26,869 Matching Annotations
  1. Apr 2024
    1. Reviewer #1 (Public Review):

      Warming and precipitation regime change significantly influences both above-ground and below-ground processes across Earth's ecosystems. Soil microbial communities, which underpin the biogeochemical processes that often shape ecosystem function, are no exception to this, and although research shows they can adapt to this warming, population dynamics and ecophysiological responses to these disturbances are not currently known. The Qinghai-Tibet Plateau, the Third Pole of the Earth, is considered among the most sensitive ecosystems to climate change. The manuscript described an integrated, trait-based understanding of these dynamics with the qSIP data. The experimental design and methods appear to be of sufficient quality. The data and analyses are of great value to the larger microbial ecological community and may help advance our understanding of how microbial systems will respond to global change. There are very few studies in which the growth rates of bacterial populations from multifactorial manipulation experiments on the Qinghai-Tibet Plateau have been investigated via qSIP, and the large quantity of data that comprises the study described in this manuscript, will substantially advance our knowledge of bacterial responses to warming and precipitation manipulations.

  2. Mar 2024
    1. eLife assessment

      This study presents a useful deep learning-based inter-protein contact prediction method named PLMGraph-Inter which combines protein language models and geometric graphs. The evidence supporting the claims of the authors is solid. The authors show that their approach may be used in cases where AlphaFold-Multimer performs poorly. This work will be of interest to researchers working on protein complex structure prediction, particularly when accurate experimental structures are available for one or both of the monomers in isolation.

    1. eLife assessment

      This important paper provides solid evidence for an alternative conceptualization of the functional role of the place and grid cell network in the medial temporal lobe for memory as opposed to spatial processing or navigation. The theory accounts for many experimental results and generates predictions for future studies. The theory's simplicity and potential explanatory power will be of interest to researchers in this field. The impact of the work at present is limited by insufficient evidence for the advantage of this model over prior models, especially as the theory does not appear to fit with some well-established existing data.

    2. Author Response

      OVERVIEW OF RESPONSE TO REVIEWS

      I thank the three anonymous reviewers for providing well-informed, constructive feedback on the initial version of this manuscript. Based on their comments I will revise the manuscript and hopefully improve it in several ways. I expected a great deal of resistance to the ideas proposed in this model because they break from traditional approaches. One of my goals in developing this model was to argue for a paradigm shift regarding the concept of a “receptive field”. Experimentally, the receptive field is defined as the set of preferred environmental sensory circumstances that cause a neuron to become highly active. Traditional interpretation of receptive fields implicitly assumes that the environmental circumstances that give rise to the receptive field do so in a purely bottom-up fashion (the cell is “receiving” its field), in which case the receptive field specifies the function of the cell. In other words, the receptive field is what the cell does. However, some brain regions (e.g., entorhinal cortex) receive substantial feedback from downstream regions (e.g., hippocampus), and feedback can play an important role in determining the receptive field. As applied to a memory account of MTL, this feedback is memory retrieval and reactivation. Thus, the multifield spatial response of grid cells doesn’t necessarily mean that their function is spatial. Consideration of bottom-up versus top-down signals gives rise to the proposal that the bottom-up preference of many grid cells is some non-spatial attribute even though they exhibit a spatial receptive field owing to retrieval in specific locations.

      One thing I will emphasize in a revision is that this model can address findings in the vast literature on learning, memory, and consolidation. The question asked in this study is whether a memory model can also explain the rodent navigation literature. This is not an attempt to provide definitive evidence that this is a better account of the rodent navigation literature. Instead, the goal is to model the rodent navigation literature even though this is a memory model rather than a spatial/navigation model. Nevertheless, within the domain of rodent spatial/navigation, this model makes different predictions/explanations than spatial/navigation models. For instance, this is the only model predicting that many grid cells with spatial receptive fields are non-spatial (see predictions in Box 1). As reviewed in Box 1, this is the only model that can explain why head direction conjunctive grid cells become head direction cells in the absence of hippocampal feedback and it is the only model that can explain why some grid cells are also sensitive to sound frequency (see several other unique explanations in Box 1).

      This study is an attempt to unify the spatial/navigation and learning/memory literatures with a relatively simply model. Given the simplicity of the model, there are important findings that the model cannot address -- it is not that the model makes the wrong predictions but rather that it makes no predictions. The role of running speed is one such variable for which the model makes no predictions. Similarly, because the model is a rate-coded model rather than a model of oscillating spiking neurons, it makes no predictions regarding theta oscillations. The model is an account of learning and memory for an adult animal, and it makes no predictions regarding the developmental or evolutionary time course of different cell types. This model contains several purely spatial representations such as border cells, head direction cells, and head direction conjunctive grid cells. In evolution and/or in development, it may be that these purely spatial cell types emerged first, followed by the evolution and/or development of non-spatial cell types. However, this does not invalidate the model. Instead, this is a model for an adult animal that has both episodic memory capabilities and spatial navigation capabilities, irrespective of the order in which these capabilities emerged.

      Grid cell models that are purely spatial are agnostic regarding the thousands of findings in the literature on memory, learning, and consolidation whereas this model can potentially unify the learning/memory and spatial/navigation literatures. The reason to prefer this model is parsimony. Rather than needing to develop a theory of memory that is separate from a theory of spatial navigation, it might be possible to address both literatures with a unified account. There are other grid cell models that can explain non-spatial grid-like responses (Mok & Love, 2019; Rodríguez‐Domínguez & Caplan, 2019; Stachenfeld et al., 2017; Wei et al., 2015) and these models may be similarly positioned to explain memory results. However, these models assume that grid cells exhibiting spatial receptive fields serve the function of identifying positions in the environment (i.e., their function is spatial). As such, these models do not explain why most of the input to rodent hippocampus appears to be spatial (these models would need to assume that rodent hippocampus is almost entirely concerned with spatial navigation). This account provides an answer to this conundrum by proposing that grid cells with spatial receptive fields have been misclassified as spatial. Below I give responses to some of the specific comments made by reviewers, grouping these comments by topic:

      COMMENTS RELATED TO THE NEED/MOTIVATION FOR THIS MODEL

      In a revision, I will clarify that the non-spatial MTL cell types that are routinely found in primate and human studies are fully compatible with this model. The reported simulations are focused on the specific question of how it can be that most mEC and hippocampal cell types in the rodent literature appear to be spatial. It is known that perirhinal cortex is not spatial. However, entorhinal cortex is the gateway to hippocampus. If the hippocampus has the capacity to represent non-spatial memories, it must receive non-spatial input from entorhinal cortex. These simulations suggest that characterization of the rodent mEC cortex as primarily spatial might be incorrect if most grid cells (except perhaps head direction conjunctive grid cells) have been mischaracterized as spatial.

      Lateral entorhinal cortex also projects to hippocampus, and one reviewer asks about the distinction between lateral versus medial entorhinal cortex. From this memory perspective, the important question is which part of the entorhinal cortex represents the non-spatial attributes common to the entire recording session, under the assumption that the animal is creating and retrieving memories during recording. If these non-spatial attributes are represented in lateral EC, there would be grid cells in lateral EC (but these are not found). There is evidence that lateral EC cells respond selectively in relation to objects (Deshmukh & Knierim, 2011), but in a typical rodent navigation study there are no objects in the enclosure.

      One reviewer asks whether this model is built to explain the existing data or whether the assumptions of this model are made for theoretical reasons. The BVC model (Barry et al., 2006), which is a precursor to this model, is a theoretically efficient representation of space that could support place coding. If the distances to different borders are known, it’s not clear why the MTL also needs the two-dimensional Fourier-like representation provided by grid cells. This gives rise to the proposal that grid cells with spatial receptive fields are serving some function other than representing space. In the proposed model, the precise hexagonal arrangement of grid cells indicates a property that is found everywhere in the enclosure (i.e., a “tiling” of knowledge for where the property can be found). This arrangement arises from the well-documented learning process termed “differentiation” in the memory literature (McClelland & Chappell, 1998; Norman & O’Reilly, 2003; Shiffrin & Steyvers, 1997), which highlights differences between memories to avoid interference and confusion.

      CONCERNS RELATED TO LIMITATIONS AND CONFLICTING RESULTS

      One reviewer points out that individual grid cells will typically reveal a grid pattern regardless of the environmental circumstances, which, according to this model, indicates that all such circumstances have the same non-spatial attribute. This might seem strange at first, but I suggest that there is a great deal of “sameness” to the environments used in the published rodent navigation experiments. For instance, as far as I’m aware, the animal is never allowed to interact with other animals during spatial navigation recording. Furthermore, the animal is always attached to wires during recording. The internal state of the animal (fear, aloneness, the noise of electronics, etc.) is likely similar across all recording situations and attributes of this internal state are likely represented in the hippocampus as well as in the regions that provide excitatory drive to hippocampus. The claim of this model is that the grid cells are “tagging” different navigation enclosures as places where these things happen (fear, aloneness, electronics, metal floor, no objects, etc.). The interesting question is what happens when the animal is allowed to navigate in a more naturalistic setting that includes varied objects, varied food sources, varied surfaces, other animals, etc. Do grid cells persist in such a naturalistic environment? Or do they lose their regularity, or even become silent, considering that there is no longer a uniformity to the non-spatial attributes? The results of Caswell Barry et al. (2012), demonstrate that the grid pattern expands and becomes less regular in a novel environment. Nevertheless, the novel environment in that study was uncluttered rather than naturalistic. It remains to be seen what will happen with a truly naturalistic environment.

      One reviewer asks how this model relates to non-grid multifield cells found in mEC (Diehl et al., 2017; see also the irregularly arranged 3D multifield cells reported by Ginosar et al., 2021). A full explanation of these cells would require a new simulation study. In a revision, I will discuss these cells, which reveal a consistent multifield spatial receptive field and yet the multiple fields are irregular in their arrangement rather than a precise hexagonal lattice. On this memory account, precise hexagonal arrangement of memories is something that occurs when there is a non-spatial attribute found throughout the enclosure. However, in a typical rodent navigation study, there may be some non-spatial attributes that are not found everywhere in the enclosure. For instance, consider the set of locations within the enclosure that afford a particular view of something outside of the enclosure or the set of locations corresponding to remembered episodic events (e.g., memory for the location where the animal first entered the enclosure). For non-spatial characteristics that are found in some locations but not others within in the enclosure, the cells representing those non-spatial attributes should reveal multifield firing at irregular locations, reflecting the subset of locations associated with the non-spatial attribute.

      One reviewer suggests that this model cannot explain the finding that grid fields become warped (e.g., grid fields arranged in an ellipse rather than a circle) in the same manner that the enclosure is warped when a wall is moved (Barry et al., 2007). The way in which I would simulate this result would be to assume that the change in the boundary location was too modest to be noticed by the animal. Because the distances are calculated relative to the borders, an unnoticed change in the border would not change the model in terms of the grid field as measured by proportional distances between borders. However, because the real-world Euclidean positions of the border are changed, the grid fields would be changed in terms of real-world coordinates. This is what I was referring to in the paper when I wrote “For instance, perhaps one egocentric/allocentric pair of mEC grid modules is based on head direction (viewpoint) in remembered positions relative to the enclosure borders whereas a different egocentric/allocentric pair is based on head direction in remembered positions relative to landmarks exterior to the enclosure. This might explain why a deformation of the enclosure (moving in one of the walls to form a rectangle rather than a square) caused some of the grid modules but not others to undergo a deformation of the grid pattern in response to the deformation of the enclosure wall (see also Barry et al., 2007). More specifically, if there is one set of non-orthogonal dimensions for enclosure borders and the movement of one wall is too modest as to cause avoid global remapping, this would deform the grid modules based the enclosure border cells. At the same time, if other grid modules are based on exterior properties (e.g., perhaps border cells in relation to the experimental room rather than the enclosure), then those grid modules would be unperturbed by moving the enclosure wall.” Related to the question of enclosure geometry, the irregularity that can emerge in trapezoid shaped enclosures was discussed in the section of the paper that reads “As seen in Figure 12, because all but one of the place cells was exterior when the simulated animal was constrained to a narrow passage, the hippocampal place cell memories were no longer arranged in a hexagonal grid. This disruption of the grid array for narrow passages might explain the finding that the grid pattern (of grid cells) is disrupted in the thin corner of a trapezoid (Krupic et al., 2015) and disrupted when a previously open enclosure is converted to a hairpin maze by insertion of additional walls within the enclosure (Derdikman et al., 2009).”

      CONCERNS THAT WILL BE ADDRESSED WITH GREATER CLARIFICATION

      One reviewer asks why a cell representing a non-spatial attribute found everywhere in the enclosure would not fire everywhere in the enclosure. In theory, cells could fire constantly. However, in practice, cells habituate and rapidly reduce their firing rate by an order of magnitude when their preferred stimulus is presented without cessation (Abbott et al., 1997; Tsodyks & Markram, 1997). After habituation, the firing rate of the cell fluctuates with minor variation in the strength of the excitatory drive. In other words, habituation allows the cell to become sensitive to changes in the excitatory drive (Huber & O’Reilly, 2003). Thus, if there is stronger top-down memory feedback in some locations as compared to others, the cell will fire at a higher rate in those remembered locations. In brief when faced with constant excitatory drive, the cell accommodates, and becomes sensitive to change in the magnitude of the excitatory drive.

      One reviewer asks for greater clarification regarding the simulation result of immediate stability for grid cells but not place cells. In a revision, I will provide a video showing a sped-up birds-eye view of the place cell memories for the 3D simulations that include head direction, showing the manner in which memories tend to linger in some locations more than others as they consolidate. This behavior was explained in the text that reads “Because the non-spatial cell’s grid field reflects on-average memory positions during the recording session (i.e., the locations where the non-spatial attribute is more often remembered, even if the locations of the memories are shifting), the grid fields for the non-spatial are immediately apparent, reflecting the tendency of place cells to linger in some locations as compared to other locations during consolidation. More specifically, the place cells tend to linger at the peaks and troughs of the border cell tuning functions (see the explanation above regarding the tendency of the grid to align with border cell dimensions). By analogy, imagine a time-lapsed birds-eye view of cars traversing the city-block structure of a densely populated city; this on-average view would show a higher density of cars at the cross-street junctions owing to their tendency to become temporarily stuck at stoplights. However, with additional learning and consolidation, the place cells stabilize their positions (e.g., the cars stop traveling), producing a consistent grid field for the head direction conjunctive grid cells.” The text describing why some locations are more “sticky” than others reads “Additional analyses revealed that this tendency to align with border cell dimensions is caused by weight normalization (Step 6 in the pseudocode). Specifically, connection weights cannot be updated above their maximum nor below their minimum allowed values. This results in a slight tendency for consolidated place cell memories to settle at one of the three peak values or three trough values of the sine wave basis set. This “stickiness” at one of 6 peak or trough values for each basis set is very slight and only occurred after many consolidation steps. In terms of biological systems, there is an obvious lower-bound for excitatory connections (i.e., it is not possible to have an excitatory weight connection that is less than zero), but it is not clear if there is an upper-bound. Nevertheless, it is common practice with deep learning models include an upper-bound for connection weights because this reduces overfitting (Srivastava et al., 2014) and there may be similar pressures for biological systems to avoid excessively strong connections.”

      One reviewer points out that Border cells are not typically active in the center of enclosure. However, the model can be built without assuming between-border cells (early simulations with the model did not make this assumption). Regarding this issue, the text reads “Unlike the BVC model, the boundary cell representation is sparsely populated using a basis set of three cells for each of the three dimensions (i.e., 9 cells in total), such that for each of the three non-orthogonal orientations, one cell captures one border, another the opposite border, and the third cell captures positions between the opposing borders (Solstad et al., 2008). However, this is not a core assumption, and it is possible to configure the model with border cell configurations that contain two opponent border cells per dimension, without needing to assume that any cells prefer positions between the borders (with the current parameters, the model predicts there will be two border cells for each between-border cell). Similarly, it is possible to configure the model with more than 3 cells for each dimension (i.e., multiple cells representing positions between the borders).” The Solstad paper found a few cells that responded in positions between borders, but perhaps not as many as 1 out of 3 cells, such as this particular model simulation predicts. If the paucity of between-border cells is a crucial data point, the model can be reconfigured with opponent-border cells without any between border cells. The reason that 3 border cells were used rather than 2 opponent border cells was for simplicity. Because 3 head direction cells were used to capture the face-centered cubic packing of memories, the simulation also used 3 border cells per dimensions to allow a common linear sum metric when conjoining dimensions to form memories. If the border dimensions used 2 cells while head direction used 3 cells, a dimensional weighting scheme would be needed to allow this mixing of “apples and oranges” in terms of distances in the 3D space that includes head direction.

      REFERENCES Abbott, L. F., Varela, J. A., Sen, K., & Nelson, S. B. (1997). Synaptic depression and cortical gain control. Science, 275(5297), 220–224.

      Barry, C., Ginzberg, L. L., O’Keefe, J., & Burgess, N. (2012). Grid cell firing patterns signal environmental novelty by expansion. Proceedings of the National Academy of Sciences of the United States of America, 109(43), 17687–17692. https://doi.org/DOI 10.1073/pnas.1209918109

      Barry, C., Hayman, R., Burgess, N., & Jeffery, K. J. (2007). Experience-dependent rescaling of entorhinal grids. Nature Neuroscience, 10(6), 682–684.

      Barry, C., Lever, C., Hayman, R., Hartley, T., Burton, S., O’Keefe, J., Jeffery, K., & Burgess, Ν. (2006). The boundary vector cell model of place cell firing and spatial memory. Reviews in the Neurosciences, 17(1–2), 71–98.

      Derdikman, D., Whitlock, J. R., Tsao, A., Fyhn, M., Hafting, T., Moser, M. B., & Moser, E. I. (2009). Fragmentation of grid cell maps in a multicompartment environment. Nat Neurosci, 12(10), 1325-U155. https://doi.org/Doi 10.1038/Nn.2396

      Deshmukh, S. S., & Knierim, J. J. (2011). Representation of non-spatial and spatial information in the lateral entorhinal cortex. Frontiers in Behavioral Neuroscience, 5, 69.

      Diehl, G. W., Hon, O. J., Leutgeb, S., & Leutgeb, J. K. (2017). Grid and nongrid cells in medial entorhinal cortex represent spatial location and environmental features with complementary coding schemes. Neuron, 94(1), 83-92. e6.

      Ginosar, G., Aljadeff, J., Burak, Y., Sompolinsky, H., Las, L., & Ulanovsky, N. (2021). Locally ordered representation of 3D space in the entorhinal cortex. Nature, 596(7872), 404–409.

      Huber, D. E., & O’Reilly, R. C. (2003). Persistence and accommodation in short-term priming and other perceptual paradigms: Temporal segregation through synaptic depression. Cognitive Science, 27(3), 403–430. https://doi.org/10.1207/s15516709cog2703_4

      Krupic, J., Bauza, M., Burton, S., Barry, C., & O’Keefe, J. (2015). Grid cell symmetry is shaped by environmental geometry. Nature, 518(7538), 232–235.

      McClelland, J. L., & Chappell, M. (1998). Familiarity breeds differentiation: A subjective-likelihood approach to the effects of experience in recognition memory. Psychological Review, 105(4), 724–760.

      Mok, R. M., & Love, B. C. (2019). A non-spatial account of place and grid cells based on clustering models of concept learning. Nature Communications, 10(1), 5685.

      Norman, K. A., & O’Reilly, R. C. (2003). Modeling hippocampal and neocortical contributions to recognition memory: A complementary-learning-systems approach. Psychological Review, 110(4), 611–646.

      Rodríguez‐Domínguez, U., & Caplan, J. B. (2019). A hexagonal Fourier model of grid cells. Hippocampus, 29(1), 37–45.

      Shiffrin, R. M., & Steyvers, M. (1997). A model for recognition memory: REM - retrieving effectively from memory. Psychonomic Bulletin & Review, 4, 145–166.

      Solstad, T., Boccara, C. N., Kropff, E., Moser, M. B., & Moser, E. I. (2008). Representation of Geometric Borders in the Entorhinal Cortex. Science, 322(5909), 1865–1868. https://doi.org/DOI 10.1126/science.1166466

      Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929–1958.

      Stachenfeld, K. L., Botvinick, M. M., & Gershman, S. J. (2017). The hippocampus as a predictive map. Nature Neuroscience, 20(11), 1643–1653.

      Tsodyks, M. V., & Markram, H. (1997). The neural code between neocortical pyramidal neurons depends on neurotransmitter release probability. Proc Natl Acad Sci U S A, 94(2), 719–723. https://doi.org/10.1073/pnas.94.2.719

      Wei, X.-X., Prentice, J., & Balasubramanian, V. (2015). A principle of economy predicts the functional architecture of grid cells. Elife, 4, e08362.

    3. Reviewer #3 (Public Review):

      The author presents a novel theory and computational model suggesting that grid cells do not encode space, but rather encode non-spatial attributes. Place cells in turn encode memories of where those specific attributes occurred. The theory accounts for many experimental results and generates useful predictions for future studies. The model's simplicity and potential explanatory power will interest others in the field, though there are a number of concerns that should first be addressed.

      A crucial assumption of the model is that the content of experience must be constant in space. It's difficult to imagine a real-world example that satisfies this assumption. Odors and sounds are used as examples. While they are often more spatially diffuse than an objects on the ground, odors and sounds have sources that are readily detectable. Animals can easily navigate to a food source or to a vocalizing conspecific. This assumption is especially problematic because it predicts that all grid cells should become silent when their preferred non-spatial attribute (e.g. a specific odor) is missing. I'm not aware of any experimental data showing that grid cells become silent. On the contrary, grid cells are known to remain active across all contexts that have been tested, including across sleep/wake states. Unlike place cells, grid cells do not seem to turn off. Since grid cells are active in all contexts, their preferred attribute must also be present in all contexts, and therefore they would not convey any information about the specific content of an experience.

      The proposed novelty of this theory is that other models all assume that grid cells encode space. This isn't quite true of models based on continuous attractor networks, the discussion of which is notably absent. More specifically, these models focus on the importance of intrinsic dynamics within the entorhinal cortex in generating the grid pattern. While this firing pattern is aligned to space during navigation and therefore can be used as a representation of that space, the neural dynamics are preserved even during sleep. Similarly, it is because the grid pattern does not strictly encode physical space that grid-like signals are also observed in relation to other two-dimensional continuous variables.

      The use of border cells or boundary vector cells as the main (or only) source of spatial information in the hippocampus is not well supported by experimental data. Border cells in the entorhinal cortex are not active in the center of an environment. Boundary-vector cells can fire farther away from the walls but are not found in the entorhinal cortex. They are located in the subiculum, a major output of the hippocampus. While the entorhinal-hippocampal circuit is a loop, the route from boundary-vector cells to place cells is much less clear than from grid cells. Moreover, both border cells and boundary-vector cells (which are conflated in this paper) comprise a small population of neurons compared to grid cells.

    4. Reviewer #1 (Public Review):

      Huber proposes a theory where the role of the medial temporal lobe (MTL) is memory, where properties of spatial cells in the MTL can be explained through memory function rather than spatial processing or navigation. Instantiating the theory through a computational model, the author shows that many empirical phenomena of spatial cells can be captured, and may be better accounted through a memory theory. It is an impressive computational account of MTL cells with a lot of theoretical reasoning and aims to tightly relate to various spatial cell data.

      In general, the paper is well written, but likely due to the complexity, there are various aspects of the paper that are difficult to understand. One point is that it is not entirely clear to me that it is a convincing demonstration of purely memory rather than navigation, but rather an account of the findings through the lens of memory. Below, I raise several big-picture theoretical questions. I also have some clarification questions about the model (where I also have some theoretical question marks - due to not achieving a full understanding).

      (1) Although the theory is based on memory, it also is based on spatially-selective cells. Not all cells in the hippocampus fulfill the criteria of place/HD/border/grid cells, and place a role in memory. E.g., Tonegawa, Buszaki labs' work does not focus on only those cells, and there are certainly a lot of non-pure spatial cells in monkeys (Martinez-Trujillo) and humans (iEEG). Does the author mainly focus on saying that "spatial cells" are memory, but do not account for non-spatial memory cells? This seems to be an incomplete account of memory - which is fine, but the way the model is set up suggests that *all* memory is, place (what/where), and non-spatial attributes ("grid") - but cells that don't fulfil these criteria in MTL (Diehl et al., 2017, Neuron; non-grid cells; Schaeffer et al., 2022, ICML; Luo et al., 2024, bioRxiv) certainly contribute to memory, and even navigation. This is also related to the question of whether these cell definitions matter at all (Luo et al., 2024).

      The authors note "However, this memory conjunction view of the MTL must be reconciled with the rodent electrophysiology finding that most cells in MTL appear to have receptive fields related to some aspect of spatial navigation (Boccara et al., 2010; Grieves & Jeffery, 2017). The paucity of non-spatial cells in MTL could be explained if grid cells have been mischaracterized as spatial." Is the author mainly talking about rodent work?

      (2) Related to the last point, how about non-grid multi-field mEC cells? In theory, these also should be the same; but the author only presents perfect-look grid cells. In empirical work, clearly, this is not the case, and many mEC cells are multi-field non-grid cells (Diehl et al., 2017). Does the model find these cells? Do they play a different role?

      As noted by the author "Because the non-spatial attributes are constant throughout the two-dimensional surface, this results in an array of discrete memory locations that are approximately hexagonal (as explained in the Model Methods, an "online" memory consolidation process employing pattern separation rapidly turns an approximately hexagonal array into one that is precisely hexagonal). "

      If they are indeed all precisely hexagonal, does that mean the model doesn't have non-grid spatial cells?

      (3) Theoretical reasons for why the model is put together this way, and why grid cells must be coding a non-spatial attribute: Is this account more data-driven (fits the data so formulated this way), or is it theoretical - there is a reason why place, border, grid cells are formulated to be like this. For example, is it an efficient way to code these variables? It can be both, like how the BVC model makes theoretical sense that you can use boundaries to determine a specific location (and so place cell), but also works (creates realistic place cells).

      But in this case, the purpose of grid cell coding a non-spatial attribute, and having some kind of system where it doesn't fire at all locations seems a little arbitrary. If it's not encoding a spatial attribute, it doesn't have to have a spatial field. For example, it could fire in the whole arena - which some cells do (and don't pass the criteria of spatial cells as they are not spatially "selective" to another location, related to above).

      (4) Why are grid cells given such a large role for encoding non-spatial attributes? If anything, shouldn't it be lateral EC or perirhinal cortex? Of course, they both could, but there is less reason to think this, at least for rodent mEC.

      (5) Clarification: why do place cells and grid cells differ in terms of stability in the model? Place cells are not stable initially but grid cells come out immediately. They seem directly connected so a bit unclear why; especially if place cell feedback leads to grid cell fields. There is an explanation in the text - based on grid cells coding the on-average memories, but these should be based on place cell inputs as well. So how is it that place fields are unstable then grid fields do not move at all? I wonder if a set of images or videos (gifs) showing the differences in spatial learning would be nice and clarify this point.

      (6) Other predictions. Clearly, the model makes many interesting (and quite specific!) predictions. But does it make some known simple predictions?<br /> • More place cells at rewarded (or more visited) locations. Some empirical researchers seem to think this is not as obvious as it seems (e.g., Duvellle et al., 2019; JoN; Nyberg et al., 2021, Neuron Review).<br /> • Grid cell field moves toward reward (Butler et al., 2019; Boccera et al., 2019).<br /> • Grid cells deform in trapezoid (Krupic et al., 2015) and change in environments like mazes (Derikman et al., 2014).

    5. Reviewer #2 (Public Review):

      The manuscript describes a new framework for thinking about the place and grid cell system in the hippocampus and entorhinal cortex in which these cells are fundamentally involved in supporting non-spatial information coding. If this framework were shown to be correct, it could have high impact because it would suggest a completely new way of thinking about the mammalian memory system in which this system is non-spatial. Although this idea is intriguing and thought-provoking, a very significant caveat is that the paper does not provide evidence that specifically supports its framework and rules out the alternate interpretations. Thus, although the work provides interesting new ideas, it leaves the reader with more questions than answers because it does not rule out any earlier ideas.

      Basically, the strongest claim in the paper, that grid cells are inherently non-spatial, cannot be specifically evaluated versus existing frameworks on the basis of the evidence that is shown here. If, for example, the author had provided behavioral experiments showing that human memory encoding/retrieval performance shifts in relation to the predictions of the model following changes in the environment, it would have been potentially exciting because it could potentially support the author's reconceptualization of this system. But in its current form, the paper merely shows that a new type of model is capable of explaining the existing findings. There is not adequate data or results to show that the new model is a significantly better fit to the data compared to earlier models, which limits the impact of the work. In fact, there are some key data points in which the earlier models seem to better fit the data.

      Overall, I would be more convinced that the findings from the paper are impactful if the author showed specific animal memory behavioral results that were only supported by their memory model but not by a purely spatial model. Perhaps the author could run new experiments to show that there are specific patterns of human or animal behavior that are only explained by their memory model and not by earlier models. But in its current form, I cannot rule out the existing frameworks and I believe some of the claims in this regard are overstated.

      In addition to the broader concerns noted above regarding the absence of any specific behavioral data that are explained by their model and not by existing spatial models, I am additionally concerned that this manuscript does not explain a number of important key empirical results in the rodent grid cell literature.

      * The paper does not fully take into account all the findings regarding grid cells, some of which very clearly show spatial processing in this system. For example, findings on grid-by-direction cells (e.g., Sargolini et al. 2006) would seem to suggest that the entorhinal grid system is very specifically spatial and related to path integration. Why would grid-by-direction cells be present and intertwined with grid cells in the author's memory-related reconceptualization? It seems to me that the existence of grid-by-direction cells is strong evidence that at least part of this network is specifically spatial.

      * I am also concerned that the paper does not do enough to address findings regarding how the elliptical shape of grid fields shifts when boundaries of an environment compress in one direction or change shape/angles (Lever et al., & Krupic et al). Those studies show compression in grid fields based on boundary position, and I don't see how the authors' model would explain these findings.

      * Are findings regarding speed modulation of grid cells problematic for the paper's memory results?

      * A further issue is that the paper does not seem to adequately address developmental findings related to the timecourses of the emergence of different cell types. In their simulation, researchers demonstrate the immediate emergence of grid fields in a novel environment, while noting that the stabilization of place cell positions takes time. However, these simulation findings contradict previous empirical developmental studies (Langston et al., 2010). Those studies showed that head direction cells show the earliest development of spatial response, followed by the appearance of place cells at a similar developmental stage. In contrast, grid cells emerge later in this developmental sequence. The gradual improvement in spatial stability in firing patterns likely plays a crucial role in the developmental trajectory of grid cells. Contrary to the model simulation, grid cells emerge later than place cells and head direction cells, yet they also hold significance in spatial mapping.

      * The model simulations suggest that certain grid patterns are acquired more gradually than others. For instance, egocentric grid cells require the stabilization of place cell memories amidst ongoing consolidation, while allocentric grid cells tend to reflect average place field positions. However, these findings seemingly conflict with empirical studies, particularly those on the conjunctive representation of distance and direction in the earliest grid cells. Previous studies show no significant differences were found in grid cells and grid cells with directional correlates across these age groups, relative to adults (Wills et al., 2012). This indicates that the combined representation of distance and direction in single mEC cells is present from the earliest ages at which grid cells emerge.

    1. eLife assessment

      This is a useful study describing an implementation of awake mouse fMRI with implanted head coils at high fields. The evidence presented is solid but could with some work become stronger. In particular, the authors need to better contextualize their work with the existing literature on awake fMRI, include further details regarding their experimental methods, and further discuss some of their unexpected (but potentially novel and interesting) brain activations.

    2. Reviewer #1 (Public Review):

      Summary:

      The authors bring together implanted radiofrequency coils, high-field MRI imaging, awake animal imaging, and sensory stimulation methods in a technological demonstration. The results are very detailed descriptions of the sensory systems under investigation.

      Strengths:

      - The maps are qualitatively excellent for rodent whole-brain imaging.<br /> - The design of the holder and the coil is pretty clever.

      Weaknesses:

      - Some unexpected regions appear on the whole brain maps, and the discussion of these regions is succinct.<br /> - The authors do not make the work and effort to train the animals and average the data from several hundred trials apparent enough. This is important for any reader who would like to consider implementing this technology.<br /> - The data is not available. This does not let the readers make their own assessment of the results.

    3. Reviewer #2 (Public Review):

      Summary:

      The manuscript by Hike et al. entitled 'High-resolution awake mouse fMRI at 14 Tesla' describes the implementation of awake mouse BOLD-fMRI at high field. This work is timely as the field of mouse fMRI is working toward collecting high-quality data from awake animals. Imaging awake subjects offers opportunities to study brain function that are otherwise not possible under the more common anesthetized conditions. Not to mention the confounding effects that anesthesia has on neurovascular coupling. What has made progress in this area slow (relative to other imaging approaches like optical imaging) is the environment within the MRI scanner (high acoustic noise) - as well as the intolerance of head and body motion. This work adds to a relatively small, but quickly growing literature on awake mouse fMRI. The findings in the study include testing of an implanted head-coil (for MRI data reception). Two designs are described and the SNR of these units at 9.4T and 14T are reported. Further, responses to visual as well as whisker stimulation recorded in acclimated awake mice are shown. The most interesting finding, and most novel, is the observation that mice seem to learn to anticipate the presentation of the stimulus - as demonstrated by activations evident ~6 seconds prior to the presentation of the stimulus when stimuli are delivered at regular intervals (but not when stimuli are presented at random intervals). These kinds of studies are very challenging to do. The surgical preparation and length of time invested into training animals are grueling. I also see this work as a step in the right direction and evidence of the foundations for lots of interesting future work. However, I also found a few shortcomings listed below.

      Weaknesses:

      (1) The surface coil, although offering a great SNR boost at the surface, ultimately comes at a cost of lower SNR in deeper more removed brain regions in comparison to commercially available Bruker coils (at room temperature). This should be quantified. A rough comparison in SNR is drawn between the implanted coils and the Bruker Cryoprobe - this should be a quantitative comparison (if possible) - including any differences in SNR in deeper brain structures. There are drawbacks to the Cryoprobe, which can be discussed, but a more thorough comparison between the implanted coils, and other existing options should be provided (the Cryoprobe has been used previously in awake mouse experiments). Further, the details of how to build the implanted coils should be provided (shared) - this should include a parts list as well as detailed instructions on how to build the units. Also, how expensive are they? And can they be reused?

      (2) In the introduction, the authors state that "Awake mouse fMRI has been well investigated". I disagree with this statement and others in the manuscript that give the reader the impression that awake experiments are not a challenging and unresolved approach to fMRI experiments in mice (or rodents). Although there are multiple labs (maybe 15 worldwide) that have conducted awake mouse experiments (with varying degrees of success/thoroughness), we are far from a standardized approach. This is a strength of the current work and should be highlighted as such. I encourage the authors to read the recent systematic review that was published on this topic in Cerebral Cortex by Mandino et al. There are several elements in there that should influence the tone of this piece including awake mouse implementations with the Bruker Cryoprobe, prevalence of surgical preparations, and evaluations of stress.

      (3) The authors also comment on implanted coils reducing animal stress - I don't know where this comment is coming from, as this has not been reported in the literature (to my knowledge) and the authors don't appear to have evaluated stress in their mice.

      (4) Following on the above point, measures of motion, stress, and more details on the acclimation procedure that was implemented in this study should be included.

      (5) It wasn't clear to me at what times the loop versus "Figure 8" coil was being used, nor how many mice (or how much data) were included in each experiment/plot. There is also no mention of biological sex.

      (6) Building on the points above, the manuscript overall lacks experimental detail (especially since the format has the results prior to the methods).

      (7) An observation is made in the manuscript that there is an appreciable amount of negative BOLD signal. The authors speculate that this may come from astrocyte-mediated BOLD during brain state changes (and cite anesthetized rat and non-human primate experiments). This is very strange to me. First, the negative BOLD signal is not plotted (please do this), further, there are studies in awake mice that measure astrocyte activation eliciting positive BOLD responses (see Takata et al. in Glia, 2017).

    1. Author Response

      Reviewer #2 (Public Review):

      In this manuscript, Chen et al. reported that the core binding factor beta (Cbfβ), a heterodimeric subunit of the RUNX family transcription factors (TFs), is crucial in maintaining cartilage homeostasis and counteracting traumatic OA pathology. Using mouse models in which Cbfβ is conditionally inactivated in the Col2a1+ and Acan+ cells, the authors claimed that Cbfβ ablation led to articular cartilage (AC) degeneration, which is associated with aberrant cartilage gene expression and chondrocyte signaling, particularly the elevated Wnt/Catenin and the decreased Hippo/YAP and TGFβ signaling. The authors further showed that Cbfβ transcripts are decreased in human OA cartilage, and sustaining Cbfβ expression in mouse knee joints mitigated the severity of surgery-evoked OA.

      On the whole, the work reported is interesting and exciting. Genetic and biochemical data support key statements. Both in vivo and in vitro experiments were well designed with proper controls; semiquantitative data were digitalized and processed for statistical significance. Furthermore, new findings were adequately discussed in contrast to the current available knowledge. However, the conceptual novelty of this study is slightly compromised by recent publications showing that Cbfβ reduction is associated with OA (Che et al. 2023; Li et al. 2021). Also, the authors claimed that multiple signaling pathways were affected by Cbfβ ablation in cartilage cells; many of them, however, are indirect effects given the nature of Cbfβ as a TF. The authors also showed that pSMAD2/3 and active βCatenin decreased and increased upon Cbfβ depletion in the mouse AC cartilage. However, how the deficiency of Cbfβ, a widely expressed TF, affected the posttranslational modification of SMAD2/3 and βCatenin is unclear and needs further discussion. Overall, Cbfβ's role in cartilage and OA pathology is an emerging area of study; the authors provided a set of genetic evidences showing that Cbfβ is indispensable for cartilage homeostasis.

      We thank the reviewer for the positive appraisal of our manuscript. We greatly appreciate the insightful comments and critiques. In accordance with the reviewer’s suggestions, we have thoroughly revised all parts of the manuscript. We are glad that the reviewers considered our work to be of interest, and we are grateful for this opportunity to resubmit our manuscript. With regard to concerns of novelty of our study, Li et al’s study only reported the relationship between abnormal Cbfβ expression in human cartilage and osteoarthritis. Che et al’s study employed Cbfβf/fAggrecan-cre mice, while our study used a novel inducible Cbfβf/fCol2α1CreERT mouse model. While the Aggrecan-creERT system provides valuable insights into the role of Cbfβ in differentiated cartilage cells and its implications in the advanced stages of osteoarthritis, our current study also used Cbfβf/fCol2α1-CreERT aimed to explore the gene's function from a broader perspective. Previous study points out that Col2α1 is expressed in both early and late stage of chondrogenesis, including skeletal mesenchymal cells, perichondrium and presumptive joint cells, but aggrecan is expressed specifically in differentiated chondrocytes(1). However, studies show that not only differentiated chondrocytes but also chondrocyte progenitors are involved in OA pathogenesis(2). In our current study, the Col2α1-CreERT system allowed us to investigate Cbfβ's role not only in mature chondrocytes but also in early chondroprogenitor cells, offering a comprehensive view of Cbfβ’s involvement in cartilage in osteoarthritis. Therefore, the use of the Cbfβf/fCol2α1-CreERT mouse mutant strain was instrumental in expanding our understanding of Cbfβ's multifaceted role in osteoarthritis, highlighting its importance not only in mature cartilage but also in the early stages of cartilage formation and differentiation. In addition to the different types of Cre used compared to our previous study, our current study also used gain-of-function approach in ACLT-induced OA disease model to understand the potential therapeutic function of Cbfβ in OA pathological condition. Adding our current findings to our previous research, we can now piece together a more complete picture of Cbfβ's role across the entire spectrum of cartilage development in osteoarthritis.

      We agree with the reviewer that how the deficiency of Cbfβ, a widely expressed TF, affected the posttranslational modification of SMAD2/3 and βCatenin is unclear and needs further exploration. So far there is no clear explanation of this, which is why we used RNA-seq and heatmap analysis to examine other genes expression which could help to uncover the mechanism underlying these results. Interestingly, Che et al’s result showed that TGFB signaling (P-Smad3) increased in Cbfβf/fAggrecan-cre mice, while our data showed that TGFB signaling (both PSmad3 and Smad3) decreased in Cbfβf/fCol2α1-CreERT mice as shown in our results in Figure 8. These results were also confirmed by RNA-seq analysis as shown in the heatmaps in figure 5.

      These differences could be the result of different mouse ages used in our study and Che et al’s study.

      1. Blaney Davidson EN, van de Loo FA, van den Berg WB, van der Kraan PM. How to build an inducible cartilagespecific transgenic mouse. Arthritis Res Ther. 2014;16(3):210.

      2. Tong L, Yu H, Huang X, Shen J, Xiao G, Chen L, et al. Current understanding of osteoarthritis pathogenesis and relevant new approaches. Bone Res. 2022;10(1):60.

      Reviewer #3 (Public Review):

      The authors comprehensively demonstrated the Cbfβ gene, which is involved in articular cartilage homeostasis, can promote articular cartilage regeneration and repair in osteoarthritis (OA) through regulating Hippo/YAP signaling TGF-β signaling, and canonical Wnt signaling. First, the authors demonstrated the deletion of Cbfβ can induce the OA phenotypes including decreased articular cartilage and osteoblasts, and increased osteoclasts and subchondral bone hyperplasia, and induce the early onset of OA. Additionally, the authors showed that the deficiency of Cbfβ in cartilage can increase canonical Wnt signaling and decrease TGF-β and Hippo signaling. Finally, the authors demonstrated that the overexpression of Cbfβ can inhibit Wnt signaling and enhance Hippo/YAP signaling in knee joints articular cartilage of ACLT-induced OA mice and protect against ACLT-induced OA. The manuscript is overall well-constructed, and the authors provided evidence to support their findings.

      In Fig. 7I, it could be better to show the statistical analysis between normal and AAV-mediated Cbfβ ACLT mice groups.

      We thank the reviewer for bringing this to our attention. In the revised figure 7I, we have included the statistical analysis between normal and AAV-mediated Cbfβ ACLT mice groups.

      In Fig. 9H-K, in the quantification analysis, the OARSI score in the DMM+AAV-YFP group is higher than in the sham group significantly. However, the SO staining results appear to show no significant difference between the DMM+AAV-luc-YFP group (Fig. 9I) and the sham group (Fig. 9H).

      We thank the reviewer for bringing this to our attention. Although both the sham and DMM+AAV-luc-YFP group stain positive for SO, the SO stain intensity of the DMM+AAV-lucYFP group is noticeably lower. In addition, SO staining is not the only parameter which is included in the OARSI score. We also evaluated the cartilage thickness, proteoglycan structure, and Cartilage surface fibrillation index. Our evaluation to determine the OARSI score relies on the qualities of the whole joint, not only the magnified portion. For convenience we have also outlined the region of positive SO stain in the revised figure 9I

    2. eLife assessment

      This fundamental work advances our understanding of the role of Cbfβ in maintaining articular cartilage homeostasis and the underlying mechanisms. The evidence supporting the conclusion is mostly convincing, although including additional experiments and discussions would have strengthened the study. This paper is of potential interest to skeletal biologists and orthopaedic surgeons who study the pathogenesis and the therapeutics of osteoarthritis.

    3. Reviewer #1 (Public Review):

      Osteoarthritis (OA) is associated with painful, chronic inflammation that often leads to severe joint pain and joint stiffness for people over the age of 55. There is no effective therapeutic drug in the treatment of osteoarthritis. The authors found that mice without Cbfβ in their chondrocytes develop spontaneous OA. Authors uncovered that the deficiency of Cbfβ caused increased canonical Wnt signaling and inflammatory response, and decreased Hippo/YAP signaling and TGF-β signaling in articular cartilage. Authors showed that ACLT surgery-induced OA decreased Cbfβ and Yap expression and increased active β-catenin expression in articular cartilage, while local AAV-mediated Cbfβ overexpression promoted Yap expression, diminished active β-catenin expression in OA lesions. The authors demonstrated that AAV-mediated Cbfβ overexpression in knee joints of mice with OA showed the significant protective effect of Cbfβ on articular cartilage in the ACLT OA mouse model. The results from the study demonstrated Cbfβ maintains articular cartilage homeostasis through inhibiting Wnt/β-catenin signaling and increasing Hippo/Yap, and TGFβ signaling. Importantly, the authors proved that local Cbfβ overexpression could be an effective strategy for treatment of OA. The data shown in the study demonstrated that the findings are novel and very significant, and the authors' claims and conclusions are justified by their data. The paper is generally excellent with an interesting scientific premise and strong scientific rigor. The findings in this manuscript are novel, the manuscript is clearly written, and the findings will make a significant impact in the field.

    4. Reviewer #2 (Public Review):

      In this manuscript, Chen et al. reported that the core binding factor beta (Cbfβ), a heterodimeric subunit of the RUNX family transcription factors (TFs), is crucial in maintaining cartilage homeostasis and counteracting traumatic OA pathology. Using mouse models in which Cbfβ is conditionally inactivated in the Col2a1+ and Acan+ cells, the authors claimed that Cbfβ ablation led to articular cartilage (AC) degeneration, which is associated with aberrant cartilage gene expression and chondrocyte signaling, particularly the elevated Wnt/Catenin and the decreased Hippo/YAP and TGFβ signaling. The authors further showed that Cbfβ transcripts are decreased in human OA cartilage, and sustaining Cbfβ expression in mouse knee joints mitigated the severity of surgery-evoked OA.

      On the whole, the work reported is interesting and exciting. Genetic and biochemical data support key statements. Both in vivo and in vitro experiments were well designed with proper controls; semiquantitative data were digitalized and processed for statistical significance. Furthermore, new findings were adequately discussed in contrast to the current available knowledge. However, the conceptual novelty of this study is slightly compromised by recent publications showing that Cbfβ reduction is associated with OA (Che et al. 2023; Li et al. 2021). Also, the authors claimed that multiple signaling pathways were affected by Cbfβ ablation in cartilage cells; many of them, however, are indirect effects given the nature of Cbfβ as a TF. The authors also showed that pSMAD2/3 and active βCatenin decreased and increased upon Cbfβ depletion in the mouse AC cartilage. However, how the deficiency of Cbfβ, a widely expressed TF, affected the posttranslational modification of SMAD2/3 and βCatenin is unclear and needs further discussion. Overall, Cbfβ's role in cartilage and OA pathology is an emerging area of study; the authors provided a set of genetic evidences showing that Cbfβ is indispensable for cartilage homeostasis.

    5. Reviewer #3 (Public Review):

      The authors comprehensively demonstrated the Cbfβ gene, which is involved in articular cartilage homeostasis, can promote articular cartilage regeneration and repair in osteoarthritis (OA) through regulating Hippo/YAP signaling TGF-β signaling, and canonical Wnt signaling. First, the authors demonstrated the deletion of Cbfβ can induce the OA phenotypes including decreased articular cartilage and osteoblasts, and increased osteoclasts and subchondral bone hyperplasia, and induce the early onset of OA. Additionally, the authors showed that the deficiency of Cbfβ in cartilage can increase canonical Wnt signaling and decrease TGF-β and Hippo signaling. Finally, the authors demonstrated that the overexpression of Cbfβ can inhibit Wnt signaling and enhance Hippo/YAP signaling in knee joints articular cartilage of ACLT-induced OA mice and protect against ACLT-induced OA. The manuscript is overall well-constructed, and the authors provided evidence to support their findings.

      In Fig. 7I, it could be better to show the statistical analysis between normal and AAV-mediated Cbfβ ACLT mice groups.

      In Fig. 9H-K, in the quantification analysis, the OARSI score in the DMM+AAV-YFP group is higher than in the sham group significantly. However, the SO staining results appear to show no significant difference between the DMM+AAV-luc-YFP group (Fig. 9I) and the sham group (Fig. 9H).

    1. Author Response

      eLife assessment

      This important study provides a new, apparently high-performance algorithm for B cell clonal family inference. The new algorithm is highly innovative and based on a rigorous probabilistic analysis of the relevant biological processes and their imprint on the resulting sequences, however, the strength of evidence regarding the algorithm's performance is incomplete, due to (1) a lack of clarity regarding how different data sets were used for different steps during algorithm development and validation, resulting in concerns of circularity, (2) a lack of detail regarding the settings for competitor programs during benchmarking, and (3) method development, data simulation for method validation, and empirical analyses all based on the B cell repertoire of a single subject. With clarity around these issues and application to a more diverse set of real samples, this paper could be fundamental to immunologists and important to any researcher or clinician utilizing B cell receptor repertoires in their field (e.g., cancer immunology).

      We apologize for the long delay in implementing the suggested changes. Some of the co-authors had some personal issues that made it hard to efficiently work on the revision.

      We have addressed all the essential points below, as well as all the detailed comments of each reviewer in the following pages.

      Due to the journal’s guidelines we have to upload an “all black” version of the manuscript as the main version. We have uploaded a revised manuscript with the changes marked in red as a “Related Manuscript file”, which appears at the very end of the Merged Manuscript File, after all the Figures, and at the end of the list of files on the webpage. We apologize for this inconvenience.

      In addition, we have added an extension of HILARy to deal with paired-chain repertoires, and have benchmarked the new method on a recently published synthetic dataset. This new analysis is now presented in new Fig. 5.

      Reviewer #1 (Public Review):

      Identifying individual BCR/Ab chain sequences that are members of the same clone is a longstanding problem in the analysis of BCR/Ab repertoire sequencing data. The authors propose a new method designed to be scalable for application to huge repertoire data sets without sacrificing accuracy. Their approach utilizes Hamming Distance between CDR3 sequences followed by clustering for a fast, high-precision approach to classifying pairs of sequences as related or not, and then refines the classification using mutation information from germline-encoded regions. They compare their method with other state-of-the-art methods using synthetic data.

      The authors address an important problem in an interesting, innovative, and rigorous way, using probabilistic representations of CDR3 differences, frequencies of shared and not-shared mutations, and the relationships between the two under hypotheses of related pairs and unrelated pairs, and from these develop an approach for determining thresholds for classification and lineage assignment. Benchmarking shows that the proposed method, the complete method including both steps, outperforms other methods.

      Strengths of the method include its theoretical underpinnings which are consistent with an immunologist's intuition about how related and unrelated sequences would compare with each other in terms of the metrics to use and how those metrics are related to each other.

      I have two high-level concerns:

      (1) It isn't clear how the real and synthetic data are being used to estimate parameters for the classifier and evaluate the classifier to avoid circularity. It seems like the approach is used to assign lineages in the data from [1], and then properties of this set of lineages are used to estimate parameters that are then used to refine the approach and generate synthetic data that is used to evaluate the approach. This may not be a problem with the approach but rather with its presentation, but it isn't entirely clear what data is being used and where for what purpose. An understanding of this is necessary in order to truly evaluate the method and results.

      The reviewer is correct in their understanding of the pipeline. It should be stressed that the lineages used to guide the generation of the synthetic data was done on VJl classes for which the clustering was easy and reliable, and should therefore be largely model independent.

      We have added an explanation in the main text of why the re-use of real data lineages inferred by HILARy doesn’t bias the procedure, since it’s done on a subset of lineages within VJl classes that are easy to infer (section “Test on synthetic dataset”).

      (2) Regarding the data used for benchmarking - given the intertwined fashion by which the classification approach and synthetic data generation approach appear to have been developed, it is not surprising that the proposed approach outperforms the other methods when evaluated on the synthetic data presented here. It would be better to include in the benchmark the data used by the other methods to benchmark themselves or also generate synthetic data using their data generation procedures.

      We agree with the reviewer that a test of the method on an independent synthetic dataset is important for its applicability and to compare to other methods.

      We have added a new synthetic dataset from the group that designed the partis method to our benchmark. Our method still performs competitively, on par with partis—which was developed and tested on that dataset—and better than other methods. The results are presented in revised Fig. 4 (panels E-G), and Figure 4–figure supplement 1 as a function of the mutation rate.

      In addition, we have used that dataset to benchmark a new version of HILARy that also uses the light chain. We present the results in new Figures 5 and Figure 4–figure supplement 1.

      An improved method for BCR/Ab sequence lineage assignment would be a methodologic advancement that would enable more rigorous analyses of BCR/Ab repertoires across many fields, including infectious disease, cancer, autoimmune disease, etc., and in turn, enable advancement in our understanding of humoral immune responses. The methods would have utility to a broad community of researchers.

      Reviewer #2 (Public Review):

      This manuscript describes a new algorithm for clonal family inference based on V and J gene identity, sequence divergence in the CDR3 region, and shared mutations outside the CDR3. Specifically, the algorithm starts by grouping sequences that have the same V and J genes and the same CDR3 length. It then performs single-linkage clustering on these groups based on CDR3 Hamming distance, then further refines these groups based on shared mutations.

      Although there are a number of algorithms that use a similar overall strategy, a couple of aspects make this work unique. First, a persistent challenge for algorithms such as this one is how to set a cutoff for single-linkage clustering: if it is too low, then one separates clusters that should be together, and if too high one joins together clusters that should be separate. Here the authors leverage a rich collection of probabilistic tools to make an optimal choice. Specifically, they model the probability distributions of within- and between-cluster CDR3 Hamming distances, with parameters depending on CDR3 length and the "prevalence" of clonal sequence pairs (i.e. family size distribution). This allows the algorithm to make optimal choices for separating clusters, given the particular chosen distance metric, and assuming the sample in question has been accurately modeled. Second, the algorithm uses a highly efficient means of doing single-linkage clustering on nucleotide sequences.

      This leads to a fast and highly performant algorithm on data meant to replicate the original sample used in algorithm design. The ideas are new and beautifully developed. The application to real data is interesting, especially the point about dN/dS.

      However, the paper leaves open the question of how this inference algorithm works on samples other than the one used for simulation and as a template for validation. If I understand the simulation procedure correctly - that one takes a collection of inferred trees from the real data, then re-draws the root sequence and the identity of the mutations on the branches - then the simulated data should be very close to the data used to develop the methods in the paper. This consideration seems especially important given that key methods in this paper use mutation counts and overall mutation counts are preserved.

      Repertoires come in all shapes and sizes: infants to adults, healthy to cancerous, and naive to memory to plasma-cell-just-after-vaccination. If this is being proposed as a general-purpose clonal inference algorithm rather than one just for this sample, then a more diverse set of validations are needed.

      We agree that testing the method on a differently generated dataset is a useful check. We should point out, however, that our synthetic dataset is not as biased as it may seem. In particular, it is based on trees from VJl classes that we predicted are very easy to cluster, which means that they are truly faithful to the data, and not dependent on the particular algorithm used to infer them. The big advantage over this synthetic dataset over others is that it recapitulates the power law statistics of clone size distribution, as well as the diversity of mutation rates. To us, it still represents a more useful benchmark than synthetic datasets generated by population genetics models, which miss most of this very broad variability.

      However, to check how the method generalizes to other datasets, we repeated our validation procedure on the dataset used to evaluate Partis in Ralph et al 2022. The new results are discussed in the main text and in new panels of Fig. 4 in the same form as the previous comparisons. We also added a comparison of performance as a function of mutation rate in the new Figure 4–figure supplement 1.

      It is unclear how to run the code. The software repo has a nice readme explaining the file layout, dependencies, and input file format, but the repo seems to be lacking an inference.ipynb mentioned there which runs an analysis. Perhaps this is a typo and refers to inference.py, which in addition to the documented cdr3 clustering, seems to have functions to run both clustering methods. However, it does not seem to have any documentation or help messages about how to run these functions.

      We have completely overhauled the github to provide a detailed step by step explanation of how to run the code. The code is now easily installable using pip.

      The results are not currently reproducible, because the simulated data is not available. The data availability statement says that no data have been generated for this manuscript, however simulated data has been generated, and that is a key aspect of the analysis in the paper.

      We have uploaded the simulated data to zenodo, as well as provided scripts in the github to run the benchmarks.

      More detail is needed to understand the timing comparisons. The new software is clearly written to use many threads. Were the other software packages run using multiple threads? What type of machine was used for the benchmarks?

      All timing comparisons were made based on a single VJl class on a 14 double-threaded CPU computer. HILARy uses all 28 threads, and other methods were run with default settings, with multi-threading allowed.

      We have clarified the specifications of the computer.

      Reviewer #3 (Public Review):

      B cell receptors are produced through a combination of random V(D)J recombination and somatic hypermutation. Identifying clonal lineages - cells that descend from a common V(D)J rearrangement - is an important part of B cell repertoire analysis. Here, the authors developed a new method to identify clonal lineages from BCR data. This method builds off of prior advances in the field and uses both an adaptive clonal distance threshold and shared somatic hypermutation information to group B cells into clonal lineages.

      The major strength of this paper is its thorough quantitative treatment of the subject and integration of multiple improvements into the clonal clustering process. By their simulation results, the method is both highly efficient and accurate.

      The only notable weakness we identified is that much of the impact of the method will depend on its superiority to existing approaches, and this is not convincingly demonstrated by Fig. 4. In particular, little detail is given on how the other clonal clustering programs were run, and this can significantly impact their performance. More specifically:

      We have added a new benchmark to address these concerns, presented in Fig. 4 and in new figure 4 – figure supplement 1 as a function of a controllable mutation rate.

      (1) Scoper supports multiple methods for clonal clustering, including both adaptive CDR3 distance thresholds (Nouri and Kleinstein, 2018) and shared V-gene mutations (Nouri and Kleinstein, 2020). It is not clear which method was used for benchmarking. The specific functions and settings used should have been detailed and justified. Spectral clustering with shared V gene mutations would be the most comparable to the authors' method. Similar detail is needed for partis.

      In the updated version I use the 2020 version. The 2018 is very similar to simple single linkage so will be removed from the benchmark.

      (2) It is not clear how the adaptive thresholds and shared mutation analysis in the authors' method differ from prior approaches such as scoper and partis.

      We have changed the paragraph in the discussion section about the benchmark to highlight the innovative aspects and differences with previous approaches.

      (3) The scripts for performing benchmarking analyses, as well as the version numbers of programs tested, are not available.

      We have added to the github all the scripts used for benchmarking. We have added details about the version numbers in the data and code availability section of the methods.

      (4) Similar to above, P. 10 describes single linkage hierarchical clustering with a fixed threshold as a "crude method" that "suffers from inaccuracy as it loses precision in the case of highlymutated sequences and junctions of short length." As far as we could tell, this statement is not backed up by either citations or analyses in the paper. It should not be difficult for the authors to test this though using their simulations, as this method is also implemented in scoper.

      We have added this method to our benchmark to support that point. The results are presented in Figure 4 – figure supplement 2.

      References

      Nouri N, Kleinstein SH. 2020. Somatic hypermutation analysis for improved identification of B cell clonal families from next-generation sequencing data. PLOS Comput Biol 16:e1007977. doi:10.1371/journal.pcbi.1007977

      Nouri N, Kleinstein SH. 2018. A spectral clustering-based method for identifying clones from high- throughput B cell repertoire sequencing data. Bioinformatics 34:i341-i349. doi:10.1093/bioinformatics/bty235

      We have changed citation [22] to refer to the 2018 paper. The 2020 paper is citation [18].

    2. eLife assessment

      This important study provides a new, apparently high-performance algorithm for B cell clonal family inference. The new algorithm is highly innovative and based on a rigorous probabilistic analysis of the relevant biological processes and their imprint on the resulting sequences, however, the strength of evidence regarding the algorithm's performance is incomplete, due to (1) a lack of clarity regarding how different data sets were used for different steps during algorithm development and validation, resulting in concerns of circularity, (2) a lack of detail regarding the settings for competitor programs during benchmarking, and (3) method development, data simulation for method validation, and empirical analyses all based on the B cell repertoire of a single subject. With clarity around these issues and application to a more diverse set of real samples, this paper could be fundamental to immunologists and important to any researcher or clinician utilizing B cell receptor repertoires in their field (e.g., cancer immunology).

    3. Reviewer #1 (Public Review):

      Identifying individual BCR/Ab chain sequences that are members of the same clone is a long-standing problem in the analysis of BCR/Ab repertoire sequencing data. The authors propose a new method designed to be scalable for application to huge repertoire data sets without sacrificing accuracy. Their approach utilizes Hamming Distance between CDR3 sequences followed by clustering for a fast, high-precision approach to classifying pairs of sequences as related or not, and then refines the classification using mutation information from germline-encoded regions. They compare their method with other state-of-the-art methods using synthetic data.

      The authors address an important problem in an interesting, innovative, and rigorous way, using probabilistic representations of CDR3 differences, frequencies of shared and not-shared mutations, and the relationships between the two under hypotheses of related pairs and unrelated pairs, and from these develop an approach for determining thresholds for classification and lineage assignment. Benchmarking shows that the proposed method, the complete method including both steps, outperforms other methods.

      Strengths of the method include its theoretical underpinnings which are consistent with an immunologist's intuition about how related and unrelated sequences would compare with each other in terms of the metrics to use and how those metrics are related to each other.

      I have two high-level concerns:<br /> (1) It isn't clear how the real and synthetic data are being used to estimate parameters for the classifier and evaluate the classifier to avoid circularity. It seems like the approach is used to assign lineages in the data from [1], and then properties of this set of lineages are used to estimate parameters that are then used to refine the approach and generate synthetic data that is used to evaluate the approach. This may not be a problem with the approach but rather with its presentation, but it isn't entirely clear what data is being used and where for what purpose. An understanding of this is necessary in order to truly evaluate the method and results.<br /> (2) Regarding the data used for benchmarking - given the intertwined fashion by which the classification approach and synthetic data generation approach appear to have been developed, it is not surprising that the proposed approach outperforms the other methods when evaluated on the synthetic data presented here. It would be better to include in the benchmark the data used by the other methods to benchmark themselves or also generate synthetic data using their data generation procedures.

      An improved method for BCR/Ab sequence lineage assignment would be a methodologic advancement that would enable more rigorous analyses of BCR/Ab repertoires across many fields, including infectious disease, cancer, autoimmune disease, etc., and in turn, enable advancement in our understanding of humoral immune responses. The methods would have utility to a broad community of researchers.

    4. Reviewer #2 (Public Review):

      This manuscript describes a new algorithm for clonal family inference based on V and J gene identity, sequence divergence in the CDR3 region, and shared mutations outside the CDR3. Specifically, the algorithm starts by grouping sequences that have the same V and J genes and the same CDR3 length. It then performs single-linkage clustering on these groups based on CDR3 Hamming distance, then further refines these groups based on shared mutations.

      Although there are a number of algorithms that use a similar overall strategy, a couple of aspects make this work unique. First, a persistent challenge for algorithms such as this one is how to set a cutoff for single-linkage clustering: if it is too low, then one separates clusters that should be together, and if too high one joins together clusters that should be separate. Here the authors leverage a rich collection of probabilistic tools to make an optimal choice. Specifically, they model the probability distributions of within- and between-cluster CDR3 Hamming distances, with parameters depending on CDR3 length and the "prevalence" of clonal sequence pairs (i.e. family size distribution). This allows the algorithm to make optimal choices for separating clusters, given the particular chosen distance metric, and assuming the sample in question has been accurately modeled. Second, the algorithm uses a highly efficient means of doing single-linkage clustering on nucleotide sequences.

      This leads to a fast and highly performant algorithm on data meant to replicate the original sample used in algorithm design. The ideas are new and beautifully developed. The application to real data is interesting, especially the point about dN/dS.

      However, the paper leaves open the question of how this inference algorithm works on samples other than the one used for simulation and as a template for validation. If I understand the simulation procedure correctly - that one takes a collection of inferred trees from the real data, then re-draws the root sequence and the identity of the mutations on the branches - then the simulated data should be very close to the data used to develop the methods in the paper. This consideration seems especially important given that key methods in this paper use mutation counts and overall mutation counts are preserved.

      Repertoires come in all shapes and sizes: infants to adults, healthy to cancerous, and naive to memory to plasma-cell-just-after-vaccination. If this is being proposed as a general-purpose clonal inference algorithm rather than one just for this sample, then a more diverse set of validations are needed.

      It is unclear how to run the code. The software repo has a nice readme explaining the file layout, dependencies, and input file format, but the repo seems to be lacking an `inference.ipynb` mentioned there which runs an analysis. Perhaps this is a typo and refers to `inference.py`, which in addition to the documented cdr3 clustering, seems to have functions to run both clustering methods. However, it does not seem to have any documentation or help messages about how to run these functions.

      The results are not currently reproducible, because the simulated data is not available. The data availability statement says that no data have been generated for this manuscript, however simulated data has been generated, and that is a key aspect of the analysis in the paper.

      More detail is needed to understand the timing comparisons. The new software is clearly written to use many threads. Were the other software packages run using multiple threads? What type of machine was used for the benchmarks?

    5. Reviewer #3 (Public Review):

      B cell receptors are produced through a combination of random V(D)J recombination and somatic hypermutation. Identifying clonal lineages - cells that descend from a common V(D)J rearrangement - is an important part of B cell repertoire analysis. Here, the authors developed a new method to identify clonal lineages from BCR data. This method builds off of prior advances in the field and uses both an adaptive clonal distance threshold and shared somatic hypermutation information to group B cells into clonal lineages.

      The major strength of this paper is its thorough quantitative treatment of the subject and integration of multiple improvements into the clonal clustering process. By their simulation results, the method is both highly efficient and accurate.

      The only notable weakness we identified is that much of the impact of the method will depend on its superiority to existing approaches, and this is not convincingly demonstrated by Fig. 4. In particular, little detail is given on how the other clonal clustering programs were run, and this can significantly impact their performance. More specifically:

      (1) Scoper supports multiple methods for clonal clustering, including both adaptive CDR3 distance thresholds (Nouri and Kleinstein, 2018) and shared V-gene mutations (Nouri and Kleinstein, 2020). It is not clear which method was used for benchmarking. The specific functions and settings used should have been detailed and justified. Spectral clustering with shared V gene mutations would be the most comparable to the authors' method. Similar detail is needed for partis.<br /> (2) It is not clear how the adaptive thresholds and shared mutation analysis in the authors' method differ from prior approaches such as scoper and partis.<br /> (3) The scripts for performing benchmarking analyses, as well as the version numbers of programs tested, are not available.<br /> (4) Similar to above, P. 10 describes single linkage hierarchical clustering with a fixed threshold as a "crude method" that "suffers from inaccuracy as it loses precision in the case of highly-mutated sequences and junctions of short length." As far as we could tell, this statement is not backed up by either citations or analyses in the paper. It should not be difficult for the authors to test this though using their simulations, as this method is also implemented in scoper.

      References<br /> Nouri N, Kleinstein SH. 2020. Somatic hypermutation analysis for improved identification of B cell clonal families from next-generation sequencing data. PLOS Comput Biol 16:e1007977. doi:10.1371/journal.pcbi.1007977<br /> Nouri N, Kleinstein SH. 2018. A spectral clustering-based method for identifying clones from high-throughput B cell repertoire sequencing data. Bioinformatics 34:i341-i349. doi:10.1093/bioinformatics/bty235

    1. Author Response

      We acknowledge the editors and reviewers for their careful and thoughtful review of the preprint. Their comments and suggestions will be very useful in improving the manuscript's revised version, which we plan to submit in the coming weeks.

    1. eLife assessment

      In this study, the authors develop a strategy for fluorophore-tagging endogenous proteins in human induced pluripotent stem cells (iPSCs) using a split mNeonGreen approach, and they conclude that the system will be appropriate for performing live imaging studies of highly dynamic cellular processes such as cytokinesis in iPSCs. Experimentally, the methods are solid, and the data presented support the authors' conclusions. Overall, these methodologies should be useful to a wide audience of cell biologists who want to study protein localization and dynamics at endogenous levels in iPSCs.

    2. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this study, the authors develop a useful strategy for fluorophore-tagging endogenous proteins in human induced pluripotent stem cells (iPSCs) using a split mNeonGreen approach. Experimentally, the methods are solid, and the data presented support the author's conclusions. Overall, these methodologies should be useful to a wide audience of cell biologists who want to study protein localization and dynamics at endogenous levels in iPSCs.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors have applied an asymmetric split mNeonGreen2 (mNG2) system to human iPSCs. Integrating a constitutively expressed long fragment of mNG2 at the AAVS1 locus, allows other proteins to be tagged through the use of available ssODN donors. This removes the need to generate long AAV donors for tagging, thus greatly facilitating high-throughput tagging efforts. The authors then demonstrate the feasibility of the method by successfully tagging 9 markers expressed in iPSC at various, and one expressed upon endoderm differentiation. Several additional differentiation markers were also successfully tagged but not subsequently tested for expression/visibility. As one might expect for high-throughput tagging, a few proteins, while successfully tagged at the genomic level, failed to be visible. Finally, to demonstrate the utility of the tagged cells, the authors isolated clones with genes relevant to cytokinesis tagged, and together with an AI to enhance signal-to-noise ratios, monitored their localization over cell division.

      Strengths:

      Characterization of the mNG2 tagged parental iPSC line was well and carefully done including validation of a single integration, the presence of markers for continued pluripotency, selected offtarget analysis, and G-banding-based structural rearrangement detection.

      The ability to tag proteins with simple ssODNs in iPSC capable of multi-lineage differentiation will undoubtedly be useful for localization tracking and reporter line generation.

      Validation of clone genotypes was carefully performed and highlights the continued need for caution with regard to editing outcomes.

      Weaknesses:

      IF and flow cytometry figures lack quantification and information on replication. How consistent is the brightness and localization of the markers? How representative are the specific images? Stability is mentioned in the text but data on the stability of expression/brightness is not shown.

      To address this comment, we have quantified the mean fluorescence intensity of the tagged cell populations in Fig. S3B-T. This data correlates well with the expected expression levels of each gene relative to the others (Fig. S3A), apart from CDH1 and RACGAP1, which are described in the discussion.

      The images in Fig. 2 show tagged populations enriched by FACS so they are non-clonal and are representative of the diversity of the population of tagged cells.

      The images shown in Fig. 3 are representative of the clonal tagged populations. The stability of the tag was not quantified directly. However, the fluorescence intensity was very stable across cells in clonal populations. Since these populations were recovered from a single cell and grown for several weeks, this low variability across cells in a population suggests that these tags are stable.

      The localization of markers, while consistent with expectations, is not validated by a second technique such as antibody staining, and in many cases not even with Hoechst to show nuclear vs cytoplasmic.

      We find that the localization of each protein is distinct and consistent with previous studies. To address this comment, we have added an overlay of the green fluorescence images with brightfield images to better show the location of the tagged protein relative to the nuclei and cytoplasm. We have also added references to other studies that showed the same localization patterns for these proteins in iPSCs and other relevant cell lines.

      For the multi-germ layer differentiation validation, NCAM is also expressed by ectoderm, so isn't a good solo marker for mesoderm as it was used. Indeed, the kit used for the differentiation suggests Brachyury combined with either NCAM or CXCR4, not NCAM alone.

      Since Brachyury is the most common mesodermal marker, we first tested differentiation using anti-Brachyury antibodies, but they did not work well for flow cytometry. We then switched to anti-NCAM antibodies. Since we used a kit for directed differentiation of iPSCs into the mesodermal lineage, NCAM staining should still report for successful differentiation. In the context of mixed differentiation experiments (embryoid body formation or teratoma assay), NCAM would not differentiate between ectoderm and mesoderm. The parental cells (201B7) have also been edited at the AAVS1 locus in multiple other studies, with no effect on their differentiation potential.

      Only a single female parental line has been generated and characterized. It would have been useful to have several lines and both male and female to allow sex differences to be explored.

      We agree that it would be interesting (and important) to study differences in protein localization between female and male cell types, and from different individuals with different genetic backgrounds. We see our tool as opening a door for cell biology to move away from randomly collected, transformed, differentiated cell types to more directed comparative studies of distinct normal cell types. Since few studies of cell biological processes have been done in normal cells, a first step is to understand how processes compare in an isogenic background, then future studies can reveal how they compare with other individuals and sexes. We hope that either our group or others will continue to build similar lines so that these studies can be done.

      The AI-based signal-to-noise enhancement needs more details and testing. Such models can introduce strong assumptions and thus artefacts into the resolved data. Was the model trained on all markers or were multiple models trained on a single marker each? For example, if trained to enhance a single marker (or co-localized group of markers), it could introduce artefacts where it forces signal localization to those areas even for others. What happens if you feed in images with scrambled pixel locations, does it still say the structures are where the training data says they should be? What about markers with different localization from the training set? If you feed those in, does it force them to the location expected by the training data or does it retain their differential true localization and simply enhance the signal?

      The image restoration neural network was used as in Weigert et al., 2018. The model was trained independently for each marker. Each trained model was used only on the corresponding marker and with the same imaging conditions as the training images. From visual inspection, the fluorescent signal in the restored images was consistent with the signal in the raw images, both for interphase and mitotic cells. We found very few artefacts of the restoration (small bright or dark areas) that were discarded. We did not try to restore scrambled images or images of mismatched markers.

      Reviewer #2 (Public Review):

      Summary:

      The authors have generated human iPSC cells constitutively expressing the mNG21-10 and tested them by endogenous tagging multiple genes with mNG211 (several tagged iPS cell lines clones were isolated). With this tool, they have explored several weakly expressed cytokinesis genes and gained insights into how cytokinesis occurs.

      Strengths:

      Human iPSC cells are used.

      Weaknesses:

      i) The manuscript is extremely incremental, no improvements are present in the split-fluorescent (split-FP) protein variant used nor in the approach for endogenous tagging with split-FPs (both of them are already very well established and used in literature as well as in different cell types).

      Although split fluorescent proteins and the endogenous tagging methodology had been developed previously, their use in human stem cells has not been explored. We argue that human iPSCs are a valuable model for cell biologists to study cellular processes in differentiating cells in an isogenic context for proper comparison. Many normal human cell types have not been studied at the cellular/subcellular level, and this tool will enable those studies. Importantly, other existing cell lines required transformation to persist in culture and represent a single, differentiated cell type that is not normal. Moreover, the protocols that we developed along with this methodology (e.g. workflows for iPSC clonal isolation that include automated colony screening and Nanopore sequencing) will be useful to other groups undertaking gene editing in human cells. Therefore, we argue that our work opens new doors for future cell biology studies.

      ii) The fluorescence intensity of the split mNeonGreen appears rather low, for example in Figure 2C the H2BC11, ANLN, SOX2, and TUBB3 signals are very noisy (differences between the structures observed are almost absent). For low-expression targets, this is an important limitation. This is also stated by the authors but image restoration could not be the best solution since a lot of biologically relevant information will be lost anyway.

      The split mNeonGreen tag is one of the brighter fluorescent proteins that is available. The low expression that the reviewer refers to for H2BC11, ANLN, TUBB3 and SOX2 is expected based on their predicted expression levels. Further, these images were taken with cells in dishes using lower resolution imaging and were not intended to be used for quantification. As shown in the images in Figures 3H, when using a different microscope with different optical settings and higher magnification, the localization is very clear and quantifiable without needing to use restoration (e.g., compare H2BC11 and ANLN). Using microscopes with high NA objectives, lasers and EMCCD or sCMOS cameras with high sensitivity can sufficiently detect levels of very weakly expressing proteins that can be quantified above background and compared across cells. It is worth noting that each tag may be studied in very different contexts. For example, ANLN will be useful for studies of cytokinesis, while the loss of SOX2 expression and gain of TUBB3 expression may be used to screen for differentiation rather than for localization per se. The reason for endogenous tagging is to study proteins at their native levels rather than using over-expression or fixation with antibodies where artefacts can be introduced. Endogenous tags tag will also enable studies of dynamic changes in localization during differentiation in an isogenic background as described previously.

      Importantly, image restoration is not required to image any of these probes! We use it to demonstrate how a researcher can increase the temporal resolution of imaging weakly-expressed proteins for extended periods of time. This data can be used to compare patterns of localization and reveal how patterns change with time and during differentiation. Imaging with fewer timepoints and altered optical settings will still permit researchers to extract quantifiable information from the raw data without requiring image restoration.

      iii) There is no comparison with other existing split-FP variants, methods, or imaging and it is unclear what the advantages of the system are.

      We are not sure what the reviewer means by this comment. In the future, we plan to incorporate an additional split-FP variant (e.g., split sfCherry) in this iPSC line to enable the imaging of more than one protein in the same cell. However, the split mNeonGreen system is still amenable for use with dyes with different fluorescence spectra that can mark other cellular components, especially for imaging over shorter timespans. In addition to tagging efficiency, the main advantage of split FPs is its scale, as demonstrated by the OpenCell project by tagging 1,310 proteins endogenously (Cho et al., 2022). We developed protocols that facilitate the identification of edited cell lines with high throughput. We also used multiple imaging methods throughout the study that relied on the use of different microscopes and flow cytometry, demonstrating the flexibility of this tagging system. Even for more weakly expressing proteins, the probe could be sufficiently visualized by multiple systems. Such endogenous tags can be used for everything from simply knowing when cells have differentiated (e.g., loss of SOX2 expression, gain of differentiation markers), to studying biological processes over a range of timescales.

      Reviewer #3 (Public Review):

      The authors report on the engineering of an induced Pluripotent Stem Cell (iPSC) line that harbours a single copy of a split mNeonGreen, mNG2(1-10). This cell line is subsequently used to take endogenous protein with a smaller part of mNeonGreen, mNG2(11), enabling the complementation of mNG into a fluorescent protein that is then used to visualize the protein. The parental cell is validated and used to construct several iPSC lines with endogenously tagged proteins. These are used to visualize and quantify endogenous protein localisation during mitosis.

      I see the advantage of tagging endogenous loci with small fragments, but the complementation strategy has disadvantages that deserve some attention. One potential issue is the level of the mNG2(1-10). Is it clear that the current level is saturating? Based on the data in Figure S3, the expression levels and fluorescence intensity levels show a similar dose-dependency which is reassuring, but not definitive proof that all the mNG2(11)-tagged protein is detected.

      We have not quantified the levels of mNG21-10 expression directly. However, the increase in fluorescence observed with highly expressed proteins (e.g., ACTB) supports that mNG21-10 levels must be sufficiently high to permit differences among endogenous proteins with vastly different expression levels. To ensure high expression, we used a previously validated expression system comprised of the CAG promoter integrated at the AAVS1 locus, which has previously been used to provide high and stable transgene expression (e.g. Oceguera-Yanez et al., 2016). We acknowledge that it is difficult to confirm that all of the endogenous mNG211-tagged protein is ‘detectable’.

      Do the authors see a difference in fluorescence intensity for homo- and heterozygous cell lines that have the same protein tagged with mNG2(11)? One would expect two-fold differences, or not?

      To answer this question, we measured the fluorescence intensity of homozygous and heterozygous clones carrying smNG2-anillin and smNG2-RhoA. We found homozygous clones that were approximately twice as bright as the corresponding heterozygous clones (Fig. S4H and I). This suggests that the complementation between mNG21-10 and mNG211 occurs efficiently over a range of mNG211 expression, since anillin is expressed weakly and RhoA is expressed more strongly in iPSCs. However, we also observed some homozygous clones that were not brighter than the corresponding heterozygous clones, which could be due to undetected byproducts of CRISPR or clonal variation in protein expression.

      Related to this, would it be favourable to have a homozygous line for expressing mNG2(1-10)?

      Our heterozygous cell line leaves the other AAVS1 allele available for integrations of other transgenes for future experiments. While a homozygous line could express more mNG2(1-10), it does not seem to be rate-limiting even with a highly-expressed protein like beta-actin, and we are not sure that it is necessary. The value gained by having the free allele could outweigh the difference in mNG2(1-10) levels.

      The complementation seems to work well for the proteins that are tested. Would this also work for secreted (or other organelle-resident) proteins, for which the mNG2(11) tag is localised in a membrane-enclosed compartment?

      The interaction between the 1-10 and 11 fragments is strong and should be retained when proteins are secreted. It was recently shown that secreted proteins tagged with GFP11 can be detected when interacting with GFP1-10 in the extracellular space, albeit using over-expression (Minegishi et al., 2023). However, in our work, the mNG21-10 fragment is cytosolic and we have only explored proteins localized to the nucleus or the cytoplasm similar to Cho et al., (2022). By GO annotation, 75% of human proteins are present in the cytoplasm and/or nucleus, which still covers a wide range of proteins of interest. Future versions of our line could include incorporating organelle-targeting peptides to drive the large fragment to specific, non-cytosolic locations.

      The authors present a technological advance and it would be great if others could benefit from this as well by having access to the cell lines.

      As discussed below, some of the resources are already available, and we are working to make the mNG21-10 cell line available for distribution.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      The manuscript is methodological, the main achievement is the generation of a stable iPSC with the split Neon system available for the scientific community. Although it is technically solid, the judgement of this reviewer is that the manuscript should be considered for a more specialised/methodological/resource-based journal.

      Indeed, we have submitted this article under the “tools and resources” category of eLife, which publishes methodology-centered papers of high technical quality. We felt this was a good venue for the audience that it can reach compared to more specialized journals that may be more limited in scope. For example, our system will be a useful resource for cell biologists and they are more likely to see it in eLife compared to more specialized journals.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors present a technological advance and it would be great if others can benefit from this as well. Therefore access to the materials (and data) would be valuable (the authors do a great job by listing all the repair templates and primers).

      We have added several pieces of data and information to the supplementary materials, as described below.

      For instance:

      What is the (complete/plasmid) sequence of the AAVS1-mNG2(1-10) repair plasmid? Will it be deposited at Addgene?

      The plasmids used in this paper are now available on Addgene, along with their sequences [ID 206042 for pAAVS1-Puro-CAG-mNG2(1-10) and 206043 for pH2B-mNG2(11)].

      The ImageJ code for the detection of colonies is interesting and potentially valuable. Will the code be shared (e.g. at Github, or as supplemental text)?

      The ImageJ macro has been uploaded to the CMCI Github page (https://github.com/CMCI/colony_screening). The parameters are optimized to perform segmentation on images obtained using a Cytation5 microscope with our specific settings, but they can be tweaked for any other sets of images. The following text has been added to the methods section: “The code for this macro is available on Github (https://github.com/CMCI/colony_screening)”.

      The cell line with the mNG2(1-10) as well as other cell lines can be of interest to others. Will the cell lines be made available? If so, can the authors indicate how?

      We are in the process of depositing our cell line in a public repository. This process may take some time for quality control. For now, the cells can be made available by requesting them from the corresponding authors.

      (2) How well does the ImageJ macro for detection of the colonies in the well work? Is there any comparison of analysis by a human vs. the macro?

      In our most recent experiment, the colony screening macro correctly identified 99.5% of wells compared to manual annotation (83/84 positive wells and 108/108 negative wells). For each 96-well plate, imaging takes 25 minutes, and it takes 7 minutes for analysis. Despite a few false negatives, we expect this macro to be useful for large-scale experiments where multiple 96-well plates need to be screened, which would take hours manually.

      (3) The CDH labeling was not readily detected by FACS, but was visible by microscopy. Is the labeling potentially disturbed by the procedure (low extracellular calcium + trypsin?) to prepare the cell for FACS?

      It is not clear why the CDH labelling was not detected by FACS. As the reviewer suggests, there could be several reasons: E-cadherin could be broken down by the dissociation reagent (Accutase), or recycled into the cell following the loss of adhesion and the low extracellular calcium in PBS. However, the C-terminal intracellular tail of E-cadherin was tagged, which should not be affected by Accutase. Moreover, recycling into the cell should still result in a detectable fluorescent signal. Notably, the flow cytometry experiments were done as quickly as possible after dissociation to minimize the time that E-cadherin could be degraded or recycled. We also resuspended the cells in MTeSR Plus media instead of PBS, and compared cells grown on iMatrix511 to those grown on Matrigel in case differences in the extracellular matrix affected Ecadherin expression. Another possibility is that the microscopy used for detection of E-cadherin in cells involved using a sweptfield livescan confocal microscope with high NA objective, 100mW 488nm laser and an EMCCD camera with high sensitivity, and perhaps this combination permitted detection better than the detector on the BD FACSMelody used for FACs.

      (4) The authors write that the "Tubulin was cytosolic during interphase" which is surprising (and see also figure 3H), as I was expecting it to be incorporated in microtubules. May this be an issue of insufficient resolution (if I'm right this was imaged with 20x, NA=0.35 and so the resolution could be improved by imaging at higher NA)?

      Indeed, as the reviewer points out, our terminology (cytosol vs. microtubule) reflects the low resolution of the imaging for the cell populations in dishes and the individual alpha-tubulin monomers being labelled with the mNG211 tag, which are present as cytoplasmic monomers as well as polymers on microtubules. However, even in this image (Fig. 2C), the mitotic spindle microtubules are visible as they are so robust compared to the interphase microtubules. Notably, when we imaged cells from the cloned tagged cell line using a microscope designed for live imaging with a higher NA objective (see above), endogenous tagged TUBA1B was even more clearly visible in spindle microtubules, and was weakly observed in some microtubules in interphase cells, although they are slightly out of focus (Fig. 3H). If we had focused on a lower focal plane where the interphase cells are located and altered the optical settings, we would see more microtubules.

      (5) It would be nice to have access to the Timelapse data as supplemental movies (.e.g from the experiments shown in Figure 4).

      We have added the movies corresponding to the timeplase images as supplementary movies (Movies S1-6), with the raw and restored movies shown side-by-side.

      (6) In Figure 3B, the order of the colors in the bar is reversed relative to the order of the legend. Would it be possible to use the same order? That makes it easier for me (as a colorblind person) to match the colors in the figure with that of the legend.

      We have modified the legend in Fig 2B and 3B to be in the same order as the bars.

    3. Reviewer #1 (Public Review):

      Summary:

      In this manuscript the authors have applied an asymmetric split mNeonGreen2 (mNG2) system to human iPSCs. By integrating a constitutively expressed long fragment of mNG2 at the AAVS1 locus, this allows other proteins to be tagged through the use of available ssODN donors. This removes the need to generate long AAV donors for tagging, thus greatly facilitating high-throughput tagging efforts. The authors then demonstrate the feasibility of the method by successfully tagging 9 markers expressed in iPSC at various, and one expressed upon endoderm differentiation. Several additional differentiation markers were also successfully tagged but not subsequently tested for expression/visibility. As one might expect for high-throughput tagging, a few proteins, while successfully tagged at the genomic level, failed to be visible. Finally, to demonstrate the utility of the tagged cells, the authors isolated clones with genes relevant to cytokinesis tagged, and together with an AI to enhance signal to noise ratios, monitored their localization over cell division.

      Strengths

      Reviewer Comment: Characterization of the mNG2 tagged parental iPSC line was well and carefully done including validation of a single integration, the presence of markers for continued pluripotency, selected off-target analysis and G-banding-based structural rearrangement detection.<br /> The ability to tag proteins with simple ssODNs in iPSC capable of multi-lineage differentiation will undoubtedly be useful for localization tracking and reporter line generation.<br /> Validation of clone genotypes was carefully performed and highlights the continued need for caution with regards to editing outcomes.

      Weaknesses

      Reviewer Comment: IF and flow cytometry figures lack quantification and information on replication. How consistent is the brightness and localization of the markers? How representative are the specific images? Stability is mentioned in the text but data on the stability of expression/brightness is not shown.

      Author Response: To address this comment, we have quantified the mean fluorescence intensity of the tagged cell populations in Fig. S3B-T. This data correlates well with the expected expression levels of each gene relative to the others (Fig. S3A), apart from CDH1 and RACGAP1, which are described in the discussion.

      Reviewer Reply: Great, thanks.

      Reviewer Comment: The localization of markers, while consistent with expectations, is not validated by a second technique such as antibody staining, and in many cases not even with Hoechst to show nuclear vs cytoplasmic.

      Author Response: We find that the localization of each protein is distinct and consistent with previous studies. To address this comment, we have added an overlay of the green fluorescence images with brightfield images to better show the location of the tagged protein relative to the nuclei and cytoplasm. We have also added references to other studies that showed the same localization patterns for these proteins in iPSCs and other relevant cell lines.

      Reviewer Reply: There was no question that the localization fit with expectations, however, this still doesn't show that in the same cell the tag is in the same spot. It would have been fairly simple to do for at least a handful of markers, image, fix and stain to demonstrate unequivocally the tag and protein are co-localized. Of course, this isn't damning by any means, it just would have been nice.

      Reviewer Comment: For the multi-germ layer differentiation validation, NCAM is also expressed by ectoderm, so isn't a good solo marker for mesoderm as it was used. Indeed, the kit used for the differentiation suggests Brachyury combined with either NCAM or CXCR4, not NCAM alone.

      Author Response: Since Brachyury is the most common mesodermal marker, we first tested differentiation using anti-Brachyury antibodies, but they did not work well for flow cytometry. We then switched to anti-NCAM antibodies. Since we used a kit for directed differentiation of iPSCs into the mesodermal lineage, NCAM staining should still report for successful differentiation. In the context of mixed differentiation experiments (embryoid body formation or teratoma assay), NCAM would not differentiate between ectoderm and mesoderm. The parental cells (201B7) have also been edited at the AAVS1 locus in multiple other studies, with no effect on their differentiation potential.

      Reviewer Reply: This is placing a lot of trust in the kit that it only makes what it says it makes. It could have been measured by options other than flow such as qPCR, Western blot, or imaging, but fine.

      Reviewer Comment: Only a single female parental line has been generated and characterized. It would have been useful to have several lines and both male and female to allow sex differences to be explored.

      Author Response: We agree that it would be interesting (and important) to study differences in protein localization between female and male cell types, and from different individuals with different genetic backgrounds. We see our tool as opening a door for cell biology to move away from randomly collected, transformed, differentiated cell types to more directed comparative studies of distinct normal cell types. Since few studies of cell biological processes have been done in normal cells, a first step is to understand how processes compare in an isogenic background, then future studies can reveal how they compare with other individuals and sexes. We hope that either our group or others will continue to build similar lines so that these studies can be done.

      Reviewer Reply: Fair enough.

      Reviewer Comment: The AI-based signal to noise enhancement needs more details and testing. Such models can introduce strong assumptions and thus artefacts into the resolved data. Was the model trained on all markers or were multiple models trained on a single marker each? For example, if trained to enhance a single marker (or co-localized group of markers), it could introduce artefacts where it forces signal localization to those areas even for others. What happens if you feed in images with scrambled pixel locations, does it still say the structures are where the training data says they should be? What about markers with different localization from the training set. If you feed those in, does it force them to the location expected by the training data or does it retain their differential true localization and simply enhance the signal?

      Author Response: The image restoration neural network was used as in Weigert et al., 2018. The model was trained independently for each marker. Each trained model was used only on the corresponding marker and with the same imaging conditions as the training images. From visual inspection, the fluorescent signal in the restored images was consistent with the signal in the raw images, both for interphase and mitotic cells. We found very few artefacts of the restoration (small bright or dark areas) that were discarded. We did not try to restore scrambled images or images of mismatched markers.

      Reviewer Reply: I understand. What I'm saying is that for the restoration technique to be useful you need to know that it won't introduce artefacts if you have an unexpected localization. Think of it this way, if you already know the localization, then there's no point measuring it. If you don't, or there's a possibility that it is somewhere unexpected, then you need to know with confidence that your algorithm will be able to accurately detect that unexpected localization. As such, it would be extremely important to validate that your restoration algorithm will not bias the results to the expected localization if the true localization is unexpected/not seen in the training dataset. It would have been extremely trivial to run this analysis and I do not feel this comment has been in any way adequately addressed.

    4. Reviewer #2 (Public Review):

      Summary:

      The authors have generated human iPSC cells constitutively expressing the mNG21-10 and tested them by endogenous tagging multiple genes with mNG211 (several tagged iPS cell lines clones were isolated). With this tool they have explored several weakly expressed cytokinesis genes gained insights into how cytokinesis occurs.

      Strengths:

      (i) Human iPSC cells are used

      Weaknesses:

      (i) The manuscript is extremely incremental, no improvements are present in the split-Fluorescent (split-FP) protein variant used nor in the approach for endogenous tagging with split-FPs (both of them are already very well established and used in literature as well as in different cell types).

      (ii) The fluorescence intensity of the split mNeonGreen appears rather low, for example in Figure 2C the H2BC11, ANLN, SOX2 and TUBB3 signals are very noisy (differences between the structures observed are almost absent). For low expression targets this is an important limitation. This is also stated by the authors but image restoration could not be the best solution since a lot of biologically relevant information will be lost anyway.

      (iii) there is no comparison with other existing split-FP variants, methods, or imaging and it is unclear what the advantages of the system are.

    5. Reviewer #3 (Public Review):

      The authors report on the engineering of an induced Pluripotent Stem Cell (iPSC) line that harbours a single copy of a split mNeonGreen, mNG2(1-10). This cell line is subsequently used to take endogenous protein with a smaller part of mNeonGreen, mNG2(11), enabling complementation of mNG into a fluorescent protein that is then used to visualize the protein. The parental cell is validated and used to construct several iPSC line with endogenously tagged proteins. These are used to visualize and quantify endogenous protein localisation during mitosis.

      I see the advantage of tagging endogenous loci with small fragments, but the complementation strategy has disadvantages that deserve some attention. One potential issue is the level of the mNG2(1-10). In addition, this may probably not work for organelle-resident proteins, where the mNG2(11) tag is localised in a membrane enclosed compartment.

      Overall the tools and resources reported in this paper will be valuable for the community that aims to study proteins at endogenous levels.

    1. eLife assessment

      This useful study introduces a simple mechanical model of C. elegans locomotion that captures aspects of the worm's behavioral repertoire beyond forward crawling. While the kinetic model (ElegansBot) provides a compromise and starting point to help understand the mechanical components of C. elegans behavior, the claim that this work improves on extant mechanical models is incomplete, including modeling a 3-dimensional turning behavior with a 2-dimensional model without sufficient justification. In addition, the results of the application of the model to previously unstudied behaviors are primarily qualitative and do not produce new predictions.

    2. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1

      Public Review

      Summary:

      (1) This work describes a simple mechanical model of worm locomotion, using a series of rigid segments connected by damped torsional springs and immersed in a viscous fluid.

      (2) It uses this model to simulate forward crawling movement, as well as omega turns.

      Strengths:

      (3) The primary strength is in applying a biomechanical model to omega-turn behaviors.

      (4) The biomechanics of nematode turning behaviors are relatively less well described and understood than forward crawling.

      (5) The model itself may be a useful implementation to other researchers, particularly owing to its simplicity.

      Weaknesses:

      (6) The strength of the model presented in this work relative to prior approaches is not well supported, and in general, the paper would be improved with a better description of the broader context of existing modeling literature related to undulatory locomotion.

      (7) This paper claims to improve on previous approaches to taking body shapes as inputs.

      (8) However, the sole nematode model cited aims to do something different, and arguably more significant, which is to use experimentally derived parameters to model both the neural circuits that induce locomotion as well as the biomechanics and to subsequently compare the model to experimental data.

      (9) Other modeling approaches do take experimental body kinematics as inputs and use them to produce force fields, however, they are not cited or discussed.

      (10) Finally, the overall novelty of the approach is questionable.

      (11) A functionally similar approach was developed in 2012 to describe worm locomotion in lattices (Majmudar, 2012, Roy. Soc. Int.), which is not discussed and would provide an interesting comparison and needed context.

      9-11: The paper you recommended and our manuscript have some similarities and differences.

      Similarities

      Firstly, the components constituting the worm are similar in both models. ElegansBot models the worm as a chain of n rods, while the study by Majmudar et al. (2012) models it as a chain of n beads. Each bead in the Majmudar et al. model has a directional vector, making it very similar to ElegansBot's rod. However, there's a notable difference: in the Majmudar et al. model, each bead has an area for detecting contact between the obstacle and the bead, while in ElegansBot, the rod does not feature such an area.

      Secondly, the types of forces and torques acting on the components constituting the worm are similar. Each rod in ElegansBot receives frictional force, muscle force, and joint force. Each bead in the Majmudar et al. model receives a constraint force, viscous force, and a repulsive force from obstacles. Each rod in ElegansBot receives frictional torque, muscle torque, and joint torque. Each bead in the Majmudar et al. model receives elastic torque, constraint torque, drive torque, and viscous torque. The Majmudar et al. model's constraint force and torque are similar to ElegansBot's joint force and torque in that they prevent two connected components of the worm from separating. The Majmudar et al. model's viscous force and torque are similar to ElegansBot's frictional force and torque in that they are forces exchanged between the worm and its surrounding environment (ground surface). The Majmudar et al. model's drive torque is similar to ElegansBot's muscle force and muscle torque as a cause of the worm's motion. However, unlike ElegansBot, the Majmudar et al. model did not consider the force generating the drive torque, and there are differences in how each force and torque is calculated. This will be discussed in more detail below.

      Differences

      Firstly, the medium in which the worm locomotes is different. ElegansBot is a model describing motion in a homogeneous medium like agar or water without obstacles, while the Majmudar et al. model describes motion in water with circular obstacles fixed at each lattice point. This is because the purposes of the models are different. ElegansBot analyzes locomotion patterns based on the friction coefficient, while the Majmudar et al. model analyzes locomotion patterns based on the characteristics of the obstacle lattice, such as the distance between obstacles. Also, for this reason, the Majmudar et al. model's bead, unlike ElegansBot's rod, receives a repulsive force from obstacles.

      Secondly, the specific methods of calculating similar types of forces differ. ElegansBot calculates joint forces by substituting frictional forces, muscle forces, frictional torques, and muscle torques into an equation derived from differentiating a boundary condition equation twice over time, where two neighboring rods always meet at one point. This involves determining the process through which various forces and torques are transmitted across the worm. Specifically, it entails calculating how the frictional forces and torques, as well as the muscle forces and torques acting on each rod, are distributed throughout the entire length of the worm. In contrast, The Majmudar et al. model uses Lagrange multipliers method based on a boundary condition that the curve length determined by each bead's tangential angle does not change, to calculate the constraint force and torque before calculating the drive torque and viscous force. This implies that the Majmudar et al. model did not consider the mechanism by which the drive torque and viscous force received by one bead are distributed throughout the worm. ElegansBot's rod receives an anisotropic Stokes frictional force from the ground surface, while the Majmudar et al. model considered the frictional force according to the Navier-Stokes equation for incompressible fluid, assuming the fluid velocity at the bead's location as the bead's velocity.

      Thirdly, unlike the Majmudar et al. model, ElegansBot considers the inertia of the worm components. Therefore, ElegansBot can simulate regardless of how low or high the ground surface's friction coefficient is. the Majmudar et al. model is not like this.

      (12) The idea of applying biomechanical models to describe omega turns in C. elegans is a good one, however, the kinematic basis of the model as used in this paper (the authors do note that the control angle could be connected to a neural model, but don't do so in this work) limits the generation of neuromechanical control hypotheses.

      8, 12: We do not agree with the claim that ElegansBot could limit other researchers in generating neuromechanical control hypotheses. The term θ_("ctrl" ,i)^((t) ) used in our model is designed to be replaceable with neuromechanical control in the future.

      (13) The model may provide insights into the biomechanics of such behaviors, however, the results described are very minimal and are purely qualitative.

      (14-1) Overall, direct comparisons to the experiments are lacking or unclear.

      14-1: If you look at the text explaining Fig. 2 and 5 (Fig. 2 and 4 in old version), it directly compares the velocity, wave-number, and period as numerical indicators representing the behavior of the worm, between the experiment and ElegansBot.

      (14-2) Furthermore, the paper claims the value of the model is to produce the force fields from a given body shape, but the force fields from omega turns are only pictured qualitatively.

      13, 14-2: We gratefully accept the point that our analysis of the omega-turn is qualitative. Therefore, we have conducted additional quantitative analysis on the omega-turn and inserted the results into the new Fig. 4. We have considered the term 'Force field' as referring to the force vector received by each rod. We have created numerical indicators representing various behaviors of the worm and included them in the revised manuscript.

      (15) No comparison is made to other behaviors (the force experienced during crawling relative to turning for example might be interesting to consider) and the dependence of the behavior on the model parameters is not explored (for example, how does the omega turn change as the drag coefficients are changed).

      Thank you for the great idea. To compare behaviors, first, a clear criterion for distinguishing behaviors is needed. Therefore, we have created a new mathematical definition for behavior classification in the revised manuscript (“Defining Behavioral Categories” in Method). After that, we compared the force and power (energy consuming rate) between each forward locomotion, backward locomotion, and omega-turn (Fig. 4). And in the revised manuscript, we newly analyzed how the turning behavior changes with variations in the friction coefficients in Figs. S4-S7.

      (16) If the purpose of this paper is to recapitulate the swim-to-crawl transition with a simple model, and then apply the model to new behaviors, a more detailed analysis of the behavior of the model variables and their dependence on the variables would make for a stronger result.

      In our revised manuscript, we have quantitatively analyzed the changes occurring in turning behavior from water to agar, and the results are presented in Figs. S9 and S10.

      (17) In some sense, because the model takes kinematics as an input and uses previously established techniques to model mechanics, it is unsurprising that it can reproduce experimentally observed kinematics, however, the forces calculated and the variation of parameters could be of interest.

      (18) Relatedly, a justification of why the drag coefficients had to be changed by a factor of 100 should be explored.

      (19) Plate conditions are difficult to replicate and the rheology of plates likely depends on a number of factors, but is for example, changes in hydration level likely to produce a 100-fold change in drag? or something more interesting/subtle within the model producing the discrepancy?

      18, 19: As mentioned in the paper, we do not know if the friction coefficients in the study of Boyle et al. (2012) and the friction coefficients in the experiment of Stephens et al. (2016) are the same. In our revised manuscript, we have explored more in detail the effects of the friction coefficient's scale factor, and explained why we chose a scale factor of 1/100 (“Proper Selection of Friction Coefficients” in Supplementary Information). In summary, we analyzed the changes in trajectory due to scaling of the friction coefficient, and chose the scale factor 1/100 as it allowed ElegansBot to accurately reproduce the worm's trajectory while also being close to the friction coefficients in the Boyle et al. paper.

      (20) Finally, the language used to distinguish different modeling approaches was often unclear.

      (21) For example, it was unclear in what sense the model presented in Boyle, 2012 was a "kinetic model" and in many situations, it appeared that the term kinematic might have been more appropriate. Thank you for the feedback. As you pointed it out, we have corrected that part to 'kinematic' in the revised manuscript.

      (22) Other phrases like "frictional forces caused by the tension of its muscles" were unclear at first glance, and might benefit from revision and more canonical usage of terms.

      We agree that the expression may not be immediately clear. This is due to the word limit for the abstract (the abstract of eLife VOR should be under 200 words, and our paper's abstract is 198 words), which forced us to convey the causality in a limited number of words. Therefore, although we will not change the abstract, the expression in question means that the muscle tension, which is the cause of the worm's locomotion, ultimately generates the frictional force between the worm and the ground surface.

      Recommendations For The Authors

      (23) As I stated in my public review, I think the paper could be made much stronger if a more detailed exploration of turning mechanics was presented.

      (24) Relatedly, rather than restricting the analysis to individual videos of turning behaviors, I wonder if a parameterized model of the turning kinematics would be fruitful to study, to try to understand how different turning gaits might be more or less energetically favorable.

      We thank the reviewer once again for their suggestion. Thanks to their proposal, we were able to conduct additional quantitative analysis on turning behavior.

      Reviewer #2

      Public Review

      Summary:

      (1) Developing a mechanical model of C. elegans is difficult to do from basic principles because it moves at a low (but not very small) Reynolds number, is itself visco-elastic, and often is measured moving at a solid/liquid interface.

      (2) The ElegansBot is a good first step at a kinetic model that reproduces a wide range of C. elegans motiliy behavior.

      Strengths: (3) The model is general due to its simplicity and likely useful for various undulatory movements.

      (4) The model reproduces experimental movement data using realistic physical parameters (e.g. drags, forces, etc).

      (5) The model is predictive (semi?) as shown in the liquid-to-solid gait transition.

      (6) The model is straightforward in implementation and so likely is adaptable to modification and addition of control circuits.

      Weaknesses:

      (7) Since the inputs to the model are the actual shape changes in time, parameterized as angles (or curvature), the ability of the model to reproduce a realistic facsimile of C. elegans motion is not really a huge surprise. (8) The authors do not include some important physical parameters in the model and should explain in the text these assumptions.

      (9. 1) The cuticle stiffness is significant and has been measured [1].

      (10. 2) The body of C. elegans is under high hydrostatic pressure which adds an additional stiffness [2].

      (11. 3) The visco-elasticity of C. elegans body has been measured. [3]

      Thank you for asking. The stiffness of C. elegans is an important consideration. We took this into account when creating ElegansBot, but did not explain it in the paper. The detailed explanation is as follows. C. elegans indeed has stiffness due to its cuticle and internal pressure. This stiffness is treated as a passive elastic force (elastic force term of lateral passive body force) in the paper of Boyle et al. (2012). However, the maximum spring constant of the passive elastic force is 1/20 of the maximum spring constant of the active elastic force. If we consider this fact in our model, the elastic term of the muscle torque is as follows: ( is the active torque elasticity coefficient, is the passive torque elasticity coefficient)

      where

      Therefore, there is no need to describe the active and passive terms separately in

      Furthermore, since , assuming , then and .

      (12) There is only a very brief mention of proprioception.

      (13) The lack of inclusion of proprioception in the model should be mentioned and referenced in more detail in my opinion.

      As you emphasized, proprioception is an important aspect in the study of C. elegans' locomotion. In our paper, its importance is briefly introduced with a sentence each in the introduction and discussion. However, our research is a model about the process of the creation of body motion originated from muscle forces, and it does not model the sensory system that senses body posture. Therefore, there is no mention of using proprioception in our paper's results section. What is mentioned in the discussion is that ElegansBot can be applied as the kinetic body model part in a combination model of a kinetic body model and a neuronal circuit model that receives proprioception as a sensory signal.

      (14) These are just suggested references.

      (15) There may be more relevant ones available.

      The papers you provided contain specific information about the Young's modulus of the C. elegans body. The first paper (Rahimi et al., 2022) measured the Young's modulus of the cuticle after chemically isolating it from C. elegans, while the second paper (Park et al., 2007) and third paper (Backholm et al., 2013) measured the elasticity and Young's modulus of C. elegans without separating the cuticle. Based on the Young's modulus provided in each paper (although the second and third papers did not measure stiffness in the longitudinal direction), we derived the elastic coefficient (assuming a worm radius of 25 μm, cuticle thickness of 0.5 μm, and 1/25 of longitudinal length of the cuticle of 40 μm). The range was quite broad, from 9.82ⅹ1011 μg/sec2 (from the first paper) to 2.16 ⅹ 108 μg / sec2 (from the third paper). Although the elastic coefficient value in our paper falls within this range, since the range of the elastic coefficient is wide, we think we can modify the elastic coefficient in our paper and will be able to reapply our model if more accurate values become known in the future.

      Reviewer #3

      Public Review

      Summary:

      (1) A mechanical model is used with input force patterns to generate output curvature patterns, corresponding to a number of different locomotion behaviors in C. elegans

      Strengths:

      (2) The use of a mechanical model to study a variety of locomotor sequences and the grounding in empirical data are strengths.

      (3) The matching of speeds (though qualitative and shown only on agar) is a strength.

      Weaknesses:

      (4) What is the relation between input and output data?

      ElegansBot takes the worm's body control angle as the input, and produces trajectory and force of each segment of the worm as the output.

      (5) How does the input-output relation depend on the parameters of the model?

      If 'parameter' is understood as vertical and horizontal friction coefficients, then the explanation for this can be found in Fig. 5 (Fig. 4 in the old version).

      (6) What biological questions are addressed and can significant model predictions be made?

      Equation of motion deciphering locomotion of C. elegans including turning behaviors which were relatively less well understood.

      Recommendations For The Authors

      (7) The novelty and significance of the paper should be clarified.

      We have added quantitative analyses of turning behavior in the revised manuscript, and we hope this will be helpful to you.

      (8) Previously much more detailed models have been published, as compared to this one.

      We hope the reviewer can point out any previous model that we may have missed.

      (9) The mechanics here are simplified (e.g. no information about dorsal/ventral innervation but only a bending angle) setting limitations on the capacity for model predictiveness.

      (10) Such limitations should be discussed.

      We view the difference between dorsal/ventral innervation and bending angle not as a matter of simplification, but rather as a reflection of the hierarchy that our model implements. Our model does not consider dorsal/ventral innervation, but it uses the bending angle to reproduce behavior in various input and frictional environments, which signifies the strong predictiveness of ElegansBot (Figure 2, 3, 5 (2, 3, 4 in the old version)). Moreover, if the midline of C. elegans is incompressible, then modeling by dividing into dorsal/ventral, as opposed to modeling solely with the bending angle, does not increase the degree of freedom of the worm model, and therefore does not increase its predictiveness.

      (11) The aims of the paper and results need to be supported quantitatively and analyzed through parameter sweeps and intervention.

      We have conducted additional quantitative analyses on turning behavior as suggested by Reviewer #1 (Fig. 4, S4-S7, S9, and S10).

      (12) The methods are given only in broad brushstrokes, and need to be much more clear (and ideally sharing all code).

      We have thoroughly detailed every aspect of this research, from deriving the physical constants of C. elegans, agar, and water to developing the formulas and proofs necessary for operating ElegansBot and its applications. This comprehensive information is all presented in the Results, Methods, and Supplementary Information sections, as well as in the source code. Moreover, we have already ensured that our research can be easily reproduced by providing detailed explanations and by making ElegansBot accessible through public software databases (PyPI, GitHub). To further aid in its application and understanding, especially for those less familiar with the subject, we have also included minimal code as examples in the database. This code is designed to simplify the process of reproducing the results of the paper, thereby making our research more accessible and understandable. Therefore, we believe that readers will easily gain significant assistance from the extensive information we have provided. Should readers require further help, they can always contact us, and we will be readily available to offer support.

      (13) The supporting figures and movies need to include a detailed analysis to evidence the claims.

      We have conducted and provided additional quantitative analyses on turning behavior as suggested by Reviewer #1 (Fig. 4, S4-S7, S9, and S10).

    3. Reviewer #1 (Public Review):

      Summary:

      This work describes a simple mechanical model of worm locomotion, using a series of rigid segments connected by damped torsional springs and immersed in a viscous fluid. It uses this model to simulate forward crawling movement, as well as omega turns.

      Strengths:

      The primary strength is in applying a biomechanical model to omega-turn behaviors. The biomechanics of nematode turning behaviors are relatively less well described and understood than forward crawling, and the increase in power during omega turns is one of the more novel results. The model itself may be a useful implementation to other researchers, particularly owing to its simplicity.

      Weaknesses:

      The strength of the model presented in this work relative to prior approaches is not well supported, and in general the paper would be improved with a better description of the broader context of existing modeling literature related to undulatory locomotion. This paper claims to improve on previous approaches to taking body shapes as inputs. However, the sole nematode model cited aims to do something different, and arguably more significant, which is to use experimentally derived parameters to model both the neural circuits that induce locomotion as well as the biomechanics and to subsequently compare the model to experimental data. Other modeling approaches do take experimental body kinematics as inputs and use them to produce force fields, however, they are not cited or discussed. Finally, the overall novelty of the approach is questionable. A functionally similar approach was developed in 2012 to describe worm locomotion in lattices (Majmudar, 2012, Roy. Soc. Int.), which is not discussed and would provide an interesting comparison and needed context.

      In some sense, because the model takes kinematics as an input and uses previously established techniques to model mechanics, it is unsurprising that it can reproduce experimentally observed kinematics, however, the forces calculated and the variation of parameters could be of interest, but other methods derived from kinematics could provide similar results. It is unclear what the predictive power of the model is.

      Relatedly, a justification of why the drag coefficients had to be changed by a factor of 100 should be explored. Plate conditions are difficult to replicate and the rheology of plates likely depends on several factors, but is for example, changes in hydration level likely to produce a 100-fold change in drag? or something more interesting/subtle within the model producing the discrepancy?

      Finally, the language used to distinguish different modeling approaches was often unclear. For example, it was unclear in what sense the model presented in Boyle, 2012 was a "kinetic model" and in many situations, it appeared that the term kinematic might have been more appropriate. Other phrases like "frictional forces caused by the tension of its muscles" were unclear at first glance, and might benefit from revision and more canonical usage of terms.

    4. Reviewer #2 (Public Review):

      Summary:

      Developing a mechanical model of C. elegans is difficult to do from basic principles because it moves at low (but not very small) Reynolds number, is itself visco-elastic, and often is measured moving at a solid/liquid interface. The ElegansBot is a good first step at a kinetic model that reproduces a wide range of C. elegans motility behavior.

      Strengths:

      The model is general due to its simplicity and likely useful for various undulatory movements. The model reproduces experimental movement data using realistic physical parameters (e.g. drags, forces, etc). The model is predictive (semi?) as shown in the liquid to solid gait transition. The model is straightforward in implementation and so likely is adaptable to modification and addition of control circuits.

      Comments on revised version:

      This is a revised manuscript. I'm happy with the changes made, including the specific responses to my previous concerns.

    5. Reviewer #3 (Public Review):

      A mechanical model of C. elegans, embedded in a resistive force environment, is used to calculate input torque patterns required to generate output curvature patterns and coordinates, corresponding to a number of different locomotion behaviors in C. elegans.

      Strengths:

      The use of a mechanical model to study a variety of locomotor sequences and the grounding in empirical data are strengths. The matching of speeds (though requiring adjusted drag coefficients) is a strength.

      Weaknesses:

      The paper lacks evidence of numerical validation or comparison with the results and tools in the literature. E.g. is it surprising that the uniform torque distribution yields maximal speed? What is the relation between input and output data? How does the input-output relation depend on the parameters of the model? What novel model predictions are made?

      In particular, if validated, the breakdown of drag forces and torque distributions during forward locomotion and turning behaviors may be interesting to compare to predictions by other tools, and to empirical measurement. One caveat is that the worm touches itself during such turns, and even crosses over itself in delta turns, and so the estimated drag coefficients and the resultant mechanical forces are likely incorrect.

    1. eLife assessment

      This useful study, which systematically addresses off-target effects of a commonly used chemotherapy drug on bone and bone marrow cells and which therefore is of potential interest to a broad readership, presents evidence that reducing systemic inflammation induced by doxorubicin limits bone loss to some extent. The demonstration of the effect of systemic inflammation on bone loss is convincing. Building on prior work, this study sets the scene for additional genetic and pharmacologic experiments as well as future analyses of the bone phenotypes, which should speak to the mechanisms involved in doxorubicin-induced bone loss – which are not addressed in the current study – and which may substantiate the clinical relevance of targeting inflammation in order to limit the negative impact of chemotherapies on bone quality.

    2. Author Response

      The following is the authors’ response to the previous reviews.

      Reviewing Editor

      We thank you for clarifying several of the questions raised by the reviewers. Since the study has otherwise largely stayed unchanged, we will leave the eLife assessment as “before”:

      We respectfully disagree because we addressed all concerns raised by the two reviewers except one (below), which was not satisfactorily answered according to reviewer 1; it has now been addressed (new S3 Fig).

      Reviewer #1 (Recommendations For The Authors):

      The authors addressed most of my previous comments. However, there is one important point that was not satisfactorily addressed "The band intensities on Western blots in Fig. 4 and Fig. 5 are not quantified, and the numbers of repeats are also not provided" The response that "It is not straightforward to quantify and describe the intensity of the bands of these numerous with different fate outcomes." In the revision, they mentioned at least three repeats were performed. If so, it's not entirely clear why they couldn't quantify the western blots results. Including quantitative data will strengthen the rigor of the findings.

      Quantitative data from Fig. 4 and Fig. 5 are now provided as S3 Fig and described in the manuscript (lines 170-175; 184-188).

    3. Reviewer #1 (Public Review):

      Summary:

      Doxorubincin has long been known to cause bone loss by increasing osteoclast and suppressing osteoblast activities. The study by Wang et al. reports a comprehensive investigation into the off-target effects of doxorubicin on bone tissues and potential mechanisms.. They used a tumor-free model with wild type mice and found that even a single dose of doxorubicin has a major influence by increasing leukopenia and DAMPs and inflammasomes in macrophages and neutrophils, and inflammation-related cell death (pyroptosis and NETosis). The gene knockout study shows that AIM2 and NLRP3 are the major contributors to bone loss. Overall, the study confirmed previous findings regarding the impact of doxorubicin on tissue inflammation and expands the research further into bone tissue. The presented data presented are consistent; however, a major question remains regarding whether doxorubicin drives inflammation and its related events. Most in vitro study showed that the effect of doxorubincin cannot be demonstrated without LPS priming. This observation raises the question of whether doxorubincin itself could activate the inflammasome and the related events. In vivo study, on the other hand, suggested that it doesn't require LPS. The inconsistency here was not explained further. Moreover, a tumor-free mouse model was used for the study; however, immune responses in tumor bearing models would likely be distinct from tumor-free ones. The justification for using tumor-free models is not well-established.

      Strengths:

      The paper includes a comprehensive study that shows the effects of doxorubincin on cytokine levels in serum, release of DAMPs and NETosis, and leukopenia using both in vivo and in vitro models. Bone marrow cells, macrophages and neutrophils were isolated from the bone marrow, and the levels of cytokines in serum were also determined.

      They employed multiple knockout models with deficiency in Aim 2, Nlirp3, and double deficiencies to dissect the functional involvement of these two inflammasomes.

      The experiments in general are well designed. The paper is also logically written, and figures were clearly labeled.

      Weaknesses:

      Most of the data presented are correlative, and there is not much effort to dissect the underlying molecular mechanism.

      It is not entirely clear why a tumor free model is chosen to study immune responses, as immune responses can differ significantly with or without tumor-bearing.

      Immune responses in isolated macrophages, neutrophils and bone marrow cells require priming with LPS, while such responses are not observed in vivo. There is no explanation for these differences.

      The band intensities on Western blots in Fig. 4 and Fig. 5 are not quantified, and the numbers of repeats are also not provided.

      Many abbreviations are used throughout the text, and some of the full names are not provided.

      Fig. 5B needs a label on X axis.

    4. Reviewer #2 (Public Review):

      Summary:

      Wang and collaborators have evaluated the impact of inflammation on bone loss induced by Doxorubicin, which is commonly used in chemotherapy to treat various cancers. In mice, they show that a single injection of Doxorubicin induces systemic inflammation, leukopenia, and a significant bone loss associated with increased bone-resorbing osteoclast numbers. In vitro, the authors show that Doxorubicin activates the AIM2 and NLRP3 inflammasomes in macrophages and neutrophils. Importantly, they show that the full knockouts (germline deletions) of AIM2 (Aim2-/-) and NLRP3 (Nlrp3-/-) and Caspase 1 (Casp1-/-) limit (but do not completely abolish) bone loss induced 4 weeks after a single injection of Doxorubicin in mice. From these results, they conclude that Doxorubicin activates inflammasomes to cause inflammation-associated bone loss.

      Strength:

      This manuscript provides functional experiments demonstrating that NRLP3 and/or AIM2 loss-of-functions (and thus the systemic impairment of the inflammatory response) prevent bone-loss induced by Doxorubicin in mice.

      Weaknesses:

      Numerous studies have reported that Doxorubicin induces systemic inflammation and activates the inflammasome in myeloid cells and various other cell types. It is also known that systemic inflammation and Doxorubicin treatment lead to bone loss. Hence, the key conclusions drawn from this work have been known already or were very much expected. Therefore, the novelty appears somewhat limited. One important limitation is the lack of experiments that could determine which cell lineages are involved in bone loss induced by Doxorubicin in vivo, while the tools to do so exist. The characterization of the bone phenotype is incomplete, and unfortunately does not tell us whether the inflammasome is activated in some of the cell lineages present in bones in vivo. Another limitation is that the relative importance of the inflammasomes compared to cell senescence and autophagy, which are also induced by Doxorubicin, has not been evaluated. Hence the main molecular mechanisms responsible for bone loss induced by Doxorubicin in vivo remains unknown. Lastly, it would have been interesting, on a more clinical point of view, to compare the few relevant treatments that could limit the deleterious effect of Doxorubicin on bone loss while preserving the toxicity on tumor cells.

    1. eLife assessment

      This important study combines in vivo and in vitro models to characterise the role of CYRI-B, an interactor of the small GTPase Rac1, in controlling pancreatic cancer progression towards a higher proliferative and metastatic stage. The evidence supporting the claims of the authors is convincing in characterizing a novel Rac1 binding protein, CYRI-B, as a regulator of metastatic potential in vivo, with distinct functions at different stages of tumour progression. CYRI-B reduces the typical hyperactivation of Rac1 in the early stages of tumour progression; subsequently, CYRI-B mediates internalization of lysophosphatidic acid receptor 1 (LPAR1) uptake through macropinocytosis, thus regulating chemotactic migration of cancer cells towards lysophosphatidic acid (LPA). Although the inclusion of human pancreatic cancer cell lines would have strengthened the study, the work will be of broad interest to cell biologists and the signalling research communities.

    2. Reviewer #1 (Public Review):

      In this paper, Nikolaou et al. demonstrated that CYRI-B expression is upregulated in a mouse model of pancreatic ductal adenocarcinoma (PDAC). Interestingly, they found that, while CYRI-B KO promotes the early stages of tumour progression, it prevents the formation of metastasis at later stages. Focusing on the latter, the authors highlight a role for CYRI-B in controlling the membrane availability of the LPA receptor LPAR1, which is required to support PDAC cell chemotaxis towards serum or LPA.

      Strengths: the in vivo and imaging data are very solid, and convincingly support the authors' conclusions. The KPC model is well-established in PDAC research and is a very powerful tool to investigate disease onset and progression. The imaging approaches used are of a very high standard. Good data presentation with the use of super-plots.

      Weaknesses: the authors focused on chemotaxis, but did not present any evidence with regard to the role of CYRI-B in 3D cell invasion, which is a key process associated with cell invasion. The data presented clearly show a specific effect towards liver metastasis, while diaphragm and bowel metastasis were not affected by CYRI-B deletion. It would be beneficial to include a discussion about this, providing some potential explanation behind this observation.

      This work is of interest to cell biologists not only working in pancreatic cancer but also more broadly to researchers interested in vesicular trafficking, plasma membrane receptor dynamics and cell migration.

    3. Reviewer #2 (Public Review):

      The manuscript expands on the previous work from the lab where novel interactors of Rac1 GTPase (CYRI-A and B) provide localized inhibition by sequestration of activated Rac1. These novel regulators are fascinating as they complement the functions of the classical negative modulators of GTPases, GAPs and GDIs. The current manuscript focuses on the in vivo role of CYRI-B in pancreatic cancer progression, and distinct CYRI-B functions are shown for early and later stages. The in vivo data following CYRI-B depletion (no change in proliferation, reduced metastatic potential) is substantiated with in vitro analyses of receptor uptake, temporal recruitment of CYRI-B on macropinosomes and reduced chemotaxis.

      The authors describe in detail the role of CYRI-B in pancreatic adenocarcinoma, building from their prior studies mapping CYRI-B function in the regulation of polarity, motility and chemotaxis. The experiments are well-designed and performed, and the text was clearly written. However, the results partially support some of their conclusions. The interpretation of the data and the discussion in the context of human pancreatic tumours would help the understanding and impact of the work.

      The hypothesis is that depletion of CYRI-B would promote localized Rac1 activation at the membrane. However, the authors show that CYRI-B is found overexpressed in PDAC, consistent with other papers where its high expression correlates with poor outcome of many cancers. The prediction is that Rac1 functions modulated by CYRI-B would be inhibited in those tumours where CYRI-B is overexpressed. Is this the case and has it been formally demonstrated?

      Most experiments use the depletion of CYRI-B to probe its function. It would be useful to readers and important to elaborate on how the specific CYRI-B functions shown upon depletion would fit with the in vivo observation of CYRI-B overexpressed in tumours. For example, loss of CYRI-B reduces chemotaxis potential. How this result can be conciliated with the predicted increase in Rac1activation in the absence of CYRI-B? Conversely, a prediction of CYRI-B overexpression in human tumours would imply the inactivation of Rac1 whereas chemotaxis is promoted. The discussion could be improved with the addition of the authors' views and further explanations in this context.

      Similarly, it is confusing to extrapolate a proposed increase in LPAR1 internalization by macropinocytosis with CYRI-B overexpression in PDAC. It is predicted that Rac1 would be locally inhibited in this scenario, and thus micropinocytosis would be compromised. It will be good to spell out what the authors envisage happens. For example, uptake could be switched to another receptor uptake process that would not involve CYRI-B sequestration of Rac1. Discussion of the potential alternatives will strengthen the manuscript.

      "..LPAR1 is a cargo of CYRI-B dependent macropinocytosis" (page 21). This statement reads as an overinterpretation of the specificity of the process. It may suggest that there is a cargo selectivity by CYRI-B, which has not been formally demonstrated or is well accepted. Macropinocytosis is thought to occur as a bulk engulfment of the membrane and thus any receptor at the cell surface would be internalised non-specifically. The demonstrated reduction in LPAR1 uptake could be proportional to the interference with micropinocytosis rate by CYRI-B depletion for example

      Furthermore, the readers would benefit from more clear explanations of the differences and similarities between CYRI-A and CYRI-B. It will be important to clarify the specificity of the proposed functions of each protein. Both localize at the macropinosomes, modulate engulfment and regulate integrin a5b1 trafficking. It will be informative to specify if CYRI-A is also upregulated in human tumours, has a similar outcome as CYRI-B in vivo and also regulates LPAR1 uptake.

      Upon depletion of CYRI-B in pancreatic tumour cells in vivo, the presence of similar levels of jaundice is confusing. Less metastasis is detected in the mesentery. Are liver metastasis affected in the absence of CYRI-B?

    4. Reviewer #3 (Public Review):

      The authors first characterize their mouse model of pancreatic cancer and show that CYRI-B mRNA is detectable in pancreatic lesions and that its amount increases over time. They also show that genetic deletion of CYRI-B accelerates pancreatic ductal adenocarcinoma (PDAC), leading to lower survival of mice. This is accompanied by higher levels of phospho-(i.e. activated)-JNK and -ERK, which are likely two of the factors driving cancer cell proliferation. Using in vivo transplantation, the authors further demonstrate that cancer cells depleted for CYRI-B exhibit decreased numbers of metastases in the mesentery, despite showing similar proliferation as control cells.<br /> Cancer cell migration can be driven by LPA, which binds LPAR1 at the surface of PDAC cells. Investigation of chemotactic migration of cancer cells towards fetal bovine serum as a source of LPA further shows that cancer cells depleted for CYRI-B and expressing GFP as control exhibit strongly reduced chemotactic migration, while cells re-expressing CYRI-B-GFP show normal chemotactic migration. Furthermore, this restored migration is blocked by using the LPAR1/3 inhibitor K116425, showing that CYRI-B is required for the chemotactic migration of PDAC cells in a gradient of serum LPA.

      Using live cell imaging, the authors show that CYRI-B-GFP and LPAR1-mCherry localize to macropinocytic cups and to macropinosomes, indicating that LPAR1 can be internalized by PDAC cells through macropinocytosis. This notion is supported by immunofluorescence analyses showing that PDAC cells depleted for CYRI-B have reduced LPAR1-mCherry internalization upon stimulation with LPA, compared to cells rescued by CYRI-B-GFP expression. Collectively, the authors suggest that CYRI-B regulates macropinocytic uptake of LPAR1, thus regulating the chemotactic migration of PDAC cells towards LPA, which supports the metastasis of pancreatic cancer.

      This is an interesting manuscript that makes a convincing case for the involvement of CYRI-B as a driver of PDAC. A particular strength is the expert use of different mouse models and derived cancer cell lines. The major conclusions are supported by the data presented. The results could be further strengthened by detecting CYRI-B protein (in addition to mRNA) in cancer lesions and also by staining endogenous CYRI-B and LPAR1 in the macropinocytosis experiments.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Roget et al. build on their previous work developing a simple theoretical model to examine whether ageing can be under natural selection, challenging the mainstream view that ageing is merely a byproduct of other biological and evolutionary processes. The authors propose an agent-based model to evaluate the adaptive dynamics of a haploid asexual population with two independent traits: fertility timespan and mortality onset. Through computational simulations, their model demonstrates that ageing can give populations an evolutionary advantage. Notably, this observation arises from the model without invoking any explicit energy tradeoffs, commonly used to explain this relationship.

      The model’s results are based on both numerical simulations and formal mathematical analysis.

      Additionally, the theoretical model developed here indicates that mortality onset is generally selected to start before the loss of fertility, irrespective of the initial values in the population. The selected relationship between the fertility timespan and mortality onset depends on the strength of fertility and mortality effects, with larger effects resulting in the loss of fertility and mortality onset being closer together. By allowing for a trans-generational effect on ageing in the model, the authors show that this can be advantageous as well, lowering the risk of collapse in the population despite an apparent fitness disadvantage in individuals. Upon closer examination, the authors reveal that this unexpected outcome is a consequence of the trans-generational effect on ageing increasing the evolvability of the population (i.e., allowing a more effective exploration of the parameter landscape), reaching the optimum state faster.

      The simplicity of the proposed theoretical model represents both the major strength and weakness of this work. On one hand, with an original and rigorous methodology, the logic of their conclusions can be easily grasped and generalised, yielding surprising results. Using just a handful of parameters and relying on direct competition simulations, the model qualitatively recapitulates the negative correlation between lifespan and fertility without requiring energy tradeoffs. This alone makes this work an important milestone for the rapidly growing field of adaptive dynamics, opening many new avenues of research, both theoretically and empirically.

      We thank the reviewers and editor for highlighting the importance of the work presented here.

      On the other hand, the simplicity of the model also makes its relationship with living organisms difficult to gauge, leaving open questions about how much the model represents the reality of actual evolution in a natural context.

      We presented both in results and discussion how the mathematical trade-offs between fertility and survival time give rise to (xb, xd) configuration representative of existing aging modes.

      In particular, a more explicit discussion of how the specifics of the model can impact the results and their interpretation is needed. For example, the lack of mechanistic details on the trans-generational effect on ageing makes the results difficult to interpret.

      We discussed the role of the transgenerational Lansing effect played to its function, there is no need for a particular mechanism beyond that function of transgenerational negative effect. We reinforce this in the discussion by adding the following sentence “Regarding the nature of the transgenerational effect, our model is agnostic and the mere transmission of any negative effect would be sufficient to exert the function.“

      Even if analytical results are obtained, most of the observations appear derived from simulations as they are currently presented. Also, the choice of parameters for the simulations shown in the paper and how they relate to our biological knowledge are not fully addressed by the authors.

      The long time limit of the system with and without the Lansing effect is based on analytical results later confirmed using numerical simulations. The choice of parameters is explained in the introduction as being the minimum ones for defining a living organism. As for the parameters’ values, our numerical analysis gives a solution for any ib, id, xb and xd on R+, making the choice of initial value a mere random decision.

      Finally, the conclusions of evolvability are insufficiently supported, as the authors do not show if the wider genotypic variability in populations with the ageing trans-generational effect is, in fact, selected.

      We do not show nor claim that evolvability per se is selected for but that the apparent advantage given by this transgenerational effect seems to be mediated by an increased genotypic/phenotypic variability conferred to the lineage that we interpreted as evolvability.

      Recommendations for the authors

      (1) The authors could use the lineage tracing results for the evolvability aspect. Specifically, within subpopulations featuring the Lansing effect, it would be valuable to explore whether individuals with parental age greater than the mortality onset (a > x_d) demonstrate higher fitness compared to individuals with a < x_d. Additionally, an examination of how this variation evolves over time could provide further insights into the dynamics of the proposed model.

      We thank the reviewer for this suggestion. This is an ongoing work in the group, especially in the context of varying environmental conditions.

      (2) In all simulations, I_b = I_d = 1, resulting in total fertility (x_b * I_b) equating to x_b, while x_d is proportional to life expectancy. Considering an exploration of the implications of this parameter setting, the authors could frame x_d as a 'lifespan cost', potentially allowing for the model to be conceptualised in terms of energetic tradeoffs. This might offer additional perspectives on the dynamics of the model and its alignment with biological principles.

      We discuss how the apparent trade-offs given by the model depending on ib and id values can be related to the interpretation of such trade-offs that has been accepted for most of the past century. Our claim here in the discussion is that one does not need such energetic trade-off for the fertility/longevity trade-offs to appear. Such energetic trade-off is not a “biological principle” but merely an accepted interpretation of a fertility/longevity trade-off that is not even a general mechanism.

      (3) Considering the necessity of variation in x_d for the observed patterns, an exploration could be undertaken by the authors to examine a model where x_d is simply variable without inheritance. This could involve centring x_d at some value d with some variance σ_d for all individuals. In such a scenario, it may be observed whether the same convergence of x_b - x_d occurs without requiring x_d to be selected. Furthermore, similar consequences of the Lansing effect could potentially be identified.

      This was done early on during our work and did not show any major changes in the model’s behaviour beyond the time of convergence. We did not include it to the final manuscript because of the low added value to an already long and complex manuscript.

      (4) While it may not be necessary to alter the model itself, it is suggested that the authors consider acknowledging the potential consequences of certain modelling decisions that might be perceived as biologically unrealistic. Notable examples include assumptions such as fertility from birth and zero mortality prior to x_d. These assumptions, such as infertility from birth, could be viewed as distinctive features, and it might be worth mentioning that parental care of offspring could have co-evolved with such features. This is particularly relevant considering the energy tradeoff hypothesis that has been postulated.

      Although inspired from results obtained in Drosophila, mice, nematodes and zebrafish, the model is so far haploid and asexual, thus involving individuals likely more similar to unicellulars. In these conditions, infertility from birth did not seem relevant to us. However, the model and codes are accessible online and we hope that others will use it to address such questions. It is interesting though to notice that ageing appears here without such constraint.

      Additionally, the consideration that all organisms face a non-trivial mortality rate at every age, not solely from physiological causes, reflects the reality within which selection operates.

      We thought this was the best way to reflect, an environment with a limited carrying capacity. A more complex model is under construction to take into account the fact that older individuals might be more sensitive to it than younger ones.

      (5) While acknowledging the technical rigour applied by the authors, it is suggested that further attention be given to conducting a comprehensive 'reality check' associated with the chosen parameters, particularly regarding the biological relevance of the results. For instance, the authors argue that offspring of old organisms do not, on average, live similarly to their parents. However, it is noted that studies in the haploid asexual organism yeast, akin to what the authors model (albeit not necessarily yeast), revealed that the average lifespan of yeast progeny born from young or old mothers is very similar.

      We do not claim that progeny of old parents live less long than that of younger parents on average, we say that it happens in the progeny of physiologically old parents, representing at most 10% of the population in our numerical simulations.

      The authors cite experimental evolution in Drosophila progeny conceived later in the life of the parent, indicating that the onset of mortality in these progeny occurs late, sometimes even after the end of the fertility period (Burke et al., 2016; Rose et al., 2002). While the authors report their own previous studies with divergent results, independent experiments have suggested an increase of x_d following an artificial increase of x_b (Luckinbill and Clare, 1985; Sgro et al., 2000). A more in-depth consideration of these contrasting observations and their potential implications for the current model could enhance the overall robustness of the study.

      The increase of x_d following an artificial increase of x_b is predicted by our model as discussed. The divergence of observations between studies is alas hard to assess.

      (6) To enhance readability and maintain consistency, it is suggested that the authors homogenise the description of key parameters, specifically x_b and x_d, throughout the text. This could contribute to improved clarity and rigour. One recommendation is to refer to x_b consistently as the 'fertility span' and x_d as the 'mortality onset' for the sake of uniformity in terminology.

      We have modified the text accordingly.

      (7) At various points in the text, the assertion is made that observations have indicated a tradeoff between fertility and longevity. It is recommended that the authors provide references or data to substantiate this claim. This addition would contribute to the empirical grounding of the mentioned tradeoff and strengthen the overall support for the assertions made in the study.

      We added the following references to the discussion Lemaitre et al., 2015, Kirkwood, 2005 and Rodrigues and Flatt 2016.

      (8) The statement claiming that the model is 'able to describe all types of ageing observed in the wild' should be moderated. As the authors themselves acknowledge, the model is referred to as a 'toy model,' and it is made clear that it cannot capture, nor is intended to capture, the entire diversity observed in life. Adjusting this statement to reflect the limited scope and purpose of the model would enhance precision and accuracy in the presentation of its capabilities.

      Although a toy model, its possible configurations encompass all the possible configurations described so far across the diversity of ageing throughout the tree of life from negligible senescence with no loss of fertility (x_b and x_d >> 0) to menopause-like configurations (x_b >> x_d) through fast mortality increase post reproduction (x_b = x_d). Replacing our current square functions would allow age-dependant decrease or increase of fertility and/or risks of mortality onsets.

      (9) To bolster the biological relevance of the study, it is strongly recommended that the authors cross-check the results of their simulations with previously published experimental findings. This approach would serve to strengthen the alignment between the model outcomes and observed biological phenomena. Additionally, placing greater emphasis on the biological relevance aspects throughout the text would contribute to a more robust and comprehensive exploration of the study's implications.

      In the present manuscript we have tried to cite a certain number of results from artificial selection experiments on life history traits in order to strengthen the interpretations of our model’s behaviour. There are numerous other studies, going in the same direction or not, but we do not think that it would be relevant to add an exhaustive list of them. Nevertheless, we added Stearns et al., 2000 that adds extrinsic high mortality to the evolution of life history traits.

      (1) For enhanced clarity, it is suggested that the x-axis in Figure 1 be labelled as 'age.' Considering this adjustment could contribute to clearer visual communication of the data.

      We agree with the reviewer and modified the figure accordingly.

      (!!) The addition of graphical legends is recommended for Figures 3-5, as well as the supplementary figures. Including these legends would provide essential context and improve the interpretability of the figures for readers.

      We agree with the reviewer and modified the figure accordingly.

      (12) For improved distinction of the ranges indicated by quantiles in Figure 3, it is suggested that the authors consider enhancing visual clarity. One approach could involve making the middle quantile thicker or using a different line type. Additionally, it is recommended to explore the calculation of the highest density 90% intervals rather than the 1-9 deciles. This adjustment could contribute to a clearer representation of the data distribution in the figure.

      We named the different deciles directly on the figure to improve readability.

      (13) It is observed that the mathematical proofs in Annex 1 are not displaying properly in the PDF. Additionally, there seem to be missing and broken references for the Annex. This issue may be related to LaTeX formatting. The authors could consider revisiting the formatting of Annex 1 to ensure the correct display of mathematical proofs and address the referencing concerns, possibly by checking and rectifying any LaTeX-related issues.

      The latex file of the supplementary was not correctly compiled. It is now corrected.

      (14) There is inconsistency in the text regarding the reference to the Annex, with both 'Annex' and 'Annexe' being used interchangeably. To maintain uniformity, it is suggested that the authors consistently use either 'Annex' or 'Annexe' throughout the text. This adjustment would contribute to a more polished presentation of the supplementary material.

      We corrected them accordingly.

      (15)There appears to be a typographical error in the name of Supplementary Figure 3.

      We corrected it accordingly.

    2. eLife assessment

      Through a theoretical approach, this study makes important contributions to our understanding of the evolutionary causes of the ageing process. Using a simple individual-based model and computational simulations, the authors provide convincing evidence that ageing can be a trait under natural selection, opening the door for further discussion in the context of lifespan extension research.

    3. Joint Public Review:

      Roget et al. build on their previous work developing a simple theoretical model to examine whether ageing can be under natural selection, challenging the mainstream view that ageing is merely a byproduct of other biological and evolutionary processes. The authors propose an agent-based model to evaluate the adaptive dynamics of a haploid asexual population with two independent traits: fertility timespan and mortality onset. Through computational simulations, their model demonstrates that ageing can give populations an evolutionary advantage. Notably, this observation arises from the model without invoking any explicit energy tradeoffs, commonly used to explain this relationship.

      Additionally, the theoretical model developed here indicates that mortality onset is generally selected to start before the loss of fertility, irrespective of the initial values in the population. The selected relationship between the fertility timespan and mortality onset depends on the strength of fertility and mortality effects, with larger effects resulting in the loss of fertility and mortality onset being closer together. By allowing for a trans-generational effect on ageing in the model, the authors show that this can be advantageous as well, lowering the risk of collapse in the population despite an apparent fitness disadvantage in individuals. Upon closer examination, the authors reveal that this unexpected outcome is a consequence of the trans-generational effect on ageing increasing the evolvability of the population (i.e., allowing a more effective exploration of the parameter landscape), reaching the optimum state faster.

      The simplicity of the proposed theoretical model represents both the major strength and weakness of this work. On one hand, with an original and rigorous methodology, the logic of their conclusions can be easily grasped and generalised, yielding surprising results. Using just a handful of parameters and relying on direct competition simulations, the model qualitatively recapitulates the negative correlation between lifespan and fertility without requiring energy tradeoffs. This alone makes this work an important milestone for the rapidly growing field of adaptive dynamics, opening many new avenues of research, both theoretically and empirically.

      On the other hand, the simplicity of the model also makes its relationship with living organisms difficult to gauge, leaving open questions about how much the model represents the reality of actual evolution in a natural context. In particular, a more explicit discussion on how the specifics of the model can impact the results and their interpretation is needed. For example, the lack of mechanistic details on the trans-generational effect on ageing makes the results difficult to interpret. Even if analytical results are obtained, most of the observations appear derived from simulations as they are currently presented. Also, the choice of parameters for the simulations shown in the paper and how it relates to our biological knowledge is not fully addressed by the authors. Finally, the conclusions of evolvability are insufficiently supported, as the authors do not show if the wider genotypic variability in populations with the ageing trans-generational effect is, in fact, selected.

    1. eLife assessment

      This study presents an interesting set of findings that connects N-cadherin and glypican-4 to Slit signaling during the regulation of contact inhibition of locomotion of Schwann cells in culture. Solid evidence is provided showing that N-cadherin not only regulates cell recognition but also proper trafficking of Slit to the cell surface. An ex-vivo model demonstrates the importance of Slit signaling during cell migration but the molecular details of how N-cadherin traffics Slit to the surface and role of glypican are unclear. The data would have been strengthened with a similar interrogation of N-cadherin in this system. The work will be of interest to cell biologists studying the mechanisms behind peripheral nervous system regeneration.

    2. Reviewer #1 (Public Review):

      Hoving and colleagues investigated the mechanisms of contact inhibition of locomotion (CIL) in Schwann cells using cell migration assays, in combination with siRNA as well as an ex-vivo model for collective cell migration of the peripheral nervous system. They found that N-cadherin is needed for proper cell repulsion during CIL. Schwann Cells depleted of N-cadherin failed CIL when encountering other Schwann cells depleted of N-cadherin, however they maintained CIL when encountering Schwann cells expressing N-cadherin. Depletion of alpha-catenin and to some degree p120 did not have the same effect as N-cadherin depletion. Further, they determined that the extracellular domain is needed for CIL as well as an interaction with Glypican-4. Glypicans often act as co-receptors for other signaling molecules, and so the authors further narrowed CIL's dependence to Slit signaling. N-cadherin was needed for proper Slit surface expression, again, dependent on the extracellular domain, and depletion of both Slit2 and 3 lead to a cell clumping and rounding phenotype. Finally, using an ex-vivo model of Schawnn cell migration they showed that rSlit lead to a similar cell rounding and clumping phenotype, ultimately leading to an inhibition of cell migration.

      Strengths

      This was a very methodical examination of what is needed for CIL in cultured Schwann cells. The data presented largely supports the findings and the linking of N-cad to glypican-4 to Slit signaling further illuminates this process helping to define the molecular players. The mechanistic insight goes further in that they demonstrate the Slit does not get to the cell surface without the expression of the extracellular domain of N-cad.

      Weaknesses

      The conclusions that can be drawn from this study remain a little narrow since only Schwann cells were used. This is not so much a weakness in that authors were indeed investigating the periphery nervous system regeneration but it does limit their findings. The experiments carried out in the ex-vivo system only touch on one aspect of their cell culture work, the mechanism of Slit. No other aspects of their cell culture system was tested ex-vivo which

    3. Reviewer #2 (Public Review):

      Schwann cells actively repair and regenerate peripheral nerves following tissue injury. Central to this process is the collective cell migration of 'cords' of Schwann cells, which guide the regenerating axons across an injury site. Previously published research from the Lloyd lab shows that at the injury site, Schwann cell cords are maintained via N-cadherin-based cell-cell adhesions; however, when cultured under low density conditions, Schwann cells display cell repulsion and contact inhibition of locomotion (CIL) phenotypes, suggesting Schwann cell behaviour is plastic. In this study, Hoving, Lloyd and colleagues build upon their previous work to show that Slit2/3/Robo signalling triggers cell repulsion between Schwann cells in an N-cadherin-dependent manner. This in turn induces contact inhibition of locomotion to propel Schwann cells to migrate collectively and with direction. The authors show that N-cadherin has a dual function in Schwann cell migration: to keep migrating Schwann cells together as a group, and concomitantly present Slit2/3 repulsive cues to cells to trigger cell repulsion locally. Their data also show that extracellular N-cadherin is required for cell repulsion, independent of cell-cell adhesion functions. The authors use a combination of in vitro Schwann cell cultures and live cell imaging, with an ex vivo precision cut tissue slice system to show that Slit2/3-dependent CIL underpins proper Schwann cell migration in an injury model.

      This is a very well executed and important study, which provides new insights into mechanisms of CIL and places CIL in the context of tissue repair and regeneration in adult tissues. The experiments are well designed, and the main findings and conclusions are based on robust and convincing data.

    4. Reviewer #3 (Public Review):

      The submitted paper by Hoving et al addresses the role of N-cadherin in Schwann cell collective cell migration and its previously unknown relationship with the slit/robo signaling pathway. The main conclusion is that N-cadherin has two distinct functions. One that is dependent on its classical role as a cell-cell junction protein promoting cell clustering and one that promotes cell repulsion and polarity independently of the formation of cell adhesion complexes. The second function is mediated by the Slit/Robo pathway. It is proposed that N-cadherin and Glypican-4 act together to present Slit2/3 at the surface of Schwann cells in order to trigger Robo signaling on neighboring cells.

      The data about N-cadherin loss of function and the associated rescue experiments with the various truncated forms of N-cadherin are well substantiated by proper controls for efficiency and specificity. They show that the extracellular domain of Ncadherin is the one required for the repulsive effect. The experiments performed to distinguish the roles in adhesion and repulsion seem clear and conclusive. In addition, the fact the slit signal needs to be provided in a polarized manner for directional migration to occur is also clearly demonstrated in vitro and on slice assays. Overall the model that Ncadherin plays two different roles, a repulsive one via presentation of slit at the cell surface and a cell adhesion one via formation of adherens junctions, is well supported by the data and will be of interest beyond the subfield of the authors.

      However, other parts of the manuscript seem weaker. If N-cadherin presentation of the Slit signal is so critical why are repulsion rates still very high in cells without N-cadherin? Same is observed with Glypican4 knockdowns. In both loss of function 50% of cell collisions lead to repulsion (compared to 70% amongst control cells). While significant such drop remains modest. The authors propose a cooperative role of Glypican-4 and N-cadherin at the cell surface as co-binding factors for Slit2/3 but they have not checked whether double knockdown of N-cad and Glypican4 might have a stronger effect. Could Glypican and N-cadherin present Slit at the cell surface independently in a somewhat redundant manner? Can Glypican and Slit interact physically in absence of N-cadherin? They also have not further analyzed the putative colocalization fo Ncad and Glypican at the cell surface.

      The data supporting a role for N-cadherin in Slit's trafficking to the cell surface seem also circumstantial. While western blot data seem to indicate no change in Slit protein level after N-cad knockdown, immunostaining for Slit in such condition show a dramatic loss of Slit signal. These two independent data sets are difficult to reconcile and are not designed to address whether Slit reaches the cell surface in control or N-cadherin knockdown conditions.

      If Slit signaling is so critical for repulsion why in double sit2/3 knockdown 40% of collisions still lead to repulsion. Also, no analysis of cell collision are provided upon Robo1/2 knockdown for comparison with Slit knockdowns. Altogether, these relatively mild effects of n-cad, slit or glypican knockdown on repulsion seem to indicate that other signals might contribute to contact-inhibition and polarization/repulsion of cells upon physical contact but this is unfortunately not discussed. All statements related to cell polarity stem from the overall cell morphology without being substantiated by actual polarity analysis (using markers such as detection of Rac-GTP or using a proxy such as the golgi-nucleus axis). The authors present the cell cluster generated after Sox2 expression and Sox2 + exposure to recombinant Slit2 as lacking polarity, however in one case cells do not present any flat membrane at their free edge whereas in the other case they do. This suggests a minimal cell polarity with a protrusive-like organization away from the contact. Finally, Robo1/2 siRNA knockdown are used but contrary to the other loss of functions it seems that controls for knockdown efficiency/specificity were not provided.

      Therefore, while the study is overall well documented and based on solid data, some weaknesses exist.

      The overall topic is clearly of broad interest as N-cadherin is protein essential in various biological settings from development to disease but the range of its biological functions remains to be fully explored. This study clearly adds to the current knowledge and how N-cadherin might act in vivo and in particular how it could mediate crosstalks between various signaling pathways.

    1. eLife assessment

      This work presents a potentially important behavioral finding: that perceptual learning may not only improve but also distort the appearance of visual stimuli. The strength of the presented evidence in support of the main claim is however incomplete, and requires further analyses to confirm that perceptual learning does increase overestimation bias, and clarify why a very large baseline overestimation bias is present in the data.

    2. Reviewer #1(Public Review):

      In this manuscript the authors report an experiment to assess how training on a perceptual task may not only increase performance on that task but impact on the appearance of the trained stimuli. They compare discrimination performance, coherence thresholds, and estimation biases for random dot motion direction relative to horizontal rightward in three groups of observers before and after 3 days in which they either trained on a discrimination task, an estimation task, or did not train. The authors report significant increases in discrimination performance post training compared to not training. They also report increases in estimation biases when assessed as the average estimate (over a bimodal distribution that crosses 0) but not when assessed as the mode of the bimodal distribution. They conclude that training resulted in "increases in already-large estimation biases away from horizontal".

      The methods and results are strengthened by the combination of classical psychophysical techniques and sophisticated computational modelling. One weakness is the possibility is misleading summary statistics when dealing with bimodal distributions. Convincing evidence that observers perceived stimulus directions as further from horizontal (in the absolute sense) following training is not presented in the current manuscript. Irrespective, this work is likely to impact the field.

    3. Reviewer #2 (Public Review):

      It is well-known that repeated exposure to perceptual stimuli improves discrimination performance, but less is known about the effects on perceptual appearance. In the present work, the authors tackle this question and focus on one particular effect on perceptual appearance termed boundary avoidance, i.e. the tendency to perceive (or report) a stimulus as biased away from a discrimination boundary.

      In the study, participants performed either a motion discrimination task (clockwise or counterclockwise with respect to a reference axis) or an estimation task (reproducing the orientation of the motion stimulus). Participants were divided in three groups which either i) trained on the discrimination task, ii) trained on the estimation task or iii) received no training (control group). Performance in both tasks was assessed prior and after training. The main behavioral finding is that training (which did not involve feedback) improved discrimination performance and increased estimation precision, but at the same time appeared to increase the boundary avoidance effect. Thus, the authors conclude that perceptual learning improved performance at the cost of appearance.

      To explain these effects, the authors created a computational model in which performance improvements were implemented as a gain increase of neurons sensitive to the trained motion directions. Repulsive biases away from the reference orientation were implemented by a combination of two modeling choices: i) Even during estimation, participants perform an implicit categorization such that they assume that their percept was created by a stimulus in line with their categorization (clockwise or counterclockwise). This effectively biases their response away from the boundary. ii) There is an abundance of neurons tuned to the horizontal reference axis (the "boundary") which likewise leads to a repulsive bias. Overall, the authors conclude that the model was able to explain the major behavioral effects, including the a priori presence of repulsive biases, the increase in performance, the increase in estimation precision and the increase of the repulsive bias.

      It is well-known that repeated exposure to perceptual stimuli improves discrimination performance, but less is known about the effects on perceptual appearance. In the present work, the authors tackle this question and focus on one particular effect on perceptual appearance termed boundary avoidance, i.e. the tendency to perceive (or report) a stimulus as biased away from a discrimination boundary. On first glance, it was a pleasure reading this paper due to a number of aspects the authors got quite right in my opinion:<br /> - A clear and well-explained research question.<br /> - The results are generally well-presented. Much effort and expertise was put into the Figures and many helpful auxiliary Figures are included as a Supplement.<br /> - The writing was concise and clear.

      However, as outlined below, I'm afraid that the main conclusion of the study and the main motivation for computational modeling are not backed up by the data.

      (1) No evidence for a change in overestimation<br /> Overestimation is (rightly) defined by the authors as a bias of the perceived orientations towards more extreme values (visualized also in Fig. 2F). However, as acknowledged by the authors, there is nearly no evidence for such an effect. The modal estimation response (correct trials) doesn't change significantly between the sessions. The mean, which is the primary measure used by the authors, is not an appropriate measure for an overestimation, as it is severely biased by accuracy. It was unclear to me why it was chosen as the primary measure for nearly all figures and analyses, given that the authors were aware of (and reported) a more suited measure.

      In my opinion, the mode of the correct responses would be the best way to quantify the overestimation bias. An alternative would be looking at the average absolute (unsigned) distance from the boundary, possibly including both correct and incorrect responses. However, such a "mean of absolute differences" approach would be affected by lucky guessing trials, which could manifest in a probability mass close to the boundary (and the proportion of which changes with overall accuracy). Therefore I see the mode as the strongest and least confounded measure.

      (2) Nature of the biases<br /> Although, as outlined in 1), there might actually be no evidence for a *change* in overestimation bias, there clearly was a baseline overestimation bias. However, the reported biases appear extremely large. For instance, for the 2{degree sign} orientation the modal estimation is close to 20{degree sign}. To me this raises the question whether we're really dealing with a pure perceptual effect (18{degree sign} misperception seems quite suboptimal) or whether there are some other psychological effects at work that could be rather classified as a response bias.

      In particular, I wondered whether the baseline bias is partly explained by participants "wanting to make sure" they indicate the correct category in estimation and therefore bias their estimation response away from the ambiguous proximity of the cardinal axes? Does it require more effort to set estimation orientation close to a cardinal axis while still making sure that it has the correct categorical orientation. I guess there was no horizontal reference line on the screen which would help with this?

      The overall discrimination-focused task design might have contributed to this bias. First, because the participants trained on estimation also performed a discrimination task (pre/post) which very likely could have affected their response style. Second, the presented orientations during estimation were likewise 50:50 around the horizontal reference which could shift the focus towards "getting the sign right" rather than "getting the precise orientation right".

      (3) The mechanism of the model<br /> As a disclaimer a priori, I am not very familiar with this particular modeling literature (but this may be the case for other readers as well). For this reason I could have used a bit more guidance about how the model works. My understanding is that there a three key mechanisms: 1) Gain modulation which explains the improvement in discrimination; 2) Warping which partly explains boundary avoidance; 3) Implicit categorization which likewise partly explains boundary avoidance. In addition, there are two levels of analysis: 1) the pre-training state (a priori presence of a repulsive bias) and 2) learning effects (bias and performance increase through training). If the models were to be kept as part of a revised manuscript, my suggestion would be to structure the corresponding section in the Results ("Observer Model") a bit more along these anchors. I suggest also providing a bit more explanation already at this point. For instance, I consider the fact that implicit categorization effectively works through Bayes rule by assuming a uniform(?) prior over either the negative or positive orientation axis, as very relevant. I assume that other priors would have been conceivable for conditioning on the response, e.g. taking into account the actual (objective or subjective) distribution of orientations for the particular choice category, so this is a non-trivial modeling choice.<br /> Intuitively, I would have also thought that if more resources are devoted to the cardinal directions (and the decoder is unaware of this), this would lead to a bias *towards* the cardinal directions. If more neurons fire particularly strong to near-cardinal orientations (such as the +-4{degree sign} in training), why would the decoder be repulsed *away* from the cardinal orientation? I trust the authors that the presentation is correct, but to me, this was not obvious and I would have wished for some guidance.

    1. Reviewer #1 (Public Review):

      In this manuscript by Buchanan and colleagues, the authors set out to determine if mutations associated with resistance to the Plasmodium apicoplast inhibitor azithromycin (AZ) had a measurable impact on the fitness of Plasmodium berghei and P. falciparum parasites as they traverse both the mosquito host and vertebrate liver.

      The Plasmodium endosymbiotic organelles - the mitochondrion and apicoplast - are attractive drug targets as they (1) possess essential functions across the multi-host multi-compartment life cycle of these parasites, and (2) are of bacterial origin and thus are vulnerable to inhibition both to extant antibiotics, and novel drugs with high parasite specificity.

      Historically however the high resistance propensity of drug targets encoded in the organellar genomes (most notably atovaquone and doxycycline) has precluded the use of these drugs in an endemic setting, limiting these potent compounds to use in prophylaxis for travelers from non-endemic countries. Several studies in the last decade now fairly definitively show that mutations conferring resistance to atovaquone in the mitochondrial gene cytochrome b are, in a mutation-dependent manner, totally or near-totally compromised in their ability to infect, grow, and escape the mosquito host, leading to a reexamination of the potential utility of this extraordinarily potent drug in endemic settings. Symmetries exist between the Plasmodium mitochondrion and apicoplast, which both appear to have highly fexpanded roles in the mosquito and liver relative to the blood stages. Thus, the authors set out to explore whether mutations in essential apicoplast genes were, in a similar manner to mutations in cytochrome B, associated with fitness effects in the mosquito and/or liver.

      Towards this, the authors selected for several AZ-resistant parasite populations, all of which acquired mutations in the apicoplast genome-encoded ribosomal protein Rpl4. Interestingly, the authors observed contrasting fitness effects caused by these mutations, both between mutants within Plasmodium species, and between species. In P. berghei, AZ mutants were compromised in their ability to form oocysts and sporozoites, and a large proportion of sporozoites lacked an intact apicoplast and displayed aberrant gliding behaviour. Similarly, in the liver, Rpl4 mutant P. berghei liver schizonts were smaller, had fewer nuclei, and appeared extremely limited in their ability to cause a patent infection - crucially in particular via mosquito bites. Surprisingly, a P. falciparum Rpl4 mutant (notably in a different position of the protein) had no impact on sporogony but appeared to have a strong impact on liver schizont development in a liver-humanized mouse model, suggesting that establishment of blood stage infection in a subsequent human host would be less likely for mutant parasites.

      This is a well-executed study, that presents novel and noteworthy findings. The impact of drug-resistance-conferring mutations in Plasmodium outside of the blood stage is woefully understudied, primarily due to significant challenges associated with studying Plasmodium, especially P. falciparum, in both the mosquito and liver which the authors navigate commendably. The results presented in this manuscript leverage state-of-the-art techniques and clearly support the authors' conclusion that AZ-conferring resistance mutations have a strong negative effect on the ability of Plasmodium parasites to both reinfect and cause symptomatic infection in a subsequent vertebrate host. This could indicate that apicoplast-targeted inhibitors are more attractive as co-drugs for malaria treatment than previously thought, due to the reduced probability of the spread of resistance, which has been a perennial issue in malaria therapeutic care.

    2. eLife assessment

      This important work addressed the identifcation of antimalarial drug resistance mutations that do not readily transmit to new human hosts, focusing on azithromycin resistance. The technically challenging analyses of azithromycin-resistant parasites as they traverse the mosquito host and human liver are conducted using state-of-the-art tools, including humanized mice. While the claim regarding the lack of transmission by atovaquone-resistant P. berghei is convincing, the evidence for the lack of transmission by atovaquone-resistant P. falciparum is insufficient. This work will appeal to biologists and biomedical scientists in parasitology and drug discovery, offering insights into combating antimalarial drug resistance and understanding the fitness costs associated with drug-resistant parasites.

    3. Reviewer #2 (Public Review):

      Summary:

      This manuscript answers an important question about the transmission of Plasmodium parasites resistant to apicoplast inhibitors, specifically azithromycin. This study builds on previous work showing the inability to transmit parasites resistant to mitochondrial inhibitor, atovaquone, based on fitness defects in transmission stages in the mosquito. The transmissibility of drug-resistant parasites is grounded in the basic biology of the Plasmodium lifecycle and has implications for the selection of drug regimens for clinical treatment, so these questions are highly significant. The authors clearly demonstrate severe defects in mosquito stages of azithromycin-resistant (AZR) P. berghei (rodent species) inhibiting transmission of AZR parasites. However, surprisingly AZR P. falciparum (human species) is unaffected in mosquito stages, rather defects are observed in liver-stage development suggesting AZR P. falciparum can transmit but may not mount a productive blood infection. The differences in the observed defects in the 2 species are important and well demonstrated in the results but are obscured in the title/summary of the manuscript. The results demonstrate that AZR parasites are unlikely to spread.

      Strengths:

      The authors performed experiments with both P. berghei (rodent species) and P. falciparum (human species). P. falciparum is the more relevant species from a clinical standpoint, however, there are limitations to studying the full lifecycle of P. falciparum which only infects humans and some primates, for example requiring humanized mice without intact immune systems. Pberghei is commonly used in lifecycle analyses as a proxy for experimental tractability, however, there are cases where the biology of P. berghei does not reflect that in P. falciparum. So the use of both species is complementary and most informative. Specific modification of the apicoplast genome, where AZR mutations are located, is not currently possible so matched genotypes could not be produced but multiple AZR mutants were analyzed for each species. Acknowledging these limitations in the experimental systems available, the authors perform a thorough set of experiments to pinpoint the specific defects in AZR Pb vs Pf during mosquito and liver developmental stages. The results show phenotypic differences between AZR Pb and Pf in mosquito stages which was not expected but in line with differences in apicoplast biology of Pb vs Pf that are important to document and be aware of when using P. berghei as a model for P. falciparum development.

      Weaknesses:

      The claim that human AZR malaria parasites (P. falciparum) is not readily transmitted to mosquitoes is incorrect, as stated in the title and abstract. Strictly speaking, transmission refers to the infection of a human host by another via mosquitos. The evidence that AZR Pf is unaffected in mosquito development indicates that transmission is not reduced compared to WT Pf. Rather transmitted AZR Pf has disrupted liver stage development and may not mount a productive blood infection. This distinction between the phenotypes of AZR Pb vs Pf is surprising, significant (suggesting differences between Pb and Pf and/or specific mutations in Rpl4), and should be more accurately represented in the title/abstract. To their credit, the authors performed thorough experiments to pinpoint the specific defects in AZR Pb vs Pf, but the current claim about AZR Pf is misleading.

      Additional context:

      Clinical trials (MORDOR studies) in several African countries have shown that biannual administration of azithromycin reduces childhood mortality (PMID: 31167050). The mechanism of this survival benefit is unknown and may be multifactorial. The findings in this manuscript can also be considered in the context that azithromycin is a commonly used antibiotic and may be administered for purposes other than malaria treatment. In theory, AZR-resistant parasites could be selected in asymptomatic patients not receiving malaria treatment but receiving azithromycin. This study indicates that mass distribution of azithromycin for other clinical applications would not result in adverse effects on malaria transmissions in the same population.

    4. Reviewer #3 (Public Review):

      Summary:

      Drug resistance is a perennial problem for malaria control and strategies to prevent the acquisition and spread of drug resistance mutations are desperately needed. One strategy is to identify drug resistance mutations that arise in blood-stage parasites, but cannot be readily spread to a new human host. Since malaria parasites must survive and replicate in mosquitoes in order to be transmitted, mutations with elevated mosquito-stage fitness defects will not spread efficiently. Buchanan and coworkers focus on the drug azithromycin and its known role of inhibiting the ribosomes found in parasite apicoplast organelles. Apicoplast organelles are known to have elevated metabolic activity in mosquito stage parasites and azithromycin resistance mutations could interfere with mosquito stage parasite development and parasite transmission.

      To address this hypothesis, azithromycin-resistant P. berghei and P. falciparum parasites were generated and analyzed for transmission defects. All lines had mutations in the apicoplast ribosomal protein Rpl4 consistent with the known role of azithromycin inhibiting the 50S ribosomal subunit. Overall, the three lines (3 berghei and one falciparum) had phenotypes that should limit parasite transmission, however, detailed characterization showed that there were surprising differences between the two parasite species and even between the P. berghei lines. The P. berghei lines produced fewer oocysts and sporozoites with aberrant apicoplast morphology compared to wild-type controls. Sporozoites from azithromycin-resistant lines appeared to have motility defects and typically were not able to infect mice (one strain produced infections when 10,000 sporozoites were injected, but not when 1,000 were).

      By contrast, the azithromycin-resistant P. falciparum strain did not display any mosquito-stage phenotypes and produced motile sporozoites with intact apicoplast organelles. These sporozoites, however, developed abnormally in a humanized mouse model with reduced liver-stage nuclear division and abnormal apicoplast morphology. These defects combined with a five-fold lower prevalence suggest that azithromycin-resistant P. falciparum parasites experience significant fitness costs during liver stage development (at least those harboring the G76V mutation).

      Strengths:

      This work was carefully conducted and transparently presented. It provides a comprehensive view of how parasite development is impacted by azithromycin resistance mutations during the mosquito and liver stages in P. berghei and P. falciparum. It adds a new dimension to the growing literature on the transmissibility of drug-resistant parasites, by showing that mutations in the apicoplast genome can impact transmission.

      Weaknesses:

      Whether these liver-stage defects in P. falciparum are severe enough to completely block subsequent blood-stage infection remains to be seen and would require experiments with humanized mice continuously grafted with human red blood cells - a difficult and expensive model system.

    1. Author Response

      eLife assessment

      The authors present evidence that small extracellular vesicles can be secreted from cells inside larger vesicles that they call amphiectosomes, which then tear to release their small vesicle contents. There are questions and concerns relating to the quality of the data and the in vivo significance of the observations. The findings are potentially important but the data are incomplete and the claims are only partially supported.

      We agree that the in vivo significance and details of the molecular background of amphiectosome release remains to be studied further. However, as Reviewer 2 indicated, our data in this Short Report may have a substantial impact on our understanding of EV biogenesis. Therefore, we considered it was important to publish our data as soon as possible because it may significantly impact other EV biogenesis studies.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors' research group had previously demonstrated the release of large multivesicular body-like structures by human colorectal cancer cells. This manuscript expands on their findings, revealing that this phenomenon is not exclusive to colorectal cancer cells but is also observed in various other cell types, including different cultured cell lines, as well as cells in the mouse kidney and liver. Furthermore, the authors argue that these large multivesicular body-like structures originate from intracellular amphisomes, which they term "amphiectosomes." These amphiectosomes release their intraluminal vesicles (ILVs) through a "torn-bag mechanism." Finally, the authors demonstrate that the ILVs of amphiectosomes are either LC3B positive or CD63 positive. This distinction implies that the ILVs either originate from amphisomes or multivesicular bodies, respectively.

      Strengths:

      The manuscript reports a potential origin of extracellular vesicle (EV) biogenesis. The reported observations are intriguing.

      Weaknesses:

      It is essential to note that the manuscript has issues with experimental designs and lacks consistency in the presented data. Here is a list of the major concerns:

      (1) The authors culture the cells in the presence of fetal bovine serum (FBS) in the culture medium. Given that FBS contains a substantial amount of EVs, this raises a significant issue, as it becomes challenging to differentiate between EVs derived from FBS and those released by the cells. This concern extends to all transmission electron microscopy (TEM) images (Figure 1, 2P-S, S5, Figure 4 P-U) and the quantification of EV numbers in Figure 3. The authors need to use an FBS-free cell culture medium.

      (1) Although FBS indeed contains bovine EVs, however, the presence of very large multivesicular EVs (amphiectosomes) that our manuscript focuses on has never been observed and reported. For reported size distributions of EVs in FBS, please find a few relevant references below:

      PMID: 29410778, PMID: 33532042, PMID: 30940830 and PMID: 37298194

      All the above publications show that the number of lEVs > 350-500 nm is negligible in FBS. The average diameter of MV-lEVs (amphiectosomes) described in our manuscript is around 1.00-1.50 micrometre.

      (1) When we demonstrated the TEM of isolated EVs, we consistently used serum- free conditioned medium (Fig2 P-S, Fig2S5 J, O) as described previously (Németh et al 2021, PMID: 34665280).

      (2) Our TEM images show cells captured in the process of budding and scission of large multivesicular EVs excluding the possibility that these structures could have originated from FBS.

      (3) In addition, in our confocal analysis, we studied Palm-GFP positive, cell-line derived MV-lEVs. Importantly, in these experiments, FBS-derived EVs are non-fluorescent, therefore, the distinction between GFP positive MV-lEVs and FBS-derived EVs was evident.

      (4) In addition, culturing cells in FBS-free medium (serum starvation) significantly affects autophagy. Given that in our study, we focused on autophagy related amphiectosome secretion, we intentionally chose to use FBS supplemented medium.

      (5) Even though the authors of this manuscript are not familiar with the technological details how FBS is processed before commercialization, it is reasonable to assume that the samples are subjected to sterile filtration (through a 0.22 micron filter) after which MV-lEVs cannot be present in the commercial FBS samples.

      (2) The data presented in Figure 2 is not convincingly supportive of the authors' conclusion. The authors argue that "...CD81 was present in the plasma membrane-derived limiting membrane (Figures 2B, D, F), while CD63 was only found inside the MV-lEVs (Fig. 2A, C, E)." However, in Figure 2G, there is an observable CD63 signal in the limiting membrane (overlapping with the green signals), and in Figure 2J, CD81 also exhibits overlap with MV-IEVs.

      Both CD63 and CD81 are tetraspanins known to be present both in the membrane of sEVs and in the plasma membrane of cells (for references, please see Uniprot subcellular location maps: https://www.uniprot.org/uniprotkb/P08962/entry#subcellular_location https://www.uniprot.org/uniprotkb/P60033/entry#subcellular_location). However, according the feedback of the reviewer, for clarity, we will delete the implicated sentence from the text.

      (3) Following up on the previous concern, the authors argue that CD81 and CD63 are exclusively located on the limiting membrane and MV-IEVs, respectively (Figure 2-A-M). However, in lines 104-106, the authors conclude that "The simultaneous presence of CD63, CD81, TSG101, ALIX, and the autophagosome marker LC3B within the MV-lEVs..." This statement indicates that CD63 and CD81 co-localize to the MV-IEVs. The authors need to address this apparent discrepancy and provide an explanation.

      There must be a misunderstanding because we did not claim or implicate in the text that that “CD81 and CD63 are exclusively located on the limiting membrane and MV-IEVs”. Here we studied co-localization of the above proteins in the case intraluminal vesicles (ILVs). In Fig 2. we did not show any analysis of limiting membrane co-localization.

      (4) The specificity of the antibodies used in Figure 2 should be validated through knockout or knockdown experiments. Several of the antibodies used in this figure detect multiple bands on western blots, raising doubts about their specificity. Verification through additional experimental approaches is essential to ensure the reliability and accuracy of all the immunostaining data in this manuscript.

      We will consider this suggestion during the revision of the manuscript.

      (5) In Figures 2P-R, the morphology of the MV-IEVs does not resemble those shown in Figures 1-A, H, and D, indicating a notable inconsistency in the data.

      EM images in Figure2 P-R show sEVs separated from serum-free conditioned media as opposed to MV-lEVs, which were in situ captured in in fixed tissue cultures (Fig1). Therefore, the two EV populations necessarily have different size and structure. Furthermore, Fig. 1 shows images of ultrathin sections while in Figure 2P-R, we used a negative-positive contrasting of intact sEV-s without embedding and sectioning.

      (6) There are no loading controls provided for any of the western blot data.

      Not even the latest MISEV 2023 guidelines give recommendations for proper loading control for separated EVs in Western blot (MISEV 2023 , DOI: 10.1002/jev2.12404 PMID: 38326288). Here we applied our previously developed method (PMID: 37103858), which in our opinion, is the most reliable approach to be used for sEV Western blotting. For whole cell lysates, we used actin as loading control (Fig3_S2B).

      Additionally, for Figures 2-S4B, the authors should run the samples from lanes i-iii in a single gel.

      Please note that in Figure 2- S4B, we did run a single gel, and the blot was cut into 4 pieces, which were tested by anti-GFP, anti-RFP, anti-LC3A and anti-LC3B antibodies. Full Western blots are shown in Fig.3_S2 B, and lanes “1”, “2” and “3” correspond to “i”, “ii” and “iii” in Fig.2_S4, respectively.

      (7) In Figure 2-S4, is there co-localization observed between LC3RFP (LC3A?) with other MV-IFV markers? How about LC3B? Does LC3B co-localize with other MV-IFV markers?

      In the Supplementary figure Figure 2-S4 we showed successful generation of HEK293T-PalmGFP-LC3RFP cell line. In this case we tested the cells, and not the released MV-lEVs. LC3A co-localized with the RFP signal as expected.

      (8) The TEM images presented in Figure 2-S5, specifically F, G, H, and I, do not closely resemble the images in Figure 2-S5 K, L, M, N, and O. Despite this dissimilarity, the authors argue that these images depict the same structures. The authors should provide an explanation for this observed discrepancy to ensure clarity and consistency in the interpretation of the presented data.

      As indicated in Material and Methods, Fig 2_S5 F, G, H and I are conventional TEM images fixed by 4% glutaraldehyde 1% OsO4 2h and embedded into Epon resin with a post contrasting of 3.75% uranyl acetate 10 min and 12 min lead citrate. Samples processed this way have very high structure preservation and better image quality, however, they are not suitable for immune detection. In contrast, Fig.2._S5 K,L,M,N shows immunogold labelling of in situ fixed samples. In this case we used milder fixation (4% PFA, 0.1% glutaraldehyde, postfixed by 0.5% OsO4 30 min) and LR-White hydrophilic resin embedding. This special resin enables immunogold TEM analysis. The sections were exposed to H2O2 and NaBH4 to render the epitopes accessible in the resin. Because of the different applied techniques, the preservation of the structure is not the same. In the case of Fig.2 J, O, separated sEVs were visualised by negative-positive contrast and immunogold labelling as described previously (PMID: 37103858).

      (9) For Figures 3C and 3-S1, the authors should include the images used for EV quantification. Considering the concern regarding potential contamination introduced by FBS (concern 1), it is advisable for the authors to employ an independent method to identify EVs, thereby confirming the reliability of the data presented in these figures.

      In our revised manuscript, we will provide all the images used for EV quantification in Figure 3C. Given that Figures 3C and 3-S1 show MV-lEVs released by HEK293T-PlamGFP cells, the possible interference by FBS-derived non-fluorescent EVs can be excluded.

      (10) Do the amphiectosomes released from other cell types as well as cells in mouse kidneys or liver contain LC3B positive and CD63 positive ILVs?

      Based on our confocal microscopic analysis, in addition the HEK293T-PalmGFP cells, HT29 and HepG2 cells also release similar LC3B and CD63 positive MV-lEVs. Preliminary evidence shows MV-lEV secretion by additional cell types.

    2. eLife assessment

      The authors present evidence that small extracellular vesicles can be secreted from cells inside larger vesicles that they call amphiectosomes, which then tear to release their small vesicle contents. There are questions and concerns relating to the quality of the data and the in vivo significance of the observations. The findings are potentially important but the data are incomplete and the claims are only partially supported.

    3. Reviewer #1 (Public Review):

      Summary:

      The authors' research group had previously demonstrated the release of large multivesicular body-like structures by human colorectal cancer cells. This manuscript expands on their findings, revealing that this phenomenon is not exclusive to colorectal cancer cells but is also observed in various other cell types, including different cultured cell lines, as well as cells in the mouse kidney and liver. Furthermore, the authors argue that these large multivesicular body-like structures originate from intracellular amphisomes, which they term "amphiectosomes." These amphiectosomes release their intraluminal vesicles (ILVs) through a "torn-bag mechanism." Finally, the authors demonstrate that the ILVs of amphiectosomes are either LC3B positive or CD63 positive. This distinction implies that the ILVs either originate from amphisomes or multivesicular bodies, respectively.

      Strengths:

      The manuscript reports a potential origin of extracellular vesicle (EV) biogenesis. The reported observations are intriguing.

      Weaknesses:

      It is essential to note that the manuscript has issues with experimental designs and lacks consistency in the presented data. Here is a list of the major concerns:

      (1) The authors culture the cells in the presence of fetal bovine serum (FBS) in the culture medium. Given that FBS contains a substantial amount of EVs, this raises a significant issue, as it becomes challenging to differentiate between EVs derived from FBS and those released by the cells. This concern extends to all transmission electron microscopy (TEM) images (Figure 1, 2P-S, S5, Figure 4 P-U) and the quantification of EV numbers in Figure 3. The authors need to use an FBS-free cell culture medium.

      (2) The data presented in Figure 2 is not convincingly supportive of the authors' conclusion. The authors argue that "...CD81 was present in the plasma membrane-derived limiting membrane (Figures 2B, D, F), while CD63 was only found inside the MV-lEVs (Fig. 2A, C, E)." However, in Figure 2G, there is an observable CD63 signal in the limiting membrane (overlapping with the green signals), and in Figure 2J, CD81 also exhibits overlap with MV-IEVs.

      (3) Following up on the previous concern, the authors argue that CD81 and CD63 are exclusively located on the limiting membrane and MV-IEVs, respectively (Figure 2-A-M). However, in lines 104-106, the authors conclude that "The simultaneous presence of CD63, CD81, TSG101, ALIX, and the autophagosome marker LC3B within the MV-lEVs..." This statement indicates that CD63 and CD81 co-localize to the MV-IEVs. The authors need to address this apparent discrepancy and provide an explanation.

      (4) The specificity of the antibodies used in Figure 2 should be validated through knockout or knockdown experiments. Several of the antibodies used in this figure detect multiple bands on western blots, raising doubts about their specificity. Verification through additional experimental approaches is essential to ensure the reliability and accuracy of all the immunostaining data in this manuscript.

      (5) In Figures 2P-R, the morphology of the MV-IEVs does not resemble those shown in Figures 1-A, H, and D, indicating a notable inconsistency in the data.

      (6) There are no loading controls provided for any of the western blot data. Additionally, for Figures 2-S4B, the authors should run the samples from lanes i-iii in a single gel.

      (7) In Figure 2-S4, is there co-localization observed between LC3RFP (LC3A?) with other MV-IFV markers? How about LC3B? Does LC3B co-localize with other MV-IFV markers?

      (8) The TEM images presented in Figure 2-S5, specifically F, G, H, and I, do not closely resemble the images in Figure 2-S5 K, L, M, N, and O. Despite this dissimilarity, the authors argue that these images depict the same structures. The authors should provide an explanation for this observed discrepancy to ensure clarity and consistency in the interpretation of the presented data.

      (9) For Figures 3C and 3-S1, the authors should include the images used for EV quantification. Considering the concern regarding potential contamination introduced by FBS (concern 1), it is advisable for the authors to employ an independent method to identify EVs, thereby confirming the reliability of the data presented in these figures.

      (10) Do the amphiectosomes released from other cell types as well as cells in mouse kidneys or liver contain LC3B positive and CD63 positive ILVs?

    4. Reviewer #2 (Public Review):

      Summary:

      The authors had previously identified that a colorectal cancer cell line generates small extracellular vesicles (sEVs) via a mechanism where a larger intracellular compartment containing these sEVs is secreted from the surface of the cell and then tears to release its contents. Previous studies have suggested that intraluminal vesicles (ILVs) inside endosomal multivesicular bodies and amphisomes can be secreted by the fusion of the compartment with the plasma membrane. The 'torn bag mechanism' considered in this manuscript is distinctly different because it involves initial budding off of a plasma membrane-enclosed compartment (called the amphiectosome in this manuscript, or MV-lEV). The authors successfully set out to investigate whether this mechanism is common to many cell types and to determine some of the subcellular processes involved.

      The strengths of the study are:

      (1) The high-quality imaging approaches used, seem to show good examples of the proposed mechanism.

      (2) They screen several cell lines for these structures, also search for similar structures in vivo, and show the tearing process by real-time imaging.

      (3) Regarding the intracellular mechanisms of ILV production, the authors also try to demonstrate the different stages of amphiectosome production and differently labelled ILVs using immuno-EM.<br /> Several of these techniques are technically challenging to do well, and so these are critical strengths of the manuscript.

      The weaknesses are:

      (1) Most of the analysis is undertaken with cell lines. In fact, all of the analysis involving the assessment of specific proteins associated with amphiectosomes and ILVs are performed in vitro, so it is unclear whether these processes are really mirrored in vivo. The images shown in vivo only demonstrate putative amphiectosomes in the circulation, which is perhaps surprising if they normally have a short half-life and would need to pass through an endothelium to reach the vessel lumen unless they were secreted by the endothelial cells themselves.

      (2) The analysis of the intracellular formation of compartments involved in the secretion process (Figure 2_S5) relies on immuno-EM, which is generally less convincing than high-/super-resolution fluorescence microscopy because the immuno-labelling is inevitably very sporadic and patchy. High-quality EM is challenging for many labs (and seems to be done very well here), but high-/super-resolution fluorescence microscopy techniques are more commonly employed, and the study already shows that these techniques should be applicable to studying the intracellular trafficking processes.

      (3) One aspect of the mechanism, which needs some consideration, is what happens to the amphisome membrane, once it has budded off inside the amphiectosome. In the fluorescence images, it seems to be disrupted, but presumably, this must happen after separation from the cell to avoid the release of ILVs inside the cell. There is an additional part of Figure 1 (Figure 1Y onwards), which does not seem to be discussed in the text (and should be), that alludes to amphiectosomes often having a double membrane.

      (4) The real-time analysis of the amphiectosome tearing mechanism seemed relatively slow to me (over three minutes), and if this has been observed multiple times, it would be helpful to know if this is typical or whether there is considerable variation.

      Overall, I think the authors have been successful in identifying amphiectosomes secreted from multiple cell lines and demonstrating that the ILVs inside them have at least two origins (autophagosome membrane and late endosomal multivesicular body) based on the markers that they carry. The analysis of intracellular compartments producing these structures is rather less convincing and it remains unclear what cells release these structures in vivo.

      I think there could be a significant impact on the EV field and consequently on our understanding of cell-cell signalling based on these findings. It will flag the importance of investigating the release of amphiectosomes in other studies, and although the authors do not discuss it, the molecular mechanisms involved in this type of 'ectosomal-style' release will be different from multivesicular compartment fusion to the plasma membrane and should be possible to be manipulated independently. Any experiments that demonstrate this would greatly strengthen the manuscript.

      In general, the EV field has struggled to link up analysis of the subcellular biology of sEV secretion and the biochemical/physical analysis of the sEVs themselves, so from that perspective, the manuscript provides a novel angle on this problem.

    5. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors describe a novel mode of release of small extracellular vesicles. These small EVs are released via the rupture of the membrane of so-called amphiectosomes that resemble "morphologically" Multivesicular Bodies.

      These structures have been initially described by the authors as released by colorectal cancer cells (https://doi.org/10.1080/20013078.2019.1596668). In this manuscript, they provide experiments that allow us to generalize this process to other cells. In brief, amphiectosomes are likely released by ectocytosis of amphisomes that are formed by the fusion of multivesicular endosomes with autophagosomes. The authors propose that their model puts forward the hypothesis that LC3 positive vesicles are formed by "curling" of the autophagosomal membrane which then gives rise to an organelle where both CD63 and LC3 positive small EVs co-exist and would be released then by a budding mechanism at the cell surface that appears similar to the budding of microvesicles /ectosomes. Very correctly the authors make the distinction from migrasomes because these structures appear very similar in morphology.

      Strengths:

      The findings are interesting despite that it is unclear what would be the functional relevance of such a process and even how it could be induced. It points to a novel mode of release of extracellular vesicles.

      Weaknesses:

      This reviewer has comments and concerns concerning the interpretation of the data and the proposed model. In addition, in my opinion, some of the results in particular micrographs and immunoblots (even shown as supplementary data) are not of quality to support the conclusions.

    1. eLife assessment

      The authors present a critique of current usage of principal component analysis in geometric morphometrics, making a convincing case with benchmark data that standard techniques perform poorly. The work is an important contribution to the field and will hopefully lead to a reassessment of the methodology most scientists in morphometrics currently use. The authors also present a new Python package that uses machine learning to provide better-supported results than principal component analysis. While the package is in the title of the manuscript, it is not its main focus.

    2. Reviewer #1 (Public Review):

      Mohseni and Elhaik's article offers a critical evaluation of Geometric Morphometrics (GM), a common tool in physical anthropology for studying morphological differences and making phylogenetic inferences. I read their article with great interest, although I am not a geneticist or an expert on PCA theory since the problem of morphology-based classification is at the core of paleoanthropology.

      The authors developed a Python package for processing superimposed landmark data with classifier and outlier detection methods, to evaluate the adequacy of the standard approach to shape analysis via modern GM. They call into question the accuracy, robustness, and reproducibility of GM, and demonstrate how PCA introduces statistical artefacts specific to the data, thus challenging its scientific rigor. The authors demonstrate the superiority of machine learning methods in classification and outlier detection tasks. The paper is well-written and provides strong evidence in support of the authors' argument. Thus, in my opinion, it constitutes a major contribution to the field of physical anthropology, as it provides a critical and necessary evaluation of what has become a basic tool for studying morphology, and of the assumptions allowing its application for phylogenetic inferences. Again, I am not an expert in these statistical methods, nor a geneticist, but the authors' contribution is of substantial relevance to our field (physical anthropology). The examples of NR fossils and HLD 6 are cases in point, in line with other notable examples of critical assessment of phylogenetic inferences made on the basis of PCA results of GM analysis. For example, see Lordkipanidze et al.'s (2014) GM analyses of the Dmanisi fossils, suggesting that the five crania represent a single regional variant of Homo erectus; and see Schwartz et al.'s (2014) comment on their findings, claiming that the dental, mandibular, and cranial morphology of these fossils suggest taxic diversity. Schwartz et al. (2014) ask, "Why did the GMA of 78 landmarks not capture the visually obvious differences between the Dmanisi crania and specimens commonly subsumed H. erectus? ... one wonders how phylogenetically reliable a method can be that does not reflect even easily visible gross morphological differences" (p. 360).

      As an alternative to the PCA step in GM, the authors tested eight leading supervised learning classifiers and outlier detection methods on three-dimensional datasets. The authors demonstrated inconsistency of PCA clustering with the taxonomy of the species investigated for the reconstruction of their phylogeny, by analyzing a database comprising landmarks of 6 known species that belong to the Old World monkeys tribe Papionini, using PCA for classification. The authors also demonstrated that high explained variance should not be used as an estimate of high accuracy (reliability). Then, the authors altered the dataset in several ways to simulate the characteristic nature of paleontological data.

      The authors excluded taxa from the database to study how PCA and alternative classifiers are affected by partial sampling, and the results presented in Figures 4 and 5, among others, are quite remarkable in showing the deviations from the benchmark data. These results expose the perils of applying PCA and GM for interpreting morphological data. Furthermore, they provide evidence showing that the alternative classifiers are superior to PCA, and that they are less susceptible to experimenter intervention. Similar results, i.e., inconsistencies in the PC plots, were obtained in examinations of the effect of removing specimens from the dataset and in the interesting test of removing landmarks to simulate partial morphological data, as is often the case with fossils. To test the combined effect of these data alterations, the authors combined removal of taxa, specific samples, and landmarks from the dataset. In this case, as well, the PCA results indicate deviation from the benchmark data. However, the ML classifiers could not remedy the situation. The authors discuss how these inconsistencies may lead to different interpretations of the data, and in turn, different phylogenetic conclusions. Lastly, the authors simulated the situation of a specimen of unknown taxonomy using outlier detection methods, demonstrating LOF's ability to identify a novelty in the morphospace.

      References<br /> Bookstein FL. 1991. Morphometric tools for landmark data: geometry and biology [Orange book]. Cambridge New York: Cambridge University Press.<br /> Cooke SB, and Terhune CE. 2015. Form, function, and geometric morphometrics. The Anatomical Records 298:5-28.<br /> Lordkipanidze D, et al. 2013. A complete skull from Dmanisi, Georgia, and the evolutionary biology of early Homo. Science 342: 326-331.<br /> Schwartz JH, Tattersall I, and Chi Z. 2014. Comment on "A complete skull from Dmanisi, Georgia, and the evolutionary biology of Early Homo". Science 344(6182): 360-a.

    3. Reviewer #2 (Public Review):

      I completely agree with the basic thrust of this study. Yes, of course, machine learning is FAR better than any variant of PCA for the paleosciences. I agree with the authors' critique early on that this point is not new per se - it is familiar to most of the founders of the field of GMM, including this reviewer. A crucial aspect is the dependence of ALL of GMM, PCA or otherwise, on the completely unexamined, unformalized praxis by which a landmark configuration is designed in the first place. I must admit that I am stunned by the authors' estimate of over 32K papers that have used PCA with GMM.

      But beating a dead horse is not a good way of designing a motor vehicle. I think the manuscript needs to begin with a higher-level view of the pathology of its target disciplines, paleontology and paleoanthropology, along the lines that David Hull demonstrated for numerical taxonomy some decades ago. That many thousands of bad methodologies require some sort of explanation all of their own in terms of (a) the fears of<br /> biologists about advanced mathematics, (b) the need for publications and tenure, (c) the desirability of covers of Nature and Science, and (d) the even greater glory of getting to name a new "species." This cumulative pathology of science results in paleoanthro turning into a branch of the humanities, where no single conclusion is treated as stable beyond the next dig, the next year or so of applied genomics, and the next chemical trace analysis. In short, the field is not cumulative.

      It is not obvious that the authors' suggestion of supervised machine learning will remedy this situation, since (a) that field itself is undergoing massive changes month by month with the advent of applications AI, and even more relevant (b) the best ML algorithms, those based on deep neural nets, are (literally) unpublishable - we cannot see how their decisions have actually been computed. Instead, to stabilize, the field will need to figure out how to base its inferences on some syntheses of actual empirical theories.

      It's not that this reviewer is cynical, but it is fair to suggest a revision conveying a concern for the truly striking lack of organized skepticism in the literature that is being critiqued here. A revision along those lines would serve as a flagship example of exactly the deeper argument that reference (17) was trying to seed, that the applied literature obviously needs a hundred times more of. Such a review would do the most good if it appeared in one of the same journals - AJBA, Evolution, Journal of Human Evolution, Paleobiology - where the bulk of the most highly cited misuses of PCA themselves have appeared.

    4. Reviewer #3 (Public Review):

      Mohseni and Elhaik challenge the widespread use of PCA as an analytical and interpretive tool in the study of geometric morphometrics. The standard approach in geometric morphometrics analysis involves Generalised Procrustes Analysis (GPA) followed by Principal Component Analysis (PCA). Recent research challenges PCA outcomes' accuracy, robustness, and reproducibility in morphometrics analysis. In this paper, the authors demonstrate that PCA is unreliable for such studies. Additionally, they test and compare several Machine-Learning methods and present MORPHIX, a Python package of their making that incorporates the tools necessary to perform morphometrics analysis using ML methods.

      Mohseni and Elhaik conducted a set of thorough investigations to test PCA's accuracy, robustness, and reproducibility following renewed recent criticism and publications where this method was abused. Using a set of 2 and 3D morphometric benchmark data, the authors performed a traditional analysis using GPA and PCA, followed by a reanalysis of the data using alternative classifiers and rigorous testing of the different outcomes.

      In the current paper, the authors evaluated eight ML methods and compared their classification accuracy to traditional PCA. Additionally, common occurrences in the attempted morphological classification of specimens, such as non-representative partial sampling, missing specimens, and missing landmarks, were simulated, and the performance of PCA vs ML methods was evaluated.

      The main problem with this manuscript is that it is three papers rolled into one, and the link doesn't work. The title promises a new Python package, but the actual text of the manuscript spends relatively little time on the Python package itself and barely gives any information about the package and what it includes or its usefulness. It is definitely not the focus of the manuscript. The main thrust of the manuscript, which takes up most of the text, is the analysis of the papionin dataset, which shows very convincingly that PCA underperforms in virtually all conditions tested. In addition, the manuscript includes a rather vicious attack against two specific cases of misuse of PCA in paleoanthropological studies, which does not connect with the rest of the manuscript at all.

      If the manuscript is a criticism of PCA techniques, this should be reflected in the title. If it is a report of a new Python package, it should focus on the package. Otherwise, there should be two separate manuscripts here.

      The criticism of PCA is valid and important. However, pointing out that it is problematic in specific cases and is sometimes misused does not justify labeling tens of thousands of papers as questionable and does not justify vilifying an entire discipline. The authors do not make a convincing enough case that their criticism of the use of PCA in analyzing primate or hominin skulls is relevant to all its myriad uses in morphometrics. The criticism is largely based on statistical power, but it is framed as though it is a criticism of geometric morphometrics in general.

    1. eLife assessment

      The study presents valuable findings on compensatory mechanisms in response to glycosuria. The evidence supporting the claims is solid, although a causal relationship is somewhat uncertain and the addition of a more clinically relevant model would have strengthened the findings. The work will be of interest to diabetes investigators.

    2. Reviewer #1 (Public Review):

      Summary:

      In this study, Faniyan and colleagues build on their recent finding that renal Glut2 knockout mice display normal fasting blood glucose levels despite massive glucosuria. Renal Glut2 knockout mice were found to exhibit increased endogenous glucose production along with decreased hepatic metabolites associated with glucose metabolism. Crh mRNA levels were higher in the hypothalamus while circulating ACTH and corticosterone was elevated in this model. While these mice were able to maintain normal fasting glucose levels, ablating afferent renal signals to the brain resulted in substantially lower blood glucose levels compared to wildtype mice. In addition, the higher CRH and higher corticosterone levels of the knockout mice were lost following this denervation. Finally, acute phase proteins were altered, plasma Gpx3 was lower, and major urinary protein MUP18 and its gene expression were higher in renal Glut2 knockout mice. Overall, the main conclusion that afferent signaling from the kidney is required for renal glut2 dependent increases in endogenous glucose production is well supported by these findings.

      Strengths:

      An important strength of the paper is the novelty of the identification of kidney to brain communication as being important for glucose homeostasis. Previous studies had focused on other functions of the kidney modulated by or modulating brain function. This work is likely to promote interest in CNS pathways that respond to afferent renal signals and the response of the HPA axis to glucosuria. Additional strengths of this paper stem from the use of incisive techniques. Specifically, the authors use isotope enabled measurement of endogenous glucose production by GC-MS/MS, capsaicin ablation of afferent renal nerves, and multifiber recording from the renal nerve. The authors also paid excellent attention to rigor in the design and performance of these studies. For example, they used appropriate surgical controls, confirmed denervation through renal pelvic CGRP measurement, and avoided the confounding effects of nerve regrowth over time. These factors strengthen confidence in their results. Finally, humans with glucose transporter mutations and those being treated with SGLT2 inhibitors show a compensatory increase in endogenous glucose production. Therefore, this study strengthens the case for using renal Glut2 knockout mice as a model for understanding the physiology of these patients.

      Weaknesses:

      A few weaknesses exist. Most concerns relate to the interpretation of this study's findings. The authors state that loss of glucose in urine is sensed as a biological threat based on the HPA axis activation seen in this mouse model. This interpretation is understandable but speculative. Importantly, whether stress hormones mediate the increase in endogenous glucose production in this model and in humans with altered glucose transporter function remains to be demonstrated conclusively. For example, the paper found several other circulating and local factors that could be causal. This model is also unable to shed light on how elevated stress hormones might interact with insulin resistance, which is known to increase endogenous glucose production. That issue is of substantial clinical relevance for patients with T2D and metabolic disease. Finally, while findings from the Glut2 knockout mice are of scientific interest, it should be noted that the Glut2 receptor is critical to the function of pancreatic islets and as such is not a good candidate for pharmacological targeting

    3. Reviewer #2 (Public Review):

      Summary:

      The authors previously generated renal Glut2 knockout mice, which have high levels of glycosuria but normal fasting glucose. They use this as an opportunity to investigate how compensatory mechanisms are engaged in response to glycosuria. They show that renal and hepatic glucose production, but not metabolism, is elevated in renal Glut2 male mice. They show that renal Glut2 male mice have elevated Crh mRNA in the hypothalamus, and elevated plasma levels of ACTH and corticosterone. They also show that temporary denervation of renal nerves leads to a decrease in fasting and fed blood glucose levels in female renal Glut2 mice, but not control mice. Finally, they perform plasma proteomics in male mice to identify plasma proteins with a greater than 25% (up or down) between the knockouts and controls.

      Strengths:

      The question that is trying to be addressed is clinically important: enhancing glycosuria is a current treatment for diabetes, but is limited in efficacy because of compensatory increases in glucose production.

      Weaknesses:

      (1) Although I appreciate that the initial characterization of the mice in another publication showed that both males and females have glycosuria, this does not mean that both sexes have the same mechanisms giving rise to glycosuria. There are many examples of sex differences in HPA activation in response to threat, for example. There is an unfounded assumption here that males and females have the same underlying mechanisms of glycosuria that undermines the significance of the findings.

      (2) The authors state that they induced the Glut2 knockout with taxomifen as in their previous publication. The methods of that publication indicate that all experiments were completed within 14 days of inducing the Glut2 knockout. This means that the last dose of tamoxifen was delivered 14 days prior to the experimental endpoint of each experiment. This seems like an important experimental constraint that should be discussed in this manuscript. Is the glycosuria that follows Glut2 knockout only a temporary change? If so, then the long-term change in glycosuria that follows SGLT2 inhibition in humans might not be best modelled by this knockout. Please specify when the surgeries to implant a jugular catheter or ablate the renal nerves performed relative to the Glut2 knockout in the Methods.

      (3) I am still unclear what group was used for controls. Are these wild-type mice who receive tamoxifen? Are they KspCadCreERT2;Glut2loxP/loxP mice who do not receive tamoxifen? This is important and needs to be specified.

      (4) The authors should report some additional control measures for the renal denervation that could also impact blood glucose and perhaps some of their other measures. The control measures, which one would like to see unimpacted by renal denervation, include body weights, food consumption and water intake, and glycosuria itself.

      (5) The graphical abstract shows a causal link between the hypothalamus and the liver that is unsupported by any of the current findings. That arrow should be removed or a question mark should be added next to the arrow.

      (6) Though the authors have toned down their language implying a causal link between the HPA measures and compensatory elevation of blood glucose in the face of glycosuria, the title still implies this causal link. It is still the case that their data do not support causation. There are many potential ways to establish a causal link but those experiments are not performed here. The renal afferents are correlated with Crh content of the PVN, but nothing has been done to show that the Crh content is important for elevating blood glucose. In light of this, the title should be toned down. Perhaps something like "Renal nerves maintain blood glucose production and elevated HPA activity in response to glycosuria". The link between HPA and glucose is not shown in this paper.

    1. eLife assessment

      This study explores the interplay between fungal colonization and controlled programmed cell death in Arabidopsis thaliana root cells. The authors reveal how this process is affected by corpse clearance in the root cap, highlighting some of the key elements in this process including a root cap-specific transcription factor. With this, the authors have discovered an important relationship between transcriptional regulation of developmentally controlled cell death and the beneficial colonization of plants by fungi. The work thus establishes a solid basis for further studies of plant-microbiome associations along the root axis.

    2. Reviewer #1 (Public Review):

      Summary:

      The study investigated how root cap cell corpse removal affects the ability of microbes to colonize Arabidopsis thaliana plants. The findings demonstrate how programmed cell death and its control in root cap cells affect the establishment of symbiotic relationships between plants and fungi. Key details on molecular mechanisms and transcription factors involved are also given. The study suggests reevaluating microbiome assembly from the root tip, thus challenging traditional ideas about this process. While the work presents a key foundation, more research along the root axis is recommended to gain a better understanding of the spatial and temporal aspects of microbiome recruitment.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors identify the root cap as an important key region for establishing microbial symbioses with roots. By highlighting for the first time the crucial importance of tight regulation of a specific form of programmed cell death of root cap cells and the clearance of their cell corpses, they start unraveling the molecular mechanisms and its regulation at the root cap (e.g. by identifying an important transcription factor) for the establishment of symbioses with fungi (and potentially also bacterial microbiomes).

      Strengths:

      It is often believed that the recruitment of plant microbiomes occurs from bulk soil to rhizosphere to endosphere. These authors demonstrate that we have to re-think microbiome assembly as a process starting and regulated at the root tip and proceeding along the root axis.

      Weaknesses:

      The study is a first crucial starting point to investigate the spatial recruitment of beneficial microorganisms along the root axis of plants. It identifies e.g. an important transcription factor for programmed cell death, but more detailed investigations along the root axis are now needed to better understand - spatially and temporally - the orchestration of microbiome recruitment.

    1. eLife assessment

      This study provides important biophysical insights into the molecular mechanism underlying the association of alpha-synuclein chains, which is essential for understanding the pathogenesis of Parkinson's disease. The data analysis is solid, and the methodology can help investigate other molecular processes involving intrinsically disordered proteins. The benchmarking of the cgMD simulations should be improved to give the reader greater confidence in the conclusions presented.

    2. Reviewer #1 (Public Review):

      Summary:

      In this paper, the authors performed molecular dynamics (MD) simulations to investigate the molecular basis of the association of alpha-synuclein chains under molecular crowding and salt conditions. Aggregation of alpha-synuclein is linked to the pathogenesis of Parkinson's disease, and the liquid-liquid phase separation (LLPS) is considered to play an important role in the nucleation step of the alpha-synuclein aggregation. This paper re-tuned the Martini3 coarse-grained force field parameters, which allows long-timescale MD simulations of intrinsically disordered proteins with explicit solvent under diverse environmental perturbation. Their MD simulations showed that alpha-synuclein does not have a high LLPS-forming propensity, but the molecular crowding and salt addition tend to enhance the tendency of droplet formation and therefore modulate the alpha-synuclein aggregation. The MD simulation results also revealed important intra and inter-molecule conformational features of the alpha-synuclein chains in the formed droplets and the key interactions responsible for the stability of the droplets. These MD simulation data add biophysical insights into the molecular mechanism underlying the association of alpha-synuclein chains, which is important for understanding the pathogenesis of Parkinson's disease.

      Strengths:

      (1) The re-parameterized Martini 3 coarse-grained force field enables the large-scale MD simulations of the intrinsically disordered proteins with explicit solvent, which will be useful for a more realistic description of the molecular basis of LLPS.

      (2) This paper showed that molecular crowding and salt contribute to the modulation of the LLPS through different means. The molecular crowding minimally affects surface tension, but adding salt increases surface tension. It is also interesting to show that the aggregation pathway involves the disruption of the intra-chain interactions arising from C-terminal regions, which potentially facilitates the formation of inter-chain interactions.

      Weaknesses:

      (1) Although the authors emphasized the advantage of the Martini3 force field for its explicit description of solvent, the whole paper did not discuss the water's role in the aggregation and LLPS.

      (2) This paper discussed the effects of crowders and salt on the surface tension of the droplets. The calculation of the surface tension relies on the droplet shape. However, for the formed clusters in the MD simulations, the typical size is <10, which may be too small to rigorously define the droplet shape. As shown in previous work cited by this paper [Benayad et al., J. Chem. Theory Comput. 2021, 17, 525−537], the calculated surface tension becomes stable when the chain number is larger than 100.

      (3) In this work, the Martini 3 force field was modified by rescaling the LJ parameters \epsilon and \sigma with a common factor \lambda. It has not been very clearly described in the manuscript why these two different parameters can be rescaled by a common factor and why it is necessary to separately tune these two parameters, instead of just tuning the coefficient \epsilon as did in a previous work [Larsen et al., PLoS Comput Biol 16: e1007870].

      (4) Both the sizes and volume fractions of the crowders can affect the protein association. It will be interesting to perform MD simulations by adding crowders with various sizes and volume fractions. In addition, in this work, the crowders were modelled by fullerenes, which contribute to protein aggregation mainly by entropic means as discussed in the manuscript. It is not very clear how the crowder effect is sensitive to the chemical nature of the crowders (e.g., inert crowders with excluded volume effect or crowders with non-specific attractive interactions with proteins, etc) and therefore the force field parameters.

    3. Reviewer #2 (Public Review):

      In the manuscript "Modulation of α-Synuclein Aggregation Amid Diverse Environmental Perturbation", Wasim et al describe coarse-grained molecular dynamics (cgMD) simulations of α-Synuclein (aSyn) at several concentrations and in the presence of molecular crowding agents or high salt. They begin by bench-marking their cgMD against all-atom simulations by Shaw. They then carry 2.4-4.3 µs cgMD simulations under the above-noted conditions and analyze the data in terms of protein structure, interaction network analysis, and extrapolated fluid mechanics properties. This is an interesting study because a molecular scale understanding of protein droplets is currently lacking, but I have a number of concerns about how it is currently executed and presented.

      (1) It is not clear whether the simulations have reached a steady state. If they have not, it invalidates many of their analysis methods and conclusions.

      (2) The benchmarking used to validate their cgMD methods is very minimal and fails to utilize a large amount of available all-atom simulation and experimental data.

      (3) They also miss opportunities to compare their simulations to experimental data on aSyn protein droplets.

      (4) Aspects such as network analysis are not contextualized by comparison to other protein condensed phases.

      (5) Data are not made available, which is an emerging standard in the field.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Wang et al investigated the evolution, expression, and function of the X-linked miR-506 miRNA family. They showed that the miR-506 family underwent rapid evolution. They provided evidence that miR-506 appeared to have originated from the MER91C DNA transposons. Human MER91C transposon produced mature miRNAs when expressed in cultured cells. A series of mouse mutants lacking individual clusters, a combination of clusters, and the entire X-linked cluster (all 22 miRNAs) were generated and characterized. The mutant mice lacking four or more miRNA clusters showed reduced reproductive fitness (litter size reduction). They further showed that the sperm from these mutants were less competitive in polyandrous mating tests. RNA-seq revealed the impact of deletion of miR-506 on the testicular transcriptome. Bioinformatic analysis analyzed the relationship among miR-506 binding, transcriptomic changes, and target sequence conservation. The miR-506-deficient mice did not have apparent effect on sperm production, motility, and morphology. Lack of severe phenotypes is typical for miRNA mutants in other species as well. However, the miR-506-deficient males did exhibit reduced litter size, such an effect would have been quite significant in an evolutionary time scale. The number of mouse mutants and sequencing analysis represent a tour de force. This study is a comprehensive investigation of the X-linked miR-506 miRNA family. It provides important insights into the evolution and function of the miR-506 family.

      The conclusions of this preprint are mostly supported by the data except being noted below. Some descriptions need to be revised for accuracy.

      L219-L285: The conclusion that X-linked miR-506 family miRNAs are expanded via LINE1 retrotransposition is not supported by the data. LINE1s and SINEs are very abundant, accounting for nearly 30% of the genome. In addition, the LINE1 content of the mammalian X chromosome is twice that of the autosomes. One can easily find flanking LINE1/SINE repeat. Therefore, the analyses in Fig. 2G, Fig. 2H and Fig. S3 are not informative. In order to claim LINE1-mediated retrotransposition, it is necessary to show the hallmarks of LINE1 retrotransposition, which are only possible for new insertions. The X chromosome is known to be enriched for testis-specific multi-copy genes that are expressed in round spermatids (PMID: 18454149). The conclusion on the LINE1-mediated expansion of miR-506 family on the X chromosome is not supported by the data and does not add additional insights. I think that the LINE1 related figure panels and description (L219-L285) need to be deleted. In discussion (L557558), "...and subsequently underwent sequence divergence via LINE1-mediated retrotransposition during evolution" should also be deleted. This section (L219-L285) needs to deal only with the origin of miR506 from MER91C DNA transposons, which is both convincing and informative.

      Reply: Agreed, the corresponding sentences were deleted.

      Fig. 3A: can you speculate/discuss why the miR-506 expression in sperm is higher than in round spermatids?

      Reply: RNAs are much less abundant in sperm than in somatic or spermatogenic cells (~1/100). Spermborne small RNAs represent a small fraction of total small RNAs expressed in their precursor spermatogenic cells, including spermatocytes and spermatids. Therefore, when the same amount of total/small RNAs are used for quantitative analyses, sperm-borne small RNAs (e.g., miR-506 family miRNAs) would be proportionally enriched in sperm compared to other spermatogenic cells. We discussed this point in the text (Lines 550-556).

      **Reviewer #2 (Public Review):

      In this paper, Wang and collaborators characterize the rapid evolution of the X-linked miR-506 cluster in mammals and characterize the functional reference of depleting a few or most of the miRNAs in the cluster. The authors show that the cluster originated from the MER91C DNA transposon and provide some evidence that it might have expanded through the retrotransposition of adjacent LINE1s. Although the animals depleted of most miRNAs in the cluster show normal sperm parameters, the authors observed a small but significant reduction in litter size. The authors then speculate that the depletion of most miRNAs in the cluster could impair sperm competitiveness in polyandrous mating. Using a successive mating protocol, they show that, indeed, sperm lacking most X-linked miR-506 family members is outcompeted by wild-type sperm. The authors then analyze the evolution of the miR-506 cluster and its predicted targets. They conclude that the main difference between mice and humans is the expansion of the number of target sites per transcript in humans.

      The conclusions of the paper are, in most cases, supported by the data; however, a more precise and indepth analysis would have helped build a more convincing argument in most cases.

      (1) In the abstracts and throughout the manuscript, the authors claim that "... these X-linked miRNA-506 family miRNA [...] have gained more targets [...] " while comparing the human miRNA-506 family to the mouse. An alternative possibility is that the mouse has lost some targets. A proper analysis would entail determining the number of targets in the mouse and human common ancestor.

      Reply: This question alerted us that we did not describe our conclusion accurately, causing confusion for this reviewer. Our data suggest that although the sheer number of target genes remains the same between humans and mice, the human X-linked miR-506 family targets a greater number of genes than the murine counterpart on a per miRNA basis. In other words, mice never lost any targets compared to humans, but per the miR-506 family miRNA tends to target more genes in humans than in mice.

      We revised the text to more accurately report our data. The pertaining text (lines 490-508) now reads: “Furthermore, we analyzed the number of all potential targets of the miR-506 family miRNAs predicted by the aforementioned four algorithms among humans, mice, and rats. The total number of targets for all the X-linked miR-506 family miRNAs among different species did not show significant enrichment in humans (Fig. S9C), suggesting the sheer number of target genes does not increase in humans. We then compared the number of target genes per miRNA. When comparing the number of target genes per miRNA for all the miRNAs (baseline) between humans and mice, we found that on a per miRNA basis, human miRNAs have more targets than murine miRNAs (p<0.05, t-test) (Fig. S9D), consistent with higher biological complexity in humans. This became even more obvious for the X-linked miR-506 family (p<0.05, t-test) (Fig. S9D). In humans, the X-linked miR-506 family, on a per miRNA basis, targets a significantly greater number of genes than the average of all miRNAs combined (p<0.05, t-test) (Fig. S9D). In contrast, in mice, we observed no significant difference in the number of targets per miRNA between X-linked miRNAs and all of the mouse miRNAs combined (mouse baseline) (Fig. S9D). These results suggest that although the sheer number of target genes remains the same between humans and mice, the human X-linked miR-506 family targets a greater number of genes than the murine counterpart on a per miRNA basis.”

      We also changed “have gained” to “have” throughout the text to avoid confusion.

      (2) The authors claim that the miRNA cluster expanded through L1 retrotransposition. However, the possibility of an early expansion of the cluster before the divergence of the species while the MER91C DNA transposon was active was not evaluated. Although L1 likely contributed to the diversity within mammals, the generalization may not apply to all species. For example, SINEs are closer on average than L1s to the miRNAs in the SmiR subcluster in humans and dogs, and the horse SmiR subcluster seems to have expanded by a TE-independent mechanism.

      Reply: Agreed. We deleted the data mentioned by this reviewer.

      (3) Some results are difficult to reconcile and would have benefited from further discussion. The miR-465 sKO has over two thousand differentially expressed transcripts and no apparent phenotype. Also, the authors show a sharp downregulation of CRISP1 at the RNA and protein level in the mouse. However, most miRNAs of the cluster increase the expression of Crisp1 on a reporter assay. The only one with a negative impact has a very mild effect. miRNAs are typically associated with target repression; however, most of the miRNAs analyzed in this study activate transcript expression.

      Reply: Both mRNA and protein levels of Crisp1 were downregulated in KO mice, and these results are consistent with the luciferase data showing overexpression of these miRNAs upregulated the Crisp1 3’UTR luciferase activity. We agree that miRNAs usually repress target gene expression. However, numerous studies have also shown that some miRNAs, such as human miR-369-3, Let-7, and miR-373, mouse miR-34/449 and the miR-506 family, and the synthetic miRNA miRcxcr4, activate gene expression both in vitro (1, 2) and in vivo (3-6). Earlier reports have shown that these miRNAs can upregulate their target gene expression, either by recruiting FXR1, targeting promoters, or sequestering RNA subcellular locations (1, 2, 6). We briefly discussed this in the text (Lines 605-611).

      (4) More information is required to interpret the results of the differential RNA targeting by the murine and human miRNA-506 family. The materials and methods section needs to explain how the authors select their putative targets. In the text, they mention the use of four different prediction programs. Are they considering all sites predicted by any method, all sites predicted simultaneously by all methods, or something in between? Also, what are they considering as a "shared target" between mice and humans? Is it a mRNA that any miR-506 family member is targeting? Is it a mRNA targeted by the same miRNA in both species? Does the targeting need to occur in the same position determined by aligning the different 3'UTRs?

      Reply: Since each prediction method has its merit, we included all putative targets predicted by any of the four methods. The "shared target" refers to a mRNA that any miR-506 family member targets because the miR-506 family is highly divergent among different species. We have added the information to the “Large and small RNA-seq data analysis” section in Materials and Methods (Lines 871-882).

      (5) The authors highlight the particular evolution of the cluster derived from a transposable element. Given the tendency of transposable elements to be expressed in germ cells, the family might have originated to repress the expression of the elements while still active but then remained to control the expression of the genes where the element had been inserted. The authors did not evaluate the expression of transcripts containing the transposable element or discuss this possibility. The authors proposed an expansion of the target sites in humans. However, whether this expansion was associated with the expansion of the TE in humans was not discussed either. Clarifying whether the transposable element was still active after the divergence of the mouse and human lineages would have been informative to address this outstanding issue.

      Reply: Agreed. The MER91C DNA transposon is denoted as nonautonomous (7); however, whether it was active during the divergence of mouse and human lineages is unknown. To determine whether the expansion of the target sites in humans was due to the expansion of the MER91C DNA transposon, we analyzed the MER91C DNA transposon-containing transcripts and associated them with our DETs. Of interest, 28 human and 3 mouse mRNAs possess 3’UTRs containing MER91C DNA sequences, and only 3 and 0 out of those 28 and 3 genes belonged to DETs in humans and mice, respectively (Fig. S9E), suggesting a minimal effect of MER91C DNA transposon expansion on the number of target sites. We briefly discussed this in the text (Lines 511-518).

      Post-transcriptional regulation is exceptionally complex in male haploid cells, and the functional relevance of many regulatory pathways remains unclear. This manuscript, together with recent findings on the role of piRNA clusters, starts to clarify the nature of the selective pressure that shapes the evolution of small RNA pathways in the male germ line.

      Reply: Agreed. We appreciate your insightful comments.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors conducted a comprehensive study of the X-linked miR-506 family miRNAs in mice on its origin, evolution, expression, and function. They demonstrate that the X-linked miR-506 family, predominantly expressed in the testis, may be derived from MER91C DNA transposons and further expanded by retrotransposition. By genetic deletion of different combinations of 5 major clusters of this miRNA family in mice, they found these miRNAs are not required for spermatogenesis. However, by further examination, the mutant mice show mild fertility problem and inferior sperm competitiveness. The authors conclude that the X-linked miR-506 miRNAs finetune spermatogenesis to enhance sperm competition.

      Strengths:

      This is a comprehensive study with extensive computational and genetic dissection of the X-linked miR506 family providing a holistic view of its evolution and function in mice. The finding that this family miRNAs could enhance sperm competition is interesting and could explain their roles in finetuning germ cell gene expression to regulate reproductive fitness.

      Weaknesses:

      The authors specifically addressed the function of 5 clusters of X-link miR-506 family containing 19 miRNAs. There is another small cluster containing 3 miRNAs close to the Fmr1 locus. Would this small cluster act in concert with the 5 clusters to regulate spermatogenesis? In addition, any autosomal miR-506 like miRNAs may compensate for the loss of X-linked miR-506 family. These possibilities should be discussed.

      Reply: The three FmiRs were not deleted in this study because the SmiRs are much more abundant than the FmiRs in WT mice (Author Response image 1, heatmap version of Fig. 5C). Based on small RNA-seq, some FmiRs, e.g., miR-201 and miR-547, were upregulated in the SmiRs KO mice, suggesting that this small cluster may act in concert with the other 5 clusters and thus, worth further investigation. To our best knowledge, all the miR-506 family miRNAs are located on the X chromosome, although some other miRNAs were upregulated in the KO mice, they don’t belong to the miR-506 family. We briefly discussed this point in the text (Lines 635-638).

      Author response image 1.

      sRNA-seq of WT and miR-506 family KO testis samples.

      Direct molecular link to sperm competitiveness defect remains unclear but is difficult to address.

      Reply: In this study, we identified a target of the miR-506 family, i.e. Crisp1. KO of Crisp1 in mice, or inhibition of CRISP1 in human sperm (7, 8), appears to phenocopy the quinKO mice, displaying largely normal sperm motility but compromised ability to penetrate eggs. The detailed mechanism warrants further investigation in the future.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Lines 84-85: "Several cellular events are unique to the male germ cells, e.g., meiosis, genetic recombination, and haploid male germ cell differentiation (also called spermiogenesis)". This statement is not accurate. Please revise. Meiosis and genetic recombination are common to both male and female germ cells. They are highly conserved in both sexes in many species including mouse.

      Reply: Agreed. We have revised the sentence and it now reads: “Several cellular events are unique to the male germ cells, e.g., postnatal formation of the adult male germline stem cells (i.e., spermatogonia stem cells), pubertal onset of meiosis, and haploid male germ cell differentiation (also called spermiogenesis) (9)” (Lines 83-86).

      Lines 163-164: "we found that Slitrk2 and Fmr1 were syntenically linked to autosomes in zebrafish and birds (Fig. 1A), but had migrated onto the X chromosome in most mammals". This description is not accurate. Chr 4 in zebrafish and birds is syntenic to the X chromosome in mammals. The term "migrated" is not appropriate. Suggestion: Slitrk2 and Fmr1 mapped to Chr 4 (syntenic with mammalian X chromosome) in zebrafish and birds but to the X chromosome in most mammals.

      Reply: Agreed. Revised as suggested.

      Reviewer #2 (Recommendations For The Authors):

      (1) In the significance statement, the authors mention that the mutants are "functionally infertile," although the decrease in competitiveness is partial. I suggest referring to them as "functionally sub-fertile."

      Reply: Agreed. Revised as suggested.

      (2) I will urge the authors to explain in more detail how some figures are generated and what they mean. Some critical information needs to be included in various panels.

      (2a) Figure S1. The phastCons track does not seem to align as expected with the rest of the figure. The highest conservation peak is only present in humans, and the sequence conserved in the sea turtle has the lowest phastCons score. I was expecting the opposite from the explanation.

      Reply: The tracks for phyloP and phastCons are the scores for all 100 species, whereas the tracks with the species names on the left are the corresponding sequences aligned to the human genome. We have revised our figure to make it clearer.

      (2b) Figure 2A and Figure S2C. Although all the functional analysis of the manuscript has been done in mice, the alignments showing sequence conservation do not include the murine miRNAs. Please include the mouse miRNAs in these panels.

      Reply: The mouse has Mir-506-P7 with the conserved miRNA-3P seed region, which was included in the lower panel in Figure S2C. However, mice do not have Mir-506-P6, which may have been lost or too divergent to be recognized during the evolution and thus, were not included in Figure 2A and the upper panel in Figure S2C.

      (2c) Figure S7H. The panel could be easier to read.

      Reply: Agreed. We combined all the same groups and turned Figure S7H (now Figure S6H) into a heatmap.

      (2d) The legend of Figure 6G reads, "The number of target sites within individual target mRNAs in both humans and mice ." Can the author explain why the value 1 of the human "Number of target sites" is connected to virtually all the "Number of target sites" values in mice?

      Reply: Sorry for the confusion. For example, for gene 1, we have 1 target site in the human and 1 target site in the mouse; but for gene 2, we have 1 target site in the human and multiple sites in the mouse; therefore, the value 1 is connected to more than one value in the mouse.

      Reviewer #3 (Recommendations For The Authors):

      CRISP1 and EGR1 protein localization in WT and mutant sperm by immunostaining would be helpful.

      Reply: Agreed. We performed immunostaining for CRISP1 on WT sperm, and the new results are presented in Figure S8D. CRISP1 seems mainly expressed in the principal piece and head of sperm.

      The detailed description of the generation of various mutant lines should be included in the Methods.

      Reply: We added more details on the generation of knockout lines in the Materials and Methods (686701).

      References:

      (1) S. Vasudevan, Y. Tong, J. A. Steitz, Switching from repression to activation: microRNAs can upregulate translation. Science 318, 1931-1934 (2007).

      (2) R. F. Place, L. C. Li, D. Pookot, E. J. Noonan, R. Dahiya, MicroRNA-373 induces expression of genes with complementary promoter sequences. Proc Natl Acad Sci U S A 105, 1608-1613 (2008).

      (3) Z. Wang et al., X-linked miR-506 family miRNAs promote FMRP expression in mouse spermatogonia. EMBO Rep 21, e49024 (2020).

      (4) S. Yuan et al., Motile cilia of the male reproductive system require miR-34/miR-449 for development and function to generate luminal turbulence. Proc Natl Acad Sci U S A 116, 35843593 (2019).

      (5) S. Yuan et al., Oviductal motile cilia are essential for oocyte pickup but dispensable for sperm and embryo transport. Proc Natl Acad Sci U S A 118 (2021).

      (6) M. Guo et al., Uncoupling transcription and translation through miRNA-dependent poly(A) length control in haploid male germ cells. Development 149 (2022).

      (7) V. G. Da Ros et al., Impaired sperm fertilizing ability in mice lacking Cysteine-RIch Secretory Protein 1 (CRISP1). Dev Biol 320, 12-18 (2008).

      (8) J. A. Maldera et al., Human fertilization: epididymal hCRISP1 mediates sperm-zona pellucida binding through its interaction with ZP3. Mol Hum Reprod 20, 341-349 (2014).

      (9) L. Hermo, R. M. Pelletier, D. G. Cyr, C. E. Smith, Surfing the wave, cycle, life history, and genes/proteins expressed by testicular germ cells. Part 1: background to spermatogenesis, spermatogonia, and spermatocytes. Microsc Res Tech 73, 241-278 (2010).

    2. eLife assessment

      This study provides important findings on the evolution and function of the X-linked miR-506 miRNA cluster. The evidence supporting the conclusions is convincing, including the generation and characterization of an impressive number of the miRNA deletion mutants. This work will be of interest to RNA biologists, evolution biologists and reproductive biologists.

    3. Reviewer #1 (Public Review):

      Wang et al investigated the evolution, expression, and function of the X-linked miR-506 miRNA family. They showed that the miR-506 family underwent rapid evolution. They provided evidence that miR-506 appeared to have originated from the MER91C DNA transposons. Human MER91C transposon produced mature miRNAs when expressed in cultured cells. A series of mouse mutants lacking individual clusters, a combination of clusters, and the entire X-linked cluster (all 22 miRNAs) were generated and characterized. The mutant mice lacking four or more miRNA clusters showed reduced reproductive fitness (litter size reduction). They further showed that the sperm from these mutants were less competitive in polyandrous mating tests. RNA-seq revealed the impact of deletion of miR-506 on the testicular transcriptome. Bioinformatic analysis analyzed the relationship among miR-506 binding, transcriptomic changes, and target sequence conservation. The miR-506-deficient mice did not have apparent effect on sperm production, motility, and morphology. Lack of severe phenotypes is typical for miRNA mutants in other species as well. However, the miR-506-deficient males did exhibit reduced litter size, such an effect would have been quite significant in an evolutionary time scale. The number of mouse mutants and sequencing analysis represent a tour de force.

      Strengths:

      This study is a comprehensive investigation of the X-linked miR-506 miRNA family. It provides important insights into the evolution and function of the miR-506 family.

    4. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors conducted a comprehensive study of the X-linked miR-506 family miRNAs in mice on its origin, evolution, expression, and function. They demonstrate that the X-linked miR-506 family, predominantly expressed in the testis, may be derived from MER91C DNA transposons and further expanded by retrotransposition. By genetic deletion of different combinations of 5 major clusters of this miRNA family in mice, they found these miRNAs are not required for spermatogenesis. However, by further examination, the mutant mice show mild fertility problem and inferior sperm competitiveness. The authors conclude that the X-linked miR-506 miRNAs finetune spermatogenesis to enhance sperm competition.

      Strengths:

      This is a comprehensive study with extensive computational and genetic dissection of the X-linked miR-506 family providing a holistic view of its evolution and function in mice. The finding that this family miRNAs could enhance sperm competition is interesting and could explain their roles in finetuning germ cell gene expression to regulate reproductive fitness.

      Comments on revised version:

      The authors have addressed the concerns raised.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The present study by Berger et al. analyzes to what extent memory formation is dependent on available energy reserves. This has been dealt with extensively in the case of aversive memory formation, but only very sparsely in the case of appetitive memory formation. It has long been known that an appetitive memory in flies can only be formed by starvation. However, the authors here additionally show that not only the duration of starvation plays a role, but also determines which form of memory (short- or long-term memory) is formed. The authors demonstrated that internal glycogen stores play a role in this process and that this is achieved through insulin-like signaling in octopaminergic reward neurons that integrates internal energy stores into memory formation. Here, the authors suggest that octopamine plays a role as a negative regulator of different forms of memory.

      The study sheds light on an old question, to what extent the octopaminergic neuronal system plays a role in the formation of appetitive memory, since in recent years only the dopaminergic system has been in focus. Furthermore, the data are an interesting contribution to the ongoing debate whether insulin receptors play a role in neurons themselves or in glial cells. The experiments are very well designed and the authors used a variety of behavioural experiments, genetic tools to manipulate neuronal activity and state-of-the-art imaging techniques. In addition, they not only clearly demonstrated that octopamine is a negative regulator of appetitive memory formation, but also proposed a mechanism by which the insulin receptor in octopaminergic neurons senses the internal energy status and then controls the activity of those neurons. The conclusions are mostly supported by the data, but some aspects related to the experimental design, some explanations and literature references need more clarification and revision.

      (1) Usually, long-term memory (LTM) is tested 24 hours after training. Here, the authors usually refer to LTM as a memory that is tested 6 hours after training. The addition of a control experiment to show that LTM that the authors observe here lasts longer would increase the power of this study immensely.

      We thank the reviewer for this comment, as it helped greatly to clarify the matter.

      We measured memory of control and mutant flies 24 h after the training and included the data into the manuscript (Figure 1B and summarized in a model in Figure 2C). We show that control flies develop an intermediate type of memory, that is depending on the length of starvation either anesthesia-sensitive or resistant. Mutants lacking octopamine develop either anesthesia-sensitive or resistant long-term memory.

      (2) The authors define here another consolidated memory component as ARM, when they applied a cold-shock 2 hours after training. However, some publications showed that LTM is formed after only one training cycle (Krashes et al 2008, Tempel et al 1983). This makes it difficult to determine, whether appetitive ARM can be formed. Furthermore, one study showed that appetitive ARM is absent after massed training (Colomb et al 2009). Therefore, the conclusion could be also, that different starvation protocols, would lead to different stabilities of LTM. Therefore, additional experiments could help to clarify this opposing explanation. From these results, it can then be concluded either that different stable forms of LTM are formed depending on the starvation state, or that two differently consolidated memory phases (LTM, ARM) are formed, as has already been shown for aversive memory. This is also important for other statements in the manuscript, and therefore the authors should address this. For example, the findings about the insulin receptor (is it two opposing memories or different stabilities of LTM).

      The flies indeed develop different types of memory depending on the length of starvation and the internal energy supply.

      Reviewer #2 (Public Review):

      How organism physiological state modulates establishment and perdurance of memories is a timely question that the authors aimed at addressing by studying the interplay between energy homeostasis and food-related conditioning in Drosophila. Specifically, they studied how starvation modulates the establishment of short-term vs long-term memories and clarified the role of the monoamine Octopamine in food-related conditioning, showing that it is not per se involved in formation of appetitive short-term memories but rather gates memory formation by suppressing LTM when energy levels are high. This work clarifies previously described phenotypes and provides insight about interconnections between energy levels, feeding and formation of short-term and long-term food-related memories. In the absence of population-specific manipulation of octopamine signaling, it however does not reach a circuit-level understanding of how these different processes are integrated.

      Strengths

      • Previous studies have documented the impact of Octopamine on different aspects of food-related behaviors (regulation of energy homeostasis, feeding, sugar sensing, appetitive memory...), but we currently lack a clear understanding of how these different functions are interconnected. The authors have used a variety of experimental approaches to systematically test the impact of internal energy levels in establishment of appetitive memory and the role of Octopamine in this process.

      • The authors have used a range of approaches, performed carefully controlled experiments and produced high quality data.

      Weaknesses

      (1) In the tbh mutant flies, Tyramine -to- Octopamine conversion is inhibited, resulting not only in a lack of Octopamine, but also in elevated levels of Tyramine. If and how elevated levels of Tyramine contributes to the described phenotypes is unclear. In the current version of the manuscript, only one set of experiments (Figure 2) has been performed using Octopamine agonist. This is particularly important in light of recent published data showing that starvation modifies Tyramine levels. (2) Octopamine (and its precursor Tyramine) have been implicated in numerous processes, complicating the analysis of the phenotypes resulting from a general inhibition of tbh.

      We thank the reviewer for raising these points. The observed memory defects of the Tbh mutants can be solely explained by loss of octopamine. We included models into the manuscript to illustrate this (Figure 2 C and Figure 7E).

      To address whether the elevated levels of tyramine observed in Tbh mutants interfere with food consumption, we analyzed the effect of increased levels of tyramine and octopamine on food consumption. We included the data (Figure S2). An increase in tyramine levels did not result in a change in food intake, rather the increase in octopamine levels reduced food intake. Our data show that the reduction of food intake observed in starved Tbh mutants is due to the increased internal energy supply.

      (3) The manuscript explores various aspects of the impact of energy levels on food-related behaviors and the underlying sensing and effector mechanism, both in wild-type and tbh mutants, making it difficult to follow the flow of the results.

      We included models illustrating the results to clarify the content of the manuscript.

      Reviewer #3 (Public Review):

      In this manuscript, Berger et al. study how internal energy storage influence learning and memory. Since in Drosophila melanogaster, octopamine (OA) is involved in the regulation of energy homeostasis they focus on the roles of OA. To do so they use the tyramine-β-hydroxylase (Tbh) mutant that is lacking the neurotransmitter OA and study short term memory (STM), long-term memory (LTM) and anesthesia-resistant memory (ARM). They show that the duration of starvation affects the magnitude of both short- and long-term memory. In addition, they show that OA has a suppressive effect on learning and memory. In terms of energy storage, they show that internal glycogen storage influences how long sucrose is remembered and high glycogen suppresses memory. Finally, they show that insulin-like signaling in octopaminergic neurons, which is also related to internal energy storage, suppresses learning and memory.

      This is an important study that extends our knowledge on OA activity in learning and memory and the effects the metabolic state has on learning and memory. The authors nicely use the genetic tools available in flies to try and unravel the complex circuitry of metabolic state level, OA activity and learning and memory.

      Nevertheless, I do have some comments that I think require attention:

      (1) The authors use RNAi to reduce the level of glycogen synthase or glycogen phosphorylase. These manipulations are expected to affect the level of glycogen. Using specific drivers the authors attempt to manipulate glycogen level at the muscles and fat bodies and examine how this affects learning and memory. The conclusions of the authors arise solely from the manipulation intended (i.e. the genetics). However, the authors also directly measured glycogen levels at these organs and those do not follow the manipulation intended, i.e. the RNAi had very limited effect on the glycogen level. Nevertheless, these results are ignored.

      We agreed with the reviewer and repeated the experiments. While we could not detect differences in whole animals, we detected differences in tissues enriched for muscles or fat, e.g. thorax or abdomen. We added the data.

      (2) The authors claim in the summary that OA is not required for STM. However, according to one experiment OA is required for STM as Tbh mutants cannot form STM. In another experiment OA is suppressive to STM as wt flies fed with OA cannot form STM. Therefore, it is very difficult to appreciate the actual role of OA on STM.

      During mild starvation, the internal energy supply is greater in Tbh mutants than in control flies. This information is integrated into the reward system via insulin receptor signaling. Therefore, the association between the odorant and sucrose is not meaningful to the mutants and no STM is formed. At the same time there is no release of octopamine and therefore no repression of LTM. In starved animals, octopamine suppresses food intake (we added the data). This is consistent with a function of Octopamine as a signal for the presence of food. Depending on when the signal comes, this might suppress the formation of STM or LTM.

      (3) The authors use t-test and ANOVA for most of the statistics, however, they did not perform normality tests. While I am quite sure that most datasets will pass normality test, nevertheless, this is required.

      Thanks for pointing this out. We have included a description in the “Materials and Methods” section that explains how we tested the data for normal distribution. We corrected the figure legends accordingly.

      “We used the Shapiro-Wilk test (significance level P < 0.05) followed by a QQ-Plot chart analysis to determine whether the data were normal distributed. “

      (4) While it is logical to assume that OA neurons are upstream to R15A04 DA neurons, I am not sure this really arises from the experiment that is presented here. It is well established that without activity in R15A04 DA neurons there is no LTM. Since OA acts to decrease LTM, can one really conclude anything about the location of OA effect when there is no learning?

      Normally control flies did not form memory 6 h after training, only Tbh mutants. We wanted to investigate what kind of memory develops in Tbh mutants. During the experiments of the manuscript, we kept the training procedure constant.

      (5) It is unclear how expression of a dominant negative form of insulin receptor (InR) in OA neurons can rescue the lack of OA due to the Tbh mutation. If OA neurons cannot release anything to the presumably downstream DA neurons, how can changing their internal signaling has any effect?

      The expression of the dominant negative form of the insulin receptor signals no food or low energy levels and activation of the insulin receptor that there is enough food. The reward is a source of food, but the energy content is not high enough to fill the energy stores. The insulin receptor activation can activate at least three different signaling cascades, one of which might regulate octopamine release.

      While I stressed some comments that need to be addressed, the overall take-home message of the manuscript is supported and the authors do show that the metabolic state of the animal affects learning and memory. I do think though, that some more caution is required for some of the conclusions.

      We added additional data to address the points raised.

      Recommendations for the authors:

      We addressed all points raised by the reviewers, clarified the content or added more data.

      Reviewer #1 (Recommendations For The Authors):

      (1) Throughout the manuscript, the full stop of a sentence is always placed before the references.

      We fixed this.

      (2) I find the English in the manuscript not yet sufficient for publication. I suggest that the authors carefully revise the manuscript. I think if the sentences are structured a little more clearly, this paper has enormous potential to be read by your broad community.

      We agree and revised the manuscript. We hope the manuscript is now clearer.

      (3) Sentences l114 to l117 are misleading. The authors imply that they tested the same flies for changes in odor perception or sucrose sensitivity. I assume that the authors meant that they analyzed different groups of animals.

      We clarified the sentence as follows:

      “To ensure that the observed differences in learning and memory were not due to changes in odorant perception, odorant evaluation or sucrose sensitivity, different fly populations of the same genotypes were tested for their odorant acuity, odorant preference and their sucrose responsiveness (Table S1).”

      (4) In the title as well as in the abstract the influence of octopamine on appetitive memory formation is described in more detail, this is also the main focus of this study. However, in the introduction, the influence of the insulin receptor on memory formation is discussed first. Personally, I would describe this later in the manuscript, ideally in the results section. At this point in the manuscript, this leads to an interruption in the flow of reading.

      Thanks for the suggestion. We changed the order in the introduction.

      (5) The authors could consider, since they only used Drosophila melanogaster, changing "Drosophila melanogaster" to "Drosophila" throughout the manuscript.

      We modified the text accordingly.

      (6) All evaluations and statistical tests are state of the art. However, I have one comment. For each statistical test, a correction should be made depending on the number of tests. However, I could not determine whether this was also done for the parametric or non-parametric one-sample t-test. From the results and the methods section, I would guess not. Here I would recommend a Bonferroni correction or even better a Sidak-Holm correction. Furthermore, the authors could also go into more detail about which non-parametric one-sample t-test they used.

      We described the statistic used in more detail in the material and method section.

      “We used the Shapiro-Wilk test (significance level P < 0.05) followed by a QQ-Plot chart analysis to determine whether the data were normal distributed. For normal distributed data, we used the Student’s t test to compare differences between two groups and the one-way ANOVA with Tukey’s post hoc HSD test for differences between more than two groups. For nonparametric data, we used the Mann-Whitney U test for differences between two groups and for more than two groups the Kruskal-Wallis test with post hoc Duenn analysis and Bonferroni correction. The nonparametric one-sample sign test was used to analyze whether behavior was not based on random choice and differed from zero (P < 0.5). The statistical data analysis was performed using statskingdom (https://www.statskingdom.com).”

      (7) In nearly all figure legends the sentence "The letter "a" marks a significant difference from random choice as determined by a one-sample sign test (P < 0.05; P< 0.01)" occur. This is correctly indexed in the figures. However, I do not understand here what then P < 0.05; P**< 0.01 means. The significance level should be described here. I would strongly recommend the authors to make the definition clearer.

      We corrected this in the figure legends (see also above).

      (8) In Fig. 1B the labelling is a bit confusing. I interpret the two right groups as the mutants for octopamine, but there is still w[1118] in front.

      We modified the Figure 1B.

      Reviewer #2 (Recommendations For The Authors):

      Suggestions

      (1) Assessing the contribution of Tyramine in the observed phenotypes (for example by reducing the levels of Tyramine or its specific receptor) would help understand the contribution of Tyramine in the observed phenotypes.

      See comments above.

      We thank the reviewer for raising these points. The observed memory defects of the Tbh mutants can be solely explained by loss of octopamine. We included models into the manuscript to illustrate this (Figure 2 C and Figure 7E).

      To address whether the elevated levels of tyramine observed in Tbh mutants interfere with food consumption, we analyzed the effect of increased levels of tyramine and octopamine on food consumption. We included the data (Figure S2). An increase in tyramine levels did not result in a change in food intake, rather the increased octopamine levels reduced food intake. Our data show that the reduction of food intake observed in starved Tbh mutants is due to the increased internal energy supply.

      (2) Cell-specific inhibition of octopamine receptors should thus be performed to precisely interpret the observed phenotypes and dissect how interconnected the different phenotypes are, which is the object of this publication.

      We observed that the time point and duration of octopamine application changes the behavioral output. The behavior analyzed depends on pulses of octopamine and differences of the internal energy status. A cell-specific inhibition via RNAi knock down of octopamine receptors might not clarify the issue.

      (3) Defining of streamline and progressively integrating the different observations into a unifying model would improve the clarity and flow of the manuscript.

      We included models explaining the observed results (Figure 2C and Figure 7E).

      Minor comments

      Line 129: Figure 1B should be mentioned, not 2B.

      Figure 1 legend: E should be replaced by C (after A,B).

      Figure S5: what are the arrows pointing to? Why are the Inr foci visible in A not seen in B? It should be mCD8-GFP and not mCD on top of the images.

      We fixed this.

      Reviewer #3 (Recommendations For The Authors):

      Major:

      (1) Can one really conclude from Figure 2A that OA acts on R15A04 DA neurons? It is well established that without activity in these DA neurons there is no LTM. Since OA acts to decrease learning, how one can conclude anything about the location of OA effect when there is no learning? With STM the situation was opposite, OA supported learning and this was abolished when DA neurons were silenced. I think some supporting experiment are required, i.e. how OA affects DA neurons activity or, alternatively, tone down a bit the writing.

      Normally control flies did not form memory 6 h after training, only Tbh mutants. We wanted to investigate what kind of memory develops in Tbh mutants. During the experiments of the manuscript, we kept the training procedure constant. The inhibition of dopaminergic neurons blocks the memory of Tbh mutants. Taken together the duration of the memory, the cold-shock experiments and the inhibition of the dopaminergic neurons, Tbh develops LTM after training. This training does not evoke memory in controls.

      The loss of STM in mildly starved Tbh mutants depends on the integration of the high internal energy levels via InR signaling. Reducing the internal energy levels further by extension of starvation result in STM supporting that OA is not directly involved in the formation of STM.

      (2) Figure 4 requires some clarifications. In Supplementary Figure S2 the authors show that they could not manipulate glycogen levels in muscles. However, in Figure 4B they show that "Increasing glycogen levels in the muscles did not change short-term memory in 16 h starved flies, but the reduction in glycogen significantly improved memory strength (Figure 4B)" (lines 231-233). How can this be reconciled?

      While we could not detect differences in whole animals, we detected differences in glycogen content in body parts enriched with muscles or fat, e.g. thorax or abdomen when using UAS-GlyP-RNAi or UAS-GlyS-RNAi under the control of the respective Gal4 drivers.

      We added the data.

      Likewise, the authors write that "Increasing or decreasing glycogen levels in the fat bodies had no effect on memory performance (Figure 4C)" Line (233-234). However, in Figure S2 they show that they can only increase glycogen levels but not decrease them.

      As explained above the conclusion of Figure 4 "Thus, low levels of glycogen in the muscles upon starvation positively influence appetitive short-term memory, while high levels of glycogen in the muscles and fat body reduce short-term memory" lines 245-246, is not supported by the direct measurements of glycogen presented in Figure S2.

      We added the data showing that the reduction or increase can be measured when analyzing the specific body parts enriched in muscles tissue or fat tissue.

      (3) In cases where mutant flies do not display learning, a control should be done to see if they ate the sugar (with dye). Especially since the genetic manipulation affects metabolism.

      We analyzed how much sucrose the animals consumed in the behavioral test. Tbh and controls fed and there was no difference in feeding behavior between the mutants and the controls.

      “We next determined whether differences in preferences influence sucrose intake during training. Therefore, we measured the sucrose intake of starved flies in the behavioral set up. We used a food-colored sucrose solution and evaluated the presence of food in the abdomen of the fly after two 2 min (Table S1). Flies fed sucrose within 2 min and there was no difference between w1118 and TβhnM18 flies. “

      (4) The use of t-test requires the data to be normally distributed. If I am not mistaken this was not demonstrated for any of the datasets used. I did a quick check on one of the datasets provided in the excel sheet and it is normally distributed. Therefore, please add normality test for all data sets. If some do not pass normality, please use a suitable non-parametric test.

      We added normality test to all data sets and used non-parametric tests for non-normal distributed data. We clarify this in the material and method section and the figure legends.

      (5) The authors show that OA suppresses also STM. This result is in contradiction to previous published results. This by itself is not a problem. However, this result also seems to me in contradiction to the authors own results. According to Figure 1B, OA is required for STM as it absence in the tbh mutant results in loss of STM. According to Figure 2C, OA is reducing STM as wt flies fed with OA just prior to learning do not form STM. This appears in other places in the manuscript as well.

      In addition, in the text lines 178-180, the authors write "A short pulse of octopamine before the training inhibits the STM. Thus, octopamine is a negative regulator of appetitive dopaminergic neuron-dependent long-term memory and can block STM." But in the summary they write "Octopamine is not required for short-term memory, since octopamine deficient mutants form appetitive short-term memory to sucrose and to other nutrients depending on the internal energy status." So, the take-home message regarding OA and STM is unclear.

      The authors need to better clarify this point.

      We clarified these points. See comments above. The loss of memory in Tbh mutants is not due to loss of octopamine, but increased energy levels that changes the reward properties of sucrose.

      (6) The manuscript is very difficult to follow. The authors constantly change between 16 and 40 hours starvation, short term memory, 3 hour memory and 6 hour memory. I think it would have been better to have a more focused manuscript. However, if this is not possible, I recommend preparing a diagram with the different neurons or signaling pathways (i.e. insulin) and how they affect each other. Also, perhaps add to each figure a panel describing exactly the experimental conditions. I think also simplifying the text and adding more conclusions throughout the results section will help the readers to follow. Finally, I think that it would help understanding the conclusions if the authors can add a diagram of the flow that they think occurs. For example, the authors show that glycogen suppresses learning as its reduction increases learning. They also show that InR activity receptor suppresses learning as its KD also increases learning. If I am not mistaken the link between the two is not straight forward (but I may be wrong here). A diagram of the flow would be very helpful.

      We prepared diagrams summarizing and explaining the results.

      Minor

      (1) I may not have understood correctly as I am not sure that I found Table S1.

      Also, there was no legend for Table S1.

      Nevertheless, if I understood correctly, the authors write that "Before the experiments, flies were tested to determine whether they perceived the odorants, preferred one odorant over other and responded to the reward similarly to ensure that the observed differences in behavior were not due to changes in odorant perception or sucrose sensitivity (Table S1)." However, according to the Table that I found it seems that following 40h starvation wt flies show preference to OCT whereas this does not occur for the mutant. Also, it seems that at 16h the mutant has a much higher preference to the odors than after 40h. This is a bit odd. I am also not sure what the balance value refers to. Finally, the mutant shows really low 2M sucrose preference after 40h. In general, this set of experiments requires a bit more explanation.

      I think it is better to show these experiments using graphs and add this to the supplementary figures.

      We clarified the experiments in the result section as follows and added an explanation to the material and method section. We tested the odorant acuity and sucrose preference for all genotypes used in the manuscript and added the data to the Table S1.

      “The flies of the different genotypes sensed the odorants and evaluated them as similar salient in comparison. This is important to a avoid a bias in the situation where flies have to choose between the two odorants after training. They also sensed sucrose. We next determined whether differences in preferences influence sucrose intake during training. Therefore, we measured the sucrose intake of starved flies in the behavioral set up. We used a food-colored sucrose solution and evaluated the presence of food in the abdomen of the fly after two 2 min (Table S1). Flies fed sucrose within 2 min and there was no difference between w1118 and TβhnM18 flies.”

      (2) Line 129 should be Figure 1B

      Is corrected.

      (3) Line 133, Figure 1C, how can one explain the negative reinforcement? I can understand no reinforcement, but negative?

      The effect of glucose might be doses dependent. 0.15 M sucrose is a much closer to a realistic concentration found in fruits than 2 M sucrose and might therefore elicit aversion. When animals are starved enough they might find any food source attractive, even when the concentrations of sucrose is unrealistic.

      (4) Figure 1, why are the graphs different between panel B and C?

      Is corrected.

      (5) In Figure S1, are the TβhnM18 groups differ significantly from zero? I think they are, so better to state this somewhere. If not, the claims in lines 134-135 are not supported by the data.

      We added the significance and added the data to Figure 1.

      Figure S1 legend: there is no A panel. Also "below box blots" should be box plots.

      Thanks for pointing that out. We corrected it.

      (6) It is not clear what is the duration of starvation used in Figure 2A. I assume that 16h and sucrose 2M used were used, but I would state that explicitly.

      We added the information to the figure legends.

      (7) Figure 2A is missing a control of flies with both the driver and UAS shibirets at the permissive temperature.

      We added the controls to the supplement (Figure S1).

      (8) It seems to me that Figure 3B, in which the author state that "Only after 40 h of starvation did TβhnM18 mutants show a similar preference to control sucrose consumption" (line 198) is somewhat in contradiction to Table S1 in which I see Sucrose preference for wt 0.36 and for tbh 0.17. I think this comment arise because I did not understand Table S1 correctly, so please better explain.

      We rewrote this section.

      (9) In Figure 3C, consider not using std as this stands for standard deviation and may be confusing.

      We now use the term “food” instead of “std” and explained in the legend that food means standard fly food.

      We fixed this.

      (10) Please check the Supplementary Figures. I think Figures S2 and S3 are switched.

      We fixed this.

      (11) There is a mistake in Figure S3A. The right column should have another "+" sign.

      Thanks, we fixed this.

      (12) I am somewhat puzzled by Figures 4 and 5. If I understand correctly figure 4B w1118 mef2-G4 is exactly the same experiment as Figure 5A w1118 mef2-G4 and yet in Figure 4B performance index is 0.2 and in Figure 5A about 0.4. According to other comparisons it seems to me that these will be significantly different and yet it is the same experiment.

      They are two independent experiments done at different times. The controls were independently repeated.

      (13) Line 273 should be Figure 5C.

      Is corrected.

      (14) I don't think this is a correct sentence "Virgin females remembered sucrose significantly better than mated females." Line 274.

      Reads now:

      “Virgin females remembered the odorant paired with sucrose significantly better than mated females.”

      (15) Line 340 there is no Figure 1E

      Is fixed (1 C)

      (16) The data excel file is difficult to follow. In Figure 2 there are references to Figure 5. The graphs are pointing to other files. Text is not always in English. It is not clear what W stands for. I recommend making it more accessible.

      We corrected the data excel files.

      (17) The manuscript is difficult to follow. I recommend preparing a diagram with the different neurons or signaling pathways (i.e. insulin) and how they affect each other.

      We improved the data presentation by

      a) adding a model showing the kinetics of memory formation in controls and mutants (Figure 2C)

      b) a model explaining how the internal state is integrated into the formation of memory (Figure 7D).

    2. eLife assessment

      This important study dissects the role of octopamine in the interplay between internal energy homeostasis, food intake and food-related memories. The solid experimental evidence will shed additional light on previously published work and should be of interest to the growing community of biologists interested in how internal state shapes behavior, including decision making processes, learning and memory.

    3. Reviewer #1 (Public Review):

      The present study conducted by Berger et al. delves into the extent to which memory formation relies on available energy reserves. While aversive memory formation has been extensively studied in this context, the investigation into appetitive memory formation has been comparatively sparse. It has long been recognized that flies can only form appetitive memory under conditions of starvation. However, the authors of this study go beyond this understanding by revealing that not only the duration of starvation matters, but it also dictates the type of memory formed, whether short- or long-term memory. The authors illustrate that internal glycogen stores play a crucial role in this process, facilitated by insulin-like signaling in octopaminergic reward neurons, which integrates internal energy reserves into memory formation. Consequently, the authors propose that octopamine serves as a negative regulator of various forms of memory, shedding light on the enduring question of the octopaminergic neuronal system's involvement in appetitive memory formation, which has been overshadowed by the focus on the dopaminergic system in recent years. Additionally, the findings contribute to the ongoing debate concerning the role of insulin receptors, whether they function within neurons themselves or in glial cells. Moreover, the authors not only convincingly demonstrate that octopamine negatively regulates appetitive memory formation, but they also propose a mechanism whereby the insulin receptor in octopaminergic neurons senses the internal energy status and subsequently modulates the activity of these neurons. The experiments are meticulously designed, employing a variety of behavioral assays, genetic tools for manipulating neuronal activity, and state-of-the-art imaging techniques. The conclusions are well supported by the data and carefully performed controlled experiments, yielding high-quality data.

    4. Reviewer #2 (Public Review):

      How organism physiological state modulates establishment and perdurance of memories is a timely question that the authors aimed at addressing by studying the interplay between energy homeostasis and food-related conditioning in Drosophila. Specifically, they studied how starvation modulates the establishment of short-term vs long-term memories and clarified the role of the monoamine Octopamine in food-related conditioning, showing that it is not per se involved in formation of appetitive short-term memories but rather gates memory formation by suppressing LTM when energy levels are high. This work clarifies previously described phenotypes and provides insight about interconnections between energy levels, feeding and formation of short-term and long-term food-related memories.

      Strengths<br /> - Previous studies have documented the impact of Octopamine on different aspects of food-related behaviors (regulation of energy homeostasis, feeding, sugar sensing, appetitive memory...), but we currently lack a clear understanding of how these different functions are interconnected. The authors have used a variety of experimental approaches to systematically test the impact of internal energy levels in establishment of different forms of appetitive memory and the role of Octopamine in this process.

      - The authors have used a range of approaches, performed carefully controlled experiments and produced high quality data.

    5. Reviewer #3 (Public Review):

      In this manuscript, Berger et al. study how internal energy storage influence learning and memory. Since in Drosophila melanogaster, octopamine (OA) is involved in the regulation of energy homeostasis they focus on the roles of OA. To do so they use the tyramine-β-hydroxylase (Tbh) mutant that is lacking the neurotransmitter OA and study short term memory (STM), long-term memory (LTM) and anesthesia-resistant memory (ARM). They show that the duration of starvation affects the magnitude of both short- and long-term memory. In addition, they show that OA has a suppressive effect on learning and memory. In terms of energy storage, they show that internal glycogen storage influences how long sucrose is remembered, and high glycogen suppresses memory. Finally, they show that insulin-like signaling in octopaminergic neurons, which is also related to internal energy storage, suppresses learning and memory.

      The revised version of the manuscript is greatly improved, and I thank the authors for taking the comment seriously. This is an important study that extends our knowledge on OA activity in learning and memory and the effects the metabolic state has on learning and memory. The authors nicely use the genetic tools available in flies to try and unravel the complex circuitry of metabolic state level, OA activity and learning and memory. The overall take-home message of the manuscript is clear and supported by the data presented.

    1. Author Response

      The following is the authors’ response to the original reviews.

      This study reports important evidence that infants' internal factors guide children's attention and that caregivers respond to infants' attentional shifts during caregiver-infant interactions. The authors analyzed EEG data and multiple types of behaviors using solid methodologies that can guide future studies of neural responses during social interaction in infants. However, the analysis is incomplete, as several methodological choices need more adequate justification.

      Reviewer #1

      Public Review:

      The authors bring together multiple study methods (brain recordings with EEG and behavioral coding of infant and caregiver looking, and caregiver vocal changes) to understand social processes involved in infant attention. They test different hypotheses on whether caregivers scaffold attention by structuring a child's behavior, versus whether the child's attention is guided by internal factors and caregivers then respond to infants' attentional shifts. They conclude that internal processes (as measured by brain activation preceding looking) control infants' attention, and that caregivers rapidly modify their behaviors in response to changes in infant attention.

      The study is meticulously documented, with cutting-edge analytic approaches to testing alternative models; this type of work provides a careful and well-documented guide for how to conduct studies and process and analyze data for researchers in the relatively new area of neural response in infants in social contexts.

      We are very pleased that R1 considers our work an important contribution to this developing field, and we hope that we have now addressed their concerns below.

      Some concerns arise around the use of terms (for example, an infant may "look" at an object, but that does not mean the infant is actually "attending); collapsing of different types of looks (to people and objects), and the averaging of data across infants that may mask some of the individual patterns.

      We thank the reviewer for this feedback and their related comments below, and we feel that our manuscript is much stronger as a result of the changes we have made. Please see blow for a detailed description of our rationale for defining and analysing the attention data, as well as the textual changes made in response to the author’s comments.

      Recommendations For The Authors

      This paper is rigorous in method, theoretically grounded, and makes an important contribution to understanding processes of infant attention, brain activity, and the reciprocal temporal features of caregiver-infant interactions. The alternative hypothesis approach sets up the questions well (although authors should temper any wording that suggests attention processes are one or the other. That is, certain bouts of infant attention can be guided by exogenous factors such as social input, and others be endogenous; so averaging across all bouts can actually mask the variation in these patterns). I appreciated the focus on multiple types of behavior (e.g., gaze, vocal fluctuations in maternal speech); the emphasis on contingent responding; and the very clear summaries of takeaways after each section. Furthermore, methods and analyses are well described, details on data processing and so on are very thorough, and visualizations aptly facilitate data interpretation. However, I am not an expert on infant neural responses in EEG and assume that a reviewer with such expertise will weigh in on the treatment and quality of the data; therefore, my comments should be interpreted in light of this lack of knowledge.

      We thank R1 for these very positive and insightful comments on our analyses which are the result of a number of years of methodological and technical developmental work.

      We do agree with R1 that we should more carefully word parts of our argument in the Introduction to make clear the fact that shifts in infant attention could be driven by a combination of interactive and endogenous influences. As a result of this comment, we have made direct changes to parts of the Introduction; removing any wording that suggests that these processes are ‘alternative’ or ‘separate’, and our overall aim states: ‘Here, recording EEG from infants during naturalistic interactions with their caregiver, we examined the (inter)-dependent influences of infants’ endogenous oscillatory neural activity, and inter-dyadic behavioural contingencies in organising infant attention’.

      Examining variability between infant attention episodes in the factors that influence the length and timing of the attention episode is an important area for future investigation. We now include a discussion on this on page 38 of the Discussion section, with suggestions for how this could be examined. Investigating different subtypes of infant attention is methodologically challenging, given the number of infant behaviours that would need to inform such an analysis- all of which are time consuming to code. Developing automated methods for performing these kinds of analyses is an important avenue for future work.

      Here, I review various issues that require revision or elaboration based on my reading of what I consider to otherwise be a solid and important research paper.

      Problem in the use of the term attention scaffolding. Although there may be literature precedent in the use of this term, it is problematic to narrowly define scaffolding as mother-initiated guidance of attention. A mother who responds to infant behaviors, but expands on the topic or supports continued attention, and so on, is scaffolding learning to a higher level. I would think about a different term because it currently implies a caregiver as either scaffolding OR responding contingently. It is not an either-or situation in conceptual meaning. In fact, research on social contingency (or contingent responsiveness), often views the follow-in responding as a way to scaffold learning in an infant.

      Yes, we agree with R1 that the term ‘attention scaffolding’ could be confusing given the use of this term in previous work conducted with children and their caregivers in problem-solving tasks, that emphasise modulations in caregiver behaviour as a function of infant behaviour. As a result of this suggestion, we have made direct edits to the text throughout, replacing the term attentional scaffold with terms such as ‘organise’ and ‘structure’ in relation to the caregiver-leading or ‘didactic’ perspective, and terms such as ‘contingent responding’ and ‘dynamic modulation’ in relation to the caregiver-following perspective. We feel that this has much improved the clarity of the argument in the Introduction and Discussion sections.

      Do individual data support the group average trends? My concern with unobservable (by definition) is that EEG data averages may mask what's going on in individual brain response. Effects appear to be small as well, which occurs in such conditions of averaging across perhaps very variable response patterns. In the interest of full transparency and open science, how many infants show the type of pattern revealed by the average graph (e.g., do neural markers of infant engagement forward predict attention for all babies? Majority?). Non-parametric tests on how many babies show a claimed pattern would offer the litmus test of significance on whether the phenomenon is robust across infants or pulled by a few infants with certain patterns of data. Ditto for all data. This would bolster my confidence in the summaries of what is going on in the infant brain. (The same applies as I suggest to attention bouts. To what extent does the forward-predict or backward-predict pattern work for all bouts, only some bouts, etc.?). I recognize that to obtain power, summaries are needed across infants and bouts, but I want to know if what's being observed is systematic.

      We thank R1 for this comment and understand their concern that the overall pattern of findings reported in relation to the infants’ EEG data might obscure inter-individual variability in the associations between attention and theta power. Averaging across individual participant EEG responses is, however, the gold standard way to perform both event-locked (Jones et al., 2020) and continuous methods (Attaheri et al., 2020) of EEG analysis that are reported in the current manuscript. EEG data, and, in particular, naturalistic EEG data is inherently noisy, and averaging across participants increases the signal to noise ratio (i.e. inconsistent, and, therefore, non-task-related activity is averaged out of the response (Cohen, 2014; Noreika et al., 2020)). Examining individual EEG responses is unlikely to tell us anything meaningful, given that, if a response is not found for a particular participant, then it could be that the response is not present for that participant, or that it is present, but the EEG recording for that participant is too noisy to show the effect. Computing group-level effects, as is most common in all neuroimaging analyses, is, therefore, most optimal to examining our main research questions.

      The findings reported in this analysis also replicate previous work conducted by our lab which showed that infant attention to objects significantly forward-predicted increases in infant theta activity during joint table-top play with their caregiver, involving one toy object (compared to our paradigm which involved 3;Wass et al., 2018). More recent work conducted by our lab has also shown continuous and time-locked associations between infant look durations and infant theta activity when infants play with objects on their own (Perapoch Amadó et al., 2023). To reassure readers of the replicability of the current findings, we now reference the Wass et al. (2018) study at the beginning of the Discussion section.

      Could activity artifacts lead to certain reported trends? Babies typically look at an object before they touch or manipulate the object, and so longer bouts of attention likely involve a look and then a touch for lengthier time frames. If active involvement with an object (touching for example) amplifies theta activity, that may explain why attention duration forward predicts theta power. That is, baby looks, then touches, then theta activates, and coding would show visual gaze preceding the theta activation. Careful alignment of infants' touches and other such behaviors with the theta peak might help address this question, again to lend confidence to the robustness of the interpretation.

      Yes, again this is a very important point, and the removal of movement-related artifact is something we have given careful attention to in the analysis of our naturalistic EEG data (Georgieva et al., 2020; Marriott Haresign et al., 2021). As a result of this comment we have made direct changes to the Results section on page 18 to more clearly signal the reader to our EEG pre-processing section before presenting the results of the cross-correlation analyses.

      As we describe in the Methods section of the main text, movement-related artifacts are removed from the data with ICA decomposition, utilising an automatic-rejection algorithm, specially designed for work with our naturalistic EEG data (Marriott Haresign et al., 2021). Given that ICA rejection does not remove all artifact introduced to the EEG signal, additional analysis steps were taken to reduce the possibility that movement artifacts influenced the results of the reported analyses. As explained in the Methods section, rather than absolute theta power, relative theta was used in all EEG analyses, computed by dividing the power at each theta frequency by the summed power across all frequencies. Eye and head movement-related artifacts most often associate with broadband increases in power in the EEG signal (Cohen, 2014): computing relative theta activity therefore further reduces the potential influence of artifact on the EEG signal.

      It is also important to highlight that previous work examining movement artifacts in controlled paradigms with infants has shown that limb movements actually associate with a decrease in power at theta frequencies, compared to rest (Georgieva et al., 2020). It is therefore unlikely that limb movement artifacts explain the pattern of association observed between theta power and infant attention in the current study.

      That said, examining the association between body movements and fluctuations in EEG activity during naturalistic interactions is an important next step, and something our lab is currently working on. Given that touching an object is most often the end-state of a larger body movement, aligning the EEG signal to the onset of infant touch is not all that informative to understanding how body movements associate with increases and decreases in power in the EEG signal. Our lab is currently working on developing new methods using motion tracking software and arousal composites to understand how data-derived behavioural sub-types associate with differential patterns of EEG activity.

      The term attention may be misleading. The behavior being examined is infant gaze or looks, with the assumption that gaze is a marker of "attention". The authors are aware that gaze can be a blank stare that doesn't reflect underlying true "attention". I recommend substitution of a conservative, more precise term that captures the variable being measured (gaze); it would then be fine to state that in their interpretation, gaze taken as a marker for attention or something like that. At minimum, using term "visual attention" can be a solution if authors do not want to use the precise term gaze. As an example, the sentence "An attention episode was defined as a discrete period of attention towards one of the play objects on the table, or to the partner" should be modified to defined as looking at a play object or partner.

      We thank the reviewer for this comment, and we understand their concern with the use of the term ‘attention’ where we are referring to shifts in infant eye gaze. However, the use of this term to describe patterns of infant gaze, irrespective of whether they are ‘actually attending’ or not is used widely in the literature, in both interactive (e.g. Yu et al., 2021) and screen-based experiments examining infant attention (Richards, 2010). We therefore feel that its use in our current manuscript is acceptable and consistent with the reporting of similar interaction findings. On page 39 of the Discussion we now also include a discussion on how future research might further investigate differential subtypes of infant looks to distinguish between moments where infants are attending vs. just looking.

      Why collapse across gaze to object vs. other? Conceptually, it's unclear why the same hypotheses and research questions on neural-attention (i.e., gaze in actuality) links would apply to looks to a mom's face or to an object. Some rationale would be useful to the reader as to why these two distinct behaviors are taken as following the same principles in ordering of brain and behavior. Perhaps I missed something, however, because later in the Discussion the authors state that "fluctuations in neural markers of infants' engagement or interest forward-predict their attentiveness towards objects", which suggests there was an object-focused variable only? Please clarify. (Again, sorry if I missed something).

      This is a really important point, and we agree with R1 that it could have been more clearly expressed in our original submission – for which, we apologise. In the cross-correlation analyses conducted in parts 2 and 3 which examines forwards-predictive associations between infant attention durations and infant endogenous oscillatory activity (part two), and caregiver behaviour (part three), as R1 describes, we include all infant looks towards objects and their partner. Including all infant look types is necessary to produce a continuous variable to cross-correlate with the other continuous variables (e.g. theta activity, caregiver vocal behaviours), and, therefore, does not concentrate only on infant attention episodes towards objects.

      We take the reviewers’ point that different attention and neural mechanisms may be associated with looks towards objects vs. the partner, which we now acknowledge directly on page 10 of the Introduction. However, our focus here is on the endogenous and interactive mechanisms that drive fluctuations in infant engagement with the ongoing, free-flowing interaction. Indeed, previous work has shown increases in theta activity during sustained episodes of infant attention to a range of different stimuli, including cartoon videos (Xie et al., 2018), real-life screen-based interactions (Jones et al., 2020), as well as objects (Begus et al., 2016). In the second half of part 2, we go on to address the endogenous processes that support infant attention episodes specifically towards objects.

      As a result of this comment, we have made direct changes to the Introduction on page 10 to more clearly explain the looking behaviours included in the cross-correlation analysis, and the rationale behind the analysis being conducted in this way – which is different to the reactive analyses conducted in the second half of parts one and three, which examines infant object looks only. Direct edits to the text have also been made throughout the Results and Methods sections as a result of this comment, to more clearly specify the types of looks included in each analysis. Now, where we discuss the cross-correlation analyses we refer only to infant ‘attention durations’ or infant ‘attention’, whilst ‘object-directed attention’ and ‘looks towards objects’ is clearly specified in sections discussing the reactive analyses conducted in parts 2 and 3. We have also amended the Discussion on page 31so that the cross-correlation analyses is interpreted relative to infant overall attention, rather than their attention towards objects only.

      Why are mothers' gazes shorter than infants' gazes? This was the flip of what I'd expect, so some interpretation would be useful to understanding the data.

      This is a really interesting observation. Our findings of the looking behaviour of caregivers and infants in our joint play interactions actually correspond to much previous micro-dynamic analysis of caregiver and infant looking behaviour during early table-top interactions (Abney et al., 2017; Perapoch Amadó et al., 2023; Yu & Smith, 2013, 2016). The reason for the shorter look durations in the adult is due to the fact that the caregivers alternate their gaze between their infant and the objects (i.e. they spend a lot of the interaction time monitoring their infants’ behaviours). This can be seen in Figure 2 (see main text) which shows that caregiver looks are divided between looks to their infants and looks towards objects. In comparison, infants spend most of their time focussing on objects (see Figure 2, main text), with relatively infrequent looks to their caregiver. As a result, infant looks are, overall, longer in comparison to their caregivers’.

      Minor points

      Use the term association or relation (relationships is for interpersonal relationships, not in statistics).

      This has now been amended throughout.

      I'm unsure I'd call the interactions "naturalistic" when they occur at a table, with select toys, EEG caps on partners, and so on. The term seems more appropriate for studies with fewer constraints that occur (for example) in a home environment, etc.

      We understand R1s concern with our use of the term ‘naturalistic’ to refer to the joint play interactions that we analyse in the current study. However, we feel the term is appropriate, given that the interactions are unstructured: the only instruction given to caregivers at the beginning of the interaction is to play with their infants in the way that they might do at home. The interactions, therefore, measure free-flowing caregiver and infant behaviours, where modulations in each individual’s behaviour are the result of the intra- and inter-individual dynamics of the social exchange. This is in comparison to previous work on early infant attention development which has used more structured designs, and modulations in infant behaviour occur as a result of the parameters of the experimental task.

      Reviewer #2

      Public Review

      Summary:

      This paper acknowledges that most development occurs in social contexts, with other social partners. The authors put forth two main frameworks of how development occurs within a social interaction with a caregiver. The first is that although social interaction with mature partners is somewhat bi-directional, mature social partners exogenously influence infant behaviors and attention through "attentional scaffolding", and that in this case infant attention is reactive to caregiver behavior. The second framework posits that caregivers support and guide infant attention by contingently responding to reorientations in infant behavior, thus caregiver behaviors are reactive to infant behavior. The aim of this paper is to use moment-to-moment analysis techniques to understand the directionality of dyadic interaction. It is difficult to determine whether the authors prove their point as the results are not clearly explained as is the motivation for the chosen methods.

      Strengths

      The question driving this study is interesting and a genuine gap in the literature. Almost all development occurs in the presence of a mature social partner. While it is known that these interactions are critical for development, the directionality of how these interactions unfold in real-time is less known.

      The analyses largely seem to be appropriate for the question at hand, capturing small moment-to-moment dynamics in both infant and child behavior, and their relationships with themselves and each other. Autocorrelations and cross-correlations are powerful tools that can uncover small but meaningful patterns in data that may not be uncovered with other more discretized analyses (i.e. regression).

      We are pleased that R2 finds our work to be an interesting contribution to the field, which utilises appropriate analysis techniques.

      Weaknesses

      The major weakness of this paper is that the reader is assumed to understand why these results lead to their claimed findings. The authors need to describe more carefully their reasoning and justification for their analyses and what they hope to show. While a handful of experts would understand why autocorrelations and cross-correlations should be used, they are by no means basic analyses. It would also be helpful to use simulated data or even a simple figure to help the reader more easily understand what a significant result looks like versus an insignificant result.

      We thank the reviewer for this comment, and we agree that much more detail should be added to the Introduction section. As a result of this comment, we have made direct changes to the Introduction on pages 9-11 to more clearly detail these analysis methods, our rationale for using these methods; and how we expect the results to further our understanding of the drivers of infant attention in naturalistic social interactions.

      We also provide a figure in the SM (Fig. S6) to help the reader more clearly understand the permutation method used in our statistical analyses described in the Methods, on page 51, which depicts significant vs. insignificant patterns of results against their permutation distribution.

      While the overall question is interesting the introduction does not properly set up the rest of the paper. The authors spend a lot of time talking about oscillatory patterns in general but leave very little discussion to the fact they are using EEG to measure these patterns. The justification for using EEG is also not very well developed. Why did the authors single out fronto-temporal channels instead of using whole brain techniques, which are more standard in the field? This is idiosyncratic and not common.

      We very much agree with R2 that the rationale and justification for using EEG to understand the processes that influence infants’ attention patterns is under-developed in the current manuscript. As a result of this comment we have made direct edits to the Introduction section of the main text on pages 7-8 to more clearly describe the rationale for examining the relationship between infant EEG activity and their attention during the play interactions with their caregivers.

      As we describe in the Introduction section, previous behavioural work conducted with infants has suggested that endogenous cognitive processes (i.e. fluctuations in top-down cognitive control) might be important in explaining how infants allocate their attention during free-flowing, naturalistic interactions towards the end of the first year. Oscillatory neural activity occurring at theta frequencies (3-6Hz), which can be measured with EEG, has previously been associated with top-down intrinsically guided attentional processes in both adulthood and infancy (Jones et al., 2020; Orekhova, 1999; Xie et al., 2018). Measuring fluctuations in infant theta activity therefore provides a method to examine how endogenous cognitive processes structure infant attention in naturalistic social interactions which might be otherwise unobservable behaviourally.

      It is important to note that the Introduction distinguishes between two different oscillatory mechanisms that could possibly explain the organisation of infant attention over the course of the interaction. The first refers to oscillatory patterns of attention, that is, consistent attention durations produced by infants that likely reflect automatic, regulatory functions, related to fluctuations in infant arousal. The second mechanism is oscillatory neural activity occurring at theta frequencies, recorded with EEG, which, as mentioned above, is thought to reflect fluctuations in intrinsically guided attention in early infancy. We have amended the Introduction to make the distinction between the two more clear.

      A worrisome weakness is that the figures are not consistently formatted. The y-axes are not consistent within figures making the data difficult to compare and interpret. Labels are also not consistent and very often the text size is way too small making reading the axes difficult. This is a noticeable lack of attention to detail.

      This has now been adjusted throughout, where appropriate.

      No data is provided to reproduce the figures. This does not need to include the original videos but rather the processed and de-identified data used to generate the figures. Providing the data to support reproducibility is increasingly common in the field of developmental science and the authors are greatly encouraged to do so.

      This will be provided with the final manuscript.

      Minor Weaknesses

      Figure 4, how is the pattern in a not significant while in b a very similar pattern with the same magnitude of change is? This seems like a spurious result.

      The statistical analysis conducted for all cross-correlation analyses reported follows a rigorous and stringent permutation-based temporal clustering method which controls for family-wise error rate using a non-parametric Monte Carlo method (see Methods in the main text for more detail). Permutations are created by shuffling data sets between participants and, therefore, patterns of significance identified by the cluster-based permutation analysis will depend on the mean and standard deviation of the cross-correlations in the permutation distribution. Fig. S6 now depicts the cross-correlations against their permutation distributions which should help readers to understand the patterns of significance reported in the main text.

      The correlations appear very weak in Figures 3b, 5a, 7e. Despite a linear mixed effects model showing a relationship, it is difficult to believe looking at the data. Both the Spearman and Pearson correlations for these plots should be clearly included in the text, figure, or figure legend.

      We thank the reviewer for this comment, and agree that reporting the correlations for these plots would strengthen the findings of the linear mixed effects models reported in text. As a result, we have added both Spearman and Pearson correlations to the legends of Figures 3b, 5a and 7e, corresponding to the statistically significant relationships examined in the linear mixed effects models. The strength of the relationships are entirely consistent with those documented in other previous research that used similar methods (e.g. Piazza et al., 2018). How strong the relationship looks to the observer is entirely dependent on the graphical representation chosen to represent it. We have chosen to present the data in this way because we feel that it is the most honest way to represent the statistically significant, and very carefully analysed, effects that we have observed in our data.

      Linear mixed effects models need more detail. Why were they built the way they were built? I would have appreciated seeing multiple models in the supplementary methods and a reasoning to have landed on one. There are multiple ways I can see this model being built (especially with the addition of a random intercept). Also, there are methods to test significance between models and aid in selection. That being said, although participant identity is a very common random effect, its use should be clearly stated in the main text.

      We very much agree with R2 that the reporting of the linear mixed effects models needs more detail and this has now been added to the Method section (page 54). Whilst it is true that there are multiple ways in which this model could be built, given the specificity of our research questions, regarding the reactive changes in infant theta activity and caregiver behaviours that occur after infant look onsets towards objects (see pages 9-11 of the Introduction), we take a hypothesis driven approach to building the linear mixed effects models. As a result, random intercepts are specified for participants, as well as uncorrelated by-participant random slopes (Brown, 2021; Gelman & Hill, 2006; Suarez-Rivera et al., 2019). In this way, infant look durations are predicted from caregiver behaviours (or infant theta activity), controlling for between participant variability in look durations, as well as the strength of the effect of caregiver behaviours (or infant theta activity) on infant look durations.

      Some parentheses aren't closed, a more careful re-reading focusing on these minor textual issues is warranted.

      This has now been corrected.

      Analysis of F0 seems unnecessarily complex. Is there a reason for this?

      Computation of the continuous caregiver F0 variable may seem complex but we feel that all analysis steps are necessary to accurately and reliably compute this variable in our naturalistic, noisy and free-flowing interaction data. For example, we place the F0 only into segments of the interaction identified as the mum speaking so that background noises and infant vocalisations are not included in the continuous variable. We then interpolate through unvoiced segments (similar to Räsänen et al., 2018), and compute the derivative in 1000ms intervals as a measure of the rate of change. The steps taken to compute this variable have been both carefully and thoughtfully selected given the many ways in which this continuous rate of change variable could be computed (cf. Piazza et al., 2018; Räsänen et al., 2018).

      The choice of a 20hz filter seems odd when an example of toy clacks is given. Toy clacks are much higher than 20hz, and a 20hz filter probably wouldn't do anything against toy clacks given that the authors already set floor and ceiling parameters of 75-600Hz in their F0 extraction.

      We thank the reviewer for this comment and we can see that this part of the description of the F0 computation is confusing. A 20Hz low pass filter is applied to the data stream after extracting the F0 with floor and ceiling parameters set between 75-600Hz. The 20Hz filter therefore filters modulations in the caregivers’ F0 that occur at a modulation frequency greater than 20Hz. The 20Hz filter does not, therefore, refer to the spectral filtering of the speech signal. The description of this variable has been rephrased on page 48 of the main text.

      Linear interpolation is a choice I would not have made. Where there is no data, there is no data. It feels inappropriate to assume that the data in between is simply a linear interpolation of surrounding points.

      The choice to interpolate where there was no data was something we considered in a lot of detail, given the many options for dealing with missing data points in this analysis, and the difficulties involved with extracting a continuous F0 variable in our naturalistic data sets. As R2 points out, one option would be to set data points to NaN values where no F0 is detected and/ or the Mum is not vocalising. A second option, however, would be to set the continuous variable to 0s where no F0 is detected and/ or the Mum is not vocalising (where the mum is not producing sound there is no F0 so rather than setting the variable to missing data points, really it makes most objective sense to set to 0).

      Either of these options (setting parts where no F0 is detected to NaN or 0) makes it difficult to then meaningfully compute the rate of change in F0: where NaN values are inserted, this reduces the number of data points in each time window; where 0s are inserted this creates large and unreal changes in F0. Inserting NaN values into the continuous variable also reduces the number of data points included in the cross-correlation and event-locked analyses. It is important to note that, in our naturalistic interactions, caregivers’ vocal patterns are characterised by lots of short vocalisations interspersed by short pauses (Phillips et al., in prep), similar to previous findings in naturalistic settings (Gratier et al., 2015). Interpolation will, therefore, have largely interpolated through the small pauses in the caregiver’s vocalisations.

      The only limitation listed was related to the demographics of the sample, namely saying that middle class moms in east London. Given that the demographics of London, even east London are quite varied, it's disappointing their sample does not reflect the community they are in.

      Yes we very much agree with R2 that the lack of inclusion of caregivers from wider demographic backgrounds is disappointing, and something which is often a problem in developmental research. Our lab is currently working to collect similar data from infants with a family history of ADHD, as part of a longitudinal, ongoing project, involving families from across the UK, from much more varied demographic backgrounds. We hope that the findings reported here will feed directly into the work conducted as part of this new project.

      That said, demographic table of the subjects included in this study should be added.

      This is now included in the SM, and referenced in the main text.

      References

      Abney, D. H., Warlaumont, A. S., Oller, D. K., Wallot, S., & Kello, C. T. (2017). Multiple Coordination Patterns in Infant and Adult Vocalizations. Infancy, 22(4), 514–539. https://doi.org/10.1111/infa.12165

      Attaheri, A., Choisdealbha, Á. N., Di Liberto, G. M., Rocha, S., Brusini, P., Mead, N., Olawole-Scott, H., Boutris, P., Gibbon, S., Williams, I., Grey, C., Flanagan, S., & Goswami, U. (2020). Delta- and theta-band cortical tracking and phase-amplitude coupling to sung speech by infants [Preprint]. Neuroscience. https://doi.org/10.1101/2020.10.12.329326

      Begus, K., Gliga, T., & Southgate, V. (2016). Infants’ preferences for native speakers are associated with an expectation of information. Proceedings of the National Academy of Sciences, 113(44), 12397–12402. https://doi.org/10.1073/pnas.1603261113

      Brown, V. A. (2021). An Introduction to Linear Mixed-Effects Modeling in R.

      Cohen, M. X. (2014). Analyzing neural time series data: Theory and practice. The MIT Press.

      Gelman, A., & Hill, J. (2006). In Data Analysis using Regression and mulilevel/Hierachical Models. Cambridge University Press.

      Georgieva, S., Lester, S., Noreika, V., Yilmaz, M. N., Wass, S., & Leong, V. (2020). Toward the Understanding of Topographical and Spectral Signatures of Infant Movement Artifacts in Naturalistic EEG. Frontiers in Neuroscience, 14, 352. https://doi.org/10.3389/fnins.2020.00352

      Gratier, M., Devouche, E., Guellai, B., Infanti, R., Yilmaz, E., & Parlato-Oliveira, E. (2015). Early development of turn-taking in vocal interaction between mothers and infants. Frontiers in Psychology, 6. https://doi.org/10.3389/fpsyg.2015.01167

      Jones, E. J. H., Goodwin, A., Orekhova, E., Charman, T., Dawson, G., Webb, S. J., & Johnson, M. H. (2020). Infant EEG theta modulation predicts childhood intelligence. Scientific Reports, 10(1), 11232. https://doi.org/10.1038/s41598-020-67687-y

      Marriott Haresign, I., Phillips, E., Whitehorn, M., Noreika, V., Jones, E. J. H., Leong, V., & Wass, S. V. (2021). Automatic classification of ICA components from infant EEG using MARA. Developmental Cognitive Neuroscience, 52, 101024. https://doi.org/10.1016/j.dcn.2021.101024

      Noreika, V., Georgieva, S., Wass, S., & Leong, V. (2020). 14 challenges and their solutions for conducting social neuroscience and longitudinal EEG research with infants. Infant Behavior and Development, 58, 101393. https://doi.org/10.1016/j.infbeh.2019.101393

      Orekhova, E. (1999). Theta synchronization during sustained anticipatory attention in infants over the second half of the first year of life. International Journal of Psychophysiology, 32(2), 151–172. https://doi.org/10.1016/S0167-8760(99)00011-2

      Perapoch Amadó, M., Greenwood, E., James, Labendzki, P., Haresign, I. M., Northrop, T., Phillips, E., Viswanathan, N., Whitehorn, M., Jones, E. J. H., & Wass, S. (2023). Naturalistic attention transitions from subcortical to cortical control during infancy. [Preprint]. Open Science Framework. https://doi.org/10.31219/osf.io/6z27a

      Piazza, E. A., Hasenfratz, L., Hasson, U., & Lew-Williams, C. (2018). Infant and adult brains are coupled to the dynamics of natural communication [Preprint]. Neuroscience. https://doi.org/10.1101/359810

      Räsänen, O., Kakouros, S., & Soderstrom, M. (2018). Is infant-directed speech interesting because it is surprising? – Linking properties of IDS to statistical learning and attention at the prosodic level. Cognition, 178, 193–206. https://doi.org/10.1016/j.cognition.2018.05.015

      Richards, J. E. (2010). The development of attention to simple and complex visual stimuli in infants: Behavioral and psychophysiological measures. Developmental Review, 30(2), 203–219. https://doi.org/10.1016/j.dr.2010.03.005

      Suarez-Rivera, C., Smith, L. B., & Yu, C. (2019). Multimodal parent behaviors within joint attention support sustained attention in infants. Developmental Psychology, 55(1), 96–109. https://doi.org/10.1037/dev0000628

      Wass, S. V., Noreika, V., Georgieva, S., Clackson, K., Brightman, L., Nutbrown, R., Covarrubias, L. S., & Leong, V. (2018). Parental neural responsivity to infants’ visual attention: How mature brains influence immature brains during social interaction. PLOS Biology, 16(12), e2006328. https://doi.org/10.1371/journal.pbio.2006328

      Xie, W., Mallin, B. M., & Richards, J. E. (2018). Development of infant sustained attention and its relation to EEG oscillations: An EEG and cortical source analysis study. Developmental Science, 21(3), e12562. https://doi.org/10.1111/desc.12562

      Yu, C., & Smith, L. B. (2013). Joint Attention without Gaze Following: Human Infants and Their Parents Coordinate Visual Attention to Objects through Eye-Hand Coordination. PLoS ONE, 8(11), e79659. https://doi.org/10.1371/journal.pone.0079659

      Yu, C., & Smith, L. B. (2016). The Social Origins of Sustained Attention in One-Year-Old Human Infants. Current Biology, 26(9), 1235–1240. https://doi.org/10.1016/j.cub.2016.03.026

      Yu, C., Zhang, Y., Slone, L. K., & Smith, L. B. (2021). The infant’s view redefines the problem of referential uncertainty in early word learning. Proceedings of the National Academy of Sciences, 118(52), e2107019118. https://doi.org/10.1073/pnas.2107019118

    2. eLife assessment

      This study reports important evidence that infants' internal factors guide children's attention, and that caregivers respond to infants' attentional shifts during caregiver-infant interactions. The authors analyzed EEG data and multiple types of behaviors using solid methodologies that can guide future studies of neural responses during social interaction in infants.

    3. Reviewer #2 (Public Review):

      Summary:

      This paper acknowledges that most development occurs in social contexts, with other social partners. The authors put forth two main frameworks of how development occurs within a social interaction with a caregiver. The first, is that although social interaction with mature partners is somewhat bi-directional, mature social partners exogenously influence infant behaviors and attention through "attentional scaffolding", and that in this case infant attention is reactive to caregiver behavior. The second framework posits that caregivers support and guide infant attention by contingently responding to reorientations in infant behavior, thus caregiver behaviors are reactive to infant behavior. The aim of this paper is to use moment-to-moment analysis techniques to understand the directionality of dyadic interaction.

      Strengths:

      The question driving this study is interesting and a genuine gap in the literature. Almost all development occurs in the presence of a mature social partner. While it is known that these interactions are critical for development, the directionality of how these interactions unfold in real-time is less known.

      The analyses are appropriate for the question at hand, capturing small moment-to-moment dynamics in both infant and child behavior, and their relationships with themselves and each other. Autocorrelations and cross-correlations are powerful tools that can uncover small but meaningful patterns in data that may not be uncovered with other more discretized analyses (i.e. regression).

      Weaknesses:

      While the authors improved their explanation of why they are using cross-correlations and the resting EEG patterns and what they mean, they did not address this specific piece of feedback: to explain their rationale for only focussing on fronto-temporal channels, rather than averaging channels across the whole brain.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We are grateful to the reviewers for their appreciation of our study and thoughtful comments. In response to the main concern raised by all reviewers regarding the potential influences of external noise factors on intuitive inference, such as external disturbances or imperfect observations, we have conducted three new experiments suggested by the reviewers. These experiments were designed to: (1) assess the influence of external forces on humans’ judgments by implementing a wall to block wind disturbances from one direction, (2) examine human accuracy in predicting the landing position of a falling ball when its trajectory is obscured, and (3) evaluate the effect of object geometry on human judgment of stability. The findings from these experiments consistently support our proposal of the stochastic world model on gravity embedded in human mind. Besides, we have also addressed the rest comments from the reviewers in a one-by-one fashion.

      Reviewer #1 (Recommendations For The Authors):

      As mentioned in the public review, I did not find it entirely convincing that the study shows evidence for a Gaussian understanding of gravity. There are two studies that would bolster this claim: 1. Replicate experiment 1, but also ask people to infer whether there was a hidden force. If people are truly representing gravity as proposed in the paper, you should get no force inferences. However, if the reason the Gaussian gravity model works is that people infer unseen forces, this should come out clearly in this study.

      Author response image 1.

      Wall experiment to test the impact of external forces on the measurement of stochastic gravity. (a) Experimental setting. We replicated the original setup with the addition of a wall implemented on one side. Left: the overall experimental scene; Right, the scene shown to participants. (b) Human behaviors. Three participants conducted this experiment, and their responses consistently showed normal distributions without any skewness, suggesting that their judgments were not affected by the presence of the wall. These results support our claim that humans’ judgments on stability were not affected by potential concerns regarding external forces.

      R1: We thank the reviewer for this suggestion. To directly test whether participants’ judgments were influenced by their implicit assumptions about external forces, we duplicated the original experimental setup with the addition of a wall implemented on one side (Supplementary Figure 4A). Before the start of the experiment, we explicitly informed the participants that the wall was designed to block wind, ensuring that any potential wind forces from the direction of the wall would not influence the collapse. If participants’ judgments were affected by external noise, we would expect to observe a skewed angle distribution. Contrary to this prediction, our results showed a normal distribution across all three participants tested (1 female; ages: 24-30), similar to the experiment without the wall (Supplementary Figure 4B). Therefore, the stochastic nature of intuitive inference on objects’ stability is embedded in the mind, not shaped by external forces or explicit instructions.

      This new experiment has been added to the revised manuscript

      Line 166-168: “…, and remained unchanged with the addition of a wall on one side to block potential external disturbances from wind (Supplementary Figure 4).”

      (2) Similarly, you can imagine a simple study where you drop an object behind a floating occluder and you check where people produce an anticipatory fixation (i.e., where do they think the object will come out?). If people have a stochastic representation of gravity, this should be reflected in their fixations. But my guess is that everyone will look straight down.

      Author response image 2.

      Trajectory experiment to test the stochastic nature of gravity represented in the mind. (a) Experiment design. In this experiment, participants were required to use a mouse to determine the landing point of a parabolic trajectory (marked by the green dot), obscured by a grey rectangle. Note that the parabolic trajectory was determined only by gravity, and no external disturbances were introduced. The parameters used in this experiment are detailed in the upper right corner. (b) Predictive errors from three participants. The predictive errors from all three participants conform to Gaussian distributions with non-negligible variances. These results suggest the notion of an inherent stochastic property of gravity represented in the mind.

      R2: We thank the reviewer for suggesting this thought experiment. However, when predicting the landing point of a falling object, participants may rely more on learned knowledge that an unimpeded object continues to fall in a straight line, rather than drawing on their intuitive physics. To avoid this potential confounding factor, we designed a similar experiment where participants were asked to predict the landing point of a parabolic trajectory, obscured by an occluder (Author response image 2A). In each trial, participants used a mouse (clicking the left button) to predict the landing point of each parabolic trajectory, and there were 100 trials in total. This design not only limits the impact of direct visual cues but also actively engages the mental simulation of intuitive physics. All three participants (1 female; ages: 24-30) were unable to accurately predict the landing points of the trajectories, and the predictive errors conformed to Gaussian distributions with different variances (Author response image 2B). Therefore, this new experiment confirms the stochastic nature of intuitive physics.

      (3) I believe the correct alternative model should be the one that has uncertainty over unseen forces, which better captures current proposals in the field, and controls for the amount of uncertainty in the models.

      R3: We thank the reviewers for the above-mentioned suggestions, and the findings from these two new experiments reinforce our proposal regarding the inherent stochastic characteristic of how the mind represents gravity.

      (4) I was not convinced that the RL framework was set up correctly to tackle the questions it claims to tackle. What this shows is that you can evolve a world model with Gaussian gravity in a setup that has no external perturbations. That does not imply that that is how humans evolved their intuitive physics, particularly when creatures have evolved in a world full of external perturbations. Showing that when (1) there are hidden perturbations, and (2) these perturbations are learnable, but (3) the model nonetheless just learns stochastic gravity, would be a more convincing result.

      R4: We completely agree with the reviewer that the RL framework serves primarily as a theoretic model to explain the stochastic nature of the world model on gravity, rather than as a demonstration of the developmental origins of intuitive physics abilities. The genesis of such abilities is multifaceted and unlikely to be fully replicated through a simple simulation like RL. Therefore, the purpose of incorporating the RL framework in our study is to demonstrate that external perturbances are not necessary for the development of a stochastic representation of gravity. In fact, introducing additional external noise into the RL framework likely heightens the uncertainty in learning gravity’s direction, potentially amplifying, rather than diminishing, the stochastic nature of mental gravity.

      In revision, we have clarified the role of the RL framework

      Line 265-277: “While the cognitive impenetrability and the self-consistency observed in this study, without resorting to an external perturbation, favor the stochastic model over the deterministic one, the origin of this stochastic feature of the world model is unclear.

      Here we used a reinforcement learning (RL) framework to unveil this origin, because our intelligence emerges and evolves under the constraints of the physical world. Therefore, the stochastic feature may emerge as a biological agent interacts with the environment, where the mismatches between external feedback from the environment and internal expectations from the world model are in turn used to fine-tune the world model (Friston et al., 2021; MacKay, 1956; Matsuo et al., 2022). Note that a key aspect of the framework is determining whether the stochastic nature of the world model on gravity emerges through this interaction, even in the absence of external noise.”

      (5) Some comments on the writing:

      The word 'normality' is used to refer to people's judgments about whether a tower collapsed looked 'normal'. I was a bit confused by this because normality can also mean 'Gaussian' and the experiments are also sampling from Gaussian distributions. There were several points where it took me a second to figure out which sense of 'normality' the paper was using. I would recommend using a different term.

      R5: We are sorry for the confusion. In revision, the term “normality” has been replaced with “confidence level about normal trajectory”.

      (6) One small comment is that Newton's laws are not a faithful replica of the "physical laws of the world" they are a useful simplification that only works at certain timescales. I believe some people propose Newtonian physics as a model of intuitive physics in part because it is a rapid and useful approximation of complex physical systems, and not because it is an untested assumption of perfect correspondence.

      R6: We are sorry for the inaccurate expression. We have revised our statements in the manuscript Line 15-16: “We found that the world model on gravity was not a faithful replica of the physical laws, but instead encoded gravity’s vertical direction as a Gaussian distribution.”

      (7) Line 49-50: Based on Fig 1d, lower bound of possible configurations for 10 blocks is ~17 in log-space, which is about 2.5e7. But the line here says it's 3.72e19, which is much larger. Sorry if I am missing something.

      R7: We thank the reviewer to point out this error. We re-calculated the number of possible configurations using the formula (3) in the appendix, and the number of configurations with 10 blocks is:

      Thus,

      This estimated number is much larger than that in our previous calculation, which has been corrected in the revised text.

      Line 827-829: “d) The lower bound of configurations’ possible number and the number of blocks in a stack followed an exponential relationship with a base of 10. The procedure can create at least 1.14×1050 configurations for stacks consisting of 10 blocks.”

      Line 49-50: “… but the universal cardinality of possible configurations is at least 1.14×1050 (Supplementary Figure 1), …”

      Line 1017-1018: “… the number of configurations can be estimated with formula (9), which is 1.14×1050.”

      (8) Lines 77-78: "A widely adopted but not rigorously tested assumption is that the world model in the brain is a faithful replica of the physical laws of the world." This risks sounding like you are asserting that colleagues in the field do not rigorously test their models. I think you meant to say that they did not 'directly test', rather than 'rigorously test'. If you meant rigorous, you might want to say more to justify why you think past work was not rigorous.

      R8: We apologize for the inappropriate wording, the sentence has been revised and we illustrate the motivation more comprehensively in the revised text,

      Line 76-92: “A prevailing theory suggests that the world model in the brain accurately mirrors the physical laws of the world (Allen et al., 2020; Battaglia et al., 2013; Zhou et al., 2022). For example, the direction of gravity encoded in the world model, a critical factor in stability inference, is assumed to be straight downward, aligning with its manifestation in the physical world. To explain the phenomenon that tall and thin objects are subjectively perceived as more unstable compared to short and fat ones (Supplementary Figure 2), external noise, such as imperfect perception and assumed external forces, is introduced to influence the output of the model. However, when the brain actively transforms sensory data into cognitive understanding, these data can become distorted (Kriegeskorte and Douglas, 2019; Naselaris et al., 2011), hereby introducing uncertainty into the representation of gravity’s direction. In this scenario, the world model inherently incorporates uncertainty, eliminating the need for additional external noise to explain the inconsistency between subjective perceptions of stability and the actual stability of objects. Note that this distinction of these two theories is nontrivial: the former model implies a deterministic representation of the external world, while the latter suggests a stochastic approach.”

      (9) Lines 79-84 States that past models encode gravity downward. It then says that alternatively there is consensus that the brain uses data from sensory organs and adds meaning to them. I think there might be a grammatical error here because I did not follow why saying there is 'consensus' on something is a theoretical alternative. I also had trouble following why those two statements are in opposition. Is any work on physics engines claiming the brain does not take data from sensory organs and add meaning to them?

      R9: We are sorry for the confusion. Here we intend to contrast the deterministic model (i.e., the uncertainty comes from outside the model) with the stochastic model (i.e., the uncertainty is inherently built into the model). In revision, we have clarified the intention. For details, please see R8.

      (10) Lines 85-88: Following on the sentence above, you then conclude that the representation of the world may therefore not be the same as reality. I did not understand why this followed. It seems you are saying that, because the brain takes data from sensory organs, therefore its representations may differ from reality.

      R10: Again, we are sorry about the confusion. Please see the revised text in R8.

      (11) Lines 190-191: I had trouble understanding this sentence. I believe you are missing an adjective to clarify that participants were more inclined to judge taller stacks as more likely to collapse.

      R11: We are sorry for the confusion. What we intended to state here is that participants’ judgment was biased, showing a tendency to predict a collapse for stacks regardless of their actual stability. We have revised this confusing sentence in the revision. Line 202–204: “However, the participants showed an obvious bias towards predicting a collapse for stacks regardless of their actual stability, as the dots in Fig 2b are more concentrated on the lower side of the diagonal line.”

      (12) Line 201: I don't think it's accurate to say that MGS "perfectly captured participants' judgments" unless the results are actually perfect.

      R12: We agree, and in revision we have toned down the statement Line 213–214: “…, the MGS, in contrast to the NGS, more precisely reflected participants’ judgments of stability …”

      Reviewer #2 (Recommendations For The Authors):

      I think this is an impressive set of experiments and modeling work. The paper is nicely written and I appreciate the poetic license the authors took at places in the manuscript. I only have clarification points and suggest a simple experiment that could lend further support to their conclusions. 1. In my opinion, the impact of this work is twofold. First, the suggestion that gravity is represented as a distribution of the world and not a result of (inferred) external perturbations. Second, that the distribution is advantageous as it balances speed and accuracy, and lessens computational processing demands (i.e., number of simulations). The second point here is contingent on the first point, which is really only supported by the RL model and potentially the inverted scene condition. I am somewhat surprised that the RL model does not converge on a width much smaller than ~20 degrees after 100,000 simulations. From my understanding, it was provided feedback with collapses based on natural gravity (deterministically downward). Why is learning so slow and the width so large? Could it be the density of the simulated world model distribution? If the model distribution of Qs was too dense, then Q-learning would take forever. If the model distribution was too sparse, then its final estimate would hit a floor of precision. Could the authors provide more details on the distribution of the Qs for the RL model?

      Author response image 3.

      RL learning curves as a function of θ angle with different sampling densities and learning rates. Learning rates were adjusted to low (a), intermediate (b) and high (c) settings, while sampling densities were chosen at four levels: 5x5, 11x11, 31x31, and 61x61 shown from the left to the right. Two key observations emerged from the simulations as the reviewer predicted. First, higher learning rates resulted in a more rapid decline in learning curves but introduced larger variances. Second, increased sampling density necessitated more iterations for convergence. Note that in all simulations, we limited the iterations to 1,000 times (as opposed to 100,000 times reported in the manuscript) to demonstrate the trend without excessive computational demands.

      R1: To illustrate the distribution of the Q-values for the RL model, we re-ran the RL model with various learning rates and sampling densities (Author response image 3). These results support the reviewer’s prediction that higher learning rates resulted in a more rapid decline in learning curves but introduced larger variances, and increased sampling density requires more iterations for convergence.

      This simulation also elucidates the slower learning observed in the experiment described in the text, where the force sphere was divided into 61x61 angle pairs, and the learning rate was set to 0.15. This set of parameters ensured convergence within a reasonable brief timeframe while maintaining high-resolution force assessments.

      Besides, the width of the Gaussian distribution is mainly determined by the complexity of stacks. As shown in Figure 3c and Supplementary Figure 9, stacks with fewer blocks (i.e., less complex) caused a larger width, whereas those with more blocks resulted in a narrower spread. In the study, we used a collection of stacks varying from 2 to 15 blocks to simulate the range of stacks humans typically encounter in daily life.

      In revision, we have incorporated these insights suggested by the reviewer to clarify the performance of the RL framework:

      Line 634-639: “The angle density and learning rate are two factors that affect the learning speed. A larger angle density prolongs the time to reach convergence but enables a more detailed force space; a higher learning rate accelerates convergence but incurs larger variance during training. To balance speed and convergence, we utilized 100,000 configurations for the training.”

      Line 618-619: “…, separately divided them into 61 sampling angles across the spherical force space (i.e., the angle density).”

      (2) Along similar lines, the authors discuss the results of the inverted science condition as reflecting cognitive impenetrability. However, do they also interpret it as support for an intrinsically noisy distribution of gravity? I would be more convinced if they created a different scene that could have the possibility of affecting the direction of an (inferred) external perturbation - a previously held explanation of the noisy world model. For example, a relatively simple experiment would be to have a wall on one side of the scene such that an external perturbation would be unlikely to be inferred from that direction. In the external perturbation account, phi would then be affected resulting in a skewed distribution of angle pairs. However, in the authors' stochastic world model phi would remain unaffected resulting in the same uniform distribution of phi the authors observed. In my opinion, this would provide more compelling evidence for the stochastic world model.

      Author response image 4.

      Wall experiment to test the impact of external forces on the measurement of stochastic gravity. (a) Experimental setting. We replicated the original setup with the addition of a wall implemented on one side. Left: the overall experimental scene; Right, the scene shown to participants. (b) Human behaviors. Three participants conducted this experiment, and their responses consistently showed normal distributions without any skewness, suggesting that their judgments were not affected by the presence of the wall. These results support our claim that humans’ judgments on stability were not affected by potential concerns regarding external forces.

      R2: We thank the reviewer for this suggestion. Following the reviewer’s concern, we designed the experiment with the addition of a wall implemented on one side (Supplementary figure 4A). We explicitly informed the participants that the wall was designed to block wind before the start of the experiment, ensuring no potential wind forces from the direction of the wall to influence the collapse trajectory of configurations. Participants need to judge if the trajectory was normal. If participants’ judgments were influenced by external noises, we would expect to observe a skewed angle distribution. However, our results still showed a normal distribution across all participants tested, consistent with the experiment without the wall (Supplementary figure 4B). This experiment suggested the stochastic nature of intuitive inference on objects’ stability is embedded in the mind, rather than shaped by external forces or explicit instructions.

      We revised the original manuscript, and added this new experiment

      Line 166-168: “…, and remained unchanged with the addition of a wall on one side to block potential external disturbances from wind (Supplementary Figure 4).”

      (3) I didn't completely follow the authors' explanation for the taller objects illusion. On lines 229-232, the authors state that deviations from gravity's veridical direction are likely to accumulate with the height of the objects. Is this because, in the stochastic world model account, each block gets its own gravity vector that is sampled from the distribution? The authors should clarify this more explicitly. If this is indeed the author's claim, then it would seem that it could be manipulated by varying the dimensions of the blocks (or whatever constitutes an object).

      R3: We are sorry for the confusion caused by the use of the term ‘accumulate’. In the study, there is only one gravity vector sampled from the distribution for the entire structure, rather than each block having a unique gravity vector. The height illusion is attributed to the fact that the center of gravity in taller objects is more susceptible to influence when gravity deviates slightly from a strictly downward direction. This is especially true for objects consisting of multiple blocks stacked atop one another. In revision, we have removed the confusing term ‘accumulate’ for clarification.

      Line 242-244: “…, because the center of gravity in taller objects is more susceptible to influence when gravity deviates slightly from a strictly downward direction during humans’ internal simulations.”

      (4) The authors refer to the RL simulations as agent-environment interactions, but in reality, the RL model does not interact with the blocks. Would experience-dependent or observation be more apropos?

      R4: We completely agree. Indeed, the RL model did not manipulate stacks; rather, it updated its knowledge of natural gravity based on the discrepancies between the RL model’s predictions and observed outcomes. In revision, we have removed the confusing term ‘agent-environment interactions’ and clarified its intended meaning.

      Line 19-22: “Furthermore, a computational model with reinforcement learning revealed that the stochastic characteristic likely originated from experience-dependent comparisons between predictions formed by internal simulations and the realities observed in the external world, …”

      Reviewer #3 (Public Review):

      (1) In spite of the fact that the Mental Gravity Simulation (MGS) seems to predict the data of the two experiments, it is an untenable hypothesis. I give the main reason for this conclusion by illustrating a simple thought experiment. Suppose you ask subjects to determine whether a single block (like those used in the simulations) is about to fall. We can think of blocks of varying heights. No matter how tall a block is, if it is standing on a horizontal surface it will not fall until some external perturbation disturbs its equilibrium. I am confident that most human observers would predict this outcome as well. However, the MSG simulation would not produce this outcome. Instead, it would predict a non-zero probability of the block to tip over. A gravitational field that is not perpendicular to the base has the equivalent effect of a horizontal force applied on the block at the height corresponding to the vertical position of the center of gravity. Depending on the friction determined by the contact between the base of the block and the surface where it stands there is a critical height where any horizontal force being applied would cause the block to fall while pivoting about one of the edges at the base (the one opposite to where the force has been applied). This critical height depends on both the size of the base and the friction coefficient. For short objects this critical height is larger than the height of the object, so that object would not fall. But for taller blocks, this is not the case. Indeed, the taller the block the smaller the deviation from a vertical gravitational field is needed for a fall to be expected. The discrepancy between this prediction and the most likely outcome of the simple experiment I have just outlined makes the MSG model implausible. Note also that a gravitational field that is not perpendicular to the ground surface is equivalent to the force field experienced by the block while standing on an inclined plane. For small friction values, the block is expected to slide down the incline, therefore another prediction of this MSG model is that when we observe an object on a surface exerting negligible friction (think of a puck on ice) we should expect that object to spontaneously move. But of course, we don't, as we do not expect tall objects that are standing to suddenly fall if left unperturbed. In summary, a stochastic world model cannot explain these simple observations.

      Author response image 5.

      Differentiating Subjectivity from Objectivity. In both Experiment 1 (a) and Experiment 2 (b), participants were instructed to determine which shape appeared most stable. Objectively, in the absence of external forces, all shapes possess equal stability. Yet, participants typically perceived the shape on the left as the most stable because of its larger base area. The discrepancy between objective realities and subjective feelings, as we propose, is attributed to the human mind representing gravity’s direction as a Gaussian distribution, rather than as a singular value pointing directly downward.

      R1: We agree with the reviewer that objects will remain stable until disturbed by external forces. However, in many cases, this is a clear discrepancy between objective realities and subjective feelings. For example, electromagnetic waves associated with purple and red colors are the farthest in the electromagnetic space, yet purple and red are the closest colors in the color space. Similarly, as shown in Supplementary Figure 4, in reality all shapes possess equal stability in the absence of external forces. Yet, humans typically perceive the shape on the left as more stable because of its larger base area. In this study, we tried to explore the mechanism underlying this discrepancy by proposing that the human mind represents gravity’s direction as a Gaussian distribution, rather than as a singular value pointing directly downward.

      In revision, we have clarified the rationale of this study

      Line 76-98: “A prevailing theory suggests that the world model in the brain accurately mirrors the physical laws of the world (Allen et al., 2020; Battaglia et al., 2013; Zhou et al., 2022). For example, the direction of gravity encoded in the world model, a critical factor in stability inference, is assumed to be straight downward, aligning with its manifestation in the physical world. To explain the phenomenon that tall and thin objects are subjectively perceived as more unstable compared to short and fat ones (Supplementary Figure 2), external noise, such as imperfect perception and assumed external forces, is introduced to influence the output of the model. However, when the brain actively transforms sensory data into cognitive understanding, these data can become distorted (Kriegeskorte and Douglas, 2019; Naselaris et al., 2011), hereby introducing uncertainty into the representation of gravity’s direction. In this scenario, the world model inherently incorporates uncertainty, eliminating the need for additional external noise to explain the inconsistency between subjective perceptions of stability and the actual stability of objects. Note that this distinction of these two theories is nontrivial: the former model implies a deterministic representation of the external world, while the latter suggests a stochastic approach. Here, we investigated these two alternative hypotheses regarding the construction of the world model in the brain by examining how gravity’s direction is represented in the world model when participants judged object stability.”

      (2) The question remains as to how we can interpret the empirical data from the two experiments and their agreement with the predictions of the stochastic world model if we assume that the brain has internalized a vertical gravitational field. First, we need to look more closely at the questions posed to the subjects in the two experiments. In the first experiment, subjects are asked about how "normal" a fall of a block construction looks. Subjects seem to accept 50% of the time a fall is normal when the gravitational field is about 20 deg away from the vertical direction. The authors conclude that according to the brain, such an unusual gravitational field is possible. However, there are alternative explanations for these findings that do not require a perceptual error in the estimation of the direction of gravity. There are several aspects of the scene that may be misjudged by the observer. First, the 3D interpretation of the scene and the 3D motion of the objects can be inaccurate. Indeed, the simulation of a normal fall uploaded by the authors seems to show objects falling in a much weaker gravitational field than the one on Earth since the blocks seem to fall in "slow motion". This is probably because the perceived height of the structure is much smaller than the simulated height. In general, there are even more severe biases affecting the perception of 3D structures that depend on many factors, for instance, the viewpoint.

      R2: We thank the reviewer for highlighting several potential confounding factors in our study. We address each of these concerns point-by-point:

      (a) Misinterpretation of the 3D scene and motion. In Response Figure 4 shown above, there is no 3D structure, yet participants’ judgment on stability still deviated from objective realities. In addition, the introduction of 3D motion was to aid in understanding the stacks’ 3D structure. Previous studies without 3D motion have reported similar findings (Allen et al., 2020). Therefore, regardless of whether objects are presented in 2D or 3D, or in static or in motion formats, humans’ judgment on object stability appears consistent.

      (b) Errors in perceived height. While there might be discrepancies between perceived and simulated heights, such errors are systematic across all conditions. Therefore, they may affect the width of the Gaussian distribution but do not fundamentally alter its existence.

      (c) The viewpoint. In one experiment, we inverted gravity’s direction to point upward, diverging from common daily experience. Despite this change in viewpoint, the Gaussian distribution was still observed. That is, the viewpoint appears not a key factor in influencing how gravity’s direction is represented as a Gaussian distribution in our mental world.

      In summary, both our and previous studies (Allen et al., 2020; Battaglia et al., 2013) agree that humans’ subjective assessments of objects’ stability deviate from actual stability due to noise in mental simulation. Apart from previous studies, we suggest that this noise is intrinsic, rather than stemming from external forces or imperfect observations.

      (3) Second, the distribution of weight among the objects and the friction coefficients acting between the surfaces are also unknown parameters. In other words, there are several parameters that depend on the viewing conditions and material composition of the blocks that are unknown and need to be estimated. The authors assume that these parameters are derived accurately and only that assumption allows them to attribute the observed biases to an error in the estimate of the gravitational field. Of course, if the direction of gravity is the only parameter allowed to vary freely then it is no surprise that it explains the results. Instead, a simulation with a titled angle of gravity may give rise to a display that is interpreted as rendering a vertical gravitational field while other parameters are misperceived. Moreover, there is an additional factor that is intentionally dismissed by the authors that is a possible cause of the fall of a stack of cubes: an external force. Stacks that are initially standing should not fall all of a sudden unless some unwanted force is applied to the construction. For instance, a sudden gust of wind would create a force field on a stack that is equivalent to that produced by a tilted gravitational field. Such an explanation would easily apply to the findings of the second experiment. In that experiment subjects are explicitly asked if a stack of blocks looks "stable". This is an ambiguous question because the stability of a structure is always judged by imagining what would happen to the structure if an external perturbation is applied. The right question should be: "do you think this structure would fall if unperturbed". However, if stability is judged in the face of possible external perturbations then a tall structure would certainly be judged as less stable than a short structure occupying the same ground area. This is what the authors find. What they consider as a bias (tall structures are perceived as less stable than short structures) is instead a wrong interpretation of the mental process that determines stability. If subjects are asked the question "Is it going to fall?" then tall stacks of sound structure would be judged as stable as short stacks, just more precarious.

      R3: Indeed, the external forces suggested by the reviewer certainly influence judgments of objects’ stability. The critical question, however, is whether humans’ judgments on objects’ stability accurately mirror the actual stability of objects in the absence of external forces. To address this question, we designed two new experiments.

      Experiment 1: we duplicated the original experimental setup with the addition of a wall implemented on one side (Supplementary Figure 4A). We explicitly informed the participants that the wall could block wind, ensuring that no potential wind from the direction of the wall could influence the configuration. If participants’ judgments were affected by external noise, we would expect to observe a skewed angle distribution. Contrary to this prediction, our results showed a normal distribution across all three participants (Age: 25-30, two females), which is similar to the experiment without the wall (Supplementary Figure 4B).

      Author response image 6.

      Wall experiment to test the impact of external forces on the measurement of stochastic gravity. (a) Experimental setting. We replicated the original setup with the addition of a wall implemented on one side. Left: the overall experimental scene; Right, the scene shown to participants. (b) Human behaviors. Three participants conducted this experiment, and their responses consistently showed normal distributions without any skewness, suggesting that their judgments were not affected by the presence of the wall. These results support our claim that humans’ judgments on stability were not affected by potential concerns regarding external forces.

      Experiment 2: The second experiment adopted another paradigm to test the hypothesis of stochastic mental simulation. Consider humans to infer the landing point of a parabolic trajectory that was obscured by an occlude (Author response image 2A), the stochastic mental simulation predicted that humans’ behavior follows a Gaussian distribution. However, if humans’ judgments were influenced by external noise, the landing points could not be Gaussian. The experiment consists of 100 trials in total, and in each trial participants used a mouse to predict the landing point of each trajectory by clicking the left button. Our results found all three participants (1 female; ages: 24-30) were unable to accurately predict the landing points of the trajectories, and the predictive errors conformed to Gaussian distributions with different variances (Author response image 2B). Therefore, this new experiment confirms the stochastic nature of intuitive physics.

      Author response image 7.

      Trajectory experiment to test the stochastic nature of gravity represented in the mind. (a) Experiment design. In this experiment, participants were required to use a mouse to determine the landing point of a parabolic trajectory (marked by the green dot), obscured by a grey rectangle. Note that the parabolic trajectory was determined only by gravity, and no external disturbances were introduced. The parameters used in this experiment are detailed in the upper right corner. (b) Predictive errors from three participants. The predictive errors from all three participants conform to Gaussian distributions with non-negligible variances. These results suggest the notion of an inherent stochastic property of gravity represented in the mind.

      (4) The RL model used as a proof of concept for how the brain may build a stochastic prior for the direction of gravity is based on very strong and unverified assumptions. The first assumption is that the brain already knows about the force of gravity, but it lacks knowledge of the direction of this force of gravity. The second assumption is that before learning the brain knows the effect of a gravitational field on a stack of blocks. How can the brain simulate the effect of a non-vertical gravitational field on a structure if it has never observed such an event?

      R4: We agree with the reviewer that the RL framework serves primarily as a theoretic model to explain the stochastic nature of the world model on gravity, rather than as a demonstration of the developmental origins of intuitive physics abilities. The genesis of such abilities is multifaceted and unlikely to be fully replicated through a simple simulation like RL. Therefore, the purpose of incorporating the RL framework in our study is to demonstrate that external perturbances are not necessary for the development of a stochastic representation of gravity.

      In revision, we have clarified the role of the RL framework

      Line 265-277: “While the cognitive impenetrability and the self-consistency observed in this study, without resorting to an external perturbation, favor the stochastic model over the deterministic one, the origin of this stochastic feature of the world model is unclear.

      Here we used a reinforcement learning (RL) framework to unveil this origin, because our intelligence emerges and evolves under the constraints of the physical world. Therefore, the stochastic feature may emerge as a biological agent interacts with the environment, where the mismatches between external feedback from the environment and internal expectations from the world model are in turn used to fine-tune the world model (Friston et al., 2021; MacKay, 1956; Matsuo et al., 2022). Note that a key aspect of the framework is determining whether the stochastic nature of the world model on gravity emerges through this interaction, even in the absence of external noise.”

      (5) The third assumption is that from the visual input, the brain is able to figure out the exact 3D coordinates of the blocks. This has been proven to be untrue in a large number of studies. Given these assumptions and the fact that the only parameters the RL model modifies through learning specify the direction of gravity, I am not surprised that the model produces the desired results.

      Author response image 8.

      Perception Uncertainty in 3D stacks structures. (a) Experimental design. A pair of two stacks with similar placements of blocks were presented sequentially to participants, who were instructed to judge whether the stacks were identical and to rate their confidence in this judgment. Each stack was presented on the screen for 2 seconds. (b) Behavior Performance. Three participants (2 males, age range: 24-30) were recruited to the experiment. The confidence in determining whether a pair of stacks remained unchanged rapidly decreased when each block had a very small displacement, suggesting humans could keenly perceive trivial changes in configurations. The x-axis denotes the difference in block placement between stacks, with the maximum value (0.4) corresponding to the length of a block’s short side. The Y-axis denotes humans’ confidence in reporting no change. The red curve illustrates the average confidence level across 4 runs, while the yellow curve is the confidence level of each run.

      R5: Indeed, uncertainty is inevitable when perceiving the external world, because our perception is not a faithful replica of external reality. A more critical question pertains to the accuracy of our perception in representing the 3D coordinates of a stack’s blocks. To address this question, we designed a straightforward experiment (Author response image 5a), where participants were instructed to determine whether a pair of stacks were identical. The position of each block was randomly changed horizontally. We found that all participants were able to accurately identify even minor positional variations in the 3D structure of the stacks (Author response image 5b). This level of perceptual precision is adequate for locating the difference between predictions from mental simulations and actual observations of the external world.

      (6)Finally, the argument that the MGS is more efficient than the NGS model is based on an incorrect analysis of the results of the simulation. It is true that 80% accuracy is reached faster by the MGS model than the 95% accuracy level is reached by the NGS model. But the question is: how fast does the NGS model reach 80% accuracy (before reaching the plateau)?

      R6: Yes. The NGS model achieved 80% accuracy as rapidly as the MGS model. However, the NGS model required a significantly longer period to reach the plateau crucial for decision-making. In revision, this information is now included.

      Line 348-350: “…, while the initial growth rates of both models were comparable, the MGS reached the plateau crucial for decision-making sooner than the NGS.”

      We greatly appreciate the thorough and insightful review provided by all three reviewers, which has considerably improved our manuscript, especially in terms of clarity in the presentation of the approach and further validation of the robustness implications of our results.

      Reference: Allen KR, Smith KA, Tenenbaum JB. 2020. Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning. Proceedings of the National Academy of Sciences 117:29302–29310.

      Battaglia PW, Hamrick JB, Tenenbaum JB. 2013. Simulation as an engine of physical scene understanding. Proceedings of the National Academy of Sciences 110:18327–18332.

      Friston K, Moran RJ, Nagai Y, Taniguchi T, Gomi H, Tenenbaum J. 2021. World model learning and inference. Neural Networks 144:573–590.

      Kriegeskorte N, Douglas PK. 2019. Interpreting encoding and decoding models. Current opinion in neurobiology 55:167–179.

      MacKay DM. 1956. The epistemological problem for automataAutomata Studies.(AM-34), Volume 34. Princeton University Press. pp. 235–252.

      Matsuo Y, LeCun Y, Sahani M, Precup D, Silver D, Sugiyama M, Uchibe E, Morimoto J. 2022. Deep learning, reinforcement learning, and world models. Neural Networks.

      Naselaris T, Kay KN, Nishimoto S, Gallant JL. 2011. Encoding and decoding in fMRI. Neuroimage 56:400–410.

      Zhou L, Smith K, Tenenbaum J, Gerstenberg T. 2022. Mental Jenga: A counterfactual simulation model of physical support.

    2. eLife assessment

      In this valuable study, the authors present findings that suggest that people do not faithfully replicate the physics of the real world but rather have a stochastic world model, specifically a stochastic representation of gravity. This contrasts with prior accounts that suggested a potentially noisy Newtonian model where the noise arises from perceptual uncertainty or (inferred) external perturbations. The experimental evidence is generally solid, with all experiments and model simulations being consistent with the proposed account. In the revision, the authors also added a number of control experiments that address some of the most pressing concerns of the original submission.

    3. Reviewer #2 (Public Review):

      Summary:

      Through a set of experiments and model simulations, the authors tested whether the commonly assumed world model of gravity was a faithful replica of the physical world. They found that participants did not model gravity as single, fixed vector for gravity but instead as a distribution of possible vectors. Surprisingly, the width of this distribution was quite large (~20 degrees). While previous accounts had suggested that this uncertainty was due to perceptual noise or an inferred external perturbation, the authors suggest that this uncertainty simply arises from a noisy distribution of the representation of gravity's direction. A reinforcement learning model with an initial uniform distribution for gravity's direction ultimately converged to a precision on the same order as the human participants, which lends support to the authors' conclusion and suggests that this distribution is learned through experience. What's more, further simulations suggest that representing gravity with such a wide distribution may balance speed and accuracy, providing a potentially normative explanation for the world model with gravity as a distribution.

      Strengths:

      The authors present surprising findings in a relatively straight-forward in a now classic experimental task. They provide a normative explanation based on a resource-rational framework for why people may have a stochastic world model instead of a deterministic world model. While the stochastic world model could be the result of people mentally simulating an external perturbation, the authors include several control experiments to test this possibility.

      Weaknesses:

      The possibility of inferred external perturbations, as opposed to a stochastic world model, is difficult to rule out. This could stem from how people interpret task instructions and it will likely take many, clever studies, to fully reconcile these two alternative accounts.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This important study combines a comparative approach in different synapses with experiments that show how synaptic vesicle endocytosis in nerve terminals regulates short-term plasticity. The data presented support the conclusions and make a convincing case for fast endocytosis as necessary for rapid vesicle recruitment to active zones. Some aspects of the description of the data and analysis are however incomplete and would benefit from a more rigorous approach. With more discussion of methods and analysis, this paper would be of great interest to neurobiologists and biophysicists working on synaptic vesicle recycling and short-term plasticity mechanisms.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study examines the role of release site clearance in synaptic transmission during repetitive activity under physiological conditions in two types of central synapses, calyx of Held and hippocampal CA1 synapses. After the acute block of endocytosis by pharmacology, deeper synaptic depression or less facilitation was observed in two types of synapses. Acute block of CDC42 and actin polymerization, which possibly inhibits the activity of Intersectin, affected synaptic depression at the calyx synapse, but not at CA1 synapses. The data suggest an unexpected, fast role of the site clearance in counteracting synaptic depression.

      Strengths:

      The study uses an acute block of the molecular targets with pharmacology together with precise electrophysiology. The experimental results are clear-cut and convincing. The study also examines the physiological roles of the site clearance using action potential-evoked transmission at physiological Ca and physiological temperature at mature animals. This condition has not been examined.

      Weaknesses:

      Pharmacology may have some off-target effects, though acute manipulation should be appreciated. Although this is a hard question and difficult to address experimentally, reagents may affect synaptic vesicle mobilization to the release sites directly in addition to blocking endocytosis.

      To acutely block vesicle endocytosis, we utilized two different pharmacological tools, Dynasore and Pitstop-2, after testing their blocking spectra and potencies at the calyx presynaptic terminals and collected data of their common effects on target functions. Since the recovery from STD was faster at the calyx synapses in the presence of both endocytic blockers in physiological 1.3 mM [Ca2+] (Figure 2B), but not in 2.0 mM [Ca2+] (Figure S4), they might facilitate vesicle mobilization in physiological condition.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Mahapatra and Takahashi report on the physiological consequences of pharmacologically blocking either clathrin and dynamin function during compensatory endocytosis or of the cortical actin scaffold both in the calyx of Held synapse and hippocampal boutons in acute slice preparations

      Strengths:

      Although many aspects of these pharmacological interventions have been studied in detail during the past decades, this is a nice comprehensive and comparative study, which reveals some interesting differences between a fast synapse (Calyx of Held) tuned to reliably transmit at several 100 Hz and a more slow hippocampal CA1 synapse. In particular, the authors find that acute disturbance of the synaptic actin network leads to a marked frequency-dependent enhancement of synaptic depression in the Calyx, but not in the hippocampal synapse. This striking difference between both preparations is the most interesting and novel finding.

      Weaknesses:

      Unfortunately, however, these findings concerning the different consequences of actin depolymerization are not sufficiently discussed in comparison to the literature. My only criticism concerns the interpretation of the ML 141 and Lat B data. With respect to the Calyx data, I am missing a detailed discussion of the effects observed here in light of the different RRP subpools SRP and FRP. This is very important since Lee et al. (2012, PNAS 109 (13) E765-E774) showed earlier that disruption of actin inhibits the rapid transition of SRP SVs to the FRP at the AZ. The whole literature on this important concept is missing. Likewise, the role of actin for the replacement pool at a cerebellar synapse (Miki et al., 2016) is only mentioned in half a sentence. There is quite some evidence that actin is important both at the AZ (SRP to FRP transition, activation of replacement pool) and at the peri-active zone for compensatory endocytosis and release site clearance. Both possible underlying mechanisms (SRP to FRP transition or release site clearance) should be better dissected.

      The concept of FRP and SRP are derived from voltage-clamp step-depolarization experiments at calyces of Held in pre-hearing rodents at RT, which cannot be directly dissected in data of action-potential evoked EPSCs at post-hearing calyces at physiological conditions. However, we dissected as much by referring to related literatures in new paragraphs in Result section (p9-10), particularly on the different effects of Latrunculin application and experimental conditions by adding a new supplementary Figure (now S5). Regarding F-actin role in vesicle replenishment at cerebellar synapses, we added sentences in Discussion section (p14, last paragraph).

      Reviewer #3 (Public Review):

      General comments:

      (1) While Dynasore and Pitstop-2 may impede release site clearance due to an arrest of membrane retrieval, neither Latrunculin-B nor ML-141 specifically acts on AZ scaffold proteins. Interference with actin polymerization may have a number of consequences many of which may be unrelated to release site clearance. Therefore, neither Latrunculin-B nor ML-141 can be considered suitable tools for specifically identifying the role of AZ scaffold proteins (i.e. ELKS family proteins, Piccolo, Bassoon, α-liprin, Unc13, RIM, RBP, etc) in release site clearance which was defined as one of the principal aims of this study.

      In this study, we focused our analysis on the downstream activity of scaffold protein intersectin by comparing the common inhibitory effects of CDC42 and actin polymerization, by use of ML141 and Latrunculin B, respectively, on vesicle endocytosis and synaptic depression/ facilitation without addressing diverse individual drug effects. To avoid confusion we removed “AZ” from scaffold protein.

      (2) Initial EPSC amplitudes more than doubled in the presence of Dynasor at hippocampal SC->CA1 synapses (Figure S2). This unexpected result raises doubts about the specificity of Dynasor as a tool to selectively block SV endocytosis.

      It is possible that Dynasore might have unknown or off-target effects. However, the main conclusion is backed up by Pitstop-2.

      (3) In this study, the application of Dynasore and Pitstop-2 strongly decreases 100 Hz steady-state release at calyx synapses while - quite unexpectedly - strongly accelerates recovery from depression. A previous study found that genetic ablation of dynamin-1 actually enhanced 300 Hz steady-state release while only little affecting recovery from depression (Mahapatra et al., 2016). A similar scenario holds for the Latrunculin-B effects: In this study, Latrunculin-B strongly increased steady-state depression while in Babu et al. (2020), Latrunculin-B did not affect steady-state depression. In Mahapatra et al. (2016), Latrunculin-B marginally enhanced steady-state depression. The authors need to make a serious attempt to explain all these seemingly contradicting results.

      The latrunculin effect on STD can vary according to the condition of application and external [Ca2+], which we show in a new supplemental Figure S5. The latrunculin effect on the recovery from STD also varies with temperature, [Ca2+], and animal age, which affect Ca2+-dependent fast recovery component from depression. We added paragraphs for this issue in Results section (p9-10).

      (4) The experimental conditions need to be better specified. It is not clear which recordings were obtained in 1.3 mM and which (if any?) in 2 mM external Ca. It is also unclear whether 'pooled data' are presented (obtained from control recordings and from separate recordings after pre-incubation with the respective drugs), or whether the data actually represent 'before'/'after' comparisons obtained from the same synapses after washing in the respective drugs. The exact protocol of drug application (duration of application/pre-incubation?, measurements after wash-out or in the continuous presence of the drugs?) needs to be clearly described in the methods and needs to be briefly mentioned in Results and/or Figure legends.

      We added methodological explanations and reworded sentences in the text to be clear for pharmacological data derived from non-sequential separate experiments.

      (5) The authors compare results obtained in calyx with those obtained in SC->CA1 synapses which they considered examples for 'fast' and 'slow' synapses, respectively. There is little information given to help readers understand why these two synapse types were chosen, what the attributes 'fast' and 'slow' refer to, and how that may matter for the questions studied here. I assume the authors refer to the maximum frequency these two synapse types are able to transmit rather than to EPSC kinetics?

      Yes, the “fast and slow” naming features maximum operating frequency these synapses can transmit. We reworded “fast and slow” to “fast-signaling and slow-plastic” and added explanation in the text.

      (6) Strong presynaptic stimuli such as those illustrated in Figures 1B and C induce massive exocytosis. The illustrated Cm increase of 2 to 2.5 pF represents a fusion of 25,000 to 30,000 SVs (assuming a single SV capacitance of 80 aF) corresponding to a 12 to 15% increase in whole terminal membrane surface (assuming a mean terminal capacitance of ~16 pF). Capacitance measurements can only be considered reliable in the absence of marked changes in series and membrane conductance. Since the data shown in Figs. 1 and 3 are central to the argumentation, illustration of the corresponding conductance traces is mandatory. Merely mentioning that the first 450 ms after stimulation were skipped during analysis is insufficient.

      Conductance trace is shown with a trace of capacitance change induced by a square pulse in our previous paper (Yamashita et al, 2005 Science).

      (7) It is essential for this study to preclude a contamination of the results with postsynaptic effects (AMPAR saturation and desensitization). AMPAR saturation limits the amplitudes of initial responses in EPSC trains and hastens the recovery from depression due to a 'ceiling effect'. AMPAR desensitization occludes paired-pulse facilitation and reduces steady-state responses during EPSC trains while accelerating the initial recovery from depression. The use of, for example, 1 mM kynurenic acid in the bath is a well-established strategy to attenuate postsynaptic effects at calyx synapses. All calyx EPSC recordings should have been performed under such conditions. Otherwise, recovery time courses and STP parameters are likely contaminated by postsynaptic effects. Since the effects of AMPAR saturation on EPSC_1 and desensitization on EPSC_ss may partially cancel each other, an unchanged relative STD in the presence of kynurenic acid is not necessarily a reliable indicator for the absence of postsynaptic effects. The use of kynurenic acid in the bath would have had the beneficial side effect of massively improving voltage-clamp conditions. For the typical values given in this MS (10 nA EPSC, 3 MOhm Rs) the expected voltage escape is ~30 mV corresponding to a change in driving force of 30 mV/80 mV=38%, i.e. initial EPSCs in trains are likely underestimated by 38%. Such large voltage escape usually results in unclamped INa(V) which was suppressed in this study by routinely including 2 mM QX-314 in the pipette solution. That approach does, however, not reduce the voltage escape.

      Glutamate released during AP-evoked EPSCs does not saturate or desensitize postsynaptic receptors at post-hearing calyces of Held (Ishikawa et al, 2002; Yamashita et al, 2003) although it does in pre-hearing calyces (Yamashita et al, 2009). In fact, as shown in Figure S3, our results are essentially the same with or without kynurenate.

      (8) In the Results section (pages 7 and 8), the authors analyze the time course into STD during 100 Hz trains in the absence and presence of drugs. In the presence of drugs, an additional fast component is observed which is absent from control recordings. Based on this observation, the authors conclude that '... the mechanisms operate predominantly at the beginning of synaptic depression'. However, the consequences of blocking or slowing site clearing are expected to be strongly release-dependent. Assuming a probability of <20% that a fusion event occurs at a given release site, >80% of the sites cannot be affected at the arrival of the second AP even by a total arrest of site clearance simply because no fusion has yet occurred. That number decreases during a train according to (1-0.2)^n, where n is the number of the AP, such that after 10 APs, ~90% of the sites have been used and may potentially be unavailable for new rounds of release after slowing site clearance. Perhaps, the faster time course into STD in the presence of the drugs isn't related to site clearance?

      Enhanced depression at the beginning of stimulation indicates the block of rapid SV replenishment mechanism, which includes endocytosis-dependent site-clearance and scaffold-dependent vesicle translocation to release sites.

      (9) In the Discussion (page 10), the authors present a calculation that is supposed to explain the reduced size of the second calyx EPSC in a 100 Hz train in the presence of Dynasore or Pitstop-2. Does this calculation assume that all endocytosed SVs are immediately available for release within 10 ms? Please elaborate.

      We do not assume rapid endocytosed vesicle reuse within 10 ms as it requires much longer time for glutamate refilling (7s at PT; Hori & Takahashi, 2012). Instead, already filled reserved vesicles can rapidly replenish release sites if sites are clean and scaffold works properly. Results shown in Figure S6 also indicate that block of vesicle transmitter refilling has no immediate effect on synaptic responses.

      (10) It is not clear, why the bafilomycin/folimycin data is presented in Fig. S5. The data is also not mentioned in the Discussion. Either explain the purpose of these experiments or remove the data.

      These v-ATPase blockers, which block vesicular transmitter refilling, are reported to enhance EPSC depression at hippocampal synapses at RT and 2 mM [Ca2+] presumably because of lack of filled vesicles undergoing rapid vesicle recycling (eg Kiss & Run). We thought it important to determine whether these data have physiological relevance since such a mechanism might also regulate synaptic strength during repetitive transmission. However, our results did not support its physiological relevance. Since these results are not within our main questions, the negative results are shown it in supplementary Figure 6 and explained in the last paragraph of Result section (p11), but were not discussed further in Discussion section.

      (11) The scheme in Figure 7 is not very helpful.

      We updated the scheme to summarize our conclusion that vesicle replenishment through endocytosis-dependent site-clearance and scaffold-dependent mechanism independently co-operate to strengthen synaptic efficacy during repetitive transmission at calyx fast-signaling synapses. However, endocytic site clearance is solely required to support facilitation at slow-plastic hippocampal SC-CA1 synapses.

      Recommendations for the authors:

      First, my deep apologies for the long delay in reviewing your paper. All reviewers are now in agreement that the paper has valuable new information, but some methods are not described well and some results appear to be incompatible with previous results in the literature. The discussion of previous literature is also incomplete and not well-balanced. With more discussion of methods and literature strengthened this paper would be of great interest to neurobiologists and biophysicists working on synaptic vesicle recycling and short-term plasticity mechanisms. We ask that you address the comments and revise your paper before we can fully recommend the paper as being an important contribution with compelling evidence and a strong data set that supports the conclusions.

      We explained methods more explicitly. Apparent incompatibility with previous results is now explained and discussed with new supplementary data.

      Major:

      (1) In this study, the application of Dynasore and Pitstop-2 strongly decreased 100 Hz steady-state release at calyx synapses while - quite unexpectedly - it strongly accelerated recovery from depression. A previous study found that genetic ablation of dynamin-1 actually enhanced 300 Hz steady-state release while only little affecting recovery from depression (Mahapatra et al., 2016). A similar scenario holds for the Latrunculin-B effects: In this study, Latrunculin-B strongly increased steady-state depression while in Babu et al. (2020), Latrunculin-B did not affect steady-state depression. In Mahapatra et al. (2016), Latrunculin-B marginally enhanced steady-state depression. The authors need to make a serious attempt to explain all these seemingly contradicting results.

      Lack of change in the recovery from depression in dynamin-1 knockout mice by Mahapatra et al (2016) is consistent with results in Figure S4 in 2 mM [Ca2+], whereas accelerated recovery by Dynasore (Figure 2B2) is observed in 1.3 mM [Ca2+] suggesting that it is masked in 2 mM [Ca2+] but revealed in physiological [Ca2+] (p7, top paragraph). In both cases, however, recovery from STD is not prolonged unlike Hosoi et al (2009).

      The latrunculin issues are discussed in Results section with newly added Supplementary Figure S5 (p9-10).

      (2) The experimental conditions need to be better specified. It is not clear which recordings were obtained in 1.3 mM and which (if any?) in 2 mM external Ca. It is also unclear whether 'pooled data' are presented (obtained from control recordings and from separate recordings after pre-incubation with the respective drugs), or whether the data actually represent 'before'/'after' comparisons obtained from the same synapses after washing in the respective drugs. The exact protocol of drug application (duration of application/pre-incubation?, measurements after wash-out or in the continuous presence of the drugs?) needs to be clearly described in the methods and needs to be briefly mentioned in Results and/or Figure legends.

      We made these points clearer in Method section and Result section.

      (3) Please cite and discuss briefly previous papers that have shown fast endocytosis in the calyx of Held with membrane capacitance measurements like Renden and von Gersdorff, J Neurophysiology, 98:3349, 2007 and Taschenberger et al., Neuron, 2002. These papers first showed exocytosis and endocytosis kinetics in more mature (hearing) mice calyx of Held and at higher physiological temperatures.

      One of these literatures relevant to the present study is quoted in p4.

      (4) The findings concerning the different consequences of actin depolymerization are not sufficiently discussed in comparison to the literature. My only criticism concerns the interpretation of the ML 141 and Lat B data. With respect to the Calyx data, I am missing a detailed discussion of the effects observed here in light of the different RRP subpools SRP and FRP. This is very important since Lee et al. (2012, PNAS 109 (13) E765-E774) showed earlier that disruption of actin inhibits the rapid transition of SRP SVs to the FRP at the AZ. The whole literature on this important concept is missing. Likewise, the role of actin for the replacement pool at a cerebellar synapse (Miki et al., 2016) is only mentioned in half a sentence. There is quite some evidence that actin is important both at the AZ (SRP to FRP transition, activation of replacement pool) and at the peri-active zone for compensatory endocytosis and release site clearance. Both possible underlying mechanisms (SRP to FRP transition or release site clearance) should be better dissected.

      We added discussions on the issue of latrunculin in Result section by quoting previous literatures (p9-10). Since there is no direct evidence (by vesicle imaging) for the presence of FRP and SRP, these definitions derived from voltage clamp step-depolarization studies are difficult to incorporate into the dissection of synaptic depression in physiological conditions.

      Reviewer #1 (Recommendations For The Authors):

      I have no major comments, but the following issues may be addressed.

      (1) The term "fast and slow" synapses may be relative and a bit confusing. I do not think hippocampal synapses are slow synapses.

      We have replaced “fast and slow” by “fast-signaling and slow-plastic” to represent their functions and added explanation in the text.

      (2) Off-target effects of pharmacological effects may be discussed. In this respect, bafilomycin experiments can be used to argue against the slow effects of vesicle cycling such as endocytosis, and vesicle mobilization. However, the effects on rapid vesicle mobilization cannot be excluded entirely. Because I cannot exclude the absence of off-target effects either (can be addressed by looking at single vesicle imaging at nano-scale, which is hard to do or looking at EM level quantitatively?), I feel this is a matter of discussion.

      It is possible that Dynasore might have unknown or off-target effects. However, the main conclusion is backed up by Pitstop-2.

      (3) Fig2 A2, B2 and Fig 4 A2 and B2. It is easier to plot the recovery only normalized to the initial value. Subtracting steady-state is somewhat confusing because the recovery looks faster after deeper depression, but this may be just apparent.

      We have given values for both types of plots in Table 2, which indicates no essential difference in the recovery parameters.

      Reviewer #2 (Recommendations For The Authors):

      Line 51: Rajappa et al. (2016) investigated clearance deficits in synaptophysin KO mice (not synaptobrevin).

      Corrected.

      Line 54: intersectin is introduced as AZ scaffold protein, although in most of the literature, it is referred to as an endocytic scaffold protein (also in the cited one, e.g. Sakaba et al. 2013). At least, this should be discussed.

      Since blockers of intersectin downstream protein activity has no effect on vesicle endocytosis (Figure 3 and Sakaba et al, 2013), we called it (presynaptic) scaffold protein instead of endocytic scaffold protein.

      Reviewer #3 (Recommendations For The Authors):

      Minor comments

      Page 1, Title: I don't think the presented data address the role of the presynaptic scaffold in SV replenishment. In addition, 'SV replenishment' and 'site clearance' should not be used synonymously as it seems to be implied here.

      In this study our focus was on the downstream activity of scaffold protein intersectin and since block of its downstream effector proteins CDC42 and actin activities do not obstruct the endocytic activity (Fig 3, and Sakaba et al., 2013), instead of naming it as “endocytic scaffold protein”, we adopted “presynaptic scaffold protein”.

      We have corrected it in the text.

      Page 2, Abstract: Clarify 'physiologically optimized condition' here and elsewhere in the manuscript.

      Abstract: in physiologically optimized condition → in physiological temperature and Ca2+.

      Page 3, line 62: I don't think 'the site-clearance hypothesis is widely accepted'. There are very few models that implement such a mechanism. Examples would be Pan & Zucker (2009) Neuron and Lin, Taschenberger & Neher 2022 (PNAS) which could be cited.

      62: the site-clearance hypothesis is “widely accepted”→ “well supported”

      Page 3 line 77: Please clarify 'fast synapses

      77: fast synapses→fast-signaling synapses, added clarification in the text.

      Page 4, line 100: Please clarify 'in the maximal rate'.

      100: in the maxima rate→reached during 1-Hz stimulation.

      Page 6, line 136: Please clarify 'to reduce the gap'.

      136: To reduce the gap between these different results→To explore the reason for these different results

      Page 7, line 157: I don't consider ML141 and Latrunculin-B 'scaffold protein inhibitors'.

      157: scaffold protein inhibitors had no effect on→ reworded as “none of these inhibitors affected fast or slow endocytosis”.  

      Page 7, line 162: P-value missing.

      162: p < 0.001 added.

      Page 8, line 184: "Since both endocytic blockers and scaffold inhibitors enhanced synaptic depression with a similar time course" consider rephrasing. Sounds like you refer to the time course by which these drugs exert their effect after being applied.

      184: Since both endocytic blockers and scaffold inhibitors enhance synaptic depression with a similar time course→Since the enhancement of synaptic depression by endocytic blockers or scaffold inhibitor occurred mostly at the early phase of synaptic depression.

      Same on page 11, line 250: "At the calyx of Held, scaffold protein inhibitors significantly enhanced synaptic depression with a time course closely matching to that enhanced by endocytic blocker" Please consider rephrasing.

      At the calyx of Held, scaffold protein inhibitors significantly enhanced synaptic depression with a time course closely matching to that enhanced by endocytic blocker →the early phase of synaptic depression like endocytic blockers

      Page 13, line 318: Please clearly state which experiments were performed at 1.3 mM and which at 2 mM external Ca if two different concentrations were used during recordings.

      320: Added text “Unless otherwise noted, EPSCs were recorded in 1.3 mM [Ca2+] aCSF at 37oC” in the methods.

      Page 15: line 346: Reference in the wrong format.

      346; (25) → (Yamashita et al, 2005)

      Page 15: line 351: Do you mean to say every 10 s and every 20 s? Please clarify.

      No, averaged at 10 ms and 20 ms, respectively as written.

      Page 16, line 369: 1 mM kyn was present in only very few experiments shown in the supplemental figures. Please clarify.

      368: In some experiments, to test in the presence of 1 mM kyn, if there is any difference in enhanced STD following endocytic block. However, as shown in Figure S3, our results are essentially the same with or without kynurenate, suggesting glutamate released during AP-evoked EPSCs does not saturate or desensitize postsynaptic receptors at post-hearing calyces of Held (Ishikawa et al, 2002; Yamashita et al, 2003) unlike in pre-hearing calyces (Yamashita et al, 2009).

      Page 16, line 387: You cannot simply use multiple t-tests to compare a single control to multiple test conditions which seems to be the scenario here. Please correct or clarify.

      Experimental protocols are clarified in Methods as “Experiments were designed as population study using different cells from separate brain slices under control and drug treatment, rather than on a same cell before and after the drug exposure.”

      Table S1: 'Endo decay rate'. It's either the 'Endo rate' or the 'Deacy rate of delta Cm'. Please correct.

      Corrected as Endocytosis rate (Endo rate).

    2. eLife assessment

      Following synaptic vesicle fusion events at release sites, vesicle remnants will need to be cleared in order to allow new rounds of vesicle docking and fusion. This fundamental study of Mahapatra and Takahashi examines the role of release site clearance in synaptic transmission during repetitive activity in two types of central synapses, the giant calyx of Held and hippocampal CA1 synapses. The study uses pharmacological approaches to interfere with release site clearance by blocking membrane retrieval (endocytosis). They compare the effects on short-term plasticity with those obtained by pharmacologically inhibiting scaffold protein activity. The data presented make a compelling case for fast endocytosis as necessary for rapid site clearance and vesicle recruitment to active zones. The data reveal an unexpected, fast role for local site clearance in counteracting synaptic depression.

    3. Reviewer #1 (Public Review):

      Summary:

      The study examines the role of release site clearance in synaptic transmission during repetitive activity under physiological conditions in two types of central synapses, calyx of Held and hippocampal CA1 synapses. After acute block of endocytosis by pharmacology, deeper synaptic depression or less facilitation was observed in two types of synapses. Acute block of CDC42 and actin polymerization, which possibly inhibits the activity of Intersectin, affected synaptic depression at the calyx synapse, but not at CA1 synapses. The data suggest an unexpected, fast role of the site clearance in counteracting synaptic depression.

      Strengths:

      The study uses acute block of the molecular targets with pharmacology together with precise electrophysiology. The experimental results are clear cut and convincing. The study also examines the physiological roles of the site clearance using action potential-evoked transmission at physiological Ca and physiological temperature at mature animals. This condition has not been examined.

      Weaknesses:

      Pharmacology may have some off-target effects, though acute manipulation should be appreciated and the authors have tried several reagents to verify the overall conclusions.

    4. Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Mahapatra and Takahashi report on the physiological consequences of pharmacologically blocking either clathrin and dynamin function during compensatory endocytosis or of the cortical actin scaffold both in the calyx of Held synapse and hippocampal boutons in acute slice preparations

      Strengths:

      Although many aspects of these pharmacological interventions have been studied in detail during the past decades, this is a nice comprehensive and comparative study, which reveals some interesting differences between a fast synapse (Calyx of Held) tuned to reliably transmit at several 100 Hz and a more slow hippocampal CA1 synapse. In particular the authors find that acute disturbance of the synaptic actin network leads to a marked frequency-dependent enhancement of synaptic depression in the Calyx, but not in the hippocampal synapse This striking difference between both preparations is the most interesting and novel finding.

      Weaknesses:

      Unfortunately, however, these findings concerning the different consequences of actin depolymerization are not sufficiently discussed in comparison to the literature. My only criticism concerns the interpretation of the ML 141 and Lat B data. With respect to the Calyx data, I am missing a detailed discussion of the effects observed here in light of the different RRP subpools SRP and FRP. This is very important since Lee at al. (2012, PNAS 109 (13) E765-E774) showed earlier that disruption of actin inhibits the rapid transition of SRP SVs to the FRP at the AZ. The whole literature on this important concept is missing. Likewise, the role of actin for the replacement pool at a cerebellar synapse (Miki et al., 2016) is only mentioned in half a sentence. There is quite some evidence that actin is important both at the AZ (SRP to FRP transition, activation of replacement pool) and at the peri-active zone for compensatory endocytosis and release site clearance. Both possible underlying mechanisms (SRP to FRP transition or release site clearance) should be better dissected.

    5. Reviewer #3 (Public Review):

      The manuscript by Mahapatra and Takahashi addresses the role of presynaptic release site clearance during sustained synaptic activity. The authors characterize the effects of pharmacologically interfering with SV endocytosis (pre-incubation with Dynasore or Pitstop-2) on synaptic short-term plasticity (STP) at two different CNS synapses (calyx of Held synapses and hippocampal SC to CA1 synapses) using patch-clamp recordings in acute slices under experimental conditions designed to closely mimic a physiological situation (37{degree sign}C and 1.3 mM external [Ca2+]). Endocytosis blocker-induced changes in STP and in the recovery from short-term depression (STD) are compared to those seen after pharmacologically inhibiting actin filament assembly (pre-incubation with Latrunculin-B or the selective Cdc42 GTPase inhibitor ML-141). Presynaptic capacitance (Cm) recordings in calyx terminals were used to establish the effects of the pharmacological maneuvers on SV endocytosis.<br /> Latrunculin-B and ML-141 neither affect SV endocytosis (assayed by Cm recordings) nor EPSC recovery following conditioning trains, but strongly enhances STD at calyx synapses. No changes in STP were observed at Latrunculin-B- or ML-141-treated SC to CA1 synapses.

      Dynasore and Pitstop-2 slow down endocytosis, limit the total amount of exocytosis in response to long stimuli, enhance STD in response to 100 Hz stimulation, but profoundly accelerate EPSC recovery following conditioning 100 Hz trains at calyx synapses. At SC to CA1 synapses, Dynasore and Pitstop-2 reduce the extend of facilitation and lower relative steady-state EPSCs suggesting a change in the facilitation-depression balance in favor of the latter.

      The authors use state-of-the art techniques and their data, which is clearly presented, leads to authors to conclude that endocytosis is universally important for clearance of release sites while the importance of scaffold protein-mediated site clearance is limited to 'fast synapses'.

      Unfortunately, and perhaps not completely unexpected in view of the pharmacological tools chosen, there are several observations which remain difficult to understand:

      (1) Blocking site clearance affects release sites that have previously been used, i.e. sites at which SV fusion has occurred and which therefore need to be cleared. Calyces use at most 20% of all release sites during a single AP, likely fewer at 1.3 mM external [Ca2+]. Even if all those 20% of release sites become completely unavailable due to a block of release site clearance, the 2nd EPSC in a train should not be reduced by >20% because ~80% of the sites cannot be affected. However, ~50% EPSC reduction was observed (Fig. 2B1, lower right panel) raising the possibility that Dynasore does more than specifically interfering with SVs endocytosis (and possibly Pitstop as well). Non-specific effects are also suggested by the observed two-fold increase in initial EPSC size in SC to CA1 synapses after Dynasore pre-incubation.

      (2) More severe depression was observed at calyx synapses after blocking endocytosis which the authors attribute to a presynaptic mechanism affecting pool replenishment. When probing EPSC recovery after conditioning 100 Hz trains, a speed up was observed mediated by an "unknown mechanism" which is "masked in 2 mM [Ca2+]". These two observations, deeper synaptic depression during 100 Hz but faster recovery from depression following 100 Hz, are difficult to align and no attempt was made to find an explanation.

      (3) To reconcile previous data reporting a block of Ca2+-dependent recovery (CDR) by Dynasore or Latrunculin (measured at 2 mM external [Ca2+]) with the data presented here (using 1.3 mM external [Ca2+]) reporting no effect or a speed up of recovery from depression, the authors postulate that "CDR may operate only when excessive Ca2+ enters during massive presynaptic activation" (page 10 line 244). While that is possible, such explanation ignores plenty of calyx studies demonstrating fiber stimulation-induced CDR and elucidating molecular pathways mediating fiber stimulation-induced CDR, and it also completely dismisses the strong change in recovery time course after 10 Hz conditioning (single exponential) as compared to 100 Hz conditioning (double exponential with a pronounced fast component).

      Strong presynaptic stimuli such as those illustrated in Figs. 1B,C induce massive exocytosis. The illustrated Cm increase of 2 to 2.5 pF represents fusion of 25,000 to 30,000 SVs (assuming a single SV capacitance of 80 aF) corresponding to a 12 to 15% increase in whole terminal membrane surface (assuming a mean terminal capacitance of ~16 pF). Capacitance measurements can only be considered reliable in the absence of marked changes in series and membrane conductance. Documentation of the corresponding conductance traces is therefore advisable for such massive Cm jumps and merely mentioning that the first 450 ms after stimulation were skipped during analysis or referring to previous publications showing conductance traces is insufficient.<br /> All bar graphs in Figures 1 through 6 and Figures S3 through S6 compare three or even four (Fig. 5C) conditions, i.e. one control and at least two treatment data sets. It appears as if repeated t-tests were used to run multiple two-group comparisons (i.e. using the same control data twice for two different comparisons). Either a proper multiple comparison test should be used or a Bonferroni correction or similar multiple-comparison correction needs to be applied.

      Finally, the terminology of contrasting "fast-signaling" (calyx synapses) and "slow-plastic" (SC synapses) synapses seems to imply that calyx synapses lack plasticity, as does the wording "conventional bouton-type synapses involved in synaptic plasticity" (page 11, line 251). I assume, the authors primarily refer to the maximum frequencies these two synapse types typically transmit (fast-signaling vs slow-signaling)?

    1. Author Response

      The following is the authors’ response to the original reviews.

      Major change:

      All three of our reviewers raised the possibility that changes in movement during the time spent at the center ports could have contributed to changes in SWR rates. Analyses to address this possibility, based on the examination of trials with high and low speeds, were originally included in the supplement but we did not sufficiently highlight and explain these results. To rectify this, we have moved these results into a new main Figure 3 and now include a paragraph describing our interpretation of these results (page 9). We also include a more detailed description of the subjects’ behavior during port times – namely, that all subjects must remain quite stationary while at the reward ports in order to keep their nose in a specific position which keeps the port triggered. As a result, all subjects maintain head speeds well below our typical speed threshold for immobility while at the ports. This leads us to predict that any feedback based on periods of immobility alone (as requested by Reviewer 3) would show results very similar to our Control cohort and would not alter SWR rates seen during neurofeedback trials.

      Minor changes:

      (1) Reviewer 1 observed our that reported statistics appeared to be missing an interaction term showing that neurofeedback differentially affected the SWR rate/count pre- and postreward. We apologize for a lack of clarity here: we fit pre- and post-reward times with separate linear mixed effects models, so this interaction term is neither expected nor defined in our model. We have added a sentence clarifying this aspect of our LME approach in the Methods section: “Each model is designed to compare samples from all trials of the control group to samples from neurofeedback and delay trials from the neurofeedback cohort for a specific time period (for instance, pre-reward-delivery at the center ports).” Combining both times in the same model would require adding an additional hierarchical level in order to preserve the pairing of the pre- and post-reward time period for each trial, which we are concerned would complicate the formulation and interpretation of the model. However, the reviewer raises a good point that the comparison between these two time periods reveals an additional difference between the trial types: SWR rate remains relatively consistent between the pre- and post-reward periods during neurofeedback trials, while delay and control trials show a clear increase in SWR rate between the two time periods. To visualize and quantify this effect, we calculated the difference in SWR rates between the two time periods and now include this plot as Supplementary Figure 2F, which is referenced in page 8 of the main text.

      (2) Reviewer 2 found our original title, “Neurofeedback training can modulate task-relevant memory replay in rats” to be misleading and suggestive of a manipulation to memory content. We are in complete agreement with the Reviewer in that our manipulation does not alter replay content, so to be more specific and accurate, we have changed our title to their suggestion “Neurofeedback training can modulate task-relevant memory replay rate in rats” accordingly.

      (3) Reviewer 2 also requested that we include analyses quantifying baseline SWR rates for each of our experimental subjects. Although we initially considered reporting our results in measures of change relative to each individual animal’s baseline, we decided against this approach for several reasons.

      First, it is important to clarify that we extensively train the animals on the task prior to implant, so we do not have access to a truly naïve, pre-behavior baseline SWR rate for any of our subjects. However, because the pre-implant training is conducted consistently between our neurofeedback and our control cohort, we have no reason to believe that the behavioral training prior to implant would introduce differences in SWR rate between the cohorts. Indeed, we find no difference in post-reward SWR rate (or SWR rate at the home well) when we quantify the first 250 trials of post-implant behavior for each subject (see panel A below). Note that we cannot compare the pre-reward SWR rate at this point, because it is influenced by the task structure which guarantees at least one SWR in each neurofeedback trial pre-reward.

      Further, we do find that SWR rate is quite consistent over many days of task performance in the control cohort (show for the post-reward period in panel B below). This suggests that comparing the post-neurofeedback training SWR rates for the neurofeedback cohort to SWR rates throughout the training for the control cohort is not likely to be confounded by differing amounts of training experience. This is supported by our analyses in Figure 2 which show no differences in SWR rate between the two cohorts when considering pre- and post-reward times combined.

      Author response image 1.

      (A) SWR rate calculated during the post-reward period at the center port for the first 250 trials of postimplant behavior for each animal. Trials of all types are included (ie both neurofeedback trials and delay trials for the manipulation cohort). Groupwise comparison p=0.192. (B) Mean SWR rate during the post-reward period at the center port for each behavioral training epoch shows no systematic change over time across subjects within the control cohort.

      Finally, within each cohort, we found the overall SWR rates to be quite consistent across animals. If each subject in the neurofeedback cohort had shown dramatically different SWR rates at the beginning of neurofeedback training, we would have needed to express the effect of neurofeedback training relative to baseline for each animal. However, since the range of SWR rates were highly comparable, we felt that it was more accessible, and easier to place our results within the context of the literature, by expressing our results as simple SWR rates themselves rather than measures of relative change. Within the neurofeedback cohort, comparing neurofeedback to delay trials is inherently matched for baseline SWR rate since these comparisons are made within the same animal.

      (4) Finally, Reviewer 2 raises the possibility that older animals or those with cognitive deficits might respond to neurofeedback differently. We entirely agree with this possibility, and note this in our Discussion section: “Since the neurofeedback paradigm depends on the occurrence of at least a low endogenous rate of SWR occurrence, it would be important to implement neurofeedback training as a relatively early interventional strategy prior to extensive neurodegeneration, and training may take longer in aged or impaired subjects.”

    2. eLife assessment

      This study tests the effects of using neurofeedback, in the form of reward delivery when large sharp wave-ripples (SWRs) are detected, on neurophysiological and behavioral measures. The results are important, and the authors provide convincing evidence that the rate of SWRs increased prior to reward delivery and decreased in the period after reward delivery, with no significant effect on memory performance. The ability to manipulate SWR rate in a naturalistic way is an exciting new tool for studies that seek to understand the function of SWRs.

    3. Reviewer #2 (Public Review):

      Gillespie et al. introduced a novel neurofeedback (NF) procedure to train rats in enhancing their sharp-wave ripple (SWR) rate within a short duration, a key neural mechanism associated with memory consolidation. The training, embedded within a spatial memory task, spanned 20-30 days and utilized food rewards as positive reinforcement upon SWR detection. Rats were categorized into NF and control groups, with the NF group further divided into NF and delay trials for within-subject control. While single trial differences were elusive due to the variability of SWR occurrence, the study revealed that statistically rats in NF trials exhibited a notably higher SWR rate before receiving rewards compared to delay trials. This difference was even more pronounced when juxtaposed with rats not exposed to NF training (control group). The unique design of blending the NF phase with the memory dependent spatial task enabled the authors to analyze whether the NF training influence the task performance and replay content during SWRs across three different conditions (NF trials, delay trials and control group). Interestingly, despite the NF training, there was no significant improvement or decline in the performance of the spatial memory task, and the replay content remained consistent across all three conditions. Hence, the operant conditioning only amplified the SWR rate before reward in NF trials without altering the task performance and the replay content during SWR. Moreover, considering the post-reward period, the total SWR count was consistent across all conditions as well, meaning the NF training also do not affect the total SWR count. The study concludes with the hypothesis of a potential homeostatic mechanism governing the total SWR production in rats. This research significantly extends previous work by Ishikawa et al. (2014), offering insights into the NF training with external reward on the SWR rate/counts, replay content and task performance.

      Strengths:

      - Integration of NF task and spatial memory task in a single trial<br /> The integration of NF training within a spatial memory task poses significant challenges. Gillespie and colleagues overcame this by seamlessly blending the NF task and the spatial memory task into a single trial. Each trial involved a rat undergoing three steps: First, initiating a trial. Second, moving to either the NF port or the delay trial port, as indicated by an LED, and then maintaining a nosepoke at one of the center ports. During this step, the rat had to keep its nose (in the NF port) until a sharp-wave ripple (SWR) exceeding a set threshold was detected, which then triggered a reward, or until a variable time elapsed (in the delay port). Third, the rat would choose one of eight arms to explore before starting the next trial. This integration of the two tasks (step two as the NF task and step three as the spatial memory task) facilitated a direct analysis of the impact of NF training on behaviorally relevant replay content during SWRs and the performance in the spatial memory task.

      - Clear Group Separation<br /> A robust study design necessitates clear distinctions between experimental conditions to ensure that observed differences can be attributed to the variable under investigation. This study meticulously categorized rats into three distinct conditions: NF trials, delay trials (for within-subject control), and a control group (for across-subject control). Furthermore, for each trial, the times of interest (TOI) were separated into pre-reward and post-reward periods. This clear separation ensures that any observed differences in SWR rates and other outcomes can be confidently attributed to the effects of neurofeedback training during specific time periods, minimizing potential confounding factors.

      - Evidence of SWR rate modulation<br /> The study's results offer compelling evidence that rats can be trained to modulate their SWR rates during the pre-reward period. This is evident from the observation that rats in the NF trials consistently displayed a higher SWR rate before receiving rewards compared to those in delay trials or the control group (Fig. 2). Such findings not only validate the efficacy of the NF paradigm but also underscore the potential of operant conditioning in influencing neural mechanisms. The observation that rats were able to produce larger SWR events by modulating their occurrence rate, rather than merely waiting for these events, suggests a learned strategy to generate them more efficiently.

      - Evidence of SWR count homeostasis<br /> A notable finding from the study was the observation of a consistent total SWR count during both pre-reward and post-reward periods across all conditions, despite the evident increase in SWR rates during the pre-reward period in NF trials. This points to a potential homeostatic mechanism governing SWR production in rats. This balance suggests that while NF training can modulate the timing and rate of SWRs over a short duration, it doesn't influence the overall count of SWRs over a longer period. Such a mechanism might be essential in ensuring that the brain neither overcompensates nor depletes its capacity for SWRs, maintaining the overall neural balance and functionality. This discovery deepens our understanding of neural mechanisms and highlights potential avenues for future research into the regulatory processes governing neural activity.

      In this revision, the paper explores a neurofeedback technique in rats that modulates hippocampal sharp-wave ripple (SWR) rates, crucial for memory replay, without altering the content of those replays. The study demonstrates that neurofeedback can specifically increase SWR rates during a task's pre-reward phase. Revisions address concerns about movement's impact on SWR rates, clarify the statistical approach used, and modify the title for accuracy, now emphasizing the modulation of memory replay rates rather than suggesting alterations to memory content itself. I think all the concerns in the previous version have been addressed.

    4. Reviewer #3 (Public Review):

      Summary:

      This study implements an innovative neurofeedback procedure in rats, providing food reward upon detection of a sharp wave-ripple event (SWR) in the hippocampus. The elegant experimental design enables a within-animal comparison of the effects of this neurofeedback procedure as compared to a control condition in which equivalent reward is provided in a non-contingent manner. The neurofeedback procedure was found to increase SWR rate, followed by a compensatory reduction in SWR rate. These changes in SWR rate were not accompanied by any changes in memory performance on the memory-guided task.

      Strengths:

      The scientific premise for the study is outstanding. It addresses an issue of high importance, of developing ways to not merely describe correlations between SWRs (and their content) and memory performance, but to manipulate them. The authors argue clearly and convincingly that even studies that have performed causal manipulations of SWRs have important confounds and limitations, and most importantly for translational purposes, they are all invasive. So, the idea of developing a potentially non-invasive neurofeedback procedure for modulating SWRs is compelling both as an innovative new experimental manipulation in studies of SWRs, and as a potentially impactful therapeutic avenue.

      In addition to addressing an important issue with an innovative approach, the study has many other strengths. The data unambiguously show that the method is effective at increasing SWR rate in each individual subject. The experimental design allows within-subject comparison of neurofeedback and control trials, where the subjects wait an equivalent amount of time. The careful analyses of SWR properties and their content establish that neurofeedback SWRs are comparable to control SWRs. The data add further evidence to the notion that SWR rate is subject to homeostatic control. The paper is also exceptionally well written, and was a pleasure to read. So, there is a clear technical advance, in that there is now a method for increasing SWR rate non-invasively, which is rigorously established and characterized.

      Weaknesses:

      The one overall limitation I find with this study is that it is unclear to what extent the same (or better) results could have been obtained using behavior-feedback instead of neuro-feedback. Because SWR rates are generally higher during states of quiescence compared to active movement or task engagement, it is possible that reinforcing behaviorally detected quiescent states (e.g. low movement) would indirectly increase SWR rates. The authors include an important control analysis showing higher SWR rates in the neurofeedback condition even when movement speed is controlled for by subsetting the data, demonstrating that changes in movement speed cannot be the only explanation of the results. At the same time, the observation that all 4 subjects had lower movement speeds during neurofeedback compared to control trials suggests that neurofeedback is likely reinforcing both overt (behavior) and covert (SWR) processes. Understanding the relative contributions of each to the observed SWR increase would help clarify whether the neurofeedback approach is worth the additional effort and expense compared to behavioral feedback.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      (1) The authors' primary research question revolves around the inquiry of "how far in advance semantic information might become available from parafoveal preview." In contrast to prior studies, the current research seeks to achieve a breakthrough in terms of timing by employing innovative technology. They mention in the manuscript that "most of these studies have been limited to measuring parafoveal preview from fixations to an immediately adjacent word... We tackle these core issues using a new technique that combines the use of frequency tagging and the measurement of magnetoencephalography (MEG)-based signals." However, the argumentation for how this new technology constitutes a breakthrough is not sufficiently substantiated. Specifically, there are two aspects that require further clarification. Firstly, the authors should clarify the importance of investigating the timing of semantic integration in their research question. They need to justify why previous studies focusing on the preview effect during fixations to an immediately adjacent word cannot address their specific inquiry about "how far in advance semantic information might become available from parafoveal preview," which requires examining parafoveal processing (POF). Secondly, in terms of the research methodology, the authors should provide a more comprehensive explanation of the advantages offered by MEG technology in the observation of the timing of semantic integration compared to the techniques employed in prior research. Indeed, the authors have overlooked some rather significant studies in this area. For instance, the research conducted by Antúnez, Milligan, Hernández-Cabrera, Barber, & Schotter in 2022 addresses the same research question mentioned in the current study and employs a similar experimental design. Importantly, they utilize a natural reading paradigm with synchronized ERP and eye-tracking recordings. Collectively, these studies, along with the series of prior research studies employing ERP techniques and RSVP paradigms discussed by the authors in their manuscript, provide ample evidence that semantic information becomes available and integrated from words before fixation occurs. Therefore, the authors should provide a more comprehensive citation of relevant research and delve deeper into explaining the potential contributions of their chosen technology to this field.

      We express our gratitude to the reviewer for providing insightful comments. Firstly, we clarify the advantages of the RIFT technique. The revised paragraph is on Page 4 with tracked changes and is copied as follows:

      “…… The RIFT technique provides a notable advantage by generating a signal — the tagging response signal — specifically yoked to just the tagged word. This ensures a clear separation in processing the tagged word from the ongoing processing of other words, addressing a challenge faced by eye tracking and ERP/FRP approaches. Moreover, RIFT enables us to monitor the entire dynamics of attentional engagement with the tagged word, which may begin a few words before the tagged word is fixated.”

      We also rephase our research questions in the introduction section on Page 5 with tracked changes:

      “This paradigm allows us to address three questions. First, we aimed to measure when in the course of reading people begin to direct attention to parafoveal words. Second, we sought to ascertain when semantic information obtained through parafoveal preview is integrated into the sentence context. Modulations of pre-target RIFT responses by the contextual congruity of target words would serve as evidence that parafoveal semantic information has not only been extracted and integrated into the sentence context but that it is affecting how readers allocate attention across the text. Third, we explored whether these parafoveal semantic attention effects have any relationship to reading speed.”

      Secondly, we would like to elucidate the significance of investigating the timing of semantic integration and why this complements existing findings of parafoveal processing (POF) during reading. Our manuscript has been revised accordingly, with specific modifications highlighted on Page 2. The revised passage reads as follows:

      “…… eye tracking-based evidence for the extraction of parafoveal semantic information …… was eventually extended into English …… For example, Schotter and Jia (2016) showed preview benefits on early gaze measures for plausible compared to implausible words, even for plausible words that were unrelated to the target. These results demonstrate that semantic information can indeed be extracted from parafoveal words. However, due to the limitations of the boundary paradigm, which only assesses effects after target words have been fixated, it is challenging to precisely determine when and how parafoveal semantic processing takes place. Furthermore, it is generally hard to distinguish between the effects of cross-saccade integration (e.g., mismatch between the preview and the word fixated) and the effects of how differing words fit into the context itself (Veldre and Andrews, 2016a, 2016b).”

      Thirdly, we now better highlight the contributions of Antúnez et al. paper as they have provided important evidence for parafoveal semantic processing during natural reading. The relevant modifications are highlighted on Page 3. The revised passage is as follows: “Although many of these effects have been measured in the context of unnatural reading paradigms (e.g., the “RSVP flanker paradigm”), similar effects obtain during natural reading. Using the stimuli and procedures from Schotter and Jia (2016), Antúnez et al. (2022) showed that N400 responses, measured relative to the fixation before the target words (i.e., before the boundary change while the manipulated words were in parafoveal preview), were sensitive to the contextual plausibility of these previewed words. These studies suggest that semantic information is available from words before they are fixated, even if that information does not always have an impact on eye fixation patterns.”

      References:

      Schotter ER, Jia A. 2016. Semantic and plausibility preview benefit effects in English: Evidence from eye movements. J Exp Psychol Learn Mem Cogn 42:1839–1866. doi:10.1037/xlm0000281

      Veldre A, Andrews S. 2016a. Is Semantic Preview Benefit Due to Relatedness or Plausibility? J Exp Psychol Hum Percept Perform 42:939–952. doi:10.1037/xhp0000200

      Veldre A, Andrews S. 2016b. Semantic preview benefit in English: Individual differences in the extraction and use of parafoveal semantic information. J Exp Psychol Learn Mem Cogn 42:837–854. doi:10.1037/xlm0000212

      Antúnez M, Milligan S, Andrés Hernández-Cabrera J, Barber HA, Schotter ER. 2022. Semantic parafoveal processing in natural reading: Insight from fixation-related potentials & eye movements. Psychophysiology 59:e13986. doi:10.1111/PSYP.13986

      (2) Further, the authors emphasize semantic integration in their observed results but overlook the intricate relationship between access, priming, and integration. This assertion appears overly confident. Despite using low-constraint sentences and low-predicted targets (lines 439-441), differences between congruent and incongruent conditions may be influenced by word-level factors. For instance, in the first coherent sentence, such as "Last night, my lazy brother came to the party one minute before it was over" (line 1049), replacing the keyword "brother" with an incongruent word could create an incoherent sentence, possibly due to semantic violation, relation mismatch with "lazy," or prediction error related to animate objects. A similar consideration applies to the second example sentence, "Lily says this blue jacket will be a big fashion trend this fall" (line 1050), where the effect might result from a discrepancy between "blue" and an incongruent word. However, the authors do not provide incongruent sentences to substantiate their claims. I recommend that the authors discuss alternative explanations and potentially control for confounding factors before asserting that their results unequivocally reflect semantic integration. My intention is not to dispute the semantic integration interpretation but to stress the necessity for stronger evidence to support this assertion.

      We agree with the reviewer that stimulus control is very critical for this kind of work and apologize for the lack of clarity in the original manuscript.

      (1) We fully agree that word-level factors can be an important confound, which is why we carefully controlled word-level factors in the experimental design. As detailed in the Appendix of the original manuscript, each pair of target words has been strategically embedded into two sentences, allowing for the creation of both congruent and incongruent sentence pairs through the interchange of these words. We now have explicitly specified this design in all sentences, as reflected in the edited manuscript on Page 38. For example, considering the exemplar pair of “brother/jacket”,

      “Last night, my lazy brother/jacket came to the party one minute before it was over.

      Lily says this blue jacket/brother will be a big fashion trend this fall.”

      In this design, the pair of target words is presented in both congruent and incongruent sentences. Participant A reads “lazy brother” and “blue jacket”, while Participant B reads “lazy jacket” and “blue brother”. This approach ensures that the same target words appear in both congruent and incongruent conditions across participants, serving as an effective control for word-level factors.

      (2) We acknowledge that the consideration of word-level information is crucial when making claims about contextual integration in the current study. However, we don’t think there are many cases in the stimulus set where a single feature like animacy is enough to create the mismatch. Instead, the stimuli were written so that it is not possible to strongly predict any word or even a specific semantic feature, so that appreciating the mismatch requires the comprehender to integrate the word into the context (and especially to integrate the word with the immediately preceding one). However, this more local modifier/noun plausibility may behave differently from a more global contextual plausibility, which is a limitation of the stimulus set and has been discussed in the revised manuscript, as indicated by the tracked changes on Page 16, as copied below:

      “Two noteworthy limitations exist in the current study. Firstly, the construction of pretarget–target word pairs consistently follows an adjective–noun phrase structure, potentially leading to semantic violations arising from immediate local incongruence rather than a broader incongruence derived from the entire sentential context. While the context preceding target words was deliberately minimized to ensure a pure effect of bottom-up parafoveal processing rather than the confounding impact of top-down prediction, it is essential to recognize that information from both local and global contexts can exert distinct effects on word processing during natural reading (Wong et al., 2022). Future investigations should incorporate more information-rich contexts to explore the extent to which the parafoveal semantic integration effect observed in this study can be generalized.”

      References:

      Wong R, Veldre A, Andrews S. 2022. Are There Independent Effects of Constraint and Predictability on Eye Movements During Reading? J Exp Psychol Learn Mem Cogn. doi:10.1037/XLM0001206

      Reviewer #2 (Public Review):

      This MEG study used co-registered eye-tracking and Rapid Invisible Frequency Tagging (RIFT) to track the effects of semantic parafoveal preview during natural sentence reading. Unpredictable target words could either be congruent or incongruent with sentence context. This modulated the RIFT response already while participants were fixating on the preceding word. This indicates that the semantic congruency of the upcoming word modulates visual attention demands already in parafoveal preview.

      The quest for semantic parafoveal preview in natural reading has attracted a lot of attention in recent years, especially with the development of co-registered EEG and MEG. Evidence from dynamic neuroimaging methods using innovative paradigms as in this study is important for this debate.

      We express our gratitude to the reviewer for recognizing the significance of our research question in the domain of natural reading.

      Major points:

      (1) The authors frame their study in terms of "congruency with sentence context". However, it is the congruency between adjective-noun pairs that determines congruency (e.g. "blue brother" vs "blue jacket", and examples p. 16 and appendix). This is confirmed by Suppl Figure 1, which shows a significantly larger likelihood of refixations to the pre-target word for incongruent sentences, probably because the pre-target word is most diagnostic for the congruency of the target word. The authors discuss some possibilities as to why there is variability in parafoveal preview effects in the literature. It is more likely to see effects for this simple and local congruency, rather than congruency that requires an integration and comprehension of the full sentence. I'm not sure whether the authors really needed to present their stimuli in a full-sentence context to obtain these effects. This should be explicitly discussed and also mentioned in the introduction (or even the abstract).

      We have addressed this limitation of the study explicitly in the revised manuscript. The modifications can be found in the tracked changes on Page 16, and is copied as follows:

      “Two noteworthy limitations exist in the current study. Firstly, the construction of pretarget–target word pairs consistently follows an adjective–noun phrase structure, potentially leading to semantic violations arising from immediate local incongruence rather than a broader incongruence derived from the entire sentential context. While the context preceding target words was deliberately minimized to ensure a pure effect of bottom-up parafoveal processing rather than the confounding impact of top-down prediction, it is essential to recognize that information from both local and global contexts can exert distinct effects on word processing during natural reading (Wong et al., 2022). Future investigations should incorporate more information-rich contexts to explore the extent to which the parafoveal semantic integration effect observed in this study can be generalized.”

      References:

      Wong R, Veldre A, Andrews S. 2022. Are There Independent Effects of Constraint and Predictability on Eye Movements During Reading? J Exp Psychol Learn Mem Cogn. doi:10.1037/XLM0001206

      (2) The authors used MEG and provided a source estimate for the tagging response (Figure 2), which unsurprisingly is in the visual cortex. The most important results are presented at the sensor level. This does not add information about the brain sources of the congruency effect, as the RIFT response probably reflects top-down effects on visual attention etc. Was it necessary to use MEG? Would EEG have produced the same results? In terms of sensitivity, EEG is better than MEG as it is more sensitive to radial and deeper sources. This should be mentioned in the discussion and/or methods section.

      Source estimation was exclusively provided for the tagging response rather than the congruency effect because we posit that this conditional contrast would emanate from the same brain regions exhibiting the tagging responses in general. As depicted in the following figure, source localization for the congruency effect was identified in the left association cortex (Brodmann area 18), the same area as the source localization for the tagging response (the negative cluster observed here is due to the incongruent minus congruent contrast). While we agree with the Reviewer that the RIFT result might indicate a top-down effect on visual attention, it is important to note that, due to the low-pass filter property of synapses, observing a tagging response at a high frequency beyond the visual cortex is challenging.

      Author response image 1.

      We discussed the necessity of using MEG in the edited manuscript with tracked changes on Page 20, and is copied as follows:

      “While the current study was conducted using MEG, these procedures might also work with EEG. If so, this would make our approach accessible to more laboratories as EEG is less expensive. However, there are currently no studies directly comparing the RIFT response in EEG versus MEG. Therefore, it would be of great interest to investigate if the current findings can be replicated using EEG.”

      (3) The earliest semantic preview effects occurred around 100ms after fixating the pre-target word (discussed around l. 323). This means that at this stage the brain must have processed the pre-target and the target word and integrated their meanings (at some level). Even in the single-word literature, semantic effects at 100 ms are provocatively early. Even studies that tried to determine the earliest semantic effects arrived at around 200 ms (e.g. (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3382728/, https://psycnet.apa.org/record/2013-17451-002). The present results need to be discussed in a bit more detail in the context of the visual word recognition literature.

      We have incorporated this valuable suggestion into the discussion section to enhance the clarity of our key result regarding the timing of parafoveal semantic integration. The revised manuscript with tracked changes can be found on Page 14, and the relevant passage is provided below:

      “Our results also provide information about the time course of semantic integration …… by as early as within 100 ms after fixating on the pre-target word. The timing of this parafoveal semantic effect appears remarkably early, considering that typical semantic access for a single word occurs no earlier than around 200 ms, as demonstrated in the visual word recognition literature (Carreiras et al., 2014). For instance, in a Go/NoGo paradigm, the earliest distinguishable brain activity related to category-related semantic information of a word occurs at 160 ms (Amsel et al., 2013; Hauk et al., 2012). Therefore, the RIFT results presented here suggest that natural reading involves parallel processing that spans multiple words. The level of (covert) attention allocated to the target word, as indexed by the significant difference in RIFT responses compared to the baseline interval, was observed even three words in advance (see Figure 2C). This initial increase in RIFT coincided with the target entering the perceptual span (McConkie and Rayner, 1975; Rayner, 1975; Underwood and McConkie, 1985), likely aligning with the initial extraction of lower-level perceptual information about the target. The emerging sensitivity of the RIFT signal to target plausibility, detected around 100 ms after the fixation on the pre-target word, suggests that readers at that time had accumulated sufficient semantic information about the target words and integrated that information with the evolving sentence context. Therefore, it is plausible that the initial semantic processing of the target word commenced even before the pre-target fixation and was distributed across multiple words. This parallel processing of multiple words facilitates rapid and fluent reading.”

      References:

      Carreiras M, Armstrong BC, Perea M, Frost R. 2014. The what, when, where, and how of visual word recognition. Trends Cogn Sci 18:90–98. doi:10.1016/j.tics.2013.11.005

      Amsel BD, Urbach TP, Kutas M. 2013. Alive and grasping: Stable and rapid semantic access to an object category but not object graspability. Neuroimage 77:1–13. doi:10.1016/J.NEUROIMAGE.2013.03.058

      Hauk O, Coutout C, Holden A, Chen Y. 2012. The time-course of single-word reading: Evidence from fast behavioral and brain responses. Neuroimage 60:1462. doi:10.1016/J.NEUROIMAGE.2012.01.061

      McConkie GW, Rayner K. 1975. The span of the effective stimulus during a fixation in reading. Percept Psychophys 17:578–586. doi:10.3758/BF03203972

      Rayner K. 1975. The perceptual span and peripheral cues in reading. Cogn Psychol 7:65–81.

      Underwood NR, McConkie GW. 1985. Perceptual Span for Letter Distinctions during Reading. Read Res Q 20:153. doi:10.2307/747752

      (4) As in previous EEG/MEG studies, the authors found a neural but no behavioural preview effect. As before, this raises the question of whether the observed effect is really "critical" for sentence comprehension. The authors provide a correlation analysis with reading speed, but this does not allow causal conclusions: Some people may simply read slowly and therefore pay more attention and get a larger preview response. Some readers may hurry and therefore not pay attention and not get a preview response. In order to address this, one would have to control for reading speed and show an effect of RIFT response on comprehension performance (or vice versa, with a task that is not close to ceiling performance). The last sentence of the discussion is currently not justified by the results.

      We acknowledge that the correlation analysis between the RIFT effect and reading speed on the group level lacks causality, making it less ideal for addressing this question. We have incorporated this acknowledgment as one of the limitations of the current study in the revised manuscript on Page 16, as indicated by the tracked changes, and the relevant passage is provided below:

      “Two noteworthy limitations exist in the current study. …… Secondly, the correlation analysis between the pre-target RIFT effect and individual reading speed (Figure 5) does not establish a causal relationship between parafoveal semantic integration and reading performance. Given that the comprehension questions in the current study were designed primarily to maintain readers’ attention and the behavioural performance reached a ceiling level, employing more intricate comprehension questions in future studies would be ideal to accurately measure reading comprehension and reveal the impact of semantic parafoveal processing on it.”

      We reformulated the last sentence:

      “These results support the idea that words are processed in parallel and suggest that early and deep parafoveal processing may be important for fluent reading.”

      (5) L. 577f.: ICA components were selected by visual inspection. I would strongly recommend including EOG in future recordings when the control of eye movements is critical.

      We appreciate the reviewer for providing this valuable suggestion. We acknowledge that EOG recordings were not included in the current study due to restrictions on MEG data collection from the University of Birmingham during the COVID-19 pandemic. In our future studies, we will follow the reviewer's suggestion to incorporate EOG recordings in data collection. This addition will facilitate optimal eye movement-related artifact rejection through ICA, as recommended by Dimigen in his methodological paper:

      Dimigen, O. (2020). Optimizing the ICA-based removal of ocular EEG artifacts from free viewing experiments. NeuroImage, 207, 116117.

      (6) The authors mention "saccade planning" a few times. I would suggest looking at the SWIFT model of eye movement control, which is less mechanistic than the dominant EZ-Reader model (https://psycnet.apa.org/record/2005-13637-003). It may be useful for the framing of the study and interpretation of the results (e.g. second paragraph of discussion).

      In the revised manuscript, we have provided a more comprehensive explanation eye movements/saccade planning, aligning it with the SWIFT model. Please refer to Page 15 with tracked changes, and the updated passage is provided below:

      “The results of the present study are aligned with the SWIFT model of eye movement control in natural reading (Engbert et al., 2005), wherein the activation field linked to a given word is hypothesized to be both temporally and spatially distributed. Indeed, we found that the initial increase in covert attention to the target word occurred as early as three words before, as measured by RIFT responses (Figure 2C). These covert processes enable the detection of semantic incongruity (Figure 3B and Figure 3C). However, it may occur at the non-labile stage of saccade programming, preventing its manifestation in fixation measures of the currently fixated pre-target word (Figure 1B). Therefore, the RIFT technique’s capacity to yoke patterns to a specific word offers a unique opportunity to track the activation field of word processing during natural reading.”

      References:

      Engbert R, Nuthmann A, Richter EM, Kliegl R. 2005. Swift: A dynamical model of saccade generation during reading. Psychol Rev 112:777–813. doi:10.1037/0033-295X.112.4.777

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      While the manuscript is well-written and presents a structured analysis of the data, it requires further clarification and substantiation regarding the originality of the research questions, the advantages of the proposed methodology, and the interpretation of the results related to semantic integration. Additional references and a more thorough discussion of related research are needed to strengthen the manuscript's contribution to the field.

      We appreciate the reviewer's kind words about this manuscript and the insightful comments and suggestions provided. In the revised manuscript, we have now placed additional emphasis on the importance of investigating semantic integration within the realm of parafoveal processing in natural reading. We have clarified the advantages of employing MEG and RIFT and expanded upon our results in the context of Antúnez et al.'s 2022 paper, as suggested by the reviewer.

      Reviewer #2 (Recommendations For The Authors):

      (1) L. 59: The "N400" has been linked to much more than "semantic access". I think it is widely accepted that "access" happens (or at least begins) earlier, and that the N400 reflects high-level integration processes etc.

      Earlier debates about whether the N400 is more linked to access or integration have resolved in favour of an access account, but with a growing appreciation of the blurred boundaries between constructions like access, priming, and integration, as Reviewer 1 also pointed out in comment #2.

      (2) L. 177: I wasn't sure about the selection of sensors. Were the same sensors used for all participants (whether they had a tagging response or not)?

      We appreciate the reviewer for highlighting the confusion regarding the sensor selection procedure in the study. In response, we have added further clarifications about this procedure in the Method section of the revised manuscript. The relevant changes can be found on Page 25 with tracked changes, and the modified passage is reproduced below:

      "Please note that the tagging response sensors may vary in number across participants (7.9 ± 4.5 sensors per participant, M ± SD). Additionally, they may have a different but overlapping spatial layout, primarily over the visual cortex. For the topography of all tagging response sensors, please refer to Figure 2A."

      (3) Ll. 247ff.: I don't understand the idea of a "spill-over effect". The future cannot spill into the past. Or does this refer to possible artefacts or technical problems?

      In the revised manuscript, we have rephrased this passage with tracked changes on Page 11, and the updated version is provided below:

      “We conducted a similar analysis of the coherence measured when participants fixated the target word and found no significant modulations related to the contextual congruity of that target word. …… Thus, the parafoveal semantic integration effect identified during the pre-target intervals cannot be attributed to signal contamination from fixations on the target word induced by the temporal smoothing of filters.”

      (4) I struggled to follow the "internal attention" explanation for the paradoxical RIFT effect (p. 11/12).

      We appreciate the reviewer for pointing out the confusion, and we have rephrased the passage in the revised manuscript with tracked changes on Page 13. The revised version is provided below:

      "Previous work has demonstrated that tagging responses decrease as attention shifts from an external task (e.g., counting visual targets) to an internal task (e.g., counting heartbeats) (Kritzman et al., 2022). Similarly, in a reading scenario, visually perceiving the flickering word constitutes an external task, while the internal task involves the semantic integration of previewed information into the context. If more attentional resources are internally directed when faced with the challenge of integrating a contextually incongruent word, fewer attentional resources would remain for processing the flickering word. This may be the kind of shift reflected in the reduction in RIFT responses."

      References:

      Kritzman L, Eidelman-Rothman M, Keil A, Freche D, Sheppes G, Levit-Binnun N. 2022. Steady-state visual evoked potentials differentiate between internally and externally directed attention. Neuroimage 254:119133.

      (5) L. 572: Why was detrending necessary on top of a 0.5 Hz high-pass filter? Was detrending applied to the continuous raw data, or to epochs? Was it just the linear trend or other polynomial terms?

      We agree with the Reviewer that, given the prior application of a 0.5Hz high-pass filter to the data, the detrending does not alter the data. Nonetheless, we included this procedure in the manuscript for the sake of completeness. In the revised manuscript, we have provided additional clarification on this point, as indicated by the tracked changes on Page 23. The modified passage is presented below:

      "Subsequently, detrending was applied individually to each channel of the continuous raw data to factor out the linear trend."

      (6) Source analysis, p. 25f.: How was the beamformer regularized?

      This information was already included in the original manuscript on Page 26. The original text is provided below for reference:

      “No regularisation was performed to the CSD matrices (lambda = 0).”

    2. eLife assessment

      This important study contributes to the understanding of how parafoveal words are neurally processed during naturalistic sentence reading. Solid evidence is provided that the MEG response to a word can be modulated by the semantic congruency of a parafoveal target word. The study addresses a classic question in reading using a new Rapid Invisible Frequency Tagging (RIFT) technique, which can separately monitor the neural processing of multiple words during sentence reading.

    3. Reviewer #1 (Public Review):

      The study investigates parafoveal processing during natural reading, combining eye-tracking and MEG techniques, building upon the RIFT paradigm previously introduced by Pan et al. (2021). Overall, the manuscript is well-written with a clear structure, and the data analysis and experimental results are presented in a lucid manner.

      The authors have addressed the issues I raised in the previous round of review to my satisfaction. However, I still have two concerns that require the authors' consideration.

      Firstly, the similarity between the RIFT analysis process in this study and traditional ERP analysis could lead readers to equate RIFT with components like N400, potentially influencing their interpretation of the results. Although the author's response has somewhat clarified my queries, I seek confirmation: does RIFT itself signify "visual attention" or the "allocation of attentional resources to the flickering target words" (line 208) in this study? While this may not be pivotal, as it primarily serves as an indicator to evaluate whether contextual congruity can indeed modulate the RIFT response rather than indicating early parafoveal semantic integration, I recommend that the authors explicitly address this point in the manuscript, maybe in the discussion section, to enhance reader comprehension of the article's rationale.

      Secondly, regarding the study's conclusions, there appears to be an overemphasis in stating that "semantic information ... can also be integrated with the sentence context ..." (line 21-22). As raised by Reviewer 2 (Major Point 1) and acknowledged by the authors in the limitations of the revised manuscript (lines 403-412), the RIFT effect observed likely stems from local congruency. Therefore, adjusting the conclusion to "integrated with previous context" may offer a more precise reflection of the findings.

    4. Reviewer #2 (Public Review):

      This MEG study used co-registered eye-tracking and Rapid Invisible Frequency Tagging (RIFT) to track the effects of semantic parafoveal preview during natural sentence reading. Unpredictable target words could either be congruent or incongruent with sentence context. This modulated the RIFT response already while participants were fixating the preceding word. This indicates that the semantic congruency of the upcoming word modulates visual attention demands already in parfoveal preview.<br /> The quest for semantic parafoveal preview in natural reading has attracted a lot of attention in recent years, especially with the development of co-registered eye-tracking and EEG/MEG. Evidence from dynamic neuroimaging methods using innovative paradigms as in this study is important for this debate.

      Major points:

      (1) The authors frame their study in terms of "congruency with sentence context". However, it is the congruency between adjective-noun pairs that determines congruency (e.g. "blue brother" vs "blue jacket", and examples p. 16 and appendix). This is confirmed by Suppl Figure 1, which shows a significantly larger likelihood of refixations to the pre-target word for incongruent sentences, probably because the pre-target word is most diagnostic for the congruency of the target word. The authors discuss some possibilities why there is variability in parafoveal preview effects in the literature. It is more likely to see effects for this simple and local congruency, rather than congruency that requires and integration and comprehension of the full sentence. Future studies should investigate whether the observed effects depend on sentence context or local congruency.

      (2) The authors used MEG and provided a source estimate for the tagging response (Figure 2), which unsurprisingly is in visual cortex. The most important results are presented at the sensor level. This does not add information about the brain sources of the congruency affect, as the RIFT response probably reflects top-down effects on visual attention etc. The RIFT response can test for preview effects on visual brain responses but does not allow the localisation of context processing effects that cause them.

      (3) The earliest semantic preview effects occurred around 100ms after fixating the pre-target word (discussed around l. 323). This means that at this stage the brain must have processed the pre-target and the target word and integrated their meanings (at some level). Even in the single-word literature, semantic effects at 100 ms are provocatively early. Future studies should aim at reconciling these different findings.

      (4) As in previous EEG/MEG studies, the authors found a neural but no behavioural preview effect. As before, this raises the question whether the observed effect is really "critical" for sentence comprehension. If interpreted in terms of "information" and "attention", then one would expect a positive effect on behaviour, either reading speed or accuracy. The authors provide a correlation analysis with reading speed, but this does not allow causal conclusions: Some people may simply read slowly and therefore pay more attention and get a larger preview response. Some readers may hurry and therefore not pay attention and not get a preview response. In order to address this, one would have to control for reading speed and show an effect of RIFT response on comprehension performance (or vice versa, with a task that is not close to ceiling performance).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Zhu, et al present a genome-wide histone modification analysis comparing patients with schizophrenia (on or off antipsychotics) to non-psychiatric controls. The authors performed analyses across the dorsolateral prefrontal cortex and tested for enrichment of nearby genes and pathways. The authors performed an analysis measuring the effect of age on the epigenomic landscape as well. While this paper provides a unique resource around SCZ and its epigenetic correlates, and some potentially intriguing findings in the antipsychotic response dataset there were some potential missed opportunities - related to the integration of outside datasets and genotypes that could have strengthened the results and novelty of the paper.

      Major Comments

      (1) Is there genotype data available for this cohort of donors or can it be generated? This would open several novel avenues of investigation for the authors. First the authors can test for enrichment of heritability for SCZ or even highly comorbid disorders such as bipolar. Second, it would allow the authors to directly measure the genetic regulation of histone markers by calculating QTLs (in this case histone hQTLs). The authors assert that although interesting, ATACseq approach does not provide the same chromatin state information as histone mods mapped by ChiP. Why do the authors not test this? There are several ATACseq datasets available for SCZ [https://pubmed.ncbi.nlm.nih.gov/30087329/]and an additional genomic overlap could help tease apart genetic regulation of the changes observed.

      As detailed in our Methods section, brain samples have previous medical diagnosis, treatment record, and toxicological screening. Unfortunately, there was no genotype information on our brain sample collection. However, we examined overlap of differential enhancer and promoter peaks with genetic variants using linkage disequilibrium score regression (Fig. S10). Additionally, to assess agreement with the literature, we compared DEGs identified in our study with a previous snRNA-seq study in postmortem prefrontal cortex of schizophrenics and controls (Table S7).

      Repressive histone marks tend to provide different information than ATAC-seq data. However, we examined only activating marks in this study. Thus, the sentence in the Introduction mentioning that “ATAC-seq approach does not provide the same chromatin state information as histone modifications mapped by chromatin immunoprecipitation sequencing (ChIP-seq) assays do” has been removed.

      (2) Can the authors theorize why their analysis found significant effects for H3K27Ac for antipsychotic use when a recent epigenomic study of SCZ using a larger cohort of samples and including the same histone modifications did not [https://pubmed.ncbi.nlm.nih.gov/30038276/]? Given the lower n and lower number of cells in this group, it would be helpful if the authors could speculate on why they see this. Do the authors know if there is any overlap with the Girdhar study donors or if there are other phenotypic differences that could account for this?

      As mentioned in the Methods sections, three strengths of this brain bank include i) inclusion of samples of schizophrenia subjects with antemortem diagnosis (i.e., based on clinical histories) and not with postmortem diagnosis (i.e., based on interviews with relatives and friends – a diagnostic approach used by many brain banks worldwide but with important limitations, see here: PMID: 15607306), ii) inclusion of control subjects individually matched by sex, age and PMD, and iii) our possibility to test the presence or absence of antipsychotic medications in blood samples as an independent experimental variable. This allowed us to obtained novel and statistically valid conclusions related to cell-type epigenetic alterations in the frontal cortex of schizophrenia subjects, and the impact of age and antipsychotic treatment on chromatin organization.

      There is no overlap with Girdhar study donors.

      (3) The reviewer is concerned about the low concordance between bulk nuclei RNA-seq and single-cell RNA-seq for SCZ (236 of 802 DEGs in NeuN+ and 63 of 1043 NEuN-). While it is not surprising for different cohorts to have different sets of DEGs these seem to be vastly different. Was there a particular cell type(s) that enriched for the authors' DEGs in the single-cell dataset? Do the authors know if any donors overlapped between these cohorts?

      This overlap is acceptable considering that these are datasets originated from an entirely distinct cohort of postmortem human brain samples.

      (4) Functional enrichment analyses: details are not provided by the authors and should be added. The authors need to consider a) providing a gene universe, ie only considering the sets of genes with nearby H3K4me3/ H3K27ac levels, to such pathway tools, and b) should take into account the fact that some genes have many more peaks with data. There are known biases in seemingly just using the best p-value per gene in other epigenetic analysis (ie. DNA methylation data) and software is available to run correct analyses: https://pubmed.ncbi.nlm.nih.gov/23732277.

      GREAT was used to map differential peak loci to target genes using the whole genome as the background set and default basal extension as per Nord et al. http://dx.doi.org/10.1016/j.cell.2013.11.033. We argue that it is more biologically relevant than comparing against an artificially selected background. These gene sets were then passed to Panther for Gene Ontology enrichment analysis as per Liu et al. 10.1186/s12940-015-0052-5.

      Additional details are provided in Materials and Methods section:

      ChIP-seq annotation and functional enrichment

      GREAT analysis (http://great.standford.edu) was performed on differential peaks using the whole genome as background and default basal extension from 5kb upstream to 1kb downstream of the TSS.

      Significantly enriched Gene Ontology biological processes were identified using the Panther Classification tools using a hypergeometric test.

      Reviewer #2 (Public Review):

      The manuscript by Zhu has generated ChIP-seq and RNA-seq data from sizeable cohorts of SCZ patient samples and controls. The samples include 15 AF-SCZ samples and 15 controls, as well as 14 AT-SCZ samples and 14 controls. The genomics data was generated using techniques optimized for low-input samples: MOWChIP-seq and SMART-seq2 for histone profiles and transcriptome, respectively. The study has generated a significant data resource for the investigation of epigenomic alterations in SCZ. I am not convinced that the hierarchical pairwise design - first comparing AF-SCZ and AT-SCZ with their corresponding controls and secondarily contrasting the two comparisons is fully justified. The authors should repeat the statistical analysis by modeling all three groups simultaneously with an interaction effect for treatment or directly compare AF-SCZ to AT-SCZ groups and evaluate if the main conclusions remain supported.

      Major comments

      (1) The manuscript did not discuss (mention) the quality control of RNA-seq data shown in Fig. 1B. The color scheme choice for the heatmap visualization did not provide a quantitative presentation of the specificity of the RNA-seq data. I would recommend using bar plots to present the results more quantitatively.

      QC of raw RNA-seq data including per sequence GC and adapter content was assessed with FastQC. Reads underwent soft-clipping during STAR alignment with on average 73.8% (+/- 0.08%) reads for neurons and 69.0% (+/- 0.99%) reads for glia being uniquely mapped. A new supplementary figure (Figure S5) has been included to show four bar plots representing the expression values more quantitatively.

      These details are now provided in the RNA-seq data processing part of the Materials and Methods section:

      RNA-seq data processing

      The human genome (GRCh38) and comprehensive gene annotation were obtained from GENCODE (v29). Quality control of RNA-seq reads including per sequence GC and adapter content was assessed with FastQC. Reads were mapped with STAR (2.7.0f) with soft-clipping (average of 73.8% (+/- 0.08%) reads uniquely mapped for neurons and 69.0% (+/- 0.99%) reads for glia) and quantified with featureCounts (v2.0.1) using the default parameters.

      (2) How does the specificity of this RNA-seq dataset compare to previous studies using a similar NeuN sorting strategy?<br /> As mentioned in the Results section, highly significant (median p-value = 6 ´ 10-7) pairwise differences in molecular marker expression were observed for all markers ranging from mature, functional and synaptic neuron markers to astrocyte, oligodendrocyte and microglial markers (Figure 1B; Figures S4 and S5; Table S5). This confirms neuronal and non-neuronal cell-type identities in the NeuN+ and NeuN- nuclei samples, respectively.

      (3) I appreciate the effort to assess the ChIP-seq data quality using phantompeakqualtools. However, prior knowledge/experience with this tool is required to fully understand the QC results. The authors should additionally provide browser shots at different scales for key neuronal/glial genes, so readers can have a more direct assessment of data quality, such as the enrichment of H3K4me3 at promoters (but not elsewhere), and H3K27ac at promoters and enhancers. Existing browser views, such as Fig. 2B are too zoomed out for assessing the data quality.

      A new Fig 2B has been generated with a magnified view for clearer examination.

      (4) The pairwise regression model should be explicitly reported in methods.

      Additional details are included in the Methods section:

      Differential analysis for RNA-seq data

      We analyzed the bulk RNA-seq data of 29 schizophrenia subjects and 29 controls. The initial step involved filtering out genes with low read counts (less than 20 reads in over 50% of samples). The analysis then employed a two-step method to estimate the technical and biological noise. The first step was identifying the top 10 principal components (PCs) of the dataset. Subsequently, the correlation between each PC and various experimental (alignment rate, unique rate, exon percentage, number of unique mapped reads) and demographic (sex, age at death, PMD, antemortem diagnosis) factors was calculated. Covariates with high correlation to the PCs were included in the analysis to minimize their impact. The analysis was conducted using the 'DESeq2' software package, and genes with a false discovery rate (FDR) below 0.05 were identified as differentially expressed.

      (5) The statistical strategy to compare AF-SCZ and AT-SCZ to their corresponding control groups was unjustified. Why not model all three groups simultaneously with an interaction effect for treatment or directly compare AF-SCZ to AT-SCZ groups? If the manuscript argues that the antipsychotic effect is the main novelty, why not directly compare AF-SCZ and AT-SCZ?

      This is an important point. As mentioned above, one of the main strengths of our experimental design is that schizophrenia subjects and controls were individually matched by sex and age and (if possible) postmortem delay and freezing storage time. Our study is also among the first to report the potential impact of antipsychotic treatment on chromatin organization using postmortem human brain samples. Because of this individual matching method, we only compared schizophrenia subjects (either antipsychotic-free or antipsychotic-treated) with their respective individually matched controls. This experimental design is supported by our previous publications with postmortem human brain samples (PMID: 36100039; PMID: 28783139; PMID: 26758213; PMID: 23129762; PMID: 22864611; PMID: 18297054). The rationale behind this experimental design – as well as potential limitations particularly related to the division of the schizophrenia group in antipsychotic-free and antipsychotic-treated – is mentioned in the Discussion:

      Related to the effect of antipsychotic treatment, frontal cortex samples of schizophrenia subjects were divided into AF and AT based on postmortem toxicological analysis in both blood and when possible brain samples, which provides information about a longer retrospective drug-free period due to the high liposolubility of antipsychotic medications (Voicu and Radulescu, 2009). However, we cannot fully exclude the possibility of previous exposure to antipsychotic medications in the AF-schizophrenia group, and hence that the epigenetic alterations observed exclusively in the AF-schizophrenia group are a consequence of a potential period of decompensation, which typically occurs following voluntary treatment discontinuation (Liu-Seifert et al., 2005).

      It is also worth mentioning here that data were analyzed both at the cohort level, as well as at an individual level (schizophrenia/cohort pairs). This is mentioned in the manuscript:

      It should be noted that in the differential analyses here, the schizophrenia subjects (whether AF or AT) and their controls were compared at the cohort level, while matched schizophrenia/control pairs were examined individually in the TF-based analyses.

      (6) The method of pairwise comparison to corresponding control groups, then further comparing the pairwise results opens the study to a number of statistical vulnerabilities. For example, on page 12, the studies identified 166 DEGs between AF and control, and 1273 DEGs between AT and control. Instead of implicating a greater amount of difference between AT and control, such a result can often be driven by differences in between-group variance, rather than between-group means, that is, are the SCZ-AF and SCZ-treated effect size magnitudes and directionalities similar (but the treated group has lower variance) or are the two groups truly different in terms of means? The result in Fig. 5A suggests effect sizes for the two comparisons (AF-Ctrl and AT-Ctrl) are similar but have lower variability in the treated group.

      For a discussion regarding our approach, which involves a pairwise comparison, see above.

      (7) The pairwise comparison further raised the possibility the results were driven by the difference in the two control cohorts rather than the two SCZ cohorts.

      We clearly show that age is an important independent factor (Fig 7). Since controls are individually matched by sex and age, this limits the validity of the comparison among the two cohort groups including subjects of different age (see Tables S1 and S2).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor Comments

      (1) Why not mention what histone modifications you measured by Chip-seq in the abstract? A certainly minor point but I felt I read for quite a while before I got to that point in the intro.

      The two histone marks are now mentioned in the abstract.

      (2) There are several places in the introduction where improper grammar is utilized and this should be edited.

      Introduction has been edited.

      (3) Related to major comments, how many donors overlapped with the PsychENCODE, CommonMind papers?

      Our datasets were generated from an entirely distinct cohort of postmortem human brain samples. Our postmortem sample collection does not overlap with postmortem samples included in PsychENCODE and/or CommonMind publications.

      (4) Since studies have already measured H3K4me3 and H3K27ac in the SCZ prefrontal cortex, why didn't the authors consider measuring changes in a related repressive marker? This is not to suggest the authors should do that now, but additional comments about other markers would help provide context for this analysis and point toward potential future studies.

      This is an interesting question and will be the goal of our future investigation.

    2. eLife assessment

      The study by Zhu et al. provides important insights into cell-specific genome-wide histone modifications in the frontal cortex of individuals with schizophrenia, as well as shedding light on the role of age and antipsychotic treatment in these associations. The evidence supporting the conclusions is solid.

    3. Reviewer #1 (Public Review):

      Zhu, et al present a genome-wide histone modification analysis comparing patients with schizophrenia (on or off antipsychotics) to non-psychiatric controls. The authors performed analyses across the dorsolateral prefrontal cortex and tested for enrichment of nearby genes and pathways. The authors performed analysis measuring the effect of age on the epigenomic landscape as well. This paper provides a unique resource around SCZ and its epigenetic correlates, and some potentially intriguing findings in the antipsychotic response dataset.

      Comments on revised version:

      The authors have adequately responded to my review comments.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Why does stimulation at 0.15 Hz show a third harmonic signal (Figure 5A) but 0.25 Hz does not show a second harmonic signal?

      Second and third harmonic signals were sometimes observed in 0.15 Hz and also in 0.25 Hz and other frequency stimulations. The second harmonic signal is easier to understand as vasomotion may be reacting to both directions of oscillating stimuli. The reason for the emergence of the third harmonics was totally unknown. These harmonic signals were not always observed, and the magnitude of these signals was variable. The frequency-locked signal was robust, thus, in this manuscript, we decided to describe only this signal. These observations are mentioned in the revised manuscript (Results, page 9, paragraph 2).

      References for the windows are missing. Closed craniotomy: (Morii, Ngai, and Winn 1986). Thinned skull: (Drew et al. 2010).

      These references were incorporated into the revised manuscript.

      An explanation of, or at least a discussion on, why a flavoprotein or other intrinsic signal from the parenchyma might follow vasomotion with high fidelity would be most helpful.

      We spend a large part of the Results describing that any fluorescence signal from the brain parenchyma follows the vasomotion because the blood vessels largely lack fluorescence signals within the filter band that we observe. This is described as “shadow imaging”. What was rather puzzling was that flavoprotein or other intrinsic signals were phase-shifted in time. This suggests that these autofluorescence signals have an anti-phase “shadow imaging” component and another component that is phase-shifted in time. This is described in the manuscript as the following.

      (Results, page 13, paragraph 2)

      “Production and degradation of flavin and other metabolites may be induced by the fluctuation in the blood vessel diameter with a fixed delay time. The phase shift in the autofluorescence could be due to the additive effect of “shadow” imaging of the vessel and to the concentration fluctuation of the autofluorescent metabolite”

      Glucose and oxygen are likely to be abundantly delivered during the vasodilation phase compared to the vasoconstriction phase of vasomotion. These molecules will trigger cell metabolism and endogenous fluorescent molecules such as NADH, NADPH, and FAD may increase or decrease with a certain delay, which is required for the chemical reactions to occur. Therefore, the concentration fluctuation of these metabolites could lag in time to the changes in the blood flow. These discussions are added in the revised manuscript (Discussions, page 19, paragraph 2).

      Reviewer #2 (Recommendations For The Authors):

      Minor corrections to the text and figures:

      (1) Figures 1 and 2- The single line slice basal and dilated traces are larger in Figure 2 (intact skull) than in Figure 1 (thinned skull)- have these been mixed up, as the authors state in the text that larger dilations are detected in the thinned skull preparation?

      The example vessel described for the thinned skull (Figure 1) happened to be larger than that shown for the intact skull (Figure 2). We did not describe that larger dilations are observed in the thinned skull preparation. What was described was that the vessel profiles were shallower in the intact skull. This is because the presence of the intact skull blurs the fluorescence image.

      (2) Figure 3- I think the lower panel of the amplitude spectrums from 3 individual animals included in D would benefit from being in its own panel within this Figure (i.e. E). The peak ratio is also used in this figure, but the equation to calculate this is not displayed until Figure 4.

      We thank the reviewer for recommending making the figure more comprehensible. We have divided panel D into D and E and shifted the panel character accordingly. The manuscript text was also updated.

      As the reviewer describes, the peak ratio of 0.25 Hz is used in Figure 3E (original). However, the equation to calculate this figure is described in the appropriate location within the main text of the manuscript (Results, page 10, paragraph 2) as well as in the figure legend.

      (3) Figure 5- In the visual stimulation traces displayed in C you have included a 10-degree scale bar, which looks similar in amplitude to the trace but the text states these are 17-degree amplitude traces.

      We thank the reviewer for noticing this mistake of labeling in the figure. We have corrected the error in the revised figure.

      (4) Figure 6- For the Texas red fluorescence traces and image scales displayed in F, you have shown the responding traces on the right and non-responding on the left, but the figure legend states the amplitude is strong on the left and weak on the right.

      We thank the reviewer for noticing the error in the figure legend text. We have corrected the error in the revised manuscript.

      (5) Figure 6- It would be helpful for the reader if the r value was displayed on the graph in G.

      We thank the reviewer for the suggestion. We have indicated the r value in Figure 6G as the reviewer recommended.

      Reviewer #3 (Recommendations For The Authors):

      Major

      It is unclear to me if the authors are studying vasomotion per se. Vasomotion is an intrinsic, natural rhythm of blood vessel diameter oscillation that is entrained by endogenous rhythmic neural activity. Importantly, if you take neural activity away, the blood vessel (with flow and pressure) should still be capable of oscillating due to an intrinsic mechanism within the vessel wall. In contrast, if one increases neural activity by way of sensory stimulation and blood flow increases, this is the basis of functional hyperemia. If one stimulates the brain over and over again at a particular frequency, it is expected that blood flow will increase whenever neural activity increases to the stimulus, up to a particular frequency until the blood vessel cannot physically track the stimulus fast enough. Functional hyperemia does not depend on an intrinsic oscillator mechanism. It occurs when the brain becomes active above endogenous resting activity due to sensory or motor activity.

      We thank the reviewer for stressing the importance of the distinction between “vasomotion” and functional “hyperemia”.

      We recognized that the terminology used in our paper was not explicitly explained. Traditionally, “vasomotion” is defined as the dilation and constriction of the blood vessels that occurs spontaneously at low frequencies in the 0.1 Hz range without any apparent external stimuli. Sensory-induced changes in the blood flow are usually called “hyperemia”. However, in our paper, we used the term, vasomotion, literally, to indicate both forms of “vascular” “motion”. Therefore, the traditional vasomotion was called “spontaneous vasomotion” and the hyperemia, with both vasoconstriction and vasodilation, induced with slow oscillating visual stimuli was called “visually induced vasomotion”. This distinction in the terminology is now explicitly introduced in the revised manuscript (Introduction, page 3, paragraph 2-3; page 4, paragraph 1-2).

      Using our newly devised methods, we show the presence of “spontaneous vasomotion”. However, this spontaneous vasomotion was often fragmented and did not last long at a specific frequency. With visual stimuli that slowly oscillated at temporal frequencies close to the frequency of spontaneous vasomotion, oscillating hyperemia, or “visually induced vasomotion” was observed. Importantly, this visually induced vasomotion is not observed in novice animals. Therefore, the visually induced vasomotion is not a simple sensory reaction of the vascular in response to neuronal activity in the primary visual cortex. We also do not know how the synchronized vasomotion can spread throughout the whole brain. Where the plasticity for vasomotion entrainment occurs is also unknown. How much of the visually induced vasomotion relies on the mechanisms of intrinsic spontaneous vasomotion is also undetermined. Discussion about the future directions of understanding the mechanisms of visually induced vasomotion and entrainment is described in better detail in the revised manuscript (Discussions, page 19, paragraph 1).

      To me, one would need to silence the naturally occurring vasomotion to study it. As soon as one activates the brain with an external stimulus, functional hyperemia is being studied. One idea that would be interesting to look at is whether a single or perhaps a double stimulus, in an untrained vs trained mouse, shows vasodilation that occurs across the cortex and in the cerebellum. In other words, is there something special about repeating the signal over and over again that results in brain-wide synchronization, or does a single or double oscillation of the same frequency (0.25Hz) also transiently synchronize the brain? My guess is that a short stimulus would give you the same thing (especially in a trained mouse) and that there is nothing special about oscillating the signal over and over again (except for the learning component).

      We thank the reviewer for the ideas of new experiments to understand whether the visually induced vasomotion shares the same mechanisms for creating spontaneous vasomotion or not.

      We would like to emphasize again that the visually induced vasomotion is not observed in the Novice animals. Therefore, the visually induced vasomotion is not a simple sensory reaction of the vascular in response to the visual stimuli. Entrainment with repeated presentation of visual stimuli is required for this global synchronization phenomenon to occur.

      We would also like to emphasize that, even in Expert animals, the visually induced vasomotion that is frequency-locked to the presented stimulus does not always occur immediately. As shown in Figure 3D lower panel (Figure 3E in the revised figure), the vasomotion did not always immediately frequency-lock. The vasomotion was also not always stable throughout the 15 min of visual stimulation presentation. These characteristics are emphasized in the revised manuscript (Results, page 10, paragraph 1).

      Therefore, we would assume that a single or double frequency of the visual stimulation would not always be sufficient to transiently frequency-lock the visually induced vasomotion.

      An alternative idea is to test frequencies lower than vasomotion. Vasomotion typically oscillates around a wide range of very low frequencies averaging around 0.1Hz, yet here the authors entrain blood vessel oscillations towards the top end of vasomotion, at 0.25Hz. What would happen if the authors tried synchronizing brain activity with 0.025Hz? Would the natural vasomotion frequency still be there, or would it be gone, dominated by the 0.025Hz entrainment?

      We would assume that visually induced vasomotion will not be induced with 0.025 Hz visual stimuli. This is too slow to induce smooth pursuit of the visual stimuli with eye movement. We show that, even if smooth eye pursuit occurs, the visually induced vasomotion may or may not occur (Figure 6F). However, visually induced vasomotion does not largely occur without eye movement. Therefore, the proposed experiment by the reviewer is likely not doable.

      Finally, perhaps the authors can see if there is a long-lasting change in natural vasomotion occurring after the animal has been trained to 0.25Hz. For example, is there greater power in the endogenous fluctuation at either 0.25Hz (or perhaps 0.1Hz) with no visual stimulation given but after the animal has been trained? These ideas would be interesting to test and could help clarify whether this is plasticity in functional hyperemia or plasticity in vasomotion.

      It should also be mentioned that the frequency-locked vasomotion quickly dissipates as soon as the visual stimulation is halted (Figure 3D upper panel, middle). However, we agree with the reviewer that it would be interesting to see whether the fragmentation of the spontaneous vasomotion is observed less in the Trained or Expert mice compared to the Novice mice, to understand whether the entrainment effect would propagate to the properties of the spontaneous vasomotion.

      This issue I have raised is not a fundamental flaw in the paper, it pertains more to the wording, phrasing, and pitch of the paper i.e. is this really entrained and plastic vasomotion? I am skeptical. Nevertheless, I think the authors should try some of these suggestions to better characterize this effect.

      We agree that the phrasing used in the original manuscript was rather confusing, as “vasomotion” normally refers to spontaneous vascular movement. However, functional “hyperemia” may not adequately express the phenomenon that we observe either. The phenomenon that we observe is slowly oscillating vasodilation and vasoconstriction that is induced with visual stimuli with a temporal frequency similar to the spontaneously occurring “vasomotion”. This phenomenon is not a direct hyperemia response to the visual stimuli as it requires entrainment and it spreads globally throughout the whole brain. We revised our manuscript to define the terminology that we use.

      An important question is if neural activity is entraining the CBF responses. The authors should do one experiment in a pan-neural GCaMP line to test if neural activity in the visual cortex (and other areas captured in the widefield microscope) shows a progressive and gradual synchronization (or not) to the vasomotion responses with training. It is possible to do this through a thinned skull window. This important to know if/how synchronized population neural activity scales with training. Perhaps they will not correlate and there is something more subtle going on.

      In our paper, we mainly studied visually induced vasomotion (or visual stimulus-triggered vasomotion). Therefore, visual stimulation must first activate the neurons and, through neurovascular coupling, the initial drive for vasomotion is likely triggered. However, visually induced vasomotion is not observed in novice animals. Therefore, the visually induced vasomotion is not a simple sensory reaction of the vascular in response to neuronal activity in the primary visual cortex.

      An important point that should be pointed out is that the neuronal visual response in the primary visual cortex could potentially decrease with repeated visual stimulation presentation as the adaptive movement of the eye should decrease the retinal slip. With repeated training sessions, a more static projection of the presented image will likely be shown to the retina. The neurovascular coupling could be enhanced with increased responsiveness of the vascules and vascular-to-vascular coupling could also be potentiated. This argument is now incorporated in the revised manuscript (Discussions, page 19, paragraph 1).

      We agree with the reviewer that, to identify the extent of the neuronal contribution to the vasomotion triggering, whole brain synchronization, and vasomotion entrainment, simultaneous neuronal calcium imaging would be ideal. However, due to the fact that fluorescent Ca2+ indicators expressed in neurons would also be distorted by the “shadow” effect from the vasomotion, exquisite imaging techniques would be required. We recognize this “shadow” effect and we are currently developing methods to take out the “shadow” effect and the intracellular pH fluctuation effect from the fluorescence traces.

      The authors nicely show that plasticity in vasomotion coincides with the mouse learning the HOKR task and that as eye movement tracks the stimulus, CBF gets entrained. However, there could also be a stress effect going on in the early trials, and as the mouse gets used to the procedure and stress comes down, the vasomotion entrainment can be seen. It could be the case that the vasomotion process is there on the first trial, but masked by stress-induced effects on neural and/or vascular activity. I did not see anything in the methods about how the mouse was habituated to head restraint. Was the first visual stim trial the first time the mouse was head restrained? If so, there could be a strong stress effect. The authors should address this either by clarifying that habituation to head restraint was done, or by doing a control experiment where each animal receives at least 1week of progressive and gradual head restraint before doing the same HOKR experiment using multiple trials.

      We agree with the reviewer that stress could well affect spontaneous vasomotion as well as visually induced vasomotion (or visual stimulus-triggered vasomotion). As the reviewer suggested, we could have compared the habituated and non-habituated mice to the initial visually induced vasomotion response. In addition, whether the experimentally induced increase in stress would interfere with the vasomotion or not could also be studied. With the TexasRed experiments, we observed that tail-vein injection stress appeared to interfere with the HOKR learning process. In the experiments presented in Fig. 3, TexasRed was injected before session 1. Vasomotion entrainment likely progressed with sessions 2 and 3 training. Before session 4, TexasRed was injected again to visualize the vasomotion. The vasomotion was clearly observed in session 4, indicating that the stress induced by tail-vein injection could not interfere with the generation of visually induced vasomotion. This argument is included in the revised manuscript (Discussions, page 20, paragraph 2).

      Minor

      The first sentence of the introduction requires citations. It is also a somewhat irrelevant comparison to make.

      Necessary citation was made in the revised manuscript, as the reviewer suggested. We think that describing how the energy is distributed in the brain would provide one of the most important breakthroughs to the understanding of how efficient information processing in the brain works. Therefore, we would like to keep this introduction.

      The third and fourth sentence of the introduction equates vasodilation/vasoconstriction with vasomotion and it is not this simple. Vasomotion is a specific physiological process involving rhythmic changes to artery diameter. Also, the frequency of these slow oscillations needs to be stated. The authors only say they are slower than 10Hz.

      The definition of spontaneous vasomotion with indication of typical temporal frequency is described in the revised manuscript, as the reviewer suggested.

      More than half of the introduction is describing the paper itself, rather than setting the stage for the findings. The authors need a more thorough account of what is known and what is not known in this area. Some of this information is in the discussion, which should be moved up to the intro.

      We have revised the introduction to include the definition of spontaneous vasomotion and visually induced vasomotion or functional hyperemia, as the reviewer suggested.

      In the first paragraph of the results section, the authors should state in what way the mice are awake. Are they freely mobile? Are they head-restrained? Are they resting or moving or doing both at different times? This is clarified later but it should come up front as someone reads through the paper.

      As the reviewer suggested, we clarified that the experiments were done in awake and head-restrained mice within the first paragraph for the Results section.

      The authors say "As shown later, blood vessels on the surface...". There is no need to say "as shown later".

      This is deleted as the reviewer suggested.

      The use of "full width at 10% maximum" of the Texas red intensity for the diameter measure is a little odd, as it may actually overestimate the diameter, but I see what the authors were trying to do. A full-width half max is standard here and that is likely more appropriate. Also, the line profiles of intensity are not raw data. The authors say the trace is strongly filtered/smoothed. If so, this creates a somewhat artificial platform to make the diameter measurement. The authors should show raw data from a single experiment and make the measurement from that. The raw line profile should look almost square, where a full-width half-max would work well.

      Contrary to what the reviewer observed, the raw line profile was not almost square. Even if there were almost no blur in the XY dimension in the optical imaging system, one would not expect to see a square line profile, as the thickness of the vessel increases in the Z dimension towards the center, as this is not a confocal or two-photon microscope image, and an ideal optical section was not created. Therefore, the full-width half-maximum value would definitely be an underestimate of the actual vessel diameter. It may be possible to equate an ideal value for cutoff if we have the 3D point spread function of the imaging. 10% is an arbitrary number but we think 10% is the minimum intensity that we can distinguish from the background intensity fluctuations. We did not attempt to derive the “true” diameter of the vessel and full-width at 10% maximum is just an index of the actual diameter. In most of the manuscript, we only deal with the change of the vessel diameter relative to the basal diameter, therefore, we considered that careful derivation of the absolute diameter estimate is not necessary. This argument is detailed in the Materials and Methods section in the revised manuscript (page 31, paragraph 2).

      The raw line profile before filtering is shown overlaid in Figure 1C, as the reviewer suggested.

      In Figures 1 and 2, state/label what brain region this is.

      The blood vessels between the bregma and lambda on the cortex were observed and described in Figures 1 and 2. This is described in the revised manuscript, as the reviewer suggested.

      Can the authors also show what a vein or venule looks like using their quantification method in Figures 1 and 2? This would be a helpful comparison to a static vein.

      The methods shown in Figures 1 and 2 would not allow us to distinguish between vein and venule in our study. Methods that allow quantification of the relative blood vessel diameter fluctuation due to spontaneous or visually induced vasomotion activities are shown in Figures 1 and 2. Later in the manuscript, the whole intensity fluctuation of TexasRed or autofluorescence in the brain parenchyma is studied, and in this case, no distinction between vein and venules could be made.

      Statements such as this are not necessary: "Later in the manuscript, we will be dealing with vasomotion dynamics observed with the optical fiber photometry methods, in which the blood vessel type under the detection of the fiber could not be identified". Simply talk about this data when you get to it.

      We have deleted this statement in this part of the manuscript, as the reviewer suggested.

      Same as this, please consider deleting: "Spontaneous vasomotion dynamic differences between different classes of blood vessels would be of interest to study using a more sophisticated in vivo two-photon microscope which we do not own." Just describe the data you have from the methods you have. There is no need to lament.

      We deleted this sentence, as the reviewer suggested.

      Figure 3 D the light blue boxes showing the time period of visual stimulation physically overlay with the frequency-time spectrograms. They should not overlay with this graph because it makes them more light blue, distorting the figure which also uses light blue in the heat map.

      Figure 3D was modified, as the reviewer suggested.

      The authors say: "The reason why the vasomotion detected in our system through the intact skull in awake in vivo mice was less periodic was unknown." Yes, but you are imaging an awake mouse. Many spontaneous behaviours such as whisking, grooming, twitching, and struggling will manifest as increased artery diameter. These will be functional hyperemia occurring events on top of rhythmic vasomotion. This can be briefly discussed.

      As the reviewer comments, the vasomotion detected in awake mice was likely to be less periodic because the spontaneous animal behavior induces functional hyperemia and interrupts spontaneous vasomotion. This interpretation was included in the revised manuscript (Results, page 8, paragraph 1).

      The authors say "extremely tuned" on page 8. They should not use words like "extremely". Perhaps say "more strongly tuned" or equivalent.

      We have changed “extremely” to “more strongly”, as the reviewer suggested.

      The authors say "First, the Texas Red fluorescence images were Gaussian filtered in the spatial XY dimension to take out the random noise presumably created within the imaging system." It is inadvisable to alter the raw data in this way unless there is a sound reason to do so. If there is random noise this should not affect the Fast Fourier Transform analysis. If there is regular noise caused by instrumentation artefact, which is picked up by the analysis then perhaps this could be filtered out. A static Texas red sample in a vial can be used to determine if there is artefactual noise.

      We mainly used the Gaussian filter for better presentation of the imaged data. The TexasRed fluorescence was low in intensity and the acquired images were Gaussian filtered in the spatial XY dimesion to reduce the pixelated noise at the expense of spatial resolution reduction. This filter should not affect the temporal frequency of the observed vasomotion. This is now more clearly indicated in the revised manuscript (Results, page 10, paragraph 2).

      There are endogenous fluorescent molecules in cell metabolism that change dynamically to neural activity: NADH, NADPH, and FAD. These are almost certainly a fraction of the auto-fluorescent signal the authors are measuring and it would be expected to see small fluctuations in these metabolites with neural activity. Perhaps this can be discussed, and the authors can likely argue that metabolic signals are much smaller than the change caused by vasodilation.

      We found that the autofluorescence signal was phase-shifted in time relative to the vasomotion, which was visualized with TexasRed. This suggests that these autofluorescence signals have an anti-phase “shadow imaging” component and another component that is phase-shifted in time. Glucose and oxygen are likely to be abundantly delivered during the vasodilation phase compared to the vasoconstriction phase of vasomotion. These molecules will trigger cell metabolism and endogenous fluorescent molecules such as NADH, NADPH, and FAD may increase or decrease with a certain delay, which is required for the chemical reactions to occur. Therefore, the concentration fluctuation of these metabolites could lag in time to the changes in the blood flow. It is also expected that these metabolites may fluctuate according to the neuronal activity that triggers visually induced vasomotion or functional hyperemia. These discussions are added in the revised manuscript (Discussions, page 19, paragraph 2).

      The authors say "however, we found that, if Texas Red had to be injected before every training session, the mouse did not learn very well." This is interesting. Why do the authors suppose this was the case? Stress from the injection? Or perhaps some deleterious effect on blood vessel function caused by the dye itself? Either way, I think this honest statement should remain. Others need to know about it.

      We think that the stress from the injection interferes with the HOKR learning. However, as shown, TexasRed injection after the mouse had learned did not interfere with the eye movement or with the visually induced vasomotion. We do not know whether the injection stress directly interferes with the blood vessel function and affects the plastic vasomotion entrainment. These arguments are now described in the revised manuscript (Discussions, page 20, paragraph 2). The statement above remains as is, as the reviewer suggested.

      YCnano50 is a calcium sensor and not really appropriate for the use employed by the authors. They are exciting YFP at 505nm but unless the authors are using a laser line, there is some bandwidth of excitation light that is likely exciting the CFP too which still absorbs light up to ~490nm. Here, calcium signalling may affect the YFP signal. This can be discussed.

      Multiband-pass filter (Chroma 69008x with the relevant band of 503 nm / 19.5 nm (FWHM)) was used for direct excitation of YFP. Negligible light is passed below 490 nm. CFP excitation above 490 nm is assumed to be negligible and usually not defined in literature. We assume that with our optical system, fluorescence by direct YFP excitation dominates the effect from the minor CFP excitation effect. We explicitly describe this in the revised manuscript (Materials and Methods, page 28, paragraph 2).

      The discussion is interesting but does not actually discuss much of the data or measurements in the paper. Most of the discussion reads more like a topical review, rather than a critical analysis of the effects/measurements and why the authors' interpretations are likely correct. This can be improved.

      As the reviewer suggests, we have improved the discussion by starting with the summary of the results (Discussion, page 19, paragraph 1). We also included the possibility of stress affecting visually induced vasomotion (Discussion, page 20, paragraph 2).

    2. eLife assessment

      This manuscript presents important results indicating a plastic enhancement in the vasomotion response of pial cortical arterioles to external stimulation in awake mice using a wide range of external visual stimulation paradigms. The evidence for this interesting effect, with broad potential applications, is solid. These results are relevant for scientists and clinicians interested in the regulation of blood flow in the brain.

    3. Reviewer #2 (Public Review):

      Sasaki et al. investigated methods to entrain vasomotion in awake wild-type mice across multiple regions of the brain using a horizontally oscillating visual pattern which induces an optokinetic response (HOKR) eye movement. They found that spontaneous vasomotion could be detected in individual vessels of their wild-type mice through either a thinned cranial window or intact skull preparation using a widefield macro-zoom microscope. They showed that low-resolution autofluorescence signals coming from the brain parenchyma could be used to capture vasomotion activity using a macro-zoom microscope or optical fibre, as this signal correlates well with the intensity profile of fluorescently-labelled single vessels. They show that vasomotion can also be entrained across the cortical surface using an oscillating visual stimulus with a range of parameters (with varying temporal frequencies, amplitudes, or spatial cycles), and that the amplitude spectrum of the detected vasomotion frequency increases with repeated training sessions. The authors include some control experiments to rule out fluorescence fluctuations being due to artifacts of eye movement or screen luminance and attempt to demonstrate some functional benefit of vasomotion entraining as HOKR performance improves after repeat training. These data add in an interesting way to the current knowledge base on vasomotion, as the authors demonstrate the ability to entrain vasomotion across multiple brain areas and show some functional significance to vasomotion with regards to information processing as HOKR task performance correlates well with vascular oscillation amplitudes.

    4. Reviewer #3 (Public Review):

      Summary:

      Here the authors show global synchronization of cerebral blood flow (CBF) induced by oscillating visual stimuli in the mouse brain. The study validates the use of endogenous autofluorescence to quantify the vessel "shadow" to assess the magnitude of frequency-locked cerebral blood flow changes. This approach enables straightforward estimation of artery diameter fluctuations in wild-type mice, employing either low magnification wide-field microscopy or deep-brain fibre photometry. For the visual stimuli, awake mice were exposed to vertically oscillating stripes at a low temporal frequency (0.25 Hz), resulting in oscillatory changes in artery diameter synchronized to the visual stimulation frequency. This phenomenon occurred not only in the primary visual cortex but also across a broad cortical and cerebellar surface. The induced CBF changes adapted to various stimulation parameters, and interestingly, repeated trials led to plastic entrainment. The authors control for different artefacts that may have confounded the measurements such as light contamination and eye movements but found no influence of these variables. The study also tested horizontally oscillating visual stimuli, which induce the horizontal optokinetic response (HOKR). The amplitude of eye movement, known to increase with repeated training sessions, showed a strong correlation with CBF entrainment magnitude in the cerebellar flocculus. The authors suggest that parallel plasticity in CBF and neuronal circuits is occurring. Overall, the study proposes that entrained "vasomotion" contributes to meeting the increased energy demand associated with coordinated neuronal activity and subsequent neuronal circuit reorganization.

      Strengths:

      -The paper describes a simple and useful method for tracking vasomotion in awake mice through an intact skull.<br /> -The work controls for artefacts in their primary measurements.<br /> -There are some interesting observations, including the nearly brain-wide synchronization of cerebral blood flow oscillations to visual stimuli and that this process only occurs after mice are trained in a visual task.<br /> -This topic is interesting to many in the CBF, functional imaging, and dementia fields.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable paper presents a thoroughly detailed methodology for mesoscale-imaging of extensive areas of the cortex, either from a top or lateral perspective, in behaving mice. While the examples of scientific results to be derived with this method are in the preliminary stages, they offer promising and stimulating insights. Overall, the method and results presented are convincing and will be of interest to neuroscientists focused on cortical processing in rodents.

      Authors’ Response: We thank the reviewers for the helpful and constructive comments. They have helped us plan for significant improvements to our manuscript. Our preliminary response and plans for revision are indicated below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors introduce two preparations for observing large-scale cortical activity in mice during behavior. Alongside this, they present intriguing preliminary findings utilizing these methods. This paper is poised to be an invaluable resource for researchers engaged in extensive cortical recording in behaving mice.

      Strengths:

      -Comprehensive methodological detailing:

      The paper excels in providing an exceptionally detailed description of the methods used. This meticulous documentation includes a step-by-step workflow, complemented by thorough workflow, protocols, and a list of materials in the supplementary materials.

      -Minimal movement artifacts:

      A notable strength of this study is the remarkably low movement artifacts. To further underscore this achievement, a more robust quantification across all subjects, coupled with benchmarking against established tools (such as those from suite2p), would be beneficial.

      Authors’ Response: This is a good suggestion. We have records of the fast-z correction applied by the ScanImage on microscope during acquisition, so we have supplied the online fast-z motion correction .csv files for two example sessions on our GitHub page as supplementary files:

      https://github.com/vickerse1/mesoscope_spontaneous/tree/main/online_fast_z_correction

      These files correspond to Figure S3b (2367_200214_E210_1) and to Figures 5 and 6 (3056_200924_E235_1). These are now also referenced in the main text. See lines ~595, pg 18 and lines ~762, pg 24.

      We have also made minor revisions to the main text of the manuscript with clear descriptions of methods that we have found important for the minimization of movement artifacts, such as fully tightening all mounting devices, implanting the cranial window with proper, evenly applied pressure across its entire extent, and mounting the mouse so that it is not too close or far from the surface of the running wheel. See Line ~309, pg 10.

      Insightful preliminary data and analysis:

      The preliminary data unveiled in the study reveal interesting heterogeneity in the relationships between neural activity and detailed behavioral features, particularly notable in the lateral cortex. This aspect of the findings is intriguing and suggests avenues for further exploration.

      Weaknesses:

      -Clarification about the extent of the method in the title and text:

      The title of the paper, using the term "pan-cortical," along with certain phrases in the text, may inadvertently suggest that both the top and lateral view preparations are utilized in the same set of mice. To avoid confusion, it should be explicitly stated that the authors employ either the dorsal view (which offers limited access to the lateral ventral regions) or the lateral view (which restricts access to the opposite side of the cortex). For instance, in line 545, the phrase "lateral cortex with our dorsal and side mount preparations" should be revised to "lateral cortex with our dorsal or side mount preparations" for greater clarity.

      Authors’ Response: We have opted to not change the title of the paper, because we feel that adding the qualifier, “in two preparations,” would add unnecessary complexity. In addition, while the dorsal mount preparation allows for imaging of bilateral dorsal cortex, the side mount preparation does indeed allow for imaging of both dorsal and lateral cortex across the right hemisphere (a bit of contralateral dorsal cortex is also imageable), and the design can be easily “flipped” across a mirror-plane to allow for imaging of left dorsal and lateral cortex. Taken together, we do show preparations that allow for pan-cortical 2-photon imaging.

      We do agree that imprecise reference to the two preparations can sometimes lead to confusion. Therefore, we made several small revisions to the manuscript, including at ~line 545, to make it clearer that we used two imaging preparations to generate our combined 2-photon mesoscope dataset, and that each of those two preparations had both benefits and limitations.

      -Comparison with existing methods:

      A more detailed contrast between this method and other published techniques would add value to the paper. Specifically, the lateral view appears somewhat narrower than that described in Esmaeili et al., 2021; a discussion of this comparison would be useful.

      Authors’ Response: The preparation by Esmaeili et al. 2021 has some similarities to, but also differences from, our preparation. Our preliminary reading is that their through-the-skull field of view is approximately the same as our through-the-skull field of view that exists between our first (headpost implantation) and second (window implantation) surgeries for our side mount preparation, although our preparation appears to include more anterior areas both near to and on the contralateral side of the midline. We have compared these preparations more thoroughly in the revised manuscript. (See lines ~278.)

      Furthermore, the number of neurons analyzed seems modest compared to recent papers (50k) - elaborating on this aspect could provide important context for the readers.

      Authors’ response: With respect to the “modest” number of neurons analyzed (between 2000 and 8000 neurons per session for our dorsal and side mount preparations with medians near 4500; See Fig. S2e) we would like to point out that factors such as use of dual-plane imaging or multiple imaging planes, different mouse lines, use of different duration recording sessions (see our Fig S2c), use of different imaging speeds and resolutions (see our Fig S2d), use of different Suite2p run-time parameters, and inclusion of areas with blood vessels and different neuron cell densities, may all impact the count of total analyzed neurons per session. We now mention these various factors and have made clear that we were not, for the purposes of this paper, trying to maximize neuron count at the expense of other factors such as imaging speed and total spatial FOV extent.

      We refer to these issues now briefly in the main text. (See ~line 93, pg 3).

      -Discussion of methodological limitations:

      The limitations inherent to the method, such as the potential behavioral effects of tilting the mouse's head, are not thoroughly examined. A more comprehensive discussion of these limitations would enhance the paper's balance and depth.

      Authors’ Response: Our mice readily adapted to the 22.5 degree head tilt and learned to perform 2-alternative forced choice (2-AFC) auditory and visual tasks in this configuration (Hulsey et al, 2024; Cell Reports). The advantages and limitations of such a rotation of the mouse, and possible ways to alleviate these limitations, as detailed in the following paragraphs, are now discussed more thoroughly in the revised manuscript at ~line 235, pg. 7.

      One can look at Supplementary Movie 1 for examples of the relatively similar behavior between the dorsal mount (not rotated) and side mount (rotated) preparations. We do not have behavioral data from mice that were placed in both configurations. Our preliminary comparisons across mice indicates that side and dorsal mount mice show similar behavioral variability. We have added brief additional mention of these considerations on ~lines 235-250, pg 7.

      It was in general important to make sure that the distance between the wheel and all four limbs was similar for both preparations. In particular, careful attention must be paid to the positioning of the front limbs in the side mount mice so that they are not too high off the wheel. This can be accomplished by a slight forward angling of the left support arm for side mount mice.

      Although it is possible to image the side mount preparation in the same optical configuration that we do without rotating the mouse, by rotating the objective 20 degrees to the right of vertical, we found that the last 2-3 degrees of missing rotation (our preparation is rotated 22.5 degrees left, which is more than the full available 20 degrees rotation of the Thorlabs mesoscope objective), along with several other factors, made this undesirable. First, it was very difficult to image auditory areas without the additional flexibility to rotate the objective more laterally. Second, it was difficult or impossible to attach the horizontal light shield and to establish a water meniscus with the objective fully rotated. One could use ultrasound gel instead (which we found to be, to some degree, optically inferior to water), but without the horizontal light shield, light from the UV and IR LEDs can reach the PMTs via the objective and contaminate the image or cause tripping of the PMT. Third, imaging the right pupil and face of the mouse is difficult under these conditions because the camera would need the same optical access angle as the 2-photon objective, or would need to be moved downward toward the air table and rotated up at an angle of 20 degrees, in which case its view would be blocked by the running wheel and other objects mounted on the air table.

      -Preliminary nature of results:

      The results are at a preliminary stage; for example, the B-soid analysis is based on a single mouse, and the validation data are derived from the training data set.

      Authors’ Response: In this methods paper, we have chosen to supply proof of principle examples, without a complete analysis of animal-to-animal variance.

      The B-SOiD analysis that we show in Figure 6 is based on a model trained on 80% of the data from four sessions taken from the same mouse, and then tested on all of a single session from that mouse. Initial attempts to train across sessions from different mice were unsuccessful, probably due to differences in behavioral repertoires across mice. However, we have performed extensive tests with B-SOiD and are confident that these sorts of results are reproducible across mice, although we are not prepared to publish these results at this time.

      We now clarify these points in the main text at ~line 865, pg 27.

      An additional comparison of the results of B-SOiD trained on different numbers of sessions to that of keypoint-MOSEQ (Weinreb et al, 2023, bioRxiv) trained on ~20 sessions can now be found as supplementary material on our GitHub site:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/Figure_SZZ_BSOID_MOSEQ_align.pdf

      The discrepancy between the maps in Figures 5e and 6e might indicate that a significant portion of the map represents noise. An analysis of variability across mice and a method to assign significance to these maps would be beneficial.

      Authors’ Response: After re-examination of the original analysis output files, we have indeed discovered that some of the Rastermap neuron density maps in Figure 6e were incorrectly aligned with their respective qualitative behaviors due to a discrepancy in file numbering between the images in 6e and the ensembles identified in 6c (each time that Rastermap is run on the same data, at least with the older version available at the time of creation of these figures, the order of the ensembles on the y-axis changes and thus the numbering of the ensembles would change even though the neuron identities within each group stayed the same for a given set of parameters).

      This unfortunate panel alignment / graphical display error present in the original reviewed preprint has been fixed in the current, updated figure (i.e. twitch corresponds to Rastermap groups 2 and 3, whisk to group 6, walk to groups 5 and 4, and oscillate to groups 0 and 1), and in the main text at ~line 925, pg 29. We have also changed the figure legend, which also contained accurate but misaligned information, for Figure 6e to reflect this correction.

      One can now see that, because the data from both figures is from the same session in the same mouse, as you correctly point out, Fig 5d left (walk and whisk) corresponds roughly to Fig 6e group R7, “walk”, and that Fig 5d right (whisk) corresponds roughly to Fig 6e group R4, “twitch”.

      We have double-checked the identity of other CCF map displays of Rastermap neuron density and of mean correlations between neural activity and behavioral primitives in all other figures, and we found no other such alignment or mis-labeling errors.

      We have also added a caveat in the main text at ~lines 925-940, pg. 30, pointing out the preliminary nature of these findings, which are shown here as an example of the viability of the methods. Analysis of the variability of Rastermap alignments across sessions is beyond the scope of the current paper, although it is an issue that we hope to address in upcoming analysis papers.

      -Analysis details:

      More comprehensive details on the analysis would be beneficial for replicability and deeper understanding. For instance, the statement "Rigid and non-rigid motion correction were performed in Suite2p" could be expanded with a brief explanation of the underlying principles, such as phase correlation, to provide readers with a better grasp of the methodologies employed.

      Authors’ Response: We added a brief explanation of Suite2p motion correction at ~line 136, pg 4. We have also added additional details concerning CCF / MMM alignment and other analysis issues. In general we cite other papers where possible to avoid repeating details of analysis methods that are already published.

      Reviewer #2 (Public Review):

      Summary:

      The authors present a comprehensive technical overview of the challenging acquisition of large-scale cortical activity, including surgical procedures and custom 3D-printed headbar designs to obtain neural activity from large parts of the dorsal or lateral neocortex. They then describe technical adjustments for stable head fixation, light shielding, and noise insulation in a 2-photon mesoscope and provide a workflow for multisensory mapping and alignment of the obtained large-scale neural data sets in the Allen CCF framework. Lastly, they show different analytical approaches to relate single-cell activity from various cortical areas to spontaneous activity by using visualization and clustering tools, such as Rastermap, PCA-based cell sorting, and B-SOID behavioral motif detection.

      Authors’ Response: Thank you for this excellent summary of the scope of our paper.

      The study contains a lot of useful technical information that should be of interest to the field. It tackles a timely problem that an increasing number of labs will be facing as recent technical advances allow the activity measurement of an increasing number of neurons across multiple areas in awake mice. Since the acquisition of cortical data with a large field of view in awake animals poses unique experimental challenges, the provided information could be very helpful to promote standard workflows for data acquisition and analysis and push the field forward.

      Authors’ Response: We very much support the idea that our work here will contribute to the development of standard workflows across the field including those for multiple approaches to large-scale neural recordings.

      Strengths:

      The proposed methodology is technically sound and the authors provide convincing data to suggest that they successfully solved various problems, such as motion artifacts or high-frequency noise emissions, during 2-photon imaging. Overall, the authors achieved their goal of demonstrating a comprehensive approach for the imaging of neural data across many cortical areas and providing several examples that demonstrate the validity of their methods and recapitulate and further extend some recent findings in the field.

      Weaknesses:

      Most of the descriptions are quite focused on a specific acquisition system, the Thorlabs Mesoscope, and the manuscript is in part highly technical making it harder to understand the motivation and reasoning behind some of the proposed implementations. A revised version would benefit from a more general description of common problems and the thought process behind the proposed solutions to broaden the impact of the work and make it more accessible for labs that do not have access to a Thorlabs mesoscope. A better introduction of some of the specific issues would also promote the development of other solutions in labs that are just starting to use similar tools.

      Authors’ Response: We have edited the motivations behind the study to clarify the general problems that are being addressed. However, as the 2-photon imaging component of these experiments were performed on a Thorlabs mesoscope, the imaging details necessarily deal specifically with this system.

      We briefly compare the methods and results from our Thorlabs system to that of Diesel-2p, another comparable system, based on what we have been able to glean from the literature on its strengths and weaknesses. See ~lines 206-213, pg 6.

      Reviewer #3 (Public Review):

      Summary

      In their manuscript, Vickers and McCormick have demonstrated the potential of leveraging mesoscale two-photon calcium imaging data to unravel complex behavioural motifs in mice. Particularly commendable is their dedication to providing detailed surgical preparations and corresponding design files, a contribution that will greatly benefit the broader neuroscience community as a whole. The quality of the data is high, but it is not clear whether this is available to the community, some datasets should be deposited. More importantly, the authors have acquired activity-clustered neural ensembles at an unprecedented spatial scale to further correlate with high-level behaviour motifs identified by B-SOiD. Such an advancement marks a significant contribution to the field. While the manuscript is comprehensive and the analytical strategy proposed is promising, some technical aspects warrant further clarification. Overall, the authors have presented an invaluable and innovative approach, effectively laying a solid foundation for future research in correlating large-scale neural ensembles with behaviour. The implementation of a custom sound insulator for the scanner is a great idea and should be something implemented by others.

      Authors’ Response: Thank you for the kind words.

      We have made ~500 GB of raw data and preliminary analysis files publicly available on FigShare+ for the example sessions shown in Figures 2, 3, 4, 5, 6, S3, and S6. We ask to be cited and given due credit for any fair use of this data.

      The data is located here: https://doi.org/10.25452/figshare.plus.c.7052513

      We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with in-depth analysis papers that are currently in preparation.

      This is a methods paper, but there is no large diagram that shows how all the parts are connected, communicating, and triggering each other. This is described in the methods, but a visual representation would greatly benefit the readers looking to implement something similar.

      Authors’ Response: This is an excellent suggestion. We have included a workflow diagram in the revised manuscript, in the form of a 3-part figure, for the methods (a), data collection (b and c), and analysis (d). This supplementary figure is now located on the GitHub page at the following link:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/pancortical_workflow_diagrams.pdf

      We now reference this figure on ~lines 190-192, pg 6 of the main text, near the beginning of the Results section.

      The authors should cite sources for the claims stated in lines 449-453 and cite the claim of the mouse's hearing threshold mentioned in lines 463.

      Authors’ Response: For the claim stated in lines 449-453:

      “The unattenuated or native high-frequency background noise generated by the resonant scanner causes stress to both mice and experimenters, and can prevent mice from achieving maximum performance in auditory mapping, spontaneous activity sessions, auditory stimulus detection, and auditory discrimination sessions/tasks”

      ,we can provide the following references: (i) for mice: Sadananda et al, 2008 (“Playback of 22-kHz and 50-kHz ultrasonic vocalizations induces differential c-fos expression in rat brain”, Neuroscience Letters, Vol 435, Issue 1, p 17-23), and (ii) for humans: Fletcher et al, 2018 (“Effects of very high-frequency sound and ultrasound on humans. Part I: Adverse symptoms after exposure to audible very-high frequency sound”, J Acoust Soc A, 144, 2511-2520). We will include these references in the revised paper.

      For the claim stated on line 463:

      “i.e. below the mouse hearing threshold at 12.5 kHz of roughly 15 dB”

      ,we can provide the following reference: Zheng et al, 1999 (“Assessment of hearing in 80 inbred strains of mice by ABR threshold analyses”, Vol 130, Issues 1-2, p 94-107).

      We have included these two new references in the new, revised version of our paper. Thank you for identifying these citation omissions.

      No stats for the results shown in Figure 6e, it would be useful to know which of these neural densities for all areas show a clear statistical significance across all the behaviors.

      Authors’ Response: It would be useful if we could provide a statistic similar to what we provide for Fig. S6c and f, in which for each CCF area we compare the observed mean correlation values to a null of 0, or, in this case, the population densities of each Rastermap group within each CCF area to a null value equal to the total number of CCF areas divided by the total number of recorded neurons for that group (i.e. a Rastermap group with 500 neurons evenly distributed across ~30 CCF areas would contain ~17 neurons, or ~3.3% density, per CCF area.) Our current figure legend states the maximums of the scale bar look-up values (reds) for each group, which range from ~8% to 32%.

      However, because the data in panel 6e are from a single session and are being provided as an example of our methods and not for the purpose of claiming a specific result at this point, we choose not to report statistics. It is worth pointing out, perhaps, that Rastermap group densities for a given CCF area close to 3.3% are likely not different from chance, and those closer to ~40%, which is our highest density (for area M2 in Rastermap group 7, which corresponds to the qualitative behavior “walk”), are most likely not due to chance. Without analysis of multiple sessions from the same mouse we believe that making a clear statement of significance for this likelihood would be premature.

      We now clarify this decision and related considerations in the main text at ~line 920, pg 29.

      While I understand that this is a methods paper, it seems like the authors are aware of the literature surrounding large neuronal recordings during mouse behavior. Indeed, in lines 178-179, the authors mention how a significant portion of the variance in neural activity can be attributed to changes in "arousal or self-directed movement even during spontaneous behavior." Why then did the authors not make an attempt at a simple linear model that tries to predict the activity of their many thousands of neurons by employing the multitude of regressors at their disposal (pupil, saccades, stimuli, movements, facial changes, etc). These models are straightforward to implement, and indeed it would benefit this work if the model extracts information on par with what is known from the literature.

      Authors’ Response: This is an excellent suggestion, but beyond the scope of the current methods paper. We are following up with an in depth analysis of neural activity and corresponding behavior across the cortex during spontaneous and trained behaviors, but this analysis goes well beyond the scope of the present manuscript.

      Here, we prefer to present examples of the types of results that can be expected to be obtained using our methods, and how these results compare with those obtained by others in the field.

      Specific strengths and weaknesses with areas to improve:

      The paper should include an overall cartoon diagram that indicates how the various modules are linked together for the sampling of both behaviour and mesoscale GCAMP. This is a methods paper, but there is no large diagram that shows how all the parts are connected, communicating, and triggering each other.

      Authors’ Response: This is an excellent suggestion. We have included a workflow diagram in the revised manuscript, in the form of a 3-part figure, for the methods (a), data collection (b and c), and analysis (c). This supplementary figure is now located on the GitHub page at the following link:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/pancortical_workflow_diagrams.pdf

      The paper contains many important results regarding correlations between behaviour and activity motifs on both the cellular and regional scales. There is a lot of data and it is difficult to draw out new concepts. It might be useful for readers to have an overall figure discussing various results and how they are linked to pupil movement and brain activity. A simple linear model that tries to predict the activity of their many thousands of neurons by employing the multitude of regressors at their disposal (pupil, saccades, stimuli, movements, facial changes, etc) may help in this regard.

      Authors’ Response: This is an excellent suggestion, but beyond the scope of the present methods paper. Such an analysis is a significant undertaking with such large and heterogeneous datasets, and we provide proof-of-principle data here so that the reader can understand the type of data that one can expect to obtain using our methods. We will provide a more complete analysis of data obtained using our methodology in the near future in another manuscript.

      Previously, widefield imaging methods have been employed to describe regional activity motifs that correlate with known intracortical projections. Within the authors' data it would be interesting to perhaps describe how these two different methods are interrelated -they do collect both datasets. Surprisingly, such macroscale patterns are not immediately obvious from the authors' data. Some of this may be related to the scaling of correlation patterns or other factors. Perhaps there still isn't enough data to readily see these and it is too sparse.

      Authors’ Response: Unfortunately, we are unable to directly compare 1-photon widefield GCaMP6s activity with mesoscope 2-photon GCaMP6s activity. During widefield data acquisition, animals were stimulated with visual, auditory, or somatosensory stimuli (i.e. “passive sensory stimulation”), while 2-photon mesoscope data collection occurred during spontaneous changes in behavioral state, without sensory stimulation. The suggested comparison is, indeed, an interesting project for the future.

      In lines 71-71, the authors described some disadvantages of one-photon widefield imaging including the inability to achieve single-cell resolution. However, this is not true. In recent years, the combination of better surgical preparations, camera sensors, and genetically encoded calcium indicators has enabled the acquisition of single-cell data even using one-photon widefield imaging methods. These methods include miniscopes (Cai et al., 2016), multi-camera arrays (Hope et al., 2023), and spinning disks (Xie et al., 2023).

      Cai, Denise J., et al. "A shared neural ensemble links distinct contextual memories encoded close in time." Nature 534.7605 (2016): 115-118.

      Hope, James, et al. "Brain-wide neural recordings in mice navigating physical spaces enabled by a cranial exoskeleton." bioRxiv (2023).

      Xie, Hao, et al. "Multifocal fluorescence video-rate imaging of centimetre-wide arbitrarily shaped brain surfaces at micrometric resolution." Nature Biomedical Engineering (2023): 1-14.

      Authors’ Response: We have corrected these statements and incorporated these and other relevant references. There are advantages and disadvantages to each chosen technique, such as ease of use, field of view, accuracy, and speed. We will reference the papers you mention without an extensive literature review, but we would like to emphasize the following points:

      Even the best one-photon imaging techniques typically have ~10-20 micrometer resolution in xy (we image at 5 micrometer resolution for our large FOV configuration, but the xy point-spread function for the Thorlabs mesoscope is 0.61 x 0.61 micrometers in xy with 970 nm excitation) and undefined z-resolution (4.25 micrometers for Thorlabs mesoscope). A coarser resolution increases the likelihood that activity related fluorescence from neighboring cells may contaminate the fluorescence observed from imaged neurons. Reducing the FOV and using sparse expression of the indicator lessens this overlap problem.

      We do appreciate these recent advances, however, particularly for use in cases where more rapid imaging is desired over a large field of view (CCD acquisition can be much faster than that of standard 2-photon galvo-galvo or even galvo-resonant scanning, as the Thorlabs mesoscope uses). This being said, there are few currently available genetically encoded Ca2+ sensors that are able to measure fluctuations faster than ~10 Hz, which is a speed achievable on the Thorlabs 2-photon mesoscope with our techniques using the “small, multiple FOV” method (Fig. S2d, e).

      We have further clarified our discussion of these issues in the main text at ~lines 76-80, pg 2.

      The authors' claim of achieving optical clarity for up to 150 days post-surgery with their modified crystal skull approach is significantly longer than the 8 weeks (approximately 56 days) reported in the original study by Kim et al. (2016). Since surgical preparations are an integral part of the manuscript, it may be helpful to provide more details to address the feasibility and reliability of the preparation in chronic studies. A series of images documenting the progression optical quality of the window would offer valuable insight.

      Authors’ Response: As you suggest, we now include brief supplementary material demonstrating the changes in the window preparation that we observed over the prolonged time periods of our study, for both the dorsal and side mount preparations. The following link to this material is now referenced at ~line 287, pg 9, and at the end of Fig S1:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/window_preparation_stability.pdf

      We have also included brief additional details in the main text that we found were useful for facilitating long term use of these preparations. These are located at ~line 287-290, pg 9.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Sharing raw data and code:

      I strongly encourage sharing some of the raw data from your experiments and all the code used for data analysis (e.g. in a github repository). This would help the reader evaluate data quality, and reproduce your results.

      Authors’ Response: We have made ~500 GB of raw data and preliminary analysis files publicly available on FigShare+ for the example sessions shown in Figures 2, 3, 4, 5, 6, S3, and S6. We ask to be cited and given due credit for any fair use of this data.

      We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with second and third in-depth analysis papers that are currently in preparation.

      The data is located here: https://doi.org/10.25452/figshare.plus.c.7052513

      We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with second and third in-depth analysis papers that are currently in preparation.

      Our existing GitHub repository, already referenced in the paper, is located here:

      https://github.com/vickerse1/mesoscope_spontaneous

      We have added an additional reference in the main text to the existence of these publicly available resources, including the appropriate links, located at ~lines 190-200, pg 6.

      (2) Use of proprietary software:

      The reliance on proprietary tools like LabView and Matlab could be a limitation for some researchers, given the associated costs and accessibility issues. If possible, consider incorporating or suggesting alternatives that are open-source, to make your methodology more accessible to a broader range of researchers, including those with limited resources.

      Authors’ Response: We are reluctant to recommend open source software that we have not thoroughly tested ourselves. However, we will mention, when appropriate, possible options for the reader to consider.

      Although LabView is proprietary and can be difficult to code, it is particularly useful when used in combination with National Instruments hardware. ScanImage in use with the Thorlabs mesoscope uses National Instruments hardware, and it is convenient to maintain hardware standards across the integrated rig/experimental system. Labview is also useful because it comes with a huge library of device drivers that makes addition of new hardware from basically any source very convenient.

      That being said, there are open source alternatives that could conceivably be used to replace parts of our system. One example is AutoPilot (author: Jonny Saunders), for control of behavioral data acquisition: https://open-neuroscience.com/post/autopilot/.

      We are not aware of an alternative to Matlab for control of ScanImage, which is the supported control software for the ThorLabs 2-photon mesoscope.

      Most of our processing and analysis code (see GitHub page: https://github.com/vickerse1/mesoscope_spontaneous) is in Python, but some of the code that we currently use remains in Matlab form. Certainly, this could be re-written as Python code. However, we feel like this is outside the scope of the current paper. We have provided commenting to all code in an attempt to aid users in translating it to other languages, if they so desire.

      (3) Quantifying the effect of tilted head:

      To address the potential impact of tilting the mouse's head on your findings, a quantitative analysis of any systematic differences in the behavior (e.g. Bsoid motifs) could be illuminating.

      Authors’ Response: We have performed DeepLabCut analysis of all sessions from both preparations, across several iterations with different parameters, to extract pose estimates, and we have also performed BSOiD of these sessions. We did not find any obvious qualitative differences in the number of behavioral motifs identified, the dwell times of these motifs, and similar issues, relating to the issue of tilting of the mouse’s head in the side mount preparation. We also did not find any obvious differences in the relative frequencies of high level qualitative behaviors, such as the ones referred to in Fig. 6, between the two preparations.

      Our mice readily adapted to the 22.5 degree head tilt and learned to perform 2-alternative forced choice (2-AFC) auditory and visual tasks in this configuration (Hulsey et al, 2024; Cell Reports). The advantages and limitations of such a rotation of the mouse, and possible ways to alleviate these limitations, as detailed in the following paragraphs, are now discussed more thoroughly in the revised manuscript. (See ~line 235, pg. 7)

      One can look at Supplementary Movie 1 for examples of the relatively similar behavior between the dorsal mount (not rotated) and side mount (rotated) preparations. We do not have behavioral data from mice that were placed in both configurations. Our preliminary comparisons across mice indicates that side and dorsal mount mice show similar behavioral variability. We have added brief additional mention of these considerations on ~lines 235-250, pg 7.

      It was in general important to make sure that the distance between the wheel and all four limbs was similar for both preparations. In particular, careful attention must be paid to the positioning of the front limbs in the side mount mice so that they are not too high off the wheel. This can be accomplished by a slight forward angling of the left support arm for side mount mice.

      Although it would in principle be nearly possible to image the side mount preparation in the same optical configuration that we do without rotating the mouse, by rotating the objective 20 degrees to the right of vertical, we found that the last 2-3 degrees of missing rotation (our preparation is rotated 22.5 degrees left, which is more than the full available 20 degrees rotation of the Thorlabs mesoscope objective), along with several other factors, made this undesirable. First, it was very difficult to image auditory areas without the additional flexibility to rotate the objective more laterally. Second, it was difficult or impossible to attach the horizontal light shield and to establish a water meniscus with the objective fully rotated. One could use gel instead (which we found to be optically inferior to water), but without the horizontal light shield, the UV and IR LEDs can reach the PMTs via the objective and contaminate the image or cause tripping of the PMT. Third, imaging the right pupil and face of the mouse is difficult to impossible under these conditions because the camera would need the same optical access angle as the objective, or would need to be moved down toward the air table and rotated up 20 degrees, in which case its view would be blocked by the running wheel and other objects mounted on the air table.

      (4) Clarification in the discussion section:

      The paragraph titled "Advantages and disadvantages of our approach" seems to diverge into discussing future directions, rather than focusing on the intended topic. I suggest revisiting this section to ensure that it accurately reflects the strengths and limitations of your approach.

      Authors’ Response: We agree with the reviewer that this section included several potential next steps or solutions for each advantage and disadvantage, which the reviewer refers to as “future directions” and are thus arguably beyond the scope of this section. Therefore we have retitled this section as, “Advantages and disadvantages of our approach (with potential solutions):”.

      Although we believe this to be a logical organization, and we already include a section focused purely on future directions in the Discussion section, we have refocused each paragraph of the advantages/disadvantages subsection to concentrate on the advantages and disadvantages per se. In addition, we have made minor changes to the “future directions” section to make it more succinct and practical. These changes can be found at lines ~1016-1077, pg 33-34.

      Reviewer #2 (Recommendations For The Authors):

      Below are some more detailed points that will hopefully help to further improve the quality and scope of the manuscript.

      • While it is certainly favorable for many questions to measure large-scale activity from many brain regions, the introduction appears to suggest that this is a prerequisite to understanding multimodal decision-making. This is based on the argument that combining multiple recordings with movement indicators will 'necessarily obscure the true spatial correlation structures'. However, I don't understand why this is the case or what is meant by 'true spatial correlation structures'. Aren't there many earlier studies that provided important insights from individual cortical areas? It would be helpful to improve the writing to make this argument clearer.

      Authors’ Response: The reviewer makes an excellent point and we have re-worded the manuscript appropriately, to reflect the following clarifications. These changes can be found at ~lines 58-71, pg. 2.

      We believe you are referring to the following passage from the introduction:

      “Furthermore, the arousal dependence of membrane potential across cortical areas has been shown to be diverse and predictable by a temporally filtered readout of pupil diameter and walking speed (Shimoaka et al, 2018). This makes simultaneous recording of multiple cortical areas essential for comparison of the dependence of their neural activity on arousal/movement, because combining multiple recording sessions with pupil dilations and walking bouts of different durations will necessarily obscure the true spatial correlation structures.”

      Here, we do not mean to imply that earlier studies of individual cortical areas are of no value. This argument is provided as an example, of which there are others, of the idea that, for sequences or distributed encoding schemes that simultaneously span many cortical areas that are too far apart to be simultaneously imaged under conventional 2-photon imaging, or are too sparse to be discovered with 1-photon widefield imaging, there are some advantages of our new methods over conventional imaging methods that will allow for truly novel scientific analyses and insights.

      The general idea of the present example, based on the findings of Shimoaka et al, 2018, is that it is not possible to directly combine and/or compare the correlations between behavior and neural activity across regions that were imaged in separate sessions, because the correlations between behavior and neural activity in each region appear to depend on the exact time since the behavior began (Shimoaka et al, 2018), in a manner that differs across regions. So, for example, if one were to record from visual cortex in one session with mostly brief walk bouts, and then from somatosensory cortex in a second session with mostly long walk bouts, any inferred difference between the encoding of walk speed in neural activity between the two areas would run the risk of being contaminated by the “temporal filtering” effect shown in Shimoaka et al, 2018. However, this would not be the case in our recordings, because the distribution of behavior durations corresponding to our recorded neural activity across areas will be exactly the same, because they were recorded simultaneously.

      • The text describes different timescales of neural activity but is an imaging rate of 3 Hz fast enough to be seen as operating at the temporal dynamics of the behavior? It appears to me that the sampling rate will impose a hard limit on the speed of correlations that can be observed across regions. While this might be appropriate for relatively slow behaviors and spontaneous fluctuations in arousal, sensory processing and decision formation likely operate on faster time scales below 100ms which would even be problematic at 10 Hz which is proposed as the ideal imaging speed in the manuscript.

      Authors’ Response: Imaging rate is always a concern and the limitations of this have been discussed in other manuscripts. We will remind the reader of these limitations, which must always be kept in mind when interpreting fluorescence based neural activity data.

      Previous studies imaging on a comparable yet more limited spatial scale (Stringer et al, 2019) used an imaging speed of ~1 Hz. With this in view, our work represents an advance both in spatial extent of imaged cortex and in imaging speed. Specifically, we believe that ~1 Hz imaging may be sufficient to capture flip/flop type transitions between low and high arousal states that persist in general for seconds to tens of seconds, and that ~3-5 Hz imaging likely provides additional information about encoding of spontaneous movements and behavioral syllables/motifs.

      Indeed, even 10 Hz imaging would not be fast enough to capture the detailed dynamics of sensory processing and decision formation, although these speeds are likely sufficient to capture “stable” encodings of sensory representations and decisions that must be maintained during a task, for example with delayed match-to-sample tasks.

      In general we are further developing our preparations to allow us to perform simultaneous widefield imaging and Neuropixels recordings, and to perform simultaneous 1.2 x 1.2 mm 2-photon imaging and visually guided patch clamp recordings.

      Both of these techniques will allow us to combine information across both the slow and fast timescales that you refer to in your question.

      We have clarified these points in the Introduction and Discussion sections, at ~lines ~93-105, pg 3, and ~lines 979-983, pg 31 and ~lines 1039-1045, pg 33, respectively.

      • The dorsal mount is very close to the crystal skull paper and it was ultimately not clear to me if there are still important differences aside from the headbar design that a reader should be aware of. If they exist, it would be helpful to make these distinctions a bit clearer. Also, the sea shell implants from Ghanbari et al in 2019 would be an important additional reference here.

      Authors’ Response: We have added brief references to these issues in our revised manuscript at ~lines 89-97, pg 3:

      Although our dorsal mount preparation is based on the “crystal skull paper” (Kim et al, 2016), which we reference, the addition of a novel 3-D printable titanium headpost, support arms, light shields, and modifications to the surgical protocols and CCF alignment represent significant advances that made this preparation useable for pan-cortical imaging using the Thorlabs mesoscope. In fact, we were in direct communication with Cris Niell, a UO professor and co-author on the original Kim et al, 2016 paper, during the initial development of our preparation, and he and members of his lab consulted with us in an ongoing manner to learn from our successful headpost and other hardware developments. Furthermore, all of our innovations for data acquisition, imaging, and analysis apply equally to both our dorsal mount and side mount preparations.

      Thank you for mentioning the Ghanbari et al, 2019 paper on the transparent polymer skull method, “See Shells.” We were in fact not aware of this study. However, it should be noted that their preparation seems to, like the crystal skull preparation and our dorsal mount preparation, be limited to bilateral dorsal cortex and not to include, as does our cranial window side mount preparation and the through-the-skull widefield preparation of Esmaeili et al, 2021, a fuller range of lateral cortical areas, including primary auditory cortex.

      • When using the lateral mount, rotating the objective, rather than the animal, appears to be preferable to reduce the stress on the animal. I also worry that the rather severe head tilt could be an issue when training animals in more complex behaviors and would introduce an asymmetry between the hemispheres due to the tilted body position. Is there a strong reason why the authors used water instead of an imaging gel to resolve the issue with the meniscus?

      Authors’ Response: Our mice readily adapted to the 22.5 degree head tilt and learned to perform 2-alternative forced choice (2-AFC) auditory and visual tasks in this situation (Hulsey et al, 2024; Cell Reports). The advantages and limitations of such a rotation of the mouse, and possible ways to alleviate these limitations, as detailed in the following paragraphs, are now discussed more thoroughly in the revised manuscript. (See ~line 235, pg. 7)

      One can look at Supplementary Movie 1 for examples of the relatively similar behavior between the dorsal mount (not rotated) and side mount (rotated) preparations. We do not have behavioral data from mice that were placed in both configurations. Our preliminary comparisons across mice indicates that side and dorsal mount mice show similar behavioral variability. We have added brief additional mention of these considerations on ~lines 235-250, pg 7.

      It was in general important to make sure that the distance between the wheel and all four limbs was similar for both preparations. In particular, careful attention must be paid to the positioning of the front limbs in the side mount mice so that they are not too high off the wheel. This can be accomplished by a slight forward angling of the left support arm for side mount mice.

      Although it would in principle be nearly possible to image the side mount preparation in the same optical configuration that we do without rotating the mouse, by rotating the objective 20 degrees to the right of vertical, we found that the last 2-3 degrees of missing rotation (our preparation is rotated 22.5 degrees left, which is more than the full available 20 degrees rotation of the objective), along with several other factors, made this undesirable. First, it was very difficult to image auditory areas without the additional flexibility to rotate the objective more laterally. Second, it was difficult or impossible to attach the horizontal light shield and to establish a water meniscus with the objective fully rotated. One could use gel instead (which we found to be optically inferior to water), but without the horizontal light shield, the UV and IR LEDs can reach the PMTs via the objective and contaminate the image or cause tripping of the PMT. Third, imaging the right pupil and face of the mouse is difficult to impossible under these conditions because the camera would need the same optical access angle as the objective, or would need to be moved down toward the air table and rotated up 20 degrees, in which case its view would be blocked by the running wheel and other objects mounted on the air table.

      • In parts, the description of the methods is very specific to the Thorlabs mesoscope which makes it harder to understand the general design choices and challenges for readers that are unfamiliar with that system. Since the Mesoscope is very expensive and therefore unavailable to many labs in the field, I think it would increase the reach of the manuscript to adjust the writing to be less specific for that system but instead provide general guidance that could also be helpful for other systems. For example (but not exclusively) lines 231-234 or lines 371 and below are very Thorlabs-specific.

      Authors’ Response: We have revised the manuscript so that it is more generally applicable to mesoscopic methods.

      We will make revisions as you suggest where possible, although we have limited experience with the other imaging systems that we believe you are referring to. However, please note that we already mentioned at least one other comparable system in the original eLife reviewed pre-print (Diesel 2p, line 209; Yu and Smith, 2021).

      Here are a couple of examples of how we have broadened our description:

      (1) On lines ~231-234, pg 7, we write:

      “However, if needed, the objective of the Thorlabs mesoscope may be rotated laterally up to +20 degrees for direct access to more ventral cortical areas, for example if one wants to use a smaller, flat cortical window that requires the objective to be positioned orthogonally to the target region.”

      Here have modified this to indicate that one may in general rotate their objective lens if their system allows it. Some systems, such as the Thorlabs Bergamo microscope and the Sutter MOM system, allow more than 20 degrees of rotation.

      (2) On line ~371, pg 11, we write:

      “This technique required several modifications of the auxiliary light-paths of the Thorlabs mesoscope”

      Here, we have changed the writing to be more general such as “may require…of one’s microscope.”

      Thank you for these valuable suggestions.

      • Lines 287-299: Could the authors quantify the variation in imaging depth, for example by quantifying to which extent the imaging depth has to be adjusted to obtain the position of the cortical surface across cortical areas? Given that curvature is a significant challenge in this preparation this would be useful information and could either show that this issue is largely resolved or to what extent it might still be a concern for the interpretation of the obtained results. How large were the required nominal corrections across imaging sites?

      Authors’ Response: This information was provided previously (lines 297-299):

      “In cases where we imaged multiple small ROIs, nominal imaging depth was adjusted in an attempt to maintain a constant relative cortical layer depth (i.e. depth below the pial surface; ~200 micrometer offset due to brain curvature over 2.5 mm of mediolateral distance, symmetric across the center axis of the window).”

      This statement is based on a qualitative assessment of cortical depth based on neuron size and shape, the density of neurons in a given volume of cortex, the size and shape of blood vessels, and known cortical layer depths across regions. A ground-truth measurement of this depth error is beyond the scope of the present study. However, we do specify the type of glass, thickness, and curvature that we use, and the field curvature characterization of the Thorlabs mesoscope is given in Fig. 6 of the Sofroniew et al, 2016 eLife paper.

      In addition, we have provided some documentation of online fast-z correction parameters on our GitHub page at:

      https://github.com/vickerse1/mesoscope_spontaneous/tree/main/online_fast_z_correction

      ,and some additional relevant documentation can be found in our publicly available data repository on FigShare+ at: https://doi.org/10.25452/figshare.plus.c.7052513

      • Given the size of the implant and the subsequent work attachments, I wonder to which extent the field of view of the animal is obstructed. Did the authors perform receptive field mapping or some other technique that can estimate the size of the animals' remaining field of view?

      Authors’ Response: The left eye is pointed down ~22.5 degrees, but we position the mouse near the left edge of the wheel to minimize the degree to which this limits their field of view. One may view our Fig. 1 and Suppl Movies 1 and 6 to see that the eyes on the left and right sides are unobstructed by the headpost, light shields, and support arms. However, other components of the experimental setup, such as the speaker, cameras, etc. can restrict a few small portions of the visual field, depending on their exact positioning.

      The facts that mice responded to left side visual stimuli in preliminary recordings during our multimodal 2-AFC task, and that the unobstructed left and right camera views, along with pupillometry recordings, showed that a significant portion of the mouse’s field of view, from either side, remains intact in our preparation.

      We have clarified these points in the text at ~lines 344-346, pg. 11.

      • Line 361: What does movie S7 show in this context? The movie seems to emphasize that the observed calcium dynamics are not driven by movement dynamics but it is not clear to me how this relates to the stimulation of PV neurons. The neural dynamics in the example cell are also not very clear. It would be helpful if this paragraph would contain some introduction/motivation for the optogenetic stimulation as it comes a bit out of the blue.

      Authors’ Response: This result was presented for two reasons.

      First, we showed it as a control for movement artifacts, since inhibition of neural activity enhances the relative prominence of non-activity dependent fluorescence that is used to examine the amplitude of movement-related changes in non-activity dependent fluorescence (e.g. movement artifacts). We have included a reference to this point at ~lines 587-588, pg 18.

      Second, we showed it as a demonstration of how one may combine optogenetics with imaging in mesoscopic 2-P imaging. References to this point were already present in the original version of the manuscript (the eLife “ reviewed preprint”).

      • Lines 362-370: This paragraph and some of the following text are quite technical and would benefit from a better description and motivation of the general workflow. I have trouble following what exactly is done here. Are the authors using an online method to identify the CCF location of the 2p imaging based on the vessel pattern? Why is it important to do this during the experiment? Wouldn't it be sufficient to identify the areas of interest based on the vessel pattern beforehand and then adjust the 2p acquisition accordingly? Why are they using a dial, shutter, and foot pedal and how does this relate to the working distance of the objective? Does the 'standardized cortical map' refer to the Allen common coordinate framework?

      Authors’ Response: We have revised this section to make it more clear.

      Currently, the general introduction to this section appears in lines 349-361. Starting in line 362, we currently present the technical considerations needed to implement the overall goals stated in that first paragraph of this section.

      In general we use a post-hoc analysis step to confirm the location of neurons recorded with 2-photon imaging. We use “online” juxtaposition of the multimodal map image with overlaid CCF with the 2-photon image by opening these two images next to each other on the ScanImage computer and matching the vasculature patterns “by eye”. We have made this more clear in the text so that the interested reader can more readily implement our methods.

      By use of the phrase “standardized cortical map” in this context, we meant to point out that we had not decided a priori to use the Allen CCF v3.0 when we started working on these issues.

      • Does Fig. 2c show an example of the online alignment between widefield and 2p data? I was confused here since the use of suite2p suggests that this was done post-recording. I generally didn't understand why the user needed to switch back and forth between the two modes. Doesn't the 2p image show the vessels already? Also, why was an additional motorized dichroic to switch between widefield and 2p view needed? Isn't this the standard in most microscopes (including the Thorlabs scopes)?

      Authors’ Response: We have explained this methodology more clearly in the revised manuscript, both at ~lines 485-500, pg 15-16, and ~lines 534-540, pg 17.

      The motorized dichroic we used replaced the motorized mirror that comes with the Thorlabs mesoscope. We switched to a dichroic to allow for near-simultaneous optogenetic stimulation with 470 nm blue light and 2-photon imaging, so that we would not have to move the mirror back and forth during live data acquisition (it takes a few seconds and makes an audible noise that we wanted to avoid).

      Figure 2c shows an overview of our two step “offline” alignment process. The image at the right in the bottom row labeled “2” is a map of recorded neurons from suite2p, determined post-hoc or after imaging. In Fig. 2d we show what the CCF map looks like when it’s overlaid on the neurons from a single suite2p session, using our alignment techniques. Indeed, this image is created post-hoc and not during imaging. In practice, “online” during imaging, we would have the image at left in the bottom row of Fig. 2c (i.e. the multimodal map image overlaid onto an image of the vasculature also acquired on the widefield rig, with the 22.5 degree rotated CCF map aligned to it based on the location of sensory responses) rotated 90 degrees to the left and flipped over a horizontal mirror plane so that its alignment matches that of the “online” 2-photon acquisition image and is zoomed to the same scale factor. Then, we would navigate based on vasculature patterns “by-eye” to the desired CCF areas, and confirm our successful 2-photon targeting of predetermined regions with our post-hoc analysis.

      • Why is the widefield imaging done through the skull under anesthesia? Would it not be easier to image through the final window when mice have recovered? Is the mapping needed for accurate window placement?

      Authors’ Response: The headpost and window surgeries are done 3-7 days apart to increase success rate and modularize the workflow. Multimodal mapping by widefield imaging is done through the skull between these two surgeries for two major reasons. First, to make efficient use of the time between surgeries. Second, to allow us to compare the multimodal maps to skull landmarks, such as bregma and lambda, for improved alignment to the CCF.

      Anesthesia was applied to prevent state changes and movements of the mouse, which can produce large, undesired effects on neural responses in primary sensory cortices in the context of these mapping experiments. We sometimes re-imaged multimodal maps on the widefield microscope through the window, roughly every 30-60 days or whenever/if significant changes in vasculature pattern became apparent.

      We have clarified these points in the main text at ~lines 510-522, pg 20-21, and we added a link to our new supplementary material documenting the changes observed in the window preparation over time:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/window_preparation_stability.pdf

      Thank you for these questions.

      • Lines 445 and below: Reducing the noise from resonant scanners is also very relevant for many other 2p experiments so it would be helpful to provide more general guidance on how to resolve this problem. Is the provided solution only applicable to the Thorlabs mesoscope? How hard would it be to adjust the authors' noise shield to other microscopes? I generally did not find many additional details on the Github repo and think readers would benefit from a more general explanation here.

      Authors’ Response: Our revised Github repository has been modified to include more details, including both diagrams and text descriptions of the sound baffle, respectively:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/resonant_scanner_baffle/closed_cell_honeycomb_baffle_for_noise_reduction_on_resonant_scanner_devices.pdf

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/resonant_scanner_baffle/closed_cell_honeycomb_baffle_methodology_summary.pdf

      However, we can not presently disclose our confidential provisional patent application. Complete design information will likely be available in early 2025 when our full utility patent application is filed.

      With respect to your question, yes, this technique is adaptable to any resonant scanner, or, for that matter, any complicated 3D surface that emits sound. We first 3D scan the surface, and then we reverse engineer a solid that fully encapsulates the surface and can be easily assembled in parts with bolts and interior foam that allow for a tight fit, in order to nearly completely block all emitted sound.

      It is this adaptability that has prompted us to apply for a full patent, as we believe this technique will be quite valuable as it may apply to a potentially large number of applications, starting with 2-photon resonant scanners but possibly moving on to other devices that emit unwanted sound.

      • Does line 458 suggest that the authors had to perform a 3D scan of the components to create the noise reduction shield? If so, how was this done? I don't understand the connection between 3D scanning and printing that is mentioned in lines 464-466.

      Authors’ Response: We do not want to release full details of the methodology until the full utility patent application has been submitted. However, we have now included a simplified text description of the process on our GitHub page and included a corresponding link in the main text:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/resonant_scanner_baffle/closed_cell_honeycomb_baffle_methodology_summary.pdf

      We also clarified in the main text, at the location that you indicate, why the 3D scanning is a critical part of our novel 3D-design, printing, and assembly protocol.

      • Lines 468 and below: Why is it important to align single-cell data to cortical areas 'directly on the 2-photon microscope'? Is this different from the alignment discussed in the paragraph above? Why not focus on data interpretation after data acquisition? I understand the need to align neural data to cortical areas in general, I'm just confused about the 'on the fly' aspect here and why it seems to be broken out into two separate paragraphs. It seems as if the text in line 485 and below could also be placed earlier in the text to improve clarity.

      Authors’ Response: Here by “such mapping is not routinely possible directly on the 2-photon mesoscope” what we mean is that it is not possible to do multimodal mapping directly on the mesoscope - it needs to be done on the widefield imaging rig (a separate microscope). Then, the CCF is mapped onto the widefield multimodal map, which is overlaid on an image of the vasculature (and sometimes also the skull) that was also acquired on the widefield imaging rig, and the vasculature is used as a sort of Rosetta Stone to co-align the 2-photon image to the multimodal map and then, by a sort of commutative property of alignment, to the CCF, so that each individual neuron in the 2-photon image can be assigned a unique CCF area name and numerical identifier for subsequent analysis.

      We have clarified this in the text, thank you.

      The Python code for aligning the widefield and 2-photon vessel images would also be of great value for regular 2p users. It would strongly improve the impact of the paper if the repository were better documented and the code would be equally applicable for alignment of imaging data with smaller cranial windows.

      Authors’ Response: All of the code for multimodal map, CCF, and 2-photon image alignment is, in fact, already present on the GitHub page. We have made some minor improvements to the documentation, and readers are more than welcome to contact us for additional help.

      Specifically, the alignment you refer to starts in cell #32 of the meso_pre_proc_1.ipynb notebook. In general the notebooks are meant to be run sequentially, starting with cell #1 of meso_pre_proc_1, then going to the next cell etc…, then moving to meso_pre_proc_2, etc… The purpose of each cell is labeled at the top of the cell in a comment.

      We now include a cleaned, abridged version of the meso_pre_proc_1.pynb notebook that contains only the steps needed for alignment, and included a direct link to this notebook in the main text:

      https://github.com/vickerse1/mesoscope_spontaneous/blob/main/python_code/mesoscope_preprocess_MMM_creation.ipynb

      Rotated CCF maps are in the CCF map rotation folder, in subfolders corresponding to the angle of rotation.

      Multimodal map creation involves use of the SensoryMapping_Vickers_Jun2520.m script in the Matlab folder.

      We updated the main text to clarify these points and included direct links to scripts relevant to each processing step.

      • Figure 4a: I found it hard to see much of the structure in the Rastermap projection with the viridis colormap - perhaps also because of a red-green color vision impairment. Correspondingly, I had trouble seeing some of the structure that is described in the text or clearer differences between the neuron sortings to PC1 and PC2. Is the point of these panels to show that both PCs identify movement-aligned dynamics or is the argument that they isolate different movement-related response patterns? Using a grayscale colormap as used by Stringer et al might help to see more of the many fine details in the data.

      Authors’ Response: In Fig. 4a the viridis color range is from blue to green to yellow, as indicated in the horizontal scale bar at bottom right. There is no red color in these Rastermap projections, or in any others in this paper. Furthermore, the expanded Rastermap insets in Figs. S4 and S5 provide additional detailed information that may not be clear in Fig 4a and Fig 5a.

      We prefer, therefore, not to change these colormaps, which we use throughout the paper.

      We have provided grayscale png versions of all figures on our GitHub page:

      https://github.com/vickerse1/mesoscope_spontaneous/tree/main/grayscale_figures

      In Fig 4a the point of showing both the PC1 and PC2 panels is to demonstrate that they appear to correspond to different aspects of movement (PC1 more to transient walking, both ON and OFF, and PC2 to whisking and sustained ON walk/whisk), and to exhibit differential ability to identify neurons with positive and negative correlations to arousal (PC1 finds both, both PC2 seems to find only the ON neurons).

      We now clarify this in the text at ~lines 696-710, pg 22.

      • I find panel 6a a bit too hard to read because the identification and interpretation of the different motifs in the different qualitative episodes is challenging. For example, the text mentions flickering into motif 13 during walk but the majority of that sequence appears to be shaped by what I believe to be motif 11. Motif 11 also occurs prominently in the oscillate state and the unnamed sequence on the left. Is this meaningful or is the emphasis here on times of change between behavioral motifs? The concept of motif flickering should be better explained here.

      Authors’ Response: Here motif 13 corresponds to a syllable that might best be termed “symmetric and ready stance”. This tends to occur just before and after walking, but also during rhythmic wheel balancing movements that appear during the “oscillate” behavior.

      The intent of Fig. 6a is to show that each qualitatively identified behavior (twitch, whisk, walk, and oscillate) corresponds to a period during which a subset of BSOiD motifs flicker back and forth, and that the identity of motifs in this subset differs across the identified qualitative behaviors. This is not to say that a particular motif occurs only during a single identified qualitative behavior. Admittedly, the identification of these qualitative behaviors is a bit arbitrary - future versions of BSOiD (e.g. ASOiD) in fact combine supervised (i.e. arbitrary, top down) and unsupervised (i.e. algorithmic, objective, bottom-up) methods of behavior segmentation in attempt to more reliably identify and label behaviors.

      Flickering appears to be a property of motif transitions in raw BSOiD outputs that have not been temporally smoothed. If one watches the raw video, it seems that this may in fact be an accurate reflection of the manner in which behaviors unfold through time. Each behavior could be thought of, to use terminology from MOSEQ (B Datta), as a series of syllables strung together to make a phrase or sentence. Syllables can repeat over either fast or slow timescales, and may be shared across distinct words and sentences although the order and frequency of their recurrence will likely differ.

      We have clarified these points in the main text at ~lines 917-923, pg 29, and we added motif 13 to the list of motifs for the qualitative behavior labeled “oscillate” in Fig. 6a.

      • Lines 997-998: I don't understand this argument. Why does the existence of different temporal dynamics make imaging multiple areas 'one of the keys to potentially understanding the nature of their neuronal activity'?

      Authors’ Response: We believe this may be an important point, that comparisons of neurobehavioral alignment across cortical areas cannot be performed by pooling sessions that contain different distributions of dwell times for different behaviors, if in fact that dependence of neural activity on behavior depends on the exact elapsed time since the beginning of the current behavioral “bout”. Again, other reasons that imaging many areas simultaneously would provide a unique advantage over imaging smaller areas one at a time and attempting to pool data across sessions would include the identification of sequences or neural ensembles that span many areas across large distances, or the understanding of distributed coding of behavior (an issue we explore in an upcoming paper).

      We have clarified these points at the location in the Discussion that you have identified. Thank you for your questions and suggestions.

      Minor

      Line 41: What is the difference between decision, choice, and response periods?

      Authors’ Response: This now reads “...temporal separation of periods during which cortical activity is dominated by activity related to stimulus representation, choice/decision, maintenance of choice, and response or implementation of that choice.”

      Line 202: What does ambulatory mean in this context?

      Authors’ Response: Here we mean that the mice are able to walk freely on the wheel. In fact they do not actually move through space, so we have changed this to read “able to walk freely on a wheel, as shown in Figs. 1a and 1b”.

      Is there a reason why 4 mounting posts were used for the dorsal mount but only 1 post was sufficient for the lateral mount?

      Authors’ Response: Here, we assume you mean 2 posts for the side mount and 4 posts for the dorsal mount.

      In general our idea was to use as many posts as possible to provide maximum stability of the preparations and minimize movement artifacts during 2-photon imaging. However, the design of the side mount headpost precluded the straight-forward or easy addition of a right oriented, second arm to its lateral/ventral rim - this would have blocked access of both the 2-photon objective and the right face camera. In the dorsal mount, the symmetrical headpost arms are positioned further back (i.e. posterior), so that the left and right face cameras are not obscured.

      When we created the side mount preparation, we discovered that the 2 vertical 1” support posts were sufficient to provide adequate stability of the preparation and minimize 2-photon imaging movement artifacts. The side mount used two attachment screws on the left side of the headpost, instead of the one screw per side used in the dorsal mount preparation.

      We have included these points/clarifications in the main text at ~lines 217-230, pg 7.

      Figure S1g appears to be mislabeled.

      Authors’ Response: Yes, on the figure itself that panel was mislabeled as “f” in the original eLife reviewed preprint. We have changed this to read “g”.

      Line 349 and below: Why is the method called pseudo-widefield imaging?

      Authors’ Response: On the mesoscope, broad spectrum fluorescent light is passed through a series of excitation and emission filters that, based on a series of tests that we performed, allow both reflected blue light and epifluorescence emitted (i.e. Stokes-shifted) green light to reach the CCD camera for detection. Furthermore, the CCD camera (Thorlabs) has a much smaller detector chip than that of the other widefield cameras that we use (RedShirt Imaging and PCO), and we use it to image at an acquisition speed of around 10 Hz maximum, instead of ~30-50 Hz, which is our normal widefield imaging acquisition speed (it also has a slower readout than what we would consider to be a standard or “real” 1-photon widefield imaging camera).

      For these 3 reasons we refer to this as “pseudo-widefield” imaging. We would not use this for sensory activity mapping on the mesoscope - we primarily use it for mapping cortical vasculature and navigating based on our multimodal map to CCF alignment, although it is actually “contaminated” with some GCaMP6s activity during these uses.

      We have briefly clarified this in the text.

      Figures 4d & e: Do the colors show mean correlations per area? Please add labels and units to the colorbars as done in panel 4a.

      Authors’ Response: For both Figs 4 and 5, we have added the requested labels and units to each scale bar, and have relabeled panels d to say “Rastermap CCF area cell densities”, and panels e to say “mean CCF area corrs w/ neural activity.”

      Thank you for catching these omissions/mislabelings.

      Line 715: what is superneuron averaging?

      Authors’ Response: This refers to the fact that when Rastermap displays more than ~1000 neurons it averages the activity of each group of adjacent 50 neurons in the sorting to create a single display row, to avoid exceeding the pixel limitations of the display. Each single row representing the average activity of 50 neurons is called a “superneuron” (Stringer et al, 2023; bioRxiv).

      We have modified the text to clarify this point.

      Line 740: it would be good to mention what exactly the CCF density distribution quantifies.

      Authors’ Response: In each CCF area, a certain percentage of neurons belongs to each Rastermap group. The CCF density distribution is the set of these percentages, or densities, across all CCF areas in the dorsal or side mount preparation being imaged in a particular session. We have clarified this in the text.

      Line 745: what does 'within each CCF' mean? Does this refer to different areas?

      Authors’ Response: The corrected version of this sentence now reads: “Next, we compared, across all CCF areas, the proportion of neurons within each CCF area that exhibited large positive correlations with walking speed and whisker motion energy.”

      How were different Rastermap groups identified? Were they selected by hand?

      Authors’ Response: Yes, in Figs. 4, 5, and 6, we selected the identified Rastermap groups “by hand”, based on qualitative similarity of their activity patterns. At the time, there was no available algorithmic or principled means by which to split the Rastermap sort. The current, newer version of Rastermap (Stringer et al, 2023) seems to allow for algorithmic discretization of embedding groups (we have not tested this yet), but it was not available at the time that we performed these preliminary analyses.

      In terms of “correctness” of such discretization or group identification, we intend to address this issue in a more principled manner in upcoming publications. For the purposes of this first paper, we decided that manual identification of groups was sufficient to display the capabilities and outcomes of our methods.

      We clarify this point briefly at several locations in the revised manuscript, throughout the latter part of the Results section.

      Reviewer #3 (Recommendations For The Authors):

      In "supplementary figures, protocols, methods, and materials", Figure S1 g is mislabeled as Figure f.

      Authors’ Response: Yes, on the figure itself this panel was mislabeled as “f” in the original reviewed preprint. We have changed this to read “g”.

      In S1 g, the success rate of the surgical procedure seems quite low. Less than 50% of the mice could be imaged under two-photon. Can the authors elaborate on the criteria and difficulties related to their preparations?

      Authors’ Response: We will elaborate on the difficulties that sometimes hinder success in our preparations in the revised manuscript.

      The success rate indicated to the point of “Spontaneous 2-P imaging (window) reads 13/20, which is 65%, not 50%. The drop to 9/20 by the time one gets to the left edge of “Behavioral Training” indicates that some mice do not master the task.

      Protocol I contains details of the different ways in which mice either die or become unsuitable or “unsuccessful” at each step. These surgeries are rather challenging - they require proper instruction and experience. With the current protocol, our survival rate for the window surgery alone is as high as 75-100%. Some mice can be lost at headpost implantation, in particular if they are low weight or if too much muscle is removed over the auditory areas. Finally, some mice survive windowing but the imageable area of the window might be too small to perform the desired experiment.

      We have added a paragraph detailing this issue in the main text at ~lines 287-320, pg 9.

      In both Suppl_Movie_S1_dorsal_mount and Suppl_Movie_S1_side_mount provided (Movie S1), the behaviour video quality seems to be unoptimized which will impact the precision of Deeplabcut. As evident, there were multiple instances of mislabeled key points (paws are switched, large jumps of key points, etc) in the videos.

      Many tracked points are in areas of the image that are over-exposed.

      Despite using a high-speed camera, motion blur is obvious.

      Occlusions of one paw by the other paws moving out of frame.

      As Deeplabcut accuracy is key to higher-level motifs generated by BSOi-D, can the authors provide an example of tracking by exclusion/ smoothing of mislabeled points (possibly by the median filtering provided by Deeplabcut), this may help readers address such errors.

      Authors’ Response: We agree that we would want to carefully rerun and carefully curate the outputs of DeepLabCut before making any strong claims about behavioral identification. As the aim of this paper was to establish our methods, we did not feel that this degree of rigor was required at this point.

      It is inevitable that there will be some motion blur and small areas of over-exposure, respectively, when imaging whiskers, which can contain movement components up to ~150 Hz, and when imaging a large area of the mouse, which has planes facing various aspects. For example, perfect orthogonal illumination of both the center of the eye and the surface of the whisker pad on the snout would require two separate infrared light sources. In this case, use of a single LED results in overexposure of areas orthogonal to the direction of the light and underexposure of other aspects, while use of multiple LEDs would partially fix this problem, but still lead to variability in summated light intensity at different locations on the face. We have done our best to deal with these limitations.

      We now briefly point out these limitations in the methods text at ~lines 155-160, pg 5.

      In addition, we have provided additional raw and processed movies and data related to DeepLabCut and BSOiD behavioral analysis in our FigShare+ repository, which is located at:

      https://doi.org/10.25452/figshare.plus.c.7052513

      In lines 153-154, the authors mentioned that the Deeplabcut model was trained for 650k iterations. In our experience (100-400k), this seems excessive and may result in the model overfitting, yielding incorrect results in unseen data. Echoing point 4, can the authors show the accuracy of their Deeplabut model (training set, validation set, errors, etc).

      Authors’ Response: Our behavioral analysis is preliminary and is included here as an example of our methods, and not to make claims about any specific result. Therefore we believe that the level of detail that you request in our DeepLabCut analysis is beyond the scope of the current paper. However, we would like to point out that we performed many iterations of DeepLabCut runs, across many mice in both preparations, before converging on these preliminary results. We believe that these results are stable and robust.

      We believe that 650k iterations is within the reasonable range suggested by DLC, and that 1 million iterations is given as a reasonable upper bound. This seems to be supported by the literature for example, see Willmore et al, 2022 (“Behavioral and dopaminergic signatures of resilience”, Nature, 124:611, 124-132). Here, in a paper focused squarely on behavioral analysis, DLC training was run with 1.3 million iterations with default parameters.

      We now note, on ~lines 153-154, pg 5, that we used 650K iterations, a number significantly less than the default of 1.03 million, to avoid overfitting.

      In lines 140-141, the authors mentioned the use of slicing to downsample their data. Have any precautions, such as a low pass filter, been taken to avoid aliasing?

      Authors’ Response: Most of the 2-photon data we present was acquired at ~3 Hz and upsampled to 10 Hz. Most of the behavioral data was downsampled from 5000 Hz to 10 Hz by slicing, as stated. We did not apply any low-pass filter to the behavioral data before sampling. The behavioral variables have heterogeneous real sampling/measurement rates - for example, pupil diameter and whisker motion energy are sampled at 30 Hz, and walk speed is sampled at 100 Hz. In addition, the 2-photon acquisition rate varied across sessions.

      These facts made principled, standardized low-pass filtering difficult to implement. We chose rather to use a common resampling rate of 10 Hz in an unbiased manner. This downsampled 10 Hz rate is also used by B-SOiD to find transitions between behavioral motifs (Hsu and Yttri, 2021).

      We do not think that aliasing is a major factor because the real rate of change of our Ca2+ indicator fluorescence and behavioral variables was, with the possible exception of whisker motion energy, likely at or below 10 Hz.

      We now include a brief statement to this effect in the methods text at ~lines 142-146, pg. 4.

      Line 288-299, the authors have made considerable effort to compensate for the curvature of the brain which is particularly important when imaging the whole dorsal cortex. Can the authors provide performance metrics and related details on how well the combination of online curvature field correction (ScanImage) and fast-z "sawtooth"/"step" (Sofroniew, 2016)?

      Authors’ Response: We did not perform additional “ground-truth” experiments that would allow us to make definitive statements concerning field curvature, as was done in the initial eLife Thorlabs mesoscope paper (Sofroniew et al, 2016).

      We estimate that we experience ~200 micrometers of depth offset across 2.5 mm - for example, if the objective is orthogonal to our 10 mm radius bend window and centered at the apex of its convexity, a small ROI located at the lateral edge of the side mount preparation would need to be positioned around 200 micrometers below that of an equivalent ROI placed near the apex in order to image neurons at the same cortical layer/depth, and would be at close to the same depth as an ROI placed at or near the midline, at the medial edge of the window. We determined this by examining the geometry of our cranial windows, and by comparing z-depth information from adjacent sessions in the same mouse, the first of which used a large FOV and the second of which used multiple small FOVs optimized so that they sampled from the same cortical layers across areas.

      We have included this brief explanation in the main text at ~lines 300-311, pg 9.

      In lines 513-515, the authors mentioned that the vasculature pattern can change over the course of the experiment which then requires to re-perform the realignment procedure. How stable is the vasculature pattern? Would laser speckle contrast yield more reliable results?

      Authors’ Response: In general the changes in vasculature we observed were minimal but involved the following: i) sometimes a vessel was displaced or moved during the window surgery, ii) sometimes a vessel, in particular the sagittal sinus, enlarged or increased its apparent diameter over time if it is not properly pressured by the cranial window, and iii) sometimes an area experiencing window pressure that is too low could, over time, show outgrowth of fine vascular endings. The most common of these was (i), and (iii) was perhaps the least common. In general the vasculature was quite stable.

      We have added this brief discussion of potential vasculature changes after cranial window surgery to the main text at ~lines 286-293, pg 9.

      We already mentioned, in the main text of the original eLife reviewed preprint, that we re-imaged the multimodal map (MMM) every 30-60 days or whenever changes in vasculature are observed, in order to maintain a high accuracy of CCF alignment over time. See ~lines 507-511, pg 16.

      We are not very familiar with laser speckle contrast, and it seems like a technique that could conceivably improve the fine-grained accuracy of our MMM-CCF alignment in some instances. We will try this in the future, but for now it seems like our alignments are largely constrained by several large blood vessels present in any given FOV, and so it is unclear how we would incorporate such fine-grained modifications without applying local non-rigid manipulations of our images.

      In lines 588-598, the authors mentioned that the occasional use of online fast-z corrections yielded no difference. However, it seems that the combination of the online fast-z correction yielded "cleaner" raster maps (Figure S3)?

      Authors’ Response: The Rastermaps in Fig S3a and b are qualitatively similar. We do not believe that any systematic difference exists between their clustering or alignments, and we did not observe any such differences in other sessions that either used or didn’t use online fast-z motion correction.

      We now provide raw data and analysis files corresponding to the sessions shown in Fig S3 (and other data-containing figures) on FigShare+ at:

      https://doi.org/10.25452/figshare.plus.c.7052513

      Ideally, the datasets contained in the paper should be available on an open repository for others to examine. I could not find a clear statement about data availability. Please include a linked repo or state why this is not possible.

      Authors’ Response: We have made ~500 GB of raw data and preliminary analysis files publicly available on FigShare+ for the example sessions shown in Figures 2, 3, 4, 5, 6, S3, and S6. We ask to be cited and given due credit for any fair use of this data.

      The data is located here:

      Vickers, Evan; A. McCormick, David (2024). Pan-cortical 2-photon mesoscopic imaging and neurobehavioral alignment in awake, behaving mice. Figshare+. Collection:

      https://doi.org/10.25452/figshare.plus.c.7052513

      We intend to release a complete data set to the public as a Dandiset on the DANDI archive in conjunction with second and third in-depth analysis papers that are currently in preparation.

    2. eLife assessment

      This important paper presents a thoroughly detailed methodology for mesoscale-imaging of extensive areas of the cortex, either from a top or lateral perspective, in behaving mice. The examples of scientific results to be derived with this method offer promising and stimulating insights. Overall, the method and results presented are convincing and will be of interest to neuroscientists focused on cortical processing in rodents and beyond.

    3. Reviewer #1 (Public Review):

      Summary:

      The authors introduce two preparations for observing large-scale cortical activity in mice during behavior. Alongside, they present intriguing preliminary findings utilizing these methods. This paper is poised to be an invaluable resource for researchers engaged in extensive cortical recording in behaving mice.

      Strengths:

      Comprehensive methodological detailing:<br /> The paper excels in providing an exceptionally detailed description of the methods used. This meticulous documentation includes a step-by-step workflow, complemented by thorough workflow, protocols and list of materials in the supplementary materials.

      Minimal of movement artifacts:<br /> A notable strength of this study is the remarkably low movement artifacts, with specific strategies outlined to attain this outcome.

      Insightful preliminary data and analysis:<br /> The preliminary data unveiled in the study reveal interesting heterogeneity in the relationships between neural activity and detailed behavioral features, particularly notable in the lateral cortex. This aspect of the findings is intriguing and suggests avenues for further exploration.

      Weaknesses:

      Clarification about the extent of the method in title:<br /> The title of the paper, using the term "pan-cortical", may inadvertently suggest that both the top and lateral view preparations are utilized in the same set of mice, while the authors employ either the dorsal view (which offers limited access to the lateral ventral regions) or the lateral view (which restricts access to the opposite side of the cortex).

      Despite the authors not identifying qualitative effects, tilting the mouse's head could potentially influence behavioral outcomes in certain paradigms.

    4. Reviewer #2 (Public Review):

      Summary:

      The authors present a comprehensive technical overview of the challenging acquisition of large-scale cortical activity, including surgical procedures and custom 3D-printed headbar designs to obtain neural activity from large parts of the dorsal or lateral neocortex. They then describe technical adjustments for stable head fixation, light shielding, and noise insulation in a 2-photon mesoscope and provide a workflow for multisensory mapping and alignment of the obtained large-scale neural data sets in the Allen CCF framework. Lastly, they show different analytical approaches to relate single-cell activity from various cortical areas to spontaneous activity by using visualization and clustering tools, such as Rastermap, PCA-based cell sorting, and B-SOID behavioral motif detection.

      The study contains a lot of useful technical information that should be of interest to the field. It tackles a timely problem that an increasing number of labs will be facing as recent technical advances allow the activity measurement of an increasing number of neurons across multiple areas in awake mice. Since the acquisition of cortical data with a large field of view in awake animals poses unique experimental challenges, the provided information could be very helpful to promote standard workflows for data acquisition and analysis and push the field forward.

      Strengths:

      The proposed methodology is technically sound and the authors provide convincing data to suggest that they successfully solved various challenging problems, such as motion artifacts of large imaging preparations or high-frequency noise emissions, during 2-photon imaging. Overall, the authors achieved their goal of demonstrating a comprehensive approach for imaging neural data across many cortical areas and providing several examples that demonstrate the validity of their methods and recapitulate and further extend some recent findings in the field. A particular focus of the results is to emphasize the need for imaging large population activity across cortical areas to identify cross-area information processing during active behaviors.

      Weaknesses:

      The manuscript contains a lot of technical details and might be challenging for readers without previous experimental experience. However, the different paragraphs illuminate a large range of technical aspects and challenges of large-scale functional imaging. Therefore, the work should be a valuable source of solutions for a diverse audience.

    5. Reviewer #3 (Public Review):

      Summary

      In their manuscript, Vickers and McCormick have demonstrated the potential of leveraging mesoscale two-photon calcium imaging data to unravel complex behavioural motifs in mice. Particularly commendable is their dedication in providing detailed surgical preparations and corresponding design files, a contribution that will greatly benefit the broader neuroscience community as a whole. The quality of the data is high and examples are available to the community. More importantly, the authors have acquired activity-clustered neural ensembles at an unprecedented spatial scale to further correlate with high level behaviour motifs identified by B-SOiD. Such an advancement marks a significant contribution to the field. While the manuscript is comprehensive and the analytical strategy proposed is promising, some technical aspects warrant further clarification. Overall, the authors have presented an invaluable and innovative approach, effectively laying a solid foundation for future research in correlating large scale neural ensembles with behavioural. The implementation of a custom sound insulator for the scanner is a great idea and should be something implemented by others.

      This is a methods paper, but there is no large diagram (in the main figures) that shows how all the parts are connected, communicating and triggering between each other. This is described in the methods and now supplemental figure, but a visual representation would greatly benefit the readers looking to implement something similar as a main figure but I guess they can find it in the methods. No stats for the results shown in Figure 6e, it would be useful to know which of these neural densities for all areas show a clear statistical significance across all the behaviors. While I understand that this is a methods paper, it seems like the authors are aware of the literature surrounding large neuronal recordings during mouse behavior. Indeed, in line 178-179 the authors mention how a significant portion of the variance in neural activity can be attributed to changes in "arousal or self-directed movement even during spontaneous behavior." Why then did the authors not make an attempt at a simple linear model that tries to predict the activity of their many thousands of neurons by employing the multitude of regressors at their disposal (pupil, saccades, stimuli, movements, facial changes, etc). These models are straightforward to implement, and indeed it would benefit this work if the model extracts information on par with what it's known from the literature. We also realize such a model could be done in the future.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Summary:

      This manuscript provides some valuable findings concerning the hippocampal circuitry and the potential role of adult-born granule cells in an interesting long-term social memory retrieval. The behavior experiments and strategy employed to understand how adult-born granule cells contribute to long-term social discrimination memory are interesting.

      We thank the reviewer for the positive evaluation.

      I have a few concerns, however with the strength of the evidence presented for some of the experiments. The data presented and the method described is incomplete in describing the connection between cell types in CA2 and the projections from abGCs. Likewise, I worry about the interpretation of the data in Figures 1 and 2 given the employed methodology. I think that the interpretation should be broadened. This second concern does not impact the interest and significance of the findings.

      In response to this concern, we have removed the data concerning abGC projections to PCP4+ and PV-GFP+ cell bodies from Figure 1 and have focused this analysis on dendrites. We now provide high magnification images of dendrites and expand on the methodology, results, and interpretations in the manuscript. We also broaden the interpretation throughout the manuscript to address the reviewer’s concern.

      Strengths:

      The behavior experiments are beautifully designed and executed. The experimental strategy is interesting.

      We appreciate these positive comments.

      Weaknesses:

      The interpretation of the results may not be justified given the methods and details provided.

      We have addressed this concern by providing more methodological details and broadening our interpretation of the results.

      Reviewer #2:

      Summary:

      Laham et al. investigate how the projection from adult-born granule cells into CA2 affects the retrieval of social memories at various developmental points. They use chemogenetic manipulations and electrophysiological recordings to test how this projection affects hippocampal network properties during behavior. I find the study to be very interesting, the results are important for our understanding of how social memories of different natures (remote or immediate) are encoded and supported by the hippocampal circuitry. I have some points that I added below that I think could help clarify the conclusions:

      We appreciate the positive assessment and have addressed the more specific points below.

      My major concern with the manuscript was that making the transitions between the different experiments for each result section is not very smooth. Maybe they can discuss a bit in a summary conclusion sentence at the end of each result section why the next set of experiments is the most logical step.

      In response, we have added summary conclusion sentences at the end of each result section.

      In line 113, the authors say that "the DG is known to influence hippocampal theta-gamma coupling and SWRs". Another recent study Fernandez-Ruiz et al. 2021, examined how various gamma frequencies in the dentate gyrus modulate hippocampal dynamics.

      We cite this paper in the revised manuscript.

      Having no single cells in the electrophysiological recordings makes it difficult to interpret the ephys part. Perhaps having a discussion on this would help interpret the results. If more SWRs are produced from the CA2 region (perhaps aided by projections from abGC), more CA2 cells that respond to social stimuli (Oliva et al. 2020) would reactivate the memories, therefore making them consolidate faster/stronger. On the other hand, the projections from abGC that the authors see, also target a great deal of PV+ interneurons, which have been shown to pace the SWRs frequency (Stark et al 2014, Gan et al 2017), which further suggests that this projection could be involved in SWRs modulation.

      We discuss these possibilities and cite Gan et al 2017, Schlingloff et al., 2014, and Stark et al., 2014 in the revised manuscript.

      The authors should cite and discuss Shuo et al., 2022 (A hypothalamic novelty signal modulates hippocampal memory).

      We mention Chen et al (A hypothalamic novelty signal modulates hippocampal memory.) in the revised manuscript. “Shuo” is the first name of the first author on this paper, so we believe that this is the same paper to which the reviewer refers.

      I think the authors forgot to refer to Fig 3a-f, maybe around lines 163-168.

      We thank the reviewer for pointing out this error. In the revised manuscript, we refer to all figure panels. Since Fig 3 is now broken into two figures (Fig 3 and 4), the panel lettering has changed in the revised manuscript.

      Are the SWRs counted only during interaction time or throughout the whole behavior session for each condition?

      The SWRs are counted throughout the whole behavior session for each condition. This is now stated in the revised manuscript.

      Figure 3t shows a shift in the preferred gamma phase within theta cycles as a result of abGC projections to CA2 ablation with CNO, especially during Mother CNO condition. I think this result is worth mentioning in the text.

      We now mention this finding in the revised manuscript.

      Figure 3u in the legend mention "scale bars = 200um", what does this refer to?

      The scale bar refers to that shown in Figure 3b, which is now indicated in the legend.

      What exactly is calculated as SWR average integral? Is it a cumulative rate? Please clarify.

      The integral measure provides information regarding the average total power of SWR events. It sums z-scored amplitude values from beginning to the end of each SWR envelope, and then takes the average across all summed envelopes. SWR integral has been shown to influence SWR propagation (De Filippo and Schmitz, 2023). This is now described in the text.

      Alexander et al 2017, "CA2 neuronal activity controls hippocampal oscillations and social behavior", examined some of the CA2 effects in the hippocampal network after CNO silencing, and the authors should cite it.

      Alexander et al., 2018, which we believe is the relevant paper, is now cited in the revised manuscript.

      Strengths:

      Behavioral experiments after abGC projections to CA2 are compelling as they show clearly distinct behavioral readout.

      We thank the reviewer for this positive assessment.

      Weaknesses:

      Electrophysiological experiments are difficult to interpret without additional quantifications (single-cell responses during interactions etc.)

      We have addressed this concern by expanding the interpretation of our results.

      Reviewer #3:

      Laham et al. present a manuscript investigating the function of adult-born granule cells (abGCs) projecting to the CA2 region of the hippocampus during social memory. It should be noted that no function for the general DG to CA2 projection has been proposed yet. The authors use targeted ablation, chemogenetic silencing, and in vivo ephys to demonstrate that the abGCs to CA2 projection is necessary for the retrieval of remote social memories such as the memory of one's mother. They also use in vivo ephys to show that abGCs are necessary for differential CA2 network activity, including theta-gamma coupling and sharp wave-ripples, in response to novel versus familiar social stimuli.

      The question investigated is important since the function of DG to CA2 projection remained elusive a decade after its discovery. Overall, the results are interesting but focused on the social memory of the mother, and their description in the manuscript and figures is too cursory. For example, raw interaction times must be shown before their difference. The assumption that mice exhibit social preference between familiar or novel individuals such as mother and non-mother based on social memory formation, consolidation, and retrieval should be better explained throughout the manuscript. Thus, when describing the results, the authors should comment on changes in preference and how this can be interpreted as a change in social memory retrieval. Several critical experimental details such as the total time of presentation to the mother and non-mother stimulus mice are also lacking in the manuscript. The in vivo e-phys results are interesting as well but even more succinct with no proposed mechanism as to how abGCs could regulate SWR and PAC in CA2.

      In response to these comments, we provide raw interaction times in a new Figure (Fig. S1). We also provide more information about the experiments and figures in the revision. We explain the rationale for our behavioral interpretations and discuss proposed mechanisms for how abGCs regulate SWR and PAC.

      The manuscript is well-written with the appropriate references. The choice of the behavioral test is somewhat debatable, however. It is surprising that the authors chose to use a direct presentation test (presentation of the mother and non-mother in alternation) instead of the classical 3-chamber test which is particularly appropriate to investigate social preference. Since the authors focused exclusively on this preference, the 3-chamber test would have been more adequate in my opinion. It would greatly strengthen the results if the authors could repeat a key experiment from their investigation using such a test. In addition, the authors only impaired the mother's memory. An additional experiment showing that disruption of the abGCs to CA2 circuit impairs social memory retrieval would allow us to generalize the findings to social memories in general. As the manuscript stands, the authors can only conclude the importance of this circuit for the memory of the mother. Developmental memory implies the memory of familiar kin as well.

      We selected the direct social interaction test because it allows for more naturalistic social behaviors than measuring investigation times toward social stimuli located inside wire mesh containers. We also decided to focus our studies on the retrieval of mother memories because these are likely the first social memories to be formed. We emphasize that our results cannot be generalized to memories of other social stimuli but given studies on recent social memory formation and retrieval in adults that manipulate abGCs and CA2 separately, we feel that it is likely that this circuit is involved in these functions as well. However, we specify throughout the manuscript that our experiments can only tell us about mother memories. We have also changed the title to reflect this.

      The in vivo ephys section (Figure 3) is interesting but even more minimalistic and it is unclear how abGCs projection to CA2 can contribute to SWR and theta-gamma PAC. In Figure 1, the authors suggest that abGCs project preferentially to PV+ neurons in CA2. At a minimum, the authors should discuss how the abGCs to PV+ neurons to CA2 pyramidal neurons circuit can facilitate SWR and theta-gamma PAC.

      We have divided Figure 3 into two figures (Figures 3 and 4) and revised the electrophysiology section of the results section. In the revised paper, we now discuss how abGC projections to PV+ interneurons may facilitate SWR and PAC.

      Finally, proposing a function for 4-6-week-old abGCs projecting to CA2 begs two questions: What are abGCs doing once they mature further, and more generally, what is the function of the DG to CA2 projection? It would be interesting for the authors to comment on these questions in the discussion.

      In response to these comments, we discuss possible answers to these interesting questions.

      Recommendations for the authors:

      Reviewer #1:

      Specifically, in Figure 1, for the analysis of the synapses formed between abGCs and CA2 PNS (as identified by PCP4 expression) and CA2 PV+ cells (as identified by cre-dependent AAV-mCherry expression) in PV-cre line. In panels c and d the soma of a CA2 PN cell is shown, as well as the soma of a PV cell is shown. Why was the soma analyzed? What relevance is there for this? It is my understanding that synapses form on dendrites- this would be much more relevant to show, in my opinion. Also, the methods for panels e and f state that the 3R-Tau+ intensity was analyzed only in stratum lucidum. (There was a normalization for the overall 3R-Tau intensity in SL of CA2 that was obtained by dividing the 3R-Tau intensity of corpus callosum). I don't understand then how a comparison of 3RTau intensity could have been done for CA2 PN soma. There are no CA2 PN soma in stratum lucidum. (This is fairly clearly shown in Figure 1aiii, with the PCP4 staining showing the soma in the somatic layer... not in stratum lucidum). What is being analyzed here?

      If the 3R-Tau intensity for dendrites is higher for PV cell dendrites, an example image of dendrites would be very helpful. How was the CA2 PV cell dendrite delimited from the CA2 PN dendrites at 40x magnification for the 3R-Tau intensity? Why were pre-synaptic puncta not examined? Is it possible to determine the post-synaptic target with these methods? This result could be particularly interesting, but I find it very difficult to understand the quantification or the justification behind it. To truly know if a cell is getting a connection, the best method would be to perform whole-cell patch clamp recordings of the post-synpatic target cells and use optogenetics of the abGCs. I understand that perhaps this may be beyond the scope of the paper, but it is a severe limitation for these results.

      We have eliminated the cell body measures from Figure 1 and focus instead on the dendrite measures, which we agree are more relevant. We now provide high magnification example images of pyramidal cell (PCP4+) and PV+ interneuron (GFP+) dendrites in Figure 1. We thank the reviewer for pointing out the error about the stratum lucidum as some of the dendrites analyzed are located in the pyramidal cell layer. In addition, neither PCP4 nor GFP label the full extent of dendrites emanating from CA2 pyramidal cells or PV+ interneurons respectively. We mention this in the revised manuscript because abGC projections to more distal dendrites might show a different pattern than that which was observed for proximal dendrites. We also provide more details about how the dendrites were delimited for the analysis, and mention that these results cannot definitively inform us about whether functional synaptic connections have been formed.

      Canulation over CA2 is potentially not specific to CA2 terminals. It would be optimal if the authors had some histology demonstrating specific cannula placement, as these surgeries are really tough to get perfectly centered over CA2. Even if it is perfectly centered, how much would the CNO diffuse into CA3? I think that given the methodology, the authors really need to consider that the behavioral results are not only a result of blocking abGC terminals in CA2 alone. Would it really change much if the abGC terminals are also silenced in CA3a/b as well? The McHugh lab has shown that area CA3 is also playing a role in social memory (Chiang, M.-C., Huang, A. J. Y., Wintzer, M. E., Ohshima, T. & McHugh, T. J. A role for CA3 in social recognition memory. Behav Brain Res 354, 2018). It may be that both areas CA2 and CA3 are important for the phenomenon being demonstrated in Figure 2. I think the impact of the study is just as interesting, as this examination of early social memories is very interesting and nicely done. In fact, areas CA2 and CA3 may be acting together (please see Stöber, T. M., Lehr, A. B., Hafting, T., Kumar, A. & Fyhn, M. Selective neuromodulation and mutual inhibition within the CA3-CA2 system can prioritize sequences for replay. Hippocampus 30, 1228-1238, 2020).

      We agree that it is possible that CNO infusions targeted at the CA2 would also influence CA3a/b and have revised the paper to include this possible interpretation. We also cite the suggested paper on CA3 involvement in social memory (Chiang et al., 2018) and the paper on CA2-CA3 interactions (Stöber et al, 2020).

      Figure 3 is packed with information, but not communicated in a reasonable way. Much more information and a description of the experimental protocol need to be presented. Furthermore, why are there no example traces for the SWRs recorded? There should be more analysis than just a difference score and frequency. How is j, k, and l analyzed and interpreted? Why no example traces there? Also, the n's seem way too small for Figure 3mr. Are there only 32 or three animals used for some of these conditions? This is insufficient in my opinion to conclude much for a 5-minute interaction.

      In response to this concern, we have divided Figure 3 into 2 figures – Figure 3 and Figure 4. In Figure 3, we provide example traces for SWRs, with additional SWR data presented in Figures S3 and S4, including data to complement the difference score data in Figure 3. In Figure 4, we include traces of phase amplitude coupling. We also provide more information in the methods about how the phase amplitude coupling data were analyzed. For Figure 4, we used methods described by Tort et al., 2010 to produce a modulation index, which is a measure of the intensity of coupling between theta phase and gamma amplitude. This method additionally allows for visualization of how gamma amplitude is modified across individual theta phase cycles. Regarding the question about n sizes in the 10-12 week abGC group (Fig. 3), the numbers are lower than in the 4-6 week abGC group because by 6 weeks after the first set of recordings, the electrodes in some of the mice were no longer usable. The n sizes for this specific study are 4-5 per group for Nestin-cre mice; 7-8 for Nestin-cre:Gi. This is now clarified in the figure legend.

      The discussion section of this paper does not put these results into a broader context with the field. There are other studies examining abGCs and their roles in novelty and memory formation (the work from Juna Song's lab, for example). These should be properly mentioned and discussed.

      In response, we have added discussion on the roles of abGCs in nonsocial novelty and memory formation and have cited papers from the Song lab.

      In the figure legend for Figure 2, there is no specific explanation for panel h. Perhaps the label is missing in the legend.

      We thank the reviewer for noting this error and now include a description in the revised manuscript.

      Reviewer #2:

      Adding more quantifications (single cells, isolating data during interactions versus noninteraction times) would help understand the results better. In the lack of this, adding a more clear rationale (even if only through the presentation of hypotheses) in between the transitions of the different results sections would make the study easier to read.

      In response to this comment, we have added transition sentences between results sections to clarify the rationale and make the manuscript easier to understand.

      Reviewer #3:

      Line 110: "Hippocampal phase-amplitude coupling (PAC) and generation of sharp waveripples (SWRs) have been linked to novel experience, memory consolidation, and retrieval (Colgin, 2015; Fernandez Ruiz et al., 2019; Meier et al., 2020; Joo and Frank, 2018; Vivekananda et al., 2021). The DG is known to influence hippocampal theta-gamma coupling and SWRs (Bott et al, 2016; Meier et al., 2020), yet no studies have examined the influence of abGCs on these oscillatory patterns." This information comes too early in the result section and is somewhat confusing.

      In response to this comment, we have moved this information and provided a better description.

      Line 118: "we found that mice with normal levels of abGCs can discriminate between their own mother and a novel mother." Be more descriptive of the results (present the raw interaction times with the statistical test to compare them), this is the conclusion.

      In response to this comment, we provide the raw interaction times in a new Figure (Fig. S1) and describe the results in more detail.

      Line 121: "These effects were not due to changes in physical activity". Be more specific. Did you subject the mice to a specific test? If not, how did you calculate locomotion? The data presented in the supplementary figure 1a only states the % locomotion.

      Locomotion was manually scored whenever an animal moved in the testing apparatus. Speed was not recorded. Total locomotion was divided by trial duration to create a % locomotion measure. We have added these details to the methods.

      Line 124: "Coinciding with the recovery of adult neurogenesis, GFAP-TK animals regained the ability to discriminate between their mother and a novel mother". Explain how the difference in interaction time can be interpreted as the ability to discriminate. You could also compute the discrimination index used by several other laboratories (difference of interaction normalized by the total interaction time).

      In response to this comment, we describe how the difference in interaction time can be interpreted as the ability to discriminate between novel and familiar mice.

      Line 133: "Targeted CNO infusion in Nestin-Cre:Gi mice enabled the inhibition of GiDREADD+ abGC axon terminals present in CA2." Provide data or references to support this claim. Injection of a dye of comparable size to CNO would help. Otherwise, mention that nearby CA3a could be affected as well.

      We cannot rule out that nearby CA3a was affected by our cannula infusions of CNO into CA2. Furthermore, since dyes likely diffuse at different rates than CNO, we believe that a dye injection would not eliminate this concern completely. Therefore, we have revised the paper to acknowledge the likelihood that the CNO infusion affected parts of CA3 in addition to CA2. We also changed the title to focus more on the CA2 electrophysiological recordings, which we know were obtained only from the CA2.

      Line 150: "When reintroduced to the now familiar adult mouse 6 hours later, after the effects of CNO had largely worn off". Provide data or references supporting this claim.

      In response, we cite articles that show behavioral effects of CNO DREADD activation are returned to baseline 6 hrs later.

      Line 165: "We found that SWR production is increased during social interaction, with more SWRs produced during novel mouse investigation, presumably during encoding social memories, than during familiar mouse investigation, presumably during retrieval of developmental social memories". How does this compare to the results in Oliva et al, Nature 2021?

      The Oliva et al 2021 paper recorded CA2 SWRs during home cage and during post-social stimulus exposure periods of sleep. The timing of the study does not coincide with the measures we made, but we cite the paper.

      Line 168: "Inhibition of abGCs in the presence of a social stimulus". How does silencing abGC impact CA2 pyramidal neurons' firing rate?

      The direct answer to this question is unknown because we did not measure single units, but based on studies done in the CA3, it is likely that firing rate in CA2 would increase.

      Line 203: "abGCs possess a time-sensitive ability to support retrieval of developmental social memories." Can you speculate on the function of the cells later on?

      In the revised paper, we speculate about the function of abGCs after they mature and no longer support retrieval of developmental social memories.

      Line 229: "GFAP-TK mice were group housed by genotype". Why not housed them with CD1 littermates?

      We housed these mice according to genotype to avoid having mice with different levels of abGCs (GFAP-TK + VGCV and CD1 + VGCV) living together in social groups. We did this to avoid potential differences that might emerge in social behavior.

      Line 237: "Adult TK, Nestin-cre, and Nestin-cre:Gi offspring underwent a social interaction test in which they directly interacted with the mother". Specify how long was the social interaction time.

      In the revised manuscript, we specify that mice interacted with each social stimulus for 5 minutes.

      Line 240: "After a 1-hour delay spent in the home cage". Were the mice single-housed or with their littermates during this delay?

      In the revised manuscript, we indicate that mice were put back into the home cage with their cagemates during the 1 hr delay period.

      Line 241: "The order of stimulus exposure was counterbalanced in all tests." Can you show some data to confirm that the order of presentation did not impair the interaction? Have you considered using your own version of the classical 3-chamber test in order to assess directly the preference for one or the other female mouse?

      Our data suggest that the order of testing is not responsible for the observed results. Across all experimental groups without an abGC manipulation (i.e., all direct social interaction assays excluding VGCV+ GFAP-TK trials and CNO+ Nestin-cre:Gi trials), ~84.4% of animals demonstrate a social preference for the novel mother over the mother (CD1 + GFAP-TK VGCV- cohort: 28/33; CD1 VGCV+ cohort: 17/17; CD1 and TK recovery cohort: 24/31; Nestin-cre and Nestin-cre:GI 4-6-week-old abGC cohort: 77/95; 10-12-week-old abGC cohort: 49/55; Total = 195/231 mice with an investigation preference for the novel mother). If stimulus presentation order were to bias social investigation preference toward the first stimulus presented, we would expect the percentage of animals demonstrating a social preference for each stimulus to be around 50%, as roughly half the animals were first exposed to the mother with the other half first exposed to the novel mother. The social novelty preference percentage reported above is comparable to percentages we observe in our lab's novel to familiar social interaction experiments, in which all animals are first exposed to a novel conspecific. We have yet to conduct experiments testing adults using the modified 3-chamber assay described in Laham et al., 2021.

      Statistics: The statistical tests used throughout the paper are appropriate but their description is too cursory. Please provide F values and specify the name of the tests used in the figure legends before giving the exact p values.

    2. eLife assessment

      This paper reports a valuable set of new results. The main result is that the projection from adult-born granule cells in the dentate gyrus to the hippocampal subfield CA2 is necessary for the retrieval of a social memory formed during development, and solid evidence is provided to support this conclusion.

    3. Reviewer #2 (Public Review):

      Summary:

      Laham et al. investigate how the projection from adult born granule cells into CA2 affects the retrieval of social memories at various developmental points. They use chemogenetic manipulations and electrophysiological recordings to test how this projection affects hippocampal network properties during behavior. The study is of relevant interest for the neuroscience community and the results are important for our understanding of how social memories of different nature (remote or immediate) are encoded and supported by the hippocampal circuitry. The behavioral experiments after abGC projections to CA2 are compelling as they show clearly distinct behavioral readout. While the electrophysiological experiments are difficult to interpret without more single cell responses quantifications, they clearly show that more than one region in the hippocampus is involved in the formation of social memories.

    4. Reviewer #3 (Public Review):

      Laham et al. present a manuscript investigating the function of adult-born granule cells (abGCs) projecting to the CA2 region of the hippocampus during social memory. It should be noted that no function for the general DG to CA2 projection has been proposed yet. The authors use targeted ablation, chemogenetic silencing and in vivo ephys to demonstrate that the abGCs to CA2 projection is necessary for the retrieval of a remote social memories such as the memory of one's mother. They also use in vivo ephys to show that abGCs are necessary for differential CA2 network activity, including theta-gamma coupling and sharp wave-ripples, in response to novel versus familiar social stimuli.

      The question investigated is important since the function of DG to CA2 projection remained elusive a decade after its discovery. Overall, the results are interesting but focused to the social memory of the mother and their description in the manuscript and figures is too cursory. For example, raw interaction times must be shown before their difference. The assumption that mice exhibit social preference between familiar or novel individuals such as mother and non-mother based on social memory formation, consolidation and retrieval should be better explained throughout the manuscript. Thus, when describing the results, the authors should comment on changes in preference and how this can be interpreted as a change in social memory retrieval. Several critical experimental details such as the total time of presentation to the mother and non-mother stimulus mice are also lacking from the manuscript. The in vivo e-phys results are interesting as well but even more succinct with no proposed mechanism as to how abGCs could regulate SWR and PAC in CA2.

      The manuscript is well-written with the appropriate references. The choice of behavioral test is somewhat debatable however. It is surprising the authors chose to use a direct presentation test (presentation of the mother and non-mother in alternance) instead of the classical 3-chamber test which is particularly appropriate to investigate social preference. Since the authors focused exclusively on this preference, the 3-chamber test would have been more adequate in my opinion. It would greatly strengthened the results if the authors could repeat a key experiment from their investigation using such test. In addition, the authors only impaired the mother's memory. An additional experiment showing that disruption of the abGCs to CA2 circuit impairs social memory retrieval in general would allow to generalize the findings to social memories in general. As the manuscript stands, the authors can only conclude as to the importance of this circuit for the memory of the mother. Developmental memory implies the memory of familiar kin as well.

      The in vivo ephys section (Figure 3) is interesting but even more minimalistic and it is unclear how abGCs projection to CA2 can contribute to SWR and theta-gamma PAC. In figure 1, the authors suggest that abGCs project preferentially to PV+ neurons in CA2. At minima, the authors should discuss how the abGCs to PV+ neurons to CA2 pyramidal neurons circuit can facilitate SWR and theta-gamma PAC.

      Finally, proposing a function for 4-6-week-old abGCs projecting to CA2 begs two questions: What are abGCs doing once they mature further and more generally, what is the function of the DG to CA2 projection? It would be interesting for the authors to comment on these questions in the discussion.

      Revision:

      The authors have followed my recommendations except for the ones suggesting new experiments. As a result, the clarity of the manuscript and the links between evidence and claims have improved by the message is quite reduced. Many important questions remain open such as: What makes mother's memories so special they require the abGC projection to CA2 unlike other types of social memories? Do abGCs truly connect CA2 PV+ interneurons and how does this connection shape sharp-wave ripples in CA2?

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this valuable study the authors propose a new regulatory role for one the most abundant circRNAs, circHIPK3, mediated by the RNA binding protein IGF2BP2. While the study presents interesting and largely solid evidence, part of the work is incomplete, requiring additional controls to more robustly support the major claims. The work would also benefit from further discussion addressing the apparently contradictory effects of circHIPK3 and STAT3 depletion in cancer progression.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this work the authors propose a new regulatory role for one the most abundant circRNAs, circHIPK3, by showing that it interacts with an RNA binding protein (IGF2BP2) and, by sequestering it, it regulates the expression of hundreds of genes containing a sequence (11-mer motif) in their untranslated regions (3'-UTR). This sequence is also present in circHIPK3, precisely where IGF2BP2 binds. The study further focuses on one specific case, the STAT3 gene, whose mRNA product is downregulated upon circHIPK3 depletion apparently through sequestering IGF2BP2, which otherwise binds to and stabilizes STAT3 mRNA. The study presents mechanistic insight into the interactions, sequence motifs, and stoichiometries of the molecules involved in this new mode of regulation. Altogether, this new mechanism seems to underlie the effects of circHIPK3 in cancer progression.

      Strengths:

      The authors show mechanistic insight into a proposed novel "sponging" function of circHIPK3 which is not mediated by sequestering miRNAs but rather by a specific RNA binding protein (IGF2BP2). They address the stoichiometry of the molecules involved in the interaction, which is a critical aspect that is frequently overlooked in this type of study. They provide both genome-wide analysis and a specific case (STAT3) that is relevant for cancer progression.

      Weaknesses:

      One of the central conclusions of the manuscript, namely that circHIPK3 sequesters IGF2BP2 and thereby regulates target mRNAs, lacks more direct experimental evidence such as rescue experiments where both species are simultaneously knocked down. CircRNA overexpression lacks a demonstration of circularization efficiencies. There seem to be contradictory effects of circHIPK3 and STAT3 depletion in cancer progression, namely that while circHIPK3 is frequently downregulated in cancer, circHIPK3 downregulation in this study leads to downregulation of STAT3. This does not seem to fit the fact that STAT3 is normally activated in a wide diversity of cancers and is positively associated with cell proliferation. The result is neither consistent with the fact that circHIPK3 expression positively correlates with good clinical outcomes. Overall, the authors have achieved some of their aims but additional controls would be advisable to fully support their conclusions.

      We thank the reviewer for the important and constructive criticism. All the raised points have now been addressed as described below.

      Rescue experiment:

      We have now performed the suggested rescue experiment, exploring the potential normalization of target expression upon double knockdown (both circHIPK3 and IGF2BP2). Expression of targets STAT3, NEU and TRAPPC9 were assessed, and all target mRNAs became normalized upon double knockdown, supporting our suggested IGF2BP2 sponging mechanism for circHIPK3. These results have been included in Supplementary Figure 5F.

      Circularization efficiency of ectopically expressed circRNAs:

      For efficient expression of circRNAs in human cells, we have used a state-of-the-art plasmid construct (Laccase2-circRNA; Kramer et al., 2015, Genes Dev. 2015 Oct 15;29(20):2168-82. doi: 10.1101/gad.270421.115), which has proved superior to many alternatives presented in the literature. To ensure proper circularization efficiency of circHIPK3, we have now subjected purified RNA from transfected HEK293 cells (and from HEK293 Flp-In T-Rex cells with stable integration of cassette) to northern blotting (Supplementary Figure S5H). This demonstrates the production of a single RNase R resistant band of correct size, for both circHIPK3 expression constructs. Due to relatively weak signal to noise ratio (rRNA background), we are unable to calculate an accurate linear-to-circ ratio. Nevertheless, the results suggest efficient production of WT and mutant circHIPK3 using the Laccase2 vector system.

      circHIPK3 and STAT3 expression in cancer:

      It is correct that STAT3 expression is oden positively correlated with disease progression in many patients suffering from different cancers, and that the observed expression pattern with downregulation of circHIPK3 and STAT3 in BC cells can be perceived as counterintuitive. We note that the STAT3 profile in our time-course knockdown experiments is somewhat dynamic. While downregulation of STAT3 is most pronounced After 24 hrs of circHIPK3 knockdown, the expression tends to be more normalized After 48 and 72 hrs, which could be due to initiating compensatory mechanisms elicited by the cells. Indeed, comparing long-term development of tumors in patients, with numerous primary and accumulating secondary effects, to transient (0-72 hrs) geneexpression analyses has limitations. In addition, despite the oncogenic role of STAT3 having been widely demonstrated, evidence suggest that STAT3 functions are multifaced and not always trivial to classify. Recent evidence has shown that STAT3 can have opposite functions in cancer and act as both a potent tumor promoter and a tumor suppressor (reviewed in Tolomeo and Cascio, 2021, Int J Mol Sci. 2021 Jan; 22(2): 603. doi: 10.3390/ijms22020603). We have now discussed this in more detail (in the discussion section) and stated some of the limitations of our study in terms of the regulation of the STAT3/p53 axis.

      Reviewer #2 (Public Review):

      The manuscript by Okholm and colleagues identified an interesting new instance of ceRNA involving a circular RNA. The data are clearly presented and support the conclusions. Quantification of the copy number of circRNA and quantification of the protein were performed, and this is important to support the ceRNA mechanism.

      We thank the reviewer for the positive feedback.

      Reviewer #3 (Public Review):

      In Okholm et al., the authors evaluate the functional impact of circHIPK3 in bladder cancer cells. By knocking it down and performing an RNA-seq analysis, the authors found thousands of deregulated genes that look unaffected by miRNAs sponging function and that are, instead, enriched for an 11mer motif. Further investigations showed that the 11-mer motif is shared with the circHIPK3 and able to bind the IGF2BP2 protein. The authors validated the binding of IGF2BP2 and demonstrated that IGF2BP2 KD antagonizes the effect of circHIPK3 KD and leads to the upregulation of genes containing the 11-mer. Among the genes affected by circHIPK3 KD and IGF2BP2 KD (resulting in downregulation and upregulation, respectively) the authors found the STAT3 gene. This was accompanied by consistent concomitant upregulation of one of its targets, TP53. The authors propose a mechanism of competition between circHIPK3 and IGF2BP2 triggered by IGF2BP2 nucleation, potentially via phase separation.

      Strengths:

      The number of circRNAs continues to drastically grow; however, the field lacks detailed molecular investigations. The presented work critically addresses some of the major pi‘alls in the field of circRNAs and there has been a careful analysis of aspects frequently poorly investigated. The timepoint KD followed by RNA-seq, investigation of the miRNAs-sponge function of circHIPK3, identification of 11-mer motif, identification, and validation of IGF2BP2, and the analysis of copy number ratio between circHIPK3 and IGF2BP2 in assessing the potential ceRNA mode of action have been extensively explored and, comprehensively are convincing.

      Weaknesses:

      In some parts, the manuscript lacks appropriate internal controls (eg: comparison with normal bladder cells, linear transcript measurements upon the KD, RIP internal controls/ WB analysis, etc), statistical analysis and significance (in some qPCRs), exhaustive description in the methods of microscopy and image analysis, western blot, and a separate section of cell lines used. The use of certain cell lines bladder cancer cells vs non-bladder cells in some experiments for the purpose of the study is also unclear.

      Overall, the presented study adds new knowledge in describing circHIPK3 function, its capability to regulate some downstream genes and its interaction and competition for IGF2BP2. However, whereas the experimental part appears technically logical, it remains unclear the overall goal of this study and the final conclusions. The mechanism of condensation proposed, although interesting and encouraging, would need further experimental support and information, especially in the context of cancer.

      In summary, this study is a promising step forward in the comprehension of the functional role of circHIPK3. These data could possibly help to better understand the circHIPK3 role in cancer.

      We thank the reviewer for the important and constructive criticism. All the raised points have now been addressed as described below.

      Internal controls/description of methods:

      We have now included suggested internal controls and provided statistical significance measures where needed. We have also described in more detail the usage of different cell lines for different experiments and a comprehensive description of microscopy, image, and western analyses.<br /> The condensation mechanism of circHIPK3 and IGF2BP2 that we propose has been toned down slightly in the discussion, as we agree that these observations are not unequivocal and could potentially be explained by alternative and yet undefined events as discussed in further detail.

      Recommendations for the authors:

      Major points

      (1) In Figure 1B the authors show neither error bars nor statistical analysis. Did they sequence each cell line in single replicates? A clarification on this point would be of help.

      All timepoints for J82 and UMUC3 were sequenced in biological triplicates (Figure 1C-G). The data shown in Figure 1B represents prior single RNA-seq runs of all specific cell lines sequenced for selection of appropriate BC cell lines used for further study.

      (2) In Figure 1C the quantification of the cognate linear Hipk3 RNA would be desired in order to rule out changes in this species levels that could account for the observed effects upon circHIPK3 KD.

      We do not observe a non-specific downregulation of the HIPK3 mRNA upon circHIPK3 knockdown, rather we observe a moderate upregulation at later timepoints. However, western blotting shows that this upregulation is not translated into significantly increased protein levels. This data is now available in Supplementary Figure S1A and S1B.

      (3) In Supplementary Figure S1B the authors show the number of differentially expressed genes between time points and baseline upon circHIPK3 KD or scr siRNA transfection. However, in this referee's opinion, the relevant comparison would be the differentially expressed genes between circHIPK3 KD and scr siRNA at different time points. Otherwise, they would be focusing on both circHIPK3-specific and non-specific effects.

      The requested comparison is part of the main figures (Figure 1F). The plotted data in Supplementary Figure 1B (Supplementary Figure S1D in the revised version) was included to allow the reviewer to better assess the variability in the data. We therefore believe it provides relevant information and that it should be kept in the final version.

      (4) Figure 1E. How many hours of KD do these measurements correspond to? Even if they correspond to 72 h, there seems to be a discrepancy between Fig 1E and 1F in terms of the total number of differentially expressed (DE) genes. Why are there more DE genes in 1E?

      The number of differentially expressed genes in Figure 1E represents the total number at all timepoints, while Figure 1F represent single timepoints. We have modified the figure legend to clarify this issue.

      (5) In Figure 3B, in order to verify pulldown efficiency, RT-qPCR should be performed instead of endpoint RT-PCR. Otherwise, no robust claim can be made regarding interaction affinities.

      We agree that these RIP-PCR results in Figure 3B are only semi-quantitative and therefore do not unequivocally assess binding strength. However, since IGF2BP2 is the RNA binding protein in focus throughout the rest of the study, where additional quantitative RIP-RT-qPCR experiments have been performed, we find this issue negligible. In addition, the semi-quantitative nature of the endpoint PCR experiment has now been mentioned in the main text and figure legend.

      (6) The authors claim that IGF2BP2 KD counteracts the effect of circHIPK3 KD on target mRNAs. However, in order to support this claim the authors should perform a rescue experiment where they simultaneously knock down both circHIPK3 and IGF2BP2. Otherwise, the conclusion remains largely supported by a correlation.

      Indeed, such an experiment is important. A rescue experiment with double knockdown has now been performed and demonstrates that levels of tested targets; STAT3, NEU and TRAPPC9 become normalized under these conditions, supporting our IGF2BP2/circHIPK3 sponging model. The data is available in Supplementary Figure S5F.

      (7) The authors claim that circHIPK3 interacts strongly with IGF2BP2 in bladder cancer cells but not with GRWD1. This is shown in Figure 4A where neither standard errors nor statistical analysis is shown. The authors need to show replicates of this experiment and perform statistics in order to support their claims.

      These experiments have been redone with even higher stringency in biological triplicates and fully supports our claims. The data is available in a modified Figure 4A – now including error bars and indications of significance. In addition, we have included western blots demonstrating Input (IN), Flowthrough (FT) and Immunoprecipitation (IP) of correctly sized proteins in Supplementary Figure S4A.

      (8) The authors claim that the STAT3 gene, which contains the 11-mer motif in its 3'UTR, becomes downregulated upon circHIPK3 KD in UMUC3 and J82 cells, while it is upregulated upon IGF2BP2 depletion in both cell lines. It is unclear why they show the effect of circHIPK3 KD on STAT3 within a time course while the effect of IGF2BP2 KD in a fixed time point (Figures 5A/S5A and 5B/S5B respectively), and it would be convenient to clarify this point.

      The initial time course knockdown experiment for circHIPK3 was conducted to provide a comprehensive dataset for circHIPK3-mediated events and clarify any temporal effects. After identification of IGF2BP2 as an interaction partner of circHIPK3, we chose to harvest cells After knockdown at 48 hrs as knockdown efficiency was prominent at this point. The temporal knockdown efficiency of RNAs (circHIPK3) and proteins (IGF2BP2) differ considerably due to increased stability of proteins compared to target RNA. This is the main reason why only a single timepoint has been assessed.

      (9) In Figure 5F the authors show that upon overexpression of wildtype or 11-mer motif-mutant circHIPK3, the binding of IGF2BP2 was reduced while the binding of STAT3 mRNA to IGF2BP2 was increased. In order to rule out differences in circularization efficiencies, it would be convenient to show a northern blot comparing the efficiency of circHIPK3 overexpression relative to its linear cognate RNA for both constructs.

      Indeed, circRNA expression constructs may differ considerably in circularization efficiencies. We are using the Laccase2 system developed by the Jeremy Wilusz lab (Kramer et al., 2015), which, at least in our hands, efficiently produces circRNAs from almost any inserted sequence. To address whether the WT and mutant circHIPK3 express similar amounts of circRNA with high efficiency, we performed the suggested northern blot, which displays very similar RNase R resistant circHIPK3 levels. The data is now available in Supplementary Figure S5H. Due to background signal from 18S rRNA in non-RNase R treated samples, we cannot accurately calculate a linear/circular RNA ratio, since no distinct linear RNA species above background is visible on the blot. However, the important part that mutant and WT (RNase R resistant) circRNA are expressed at similar levels, makes us confident about our conclusion that WT circHIPK3 expression interferes with IGF2BP2 binding to STAT3 mRNA.

      (10) Figure 1G, several genes were selected as up and downregulated for J82 and UMUc3 cell lines. Were these consistently involved in specific biological processes?

      Genes were classified as down or upregulated based on significant (FDR<0.1) fold changes. The most significant genes in both directions were named, disregarding of involvement in any specific biological processes. Initially, we performed a GO-term analysis on these genes and received many hits, but we did not observe a very specific pattern or cluster of genes, suggesting that we are looking at both primary and secondary effects of knocking down circHIPK3. We believe our GSEA of the 50 hallmarks of cancer genes sets, presented in Figure 4D, 4E and Supplementary Figure S4E and S4F is addressing this point in a satisfactory manner.

      (11) For differential expression analysis, which data sets were used to group outcomes at different time points. Also, there is an increased number of genes affected after KD - please describe in more detail how you reached that gene number.

      As also discussed above (point 3), at each timepoint (Figure 1F) “Scr” was compared to “circHIPK3” knockdown. It makes sense that more and more genes are DE over the course of time as both primary and secondary effects of knockdown will build up over time. We have now clarified which datasets have been used in the figure legend and rewritten the Methods’ section on differential expression analysis.

      (12) What happens with the expression of circHIPK3 if STAT3 is KD? What biological processes are modulated by silencing circHIPK3?

      (13) What happens in bladder cancer cells if STAT3 and circHIPK3 are KD?

      The main goal of our work is to clarify how circRNAs (here circHIPK3) affect gene-expression and cancer pathways. While it would be interesting to explore the consequences of STAT3 knockdown and in combination with circHIPK3, such experiments would require comprehensive additional analyses (RNA-seq), which we believe is beyond the scope of this study at this point.

      (14) The rationale of the study and conclusions are unclear. Quote "we extensively evaluate the functional impact of circHIPK3 in bladder cancer cells". As previously published by the authors, as well as mentioned in the manuscript, circHIPK3 is downregulated in cancers and possesses tumor suppressor functions in bladder cancers. Could the authors clarify how the results of the presented study based on the depletion of circHIPK3 fit with the previous discoveries? If the circHIPK3 is generally downregulated compared to normal cells (although higher compared to the linear transcript) why do the authors use a KD approach? Are the bladder cancer cells simply a cell model to study circRNA vs linear? How the condensation model reconciles with circHIPK3 tumor suppressor function based on these results?

      We believe that it remains unclear whether circHIPK3 is a direct tumor suppressor, although this is possible judged from the clinical patient data, since STAT3, which has been shown to become activated in many cancers, is also downregulated upon circHIPK3 knockdown. However, differences in immediate effects on gene-expression of circHIPK3 knockdown (0-72 hrs) and long-term development of tumors within patients, may be difficult to compare directly. If STAT3 downregulation contributes to cancer phenotypes in bladder cancer as suggested for several other cancer types (Glioblastoma, prostate cancer, lung cancer etc.) circHIPK3 may indeed still be classified as a tumor suppressor in bladder cancer. It is worth noting that circHIPK3 has been shown to be upregulated and have oncogenic phenotypes in many other cancers, which makes direct correlations between cancers complex and difficult to reconcile. We have revised the discussion to reflect these issues in a more comprehensive fashion. To fully delve into STAT3 regulation in terms of bladder cancer development, progression, cell invasiveness, and survival, we believe are more suitable for future experiments.

      At this point, we have identified a novel mechanism of a circRNA deregulated in cancer being able to sponge/regulate the function of an oncogenic RNA binding protein, even though it is severely outnumbered in cells. Importantly, circHIPK3 likely does not function as a miRNA sponge as previously proposed in several previous studies based on circRNA overexpression, reporter constructs and miRNA mimics. We therefore believe that these findings provide new important insights into circHIPK3 function and that the current understanding of circRNAs functioning primarily as miRNA sponges, likely should be revised.

      (15) Related to the previous point, if the purpose is to study the role of circHIPK3 in bladder cancer, there is a bit of a lack of consistency and it is sometimes confusing to understand the use of certain cell lines for specific experiments. The initial circHIPK3 KD experiments have been conducted in 2 (out of 11 not malignant/ metastatic) bladder cancer cell lines (J82 and UMUC3). Why this specific selection of exclusively metastatic bladder cell lines? For comparison are the normal bladder cell lines characterized by the same circRNA vs linear ratio?

      The selection of bladder cancer cell lines (J82, UMUC3 and FL3) is based on several criteria including expression levels of circHIPK3, cell maintenance characteristics and knockdown/transfection efficiencies. Initially, we included HT1197 cells as well, but batch effects precluded the use of these data.

      Furthermore, the subsequent miRNA analysis was conducted exclusively in one bladder cell line (J82 but not in UMUC3), the initial identification of motif again in bladder cells but the initial RBP identification and experimental interaction is conducted in non-bladder cells HepG2 and k562 (reported as main figure 3B) and only subsequently in bladder cell (4A), again in a different cell line (only FL3, but not in J82 and UMUC3). The validation of the interaction of STAT3 by RIP is performed exclusively in FL3. All this also makes someone wonder how specific this mechanism/binding is in bladder cancer cells. There is an attempt to explain this by comparing cell cycle progression analysis upon circHIPK3 KD and IGF2BP2 KD later on but the final conclusions of this analysis remain unclear. The authors should provide more explanation and information in this part of the manuscript.

      It is correct that the different bladder cancer cell lines (FL3, J82 and UMUC3) have been used more or less interchangeably between experiments. This is due to the observed common phenotypes, e.g. sharing up to 92% DE genes, and highly significant enrichment of the IGF2BP2 11-mer-motif in downregulated mRNAs upon circHIK3 knockdown in all three cell lines. The ENCODE cell lines HepG2 and K562 were used since the accessible RBP-CLIP data originates from the ENCODE project, where these cells have been used exclusively. Hence, we validated the binding of candidate RBPs (semi-quantitatively) in HepG2 and K562 prior to assessing their RNA binding in the BC cell line FL3. We have used FL3 for RIP and validation of IGF2BP2 binding mainly due to better transfection efficiency and higher expression levels, allowing detection all interrogated components. The fact that we have included three BC cell lines in many experiments instead of only one, and obtained consistent results, solidifies the conclusions that our phenotypes and regulatory mechanisms are likely common for most, if not all, bladder cancer cell lines. We have included a paragraph in the materials and methods section to further clarify the usage of cell lines in the different experiments.

      (16) STAT3 gene is used as an example. Where is this gene coming from? How has this gene been selected? Is there any complete list of RNA-seq data of up/down-regulated genes upon circHIPK3 KD? The raw data and gene list should be publicly available to the reviewers.

      STAT3 is a major regulator of cancer pathways and therefore an interesting candidate for further analysis as it is differentially expressed between control and circHIPK3 knockdown in all cell lines. We have now included the complete list of DE genes from the time-resolved RNA-seq analyses (DESeq2 output files) in the supplementary material. This data is now available in Supplementary Tables S6 and S7.

      (17) In performing the KD of circHIPK3 the authors use a unique siRNA on a splice junction. The authors claim that this is a way to not affect the linear transcript, however, have the authors also ensured experimentally that this doesn't affect in any way the linear RNA? This should be included as an initial internal control.

      We do not observe a downregulation of the HIPK3 mRNA upon circHIPK3 downregulation, rather we observe a moderate upregulation at later timepoints. When assessing the HIPK3 protein levels, we observe no significant change After 48 hrs of knockdown. This data is now available in Supplementary Figure S1A and S1B.

      (18) Additional controls should be provided for RIP, especially for Fig3B and 4A, Sfig4, 5C such as an internal positive control (es: AGAP2-AS1) of the correct pulldown of IGF2BP2 and/or WB should be shown (in the methods it is told that WB has been used for the analysis of RIP but I couldn't find any)

      Indeed, IGF2BP2 likely binds to many mRNAs in the cell. We have now included b-actin mRNA as a low affinity control in the Figure 4A RIP data, showing that circHIPK3 represents a tight binding substrate for IGF2BP2. We have also included a western blot showing the IP of IGF2BP2, IGF2BP2, GRWD1 and GFP. This data is now available in Supplementary Figure S4A.

      (19) Additional internal experimental controls should be included to assess the successful transfection and overexpression of circHIPK3 with the laccase-2 driven plasmid and mutated versions before the RIP in 4B and in the 5F. Supportive controls to show equal transfection would be required for Figure 6C-D. Further controls to show that the ASO specifically targets the 11-mer in circHIPK3 but not IGF2BP2 target genes should also be included. Please include this information in the supplementary materials.

      We have now included a northern blot showing successful transfection and expression of RNase R resistant circHIPK3 from the Laccase2 vector (WT and mutant) in relation to RIP experiments. This data is now available in Supplementary Figure S5H (see also comments about this above). Equal transfections in cells shown in Figure 6C-D is assessed by comparable levels of GFP expression, which is included as an expression cassette in the modified Laccase2 construct. Pictures were acquired with same exposure time and scaling to ensure that they can be compared directly. The ASO targets circHIPK3 with full complementarity, while STAT3 mRNA has 2 mismatches, leaving the “lesser interaction” with STAT3 theoretical. This has now been clarified in the main text.

      (20) Specifically, in 1C and 4A, Sfig4 there is no statistical analysis made and/or significance? This is only reported for the RIP experiment in Fig 5C.

      Statistical analyses have now been performed and shown in Figure 4A and we have included binding of ACTB as a low affinity control. In Figure 1C, which displays knockdown efficiency (highly efficient) at the various timepoints, no statistical significance has been displayed, since this is normally not done for such knockdown experiments. In addition, it is also not clear which comparisons would be beneficial. Except for the J82 cell line at 12 hrs compared to 0 hrs, knockdown efficiency is high and statistically significant at all timepoints.

      (21) In the assessment of copy number ensuring the same primer efficiency is fundamental, it can't be simply "assumed". Please clarify this point and possibly include this information in the supplementary materials.

      It is correct that identical, or at least very similar, primer efficiencies are necessary to make the conclusion that the relationship between GAPDH mRNA and circHIPK3 levels in the cell reflects the quantitatively measured number of molecules. However, since this single comment is only to support the quantitatively measured circHIPK3 molecules by a ballpark estimate, and since we already assume that there are an estimated 10.000-20.000 copies of GAPDH mRNAs in most cells (which we also do not know precisely), we have chosen to remove this statement.

      (22) The methodology section is not well organized and looks incomplete. For example, there are two separate sections for circHIPK3 expression conducted in different cell lines, this would be better explained in a single paragraph.

      We have now rewritten this section to make it clearer.

      The section reporting cell lines and growth conditions is incorporated in "circHIPK3 KD and overexpression" while it should be a separate paragraph and valid for all experiments where these cells have been used. There is no information regarding Western blots, including Antibodies used, and densitometry performed.

      This information has now been included.

      In "immunofluorescence microscopy" it is not clear what microscope has been used, how many acquisitions have been made, and how acquisition has been performed. Related to this, how the image analysis has been performed? Figures 5I-J "Finally, immunofluorescence staining showed that nuclear and overall STAT3 protein levels are significantly lower upon circHIPK3 KD, while nuclear p53 protein levels are higher" and 6C and D "we observed a significantly higher prevalence of large cytoplasmic condensates in cells expressing high levels of circHIPK3 compared to controls" how this quantification has been made? The conclusive part about the condensation role remains a bit too loose and mostly speculative, largely due to the lack of robust information provided on microscopy and image analysis

      We have now included a better description of the acquisition and quantification methods.

      Minor

      (1) The Van Nostrand et al 2018 citation should refer to the updated publication in Nature and not to the original preprint in Biorxiv.

      This reference has now been updated.

      (2) In Supplementary Figure S3B, the authors offer no explanation as to why genes that become upregulated upon circHIPK3 knockdown generally contain more circHIPK3-RBP binding sites other than for IGF2BP2. A clarification would be of help.

      We do not have any evidence to explain this observation. One possibility is that other RBPs elicit mRNA stabilizing effects on average, whereas abundant IGF2BP2 (~ 120.000-200.000 copies per cell) now able to bind more target mRNAs and elicit destabilization. This remains highly speculative though.

      (3) In Supplementary Figure S3D, the authors' claim that the 11-mer motif is found more bound to IGF2BP2 than for other circHIPK3-RBPs should be referred to the corresponding dataset/reference.

      This information is stated in the figure legend (K562) and we have now included it in the main text as well: “We evaluated how oden binding sites of circHIPK3-RBPs overlap the 11-mer motif and found that this is more oden the case for IGF2BP2 binding sites than binding sites of the other circHIPK3-RBPs when scrutinizing K562 datasets (Supplementary Figure S3D)”.

      (4) In Figure 4C the authors show that, according to previously performed experiments of their group, the 11-mer motif is enriched in upregulated genes compared to downregulated genes upon IGF2BP2 KD in UMUC3. This seems like a confirmation of the results presented in the preceding section (Figure 3H) and it would be clearer if it were presented in the same section.

      The data in Figure 3H is based on ENCODE data from IGF2BP2 knockdowns in K562 cells, while in Figure 4C these are from IGF2BP2 knockdown followed by sequencing in UMUC3 cells. We believe the timing of the data is fitting as is, since they relate to non-BC cells and BC cells, respectively.

      (5) More in vitro experiments are needed to investigate the implication of circHIPK3 in bladder cancer cell phenotype, and how different cancer hallmarks are modulated by this ceRNA network.

      We agree that this study does not fully clarify how these complex molecular interactions relate to bladder cancer progression, including fluctuations of key cancer genes/proteins. Since our focus has been on the mechanisms of circRNA function in relation to bladder cancer, these issues will await further future experimentation.

      (6) "apparent" competition (introduction - pag4)? Maybe rephrase more appropriately.

      This has been rephrased and “apparent” excluded.´

      (7) Fig1C. Relative quantification. Statistical analysis? Is this significant?

      See also comment to point 20 above. In Figure 1C we show the knockdown efficiency at the different timepoints. At all timepoints knockdowns are highly significant compared to the control (Scr), which is not significantly changed over time. It seems somewhat redundant to include pvalues for such data. Also, which comparisons should be highlighted? Knockdown is highly efficient, which is what we want to show.

      (8) Figure 5H. Western blot. Densitometry quantification performed, how?

      This is now described in the Materials and Methods section.

      (9) Please specify the concentration of circHIPK3-specific siRNA used.

      20 nM. The information is included in the Materials and Methods section.

      (10) The control sample refers to scrambled or untreated cells? Instead of using "control samples without siRNA transfection" or "No siRNA" use untreated cells - otherwise, it is a bit confusing.

      This has now been modified.

      (11) Figure 3 is starting with hepatocellular and leukemia cells; why not with bladder cells?

      These experiments were performed based on CLIP-data and RBP knockdown data from the ENCODE project. The cells used are limited to HepG2 and K562.

      (12) For Figure 4B, which is the time-point?

      This is 24 hrs. Has now been stated.

      (13) Figure 5I and J, the expression of STAT3 and circHIPK3 can be also investigated for cellular distribution.

      The expression of STAT3 is investigated in Figure 5I. Localization of circRNA by standard RNA-FISH protocols using multiple (>20) probes is inherently difficult due to the cross reaction of probes with the linear mRNA. Certain amplification steps can be included if using a single backsplicing junction probe, but this is oden giving rise to highly ambiguous results as specificity is very limited due to the “one probe“ nature of the design.

      (14) Some discussion of the limitations of the study would be of value.

      We have included this in the discussion.

    2. eLife assessment

      This study explores the role of one the most abundant circRNAs, circHIPK3, in bladder cancer cells, providing convincing data that circHIPK3 depletion affects thousands of genes and that those downregulated (including STAT3) share an 11-mer motif with circHIPK3, corresponding to a binding site for IGF2BP2. The experiments demonstrate that circHIPK3 can compete with the downregulated mRNAs targets for IGF2BP2 binding and that IGF2BP2 depletion antagonizes the effect of circHIPK3 depletion by upregulating the genes containing the 11-mer motif. These valuable findings contribute to the growing recognition of the complexity of cancer signaling regulation and highlight the intricate interplay between circRNAs and protein-coding genes in tumorigenesis.

    3. Reviewer #1 (Public Review):

      Short Assessment

      In this work the authors propose a new regulatory role for one the most abundant circRNAs, circHIPK3. They demonstrate that circHIPK3 interacts with an RNA binding protein (IGF2BP2), sequestering it away from its target mRNAs. This interaction is shown to regulates the expression of hundreds of genes that share a specific sequence motif (11-mer motif) in their untranslated regions (3'-UTR), identical to one present in circHIPK3 where IGF2BP2 binds. The study further focuses on the specific case of STAT3 gene, whose mRNA product is found to be downregulated upon circHIPK3 depletion. This suggests that circHIPK3 sequesters IGF2BP2, preventing it from binding to and destabilizing STAT3 mRNA. The study presents evidence supporting this mechanism and discusses its potential role in tumor cell progression. These findings contribute to the growing complexity of understanding cancer regulation and highlight the intricate interplay between circRNAs and protein-coding genes in tumorigenesis.

      Strengths:<br /> The authors show mechanistic insight into a proposed novel "sponging" function of circHIPK3 which is not mediated by sequestering miRNAs but rather a specific RNA binding protein (IGF2BP2). They address the stoichiometry of the molecules involved in the interaction, which is a critical aspect that is frequently overlooked in this type of studies. They provide both genome-wide analysis and a specific case (STAT3) which is relevant for cancer progression. Overall, the authors have significantly improved their manuscript in their revised version.

      Weaknesses:<br /> While the authors have performed northern blots to measure circRNA levels, an estimation of the circRNA overexpression efficiency, namely the circular-to-linear expression ratio, would be desired. The seemingly contradictory effects of circHIPK3 and STAT3 depletion in cancer progression, are now addressed by the authors in their revised manuscript, incorporating potential reasons that might explain such complexity.

      Major points about revised manuscript

      (1) In Supplementary Figure S5H, the membrane may have been trimmed too closely to the circRNA band, potentially resulting in the absence of the linear RNA band. Could the authors provide a full image of the membrane that includes the loading points? Having access to the complete image would allow for a more comprehensive evaluation of the results, including the presence or absence of expected linear and circular RNA bands.

    4. Reviewer #2 (Public Review):

      Summary:

      The authors have diligently addressed most of the points raised during the review process (except the important point of "additional in vitro experiments [...] needed to investigate the implication of circHIPK3 in bladder cancer cell phenotype" for which no additional experiments were performed), resulting in an improvement in the study. The data are now described with clarity and conciseness, enhancing the overall quality of the manuscript.

      Strengths:

      New, well-defined molecular mechanism of circRNAs involvement in bladder cancer.

      Weaknesses:

      Lack of solid translational significance data.

    5. Reviewer #3 (Public Review):

      In Okholm et al., the authors evaluate the functional impact of circHIPK3 in bladder cancer cells. By knocking down circHIPK3 and performing an RNA-seq analysis, the authors found thousands of deregulated genes which look unaffected by miRNAs sponging function and that are, instead, enriched for a 11-mer motif. Further investigations showed that the 11-mer motif is shared with the circHIPK3 and able to bind the IGF2BP2 protein. The authors validated the binding of IGF2BP2 and demonstrated that IGF2BP2 KD antagonizes the effect of circHIPK3 KD and leads to the upregulation of genes containing the 11-mer. Among the genes affected by circHIPK3 KD and IGF2BP2 KD, resulting in downregulation and upregulation respectively, the authors found the STAT3 gene, which also consistently has concomitant upregulation of one of its targets TP53. The authors propose a mechanism of competition between circHIPK3 and IGF2BP2 triggered by IGF2BP2 nucleation, potentially via phase separation.

      Strengths:

      Although the number of circRNAs continues to grow, this field lacks many instances of detailed molecular investigations. The presented work critically addresses some of the major pitfalls in the field of circRNAs, and there has been a careful analysis of aspects frequently poorly investigated. Experiments involving use of time-point knockdown followed by RNA-seq, investigation of miRNA-sponge function of circHIPK3, identification of 11-mer motif, identification and validation of IGF2BP2, and the analysis of copy number ratio between circHIPK3 and IGF2BP2 in assessing the potential ceRNA mode of action are thorough and convincing.

      Weaknesses:

      It is unclear why the authors used certain bladder cancer cells versus non-bladder cells in some experiments. The efficacy of certain experiments (specifically rescue experiments) and some control conditions is still questionable. Overall, the presented study adds some further knowledge in describing circHIPK3 function, its capability to regulate some downstream genes, and its interaction and competition for IGF2BP2.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors were attempting to determine the extent that CIH altered swallowing motor function; specifically, the timing and probability of the activation of the larygneal and submental motor pools. The paper describes a variety of different motor patterns elicited by optogenetic activation of individual neuronal phenotypes within PiCo in a group of mice exposed to CIH. They show that there are a variety of motor patterns that emerge in CIH mice; this is apparently different than the more consistent motor patterns elicited by PiCo activation in normoxic mice (previously published).

      Strengths:

      The preparation is technically challenging and gives valuable information related to the role of PiCo in the pattern of motor activation involved in swallowing and its timing with phrenic activity. Genetic manipulations allow for the independent activation of the individual neuronal phenotypes of PiCo (glutamatergic, cholinergic) which is a strength.

      We thank the reviewers for acknowledging and summarizing the strengths of this study.

      Weaknesses:

      (1) The data presented are largely descriptive in terms of the effect of PiCo activation on the probability of swallowing and the pattern of motor activation changes following CIH. Comparisons made between experimental data acquired currently and those obtained in a previous cohort of animals (possibly years before) are extremely problematic, with the potential confounding influence of changing environments, genetics, and litter effects. The statistical analyses (i.e. comparing CIH with normoxic) appear insufficiently robust. Exactly how the data were compared is not described.

      Yes, we agree the data are descriptive in terms of characterizing the effect of CIH on PiCo activation. However, we would like to emphasize that the data are also mechanistic because they characterize the effects of specifically, optogenetically manipulating PiCo neurons after being exposed to CIH.

      Thank you for this comment and for pointing out our misleading description in the paper. This manuscript is meant to independently characterize the effects of CIH to the response of PiCo stimulation. We are not making direct comparisons between the previously published manuscript where mice were exposed to room air. There has been no statistical analysis made between previously published control and current CIH data, since we are not making a direct comparison, only an observational comparison.

      To make this clearer, and to address the reviewers concern, we have removed the room air data from figures 1E, 2C and 3A. However, we believe it is important to keep the data from mice exposed to room air in Figure 2B since we did not include this information in the previously published manuscript. It is important to point out that all mice exposed to CIH have some form of submental activity during laryngeal activation in response to PiCo stimulation. This is not the case when mice are exposed to room air only. In this figure, only descriptive analysis are presented. We adjusted our wording throughout the text, particularly in the discussion, to eliminate any confusion that we are making direct comparisons between the two studies. The following sentence has been added to the discussion “While we do not intend to make direct quantitative comparisons between the previously published PiCo-triggered swallows in control mice exposed to room air (Huff et al 2023) and the data presented here for mice exposed to CIH, we believe it is important to compare the conclusions made in these two studies.” This was the motivation for using the eLife Advance format. Since the present study demonstrates that PiCo affects swallow patterning which was not observed in the control data.

      (2) There is limited mechanistic insight into how PiCo manipulation alters the pattern and probability of motor activation. For example, does CIH alter PiCo directly, or some other component of the circuit (NTS)? Techniques that silence or activation projections to/from PiCo should be interrogated. This is required to further delineate and define the swallowing circuit, which remains enigmatic.

      We agree with the reviewer that our study raises many more questions than we are able to answer at the moment. This however applies to most scientific studies. Even though swallowing has been studied for many decades, the underlying circuitry remains largely enigmatic. We will continue to investigate the role of PiCo and its interaction with the NTS, in healthy and diseased states. These investigations require many different techniques, and approaches, some of which are still in development. For example, we are currently conducting experiments that silence portions of the NTS related to swallow and PiCo: ChAT/Vglut2 neurons using novel unpublished viral approaches. However, these are separate and ongoing studies beyond the scope of the current one.

      To address the reviewer’s comment, we have added to the following to the limitation section: “In addition, this preparation does not allow for recording of PiCo neurons to evaluate the direct effects of CIH in PiCo neuronal activity”. The following has also been added to the discussion: “Rather, our data reveal CIH disrupts the swallow motor sequence which is likely due to changes in the interaction between PiCo and the SPG, presumably located in the cNTS. While it has previously been demonstrated that PiCo is an important region in swallow-breathing coordination (Huff et al., 2023), previous studies did not demonstrate that PiCo is involved in swallow motor patterning itself. Here we show for the first time that CIH leads to disturbances in the generation of the swallow motor pattern that is activated by stimulating PiCo. This suggests that PiCo is not only important for coordinating swallow and breathing, but also modulating swallow motor patterning. Further studies are necessary to directly evaluate the presumed interactions between PiCo and the cNTS.”

      (3) The functional significance of the altered (non-classic) patterns is unclear.

      Like in our original study, the preparation used to stimulate PiCo does not allow to simultaneously characterize the functional significance of swallowing. Therefore, we have included this as a limitation in the limitation section: “In this preparation we are unable to directly determine the functionality of the variable swallow motor pattern seen after CIH. Different experimental techniques, such as videofluoroscopy would need to be used to directly evaluate functional significance. This technique is beyond the scope of this study and not possible to perform in this preparation. We acknowledge this limits our ability to make direct comparisons between dysphagic swallows in OSA patients.”

      Reviewer #1 (Recommendations For The Authors):

      (1) A more rigorous experimental approach is required. Littermates should be separated and exposed to either room air or CIH at the same (or close to the same) time.

      As stated above, we did not directly compare mice exposed to room air with mice exposed to CIH. Hence, we believe this is not necessary, and it would have meant repeating all the experiments already published in the original eLife paper.

      (2) Robust statistical analyses are required to determine whether the effects of CIH on the pattern/probability of motor activation are required.

      Since control and CIH group were not compared in this study, statistical hypothesis testing is not appropriate or applicable.

      (3) Use a combination of retrograde, Cre- AAVs and Cre-dependent approaches to interrogate the circuitry to/from PiCO that forms the swallowing network. This is what is needed to push this area forward, in my view.

      Thank you for this suggestion, we will consider this suggestion as we plan for future experiments. Indeed, we are in the process of developing novel approaches. However, in this context we would like to emphasize that further network investigations are exponentially more complicated given that we need to use a Flpo/Cre approach to specifically characterize the glutamatergic-cholinergic PiCo neurons. Most other laboratories that have studied PiCo have avoided this experimental complication and used only a “cre-dependent” approach. This approach is much simpler, but the data are much less specific and the conclusions sometimes misleading. Stimulating for example cholinergic neurons in the PiCo area will also activate Nucleus ambiguus neurons, stimulating glutamatergic neurons will also activate glutamatergic neurons that are not necessarily the glutamatergic/cholinergic neurons that we use to define PiCo specifically. Readers that are unfamiliar with these different approaches often miss this important difference. Hence, compared to stimulating other areas, stimulating the cholinergic-glutamatergic neurons in PiCo is much more specific than e.g. stimulating preBötzinger complex neurons. There are no markers that will specifically stimulate only preBötzinger complex neurons or neurons in the parafacial Nucleus. Unfortunately, this difference is often overlooked.

      (4) It should be made more clear how each of the "non-classic" swallowing patterns could cause dysfunction - especially to the reader who is not completely familiar with the neural control of swallowing.

      We agree that it would be helpful to understand the functional implications of these alterations in swallow-related motor activation, however since our approach does not allow us to use any tools to measure or evaluate functional activity it would be inappropriate to make suggestions of this type without any data to back up our conclusion. This is why we have not speculated on the functional implications. We have added the following to the discussion section of this manuscript. “While fine wire EMG studies are an excellent evaluation tool to observe temporal motor pattern of sequential swallow related muscles; it must be combined with tools such as videofluoroscopic swallow study (VFSS) and/or high resolution manometry (HRM) in order to characterize the functional significance of these alterations to the swallow motor pattern shown in this study (Park et al., 2017). Since the preparation in this study utilizes only fine wire EMGs we are not able to evaluate or comment on the functional significance of the variable swallow motor patterns. ”

      Minor:

      The Results should be written in a way that better conveys the neurophysiological effects of the manipulations. As it stands, it reads like a statistical report on how activation of each neuronal phenotype is statistically different from each other. As such it is difficult to read and understand the salient findings.

      Thank you for this insight. We have adjusted the language in the results section.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors investigated the role of a medullary region, named Postinspiratory Complex (PiCo), in the mediation of swallow/laryngeal behaviours, their coordination with breathing, and the possible impact on the reflex exerted by chronic intermittent hypoxia (CIH). This region is characterized by the presence of glutamatergic/cholinergic interneurons. Thus, experiments have been performed in single allelic and intersectional allelic recombinase transgenic mice to specifically excite cholinergic/glutamatergic neurons using optogenetic techniques, while recording from relevant muscles involved in swallowing and laryngeal activation. The data indicate that in anaesthetized transgenic mice exposed to CIH, the optogenetic activation of PiCo neurons triggers swallow activity characterized by variable motor patterns. In addition, these animals show an increased probability of triggering a swallow when stimulation is applied during the first part of the respiratory cycle. They conclude that the PiCo region may be involved in the occurrence of swallow and other laryngeal behaviours. These data interestingly improve the ongoing discussion on neural pathways involved in swallow-breathing coordination, with specific attention to factors leading to disruption that may contribute to dysphagia under some pathological conditions.

      The Authors' conclusions are partially justified by their data. However, it should be acknowledged that the impact of the study is to a certain extent limited by the lack of knowledge on the source of excitatory inputs to PiCo during swallowing under physiological conditions, i.e. during water-evoked swallowing. Also the connectivity between this region and the swallowing CPG, a structure not well defined, or other brain regions involved in the reflex is not known.

      We thank the reviewer for the comments and the strength of the paper. However, with regards to the “lack of knowledge”, we would like to emphasize that PiCo was first described in 2016, while e.g. the preBötzinger complex was described in 1991. Thus, it is not fair to assume the same level of anatomical and physiological understanding for PiCo as we became accustomed to for the preBötzinger complex. We are fairly confident that in 25 years from now, our knowledge of the in- and outputs of PiCo will be much less limited than it currently is.

      Strengths:

      Major strengths of the manuscript:

      • The methodological approach is refined and well-suited for the experimental question. The in vivo mouse preparation developed for this study takes advantage of selective optogenetic stimulation of specific cell types with the simultaneous EMG recordings from upper airway muscles involved in respiration and swallowing to assess their motor patterns. The animal model and the chronic intermittent hypoxia protocol have already been published in previous papers (Huff et al. 2022, 2023).

      • The choice of the topic. Swallow disruption may contribute to the dysphagia under some pathological conditions, such as obstructive sleep apnea. Investigations aimed at exploring and clarifying neural structures involved in this behaviour as well as the connectivity underpinning muscle coordination are needed.

      • This study fits in with previous works. This work is a logical extension of previous studies from this group on swallowing-breathing coordination with further advances using a mouse model for obstructive sleep apnea.

      We thank the reviewers for acknowledging and summarizing the strengths of this study.

      Weaknesses:

      Major weaknesses of the manuscript:

      • The Authors should be more cautious in concluding that the PiCo is critical for the generation of swallowing itself. It remains to demonstrate that PiCo is necessary for swallowing and laryngeal function in a more physiological situation, i.e. swallow of a bolus of water or food. It should be interesting to investigate the effects of silencing PiCo cholinergic/glutamatergic neurons on normal swallowing. In this perspective, the title should be slightly modified to avoid "swallow pattern generation" (e.g. Chronic Intermittent Hypoxia reveals the role of the Postinspiratory Complex in the mediation of normal swallow production).

      Thank you for pointing out that this manuscript suggest PiCo is necessary for swallow generation. We agree further interventions to silence specifically PiCo ChAt/Vglut2 neurons will be necessary to investigate this claim. Which we have begun to evaluate for a future study by developing a novel as yet unpublished approach. We have altered language throughout the text to limit the perception that PiCo is the swallow pattern generator. We have also changed the title to say: Chronic Intermittent Hypoxia reveals the role of the Postinspiratory Complex in the mediation of normal swallow production

      • The duration of swallows evoked by optogenetic stimulation of PiCo is considerably shorter in comparison with the duration of swallows evoked by a physiological stimulus (water). This makes it hard to compare the timing and the pattern of motor response in CIH-exposed mice. In Figure 1, the trace time scale should be the same for water-triggered and PiCo-triggered swallows. In addition, it is not clear if exposure to CIH alters the ongoing respiratory activity. Is the respiratory rhythm altered by hypoxia? If a disturbed or irregular pattern of breathing is already present in CIH-exposed mice, could this alteration interfere with the swallowing behaviour?

      Thank you. We have changed the time scale so that all representative traces are on the same time scale.

      We explained in the original paper (Huff et al 2023) that the significant decrease in PiCo-evoked swallow duration compared to water evoked is likely due to the absence of oral/upper airway feedback. We are not making comparisons of the effects of CIH on swallow motor pattern between water-evoked and PiCo-evoked. Rather, we are only characterizing the effects of CIH on the swallow motor pattern in PiCo-evoked swallows. The purpose of Figure 1A is to show that the rostocaudal submental-laryngeal sequence in water-evoked swallows is preserved in “canonical” PiCo-evoked swallow like is shown in the original study. While we did not measure the effects of CIH on breathing and the respiratory pattern in this study, it has been established, by others, that CIH causes respiratory muscle weakness, impaired motor control of the upper airway and variable respiratory rhythm and rhythm generation. However, when characterizing the timing of swallow in relation to inspiration (Figure 1 Figure Supplement 1) and the reset of the respiratory rhythm (Figure 3 figure supplement 1) and by observationally comparing these results with mice exposed to room air (Huff et al 2023) we do not observe any obvious differences in swallow-breathing coordination. However, a separate study in wild-type mice focusing on a characterization of swallowing via water after CIH would be better suited to achieve a better understanding of the physiological changes of swallowing after CIH. We would like to point out that this has shown in Huff et al 2022 that altering respiratory rate/pattern via activation of various preBötzinger Complex neurons does not change swallow behavior. Except in the case of Dbx1 PreBötC neuron activation, which was independent of CIH. Increasing or decreasing respiratory rate via activation of PreBötC Vgat and SST neurons did not change the swallow pattern rather it changed the timing of when swallows occurred. It has been reported before by others that swallow has a hierarchical control over breathing and has the ability to shut breathing down. We believe that the swallowing behavior is independent of respiratory pattern and alterations in breathing pattern does not necessarily affect the swallow motor pattern rather could affect the swallow timing.

      Reviewer #2 (Recommendations For The Authors):

      Abstract

      Lines 37-41 "Here we show that optogenetic stimulation of ChATcre:Ai32, Vglut2cre:Ai32, and ChATcre:Vglut2FlpO:ChR2 mice exposed to CIH does not alter swallow-breathing coordination, but unexpectedly the generation of swallow motor pattern was significantly disturbed."

      It should be better:

      "Here we show that optogenetic stimulation of ChATcre:Ai32, Vglut2cre:Ai32, and ChATcre:Vglut2FlpO:ChR2 mice exposed to CIH does not alter swallow-breathing coordination, but unexpectedly triggers variable swallow motor patterns".

      Thank you, this has been changed

      Lines 41-43 "This suggests, glutamatergic-cholinergic neurons in PiCo are not only critical for the gating of postinspiratory and swallow activity but also play important roles in the generation of swallow motor pattern." I suggest removing any language claiming PiCo is swallow gating and change "generation" in "modulation"

      "This suggests that glutamatergic-cholinergic neurons in PiCo are not only critical in regulating swallow-breathing coordination but also play important roles in the modulation of swallow motor pattern."

      Thank you, this has been changed

      Introduction:

      Line 88-90: Actually, in Huff et al. 2023 it is said "PiCo acts as an interface between the swallow pattern generator and the preBötzinger complex to coordinate swallow and breathing". Please, change accordingly. Please, remove Toor et al., 2019 since their conclusions are quite different.

      Line 100-101: Please, change the sentence according to the comments reported above.

      Thank you, this has been changed

      Results:

      Lines 104-105: Did you mean: "We confirmed that optogenetic stimulation of PiCo neurons in ChATcre:Vglut2FlpO:ChR2 mice exposed to CIH triggers swallow and laryngeal activation similar to the control mice exposed to room air (Huff et al., 2023)." Otherwise, the sentence is not clear.

      Thank you, this has been changed

      Lines 129-130: This finding is not surprising since similar results have been reported in Huff et al. 2023.

      Thank you, we wanted to confirm that CIH did not alter this characteristic, which it did not. We believe that it is important to include this as it is a criterion for characterizing laryngeal activation.

      Lines 219: The number of water swallows is considerably lower than stimulation-evoked swallows. Why?

      We inject water into the mouth three times. Typically, there is one swallow in response to each water injection. Pico is stimulated 25 times at each duration. If we were to stimulate swallow with water as many times as optogenetic stimulation there would be an adaptive response to the water stimulation and the mouse would not respond. This does not seem to be the case with PiCo stimulation. Simple answer is, there are many more PiCo stimulations than water stimulation.

      Lines 228-232: "PiCo-triggered swallows are characterized by a significant decrease in duration compared to swallows evoked by water in ChATcre:Ai32 mice (265 {plus minus} 132ms vs 144 {plus minus} 101ms; paired t-test: p= 0.0001, t= 5.21, df= 8), Vglut2cre:Ai32 mice (308 {plus minus} 184ms vs 125 {plus minus} 44ms; paired t-test: p= 0.0003, t= 6.46, df= 7), and ChATcre:Vglut2FlpO:ChR2 mice (230 {plus minus} 67ms vs 130 {plus minus} 35ms; paired t-test: p= 0.0005, t= 5.62, df= 8) exposed to CIH (Table S1).".

      Thank you, this has been changed

      Line 252 and 254: remove SEM.

      Thank you, this has been changed

      Discussion

      Line 267: ...(Figure 1Bi), while 28% of PiCo-triggered swallows...

      Thank you, this has been changed

      Lines 283-290: "Thus, CIH does not alter PiCo's ability to coordinate the timing for swallowing and breathing. Rather, our data reveals that CIH disrupts the swallow motor sequence likely due to changes in the interaction between PiCo and the SPG, presumably the cNTS.

      While it has previously been demonstrated that PiCo is an important region in swallow-breathing coordination (Huff et al., 2023), previous studies did not demonstrate that PiCo is involved in swallow pattern generation itself. Thus, here we show for the first time that CIH resulted in the instability of the swallow motor pattern activated by stimulating PiCo, suggesting PiCo plays a role in its modulation.".

      Thank you, this has been changed

      Could the observed effects be due to a non-specific effect of hypoxia on neuronal excitability? In addition, it should be considered that PiCo-triggered swallows lack the behavioural setting of water-evoked swallows and do not activate the sensory component of the SPG to the same extent as the water-evoked swallows.

      Yes, this is very possible. We stated in our first manuscript that the decrease in PiCo-triggered swallow duration, as compared to water-triggered swallow duration, is likely because oral sensory components are not being activated to the same extent (Huff et al. 2023). Since we do not directly measure neuronal excitability, it is not known (in this study) whether CIH causes changes in the excitability to swallow related areas. However, others have shown increased excitability and activity of Vglut2 neurons after CIH exposure (Kline et al 2007,2010), and we have shown e.g. changes in the excitability of preBötC neurons (Garcia et al. 2016, 2017).

      Lines 293-300: The sentence is not clear. Is there any evidence indicating that glutamatergic neurons are differently affected by hypoxia than cholinergic neurons?

      Thank you, these sentences have been changed to increase clarity. The section now reads: There was no statistical difference in the probability of triggering a swallow during optogenetic stimulation of ChATcre:Ai32, Vglut2cre:Ai32 and ChATcre:Vglut2FlpO:ChR2 neurons in mice exposed to room air (Huff et al 2023). However, when exposed to CIH, ChATcre:Ai32 and Vglut2:Ai32 mice have a lower probability of triggering a swallow -- in some mice swallow was never triggered via PiCo activation, while water-triggered swallows remained – compared to the ChATcre:Vglut2FlpO:ChR2 mice. While it is possible that portions of the presumed SPG remain less affected by CIH, which could offset these instabilities to produce functional swallows, our data suggest that PiCo targets microcircuits within the SPG that are highly affected by CIH. The NTS is a primary first site for upper airway and swallow-related sensory termination in the brainstem (Jean, 1984). CIH induces changes to the cardio-respiratory Vglut2 neurons, resulting in an increase in cNTS neuronal activity (Kline, 2010; Kline et al., 2007), as well as changes to preBötzinger neurons (Garcia et al., 2017; Garcia et al., 2016) and ChAT neurons in the basal forebrain (Tang et al., 2020). It is reasonable to suggests that CIH has differential effects on neurons that only express ChATcre and Vglut2cre versus the PiCo-specific interneurons that co-express ChATcre and Vglut2FlpO, emphasizing the importance of targeting and manipulating these PiCo-specific interneurons.”

      Lines 372-374: "Here we show that PiCo, a neuronal network which is critical for the generation of postinspiratory activity (Andersen et al. 2016) and implicated in the coordination of swallowing and breathing (Huff et al., 2023), is severely affected by CIH.".

      Thank you, this has been changed.

      Methods

      Line 398: Did you mean Slc17a6-IRES2-FlpO-D?

      Thank you, this has been changed.

      Line 399: were.

      Thank you, this has been changed.

      Line 403: ... expressing both ChAT and Vglut2 and will be reported as ChATcre:Vglut2FlpO.

      Thank you, this has been changed.

      Line 437: Mice of the ChATcre:Ai32, Vglut2cre:Ai32 and ChATcre:Vglut2FlpO:ChR2 lines were kept in collective cages with food and water ad libitum placed inside custom-built chambers.

      Thank you, this has been changed.

      Line 479: (Figure 6a in Huff et al., 2022).

      Line 497: What does Fig 7 refer to?

      This should say Figure 1- figure supplement 2, This has been changed

      Lines 501-506: "First, swallow was stimulated by injecting 0.1cc of water into the mouth using a 1.0 cc syringe connected to a polyethylene tube. Second, 25 pulses of each 40ms, 80ms, 120ms, 160ms and 200ms continuous TTL laser stimulation at PiCo was repeated, at random, throughout the respiratory cycle. The lasers were each set to 0.75mW and triggered using Spike2 software (Cambridge Electronic Design, Cambridge, UK). These stimulation protocols were performed in all ChATcre:Ai32, Vglut2cre:Ai32, and ChATcre:Vglut2FlpO:ChR2." .

      Thank you, this has been changed.

      Line 526 and 540: (Fig.6 in Huff et al., 2022) and (Fig.6d in Huff et al., 2022).

      Thank you, this has been fixed

      Line 594: Figure 5 doesn't exist. Please, change the sentence.

      Thank you, this has been fixed

      Line 595 and 609: The reference Kirkcaldie et al. 2012 is referred to the neocortex and doesn't seem appropriate. Please, quote the atlas of Paxinos and Franklin.

      Thank you, this has been changed.

      Reference:

      Please, correct throughout the text editing of references by removing e.g J.M. or A. or David D. and so on. Only surnames should be mentioned.

      Thank you, this has been changed.

      Figures:

      Figure 1. A and B as well as the purple arrow are lacking. In addition, optogenetic stimulation is applied during different periods of inspiratory activity and this could impact the swallow motor pattern. In Bv, Non-LAR seems very similar to LAR. In panel E, please add the number of animals.

      Thank you, this has been fixed.

      We used the same optogenetic protocols in the original paper (Huff et al. 2023) and did not observe any changes to the swallow motor patter in relation to the time PiCo was stimulated. The only phase dependent response seen in both control and CIH is when PiCo Is stimulated during inspiration and a swallow is triggered, inspiration will be inhibited. Therefore, we do not believe variability in swallow motor pattern is dependent on the phase of breathing in which PiCo is stimulated.

      Biv LAR has a pause in EMG activity before the swallow begins (red arrow pointing to the pause). While Bv Non-LAR does not have this pause, rather the two behaviors converge (red arrow). In order for something to be considered an LAR the pause must be present which is why we separated these two motor patterns.

      Figure 1 - Figure Supplement 1. Why do the Authors call the lines "histograms"?

      Thank you, this has been fixed. This is a line graph of swallow frequency in relation to inspiration.

      Tables:

      In tables, data are provided as means and standard deviation. Please, specify this in the Method section.

      Thank you, the following is listed in the methods section: “All data are expressed as mean ± standard deviation (SD), unless otherwise noted.”

      Reviewer #3 (Public Review):

      In the present study, the authors investigated the effects of CIH on the swallowing and breathing responses to PICO stimulation. Their conclusion is that glutamatergic-cholinergic neurons from PICO are not only critical for the gating of post-inspiratory and swallow activity, but also play important roles in the generation of swallow motor patterns. There are several aspects that deserve the authors' attention and comments, mainly related to the study´s conclusions.

      • The authors refer to PICO as the generator of post-inspiratory rhythm. However, evidence points to this region as a modulator of post-inspiratory activity rather than a rhythmogenic site (Toor et al., 2019 - 10.1523/JNEUROSCI.0502-19.2019; Oliveira et al., 2021 - 10.1016/j.neuroscience.2021.09.015). For example, sustained activation of PICO for 10 s barely affected the vagus or laryngeal post-inspiratory activity (Huff et al., 2023 - 10.7554/eLife.86103).

      Yes, we did refer to PiCo as the postinspiratory rhythm generator as defined as Anderson et al. 2016. We base this statement on the following criteria and experiments: In Anderson et al. 2016, we demonstrate that PiCo can be isolated in vitro, that PiCo neurons are activated in phase with postinspiration, and that they are inhibited during inspiration by preBötC neurons via GABAergic mechanisms and not glycinergic mechanisms. We also demonstrate that optogenetically stimulating cholinergic neurons in the PiCo area resets the inspiratory rhythm both in vivo and in vitro. We also show that PiCo when isolated in transverse slices is autorhythmic and that PiCo, like the preBötC in transverse slices can generate respiratory rhythmic activity in vitro and independent of the preBötC. We also demonstrate that PiCo neurons are an order of magnitude more sensitive to opioids (DAMGO) than the preBötC and that local injections of DAMGO into the PiCo area in vivo abolishes postinspiration, and also abolishes the phase delay of the respiratory rhythm. None of these specific rhythmogenic properties have been studied by the Toor study or the Oliveira et al study. Hence, we do not understand why the reviewer cites these studies as evidence for modulation as opposed to rhythmogenic properties. The fact that PiCo is rhythmogenic should not be considered as an “exclusive property”. Specifically, this does not mean that PiCo is also “modulating” the swallow-breathing coordination as we have demonstrated more specifically in the Huff et al study. In the same sentence we also referred to the PreBӧtzinger complex as the inspiratory rhythm generator as defined by Smith et al 1991, and it seems that the reviewer did not object to this reference. But we would like to point out that the same criteria were used to define the preBötzinger complex as we used for PiCo, except that PiCo neurons are better defined than preBötzinger complex neurons. Dbx1 neurons are often used to characterize the PreBötC, but these neurons form a rostrocaudal and ventrodorsal column which involves also glia cells and transcends the preBötC. Glutamatergic neurons are everywhere, and so are Somatostatin or Neurokinin neurons. Moreover, the 1991 study was only performed in vitro, and did not include a histochemical analysis. We would also like to point out that the present manuscript is investigating the role of PiCo in swallow and laryngeal behaviors, and not specifically postinspiration. Thus, we are not entirely sure how this comment relates to this manuscript.

      • The optogenetic activation of glutamatergic and cholinergic neurons from PICO evoked submental and laryngeal responses, and CIH changed these motor responses. Therefore, the authors proposed that PICO is directly involved in swallow pattern generation and that CIH disrupts the connection between PICO and SPG (swallow pattern generator). However, the experiments of the present study did not provide evidence about connections between these two regions nor their possible disruption after CIH, or even whether PICO is part of SPG.

      We have edited the text to suggest PiCo modulates swallow motor sequence in addition to the coordination of swallow and breathing. We have also added that further experiments will be necessary to further investigate the connections between PiCo and SPG. But, unfortunately, compared to PiCo, the SPG is much less defined. As already stated above, it cannot be expected that a single study can address all possible open questions. Clearly, more work needs to be done outside of this study to answer all of these questions, which makes this an exciting area of research.

      • CIH affects several brainstem regions which might contribute to generating abnormal motor responses to PICO stimulation. For example, Bautista et al. (1995 - 10.1152/japplphysiol.01356.2011) documented that intermittent hypoxia induces changes in the activity of laryngeal motoneurons by neural plasticity mechanisms involving serotonin.

      Yes, we thank the reviewer for this comment and we agree that CIH effects multiple brainstem regions. We stated in the manuscript that we are measuring changes in two muscle complexes which spread among three motor neuron pools: hypoglossal nucleus, trigeminal nucleus, and nucleus ambiguus. We have added a discussion on laryngeal activity in the presence of acute bouts of extreme hypoxia, acute intermittent hypoxia, as well as chronic intermittent hypoxia.

      • To support the hypothesis that PICO is directly involved in swallow pattern generation the authors should perform the inhibition of Vglut2-ChAT neurons from PICO and then evoke swallow motor responses. If swallow is abolished when the neurons from this region are inhibited, it would indicate that PICO is crucial to generate this behavior.

      Thank you. We would like to clarify: “involvement” does not mean “necessary for”. Confusing this difference has caused much confusion and debate in the field. Just as an example: We can argue in great length whether inhibition is necessary for respiratory rhythmogenesis in vivo, but I think there is no question that inhibition is involved in respiratory rhythmogenesis in vivo. But to avoid any confusion, we have changed the text to suggest PiCo is involved in the modulation of swallow motor sequence. We agree various additional inhibition experiments are necessary to explain if PiCo is also a necessary component of the SPG, but this is not the question we have set out to address in this study. To specifically target PiCo we must not only inhibit Vglut2 neurons but neurons that express both ChAT and Vglut2. To our knowledge there are no inhibitory DREADD or opsin techniques for cre/FlpO to specifically target these neurons. As stated above, non-experts in the field do not appreciate this technical nuance. However, we have begun to develop novel techniques necessary to inhibit these specific neurons which will be published in the future.

      • In almost all the data presented, the authors observed different patterns of changes in the motor submental and laryngeal responses to PICO activation, including that animals submitted to CIH (6%) presented a "normal" motor response. However, the authors did not discuss the possible explanations and functional implications of this variability.

      We agree that it would be helpful to understand the functional implications of these alterations in swallow-related motor activation, however since we are not using any tools to measure or evaluate functional activity it would be inappropriate to make suggestions of this type without any data to back up our conclusion. This is why we have not included any functional implications. We have added the following to the manuscript. “While fine wire EMG studies are an excellent evaluation tool to observe temporal motor pattern of sequential swallow related muscles; it must be combined with tools such as videofluoroscopic swallow study (VFSS) and/or high resolution manometry (HRM) in order to characterize the functional significance of these alterations to the swallow motor pattern shown in this study (Park et al., 2017). Since the preparation in this study utilizes only fine wire EMGs we are not able to evaluate or comment on the functional significance of the variable swallow motor patterns.”

      • In Figure 4, the authors need to present low magnification sections showing the PICO transfected neurons as well as the absence of transfection in the ventral respiratory column. The authors could also check the scale since the cAmb seems very small.

      Thank you, added different histology images to have a more comparable cAmb. As well as added lower magnification to show absence of transfection in the VRC.

      • Finally, the title does not reflect the study. The present study did not demonstrate that PICO is a swallow pattern generator.

      We have also changed the title to say: Chronic Intermittent Hypoxia reveals the role of the Postinspiratory Complex in the mediation of normal swallow production

    2. eLife assessment

      This important study represents a follow-up of previous papers by Huff et al. (2023) in which the authors further investigate a specific medullary region named the Postinspiratory Complex (PiCo) involved in the control of swallow behaviour and its coordination with breathing. In the present work, they tested the impact of chronic intermittent hypoxia on the swallow motor pattern evoked by optogenetic stimulation of the same medullary area in transgenic mice. These solid results indicate that in chronic intermittent hypoxia-exposed mice PiCo stimulation triggers atypical swallow motor patterns. The experimental procedures are rigorous and technically remarkable. The work will be of interest in the field of respiratory physiology and pathophysiology since a disruption of swallowing and possibly discoordination with breathing may be involved in diseases characterized by the presence of hypoxic conditions such as obstructive sleep apnea.

    3. Reviewer #1 (Public Review):

      Summary:

      Authors were attempting to determine the extent that CIH altered swallowing motor function; specifically, the timing and probability of the activation of the larygneal and submental motor pools. The paper describes a variety of different motor patterns elicited by optogenetic activation of individual neuronal phenotypes within PiCo in a group of mice exposed to CIH. They show that there are a variety of motor patterns that emerge in CIH mice; this is apparently different than the more consistent motor patterns elicited by PiCo activation in normoxic mice (previously published)

      Strengths:

      The preparation is technically challenging and gives valuable information related to the role of PiCo in the pattern of motor activation involved in swallowing and its timing with phrenic activity. Genetic manipulations allow for the independent activation of the individual neuronal phenotypes of PiCo (glutamatergic, cholinergic) which is a strength.

      Weaknesses:

      (1) Comparisons made between experimental data acquired currently with those previously published are extremely problematic, with the potential confounding influence of changing environments, genetics and litter effects. For example, were the current mice tested at the same time as those exposed to normoxia? Are they littermates (or at least from the same colony) as those previously examined? If they were tested at the same time and age, then the authors should explicitly state this in the methods. The authors have provided no statistical analyses to determine whether there is an effect of CIH on the motor patterns. In short, how can they be sure that the phenomena they observe with respect to motor patterns is due to CIH?

      (2) The data are descriptive in nature, reporting only differences (diversity) of motor patterns in this cohort of animals exposed to CIH. There is limited mechanistic insight into how PiCo manipulation alters the pattern and probability of motor activation. Can they utilize Fos or marker of activation within the nTS or other regions to provide initial insight? Or in another nucleus that contributes as part of the circuit.

      (3) The differences between the genotypes (ChaTcre; Vglut2Cre; ChatCre:Vglut2FlpO) with regard to the probability of generating a swallow are not sufficiently discussed, in my view. If, as the authors state, it is "reasonable to suggest that CIH differentially affects" these populations, then what are some viable reasons? What are the known differences in these populations of neurons that could lead to variable responses? Do they project to different places?

      (4) The Results section is difficult to follow and interpret. It would be beneficial to have a couple of sentences after each sub-section stating what the data actually mean. As of now it reads like a statistical report of the data with little "basic" interpretation of the data.

      (5) I have a hard time understanding the functional significance of calculating and plotting the degree of correlation between shifting/delaying the following inspiratory burst and triggering a swallow.

    4. Reviewer #2 (Public Review):

      The manuscript has been revised according to Reviewer's suggestions. Recommendations for the Authors have been almost entirely followed. However, there are some points where the authors state that they have made changes, but the text does not show this. The revised version would have gained in clarity if it was with track changes and numbered rows. In particular, I cannot see the following changes:

      Lines 104-105: Did you mean: "We confirmed that optogenetic stimulation of PiCo neurons in ChATcre:Vglut2FlpO:ChR2 mice exposed to CIH triggers swallow and laryngeal activation similar to the control mice exposed to room air (Huff et al., 2023)." Otherwise, the sentence is not clear.<br /> Thank you, this has been changed

      Lines 228-232: "PiCo-triggered swallows are characterized by a significant decrease in duration compared to swallows evoked by water in ChATcre:Ai32 mice (265 {plus minus} 132ms vs 144 {plus minus} 101ms; paired t-test: p= 0.0001, t= 5.21, df= 8), Vglut2cre:Ai32 mice (308 {plus minus} 184ms vs 125 {plus minus} 44ms; paired t-test: p= 0.0003, t= 6.46, df= 7), and ChATcre:Vglut2FlpO:ChR2 mice (230 {plus minus} 67ms vs 130 {plus minus} 35ms; paired t-test: p= 0.0005, t= 5.62, df= 8) exposed to CIH (Table S1).".<br /> Thank you, this has been changed

      Lines 283-290: "Thus, CIH does not alter PiCo's ability to coordinate the timing for swallowing and breathing. Rather, our data reveals that CIH disrupts the swallow motor sequence likely due to changes in the interaction between PiCo and the SPG, presumably the cNTS.

      While it has previously been demonstrated that PiCo is an important region in swallow-breathing coordination (Huff et al., 2023), previous studies did not demonstrate that PiCo is involved in swallow pattern generation itself. Thus, here we show for the first time that CIH resulted in the instability of the swallow motor pattern activated by stimulating PiCo, suggesting PiCo plays a role in its modulation.".<br /> Thank you, this has been changed

      Line 437: Mice of the ChATcre:Ai32, Vglut2cre:Ai32 and ChATcre:Vglut2FlpO:ChR2 lines were kept in collective cages with food and water ad libitum placed inside custom-built chambers.<br /> Thank you, this has been changed.

      Overall, the manuscript has been improved.

    1. Author Response

      eLife assessment

      This valuable study examines the activity and function of dorsomedial striatal neurons in estimating time. The authors examine striatal activity as a function of time and the impact of optogenetic striatal manipulation on the animal's ability to estimate a time interval. However, the task's design and methodology present several confounding factors that mean the evidence in support of the authors' claims is incomplete. With these limitations addressed, the work would be of interest to neuroscientists examining how striatum contributes to behavior.

      We appreciate the editorial process and are grateful for the thorough, detailed, and constructive reviews. We will respond in detail to every point raised by reviewers in a full revision.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, the authors examine the activity and function of D1 and D2 MSNs in dorsomedial striatum (DMS) during an interval timing task. In this task, animals must first nose poke into a cued port on the left or right; if not rewarded after 6 seconds, they must switch to the other port. Critically, this task thus requires animals to estimate if at least 6 seconds have passed after the first nose poke - this is the key aspect of the task focused on here. After verifying that animals reliably estimate the passage of 6 seconds by leaving on average after 9 seconds, the authors examine striatal activity during this interval. They report that D1-MSNs tend to decrease activity, while D2-MSNs increase activity, throughout this interval. They suggest that this activity follows a drift-diffusion model, in which activity increases (or decreases) to a threshold after which a decision (to leave) is made. The authors next report that optogenetically inhibiting D1 or D2 MSNs, or pharmacologically blocking D1 and D2 receptors, increased the average wait time of the animals to 10 seconds on average. This suggests that both D1 and D2 neurons contribute to the estimate of time, with a decrease in their activity corresponding to a decrease in the rate of 'drift' in their drift-diffusion model. Lastly, the authors examine MSN activity while pharmacologically inhibiting D1 or D2 receptors. The authors observe most recorded MSNs neurons decrease their activity over the interval, with the rate decreasing with D1/D2 receptor inhibition.

      Major strengths:

      The study employs a wide range of techniques - including animal behavioral training, electrophysiology, optogenetic manipulation, pharmacological manipulations, and computational modeling. The behavioral task used by the authors is quite interesting and a nice way to probe interval timing in rodents. The question posed by the authors - how striatal activity contributes to interval timing - is of importance to the field and has been the focus of many studies and labs; thus, this paper can meaningfully contribute to that conversation. The data within the paper is presented very clearly, and the authors have done a nice job presenting the data in a transparent manner (e.g., showing individual cells and animals). Overall, the manuscript is relatively easy to read and clear, with sufficient detail given in most places regarding the experimental paradigm or analyses used.

      We are glad our main points came through to the reviewer.

      Major weaknesses:

      I perceive two major weaknesses. The first is the impact or contextualization of their results in terms of the results of the field more broadly. More specifically, it was not clear to me how the authors are interpreting the striatal activity in the context of what others have observed during interval timing tasks. In other words - what was the hypothesis going into this experiment? Does observing increasing/decreasing activity in D2 versus D1 support one model of interval timing over another, or does it further support a more specific idea of how DMS contributes to interval timing? Or was the main question that we didn't know if D2 or D1 neurons had differential activity during interval timing?

      Our hypothesis, based on prior behavioral work from our group describing that blocking striatal D1 and D2 dopamine receptors impaired interval timing (De Corte et al., 2019; Stutt et al., 2023) was D1 and D2 MSNs would have similar patterns of activity during interval timing. We will clarify this in the revision.

      In the second, I felt that some of the conclusions suggested by the authors don't seem entirely supported by the data they present, or the data presented suggests a slightly more complicated story. Below I provide additional detail on some of these instances.

      Regarding the results presented in Figures 2 and 3:

      I am not sure the PC analysis adds much to the interpretation, and potentially unnecessarily complicates things. In particular, running PCA on a matrix of noisy data that is smoothed with a Gaussian will often return PCs similar to what is observed by the authors, with the first PC being a line up/down, the 2nd PC being a parabola that is up/down, etc. Thus, I'm not sure that there is much to be interpreted by the specific shape of the PCs here.

      These are insightful points. We will clarify details of our PCA analysis in the revision. We include PCA for comparisons with our past work (Emmons et al., 2017, 2021; Bruce et al., 2021). Second, it is true that these components can be observed in smoothed data; however, when we generated random data using identical parameters, we found that the variance explained by PC1 was not commonly observed in random data. Third, our goal is to compare between D1 and D2 MSNs, not to interpret the PCs. We will make this explicit in our revision.

      I think an alternative analysis that might be both easier and more informative is to compute the slope of the activity of each neuron across the 6 seconds. This would allow the authors to quantify how many neurons increase or decrease their activity much like what is shown in Figure 2.

      This is exactly the analysis shown in Figure 3D. We will clarify this in the revision.

      Relatedly, it seems that the data shown in Figure 2D doesn't support the authors' main claim regarding D2/D1 MSNs increasing/decreasing their activity, as the trial-by-trial slope is near 0 for both cell types.

      This likely refers to Figure 3D. In the revision, we will clarify this analysis, add error bars, and note that our goal was to differentiate D2 and D1 MSNs in this analysis. We will also add to this analysis to better make the poin that D2 and D1 MSNs are distinct, contrary to our hypothesis.

      Regarding the results in Figure 4:

      The authors suggest that their data is consistent with a drift-diffusion model. However, it is unclear how well the output from the model fits the activity from neurons the authors recorded. Relatedly, it is unclear how the parameters were chosen for the D1/D2 versions of this model. I think that an alternate approach that would answer these questions is to fit the model to each cell, and then examine the best-fit parameters, as well as the ability of the model to predict activity on trials held out from the fitting process. This would provide a more rigorous method to identify the best parameters and would directly quantify how well the model captures the data.

      This is a great point. Our goal was to fit behavioral activity, not neuronal activity; in our revision, we will do exactly what the reviewer suggests and present data of fits to neuronal activity.

      Relatedly, looking at the raw data in Figure 2, it seems that many neurons either fire at the beginning or end of the interval, with more neurons firing at the end, and more firing at the beginning, for D2/D1 neurons respectively. Thus, it's not clear to me whether the drift-diffusion model is a good model of activity. Or, perhaps the model is supposed to be related to the aggregate activity of all D1/D2 neurons? (If so, this should be made more explicit. The comment about fitting the model directly to the data also still stands).

      Our model was inspired by the averages in Figure 2G&H; however, we will fit drift-diffusion models to individual neurons exactly as the reviewer suggests.

      Further, it's unclear to me how, or why, the authors changed the specific parameters they used to model the optogenetic manipulation. Were these parameters chosen because they fit the manipulation data? This I don't think is in itself an issue, but perhaps should be clearly stated, because otherwise it sounds a bit odd given the parameter changes are so specific. It is also not clear to me why the noise in the diffusion process would be expected to change with increased inhibition.

      We will clarify this in our revision, as this is an important point.

      Regarding the results in Figure 6:

      My comments regarding the interpretation of PCs in Figure 2 apply here as well. In addition, I am not sure that examining PC2 adds much here, given that the authors didn't examine such nonlinear changes earlier in the paper.

      We agree – we will remove PC2 in Figure 6 and Figure S9 and add context to the PC analysis noting that we are including for 1) comparisons with past work, 2) our observed variance is much higher than observed in random/smoothed data, and 3) we are primarily interested in comparisons between conditions rather than interpreting the components.

      A larger concern though that seems potentially at odds with the authors' interpretation is that there seems to be very little change in the firing pattern after D1 or D2 blockade. I see that in Figure 6F the authors suggest that many cells slope down (and thus, presumably, they are recoding more D1 cells), and that this change in slope is decreased, but this effect is not apparent in Figure 6C, and Figure 6B shows an example of a cell that seems to fire in the opposite direction (increase activity). I think it would help to show some (more) individual examples that demonstrate the summary effect shown by the authors, and perhaps the authors can comment on the robustness (or the variability) of this result.

      We agree, although we note D1/D2 blockade changes PC1, which explains the most variance in MSN activity. In the revision, we will show more examples and comment on the robustness of PC1, exactly as the reviewer recommends. The changes in PC1 are rather consistent.

      Also, it seems that if the authors want to claim that this manipulation lowers the drift rate. I think to make this claim, they could fit the DDM model and examine whether D is significantly lower.

      This is a great idea – we will try to do this.

      Regarding the results in Figure 7:

      I am overall a bit confused about what the authors are trying to claim here. In Figure 7, they present data suggesting that D1 or D2 blockade disrupts their ability to decode time in the interval of interest (0-6 seconds). However, in the final paragraph of the results, the authors seem to say that by using another technique, they didn't see any significant change in decoding accuracy after D1 or D2 blockade. What do the authors make of this?

      We were not clear. The second classifier was predicting response time. This was confusing and we will remove it.

      Impact:

      The task and data presented by the authors are very intriguing, and there are many groups interested in how striatal activity contributes to the neural perception of time. The authors perform a wide variety of experiments and analysis to examine how DMS activity influences time perception during an interval-timing task, allowing for insight into this process. However, the significance of the key finding - that D2/D1 activity increases/ decreases with time - remains somewhat ambiguous to me. This arises from a lack of clarity regarding the initial hypothesis and the implications of this finding for advancing our understanding of striatal functions.

      Again, we are grateful for the constructive and very insightful comments that we look forward to clarifying in a full revision.

      Reviewer #2 (Public Review):

      Summary:

      In the present study, the authors investigated the neural coding mechanisms for D1- and D2-expressing striatal direct and indirect pathway MSNs in interval timing by using multiple strategies. They concluded that D2-MSNs and D1-MSNs have opposing temporal dynamics yet disrupting either type produced similar effects on behavior, indicating the complementary roles of D1- and D2- MSNs in cognitive processing. However, the data was incomplete to fully support this major finding. One major reason is the heterogenetic responses within the D1-or D2-MSN populations. In addition, there are additional concerns about the statistical methods used. For example, the majority of the statistical tests are based on the number of neurons, but not the number of mice. It appears that the statistical difference was due to the large sample size they used (n=32 D2-MSNs and n=41 D1-MSNs), but different neurons recorded in the same mouse cannot be treated as independent samples; they should use independent mouse-based statistical analysis.

      Strengths:

      The authors used multiple approaches including awake mice behavior training, optogenetic-assistant cell-type specific recording, optogenetic or pharmacological manipulation, neural computation, and modeling to study neuronal coding for interval timing.

      We appreciate the reviewer’s careful read recognizing the breadth of our approach.

      Weaknesses:

      (1) More detailed behavior results should be shown, including the rate of the success switches, and how long it takes to wait in the second nose poke to get a reward. For line 512 and the Figure 1 legend, the reviewer is not clear about the reward delivery. The methods appear to state that the mouse had to wait for 18s, then make nose pokes at the second port to get the reward. What happens if the mouse made the second nose poke before 18 seconds, but then exited? Would the mouse still get the reward at 18 seconds? Similarly, what happens if the mice made the third or more nosepokes within 18 seconds? It is important to clarify because, according to the method described, if the mice made a second nose poke before 18 seconds, this already counted as the mouse making the "switch." Lastly, what if the mice exited before 6s in the first nosepoke?

      We agree. These were presented in detail in our prior work (Bruce et al., 2021; Larson et al., 2022; and Weber et al., 2023) and work from others (Balci et al 2008; Tosun et al., 2016. However, we will work on a detailed behavioral schematic in the revision and move supplementary behavioral data in Figure S1 to the main manuscript.

      (2) There are a lot of time parameters in this behavior task, the description of those time parameters is mentioned in several parts, in the figure legend, supplementary figure legend, and methods, but was not defined clearly in the main text. It is inconvenient, sometimes, confusing for the readers. The authors should make a schematic diagram to illustrate the major parameters and describe them clearly in the main text.

      This is a great suggestion – we will do this – and clarify in the above schematic.

      (3) In Line 508, the reviewer suggests the authors pay attention to those trials without "switch". It would be valuable to compare the MSN activity between those trials with or without a "switch".

      We analyzed MSN activity on errors in detail Figure 6 of Bruce et al., 2021. These errors are infrequent and inconsistent – we will discuss this in the revision.

      (4) The definition of interval is not very clear. It appears that the authors used a 6-second interval in analyzing the data in Figure 2 and Figure 3. But from my understanding, the interval should be the time from time "0" to the "switch", when the mice start to exit from the first nose poke.

      We agree. The switch time can be vastly different on some trials, making it challenging to compare different lengths and slopes. However, we will clarify the interval as noted above, and we have a few ideas on how to do the analysis the reviewer suggests.

      (5) For Figure 2 C-F, the authors only recorded 32 D2-MSNs in 4 mice, and 41 D1-MSNs in 5 mice. The sample size is too small compared to the sample size usually used in the field. In addition to the small sample size, the single-cell activity exhibited heterogeneity, which created potential issues. For both D1 and D2 MSNs, the authors tried to make conclusions on the "trend" of increasing in D2-MSNs and decreasing in D1-MSNs populations, respectively, during the interval. However, such a conclusion is not sufficiently supported by the data presented. It looks like the single-cell activity patterns can be separated into groups: one is a decreasing activity group, one is an increasing activity group and a small group for on and off response. Because of the small sample size, the author should pay attention to the variance across different mice (which needs to be clearly presented in the manuscript), instead of pooling data together and analyzing the mean activity.

      We were not clear – we did this analysis exactly the reviewer suggested. We are not pooling any data – instead – as we state on line 620 – we are using linear-mixed effects models to account for mouse-specific and neuron-specific variance. This approach was developed with our statistics core for exactly the reasons the reviewer suggested. Furthermore, we will add to this analysis demonstrative that it is resistant to outliers. Finally, we will include measures of effect size noting that it is a medium to large effect.

      It’s a helpful idea to plot data individually by mice, and we will do so in the revision.

      (6) For Figure 2, from the activity in E and F, it seems that the activity already rose before the trial started, the authors should add some longer baseline data before time zero for clarification and comparison, and show the timing of the actual start of the activity with the corresponding behavior. What behavior states are the mice in when initiating the activity?

      We can certainly include a longer baseline. We can clarify in the revision that mice initiate trials at the rear nosepoke, and this is what initiates the task cues and the temporal interval.

      (7) The authors were focused on the "switch " behavior in the task, but they used an arbitrary 6s time window to analyze the activity, and tried to correlate the decreasing or increasing activities of MSNs to the neural coding for time. A better way to analyze is to sort the activity according to the "switch" time, from short to long intervals. This way, the authors could see and analyze whether the activity of D1 or D2 MSNs really codes for the different length of interval, instead of finding a correlation between average activity trends and the arbitrary 6s time window.

      This is a great idea, and we have some ideas on how to adapt the GLM analysis to perform this analysis.

      Reviewer #3 (Public Review):

      Summary:

      The cognitive striatum, also known as the dorsomedial striatum, receives input from brain regions involved in high-level cognition and plays a crucial role in processing cognitive information. However, despite its importance, the extent to which different projection pathways of the striatum contribute to this information processing remains unclear. In this paper, Bruce et al. conducted a study using a range of causal and correlational techniques to investigate how these pathways collectively contribute to interval timing in mice. Their results were consistent with previous research, showing that the direct and indirect striatal pathways perform opposing roles in processing elapsed time. Based on their findings, the authors proposed a revised computational model in which two separate accumulators track evidence for elapsed time in opposing directions. These results have significant implications for understanding the neural mechanisms underlying cognitive impairment in neurological and psychiatric disorders, as disruptions in the balance between direct and indirect pathway activity are commonly observed in such conditions.

      Strengths:

      The authors employed a well-established approach to study interval timing and employed optogenetic tagging to observe the behavior of specific cell types in the striatum. Additionally, the authors utilized two complementary techniques to assess the impact of manipulating the activity of these pathways on behavior. Finally, the authors utilized their experimental findings to enhance the theoretical comprehension of interval timing using a computational model.

      We are grateful for the reviewer’s consideration of our work and recognizing the strengths of our approach.

      Weaknesses:

      The behavioral task used in this study is best suited for investigating elapsed time perception, rather than interval timing. Timing bisection tasks are often employed to study interval timing in humans and animals.

      This is certainly valid, and we will include these points in the revision.

      The main results from unit recording (opposing slopes of D1/D2 cell firing rate, as shown in Figure 3D) appear to be very sensitive to a couple of outlier cells, and the predictive power of ensemble recording seems to be only slightly above chance levels.

      We are glad that the reviewer raised this. We will add to this analysis demonstrative that it is resistant to outliers. Finally, we will include measures of effect size noting that it is a medium to large effect. Thus, it is significantly above chance, and rather reliable, and supported by our PCA results in Figure 3C.

      In the optogenetic experiment, the laser was kept on for too long (18 seconds) at high power (12 mW). This has been shown to cause adverse effects on population activity (for example, through heating the tissue) that are not necessarily related to their function during the task epochs.

      Again, this is an important point. We are well aware of heating effects with optogenetics. For the exact reasons noted by the reviewer, we had opsin-negative controls –when the laser was on the exact same time course and parameters – in Figure S5. There were no behavioral effects in controls with identical heating and other effects of the laser. Furthermore, these effects are similar to pharmacological effects in this manuscript and in our prior work (De Corte et al., 2019; Stutt et al., 2023). We will better highlight these issues in the revision.

      Given the systemic delivery of pharmacological interventions, it is difficult to conclude that the effects are specific to the dorsomedial striatum. Future studies should use the local infusion of drugs into the dorsomedial striatum.

      This is a great point - we did exactly this experiment in De Corte et al, 2019 with local drug infusions. This earlier study was the departure point for this experiment, although it is challenging to combine focal pharmacological inactivation with recordings in mice (we have extensive experience with this in rats in Parker et al., 2015 and Parker et al, 2015). Furthermore, we have similar local optogenetics effects in this paper. We will include these points in the revised manuscript.

    2. eLife assessment

      This valuable study examines the activity and function of dorsomedial striatal neurons in estimating time. The authors examine striatal activity as a function of time and the impact of optogenetic striatal manipulation on the animal's ability to estimate a time interval. However, the task's design and methodology present several confounding factors that mean the evidence in support of the authors' claims is incomplete. With these limitations addressed, the work would be of interest to neuroscientists examining how striatum contributes to behavior.

    3. Reviewer #1 (Public Review):

      Summary:

      In this work, the authors examine the activity and function of D1 and D2 MSNs in dorsomedial striatum (DMS) during an interval timing task. In this task, animals must first nose poke into a cued port on the left or right; if not rewarded after 6 seconds, they must switch to the other port. Critically, this task thus requires animals to estimate if at least 6 seconds have passed after the first nose poke - this is the key aspect of the task focused on here. After verifying that animals reliably estimate the passage of 6 seconds by leaving on average after 9 seconds, the authors examine striatal activity during this interval. They report that D1-MSNs tend to decrease activity, while D2-MSNs increase activity, throughout this interval. They suggest that this activity follows a drift-diffusion model, in which activity increases (or decreases) to a threshold after which a decision (to leave) is made. The authors next report that optogenetically inhibiting D1 or D2 MSNs, or pharmacologically blocking D1 and D2 receptors, increased the average wait time of the animals to 10 seconds on average. This suggests that both D1 and D2 neurons contribute to the estimate of time, with a decrease in their activity corresponding to a decrease in the rate of 'drift' in their drift-diffusion model. Lastly, the authors examine MSN activity while pharmacologically inhibiting D1 or D2 receptors. The authors observe most recorded MSNs neurons decrease their activity over the interval, with the rate decreasing with D1/D2 receptor inhibition.

      Major strengths:

      The study employs a wide range of techniques - including animal behavioral training, electrophysiology, optogenetic manipulation, pharmacological manipulations, and computational modeling. The behavioral task used by the authors is quite interesting and a nice way to probe interval timing in rodents. The question posed by the authors - how striatal activity contributes to interval timing - is of importance to the field and has been the focus of many studies and labs; thus, this paper can meaningfully contribute to that conversation. The data within the paper is presented very clearly, and the authors have done a nice job presenting the data in a transparent manner (e.g., showing individual cells and animals). Overall, the manuscript is relatively easy to read and clear, with sufficient detail given in most places regarding the experimental paradigm or analyses used.

      Major weaknesses:

      I perceive two major weaknesses. The first is the impact or contextualization of their results in terms of the results of the field more broadly. More specifically, it was not clear to me how the authors are interpreting the striatal activity in the context of what others have observed during interval timing tasks. In other words - what was the hypothesis going into this experiment? Does observing increasing/decreasing activity in D2 versus D1 support one model of interval timing over another, or does it further support a more specific idea of how DMS contributes to interval timing? Or was the main question that we didn't know if D2 or D1 neurons had differential activity during interval timing?

      In the second, I felt that some of the conclusions suggested by the authors don't seem entirely supported by the data they present, or the data presented suggests a slightly more complicated story. Below I provide additional detail on some of these instances.

      Regarding the results presented in Figures 2 and 3:

      I am not sure the PC analysis adds much to the interpretation, and potentially unnecessarily complicates things. In particular, running PCA on a matrix of noisy data that is smoothed with a Gaussian will often return PCs similar to what is observed by the authors, with the first PC being a line up/down, the 2nd PC being a parabola that is up/down, etc. Thus, I'm not sure that there is much to be interpreted by the specific shape of the PCs here. I think an alternative analysis that might be both easier and more informative is to compute the slope of the activity of each neuron across the 6 seconds. This would allow the authors to quantify how many neurons increase or decrease their activity much like what is shown in Figure 2.

      Relatedly, it seems that the data shown in Figure 2D *doesn't* support the authors' main claim regarding D2/D1 MSNs increasing/decreasing their activity, as the trial-by-trial slope is near 0 for both cell types.

      Regarding the results in Figure 4:

      The authors suggest that their data is consistent with a drift-diffusion model. However, it is unclear how well the output from the model fits the activity from neurons the authors recorded. Relatedly, it is unclear how the parameters were chosen for the D1/D2 versions of this model. I think that an alternate approach that would answer these questions is to fit the model to each cell, and then examine the best-fit parameters, as well as the ability of the model to predict activity on trials held out from the fitting process. This would provide a more rigorous method to identify the best parameters and would directly quantify how well the model captures the data.

      Relatedly, looking at the raw data in Figure 2, it seems that many neurons either fire at the beginning or end of the interval, with more neurons firing at the end, and more firing at the beginning, for D2/D1 neurons respectively. Thus, it's not clear to me whether the drift-diffusion model is a good model of activity. Or, perhaps the model is supposed to be related to the aggregate activity of all D1/D2 neurons? (If so, this should be made more explicit. The comment about fitting the model directly to the data also still stands).

      Further, it's unclear to me how, or why, the authors changed the specific parameters they used to model the optogenetic manipulation. Were these parameters chosen because they fit the manipulation data? This I don't think is in itself an issue, but perhaps should be clearly stated, because otherwise it sounds a bit odd given the parameter changes are so specific. It is also not clear to me why the noise in the diffusion process would be expected to change with increased inhibition.

      Regarding the results in Figure 6:

      My comments regarding the interpretation of PCs in Figure 2 apply here as well. In addition, I am not sure that examining PC2 adds much here, given that the authors didn't examine such nonlinear changes earlier in the paper.

      A larger concern though that seems potentially at odds with the authors' interpretation is that there seems to be very little change in the firing pattern after D1 or D2 blockade. I see that in Figure 6F the authors suggest that many cells slope down (and thus, presumably, they are recoding more D1 cells), and that this change in slope is decreased, but this effect is not apparent in Figure 6C, and Figure 6B shows an example of a cell that seems to fire in the opposite direction (increase activity). I think it would help to show some (more) individual examples that demonstrate the summary effect shown by the authors, and perhaps the authors can comment on the robustness (or the variability) of this result.

      Also, it seems that if the authors want to claim that this manipulation lowers the drift rate. I think to make this claim, they could fit the DDM model and examine whether D is significantly lower.

      Regarding the results in Figure 7:

      I am overall a bit confused about what the authors are trying to claim here. In Figure 7, they present data suggesting that D1 or D2 blockade disrupts their ability to decode time in the interval of interest (0-6 seconds). However, in the final paragraph of the results, the authors seem to say that by using another technique, they didn't see any significant change in decoding accuracy after D1 or D2 blockade. What do the authors make of this?

      Impact:

      The task and data presented by the authors are very intriguing, and there are many groups interested in how striatal activity contributes to the neural perception of time. The authors perform a wide variety of experiments and analysis to examine how DMS activity influences time perception during an interval-timing task, allowing for insight into this process. However, the significance of the key finding - that D2/D1 activity increases/ decreases with time - remains somewhat ambiguous to me. This arises from a lack of clarity regarding the initial hypothesis and the implications of this finding for advancing our understanding of striatal functions.

    4. Reviewer #2 (Public Review):

      Summary:

      In the present study, the authors investigated the neural coding mechanisms for D1- and D2-expressing striatal direct and indirect pathway MSNs in interval timing by using multiple strategies. They concluded that D2-MSNs and D1-MSNs have opposing temporal dynamics yet disrupting either type produced similar effects on behavior, indicating the complementary roles of D1- and D2- MSNs in cognitive processing. However, the data was incomplete to fully support this major finding. One major reason is the heterogenetic responses within the D1-or D2-MSN populations. In addition, there are additional concerns about the statistical methods used. For example, the majority of the statistical tests are based on the number of neurons, but not the number of mice. It appears that the statistical difference was due to the large sample size they used (n=32 D2-MSNs and n=41 D1-MSNs), but different neurons recorded in the same mouse cannot be treated as independent samples; they should use independent mouse-based statistical analysis.

      Strengths:

      The authors used multiple approaches including awake mice behavior training, optogenetic-assistant cell-type specific recording, optogenetic or pharmacological manipulation, neural computation, and modeling to study neuronal coding for interval timing.

      Weaknesses:

      (1) More detailed behavior results should be shown, including the rate of the success switches, and how long it takes to wait in the second nose poke to get a reward. For line 512 and the Figure 1 legend, the reviewer is not clear about the reward delivery. The methods appear to state that the mouse had to wait for 18s, then make nose pokes at the second port to get the reward. What happens if the mouse made the second nose poke before 18 seconds, but then exited? Would the mouse still get the reward at 18 seconds? Similarly, what happens if the mice made the third or more nosepokes within 18 seconds? It is important to clarify because, according to the method described, if the mice made a second nose poke before 18 seconds, this already counted as the mouse making the "switch." Lastly, what if the mice exited before 6s in the first nosepoke?

      (2) There are a lot of time parameters in this behavior task, the description of those time parameters is mentioned in several parts, in the figure legend, supplementary figure legend, and methods, but was not defined clearly in the main text. It is inconvenient, sometimes, confusing for the readers. The authors should make a schematic diagram to illustrate the major parameters and describe them clearly in the main text.

      (3) In Line 508, the reviewer suggests the authors pay attention to those trials without "switch". It would be valuable to compare the MSN activity between those trials with or without a "switch".

      (4) The definition of interval is not very clear. It appears that the authors used a 6-second interval in analyzing the data in Figure 2 and Figure 3. But from my understanding, the interval should be the time from time "0" to the "switch", when the mice start to exit from the first nose poke.

      (5) For Figure 2 C-F, the authors only recorded 32 D2-MSNs in 4 mice, and 41 D1-MSNs in 5 mice. The sample size is too small compared to the sample size usually used in the field. In addition to the small sample size, the single-cell activity exhibited heterogeneity, which created potential issues. For both D1 and D2 MSNs, the authors tried to make conclusions on the "trend" of increasing in D2-MSNs and decreasing in D1-MSNs populations, respectively, during the interval. However, such a conclusion is not sufficiently supported by the data presented. It looks like the single-cell activity patterns can be separated into groups: one is a decreasing activity group, one is an increasing activity group and a small group for on and off response. Because of the small sample size, the author should pay attention to the variance across different mice (which needs to be clearly presented in the manuscript), instead of pooling data together and analyzing the mean activity.

      (6) For Figure 2, from the activity in E and F, it seems that the activity already rose before the trial started, the authors should add some longer baseline data before time zero for clarification and comparison, and show the timing of the actual start of the activity with the corresponding behavior. What behavior states are the mice in when initiating the activity?

      (7) The authors were focused on the "switch " behavior in the task, but they used an arbitrary 6s time window to analyze the activity, and tried to correlate the decreasing or increasing activities of MSNs to the neural coding for time. A better way to analyze is to sort the activity according to the "switch" time, from short to long intervals. This way, the authors could see and analyze whether the activity of D1 or D2 MSNs really codes for the different length of interval, instead of finding a correlation between average activity trends and the arbitrary 6s time window.

    5. Reviewer #3 (Public Review):

      Summary:

      The cognitive striatum, also known as the dorsomedial striatum, receives input from brain regions involved in high-level cognition and plays a crucial role in processing cognitive information. However, despite its importance, the extent to which different projection pathways of the striatum contribute to this information processing remains unclear. In this paper, Bruce et al. conducted a study using a range of causal and correlational techniques to investigate how these pathways collectively contribute to interval timing in mice. Their results were consistent with previous research, showing that the direct and indirect striatal pathways perform opposing roles in processing elapsed time. Based on their findings, the authors proposed a revised computational model in which two separate accumulators track evidence for elapsed time in opposing directions. These results have significant implications for understanding the neural mechanisms underlying cognitive impairment in neurological and psychiatric disorders, as disruptions in the balance between direct and indirect pathway activity are commonly observed in such conditions.

      Strengths:

      The authors employed a well-established approach to study interval timing and employed optogenetic tagging to observe the behavior of specific cell types in the striatum. Additionally, the authors utilized two complementary techniques to assess the impact of manipulating the activity of these pathways on behavior. Finally, the authors utilized their experimental findings to enhance the theoretical comprehension of interval timing using a computational model.

      Weaknesses:

      The behavioral task used in this study is best suited for investigating elapsed time perception, rather than interval timing. Timing bisection tasks are often employed to study interval timing in humans and animals. The main results from unit recording (opposing slopes of D1/D2 cell firing rate, as shown in Figure 3D) appear to be very sensitive to a couple of outlier cells, and the predictive power of ensemble recording seems to be only slightly above chance levels. In the optogenetic experiment, the laser was kept on for too long (18 seconds) at high power (12 mW). This has been shown to cause adverse effects on population activity (for example, through heating the tissue) that are not necessarily related to their function during the task epochs. Given the systemic delivery of pharmacological interventions, it is difficult to conclude that the effects are specific to the dorsomedial striatum. Future studies should use the local infusion of drugs into the dorsomedial striatum.

    1. eLife assessment

      This work addresses an important biological question: what is the cellular basis of wound healing? Using the Drosophila pupal notum as a model, the paper provides an elegant, thorough, descriptive characterisation of syncytia-driven wound closure using state-of-the-art confocal live imaging of the pupal notum. The authors meticulously characterize the cell-cell fusion events during wound healing, but without any mechanisms to inhibit cell fusion, it is incomplete, since it remains unclear whether cell fusion is required or not for speeding wound healing and/ or increasing the level of actin resources at the leading edge.

    2. Reviewer #1 (Public Review):

      Summary:

      This study aims to understand how cell fusion contributes to wound healing using a laser-induced injury in the notum epithelium of a developing fruit fly. The authors meticulously characterize the epithelial fusion events using a live imaging approach and report that syncytia arise by 'border breakdown' and 'cell shrinking'. The syncytial epithelial cells also appear to outcompete mononucleated cells and preferentially dissolve their tangential borders, which correlates with the accumulation of actin at the leading edge.

      Strengths:

      The strength of this study is the authors' live imaging approach to capture these dynamic fusion events that are a fundamental, yet poorly understood biological process.

      Weaknesses:

      A major weakness is that all the authors' conclusions are based on descriptive studies, in which the role of cell fusion is not directly tested. This is particularly important because other models of wound-induced polyploidization have demonstrated that another cytoskeletal protein, myosin, was upregulated and dependent on endoreplication, and not cell fusion. Therefore it remains unclear to what extent cell fusion, endoreplication, or both are required to outcompete mononucleated cells as well as pool actin as described in this study.

    3. Reviewer #2 (Public Review):

      Summary:

      Overall, this study provides a thorough description of the formation of syncytia following wounding of the proliferation-competent diploid epithelium of the pupal notum. While this phenomenon has already been described briefly for this particular tissue by the Galko lab in Wang et al 2015, the authors provide a much more detailed description and characterisation of the process providing some novel insights (radial versus tangential border breakdown, cell shrinkage, timings, syncytia outcompeting mononucleated cells, etc.).

      Strengths:

      This paper provides an elegant, thorough, descriptive characterisation of syncytia-driven wound closure using state-of-the-art confocal live imaging of the pupal notum. The authors show that laser-induced wounding of this diploid, proliferation-competent epithelium results in the formation of syncytia of various sizes in the first few cell rows around the wound edge, which progressively become bigger as healing proceeds. This results in ~50% of cells becoming part of these syncytia. The cell fusion events were convincingly demonstrated by showing the disappearance of p120ctnRFP and E-Cadherin-GFP from cell-cell borders as well as cytoplasmic GFP mixing of GFP-positive cells with a GFP-negative cell.

      Apart from cell-cell fusion by border breakdown that mostly happens in the first 2h following wounding, the authors also found that at later stages of wound healing cell shrinkage following cytoplasmic mixing contributed to sycytia formation.

      Next, the authors provided some convincing evidence that syncytia outcompete mononuclear cells for being positioned in the first cell row around the wound.

      The authors then show that radial border breakdown occurs much less frequently than tangential border breakdown. They suggest that radial border breakdown reduces the requirement for cell-cell intercalations. They also hypothesise that tangential border breakdown might allow fused cells to share resources and provide more resources to be used near the wound edge, e.g. for actomyosin cable formation. To test this, the authors generate single-cell clones that overexpress Actin-GFP. They then show convincingly how a single Actin-GFP-positive cell in the second cell row fuses with one GFP-negative cell in the first cell row. The Actin-GFP signal then spreads in the fused cell and labels some previously unlabelled actin-rich structure near the wound edge which most likely is the actomyosin cable. This provides some evidence for resource sharing by cytoplasmic mixing following fusion.

      Weaknesses:

      The authors provide some convincing evidence that syncytia outcompete mononuclear cells for being positioned in the first cell row around the wound. The authors suggest that the syncytial cells might be better able to close the wound. However, some genetic studies would need to be done to establish this more convincingly. E.g. Could the authors genetically block syncytia formation and then show that these wounds now heal slower?

      The authors suggest that radial border breakdown reduces the requirement for cell intercalation. While this might be true it also raises the question of how the various syncytia facing the wound border change shape to allow the shrinkage of the first cell row over time to allow wound closure. None of the four movies included in the study shows the whole wound healing process until the later stages, making it hard to assess this. It would be good to include one such movie showing the syncytia in the whole wound and comment on this point.

      The authors hypothesise that tangential border breakdown might allow fused cells to share resources and provide more resources to be used near the wound edge, e.g. for actomyosin cable formation. They show convincingly through the fusion of a single Actin-GFP-positive cell in the second cell row with a GFP-negative cell in the first cell row that Actin-GFP spreads in the fused cell and labels the previously unlabelled actomyosin cable. While the hypothesis of resource sharing to improve healing is intriguing and makes sense, this experiment doesn't necessarily prove the benefit of resource sharing. It does show cytoplasmic mixing following fusion, now allowing the GFP-labelled actin to diffuse and be incorporated into the actomyosin cable. In a wild-type condition, fusion would not increase the total concentration of resources, although it would increase the total amount of resources within this bigger fused cell. The question is whether resource sharing without increasing the protein concentration is beneficial and increases the efficiency of certain wound healing mechanisms. There might be a benefit of cell fusion, if for example certain resources were only present in limited amounts or if protein transport could increase the concentration locally. To provide better evidence for the hypothesis that resource sharing improves wound healing, maybe the authors could look at the actomyosin cable in a wounded epithelium (such as in Figure 4E, F), in which all cells express MyoII-GFP. The authors could compare the average intensity of the actomyosin cable at the wound edge in mononucleated cells versus in syncytia. If resource sharing is indeed beneficial, it might be that the actomyosin cable is stronger/brighter in syncytia or it forms quicker.

      The biggest limitation of this study is that the authors don't address how the formation of these syncytia is regulated. While the manuscript in its current form provides some valuable new insights into syncytial-driven wound closure, it would be much more informative if it also provided some mechanistic details. The authors could test if some of the mechanisms shown to regulate syncytial formation in other types of syncytia-driven wound healing are also involved here. E.g. Yorkie was shown to negatively regulate cell fusion in adult syncytial-driven wound closure (Losick et al 2013). The authors could test for the effect of Yorkie-RNAi in the epithelium on wound closure and syncytia formation. Expression of the dominant negative RacN17 also blocked cell fusion in adult syncytial-driven wound closure (Losick et al 2013).

      Moreover, JNK activation was shown to be needed in larval syncytial-driven wound closure (Galko and Krasnow 2004). The authors could test JNK pathway reporters to assess pathway activation or test if the JNK pathway is needed for syncytial-driven wound closure by expressing a dominant-negative form of Basket JNK in the epithelium.

      Or could syncytia formation be regulated by changes in Integrin-mediated adhesion as shown by the Galko lab in Wang et al 2015? They show that wounding provoked a striking relocalization of PINCH and ILK, indicating the disassembly of functional FA complexes concomitant with syncytium formation. Maybe the authors could investigate some of these.

      Another general question that the authors raise but don't address enough is whether syncytia-driven wound closure in proliferation-competent epithelia is any different from the one in post-mitotic, polyploid epithelia. Since the mechanism regulating the former is not known, this remains unclear.

      Finally, it is not clear, whether syncytia in these proliferation-competent epithelia get resolved after wound healing. Do they get removed and replaced by mononucleated proliferation-competent cells or do the syncytia stay in the epithelium like a scar? The authors should provide some images of wound areas a few hours after wound closure is complete and comment on this.

      Minor points:

      Figure 3: It would be better to have the microcopy images alongside the quantifications.

      Figure 4A: The syncytium at the wound edge here doesn't look straight but wavy. Does it not form an actomyosin cable that straightens the front? Or are there lamellipodia/filopodia?

      248: The authors suggest an interesting hypothesis that mitochondria or ER could be pooled in fused cells. It would be nice to see some evidence: e.g. by labeling mitochondria and assessing where they are in syncytia versus mononucleated cells and whether they are concentrated around the wound edge.

      141-145 (Figure 4B and C) This example is not completely convincing. First, it is hard to see where the wound edge is. Second, it would be good to include an even later time point when the cell is clearly no longer at the wound edge.

    4. Reviewer #3 (Public Review):

      Summary:

      White et al. described laser-induced wound healing of the Drosophila pupal notum. They found that the epithelial monolayer is dynamically induced to form syncytia by cell-cell fusion as an important part of repair. They reveal two processes: cell shrinking and border breakage that occur as part of syncytia formation. Expression of GFP in the cytoplasms of some epithelial cells reveals that cytoplasmic contents mix following injury and the GFP rapidly diffuses between cells. Using live imaging they observe that syncytia expand towards the wound, maintain their positions close to the leading edge, and apparently displace smaller cells. They propose that syncytia redistribute cellular components towards the wound facilitating repair and show that labelled actin becomes concentrated at the leading edge.

      Strengths:

      The manuscript is interesting and on an important and emerging topic of wound healing in a genetically tractable organism. The manuscript is very well written.

      Weaknesses:

      There are three major issues that the authors must address:

      (1) Is cell-cell fusion sufficient to enhance/facilitate wound healing?

      (2) Characterization of "border breakdown"; Is this phenomenon disassembly of apical junctions following membrane fusion?

      (3) Are cells really shrinking or is it only the apical domains that "shrink" as the cells join the syncytium?

    1. eLife assessment

      This important study describes the new observation that nuclear volume responds to confinement in a manner that requires transit through mitosis. The authors present solid evidence demonstrating that nuclear volume decreases upon nuclear envelope reformation under confinement in a manner that reestablishes a homeostatic state of nuclear envelope tension. Additional experimental support could provide a more complete case for the proposed underlying mechanisms governing this response. The work will be of broad interest to cell biologists and those interested in cell and organismal scaling.

    2. Author Response

      Reviewer #1 (Public Review):

      (1) It is unclear whether the authors took into consideration the contribution of nuclear blebs for nuclear volume measurements. This would be particularly relevant in situations of very strong confinement. Blebs were previously shown to affect volume (Mistriotis et al., JCB 2019). One could argue that the decreased nuclear volume was due to the increased blebbing observed in very strong confinements.

      As stated in the main text: “[Nuclear Blebs] had a limited contribution to the increase in nuclearprojected area, as the increase remained significantly different even if protrusions were dismissed to compute the projected area (Fig S3C)”. In addition, a decrease in the nuclear volume was also observed for slight and intermediate confinement (height = 7 and 9 µm), while in these two conditions, no blebs are observed.

      (2) From their experimental setup, it is unclear whether the reduced nuclear volume observed after confined cell division arises from a geometrical constraint or is due to an intrinsic nuclear feature. One could argue that cells exiting mitosis under confinement have clustered chromosomes and, therefore, will have decreased volume. This would imply that the nucleus is not "reset" but rather that a geometrical constraint is forcing nuclei to be smaller. One way to test this would be to follow individual cells under confinement, let them enter mitosis, and then release the confinement. If, under these conditions, the daughter nuclei are smaller, then it supports their model. If daughter nuclei recover to their initial value, then it´s simply due to a geometrical constraint that forces the clustering of chromosomes and the reassembly of the NE in a confined space.

      We agree with the reviewer. As stated in the discussion, “For now, the mechanisms involved remain elusive”, and “Our results call for an in-depth analysis of the molecular pathways at play”. The experiments suggested by the reviewer are definitely important experiments that we plan to carry out. Indeed, it is important to know if cells that were ‘born’ under confinement will retain smaller nuclei in the next generation if confinement is released, or whether the next generation will recover their initial larger nuclei.

      (3) The authors claim that the nucleus adapts to confinement based on evidence that the nucleus no longer shrinks in the second division following the first division. I would argue no further decrease is possible because the DNA is already compacted in the smallest possible volume. If indeed nuclei are in a new homeostatic state as the authors claim, then one would expect nuclei to remain smaller even after confinement is removed. This analysis is missing.

      As mentioned above, we agree that “deconfinement experiments” are indeed important. Nevertheless, we respectfully want to point out that the DNA is not compacted to its maximum level during confinement.

      First, we observed that the nuclei of the second generation of cells born in confinement no longer shrink for all investigated confinement conditions, including for slight confinement (height of 9 µm, corresponding to an initial nuclear deformation of 41%), where DNA is less confined compared to the very strong confinement condition (height of 3 µm, corresponding to an initial nuclear deformation of 70%).

      Second, the total uncompressible volumetric fraction of a cell is smaller than 30% (Roffay et al. PMID: 34785592, Cell Biology by the Numbers ISBN: 9780815345374) this allows a nucleus to be compressed to over 70% of its size, as we observed in the extreme scenario.

      (4) Also, if the authors want to claim that this is a mechanism used for cancer cells to adapt to confined situations as the title says, they need to show that normal, near-diploid cells do not behave in the same way. This analysis is missing.

      We agree with the reviewer. For the revised version, we have planned to analyze cell response to confinement using the RPE-1 cell line, as a model of a diploid and untransformed cell line. This will be important experiments to know if the nuclear mechanism identified in the HT-29 cell line is also at stake for normal cells.

      (5) Authors state that "Loss of nuclear blebs is clearly linked to mitosis, suggesting that nuclear volume and nuclear envelope tension are tightly coupled, and supports the hypothesis that mitosis is a key regulator of nuclear envelope tension". I have a few issues with the way this sentence is written. Firstly, one could say that all nuclear structures (and not only blebs) are lost during mitosis because the nucleus disassembles. Hence, the new homeostatic state could be determined by envelope reassembly after mitosis and not mitosis itself. Thirdly, how can mitosis be a key regulator of nuclear envelope tension when the nucleus is disassembled during the process? These require clarification.

      We agree with the reviewer that the formulation used required clarification that will be made in the revised version: for now, we only have evidence that nuclear volume regulation is at stake at mitosis. The most probable hypothesis is that confinement perturbed NE reassembly after mitosis, and that this perturbed reassembly leads to a change in nuclear volume. Complementary experiments are needed to test such a hypothesis, using cell lines stably expressing LAP2/LAP2b-GFP for instance. It is however delicate experiments that will require a dedicated study on its own.

      Secondly, I don´t understand why the loss of nuclear blebs suggests that volume and tension are tightly coupled.

      Nuclear Blebs appear once nuclei have reached a critical NE tension (Srivastava, et al PMID: 33662810). The fact that cells “born” under confinement have no nuclear blebs means that their nuclei are no longer under tension. This is a direct consequence of the decrease in nuclear volume, implying a coupling between volume and tension.

      (6) The authors claim that, unlike previous studies (Lomakin et al), this work shows a "gradual nuclear adaptation". From their results, this is difficult to conclude simply because they do not analyse cPLA2 levels. This is solely based on indirect evidence obtained from cPLA2 inhibition. A gradual adaptation would mean that based on the level of confinement we would expect to have increasingly higher levels of cPLA2 (and therefore nuclear tension).

      We thank the reviewer for his/her comment. Indeed, we have no direct evidence of gradual cPLA2 recruitment in our study, as we did not analyze cPLA2 levels.

      However, of note, in our study, nuclear volume and tension adaptation occur in the entire range of confinement height (from 3 to 9 µm), with a decrease in nuclear volume inversely correlated with the imposed initial nuclear deformation (fig S2C). On the contrary, in Lomakin et al., for HeLa cells, a threshold of 5 µm confinement is needed to trigger a cell motility response mediated by cPLA2. Such a difference suggests that other parameters are used as a confinement readout by cells during the reassembly of the NE after mitosis.

      (7) The authors should refrain from saying that the mechanism behind DNA repair is coupled to the nuclear adaptation they show. There are several points regarding this statement. Firstly, increased DNA damage could be due to nuclear ruptures imposed by confinement at 2h. In fact, the authors show leakage of NLS from the nucleus after confinement (Figure S3A). Secondly, the decrease in DNA damage at 24h could be because these nuclei did not rupture. How can they ensure that cells with low DNA damage at 24h had increased DNA damage at 2h? Finally, one needs to confirm if the nuclei they are analysing at 24h did undergo a round of cell division previously. From the evidence provided, the authors cannot conclude that DNA damage regulation is occurring in confined cells. Moreover, cell cycle arrest is a known effect of DNA damage. Cells with high damage at 2h most likely are arrested or will present with increased mitotic errors (which the authors exclude from their analyses).

      We need to clarify our analysis workflow: it was only in live experiments that we excluded cells with abnormal cell division, as cell division was visible in the timelapse. For immuno-staining analysis on fixed samples, all non-apoptotic cells were taken into account in the analysis. The decrease in DNA damage observed at 24h thus applies to all cells under confinement. There is a clear difference between 2h and 24h in the 2AX immunostaining (that is used as a proxy for DNA damage): whereas at 2h almost all cells have several foci (10-15 foci per cells on average fig. 3H), the number of foci in the entire cell population decreases to 1-2 foci per cell at 24h. The population at 24h mainly includes cells that have undergone a round of cell division, with >80 % of normal cells, as quantified in Fig. 3 E. In the revised version, we will include as a supplementary figure, a quantification of the percentage of cells having more than 5 foci at 2h and 24h, as well as large field of views for -2AX immunostaining to illustrate the distribution.

      Reviewer #2 (Public Review)

      One major limitation is that all experiments are performed in a single cell line, HT-29 human colorectal cancer cells, which has an unusual nuclear envelope composition as it has no lamin B2, low lamin B1 levels, and contains a p53 mutation. Because lamins B1 and B2 play important functions in protecting the nuclear envelope from blebs and confinement-induced rupture, and p53 is crucial in the cellular DNA damage response, it remains unclear whether other cell lines exhibit similar adaptation behavior.

      We agree that including other cell lines would help generalize our findings. It would be interesting in the future to analyze if a similar regulation exists for other cell types. In particular, as stated in the discussion, it would be very interesting to investigate whether this nuclear adaptation is universal, or if it is a consequence of a dysregulation in a specific cancer pathway. Our current manuscript is relevant as it uncovers the existence of this highly interesting phenomenon.

      Investigating if other cell types have the same capacity to adapt would provide insights into the molecular mechanisms involved. In the revised version, we specifically plan to analyze nuclear response under prolonged confinement in 2 types of cells :(1) normal cells with near diploid characteristics (RPE-1 cell line, as a model of a diploid and untransformed cell line); (2) other colorectal cancer cell lines presenting higher levels of lamin B2 and B1, and no P53 mutation (HCT-116).

      Furthermore, although the time-lapse experiments suggest that reduction in nuclear volume occurs primarily during mitosis, the authors do not address whether prolonged confinement, even in the absence of apoptosis, could also result in cells adjusting their nuclear volume, or alternatively normalizing nuclear envelope tension by recruiting additional membrane from the endoplasmic reticulum, which is continuous with the nuclear membranes.

      Even if we cannot completely ruin the hypothesis raised by the reviewer, we respectfully want to stress that if additional membrane from the endoplasmic reticulum were recruited, we should observe an increase in nuclear volume at S/G2, which is the case only for the strongest imposed confinment (h=3 µm, corresponding to an initial nuclear deformation of 70 % Figure S2E). It should be however very interesting in the future to directly assess nuclear envelope tension and to follow with high resolution live experiments the eventual recruitment of additional membrane.

      Regarding the proposed role of cPLA2, previous studies have shown that cPLA2 recruitment to the nuclear membrane, which is essential to mediate its nuclear mechanotransduction function, requires both an increase in nuclear membrane tension and intracellular calcium. However, the current study does not include any data showing the recruitment of cPLA2 to the nuclear membrane upon confinement, or the disappearance of nuclear membrane-associated cPLA2 during prolonged confinement, leaving unclear the precise function and dynamics of cPLA2 in the process.

      We agree with the reviewer that it would be very informative to analyze the recruitment of cPLA2 in live experiments. We plan to do this in future experiments using cPLA2 immunostaining at different time points or the cPLA2-mKate construct. This will be the subject of a dedicated study, together with possible changes in nuclear pores size and organization, as well as nuclear tension analysis. For this article, we plan to add the analysis of the effect of cPLA2 inhibition in live experiments.

      Lastly, it remains unclear (1) whether the reduction in nuclear volume is caused by a reduction in nuclear water content, by chromatin compaction, e.g. associated with an increase in heterochromatin, or through other mechanisms, (2) whether the change in nuclear volume is reversible, and if so, how quickly,

      We thank the reviewer for his/her comment. This point was also mentioned by Reviewer #1. It is important to know if cells that were ‘born’ under confinement will retain smaller nuclei in the next generation if confinement is released, or whether the next generation will recover their initial larger nuclei. We plan to perform such “deconfinement” experiments and add the results in the revised version. In addition, we also plan to investigate in more detail the DNA compaction state during confinement.

      and (3) what functional consequences the substantial reduction in nuclear volume has on nuclear function, as one would expect that this reduction would be associated with a substantial increase in nuclear crowding, affecting numerous nuclear processes.

      We agree with the reviewer that such a reduction in nuclear volume would most probably affect numerous nuclear processes that would be highly interesting to decipher in the future. Especially, as pointed out in the discussion, “the regulation of nuclear size identified in this study could have important consequences on resistance to classical chemotherapeutic treatments that target proliferation”. This question merits an entire study and is outside the scope of our current manuscript.

      Reviewer #3 (Public Review)

      (1) One essential consideration that goes unaddressed is whether the nuclear volume alone is changing under compression (resulting in a higher nuclear to cytoplasmic ratio) or if the cell volume is changing and the nuclear volume is following suit (no change in the N:C ratio). Depending on which of these is the case, the overall model would likely shift. In particular, interpreting the effect of disrupting myosin II activity given its different distribution at the cortex in response to the higher confinement would be influenced by which of these conditions are at play.

      We agree with the reviewer. As stated in the discussion, “the nuclear to cytoplasmic volume ratio, which is constant within a given population, is most likely to be impacted by confinement and changes in nuclear envelope tension (24, 45, 46), and might be at play in the regulation we describe herein”.

      As mentioned in the results section, “the distance between the cell membrane and the nuclear envelope was significantly reduced with confinement (Fig. 1D, Fig. S1B) and accompanied by the relocalization of the contractility machinery (Phosphorylated Myosin Light Chain (p-MLC) staining) from above the nucleus to the side, indicating a cortex rearrangement (Fig. S1C)”. For the revised version, we plan to investigate if such relocalization is accompanied by a change in the nuclear to cytoplasmic ratio using the p-MLC and nuclei immunostaining performed at 2h and 24h under the entire range of confinement investigated.

      (2) -A key approach used and interpreted by the investigators is an assessment of the folding of the "inner lamin envelope", which they derive from an image analysis routine of lamin staining that they developed and argue reflects "nuclear envelope tension". I am not convinced of the robustness of this approach or what it mechanistically reveals. It may or may not reflect the contour of the inner nuclear membrane, which (perhaps) is the most relevant to the authors' interpretation of nuclear envelope tension. Given the major contribution of this data to the model, which is based on the "unfolding" of the nuclear envelope, an orthogonal approach (e.g. electron microscopy - which one needs to truly address the high-frequency undulations of the nuclear envelope) is needed to support the larger conclusions.

      We agree with the reviewer that the precise measurement of NE surface area is challenging because of the NE folds, and that our approach is provides semi-quantitative information. Higher-resolution approaches would be necessary to investigate that point in more details, using 3D super-resolution. However, we want to point out that even with our limited resolution, the differences observed in lamin A/C staining are striking (Fig. 3A): while lamin folds are completely absent at 2h under strong confinement, inner lamin folds are massively observed at 24h, showing a pattern very similar to the control condition. In the revised version, we will add more representative images to strengthen that our analysis is representative of our observations.

      (3) The authors argue that nuclear tension is lost after mitosis in the confined devices because nuclear volume has decreased. While a smaller nuclear volume might indeed translate to less compressive force from the device on the nucleus, one would imagine that the chromosomes still have to be accommodated and that confining them in a smaller volume could increase the tension. Although arguable, the potential alternative possibilities suggest that actual measurements of nuclear envelope tension are needed to robustly test the model. The authors cite the observation that blebs are less prevalent after mitosis as additional support for this model, but this is expected as nuclear envelope breakdown and reformation will "reset" the nuclear contour while the appearance of blebs at mitotic entry is essential a "memory" of all blebs and ruptures over the entire preceding cell cycle.

      We agree with the reviewer that assessing the nuclear envelope tension would enable a better description of the underlying process. It will be the subject of a dedicated study, together with possible changes in nuclear pore size and organization, as well as the analysis of cPLA2 recruitment.

      The proposed model in the current study is for the moment simply a geometrical model. Given the simplicity of the model, the fit with our experimental points is striking.

      (4) Representative images for the pharmacological perturbations other than blebbistatin are notably absent - only the analyzed data are presented in the manuscript or the supplemental material. How these perturbations (e.g. to cPLA2) also affect the cortex is important to interpret the data given the point raised above. Orthogonal approaches would also strengthen the conclusions (for example, the statement that "nuclear adaptation observed during mitosis requires nuclear tension sensing through cPLA2" requires more evidence to be convincing - it is not sufficiently supported by the data presented). Even if this is the case, the authors acknowledge that cPLA2 is likely not the answer to the adaption observed under the lower degrees of confinement. Thus, the mechanisms underlying the adaptive changes to nuclear volume remain enigmatic.

      We thank the reviewer for this insightful comment, and we plan to add representative images for the pharmacological perturbation in the revised version of the manuscript.

      (5) One more consideration that seems to go without comment is that the cells under confinement do not appear to successfully complete cytokinesis (Fig. 5b). At a minimum this seems like a major perturbation to cell physiology and needs to be more fully discussed by the authors as playing a role in the observed changes in nuclear volume.

      We agree that in the image chosen for Fig. 5b, cytokinesis does not seem to be complete. This is not representative of the entire cell population as 80% of the cell population showed a normal phenotype under very strong confinement with no drug (Fig. 5C and 3E, as well as fig S3D for a representative large field of view). Live experiments using the FUCCI cell lines also show that cells are capable of making several complete divisions under confinement (Fig. 2). Complementary experiments under pharmacological treatments and confinement are planned to extend our analysis of such processes.

    3. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors discover that nuclear volume decreases after mitotic exit following cell confinement in a manner that scales with the extent of confinement. This adaptation appears to protect the cells from adverse outcomes of critical confinement such as nuclear blebs and DNA damage. The evidence to support these claims is strong.

      The authors also provide a model in which argue that what they call the "apparent nuclear surface area" is modulated by confinement through a mechanism regulated by cPLA2 and myosin II activities. Here there are weaknesses in that the manuscript relies on a single approach, measurements are indirect, and alternative models are not explored. Similarly, additional considerations need to be addressed so that the reader can interpret the data presented - for example whether cell volume is also changing coincident with nuclear volume changes, and whether other aspects of cell physiology such as cytokinesis are altered.

      Considerations that could support the manuscript further:

      One essential consideration that goes unaddressed is whether the nuclear volume alone is changing under compression (resulting in a higher nuclear to cytoplasmic ratio) or if the cell volume is changing and the nuclear volume is following suit (no change in the N:C ratio). Depending on which of these is the case, the overall model would likely shift. In particular, interpreting the effect of disrupting myosin II activity given its different distribution at the cortex in response to the higher confinement would be influenced by which of these conditions are at play.

      A key approach used and interpreted by the investigators is an assessment of the folding of the "inner lamin envelope", which they derive from an image analysis routine of lamin staining that they developed and argue reflects "nuclear envelope tension". I am not convinced of the robustness of this approach or what it mechanistically reveals. It may or may not reflect the contour of the inner nuclear membrane, which (perhaps) is the most relevant to the authors' interpretation of nuclear envelope tension. Given the major contribution of this data to the model, which is based on the "unfolding" of the nuclear envelope, an orthogonal approach (e.g. electron microscopy - which one needs to truly address the high-frequency undulations of the nuclear envelope) is needed to support the larger conclusions.

      The authors argue that nuclear tension is lost after mitosis in the confined devices because nuclear volume has decreased. While a smaller nuclear volume might indeed translate to less compressive force from the device on the nucleus, one would imagine that the chromosomes still have to be accommodated and that confining them in a smaller volume could increase the tension. Although arguable, the potential alternative possibilities suggest that actual measurements of nuclear envelope tension are needed to robustly test the model. The authors cite the observation that blebs are less prevalent after mitosis as additional support for this model, but this is expected as nuclear envelope breakdown and reformation will "reset" the nuclear contour while the appearance of blebs at mitotic entry is essential a "memory" of all blebs and ruptures over the entire preceding cell cycle.

      Representative images for the pharmacological perturbations other than blebbistatin are notably absent - only the analyzed data are presented in the manuscript or the supplemental material. How these perturbations (e.g. to cPLA2) also affect the cortex is important to interpret the data given the point raised above. Orthogonal approaches would also strengthen the conclusions (for example, the statement that "nuclear adaptation observed during mitosis requires nuclear tension sensing through cPLA2" requires more evidence to be convincing - it is not sufficiently supported by the data presented). Even if this is the case, the authors acknowledge that cPLA2 is likely not the answer to the adaption observed under the lower degrees of confinement. Thus, the mechanisms underlying the adaptive changes to nuclear volume remain enigmatic.

      One more consideration that seems to go without comment is that the cells under confinement do not appear to successfully complete cytokinesis (Fig. 5b). At a minimum this seems like a major perturbation to cell physiology and needs to be more fully discussed by the authors as playing a role in the observed changes in nuclear volume.

    1. eLife assessment

      Tatekoshi et al. endeavored to utilize human-induced pluripotent stem cell-derived cardiomyocytes to create an in vitro model of Heart Failure with preserved Ejection Fraction (HFpEF) to investigate the impacts of inflammation, sera from HIV patients, and the application of cardioprotective and antiviral medications on cardiomyocyte relaxation as a proxy for diastolic function. Their investigations revealed that inflammatory cytokines lead to an increased decay in calcium transients, a process that could be alleviated through the use of SGLT2 inhibitors and mitochondrial antioxidants. These results indicate that inflammation might contribute to diastolic dysfunction and suggest that SGLT2 inhibitors and mitochondrial antioxidants could offer cardioprotective benefits by reducing inflammation in cardiomyocytes. These valuable, yet incomplete results, can be complemented by employing biomechanical and molecular analyses, alongside validation of the therapeutic duration and dosages.

    2. Reviewer #1 (Public Review):

      Summary:

      This is an original manuscript submission by Tatekoshi et al entitled, "Human induced pluripotent stem cell-derived cardiomyocytes to study inflammation-induced diastolic dysfunction." Based on the premise that treated HIV individuals commonly have heart failure with preserved ejection fraction, yet robust animal models have not been established, the team developed iPS-CM models to study HFpEF with this angle in mind. The group established iPS-CMs using standard methods and studied TNFa and IFNy effects on calcium transients. They observed that both cytokines increased calcium transient decay and downstroke times, which could be reversed by mitoTempo treatment in the case of TNFa. To determine how mitochondrial dysfunction may impact the cytokine-induced calcium transient changes, the team measured OCR treatment changes. They observed that NAC and TNFa co-treated cells demonstrated reduced OCR. The team went on to test the effects of antiretroviral therapies including tenofovir, relategravir, elvitegravir, and darunavir at 3-10 uM levels in iPS-CMs. The team noted that ART treatments reversed the diastolic dysfunction associated with TNFa treatment suggesting that ART therapies may improve diastolic dysfunction that is associated with TNFa signaling directly in cardiomyocytes. Following up on this treatment effect, the team screened several other candidates across drug classes and identified that dapagliflozin (SGLT2i) reversed diastolic dysfunction induced by TNFa. Finally, the team collected human serum from patients with HIV+ patients from two hospitals - Northwestern with diastolic dysfunction by cMRI, and UCSF with normal diastolic function by echo. Both cohort serum samples did not change calcium transients in iPS-CMs. However, due to numerous and significant major methodological concerns, and the potential low impact of the study results, this manuscript is expected to be of very low impact to the field in its current form.

      Strengths:

      1) From a significance standpoint, understanding the mechanisms of HFpEF, particularly in conditions such as HIV would be very impactful.

      2) Collecting HIV patient serum and identifying a plasma factor that impacts cardiac function could be very significant if successful.

      Weaknesses:

      (1) I am not convinced how this study relates to HIV individual HFpEF, and the study design does not seem to be well thought out.

      (2) The connectivity of the study experiments is loose, and data analysis and conclusions are broadly overstated and misinterpreted.

      (3) For example the study lacks any measure of diastolic contractile function, and even if performed, the relevance of TNFa treatments to cells in vitro in these immature cell contexts would remain unclear. There is surprisingly no reported molecular analyses of potential mechanisms of the calcium transient changes. The study falls short in molecular detail and instead relies on drug treatments and responses that are hard to interpret with dosages that are not well justified and treatments that are numerous. Unclear what changes in calcium transients mean functionally without a comprehensive assessment of CM biomechanical contraction and relaxation measurements, and this would also require parallel molecular investigations of potential targets of any phenotypes observed.

      (4) Calcium transient data need to be better illustrated such as with representative peak tracings. The data overall is with too few samples, particularly given the inherent heterogeneity of iPS-CM studies. The iPS-CM system as a model for diastolic dysfunction remains unestablished.

      (5) There are unclear dose choices for the various ART drugs tested, as well as the other drugs tested such as SGLT2i. Besides the observation that SLC5A2 (SGLT2 target) is not established to be expressed in adult mammalian cardiomyocytes.

      (6) HIV plasma samples were not tested for cytokine levels, but this could be done to assess the validity of the final experiments. It is unclear what is being tested with these experiments.

      (7) The choice of serum controls from a second institution (UCSF) opens up concerns over batch effects unrelated to differences in diastolic dysfunction. However, there were no differences with the Northwestern samples. It is unclear why this data is included as it does not add to the impact of the study.

      (8) There are concerns about the quality of the iPS-CMs since there is no cell imaging or molecular analyses. Figure 5 Supplement 1 images are of low quality and low resolution to assess cell quality. Overall the iPS-CM QC data is extremely sparse

    3. Reviewer #2 (Public Review):

      The authors investigated the role of inflammatory molecules in diastolic dysfunction and screened antiviral and cardioprotective pharmacological agents for their potential to reverse inflammation-mediated diastolic dysfunction. This study focuses on heart failure with preserved ejection fraction (HFpEF) in people living with HIV (PLWH), a condition often challenging to study due to the lack of suitable animal models. Using human-induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs), researchers simulated HFpEF in vitro. They observed that inflammatory cytokines impaired cardiomyocyte relaxation, mimicking HFpEF, while SGLT2 inhibitors and mitochondrial antioxidants reversed this effect. Exposure to serum from HIV patients did not induce dysfunction in hiPSC-CMs. These findings suggest hiPSC-CMs as a promising model for understanding HFpEF mechanisms and testing potential treatments.

      Comments to improve the study:

      The manuscript is well-written, and the results are well-illustrated. However, there are some topics that are not well-connected, and the rationale and hypothesis are not clearly defined beforehand, such as mitochondrial membrane potential, mitochondrial ROS, and angiogenic potential.

      As the hiPSC cardiomyocytes are treated with various reagents to measure diastolic dysfunction, it is important to confirm whether the treatment time and dose used were sufficient to exert a functional effect. Dose and time-dependent experiments are essential, or at least sufficient citations should be provided for selecting the dose for IFN and TNF.

      After IFN and TNF treatment, determining the expression levels of molecular markers of DD/HFpEF is crucial. Again, if sufficient evidence is available, it can be cited.

      The Methods section describes TMRE colocalization and immunofluorescence, but no images are provided.

      The concentration of TNF and IFN in patients is critical, which was acknowledged and discussed as a limitation of the study by the authors. Authors should consider this aspect, and if not feasible, clinical reports should be cited to provide a rough estimation of their concentration.

    1. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Weng et al. detect a neuron-specific transcriptome that regulates aging. The authors first profile neuron-specific responses during aging at a time point where a loss in memory function is present. They discover signatures unique to neurons which validate their pipeline and reveal the loss of neuron identity with age. For example, old neurons reduce the expression of genes related to synaptic function and neuropeptide signaling and increase the expression of chromatin regulators, insulin peptides, and glycoproteins. The authors discover the detrimental effect of selected upregulated genes (utx-1, ins-19, and nmgp-1) by knocking them down in the whole body and detecting improvement of short memory functions. They then use their pipeline to test neuronal profiles of long-lived insulin/IGF mutants. They discover that genes related to stress response pathways are upregulated upon longevity (e.g. dod-24, F08H9.4) and that they are required for improved neuron function in long-lived individuals.

      Strengths:

      Overall, the manuscript is well-written, and the experiments are well-described. The authors take great care to explain their reasoning for performing experiments in a specific way and guide the reader through the interpretation of the results, which makes this manuscript an enjoyable and interesting read. Using neuron-specific transcriptomic analysis in aged animals the authors discover novel regulators of learning and memory, which underlines the importance of cell-specific deep sequencing. The time points of the transcriptomic profiling are elegantly chosen, as they coincide with the loss of memory and can be used to specifically reveal gene expression profiles related to neuron function. The authors showcase on the dod-24 example how powerful this approach is. In long-lived insulin/IGF-1 receptor mutants body-wide dod-24 expression differs from neuron-specific profiles. Importantly, the depletion of dod-24 has an opposing effect on lifespan and learning memory. The dataset will provide a useful resource for the C. elegans and aging community.

      Weaknesses:

      While this study nicely describes the neuron-specific profiles, the authors do not test the relevance in a tissue-specific way. It remains unclear if modifying the responses only in neurons has implications for either memory or potentially for lifespan. The authors point to this in the text and refer to tissue-specific datasets. However, it is possible that the tissue-specific profile changes with age. The authors should consider mining publicly available cell-specific aging datasets and performing neuron-specific RNAi to test the functional relevance of the neuron-specific response. This would strengthen the importance of cell-specific profiling.

    2. eLife assessment

      This important study investigates the transcriptional changes in neurons that underlie loss of learning and memory with age in C. elegans, and how cognition is maintained in insulin/IGF-1-like signaling mutants. The presented evidence is convincing, utilizing a cutting-edge method to isolate neurons from worms for genomics that is clearly conveyed with a rigorous experimental approach. Overall, this study supports that older daf-2 worms maintain cognitive function via mechanisms that are unique from younger wild type worms, which will be of interest to neuroscientists and researchers studying ageing.

    3. Reviewer #1 (Public Review):

      The authors perform RNA-seq on FACS-isolated neurons from adult worms at days 1 and 8 of adulthood to profile the gene expression changes that occur with cognitive decline. Supporting data are included indicating that by day 7 of adulthood, learning and memory are reduced, indicating that this time point or after represents cognitively aged worms. Neuronal identity genes are reduced in expression within cognitively aged worms, whereas genes involved in proteostasis, transcription/chromatin, and stress response are elevated. A number of specific examples are provided, representing markers of specific neuronal subtypes, and correlating expression changes to the erosion of particular functions (e.g. motor neurons, chemosensory neurons, aversive learning neurons, etc).

      To investigate whether the upregulation of genes in neurons with age is compensatory or deleterious, the authors reduced the expression of a set of three significantly upregulated genes and performed behavioral assays in young adults. In each case, reduction of expression improved memory, consistent with a model in which age-associated increases impair neuronal function. This claim would be bolstered by an experiment elevating the expression of these genes in young neurons, which should reduce the learning index if the hypothesis is correct.

      The authors then characterize learning and memory in wild-type, daf-2, and daf-2/daf-16 worms with age and find that daf-2 worms have an extended ability to learn for approximately 10 days longer than wild types. This was daf-16 dependent. Memory was extended in daf-2 as well, and strikingly, daf-2;daf-16 had no short-term memory even at day 1. Transcriptomic analysis of FACS-sorted neurons was performed on the three groups at day 8. The authors focus their analysis on daf-2 vs. daf-2;daf-16 and present evidence that daf-2 neurons express a stress-resistance gene program. One question that remains unanswered is how well the N2 and daf-2;daf-16 correlate overall, and are there differences? This may be informative as wild type and daf-2;daf-16 mutants are not phenotypically identical when it comes to memory, and there may be differences that can be detected despite the overlap in the PCA. This analysis could reveal the daf-16 targets involved in memory.

      The authors tested eight candidate genes that were more highly expressed in daf-2 neurons vs. daf-2;daf-16 and showed that reduction of 2 and 5 of these genes impaired learning and memory, respectively, in daf-2 worms. This finding implicates specific neuronal transcriptional targets of IIS in maintaining cognitive ability in daf-2 with age, which, importantly, are distinct from those in young wild type worms.

    4. Reviewer #2 (Public Review):

      Weng et al. perform a comprehensive study of gene expression changes in young and old animals, in wild-type and daf-2 insulin receptor mutants, in the whole animal, and specifically in the nervous system. Using this data, they identify gene families that are correlated with neuronal ageing, as well as a distinct set of genes that are upregulated in neurons of aged daf-2 mutants. This is particularly interesting as daf-2 mutants show both extended lifespans and healthier neurons in aged animals, reflected by better learning/memory in older animals compared with wild-type controls. Indeed, the knockdown of several of these upregulated genes resulted in poorer learning and memory. In addition, the authors showed that several genes upregulated during ageing in wild-type neurons also contribute to learning and memory; specifically knockdown of these genes in young animals resulted in improved memory. This indicates that (at least in this small number of cases), genes that show increased transcript levels with age in the nervous system somehow suppress memory, potentially by having damaging effects on neuronal health.

      Finally, from a resource perspective, the neuronal transcriptome provided here will be very useful for C. elegans researchers as it adds to other existing datasets by providing the transcriptome of older animals (animals at day 8 of adulthood) and demonstrating the benefits of performing tissue-specific RNAseq instead of whole-animal sequencing.

      The work presented here is of high quality and the authors present convincing evidence supporting their conclusions. I only have a few comments/suggestions:

      (1) Do the genes identified to decrease learning/memory capacity in daf-2 animals (Figure 4d/e) also impact neuronal health? daf-2 mutant worms show delayed onset of age-related changes to neuron structure (Tank et al., 2011, J Neurosci). Does knockdown of the genes shown to affect learning also affect neuron structure during ageing, potentially one mechanism through which they modulate learning/memory?

      (2) The learning and memory assay data presented in this study uses the butanone olfactory learning paradigm, which is well established by the same group. Have the authors tried other learning assays when testing for learning/memory changes after the knockdown of candidate genes? Depending on the expression pattern of these genes, they may have more or less of an effect on olfactory learning versus for example gustatory or mechanosensory-based learning.

      (3) I have a comment on the 'compensatory vs dysregulatory' model as stated by the authors on page 7. I understand that this model presents the two main options, but perhaps this is slightly too simplistic: the gene expression that rises during ageing may be detrimental for memory (= dysregulatory), but at the same time may also be beneficial for other physiological roles in other tissues (=compensatory).

    1. eLife assessment

      This useful study reports data supporting the importance of sterol homeostasis in sperm development and consequently male reproduction. While most of the data are supportive of the conclusion, some remain incomplete and need more experimental verification. This work would be of interest to basic researchers and clinicians working on sterol homeostasis and male fertility.

    2. Reviewer #1 (Public Review):

      Summary:

      The manuscript by Relovska and colleagues aims to decipher the importance of sterol homeostasis on male reproduction and, in particular, the impact of altered sterol homeostasis in sperm cells. To this end, they are generating a global line of Dhcr24 transgenic mice by mating Dhcr24fl/fl mice (overexpressing the construct in the Rosa269c gene locus) with EIla 100 CRE mice (expressing Cre recombinase in the early mouse embryo).

      The data provided are robust, using a range of approaches from sperm analysis (structure, function) to lipid analysis. Results show that overexpression of DHCR24 (TG) leads to altered sterol homeostasis in spermatozoa. Sperm from TG mice have abnormal mitochondria and sperm tails. TG spermatozoa have reduced efficiency in undergoing the acrosomal reaction. Furthermore, the data suggest that TG spermatozoa have an altered metabolism with increased oxygen consumption. These data highlight that desmosterol depletion and/or altered sterol homeostasis impact sperm morphology, number, motility, and metabolism, resulting in reduced male fertility.

      Strengths:

      The manuscript is clear and well-written, and the results are presented in high-quality figures.

      Weaknesses:

      The main concern is the clear analysis of the rodent model. Indeed, the use of this particular Cre leads to whole-body overexpression, the remaining question is whether the observed effects are directly mediated by the testicular impacts of Dhcr24 overexpression. Even if it is a testicular effect, we can't conclude where it comes from. Is it at the level of spermatogenesis, as the authors speculate? Furthermore, the authors mention that normally, Dhcr24c is mainly expressed in spermatogonia, so it's not clear why they focus only on spermatozoa, which in WT males do not normally express dhcr24 according to the authors. It is worth clarifying the testicular phenotype in more detail.

      The lower level of TG in aged mice could suggest an extinction of the transgene at least in sperm during aging, which could be difficult to reconcile with the observed phenotype. With this in mind, it would be interesting to define the penetrance of the phenotype during aging, to define variability between mice, and to clearly define potential correlations between sterol levels and fertility disorders, or altered sperm parameters.

    3. Reviewer #2 (Public Review):

      Summary:

      Sterols, including desmosterol and cholesterol, play critical roles in male fertility including membrane rearrangements associated with sperm capacitation, steroidogenesis, and germ cell development. Relovska, Sona, et al. investigated the effects of global overexpression of classic cholesterol biosynthesis enzyme DHCR24 in a mouse model, focusing on the impacts on sperm function and male fertility. While mice were viable and did not exhibit altered plasma cholesterol levels or obesity, the authors demonstrated that concentrations of relevant sterols in sperm from transgenic mice were altered compared to WT mouse sperm, including the expected depletion of desmosterol. The transgenic males exhibited several indicators of reduced sperm function and fertility. Mitochondrial dysfunction was indicated by a noted depletion of localization in the distal middle-piece of up to approximately 20-25% of transgenic sperm flagella, and alterations in mitochondrial membrane potential and oxygen consumption rates in transgenic sperm were noted.

      Strengths:

      The authors demonstrate that DHCR24 overexpression was achieved and that sperm sterol levels are altered. The conclusions that global DHCR24 overexpression impacts mitochondrial localization and male fertility parameters are supported by the number of different supporting assessments utilized to reach these conclusions and this is a strength. Overall, the authors achieve their aim of demonstrating DHCR24 overexpression impacts on indicators of sperm function and fertility including reduced sperm counts and sperm motility, reduced fertility in mating trials with aged males, and reduced IVF success when sperm were capacitated in conditions of higher sperm concentrations in vitro. The authors further investigate sperm mitochondrial localization and function. While a mitochondrial sheath can form in sperm from transgenic mice, 25% of the sperm exhibit a shortened mitochondrial sheath where a distal portion of the middle piece of the sperm flagella lacks mitochondria and instead exhibits exposed outer dense fibers.

      Weaknesses:

      In the current study, the authors conclude that desmosterol may not act as an LXR activator in testicular cells based on assessment of relevant mRNA levels in whole testis that indicated the relevant transcripts were not altered in transgenic testes. However, caution should be taken in utilizing whole testis transcriptomics to rule out a role in specific cell populations within the testis with minor relative representation, such as macrophages or undifferentiated spermatogonia. This is an important distinction for a few reasons. The authors reveal through single-cell assessments of DHCR24 expression in WT testis that it is most highly expressed in undifferentiated spermatogonia. Further, the authors previously reported that DHCR24 over-expression in myeloid/macrophage populations did impact LXR activation impacting atherosclerosis. Taken together with emerging evidence that testis macrophages may impact spermatogonial fate decisions, the potential for DHCR24 to impact these minor testicular cell populations should not yet be ruled out. The significance of individual observations needs to be clarified through improved reporting of methodologies, specific biological and technical replicates, and statistical significance for each individual assessment. The lack of these details obfuscates the ability of the reader to interpret or replicate several reported observations which is a weakness.

      (1) The fertility trials indicate a reduced number of pups/litter in aged but not younger transgenic males. However, the data for the aged males includes three data points of 0 pups, which brings to question if the data points each represent the average pups/litter for individual males or individual litters with multiple litters separately included for fertile males. Clarification could help in interpreting whether litter sizes were reduced, or if litter frequency and/or fertility of individual males was reduced. In the latter case, behavioral infertility would not be excluded from consideration.

      (2) The statistical significance is not clear for altered acrosome reaction data, hyperactivated motility data, waveform analysis, mitochondrial membrane potential, and some of the sperm morphology assessments. In many assessments, the biological and technical replicates assessed need to be clarified.

      (3) Methods utilized for image assessment of waveform analysis and mitochondrial membrane potential are lacking detail sufficient for replication of the assessments or for reader interpretation of how conclusions were reached.

      Summary of impact:

      Overall, the novel observations in this study are consistent with a role for controlled sterol concentrations being important for male fertility and indicate that this model will be useful to further investigate sterol biosynthesis contributions to testis function including steroidogenesis, spermatogenesis, and sperm function including capacitation.

    4. Reviewer #3 (Public Review):

      Summary:

      The authors have created DHCR24 knockin mice and noted changes in the sperm sterol composition. Concurrently, alterations in the quantity, motility, and function of the sperm in DHCR24 knockin mice were identified.

      Strengths:

      The manuscript offers an intriguing perspective on how disruptions in sperm sterol composition can lead to sperm abnormalities.

      Weaknesses:

      From the current data, several issues remain to be clarified, including the fertility test results, which merit a more detailed presentation to ascertain whether differences stem from individual variability or overall changes. The authors suggest an increase in ROS in the sperm of DHCR24 knockin mice, leading to sperm damage, which also requires further confirmation. Moreover, the quality of some data requires verification or improvement, such as the morphological analysis of testicular sections and the OCR experiments.

    1. eLife assessment

      The study is solid and reports interesting findings on the sensitivity of different cell types to EDCs in vitro. The data are valuable but the manuscript requires more experimental details. Bioinformatic analyses need to be improved and the interpretation of results and conclusions adjusted to the results.

    2. Reviewer #1 (Public Review):

      Summary:

      The goal of this study was to use in vitro cell populations to determine mechanisms that may be important for the propagation of epimutations induced by EDCs in vivo. To do this, authors exposed induced pluripotent stem cells (iPS), somatic cells (Sertoli, granulosa), and primordial germ cell like cells (PGCLCs) to BPS, and conducted epigenomic and transcriptomic analyses on outcomes. The importance of estrogen receptors, and the relationship of epigenomic results to genomic sites expressing EREs, were also determined in the different cell types. Results revealed differential effects of BPS in each cell population on each of these endpoints, and that epimutations were prevalent in enhancer regions with EREs with the exception of PGCLCs (which do not express ERs). The authors speculate that because epimutations also occurred in regions without EREs, especially in PGCLCs, other mechanisms may be in place. Finally, epimutations induced in iPSCs exposed to BPS that were subsequently differentiated into PGCLCs demonstrated that most epimutations were corrected.

      Strengths:

      A strength of this work is the use of different cell types representing somatic cells that would be the major recipient of EDC exposure; pluripotent cells representing preimplantation embryos; and PGCLCs that model the early germline in which epigenetic reprogramming takes place. Work differentiating the iPSCs from PGCLCs with or without BPS exposure at the iPSC level is also very informative as it suggests that most epimutations are corrected, at least in vitro. The paper is well-written and studies were technically well-executed and validated. Results are novel and likely to be of interest to those interested in transgenerational inheritance of environmentally-induced traits, as well as others more broadly interested in epigenetic mechanisms.

      Critique/Weaknesses:

      (1) A problem with in vitro work is that homogeneous cell lines/cultures are, by nature, absent from the rest of the microenvironment. The authors need to discuss this.

      (2) What are n's/replicates for each study? Were the same or different samples used to generate the data for RNA sequencing, methylation beadchip analysis, and EM-seq? This clarification is important because if the same cultures were used, this would allow comparisons and correlations within samples.

      (3) In Figure 1, it is interesting that the 50 uM BPS dose mainly resulted in hypermethylation whereas 100 uM appears to be mainly hypomethylation. (This is based on the subjective appearance of graphs). The authors should discuss and/or present these data more quantitatively. For example, what percentage of changes were hypo/hypermethylation for each treatment? How many DMRs did each dose induce? For the RNA-seq results, again, what were the number of up/down-regulated genes for each dose?

      (4) Also in Figure 1, were there DMRs or genes in common across the doses? How did DMRs relate to gene expression results? This would be informative in verifying or refuting expectations that greater methylation is often associated with decreased gene expression.

      (5) In Figure 2, was there an overlap in the hypo- and/or hyper-methylated DMCs? Please also add more description of the data in 2b to the legend including what the dot sizes/colors mean, etc. Some readers (including me) may not be familiar with this type of data presentation. Some of this comes up in Figure 4, so perhaps allude to this earlier on, or show these data earlier.

      (6) iPSCs were derived from male mice MEFs, and subsequently used to differentiate into PGCLCs. The only cell type from an XX female is the granulosa cells. This might be important, and should be mentioned and its potential significance discussed (briefly).

      (7) EREs are only one type of hormone response element. The authors make the point that other mechanisms of BPS action are independent of canonical endocrine signaling. Would authors please briefly speculate on the possibility that other endocrine pathways including those utilizing AREs or other HREs may play a role? In other words, it may not be endocrine signaling independent. The statement that the differences between PGCLCs and other cells are largely due to the absence of ERs is overly simplistic.

      (8) Interpretation of data from the GO analysis is similarly overly simplistic. The pathways identified and discussed (e.g. PI3K/AKT and ubiquitin-like protease pathways are involved in numerous functions, both endocrine and non-endocrine. Also, are the data shown in Figure 6a from all 4 cell types? I am confused by the heatmap in 6c, which genes were significantly affected by treatment in which cell types?

      (9) In Figure 7, what were the 138 genes? Any commonalities among them?

      (10) The Introduction is very long. The last paragraph, beginning line 105, is a long summary of results and interpretations that better fit in a Discussion section.

      (11) Provide some details on husbandry: e.g. were they bred on-site? What food was given, and how was water treated? These questions are to get at efforts to minimize exposure to other chemicals.

    3. Reviewer #2 (Public Review):

      Summary:

      This manuscript uses cell lines representative of germ line cells, somatic cells, and pluripotent cells to address the question of how the endocrine-disrupting compound BPS affects these various cells with respect to gene expression and DNA methylation. They find a relationship between the presence of estrogen receptor gene expression and the number of DNA methylation and gene expression changes. Notably, PGCLCs do not express estrogen receptors and although they do have fewer changes, changes are nevertheless detected, suggesting a nonconical pathway for BPS-induced perturbations. Additionally, there was a significant increase in the occurrence of BPS-induced epimutations near EREs in somatic and pluripotent cell types compared to germ cells. Epimutations in the somatic and pluripotent cell types were predominantly in enhancer regions whereas that in the germ cell type was predominantly in gene promoters.

      Strengths:

      The strengths of the paper include the use of various cell types to address the sensitivity of the lineages to BPS as well as the observed relationship between the presence of estrogen receptors and changes in gene expression and DNA methylation.

      Weaknesses:

      The weaknesses include the lack of reporting of replicates, superficial bioinformatic analysis, and the fact that exposures are more complicated in a whole organism than in an isolated cell line.

    1. eLife assessment

      This study concerns how macaque visual cortical area MT represents stimuli composed of more than one speed of motion. The study is valuable because little is known about how the visual pathway segments and preserves information about multiple stimuli, and the study involves perceptual reports from both humans and one monkey regarding whether there are one or two speeds in the stimulus. The study presents compelling evidence that (on average) MT neurons represent the average of the two speeds, with a bias that accentuates the faster of the two speeds. Ultimately, this study raises intriguing questions about how exactly the response patterns in visual cortical area MT might preserve information about each speed, since such information could potentially be lost in an average response as described here, depending on assumptions about how MT activity is evaluated by other visual areas.

    2. Reviewer #1 (Public Review):

      Summary:

      Most studies in sensory neuroscience investigate how individual sensory stimuli are represented in the brain (e.g., the motion or color of a single object). This study starts tackling the more difficult question of how the brain represents multiple stimuli simultaneously and how these representations help to segregate objects from cluttered scenes with overlapping objects.

      Strengths

      The authors first document the ability of humans to segregate two motion patterns based on differences in speed. Then they show that a monkey's performance is largely similar; thus establishing the monkey as a good model to study the underlying neural representations.

      Careful quantification of the neural responses in the middle temporal area during the simultaneous presentation of fast and slow speeds leads to the surprising finding that, at low average speeds, many neurons respond as if the slowest speed is not present, while they show averaged responses at high speeds. This unexpected complexity of the integration of multiple stimuli is key to the model developed in this paper.

      One experiment in which attention is drawn away from the receptive field supports the claim that this is not due to the involuntary capture of attention by fast speeds.

      A classifier using the neuronal response and trained to distinguish single-speed from bi-speed stimuli shows a similar overall performance and dependence on the mean speed as the monkey. This supports the claim that these neurons may indeed underlie the animal's decision process.

      The authors expand the well-established divisive normalization model to capture the responses to bi-speed stimuli. The incremental modeling (eq 9 and 10) clarifies which aspects of the tuning curves are captured by the parameters.

      Weaknesses

      While the comparison of the overall pattern of behavioral performance between monkeys and humans is important, some of the detailed comparisons are not well supported by the data. For instance, whether the monkey used the apparent coherence simply wasn't tested and a difference between 4 human subjects and a single monkey subject cannot be tested statistically in a meaningful manner. I recommend removing these observations from the manuscript and leaving it at "The difference between the monkey and human results may be due to species differences or individual variability" (and potentially add that there are differences in the task as well; the monkey received feedback on the correctness of their choice, while the humans did not.)

      A control experiment aims to show that the "fastest speed takes all" behavior is general by presenting two stimuli that move at fast/slow speeds in orthogonal directions. The claim that these responses also show the "fastest speed takes all" is not well supported by the data. In fact, for directions in which the slow speed leads to the largest response on its own, the population response to the bi-speed stimulus is the average of the response to the components. Only for the directions where the fast speed stimulus is the preferred direction is there a bias towards the faster speed (Figure 7A). The quantification of this effect in Figure 7B seems to suggest otherwise, but I suspect that this is driven by the larger amplitude of Rf in Figure 8, and the constraint that ws and wf are constant across directions. The interpretation of this experiment needs to be reconsidered.

    3. Reviewer #2 (Public Review):

      Summary:

      This is a paper about the segmentation of visual stimuli based on speed cues. The experimental stimuli are random dot fields in which each dot moves at one of two velocities. By varying the difference between the two speeds, as well as the mean of the two speeds, the authors estimate the capacity of observers (human and non-human primates) to segment overlapping motion stimuli. Consistent with previous work, perceptual segmentation ability depends on the mean of the two speeds. Recordings from area MT in monkeys show that the neuronal population to compound stimuli often shows a bias towards the faster-speed stimuli. This bias can be accounted for with a computational model that modulates single-neuron firing rates by the speed preferences of the population. The authors also test the capacity of a linear classifier to produce the psychophysical results from the MT data.

      Strengths:

      Overall, this is a thorough treatment of the question of visual segmentation with speed cues. Previous work has mostly focused on other kinds of cues (direction, disparity, color), so the neurophysiological results are novel. The connection between MT activity and perceptual segmentation is potentially interesting, particularly as it relates to existing hypotheses about population coding.

      Weaknesses:

      Page 10: The relationship between (R-Rs) and (Rf-Rs) is described as "remarkably linear". I don't actually find this surprising, as the same term (Rs) appears on both the x- and y-axes. The R^2 values are a bit misleading for this reason.

      Figure 9: I'm confused about the linear classifier section of the paper. The idea makes sense - the goal is to relate the neuronal recordings to the psychophysical data. However the results generally provide a poor quantitative match to the psychophysical data. There is mention of a "different paper" (page 26) involving a separate decoding study, as well as a preprint by Huang et al. (2023) that has better decoding results. But the Huang et al. preprint appears to be identical to the current manuscript, in that neither has a Figure 12, 13, or 14. The text also says (page 26) that the current paper is not really a decoding study, but the linear classifier (Figure 9F) is a decoder, as noted on page 10. It sounds like something got mixed up in the production of two or more papers from the same dataset. In any case, I think that some kind of decoding analysis would really strengthen the current paper by linking the physiology to the psychophysics, but given the limitations of the linear classifier, a more sophisticated approach might be necessary -- see for example Zemel, Dayan, and Pouget, 1998. The authors might also want to check out closely related work by Treue et al. (Nature Neuroscience 2000) and Watamaniuk and Duchon (1992).

      What do we learn from the normalization model? Its formulation is mostly a restatement of the results - that the faster and slower speeds differentially affect the combined response. This hypothesis is stated quantitatively in equation 8, which seems to provide a perfectly adequate account of the data. The normalization model in equation 10 is effectively the same hypothesis, with the mean population response interposed - it's not clear how much the actual tuning curve in Figure 10A even matters, since the main effect of the model is to flatten it out by averaging the functions in Figure 10B. Although the fit to the data is reasonable, the model uses 4 parameters to fit 5 data points and is likely underconstrained; the parameters other than alpha should at least be reported, as it would seem that sigma is actually the most important one. And I think it would help to examine how robust the statistical results are to different assumptions about the normalization pool.

    4. Reviewer #3 (Public Review):

      Summary:

      This study concerns how macaque visual cortical area MT represents stimuli composed of more than one speed of motion.

      Strengths:

      The study is valuable because little is known about how the visual pathway segments and preserves information about multiple stimuli. The study presents compelling evidence that (on average) MT neurons represent the average of the two speeds, with a bias that accentuates the faster of the two speeds. An additional strength of the study is the inclusion of perceptual reports from both humans and one monkey participant performing a task in which they judged whether the stimuli involved one vs two different speeds. Ultimately, this study raises intriguing questions about how exactly the response patterns in visual cortical area MT might preserve information about each speed, since such information could potentially be lost in an average response as described here, depending on assumptions about how MT activity is evaluated by other visual areas.

      Weaknesses:

      My main concern is that the authors are missing an opportunity to make clear that the divisive normalization, while commonly used to describe neural response patterns in visual areas (and which fits the data here), fails on the theoretical front as an explanation for how information about multiple stimuli can be preserved. Thus, there is a bit of a disconnect between the goal of the paper - how does MT represent multiple stimuli? - and the results: mostly averaging responses which, while consistent with divisive normalization, would seem to correspond to the perception of a single intermediate speed. This is in contrast to the psychophysical results which show that subjects can at least distinguish one from two speeds. The paper would be strengthened by grappling with this conundrum in a head-on manner.

    1. eLife assessment

      This work provides a valuable analysis of the effect of two commonly used hyperparameters, noise amplitude and firing rate regularization, on the representations of relevant and irrelevant stimuli in trained recurrent neural networks (RNNs). The results suggest an interesting interpretation of prefrontal cortex (PFC) dynamics, based on comparisons to previously published data from the same lab, in terms of decreasing metabolic cost during learning. The evidence indicating that the mechanisms identified in the RNNs are the same ones operating in PFC was considered incomplete, but could potentially be bolstered by additional analyses and appropriate revisions.

    2. Reviewer #1 (Public Review):

      Summary:

      This study compares experimental data recorded from the PFC of monkeys to the activity of recurrent neural networks trained to perform the same `task' as the monkeys, namely, to predict the delivery of reward following the presentation of visual stimuli. The visual information varied along 3 dimensions, color, shape, and width. Shape was always relevant for reward prediction, width was always irrelevant, and color was irrelevant at the beginning of the trial but became relevant later on, once it could be assessed together with shape. The neural data showed systematic changes in the representations of these features and of the expected reward as the learning progressed, and the objective of this study was to try to understand what principles could underlie these changes. The simulations and theoretical calculations indicated that the changes in PFC activity (throughout learning and throughout a trial) can be understood as an attempt by the circuitry to use an efficient representational strategy, i.e., one that uses as few spikes as possible, given that the resulting representation should be accurate enough for task performance.

      Strengths:

      - The paper is concise and clearly written.

      - The paper shows that, in a neural circuit, the information that is decodable and the information that is task-relevant may relate in very different ways. Decodable information may be very relevant or very irrelevant. This fact is critical for interpreting the results of pure decoding studies, which often assume an equivalence. This take-home message is not emphasized by the authors, but I think is quite important.

      - The results provide insight as to how neural representations may be transformed as a task is learned, which often results in subtle changes in selectivity and overall activity levels whose impact or reason is not entirely clear just by looking at the data.

      Weaknesses:

      The match between the real PFC and the model networks is highly qualitative, and as noted by the authors, comparisons only make sense in terms of *changes* between early and late learning. The time scales, activity levels, and decoding accuracies involved are all different between the model and recording data. This is not to disregard what the authors have done, but simply to point out an important limitation.

    3. Reviewer #2 (Public Review):

      Summary:

      The study investigates the representation of irrelevant stimuli in neural circuits using neural recordings from the primate prefrontal cortex during a passive object association task. They find a significant decrease in the linear decodability of irrelevant stimuli over the course of learning (in the time window in which the stimuli are irrelevant). They then compare these trends to RNNs trained with varying levels of noise and firing rate regularization and find agreement when these levels are at an intermediate value. In a complementary analysis, they found (in both RNNs and PFC) that the magnitude of relevant and irrelevant stimuli increased and decreased, respectively, during learning. These findings were interpreted in terms of a minimization of metabolic cost in the cortex.

      To understand how stimuli can be dynamically suppressed at times when they are irrelevant, the authors constructed and analyzed a reduced two-neuron model of the task. They found a mechanism in which firing rate regularization increased the probability of negative weights in the input, pushing the neural activities below the threshold. A similar mechanism was observed in RNNs.

      Strengths:

      The article is well-written and the figures are easily understood. The analyses are well explained and motivated. The article provides a valuable analysis of the effect of two parameters on representations of irrelevant stimuli in trained RNNs.

      Weaknesses:

      (1) The mechanism for suppressing dynamically relevant stimuli appears to be incomplete and does not explain clearly enough how representations of 'color' which are suppressed through negative input weights become un-suppressed in the presence of the second variable 'shape'.

      (2) Interpretation of results in terms of the effect of metabolic cost on cortical dynamics is not backed up by the presented data/analyses. The change in dynamics of 'color' representations in the prefrontal cortex only qualitatively matches RNN dynamics and may arise from other causes.

    4. Reviewer #3 (Public Review):

      Summary:

      In order to study the factors and neural dynamics that lead to the suppression of irrelevant information in the brain, the authors trained artificial neural networks in the execution of a task that involved the discrimination of complex stimuli with three main features: color, shape, and width. Specific combinations of color and shape led to a reward, but the temporal structure made color dynamically irrelevant at the beginning of the trial, and then it became relevant once the shape was presented. On the other hand, the width of the stimulus was always irrelevant. Importantly, non-human primates were also trained to execute this task (in a previous study by the authors) and the activity from neural populations from the dorsolateral Prefrontal Cortex (dlPFC) was recorded, allowing to compare the coding of information by the artificial neural network model with what happens in biological neural populations.

      The authors changed systematically the amount of noise present in the neural network model, as well as limiting the firing rate of the artificial neurons to simulate the limitations imposed by high metabolic costs in biological neurons. They found that models with medium and low noise, as well as medium and low metabolic cost, developed information encoding patterns that resembled the patterns observed throughout learning in the dlPFC, as follows: early in the learning process, color information was strongly represented during the whole trial, as well as shape and width, whereas the color/shape combination significance (XOR operation) was weakly encoded. Late in learning, color information was initially suppressed (while it was deemed irrelevant) and became more prominent during the shape presentation. Width information coding decreased, and the XOR operation result became more strongly encoded.

      Subthreshold activity dynamics were studied by training artificial networks consisting of 2 neurons, with the aim of understanding how dynamically irrelevant information is suppressed and then encoded more strongly at a different time during the trial. Under medium noise and medium metabolic cost, color information is suppressed by the divergence of the activity away from the level that triggers spikes. The authors claim that this subthreshold dynamic explains the suppression of irrelevant information in biological neural networks.

      Strengths:

      The study leverages the power of computational models to simulate biological networks and do manipulations that are difficult (if not impossible) to perform in vivo. The analyses of the activity of the network model are neat and thorough and provide a clear demonstration of how noise and metabolic costs may affect the information coding in the brain. The mathematical analyses are rigorous and nicely documented.

      Weaknesses:

      The study does not leverage the fact that they have access to the activity of individual neurons both on a neural network model and in neural recordings. The model/brain comparison results are limited to the decodability of different pieces of information during the execution of the task at different stages of learning. It would have been useful if the authors had shown response profiles of individual neurons, both biological and artificial, to strengthen the claim that the activity patterns are similar. Perhaps showing that the firing rates vary in a similar way in the large models (like they do for the 2-neuron model) would have been informative. For instance, it is possible that suppression is not occurring in the dlPFC, but that the PFC receives input with this information already suppressed. If suppression indeed happens in the PFC, response profiles associated with this process may be observed.

      There is no way to say that the 2-neuron models are in any way informative of what happens in brain neurons, or even larger artificial networks since the sources of sensory input, noise, and inhibition will differ between biological and artificial networks. And because the firing patterns are not shown for large networks, it is not clear if some non-coding artificial neurons will become broadly inhibitory but maintain a relatively high firing rate (to mention only one possibility).

    1. eLife assessment

      The authors present valuable empirical and modelling evidence that statistical learning in speech perception may contain processes like segmentation and anticipation. While the evidence for statistical learning effects is solid, the link between the pattern of effects (both empirical and simulated) and the theoretical concepts of segmentation and anticipation would need to be much stronger to exclude other accounts of the data. This work will be of broad interest to researchers working on, or with, statistical learning, and to any researcher interested in the challenges of whether data and modeling can effectively adjudicate between competing theoretical constructs.

    2. Reviewer #1 (Public Review):

      Summary:

      This paper presents two experiments, both of which use a target detection paradigm to investigate the speed of statistical learning. The first experiment is a replication of Batterink, 2017, in which participants are presented with streams of uniform-length, trisyllabic nonsense words and asked to detect a target syllable. The results replicate previous findings, showing that learning (in the form of response time facilitation to later-occurring syllables within a nonsense word) occurs after a single exposure to a word. In the second experiment, participants are presented with streams of variable-length nonsense words (two trisyllabic words and two disyllabic words) and perform the same task. A similar facilitation effect was observed as in Experiment 1. The authors interpret these findings as evidence that target detection requires mechanisms different from segmentation. They present results of a computational model to simulate results from the target detection task and find that an "anticipation mechanism" can produce facilitation effects, without performing segmentation. The authors conclude that the mechanisms involved in the target detection task are different from those involved in the word segmentation task.

      Strengths:

      The paper presents multiple experiments that provide internal replication of a key experimental finding, in which response times are facilitated after a single exposure to an embedded pseudoword. Both experimental data and results from a computational model are presented, providing converging approaches for understanding and interpreting the main results. The data are analyzed very thoroughly using mixed effects models with multiple explanatory factors.

      Weaknesses:

      In my view, the main weaknesses of this study relate to the theoretical interpretation of the results.

      (1) The key conclusion from these findings is that the facilitation effect observed in the target detection paradigm is driven by a different mechanism (or mechanisms) than those involved in word segmentation. The argument here I think is somewhat unclear and weak, for several reasons:

      First, there appears to be some blurring in what exactly is meant by the term "segmentation" with some confusion between segmentation as a concept and segmentation as a paradigm.<br /> Conceptually, segmentation refers to the segmenting of continuous speech into words. However, this conceptual understanding of segmentation (as a theoretical mechanism) is not necessarily what is directly measured by "traditional" studies of statistical learning, which typically (at least in adults) involve exposure to a continuous speech stream followed by a forced-choice recognition task of words versus recombined foil items (part-words or nonwords). To take the example provided by the authors, a participant presented with the sequence GHIABCDEFABCGHI may endorse ABC as being more familiar than BCG, because ABC is presented more frequently together and the learned association between A and B is stronger than between C and G. However, endorsement of ABC over BCG does not necessarily mean that the participant has "segmented" ABC from the speech stream, just as faster reaction times in responding to syllable C versus A do not necessarily indicate successful segmentation. As the authors argue on page 7, "an encounter to a sequence in which two elements co-occur (say, AB) would theoretically allow the learner to use the predictive relationship during a subsequent encounter (that A predicts B)." By the same logic, encoding the relationship between A and B could also allow for the above-chance endorsement of items that contain AB over items containing a weaker relationship.

      Both recognition performance and facilitation through target detection reflect different outcomes of statistical learning. While they may reflect different aspects of the learning process and/or dissociable forms of memory, they may best be viewed as measures of statistical learning, rather than mechanisms in and of themselves.

      (2) The key manipulation between experiments 1 and 2 is the length of the words in the syllable sequences, with words either constant in length (experiment 1) or mixed in length (experiment 2). The authors show that similar facilitation levels are observed across this manipulation in the current experiments. By contrast, they argue that previous findings have found that performance is impaired for mixed-length conditions compared to fixed-length conditions. Thus, a central aspect of the theoretical interpretation of the results rests on prior evidence suggesting that statistical learning is impaired in mixed-length conditions. However, it is not clear how strong this prior evidence is. There is only one published paper cited by the authors - the paper by Hoch and colleagues - that supports this conclusion in adults (other mentioned studies are all in infants, which use very different measures of learning). Other papers not cited by the authors do suggest that statistical learning can occur to stimuli of mixed lengths (Thiessen et al., 2005, using infant-directed speech; Frank et al., 2010 in adults). I think this theoretical argument would be much stronger if the dissociation between recognition and facilitation through RTs as a function of word length variability was demonstrated within the same experiment and ideally within the same group of participants.

      (3) The authors argue for an "anticipation" mechanism in explaining the facilitation effect observed in the experiments. The term anticipation would generally be understood to imply some kind of active prediction process, related to generating the representation of an upcoming stimulus prior to its occurrence. However, the computational model proposed by the authors (page 24) does not encode anything related to anticipation per se. While it demonstrates facilitation based on prior occurrences of a stimulus, that facilitation does not necessarily depend on active anticipation of the stimulus. It is not clear that it is necessary to invoke the concept of anticipation to explain the results, or indeed that there is any evidence in the current study for anticipation, as opposed to just general facilitation due to associative learning.

      In addition, related to the model, given that only bigrams are stored in the model, could the authors clarify how the model is able to account for the additional facilitation at the 3rd position of a trigram compared to the 2nd position?

      (4) In the discussion of transitional probabilities (page 31), the authors suggest that "a single exposure does provide information about the transitions within the single exposure, and the probability of B given A can indeed be calculated from a single occurrence of AB." Although this may be technically true in that a calculation for a single exposure is possible from this formula, it is not consistent with the conceptual framework for calculating transitional probabilities, as first introduced by Saffran and colleagues. For example, Saffran et al. (1996, Science) describe that "over a corpus of speech there are measurable statistical regularities that distinguish recurring sound sequences that comprise words from the more accidental sound sequences that occur across word boundaries. Within a language, the transitional probability from one sound to the next will generally be highest when the two sounds follow one another within a word, whereas transitional probabilities spanning a word boundary will be relatively low." This makes it clear that the computation of transitional probabilities (i.e., Y | X) is conceptualized to reflect the frequency of XY / frequency of X, over a given language inventory, not just a single pair. Phrased another way, a single exposure to pair AB would not provide a reliable estimate of the raw frequencies with which A and AB occur across a given sample of language.

      (5) In experiment 2, the authors argue that there is robust facilitation for trisyllabic and disyllabic words alike. I am not sure about the strength of the evidence for this claim, as it appears that there are some conflicting results relevant to this conclusion. Notably, in the regression model for disyllabic words, the omnibus interaction between word presentation and syllable position did not reach significance (p= 0.089). At face value, this result indicates that there was no significant facilitation for disyllabic words. The additional pairwise comparisons are thus not justified given the lack of omnibus interaction. The finding that there is no significant interaction between word presentation, word position, and word length is taken to support the idea that there is no difference between the two types of words, but could also be due to a lack of power, especially given the p-value (p = 0.010).

      (6) The results plotted in Figure 2 seem to suggest that RTs to the first syllable of a trisyllabic item slow down with additional word presentations, while RTs to the final position speed up. If anything, in this figure, the magnitude of the effect seems to be greater for 1st syllable positions (e.g., the RT difference between presentation 1 and 4 for syllable position 1 seems to be numerically larger than for syllable position 3, Figure 2D). Thus, it was quite surprising to see in the results (p. 16) that RTs for syllable position 1 were not significantly different for presentation 1 vs. the later presentations (but that they were significant for positions 2 and 3 given the same comparison). Is this possibly a power issue? Would there be a significant slowdown to 1st syllables if results from both the exact replication and conceptual replication conditions were combined in the same analysis?

      (7) It is difficult to evaluate the description of the PARSER simulation on page 36. Perhaps this simulation should be introduced earlier in the methods and results rather than in the discussion only.

    3. Reviewer #2 (Public Review):

      Summary:

      This valuable study investigates how statistical learning may facilitate a target detection task and whether the facilitation effect is related to statistical learning of word boundaries. Solid evidence is provided that target detection and word segmentation rely on different statistical learning mechanisms.

      Strengths:

      The study is well designed, using the contrast between the learning of words of uniform length and words of variable length to dissociate general statistical learning effects and effects related to word segmentation.

      Weaknesses:

      The study relies on the contrast between word length effects on target detection and word learning. However, the study only tested the target detection condition and did not attempt to replicate the word segmentation effect. It is true that the word segmentation effect has been replicated before but it is still worth reviewing the effect size of previous studies.

      The paper seems to distinguish prediction, anticipation, and statistical learning, but it is not entirely clear what each term refers to.

    1. eLife assessment

      The authors report valuable findings that temporary pharmacological inhibition targeting the dorsal or intermediate hippocampus in rats disrupted navigation to a goal location in a new virtual place-preference task and that functional inhibition of the intermediate hippocampus is more detrimental than functional inhibition of the dorsal hippocampus. The work has the potential to provide novel insights into function differentiation along the dorsal-ventral axis of the hippocampus. However, the evidence for the paper's claim that the dorsal hippocampus is responsible for accurate spatial navigation and the intermediate hippocampus for place-value associations is currently incomplete.

    2. Reviewer #1 (Public Review):

      Summary:

      The manuscript examines the contribution of the dorsal and intermediate hippocampus to goal-directed navigation in a wide virtual environment where visual cues are provided by the scenery on the periphery of a wide arena. Among a choice of 2 reward zones located near the arena periphery, rats learn to navigate from the center of the arena to the reward zone associated with the highest reward. Navigation performance is largely assessed from the rats' body orientation when they leave the arena center and when they reach the periphery, as well as the angular mismatch between the reward zone and the site rats reach the periphery. Muscimol inactivation of the dorsal and intermediate hippocampus alters rat navigation to the reward zone, but the effect was more pronounced for the inactivation of the intermediate hippocampus, with some rat trajectories ending in the zone associated with the lowest reward. Based on these results, the authors suggest that the intermediate hippocampus is critical, especially for navigating to the highest reward zone.

      Strengths:

      _ The authors developed an effective approach to study goal-directed navigation in a virtual environment where visual cues are provided by the peripheral scenery.

      _ In general, the text is clearly written and the figures are well-designed and relatively straightforward to interpret, even without reading the legends.

      _ An intriguing result, which would deserve to be better investigated and/or discussed, was that rats tended to rotate always in the counterclockwise direction. Could this be because of a hardware bias making it easier to turn left, some aspect of the peripheral landscape, or a natural preference of rats to turn left that is observable (or reported) in a real environment?

      _ Another interesting observation, which would also deserve to be addressed in the discussion, is the fact that dHP/iHP inactivations produced to some extent consistent shifts in departing and peripheral crossing directions. This is visible from the distributions in Figures 6 and 7, which still show a peak under muscimol inactivation, but this peak is shifted to earlier angles than the correct ones. Such change is not straightforward to interpret, unlike the shortening of the mean vector length.

      Maybe rats under muscimol could navigate simply by using the association of reward zone with some visual cues in the peripheral scene, in brain areas other than the hippocampus, and therefore stopped their rotation as soon as they saw the cues, a bit before the correct angle. While with their hippocampus is intact, rats could estimate precisely the spatial relationship between the reward zone and visual cues.

      Weaknesses:

      _ I am not sure that the differential role of dHP and iHP for navigation to high/low reward locations is supported by the data. The current results could be compatible with iHP inactivation producing a stronger impairment on spatial orientation than dHP inactivation, generating more erratic trajectories that crossed by chance the second reward zone.

      To make the point that iHP inactivation affects the disambiguation of high and low reward locations, the authors should show that the fraction of trajectories aiming at the low reward zone is higher than expected by chance. Somehow we would expect to see a significant peak pointing toward the low reward zone in the distribution of Figures 6-7.

    3. Reviewer #2 (Public Review):

      Summary:

      The aim of this paper was to elucidate the role of the dorsal HP and intermediate HP (dHP and iHP) in value-based spatial navigation through behavioral and pharmacological experiments using a newly developed VR apparatus. The authors inactivated dHP and iHP by muscimol injection and analyzed the differences in behavior. The results showed that dHP was important for spatial navigation, while iHP was critical for both value judgments and spatial navigation. The present study developed a new sophisticated behavioral experimental apparatus and proposed a behavioral paradigm that is useful for studying value-dependent spatial navigation. In addition, the present study provides important results that support previous findings of differential function along the dorsoventral axis of the hippocampus.

      Strengths:

      The authors developed a VR-based value-based spatial navigation task that allowed separate evaluation of "high-value target selection" and "spatial navigation to the target." They were also able to quantify behavioral parameters, allowing detailed analysis of the rats' behavioral patterns before and after learning or pharmacological inactivation.

      Weaknesses:

      Although differences in function along the dorsoventral axis of the hippocampus is an important topic that has received considerable attention, differences in value coding have been shown in previous studies, including the work of the authors; the present paper is an important study that supports previous studies, but the novelty of the findings is not that high, as the results are from pharmacological and behavioral experiments only.

    4. Reviewer #3 (Public Review):

      Summary:

      The authors established a new virtual reality place preference task. On the task, rats, which were body-restrained on top of a moveable Styrofoam ball and could move through a circular virtual environment by moving the Styrofoam ball, learned to navigate reliably to a high-reward location over a low-reward location, using allocentric visual cues arranged around the virtual environment.

      The authors also showed that functional inhibition by bilateral microinfusion of the GABA-A receptor agonist muscimol, which targeted the dorsal or intermediate hippocampus, disrupted task performance. The impact of functional inhibition targeting the intermediate hippocampus was more pronounced than that of functional inhibition targeting the dorsal hippocampus.

      Moreover, the authors demonstrated that the same manipulations did not significantly disrupt rats' performance on a virtual reality task that required them to navigate to a spherical landmark to obtain reward, although there were numerical impairments in the main performance measure and the absence of statistically significant impairments may partly reflect a small sample size (see comments below).

      Overall, the study established a new virtual-reality place preference task for rats and established that performance on this task requires the dorsal to intermediate hippocampus. They also established that task performance is more sensitive to the same muscimol infusion (presumably - doses and volumes used were not clearly defined in the manuscript, see comments below) when the infusion was applied to the intermediate hippocampus, compared to the dorsal hippocampus, although this does not offer strong support for the authors claim that dorsal hippocampus is responsible for accurate spatial navigation and intermediate hippocampus for place-value associations (see comments below).

      Strengths:

      (1) The authors established a new place preference task for body-restrained rats in a virtual environment and, using temporary pharmacological inhibition by intra-cerebral microinfusion of the GABA-A receptor agonist muscimol, showed that task performance requires dorsal to intermediate hippocampus.

      (2) These findings extend our knowledge about place learning tasks that require dorsal to intermediate hippocampus and add to previous evidence that, for some place memory tasks, the intermediate hippocampus may be more important than other parts of the hippocampus, including the dorsal hippocampus, for goal-directed navigation based on allocentric place memory.

      (3) The hippocampus-dependent task may be useful for future recording studies examining how hippocampal neurons support behavioral performance based on place information.

      Weaknesses:<br /> (1) The new findings do not strongly support the authors' suggestion that the dorsal hippocampus is responsible for accurate spatial navigation and the intermediate hippocampus for place-value associations.

      The authors base this claim on the differential effects of the dorsal and intermediate hippocampal muscimol infusions on different performance measures. More specifically, dorsal hippocampal muscimol infusion significantly increased perimeter crossings and perimeter crossing deviations, whereas dorsal infusion did not significantly change other measures of task performance, including departure direction and visits to the high-value location. However, these statistical outcomes offer only limited evidence that dorsal hippocampal infusion specifically affected the perimeter crossing, without affecting the other measures. Numerically the pattern of infusion effects is quite similar across these various measures: intermediate hippocampal infusions markedly impaired these performance measures compared to vehicle infusions, and the values of these measures after dorsal hippocampal muscimol infusion were between the values in the intermediate hippocampal muscimol and the vehicle condition (Figures 5-7). Moreover, I am not so sure that the perimeter crossing measures really reflect distinct aspects of navigational performance compared to departure direction and hit rate, and, even if they did, which aspects this would be. For example, in line 316, the authors suggest that 'departure direction and PCD [perimeter crossing deviation] [are] indices of the effectiveness and accuracy of navigation, respectively'. However, what do the authors mean by 'effectiveness' and 'accuracy'? Accuracy typically refers to whether or not the navigation is 'correct', i.e. how much it deviates from the goal location, which would be indexed by all performance measures.

      So, overall, I would recommend toning down the claim that the findings suggest that the dorsal hippocampus is responsible for accurate spatial navigation and the intermediate hippocampus for place-value associations.

      (2) The claim that the different effects of intermediate and dorsal hippocampal muscimol infusions reflect different functions of intermediate and dorsal hippocampus rests on the assumption that both manipulations inhibit similar volumes of hippocampal tissue to a similar extent, but at different levels along the dorso-ventral axis of the hippocampus. However, this is not a foregone conclusion (e.g., drug spread may differ depending on the infusion site or drug effects may differ due to differential expression of GABA-A receptors in the dorsal and intermediate hippocampus), and the authors do not provide direct evidence for this assumption. Therefore, a possible alternative account of the weaker effects of dorsal compared to intermediate hippocampal muscimol infusions on place-preference performance is that the dorsal infusions affect less hippocampal volume or less markedly inhibit neurons within the affected volume than the intermediate infusions. I would recommend that the authors briefly consider this issue in the discussion. Moreover, from the Methods, it is not clear which infusion volume and muscimol concentration were used for the different infusions (see below, 4.a.), and this must be clarified.

      (3) It is good that the authors included a comparison/control study using a spherical beacon-guided navigation task, to examine the specific psychological mechanisms disrupted by the hippocampal manipulations. However, as outlined below (4.b.), the sample size for the comparison study was lower than for the main study, and the data in Figure 8 suggest that the comparison task may be affected by the hippocampal manipulations similarly to the place-preference task, albeit less markedly. This would raise the question as to which mechanisms that are common to the two tasks may be affected by hippocampal functional inhibition, which should be considered in the discussion.

      (4) Several important methodological details require clarification:<br /> a. Drug infusions (from line 673):<br /> - '0.3 to 0.5 μl of either phosphate-buffered saline (PBS) or muscimol (MUS) was infused into each hemisphere'; the authors need to clarify when which infusion volume was used and why different infusion volumes were used.<br /> - I could not find the concentration of the muscimol solution that was used. The authors must clarify this and also should include a justification of the doses used, e.g. based on previous studies.<br /> - Please also clarify if the injectors and dummies were flush with the guides or by which distance they protruded from the guides.<br /> b. Sample sizes: The authors should include sample size justifications, e.g. based on considerations of statistical power, previous studies, practical considerations, or a combination of these factors. Importantly, the smaller sample size in the control study using the spherical beacon-guided navigation task (n=5 rats) limits comparability with the main study using the place-preference task (n=8). Numerically, the findings on the control task (Figure 8) look quite similar to the findings on the place-preference task, with intermediate hippocampal muscimol infusions causing the most pronounced impairment and dorsal hippocampal muscimol infusions causing a weaker impairment. These effects may have reached statistical significance if the same sample size had been used in the place-preference study.<br /> c. Statistical analyses: Why were the data of the intermediate and dorsal hippocampal PBS infusion conditions averaged for some of the analyses (Figure 5; Figure 6B and C; Figure 7B and C; Figure 8B) but not for others (Figure 6A and Figure 7A)?

    1. eLife assessment

      In their study, Diana et al. introduce a novel method for spike inference from calcium imaging data using a Monte Carlo-based approach, emphasizing the quantification of uncertainties in spike time estimates through a Bayesian framework. This method employs particle Gibbs sampling for estimating model parameter probabilities, offering accuracy comparable to existing methods with the added benefit of directly assessing uncertainties. Although the paper provides a solid methodological explanation, it lacks a thorough comparison with other inference methods. Nevertheless, it presents a valuable advancement for neuroscientists interested in new approaches for parameter estimation from calcium imaging data.

    2. Reviewer #1 (Public Review):

      Summary:

      In this study, Diana et al. present a Monte Carlo-based method to perform spike inference from calcium imaging data. A particular strength of their approach is that they can estimate not only averages but also uncertainties of the modeled process. The authors then focus on the quantification of spike time uncertainties in simulated data and in data recorded with a high sampling rate in cerebellar slices with GCaMP8f.

      Strengths:

      - The authors provide a solid groundwork for sequential Monte Carlo-based spike inference, which extends previous work of Pnevmatikakis et al., Greenberg et al., and others.

      - The integration of two states (silence vs. burst firing) seems to improve the performance of the model.

      - The acquisition of a GCaMP8f dataset in the cerebellum is useful and helps make the point that high spike time inference precision is possible under certain conditions.

      Weaknesses:

      - The algorithm is designed to predict single spike times. Currently, it is not benchmarked against other algorithms in terms of single spike precision and spike time errors. A benchmarking with the most recent other SMC model and another good model focused on single spike outputs (e.g., MLSpike) would be useful to have.

      - Some of the analyses and benchmarks seem too cursory, and the reporting simply consists of a visual impression of results instead of proper analysis and quantification. For example, the authors write "The spike patterns obtained using our method are very similar across trials, showing that PGBAR can reliably detect single-trial action potential-evoked GCaMP8f fluorescence transients." This is a highly qualitative statement, just based on the (subjective) visual impression of a plot. Similarly, the authors write "we could reliably identify the two spikes in each trial", but this claim is not supported by quantification or a figure, as far as I can see. The authors write "but the trade-off between temporal accuracy, SNR and sampling frequency must be considered", but they don't discuss these trade-offs systematically.

      - It has been shown several times from experimental data that spike inference with single spike resolution does not work well (Huang et al. eLife, 2021; Rupprecht et al., Nature Neuroscience, 2021) in general. This limitation should be discussed with respect to the applicability of the proposed algorithm for standard population calcium imaging data.

      - Several analyses are based on artificial, simulated data with simplifying assumptions. Ever since Theis et al., Neuron, 2016, it has been known that artificially generated ground truth data should not be used as the primary means to evaluate spike inference algorithms. It would have been informative if the authors had used either the CASCADE dataset or their cerebellum dataset for more detailed analyses, in particular of single spike time precision.

      - In its current state, the sum of the current weaknesses makes the suggested method, while interesting for experts, rather unattractive for experimentalists who want to perform spike inference on their recorded calcium imaging data.

      Other comments:

      - One of the key features of the SMC model is the assumption of two states (bursting vs. non-bursting). However, while it seems clear that this approach is helpful, it is not clear where this idea comes from, from an observation of the data or another concept.

      - Another SMC algorithm (Greenberg et al., 2018) stated that the fitted parameters showed some degeneracy, resulting in ambiguous fitting parameters. It would be good to know if this problem was avoided by the authors.

    3. Reviewer #2 (Public Review):

      Summary:

      Methods to infer action potentials from fluorescence-based measurements of intracellular calcium dynamics are important for optical measurements of activity across large populations of neurons. The variety of existing methods can be separated into two broad classes: a) model-independent approaches that are trained on ground truth datasets (e.g., deep networks), and b) approaches based on a model of the processes that link action potentials to calcium signals. Models usually contain parameters describing biophysical variables, such as rate constants of the calcium dynamics and features of the calcium indicator. The method presented here, PGBAR, is model-based and uses a Bayesian approach. A novelty of PGBAR is that static parameters and state variables are jointly estimated using particle Gibbs sampling, a sequential Monte Carlo technique that can efficiently sample the latent embedding space.

      Strengths:

      A main strength of PGBAR is that it provides probability distributions rather than point estimates of spike times. This is different from most other methods and may be an important feature in cases when estimates of uncertainty are desired. Another important feature of PGBAR is that it estimates not only the state variable representing spiking activity but also other variables such as baseline fluctuations and stationary model variables, in a joint process. PGBAR can therefore provide more information than various other methods. The information in the GitHub repository is well-organized.

      Weaknesses:

      On the other hand, the accuracy of spike train reconstructions is not higher than that of other model-based approaches, and clearly lower than the accuracy of a model-independent approach based on a deep network. The authors demonstrate convincingly that PGBAR can resolve inter-spike intervals in the range of 5 ms using fluorescence data obtained with a very fast genetically encoded calcium indicator at very high sampling rates (line scans at >= 1 kHz). It would be interesting to more systematically compare the performance of PGBAR to other methods in this regime of high temporal resolution, which has not been explored much.

    1. Author Response

      We would like to thank the three reviewers and the eLife editors for their careful analysis of our work, and for their constructive feedback and positive evaluation. We are especially pleased to see echoed in the reviews and in the editorial assessment that our results underline the importance of taking into account glycosylation in viral evolution, immune surveillance, and in the interpretation of complex epistatic interactions. With this provisional response we would like to communicate to the editors, reviewers and to the eLife readership our intention to integrate in the paper a detailed description of the GM1os and GM2os binding site on the RBD with details on the computational approach we used. We agree that this addition will strengthen the work by making it more self-contained. Also, as suggested by the editorial team, we will provide a comprehensive discussion of published data, as a firmer foundation for our findings.

    2. eLife assessment

      This study presents a valuable finding on the structural role of glycosylation at position N343 of the SARS-CoV-2 spike protein's receptor-binding domain in maintaining its stability, with implications across different variants of concern. The evidence supporting the claims of the authors is solid, although a more complete discussion of published data would have strengthened the study by providing a foundation for the new findings. The work will be of interest to evolutionary virologists.

    3. Reviewer #1 (Public Review):

      Summary:

      The authors seek to elucidate the structural role of N-glycosylation at the N343 position of the SARS-CoV-2 Spike protein's Receptor Binding Domain (RBD) and its evolution across different variants of concern (VoCs). Specifically, they aim to understand the impact of this glycosylation on the RBD's stability and function, which could have implications for the virus's infectivity and, eventually, the effectiveness of vaccines.

      Strengths:

      The major strength of the study stems from the molecular-level picture emerging from the use of over 45 μs of cumulative molecular dynamics (MD) simulations, including both conventional and enhanced sampling schemes, which provide detailed insights into the structural role of N343 glycosylation. The combination of these simulations with experimental assays, such as electron-spray ionization mass spectrometry (ESI-MS) for affinity measurements, bolsters the reliability of the findings. At the same time, one potential weakness is the inherent limitation of the current computational models to fully capture the complexities of in vivo systems. While the authors acknowledge the difficulty in completely gauging the N343 glycosylation's impact on RBD folding due to the dynamic nature of glycan structures, their computational/experimental approach lends support to their claims.

      Weaknesses:

      One potential weakness is the inherent limitation of computational models to fully capture the complexities of in vivo systems. While the authors acknowledge the difficulty in completely gauging the N343 glycosylation's impact on RBD folding due to the dynamic nature of glycan structures, their multi-faceted approach lends solid support to their claims.

      Other Comments:

      The study shows that N343 glycosylation plays a structural role in stabilizing the RBD across various SARS-CoV-2 strains. The removal of this glycan led to conformational changes that could affect the virus's infectivity. The results correlate with a reported reduction in viral infectivity upon deletion of glycosylation sites, supporting the authors' conclusion that N343 glycosylation is functionally essential for viral infection.

      By providing molecular insights into the spike protein's architectural changes, the work could influence the design of more effective vaccines and therapeutic agents. The data and methods used could serve as a valuable resource for researchers looking into viral evolution, protein-glycan interactions, and the development of glycan-based interventions.

    4. Reviewer #2 (Public Review):

      The authors sought to establish the role played by N343 glycosylation on the SARS-CoV-2 S receptor binding domain structure and binding affinity to the human host receptor ACE2 across several variants of concern. The work includes both computational analysis in the form of molecular dynamics simulations and experimental binding assays between the RBD and ganglioside receptors.

      The work extensively samples the conformational space of the RBD beginning with atomic coordinates representing both the bound and unbound states and computes molecular dynamics trajectories until equilibrium is achieved with and without removing N343 glycosylation. Through comparison of these simulated structures, the authors are able to demonstrate that N343 glycosylation stabilizes the RBD. Prior work had demonstrated that glycosylation at this site plays an important role in shielding the RBD core and in this work, the authors demonstrate that removal of this glycan can trigger a conformational change to reduce water access to the core without it. This response is variant-dependent and variants containing interface substitutions that increase RBD stability, including Delta substitution L452R, do not experience the same conformational change when the glycan is removed. The authors also explore structures corresponding to Alpha and Beta in which no structure-reinforcing substitutions were identified and two Omicron variants in which other substitutions with an analogous effect to L452R are present.

      The authors experimentally assessed these inferred structural changes by measuring the binding affinity of the RBD for the oligosaccharides of the mono-sialylated gangliosides GM1os and GM2os with and without the glycan at N343. While GM1os and GM2os binding is influenced by additional factors in the Beta and Omicron variants, the comparison between Delta and Wuhan-hu-1 is clear: removal of the glycan abrogated binding for Wuhan-hu-1 and minimally affected Delta as predicted by structural simulations.

      In summary, these findings suggest, in the words of the authors, that SARS-CoV-2 has evolved to render the N-glycosylation site at N343 "structurally dispensable". This study emphasizes how glycosylation impacts both viral immune evasion and structural stability which may in turn impact receptor binding affinity and infectivity. Mutations that stabilize the antigen may relax the structural constraints on glycosylation opening up avenues for subsequent mutations that remove glycans and improve immune evasion. This interplay between immune evasion and receptor stability may support complex epistatic interactions which may in turn substantially expand the predicted mutational repertoire of the virus relative to expectations that do not take into account glycosylation.

    5. Reviewer #3 (Public Review):

      Summary:

      The receptor binding domain of SARS-Cov-2 spike protein contains two N-glycans which have been conserved by the variants observed in these last 4 years. Through the use of extensive molecular dynamics, the authors demonstrate that even if glycosylation is conserved, the stabilization role of glycans at N343 differs among the strains. They also investigate the effect of this glycosylation on the binding of RBD towards sialylated gangliosides, as a function of evolution.

      Strengths:

      The molecular dynamics characterization is well performed and demonstrates differences in the effect of glycosylation as a factor of evolution. The binding of different strains to human gangliosides shows variations of strong interest. Analyzing the structure function of glycans on SARS-Cov-2 surface as a function of evolution is important for the surveillance of novel variants since it can influence their virulence.

      Weaknesses:

      The article is difficult to read, with no sufficient efforts of clarification for non-glycobiology audiences. The presentation of previous knowledge about RBD glycosylation and its effect on structure is very difficult to follow and should be reorganized. The choice of the nature of the biantennary glycan at N343 is not rationalized. A major weakness is the absence of data supporting the proposed binding site for ganglioside.

    1. Author Response

      We are deeply grateful for the highly professional analysis of our work by the Journal Editor and Reviewers. Here is our provisional response to some of the reviewer comments. In our response, we would like to address two comments that were common to all Reviewers' responses. We will thoroughly address all of the Reviewers' comments in the final version of the paper.

      Incomplete analysis of maturational changes of striato-nigral connections.

      In the initial study, we showed that chronic inhibition of striosomal neurons with the DREADD approach during early postnatal development leads to decreased functional innervation of dopaminergic cells by striosomes in adulthood. We have shown that by two approaches: (1) analysis of miniature inhibitory post-synaptic currents (mIPSCs) and (2) analysis of GFP and gephyrin puncta densities around dopaminergic cells. The results from these experiments strongly suggest a decrease in inhibitory drive to dopaminergic neurons of substantia nigra pars compacta, yet we agree that only GFP puncta density can be considered as a direct evidence for weakened striatonigral connections. Reviewers indicated that additional direct measurements of striatonigral synaptic efficacy would be needed to strengthen our conclusions. We completely agree with this statement and will evaluate the possibility of doing the suggested experiments, using optogenetic stimulation of striosomal inputs to dopaminergic neurons.

      Inconsistent description of Ca2+ imaging experiments.

      Unfortunately, there was a general misunderstanding in interpreting the Ca2+ imaging methods description. All our experiments were done so that baseline Ca2+ oscillations and oscillations in the presence of a drug were recorded in the usual ACSF (containing 3 mM KCl) at the patch-clamp setup chamber. So, conditions were exactly the same as for cell-attached and whole-cell recordings. At the end of each experiment, ACSF containing 8 mM KCl was applied. This high-KCl condition was used to calculate the total number of viable cells reacting to elevated potassium concentrations, and this number was taken as 100 %. Therefore, the percents displayed in the paper represent the actively oscillating cells in common ACSF (3 mM KCl), counted as a percent of the total number of cells that responded to the following high potassium stimulation (8 mM KCl). The formula was: (Number of active cells in 3 mM KCl / number of viable cells active at 8 mM KCl)*100.

    2. eLife assessment

      This valuable study describes early postnatal compartmental differences in the functional maturation of striatal projection neurons. It explores how the postnatal activity of these neurons may determine the GABAergic innervation of dopaminergic neurons in the adult substantia nigra pars compacta. While the functional characterization of striatal neuron development is solid, analysis of how early postnatal activity of striatal projection neurons shapes their functional innervation of dopaminergic neurons is incomplete.

    3. Reviewer #1 (Public Review):

      Summary:

      This study offers a comprehensive examination of the early postnatal development of the patch and matrix compartments within the striatum. These are segregated circuits within the striatum circuits with distinct embryonic origins and functional roles in mature brain physiology. Despite the recognized significance of these circuits, a comprehensive understanding of their postnatal maturation remains elusive.

      Strengths:

      The authors undertake a thorough investigation, characterizing the intrinsic properties of direct pathway spiny projection neurons (dSPNs) and indirect pathway spiny projection neurons (iSPNs) across both matrix and striosome compartments throughout development. The authors identify the regulatory role of M1 receptors in modulating spontaneous activity in SPNs, and demonstrate the impact of chemogenetic inhibition of MOR-positive neurons during development on GABAergic synapses in substantia nigra pars compacta (SNc) dopamine (DA) neurons. These findings significantly advance our understanding of striatal development and function.

      Weaknesses:

      Certain methodological considerations warrant attention. Notably, the reliance on TdTomato expression for the identification of striosomes raises concerns, particularly regarding the substantial difference in slice thickness between the immunohistochemistry (IHC) images (50um) shown in Figure 2 and those utilized for whole-cell recordings (300um).

      Enhanced clarification regarding the identification of cell patches is possible in the electrophysiology rig conditions. Using a widefield microscope rather than a confocal would strengthen the reliability of this methodology.

      In the Ca2+ imaging experiments of Figure 2, striosomes were defined as the regions of brighter GCaMP fluorescence. This presents a potential limitation because it presupposes higher activity levels within patch cells, which is what the experiment is designed to test. Based on this criteria, neurons of this region will necessarily have more activity than in others.

      There is also no information on how Ca2+ imaging traces were analyzed. In the examples provided, putative matrix neurons seem to exhibit different Ca2+ dynamics compared to striosome neurons. The plateau responses might reflect even higher activity than the transient signals observed in striosome neurons. It'll be important to know how the data was quantified. For example, calculations of F0 based on rolling functions tend to underestimate dF/F in traces like this. Calculations of the area under the curve can also provide valuable information in these cases.

      There is no description of the 8mM KCl treatment in the methods. Was this only used for the Ca2+ imaging experiments? The percentage of active cells in Figures 2C-D is similar to or lower than that described in Figure 2B, which is confusing. Were recordings always performed in 8mM KCl?

      Lastly, while the findings of Figure 6 suggest a deficit in striosomal inputs to SNc DA neurons, they do not conclusively demonstrate this point (DA neurons receive many sources of inhibition, and local interneurons in SNc are highly plastic). Given the availability of Opmr1-Cre mice and the utilization of multiple viruses in Figure 6 experiments, the inclusion of experiments employing ChR2 to directly assess striatal/striosome inputs would substantially strengthen this claim. This is the main claim stated in the manuscript title, so it is important to provide evidence of specific striatonigral deficits.

    4. Reviewer #2 (Public Review):

      Summary:

      The manuscript by Kokinovic et al. presents evidence that a significant portion of striatal projection neurons (SPNs) are spontaneously active early in development. This spontaneous activity (as measured in ex vivo brain slices) is due to intrinsic mechanisms, and subsides over the course of the first few postnatal weeks in a cell-type specific way: striosome direct and indirect pathway SPNs (dSPNs and iSPNs, respectively) remain spontaneously active until postnatal days 10-14, by which time matrix dSPNs and iSPNs have become entirely silent. The authors suggest that this early spontaneous activity may be in part due to M1 muscarinic receptor signaling. Through chemogenetic inhibition of striosome SPNs (of which dSPNs target dopaminergic neurons of the SNc), the authors present evidence that critical postnatal windows of SPN activity shape the strength of GABAergic innervation of the SNc (measured in adults). This study provides a useful and solid characterization of the functional, postnatal compartmental development of the striatum. However, some weaknesses in the experimental design should be addressed before definitively concluding that postnatal striosome SPN activity determines its functional innervation of dopaminergic SNc neurons.

      Specific Comments:

      (1) While certainly interesting and possibly true, evidence for the necessity of early striosome dSPN activity in shaping their functional innervation of dopaminergic SNc neurons is not entirely convincing. The functional measure of GABAergic innervation of dopamine neurons is inferred from mIPSCs. As the authors state, dopaminergic neurons have numerous other sources of GABAergic inputs in addition to striosome dSPNs. So while manipulating striosome activity may ultimately alter the overall GABAergic innervation of SNc dopamine neurons, the specificity of this to striosome dSPN inputs is not known. Optogenetic stimulation of striosome->SNc neurons after chemogenetic silencing would help support the authors' interpretation. Related to this point, while striatonigral projections form embryonically, is there evidence that striosome->SNc synapses are indeed functional by P6-14 when CNO is delivered?

      (2) One big caveat that needs to be addressed is that all measures of early postnatal spontaneous SPN activity were performed in ex vivo slices. Are SPNs active (in pathway/compartmental specific ways) in vivo during this time? If it is unknown, is there other evidence (e.g. immediate early gene expression, etc...) that may suggest this is indeed the case in vivo?

      (3) It appears that 8mM KCl (external) was only used while measuring spontaneous calcium oscillations, not spontaneous spiking (Figure 2). Was there any evidence of spontaneous calcium activity in the lower KCl concentration (3mM?) used for cell-attached recordings? One caveat is that experiments demonstrating that SPNs fire spontaneously in the presence of AMPA receptor blockers (Figure S1) were presumably performed in 3mM KCl. Does elevated KCl increases spontaneous EPSPs during the ages examined? If so, are the calcium oscillations shown in Figure 2 synaptically driven or intrinsically generated? Somewhat related, speculation on why M1 receptor blockade reduces calcium oscillations but not spontaneous spikes in striosome dSPNs would be useful.

      (4) Several statements in the introduction could use references.

    5. Reviewer #3 (Public Review):

      Summary:

      Kokinovic et al. presents an interesting paper that addresses an important gap in knowledge about the differences in the development of direct and indirect pathway striatal neurons in the striosome and matrix compartments. The division of the striatum into 4 distinct populations, striosome-dSPNs, striosome-iSPNs, matrix-dSPNs, and matrix-iSPNs is important, but rarely done. This study records all four populations across early development and shows differences in action potential characteristics and intrinsic properties. They also suppress striosome activity during postnatal development and evaluate the characteristics of adult dopaminergic neurons in control and previously striosome-quieted conditions.

      Strengths:

      The striatal electrophysiology is beautifully and carefully done and shows important developmental differences between neural subtypes.

      The idea to test the striatonigral connection is a good idea.

      Weaknesses:

      The authors didn't actually test the striatonigral connection. The experiments they do instead don't convincingly show that the striosomal or even striatal connection to the dopaminergic neurons is altered after postnatal striosome suppression.

      Major concerns:

      (1) mIPSCs are measured and are reduced after chemogenetic suppression of striosomal neurons during development. This is an interesting finding, but these mIPSCs could be coming from any inhibitory input onto the SNc neurons. It is unlikely that most of the mIPSCs are coming from the striosomal inputs. The GPe is much more likely to be the source of these mIPSCs than the striatum because the GPe inputs form synapses nearer the soma and have a higher probability of release (Evans et al., 2020). dSPNs inhibit GPe neurons through a non-canonical pathway (Cui et al., 2021; Spix et al., 2021) and striosomes also inhibit the SNr (McGregor et al., 2019). The striatum has the potential to disinhibit SNc neurons through both the SNr or the GPe (Evans, 2022), and modification of the striosome-SNr or striosome-GPe connections during development could be what is causing the mIPSC changes. To claim that the striosome-SNc connection is altered, a direct test of this connection is necessary.

      (2) The dopaminergic neurons recorded seem to be randomly selected, but the striosomes do not inhibit all SNc dopamine neurons. They selectively inhibit the ventral tier SNc neurons (Evans et al., 2020). In the present manuscript, it is impossible to know which subpopulation of SNc neurons was recorded, so it is impossible to tell whether the dopaminergic neurons recorded are the ones expected to receive striosomal input.

      (3) Very similarly, the striosomes selectively wrap around the "SNr dendrite" of SNc neurons that participate in striosome-dendron bouquets (Crittenden et al., 2016). However, not all SNc neurons have prominent SNr dendrites (Henny et al., 2012). In the morphological images of Supplemental Figure 3, it looks like the recorded cells sometimes have an SNr dendrite and sometimes don't (but it is hard to tell because the medial-lateral rostral-caudal axis is not labeled in the images). The presence or absence of the "SNr dendrite" is a strong determinant of whether an individual dopaminergic neuron receives striosomal inhibition or not (Evans et al., 2020). As above, not knowing whether the neurons recorded have SNr dendrites makes it impossible to know whether they should be receiving striosomal input at all.

      (4) It's quite interesting that the dendron-bouquet structure is intact even after striosomal activity suppression, as cannabinoid receptor knockout greatly disrupts the structural integrity of bouquets (Crittenden et al., 2022). However, going along with point 3, the gephyrin puncta analysis only at the somas is very limiting. The striosome-SNc relevant puncta would be primarily on the SNr dendrite. Gephyrin density on the SNr dendrites or in bouquets would be much more informative than density on the soma.

      (5) The authors claim that "CNO didn't affect the shape of the DA neuron dendritic tree", but more information about the morphological analysis should be added. It is not clear how the sholl analysis was conducted or whether a full 3D reconstruction was made. This claim seems to be based on only one dendritic measurement (sholl analysis), but many other dendritic or morphological features could be altered.

      Crittenden, J.R., Tillberg, P.W., Riad, M.H., Shima, Y., Gerfen, C.R., Curry, J., Housman, D.E., Nelson, S.B., Boyden, E.S., & Graybiel, A.M. (2016) Striosome-dendron bouquets highlight a unique striatonigral circuit targeting dopamine-containing neurons. Proc. Natl. Acad. Sci. U.S.A., 113, 11318-11323.<br /> Crittenden, J.R., Yoshida, T., Venu, S., Mahar, A., & Graybiel, A.M. (2022) Cannabinoid Receptor 1 Is Required for Neurodevelopment of Striosome-Dendron Bouquets. eNeuro, 9, ENEURO.0318-21.2022.<br /> Cui, Q., Du, X., Chang, I.Y.M., Pamukcu, A., Lilascharoen, V., Berceau, B.L., García, D., Hong, D., Chon, U., Narayanan, A., Kim, Y., Lim, B.K., & Chan, C.S. (2021) Striatal Direct Pathway Targets Npas1+ Pallidal Neurons. J Neurosci, 41, 3966-3987.<br /> Evans, R.C. (2022) Dendritic involvement in inhibition and disinhibition of vulnerable dopaminergic neurons in healthy and pathological conditions. Neurobiol Dis, 172, 105815.<br /> Evans, R.C., Twedell, E.L., Zhu, M., Ascencio, J., Zhang, R., & Khaliq, Z.M. (2020) Functional Dissection of Basal Ganglia Inhibitory Inputs onto Substantia Nigra Dopaminergic Neurons. Cell Rep, 32, 108156.<br /> Henny, P., Brown, M.T.C., Northrop, A., Faunes, M., Ungless, M.A., Magill, P.J., & Bolam, J.P. (2012) Structural correlates of heterogeneous in vivo activity of midbrain dopaminergic neurons. Nat. Neurosci., 15, 613-619.<br /> McGregor, M.M., McKinsey, G.L., Girasole, A.E., Bair-Marshall, C.J., Rubenstein, J.L.R., & Nelson, A.B. (2019) Functionally Distinct Connectivity of Developmentally Targeted Striosome Neurons. Cell Rep, 29, 1419-1428.e5.<br /> Spix, T.A., Nanivadekar, S., Toong, N., Kaplow, I.M., Isett, B.R., Goksen, Y., Pfenning, A.R., & Gittis, A.H. (2021) Population-specific neuromodulation prolongs therapeutic benefits of deep brain stimulation. Science, 374, 201-206.

    1. Author Response

      We appreciate your constructive feedback on our manuscript entitled “Deletion of sulfate transporter SUL1 extends yeast replicative lifespan via reduced PKA signaling instead of decreased sulfate uptake” (ID: eLife-RP-RA-2023-94609). Your comments/suggestions are very helpful for improving our manuscript. In particular, we feel additional experiments and analysis suggested by the reviewers will help strengthen our argument that Sul1 deletion mutant extends lifespan via decreased PKA signaling, instead of via decreased sulfate uptake. Below we outline our response to the reviewer's comments/suggestions and the plans for additional experiments and analysis.

      (1) Our current model is that lifespan extension following SUL1 knockout depends on the PKA signaling pathway but not sulfate transport. To further substantiate this, we plan to conduct further transcriptome sequencing and dynamic sulfate uptake experiments using WT, Sul1D and Sul1E427Q strains. If our model is correct, we expect that PKA signaling pathway will be more repressed in Sul1D strain than in Sul1E427Q strain, but the sulfate transport will be similar in both strains. This will add strong evidences supporting the model in addition to the lifespan data.

      (2) The reviewer mentioned the disparities observed between the lifespan of WT in Figure 1B and other experimental assays. Although it is known that lifespan for WT varies considerably from experiment to experiment (thus the need for WT control for every lifespan measurement), we agree it is important to make a solid conclusion that Sul1E427Q does not extend lifespan. We plan to measure the lifespan of more cells for the mutant strains illustrated in Figure 1B and update the data and charts.

      (3) Other issues, for example, the small images of Msn2/4 in the nucleus, grammar and formatting errors, and the lifespan data of double (Sul1/Msn4) mutants will be addressed in the revised version of the manuscript after we performed the additional experiments/analysis.

    2. Reviewer #2 (Public Review):

      Summary:

      In this study, the authors find that the deletion of a sulfate transporter in yeast, Sul1, leads to the extension of replicative lifespan. They investigate mechanisms underlying this extension and claim that the effects on longevity can be separated from sulfate transport, and are instead linked to a previously proposed transceptor function of the Sul1 transporter. Through RNA sequencing analysis, the authors find that Sul1 loss triggers activation of several stress response pathways, and conclude that deletion of two pathways, autophagy or Msn2/4, partially prevents lifespan extension in cells lacking Sul1. Overall, while it is well-appreciated that activation of Msn2/4 or autophagy is beneficial for lifespan extension in yeast, the results of this study would add an important new mechanism by which this could achieved, through perceived sulfate starvation. However, as described below, several of the experiments utilized to support the authors' conclusion are not experimentally sound, and significant additional experimentation is required to support the authors' claims throughout the manuscript.

      Strengths:

      The major strength of the study is the robust RNA-seq data that identified differentially expressed genes in cells lacking Sul1. This facilitated the authors' focus on two of these pathways, autophagy and the Msn2/4 stress response pathway.

      Weaknesses:

      Several critical experimental flaws need to be addressed by the authors to more rigorously test their hypothesis.

      (1) The lifespan assays throughout the manuscript contain inconsistencies in the mean lifespan of the wild-type strain, BY4741. For example, in Figure 1A, the lifespan of BY4741 is 24.3, and the extended lifespan of the sul1 mutant is 31. However, although all mutants tested in Figure 1B also have lifespans close to 30 cell divisions, the wild-type control is also at 30 divisions in those experiments as well. This is problematic, as it makes it impossible to conclude anything about the lifespan extension of various mutants with inconsistencies in the wild-type lifespan. Additionally, the mutants analyzed in 1B are what the authors use to claim that loss of the transporter does not extend lifespan through sulfate limitation, but instead through a signaling function. Thus, it remains unclear whether loss of sul1 extends lifespan at all, and if it does, whether this is separable from cellular sulfate levels.

      (2) While the authors use mutants in Figure 1 that should have differential effects on sulfate levels in cells, the authors need to include experiments to measure sulfate levels in their various mutant cells to draw any conclusions about their data.

      3) Similar to point 2, the authors focused their RNA sequencing analysis on the deletion of sul1 and did not include important RNA seq analysis of the specific Sul1 mutation or other mutants in Figure 1B that do not exhibit lifespan extension. The prediction is that they should not see the activation of stress response pathways in these mutants as they do not see lifespan extension, but this needs to be tested.

      (4) While the RNA-seq data is robust in Figure 2 as well as the follow-up quantitative PCR and trehalose/glycogen assays in 2A-B, the follow-up imaging assays for Msn2/4 localization in Figure 2 are not robust and are difficult to interpret. The authors need to include more high-resolution imaging or at least a close-up of the cells in Figure 3C.

      (5) The autophagy assays utilized in Figure 4 appear to all be done with a C-terminal GFP-tagged Atg8 protein. As C-terminal GFP is removed from Atg8 prior to conjugation to phosphatidylethanolamine, microscopy assays of this reporter cannot be utilized to report on autophagy activity or flux. Instead, the authors need to utilize N-terminally tagged Atg8, which they can monitor for vacuole uptake as an appropriate readout of autophagy levels. As it stands, the authors cannot draw any conclusions about autophagy activity in their studies.

    3. eLife assessment

      The study by Long et al. presents valuable findings on the role of the SUL1 gene in yeast longevity, proposing that lifespan extension can occur through signaling pathways independent of its sulfate transport function, offering new insights into aging mechanisms with potential implications beyond yeast biology. However, the evidence supporting the uncoupling of SUL1's transport and signaling functions is inadequate, relying on limited lifespan analysis without measurements for nutrients and nutrient signaling status. This research is of particular interest to the aging research community, although additional experiments are needed to fully substantiate the claims.

    4. Reviewer #1 (Public Review):

      The manuscript by Long et al. focused on SUL1, a gene encoding a sulfate transporter with signaling roles in yeast. The authors claim that the deletion of SUL1, rather than SUL2 (encoding a similar transporter), extended yeast replicative lifespan independent of sulfate transport. They also show that SUL1 loss-of-function mutants display decreased PKA activity, indicated by stress-protective carbohydrate accumulation, relevant transcription factor relocalization (measured during aging in single cells), and changes in gene expression. Finally, they show that loss of SUL1 increases autophagy, which is consistent with the longer lifespan of these cells. Overall, this is an interesting paper, but additional work should strengthen several conclusions, especially for the role of sulfate transport. Specific points include the following:

      - What prompted the authors to measure the RLS of sul1 mutants? Prior systematic surveys of RLS in the same strain background (which included the same sul1 deletion strain they used) did not report lifespan extension in sul1 cells (PMID: 26456335).

      - Cells carrying a mutant Sul1 (E427Q), which was reported to be disrupted in sulfate transport, did not have a longer lifespan (Figure 1), leading them to conclude that "lifespan extension by SUL1 deletion is not caused by decreased sulfate uptake". They would need to measure sulfate uptake in the mutants they test to draw that conclusion firmly.

      - Related to my previous point, another simple experiment would be to repeat the assays in Figure 1 with exogenous sulfur added to see if the lifespan extension is suppressed.

      - There needs to be more information in the text or the methods about how they did the enrichment analysis in Figure 2B. P-values are typically insufficient, and adjusted FDR values are reported from standard gene ontology platforms (e.g., PANTHER).

      - It is somewhat puzzling that relocalization of Msn2 was not seen in very old cells (past the 17th generation), but it was evident in younger cells. The authors could consider another possibility, that it was early and midlife experiences that made those cells live longer. Past that window, loss of Sul1 may have no impact on longevity. A conditional shutoff system to regulate SUL1 expression would be needed to test the above, albeit this is probably beyond the scope of this report.

      - The connections between glucose restriction, autophagy, and sul1 (Figure 4) could be further tested by measuring the RLS of sul1 cells in glucose-restricted cells. If RLS is further extended by glucose restriction, then whatever effects they see should be independent of glucose restriction.

      - They made and tested the double (sul1, msn2) mutants, but they should also test the sul1, msn4 combination since Msn4 functions similarly to Msn2.

    5. Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Long et al. demonstrated that the deletion of SUL1, which encodes a sulfate transporter localized on the plasma membrane, extends the replicative lifespan in S. cerevisiae. The authors further investigated the mechanism underlying this lifespan extension. They found that, unlike sul1∆ mutants, other mutants that have been shown to have a deficiency in sulfate transport cannot extend lifespan, from which they concluded that it is unlikely that SUL1 deletion extends lifespan by impairing sulfate intake. The authors then performed a series of characterizations on sul1∆ mutants and found that consistent with previous studies, PKA activity is downregulated when SUL1 is deleted. The authors demonstrated that SUL1 deletion promotes the nuclear localization of Msn2, as well as autophagy, which are known downstream signals of the PKA pathway. In addition, the authors show that MSN2 and ATG8 are indispensable for the lifespan extension in sul1∆ cells. Altogether, this manuscript suggests that SUL1 deletion extends lifespan by affecting PKA activity.

      Strengths:

      This study reported an interesting phenotype that the deletion of SUL1, but not SUL2, promotes lifespan extension in budding yeast. The authors performed some characterizations on sul1∆ mutants and epistatic studies to demonstrate that this lifespan extension requires MSN2 and ATG8, which further support the importance of the PKA pathway in regulating lifespan.

      Weaknesses:

      However, one of the major findings in this paper that SUL1 deletion extends lifespan independently of its role in sulfate uptake was merely based on lifespan measurements on sul2∆, SUL1E427Q, and met3∆ mutants, which cannot exclude the possibility that yeast lifespan is affected by sulfate intake. In addition, the strength of evidence for whether SUL1 deletion extends lifespan through affecting PKA activity is incomplete. It has been shown that Sul1 and Sul2 have redundant functions in both sulfate transport and PKA activation (Kankipati et al. 2015). However, in this manuscript, as shown by the authors, the deletion of SUL2 does not extend the lifespan compared with sul1∆ mutants. Without a further characterization on why deletion of SUL1, but not SUL2, extends lifespan, it is likely that SUL1 deletion extends lifespan independently of either sulfate transport or PKA activation.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This study presents a valuable finding on the possible use of vilazodone in the management of thrombocytopenia through regulating 5-HT1A receptor signaling. The evidence supporting the claims of the authors is solid, with the combined use of computational methods and biochemical assays. The work will be of broad interest to scientists working in the field of thrombocytopenia.

      Public Review:

      Reviewer #1 (Public Review):

      Summary:

      This is well-performed research with solid results and thorough controls. The authors did a good job of finding the relationship between the 5-HT1A receptor and megakaryocytopoiesis, which demonstrated the potential of vilazodone in the management of thrombocytopenia. The paper emphasizes the regulatory mechanism of 5-HT1A receptor signaling on hematopoietic lineages, which could further advance the field of thrombocytopenia for therapeutic purposes.

      Strengths:

      This is comprehensive and detailed research using multiple methods and model systems to determine the pharmacological effects and molecular mechanisms of vilazodone. The authors conducted in vitro experiments using HEL and Meg-01 cells and in vivo experiments using Zebrafish and Kunming-irradiated mice. The experiments and bioinformatics analysis have been performed with a high degree of technical proficiency. The authors demonstrated how vilazodone binds to 5-HTR1A and regulates the SRC/MAPK pathway, which is inhibited by particular 5-HTR1A inhibitors. The authors determined this to be the mechanistic underpinning for the effects of vilazodone in promoting megakaryocyte differentiation and thrombopoiesis.

      Weaknesses:

      (1) Which database are the drug test sets and training sets for the creation of drug screening models obtained from? What criteria are used to grade the results?

      Response: Thank you for your thoughtful comment. The database is built by our laboratory. Firstly, we collected 39 small molecule compounds that can promote MK differentiation or platelet formation and 691 small molecule compounds that have no obvious effect on MK differentiation or platelet formation to buiid the datbase. Then, the data of the remaining 713 types of small molecule compounds were utilized as the Training set, and the Molecular Descriptors of 2 types of active and 15 types of inactive small molecule compounds were randomly picked as the Validation set. With regard to the activity evaluation criteria, the prediction score for each molecule was between 0 and 1, and the model decision was made with a threshold of 0.5. The molecule with a score above the 0.5 threshold was identified as a megakaryopoiesis inducer (1).

      Reference:

      (1) Mo Q, Zhang T, Wu J, et al. Identification of thrombopoiesis inducer based on a hybrid deep neural network model. Thromb Res. 2023;226:36-50. doi:10.1016/j.thromres.2023.04.011

      (2) What is the base of each group in Figure 3b for the survival screening of zebrafish? The positivity rate of GFP-labeled platelets is too low, as indicated by the quantity of eGFP+ cells. What gating technique was used in Figure 3e?

      Response: We are deeply grateful for the insightful feedback you have provided regarding Figure 3 and the assessment of zebrafish model. We used 50 zebrafish embryos per group to evaluate VLZ toxicity, and we think this is a suitable and fair baseline. Our gating procedure is clearly depicted in the resulting diagram. Since our goal was to evaluate the fluorescence intensity quantitatively, we isolated the entire zebrafish cell. Since the amount of eGFP+ in various zebrafish tissues found in other literature is likewise quite low and we are unsure of the typical eGFP+ threshold for zebrafish (1, 2), we think this finding should be fair given that each group's activities in the experiment were conducted in parallel.

      Reference:

      (1) Yang L, Wu L, Meng P, et al. Generation of a thrombopoietin-deficient thrombocytopenia model in zebrafish. J Thromb Haemost. 2022; 20(8): 1900-1909. doi:10.1111/jth.15772

      (2) Fallatah W, De Silva IW, Verbeck GF, Jagadeeswaran P. Generation of transgenic zebrafish with 2 populations of RFP- and GFP-labeled thrombocytes: analysis of their lipids. Blood Adv. 2019;3(9):1406-1415. doi:10.1182/bloodadvances.2018023960

      (3) In Figure 4C, the MPV values of each group of mice did not show significant downregulation or upregulation. The possible reasons for this should be explained.

      Response: Thank you for your thoughtful comment. Megakaryocytes build pseudopodia, which form extensions that release proplatelets into the bone marrow sinusoids. Proplatelets convert into barbell-shaped proplatelets to form platelets in an integrin αIIbβIII mediated process (1-2). Platelet size is established by microtubule and actin-myosin-sceptrin cortical forces which determine platelet size during the vascular formation of barbell proplatelets (3). Conversion is regulated by the diameter and thickness of the peripheral microtubule coil. Proplatelets can also be formed from proplatelets in the circulation (4). Megakaryocyte ploidy correlates with platelet volume following a direct nonlinear relationship to mean platelet volumes (5). Usually there is an equilibrium between platelet generation and clearance from the circulation (normal turnover) controlled by thrombopoietin. When healthy humans receive thrombopoietin, their platelet size decreases (6). Proplatelet formation is dynamic and influenced by platelet turnover (7) which increases upon increased platelet consumption and/or sequestration. In our study, the MPV values of each group of mice did not show significant downregulation or upregulation, from our point of view, there are several possible reasons for these results.

      (1) Mice in a radiation-damaged state may result in a decrease in platelet count, but at the same time stimulate the bone marrow to release young and larger platelets, thus keeping the MPV relatively stable.

      (2) After radiation injury, bone marrow cells were suppressed, resulting in a decrease in the number of platelets produced, but MPV remained unchanged, possibly because the direct effects of radiation on the bone marrow caused thrombocytopenia, but not necessarily the average platelet size.

      Reference:

      (1) Thon JN, Italiano JE. Platelet formation. Semin Hematol. 2010(3):220-226. doi: 10.1053/j.seminhematol.2010.03.005.

      (2) Larson MK, Watson SP. Regulation of proplatelet formation and platelet release by integrin alpha IIb beta3. Blood. 2006(5):1509-1514. doi: 10.1182/blood-2005-11-011957.

      (3) Thon JN, Macleod H, Begonja AJ, et al., Microtubule and cortical forces determine platelet size during vascular platelet production. Nat. Commun. 2012(3):852. doi: 10.1038/ncomms1838.

      (4) Machlus KR, Thon JN, Italiano JE Jr. Interpreting the developmental dance of the megakaryocyte: a review of the cellular and molecular processes mediating platelet formation. Br. J. Haematol. 2014(2):227-36. doi: 10.1111/bjh.12758.

      (5) Bessman JD. The relation of megakaryocyte ploidy to platelet volume. Am. J. Hematol. 1984(2):161-170. doi: 10.1002/ajh.2830160208.

      (6) Harker LA, Roskos LK, Marzec UM, et al., Effects of megakaryocyte growth and development factor on platelet production, platelet life span, and platelet function in healthy human volunteers. Blood. 2000(8):2514-2522. doi: 10.1182/blood.V95.8.2514.

      (7) Kowata S, Isogai S, Murai K, et al., Platelet demand modulates the type of intravascular protrusion of megakaryocytes in bone marrow. Thromb. Haemost. 2014(4):743-756. doi: 10.1160/TH14-02-0123.

      (4) The PPI diagram and the KEGG diagram in Figure 6 both provide a possible mechanism pathway for the anti-thrombocytopenia effect of vilazodone. How can the authors analyze the differences in their results?

      Response: We are appreciated your valuable comments. PPI (Protein-Protein Interaction) refers to the interaction between proteins. Inside cells, proteins interact with each other to perform various biological functions, influencing cell signaling, metabolic pathways, cell cycle, and more. KEGG (Kyoto Encyclopedia of Genes and Genomes) is a database that integrates information on genomes, chemicals, and biological systems. In pharmacoinformatic, KEGG pathways are often used to understand the molecular mechanisms of specific diseases or biological processes. KEGG contains the interrelationships between genes, proteins, and metabolites, helping to reveal key nodes in biological processes. PPI information can be integrated with data from KEGG pathways, such as metabolic and signaling pathways, to gain a more comprehensive understanding of the role of protein-protein interactions in cellular processes and biological functions. For example, by analyzing nodes in the PPI network, proteins associated with a specific disease can be identified, and further examination of these proteins' locations in KEGG pathways can reveal molecular mechanisms underlying the onset and development of the disease. However, this method also has some limitations:

      Uncertainty (1): The construction of protein-protein interaction networks and drug interaction networks involves many assumptions and speculations. The edges of these networks may be based on experimental data but can also rely on bioinformatics predictions. Therefore, the accuracy of predictions is limited by the quality and reliability of the data used during network construction.

      Insufficient data (2): Despite the availability of a large amount of bioinformatics data for network construction, interactions between some proteins and drugs may still lack sufficient experimental data. This data insufficiency can result in inaccuracies in network predictions.

      Dynamics and temporal-spatial changes (3): The dynamics and temporal-spatial changes in biological systems are crucial for drug effects. Pharmacoinformatic may struggle to capture these changes as it often relies on static network representations, overlooking the temporal and dynamic nature of biological systems.

      Reference:

      (1) Fernando PC, Mabee PM, Zeng E. Integration of anatomy ontology data with protein-protein interaction networks improves the candidate gene prediction accuracy for anatomical entities. BMC Bioinformatics. 2020(1):442. doi: 10.1186/s12859-020-03773-2.

      (2) Zhang S, Zhao H, Ng MK. Functional module analysis for gene coexpression networks with network integration. IEEE/ACM Trans. Comput. Biol. Bioinform. 2015(5):1146-1160. doi: 10.1109/TCBB.2015.2396073.

      (3) Cinaglia P, Cannataro M. A method based on temporal embedding for the pairwise alignment of dynamic networks. Entropy (Basel). 2023(4):665. doi: 10.3390/e25040665.

      (5)-HTR1A protein expression is measured only in the Meg-01 cells assay. Similar quantitation through western blot is not shown in other cell models.

      Response: Your insightful criticism and recommendation to use different cell models in order to obtain a more accurate depiction of 5-HTR1A protein expression are greatly appreciated. We completely concur that using this strategy would greatly increase the validity of our research. However, establishing a primary megakaryocyte model requires specialized expertise and technical resources, which unfortunately are not readily available to us within the given timeframe. Nevertheless, we acknowledge the limitations of Meg-01 cells, which may exhibit distinct properties compared to true megakaryocytes. To mitigate this concern, we have ensured robust experimental design and rigorous data analysis to interpret our findings within the context of these model cell lines. We believe our results still provide valuable insights into megakaryocyte differentiation and address an important biological question.

      Reviewer #2 (Public Review):

      Summary:

      The authors tried to understand the mechanism of how a drug candidate, VLZ, works on a receptor, 5-HTR1A, by activating the SRC/MAPK pathway to promote the formation of platelets.

      Strengths:

      The authors used both computational and experimental methods. This definitely saves time and funds to find a useful drug candidate and its therapeutic marker in the subfield of platelets reduction in cancer patients. The authors achieved the aim of explaining the mechanism of VLZ in improving thrombocytopenia by using two cell lines and two animal models.

      Weaknesses:

      Only two cell lines, HEL and Meg-01 cells, were evaluated in this study. However, using more cell lines is really depending on the workflow and the grant situations of the current research team.

      Response: We deeply appreciate your insightful feedback and valuable suggestions regarding the use of more suitable models for studying the role of VLZ in megakaryocyte differentiation and platelet production. We fully agree that CD34+ hematopoietic stem/progenitor cells or primary megakaryocytes would provide a more accurate representation of in vitro megakaryopoiesis compared to HEL and Meg-01 cells, which possess limited potential for this process. We acknowledge that our current study did not include experiments with these preferred cell models. This is because our laboratory is still actively developing the technical expertise and resources required for establishing and maintaining primary megakaryocyte and CD34+ cell cultures. Despite the limitations of the current study, we believe the results using HEL and Meg-01 cells provide valuable preliminary insights into the potential effects of VLZ on megakaryocyte differentiation. We are actively working to overcome these limitations and plan to incorporate these more advanced models in our future investigations.

      Reviewer #1 (Recommendations For The Authors):

      I think the authors can enhance the mechanism study by developing more reliable models and methodologies. The connection to clinical research should be strengthened at the same time.

      Response: We deeply appreciate your insightful feedback and valuable suggestions regarding the use of more suitable models for studying the role of VLZ in megakaryocyte differentiation and platelet production. Despite the limitations, we are committed to expanding our research in the future by incorporating your suggestion and establishing a primary megakaryocyte model to further validate our findings and strengthen our conclusions. At the same time, we wholeheartedly concur with your suggestion to combine clinical research. Unfortunately, VLZ is not a first-line treatment for depression in China, and getting blood samples from the matching number of patients for analysis is a challenge. To give additional experimental support for the medication, we have attempted to improve the data in vivo as much as feasible, including by implementing the intervention in normal mice. Our findings should also contribute to the theoretical underpinnings of this medication and aid in its practical application.

      Reviewer #2 (Recommendations For The Authors):

      Issues the authors need to address:

      Figure 7: Why the band intensity of GAPDH in b or e is much greater than that in f, g, or h?

      Response: Thank you for your careful observation and insightful comment regarding Figure 7. Because the concentration of each batch of protein samples is different, sometimes the GAPDH band strength is increased by the large loading volume. Other factors that may influence the GAPDH band strength include the instrument's contrast adjustment during exposure and the use of different numbers of holes for electrophoresis. Meanwhile, the original three replicate results of all WB results will be provided in the supplementary materials.

      Finally, we sincerely thank you for providing us with this opportunity to make a further revision and modification of our manuscript, and your valuable and scientific comments are useful for the great improvement of our manuscript!

    2. Reviewer #1 (Public Review):

      Summary:

      This is well-performed research with solid results and thorough control. The authors did a good job of finding the relationship between the 5-HT1A receptor and megakaryocytopoiesis, which demonstrated the potential of vilazodone in the management of thrombocytopenia. It emphasizes the regulatory mechanism of 5-HT1A receptor signaling on hematopoietic lineages, which could further advance the field of thrombocytopenia for therapeutic purposes.

      Strengths:

      This is a comprehensive and detailed research using multiple methods and model systems to determine the pharmacological effects and molecular mechanisms of vilazodone. The authors conducted in vitro experiments using HEL and Meg-01 cells and in vivo experiments using Zebrafish and Kunming-irradiated mice. The experiments and bioinformatics analysis have been performed with a high degree of technical proficiency. The authors demonstrated how vilazodone binds to 5-HTR1A and regulates the SRC/MAPK pathway, which is inhibited by particular 5-HTR1A inhibitors. The authors determined this to be the mechanistic underpinning for the effects of vilazodone in promoting megakaryocyte differentiation and thrombopoiesis.

      Weaknesses:

      (1) Which database are the drug test sets and training sets for the creation of drug screening models obtained from? What criteria are used to grade the results?<br /> (2) What is the base of each group in Figure 3b for the survival screening of zebrafish? The positivity rate of GFP-labeled platelets is too low, as indicated by the quantity of eGFP+ cells. What gating technique was used in Figure 3e?<br /> (3) In Figure 4C, the MPV values of each group of mice did not show significant downregulation or upregulation. Please explain the possible reasons.<br /> (4) The PPI diagram and the KEGG diagram in Figure 6 both provide a possible mechanism pathway for the anti-thrombocytopenia effect of vilazodone. How can the author analyze the differences in their results?<br /> (5) 5-HTR1A protein expression is measured only in the Meg-01 cells assay. Similar quantitation through western blot is not shown in other cell models.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Response to reviewers

      We wish to thank the reviewers for the time taken to appraise the manuscript and the helpful feedback to improve it. We have taken onboard the suggested feedback and incorporated it into the revision. The findings of the revised manuscript are unchanged. Below is a point-by-point response to specific comments.

      Public reviews

      Reviewer 1

      Thank you to reviewer 1 for the thorough and insightful review of our manuscript. We are pleased that the strengths of our research, particularly the use of whole-genome bisulfite sequencing, the combination of animal and human data, and the investigation of a potential dietary intervention were recognized. We are confident that these aspects contribute significantly to the value and originality of our work.

      We acknowledge the concerns regarding the statistical rigor of the study, particularly the sample size and data analysis methods. We would like to address these points in more detail:

      Sample size: While we agree that a larger sample size would be ideal, the chosen sample size (n=4 per group) is consistent with other murine whole-genome bisulfite sequencing experiments in the field. We have carefully considered the cost-benefit trade-off in selecting this approach. In the revision we discuss the potential limitations of this sample size.

      Data analysis: We acknowledge the inconsistencies in the study reporting and have committed to improving the clarity in the revision. We carefully reviewed the concerns regarding the use of causal language and the interpretation of differences in our results. In some cases, the use of causal language is justified by the intervention study design. We also believe other explanations like stochastic variation affecting the same genomic regions in different tissues, are exceedingly unlikely from a statistical viewpoint. In the revision we have adopted a balanced approach to the language.

      Confounders: We acknowledge the importance of accounting for potential confounders such as birthweight, alcohol exposure and sex. The pups selected for genome analysis were matched for sex and on litter size as a proxy for in utero alcohol exposure. This careful selection of mice for genome analysis was intentionally guided to mitigate potential confounding.

      Statistical rigour: We acknowledge the importance of multiple testing correction in the genome-wide analysis. We used the DSS method of Feng et al (PMID: 2456180) which employs a two-step procedure for assessing significance of a region. Instead of a single p-value for the whole DMR, DSS uses the area statistic to rank candidate regions and control the false discovery rate through shrinkage estimation methods. This approach reduces the risk of reporting false positives due to multiple testing across numerous CpG sites. It is similar in respects to employing local FDR correction at 0.05 level, with an additional minimum effect size threshold applied, and particularly suited to experiments where the number of replicates is low. In the revision we have committed to improving the clarity of the reporting of statistical methods.

      Reviewer 2

      Thank you to reviewer 2 for the comprehensive and valuable feedback on our manuscript. We take your concerns about the generalizability of our findings and the interpretation of certain results seriously. We would like to address your specific criticisms in detail:

      Generalizability and Human Data: We agree that the generalizability of mouse models to human conditions has limitations. However, our study focused on understanding the early molecular alterations caused by moderate PAE, which can be more effectively modelled in a controlled environment like mice. To clarify this, we have strengthened the manuscript by emphasizing the focus on moderate PAE in the title and throughout the paper.

      Transcriptome Analysis: We recognize the importance of investigating the functional consequences of PAE-induced DMRs and agree that transcriptome analysis would be highly valuable. We are currently planning to conduct future transcriptomic studies to understand the link between DMRs and gene expression.

      Species-Specificity and DMR Enrichment: We acknowledge the likelihood of species-specific PAE effects. Our finding of enrichment of DMRs in non-coding regions was consistent with observations from the Lussier study of FASD. We agree there is further work to do and now highlight this in the discussion.

      Tissue Sample Locations: Due to technical restrictions of processing newborn mouse tissue, we are unable to enhance the manuscript with specific tissue regions sampled.

      Interpretation of Shared Genomic Regions: We appreciate your point about the alternative explanation for the shared genomic regions between brain and liver. Our interpretation is that regions identified in the alcohol group only affected equally in both tissues are likely established stochastically (as a result of the exposure) in the early embryo and then maintained in the germ layers. We have revised to suggest this is the most likely explanation and we acknowledge a more detailed examination in more tissues would be warranted for proof.

      Additional Feedback

      Reviewer 1

      Introduction

      • Line 65 - alcohol consumption is not always preventable and these statements further increase the stigma associated with FASD. A better way to say this would be "a leading cause of neurodevelopmental impairments".

      We have implemented this suggestion in revised manuscript.

      • The studies cited in lines 87-89 are somewhat outdated, as several more recent studies with better sample sizes have been published in recent years. I would recommend citing more recent publications in addition to these studies. Similarly, the authors should also cite Portales-Casamar et al., 2016 (Epigenetic & Chromatin) for the validation in humans, as it was the original study for those data.

      We have added a citation for the study mentioned by Portales-Casamar et al. (2016) in the revised manuscript.

      • Lines 95-95 - the authors should elaborate further on the "encouraging results" from choline supplementation studies, as these details may help interpret the findings from their own study.

      In the revised manuscript, we replaced “encouraging results” with “results suggesting a high methyl donor diet (HMD) could at least partially mitigate the adverse effects of PAE on various behavioural outcomes”.

      • Minor point: DNA methylation is preferable to "methylation" alone when not referring to specific CpGs or sites, as methylation can also refer to protein or RNA methylation.

      “Methylation” has been replaced with “DNA methylation” in revised manuscript

      Results

      • Line 118 - HMD should be defined here.

      HMD defined in revised manuscript

      • The figures in the main manuscript and supplemental materials are not in the same order as they are presented in the text.

      We apologise for this and thank the reviwer for their attendtion to detail. In the revision we have corrected the order of figures to match the text.

      • It is concerning that the H20-HMD group had lower baseline weights, which could impact the findings from these analyses. Please discuss how these differences were accounted for in the study design and analyses.

      We appreciate the reviewer's concern about the lower baseline weight in the H20-HMD group. We agree that this difference could potentially affect our findings. However, we want to emphasize that total weight gain during pregnancy was statistically similar across all groups by linear mixed effect model. Additionally, all dams were within the healthy weight range for their strain. While we cannot completely rule out any potential influence of baseline weight, we believe the similarity in weight gain and the healthy range of all dams suggest that the in-utero experience of pups regarding weight-related factors was likely comparable across groups.

      • I have some concerns regarding the cutoffs used to identify the DMRs, particularly given the small N and number of tests. The authors should report the number of DMRs that meet a multiple testing threshold; if none, they should use a more stringent threshold than p<0.05, as one would expect 950,000 CpGs to meet that threshold by chance (19,000,000 CpGs x 0.05). The authors should also report the number of DMRs tested, as this will be a more appropriate benchmark for their analyses than the number of CpGs (they should also report the specific number here).

      We appreciate the reviewer's concerns regarding the DMR cut-offs. We agree that clarifying the methods and justifying our choices is crucial. Our implementation of the DSS method for defining DMRs employs a local FDR p<0.05 cut-off, with additional delta beta threshold of 5%. We have clarified this in the methods section of the revised manuscript . We want to emphasize that the local FDR approach effectively mitigates the concern of chance findings by adjusting for multiple comparisons across the genome. Line 414-420 in the revised methods contains the following amended text

      “Differentially methylated regions (DMRs) were identified within each tissue using a Bayesian hierarchical model comparing average DNA methylation ratios in each CpG site between PAE and non-PAE mice using the Wald test with smoothing, implemented in the R package DSS (46). False-discovery rate control was achieved through shrinkage estimation methods. We declared DMRs as those with a local FDR P-value < 0.05 based on the p-values of each individual CpG site in the DMR, and minimum mean effect size (delta) of 5%”

      • I also have concerns about the delta cutoff for their DMRs. First, it is not clear if this cutoff is set for a single CpG or across the DMR (even then, it is not clear if this is a mean, median, max, min, etc.) Second, since the authors analyzed CpGs with 10X coverage, they can only reliably detect a delta of 0.1 (1/10 reads).

      Thank you for raising this important point. In the revision we have clarified the effect size cutoff reflects the mean effect across CpGs within the DMR as follows (line 418)

      “We declared DMRs as those with a local FDR P-value < 0.05 based on the p-values of each individual CpG site in the DMR, and minimum mean effect size (delta) of 5%”

      We chose the mean as it provides a comprehensive representation of the overall methylation change within the region, while ensuring all individual CpGs used in the analysis had at least 10x coverage. It is not true that we can only detect a delta of 1/10 reads, the mean effect is the relative difference in means between groups and is not dependent on the underlying sequencing depth.

      • Prenatal alcohol exposure is known to impact cell type proportions in the brain, which could lead to differences in DNAm patterns. The authors should address this possibility in the discussion, as well as examine their list of DMRs to determine if they are associated with specific brain cell types. The possibility of cell type differences in the liver should also be discussed.

      We agree with the reviewer that PAE-induced alterations in cell type proportions can influence DNA methylation patterns. While isolating specific cell types in our current study's brain and liver samples was not achievable due to tissue limitations, we acknowledge this as a limitation and recognize the need for further investigations incorporating single-cell or cell type-specific approaches in the discussion.

      • It is interesting, but maybe not surprising, that more DMRs were identified in the liver compared to the brain. This finding would warrant some additional interpretation in the discussion.

      We appreciate and agree that this finding indeed warrants further interpretation. We have added the following sentence into the discussion section of the revised manuscript that provides some potential factors behind this observation.

      Lines 263 “Indeed, most of the observed effects were tissue-specific, with more perturbations to the epigenome observable in liver tissue, which may reflect the liver’s specific role in metabolic detoxification of alcohol. Alternatively, cell type composition differences between brain and liver might explain differential sensitivity to alcohols effects”.

      • Lines 148-149 - I disagree about the enrichment of decreased DNAm in brain DMRs, as 52.6% is essentially random chance. The authors should also include a statistical test here, such as a chi-squared test, to support this statement.

      We agree that a revised interpretation is warranted. The updated manuscript has been amended as follows: “Lower DNA methylation with early moderate PAE in NC mice was more frequently observed in liver DMRs (93.5% of liver DMRs), while brain DMRs were almost equally divided between lower and higher DNA methylation with early moderate PAE (52.6% of brain DMRs had lower DNA methylation with early moderate PAE).”

      • Similarly, I would recommend the authors use increased/decreased DNAm, rather than hypermethylated/hypomethylation, as the latter terms are better suited to DNAm values near 100% or 0%.

      The use of hyper/hypo methylation is still considered common and well understood even for moderate changes. We agree the use of increased/decreased is more inclusive for a broader audience, so we have amended all references accordingly in the main text.

      • Lines 153-155 - please report the statistics to support these enrichment results. A permutation test would be well suited to this analysis.

      The reporting of statistics related to the enrichment test has now been amended to read “Overlap permutation tests showed liver DMRs were enriched in inter-CpG regions and non-coding intergenic regions (p < 0.05), while being depleted in all CpG regions and genic regions except 1to5kb, 3UTR and 5UTR regions, where there was no significant difference (Figure 2f).”

      • Line 156 - "overwhelming enrichment" is a very strong statement considering the numbers themselves.

      Omitted “overwhelming” in revised manuscript. Revised manuscript states: “Using open chromatin assay and histone modification datasets from the ENCODE project, we found enrichment (p < 0.05) of DMRs in open chromatin regions (ATAC-seq), enhancer regions (H3K4me1), and active gene promoter regions (H3K27ac), in mouse fetal forebrain tissue and fetal liver (Table 2).”

      • Lines 165-167 - Please describe the analyses and metrics used to determine if the DNAm differences were mitigated in the HMD groups. As it stands, it is not clear if they are simply not significant, or if the delta was decreased. In terms of a figure, a scatter plot of the deltas for these DMRs would be better suited to visualizing these changes.

      To determine whether DMRs were mitigated we simply applied the same statistical testing procedure on the subset of PAE DMRs in the group of mice exposed to the HM diet. The sample size is the same, and the burden on multiple testing is reduced as we did not test the entire genome. We believe our interpretation stands although we have urged caution in the discussion as follows (line 319)

      “Another key finding from this study was that HMD mitigated some of the effects of PAE on DNA methylation. Although a plausible alternative explanation is that some of the PAE regions were not reproduced in the set of mice given the folate diet, our data are consistent with preclinical studies of choline supplementation in rodent models (34, 35) (36). Moreover, a subset of PAE regions were statistically replicated in subjects with FASD, suggestive or robust associations. Although our findings should be interpreted with caution, they collectively support the notion that alcohol induced perturbation of epigenetic regulation may occur, at least in part, through disruption of the one-carbon metabolism.”

      • Given the lenient threshold to identify DMRs, it is possible that PAE-associated DMRs are simply false positives and do not "replicate" in a different subset of animals. One way to check this would be to determine whether there are any differences between mitigated/unmitigated DMRs and the strength of their initial associations. Should the mitigated DMRs skew towards higher p-values and lower deltas, one might consider that these findings could be false positives.

      We appreciate the reviewer's concern about potential false positives due to the chosen DMR identification threshold. We reiterate the DMR calling thresholds were adjusted for local FDR; however, we acknowledge the need for further validation. We haven't observed this trend of mitigated DMRs having higher p-values and lower deltas, but we have replicated some PAE DMRs in independent human datasets and found support for their biological plausibility in the context of PAE.

      • Related to the HMD analyses, I am concerned that the EtOH-HMD group consumed less alcohol, which could manifest in the PAE-induced DMRs disappearing, unrelated to the HMD exposure. The authors should comment on whether the pups were matched for ethanol exposure and include sensitivity analyses that include ethanol level as a covariate to confirm that their results are not simply due to decreased alcohol exposure.

      We appreciate the reviewer's concern regarding the lower alcohol consumption by Dams in the EtOH-HMD group and its potential impact on DMRs. We agree that consistent in utero exposure is crucial for reliable results. Our pup selection for genomic analysis involved matching litter size as a proxy for in utero exposure, so even through the average alcohol consumption was lower for the EtOH-HMD group, we matched pups across treatment groups based on litter size as a proxy for alcohol intake levels, excluding pups with significantly different exposure levels. We agree more robust methods including direct measurement of blood alcohol content would improve the study. We have now incorporated this into the discussion of the revised manuscript on lines 351: “Additionally, we employed an ad-libitum alcohol exposure model rather than direct dosing of dams. Although the trajectories of alcohol consumption were not statistically different between groups, this introduces more variability into alcohol exposure patterns, and might might impact offspring methylation data”

      • Lines 172 - please be more specific about the neurocognitive domains tested.

      In the revision we have included more detail about the neurocognitive domains tested (originally mentioned in the results) in the methods as follows:

      “These tests included the open field test (locomotor activity, anxiety) (38), object recognition test (locomotor activity, spatial recognition) (39), object in place test (locomotor activity, spatial recognition) (40), elevated plus maze test (locomotor activity, anxiety) (41), and two trials of the rotarod test (motor coordination, balance) (42)”

      • Line 191 - please report the tissue type used in the human study, as well as the method used to estimate cell type proportions.

      We stated in the results section that buccal swabs were used in both human cohorts.

      We added to the revised manuscript that cell type proportions were estimated using the EpiDISH R package.

      • Related to validation, it is unclear whether the human-identified DMRs were also validated in mice, or if the authors are showing their own DMRs. Please also discuss why DMRs might not have been replicated in AQUA.

      We used human data sets to validate observations from our murine model, focusing on regions identified in our early moderate PAE model. This is now explicitly state on line 209 of the revision:

      “We undertook validation studies by examining PAE sensitive regions identified in our murine model using existing DNA methylation data from human cohorts to address the generalizability of our findings.”

      “In the section entitled ‘Candidate Gene Analysis..’ we used our murine data sets to reproduce previously published associations that included regions identified in both animal and human studies. We posit the lack of replication of our early moderate PAE regions in AQUA is explained in part by species-specific differences and considering the striking differences in effect size seen in regions that did replicate in FASD subjects, the exposure may need to be of sufficient magnitude and duration for the effects seen in brain and liver to survive reprogramming in the blood. The AQUA cohort is largely enriched for low to moderate patterns of alcohol consumption.

      • Line 197 - please provide a citation for the ethanol-sensitive regions. There are also several existing DNAm analyses in brain tissues from animal models that should be included as part of these analyses, as several have shown brain-region and sex-specific DMRs related to prenatal alcohol exposure. These contrasts might help the authors further delineate the effects of prenatal alcohol in their model and expand on current literature to explain the deficits caused by alcohol exposure.

      Our candidate gene/region selection was informed by a systematic review of previously published human and animal studies reporting associations between in utero exposure to PAE and offspring DNA methylation. We synthesized evidence across several models, tissues and methylation platforms to arrive at a core set of reproducible associations. Line 481 of the methods now includes a citation to our systematic review which details our selection criteria.

      Discussion

      • Line 211 - This is a strong statement for one hypothesis. It is also possible that different cell types have similar responses to prenatal alcohol exposure. In this scenario, perturbations need not arise before germ layer separation. The authors should soften this causal statement.

      We appreciate this point although given the genome size relative to the size of the DMRs we have detected, the chance that different cell types would respond similarly in exactly the same regions seems exceedingly rare. We posit a more likely explanation is early perturbations in the embryo are established stochastically as a result of the exposure (supported by the interventional design) and maintained in the differentiating tissues. We agree further work is needed to prove this, specifically in a wider set of tissues from multiple germ layers so we have amended the discussion as follows:

      “These perturbations may have been established stochastically because of alcohol exposure in the early embryo and maintained in the differentiating tissue. Further analysis in different germ layer tissues is required to formally establish this.”

      • Lines 222-224 - I completely agree with this statement. However, the authors had the opportunity to examine dosage effects in their model as they measured alcohol-levels from the dams. At the very least, I would recommend sensitivity analyses in their DMRs to assess whether alcohol level/dosage influences their results.

      Although a great suggestion to improve the manuscript, we did not have opportunity to examine dosages by design as we selected mice for genome analysis with matched exposure patterns. It would be fascinating to conduct a sensitivity analysis.

      Methods:

      • Please include the lysis protocol.

      Thank you for picking up this error in our reporting. We have now included the following details in the methods which improve the reproducibility of this study: “Ten milligrams of tissue were collected from each liver and brain and lysed in Chemagic RNA Tissue10 Kit special H96 extraction buffer”.

      • Please include the total reads for each sample and details of the QC pipeline, including filtering flags, quality metrics, and genome build.

      Thank you for suggesting improvements to our reporting which improve the reproducibility of this study. We have included a new supplementary tableTab of sequencing statistics and details of the quality metrics. Please note the genome build is explicitly stated in the methods already.

      • Please make your code publicly available to ensure that these analyses can be replicated.

      Thank you for this suggestion. A data availability statement has now been included in the revision and code will be made available upon request

      • Why were Y chromosome reads included in the dataset?

      Y chromosomal reads were not included in the DMR analysis. Amended “We filtered the X chromosomal reads” to “We filtered the sex chromosomal reads” in revised manuscript.

      • Please provide the number of total CpGs available for analysis.

      Added sentence into results section of revised manuscript: “A total of 21,842,961 CpG sites were initially available for analysis.” We also clarified that the ~19,000,000 CpGs were analysed following coverage filtering.

      • Please provide the parameters for the DMR analysis and report how the p-values and deltas were calculated.

      We have addressed this in previous comments

      • The supplemental materials for the human data are missing.

      Thank you for picking up this oversight. The revision now includes an additional data supplement which details the analysis of the human data sets for interested readers.

      Tables and figures

      • Table 1. It is not clear how the DMRs for this table were selected. The exact p-values and FDR should also be reported in this table. The number of CpGs in these DMRS should also be reported.

      Table 1 includes select DMRs that were consistently detected in both brain and liver tissue. These are particularly of interest as they represent regions highly sensitive to alcohol exposure. We agree that exact reporting of p-values would be ideal. Instead of a single p-value for the whole DMR, DSS uses the area statistic to rank candidate regions and control the false discovery rate (FDR) through shrinkage estimation methods. In the revision we have now included region size and number of CpGs in table 1.

      • Table 3. Please include p-values for the DMR analyses.

      As above we report the area-statistic which is an equivalent measure to assess evidence for differential methylation.

      • Figure 2 (Figure 4 in revised manuscript). Please report the N for these analyses. It also seems that the pairwise t-tests were only compared to the H20-NC, which does not provide much insight into the PAE group. The relevance of the sexP analysis to the present manuscript is also unclear.

      Figure 2 is now Figure 4 in the revision and the sample size has been included in figure legend. We compared all groups to the control group (H20-NC) as we aimed to determine any differences in intervention groups from the control.

      We apologies for lack of clarity around the ‘sex P’ terminology. This refers to the p-value for the main effect of sex on the behavioural outcome. We agree it lacks relevance since the regression models were adjusted for sex. In the revision we have updated the methods as follows (line426) and removed references to sex P

      “To examine the effect of alcohol exposure on behavioural outcomes we used linear regression with alcohol group (binary) as the main predictor adjusted for diet and sex.”

      • Figure 3ef (Figure 2ef in revised manuscript). It is unclear how the regions random regions were generated. A permutation test would be relevant to determine whether there are any actual enrichment differences.

      As stated in methods section: “DMRs were then tested for enrichment within specific genic and CpG regions of the mouse genome, compared to a randomly generated set of regions in the mouse genome generated with resampleRegions in regioneR, with equivalent means and standard deviations.”

      • Figure 5. Please include the gene names for these DMRs, as well as their genomic locations. It would also be relevant to annotate these plots with the max, min, and mean delta between groups.

      Thank you, we considered this however the DMRs are not in genes so we cannot apply a gene label. The locations are reported on the x-axis and the statistics are shown in Table 3.

      • Figure S1b and S2c- It is quite worrisome that the PAE-HMD group drank less throughout pregnancy than their PAE counterparts. Please discuss how this was addressed in the analyses.

      We appreciate the reviewer's concern regarding the lower alcohol consumption in the PAE-HMD group and its potential impact on DMRs. We agree that consistent in-utero exposure is crucial for reliable results. Although the total amount of liquid consumed over pregnancy was lower in this group, they started with a lower baseline and the trajectory was not statistically different compared to other groups.

      We have now incorporated this into the discussion section of the revised manuscript on lines 336: “Additionally, we employed an ad-libitum alcohol exposure model rather than direct dosing of dams. Although the trajectories of alcohol consumption were not statistically different between groups, this introduces more variability into alcohol exposure patterns, and might might impact offspring methylation data.”

      • Figure S1cd. See my comments about Figure 2.

      Suggested changes have been incorporated.

      • Figure S2d. it is not clear to what the statistics presented in this panel refer. Please clarify and discuss the implications of dietary intake differences on your findings.

      Added sentence to caption in revised manuscript: “Statistical analysis involved linear mixed-effects regression comparing trajectories of treatment groups to H2O-NC baseline control group.”

      • Figure S3. See my comments about Figure 2.

      Suggested changes have been incorporated

      • Figure S4. I am confused by the color legend, as it seems both colors are PAE. I also do not see how any regions show increased or decreased DNAm in PAE based on this plot (also no statistics are presented to support these conclusions).

      The plot is intended to show there are no gross changes in methylation when averaged across all CpGs within different regulatory genomic contexts. Statistics are not included as it is intuitive from the plot that the means are the same. We have updated the figure legend which now reads

      “Figure S4. No evidence for global disruption of methylation by PAE. The figure shows methylation levels averaged across CpGs in different regulatory genomic contexts. Neither brain tissue (A & B), nor liver tissue (C & D) were grossly affected by PAE exposure (blue bars). Bars represent means and standard deviation.”

    2. Reviewer #2 (Public Review):

      Summary:

      Bestry et al. investigated the effects of prenatal alcohol exposure (PAE) and high methyl donor diet (HMD) on offspring DNA methylation and behavioral outcomes using a mouse model that mimics common patterns of alcohol consumption in pregnancy in humans. The researchers employed whole-genome bisulfite sequencing (WGBS) for unbiased assessment of the epigenome in the newborn brain and liver, two organs affected by ethanol, to explore tissue-specific effects and to determine any "tissue-agnostic" effects that may have arisen prior to the germ-layer commitment during early gastrulation. The authors found that PAE induces measurable changes in offspring DNA methylation. DNA methylation changes induced by PAE coincide with non-coding regions, including enhancers and promoters, with the potential to regulate gene expression. Though the majority of the alcohol-sensitive differentially methylated regions (DMRs) were not conserved in humans, the ones that were conserved were associated with clinically relevant traits such as facial morphology, educational attainment, intelligence, autism, and schizophrenia Finally, the study provides evidence that maternal dietary support with methyl donors alleviates the effects of PAE on DNA methylation, suggesting a potential prenatal care option.

      Strengths:

      The strengths of the study include the use of a mouse model where confounding factors such as genetic background and diet can be well controlled. The study performed whole-genome bisulfite sequencing, which allows a comprehensive analysis of the effects of PAE on DNA methylation.

      Weaknesses:

      Transcriptome analysis to test if the identified DMRs indeed affect gene expression would help determine the potential function of the identified methylation changes.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We greatly appreciate the editor and reviewers’ careful and professional assessment of this manuscript. We are delighted with the reviewers’ instructive comments and suggestions. We have tried to address the raised points comprehensively. The reviewers’ scrutiny has helped us immensely to discuss and present our work extensively and properly. We are grateful for the reviewers’ efforts and insights. The detailed responses are listed here.

      Recommendations for the authors

      (1) The intuition behind the model is not properly explained, i.e., the derivation of Eqs. 1-2 and the biological meaning of the AA/OO logic modes. A different notation could be helpful.

      We thank the reviewers for this comment, and agree that the interpretation of our model in manuscript was indeed in need of improvement. We have incorporated this suggestion into the manuscript. For clarity, we have substituted AND-AND/OR-OR for original expression of AA/OO, and hope that new notations are helpful for interpreting our work.

      In general, considering the diverse audience including those with experimental background, we feel that it is essential to present this manuscript in a more digestible manner. We therefore retain the entire derivation of Eqs. 1-2 in the supplementary method. We have added a qualitative introduction to model derivation and molecular biological significance underlying different logic motifs (AND-AND/OR-OR) in the revised manuscript. Please refer to Page 5 of the revised manuscript, lines 161-167 (see below).

      “X and Y are TFs in the CIS network. n1 and n2 are the coefficients of molecular cooperation. k1-k3 in Eq1 and k4-k6 in Ep2 represent the relative probabilities for possible configurations of binding of TFs and CREs. (Fig2.A). d1 and d2 are degradation rates of X and Y, respectively. Here, we considered a total of four CRE’s configurations as shown in Figure 2A (i.e., TFs bind to the corresponding CREs or not, 22=4). Accordingly, depending on the transcription rates (i.e., r0x, r1, r2, r3 in Eq1, similarly in Eq2) of each configuration, we can model the dynamics of TFs in the Shea-Ackers formalism[1, 2].

      Thus, the distinct logic operations (AND/OR) of two inputs (e.g., activation by X itself and inhibition by Y) can be further implemented by assigning corresponding profile of transcription rates in four configurations (Fig2.A). From the perspective of molecular biology, the regulatory logics embody the complicated nature of TF regulation that TFs function in a context-dependent manner. Considering the CIS network, when X and Y bind respective CREs concurrently, whether the expression of target gene is turned on or off depends on the different regulatory logics (specifically, off in the AND logic and on in the OR logic; Fig2.A). Notably, instead of exploring the different logics of one certain gene[3, 4], we focus on different combinations of regulatory logics due to dynamics in cell fate decisions is generally orchestrated by GRN with multiple TFs.”

      (2) More clearly specify the used parameters and how these are chosen. This would be helpful to get a more quantitative grasp of the conditions that they compare.

      We appreciate the reviewers pointing out unspecified parts in the main text. We have now included related discussion in the revised manuscript. Please refer to Page 5 of the revised manuscript, lines 179-181 (“Benchmarking the Boolean models with different logic motifs (Fig2.B), we reproduced the geometry of the attractor basin in the continuous models resembling those represented by corresponding Boolean models (Fig2.C; see Methods).”).

      We would like to highlight that the Boolean models with different logic motifs (Fig. 2B) explicitly display the difference of state spaces (i.e., attractor basin). Moreover, as the focus of this work is on the role of regulatory logics in cell fate decisions, we ponder that it is rational to specify the geometry of the landscape based on the hint from Boolean models. Therefore, we reason that it is intuitive and reliable to assign values to used parameters by mapping our ODE models (Eqs. 1-2) to corresponding Boolean models qualitatively (refer to the statement in our original manuscript, Page 5, lines 162-163, “With appropriate parameters, we are able to reproduce the Boolean-like attractor basin in the continuous models”). In producing Figure 2-5, setting of parameters was performed in a heuristic way without particular searching. However, to draw general conclusions, like the "trade-offs between progression and accuracy" and the presence of the fully-connected stage, we sampled a substantial number of sets parameters to ensure statistically robust findings.

      (3) Include the explanation of how the nullclines and basins shown in the figures (e.g., Fig. 2C, Fig. 4C, Fig. 4F, etc.) are calculated.

      We thank the reviewers for this suggestion. We have incorporated this into the legend of corresponding figures when first mentioned in the main text. Please refer to Page 7 of the revised manuscript, lines 217-223 (see below).

      “Fig2.C:

      (C) State spaces of the AND-AND (top panel) and OR-OR (bottom panel) motifs in ODE models. Dark and red lines represent nullclines of respectively. Stable steady states (SSS) are denoted as orange dots. Unstable Steady States (USSs) are denoted as white dots. Each axis represents the concentration of each transcription factor, which units are arbitrary. Blue, green and purple areas in state spaces indicate attractor basins representing LX, S and LY, respectively. Color of each point in state space was assigned by the attractors they finally enter according to the deterministic models (Eq1, Eq2). These annotations were used for the following Figure 3-7.”

      (4) Clarity on the decisions in the work is needed. For example, the "introduction" of asymmetry of the noise levels (as stated in line 215) appears completely arbitrary. The reason behind it can be guessed in the following paragraph, but the reader shouldn't have to guess.

      We agree entirely with the reviewers’ comment. Indeed, this should have been stated more explicitly. The motivation for incorporating asymmetry in the noise levels stems from our endeavor to mimic the inherent biological variability in gene expression within a cell population. We have adjusted the manuscript to better convey the motivation for investigating asymmetric noise level. Please refer to Page 8 of the revised manuscript, lines 237-238 (“In biological systems, it is unlikely that the noise level of different genes is kept perfectly the same.”).

      (5) Arbitrary and/or out-of-context jargon is used throughout the manuscript, making it hard to read and follow what the authors mean in some cases. For example, "temporal fully-connected stage" is used for the first time in line 290, and the term is not explained either in the main text or in the manuscript. Similarly, the reference to a Boolean-like and Boolean model (line 163 and Figure 1) without clarifying if this is just an analogy or if a formal model is built, nor the utility and implications of this comparison. Another problem related to jargon occurs on line 291, where the authors talk about "parameter sensibility", but such analysis (as it is normally understood in the field) is never performed; the authors perform a parameter exploration and make some general conclusions about the parameter space, but that is different than a parameter sensitivity analysis.

      We thank the reviewers for this comment, as it has prompted us to better clarify our manuscript. We have reviewed the manuscript and made the necessary adjustments to improve its clarity. We do hope that this revision meets the reviewers’ expectations on the clarity and comprehensiveness of our analysis.

      Regarding the jargon of "temporal fully-connected stage", we realized that this term was slightly vague and in need of improvement. Instead, we now employ “transitory fully-connected stage” in the revised manuscript to underline the short emergence of this particular stage. Please refer to Page 11 of the revised manuscript, lines 323.

      We thank the reviewers for pointing out the lack of clarity concerning the Boolean models. We have now amended the manuscript to make this implicit expression explicit. Please refer to Page 5 of the revised manuscript, lines 179-181 (“Benchmarking the Boolean models with different logic motifs (Fig2.B; see Methods), we reproduced the geometry of the attractor basin in the continuous models resembling those represented by corresponding Boolean models (Fig2.C; see Methods).”). Specifically, we employed the Boolean models (Fig.2B) as the reference to assist us to heuristically evaluate the applicability of used parameters in the ODE models. Therefore, the Boolean models are built formally, and corresponding updated rules are listed in Fig.2A (refer to the middle row in the table called “Logic Function”, now also noted in the legend of Fig.2B, Page 7, lines 213-214). Nevertheless, we do utilize the analogy between the attractor basins from Boolean models and ODE models (refer to Fig.2B-C). Accordingly, we used the term “Boolean-like” to describe the landscape presented by the continuous models (Eqs. 1-2; refer to the statement in our original manuscript, Page 5, lines 162-163, “With appropriate parameters, we are able to reproduce the Boolean-like attractor basin in the continuous models”).

      We appreciate the reviewers for this valuable comment, and agree that the usage of “parameter sensibility” was in need of adjustment. We have now amended the manuscript. Please refer to Page 10 of the revised manuscript, lines 318-321 (see below).

      “To manifest the generality, we globally screened 6,213 groups of parameter sets under the AND-AND motif, and this logic-dependent intermediated stage can be observed for 82.7% of them (see Methods; Table S1), indicating little dependence on particular parameter setting (1.8% in the OR-OR motif).”

      (6) Probably related just to the language clarity (i.e., the abuse of jargon), but we don't understand the conclusion on lines 296-298.

      We thank the reviewers for this comment. We have adjusted the manuscript accordingly. Please refer to Page 11 of the revised manuscript, lines 323-327 (see below). And we hope that the reviewers agree with our attempt at mapping into the particular stage in cell fate decisions from the point of landscape.

      “Furthermore, this transitory fully-connected stage locates between the fate-undetermined stage (Fig4.C top panel) and fate-determined stage (Fig4.C 3rd panel), comparable to the initiation (or activation) stage before the lineage commitment in experimental observations [5-7]. Therefore, we suspected that the robust fully-connected stage in the AND-AND motif may correspond to a specific period in cell fate decisions.”

      (7) The so-called "solution landscape" in Figure 4E needs to be better explained.

      We thank the reviewers for this comment. We have introduced the concept of solution landscape, which is a pathway map consisting of all stationary points and their connections, in lines 196-198 of the revised manuscript (see below).

      “Furthermore, we introduced the solution landscape method. Solution landscape is a pathway map consisting of all stationary points and their connections, which can describe different cell states and transfer paths of them [82-84].”

      In Figure 4E, we added detailed explanation of the solution landscape for the AND-AND motif. Specifically, it describes a hierarchical structure including one 2-saddle (yellow triangle), three 1-saddles (crimson X-cross sign), and three attractors (green dot). The layer of 1-saddles is represented by a blue translucent plane, and the bottom layer is the flow field diagram. The connections from 2-saddle to 1-saddles and from 1-saddles to the attractors are represented by red and blue lines, respectively. The arrow and color of the heatmap correspond to the flow direction and the length of the acceleration at each point in the state space.

      (8) Table S1 is not properly annotated, and then it is impossible to interpret how it supports the observations in the paragraph in lines 342-342.

      We appreciate the reviewers’ useful feedback. We have refined the annotations of all tables in our manuscript (Table S1-3). Please refer to “Supplementary Table” in resubmitted files.

      Specifically, we randomly collected 6,231 sets of parameters for the AND-AND motif and 6,682 sets for the OR-OR motif (k1-k6 in Eq1 and Eq2; refer to Page 6 of the revised supplementary method, see below).

      “First, to collect parameter sets with 3 SSSs, we used Latin hypercube sampling (LHS) to screen k-series parameters symmetrically (i.e., k1 = k4, k2 = k5, k3 = k6) ranging from 0.001 to 5 both in the AND-AND and OR-OR motifs. We ultimately collected 6,231 sets for the AND-AND motif and 6,682 sets for the OR-OR motifs (Table S1).”

      To analyze the sequence of vanishing SSSs, we further filtered parameter sets with 2 SSSs remained as increasing ux (corresponding to Eq3 in the revised manuscript, Page 10, lines 293). We then got a collection of 6,207 sets for the AND-AND motif and 6,634 sets for the OR-OR motif. Based on these parameter settings, we checked if the observations (refer to Page 13, lines 377-378, “The distinct sequences of attractor basin disappearance as ux increasing can be viewed as a trade-off between progression and accuracy.”) are artifacts of particular parameter choice.

      (9) The flow in Section 5 needs to be reorganised. For instance, it is not clear which question the authors are addressing in line 395, or how the proposed approach answers the question stated in lines 381-382.

      We greatly thank the reviewers for pointing this out, and acknowledge that the Section 5 was definitely in need of improvement. We have now amended the manuscript to make this implicit understanding explicit. Please refer to Page 15 of the revised manuscript, lines 426-430 (see below).

      “In prior sections, we systematically investigated two logic motifs under the noise- and signal-driven modes in silico. With various combinations of logic motifs and driving forces, features about fate-decision behaviors were characterized by computational models. Next, we questioned whether observations in computation can be mapped into real biological systems. And how to discern different logic motifs and driving modes is a prerequisite for answering this question.

      To end this, we first evaluated the performance of different models, specifically in simulating the process of stem cells differentiating towards LX (Fig6.A).”

      (10) There are two important weak points for the successful classification of the regulatory logic of real gene expression data as presented in the manuscript: (1) the small number of time-points in the datasets and clear peaks in gene expression heterogeneity cannot be identified, and (2) it is not always clear whether cell differentiation really exclusively relies on a CIS network, and which genes constitute it. These limitations should be solved or at least discussed in the manuscript.

      We thank the reviewer for this comment. First, we agree entirely that analysis of datasets with more time points will be more amenable to identifying the trends of gene expression variation. We have made a concerted effort towards searching for such datasets, but unfortunately, there are not many such datasets publicly available. Specifically, to apply our computational framework, the datasets of our interest need to fulfill the following three characteristics: (i) sampling at multiple time points (as many as possible); (ii) to illustrate/validate our findings clearly and representatively, we would like the cell fate decisions in the biological systems to follow the classical binary tree-like pattern. i.e., there is one stem cell fate (or progenitor) and two downstream cell fates in the systems; (iii) the core GRN circuits for orchestrating the fate-decision processes have been experimentally confirmed (at least clearly supported). We have also extended the discussion to include above points to explicitly note the limitations regarding the used datasets. Please refer to Page 25 of the revised manuscript, lines 762-766 (see below).

      “The gene expression datasets analyzed here are only available for a limited number of time points. Though they meet the need for discerning trends, it is evident that the application to the datasets with more time points will yield clearer and less ambiguous changing trends to support the conclusions of this paper more generally.”

      In regards to second point, we do acknowledge that the CIS network may not always be the core module for every fate-decision case (but to our knowledge, this can be assumed in many cases, especially in binary tree-like pattern). For applicability and potential relevance to our intended readership, we developed the models and draw our conclusions primarily based on the CIS topology for its representativeness. We intend to incorporate diverse topologies (like mutual activation with self-activation, Feed-Forward Loop, etc.) in our computational framework presented here in near future. Additionally, we have incorporated this point into the discussion in the revised manuscript. Please refer to Page 25 of the revised manuscript, lines 766-769 (see below).

      “Notwithstanding the fact that the CIS network is prevalent in fate-decision programs, there are other topologies of networks that serve important roles in the cell-state transitions, like feed-forward loop, etc. The framework presented in this work should further incorporate diverse network motifs in the future.”

      As referred by the reviewers, even if given the CIS network, we may not sure about which genes constitute it in some cases. We agree that further extension of our framework to mining key regulators is an interesting question. We also note that we have become very enthusiastic about recent work that shows how to nominate core factors from high-throughput data[8, 9]. Of note, in the last section of our manuscript titled “The chemical-induced reprogramming of human erythroblasts (EBs) to induced megakaryocytes (iMKs) is the signal-driven fate decisions with an OR-OR-like motif”, we leveraged patterns of temporal expression variance to filter out key regulators (Fig7.F and H). We thus underline the potential of mining genes comprising core GRN circuits through expression variance. Nevertheless, as the focus of the present paper is on the role of regulatory logic in cell fate decisions, we feel it is beyond the scope of the present article to continue the development of our results on this point. Instead, we have included discussion of case that genes comprising the CIS network are not defined. Please refer to Page 23 of the revised manuscript, lines 685-687 (see below).

      “Notably, if the genes that constituting the CIS network are not specified, we can conversely leverage the patterns of temporal expression variance to nominate key regulators in a model-guided manner.”

      (11) The models used in Figure S5 are never clearly described.

      We thank the reviewers for pointing this out. We have now introduced the settings of the models used in Figure S5 more clearly in the legend (see below).

      Two logic motifs with the noise-driven mode (FigS5.A, see below):

      Author response image 1.

      “Initial values were identical with attractor of S fate in Figure 2C (SSSs in green attractor basins). Simulation was preformed 1000 times for each pseudo-time point, with each temporal state (from left to right) recorded as a dot on the plot. Top panel: Noise level of X (σx) is set to 0.21, and σy is 0.09. Bottom panel: Noise level of Y (σy) is set to 0.21, and σx is 0.09. Red arrow represents the direction of fate transitions of S to LX. Other than adding a white noise, parameters were identical with those in Figure 2C.”

      Two logic motifs with the signal-driven mode (FigS5.B, see below):

      Author response image 2.

      “Initial values were identical with attractor of S fate in Figure 2C (SSSs in green attractor basins). Top panel: Noise level of X (σx) and Y (σy) are both set to 0.06. Simulation was preformed 1000 times, with each final state recorded as a dot on the plot. Parameter ux switched from 0 to 0.09 (0, 0.045, 0.09, from left to right). Bottom panel: Noise level of X (σx) and Y (σy) are both set to 0.05. Simulation was preformed 1000 times, with each final state recorded as a dot on the plot. Parameter ux switched from 0 to 0.24 (0, 0.12, 0.24, from left to right). Red arrow represents the direction of fate transitions of S to LX. Other model’s parameters were identical with those in Figure 2C.”

      (12) Up until Section 5, "noise levels" have been used to refer to an input/parameter in the model. Here it is assumed as an emergent property. Are the authors talking about the variance in expression (e.g., see line 398)? Is it defined as the coefficient of variation? Clarity is essential to interpret the observations in this section, e.g., "different driving modes change in the patterns of noise rather than expression levels" (lines 399-400).

      We greatly appreciate the reviewers pointing this ambiguity out. The term of “noise level” was indeed used to refer the strength of the noise in the models in Section 1-4. For classifying different logic motifs with two driving forces, we needed a practical metric that can be quantified from data, and we found population-level gene expression variance (i.e., “noise level” in line 398) is useful which defined as the coefficient of variation. For clarity, we carefully decide to substitute “expression variance” for “noise level” presented in Section 5-6. We have amended the manuscript accordingly, and hope this revision will be helpful for interpreting our result. Please refer to Page 15 of the revised manuscript.

      (13) "Pulse-like behaviour" is used in an arbitrary way, not as it is normally used in the field. Moreover, we consider this jargon expression does not contribute to the understanding of the paper. (The authors probably meant "discrete transitions" vs "gradual transitions".)

      We appreciate the reviewers’ valuable feedback regarding our use of the term “Pulse-like behavior”. We agree with the reviewers’ statement, and acknowledge that terminology of noise level’s patterns between different driving modes (noise-driven vs signal-driven; refer to Section 5 in our manuscript) was in need of improvement.

      Upon comprehensive consideration, we primarily decided to adopt the terms “monotonic transitions” and “nonmonotonic transitions” to recapitulate the trends of noise level, underlining the distinct temporal noise’s patterns in cell fate decisions brought by two driving forces in a more contrastive way. We anticipate that current jargon expressions will be beneficial for interpreting our work. Please refer to Page 15 of the revised manuscript.

      (14) The temporal resolution of the scRNAseq datasets that the authors used is too low to unambiguously distinguish a discrete pattern of gene expression heterogeneity from a rising profile. This limitation needs to be at least acknowledged in the text. Alternatively, the authors might want to identify more recent datasets with higher time resolution.

      We appreciate the reviewers’ insightful suggestions. We agree that analysis of datasets with higher time resolution will be more unambiguous to identifying the trends of gene expression variation. We have made a concerted effort towards searching for such datasets, but unfortunately, there are not many such datasets publicly available. Specifically, to apply our computational framework, the datasets of our interest need to fulfill the following three characteristics: (i) sampling at multiple time points (as many as possible); (ii) to illustrate/validate our findings clearly and representatively, we would like the cell fate decisions in the biological systems to follow the classical binary tree-like pattern. i.e., there is one stem cell fate (or progenitor) and two downstream cell fates in the systems; (iii) the core GRN circuits for orchestrating the fate-decision processes have been experimentally confirmed (at least clearly supported). Nevertheless, we recognize this limitation should be mentioned in the paper. So, we have also extended the discussion to include above points. Please refer to Page 25 of the revised manuscript, lines 762-766 (see below).

      “The gene expression datasets analyzed here are only available for a limited number of time points. Though they meet the need for discerning trends, it is evident that the application to the datasets with more time points will yield clearer and less ambiguous changing trends to support the conclusions of this paper more generally.”

      (15) In the case of embryonic stem cell differentiation, an additional complication is that this protocol yields heterogeneous cell type mixtures, whereas the authors' simulations usually are designed to give differentiation towards a single cell type. This difference makes it difficult to compare measures of gene expression heterogeneity between simulations and the experimental system to infer regulatory logic questionable.

      We thank the reviewers for this valuable comment and realize that we were not clear enough in the manuscript regarding the case of embryogenesis. In the biological system devised by Semrau et al[10], mouse embryonic stem cells (mESCs) differentiates into two lineages simultaneously, just as mentioned by the reviewers. We noticed this additional complication and performed other simulations in two logic motifs with increasing noise level of gene X and Y, as presented in Fig.S6E (see below).

      Author response image 3.

      “(E) Time courses on the coefficient of variation in expression levels of X and Y genes in silico during differentiation under the noise-driven mode. Initial values were set to the attractors of S fate in Figure 2C (SSSs in green attractor basins). Top panel: Noise level of X (σx) and Y (σy) are both set to 0.14. Bottom panel: Noise level of X (σx) and Y (σy) are both set to 0.1. Stochastic simulation was preformed 1000 times for each pseudo-time point.”

      Given the noise-driven mode, we further employed the expression pattern of Gbx2-Tbx3 circuit to heuristically infer the logic motif.

      (16) In contrast to the hematopoiesis example, the authors do not focus on a specific gene regulatory circuit with the ESC dataset. How their approach is possible on genome-wide data needs to be discussed.

      We thank the reviewers for this comment. Indeed, the core GRN orchestrating the fate-decision process reported by Semrau et al[10] is not fully elucidated. We here focus on the Gbx2-Tbx3 circuit (Fig.6H, Fig.S6D). These two TFs were filtered out from 22 candidate TFs and suggested as potential key regulators in the original paper[10]. Accordingly, at this point we followed the original paper’s statement.

      In regards to extension into biological systems without specific gene regulatory circuits, we have included discussions about the possibility that genes comprising the CIS network are not defined. Please refer to Page 23 of the revised manuscript, lines 685-687 (see below).

      “Notably, if the genes that constituting the CIS network are not specified, we can conversely leverage the patterns of temporal expression variance to nominate key regulators in a model-guided manner.”

      (17) [In supplemental material, pp.1] Possible typo: "In our word, we considered a GRN comprised...".

      Thanks for spotting this typo. We have amended it in the revised supplemental method (refer to Page 1 of the revised supplementary method).

      (18) [In supplemental material, pp.1] In Eqs. (1), the notation for the function HX([X]) implies that HX only depends on X, leaving the combinatorial regulation out. HX([X],[Y]) would be more general and accurate.

      Thanks for pointing this out. We have incorporated this suggestion into the manuscript. Please refer to Page 1 of the revised supplementary method.

      (19) [In supplemental material, pp.1] There are several works that have shown that the Hill coefficient is rarely representative of the number of binding elements. The model can be more general. See, for example, «Santillán, Moisés. "On the Use of the Hill Functions in Mathematical Models of Gene Regulatory Networks." Mathematical Modelling of Natural Phenomena 3, no. 2 (October 22, 2008): 85-97. https://doi.org/10.1051/mmnp:2008056.» and «Nam, Kee-Myoung, Rosa Martinez-Corral, and Jeremy Gunawardena. "The Linear Framework: Using Graph Theory to Reveal the Algebra and Thermodynamics of Biomolecular Systems." Interface Focus 12, no. 4 (June 10, 2022): 20220013. https://doi.org/10.1098/rsfs.2022.0013.»;

      We thank the reviewer for drawing our attention to this and highlighting the above works. Indeed, this is important information to include in the manuscript. We have incorporated this suggestion into the revised supplemental method (refer to Page 1 of the revised supplementary method). These references have now been included in the revised supplemental method (refer to references [2]-[3]).

      (20) [Minor] The configuration labels can be confusing, especially the AA, which is rather an AND NOT gate.

      We thank the reviewers for this comment. For clarity, we have substituted AND-AND/OR-OR for original expression of AA/OO, and hope that new notations are helpful for interpreting our work.

      (21) [Minor] Very low printing quality in Figure 1.

      Thanks for the feedback regarding the printing quality of Figure 1. We have made the necessary adjustments to improve its quality. We have also ensured that all other figures in the manuscript meet the required standards.

      (22) [Minor] We suggest including a quantitative scale for the bias in Fig. 3E.

      Thanks, we have incorporated this suggestion into the manuscript.

      (23) [Recommendation] Authors could also evaluate the cell fate decision processes as mutations or other perturbations affect a regulatory network.

      We appreciate the reviewers for this valuable recommendation. We agree with the reviewers that further involving new cases would be helpful, especially those mutation-driven disease-related fate-decision processes, such as neutropenia in chemotherapy. However, given the considerable effort towards searching for appropriate datasets, we carefully decide not to make this change.

      (24) [Recommendation] The authors could include some discussion of the likely impact of the work on the field and the utility of the methods and data to the community. For example, understanding the fluidity of the epigenetic landscape and the regulatory forces behind cell fate decisions can be of great importance in designing synthetic gene regulatory circuits.

      We greatly appreciate the reviewers pointing this out. In the original manuscript, we intentionally limited the length of the discussion to make the whole story more focus. We thank the reviewers for their insightful suggestions regarding the content of discussion. We have incorporated this suggestion into the revised manuscript. Please refer to Page 25, lines 751-757 (see below).

      “Recently, synthetic biology has realized the insertion of the CIS network in mammalian cells. One of the prerequisites for recapitulating the complex dynamics of fate transitions in synthetic biology is systematical understanding of the role of GRNs and driving forces in differentiation. And the logic motifs are the essential and indispensable elements in GRNs. Our work also provides a blueprint for designing logic motifs with particular functions. We are also interested in validating the conclusions drawn from our models in a synthetic biology system.”

      In addition, a longstanding question of our interest in cell fate decisions is what contributes the distinctive development cross species, like human, mice and so on forth. However, in addition to protein coding sequences, regulatory interactions between genes (i.e., activation and inhibition) also exhibit conservation as reported in recent work of multi-species cell atlas [11], and it is generally acknowledged that gene regulatory networks (GRNs) orchestrate fate-decision procedures. Namely, conserved regulatory programs further bring us a conserved topology of core GRNs. Thus, the logics of regulation, as another vital element in GRNs, is naturally under the spot light (related to the introduction, lines 99-120 of the revised manuscript). Nevertheless, to our knowledge, regulatory logic in cell fate decisions has received only scant attention. We hope that our elucidation of the role of logic motifs in cell fate decisions will attract more inquiries in community into GRN’s regulatory logic.

      Public reviews

      In this manuscript, Xue and colleagues investigate the fundamental aspects of cellular fate decisions and differentiation, focusing on the dynamic behaviour of gene regulatory networks. It explores the debate between static (noise-driven) and dynamic (signal-driven) perspectives within Waddington's epigenetic landscape, highlighting the essential role of gene regulatory networks in this process. The authors propose an integrated analysis of fate-decision modes and gene regulatory networks, using the Cross-Inhibition with Self-activation (CIS) network as a model. Through mathematical modelling, they differentiate two logic modes and their effect on cell fate decisions: requires both the presence of an activator and absence of a repressor (AA configuration) with one where transcription occurs as long the repressor is not the only species on the promoter (OO configuration).

      The authors establish a relationship between noise profiles, logic-motifs, and fate-decision modes, showing that defining any two of these properties allows the inference of the third. They also identify, under the signal-driven mode, two fundamental patterns of cell fate decisions: either prioritising progression or accuracy in the differentiation process. The authors apply this analysis to available high-throughput datasets of cell fate decisions in hematopoiesis and embryogenesis, proposing the underlying driving force in each case and utilising the observed noise patterns to nominate key regulators.

      The paper makes a substantial contribution by rigorously evaluating assumptions in gene regulatory network modelling. Notably, it extensively compares two model configurations based on different integration logic, illuminating the consequences of these assumptions in a clear, understandable manner. The practical simulation results effectively bridge theoretical models with real biological systems, adding relevance to the study's insights. With its potential to enhance our understanding of gene regulatory networks across biological processes, the paper holds promise. Its implications extend practically to synthetic circuit design, impacting biotechnology. The conclusions stand out, addressing cell fate decisions and noise's role in gene networks, contributing significantly to our understanding. Moreover, the adaptable approach proposed offers versatility for broader applications in diverse scenarios, solidifying its relevance beyond its current scope.

      We thank the reviewers for their enthusiasm for our work, and appreciate the professional, insightful and encouraging assessment.

      However, the manuscript in its current form also has some important weaknesses, including the lack of clarity in the text and the questionable generality of specific observations.

      We thank the reviewers for this comment. We have reviewed the manuscript and made the necessary adjustments to improve its clarity. We do hope that this revision meets the reviewers’ expectations on the clarity and comprehensiveness of our analysis.

      For instance, even when focusing on the CIS network, the effect of alternative model implementations is not discussed. Notably, the input signals are only considered as an additive effect over the differential equations, while signals can potentially affect each of the individual processes.

      We agree with the reviewers’ comment that signals may affect at each level of the central dogma, including transcription, translation, etc. Further, we have also included additional section titled “limitation of this study” on this point in the revised manuscript, and explicitly point to the potential limitations of our models. Please refer to Page 25 of the revised manuscript, lines 769-771 (see below).

      “In addition, for simplicity and intuition, we here considered signals as uncoupled and additive effects in ODE models, due to feasible mapping in real biological systems, such as ectopic overexpression.”

      The proposed model allows for a continuum of interactions/competition between transcription factors, yet only very restrictive scenarios are explored (strict AND/OR logic operations).

      We thank the reviewers for this comment, and appreciate them sharing the potential for further generalization of our framework. Indeed, in addition to logic operations, our framework is able to be applied to all two-node circuits (34=81 in total), including mutual activation with self-activation. As the focus of this work is to illustrate the role of logic motifs in cell fate decisions, we mainly concentrated on two classical, intuitive and representative (at least to us) logic operations AND/OR in the context of the CIS network. Nonetheless, we already have four combinations to consider (two logic motifs and two driving forces). And we feel that the currently involved scenarios have properly fulfilled our need to manifest the role of logic motifs. Hence, we carefully decided not to further explore more logic operations in this work. Instead, we have included additional section titled “limitation of this study” in the revised manuscript. Please refer to Page 25 of the revised manuscript, lines 760-762.

      “Although our framework enables the investigation of more logic motifs, we chose two classical and symmetrical logic combinations for our analysis. Future work should involve more logic gates like XOR and explore asymmetrical logic motifs like AND-OR.”

      Moreover, how the model parameters are chosen throughout the paper is not clear. Similarly, the concentration and times are not clearly specified, making their comparison to experimental data troublesome.

      We thank the reviewers for this comment. Regarding how to specify parameters in our model, we have now revised the manuscript. Please refer to Page 5 of the revised manuscript, lines 179-181 (“Benchmarking the Boolean models with different logic motifs (Fig2.B; see Methods), we reproduced the geometry of the attractor basin in the continuous models resembling those represented by corresponding Boolean models (Fig2.C; see Methods).”). In terms of concentration and time, we acknowledge that their units are arbitrary compared to a real experimental system. We now have noted this point in the legend of corresponding figures (Fig2.C, Fig3.B&D, Fig6.B-C, Fig7.E).

      We would like to highlight that our entire work is organized in a model-driven fashion (also called top-down). We did not fine-tune the sets of parameters used in our model to specifically match the experimental data. Actually, it is also a longstanding challenge in computational biology since experimental datasets are usually insufficient to specify the parameters in a dynamical model. So, in general, it is inevitable to involve more assumptions such as non-Markov process[12, 13] and may lead to artifacts. Thus, we decided to draw qualitative conclusions (e.g., trends over time) from a quantitative model with sampling of parameter sets. Hence, we did not intentionally tailor our models to fit different datasets (i.e., all models used in our work share same basic setting of parameters), mapping into real biological systems in a top-down manner.

      Regarding clarity, how the general model (equations 1-2) transforms into the specific cases evaluated in the paper is not clearly stated in the main text, nor are the positive and negative effects of individual transcription factors adequately explained. Similarly, in the main text and Figure 2, the authors refer to a Boolean model. However, they do not clearly explain how this relates to the differential equation model, nor its relevance to understanding the paper.

      We thank the reviewers for this comment, as it has prompted us to better clarify our manuscript. We have adjusted the manuscript accordingly and made the necessary adjustments to improve its clarity.

      Additionally, the term "noise levels" is generally used to refer to noise introduced in the "noise-driven" analysis (i.e., as an input or parameter in the models). Nonetheless, it is later claimed to be evaluated as an intrinsic property of the network (likely referring to expression level variability measured by the coefficient of variation).

      We greatly appreciate the reviewers pointing this ambiguity out. The term of “noise level” was indeed used to refer the strength of the noise in the models in Section 1-4. For classifying different logic motifs with two driving forces, we needed a practical metric that can be quantified from data, and we found population-level gene expression variance (i.e., “noise level” in line 398) is useful which defined as the coefficient of variation.

      For clarity, we carefully decide to substitute “expression variance” for “noise level” presented in Section 5-6. We have amended the manuscript accordingly.

      Finally, some jargon is introduced without sufficient context about its meaning (e.g., "temporal fully-connected stage").

      Regarding the jargon of "temporal fully-connected stage", we have realized that this term was slightly vague and in need of improvement. Instead, we now employ “transitory fully-connected stage” in the revised manuscript to underline the short emergence of this particular stage. Please refer to Page 10-11 of the revised manuscript, lines 316-327 (see below).

      “Notably, in the AND-AND motif we observed a brief intermediated stage before S attractor disappears, where all three fates are directly interconnected (Fig4.C 2nd panel and D 2nd panel, Fig.4E). To manifest the generality, we globally screened 6,213 groups of parameter sets under the AND-AND motif, and this logic-dependent intermediated stage can be observed for 82.7% of them (see Methods; Table S1), indicating little dependence on particular parameter setting (1.8% in the OR-OR motif). Unlike the indirect attractor adjacency structure mediated by S attractor (Fig2.D), the solution landscape with fully-connected structure facilitates transitions between any two pairs of fates. Furthermore, this transitory fully-connected stage locates between the fate-undetermined stage (Fig4.C top panel) and fate-determined stage (Fig4.C 3rd panel), comparable to the initiation (or activation) stage before the lineage commitment in experimental observations [5-7]. Therefore, we suspected that the robust fully-connected stage in the AND-AND motif may correspond to a specific period in cell fate decisions.”

      Additionally, proper discussion of previous work is also missing. For instance, the dynamics of the CIS network investigated by the authors have been extensively characterised (see e.g., Huang et al., Dev Biol, 2007), and how the author's results compare to this previous work should be discussed. In particular, the central assumptions behind the derivation of the model proposed in the manuscript must be assessed in the context of previous work.

      Thanks for pointing this out. We have extended the discussion to include above points. We have also discussed and cited the work of Huang mentioned above. Please refer to Page 22, lines 644-647 in the revised manuscript (see below).

      “One of the most representative work is that Huang et al. [14] modeled the bifurcation in hematopoiesis to reveal the lineage commitment quantitatively. Compared to simply modularizing activation or inhibition effect by employing Hill function in previous work, our models reconsidered the multiple regulations from the level of TF-CRE binding.”

      References

      (1) Ackers, G.K., A.D. Johnson, and M.A. Shea, Quantitative model for gene regulation by lambda phage repressor. Proc Natl Acad Sci U S A, 1982. 79(4): p. 1129.

      (2) Shea, M.A. and G.K. Ackers, The OR control system of bacteriophage lambda: A physical-chemical model for gene regulation. Journal of Molecular Biology, 1985. 181(2): p. 211-230.

      (3) Hunziker, A., et al., Genetic flexibility of regulatory networks. Proc Natl Acad Sci U S A, 2010. 107(29): p. 12998-3003.

      (4) Kittisopikul, M. and G.M. Suel, Biological role of noise encoded in a genetic network motif. Proc Natl Acad Sci U S A, 2010. 107(30): p. 13300-5.

      (5) Brand, M. and E. Morrissey, Single-cell fate decisions of bipotential hematopoietic progenitors. Curr Opin Hematol, 2020. 27(4): p. 232-240.

      (6) Zhang, Y., et al., Hematopoietic Hierarchy - An Updated Roadmap. Trends Cell Biol, 2018. 28(12): p. 976-986.

      (7) Arinobu, Y., et al., Reciprocal activation of GATA-1 and PU.1 marks initial specification of hematopoietic stem cells into myeloerythroid and myelolymphoid lineages. Cell Stem Cell, 2007. 1(4): p. 416-27.

      (8)Kamimoto, K., et al., Dissecting cell identity via network inference and in silico gene perturbation. Nature, 2023. 614(7949): p. 742-751.

      (9) Hammelman, J., et al., Ranking reprogramming factors for cell differentiation. Nat Methods, 2022. 19(7): p. 812-822.

      (10) Semrau, S., et al., Dynamics of lineage commitment revealed by single-cell transcriptomics of differentiating embryonic stem cells. Nat Commun, 2017. 8(1): p. 1096.

      (11) Li, J., et al., Deep learning of cross-species single-cell landscapes identifies conserved regulatory programs underlying cell types. Nature Genetics, 2022. 54(11): p. 1711-1720.

      (12) Stumpf, P.S., F. Arai, and B.D. MacArthur, Modeling Stem Cell Fates using Non-Markov Processes. Cell Stem Cell, 2021. 28(2): p. 187-190.

      (13) Stumpf, P.S., et al., Stem Cell Differentiation as a Non-Markov Stochastic Process. Cell Syst, 2017. 5(3): p. 268-282 e7.

      (14) Huang, S., et al., Bifurcation dynamics in lineage-commitment in bipotent progenitor cells. Dev Biol, 2007. 305(2): p. 695-713.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This work presents some valuable information regarding the molecular mechanisms controlling the regeneration of pancreatic beta cells following induced cell ablation. However, the study lacks the critical lineage tracing result to support the conclusion about the origin of the regenerated beta cells. The results of the pharmacological manipulation of CaN signaling are also incomplete. In particular, these manipulation are not cell-specific, making it difficult to interpret and thus genetic approach is recommended.

      Public Reviews:

      Reviewer #1 (Public Review):

      Induction of beta cell regeneration is a promising approach for the treatment of diabetes. In this study, Massoz et.al., identified calcineurin (CaN) as a new potential modulator of beta cell regeneration by using zebrafish as model. They also showed that calcineurin (CaN) works together with Notch signaling calcineurin (CaN) to promote the beta cell regeneration. Overall, the paper is well organized, and technically sound. However, some evidence seems weak to get the conclusion.

      Reviewer #2 (Public Review):

      This work started with transcriptomic profiling of ductal cells to identify the upregulation of calcineurin in the zebrafish after beta-cell ablation. By suppressing calcineurin with its chemical inhibitor cyclosporin A and expressing a constitutively active form of calcineurin ubiquitously or specifically in ductal cells, the authors found that inhibited calcineurin activity promoted beta-cell regeneration transiently while ectopic calcineurin activity hindered beta-cell regeneration in the pancreatic tail. They also showed similar effects in the basal state but only when it was within a particular permissive window of Notch activity. To further investigate the roles of calcineurin in the ductal cells, the authors demonstrated that calcineurin inhibition additionally induced the proliferation of the ductal cells in the regenerative context or under a limited level of Notch activity. Interestingly, the enhanced proliferation was followed by a depletion of ductal cells, suggesting that calcineurin inhibition would exhaust the ductal cells. Based on the data, the authors proposed a very attractive and intriguing model of the role of calcineurin in maintaining the balance of the progenitor proliferation and the endocrine differentiation. However, the conclusions of this paper are only partially supported by the data as some evidence from the data remains suggestive.

      (1) In the transcriptomic profiling, genes differentially regulated in the ablated adults could be solely due to the chemical effects of metronidazole instead of the beta-cell ablation. A control group without ins:NTR-mCherry but treated with metronidazole is necessary to exclude the side effects of metronidazole.

      We believe that it is unlikely that the differential regulation observed is due to metronidazole rather than the beta cell loss. This experimental strategy as proven successful in well-published studies to identify regulators of beta cell regeneration in the zebrafish larvae. Importantly, the candidates identified in these studies were subsequently functionally validated in mammalian models (Lu et al. 2016, Karampelias 2021). Moreover, in our study, we also used another chemical compound, the nifurpirinol (Bergemann et al., 2018), to ablate the beta cells. Regardless of whether we employed metronidazole or nifurpirinol for beta cell ablation, our results consistently indicate a notable involvement of calcineurin. Of note, the nifurpirinol molecule is commonly used in fishkeeping without toxicity reported on the global health of the fish.

      (2) Although it has been shown that the pancreatic duct is a major source of the secondary islets in the pancreatic tail in previous studies, there is no direct evidence showing the cyclosporin A-induced cells share the source in this manuscript. Without any proper lineage tracing work, the origin of those cyclosporin A-induced cells cannot be concluded.

      Our experimental setting is similar to the one described in Ninov et al. 2013, where lineage tracing experiments demonstrate an increase of beta cell formation in the pancreatic tail that originate from the pancreatic ducts. In our study, we performed the same experiment with the addition of CsA and showed more ductal cell proliferation (Figure 5G) followed by a 19% increase of beta cell regeneration compared to nonregenerative conditions (Figure 2B). It is unlikely that the additional 19% of regenerated beta cells under CaN inhibition come from another source than the 68% first.

      On the other hand, the acinar cells cannot be consider as another source of regenerated beta cell as they are not able to form beta cells unless they are artificially reprogrammed (Maddison et al., 2012). Therefore the only other potential source of regenerated beta cell is the endocrine compartment. However at the stage where we performed beta cell ablation, there are no endocrine cell in the pancreatic tail. Moreover, there are no evidence that secondary islets could come from the principal islet, they are tightly associated with the ducts and differentiate form ductal cell (Mi et al., 2023).

      Importantly, we demonstrated that overexpression of CaN specifically in the pancreatic ducts prevents beta cell regeneration. CaN effect is therefore intrinsic to the ducts. Moreover, we showed that CsA increase beta cells formation when Notch signalling is repressed. Given that Notch signalling is known to act on the ductal cell population, this strongly suggests again that CsA exacerbate beta cells formation from the ducts.

      All of these compelling evidences strongly support the notion that the cyclosporininduced beta cells originate from the ductal cells.

      (3) It is interesting to see an increase of beta cells in the primary islet after cyclosporin A treatment (Supplemental Fig 2B). However, it remains unclear if their formation shares the same mechanism with the newly formed beta cells in the pancreatic tail.

      There are indeed several source of beta cell regeneration in the primary islet. However, a recent study showed that the contribution of alpha cell to regeneration is minor and the main contributors are ductal and sst1.1 cells (Mi et al., 2023). In our previous publication, we indeed showed that a major source of beta cell in the principal islet is the delta 1.1 cell population. Those sst1.1 cells begin to express insulin and therefore are named ‘bihormonal’ (Carril et al., 2022). We tested if this population is impacted by CsA treatment and we showed below that CsA does not affect bi-hormonal cell formation (Figure 2D supplemental). These new results suggest that the CsA mediated increase of beta cells in the principal islet arise from the ductal cells as observed in the tail. These results were added in the manuscript as Figure 2D supplemental.

      Author response image 1.

      Tg (sst1.1:GFP); Tg (ins:NTR*-mCherry) larvae were treated at 3dpf with NFP 4µM to induce beta cell ablation. Then larvae were treated with CsA 1µM from 4 to 6 dpf (or ctl with DMSO); prior fixation and analysis of bi-hormonal cells in the principal islet at 6dpf.

      (4) The conclusion of the effect of cyclosporin A on the endocrine progenitors (Line 175) is not convincing because the data cannot distinguish the endocrine progenitors from the insulin-expressing cells. Indeed, Figure 2E shows that neurod1+ cells are fewer than ins+ cells (Figure 2D) in the pancreatic tail at 10 dpt, suggesting that all or at least the majority of neurod1+ cells are already ins+.

      The neurod1+ cells population indeed included both endocrine progenitor cells and differentiated endocrine cells. However, we would like to point out that the timing of the analysis is essential to reach our conclusion. When we treat with CsA, we show an increase of neurod1+ cells already at 4dpt. At this time point, no hormone- producing cell can yet be detected (Figure 2E). Those additional neurod1+ cell are therefore endocrine progenitors and not beta cells. This result shows that CaN inhibition induces pro-endocrine cell formation in regenerative conditions.

      At 10dpt, the neurod1+ cells population includes beta cells as well as endocrine progenitor cell. We agree that the way the data are presented in figure 2D and 2E can be confusing. Those 2 figures come form 2 separated experiments, the number of beta cell in figure 2D can therefore not be compared to the number of Neurod1+ cell in figure 2E. Indeed, from one experiment to another the efficiency and rate of regeneration can vary, independently of calcineurin. To clarify, we added the number of beta cells regenerated in the experiment of figure 2E (see Author response image 2 in red). As you can see in this experiment, regeneration was a bit slower than usual.

      Author response image 2.

      Tg (neurod1:GFP); Tg (ins:NTR*-mCherry) larvae were treated at 3dpf with NFP 4µM to induce beta cell ablation. Then larvae were treated with CsA 1µM from 4 to 6 dpf (or ctl with DMSO); prior fixation and analysis of GFP+ cells (in grey, pink, dark grey and green), and mCherry+ cells for the condition ablated + CsA in red from 2 to 10 dpf.

      (5) Figure 5D shows a significant loss of nkx6.1+ cells in the combined treatment group but there is no direct evidence showing this was a result of differentiation as the authors suggested. This cell loss also outnumbered the increase in ins+ cells (Figure 4D). The cell fates of these lost cells are still undetermined, and the authors did not demonstrate if apoptosis could be a reason of the cell loss.

      Firstly, as you can notice on the graphs, we encountered a very high variability between individuals within the same condition. We decided to show this variability by presenting the raw data. This high variability could partially explain the differences that you underline. Moreover, we would like to point out that independently of CaN inhibition the progenitor loss (nkx6.1+ cell) outnumber the gain of beta cells. Indeed, in average there is a loss of 29% (41 GFP+) of the nkx6.1+ cells and a gain of only 6 beta cells after Notch inhibitory treatment. The other progenitors cells being differentiated into other endocrine cell types (pro-endocrine, alpha, delta). In the combined treatment (Notch and CaN inhibitors), we decreased the number of progenitors cell by 50%, i.e 21% (20 cells) more than without CaN inhibitor. However, we increased the number of regenerated beta cells by two fold (6 cell to 12 cells). In brief, the important progenitors cell loss could be explained by precocious differentiation in the pro-endocrine and endocrine cells type. It is therefore normal than the number of beta cells regenerated do not match the progenitors cell number loss and this in presence or absence of CaN inhibition.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major concerns:

      (1) The evidence to indicate the proliferating ductal cell differentiate into beta cell is weak. They should use linkage tracing, or other marker genes immunostaining to confirm that.

      The experiment from the Figure 5 A-D is a short term tracing experiment and should have been presented as such in the manuscript. After LY411575 (Notch inhibitor) and CsA treatments at 3dpf, we exposed the larvae to EdU at 4dpf during 8 hours (Figure 5A). We showed that EdU is incorporated in dividing ductal cells at 4dpf (Figure 5C) ant that 2 days later there are newly form beta cells that are EdU+.(see Author response image 3) To reinforce our conclusion, the image below will be added to the manuscript.

      Author response image 3.

      Tg (nkx6.1:GFP); Tg (ins:NTR*-mCherry) larvae were treated at 3dpf with both CsA 1µM and LY411575 5µM. At 4dpf, the larvae were exposed to EdU 4mM during 8 hours, before analysis at 6 dpf.

      (2) To inhibition of CaN and Notch pathway, they just used the pharmacological approaches, genetical approaches should be used to get stronger evidence.

      We employed two distinct inhibitors specifically targeting calcineurin (CsA and FK506) for CaN inhibition. While these inhibitors have distinct chemical structures and potential non-specific effects, they both yield the same result of increased beta cell formation under Notch repression (see Figure 4D and Figure 4B in the supplementary data). This convergence of outcomes strongly suggests that the observed effect is primarily attributable to the specific inhibition of calcineurin.

      Furthermore, we complemented our inhibitor-based approach with a genetic strategy involving CaN overexpression (see Figure 3). Notably, the overactivation of CaN resulted in a reduction of beta cell regeneration. Given that this genetic approach generated an effect contrary to that achieved with the inhibitors, it provides robust support for our model, which postulates that calcineurin plays a critical role in the regulation of beta cell regeneration (see Figure 3, panels C-E).

      As for Notch inhibition, previous published data from our laboratory compared the effects of Notch inhibitor (LY411575) and genetic approaches (mib mutant and transgenic line) on pro-endocrine cell (ascl1b+) and ductal cell (nkx6.1+) formation. This study showed that both Notch inhibitor (LY411575) and Notch repression using genetic approaches recapitulate the same effect: an induction of pro-endocrine cells formation. The specificity of this inhibitor being validated (Ghaye et al., 2015), we did not consider the need of a genetic approach.

      (3) The most enriched pathways among the up-regulated genes were DNA replication and cell cycle, which suggested that these genes are more important for the duct cell proliferation, how is Calcineurin related to these pathways, such as regulating the genes important for proliferation?

      The transcriptomic data presented in this manuscript suggest that the ductal cells undergo a strong proliferative response after beta cell ablation. This is in accordance with our experimental data showing activation of ductal proliferation after beta cell ablation (Ghaye at al., 2015) and data from this manuscript (Figure 1 I-J).

      Calcineurin is a well-known regulator of the cell cycle, and can either promote or repress the cell cycle depending on the cell type. For example, stressing the cell provokes an entry of calcium and subsequently a CaN activation which result in cell cycle arrest (Leech et al. 2020). Nevertheless, depending the cell type, CaN can be either necessary or deleterious to cell proliferation (Goshima et al. 2019; Masaki and Shimada 2022). The intriguing dual role of CaN in cell cycle is well illustrated in β cell regeneration. While CaN should be repressed to enable ductal progenitor amplification and subsequent endocrine differentiation, CaN is then necessary for β cell function and for their replication (Dai et al. 2017; Heit et al. 2006). Moreover, CaN is related to cellular senescence and CaN function is important for proper fin regeneration in zebrafish.

      (4) It is hard to understand why they pick up the pathway of cellular senescence signature for the duct cell progenitor neogenesis? Moreover, among these senescence genes, many genes are cell cycle regulators.

      In response to beta cell ablation, the ductal cells undergo a strong proliferative response, as shown in our previous data (Ghaye 2015). It was therefore not surprising that many differentially expressed genes are cell cycle regulators. On the other hand, the cellular senescence signature was surprising. Indeed, senescence is usually associated with cell cycle arrest and aging. However, recent studies showed that cellular senescence is required for proper development and regeneration. We therefore wanted to investigate this pathway and more particularly the function of calcineurin, which can either promote or repress the cell cycle in different cell types (see comment above).

      (5) The RNA-seq data obtained from adult fish, while the authors use larvae to explore the CaN functions, it may have different conclusion using adult fish. Moreover, it is unclear whether the CaN increased when the beta cell ablated in young larvae.

      We decided to first perform functional experiment in the larvae as this model unable the quantification of beta cell regeneration from the ducts in the pancreatic tail. However, to validate our results in non-developmental stages, we perform experiments in juveniles (2 months old) and adults. CsA treatments in juveniles zebrafish recapitulated the same results that in larvae (Figure 2B and Figure 6A-C). Moreover, we showed that CaN overactivation delayed glycemia recovery after ablation adults (Figure 6D-E), which is in accordance with an impaired regeneration. Altogether, these results strongly suggest that CaN act as regulator of beta cell regeneration both in the juvenile/adult and larval stages.

      Concerning the expression of CaN in the zebrafish larvae, we tried to detect the level of CaN in the different experimental conditions by in situ hybridization. However, we were not able to detect it using this technique. We also tried immunostaining with antiphospho-nfact3 ser165 polyclonal antibody (Invitrogen) but this antibody does not seem to work in zebrafish. Finally, we tried to sort ductal cell at larval stage to perform a transcriptomic analysis but we were unable to collect enough ductal cells to proceed further. Indeed our staining experiment showed that there are only around 150 ductal cells (nkx6.1+, Figure 5D) at this stage.

      (6) The beta cell regeneration in the young larvae usually recovers within ~ 5 days in principle islet. Please also show the beta cell number (PI) during the beta cell recovery after ablation.

      We did show beta cell regeneration in the principal islet in Figure 2A-B supplemental. While new beta cells appears quickly in this islet (Carril, Massoz, Dupont et al., 2023), the principal islet has not yet fully recover at 5dpt.

      (7) Since the studies did not show the CaN level in Fig.3, it is hard to know that the CaN is exactly expressed.

      In the figure 3B, using Tg(hsp70:GFP-CaNCA), it is indeed not possible to see CaN expression at 10 dpt as the heat shocks induce only transiently CaNCA overexpression. However, the transient expression was detected in live shortly after the heat shocks. On the other hand, with the transgenic line Tg(UAS:GFP-CaNCA); Tg(cftr:Gal4), in which GFPCaNCA is continuously expressed allowing us to show CaNCA expression in the pancreatic ducts (Figure 3).

      (8) In Fig.6 D and 6E, did these drug treatments change the glucose level in nonablated fish?

      As you can see below, the CaN inhibitor, CsA does not affect the glycemia of the fish in non-regenerative conditions.

      Author response image 4.

      Glycemia of non-ablated fish, 3 days after drug treatment.

      (9) The logic of writing in Results is very hard to understand.

      We proofed read the paper in an effort to clarify it.

      Minor concerns,

      (1) Make a scheme for ablation and RNA-seq, and indicate the age of the fish used in Fig. 1.

      We added the scheme in Figure 1 supplemental.

      (2) In Fig. 1G, two arrows indicated mCherry+ cells is hard to see in the non-ablated fish.

      One arrow was indeed mislocated, we moved the arrow and try to improve the intensity of red. However, the only cells are indeed small and can be difficult to see.

      (3) In Fig.6, it is hard to know that the arrows indicated islets are small islets (up to 5 cells), how they compared with big islets and defined as small islet. Moreover, some of these islets are almost invisible.

      We now show a close up of a portion of the pancreatic tail and show the beta cells with arrows only in this picture, to enhance clarity.

      Reviewer #2 (Recommendations For The Authors):

      (1) This manuscript needs more proofreading and polishing to increase its readability.

      We proofread the manuscript and change some paragraph for more clarity.

      (2) The extensive use of words like "modulate" or "regulate" sometimes makes the text ambiguous as the effect is not stated directly and clearly.

      We re-wrote some parts of the text and try to avoid using “regulate” as often.

      However, as we used both repression and over-activation of CaN, we still use words as regulate to stipulate general conclusions on the function of CaN.

      (3) The list of individual differentially regulated genes after the beta-cell ablation in the RNAseq seems missing. This list could be interesting and helpful for other researchers. We added it.

      (4) In Figure 1D, "modulated" genes are shown but were they all upregulated like those in Figure 1A? The modulation should be indicated more clearly (e.g. up- or down-regulated) in the figure. The authors can use different colours to illustrate that.

      Done.

      (5) Is Figure 2D showing the same data extracted from Figure 2B? Does Figure 2D add any information to the data?

      No, it does not add data. We actually add the Figure 2D for a better visualisation of the increase at 10dpt.

      (6) In the y-axis of Figure 3E, it should be "mCherry".

      It already is. We did check all the axis again to be sure it is correct.

      (7) Line 219, "Figure 4E supplemental" instead of "Figure 4D supplemental"

      Done.

      (8) Line 266, "ablated juveniles" instead of "ablated larvae"

      Done. Thank you for noticing these mistakes.

      (9) In Figure 6A, many mCherry+ cells are hardly visible and there are some greyish white signals in the images that are supposed to show the mCherry channel only. What are those grey signals?

      There is no channel showing grey on the picture, I improved the overall quality of this pictures and show close up to improve the figure.

      (10) In Figure 6D and 6E, CaNCA overexpression had a significant effect on the glycemia. But did the overexpression affect the beta cell formation or regeneration? We showed that CaNCA overexpression did not affect beta cell formation in absence of regeneration in the larvae (Figure 3E). Moreover, it does not affect the glycemia of the fish in non-regenerative conditions (Author response image 5). As for regenerative conditions, CaN overexpression decreased the regeneration in the larvae (Figure 3E).

      Author response image 5.

      Glycemia of Tg(UAS:GFP-CaNCA); Tg(cftr:Gal4) fish, overexpressing CaNCA, compared to controls fish, in non-regenerative conditions.

      (11) The role of calcineurin seems transient (e.g. Figure 2B and 4E) and does not play a significant role in long term. It would be interesting to see if long-term/repeated treatments of calcineurin inhibitors and overexpression/knockout of important members of calcineurin signaling would affect the pool of progenitors in long term.

      We were also interested in the consequences of CaN overexpression on the long term. Our overexpression tool Tg(UAS:CaNCA) allow to address this question, as CaN is overexpress permanently. We assessed the structure of the ducts and the number of beta cells in transgenic larvae and did not see any defects of the ducts whether in regenerative context or not. On the other hand, we showed in this manuscript that CaN effect is specific to regenerative conditions. As a consequence, it is not likely that repeated treatments long after the ablation would continue to affect beta cell formation and the progenitors pool.

    2. Reviewer #2 (Public Review):

      This work started with transcriptomic profiling of ductal cells to identify the upregulation of calcineurin in the zebrafish after beta-cell ablation. By suppressing calcineurin with its chemical inhibitor cyclosporin A and expressing a constitutively active form of calcineurin ubiquitously or specifically in ductal cells, the authors found that inhibited calcineurin activity promoted beta-cell regeneration transiently while ectopic calcineurin activity hindered beta-cell regeneration in the pancreatic tail. They also showed similar effects in the basal state but only when it was within a particular permissive window of Notch activity. To further investigate the roles of calcineurin in the ductal cells, the authors demonstrated that calcineurin inhibition additionally induced the proliferation of the ductal cells in the regenerative context or under a limited level of Notch activity. Interestingly, the enhanced proliferation was followed by a depletion of ductal cells, suggesting that calcineurin inhibition would exhaust the ductal cells. Based on the data, the authors proposed a very attractive and intriguing model of the role of calcineurin in maintaining the balance of the progenitor proliferation and the endocrine differentiation. However, the conclusions of this paper are only partially supported by the data as some evidence of the lineage between ductal cells and beta cells remains suggestive.

    1. Author Response

      eLife assessment

      We appreciate the assessment carried out by the editorial team at eLife. Therefore, we plan to review the methods section in order to make the statistical analysis more comprehensible for each of the displayed figures.

      Public reviews

      Reviewer 1

      We would like to express our gratitude to Reviewer 1 for providing a thorough summary of our work and highlighting its strengths. With regards to the weaknesses, we are committed to improve the manuscript by performing the necessary changes. First, we will specify the exact p-value in all cases.

      Regarding the discussion section, we acknowledge the feedback regarding its potential confusion. In line with the reviewer's suggestion, we will reduce the literature review and highlight our findings.

      Finally, for the preprint we did not include cofounders such as HIV infection and ethnicity as our study population did not exhibit viral infections and comprised only Hispanic individuals. We will make a more thorough description of the population of study and address these characteristics explicitly in both the methods section and the initial part of the results.

      Reviewer 2

      We appreciate and thank reviewer 2 for the commentaries. Although it is true that several papers have described the role of microbiome in COVID-19 severity, we firmly believe that our current work stands out.

      There is not much information related to this association in mediterranean countries, especially in the south of Spain. In addition, most of the studies only describe microbiota composition in stool or nasopharyngeal samples separately, without investigating any potential relationships between them as we do.

      (1) We agree with the reviewer idea of a limited sample size. We faced the challenge of collecting the samples during the peak of COVID-19 pandemia. Thus, doctors and nurses were overwhelmed and not always available for carrying out patient recruitment following the inclusion criteria. Despite these constraints, we ensured that all included samples met our specified inclusion criteria and were from subjects with confirmed symptomatology.

      In addition, our main goal was to identify whether severity of the disease could be assessed through microbiota composition. Therefore we did not include a healthy group. Despite not having a large N, our results should be reproducible as they are supported by statistical analysis.

      (2) We thank reviewer commentary, and since our original sentence may have lacked clarity, we intend to modify it to ensure it conveys the intended meaning more effectively.

      Nonetheless, we remain confident in the significance of our findings. Not only have we found correlation between microbiota and COVID severity, but we have also described how specific bacteria from each condition is associated with key biochemical parameters of clinical COVID infection.

      (3) We appreciate the feedback provided by the reviewer. In this case, we have performed 16S analysis due to its cost-effectiveness compared to metagenomic approaches. Furthermore, 16S analysis has undergone refinements that ensure comprehensive coverage and depth, along with standardized analysis protocols. Unlike 16S, metagenomic approaches lack software tools such as QIIME that facilitate standardization of analysis and, thus, reduce reproducibility of results.

      (4) We sincerely appreciate this insightful suggestion. simply listing associations between both microbiomes and COVID-19 severity could not be enough, we intend to discuss how microbiota composition may be linked to the mechanisms underlying COVID-19 pathogenesis in our discussion.

      (5) We are grateful for the constructive criticism and intend to rewrite our abstract to enhance clarity. Additionally, we will thoroughly review all figures and their descriptions to ensure accuracy and comprehensibility.

      Reviewer 3

      We acknowledge the annotations made by reviewer 3 and are committed to addressing all identified weaknesses to enhance the quality of our work. Our idea is to modify the methods section and figures to make them easier to understand.

      Specifically, in the case of Figure 1, we recognize an error in the description of the Bray-Curtis test. We appreciate the commentary and we will make the necessary changes. Moreover, there is another observation related to Figure 1 description. We are going to modify it in order to gain accuracy.

      For figure 2 we are planning to add a supplementary table showing the abundance of detected genus. Nevermind, we will also update the manuscript text to provide clarification on how we obtained this result.Regarding the clarification about "1% abundance," we want to emphasize that we are referring to relative abundance, where 1 represents 100%. To avoid confusion, we will explicitly state this in both the methods section and figure descriptions. Besides, it is true that the statistical test employed for the analysis is not mentioned in the figure description and we recognize that the image may be difficult to interpret. Therefore, we will modify the text and a supplementary table displaying the abundance and p values is going to be added.

      Furthermore, we agree with the reviewer's suggestion to investigate whether the bacteria identified as potential biomarkers for each condition are specific to their respective severity index or if there is a threshold. Thus, we will reanalyze the data and include a supplementary table with the abundance of each biomarker for each condition. We will also place greater emphasis on these results in our discussion.

      Finally, in response to the reviewer's suggestion, we are going to go through the nasopharyngeal-fecal axis part in the discussion. It is well described that COVID-19 induces a dysbiosis in both microbiomes.

      Consequently, we understand that the ratio we have described could be an interesting tool for assessing COVID severity development as it considers alterations in both environments. However, we acknowledge that there may be room for improvement in clarifying the significance of this intriguing finding and its implications.

    1. Author Response

      The following is the authors’ response to the original reviews.

      eLife assessment

      This comprehensive study provides valuable information on the cooperation of Ikaros with Foxp3 to establish and regulate a major portion of the epigenome and transcriptome of T-regulatory cells. However, the characterization is incomplete in that incontrovertible evidence that these are intrinsic features regulating biological function and not outcomes of the inflammatory micro-environment of the genetically manipulated mice is missing.

      Public Reviews:

      This study investigates the role of Ikaros, a zinc finger family transcription factor related to Helios and Eos, in T-regulatory (Treg) cell functionality in mice. Through genome-wide association studies and chromatin accessibility studies, the authors find that Ikaros shares similar binding sites to Foxp3. Ikaros cooperates with Foxp3 to establish a major portion of the Treg epigenome and transcriptome. Ikaros-deficient Treg exhibits Th1-like gene expression with abnormal expression of IL-2, IFNg, TNFa, and factors involved in Wnt and Notch signaling. Further, two models of inflammatory/ autoimmune diseases - Inflammatory Bowel Disease (IBD) and organ transplantation - are employed to examine the functional role of Ikaros in Treg-mediated immune suppression. The authors provide a detailed analysis of the epigenome and transcriptome of Ikaros-deficient Treg cells.

      These studies establish Ikaros as a factor required in Treg for tolerance and the control of inflammatory immune responses. The data are of high quality. Overall, the study is well organized, and reports new data consolidating mechanistic aspects of Foxp3 mediated gene expression program in Treg cells.

      Strengths:

      The authors have performed biochemical studies focusing on mechanistic aspects of molecular functions of the Foxp3-mediated gene expression program and complemented these with functional experiments using two models of autoimmune diseases, thereby strengthening the study. The studies are comprehensive at both the cellular and molecular levels. The manuscript is well organized and presents a plethora of data regarding the transcriptomic landscape of these cells.

      Response: We thank the reviewers for their careful review and feedback on our manuscript. We appreciate that the reviewers and editors recognize the strength and comprehensive nature of our in vivo, cellular, biochemical, and genome-wide molecular studies, which are well-organized in the manuscript. The acknowledgment of the complementary functional experiments in two models of inflammatory disease is also encouraging.

      Weakness:

      The authors claim that the mice have no pathologic signs of autoimmune disease even at a relatively old age, yet mice have an increased number of activated CD4+ T cells and T-follicular helper cells (even at the age of 6 weeks) as well as reduced naïve T-cells. Thus, immune homeostasis is perturbed in these mice even at a young age and the eXect of inflammatory microenvironments on cellular functions cannot be ruled out. Further, clear conclusions from the genome-wide studies are lacking.

      Response: We agree with the reviewers' comment regarding the absence of overt autoimmune pathologies in Ikzf1-fl/fl-Foxp3-Cre+ mice, despite the increased frequency of activated CD4+ T cells, TFH cells, and apparent perturbation of lymphocyte homeostasis, even at a young age. It is noteworthy that while Ikaros is implicated in various autoimmune diseases, our specific mouse model in which Ikaros expression is lost only in Tregs, may not lead to a strong autoimmune phenotype in part due to the controlled environment of an extra-clean, pathogen-free animal facility. This aligns with a related study by Ana et al (2019, J. Immunol: doi:10.4049/jimmunol.1801270) in Ikzf1-fl/fl-dLck-Cre+ mice with loss of Ikaros expression in all mature CD4+ T cells, including Tregs, that exhibit no overt signs of overt autoimmune disease. Moreover, our transcriptomic studies reveal that increased expression of inflammatory genes in Ikzf1-deficient Treg is coupled with the simultaneous upregulation of genes with positive roles in Treg function. This balance suggests a compensatory mechanism within Ikaros-deficient Tregs that maintains their suppressive function until encountering an inflammatory immune challenge, which eventually leads to loss of Treg suppressive function in Treg-specific Ikaros-deficient mice. Our studies clearly show that Ikaros has cell-intrinsic eXects in Treg that also lead to cell-extrinsic eXects mediated by secreted factors that are likewise regulated by Ikaros. This can be said about the function of any transcription factor in any cell type. Our data clearly support the conclusion from the genome-wide studies that Ikaros plays a major role in establishing the active chromatin landscape, gene expression profile, and function of regulatory T cells in mice.

      The following recommendations consolidate the views of the three reviewers of the manuscript.

      The experiments suggested and, in some instances, fresh analysis, are thought necessary, so that the evidence of Ikaros-Foxp3 interactions regulating T-regulatory cell biology is comprehensive and solid. We hope the comments are useful to strengthen the comprehensive analysis reported in this submission.

      The primary concern is that the indications of inflammation in the mice (see points 1 & 2 below) do not reflect in the experiments or consequent conclusions. The gap in the data should be addressed by testing these interactions in an appropriate context for which suggestions are included.

      Please note that the title of the manuscript may be modified to reflect the use of mice as the system of study for this work.

      (1) The evidence of inflammation (increased CD4 and T follicular cells) reported in the work requires new experiments to rigorously examine the relationship between Ikaros and Foxp3 to rule out the possible impact of the (inflammatory) microenvironment of the mice (Please see: Zemmour et al., Nat. Immunology 22, 607, 2021). Two possible experimental systems in mice are suggested.

      a) The use of heterozygous female mice, which should be phenotypically normal due to the presence of 50% normal Treg. Or,

      b) The generation of bone chimeras between wild-type and deficient mice using congenic markers.

      Response: We agree that immune dysregulation that develops in the mice with age or during an inflammatory insult due to loss of Ikaros function in the Treg lineage is an important part of the phenotype of the animals. Our studies show that loss of Ikaros function in Treg influences the gene expression program such that Treg now produce inflammatory cytokines and ligands capable of engaging receptors expressed on Treg and other cells. This likely results in autocrine and paracrine signaling that induces further metabolic and gene expression diXerences not observed in wild-type mice. Indeed, we report in the manuscript that a sizable fraction of the diXerentially expressed genes do not appear to be direct Ikaros targets, but rather are downstream of Ikaros target genes such as Il2, Ifng, Notch, and Wnt. The mosaic experiments suggested will be a useful topic of future studies. Importantly, we argue that no gene expression study involving modulation of transcription factor activity in an organism- or cell-based system can be designed to measure only the direct eXects of that transcription factor in a manner isolated from any indirect, downstream eXects on the expression of other genes. We suggest that our current data remain highly valuable, as they reveal real and relevant biology in physiologic in vivo systems that do not depend upon the use of heterologous models. The fact that loss of Ikaros has an eXect not only on its direct targets, but on gene programs driven in turn by the indirect eXects of Ikaros-regulated factors, has been acknowledged in the manuscript.

      (2) Figs. 7 and S5 show accumulation of CD4 cells (activated, memory, Tfh, Tfr) in LNs and spleens of the Ikaros KO over time. This is accompanied by elevated Igs but without overt autoimmune disease. KO Tregs had equivalent suppressive activity as WT Tregs against WT TeX in vitro. However, TeX from KO mice were resistant to the suppressive eXects of WT or KO Tregs. The authors interpret this as due to the increased percentage of memory cells within the KO TeXs, although they did not formally prove this point. Figs. 9 and S6 show that Ikaros KO mice are unable to be tolerized for cardiac allograft survival using two diXerent standard tolerogenic regiments. The rejecting allografts are accompanied by increased T-cell infiltration and upregulation of inflammatory genes. The authors suggest there is increased alloantibody, but alloantibody does not seem to have been measured.

      Response: We are currently exploring in more detail the dysregulation of humoral immunity in the Ikzf1-deficient Treg model and plan to report these results in a future study.

      (3) Linked to the above, a comparison of the chromatin occupancy of Ikaros in resting and activated Tregs would inform on whether and how Ikaros occupancy changes with the activation status of Tregs. Since the authors use in vitro stimulation for RNAseq and ATAC seq, ChIP seq analyses under these matching conditions will greatly add to the quality of the study. Since "Foxp3-dependent", ie. diXerential gene expression in the Foxp3GFPKO cells (PMID: 17220874) gene expression has been shown to be not entirely the same as Treg signature (i.e. gene expression or Tregs compared to Tnv), it will be worth correlating Ikaros, Foxp3 co-occupied genes and the corresponding fate of their expression with Foxp3-dependent and independent Treg signature gene sets.

      Response: The prior study by Gavin et al. referred to above used duplicate samples instead of the standard three or more replicates required for a robust diXerential analysis of gene expression. The two samples in this study are variable, and no statistically significant diXerential gene expression was found between the experimental groups when we subjected these data to current analysis methods. For this reason, we have elected not to compare these prior data with our current data, which are robust, reproducible, and analyzed using current statistical methods. Furthermore, the mice used for the prior study develop a fatal inflammatory disease (scurfy) and therefore the Treg examined in this study would be subject to a much stronger extrinsic inflammatory environment than the Treg in our study, as our mice show no overt disease even with age.

      Further, the consequence of the cooperation between the two transcription factors that can be inferred from the experiments in the study remains unclear. It is suggested that the authors could first consider the ChIP seq data from Foxp3, Ikaros co- and diXerentially occupied genes, and then correlate with the ATAC seq and gene expression data to comment on the consequence of this cooperation.

      Response: We find that Ikaros binding at a given region has a strong eXect on accessibility, as reported in the manuscript, but that Foxp3 occupancy has less consequence, consistent with a prior study suggesting that Foxp3 largely utilizes the open chromatin landscape already present in the conventional CD4 T cell lineage (PMID:23021222). Our data suggest that the dominant eXect of Ikaros on Foxp3 is at the level of chromatin occupancy.

      (4) In the comparative analyses of Ikaros and Foxp3 co-occupied regions and gene expression outcome, the authors mention "A total of 4423 Foxp3 binding sites were detected in the open chromatin landscape of wild-type Treg (Supplementary Table 9), and this ChIP-seq signal was enriched at accessible Foxp3 motifs." It is unclear whether the authors focused on the ATAC seq data and only examined the open chromatin regions for this analysis. In that case, it is unclear why. More so because the Ikaros footprint is more apparent in regions where accessibility is reduced upon deletion of Ikaros.

      Response: Foxp3 has been shown to bind primarily at open chromatin shared between Tconv and Treg, unlike the pioneer activity of other Fox family members (PMID: 23021222, biorXiv https://www.biorxiv.org/content/10.1101/2023.10.06.561228v2.full.pdf). Consistent with this, we found the majority of peaks were in open chromatin. The motif analysis is quantitative, not binary, and takes into account Foxp3 binding sites at regions considered open in either condition, which is why we can see enrichment of Foxp3 motifs at sites going from more open to less open in the absence of Ikaros.

      (5) Comments on figures:

      The authors use MFI repeatedly in many of the figures for quantitation of antigen expression. This is misleading as several of the target antigens are normally expressed on a subpopulation of cells, e.g., Eos. Percent positive and MFI would be more relevant. Cytokine production should be presented by intracellular staining (e.g., IL-2, IFNg) as Elisa data does not allow one to determine the percentage of abnormally producing cells.

      Response: We show both ICS and ELISA in this paper, preferring ELISA because it is much more quantitative than ICS.

      Suppl. Fig. 1c - the panels do not correspond precisely to the legend or the text. At least one panel is missing. In Supp fig 1c, the authors plotted eXector Tregs, which are by definition CD62LloCD44hi, but the Y axis says CD44hiCD62Lhi. Is this a typo? Also on page 4, describing this data the authors mentioned Tfr, but the data is not shown in the Supp fig 1c.

      Response: We thank the reviewer for catching these mistakes. We have corrected the typo in the figure panel for Supplementary Figure 1c. Follicular Treg data are indeed presented in Figure 7h, not Supplementary Figure 1, and we have corrected the text.

      Fig. 2, which lists the diXerent categories of diXerentially expressed genes, it will be helpful if the authors add two columns indicating fold change and FDR values.

      Response: These values are included in Table S1

      Fig. 3c, the resolution of the histograms in the inset should be enhanced.

      Fig. 3d, a histogram of representative CTV dilution plots, and an explanation of how the quantifications were done may be included.

      Fig. 3e - not well labeled. Are these fold changes? Enrichments? Number of gene elements within the GO term that are aXected? Something else?

      Fig. 3f - presented out of sequence. The data are a little hard to understand as the color scale is so subtle and the colors so close to one another that it is not entirely clear which gene expressions are increased vs decreased. Other than the simple statement that the Ikaros KO causes numerous changes, there does not seem to be a more consistent message from this data panel.

      Fig. 4a, in addition to the bar graphs, it will be better to show the plots in a histogram, gated on Foxp3+ Tregs in WT and KO groups, with representative MFI indicated on top. The resolution of the scatter plots in this figure, as well as some others throughout the manuscript, may be improved. Please increase the resolution wherever necessary.

      Fig. 4b should include representative plots for cytokine production gated in Tconv (CD4+Foxp3-) cells.

      Figs. 5a-h, S2-3a-d, and Suppl. Tables S4-8 show a comprehensive ATAC-seq and ChIP-seq analysis of genes and chromatin occupied or regulated by Ikaros, comparing Tconv vs Treg, stimulated vs naïve, and WT vs KO cells. It is a comprehensive tour-de-force analysis, again showing the major eXects of Ikaros on the entire Treg landscape of gene regulation.

      Fig. S5h-j should be explained or labeled in more detail. The fonts are too small to read, even at 200% magnification; and the cell and gene comparisons are not entirely clear.

      Supp. Fig. S3e is not referred to in the text.

      Fig. S4a is very diXicult to read; the font and plotted points are too small.

      Response: We have improved the clarity of the figures where necessary. We also indicate in the figure legends that full gene lists are to be found in the supplementary tables.

      Page 8, "Regions that exhibit reduced accessibility in Ikzf1 cko compared to wild-type Treg are enriched for the binding motif for Ikaros and the motif for TCF1 (Figure 5g).... ". Is this Fig. 5i or 5g?

      Response: This statement is correct and is referring to data depicted in Figure 5g.

      In Fig 6e, Flag-Ik7 is not visible in any of the inputs. The co-IP between Foxp3 and Runx1 (presumably a positive control) is not eXicient in this experimental condition. Co-IP experiments performed in primary cells upon retroviral transduction of the tagged proteins to confirm observations in cell lines are suggested.

      Response: Runx1 is shown to co-precipitate with Foxp3 as expected, although the band is not intense, and the data depicted are representative of 3 experiments. Ik7 was included in this transient transfection experiment as a redundant control, and the referee is correct that Ik7 did not express well in this experiment and cannot be seen in this exposure. We showed these blots intact in the spirit of not digitally altering the data, and because the low Ik7 expression did not impact our ability to demonstrate specific co-precipitation of Foxp3 with full length Ikaros (Ik1). The images include nearly the entire mini-blots, and we have added molecular weight markers for clarity. As indicated in the legend, the cytokine and ChIP data in 6f are from a separate model of retrovirally Foxp3/Ik7transduced T cells that we and others have used in multiple prior studies (e.g. Thomas JI 2007, Thomas JI 2010). The interpretability of these experiments is not impacted by the transient transfection data from figure 6e. It should be noted that a prior study by Rudra et al. that is cited and referred to in the manuscript used a similar approach to also establish that Foxp3 and Ikaros form a complex in cells.

      In Fig 6f, the authors state that Foxp3 overexpression in CD4 cells results in promoter occupancy of both IL2 and IFNg, however, data shows only IL2. Also in 6f, Foxp3 overexpression reduces IL2 and IFNg secretion, measured by ELISA, which is recovered by IkDN. However, the eXect of Foxp3 along with WT Ikaros (which should not modulate, and if anything, further repress IL2, IFNg production) is not shown.

      Response: The reviewer is correct that ectopic expression of Ikaros leads to repression of cytokine gene expression, which we and others have shown in prior studies. Because the focus of this study was on loss of Ikaros function in Treg, we did not elect to overexpress full-length Ikaros. However, we completely agree that Ikaros GOF in Treg is an important topic for future studies.

      Fig. 7e-g, how is %suppression calculated? Can representative CTV dilution plots for the suppression assays be shown?

      Response: Cell division was quantified as described previously (see ref 50), and percent suppression represents the reduction in cell division measured by Tconv in the presence of Treg compared to in the absence of Treg. This has been clarified in the methods section.

      In Fig 8 and the supplementary figures the representative colon pictures (Fig. S6a-c) do not show convincing diXerences in colon morphology even though all the other histology and clinical parameters are clear. Are the figures mislabeled?

      In Fig 8c-e and other histology figures scale bars should be shown.

      Fig. 8c-e, the Alcian blue staining among the groups appears similar; perhaps this is due to the low power magnification.

      Response: We have edited this figure for clarity

      Additional comments:

      Fig 10 is explained in the discussion section for the first time. The authors may want to consider including this when introducing Ikzf1 ChIPseq data for the first time in the study.

      Response: The reviewer raises a valid point but we have elected to retain the current organizational structure of the manuscript.

      A more complete characterization of the activated conventional cells including both CD4+ and CD8+ T cells for cytokine production during aging may be considered, as it is highly likely that abnormalities in cytokine production will be observed.

      Response: We agree and are planning additional such experiments in future studies focusing on in vivo models of tolerance.

      The failure of suppression of T cell proliferation which the authors claim is due to the presence of activated memory T cells can be better documented by using naive responder cells from the cKO mice.

      Response: We agree and are planning additional such experiments in a future study focusing on further aspects of cellular immunobiology impacted by Ikaros, but we will give preference to in vivo models of tolerance in such studies.

    1. eLife assessment

      This study is a useful showcase of a workflow to perform large-scale characterization of drug mechanisms of action using proteomics. The work is backed by solid evidence, however, more statistical analyses and a user-friendly interface to enhance data mining by the readers are recommended. The strengths of this study include the large number of compounds tested within a common workflow and well-described experimental protocols. This will be of broad interest to medicinal chemists, toxicologists, and biochemists.

    2. Reviewer #1 (Public Review):

      Summary:

      This is an interesting and potentially important paper, which however has some deficiencies.

      Strengths:

      A significant amount of potentially useful data.

      Weaknesses:

      One issue is a confusion of thermal stability with solubility. While thermal stability of a protein is a thermodynamic parameter that can be described by the Gibbs-Helmholtz equation, which relates the free energy difference between the folded and unfolded states as a function of temperature, as well as the entropy of unfolding. What is actually measured in PISA is a change in protein solubility, which is an empirical parameter affected by a great many variables, including the presence and concentration of other ambient proteins and other molecules. One might possibly argue that in TPP, where one measures the melting temperature change ∆Tm, thermal stability plays a decisive or at least an important role, but no such assertion can be made in PISA analysis that measures the solubility shift.

      Another important issue is that the authors claim to have discovered for the first time a number of effects well described in prior literature, sometimes a decade ago. For instance, they marvel at the differences between the solubility changes observed in lysate versus intact cells, while this difference has been investigated in a number of prior studies. No reference to these studies is given during the relevant discussion.

      The validity of statistical analysis raises concern. In fact, no calculation of statistical power is provided. As only two replicates were used in most cases, the statistical power must have been pretty limited. Also, there seems to be an absence of the multiple-hypothesis correction.

      Also, the authors forgot that whatever results PISA produces, even at high statistical significance, represent just a prediction that needs to be validated by orthogonal means. In the absolute majority of cases such validation is missing.

      Finally, to be a community-useful resource the paper needs to provide the dataset with a user interface so that the users can data-mine on their own.

    3. Reviewer #2 (Public Review):

      Summary:

      Using K562 (Leukemia) cells as an experimental model, Van Vracken et. al. use Thermal Proteome Profiling (TPP) to investigate changes in protein stability after exposing either live cells or crude cell lysates to a library of anti-cancer drugs. This was a large-scale and highly ambitious study, involving thousands of hours of mass spectrometry instrument time. The authors used an innovative combination of TPP together with Proteome Integral Solubility Alternation (PISA) assays to reduce the amount of instrument time needed, without compromising on the amount of data obtained.

      The paper is very well written, the relevance of this work is immediately apparent, and the results are well-explained and easy to follow even for a non-expert. The figures are well-presented. The methods appear to be explained in sufficient detail to allow others to reproduce the work.

      Strengths:

      Using CDK4/6 inhibitors, the authors observe strong changes in protein stability upon exposure to the drug. This is expected and shows their methodology is robust. Further, it adds confidence when the authors report changes in protein stability for drugs whose targets are not well-known. Many of the drugs used in this study - even those whose protein targets are already known - display numerous off-target effects. Although many of these are not rigorously followed up in this current study, the authors rightly highlight this point as a focus for future work.

      Weaknesses:

      While the off-target effects of several drugs could've been more rigorously investigated, it is clear the authors have already put a tremendous amount of time and effort into this study. The authors have made their entire dataset available to the scientific community - this will be a valuable resource to others working in the fields of cancer biology/drug discovery.

    4. Reviewer #3 (Public Review):

      Summary:

      This work aims to demonstrate how recent advances in thermal stability assays can be utilised to screen chemical libraries and determine the compound mechanism of action. Focusing on 96 compounds with known mechanisms of action, they use the PISA assay to measure changes in protein stability upon treatment with a high dose (10uM) in live K562 cells and whole cell lysates from K562 or HCT116. They intend this work to showcase a robust workflow that can serve as a roadmap for future studies.

      Strengths:

      The major strength of this study is the combination of live and whole cell lysates experiments. This allows the authors to compare the results from these two approaches to identify novel ligand-induced changes in thermal stability with greater confidence. More usefully, this also enables the authors to separate the primary and secondary effects of the compounds within the live cell assay.

      The study also benefits from the number of compounds tested within the same framework, which allows the authors to make direct comparisons between compounds.

      These two strengths are combined when they compare CHEK1 inhibitors and suggest that AZD-7762 likely induces secondary destabilisation of CRKL through off-target engagement with tyrosine kinases.

      Weaknesses:

      One of the stated benefits of PISA compared to the TPP in the original publication (Gaetani et al 2019) was that the reduced number of samples required allows more replicate experiments to be performed. Despite this, the authors of this study performed only duplicate experiments. They acknowledge this precludes the use of frequentist statistical tests to identify significant changes in protein stability. Instead, they apply an 'empirically derived framework' in which they apply two thresholds to the fold change vs DMSO: absolute z-score (calculated from all compounds for a protein) > 3.5 and absolute log2 fold-change > 0.2. They state that the fold-change threshold was necessary to exclude non-specific interactors. While the thresholds appear relatively stringent, this approach will likely reduce the robustness of their findings in comparison to an experimental design incorporating more replicates. Firstly, the magnitude of the effect size should not be taken as a proxy for the importance of the effect. They acknowledge this and demonstrate it using their data for PIK3CB and p38α inhibitors (Figures 2B-C). They have thus likely missed many small, but biologically relevant changes in thermal stability due to the fold-change threshold. Secondly, this approach relies upon the fold-changes between DMSO and compound for each protein being comparable, despite them being drawn from samples spread across 16 TMT multiplexes. Each multiplex necessitates a separate MS run and the quantification of a distinct set of peptides, from which the protein-level abundances are estimated. Thus, it is unlikely the fold changes for unaffected proteins are drawn from the same distribution, which is an unstated assumption of their thresholding approach. The authors could alleviate the second concern by demonstrating that there is very little or no batch effect across the TMT multiplexes. However, the first concern would remain. The limitations of their approach could have been avoided with more replicates and the use of an appropriate statistical test. It would be helpful if the authors could clarify if any of the missed targets passed the z-score threshold but fell below the fold-change threshold.

      The authors use a single, high, concentration of 10uM for all compounds. Given that many of the compounds likely have low nM IC50s, this concentration will often be multiple orders of magnitude above the one at which they inhibit their target. This makes it difficult to assess the relevance of the off-target effects identified to clinical applications of the compounds or biological experiments. The authors acknowledge this and use ranges of concentrations for follow-up studies (e.g. Figure 2E-F). Nonetheless, this weakness is present for the vast bulk of the data presented.

      The authors claim that combining cell-based and lysate-based assays increases coverage (Figure 3F) is not supported by their data. The '% targets' presented in Figure 3F have a different denominator for each bar. As it stands, all 49 targets quantified in both assays which have a significant change in thermal stability may be significant in the cell-based assay. If so, the apparent increase in % targets when combining reflects only the subsetting of the data. To alleviate this lack of clarity, the authors could update Figure 3F so that all three bars present the % targets figure for just the 60 compounds present in both assays.

      Aims achieved, impact and utility:

      The authors have achieved their main aim of presenting a workflow that serves to demonstrate the potential value of this approach. However, by using a single high dose of each compound and failing to adequately replicate their experiments and instead applying heuristic thresholds, they have limited the impact of their findings. Their results will be a useful resource for researchers wishing to explore potential off-target interactions and/or mechanisms of action for these 96 compounds, but are expected to be superseded by more robust datasets in the near future. The most valuable aspect of the study is the demonstration that combining live cell and whole cell lysate PISA assays across multiple related compounds can help to elucidate the mechanisms of action.

    1. eLife assessment

      This is a very strong, well-written, and interesting paper analyzing in an original way how tension pattern dynamics can reveal the contribution of active versus passive intercalation during tissue elongation. The authors apply a new concept of isogonal tension decomposition to extract a global map of tissue mechanics that will be extremely valuable in the field of biomechanics. The model is convincing to explain the authors' data but could be strengthened further by analyzing data from mutant backgrounds that could serve as a test.

    2. Joint Public Review:

      Summary:

      Brauns et al. work to decipher the respective contribution of active versus passive contributions to cell shape changes during germ band elongation. Using a novel quantification tool of local tension, their results suggest that epithelial convergent extension results from internal forces.

      Strengths:

      The approach developed here, tension isogonal decomposition, is original and the authors made the demonstration that we can extract comprehensive data on tissue mechanics from this type of analysis.

      They present an elegant diagram that quantifies how active and passive forces interact to drive cell intercalations.

      The model qualitatively recapitulates the features of passive and active intercalation for a T1 event.

      Regions of high isogonal strains are consistent with the proximity of known active regions.

      They define a parameter (the LTC parameter) which encompasses the geometry of the tension triangles and allows the authors to define a criterium for T1s to occur.

      The data are clearly presented, going from cellular scale to tissue scale, and integrating modeling approach to complement the thoughtful description of tension patterns.

      Weaknesses:

      The modeling is interesting, with the integration of tension through tension triangulation around vertices and thus integrating force inference directly in the vertex model. However, the authors are not using it to test their hypothesis and support their analysis at the tissue level. Thus, although interesting, the analysis at the tissue level stays mainly descriptive.

      Major points:

      (1) The authors mention that from their analysis, they can predict what is the tension threshold required for intercalations in different conditions and predict that in Snail and Twist mutants the T1 tension threshold would be around √2. Since movies of these mutants are most probably available, it would be nice to confirm these predictions.

      (2) While the formalism is very elegant and convincing, and also convincingly allows making sense of the data presented in the paper, it is not all that clear whether the claims are compatible with previous experimental observations. In particular, it has been reported in different papers (including Collinet et al NCB 2015, Clement et al Curr Biol 2017) that affecting the initial Myosin polarity or the rate of T1s does not affect tissue-scale convergent extension. Analysis/discussion of the Tor phenotype (no extension with myosin anisotropy) and the Eve/Runt phenotype (extension without Myosin anisotropy), which seem in contradiction with an extension mostly driven by myosin anisotropy.

    1. eLife assessment

      This paper makes a valuable contribution by implicating S-acylation of Cys-130 in recruitment of the inflammasome receptor NLRP3 to the Golgi. Enzymes are identified as candidates for mediating S-acylation and de-acylation of NLRP3, and evidence is presented that S-acylation plays a role in response to the stress induced by nigericin treatment. Although it seems likely that Cys-130 does indeed contribute to membrane association of NLRP3, the mechanistic analyses are incomplete and the interpretations about the effects of nigericin are not fully conclusive.

    2. Reviewer #1 (Public Review):

      This is an interesting study investigating the mechanisms underlying membrane targeting of the NLRP3 inflammasome and reporting a key role for the palmitoylation-depalmitoylation cycle of cys130 in NRLP3. The authors identify ZDHHC3 and APT2 as the specific ZDHHC and APT/ABHD enzymes that are responsible for the s-acylation and de-acylation of NLRP3, respectively. They show that the levels of ZDHHC3 and APT2, both localized at the Golgi, control the level of palmitoylation of NLRP3. The S-acylation-mediated membrane targeting of NLRP3 cooperates with polybasic domain (PBD)-mediated PI4P-binding to target NLRP3 to the TGN under steady-state conditions and to the disassembled TGN induced by the NLRP3 activator nigericin.

      However, the study has several weaknesses in its current form as outlined below.

      (1) The novelty of the findings concerning cys130 palmitoylation in NLRP3 is unfortunately compromised by recent reports on the acylation of different cysteines in NLRP3 (PMID: 38092000), including palmitoylation of the very same cys130 in NLRP3 (Yu et al https://doi.org/10.1101/2023.11.07.566005), which was shown to be relevant for NLRP3 activation in cell and animal models. What remains novel and intriguing is the finding that NLRP3 activators induce an imbalance in the acylation-deacylation cycle by segregating NLRP3 in late Golgi/endosomes from de-acylating enzymes confined in the Golgi. The interesting hypothesis put forward by the authors is that the increased palmitoylation of cys130 would finally contribute to the activation of NLRP3. However, the authors should clarify the trafficking pathway of acylated-NLRP3. This pathway should, in principle, coincide with that of TGN46 which constitutively recycles from the TGN to the plasma membrane and is trapped in endosomes upon treatment with nigericin.

      (2) To affect the S-acylation, the authors used 16 hrs treatment with 2-bromopalmitate (2-BP). In Figure 1f, it is quite clear that NLRP3 in 2-BP treated cells completely redistributed in spots dispersed throughout the cells upon nigericin treatment. What is the Golgi like in those cells? In other words, does 2-BP alter/affect Golgi morphology? What about PI4P levels after 2-BP treatment? These are important missing pieces of data since both the localization of many proteins and the activity of one key PI4K in the Golgi (i.e. PI4KIIalpha) are regulated by palmitoylation.

      (3) The authors argue that the spots observed with NLRP-GFP result from non-specific effects mediated by the addition of the GFP tag to the NLRP3 protein. However, puncta are visible upon nigericin treatment, as a hallmark of endosomal activation. How do the authors reconcile these data? Along the same lines, the NLRP3-C130S mutant behaves similarly to wt NLRP3 upon 2-BP treatment (Figure 1h). Are those NLRP3-C130S puncta positive for endosomal markers? Are they still positive for TGN46? Are they positive for PI4P?

      (4) The authors expressed the minimal NLRP3 region to identify the domain required for NLRP3 Golgi localization. These experiments were performed in control cells. It might be informative to perform the same experiments upon nigericin treatment to investigate the ability of NLRP3 to recognize activating signals. It has been reported that PI4P increases on Golgi and endosomes upon NG treatment. Hence, all the differences between the domains may be lost or preserved. In parallel, also the timing of such recruitment upon nigericin treatment (early or late event) may be informative for the dynamics of the process and of the contribution of the single protein domains.

      (5) As noted above for the chemical inhibitors (1) the authors should check the impact of altering the balance between acyl transferase and de-acylases on the Golgi organization and PI4P levels. What is the effect of overexpressing PATs on Golgi functions?