10,000 Matching Annotations
  1. Nov 2025
    1. Author Response

      Reviewer #1 (Public Review):

      The authors convincingly show in this study the effects of the fas5 gene on changes in the CHC profile and the importance of these changes toward sexual attractiveness.

      The main strength of this study lies in its holistic approach (from genes to behaviour) showing a full and convincing picture of the stated conclusions. The authors succeeded in putting a very interdisciplinary set of experiments together to support the main claims of this manuscript.

      We appreciate the kind comments from the reviewer.

      The main weakness stems from the lack of transparency behind the statistical analyses conducted in the study. Detailed statistical results are never mentioned in the text, nor is it always clear what was compared to what. I also believe that some tests that were conducted are not adequate for the given data. I am therefore unable to properly assess the significance of the results from the presented information. Nevertheless, the graphical representations are convincing enough for me to believe that a revision of the statistics would not significantly affect the main conclusions of this manuscript.

      We apologize for neglecting a detailed description of statistical tests that were performed. We wrote additional paragraphs in the method part specifically explaining the statistical analyses (line 435-445; 489-502; 559-561; 586-591).

      The second major problem I had with the study was how it brushes over the somewhat contradicting results they found in males (Fig S2). These are only mentioned twice in the main text and in both cases as being "similarly affected", even though their own stats seem to indicate otherwise for many of the analysed compound groups. This also should affect the main conclusion concerning the effects of fas5 genes in the discussion, a more careful wording when interpreting the results is therefore necessary.

      Thank you for pointing this out. Though our focus clearly lay on the female CHC profiles as a function in sexual signaling has only been described thus far for them, we now elaborated the result and discussion for the fas5 RNAi male part (line 167-178; 258-268).

      Reviewer #2 (Public Review):

      Insects have long been known to use cuticular hydrocarbons for communication. While the general pathways for hydrocarbon synthesis have been worked out, their specificity and in particular the specificity of the different enzymes involved is surprisingly little understood. Here, the authors convincingly demonstrate that a single fatty acid synthase gene is responsible for a shift in the positions of methyl groups across the entire alkane spectrum of a wasp, and that the wasps males recognize females specifically based on these methyl group positions. The strength of the study is the combination of gene expression manipulations with behavioural observations evaluating the effect of the associated changes in the cuticular hydrocarbon profiles. The authors make sure that the behavioural effect is indeed due to the chemical changes by not only testing life animals, but also dead animals and corpses with manipulated cuticular hydrocarbons.

      I find the evidence that the hydrocarbon changes do not affect survival and desiccation resistance less convincing (due to the limited set of conditions and relatively small sample size), but the data presented are certainly congruent with the idea that the methyl alkane changes do not have large effects on desiccation.

      We appreciate the kind comments from the reviewer.

      Reviewer #3 (Public Review):

      In this manuscript, the authors are aiming to demonstrate that a fatty-acyl synthase gene (fas5) is involved in the composition of the blend of surface hydrocarbons of a parasitoid wasp and that it affects the sexual attractiveness of females for males. Overall, the manuscript reads very well, it is very streamlined, and the authors' claims are mostly supported by their experiments and observations.

      We appreciate the kind comments from the reviewer.

      However, I find that some experiments, information and/or discussion are absent to assess how the effects they observe are, at least in part, not due to other factors than fas5 and the methyl-branched (MB) alkanes. I'm also wondering if what the authors observe is only a change in the sexual attractiveness of females and not related to species recognition as well.

      We appreciate the interesting point that the reviewer raises in sexual attractiveness and species recognition and now expand upon this potential aspect in the discussion (lines 327-330). However, in this manuscript, we very much focused on the effect of fas5 knockdown on the conveyance of female sexual attractiveness in a single species (Nasonia vitripennis). Therefore, we argue that species recognition constitutes a different communication modality here, and we currently cannot infer whether and how species recognition is exactly encoded in Nasonia CHC profiles despite some circumstantial evidence for species-specificity (Buellesbach et al. 2013; Mair et al. 2017). Thus, we would like to refrain from any further speculation on species recognition before this can be unambiguously demonstrated, and remain within the mechanism of sexual attractiveness within a single species which we clearly show is mediated by the female MB-alkane fraction governed by the fatty acid synthase genes. We however still consider potential alternative explanations (e.g., n-alkenes acting as a deterrent of homosexual mating attempts).

      The authors explore the function of cuticular hydrocarbons (CHCs) and a fatty-acyl synthase in Nasonia vitripennis, a parasitic wasp. Using RNAi, they successfully knockdown the expression of the fas5 gene in wasps. The authors do not justify their choice of fatty-acyl synthase candidate gene. It would have been interesting to know if that is one of many genes they studied or if there was some evidence that drove them to focus their interest in fas5.

      In a previous study, 5 fas candidate genes orthologous to Drosophila melanogaster fas genes were identified and mapped in the genome of Nasonia vitripennis (Buellesbach et al. 2022). We actually investigated the effects of all of these fas genes on CHC variation, but only fas5 led to such a striking, traceable pattern shift. We are currently preparing another manuscript discussing the effects of the other fas genes, but decided to focus exclusively on fas5 here, due to its significance for revealing how sexual attractiveness can be encoded and conveyed in complex chemical profiles, maintained and governed by a surprisingly simple genetic basis.

      The authors observe large changes in the cuticular hydrocarbons (CHC) profile of male and females. These changes are mostly a reduction of some MB alkanes and an increase in others as well as an increase of n-alkene in fas5 knockdown females. For males fas5 knockdowns, the overall quantity of CHC is increased and consequently, multiple types of compounds are increased compared to wild-type, with only one compound appearing to decrease compared to wild-type. Insects are known to rely on ratios of compounds in blends to recognize odors. Authors address this by showing a plot of the relative ratios, but it seems to me that they do show statistical tests of those changes in the proportions of the different types of compounds. In the results section, the authors give percentages while referring to figures showing the absolute amount of CHCs. They should also test if the ratios are significantly different or not between experimental conditions. Similar data should be displayed for the males as well.

      We appreciate your suggestions. We kindly refer you to our response to reviewer 1, where we addressed the statistical tests. Specifically, we generated separate subplots to display the proportions of different compound classes and performed statistical tests to compare these proportions between different treatments for both males and females. Additionally, we have revised the results section to replace relative abundances with absolute quantity, as depicted in Figure 2C-G.

      Furthermore, the authors didn't use an internal standard to measure the quantity of CHCs in the extracts, which, to me, is the gold standard in the field. If I understood correctly, the authors check the abundance measured for known quantities of n-alkanes. I'm sure this method is fine, but I would have liked to be reassured that the quantities measured through this method are good by either testing some samples with an internal standard, or referring to work that demonstrates that this method is always accurate to assess the quantities of CHC in extracts of known volumes.

      We actually did include 7,5 ng/μl dodecane (C12) as an “internal” standard in the hexane resuspensions of all of our processed samples (line 456, Materials and Methods). This was primarily done to allow for visually inspecting and comparing the congruence of all chromatograms in the subsequent data analysis and immediately detect any variation from sample preparation, injection process and instrument fluctuation. In our study, we have a very elaborate and standardized CHC extraction method that the volume of solvent and duration for extraction are strictly controlled to minimize the variation from sample preparation steps. Furthermore, we calibrated each individual CHC compound quantity with a dilution series of external standards (C21-C40) of known concentration. By constructing a calibration curve based on this dilution series, we achieved the most accurate compound quantification, also taking into account and counteracting the generally diminishing quantities of compounds with higher chain lengths.

      The authors provide a sensible control for their RNAi experiments: targeting an unrelated gene, absent in N. vitripennis (the GFP). This allows us to see if the injection of RNAi might affect CHC profiles, which it appears to do in some cases in males, but not in females. The authors also show to the reader that their RNAi experiments do reduce the expression of the target gene. However, one of the caveats of their experiments, is that the authors don't provide evidence or information to allow the (non-expert) reader to assess whether the fas5 RNAi experiments did affect the expression of other fatty-acyl synthase genes. I'm not an expert in RNAi, so maybe this suggestion is not relevant, but it should, at least, be addressed somewhere in the manuscript that such off-target effects are very unlikely or impossible, in that case, or more generally.

      We acknowledge the reviewer’s concern about potential off-target effect of the fas5 knockdown. We actually did check initially for off-target effects on the other four previously published fas genes in N. vitripennis (Lammers et al. 2019; Buellesbach et al. 2022) and did not find any effects on their respective expressions. We now include these results as supplementary data (Figure 2-figure supplement 1). However, as mentioned in the cover letter to the editor, we discovered a previously uncharacterized fas gene in the most recent N. vitripennis genome assembly (NC_045761.1), fas6, most likely constituting a tandem gene duplication of fas5. These two genes turned out to have such high sequence similarity (> 90 %, Figure 2-figure supplement 2) that both were simultaneously downregulated by our fas5 dsRNAi construct, which we confirmed with qPCR and now incorporated into our manuscript (Fig. 2H). Therefore, we now explicitly mention that the knockdown affects both genes, and either one or both could have the observed phenotypic effects. Recognizing this RNAi off-target effect, we have now also incorporated a discussion of this issue in the appropriate section of the manuscript (line 364-377), as well as the potential off-target effects of our GFP dsRNAi controls (line 262-274).

      The authors observe that the modified CHCs profiles of RNAi females reduce courtship and copulation attempts, but not antennation, by males toward live and (dead) dummy females. They show that the MB alkanes of the CHC profile are sufficient to elicit sexual behaviors from males towards dummy females and that the same fraction from extracts of fas5 knockdown females does so significantly less. From the previous data, it seems that dummy females with fas5 female's MB alkanes profile elicit more antennation than CHC-cleared dummy females, but the authors do not display data for this type of target on the figure for MB alkane behavioral experiments.

      Actually similar proportions of males performed antennation behavior towards female dummies with MB alkane fraction of fas5 RNAi females and CHC-cleared female dummies (55% and 50%, respectively, see Author response image 1 for the corresponding parts of the sub-figures 3 E and 4 D). We did not deem it necessary to show the same data on CHC-cleared female dummies in Figure 3 as well.

      Author response image 1.

      Unfortunately, the authors don't present experiments testing the effect of the non-MB alkanes fractions of the CHC extracts on male behavior toward females. As such, they are not able to (and didn't) conclude that the MB-alkane is necessary to trigger the sexual behaviors of males. I believe testing this would have significantly enhanced the significance of this work. I would also have found it interesting for the authors to comment on whether they observe aggressive behavior of males towards females (live or dead) and/or whether such behavior is expected or not in inter-individual interactions in parasitoids wasps.

      In our experiment, we focus on the function of the MB-alkane fraction in female CHC profiles, and we comprehensibly demonstrate in figure 4 that the MB-alkane fraction from WT females alone is sufficient to trigger mating behavior coherent with that on alive and untreated female dummies. Therefore, we do not completely understand the reviewer’s concern about us not being ” able to (and didn't) conclude that the MB-alkane is necessary to trigger the sexual behaviors of males”. We appreciate the suggestion from the reviewer of testing the non-MB alkanes (n-alkanes and n-alkenes). However, due to the experimental procedure of separating the CHC compound class fractions through elution with molecular sieves, it was not possible for us to retrieve either the whole n-alkane or n-alkene fraction remaining bound to the sieves after separation). The role of n-alkenes in N. vitripennis is however considered in the discussion, as a deterrent for homosexual interactions between males (Wang et al. 2022a). Moreover, we did not observe aggressive behavior of males towards live or dead females.

      CHCs are used by insects to signal and/or recognize various traits of targets of interest, including species or groups of origin, fertility, etc. The authors claim that their experiments show the sexual attractiveness of females can be encoded in the specific ratio of MB alkanes. While I understand how they come to this conclusion, I am somewhat concerned. The authors very quickly discuss their results in light of the literature about the role of CHCs (and notably MB alkanes) in various recognition behaviors in Hymenoptera, including conspecific recognition. Previous work (cited by the authors) has shown that males recognize males from females using an alkene (Z9C31). As such, it remains possible that the "sexual attractiveness" of N. vitripennis females for males relies on them not being males and being from the right species as well. The authors do not address the question of whether the CHCs (and the MB alkanes in particular) of females signal their sex or their species. While I acknowledge that responding to this question is beyond the scope of this work, I also strongly believe that it should be discussed in the manuscript. Otherwise, non-specialist readers would not be able to understand what I believe is one of the points that could temper the conclusions from this work.

      We acknowledge the reviewer’s insight about the MB alkanes in signaling sex or species in N. vitripennis, and now include this aspect in our revised discussion (line 324-330). Moreover, we clearly demonstrate that n-alkenes have been reduced to minute trace components after our compound class separation, and the males still do not display courtship and copulation behaviors similar to WT females, thus strongly indicating that the n-alkenes do not play a role when relying solely on the changed MB-alkane patterns, further strengthening our main argument.

      References

      Benjamini, Y. and D. Yekutieli. 2001. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29:1165-1188.

      Buellesbach, J., J. Gadau, L. W. Beukeboom, F. Echinger, R. Raychoudhury, J. H. Werren, and T. Schmitt. 2013. Cuticular hydrocarbon divergence in the jewel wasp Nasonia: Evolutionary shifts in chemical communication channels? J. Evol. Biol. 26:2467-2478.

      Buellesbach, J., C. Greim, and T. Schmitt. 2014. Asymmetric interspecific mating behavior reflects incomplete prezygotic isolation in the jewel wasp genus Nasonia. Ethology 120:834-843.

      Buellesbach, J., H. Holze, L. Schrader, J. Liebig, T. Schmitt, J. Gadau, and O. Niehuis. 2022. Genetic and genomic architecture of species-specific cuticular hydrocarbon variation in parasitoid wasps. Proc. R. Soc. B 289:20220336.

      Engl, T., N. Eberl, C. Gorse, T. Krüger, T. H. P. Schmidt, R. Plarre, C. Adler, and M. Kaltenpoth. 2018. Ancient symbiosis confers desiccation resistance to stored grain pest beetles. Mol. Ecol. 27:2095-2108.

      Ferveur, J. F., J. Cortot, K. Rihani, M. Cobb, and C. Everaerts. 2018. Desiccation resistance: effect of cuticular hydrocarbons and water content in Drosophila melanogaster adults. Peerj 6.

      Lammers, M., K. Kraaijeveld, J. Mariën, and J. Ellers. 2019. Gene expression changes associated with the evolutionary loss of a metabolic trait: lack of lipogenesis in parasitoids. BMC Genom. 20:309.

      Mair, M. M., V. Kmezic, S. Huber, B. A. Pannebakker, and J. Ruther. 2017. The chemical basis of mate recognition in two parasitoid wasp species of the genus Nasonia. Entomol. Exp. Appl. 164:1-15.

      Wang, Y., W. Sun, S. Fleischmann, J. G. Millar, J. Ruther, and E. C. Verhulst. 2022a. Silencing Doublesex expression triggers three-level pheromonal feminization in Nasonia vitripennis males. Proc. R. Soc. B 289:20212002.

      Wang, Z., J. P. Receveur, J. Pu, H. Cong, C. Richards, M. Liang, and H. Chung. 2022b. Desiccation resistance differences in Drosophila species can be largely explained by variations in cuticular hydrocarbons. eLife 11:e80859.

    1. Author Response

      Reviewer #1 (Public Review):

      This study investigates how pathogens might shape animal societies by driving the evolution of different social movement rules. The authors find that higher disease costs induce shifts away from positive social movement (preference to move towards others) to negative social movement (avoidance from others). This then has repercussions on social structure and pathogen spread.

      Overall, the study comprises a good mixture of intuitive and less intuitive results. One major weakness of the work, however, is that the model is constructed around one pathogen that repeatedly enters a population across hundreds of generations. While the authors provide some justification for this, it does not capture any biological realism in terms of the evolution of the pathogen itself, which would be expected. The lack of co-evolution in the model substantially limits the generality of the results. For example, a number of recent studies have reported that animals might be expected to become very social when pathogens are very infectious, because if the pathogen is unavoidable they may as well gain the benefits of being social. The authors make some arguments about being focused on introduction events, but this does not really align well with their study design that carries through many generations after the introduction. Given the rapid evolutionary dynamics, perhaps the study could have a more focused period immediately after the initial introduction of the pathogen to look at rapid evolutionary responses (albeit this may need some sensitivity analyses around the parameters such as the mutation rates).

      We appreciate the reviewer’s evaluation of our work, and acknowledge that we have not currently included evolutionary dynamics for the pathogen.

      One conceptual impediment to such inclusion is knowing how pathogen traits could be modelled in a mechanistic way. For example, it is widely held that there is a trade-off between infection cost and transmissibility, with a quadratic relationship between them, but this is a pattern and not a process per se. We are unsure which mechanisms could be modelled that impinge upon both infection cost and transmissibility.

      On the practical side, we feel that a mechanistic, individual-based model that includes both pathogen and host evolution would become very challenging to interpret. It might be more tractable to begin with a mechanistic, spatial model that examines pathogen trait evolution with an unchanging host (such as an adaptation of Lion and Boots, 2010). We would be happy to take this on in future work, with a view to combining models thereafter.

      We have taken the suggestion to focus on the period immediately after the introduction, and we now focus on the following 500 generations. While 500 generations is still a long time, we would note that our model dynamics typically stabilise within 200 generations. We show the following generations primarily to check that some stability in the dynamics has indeed been reached (but see our new scenario 2).

      We also appreciate the point regarding mutation rates. Our mutation rates are relatively high to account for the small size of our population. We have found that with smaller mutation rates (0.001 rather than 0.01), evolutionary shifts in our population do not occur within the first 500 generations. This is primarily because prior to pathogen introduction, the ‘agent avoiding’ strategy that becomes common later is actually quite rare. Whether a rapid transition takes place thus depends on whether there are any agent avoiding individuals in the population at the moment of pathogen introduction, or on whether such individuals emerge rapidly thereafter through mutations on the social weights. We expect that with larger population sizes, we would be able to recover our results with smaller mutation rates as well.

      A final, and much more minor comment is whether this is really a paper about movement. The model does not really look at evolutionary changes in how animals move, but rather at where they move. How important is the actual movement process under this model? For example, would the results change if the model was constructed without explicit consideration of space and resources, but instead simply modelled individuals' decisions to form and break ties? (Similar to the recent paper by Ashby & Farine https://onlinelibrary.wiley.com/doi/full/10.1111/evo.14491 ). It might help to provide more information about how putting social decisions into a spatially explicit framework is expected to extend studies that have not done so (e.g.., because they are analytical).

      This paper is indeed about movement, as where to move is a key part of the movement ecology paradigm (Nathan et al. 2008). That said, we appreciate the advice to emphasise the importance of social decisions in a spatial context, we have added these to the Introduction (L. 79 – 81) and Discussion (L. 559 – 562). In brief, we do expect different dynamics that result from the explicit spatial context, as compared to a model in which social associations are probabilistic and could occur with any individual in the population.

      In our models, individual social tendency (whether they are prefer moving towards others) is separated from individual sociality (whether they actually associate with other individuals). This can be seen from our (new) Fig. 3D, in which individuals of each of the social strategies can sometimes have similar numbers of associations (although modulated by movement). This separation of the pattern from the underlying process is possible, we believe, due to the heterogeneity in the social landscape created by the explicit spatial context.

      Reviewer #2 (Public Review):

      This theoretical study looks at individuals' strategies to acquire information before and after the introduction of pathogens into the system. The manuscript is well-written and gives a good summary of the previous literature. I enjoyed reading it and the authors present several interesting findings about the development of social movement strategies. The authors successfully present a model to look at the costs and benefits of sociality.

      I have a couple of major comments about the work in its current form that I think are very important for the authors to address. That said, I think this is a promising start and that with some revisions, this could be a valuable contribution to the literature on behavioral ecology.

      We appreciate the reviewer’s kind words.

      Before starting, I would like to be precise that, given the scope of the models and the number of parameter choices that were necessary, I am going to avoid criticisms of the decisions made when designing the models. However, there are a few assumptions I rather find problematic and would like to give proper attention to.

      The first regards social vs. personal information. Most of the model argumentation is based on the reliance on social information (considering four, but to me overlapping, social strategies that are somehow static and heritable) but in fact, individuals may oscillate between relying on their personal information and/or on social information -- which may depend on the availability of resources, population density, stochastic factors, among others (Dall et al. 2005 Trends Ecol. Evol., Duboscq et al. 2016 Frontiers in Psychology). In my opinion, ignoring the influence of personal and social information decreases the significance of this work. I am aware that the authors consider the detection of food present in the model, but this is considered to a much smaller extent (as seen in their weight on individual decisions) than the social information cues.

      We appreciate the point that individuals can switch between relying on social and personal information. However, we would point out that in our model, the social strategies are not static. The social strategy is a convenient way of representing individuals’ position in behavioural trait-space (the ‘behavioural hypervolume’ of Bastille-Rousseau and Wittemeyer 2019). This essentially means that the importance assigned to each of the three cues available in our model varies among individuals. There are indeed individuals that are primarily guided by the density of food items, and this is the commonest ‘overall’ movement strategy before the pathogen is introduced. We represent this by showing how the importance of social information is low before pathogen introduction (Fig. 2B).

      While we primarily focus on the importance of social information, this is because the population quite understandably evolves a persistent preference for moving towards food items (i.e., using personal information if available). We have made this clearer in the text on lines 367 – 371.

      Critically, it is also unclear how, if at all, the information and pathogen traits are related to each other. If a handler gets sick, how does this affect its foraging activity (does it stop foraging, slow its activities, or does it show signs of sickness)? Perhaps this model is attempting to explore the emergence of social movement strategies only, but how they disentangle an individual's sickness status and behavioral response is unclear.

      We appreciate that infection may lead to physiological effects (e.g. altered metabolic rates, reduction in cognitive capacity) that may then influence behaviour. Our model aims to be relatively simple and general one, and does not consider the explicit mechanisms by which infection imposes a cost on fitness. Thus we do not include any behavioural modifications due to infection, as we feel that these would be much too complex to include in such a model. We would be happy to explore, in future work, phenomena such as the evolution of self-isolation and infection detection which is common among animals such as social insects (Stroeymeyt et al. 2018, Pusceddu et al. 2021).

      However, we have considered an alternative implementation of our model’s scenario 1 which could be interpreted as the infection reducing foraging efficiency by a certain percentage (other interpretations of the redirection of energy away from reproduction are also possible). We show how this implementation leads to very similar outcomes as those seen in our

      Very little is presented about the virulence of the pathogens and how they could affect the emergence of social strategies. The authors keep their main argumentation based on the introduction of novel pathogens (without distinctions on their pathogenicity), but a behavioral response is rather influenced by how fast individuals are infected and which are their chances of recovering. Besides, they consider that only one or two social interactions would be enough for pathogen transmission to occur.

      We have indeed considered a fixed transmission probability of 0.05, a relatively modest attack rate. Setting transmission probability to two other values (0.025, 0.1), we find that our general results are recovered - there is an evolutionary transition away from sociality, with the proportion of agent avoidance evolved increasing with the transmission probability. While we do not show these results in the main text, we have included figures showing the proportions of each social movement strategy here for the reviewers’ reference.

      Figures showing the proportion of social movement strategies in two simulation runs of our default implementation of scenario 1 (dE = 0.25, R = 2, pathogen introduction begins from G = 500). Top: Probability of transmission = 0.025 (half of the default). Bottom: Probability of transmission = 0.10 (double the default). Overall, the proportion of agent avoidance evolved (purple) increases with the probability of transmission. Each figure shows a single replicate of each parameter combination, for only 1,000 generations.

      Another important component is that individuals do not die, and it seems that they always have a chance (even if it is small) to reproduce. So, how the authors consider unsuccessful strategies in the model outputs or how these social strategies would be potentially "dismissed" by natural selection are not considered.

      We appreciate the point that our simulation does not include mortality effects, and that all individuals have some small chance of reproducing. There are a few practical and conceptual challenges when incorporating this level of realism in a general model. Including mortality effects could allow for the emergence of more complex density-dependent dynamics, as dead individuals would not be able to transmit the pathogen to other foragers (although for some pathogens, this could be a valid choice), nor would they be sources of social information. This would make the model much more challenging to interpret, and we have tried to keep this model as simple as possible.

      We have also sought to keep the model’s focus on the evolutionary dynamics, and to not focus on mortality. In order to balance this aim with the reviewer's suggestion, we have included a new implementation of the model’s scenario 1 which has a threshold on reproduction. That means that only individuals with a positive energy balance (intake > infection costs) are allowed to reproduce. We show a potentially counter-intuitive result, that the more social ‘handler tracking’ strategy persists at a higher frequency than in our default implementation, despite having a higher infection rate than the ‘agent avoiding’ strategy. We suggest that this is because the ‘agent avoiding’ individuals have very low or no intake. This is sufficient in our default implementation to have relatively higher fitness than the more frequently infected handler tracking individuals.

      Reviewer #3 (Public Review):

      Gupte and colleagues develop an individual-based model to examine how the introduction of a novel pathogen influences the evolution of social cue use in a population of agents for which social cues can both facilitate more efficient foraging, but also expose individuals to infection. In their simulations, individuals move across a landscape in search of food, and their movements are guided by a combination of cues related to food patches, individuals that are currently handling food items, and individuals that are not actively handling food. The latter two cues can provide indirect information about the likely presence of food due to the patchiness of food across the landscape.

      The authors find that prior to introducing the novel pathogen, selection favors strategies that home in on agents, regardless of whether those agents are currently handling food items. The overall contribution of these social cues to movement decisions, however, tends to be relatively small. After pathogen introduction, agents evolve to rely more heavily on social information and to either be more selective in their use of it (attending to other agents that are currently handling food and avoiding non-handlers) or avoiding other agents altogether. Gupte and colleagues further examine the ecological consequences of these shifts in social decision-making in terms of individuals' overall movement, food consumption, and infection risk. Relative to pre-introduction conditions, individuals move more, consume less food, and are less likely to be infected due to reduced contact with others. Epidemiological models on emergent social networks confirm that evolved behavioral changes generate networks that impede the spread of disease.

      The introduction of novel pathogens into wild populations is expected to be increasingly common due to climate change and increasing global connectedness. The approach taken here by the authors is a potentially worthwhile avenue to explore the potential eco-evolutionary consequences of such introductions. A major strength of this study is how it couples ecological and evolutionary timescales. Dominant behavioral strategies evolve over time in response to changing environmental conditions and impact social, foraging, and epidemiological dynamics within generations. I imagine there are many further questions that could be fruitfully explored using the authors' framework. There are, however, important caveats that impact the interpretation of the authors' findings.

      First, reproduction bears no cost in this model. Individuals produce offspring in proportion to their lifetime net energy intake, which is increased by consuming food and decreased by a set amount per turn once infected. However, prior to reproduction, net energy intake is normalized (0-1) according to the lowest individual value within the generation. This means that individuals need not maintain a positive energy balance nor even consume food at all to successfully reproduce, so long as they perform reasonably well relative to other members of the population. Since consuming food is not necessary to reproduce, declining per capita intake due to evolved social avoidance (Fig. 1d) likely decreases the importance of food to an individual's reproductive success relative to simply avoiding infection. This dynamic could explain the delayed emergence of the 'agent avoiding' strategy (Fig. 1a), as this strategy potentially is only viable once per capita intake reaches a sufficiently low level across the population (Fig. 1d). I am curious to know what the results would be if reproduction required some minimal positive net energy, such that individuals must risk food patches in order to reproduce. It would also be useful for the authors to provide information on how net energy intake changes across generations, as well as whether (and if so, how) attraction to the food itself may change over time.

      We thank the reviewer for their assessment of our work, and appreciate the point raised here (and in an earlier review) about individuals potentially reproducing without any intake. We have addressed this by running our default model [repeated introductions, R = 2, dE = 0.25], with a threshold on reproduction such that only individuals with a positive energy balance can reproduce. We mention these results in the text (L. 495 – 500), and show related figures in the SI Appendix. In brief, as the reviewer suggests, agent avoiding is less common for our default parameter combination, but becomes as common as the default combination when the infection cost is doubled (to dE = 0.5).

      We appreciate the reviewer’s suggestion about decreasing per-capita intake being a precondition for the proliferation of the agent avoiding strategy. With our new results, we now show that there is no overall decrease in intake, but the agent avoiding strategy still becomes a common strategy after pathogen introduction. As the reviewer suggests, this is because these individuals have an equivalent net energy as handler tracking individuals, as they are less frequently infected.

      We suggest that the delayed emergence of the agent avoiding strategy is primarily due to mutation limitations – such individuals are uncommon or non-existent in the simulation before pathogen introduction, and random mutations are required for them to emerge. As we have noted in response to an earlier comment, this becomes clear when the mutation rate is reduced from 0.01 to 0.001 – agent avoidance usually does not evolve at all.

      A second important caveat is that the evolutionary responses observed in the model only appear when novel pathogen introductions are extremely frequent. The model assumes no pathogen co-evolution, but rather that the same (or a functionally identical) pathogen is re-introduced every generation (spillover rate = 1.0). When the authors considered whether evolutionary responses were robust to less frequent introductions, however, they found that even with a per-generation spillover rate of 0.5, there was no impact on social movement strategies. The authors do discuss this caveat, but it is worth highlighting here as it bears on how general the study's conclusions may be.

      We appreciate the reviewer’s point entirely. We would point out that current knowledge about pathogen introductions across species and populations in the wild is very poor. However, the ongoing highly pathogenic avian influenza outbreak (Wille and Barr 2022), the spread of multiple strains of SARS-CoV-2 to wild deer in several different human-to-wildlife transmission events, and recent work on the potential for coronavirus spillovers from bats to humans, all suggest that at least some generalist pathogens must circulate quite widely among wildlife, often crossing into novel host species or populations. We have added these considerations to the text on lines 218 – 231.

      We have also added, in order to confront this point more squarely, a new scenario of our model in which the pathogen is introduced just once, and then transmits vertically and horizontally among individuals (lines 519 – 557). This scenario more clearly suggests when evolutionary responses to pathogen introductions are likely to occur, and what their consequences might be for a pathogen becoming endemic in a population. This scenario also serves as a potential starting point for models of host-pathogen trait co-evolution, and we have added this consideration to the text on lines 613 – 623.

      References

      ● Albery, G. F. et al. 2021. Multiple spatial behaviours govern social network positions in a wild ungulate. - Ecology Letters 24: 676–686.

      ● Bastille-Rousseau, G. and Wittemyer, G. 2019. Leveraging multidimensional heterogeneity in resource selection to define movement tactics of animals. - Ecology Letters 22: 1417–1427.

      ● Gupte, P. R. et al. 2021. The joint evolution of animal movement and competition strategies. - bioRxiv in press.

      ● Lion, S. and Boots, M. 2010. Are parasites ‘“prudent”’ in space? - Ecology Letters 13: 1245–1255.

      ● Lloyd-Smith, J. O. et al. 2005. Superspreading and the effect of individual variation on disease emergence. - Nature 438: 355–359.

      ● Nathan, R. et al. 2008. A movement ecology paradigm for unifying organismal movement research. - PNAS 105: 19052–19059.

      ● Pusceddu, M. et al. 2021. Honey bees increase social distancing when facing the ectoparasite varroa destructor. - Science Advances 7: eabj1398.

      ● Sánchez, C. A. et al. 2022. A strategy to assess spillover risk of bat SARS-related coronaviruses in Southeast Asia. - Nat Commun 13: 4380.

      ● Stroeymeyt, N. et al. 2018. Social network plasticity decreases disease transmission in a eusocial insect. - Science 362: 941–945.

      ● Wilber, M. Q. et al. 2022. A model for leveraging animal movement to understand spatio-temporal disease dynamics. - Ecology Letters in press.

      ● Wille, M. and Barr, I. G. 2022. Resurgence of avian influenza virus. - Science 376: 459–460.

    1. Author Response

      Reviewer #2 (Public Review):

      This paper by Angueyra, et al., adds to the field’s current understanding of photoreceptor specification and factors regulating opsin expression in vertebrates. Current models of specification of vertebrate photoreceptors are largely based on studies of mammals. However, a great number of animals including teleosts express a wider array of photoreceptor subtypes. Zebrafish for example have 4 distinct cone subtypes and rods. The approach is sound and the data are quite convincing. The only minor weaknesses are that the statistical analyses need to be revisited and the discussion should be a bit more focused.

      To identify differentially expressed transcription factors, the authors performed bulk RNA-seq of pooled, hand-sorted photoreceptors. The selection criterion was tightly controlled to limit unhealthy cells and cellular debris from other photoreceptors subtypes. The pooling of cells provided a considerable depth of sequencing, orders of magnitude better than scSeq. The authors identified known transcription factors and several that appear to be novel or their role has not been determined. The data are made available on the PIs website as is a program to access and compare the gene expression data.

      The authors then used CRISPR/Cas9 gene targeting of two known and several novel factors identified in their analysis for effects on cell fate decisions and opsin expression. Phenotyping performed on the injected larvae is possible, and the target genes were applied and sequenced to demonstrate the efficiency of the gene targeting. Targeting of 2 genes with know functions in photoreceptor specification in zebrafish, Tbx2b and Foxq2 resulted in the anticipated changes in cell fate, albeit, the strength of the alterations in cell fate in the F0 larvae appears to be less than the published phenotypes for the inherited alleles. Interestingly, the authors also identified the expression of an RH2 opsin in the SWS2 another cone type. The changes are subtle but important.

      The authors then targeted tbx2a, the function of which was not known. The result is quite interesting as it matches the increase of rods and decrease of UV cones observed in tbx2b mutants. However, the injected animals also showed RH2 opsin expression but are now in the LWS cone subtype. These data suggest that Tbx2 transcription factors repress misexpression of opsins in the wrong cell type.

      The authors also show that targeting additional differentially expressed factors does not affect photoreceptor fate or survival in the time frame investigated. These are important data to present. For these or any of the other targeted genes above, did the authors test for changes in photoreceptor number or survival?

      We have attempted to address this point, but the answer is not clear cut. We used activated caspase-3 inmmunolabeling as a marker of apoptosis (Lusk and Kwan 2022). At 5 dpf, the age we chose to make quantifications, we don’t see an increase in activated caspase-3 positive cells when we compare control and tbx2a F0 mutants (Reviewer Figure 1A-B). Labeled cells are very rare and located near the ciliary marginal zone irrespective of genotype. This suggests that there is no detectable active death at this late stage of development in tbx2 F0 mutants. Earlier in development, at 3 dpf, when photoreceptor subtypes first appear, there is also a normal wave of apoptosis in the retina (Blume et al. 2020; Biehlmaier, Neuhauss, and Kohler 2001), resulting in many cells positive for activated caspase-3; our preliminary quantifications don’t show a marked increase in the number of labeled cells in tbx2a F0 mutants, but we consider that it’s likely that subtle effects might be obscured by the physiological wave of apoptosis (Reviewer Figure 1C-D).

      Reviewer Figure 1 - Assessment of apoptosis in tbx2a F0 mutants. (A-B) Confocal images of 5 dpf larval eyes of control (A and A’) and tbx2a F0 mutants (B and B’) counterstained with DAPI (grey) and immunolabeled against activated Caspase 3 (yellow) show sparse and dim labeling, restricted to cells located in the ciliary marginal zone, without clear differences between groups. (C-D) Confocal images of 3 dpf larval eyes of control (C and C’) and tbx2a F0 mutants (D and D’) immunolabeled against activated Caspase 3 show many positive cells, located in all retinal layers, as expected from physiological apoptosis at this stage of development and without clear differences between groups.

      Furthermore, the additional single-cell RNA-seq datasets we have reanalyzed suggest that tbx2a and tbx2b are expressed by other retinal neurons and progenitors and not just photoreceptors (Reviewer Figure 2), further confounding attempts at the quantification of apoptosis specifically in photoreceptor progenitors.

      Reviewer Figure 2 – Expression of tbx2 paralogues across retinal cell types. The transcription factors tbx2a and tbx2b are expressed by many retinal cells. Plots show average counts across clusters in RNA-seq data obtained by Hoang et al. (2020).

      At this stage, we consider that fully resolving this issue is important and will require considerably more work, which we will pursue in the future using full germline mutants and live-imaging experiments.

      Reviewer #3 (Public Review):

      Angueyra et al. tried to establish the method to identify key factors regulating fate decisions in the retinal visual photoreceptor cells by combining transcriptomic and fast genome editing approaches. First, they isolated and pooled five subtypes of photoreceptor cells from the transgenic lines in each of which a specific subtype of photoreceptor cells are labeled by fluorescence protein, and then subjected them to RNA-seq analyses. Second, by comparing the transcriptome data, they extracted the list of the transcription factor genes enriched in the pooled samples. Third, they applied CRISPR-based F0 knockout to functionally identify transcription factor genes involved in cell fate decisions of photoreceptor subtypes. To benchmark this approach, they initially targeted foxq2 and nr2e3 genes, which have been previously shown to regulate S-opsin expression and S-cone cell fate (foxq2) and to regulate rhodopsin expression and rod fate (nr2e3). They then targeted other transcription factor genes in the candidate list and found that tbx2a and tbx2b are independently required for UV-cone specification. They also found that tbx2a expressed in the L-cone subtype and tbx2b expressed in L-cones inhibit M-opsin gene expression in the respective cone subtypes. From these data, the authors concluded that the transcription factors Tbx2a and Tbx2b play a central role in controlling the identity of all photoreceptor subtypes within the retina.

      Overall, the contents of this manuscript are well organized and technically sound. The authors presented convincing data, and carefully analyzed and interpreted them. It includes an evaluation of the presented data on cell-type specific transcriptome by comparing it with previously published ones. I think the current transcriptomic data will be a valuable platform to identify the genes regulating cell-type specific functions, especially in combination with the fast CRISPR-based in vivo screening methods provided here. I hope that the following points would be helpful for the authors to improve the manuscript appropriately.

      1) The manuscript uses the word “FØ” quite often without any proper definition. I wonder how “Ø” should be pronounced - zero or phi? This word is not common and has not been used in previous publications. I feel the phrase “F0 knockout,” which was used in the paper cited by the authors (Kroll et al 2021), is more straightforward. If it is to be used in the manuscript, please define “FØ” and “CRISPR-FØ screening” appropriately, especially in the abstract.

      We have made changes to replace “FØ” to “F0.” In our other citation (Hoshijima et al., 2019), “F0 embryo” was used throughout the paper. Following our references and Dr Kojima’s suggestion, we adopted “F0 mutant larva” as the most straightforward and less confusing term. We have also made changes in the abstract to define our approach more clearly and made appropriate changes throughout the manuscript.

      2) Figure 1-supplement 1 shows that opn1mw4 has quite high (normalized) FPKM in one of the S-cone samples in contrast to the least (or no) expression in the M-cone samples, in which opn1mw4 is expected to be detected. The authors should address a possible origin of this inconsistent result for opn1mw4 expression as well as a technical limitation of using the Tg(opn1mw2:egfp) line for detection of opn1mw4 expression in the GFP-positive cells.

      In Figure 1 - Supplement 1, we had attempted to provide a summarized figure of all phototransduction genes, but the big differences in expression levels — in particular, the high expression of opsins genes — forced us to use gene-by-gene normalization for display. Without normalization, the expression of opn1mw4 is very low across all samples, and its detection in that sole S-cone sample can likely be attributed to some degree of inherent noise in our methods. We have revised Figure 1 - Supplement 1: we find that we can avoid gene-by-gene normalization and still provide a good summary of the expression of phototransduction genes if the heatmap is broken down by gene families, which have more similar expression levels. In addition, we have added caveats to the use of the Tg(opn1mw2:egfp) line as our sole M-cone marker in the results section describing our RNA-seq approach, including our inability to provide data on Opn1mw4-expressing M cones.

      3) The manuscript lacks a description of the sampling time point. It is well known that many genes are expressed with daily (or circadian) fluctuation (cf. Doherty & Kay, 2010 Annu. Rev. Genet.). For example, the cone-specific gene list in Fig.2C includes a circadian clock gene, per3, whose expression was reported to fluctuate in a circadian manner in many tissues of zebrafish including the retina (Kaneko et al. 2006 PNAS). It appears to be cone-specific at this time point of sample collection as shown in Fig.2, but might be expressed in a different pattern at other time points (eg, rod expression). The authors should add, at least, a clear description of the sampling time points so as to make their data more informative.

      We have included this information in the materials and methods. We collected all our samples during the most active peak of the zebrafish circadian rhythm between 11am and 2pm (3h to 6h after light onset) to avoid the influence of circadian fluctuations in our analysis.

    1. Author Response

      Reviewer #1 (Public Review):

      In this work George et al. describe RatInABox, a software system for generating surrogate locomotion trajectories and neural data to simulate the effects of a rodent moving about an arena. This work is aimed at researchers that study rodent navigation and its neural machinery.

      Strengths:

      • The software contains several helpful features. It has the ability to import existing movement traces and interpolate data with lower sampling rates. It allows varying the degree to which rodents stay near the walls of the arena. It appears to be able to simulate place cells, grid cells, and some other features.

      • The architecture seems fine and the code is in a language that will be accessible to many labs.

      • There is convincing validation of velocity statistics. There are examples shown of position data, which seem to generally match between data and simulation.

      Weaknesses:

      • There is little analysis of position statistics. I am not sure this is needed, but the software might end up more powerful and the paper higher impact if some position analysis was done. Based on the traces shown, it seems possible that some additional parameters might be needed to simulate position/occupancy traces whose statistics match the data.

      Thank you for this suggestion. We have added a new panel to figure 2 showing a histogram of the time the agent spends at positions of increasing distance from the nearest wall. As you can see, RatInABox is a good fit to the real locomotion data: positions very near the wall are under-explored (in the real data this is probably because whiskers and physical body size block positions very close to the wall) and positions just away from but close to the wall are slightly over explored (an effect known as thigmotaxis, already discussed in the manuscript).

      As you correctly suspected, fitting this warranted a new parameter which controls the strength of the wall repulsion, we call this “wall_repel_strength”. The motion model hasn’t mathematically changed, all we did was take a parameter which was originally a fixed constant 1, unavailable to the user, and made it a variable which can be changed (see methods section 6.1.3 for maths). The curves fit best when wall_repel_strength ~= 2. Methods and parameters table have been updated accordingly. See Fig. 2e.

      • The overall impact of this work is somewhat limited. It is not completely clear how many labs might use this, or have a need for it. The introduction could have provided more specificity about examples of past work that would have been better done with this tool.

      At the point of publication we, like yourself, also didn’t know to what extent there would be a market for this toolkit however we were pleased to find that there was. In its initial 11 months RatInABox has accumulated a growing, global user base, over 120 stars on Github and north of 17,000 downloads through PyPI. We have accumulated a list of testimonials[5] from users of the package vouching for its utility and ease of use, four of which are abridged below. These testimonials come from a diverse group of 9 researchers spanning 6 countries across 4 continents and varying career stages from pre-doctoral researchers with little computational exposure to tenured PIs. Finally, not only does the community use RatInABox they are also building it: at the time of writing RatInABx has received logged 20 GitHub “Issues” and 28 “pull requests” from external users (i.e. those who aren’t authors on this manuscript) ranging from small discussions and bug-fixes to significant new features, demos and wrappers.

      Abridged testimonials:

      ● “As a medical graduate from Pakistan with little computational background…I found RatInABox to be a great learning and teaching tool, particularly for those who are underprivileged and new to computational neuroscience.” - Muhammad Kaleem, King Edward Medical University, Pakistan

      ● “RatInABox has been critical to the progress of my postdoctoral work. I believe it has the strong potential to become a cornerstone tool for realistic behavioural and neuronal modelling” - Dr. Colleen Gillon, Imperial College London, UK

      ● “As a student studying mathematics at the University of Ghana, I would recommend RatInABox to anyone looking to learn or teach concepts in computational neuroscience.” - Kojo Nketia, University of Ghana, Ghana

      ● “RatInABox has established a new foundation and common space for advances in cognitive mapping research.” - Dr. Quinn Lee, McGill, Canada

      The introduction continues to include the following sentence highlighting examples of past work which relied of generating artificial movement and/or neural dat and which, by implication could have been done better (or at least accelerated and standardised) using our toolbox.

      “Indeed, many past[13, 14, 15] and recent[16, 17, 18, 19, 6, 20, 21] models have relied on artificially generated movement trajectories and neural data.”

      • Presentation: Some discussion of case studies in Introduction might address the above point on impact. It would be useful to have more discussion of how general the software is, and why the current feature set was chosen. For example, how well does RatInABox deal with environments of arbitrary shape? T-mazes? It might help illustrate the tool's generality to move some of the examples in supplementary figure to main text - or just summarize them in a main text figure/panel.

      Thank you for this question. Since the initial submission of this manuscript RatInABox has been upgraded and environments have become substantially more “general”. Environments can now be of arbitrary shape (including T-mazes), boundaries can be curved, they can contain holes and can also contain objects (0-dimensional points which act as visual cues). A few examples are showcased in the updated figure 1 panel e.

      To further illustrate the tools generality beyond the structure of the environment we continue to summarise the reinforcement learning example (Fig. 3e) and neural decoding example in section 3.1. In addition to this we have added three new panels into figure 3 highlighting new features which, we hope you will agree, make RatInABox significantly more powerful and general and satisfy your suggestion of clarifying utility and generality in the manuscript directly.

      On the topic of generality, we wrote the manuscript in such a way as to demonstrate how the rich variety of ways RatInABox can be used without providing an exhaustive list of potential applications. For example, RatInABox can be used to study neural decoding and it can be used to study reinforcement learning but not because it was purpose built with these use-cases in mind. Rather because it contains a set of core tools designed to support spatial navigation and neural representations in general. For this reason we would rather keep the demonstrative examples as supplements and implement your suggestion of further raising attention to the large array of tutorials and demos provided on the GitHub repository by modifying the final paragraph of section 3.1 to read:

      “Additional tutorials, not described here but available online, demonstrate how RatInABox can be used to model splitter cells, conjunctive grid cells, biologically plausible path integration, successor features, deep actor-critic RL, whisker cells and more. Despite including these examples we stress that they are not exhaustive. RatInABox provides the framework and primitive classes/functions from which highly advanced simulations such as these can be built.”

      Reviewer #3 (Public Review):

      George et al. present a convincing new Python toolbox that allows researchers to generate synthetic behavior and neural data specifically focusing on hippocampal functional cell types (place cells, grid cells, boundary vector cells, head direction cells). This is highly useful for theory-driven research where synthetic benchmarks should be used. Beyond just navigation, it can be highly useful for novel tool development that requires jointly modeling behavior and neural data. The code is well organized and written and it was easy for us to test.

      We have a few constructive points that they might want to consider.

      • Right now the code only supports X,Y movements, but Z is also critical and opens new questions in 3D coding of space (such as grid cells in bats, etc). Many animals effectively navigate in 2D, as a whole, but they certainly make a large number of 3D head movements, and modeling this will become increasingly important and the authors should consider how to support this.

      Agents now have a dedicated head direction variable (before head direction was just assumed to be the normalised velocity vector). By default this just smoothes and normalises the velocity but, in theory, could be accessed and used to model more complex head direction dynamics. This is described in the updated methods section.

      In general, we try to tread a careful line. For example we embrace certain aspects of physical and biological realism (e.g. modelling environments as continuous, or fitting motion to real behaviour) and avoid others (such as the biophysics/biochemisty of individual neurons, or the mechanical complexities of joint/muscle modelling). It is hard to decide where to draw but we have a few guiding principles:

      1. RatInABox is most well suited for normative modelling and neuroAI-style probing questions at the level of behaviour and representations. We consciously avoid unnecessary complexities that do not directly contribute to these domains.

      2. Compute: To best accelerate research we think the package should remain fast and lightweight. Certain features are ignored if computational cost outweighs their benefit.

      3. Users: If, and as, users require complexities e.g. 3D head movements, we will consider adding them to the code base.

      For now we believe proper 3D motion is out of scope for RatInABox. Calculating motion near walls is already surprisingly complex and to do this in 3D would be challenging. Furthermore all cell classes would need to be rewritten too. This would be a large undertaking probably requiring rewriting the package from scratch, or making a new package RatInABox3D (BatInABox?) altogether, something which we don’t intend to undertake right now. One option, if users really needed 3D trajectory data they could quite straightforwardly simulate a 2D Environment (X,Y) and a 1D Environment (Z) independently. With this method (X,Y) and (Z) motion would be entirely independent which is of unrealistic but, depending on the use case, may well be sufficient.

      Alternatively, as you said that many agents effectively navigate in 2D but show complex 3D head and other body movements, RatInABox could interface with and feed data downstream to other softwares (for example Mujoco[11]) which specialise in joint/muscle modelling. This would be a very legitimate use-case for RatInABox.

      We’ve flagged all of these assumptions and limitations in a new body of text added to the discussion:

      “Our package is not the first to model neural data[37, 38, 39] or spatial behaviour[40, 41], yet it distinguishes itself by integrating these two aspects within a unified, lightweight framework. The modelling approach employed by RatInABox involves certain assumptions:

      1. It does not engage in the detailed exploration of biophysical[37, 39] or biochemical[38] aspects of neural modelling, nor does it delve into the mechanical intricacies of joint and muscle modelling[40, 41]. While these elements are crucial in specific scenarios, they demand substantial computational resources and become less pertinent in studies focused on higher-level questions about behaviour and neural representations.

      2. A focus of our package is modelling experimental paradigms commonly used to study spatially modulated neural activity and behaviour in rodents. Consequently, environments are currently restricted to being two-dimensional and planar, precluding the exploration of three-dimensional settings. However, in principle, these limitations can be relaxed in the future.

      3. RatInABox avoids the oversimplifications commonly found in discrete modelling, predominant in reinforcement learning[22, 23], which we believe impede its relevance to neuroscience.

      4. Currently, inputs from different sensory modalities, such as vision or olfaction, are not explicitly considered. Instead, sensory input is represented implicitly through efficient allocentric or egocentric representations. If necessary, one could use the RatInABox API in conjunction with a third-party computer graphics engine to circumvent this limitation.

      5. Finally, focus has been given to generating synthetic data from steady-state systems. Hence, by default, agents and neurons do not explicitly include learning, plasticity or adaptation. Nevertheless we have shown that a minimal set of features such as parameterised function-approximator neurons and policy control enable a variety of experience-driven changes in behaviour the cell responses[42, 43] to be modelled within the framework.

      • What about other environments that are not "Boxes" as in the name - can the environment only be a Box, what about a circular environment? Or Bat flight? This also has implications for the velocity of the agent, etc. What are the parameters for the motion model to simulate a bat, which likely has a higher velocity than a rat?

      Thank you for this question. Since the initial submission of this manuscript RatInABox has been upgraded and environments have become substantially more “general”. Environments can now be of arbitrary shape (including circular), boundaries can be curved, they can contain holes and can also contain objects (0-dimensional points which act as visual cues). A few examples are showcased in the updated figure 1 panel e.

      Whilst we don’t know the exact parameters for bat flight users could fairly straightforwardly figure these out themselves and set them using the motion parameters as shown in the table below. We would guess that bats have a higher average speed (speed_mean) and a longer decoherence time due to increased inertia (speed_coherence_time), so the following code might roughly simulate a bat flying around in a 10 x 10 m environment. Author response image 1 shows all Agent parameters which can be set to vary the random motion model.

      Author response image 1.

      • Semi-related, the name suggests limitations: why Rat? Why not Agent? (But its a personal choice)

      We came up with the name “RatInABox” when we developed this software to study hippocampal representations of an artificial rat moving around a closed 2D world (a box). We also fitted the random motion model to open-field exploration data from rats. You’re right that it is not limited to rodents but for better or for worse it’s probably too late for a rebrand!

      • A future extension (or now) could be the ability to interface with common trajectory estimation tools; for example, taking in the (X, Y, (Z), time) outputs of animal pose estimation tools (like DeepLabCut or such) would also allow experimentalists to generate neural synthetic data from other sources of real-behavior.

      This is actually already possible via our “Agent.import_trajectory()” method. Users can pass an array of time stamps and an array of positions into the Agent class which will be loaded and smoothly interpolated along as shown here in Fig. 3a or demonstrated in these two new papers[9,10] who used RatInABox by loading in behavioural trajectories.

      • What if a place cell is not encoding place but is influenced by reward or encodes a more abstract concept? Should a PlaceCell class inherit from an AbstractPlaceCell class, which could be used for encoding more conceptual spaces? How could their tool support this?

      In fact PlaceCells already inherit from a more abstract class (Neurons) which contains basic infrastructure for initialisation, saving data, and plotting data etc. We prefer the solution that users can write their own cell classes which inherit from Neurons (or PlaceCells if they wish). Then, users need only write a new get_state() method which can be as simple or as complicated as they like. Here are two examples we’ve already made which can be found on the GitHub:

      Author response image 2.

      Phase precession: PhasePrecessingPlaceCells(PlaceCells)[12] inherit from PlaceCells and modulate their firing rate by multiplying it by a phase dependent factor causing them to “phase precess”.

      Splitter cells: Perhaps users wish to model PlaceCells that are modulated by recent history of the Agent, for example which arm of a figure-8 maze it just came down. This is observed in hippocampal “splitter cell”. In this demo[1] SplitterCells(PlaceCells) inherit from PlaceCells and modulate their firing rate according to which arm was last travelled along.

      • This a bit odd in the Discussion: "If there is a small contribution you would like to make, please open a pull request. If there is a larger contribution you are considering, please contact the corresponding author3" This should be left to the repo contribution guide, which ideally shows people how to contribute and your expectations (code formatting guide, how to use git, etc). Also this can be very off-putting to new contributors: what is small? What is big? we suggest use more inclusive language.

      We’ve removed this line and left it to the GitHub repository to describe how contributions can be made.

      • Could you expand on the run time for BoundaryVectorCells, namely, for how long of an exploration period? We found it was on the order of 1 min to simulate 30 min of exploration (which is of course fast, but mentioning relative times would be useful).

      Absolutely. How long it takes to simulate BoundaryVectorCells will depend on the discretisation timestep and how many neurons you simulate. Assuming you used the default values (dt = 0.1, n = 10) then the motion model should dominate compute time. This is evident from our analysis in Figure 3f which shows that the update time for n = 100 BVCs is on par with the update time for the random motion model, therefore for only n = 10 BVCs, the motion model should dominate compute time.

      So how long should this take? Fig. 3f shows the motion model takes ~10-3 s per update. One hour of simulation equals this will be 3600/dt = 36,000 updates, which would therefore take about 72,000*10-3 s = 36 seconds. So your estimate of 1 minute seems to be in the right ballpark and consistent with the data we show in the paper.

      Interestingly this corroborates the results in a new inset panel where we calculated the total time for cell and motion model updates for a PlaceCell population of increasing size (from n = 10 to 1,000,000 cells). It shows that the motion model dominates compute time up to approximately n = 1000 PlaceCells (for BoundaryVectorCells it’s probably closer to n = 100) beyond which cell updates dominate and the time scales linearly.

      These are useful and non-trivial insights as they tell us that the RatInABox neuron models are quite efficient relative to the RatInABox random motion model (something we hope to optimise further down the line). We’ve added the following sentence to the results:

      “Our testing (Fig. 3f, inset) reveals that the combined time for updating the motion model and a population of PlaceCells scales sublinearly O(1) for small populations n > 1000 where updating the random motion model dominates compute time, and linearly for large populations n > 1000. PlaceCells, BoundaryVectorCells and the Agent motion model update times will be additionally affected by the number of walls/barriers in the Environment. 1D simulations are significantly quicker than 2D simulations due to the reduced computational load of the 1D geometry.”

      And this sentence to section 2:

      “RatInABox is fundamentally continuous in space and time. Position and velocity are never discretised but are instead stored as continuous values and used to determine cell activity online, as exploration occurs. This differs from other models which are either discrete (e.g. “gridworld” or Markov decision processes) or approximate continuous rate maps using a cached list of rates precalculated on a discretised grid of locations. Modelling time and space continuously more accurately reflects real-world physics, making simulations smooth and amenable to fast or dynamic neural processes which are not well accommodated by discretised motion simulators. Despite this, RatInABox is still fast; to simulate 100 PlaceCell for 10 minutes of random 2D motion (dt = 0.1 s) it takes about 2 seconds on a consumer grade CPU laptop (or 7 seconds for BoundaryVectorCells).”

      Whilst this would be very interesting it would likely represent quite a significant edit, requiring rewriting of almost all the geometry-handling code. We’re happy to consider changes like these according to (i) how simple they will be to implement, (ii) how disruptive they will be to the existing API, (iii) how many users would benefit from the change. If many users of the package request this we will consider ways to support it.

      • In general, the set of default parameters might want to be included in the main text (vs in the supplement).

      We also considered this but decided to leave them in the methods for now. The exact value of these parameters are subject to change in future versions of the software. Also, we’d prefer for the main text to provide a low-detail high-level description of the software and the methods to provide a place for keen readers to dive into the mathematical and coding specifics.

      • It still says you can only simulate 4 velocity or head directions, which might be limiting.

      Thanks for catching this. This constraint has been relaxed. Users can now simulate an arbitrary number of head direction cells with arbitrary tuning directions and tuning widths. The methods have been adjusted to reflect this (see section 6.3.4).

      • The code license should be mentioned in the Methods.

      We have added the following section to the methods:

      6.6 License RatInABox is currently distributed under an MIT License, meaning users are permitted to use, copy, modify, merge publish, distribute, sublicense and sell copies of the software.

    1. Author Response:

      Reviewer #1:

      The largest concern with the manuscript is its use of resting-state recordings in Parkinson's Disease patients on and off levodopa, which the authors interpret as indicative of changes in dopamine levels in the brain but not indicative of altered movement and other neural functions. For example, when patients are off medication, their UPDRS scores are elevated, indicating they likely have spontaneous movements or motor abnormalities that will likely produce changed activations in MEG and LFP during "rest". Authors must address whether it is possible to study a true "resting state" in unmedicated patients with severe PD. At minimum this concern must be discussed in the manuscript.

      We agree that Parkinson’s disease can lead to unwanted movements such as tremor as well as hyperkinesias. This would of course be a deviation from a resting state in healthy subjects. However, such movements are part of the disease and occur unwillingly. The main tremor in Parkinson’s disease is a rest tremor and - as the name already suggests – it occurs while not doing anything. Therefore, such movements can arguably be considered part of the resting state of Parkinson’s disease. Resting state activity with and without medication is therefore still representative for changes in brain activity in Parkinson’s patients and indicative of alterations due to medication.

      To further investigate the effect of movement in our patients, we subdivided the UPDRS part 3 score into tremor and non-tremor subscores. For the tremor subscore we took the mean of item 15 and 17 of the UPDRS, whereas for the non-tremor subscore items 1, 2, 3, 9, 10, 12, 13, and 14 were averaged. Following Spiegel et al., 2007, we classified patients as akinetic-rigid (non-tremor score at least twice the tremor score), tremor-dominant (tremor score at least twice as large as the non-tremor score), and mixed type (for the remaining scores). Of the 17 patients, 1 was tremor dominant and 1 was classified as mixed type (his/her non-tremor score was greater than tremor score). None of our patients exhibited hyperkinesias during the recording. To exclude that our results are driven by tremor-related movement, we re-ran the HMM without the tremor-dominant and the mixed-type patient (see Figure R1 response letter).

      ON medication results for all HMM states remained the same. OFF medication results for the Ctx-Ctx and STN-STN state remained the same as well. The Ctx-STN state OFF medication was split into two states: Sensorimotor-STN connectivity was captured in one state and all other types of Ctx-STN connections were captured in another state (see Figure 1 response letter. The important point is that the biological conclusions stand across these solutions. Regardless, both with and without the two subjects a stable covariance matrix entailing sensorimotor-STN connectivity was determined, which is the main finding for the Ctx-STN state OFF medication.

      We therefore discuss this issue now within the limitation section (page 20):

      “Both motor impairment and motor improvement can cause movement during the resting state in PD. While such movement is a deviation from a resting state in healthy subjects, such movements are part of the disease and occur unwillingly. Therefore, such movements can arguably be considered part of the resting state of Parkinson’s disease. None of the patients in our cohort experienced hyperkinesia during the recording. All patients except for two were of the akinetic-rigid subtype. We verified that tremor movement is not driving our results. Recalculating the HMM states without these 2 subjects, even though it slightly changed some particular aspects of the HMM solution did not materially affect the conclusions.”

      Figure R1: States obtained after removing one tremor dominant and one mixed type patient from analysis. Panel C shows the split OFF medication cortico-STN state. Most of the cortico-STN connectivity is captured by the state shown in the top row (Figure 1 C OFF). Only the motor-STN connectivity in the alpha and beta band (along with a medial frontal-STN connection in the alpha band) is captured separately by the states labeled “OFF Split” (Figure 1 C OFF SPLIT).

      This reviewer was unclear on why increased "communication" in the medial OFC in delta and theta was interpreted as a pathological state indicating deteriorated frontal executive function. Given that the authors provide no evidence of poor executive function in the patients studied, the authors must at least provide evidence from other studies linking this feature with impaired executive function.

      If we understand the comment correctly it refers to the statement in the abstract “Dopaminergic medication led to communication within the medial and orbitofrontal cortex in the delta/theta frequency range. This is in line with deteriorated frontal executive functioning as a side effect of dopamine treatment in Parkinson’s disease”

      This statement is based on the dopamine overdose hypothesis reported in the Parkinson’s disease (PD) literature (Cools 2001; Kelly et al. 2009; MacDonald and Monchi 2011; Vaillancourt et al. 2013). We have elaborated upon the dopamine overdose hypothesis in the discussion on page 16. In short, dopaminergic neurons are primarily lost from the substantia nigra in PD, which causes a higher dopamine depletion in the dorsal striatal circuitry than within the ventral striatal circuits (Kelly et al. 2009; MacDonald and Monchi 2011). Thus, dopaminergic medication to treat the PD motor symptoms leads to increased dopamine levels in the ventral striatal circuits including frontal cortical activity, which can potentially explain the cognitive deficits observed in PD (Shohamy et al. 2005; George et al. 2013). We adjusted the abstract to read:

      “Dopaminergic medication led to coherence within the medial and orbitofrontal cortex in the delta/theta frequency range. This is in line with known side effects of dopamine treatment such as deteriorated executive functions in Parkinson’s disease.”

      In this article, authors repeatedly state their method allows them to delineate between pathological and physiological connectivity, but they don't explain how dynamical systems and discrete-state stochasticity support that goal.

      To recapitulate, the HMM divides a continuous time series into discrete states. Each state is a time-delay embedded covariance matrix reflecting the underlying connectivity between brain regions as well as the specific temporal dynamics in the data when such state is active. See Packard et al., (1980) for details about how a time-delay embedding characterises a linear dynamical system.

      Please note that the HMM was used as a data-driven, descriptive approach without explicitly assuming any a-priori relationship with pathological or physiological states. The relation between biology and the HMM states, thus, purely emerged from the data; i.e. is empirical. What we claim in this work is simply that the features captured by the HMM hold some relation with the physiology even though the estimation of the HMM was completely unsupervised (i.e. blind to the studied conditions). We have added this point also to the limitations of the study on page 19 and the following to the introduction to guide the reader more intuitively (page 4):

      “To allow the system to dynamically evolve, we use time delay embedding. Theoretically, delay embedding can reveal the state space of the underlying dynamical system (Packard et al., 1980). Thus, by delay-embedding PD time series OFF and ON medication we uncover the differential effects of a neurotransmitter such as dopamine on underlying whole brain connectivity.”

      Reviewer #2:

      Sharma et al. investigated the effect of dopaminergic medication on brain networks in patients with Parkinson's disease combining local field potential recordings from the subthalamic nucleus and magnetencephalography during rest. They aim to characterize both physiological and pathological spectral connectivity.

      They identified three networks, or brain states, that are differentially affected by medication. Under medication, the first state (termed hyperdopaminergic state) is characterized by increased connectivity of frontal areas, supposedly responsible for deteriorated frontal executive function as a side effect of medical treatment. In the second state (communication state), dopaminergic treatment largely disrupts cortico-STN connectivity, leaving only selected pathways communicating. This is in line with current models that propose that alleviation of motor symptoms relates to the disruption of pathological pathways. The local state, characterized by STN-STN oscillatory activities, is less affected by dopaminergic treatment.

      The authors utilize sophisticated methods with the potential to uncover the dynamics of activities within different brain network, which opens the avenue to investigate how the brain switches between different states, and how these states are characterized in terms of spectral, local, and temporal properties. The conclusions of this paper are mostly well supported by data, but some aspects, mainly about the presentation of the results, remain:

      We would like to thank the reviewer for his succinct and clear understanding of our work.

      1) The presentation of the results is suboptimal and needs improvement to increase readers' comprehension. At some points this section seems rather unstructured, some results are presented multiple times, and some passages already include points rather suitable for the discussion, which adds too much information for the results section.

      We have removed repetitions in the results sections and removed the rather lengthy introductory parts of each subsection. Moreover, we have now moved all parts, which were already an interpretation of our findings to the discussion.

      2) It is intriguing that the hyperdopaminergic state is not only identified under medication but also in the off-state. This is intriguing, especially with the results on the temporal properties of states showing that the time of the hyperdopaminergic state is unaffected by medication. When such a state can be identified even in the absence of levodopa, is it really optimal to call it "hyperdopaminergic"? Do the results not rather suggest that the identified network is active both off and on medication, while during the latter state its' activities are modulated in a way that could relate to side effects?

      The reviewer’s interpretations of the results pertaining to the hyper-dopaminergic state are correct. The states had been named post-hoc as explained in the results section. The hyper-dopaminergic state’s name derived from it showing the overdosing effects of dopamine. Of course, these results are only visible on medication. But off medication, this state also exists without exhibiting the effects of excess dopamine. To avoid confusion or misinterpretation of the findings and also following the relevant comment by reviewer 1, we renamed all states to be more descriptive:

      Hyperdopaminergic > Cortico-cortical state

      Communication > Cortico-STN state

      Local > STN-STN state.

      3) Some conclusions need to be improved/more elaborated. For example, the coherence of bilateral STN-STN did not change between medication off and on the state. Yet it is argued that a) "Since synchrony limits information transfer (Cruz et al. 2009; Cagnan, Duff, and Brown 2015; Holt et al. 2019) , local oscillations are a potential mechanism to prevent excessive communication with the cortex" (line 436) and b) "Another possibility is that a loss of cortical afferents causes local basal ganglia oscillations to become more pronounced" (line 438). Can these conclusions really be drawn if the local oscillations did not change in the first place?

      We apologize for the unclear description. Our conclusion was based on the following results:

      a) We state that STN-STN connectivity as measured by the magnitude of STN-STN coherence does not change OFF vs ON medication in the Cortico-STN state. This result is obtained using inter-medication analysis.

      b) But ON medication, STN-STN coherence in the Cortico-STN state was significantly different from mean coherence within the ON condition. These results are obtained using intra-medication analysis.

      Based on this, we conclude that in the Cortico-STN state, although OFF vs ON medication the magnitude of STN-STN coherence was unchanged, the STN-STN coherence was significantly different from mean coherence in the ON medication condition. The emergence of synchronous STN-STN activity may limit information exchange between STN and cortex ON medication.

      An alternative explanation for these findings might be a mechanism preventing connectivity between cortex and the STN ON medication. This missing interaction between STN and cortex might cause STN-STN oscillations to increase compared to the mean coherence within the ON state. Unfortunately, we cannot test such causal influences with our analysis.

      We have added the following discussion to the manuscript on page 17 in order to improve the exposition:

      “Bilateral STN–STN coherence in the alpha and beta band did not change in the cortico-STN state ON versus OFF medication (InterMed analysis). However, STN-STN coherence was significantly higher than the mean level ON medication (IntraMed analysis). Since synchrony limits information transfer (Cruz et al. 2009; Cagnan, Duff, and Brown 2015; Holt et al. 2019), the high coherence within the STN ON medication could prevent communication with the cortex. A different explanation would be that a loss of cortical afferents leads to increased local STN coherence. The causal nature of the cortico-basal ganglia interaction is an endeavour for future research.”

      Reviewer #3:

      In PD, pathological neuronal activity along the cortico-basal ganglia network notably consists in the emergence of abnormal synchronized oscillatory activity. Nevertheless, synchronous oscillatory activity is not necessarily pathological and also serve crucial cognitive functions in the brain. Moreover, the effect of dopaminergic medication on oscillatory network connectivity occurring in PD are still poorly understood. To clarify these issues, Sharma and colleagues simultaneously-recorded MEG-STN LFP signals in PD patients and characterized the effect of dopamine (ON and OFF dopaminergic medication) on oscillatory whole-brain networks (including the STN) in a time-resolved manner. Here, they identified three physiologically interpretable spectral connectivity patterns and found that cortico-cortical, cortico-STN, and STN-STN networks were differentially modulated by dopaminergic medication.

      Strengths:

      1) Both the methodological and experimental approaches used are thoughtful and rigorous.

      a) The use of an innovative data-driven machine learning approach (by employing a hidden Markov model), rather than hand-crafted analyses, to identify physiologically interpretable spectral connectivity patterns (i.e., distinct networks/states) is undeniably an added value. In doing so, the results are not biased by the human expertise and subjectivity, which make them even more solid.

      b) So far, the recurrent oscillatory patterns of transient network connectivity within and between the cortex and the STN reported in PD was evaluated/assessed to specific cortico-STN spectral connectivity. Conversely, whole-brain MEG studies in PD patients did not account for cortico-STN and STN-STN connectivity. Here, the authors studied, for the first time, the whole-brain connectivity including the STN (whole brain-STN approach) and therefore provide new evidence of the brain connectivity reported in PD, as well as new information regarding the effect of dopaminergic medication on the recurrent oscillatory patterns of transient network connectivity within and between the cortex and the STN reported in PD.

      2) Studying the temporal properties of the recurrent oscillatory patterns of transient network connectivity both ON and OFF medication is extremely important and provide interesting and crucial information in order to delineated pathological versus physiologically-relevant spectral brain connectivity in PD.

      We would like to thank the reviewer for their valuable feedback and correct interpretation of our manuscript.

      Weaknesses:

      1) In this study, the authors implied that the ON dopaminergic medication state correspond to a physiological state. However, as correctly mentioned in the limitations of the study, they did not have (for obvious reasons) a control/healthy group. Moreover, no one can exclude the emergence of compensatory and/or plasticity mechanisms in the brain of the PD patients related to the duration of the disease and/or the history of the chronic dopamine-replacement therapy (DRT). Duration of the disease and DRT history should be therefore considered when characterizing the recurrent oscillatory patterns of transient network connectivity within and between the cortex and the STN reported in PD, as well as when examining the effect of the dopaminergic medication on the functioning of these specific networks.

      We would like to thank the reviewer for pointing this out. We regressed duration of disease (year of measurement – year of onset) on the temporal properties of the HMM states. We found no relationship between any of the temporal properties and disease duration. Similarly, we regressed levodopa equivalent dosage for each subject on the temporal properties and found no relationship. We now discuss this point in the manuscript (page 20):

      “A further potential influencing factor might be the disease duration and the amount of dopamine patients are receiving. Both factors were not significantly related to the temporal properties of the states.”

      2) Here, the authors recorded LFPs in the STN activity. LFP represents sub-threshold (e.g., synaptic input) activity at best (Buzsaki et al., 2012; Logothetis, 2003). Recent studies demonstrated that mono-polar, but also bi-polar, BG LFPs are largely contaminated by volume conductance of cortical electroencephalogram (EEG) activity even when re-referenced (Lalla et al., 2017; Marmor et al., 2017). Therefore, it is likely that STN LFPs do not accurately reflect local cellular activity. In this study, the authors examined and measured coherence between cortical areas and STN. However, they cannot guarantee that STN signals were not contaminated by volume conducted signals from the cortex.

      We appreciate this concern and thank the reviewer for bringing it up. Marmor et al. (2017) investigated this on humans and is therefore most closely related to our research. They find that re-referenced STN recordings are not contaminated by cortical signals. Furthermore, the data in Lalla et al. (2017) is based on recordings in rats, making a direct transfer to human STN recordings problematic due to the different brain sizes. Since we re-referenced our LFP signals as recommended in the Marmor paper, we think that contamination due to cortical signals is relatively minor; see Litvak et al. (2011), Hirschmann et al. (2013), and Neumann et al. (2016) for additional references supporting this. That being said, we now discuss this potential issue in the paper on page 20.

      “Lastly, we recorded LFPs from within the STN –an established recording procedure during the implantation of DBS electrodes in various neurological and psychiatric diseases. Although for Parkinson patients results on beta and tremor activity within the STN have been reproduced by different groups (Reck et al. 2010, Litvak et al. 2011, Florin et al. 2013, Hirschmann et al. 2013, Neumann et al. 2016), it is still not fully clear whether these LFP signals are contaminated by volume-conducted cortical activity. However, while volume conduction seems to be a larger problem in rodents even after re-referencing the LFP signal (Lalla et al. 2017), the same was not found in humans (Marmor et al. 2017).”

      3) The methods and data processing are rigorous but also very sophisticated which make the perception of the results in terms of oscillatory activity and neural synchronization difficult.

      To aid intuition on how to interpret the result in light of the methods used, one can compare the analysis pipeline to a windowing approach. In a more standard approach, windows of different time length can be defined for different epochs within the time series and for each window coherence and connectivity can be determined. The difference in our approach is that we used an unsupervised learning algorithm to select windows of varying length based on recurring patterns of whole brain network activity. Within those defined windows we then determine the oscillatory properties via coherence and power – which is the same as one would do in a classical analysis. We have added an explanation of the concept of “oscillatory activity” within our framework to the introduction (page 2 footnote):

      “For the purpose of our paper, we refer to oscillatory activity or oscillations as recurrent, but transient frequency–specific patterns of network activity, even though the underlying patterns can be composed of either sustained rhythmic activity, neural bursting, or both (Quinn et al. 2019).”

      Moreover, we provide a more intuitive explanation of the analysis within the first section of the results (page 4):

      “Using an HMM, we identified recurrent patterns of transient network connectivity between the cortex and the STN, which we henceforth refer to as an ‘HMM state’. In comparison to classic sliding-window analysis, an HMM solution can be thought of as a data-driven estimation of time windows of variable length (within which a particular HMM state was active): once we know the time windows when a particular state is active, we compute coherence between different pairs of regions for each of these recurrent states.”

      4) Previous studies have shown that abnormal oscillations within the STN of PD patients are limited to its dorsolateral/motor region, thus dividing the STN into a dorsolateral oscillatory/motor region and ventromedial non-oscillatory/non-motor region (Kuhn et al. 2005; Moran et al. 2008; Zaidel et al. 2009, 2010; Seifreid et al. 2012; Lourens et al. 2013, Deffains et al., 2014). However, the authors do not provide clear information about the location of the LFP recordings within the STN.

      We selected the electrode contacts based on intraoperative microelectrode recordings (for details, see page 23). The first directional recording height after the entry into the STN was selected to obtain the three directional LFP recordings from the respective hemisphere. This practice has been proven to improve target location (Kochanski et al., 2019; Krauss et al., 2021). The common target area for DBS surgery is the dorsolateral STN. To confirm that the electrodes were actually located within this part of the STN, we now reconstructed the DBS location with Lead-DBS (Horn et al. 2019). All electrodes – except for one – were located within the dorsolateral STN (see figure 7 of the manuscript). To exclude that our results were driven by outlier, we reanalysed our data without this patient. No change in the overall connectivity pattern was observed (see figure R3 of the response letter).

      Figure R2: Lead DBS reconstruction of the location of electrodes in the STN for different subjects. The red electrodes have not been placed properly in the STN. The contacts marked in red represent the directional contacts from which the data was used for analysis.

      Figure R3: HMM states obtained after running the analysis without the subject with the electrode outside the STN.

      References:

      Buzsáki G, Anastassiou CA, Koch C. The origin of extracellular fields and currents-EEG, ECoG, LFP and spikes. Nat Rev Neurosci 2012; 13: 407–20.

      Cagnan H, Duff EP, Brown P. The relative phases of basal ganglia activities dynamically shape effective connectivity in Parkinson’s disease. Brain 2015; 138: 1667–78.

      Cools R. Enhanced or impaired cognitive function in Parkinson’s disease as a function of dopaminergic medication and task demands. Cereb Cortex 2001; 11: 1136–43.

      Cruz A V., Mallet N, Magill PJ, Brown P, Averbeck BB. Effects of dopamine depletion on network entropy in the external globus pallidus. J Neurophysiol 2009; 102: 1092–102.

      Florin E, Erasmi R, Reck C, Maarouf M, Schnitzler A, Fink GR, et al. Does increased gamma activity in patients suffering from Parkinson’s disease counteract the movement inhibiting beta activity? Neuroscience 2013; 237: 42–50.

      George JS, Strunk J, Mak-Mccully R, Houser M, Poizner H, Aron AR. Dopaminergic therapy in Parkinson’s disease decreases cortical beta band coherence in the resting state and increases cortical beta band power during executive control. NeuroImage Clin 2013; 3: 261–70.

      Hirschmann J, Özkurt TE, Butz M, Homburger M, Elben S, Hartmann CJ, et al. Differential modulation of STN-cortical and cortico-muscular coherence by movement and levodopa in Parkinson’s disease. Neuroimage 2013; 68: 203–13.

      Holt AB, Kormann E, Gulberti A, Pötter-Nerger M, McNamara CG, Cagnan H, et al. Phase-dependent suppression of beta oscillations in parkinson’s disease patients. J Neurosci 2019; 39: 1119–34.

      Horn A, Li N, Dembek TA, Kappel A, Boulay C, Ewert S, et al. Lead-DBS v2: Towards a comprehensive pipeline for deep brain stimulation imaging. Neuroimage 2019; 184: 293–316.

      Kelly C, De Zubicaray G, Di Martino A, Copland DA, Reiss PT, Klein DF, et al. L-dopa modulates functional connectivity in striatal cognitive and motor networks: A double-blind placebo-controlled study. J Neurosci 2009; 29: 7364–78.

      Kochanski RB, Bus S, Brahimaj B, Borghei A, Kraimer KL, Keppetipola KM, et al. The impact of microelectrode recording on lead location in deep brain stimulation for the treatment of movement disorders. World Neurosurg 2019; 132: e487–95.

      Krauss P, Oertel MF, Baumann-Vogel H, Imbach L, Baumann CR, Sarnthein J, et al. Intraoperative neurophysiologic assessment in deep brain stimulation surgery and its impact on lead placement. J Neurol Surgery, Part A Cent Eur Neurosurg 2021; 82: 18–26.

      Lalla L, Rueda Orozco PE, Jurado-Parras MT, Brovelli A, Robbe D. Local or not local: Investigating the nature of striatal theta oscillations in behaving rats. eNeuro 2017; 4: 128–45.

      Litvak V, Jha A, Eusebio A, Oostenveld R, Foltynie T, Limousin P, et al. Resting oscillatory cortico-subthalamic connectivity in patients with Parkinson’s disease. Brain 2011; 134: 359–74.

      MacDonald PA, MacDonald AA, Seergobin KN, Tamjeedi R, Ganjavi H, Provost JS, et al. The effect of dopamine therapy on ventral and dorsal striatum-mediated cognition in Parkinson’s disease: Support from functional MRI. Brain 2011; 134: 1447–63.

      MacDonald PA, Monchi O. Differential effects of dopaminergic therapies on dorsal and ventral striatum in Parkinson’s disease: Implications for cognitive function. Parkinsons Dis 2011; 2011: 1–18.

      Marmor O, Valsky D, Joshua M, Bick AS, Arkadir D, Tamir I, et al. Local vs. volume conductance activity of field potentials in the human subthalamic nucleus. J Neurophysiol 2017; 117: 2140–51.

      Neumann WJ, Degen K, Schneider GH, Brücke C, Huebl J, Brown P, et al. Subthalamic synchronized oscillatory activity correlates with motor impairment in patients with Parkinson’s disease. Mov Disord 2016; 31: 1748–51.

      Packard NH, Crutchfield JP, Farmer JD, Shaw RS. Geometry from a time series. Phys Rev Lett 1980; 45: 712–6.

      Quinn AJ, van Ede F, Brookes MJ, Heideman SG, Nowak M, Seedat ZA, et al. Unpacking Transient Event Dynamics in Electrophysiological Power Spectra. Brain Topogr 2019; 32: 1020–34.

      Reck C, Himmel M, Florin E, Maarouf M, Sturm V, Wojtecki L, et al. Coherence analysis of local field potentials in the subthalamic nucleus: Differences in parkinsonian rest and postural tremor. Eur J Neurosci 2010; 32: 1202–14.

      Shohamy D, Myers CE, Grossman S, Sage J, Gluck MA. The role of dopamine in cognitive sequence learning: Evidence from Parkinson’s disease. Behav Brain Res 2005; 156: 191–9.

      Spiegel J, Hellwig D, Samnick S, Jost W, Möllers MO, Fassbender K, et al. Striatal FP-CIT uptake differs in the subtypes of early Parkinson’s disease. J Neural Transm 2007; 114: 331–5.

      Vaillancourt DE, Schonfeld D, Kwak Y, Bohnen NI, Seidler R. Dopamine overdose hypothesis: Evidence and clinical implications. Mov Disord 2013; 28: 1920–9.

    1. Author Response

      Reviewer #1 (Public Review):

      1-1. I do have some concerns that the differences in network clustering reported in Fig 6 may be due to noise and I think the comparisons against the HCP parcellation could be more robust. Specifically, with regard to the network clustering in Fig 6. The authors use a clustering algorithm (which is not explained) to cluster the parcels into different functional networks. They achieve this by estimating the mean time series for each parcel in each individual, which they then correlate between the n regions, to generate an nxn connectivity matrix. This they then binarise, before averaging across individuals within an age group. It strikes me that binarising before averaging will artificially reduce connections for which only a subset of individuals are set to zero. Therefore averaging should really occur before binarising. Then I think the stability of these clusters should be explored by creating random repeat and generation groups (as done for the original parcells) or just by bootstrapping the process. I would be interested to see whether after all this the observation that the posterior frontoparietal expands to include the parahippocampal gryus from 3-6 months and then disappears at 9 months - remains.

      We thank the reviewer for this insightful comment on our clustering process. For the step of “binarizing before averaging”, we followed the method proposed by Yeo et al (1). In this method, all correlation matrices are binarized according to the individual-specific thresholds. Specifically, each individual-specific threshold is determined according to the percentile, and only 10% of connections are kept and set to 1, while all other connections are set to 0. Yeo et al. (1) explained their motivation for doing so as “the binarization of the correlation matrix leads to significantly better clustering results, although the algorithm appears robust to the particular choice of the threshold”. We consider that the possible reason is that the binarization of connectivity in each individual offers a certain level of normalization so that each subject can contribute the same number of connections. If averaging occurs before binarizing, the actual connectivity contributed by different subjects would be different, which leads to bias. Meanwhile, we tested the stability of ‘binarizing first’ and ‘averaging first’, and the result is shown in Fig. R1 below. This figure suggests a similar conclusion as (1), where binarizing first before averaging leads to better clustering stability. We added the motivation of binarizing before averaging in the revised manuscript between line 577 and line 581.

      Fig. R1. The comparison of clustering stability of different methods. The red line refers to the clustering stability when binarizing the correlation matrices first and then averaging the matrices across individuals, while the blue line refers to the clustering stability when averaging the correlation matrices across individuals first and then binarizing the average matrix.

      For the final clustering results, we performed our clustering method using bootstrapping 100 times, and the final result is a majority voting of each parcel. The comparison of these two results is shown in Fig. R2. Overall, we do observe good repeatability between these two results. However, we also observed that some parcels show different patterns between the two results, especially for those parcels that are spatially located around the boundaries of networks or the medial wall. The pattern of the observation that “the posterior frontoparietal expands to include the parahippocampal gyrus from 3-6 months and then disappears at 9 months – remains” was not repeated in the bootstrapped results. These results might suggest that the clustering method is quite robust, the discovered patterns are relatively stable, and the differences between our original results and bootstrapping results might be caused by noises or inter-subject variabilities.

      Fig. R2. Top panel: the network clustering results using all data in the original manuscript. Bottom panel: the network clustering results using majority voting through 100 times of bootstrapping. Black circles and red arrows point to the parahippocampal gyrus, which was included in the posterior frontoparietal network, and is not well repeated in the bootstrapped results. (M: months)

      1-2. Then with regard to the comparison against the HCP parcellation, this is only qualitative. The authors should see whether the comparison is quantitatively better relative to the null clusterings that they produce.

      Thank you for this great suggestion! As suggested, we added this quantitative comparison using the Hausdorff distance. Similar to the comparison in parcel variance and homogeneity, the 1,000 null parcellations were created by randomly rotating our parcellation with small angles on the spherical surface 1,000 times. We compared our parcellation and the null parcellations by accordingly evaluating their Hausdorff distances to some specific areas of the HCP parcellation on the spherical space, including Brodmann's area 2, 3b, 4+3a, 44+45, V1, and MT+MST. The results are listed in Figure 4. From the results, we can observe that our parcellation generally shows statistically much lower Hausdorff distances to the HCP parcellation, suggesting that our parcellation generates parcel borders that are closer to HCP parcellations compared to the null parcellations.

      However, we noticed very few null parcellations that show smaller Hausdorff distances compared to our parcellation. A possible reason comes from our surface registration process with the HCP template purely based on cortical folding, without using functional gradient density maps, which are not available in the HCP template. As a result, this does not ensure high-quality functional alignment between our infant data and the HCP space, thus inevitably increasing the Hausdorff distance between our parcellation and the HCP parcellation.

      1-3. … not all individuals appear (from Fig 8) to be acquired exactly at the desired timepoints, so maybe the authors might comment on why they decided not to apply any kernel weighted or smoothing to their averaging? Pg. 8 'and parcel numbers show slight changes that follow a multi-peak fluctuation, with inflection ages of 9 and 18 months' explain - the parcels per age group vary - with age with peaks at 9 and 18 - could this be due to differences in the subject numbers, or the subjects that were scanned at that point?

      We do agree with the reviewer that subjects are not scanned at similar time points. This is designed in the data acquisition protocol to seamlessly cover the early postnatal stage so that we will have a quasi-continuous observation of the dynamic early brain development.

      We didn’t apply kernel weighted average or smoothing when generating the parcellation, as we would like each scan to contribute equally, and each parcellation map could be representative of the cohort of the covered age, instead of only part of them. Meanwhile, our final ‘age-common parcellation’ could be representative of all subjects from birth to 2 years of age. However, we do agree that the parcellation map that is only designed for the use of a specific age, e.g., 1-year-olds, kernel weighted average, or even a more restricted age range could be a more appropriate solution.

      For the parcel number that likely shows fluctuations with subject numbers, we added an experiment, where we randomly selected 100 scans by considering the minimum scan number in each age group using bootstrapping and repeated this process 100 times. The average parcel number of each age is reported in the following Table R1. We didn’t observe strong changes in parcel numbers when reducing scan numbers, which further demonstrates that our parcel numbers do not show a strong relation to subject numbers. However, the parcel number does not increase greatly from 18M to 24M in the bootstrapping results, so we modified the statement in the manuscript about the parcel number to ‘… all parcel numbers fall between 461 to 493 per hemisphere, where the parcel number attains a maximum at around 9 months and then reduces slightly and remains relatively stable afterward. …’, which can be found between line 121 and line 122.

      1-4. I also have some residual concerns over the number of parcels reported, specifically as to whether all of this represents fine-grained functional organisation, or whether some of it represents noise. The number of parcels reported is very high. While Glasser et al 2016 reports 360 as a lower bound, it seems unlikely that the number of parcels estimated by that method would greatly exceed 400. This would align with the previous work of Van Essen et al (which the authors cite as 53) which suggests a high bound of 400 regions. While accepting Eickhoff's argument that a more modular view of parcellation might be appropriate, these are infants with underdeveloped brain function.

      We thank the reviewer for this insightful comment. We agree that there might be noises for some of the parcels, as noises exist in each step, such as data acquisition, image processing, surface reconstruction, and registration, especially considering functional MRI is noisier than structural MRI. Though our experiments show that our parcellation is fine-grained and is suitable for the study of the infant brain functional development, it is hard to directly quantitatively validate as there is no ground truth available.

      Despite these, we are still motivated to create fine-grained parcellations, as with the increase of bigger and higher resolution imaging data and advanced computational methods, parcellations with more fine-grained regions are desired for downstream analyses, especially considering the hierarchical nature of the brain organization (2). And the main reason that our method generates much finer parcellation maps, is that both our registration and parcellation process is based on the functional gradient density, which characterizes a fine-grained feature map based on fMRI. This leads to both better inter-subject alignment in functional boundaries and finer region partitions. This strategy is different from Glasser et al (3), which jointly considers multimodal information for defining parcel boundaries, thus parcels revealed purely by functional MRI might be ignored in the HCP parcellation. We hope our parcellation framework can be a useful reference for this research direction. We added this discussion in the revised manuscript between line 268 and line 271.

      For the parcel number, even without performing surface registration based on fine-grained functional features, recent adult fMRI-based parcellations greatly increased parcel numbers, such as up to 1,000 parcels in Schaefer et al. (4), 518 parcels in Peng et al. (5), and 1,600 parcels in Zhao et al. (6). For infants, we do agree that the infant functional connectivity might not be as strong as in adults. However, there are opinions (7-9) that the basic units of functional organization are likely to present in infant brains, and brain functional development gradually shapes the brain networks. Therefore, the functional parcel units in infants could be possibly on a comparable scale to adults. Even so, we do agree that more research needs to be performed on larger datasets for better evaluations. We added this discussion in the revised manuscript between line 275 and line 280.

      1-5. Further comparisons across different subjects based on small parcels increases the chances of downstream analyses incorporating image registration noise, since as Glasser et al 2016 noted, there are many examples of topographic variation, which diffeomorphic registration cannot match. Therefore averaging across individuals would likely lose this granularity. I'm not sure how to test this beyond showing that the networks work well for downstream analyses but I think these issues should be discussed.

      We agree with the reviewer that averaging across individuals inevitably brings some registration errors to the parcellation, especially for regions with high topographic variation across subjects, which would lead to loss of granularity in these regions. We believe this is an important issue that exists in most methods on group-level parcellations, and the eventual solution might be individualized parcellation, which will be our future work. We added this discussion in the revised manuscript between line 288 and line 292.

      We also agree with the reviewer that downstream analyses are important evaluations for parcellations. We provided a beta version of our parcellation with 602 parcels (10) to our colleagues, and they tested our parcellation in the task of infant individual recognition across ages using functional connectivity, to explore infant functional connectome fingerprinting (10). We compared the performance of different parcellations with 602 ROIs (our beta version), 360 ROIs (HCP MMP parcellation (3)), and 68 ROIs (FreeSurfer parcellation (11)). The results (Fig. R3) show that our parcellation with a higher parcellation number yields better accuracy compared to other parcellations. We added a description of this downstream application in the discussion between line 284 and line 287.

      Fig. R3. The comparison of different parcellations for infant individual recognition across age based on functional connectivity (figure source: Hu et al. (10)). The parcellation with 602 ROIs is the beta version of our parcellation, 360 ROIs stands for HCP MMP parcellation (3) and 68 ROIs stands for the FreeSurfer parcellation (11). This downstream task shows that a higher parcellation number does lead to better accuracy in the application.

      1-6. Finally, I feel the methods lack clarity in some areas and that many key references are missing. In general I don't think that key methods should be described only through references to other papers. And there are many references, particular to FSL papers, that are missing.

      We thank the reviewer for this great suggestion. We added related references for FLIRT, FSL, MCFLIRT, and TOPUP For the alignment to the HCP 32k_LR space, we first aligned all subjects to the fsaverage space using spherical demons, and then used part of the HCP pipeline (12) to map the surface from the fsaverage space to HCP 164k_LR space, and downsampled to 32k_LR space. We modified this citation by referencing the HCP pipeline by Glasser et al. (12) instead and detailed this registration process in the revised manuscript between line 434 to line 440 in the revised manuscript and as below:

      “… The population-mean surface maps were mapped to the HCP 164k ‘fs_LR’ space using the deformation field that deforms the ‘fsaverage’ space to the ‘fs_LR’ space released by Van Essen et al. (13), which was obtained by landmark-based registration. By concatenating the three deformation fields of steps 1, 3, and 4, we directly warped all cortical surfaces from individual scan spaces to the HCP 164k_LR space and then resampled them to 32k_LR using the HCP pipeline (12), thus establishing vertex-to-vertex correspondences across individuals and ages …”

      Reviewer #2 (Public Review):

      2-1. Diminishing enthusiasm is the lack of focus in the result section, the frequent use of jargon, and figures that are often difficult to interpret. If those issues are addressed, the proposed atlas could have a high impact in the field especially as it is aligned with the template of the Human Connectome Project.

      We’d like to thank Reviewer #2 for the appreciation of our atlas. According to the reviewer’s suggestion, we went through the manuscript again by focusing on correcting the use of jargon, clarity in the result section, as well as figures and figure captions. We hope our corrections can help explain our work to a broader community. Our revisions are accordingly detailed in the following. Meanwhile, our parcellation maps have been aligned with the templates in HCP and FreeSurfer and made available via NITRC at: https://www.nitrc.org/projects/infantsurfatlas/.

      References

      1. B. Thomas Yeo, F. M. Krienen, J. Sepulcre, M. R. Sabuncu, D. Lashkari, M. Hollinshead, J. L. Roffman, J. W. Smoller, L. Zöllei, J. R. Polimeni, The organization of the human cerebral cortex estimated by intrinsic functional connectivity. Journal of neurophysiology 106, 1125-1165 (2011).

      2. S. B. Eickhoff, R. T. Constable, B. T. Yeo, Topographic organization of the cerebral cortex and brain cartography. NeuroImage 170, 332-347 (2018).

      3. M. F. Glasser, T. S. Coalson, E. C. Robinson, C. D. Hacker, J. Harwell, E. Yacoub, K. Ugurbil, J. Andersson, C. F. Beckmann, M. Jenkinson, S. M. Smith, D. C. Van Essen, A multi-modal parcellation of human cerebral cortex. Nature 536, 171-178 (2016).

      4. A. Schaefer, R. Kong, E. M. Gordon, T. O. Laumann, X.-N. Zuo, A. J. Holmes, S. B. Eickhoff, B. T. J. C. C. Yeo, Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity MRI. 28, 3095-3114 (2018).

      5. L. Peng, Z. Luo, L.-L. Zeng, C. Hou, H. Shen, Z. Zhou, D. Hu, Parcellating the human brain using resting-state dynamic functional connectivity. Cerebral Cortex, (2022).

      6. J. Zhao, C. Tang, J. Nie, Functional parcellation of individual cerebral cortex based on functional mri. Neuroinformatics 18, 295-306 (2020).

      7. W. Gao, S. Alcauter, J. K. Smith, J. H. Gilmore, W. Lin, Development of human brain cortical network architecture during infancy. Brain Structure and Function 220, 1173-1186 (2015).

      8. W. Gao, H. Zhu, K. S. Giovanello, J. K. Smith, D. Shen, J. H. Gilmore, W. J. P. o. t. N. A. o. S. Lin, Evidence on the emergence of the brain's default network from 2-week-old to 2-year-old healthy pediatric subjects. 106, 6790-6795 (2009).

      9. K. Keunen, S. J. Counsell, M. J. J. N. Benders, The emergence of functional architecture during early brain development. 160, 2-14 (2017).

      10. D. Hu, F. Wang, H. Zhang, Z. Wu, Z. Zhou, G. Li, L. Wang, W. Lin, G. Li, U. U. B. C. P. Consortium, Existence of Functional Connectome Fingerprint during Infancy and Its Stability over Months. Journal of Neuroscience 42, 377-389 (2022).

      11. R. S. Desikan, F. Ségonne, B. Fischl, B. T. Quinn, B. C. Dickerson, D. Blacker, R. L. Buckner, A. M. Dale, R. P. Maguire, B. T. Hyman, An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage 31, 968-980 (2006).

      12. M. F. Glasser, S. N. Sotiropoulos, J. A. Wilson, T. S. Coalson, B. Fischl, J. L. Andersson, J. Xu, S. Jbabdi, M. Webster, J. R. Polimeni, The minimal preprocessing pipelines for the Human Connectome Project. NeuroImage 80, 105-124 (2013).

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors present a system that allows the measurement of OCR on diverse tissues. Using two optopes, one before the tissue under examination, and one after, allows the OCR to be measured as the difference between the concentration of O2 in the in-flow gas and the concentration of O2 in the out-flow gas. The system maintains the tissue at a set concentration of dissolved O2 so that experiments can be performed over a long period of time. The authors have provided ample data and full methods and their conclusions are most likely reliable.

      Currently, we know that O2 is critical for diverse physiological processes, however it is rarely as well controlled for as well as non-gas solutes such as glucose, as we lack methods to control its delivery and infer its consumption. By addressing this need, the authors contribute something valuable to the field, which will hopefully be built on by others. The authors have already begun to show the utility of their system by exploring the complicated biology of H2S. As delivering this gas in a controlled manner is hard, often people use NaHS instead. In line with previous studies (well cited by the authors), differences are observed.

      Specific points

      1) The gas control system is used with islets, INS-1 832/12 cells, retinas, and liver tissue, demonstrating its broad applicability.

      2) The system as a platform can have diverse extra measurement modalities attached to it, for example visible-wavelength absorbance and fluorescence. Metabolite concentrations in the tissue culture outflow could also be measured.

      3) The reduction state of cyt c and cyt c oxidase are measured from the second derivative of absorbance at 550 and 605 nm. Ideally, to reliably decompose these signals full spectra around 550-605 nm would be collected. As the authors are only using cytochrome reduction state as a qualitative measure and appear careful to avoid over-interpretation this method should be fine. However, the authors ought to show a representative time course including the fully oxidised and reduced states demonstrating this approach as making these measurements is demanding and will depend on the exact spectroscopic set-up. Without this information it is hard to judge the reliability of the paper.

      We appreciate giving us the latitude for a less robust measurement. However, we actually did do what you have suggested should be done. That is, with the Ocean Optics spectrophotometer, we measure the full light spectrum from 400 to 650. Using this spectral data, we calculate the first and second derivatives of the absorption. We have previously published our approach to spectral analysis, as well as the inclusion of the fully oxidized and reduced states (Sweet IR, G Khalil, AR Wallen, M Steedman, KA Schenkman, JA Reems, SE Kahn, JB Callis. Continuous measurement of oxygen consumption by pancreatic islets. Diabetes Tech. Ther. 4: 661-672, 2002; Sweet IR, Cook DL, DeJulio E, Wallen AR, Khalil G, Callis JB, Reems JA: Regulation of ATP/ADP in pancreatic islets. Diabetes 53:401-409. 2004), so we did not include all the details. In order to ensure that our description is clear, we have added a more thorough explanation that we used spectral analysis and not just data obtained as single wavelengths.

      Reviewer #2 (Public Review):

      The present project is an extension of prior work from this work group in which they describe a technological advancement to their published flow-culture system. Such improvements now incorporate technology that allows for metabolic characterization of mammalian tissues while precisely controlling the concentration of abundant gases (e.g., O2), as well as trace gases (e.g., H2S). The present article demonstrates the utility of this system in the context of hypoxia/re-oxygenation experiments, as well as exposure to H2S. Although the methodology described herein is clearly capable of detecting nuanced metabolic changes in response to variations in O2 or H2S, the lack of a head-to-head comparison with other techniques makes it difficult to discern the potential impact of the technology.

      We understand the benefit of comparing compare a new method with the currently utilized methods. However, the novelty of our methodology is that it is able to control the exposure of tissue to levels of both abundant and trace dissolved gas composition, functions that neither of these existing instruments provide. In addition, continuous flow of media allows maintenance and assessment of tissue models that cannot be accommodated by static or spinner systems. Since we are the first to report an entirely novel technology, the direct comparison to benchmarks is not possible. In the past, however, we have tested liver slices and retina in a Seahorse and the tissue died within 120 minutes presumably due to the lack of flow/reoxygenation in the tissue. In addition, islets placed in spinner systems such as the Oxygraph become fragmented and broken very rapidly. So, a head to head comparison on the tissue OCR response to changes in gas composition cannot be meaningfully carried out for the facets of our method that we highlighted. The methodology we present has capabilities that do not exist in any other commercially available system. We have stated this latter point in the last line of the second paragraph of the Introduction. Regarding the general reliability of the O2 consumption measurement: the unprecedented accuracy and stability of the O2 detectors and the ability of our flow system to maintain tissue for days while generating accurate and reproducible measurements of O2 consumption has previously been established (Sweet IR, Gilbert M, Sabek O, Fraga DW, Gaber AO, Reems JA. Glucose Stimulation of Cytochrome c Reduction and Oxygen Consumption as Assessment of Human Islet Quality. Transplantation 80: 1003- 1011, 2005; Neal AS, Rountree AM, Philips CW, Kavanagh TJ, Williams DP, Newham P, Khalil G, Cook DL, Sweet IR. Quantification of low-level drug effects using real-time, in vitro measurement of oxygen consumption rate. Toxicological Sciences 148: 594-602, 2015).

      In addition, diffusion gradients both in the bath, as well as the tissue itself likely impact the accuracy of the metabolic measurements. This is likely relevant for the liver slices experiments.

      We agree that there are certainly concentration gradients within tissue, and these are increased in the absence of capillary flow. Nonetheless, the gradients will certainly be less than what occurs in static systems. In general, optimal size of tissue pieces are a trade-off between potential for hypoxia if the tissue is too large, and a lack of untraumatized tissue if it is too small. We have added text to address this concern that these effects are to be considered when choosing the size and shape of the liver slices or other tissue models to place into the flow system.

      Following resection, liver tissue can be mechanically permeabilized (PMID: 12054447). In the present experiments, no controls were put in place to discern if the tissue was permeabilized. This could be checked by adding in adenylates and additional carbon substrates and assessing the impact on OCR. Similar controls likely need to be implemented for the islet and retina experiments.

      As we have used flow systems in the past to maintain islets and liver for 24 hours and more (Neal AS, Rountree AM, Kernan K, Van Yserloo B, Zhang H, Reed BJ, Osborne W, Wang W, Sweet IR. Real time imaging of intracellular hydrogen peroxide in pancreatic islets. Biochem. J. 473:4443-4456, 2016; Neal AS, Rountree AM, Philips CW, Kavanagh TJ, Williams DP, Newham P, Khalil G, Cook DL, Sweet IR. Quantification of low-level drug effects using real-time, in vitro measurement of oxygen consumption rate. Toxicological Sciences 148: p. 594-602, 2015) and based on stable OCR we concluded that the tissue is viable. However, it is possible that the membranes of some of the tissue would become permeabilized which would affect the responses to test compounds. We considered this issue from two perspectives. 1. Whether established models that we used to test the BaroFuse were prone to high cell permeability; and 2. Whether loading and maintenance of the tissue models in the fluidics system resulted in increased permeability. We did do experiments measuring the ADP responses in OCR by islets and retina within the fluidics system. Effects were observable but small. However, these results are not definitive, because it was difficult to know what the response in permeabilized tissue was (and permeabilizing tissue slices was difficult). We then used Propidium Iodide staining to visualize and quantify the level of permeability. In islets, the fluorescence in isolated islets before and after perifusion was negligible compared to that in islets permeabilized by H2O2 treatment (see below).

      Fig. 1. Staining of isolated rat islets with the indicator of cell membrane integrity propidium iodide. Islets were stained either before or after a 3-hour perifusion. As a positive control for PI staining, islets were treated with 500 uM H2O2 for 30 minutes and incubated overnight. Each data point was the average +/- SE for an n of 3.

      There was some fluorescence in retina and liver however, but it was difficult to interpret this data in terms of a fraction of the tissue that is permeabilized due to the fact that dye close to the surface of the tissue is preferentially imaged. So, we finally assessed the amount of permeabilized tissue in retina and liver by comparing uptake of 3H H2O and an extracellular marker C14 sucrose.

      Fig. 2. Fraction of tissue water space that is accessible to the extracellular marker sucrose. Left: Mouse retina. Right: Rat liver slice. Each data point was the average +/- SE for an n of 3.

      Extracellular water in liver and retina is well established to be about 25%, close to the volume of distribution of sucrose. Thus, we cannot rule out that there are a small percentage of cells that are permeabilized, but the vast majority are not.

      Additional comments are detailed below:

      -The experiments with H2S are particularly interesting, as this system does seem well suited to investigate the metabolic effects of H2S.

      Thanks! We are excited by the potential for this method to assess the effects of H2S and other trace gases.

      -The authors state the transient rise in O2 consumption was surprising; however, accumulation of succinate during ischemia and rapid oxidation upon reperfusion has been previously demonstrated (PMID: 32863205).

      This is an interesting paper which describes findings that speak to the role of succinate in supplying fuel that could drive the transient changes in O2 consumption observed following hypoxia. It would be an interesting experiment to perform our hypoxia-reoxygenation experiment in the absence and presence of the permeable malonate to see if the spike in O2 consumption following reoxygenation was absent in the presence of the drug. We have removed the word surprising and cited this paper.

      -In the paper, Zaprinast was used to block pyruvate uptake. However, the rationale to use this compound, as opposed to the more specific MPC inhibitor UK5099 is unclear.

      We could have used UK5099, but we had used Zaprinast in past studies (Du J, Cleghorn WM, Contreras L, Lindsay K, Rountree AM, Chertov AO, Turner SJ, Sahaboglu A, Linton J, Sadilek M, Satrústegui I, Sweet IR, Paquet-Durand F, Hurley JB. Inhibition of mitochondrial pyruvate transport by Zaprinast causes massive accumulation of aspartate at the expense of glutamate in retinas. J Biol. Chem, 288:36129-40, 2013) and so we knew that in our hands that it blocked pyruvate mitochondrial uptake and would therefore be a good test of the rapid transfer of pyruvate across the plasma membrane.

      -Throughout the paper, the authors list 'COVID-19' as a potential application. It is not clear how this technology could be used in the context of COVID-19.

      Reference to COVID-19 has been removed.

    1. Author Response:

      Reviewer #1 (Public Review):

      Overall, the authors have done a nice job covering the relevant literature, presenting a story out of complicated data, and performing many thoughtful analyses.

      However, I believe the paper requires quite major revisions.

      We thank the reviewer for their encouraging assessment of our manuscript. We are grateful for their valuable and especially detailed feedback that helped us to substantially improve our manuscript.

      Major issues:

      I do not believe the current results present a clear, comprehensible story about sleep and motor memory consolidation. As presented, sleep predicts an increase in the subsequent learning curve, but there is a negative relationship between learning curve and task proficiency change (which is, as far as I can tell, similar to "memory retention"). This makes it seem as if sleep predicts more forgetting on initial trials within the subsequent block (or worse memory retention) - is this true? Regardless of whether it is statistically true, there appears another story in these data that is being sacrificed to fit a story about sleep. To my eye, the results may first and foremost tell a circadian (rather than sleep) story. Examining the data in Figure 2A and 2B, it appears that every AM learning period has a higher learning curve (slope) than every PM period. While this could, of course, be due to having just slept, the main story gleaned from such a result is not a sleep effect on retention, which has been the emphasis on motor memory consolidation research in the last couple of decades, but on new learning. The fact that this effect appears present in the first session (juggling blocks 1-3 in adolescents and blocks 1-5 in adults) makes this seem the more likely story here, since it has less to do with "preparing one to re-learn" and more to do with just learning and when that learning is optimal. But even if it does not reach statistical significance in the first session alone, it remains a concern and, in my opinion, should be considered a focus in the manuscript unless the authors can devise a reason to definitively rule it out.

      Here is how I recommend the authors proceed on this point: include all sessions from all subjects into a mixed effect model, predicting the slope of the learning curve with time of day and age group as fixed effects and subjects as random effects:

      learning curve slope ~ AM/PM [AM (0) or PM (1)] + age [adolescent (0) or adult (1)] + (1|subject)

      …or something similar with other regressors of interest. If this is significant for AM/PM status, they should re-try the analysis using only the first session. If this is significant, then a sleep-centric story cannot be defended here at all, in my opinion. If it is not (which could simply result from low power, but the authors could decide this), the authors should decide if they think they can rule out circadian effects and proceed accordingly. I should note that, while to many, a sleep story would be more interesting or compelling, that is not my opinion, and I would not solely opt to reject this paper if it centered a time-of-day story instead.

      The authors need to work out precisely what is happening in the behavior here, and let the physiology follow that story. They should allow themselves to consider very major revisions (and drop the physiology) if that is most consistent with the data. As presented, I am very unclear of what to take away from the study.

      We thank the reviewer for the opportunity to further elaborate on our behavioral results. We agree that the interpretation of the behavior in the complex gross-motor task is not straight forward, which might be partly due to less controllability compared to for example finger-tapping tasks. The reviewer is correct that, initially sleep seems to predict more forgetting on initial trials within the subsequent block given the dip in task proficiency and a resulting increase in steepness of the learning curve after the sleep retention interval. Notably, this dip in performance after sleep has also been reported for finger-tapping tasks (cf. Eichenlaub et al, 2020). The performance dip is also present in the wake first group (Figure 2) after the first interval. This observation suggests that picking up the task again after a period of time comes at a cost. Interestingly, this performance dip is no longer present after the second retention interval indicating that the better the task proficiency the easier it is to pick up juggling again. In other words, juggling has been better consolidated after additional training. Critically, our results show, that participants with higher SO-spindle coupling strength have a lower dip in performance after the retention interval, thus indicating a learning advantage.

      Figure 2

      (A) Number of successful three-ball cascades (mean ± standard error of the mean [SEM]) of adolescents (circles) for the sleep-first (blue) and wake-first group (green) per juggling block. Grand average learning curve (black lines) as computed in (C) are superimposed. Dashed lines indicate the timing of the respective retention intervals that separate the three performance tests. Note that adolescents improve their juggling performance across the blocks. (B) Same conventions as in (A) but for adults (diamonds). Similar to adolescents, adults improve their juggling performance across the blocks regardless of group.

      We discuss the sleep effect on juggling in the discussion section (page 22 – 23, lines 502 – 514):

      "How relevant is sleep for real-life gross-motor memory consolidation? We found that sleep impacts the learning curve but did not affect task proficiency in comparison to a wake retention interval (Figure 2DE). Two accounts might explain the absence of a sleep effect on task proficiency. (1) Sleep rather stabilizes than improves gross-motor memory, which is in line with previous gross-motor adaption studies (Bothe et al, 2019; Bothe et al, 2020). (2) Pre-sleep performance is critical for sleep to improve motor skills (Wilhelm et al, 2012). Participants commonly reach asymptotic pre-sleep performance levels in finger tapping tasks, which is most frequently used to probe sleep effects on motor memory. Here we found that using a complex juggling task, participants do not reach asymptotic ceiling performance levels in such a short time. Indeed, the learning progression for the sleep-first and wake-first groups followed a similar trend (Figure 2AB), suggesting that more training and not in particular sleep drove performance gains."

      If indeed the authors keep the sleep aspect of this story, here are some comments regarding the physiology. The authors present several nice analyses in Figure 3. However, given the lack of behavioral difference between adolescents and adults (Fig 2D), they combine the groups when investigating behavior-physiology relationships. In some ways, then, Figure 3 has extraneous details to the point of motor learning and retention, and I believe the paper would benefit from more focus. If the authors keep their sleep story, I believe Figure 3 and 4 should be combined and some current figure panels in Figure 3 should be removed or moved to the supplementary information.

      We thank the reviewers for their suggestion and we agree that the figures of our manuscript would benefit from more focus. Therefore, we combined Figure 3 and 4 from the original manuscript into a revised Figure 3 in the updated version of the manuscript. In more detail, subpanels that explain our methodological approach can now be found in Figure 3 – figure supplement 1, while the updated Figure 3 now focuses on developmental changes in oscillatory dynamics and SO-spindle coupling strength as well as their relationship to gross-motor learning.

      Updated Figure 3:

      (A) Left: topographical distribution of the 1/f corrected SO and spindle amplitude as extracted from the oscillatory residual (Figure 3 – figure supplement 1A, right). Note that adolescents and adults both display the expected topographical distribution of more pronounced frontal SO and centro-parietal spindles. Right: single subject data of the oscillatory residual for all subjects with sleep data color coded by age (darker colors indicate older subjects). SO and spindle frequency ranges are indicated by the dashed boxes. Importantly, subjects displayed high inter-individual variability in the sleep spindle range and a gradual spindle frequency increase by age that is critically underestimated by the group average of the oscillatory residuals (Figure 3 – figure supplement 1A, right). (B) Spindle peak locked epoch (NREM3, co-occurrence corrected) grand averages (mean ± SEM) for adolescents (red) and adults (black). Inset depicts the corresponding SO-filtered (2 Hz lowpass) signal. Grey-shaded areas indicate significant clusters. Note, we found no difference in amplitude after normalization. Significant differences are due to more precise SO-spindle coupling in adults. (C) Top: comparison of SO-spindle coupling strength between adolescents and adults. Adults displayed more precise coupling than adolescents in a centro-parietal cluster. T-scores are transformed to z-scores. Asterisks denote cluster-corrected two-sided p < 0.05. Bottom: Exemplary depiction of coupling strength (mean ± SEM) for adolescents (red) and adults (black) with single subject data points. Exemplary single electrode data (bottom) is shown for C4 instead of Cz to visualize the difference. (D) Cluster-corrected correlations between individual coupling strength and overnight task proficiency change (post – pre retention) for adolescents (red, circle) and adults (black, diamond) of the sleep-first group (left, data at C4). Asterisks indicate cluster-corrected two-sided p < 0.05. Grey-shaded area indicates 95% confidence intervals of the trend line. Participants with a more precise SO-spindle coordination show improved task proficiency after sleep. Note that the change in task proficiency was inversely related to the change in learning curve (cf. Figure 2D), indicating that a stronger improvement in task proficiency related to a flattening of the learning curve. Further note that the significant cluster formed over electrodes close to motor areas. (E) Cluster-corrected correlations between individual coupling strength and overnight learning curve change. Same conventions as in (D). Participants with more precise SO-spindle coupling over C4 showed attenuated learning curves after sleep.

      and

      Figure 3 - figure supplement 1

      (A) Left: Z-normalized EEG power spectra (mean ± SEM) for adolescents (red) and adults (black) during NREM sleep in semi-log space. Data is displayed for the representative electrode Cz unless specified otherwise. Note the overall power difference between adolescents and adults due to a broadband shift on the y-axis. Straight black line denotes cluster-corrected significant differences. Middle: 1/f fractal component that underlies the broadband shift. Right: Oscillatory residual after subtracting the fractal component (A, middle) from the power spectrum (A, left). Both groups show clear delineated peaks in the SO (< 2 Hz) and spindle range (11 – 16 Hz) establishing the presence of the cardinal sleep oscillations in the signal. (B) Top: Spindle frequency peak development based on the oscillatory residuals. Spindle frequency is faster at all but occipital electrodes in adults than in adolescents. T-scores are transformed to z-scores. Asterisks denote cluster-corrected two-sided p < 0.05. Bottom: Exemplary depiction of the spindle frequency (mean ± SEM) for adolescents (red) and adults (black) with single subject data points at Cz. (C) SO-spindle co-occurrence rate (mean ± SEM) for adolescents (red) and adults (black) during NREM2 and NREM3 sleep. Event co-occurrence is higher in NREM3 (F(1, 51) = 1209.09, p < 0.001, partial eta² = 0.96) as well as in adults (F(1, 51) = 11.35, p = 0.001, partial eta² = 0.18). (D) Histogram of co-occurring SO-spindle events in NREM2 (blue) and NREM3 (purple) collapsed across all subjects and electrodes. Note the low co-occurring event count in NREM2 sleep. (E) Single subject (top) and group averages (bottom, mean ± SEM) for adolescents (red) and adults (black) of individually detected, for SO co-occurrence-corrected sleep spindles in NREM3. Spindles were detected based on the information of the oscillatory residual. Note the underlying SO-component (grey) in the spindle detection for single subject data and group averages indicating a spindle amplitude modulation depending on SO-phase. (F) Grand average time frequency plots (-2 to -1.5s baseline-corrected) of SO-trough-locked segments (corrected for spindle co-occurrence) in NREM3 for adolescents (left) and adults (right). Schematic SO is plotted superimposed in grey. Note the alternating power pattern in the spindle frequency range, showing that SO-phase modulates spindle activity in both age groups.

      Why did the authors use Spearman rather than Pearson correlations in Figure 4? Was it to reduce the influence of the outlier subject? They should minimally clarify and justify this, since it is less conventional in this line of research. And it would be useful to know if the relationship is significant with Pearson correlations when robust regression is applied. I see the authors are using MATLAB, and the robustfit toolbox (https://www.mathworks.com/help/stats/robustfit.html) is a simple way to address this issue.

      We thank the reviewers for their suggestion. We agree that when inspecting the scatter plots it looks like that the correlations could be severely influenced by two outliers in the adult group. Because this is an important matter, we recalculated all previously reported correlations without the two outliers (Figure R4, left column) and followed the reviewer’s suggestion to also compute robust regression (Figure R4, right column) and found no substantial deviation from our original results.

      In more detail, increase in task proficiency resulted in flattening of the learning curve when removing outliers (Figure R4A, rhos = -0.70, p < 0.001) and when applying robust regression analysis (Figure R4B, b = -0.30, t(67) = -10.89, rho = -0.80, p < 0.001). Likewise, higher coupling strength still predicted better task proficiency (mean rho = 0.35, p = 0.029, cluster-corrected) and flatter learning curves after sleep (rho = -0.44, p = 0.047, cluster-corrected) when removing the outliers (Figure R4CE) and when calculating robust regression (Figure R4DF, task proficiency: b = 82.32, t(40) = 3.12, rho = 0.45, p = 0.003; learning curve: b = -26.84, t(40) = -2.96, rho = -0.43, p = 0.005). Furthermore, we calculated spearman rank correlations and cluster-corrected spearman rank correlations in our original manuscript, to mitigate the impact of outliers, even though Pearson correlations are more widely used in the field. Therefore, we still report spearman rank correlations for single electrodes instead of robust correlations as it is more consistent with the cluster-correlation analyses.

      We now use robust trend lines instead of linear trend lines in our scatter plots. Further, we added the correlations without outliers (Figure R4ACE) to the supplements as Figure 2 – figure supplement 1D and Figure 3 – figure supplement 2 FG. These additional analyses are now reported in the results section of the revised manuscript (page 9, lines 186 – 191):

      "[…] we confirmed a strong negative correlation between the change (post retention values – pre retention values) in task proficiency and the change in learning curve after the retention interval (Figure 2F; rhos = -0.71, p < 0.001), which also remained strong after outlier removal (Figure 2 – figure supplement 1D). This result indicates that participants who consolidate their juggling performance after a retention interval show slower gains in performance."

      And (page 16, lines 343 – 346):

      "[…] Furthermore, our results remained consistent when including coupled spindle events in NREM2 (Figure 3 – figure supplement 2E) and after outlier removal (Figure 3 – figure supplement 2FG)."

      Furthermore, we now state that we specifically utilized spearman rank correlations to mitigate the impact of outliers in our analyses in the method section (page 35, lines 808 – 813)::

      "For correlational analyses we utilized spearman rank correlations (rhos; Figure 2F & Figure 3DE) to mitigate the impact of possible outliers as well as cluster-corrected spearman rank correlations by transforming the correlation coefficients to t-values (p < 0.05) and clustering in the space domain (Figure 3DE). Linear trend lines were calculated using robust regression."

      Figure R4

      (A) Spearman rank correlation between task proficiency change and learning curve change collapsed across adolescents (red dot) and adults (black diamonds) after removing two outlier subjects in the adult age group. Grey-shaded area indicates 95% confidence intervals of the robust trend line. (B) Robust regression of task proficiency change and learning curve change of the original sample. (C) Cluster-corrected correlations (right) between individual coupling strength and overnight task proficiency change (post – pre retention) after outlier removal (left, spearman correlation at C4, uncorrected). Asterisks indicate cluster-corrected two-sided p < 0.05. (D) Robust regression of coupling strength at C4 and task proficiency of the original sample. (E) Same conventions as in (C) but for overnight learning curve change. (F) Same conventions as in (D) but for overnight learning curve change.

      Additionally, with only a single night of recording data, it is impossible to disentangle possible trait-based sleep characteristics (e.g., Subject 1 has high SO-spindle coupling in general and retains motor memories well, but these are independent of each other) from a specific, state-based account (e.g., Subject 1's high SO-spindle coupling on night 1 specifically led to their improved retention or change in learning, etc., and this is unrelated to their general SO-spindle coupling or motor performance abilities). Clearly, many studies face this limitation, but this should be acknowledged.

      We thank the reviewers for their important remark. We agree that it is impossible to make a sound statement about whether our reported correlations represent trait- or state-based aspects of the sleep and learning relationship with the data that we have reported in the manuscript. However, while we are lacking a proper baseline condition without any task engagement, we still recorded polysomnography for all subjects during an adaptation night. Given the expected pronounced differences in sleep architecture between the adaptation nights and learning nights (see Table R3 for an overview collapsed across both age groups), we initially refrained from entering data from the adaptation nights into our original analyses, but we now fully report the data below. Note that the differences are driven by the adaptation night, where subjects first have to adjust to sleeping with attached EEG electrodes in a sleep laboratory.

      Table R3. Sleep architecture (mean ± standard deviation) for the adaptation and learning night collapsed across both age groups. Nights were compared using paired t-tests

      To further clarify whether subjects with high coupling strength have a motor learning advantage (i.e. trait-effect) or a learning induced enhancement of coupling strength is indicative for improved overnight memory change (i.e. state-effect), we ran additional analyses using the data from the adaptation night. Note that the coupling strength metric was not impacted by differences in event number and our correlations with behavior were not influenced by sleep architecture (please refer to our answer of issue #7 for the results).Therefore, we considered it appropriate to also utilize data from the adaptation night.

      First, we correlated SO-spindle coupling strength obtained from the adaptation night with the coupling strength in the learning night. We found that overall, coupling strength is highly correlated between the two measurements (mean rho across all channels = 0.55, Figure R5A), supporting the notion that coupling strength remains rather stable within the individual (i.e. trait), similar to what has been reported about the stable nature of sleep spindles as a “neural finger-print” (De Gennaro & Ferrara, 2003; De Gennaro et al, 2005; Purcell et al, 2017).

      To investigate a possible state-effect for coupling strength and motor learning, we calculated the difference in coupling strength between the two nights (learning night – adaptation night) and correlated these values with the overnight change in task proficiency and learning curve. We identified no significant correlations with a learning induced coupling strength change; neither for task proficiency nor learning curve change (Figure R5B). Note that there was a positive correlation of coupling strength change with overnight task proficiency change at Cz (Figure R5B, left), however it did not survive cluster-corrected correlational analysis (rhos = 0.34, p = 0.15). Combined, these results favor the conclusion that our correlations between coupling strength and learning rather reflect a trait-like relationship than a state-like relationship. This is in line with the interpretation of our previous studies that SO-spindle coupling strength reflects the efficiency and integrity of the neuronal pathway between neocortex and hippocampus that is paramount for memory networks and the information transfer during sleep (Hahn et al, 2020; Helfrich et al, 2019; Helfrich et al, 2018; Winer et al, 2019). For a comprehensive review please see Helfrich et al (2021), which argued that SO-spindle coupling predicts the integrity of memory pathways and therefore correlates with various metrics of behavioral performance or structural integrity.

      Figure R5

      (A) Topographical plot of spearman rank correlations of coupling strength in the adaptation night and learning night across all subjects. Overall coupling strength was highly correlated between the two measurements. (B) Cluster-corrected correlation between learning induced coupling strength changes (learning night – adaptation night) and overnight change in task proficiency (left) as well as learning curve (right). We found no significant clusters, although correlations showed similar trends as our original analyses, with more learning induced changes in coupling strength resulting in better overnight task proficiency and flattened learning curves.

      We have now added the additional state-trait analyses (Figure R5) to the updated manuscript as Figure 3 – figure supplement 2HI and report them in the results section (page 17, lines 361 – 375):

      "Finally, we investigated whether subjects with high coupling strength have a gross-motor learning advantage (i.e. trait-effect) or a learning induced enhancement of coupling strength is indicative for improved overnight memory change (i.e. state-effect). First, we correlated SO-spindle coupling strength obtained from the adaptation night with the coupling strength in the learning night. We found that overall, coupling strength is highly correlated between the two measurements (mean rho across all channels = 0.55, Figure 3 – figure supplement 2H), supporting the notion that coupling strength remains rather stable within the individual (i.e. trait). Second, we calculated the difference in coupling strength between the learning night and the adaptation night to investigate a possible state-effect. We found no significant cluster-corrected correlations between coupling strength change and task proficiency- as well as learning curve change (Figure 3 – figure supplement 2I).

      Collectively, these results indicate the regionally specific SO-spindle coupling over central EEG sensors encompassing sensorimotor areas precisely indexes learning of a challenging motor task."

      We further refer to these new results in the discussion section (page 23, lines 521 – 528):

      "Moreover, we found that SO-spindle coupling strength remains remarkably stable between two nights, which also explains why a learning-induced change in coupling strength did not relate to behavior (Figure 3 – figure supplement 2I). Thus, our results primarily suggest that strength of SO-spindle coupling correlates with the ability to learn (trait), but does not solely convey the recently learned information. This set of findings is in line with recent ideas that strong coupling indexes individuals with highly efficient subcortical-cortical network communication (Helfrich et al, 2021)."

      Additionally, we now provide descriptive data of the adaptation and learning night (Table R3) in the Supplementary file – table 1 and explicitly mention the adaptation night in the results section, which was previously only mentioned in the method section(page 6, lines 101 – 105):.

      "Polysomnography (PSG) was recorded during an adaptation night and during the respective sleep retention interval (i.e. learning night) except for the adult wake-first group (for sleep architecture descriptive parameters of the adaptation night and learning night as well as for adolescents and adults see Supplementary file – table 1 & 2)."

      Reviewer #2 (Public Review):

      In this study Hahn and colleagues investigate the role of Slow-oscillation spindle coupling for motor memory consolidation and the impact of brain maturation on these interactions. The authors employed a real-life gross-motor task, where adolescents and adults learned to juggle. They demonstrate that during post-learning sleep SO-spindles are stronger coupled in adults as compared to adolescents. The authors further show, that the strength of SO-spindle coupling correlates with overnight changes in the learning curve and task proficiency, indicating a role of SO-spindle coupling in motor memory consolidation.

      Overall, the topic and the results of the present study are interesting and timely. The authors employed state of the art analyse carefully taking the general variability of oscillatory features into account. It also has to be acknowledged that the authors moved away from using rather artificial lab-tasks to study the consolidation of motor memories (as it is standard in the field), adding ecological validity to their findings. However, some features of their analyses need further clarification.

      We thank the reviewer for their positive assessment of our manuscript. Incorporating the encouraging and helpful feedback, we believe that we substantially improved the clarity and robustness of our analyses.

      1) Supporting and extending previous work of the authors (Hahn et al, 2020), SO-spindle coupling over centro-parietal areas was stronger in adults as compared to adolescents. Despite these differences in the EEG results the authors collapsed the data of adults and adolescents for their correlational analyses (Fig. 4a and 4b). Why would the authors think that this procedure is viable (also given the fact that different EEG systems were used to record the data)?

      We thank the reviewers for the opportunity to clarify why we think it is viable to collapse the data of adolescents and adults for our correlational analyses. In the following we split our answers based on the two points raised by the reviewers: (1) electrophysiological differences (i.e. coupling strength) between the groups and (2) potential signal differences due to different EEG systems.

      1. Electrophysiological differences

      Upon inspecting the original Figure 4, it is apparent that the coupling strength of the combined sample does not form isolated clusters for each age group. In other words, while adult coupling strength is on the higher and adolescent coupling on the lower end due to the developmental increase in coupling strength we reported in the original Figure 3F, both samples overlap forming a linear trend. Second, when running the correlational analyses between coupling strength and task proficiency as well as learning curve separately for each age group, we found that they follow the same direction (Figure R3). Adolescents with higher coupling strength show better task proficiency (Figure R3A, rhos = 0.66, p = 0.005). This effect was also present when using robust regression (b = 109.97, t(15)=3.13, rho = 0.63, p = 0.007). Like adolescents, adults with higher coupling strength at C4 displayed better task proficiency after sleep (Figure R3B, rhos = 0.39, p = 0.053). This relationship was stronger when using robust regression (b = 151.36, t(23)=3.17, rho =0.56, p = 0.004). For learning curves, we found the expected negative correlation at C4 for adolescents (Figure R3C, rhos = -0.57, p = 0.020) and adults (Figure R3D, rhos = -0.44, p = 0.031). Results were comparable when using robust regression (adolescents: b = -59.58, t(15) = -2.94, rho = -0.60, p = 0.010; adults: b = -21.99, t(23 )= -1.71, rho = -0.37, p = 0.101).

      Taken together, these results demonstrate that adolescents and adults show the effects and the same direction at the same electrode, thus, making it highly unlikely that our results are just by chance and that our initial correlation analyses are just driven by one group.

      Additionally, we already controlled for age in our original analyses using partial correlations (also refer to our answer to issue #6). Hence, our additional analyses provide additional support that it is viable to collapse the analyses across both age groups even though they differ in coupling strength.

      1. Different EEG-systems

        The reviewers also raise the question whether our analyses might be impacted by the different EEG systems we used to record our data. This is an important concern especially when considering that cross-frequency coupling analyses can be severely confounded by differences in signal properties (Aru et al, 2015). In our sample, the strongest impact factor on signal properties is most likely age, given the broadband power differences in the power spectrum we found between the groups (original Figure 3A). Importantly, we also found a similar systematic power difference in our longitudinal study using the same ambulatory EEG system for both data recordings (Hahn et al, 2020). This is in line with numerous other studies demonstrating age related EEG power changes in broadband- as well as SO and sleep spindle frequency ranges (Campbell & Feinberg, 2016; Feinberg & Campbell, 2013; Helfrich et al, 2018; Kurth et al, 2010; Muehlroth et al, 2019; Muehlroth & Werkle-Bergner, 2020; Purcell et al, 2017). Therefore, we already had to take differences in signal property into account for our cross-frequency analyses. Regardless whether the underlying cause is an age difference or different signal-to-noise ratios of different EEG systems.

      To mitigate confounds in the signal, we used a data-driven and individualized approach detecting SO and sleep spindle events based on individualized frequency bands and a 75-percentile amplitude criterion relative to the underlying signal. Additionally we z-normalized all spindle events prior to the cross-frequency coupling analyses (Figure R3E). We found no amplitude differences around the spindle peak (point of SO-phase readout) between adolescents that were recorded with an ambulatory amplifier system (alphatrace) and adults that were recorded with a stationary amplifier system (neuroscan) using cluster-based random permutation testing. This was also the case for the SO-filtered (< 2 Hz) signal (Figure R3E, inset). Critically, the significant differences in amplitude from -1.4 to -0.8 s (p = 0.023, d = -0.73) and 0.4 to 1.5 s (p < 0.001, d = 1.1) are not caused by age related differences in power or different EEG-systems but instead by the increased coupling strength (i.e. higher coupling precision of spindles to SOs) in adults giving rise to a more pronounced SO-wave shape when averaging across spindle peak locked epochs.

      Consequently, our analysis pipeline already controlled for possible differences in signal property introduced through different amplifier systems. Nonetheless, we also wanted to directly compare the signal-to-noise ratio of the ambulatory and stationary amplifier systems. However, we only obtained data from both amplifier systems in the adult sleep first group, because we recorded EEG during the juggling learning phase with the ambulatory system in addition to the PSG with the stationary system. First, we computed the power spectra in the 1 to 49 Hz frequency range during the juggling learning phase (ambulatory) and during quiet wakefulness (stationary) for every subject in the adult sleep first group in 10-seconds segments. Next, we computed the signal-to-noise ratio (mean/standard deviation) of the power spectra per frequency across all segments. We only found a small negative cluster from 21.9 to 22.5 Hz (p = 0.042, d = 0.53; Figure R3F), which did not pertain our frequency-bands of interest. Critically, the signal-to-noise ratio of both amplifiers converged in the upper frequency bands approaching the noise floor, therefore, strongly supporting the notion that both systems in fact provided highly comparable estimates.

      In conclusion, both age groups display highly similar effects and direction when correlating coupling strength with behavior. Further, after individualization and normalization the analytical signal, we found no differences in signal properties that would confound the cross-frequency analysis. Lastly, we did not find systematic differences in signal-to-noise ratio between the different EEG-systems. Thus, we believe it is justified to collapse the data across all participants for the correlational analyses, as it combines both, the developmental aspect of enhanced coupling precision from adolescence to adulthood and the behavioral relevance for motor learning which we deem a critical research advance from our previous study.

      Figure R3

      (A) Cluster-corrected correlations (right) between individual coupling strength and overnight task proficiency change (post – pre retention) for adolescents of the sleep-first group (left, spearman correlation at C4, uncorrected). Asterisks indicate cluster-corrected two-sided p < 0.05. Grey-shaded area indicates 95% confidence intervals of the robust trend line. Participants with a more precise SO-spindle coordination show improved task proficiency after sleep. (B) Cluster-corrected correlation of coupling strength and overnight task proficiency change) for adults. Same conventions as in (A). Similar trend of higher coupling strength predicting better task proficiency after sleep (C) Cluster-corrected correlation of coupling strength and overnight learning curve change for adolescents. Same conventions as in (A). Higher coupling strength related to a flatter learning curve after sleep. (D) Cluster-corrected correlation of coupling strength and overnight learning curve change for adults. Same conventions as in (A). Higher coupling strength related to a flatter learning curve after sleep. (E) Spindle peak locked epoch (NREM3, co-occurrence corrected) grand averages (mean ± SEM) for adolescents (red) and adults (black). Inset depicts the corresponding SO-filtered (2 Hz lowpass) signal. Black lines indicate significant clusters. Note, we found no difference in amplitude after normalization. Significant differences are due to more precise SO-spindle coupling in adults. Spindle frequency is blurred due to individualized spindle detection. (F) Signal-to-noise ratio for the stationary EEG amplifier (green) during quiet wakefulness and for the ambulatory EEG amplifier (purple) during juggling training. Grey shaded area denotes cluster-corrected p < 0.05. Note that signal-to-noise ratio converges in the higher frequency ranges.

      We have now added Figure R3E as Figure 3B to the revised version of the manuscript to demonstrate that there were no systematic differences between the two age groups in the analytical signal due to the expected age related power differences or EEG-systems. Specifically, we now state in the results section (page 13 – 14, lines 282 – 294):

      "We assessed the cross frequency coupling based on z-normalized spindle epochs (Figure 3B) to alleviate potential power differences due to age (Figure 3 – figure supplement 1A) or different EEG-amplifier systems that could potentially confound our analyses (Aru et al, 2015). Importantly, we found no amplitude differences around the spindle peak (point of SO-phase readout) between adolescents and adults using cluster-based random permutation testing (Figure 3B), indicating an unbiased analytical signal. This was also the case for the SO-filtered (< 2 Hz) signal (Figure 3B, inset). Critically, the significant differences in amplitude from -1.4 to -0.8 s (p = 0.023, d = -0.73) and 0.4 to 1.5 s (p < 0.001, d = 1.1) are not caused by age related differences in power or different EEG-systems but instead by the increased coupling strength (i.e. higher coupling precision of spindles to SOs) in adults giving rise to a more pronounced SO-wave shape when averaging across spindle peak locked epochs."

      Further, we added the correlational analyses that we computed separately for the age groups (Figure R3A-D) to the revised manuscript (Figure 3 – figure supplement 2CD) as they further substantiate our claims about the relationship between SO-spindle coupling and gross-motor learning.

      We now refer to these analyses in the results section (page 16, lines 338 – 343):

      "Critically, when computing the correlational analyses separately for adolescents and adults, we identified highly similar effects at electrode C4 for task proficiency (Figure 3 – figure supplement 2C) and learning curve (Figure 3 – figure supplement 2D) in each group. These complementary results demonstrate that coupling strength predicts gross-motor learning dynamics in both, adolescents as well as adults, and further show that this effect is not solely driven by one group."

      2) The authors might want to explicitly show that the reported correlations (with regards to both learning curve and task proficiency change) are not driven by any outliers.

      We thank the reviewers for their suggestion. We agree that when inspecting the scatter plots it looks like that the correlations could be severely influenced by two outliers in the adult group. Because this is an important matter, we recalculated all previously reported correlations without the two outliers (Figure R4, left column) and followed the reviewer’s suggestion to also compute robust regression (Figure R4, right column) and found no substantial deviation from our original results.

      In more detail, increase in task proficiency resulted in flattening of the learning curve when removing outliers (Figure R4A, rhos = -0.70, p < 0.001) and when applying robust regression analysis (Figure R4B, b = -0.30, t(67) = -10.89, rho = -0.80, p < 0.001). Likewise, higher coupling strength still predicted better task proficiency (mean rho = 0.35, p = 0.029, cluster-corrected) and flatter learning curves after sleep (rho = -0.44, p = 0.047, cluster-corrected) when removing the outliers (Figure R4CE) and when calculating robust regression (Figure R4DF, task proficiency: b = 82.32, t(40) = 3.12, rho = 0.45, p = 0.003; learning curve: b = -26.84, t(40) = -2.96, rho = -0.43, p = 0.005). Furthermore, we calculated spearman rank correlations and cluster-corrected spearman rank correlations in our original manuscript, to mitigate the impact of outliers, even though Pearson correlations are more widely used in the field. Therefore, we still report spearman rank correlations for single electrodes instead of robust correlations as it is more consistent with the cluster-correlation analyses.

      We now use robust trend lines instead of linear trend lines in our scatter plots. Further, we added the correlations without outliers (Figure R4ACE) to the supplements as Figure 2 – figure supplement 1D and Figure 3 – figure supplement 2 FG. These additional analyses are now reported in the results section of the revised manuscript (page 9, lines 186 – 191):

      "[…] we confirmed a strong negative correlation between the change (post retention values – pre retention values) in task proficiency and the change in learning curve after the retention interval (Figure 2F; rhos = -0.71, p < 0.001), which also remained strong after outlier removal (Figure 2 – figure supplement 1D). This result indicates that participants who consolidate their juggling performance after a retention interval show slower gains in performance."

      And (page 16, lines 343 – 346):

      "[…] Furthermore, our results remained consistent when including coupled spindle events in NREM2 (Figure 3 – figure supplement 2E) and after outlier removal (Figure 3 – figure supplement 2FG)."

      Furthermore, we now state that we specifically utilized spearman rank correlations to mitigate the impact of outliers in our analyses in the method section (page 35, lines 808 – 813)::

      "For correlational analyses we utilized spearman rank correlations (rhos; Figure 2F & Figure 3DE) to mitigate the impact of possible outliers as well as cluster-corrected spearman rank correlations by transforming the correlation coefficients to t-values (p < 0.05) and clustering in the space domain (Figure 3DE). Linear trend lines were calculated using robust regression."

      Figure R4:

      (A) Spearman rank correlation between task proficiency change and learning curve change collapsed across adolescents (red dot) and adults (black diamonds) after removing two outlier subjects in the adult age group. Grey-shaded area indicates 95% confidence intervals of the robust trend line. (B) Robust regression of task proficiency change and learning curve change of the original sample. (C) Cluster-corrected correlations (right) between individual coupling strength and overnight task proficiency change (post – pre retention) after outlier removal (left, spearman correlation at C4, uncorrected). Asterisks indicate cluster-corrected two-sided p < 0.05. (D) Robust regression of coupling strength at C4 and task proficiency of the original sample. (E) Same conventions as in (C) but for overnight learning curve change. (F) Same conventions as in (D) but for overnight learning curve change.

      3) The sleep data of all participants (thus from both sleep first and wake first) were used to determine the features of SO-spindle coupling in adolescents and adults. Were there any differences between groups (sleep first vs. wake first)? This might be in interesting in general but especially because only data of the sleep first group entered the subsequent correlational analyses.

      We thank the reviewers for their remark. We agree that adding additional information about possible differences between the sleep first and wake first groups would allow for a more comprehensive assessment of the reported data. We did not explain our reasoning to include only the sleep first groups for the correlation analyses clearly enough in the original manuscript. Unfortunately, we can only report data for the adolescents in our sample, because we did not record polysomnography (PSG) for the adult wake first group. This is also one of the two reasons why we focused on the sleep first groups for our correlational analyses.

      Adolescents in the sleep first group did not differ from adolescents in the wake first group in terms of sleep architecture (except REM (%), which did not correlate with behavior [task proficiency: rho = -0.17, p = 0.28; learning curve: -0.02, p = 0.90]) as well as SO and sleep spindle event descriptive measures (see Table R2). Importantly, we found no differences in coupling strength between the two groups (Figure R2A).

      Table R2. Summary of sleep architecture and SO/spindle event descriptive measures (at electrode C4) of adolescents in the sleep first and wake first group (mean ± standard deviation). Independent t-tests were used for comparisons

      The second reason why we focused our analyses on sleep first was that adolescents in the wake first group had higher task proficiency after the sleep retention interval than the sleep first group (Figure R2A; t(23) = -2.24, p = 0.034). This difference in performance is directly explained by the additional juggling test that the wake first group performed at the time point of their learning night, which should be considered as additional training. Therefore, we excluded the wake first group from our correlational analyses because sleep and wake first group are not comparable in terms of juggling training during the night when we assessed SO-spindle coupling strength.

      Figure R2

      (A) Comparison of SO-spindle coupling strength in the adolescent sleep first (blue) and wake first (green) group using cluster-based random permutation testing (Monte-Carlo method, cluster alpha 0.05, max size criterion, 1000 iterations, critical alpha level 0.05, two-sided). Left: exemplary depiction of coupling strength at electrode C4 (mean ± SEM). Right: z-transformed t-values plotted for all electrodes obtained from the cluster test. No significant clusters emerged. (B) Comparison of task proficiency between sleep first and wake first group after the sleep retention interval (mean ± SEM). Adolescents in the wake first group had higher task proficiency given the additional juggling performance test, which also reflects additional training.

      These additional analyses (Figure R2) and the summary statistics of sleep architecture and SO/spindle event descriptives of adolescents in the sleep first and wake first group (Table R2), are now reported in the revised version of the manuscript as Figure 3 – figure supplement 2AB and Supplementary file – table 7. We now explicitly explain our rationale of why we only considered participants in the sleep first group for our correlational analyses in the results section (page 6, lines 101 – 105):

      "Polysomnography (PSG) was recorded during an adaptation night and during the respective sleep retention interval (i.e. learning night) except for the adult wake-first group (for sleep architecture descriptive parameters of the adaptation night and learning night as well as for adolescents and adults see Supplementary file – table 1 & 2)"

      And (page 15, lines 311 – 320):

      "[…] Furthermore, given that we only recorded polysomnography for the adults in the sleep first group and that adolescents in the wake first group showed enhanced task proficiency at the time point of the sleep retention interval due to additional training (Figure 3 – figure supplement 2A), we only considered adolescents and adults of the sleep-first group to ensure a similar level of juggling experience adolescents and adults of the sleep-first group to ensure a similar level of juggling experience (for summary statistics of sleep architecture and SO and spindle events of subjects that entered the correlational analyses see Supplementary file – table 6). Notably, we found no differences in electrophysiological parameters (i.e. coupling strength, event detection) between the adolescents of the wake first and sleep first group (Figure 3 – figure supplement 2B & Supplementary file – table 7)."

      4) To allow a more comprehensive assessment of the underlying data information with regards to general sleep descriptives (minutes, per cent of time spent in different sleep stages, overall sleep time etc.) as well as related to SOs, spindles and coupled events (e.g. number, density etc.) would be needed.

      We agree with the reviewers that additional information about sleep architecture and SO as well as sleep spindle characteristics are needed for a more comprehensive assessment of our data. We now added summary tables for sleep architecture and SO/spindle event descriptive measures for the whole sample (Table R4) and for the sleep first groups that we used for our correlational analyses (Table R5) to the supplementary material in the updated manuscript. It is important to note, that due to the longer sleep opportunity of adolescents that we provided to accommodate the overall higher sleep need in younger participants, adolescents and adults differed in most general sleep architecture markers and SO as well as sleep spindle descriptive measures. In addition, changes in sleep architecture are prominent during the maturational phase from adolescence to adulthood, which might introduce additional variance between the two age groups.

      Table R4. Summary of sleep architecture and SO/spindle event descriptive measures (at electrode C4) of adolescents and adults across the whole sample (mean ± standard deviation) in the learning night. Independent t-tests were used for comparisons

      Table R5. Summary of sleep architecture and SO/spindle event descriptive measures (at electrode C4) of adolescents and adults in the sleep first group (mean ± standard deviation) in the learning night. Independent t-tests were used for comparisons

      In order to ensure that our correlational analyses are not driven by these systematic differences between the two age groups, we used cluster-corrected partial correlations to control for sleep architecture markers (Figure R7) and SO/spindle descriptive measurements (Figure R8A). Critically, none of these possible confounders changed the pattern of our initial correlational analyses of coupling strength and task proficiency/learning curve. Additionally, we also controlled for differences in spindle event number by using a bootstrapped resampling approach. We randomly drew 200 spindle events in 100 iterations and subsequently recalculated the coupling strength for each subject. We found that resampled values and our original observation of coupling strength are almost perfectly correlated, indicating that differences in event number are unlikely to have an impact on coupling strength as long as there are at least 200 events (Figure R8B). Combined these analyses demonstrate that our correlations between coupling strength and behavior are not influenced by the reported differences in sleep architecture and SO/spindle descriptive measures.

      Figure 7R

      Summary of cluster-corrected partial correlations of coupling strength with task proficiency (left) and learning curve (right) controlling for possible confounding factors. Asterisks indicate location of the detected cluster. The pattern of initial results remained highly stable.

      Figure R8

      (A) Summary of cluster-corrected partial correlations of coupling strength with task proficiency (left) and learning curve (right) controlling SO/spindle descriptive measures at critical electrode C4. Asterisks indicate location of the detected cluster. The pattern of initial results remained highly stable. (B) Spearman correlation between resampled coupling strength (N = 200, 100 iterations) and original observation of coupling strength for adolescents (red circles) and adults (black diamonds), indicating that coupling strength is not influenced by spindle event number if at least 200 events are present. Grey-shaded area indicates 95% confidence intervals of the robust trend line.

      We now provide general sleep descriptives (Table R4 & R5) in the revised version of the manuscript as Supplementary file – table 2 & table 6. These data are referred to in the results section (page 6, lines 101 – 105):

      "Polysomnography (PSG) was recorded during an adaptation night and during the respective sleep retention interval (i.e. learning night) except for the adult wake-first group (for sleep architecture descriptive parameters of the adaptation night and learning night as well as for adolescents and adults see Supplementary file – table 1 & 2)."

      And (page 15, lines 311 – 318):

      "Furthermore, given that we only recorded polysomnography for the adults in the sleep first group and that adolescents in the wake first group showed enhanced task proficiency at the time point of the sleep retention interval due to additional training (Figure 3 – figure supplement 2A), we only considered adolescents and adults of the sleep-first group to ensure a similar level of juggling experience (for summary statistics of sleep architecture and SO and spindle events of subjects that entered the correlational analyses see Supplementary file – table 6)."

      The additional control analyses (Figure R7 & R8) are also now added to the revised manuscript as Figure 3 – figure supplement 3 & 4 in the results section (page 16, lines 356 – 360):

      "For a summary of the reported cluster-corrected partial correlations as well as analyses controlling for differences in sleep architecture see Figure 3 – figure supplement 3. Further, we also confirmed that our correlations are not influenced by individual differences in SO and spindle event parameters (Figure 3 – figure supplement 4)."

      5) The authors used a partial correlations to rule out that age drove the relationship between coupling strength, learning curve and task proficiency. It seems like this analysis was done specifically for electrode C4, after having already established that coupling strength at electrode C4 correlates in general with changes in the learning curve and task proficiency. I think the claim that results were not driven by age as confounding factor would be stronger if the authors used a cluster-corrected partial correlation in the first place (just as in the main analysis).

      The reviewers are correct that initially we only conducted the partial correlation for electrode C4. Following the reviewers suggestion we now additionally computed cluster-corrected partial correlations similar to our main analysis. Like in our original analyses, we found a significant positive central cluster (Figure R6A, mean rho = 0.40, p = 0.017) showing that higher coupling strength related to better task proficiency after sleep and a negative cluster-corrected correlation at C4 showing that higher coupling strength was related to flatter learning curves after sleep (Figure R6B, rho = -0.47, p = 0.049) also when controlling for age.

      Figure R6

      (A) Cluster-corrected partial correlation of individual coupling strength in the learning night and overnight change in task proficiency (post – pre retention) collapsed across adolescents and adults, controlling for age. Asterisks indicate cluster-corrected two-sided p < 0.05. A similar significant cluster to the original analysis (Figure 4A) emerged comprising electrodes Cz and C4. (B) Same conventions as in A. Like in the original analysis (Figure 4B) a negative correlation between coupling strength at C4 and learning curve change survived cluster-corrected partial correlations when controlling for age.

      We now always report cluster-corrected partial correlations when controlling for possible confounding variables in the updated version of the manuscript (also see answer to issue #7). A summary of all computed partial correlations including Figure R6 can now be found as Figure 3 – figure supplement 3 & 4 in the revised manuscript.

      Specifically we now state in the results section (page 16 – 17, lines 347 – 360):

      "To rule out age as a confounding factor that could drive the relationship between coupling strength, learning curve and task proficiency in the mixed sample, we used cluster-corrected partial correlations to confirm their independence of age differences (task proficiency: mean rho = 0.40, p = 0.017; learning curve: rhos = -0.47, p = 0.049). Additionally, given that we found that juggling performance could underlie a circadian modulation we controlled for individual differences in alertness between subjects due to having just slept. We partialed out the mean PVT reaction time before the juggling performance test after sleep from the original analyses and found that our results remained stable (task proficiency: mean rho = 0.37, p = 0.025; learning curve: rhos = -0.49, p = 0.040). For a summary of the reported cluster-corrected partial correlations as well as analyses controlling for differences in sleep architecture see Figure 3 – figure supplement 3. Further, we also confirmed that our correlations are not influenced by individual differences in SO and spindle event parameters (Figure 3 – figure supplement 4)."

      And in the methods section (page 35, lines 813 – 814):

      "To control for possible confounding factors we computed cluster-corrected partial rank correlations (Figure 3 – figure supplement 3 and 4)."

      References

      Aru, J., Aru, J., Priesemann, V., Wibral, M., Lana, L., Pipa, G., Singer, W. & Vicente, R. (2015) Untangling cross-frequency coupling in neuroscience. Curr Opin Neurobiol, 31, 51-61.

      Bothe, K., Hirschauer, F., Wiesinger, H. P., Edfelder, J., Gruber, G., Birklbauer, J. & Hoedlmoser, K. (2019) The impact of sleep on complex gross-motor adaptation in adolescents. Journal of Sleep Research, 28(4).

      Bothe, K., Hirschauer, F., Wiesinger, H. P., Edfelder, J. M., Gruber, G., Hoedlmoser, K. & Birklbauer, J. (2020) Gross motor adaptation benefits from sleep after training. J Sleep Res, 29(5), e12961.

      Campbell, I. G. & Feinberg, I. (2016) Maturational Patterns of Sigma Frequency Power Across Childhood and Adolescence: A Longitudinal Study. Sleep, 39(1), 193-201.

      Dayan, E. & Cohen, L. G. (2011) Neuroplasticity subserving motor skill learning. Neuron, 72(3), 443-54. De Gennaro, L. & Ferrara, M. (2003) Sleep spindles: an overview. Sleep Med Rev, 7(5), 423-40.

      De Gennaro, L., Ferrara, M., Vecchio, F., Curcio, G. & Bertini, M. (2005) An electroencephalographic fingerprint of human sleep. Neuroimage, 26(1), 114-22.

      Dinges, D. F., Pack, F., Williams, K., Gillen, K. A., Powell, J. W., Ott, G. E., Aptowicz, C. & Pack, A. I. (1997) Cumulative sleepiness, mood disturbance, and psychomotor vigilance performance decrements during a week of sleep restricted to 4-5 hours per night. Sleep, 20(4), 267-77.

      Dinges, D. F. & Powell, J. W. (1985) Microcomputer Analyses of Performance on a Portable, Simple Visual Rt Task during Sustained Operations. Behavior Research Methods Instruments & Computers, 17(6), 652-655.

      Eichenlaub, J. B., Biswal, S., Peled, N., Rivilis, N., Golby, A. J., Lee, J. W., Westover, M. B., Halgren, E. & Cash, S. S. (2020) Reactivation of Motor-Related Gamma Activity in Human NREM Sleep. Front Neurosci, 14, 449.

      Feinberg, I. & Campbell, I. G. (2013) Longitudinal sleep EEG trajectories indicate complex patterns of adolescent brain maturation. American Journal of Physiology - Regulatory, Integrative and Comparative Physiology, 304(4), R296-303.

      Hahn, M., Heib, D., Schabus, M., Hoedlmoser, K. & Helfrich, R. F. (2020) Slow oscillation-spindle coupling predicts enhanced memory formation from childhood to adolescence. Elife, 9.

      Helfrich, R. F., Lendner, J. D. & Knight, R. T. (2021) Aperiodic sleep networks promote memory consolidation. Trends Cogn Sci.

      Helfrich, R. F., Lendner, J. D., Mander, B. A., Guillen, H., Paff, M., Mnatsakanyan, L., Vadera, S., Walker, M. P., Lin, J. J. & T., K. R. (2019) Bidirectional prefrontal-hippocampal dynamics organize information transfer during sleep in humans. Nature Communications, 10(1), 3572.

      Helfrich, R. F., Mander, B. A., Jagust, W. J., Knight, R. T. & Walker, M. P. (2018) Old Brains Come Uncoupled in Sleep: Slow Wave-Spindle Synchrony, Brain Atrophy, and Forgetting. Neuron, 97(1), 221-230 e4.

      Killgore, W. D. (2010) Effects of sleep deprivation on cognition. Prog Brain Res, 185, 105-29.

      Kurth, S., Jenni, O. G., Riedner, B. A., Tononi, G., Carskadon, M. A. & Huber, R. (2010) Characteristics of sleep slow waves in children and adolescents. Sleep, 33(4), 475-80.

      Maris, E. & Oostenveld, R. (2007) Nonparametric statistical testing of EEG- and MEG-data. J Neurosci Methods, 164(1), 177-90.

      Muehlroth, B. E., Sander, M. C., Fandakova, Y., Grandy, T. H., Rasch, B., Shing, Y. L. & Werkle-Bergner, M. (2019) Precise Slow Oscillation-Spindle Coupling Promotes Memory Consolidation in Younger and Older Adults. Sci Rep, 9(1), 1940.

      Muehlroth, B. E. & Werkle-Bergner, M. (2020) Understanding the interplay of sleep and aging: Methodological challenges. Psychophysiology, 57(3), e13523.

      Niethard, N., Ngo, H. V. V., Ehrlich, I. & Born, J. (2018) Cortical circuit activity underlying sleep slow oscillations and spindles. Proceedings of the National Academy of Sciences of the United States of America, 115(39), E9220-E9229.

      Purcell, S. M., Manoach, D. S., Demanuele, C., Cade, B. E., Mariani, S., Cox, R., Panagiotaropoulou, G., Saxena, R., Pan, J. Q., Smoller, J. W., Redline, S. & Stickgold, R. (2017) Characterizing sleep spindles in 11,630 individuals from the National Sleep Research Resource. Nature Communications, 8, 15930.

      Van Dongen, H. P., Maislin, G., Mullington, J. M. & Dinges, D. F. (2003) The cumulative cost of additional wakefulness: dose-response effects on neurobehavioral functions and sleep physiology from chronic sleep restriction and total sleep deprivation. Sleep, 26(2), 117-26.

      Wilhelm, I., Metzkow-Meszaros, M., Knapp, S. & Born, J. (2012) Sleep-dependent consolidation of procedural motor memories in children and adults: the pre-sleep level of performance matters. Developmental Science, 15(4), 506-15.

      Winer, J. R., Mander, B. A., Helfrich, R. F., Maass, A., Harrison, T. M., Baker, S. L., Knight, R. T., Jagust, W. J. & Walker, M. P. (2019) Sleep as a potential biomarker of tau and beta-amyloid burden in the human brain. J Neurosci.

    1. Author Response

      Reviewer #1 (Public Review):

      This study explores the mechanisms responsible for reduced steroidogenesis of adrenocortical cells in a mouse model of systemic inflammation induced by LPS administration. Working from RNA and protein profiling data sets in adrenocortical tissue from LPS-treated mice they report that LPS perturbs the TCA cycle at the level of succinate dehydrogenase B (SDHB) impairing oxidative phosphorylation. Additional studies indicate these events are coupled to increased IL-1β levels which inhibit SDHB expression through DNA methyltransferase-dependent DNA methylation of the SDHB promoter.

      In general, these are interesting studies with some novel implications. I do, however, have concerns with some of the author's rather broad conclusions given the limitations of their experimental approach. The paper could be improved by addressing the following points:

      1) The limitations of using LPS as the model for systemic inflammation need to be explicitly described.

      We thank the Reviewer for this suggestion. Indeed, the LPS model has several limitations as a preclinical model of sepsis, which are outlined in the revised Discussion. Despite its limitations, we chose this model over other models of sepsis, such as the cecal slurry model, due to its high reproducibility, which enabled the here presented mechanistic studies.

      2) The initial in vivo findings, which support the proposed metabolic perturbation, are based on descriptive profiling data obtained at one time point following a single dose of LPS. The author's conclusion that the ultimate transcriptional pathway identified hinges critically on knowledge of the time course of this effect following LPS, which is not adequately addressed in the paper. How was this time and dose of LPS established and are there data from different dose and time points?

      We thank the Reviewer for raising this question, which we indeed addressed at the beginning of our studies in order to determine a suitable time point and dose of LPS treatment. We chose 6 h as a suitable starting time point to perform transcriptional analyses, based on the fact that LPS triggers transcriptional changes in the adrenal gland and other tissues within the range of few hours (1-3). Confirming our expectations we found 2,609 differentially expressed genes (Figure 1a) in the adrenal cortex of LPS-treated mice among which many were involved in cellular metabolism (Figure 1d,e, 2a-e, Table 1, Table 2). Acute transcriptional changes, which are more likely to reflect direct effects of inflammatory signals compared to changes occurring at later time points (for instance in the range of days), would allow us to mechanistically investigate the effects of inflammation in the adrenal gland, which was the purpose of our studies. Hence, we were guided by the transcriptional changes observed at 6 h of LPS treatment and established the hypothesis that disruption of the TCA cycle in adrenocortical cells is key in the impact of inflammation on adrenal function. Along this line, we analyzed the metabolomic profile of the adrenal gland at 6 and 24 h of LPS treatment. At 6 h succinate levels as well as the succinate / fumarate ratio remained unchanged (Author response image 1A), while at 24 h post-injection these were increased by LPS (Author response image 1B, Figure 2l,o,q). The time delay of the increase in succinate levels (observed at 24 h) following downregulation of Sdhb mRNA expression (at 6 h) can be explained by the time required for reduction of SDHB protein levels, which is dependent on the protein turnover suggested to be approximately 12 h in HeLa cells (4). Based on these findings, all further metabolomic analyses were performed at 24 h of LPS treatment.

      Author response image 1. LPS increases the succinate/fumarate ratio at 24 but not 6h. Mice were i.p. injected with 1 mg/kg LPS and 6 h (A) and 24 h (B) post-injection succinate and fumarate levels were determined by LC-MS/MS in the adrenal gland. n=8-10; data are presented as mean ± s.e.m. Statistical analysis was done with two-tailed Mann-Whitney test. *p < 0.05.

      Having established the most suitable time points of LPS treatments to observe induced transcriptional and metabolic changes, we set out to define the LPS dose to be used in subsequent experiments. The data shown in Author response image 1, were acquired after treatment with 1 mg/kg LPS. This is a dose that was previously reported to cause transcriptional re-profiling of the adrenal gland (1, 2). However, 5 mg/kg LPS, similarly to 1 mg/kg LPS, also reduced Sdhb, Idh1 and Idh2 expression at 4 h (Author response image 2A) and increased succinate and isocitrate levels at 24 h (Author response image 2B) in the adrenal gland. Given that the effects of 1 and 5 mg/kg LPS were similar, for animal welfare reasons we continued our studies with the lower dose.

      Author response image 2. Five mg/kg LPS downregulate Sdhb, Idh1 and Idh2 expression and increase succinate and isocitrate levels in the adrenal gland of mice. Sdhb, Idh1 and Idh2 expression (A) and succinate and isocitrate levels (B) were assessed in the adrenal gland of mice treated with 5 mg/kg LPS for 4 h (A) and 24 h (B). n=5; data are presented as mean ± s.d. Statistical analysis was done with two-tailed Mann-Whitney test. p < 0.05, *p < 0.01.

      3) Related to the point above, the authors data supporting a break in the TCA cycle would be strengthened direct biochemical assessment (metabolic flux analysis) of step kin the TCA cycle process impacted.

      We entirely agree with the Reviewer and considered performing TCA cycle metabolic flux analyses in adrenocortical cells. Unfortunately, the low yield of adrenocortical cells per mouse (approx. 3,000- 6,000) does not allow the performance of metabolic flux experiments, which require higher cell numbers per sample, several time points per condition and an adequate number of replicates per experiment. Moreover, NCI-H295R cells being adrenocortical carcinoma cells are expected to have substantially altered metabolic fluxes compared to normal cells. Since we wouldn’t have the capacity to confirm findings from metabolic flux experiments in NCI-H295R cells in primary adrenocortical cells, as we did for the rest of the experiments, we decided to not perform metabolic flux experiments in NCI-H295R cells. However, performing metabolic flux analyses in adrenocortical cells under inflammatory or other stress conditions remains an important future task that we will pursue upon establishment of a more suitable cell culture system.

      4) The proposed connection of DNMT and IL1 signaling to systemic inflammation and reduced steroidogenesis could be more firmly established by additional studies in adrenal cortical cells lacking these genes.

      We thank the Reviewer for this excellent suggestion. In the revised manuscript we strengthened the evidence for an IL-1β –DNMT1 link and show that DNMT1 deficiency blocks the effects of IL-1β on SDHB promoter methylation (Figure 6k), the succinate / fumarate ratio (Figure 6m), the oxygen consumption rate (Figure 6n) and steroidogenesis (Figure 6o-q) in adrenocortical cells. In order to validate the role of IL-1β in vivo, mice were simultaneously treated with LPS and Raleukin, an IL-1R antagonist. Treatment with Raleukin increased the SDH activity (Figure 6r), reduced succinate levels and the succinate / fumarate ratio (Figure 6s,t) and increased corticosterone production in LPS-treated mice (Figure 6u).

      Reviewer #2 (Public Review):

      The present manuscript provides a mechanistic explanation for an event in adrenal endocrinology: the resistance which develops during excessive inflammation relative to acute inflammation. The authors identify disturbances in adrenal mitochondria function that differentiate excessive inflammation. During severe inflammation the TCA in the adrenal is disrupted at the level of succinate production producing an accumulation of succinate in the adrenal cortex. The authors also provide a mechanistic explanation for the accumulation of succinate, they demonstrate that IL1b decreases expression of SDH the enzyme that degrades succinate through a methylation event in the SDH promoter. This work presents a solid explanation for an important phenomenon. Below are a few questions that should be resolved experimentally.

      1) The authors should confirm through direct biochemical assays of enzymatic activity that steroidogenesis enzyme activity is not impaired. Many of these enzymes are located in the mitochondria and their activity may be diminished due to the disturbed, high succinate environment of the cortical cell as opposed to the low ATP production.

      We thank the Reviewer for this question. The activity of the first and rate-limiting steroidogenic enzyme, cytochrome P450-side-chain-cleavage (SCC, CYP11A1) which generates pregnenolone from cholesterol, was recently shown to require intact SDH function (5). In agreement with this report we show that production of progesterone, the direct derivative of pregnenolone, is impaired upon SDH inhibition (Figure 5b,e,h). In addition, we assessed the activity of CYP11B1 (steroid 11β-hydroxylase), the enzyme catalyzing the conversion of 11-deoxycorticosterone to corticosterone, i.e. the last step of glucocorticoid synthesis, by determining the corticosterone and 11-deoxycorticosterone levels by LC-MS/MS and calculating the ratio of corticosterone to 11-deoxycorticosterone in ACTH-stimulated adrenocortical cells and explants. The corticosterone / 11-deoxycorticosterone ratio was not affected by Sdhb silencing in adrenocortical cells (Figure 5- Supplement 2g) nor did it change upon LPS treatment in adrenal explants (Figure 5- Supplement 2h), suggesting that CYP11B1 activity may not be altered upon SDH blockage. Hence, we propose that upon inflammation impairment of SDH function may disrupt at least the first steps of steroidogenesis (producing pregnenolone/progesterone), thereby diminishing production of all downstream adrenocortical steroids. This is now discussed in the revised manuscript.

      2) What is the effect of high ROS production? Is steroidogenesis resolved if ROS is pharmacologically decreased even if the reduction of ATP is not resolved?

      We thank the Reviewer for this suggestion, which helped us to broaden our findings. Indeed, ROS scavenging by the vitamin E analog Trolox (Figure 5n) partially reversed the inhibitory effect of DMM on steroidogenesis (Figure 5o,p), suggesting that impairment of SDH function impacts steroidogenesis also via enhanced ROS production (Figure 4g).

      3) Does increased intracellular succinate (through cell permeable succinate treatment) inhibit steroidogenesis even if there is not a blockage of OXPHOS?

      We suggest that SDH inhibition and succinate accumulation lead to reduced steroidogenesis due to impaired oxidative phosphorylation (Figure 4c,e, 5i), reduced ATP synthesis (Figure 4d, 5j-m) and increased ROS production (Figure 4g, 5o,p). Since SDH is part (complex II) of the electron chain transfer it cannot be decoupled from oxidative phosphorylation, thereby limiting the experimental means for addressing this question.

      4) It should be demonstrated the genetic loss of IL1 signaling in adrenal cortical cells results in a loss of the effect of LPS on reduced steroidogenesis and increased succinate accumulation.

      We thank the Reviewer for this suggestion. Development of a mouse line with genetic loss of Il-1r in adrenocortical cells was rather impossible during the short time of revisions. Instead, mice under LPS treatment were treated with the IL-1R antagonist, Raleukin, to study the in vivo effects of IL-1β in the adrenal gland. IL-1R antagonism increased SDH activity in the adrenal cortex (Figure 6r), decreased succinate levels and the succinate/fumarate ratio in the adrenal gland (Figure 6s,t) and enhanced corticosterone production (Figure 6u) in LPS-treated mice, supporting our hypothesis that IL-1β mediates the effects of systemic inflammation in the adrenal cortex.

      5) It should be demonstrated the genetic loss of IL1 signaling in adrenal cortical cells results in a loss of the effect of LPS on SDH activity and ATP production and SDH promoter methylation

      As outlined above, Raleukin treatment increased SDH activity in the adrenal cortex (Figure 6r) and decreased succinate levels and the succinate/fumarate ratio in the adrenal gland (Figure 6s,t) of mice treated with LPS. Furthermore, IL-1β reduced the ATP/ADP ratio (Figure 6e) and enhanced SDHB promoter methylation in NCI-H295R cells (Figure 6k).

      6) It should be shown that the silencing of DNMT eliminates or diminishes the effect of LPS on reduced steroidogenesis and increased succinate accumulation.

      We thank the Reviewer for this suggestion, which prompted us to strengthen the evidence for the implication of DNMT1 in the effects of LPS on adrenocortical cell metabolism and function. As mentioned above, development of a new mouse line, in this case bearing genetic loss of DNMT1 in adrenocortical cells, was considered impossible during the short time of revisions. Therefore, we assessed the role of DNMT1 by silencing it via siRNA transfections in primary adrenocortical cells and NCI-H295R cells. We show that DNMT1 silencing inhibits the effect of IL-1β on SDHB promoter methylation (Figure 6k), restores Sdhb expression (Figure 6l) and reduces the succinate/fumarate ratio in IL-1β treated adrenocortical cells (Figure 6m). Accordingly, DNMT1 silencing restores ACTH-induced production of corticosterone, 11-deoxycorticosterone and progesterone in IL-1β treated adrenocortical cells (Figure 6o-q). We chose to stimulate adrenocortical cells with IL-1β instead of LPS, as in vitro the effects of IL-1β were more robust than these of LPS (possibly due to a reduction of TLR4 expression or function in cultured adrenocortical cells) and in order to show the link between IL-1β and DNMT1.

      7) Does silencing of DNMT reduce OXPHOS in adrenal cortical cells?

      We measured the oxygen consumption rate in NCI-H295R cells, which were transfected with siRNA against DNMT1 and treated or not with IL-1β. IL-1β reduced the OCR in cells transfected with control siRNA, while DNMT1 silencing blunted the effect of IL-1β (Figure 6n).

      8) The effects of LPS on reduced adrenal steroidogenesis are not elaborated at the physiological level. The manuscript should demonstrate the ramifications of the adrenal function decreasing after LPS. Does CORT release become less pronounced after subsequent challenges? Does baseline CORT decrease at some point? No physiological consequences are shown. Similarly, these physiological consequences of decreased adrenal function should be dependent on decreased SDH activity and OXPHOS in adrenal cells and this should be demonstrated experimentally.

      We thank the Reviewer for raising this excellent question. Inflammation is a potent inducer of the Hypothalamus-Pituitary-Adrenal gland (HPA) axis, causing increased glucocorticoid production, a stress response leading to vital immune and metabolic adaptations. Accordingly, LPS treatment rapidly increases glucocorticoid production in mice (1, 6, 7). Reduced adrenal gland responsiveness to ACTH associates with decreased survival of septic mice (8). These preclinical findings stand in accordance with observations in septic patients, in which impairment of adrenal function correlates with high risk for death (9). Along this line, ACTH test was suggested to have prognostic value for identification of septic patients with high mortality risk (9, 10).

      In order to confirm impairment of the adrenal gland function in septic mice, animals were subjected to sepsis via administration of a high LPS dose (10 mg / kg) and treated with ACTH 24 h later. Indeed, the ACTH-induced increase in corticosterone levels was diminished in LPS-treated mice (Author response image 3). This finding was further confirmed in adrenal explants, in which LPS pre-treatment also blunted ACTH-stimulated corticosterone production (Figure 5s).

      Author response image 3. High LPS dose blunts the ACTH response in mice. C57BL/6J mice were i.p. injected with 10 mg/kg LPS or PBS and 24 h later they were i.p. injected with 1 mg/kg ACTH. One hour after ACTH administration blood was retroorbitally collected and corticosterone plasma levels were determined by LC-MS/MS. n=4-5; data are presented as mean ± s.d. Statistical analysis was done with two-tailed Mann-Whitney test. *p < 0.05.

      Given that purpose of our studies was to dissect the mechanisms underlying adrenal gland dysfunction in inflammation rather than analyzing the physiological consequences thereof, we chose not to follow these lines of investigations and concentrate on the role of cell metabolism in adrenocortical cells in the context of inflammation.

      References

      1. W. Kanczkowski, A. Chatzigeorgiou, M. Samus, N. Tran, K. Zacharowski, T. Chavakis, S. R. Bornstein, Characterization of the LPS-induced inflammation of the adrenal gland in mice. Mol Cell Endocrinol 371, 228-235 (2013).
      2. L. S. Chen, S. P. Singh, M. Schuster, T. Grinenko, S. R. Bornstein, W. Kanczkowski, RNA-seq analysis of LPS-induced transcriptional changes and its possible implications for the adrenal gland dysregulation during sepsis. J Steroid Biochem Mol Biol 191, 105360 (2019).
      3. V. I. Alexaki, G. Fodelianaki, A. Neuwirth, C. Mund, A. Kourgiantaki, E. Ieronimaki, K. Lyroni, M. Troullinaki, C. Fujii, W. Kanczkowski, A. Ziogas, M. Peitzsch, S. Grossklaus, B. Sonnichsen, A. Gravanis, S. R. Bornstein, I. Charalampopoulos, C. Tsatsanis, T. Chavakis, DHEA inhibits acute microglia-mediated inflammation through activation of the TrkA-Akt1/2-CREB-Jmjd3 pathway. Mol Psychiatry 23, 1410-1420 (2018).
      4. C. Yang, J. C. Matro, K. M. Huntoon, D. Y. Ye, T. T. Huynh, S. M. Fliedner, J. Breza, Z. Zhuang, K. Pacak, Missense mutations in the human SDHB gene increase protein degradation without altering intrinsic enzymatic function. FASEB J 26, 4506-4516 (2012).
      5. H. S. Bose, B. Marshall, D. K. Debnath, E. W. Perry, R. M. Whittal, Electron Transport Chain Complex II Regulates Steroid Metabolism. iScience 23, 101295 (2020).
      6. W. Kanczkowski, V. I. Alexaki, N. Tran, S. Grossklaus, K. Zacharowski, A. Martinez, P. Popovics, N. L. Block, T. Chavakis, A. V. Schally, S. R. Bornstein, Hypothalamo-pituitary and immune-dependent adrenal regulation during systemic inflammation. Proc Natl Acad Sci U S A 110, 14801-14806 (2013).
      7. W. Kanczkowski, A. Chatzigeorgiou, S. Grossklaus, D. Sprott, S. R. Bornstein, T. Chavakis, Role of the endothelial-derived endogenous anti-inflammatory factor Del-1 in inflammation-mediated adrenal gland dysfunction. Endocrinology 154, 1181-1189 (2013).
      8. C. Jennewein, N. Tran, W. Kanczkowski, L. Heerdegen, A. Kantharajah, S. Drose, S. Bornstein, B. Scheller, K. Zacharowski, Mortality of Septic Mice Strongly Correlates With Adrenal Gland Inflammation. Crit Care Med 44, e190-199 (2016).
      9. D. Annane, V. Sebille, G. Troche, J. C. Raphael, P. Gajdos, E. Bellissant, A 3-level prognostic classification in septic shock based on cortisol levels and cortisol response to corticotropin. JAMA 283, 1038-1045 (2000).
      10. E. Boonen, S. R. Bornstein, G. Van den Berghe, New insights into the controversy of adrenal function during critical illness. Lancet Diabetes Endocrinol 3, 805-815 (2015).
      11. C. C. Huang, Y. Kang, The transient cortical zone in the adrenal gland: the mystery of the adrenal X-zone. J Endocrinol 241, R51-R63 (2019).
    1. Author Response

      Reviewer #1:

      This is a very timely paper that addresses an important and difficult-to-address question in the decision-making field - the degree to which information leakage can be strategically adapted to optimise decisions in a task-dependent fashion. The authors apply a sophisticated suite of analyses that are appropriate and yield a range of very interesting observations. The paper centres on analyses of one possible model that hinges on certain assumptions about the nature of the decision process for this task which raises questions about whether leak adjustments are the only possible explanation for the current data. I think the conclusions would be greatly strengthened if they were supported by the application and/or simulation of alternative model structures.

      We thank the reviewer for this positive appraisal of our study. We now entirely agree with their central comment about whether leak adjustments are the only (or even the best) explanation for the current data. We hope that the additional modelling sections that we have discussed in response to main comment 1 above have strengthened the paper. We have responded point-by-point to their public review, as this contained their main recommendations for revision.

      The behavioural trends when comparing blocks with frequent versus rare response periods seem difficult to tally with a change in the leak. […] Are there other models that could reproduce such effects? For example, could a model in which the drift rate varies between Rare and Frequent trials do a similar or better job of explaining the data?

      We can see why the reviewer has advocated for a possible change of drift rate (or ‘gain’ applied to sensory evidence) between conditions to explain our behavioural findings. We found, however, that changes in drift rate could elicit qualitatively similar changes in integration kernels to changes in decision threshold:

      Author response image 1.

      Changes in gain applied to incoming sensory evidence (A parameter in model) have similar effects on recovered integration kernels from Ornstein-Uhlenbeck simulation as changes in decision threshold.

      The likely reason for this is that the overall probability of emitting a response at any point in the continuous decision process is determined by the ratio of accumulated evidence to decision threshold. A similar logic applies to effects on reactions times and detection probability (main figure 2): increasing sensory gain/decreasing decision threshold will lead to faster reaction times and increased detection probability during response periods.

      Both parameters may even have a similar effect on ‘false alarms’, because (as the reviewer notes below) false alarms in our paradigm are primarily being driven by the occurrence of stimulus changes as well as internal noise. In fact, the false alarm findings mean it is difficult to fully reconcile all of our behavioural findings in terms of changes in a single set of model parameters in the O-U process. It is possible that other changes not considered within our model (such as expectations of hazard rates of inter-response intervals leading to dynamic thresholds etc.) may have had a strong impact upon the resulting false alarm rates. A full exploration of different variations in O-U model (with varying urgency signals, hazard rates, etc.) is beyond the scope of this paper.

      For this reason, we have decided in our new modelling section to focus primarily on a single, well-established model (the O-U process) and explore how changes in leak and threshold affect task performance and the resulting integration kernels. We note that this is in line with the suggestion of reviewer #2, who focussed on similar behavioural findings to reviewer #1 but suggested that we look at decision threshold rather than drift rate as our primary focus.

      This ties in to a related query about the nature of the task employed by the authors. Due to the very significant volatility of the stimulus, it seems likely that the participants are not solely making judgments about the presence/absence of coherent motion but also making judgments about its duration (because strong coherent motion frequently occurs in the inter-target intervals). If that is so, then could the Rare condition equate to less evidence because there is an increased probability that an extended period of coherent motion could be an outlier generated from the noise distribution? Note that a drift rate reduction would also be expected to result in fewer hits and slower reaction times, as observed.

      As mentioned above, the rare and frequent targets are indeed matched in terms of the ease with which they can be distinguished from the intervening noise intervals. To confirm this, we directly calculated the variance (across frames) of the motion coherence presented during baseline periods and response periods (until response) in all four conditions:

      Author response image 2.

      The average empirical standard deviation of the stimulus stream presented during each baseline period (‘baseline’) and response period (‘trial’), separated by each of the four conditions (F = frequent response periods, R = rare, L = long response periods, S = short). Data were averaged across all response/baseline periods within the stimuli presented to each participant (each dot = 1 participant). Note that the standard deviation shown here is the standard deviation of motion coherence across frames of sensory evidence. This is smaller than the standard deviation of the generative distribution of ‘step’-changes in the motion coherence (std = 0.5 for baseline and 0.3 for response periods), because motion coherence remains constant for a period after each ‘step’ occurs.

      Some adjustment of the language used when discussing FAs seems merited. If I have understood correctly, the sensory samples encountered by the participants during the inter-response intervals can at times favour a particular alternative just as strongly (or more strongly) than that encountered during the response interval itself. In that sense, the responses are not necessarily real false alarms because the physical evidence itself does not distinguish the target from the non-target. I don't think this invalidates the authors' approach but I think it should be acknowledged and considered in light of the comment above regarding the nature of the decision process employed on this task.

      This is a good point. We hope that the reviewer will allow us to keep the term ‘false alarms’ in the paper, as it does conveniently distinguish responses during baseline periods from those during response periods, but we have sought to clarify the point that the reviewer makes when we first introduce the term.

      “Indeed, participants would occasionally make ‘false alarms’ during baseline periods in which the structure of the preceding noise stream mistakenly convinced them they were in a response period (see Figure 4, below). Indeed, this means that a ‘false alarm’ in our paradigm has a slightly different meaning than in most psychophysics experiments; rather than it referring to participants responding when a stimulus was not present, we use the term to refer to participants responding when there was no shift in the mean signal from baseline.”

      And:

      “The fact that evidence integration kernels naturally arise from false alarms, in the same manner as from correct responses, demonstrates that false alarms were not due to motor noise or other spurious causes. Instead, false alarms were driven by participants treating noise fluctuations during baseline periods as sensory evidence to be integrated across time, and the physical evidence preceding ‘false alarms’ need not even distinguish targets from non-targets.”

      The authors report that preparatory motor activity over central electrodes reached a larger decision threshold for RARE vs. FREQUENT response periods. It is not clear what identifies this signal as reflecting motor preparation. Did the authors consider using other effectorselective EEG signatures of motor preparation such as beta-band activity which has been used elsewhere to make inferences about decision bounds? Assuming that this central ERP signal does reflect the decision bounds, the observation that it has a larger amplitude at the response on Rare trials appears to directly contradict the kernel analyses which suggest no difference in the cumulative evidence required to trigger commitment.

      Thanks for this comment. First, we should simply comment that this finding emerged from an agnostic time-domain analysis of the data time-locked to button presses, in which we simply observed that the negative-going potential was greater (more negative) in RARE vs. FREQUENT trials. So it is simply the fact that it precedes each button press that we relate it to motor preparation; nonetheless, we note that (Kelly and O’Connell, 2013) found similar negative-going potentials at central sensors without applying CSD transform (as in this study). Like them, we would relate this potential to either the well-established Bereitschaftpotential or the contingent negative potential (CNV).

      We agree that many other studies have focussed on beta-band activity as another measure of motor preparation, and to make inferences about decision bounds. To investigate this, we used a Morlet wavelet transform to examine the time-varying power estimate at a central frequency of 20Hz (wavelet factor 7). We repeated the convolutional GLM analysis on this time-varying power estimate.

      We first examined average beta desynchonisation at a central cluster of electrodes (CPz, CP1, CP2, C1, Cz, C2) in the run-up to correct button presses during response periods. We found a reliable beta desynchonisation occurred, and, just as in the time-domain signal, this reached a greater threshold in the RARE trials than in the FREQUENT trials:

      Author response image 3.

      Beta desynchronisation prior to a correct response is greater over central electrodes in the RARE condition than in the FREQUENT condition.

      We agree with the reviewer that this is likely indicative of a change in decision threshold between rare and frequent trials. We also note that our new computational modelling of the O-U process suggests that this in fact reconciles well with the behavioural findings (changes in integration kernels). We now mention this at the relevant point in the results section:

      “As large changes in mean evidence are less frequent in the RARE condition, the increased neural response to |Devidence| may reflect the increased statistical surprise associated with the same magnitude of change in evidence in this condition. In addition, when making a correct response, preparatory motor activity over central electrodes reached a larger decision threshold for RARE vs. FREQUENT response periods (Figure 7b; p=0.041, cluster-based permutation test). We found similar effects in beta-band desynchronisation prior, averaged over the same electrodes; beta desynchronisation was greater in RARE than FREQUENT response periods. As discussed in the computational modelling section above, this is consistent with the changes in integration kernels between these conditions as it may reflect a change in decision threshold (figure 2d, 3c/d). It is also consistent with the lower detection rates and slower reaction times when response periods are RARE (figure 2 b/c).”

      We did also investigate the lateralised response (left minus right beta-desynchronisation, contrasted on left minus right responses). We found, however, that we were simply unable to detect a reliable lateralised signal in either condition using these lateralised responses. We suspect that this is because we have far fewer response periods than conventional trialbased EEG experiments of decision making, and so we did not have sufficient SNR to reliably detect this signal. This is consistent with standard findings in the literature, which report that the magnitude of the lateralised signal is far smaller than the magnitude of the overall beta desynchronisation (e.g. (Doyle et al., 2005))

      P11, the "absolute sensory evidence" regressor elicited a triphasic potential over centroparietal electrodes. The first two phases of this component look to have an occipital focus. The third phase has a more centroparietal focus but appears markedly more posterior than the change in evidence component. This raises the question of whether it is safe to assume that they reflect the same process.

      We agree. We have now referred to this as a ‘triphasic component over occipito-parietal cortex’ rather than centroparietal electrodes.

      Reviewer #2:

      Overall, the authors use a clever experimental design and approach to tackle an important set of questions in the field of decision-making. The manuscript is easy to follow with clear writing. The analyses are well thought-out and generally appropriate for the questions at hand. From these analyses, the authors have a number of intriguing results. So, there is considerable potential and merit in this work. That said, I have a number of important questions and concerns that largely revolve around putting all the pieces together. I describe these below.

      Thanks to the reviewer for their positive appraisal of the manuscript; we are obviously pleased that they found our work to have considerable potential and merit. We seek to address the main comments from their public review and recommendations below.

      1) It is unclear to what extent the decision threshold is changing between subjects and conditions, how that might affect the empirical integration kernel, and how well these two factors can together explain the overall changes in behavior.

      I would expect that less decay in RARE would have led to more false alarms, higher detection rates, and faster RTs unless the decision threshold also increased (or there was some other additional change to the decision process). The CPP for motor preparatory activity reported in Fig. 5 is also potentially consistent with a change in the decision threshold between RARE and FREQUENT. If the decision threshold is changing, how would that affect the empirical integration kernel? These are important questions on their own and also for interpreting the EEG changes.

      This important comment, alongside the comments of reviewer 1 above, made us carefully consider the effects of changes in decision threshold on the evidence integration kernel via simulation. As discussed above (in response to ‘essential revisions for the authors’), we now include an entirely new section on how changes in decision threshold and leak may affect the evidence integration kernel, and be used to optimise performance across the different sensory environments. In particular, we agree with the reviewer that the motor preparatory activity that differs between RARE and FREQUENT is consistent with a change in decision threshold, and our simulations have suggested that our behavioural findings on evidence integration are also consistent with this change as well. These are detailed on pp.1-4 of the rebuttal, above.

      2) The authors find an interesting difference in the CPP for the FREQUENT vs RARE conditions where they also show differences in the decay time constant from the empirical integration kernel. As mentioned above, I'm wondering what else may be different between these conditions. Do the authors have any leverage in addressing whether the decision threshold differs? What about other factors that could be important for explaining the CPP difference between conditions? Big picture, the change in CPP becomes increasingly interesting the more tightly it can be tied to a particular change in the decision process.

      We fully agree with the spirit of this comment, and we’ve tried much more carefully to consider what the influences of decision threshold and leak would be on our behavioural analyses. As discussed in the response to reviewer 1, we think that the negative-going potential at the time of responses (which is greater in RARE vs. FREQUENT, main figure 7b, and mirrored by equivalent changes in beta desynchronisation, see Reviewer Response Figure 5 above) are both reflective of a change in decision threshold between RARE and FREQUENT conditions. We have tried to make this link explicit in the revised results section:

      “As large changes in mean evidence are less frequent in the RARE condition, the increased neural response to |Devidence| may reflect the increased statistical surprise associated with the same magnitude of change in evidence in this condition. In addition, when making a correct response, preparatory motor activity over central electrodes reached a larger decision threshold for RARE vs. FREQUENT response periods (Figure 7b; p=0.041, cluster-based permutation test). We found similar effects in beta-band desynchronisation prior, averaged over the same electrodes; beta desynchronisation was greater in RARE than FREQUENT response periods. As discussed in the computational modelling section above, this is consistent with the changes in integration kernels between these conditions as it may reflect a change in decision threshold (figure 2d, 3c/d). It is also consistent with the lower detection rates and slower reaction times when response periods are RARE (figure 2 b/c).”

      I'll note that I'm also somewhat skeptical of the statements by the authors that large shifts in evidence are less frequent in the RARE compared to FREQUENT conditions (despite the names) - a central part of their interpretation of the associated CPP change. The FREQUENT condition obviously has more frequent deviations from the baseline, but this is countered to some extent by the experimental design that has reduced the standard deviation of the coherence for these response periods. I think a calculation of overall across-time standard deviation of motion coherence between the RARE and FREQUENT conditions is needed to support these statements, and I couldn't find that calculation reported. The authors could easily do this, so I encourage them to check and report it.

      See Author response image 2.

      3) The wide range of decay time constants between subjects and the correlation of this with another component of the CPP is also interesting. However, in trying to interpret this change in CPP, I'm wondering what else might be changing in the inter-subject behavior. For instance, it looks like there could be up to 4 fold changes in false alarm rates. Are there other changes as well? Do these correlate with the CPP? Similar to my point above, the changes in CPP across subjects become increasingly interesting the more tightly it can be tied to a particular difference in subject behavior. So, I would encourage the authors to examine this in more depth.

      Thanks for the interesting suggestion. We explored whether there might be any interindividual correlation in this measure with the false alarm rate across participants, but found that there was no such correlation. (See Author response image 4; plotting conventions are as in main figure 9).

      Author response image 4.

      No evidence of between-subject correlations in CPP responses and false alarm rates, in any of the four conditions.

      We hope instead that the extended discussion of how the integration kernel should be interpreted (in light of computational modelling) provides at least some increased interpretability of the between-subject effects that we report in figure 9.

      Reviewer #3 (Public Review):

      The main strength is in the task design which is novel and provides an interesting approach to studying continuous evidence accumulation. Because of the continuous nature of the task, the authors design new ways to look at behavioral and neural traces of evidence. The reverse-correlation method looking at the average of past coherence signals enables us to characterize the changes in signal leading to a decision bound and its neural correlate. By varying the frequency and length of the so-called response period, that the participants have to identify, the method potentially offers rich opportunities to the wider community to look at various aspects of decision-making under sensory uncertainty.

      We are pleased that the reviewer agrees with our general approach as a novel way of characterising various aspects of decision-making under uncertainty.

      The main weaknesses that I see lie within the description and rigor of the method. The authors refer multiple times to the time constant of the exponential fit to the signal before the decision but do not provide a rigorous method for its calculation and neither a description of the goodness of the fit. The variable names seem to change throughout the text which makes the argumentation confusing to the reader. The figure captions are incomplete and lack clarity.

      We apologise that some of our original submission was difficult to follow in places, and we are very grateful to the reviewer for their thorough suggestions for how this could be improved. We address these in turn below, and we hope that this answers their questions, and has also led to a significant improvement in the description and rigour of the methodology.

    1. Author Response:

      Reviewer #1:

      This manuscript by Gabor Tamas' group defines features of ionotropic and metabotropic output from a specific cortical GABAergic cell cortical type, so-called neurogliaform cells (NGFCs), by using electrophysiology, anatomy, calcium imaging and modelling. Experimental data suggest that NGFCs converge onto postsynaptic neurons with sublinear summation of ionotropic GABAA potentials and linear summation of metabotropic GABAB potentials. The modelling results suggest a preferential spatial distribution of GABA-B receptor-GIRK clusters on the dendritic spines of postsynaptic neurons. The data provide the first experimental quantitative analysis of the distinct integration mechanisms of GABA-A and GABA-B receptor activation by the presynaptic NGFCs, and especially gain insights into the logic of the volume transmission and the subcellular distribution of postsynaptic GABA-B receptors. Therefore, the manuscript provides novel and important information on the role of the GABAergic system within cortical microcircuits.

      We have made all changes humanely possible under the current circumstances and we are open to further suggestions deemed necessary.

      Reviewer #2:

      The authors present a compelling study that aims to resolve the extent to which synaptic responses mediated by metabotropic GABA receptors (i.e. GABA-B receptors) summate. The authors address this question by evaluating the synaptic responses evoked by GABA released from cortical (L1) neurogliaform cells (NGFCs), an inhibitory neuron subtype associated with volume neurotransmission, onto Layer 2/3 pyramidal neurons. While response summation mediated by ionotropic receptors is well-described, metabotropic receptor response summation is not, thereby making the authors' exploration of the phenomenon novel and impactful. By carrying out a series of elegant and challenging experiments that are coupled with computational analyses, the authors conclude that summation of synaptic GABA-B responses is linear, unlike the sublinear summation observed with ionotropic, GABA-A receptor-mediated responses.

      The study is generally straightforward, even if the presentation is often dense. Three primary issues worth considering include:

      1) The rather strong conclusion that GABA-B responses linearly summate, despite evidence to the contrary presented in Figure 5C.

      2) Additional analyses of data presented in Figure 3 to support the contention that NGFCs co-activate.

      3) How the MCell model informs the mechanisms contributing to linear response summation.

      These and other issues are described further below. Despite these comments, this reviewer is generally enthusiastic about the study. Through a set of very challenging experiments and sophisticated modeling approaches, the authors provide important observations on both (1) NGFC-PC interactions, and (2) GABA-B receptor mediated synaptic response dynamics.

      The differences between the sublinear, ionotropic responses and the linear, metabotropic responses are small. Understandably, these experiments are difficult – indeed, a real tour de force – from which the authors are attempting to derive meaningful observations. Therefore, asking for more triple recordings seems unreasonable. That said, the authors may want to consider showing all control and gabazine recordings corresponding to these experiments in a supplemental figure. Also, why are sublinear GABA-B responses observed when driven by three or more action potentials (Figure 5C)? It is not clear why the authors do not address this observation considering that it seems inconsistent with the study's overall message. Finally, the final readout – GIRK channel activation – in the MCell model appears to summate (mostly) linearly across the first four action potentials. Is this true and, if so, is the result inconsistent with Figure 5C?

      GABAB responses elicited by three and four presynaptic NGFC action potentials were investigated to have a better understanding about the extremities of NGFC-PC connection. Although, our spatial model suggests that in L1 in a single volumetric point one or two NGFCs could provide GABAB response with their respective volume transmission, it is still important that in the minority of the percentage three or more NGFCs could converge their output. The experiments in Fig 5 not only offer mechanistic understanding that possible HCN channel activation and GABA reuptake do not influence significantly the summation of metabotropic receptor-mediated responses, but also support additional information about the extensive GABAB signaling from more than two NGFC outputs. Interestingly in this experiment the summation until two action potentials show very similar linear integration as seen in the triplet recordings. This result suggests that the temporal and spatial summation is identical when limited inputs are arriving to the postsynaptic target cell. Similar summation interaction can be seen in our model until two consecutive GABA releases. Three or four consecutive GABA releases in our model still produces linear summation, our experiments show moderate sublinearity. One possible answer for this inconsistency is the vesicle depletion in NGFCs after multiple rapid release of GABA, which was not taken into account in our model.

      Presumably, the motivation for Figure 3 is that it provides physiological context for when NGFCs might be coactive, thereby providing the context for when downstream, PC responses might summate. This is a nice, technically impressive addition to the study. However, it seems that a relevant quantification/evaluation is missing from the figure. That is, the authors nicely show that hind limb stimulation evokes responses in the majority of NGFCs. But how many of these neurons are co-active, and what are their spatial relationships? Figure 3D appears to begin to address this point, but it is not clear if this plot comes from a single animal, or multiple? Also, it seems that such a plot would be most relevant for the study if it only showed alpha-actin 2-positive cells. In short, can one conclude that nearby, presumptive NGFCs co-activate, and is this conclusion derived from multiple animals?

      The aim of Fig. 3 D was to indicate that the active, presumably NGFCs are spatially located close to each other. The figure comes from a single animal. We agree with the reviewer, therefore changed the scatter plot figure in Fig. 3D to another one, that provides information about the molecular profiles of the active/inactive cells. We made an effort to further analyze our in vivo data and the spatial localization of the monitored interneurons (see Author response image 3.). The results are from 4 different animals, in these experiments numerous L1 interneurons are active during the sensory stimulus, as shown in the scatter plot. We calculated the shortest distance between all active cells and all ɑ-actinin2+ that were active in experiments. The data suggest that in the case of identified active ɑ-actinin2+ cells, the interneuron somas were on average 182.69+60.54 or 305.135+34.324 μm distance from each other. Data from Fig. 2D indicates that the average axonal arborization of the NGFCs is reaching ~200-250μm away. Taken these two data together, in theory it is probable that the spatial localization would allow neighboring NGFCs to directly interact in the same spatial point.

      The inclusion of the diffusion-based model (MCell) is commendable and enhances the study. Also, the description of GABA-B receptor/GIRK channel activation is highly quantitative, a strength of the study. However, a general summary/synthesis of the observations would be helpful. Moreover, relating the simulation results back to the original motivation for generating the MCell model would be very helpful (i.e. the authors asked whether "linear summation was potentially a result of the locally constrained GABAB receptor - GIRK channel interaction when several presynaptic inputs converge"). Do the model results answer this question? It seems as if performing "experiments" on the model wherein local constraints are manipulated would begin to address this question. Why not use the model to provide some data – albeit theoretical – that begins to address their question?

      We re-formulated the problem to be addressed in this Results section. We admit that our model is has several limitations in the Discussion and, consequently, we restricted its application to a limited set of quantitative comparisons paired to our experimental dataset or directly related to pioneering studies on GABAB efficacy on spines vs shafts. We believe that a proper answer to the reviewer’s suggestion would be worth a separate and dedicated study with an extended set of parameters and an elaborated model.

      In sum, the authors present an important study that synthesizes many experimental (in vitro and in vivo) and computational approaches. Moreover, the authors address the important question of how synaptic responses mediated by metabotropic receptors summate. Additional insights are gleaned from the function of neurogliaform cells. Altogether, the authors should be congratulated for a sophisticated and important study.

      Reviewer #3:

      The authors of this manuscript combine electrophysiological recordings, anatomical reconstructions and simulations to characterize synapses between neurogliaform interneurons (NGFCs) and pyramidal cells in somatosensory cortex. The main novel finding is a difference in summation of GABAA versus GABAB receptor-mediated IPSPs, with a linear summation of metabotropic IPSPs in contrast to the expected sublinear summation of ionotropic GABAA IPSPs. The authors also provide a number of structural and functional details about the parameters of GABAergic transmission from NGFCs to support a simulation suggesting that sublinear summation of GABAB IPSPs results from recruitment of dendritic shaft GABAB receptors that are efficiently coupled to GIRK channels.

      I appreciate the topic and the quality of the approach, but there are underlying assumptions that leave room to question some conclusions. I also have a general concern that the authors have not experimentally addressed mechanisms underlying the linear summation of GABAB IPSPs, reducing the significance of this most interesting finding.

      1) The main novel result of broad interest is supported by nice triple recording data showing linear summation of GABAB IPSPs (Figure 4), but I was surprised this result was not explored in more depth.

      We have chosen the approach of studying GABAB-GABAB interactions through the scope of neurogliaform cells and explored how neurogliaform cells as a population might give rise to the summation properties studied with triple recordings. This was a purposeful choice admittedly neglecting other possible sources of GABAB-GABAB interactions which possibly take place during high frequency coactivation of homogeneous or heterogeneous populations of interneurons innervating the same postsynaptic cell. We agree with the reviewer that the topic of summation of GABAB IPSPs is important and in-depth mechanistic understanding requires further separate studies.

      2) To assess the effective radius of NGFC volume transmission, the authors apply quantal analysis to determine the number of functional release sites to compare with structural analysis of presynaptic boutons at various distances from PC dendrites. This is a powerful approach for analyzing the structure-function relationship of conventional synapses but I am concerned about the robustness of the results (used in subsequent simulations) when applied here because it is unclear whether volume transmission satisfies the assumptions required for quantal analysis. For example, if volume transmission is similar to spillover transmission in that it involves pooling of neurotransmitter between release sites, then the quantal amplitude may not be independent of release probability. Many relevant issues are mentioned in the discussion but some relevant assumptions about QA are not justified.

      Indeed, pooling of neurotransmitter between release sites may affect quantal amplitude, therefore we examined quantal amplitude under low release probability conditions using 0.7- 1.5 mM [Ca]o to detect postsynaptic uniqantal events initiated by neurogliaform cell activation (Author response image 7). This way we measured similar quantal current amplitudes comparing with BQA method with no significant difference (4.46±0.83 pA, n=4, P=0.8, Mann-Whitney Test).

      3) The authors might re-think the lack of GABA transporters in the model since the presence and characteristics of GATs will have a large effect on the spread of GABA in the extracellular space.

      We agree that the presence of GAT could effectively shape the GABA exposure, e.g. (Scimemi 2014). During the development of the model, we took into consideration different possibilities and solutions to create the model’s environment. To our knowledge, there is no detailed electron microscopic study that would provide ultrastructural measurements of structural elements around the NGFC release sites and postsynaptic pyramidal cell dendrites in layer 1 while preserving the extracellular space. Moreover, quantitative information is scarce about the exact localization and density of the GATs along the membrane surface of glial processes around confirmed NGFC release sites. We felt that developing a functional environment that would contain GABA transporters without possessing such information would be speculative. Furthermore, during the development of the model it became clear that incorporating thousands of differentially located GABA transporters would massively increase the processing time of single simulations including monitoring each interaction between GATs and GABA molecules, and requiring computational power calculating the diffusion of GABA molecules in the extracellular space, even if GABA molecules are far from the postsynaptic dendritic site without any interaction.

      As an admittedly simple and constrained alternative, we decided to set a decay half-life for the GABA molecules released. This approach allows us to mimic the GABA exposure time of 20-200 ms, based on experimental data (Karayannis et al 2010). In the model the GABA exposure time was 114.87 ± 2.1 ms with decay time constants of 11.52 ± 0.14 ms. After ~200 ms all the released GABA molecules disappeared from the simulation environment.

      A detailed extracellular diffusion aspect was out of the scope of our model, we were interested in investigating how the subcellular localization of receptors and channels determine the summation properties.

      4) I'm not convinced that the repetitive stimulation protocol of a single presynaptic cell shown (Figure 5) is relevant for understanding summation of converging inputs (Figure 4), particularly in light of the strong use-dependent depression of GABA release from NGFCs. It is also likely that shunting inhibition contributes to sublinear summation to a greater extent during repetitive stimulation than summation from presynaptic cells that may target different dendritic domains. The authors claim that HCN channels do not affect integration of GABAB IPSPs but one would not expect HCN channel activation from the small hyperpolarization from a relatively depolarized holding potential.

      Use-dependent synaptic depression of NGFC induced postsynaptic responses was nicely documented by Karayannis and coworkers (2010) although they investigated the GABAA component of the responses and they found that the depression is caused by the desensitization of postsynaptic GABAA receptors. We are not aware of experiments published on the short term plasticity of GABAB responses. In our experiments represented in Fig 5 we found linearity in the summation of GABAB responses up to two action potentials and sublinearity for 3 and 6 action potentials. In fact, our results show that no synaptic depression is detectable in response to paired pulses since amplitudes of the voltage responses were doubled compared to a single pulse which means that the paired pulse ratio is around 1. To verify our result, we repeated our dual recording measurements with one, two, three and four spike initiation in the presynaptic neurogliaform cell (Author response image 6). Measuring both the amplitude and the overall charge of GABAB responses we again found linear relationship among one and two spike initiation protocol.

      Author response image 6 - Integration of GABAB receptor-mediated synaptic currents (A) Representative recording of a neurogliaform synaptic inhibition on a voltage clamped pyramidal cell. Bursts of up to four action potentials were elicited in NGFCs at 100 Hz in the presence of 1 μM gabazine and 10 μM NBQX (B) Summary of normalized IPSC peak amplitudes (left) and charge (right). (C) Pharmacological separation of neurogliaform initiated inhibitory current.

    1. Author Response:

      Reviewer #2:

      Non-canonical pathways for regulating protein synthesis serve important roles for controlling gene expression in critical developmental pathways. Homeobox (Hox) genes encode many mRNAs regulated at the level of translation. A general feature for many of these mRNAs has been the proposal they are regulated by Internal Ribosome Entry Sites (IRESs) and possess sequences in the 5'-untranslated regions (5'-UTR) of the mRNA that prevent canonical cap-dependent translation, termed "translation inhibitory elements" or TIEs. However, the mechanisms by which these Hox mRNAs are regulated remain unclear. Here, the authors focus on two Hox mRNAs, Hox a3 and Hox a11, and find they use entirely different means to achieve the same end of repressing cap-dependent translation. Hox a3 uses the non-canonical translation initiation factor eIF2D and an upstream open reading fram (uORF), whereas a11 uses a "start-stop" uORF followed by a thermodynamically stable stem-loop to inhibit translation. Overall, the experiments support the major conclusions drawn by the authors, and nail down mechanisms that have been left unresolved since the Hox mRNAs were first discovered to be regulated at the level of translation. These results will be of wide interest to the translation and developmental biology fields.

      Some issues the authors should consider:

      1) The mapping of the TIE boundaries are in general well-supported by the luciferase reporter experiments. However, there seems to be a disconnect in the luciferase values in Fig. 1B compared to the western blots in Supplementary Fig. 1D, however. For example, in the a3 case the 106 and 113 bands don't seem to correspond to levels consistent with the luciferase activity. For a11, the 153 band is not consistent with the luciferase activity. Also, the gels at the bottom are confusing. Should 74 in the left gel be 77? It would help to have a clearer explanation in the figure legend.

      The reviewer is right, supplementary figure 1D is misleading. We have clarified the data with a new supplementary figure 1D. The gels presented in this figure are not western blots, they are SDS-page analysis of translated product (i.e. Renilla luciferase protein) in the presence of 35S-Methionin. Since the function of TIE elements was measured in comparison with reporters that do not contain any TIE element, we loaded on each gel a reference (lanes w/o TIE) for quantification purposes. Since the exposure time of distinct gels was variable, one should not compare the intensities in between gels. We added the quantification of the gel intensity related to the reference construct (w/o TIE). We agree with the reviewer that the two gels at the bottom are not informative, we removed them from the new supplemental figure 1D.

      2) The results in the various sucrose gradients are not entirely convincing as presented. In all these cases, the experiment would benefit from the use of high-salt conditions (See Lodish and Rose, 1977, JBC 252, 1181-ff) in the gradient to remove background 80S not engaged with mRNAs. For the +cycloheximide sample in Fig. 8, this looks more like a "half-mer" between a monosome and disome, rather than a standard polysome.

      We do not agree with the point raised by the reviewer on sucrose gradients. Obviously this is due to a misunderstanding of the conducted experiments. We would like to remind that the plots shown in the manuscript represent the percentage of mRNA transcripts labelled with a radioactive cap that were introduced in cell-free translation extracts. Therefore, since we monitor only radioactivity, the sole radioactive mRNA transcripts tested in these experiments are observed, consequently there is no background 80S that are not engaged with mRNAs. Such background 80S are visible on the OD profile shown now in a novel supplementary figure S6. However, non-engaged 80S are not radioactive and mRNAs that are not engaged in the 80S are found in the RNP fraction. The absence of radioactive background 80S is further corroborated by the use of edeine that prevents the codon-anticodon interaction (see data below).

      When we setup our experimental strategy, we first used edeine to validate our protocol, in this case no radioactive 80S is observed confirming that no background 80S is present in our assays. In conclusion, peaks at the level of 80S can only be radioactive mRNA engaged in an 80S. We have extended the figure legend to clarify the conducted experiments.

      Concerning Fig 8, we agree that this experiment is not conclusive and propose to remove it as mentioned in response to a comment from reviewer #1.

      3) In Fig. 7, it would be helpful to see the absolute level of translation from the reporters, as it is not clear what the baseline level of translation is in the knockdown cell lines. It's hard to judge the eIF4E knockdown case in particular without this information. Also in panel B, the GGCCC147 cell line is missing.

      As previously mentioned, we agree that Fig 7 is misleading and we have completely remodelled the figure in the revised manuscript. See also point 5 from reviewer #1. Because the GGCCC147 mutation had no effect in RRL, we decided not to test it in HEK cells and focused on the GGCC107 that has a significant effect both in RRL and in HEK cells.

      4) From the MS experiments in Fig. 6 and Supplementary Fig. 6, the authors focus on eIF2D, which makes sense. But they don't comment on two other highly suggestive hits in the a3 vs. beta-globin and a3 vs. a11 comparisons. These are eIF5B and HBS1L. Both are highly suggestive of what might be going in with the eIF2D-dependent translation mechanism. They don't show up in the GMP-PNP samples in Supplementary Fig. 6, which is interesting and would deserve a comment.

      We are grateful for this very interesting comment. As suggested, we have inserted a comment related to HBS1L and eIF5B in the discussion of the manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      “A sample size of 3 idiopathic seems underpowered relative to the many types of genetic changes that can occur in ASD. Since the authors carried out WGS, it would be useful to know what potential causative variants were found in these 3 individuals and even if not overlapping if they might expect to be in a similar biological pathway.

      If the authors randomly selected 3 more idiopathic cell lines from individuals with autism, would these cell lines also have altered mTOR signaling? And could a line have the same cell biology defects without a change in mTOR signaling? The authors argue that the sample size could be the reason for lack of overlap of the proteomic changes (unlike the phosphor-proteomic overlaps), which makes the overlapping cell biology findings even more remarkable. Or is the phenotyping simply too crude to know if the phenotypes truly are the same?”

      We appreciate these thoughtful comments and also agree that of several models, our studies indicate the possibility of mTOR alteration in multiple forms of ASD. As above, we are currently pursuing this hypothesis with newly acquired DOD support. With regard to the I-ASD population, we agree that there are a large variety of genetic changes that can occur in genetically undefined ASDs. Indeed, this is precisely why we expected to see “personalized” phenotypes in each I-ASD individual when we embarked on this study. At that time, several years ago, we had planned to expand the analyses to more I-ASD individuals to assess for additional personalized phenotypes. However, as our studies progressed, we were surprised to find convergence in our I-ASD population in terms of neurite outgrowth and migration and later proteomic results showing convergence in mTOR. We found it particularly remarkable that despite a sample size of 3 that this convergence was noted. When we had the opportunity to extend our studies to the 16p11.2 deletion population, we were thrilled to conduct the first comparison between I-ASD and a genetically defined ASD and, as such, the scope of the paper turned towards this comparison. We do agree that analyses of the other I-ASD individuals would be a beneficial endeavor, both to understand how pervasive NPC migration and neurite deficits are in autism and to assess the presence of mTOR dysregulation. Furthermore, it would be important to see whether alterations in other pathways could also lead to similar cell biological deficits, though we know that other studies of neurodevelopmental disorders have found such cellular dysregulations without reporting concurrent mTOR dysregulation. Given our current grant funding to extend these analyses, such experiments within this manuscript would not be feasible.

      Regarding the phenotyping methods used, we decided to assess neurite outgrowth and migration as they are both cytoskeleton dependent processes that are critical for neurodevelopment and are often regulated by the same genes. Furthermore, similar analyses have been applied to Fragile-X Syndrome, 22q11.2 deletion syndrome, and schizophrenia NPCs (Shcheglovitov A. et al., 2013; Mor-Shaked H. et al., 2016; Urbach A. et al., 2010; Kelley D. J. et al., 2008; Doers M. E. et al., 2014; Brennand K. et al., 2015; Lee I. S. et al., 2015; Marchetto M. C. et al., 2011). As such, it seems that multiple underlying etiologies can lead to similar dysregulated cellular phenotypes that can contribute to a variety of neurodevelopmental disorders. On a more global level, there are only a few different cellular functions a developing neuron can undergo, and these include processes such as proliferation, survival, migration, and differentiation. Thus, to understand neurodevelopmental disorders, it is important to study the more “crude” or “global” cellular functions occurring during neurodevelopment to determine whether they are disrupted in disorders such as ASD. In our studies we find that there are indeed dysregulations in many of these basic developmental processes, indicating that the typical steps that occur for normal brain cytoarchitecture may be disrupted in ASD. To understand why, we then further utilized molecular studies to “zoom” in on potential mechanisms which implicated common dysregulation in mTOR signaling as one driver for these common cellular phenotypes. As suggested, we did complete WGS on all the I-ASD individuals and did not see any overlapping genetic variants between the three I-ASD individuals as mentioned in our manuscript. The genetic data was published in a larger manuscript incorporating the data (Zhou A. et al., 2023). However, there were variants that were unique to each I-ASD individual which were not seen in their unaffected family members, and it is possible these variants could be contributing to the I-ASD phenotypes. We also utilized IPA to conduct pathway analysis on the WGS data utilizing the same approach we did in analysis of p- proteome and proteome data. From WGS data, we selected high read-quality variants that were found only in I-ASD individuals and had a functional impact on protein (ie excluding synonymous variants). The enriched pathways obtained from this data were strikingly different from the pathways we found in the p-proteome analysis and are now included in supplemental Figure 6 in the manuscript. Briefly, the top 5 enriched pathways were: O-linked glycosylation, MHC class 1 signaling, Interleukin signaling, Antigen presentation, and regulation of transcription.

      Reviewer #2 (Public Review):

      1) I found that interpreting how differential EF sensitivity is connected to the rest of the story difficult at times. First, it is unclear why these extracellular factors were picked. These are seemingly different in nature (a neuropeptide, a growth factor and a neuromodulator) targeting largely different pathways. This limits the interpretation of the ASD subtype-specific rescue results. One way of reframing that could help is that these are pro-migratory factors instead of EFs broadly defined that fail to promote migration in I-ASD lines due to a shared malfunctioning of the intracellular migration machinery or cell-cell interactions (possibly through tight junction signaling, Fig S2A). Yet, this doesn't explain the migration/neurite phenotypes in 16p11 lines where EF sensitivity is not altered, overall implying that divergent EF sensitivity independent of underlying mTOR state. What is the proposed model that connects all three findings (divergent EF sensitivity based on ASD subtypes, 2 mTOR classes, convergent cellular phenotypes)?

      We thank you for the kind assessment of our manuscript and for the thought-provoking questions posed. In terms of extracellular factors, for our study, we defined extracellular factor as any growth factor, amino acid, neurotransmitter, or neuropeptide found in the extracellular environment of the developing cells. The EFs utilized were selected due to their well-established role in regulation of early neurodevelopmental phenotypes, their expression during the “critical window” of mid-fetal development (as determined by Allan Brain Atlas), and in the case of 5-HT, its association with ASD (Abdulamir H. A. et al., 2018; Adamsen D. et al., 2014; Bonnin A. et al., 2011; Bonnin A. et al., 2007; Chen X. et al., 2015; El Marroun H. et al., 2014; Hammock E. et al., 2012; Yang C. J. et al., 2014; Dicicco-Bloom E. et al., 1998; Lu N. et al., 1998; Suh J. et al., 2001; Watanabe J. et al., 2016; Gilmore J. H. et al., 2003; Maisonpierre P. C. et al., 1990; Dincel N. et al., 2013; Levi- Montalcini R., 1987). Lastly, prior experiments in our lab with a mouse model of neurodevelopmental disorders, had shown atypical responses to EFs (IGF-1, FGF, PACAP). As such, when we first chose to use EFs in human NPCs we wanted to know 1) whether human NPCs even responded to these EFs, 2) whether EFs regulated neurite outgrowth and migration and 3) would there be a differential response in NPCs derived from those with ASD. Our studies were initiated on the I-ASD cohort and given the heterogeneity of ASD we had hypothesized we would get “personalized” neurite and migration phenotypes. Due to this reason, we also wanted to select multiple types of EFs that worked on different signaling pathways. Ultimately, instead of personalized phenotypes we found that all the I-ASD NPCs did not respond to any of the EFs tested whereas the 16p11.2 deletion NPCS did – this was therefore the only difference we found between these two “forms” of ASD. As noted, in I-ASD the lack of response to EFs can be ameliorated by modulating mTOR. However, in the 16p11.2 deletion, despite similar mTOR dysregulation as seen in I-ASD, there is no EF impairment. We do not have a cohesive model to explain why the 16pDel individuals differ from the I-ASD model other than to point to the p- proteomes which do show that the 16pDel NPCs are distinct from the I-ASD NPCs. It seems that mTOR alteration can contribute to impaired EF responsiveness in some NPCs but perhaps there is an additional defect that needs to be present in order for this defect to manifest, or that 16p11.2 deletion NPCs have specific compensatory features. For example, as noted in the thoughtful comment, the p-proteome canonical pathway analysis shows tight junction malfunction in I-ASD which is not present in the 16pDel NPCs and it could be the combination of mTOR dysregulation + dysregulated tight junction signaling that has led to lack of response to EFs in I-ASD. Regardless, we do not think the differences between two genetically distinct ASDs diminish the convergent mTOR results we have uncovered. That is, regardless of whatever defects are present in the ASD NPCs, we are able to rescue it with mTOR modulation which has fascinating implications for treatment and conceptualization for ASD. Lastly, we see our EF studies as an important inclusion as it shows that in some subtypes of ASD, lack of response to appropriate EFs could be contributing to neurodevelopmental abnormalities. Moreover, lack of response to these EFs could have implications for treatment of individuals with ASD (for example, SSRI are commonly used to treat co-morbid conditions in ASD but if an individual is unresponsive to 5- HT, perhaps this treatment is less effective). We have edited the manuscript to include an additional discussion section to address the EFs more thoroughly and have included a few extra sentences in the introduction as well!

      2) A similar bidirectional migration phenotype has been described in hiSPC-derived human cortical interneurons generated from individuals with Timothy Syndrome (Birey et al 2022, Cell Stem Cell). Here, authors show that the intracellular calcium influx that is excessive in Timothy Syndrome or pharmacologically dampened in controls results in similar migration phenotypes. Authors can consider referring to this report in support of the idea that bimodal perturbations of cardinal signaling pathways can converge upon common cellular migration deficits.

      We thank you for pointing out the similar migration phenotype in the Timothy Syndrome paper and have now cited it in our manuscript. We have also expanded on the concept of “too much or too little” of a particular signaling mechanism leading to common outcomes.

      3) Given that authors have access to 8 I-ASD hiPSC lines, it'd very informative to assay the mTOR state (e.g. pS6 westerns) in NPCs derived from all 8 lines instead of the 3 presented, even without assessing any additional cellular phenotypes, which authors have shown to be robust and consistent. This can help the readers better get a sense of the proportion of high mTOR vs low- mTOR classes in a larger cohort.

      We have already addressed this in response to reviewer 1 and the essential revisions section, providing our reasoning for not expanding the study to all 8 I-ASD individuals.

      4) Does the mTOR modulation rescue EF-specific responses to migration as well (Figure 7)

      We did not conduct sufficient replicates of the rescue EF specific responses to migration due to the time consuming and resource intensive nature of the neurosphere experiments. Unlike the neurite experiments, the neurosphere experiments require significantly more cells, more time, selection of neurospheres based on a size criterion, and then manual trace measurements. We did one experiment in Family-1 where we utilized MK-2206 to abolish the response of Sib NPCs to PACAP. Likewise, adding SC-79 to I-ASD-1 neurospheres allowed for response to PACAP.

      Author response image 1.

      Author response image 2.

      Reviewer #3: Public Review

      We appreciate the kind, detailed and very thorough review you provided for us!

      The results on the mTOR signaling pathway as a point of convergence in these particular ASD subtypes is interesting, but the discussion should address that this has been demonstrated for other autism syndromes, and in the present manuscript, there should be some recognition that other signaling pathways are also implicated as common factors between the ASD subtypes.

      With regards to the mTOR pathway, we had included the other ASD syndromes in which mTOR dysregulation has been seen including tuberous sclerosis, Cowden Syndrome, NF-1, as well as Fragile-X, Angelman, Rett and Phelan McDermid in the final paragraph of the discussion section “mTOR Signaling as a Point of Convergence in ASD”. We have now expanded our discussion to include that other signaling pathways such as MAPK, cyclins, WNT, and reelin which have also been implicated as common factors between the ASD subtypes.

      The conclusions of this paper are mostly well supported by data, but for the cell migration assay, it is not clear if the authors control for initial differences in the inner cell mass area of the neurospheres in control vs ASD samples, which would affect the measurement of migration.

      Thank you for this thoughtful comment! When we first started our migration data, inner cell mass size was indeed a major concern for which we controlled in our methods. First, when plating the neurospheres, we would only collect spheres when a majority of spheres were approximately a diameter of 100 um. Very large spheres often could not be imaged due to being out of focus and very small spheres would often disperse when plated. Thus, there were some constraints to the variability of inner cell mass size.

      Furthermore, when we initially collected data, we conducted a proof of principal test to see if initial inner cell mass area (henceforth referred to as initial sphere size or ISS) influenced migration data. To do so, we obtained migration and ISS data from each diagnosis (Sib, NIH, I-ASD, 16pASD). Then we utilized R studio to see if there is a relationship between Migration and ISS in each diagnosis category using the equation (lm(Migration~ISS, data=bydiagnosis). In this equation, lm indicates linear modeling and (~) is a term used to ascertain the relationship between Migration and ISS and the term data=bydiagnosis allows the data to be organized by diagnosis

      The results were expressed as R-squared values indicating the correlation between ISS and Migration for each diagnosis and the p-value showing statistical significance for each comparison. As shown in Author response table 1, for each data set, there is minimal correlation between Migration and ISS in each data set. Moreover, there are no statistically significant relationships between Migration and ISS indicating that initial sphere size DOES NOT influence migration data in any of our data-sets.

      Author response table 1.

      Lastly, utilizing R, we modeled what predicted migration would be like for Sib, NIH, I-ASD, and 16pASD if we accounted for ISS in each group. Raw migration data was then plotted against the predicted data as in Author response image 3.

      Author response image 3.

      As shown in the graph, there are no statistical differences between the raw migration data (the data that we actually measured in the dish) and the modeled data in which ISS is accounted for as a variable. As such, we chose not to normalize to or account for ISS in our other experiments. We have now included the above R studio analyses in our supplemental figures (Figure S1) as well.

      Also, in Fig 5 and 6, panels I and J omit the effects of drug on mTOR phosphorylation as shown for other conditions.

      Both SC-79 and MK2206 were selected in our experiments after thorough analysis of their effects on human epithelial cells and other cultured cells (citations in manuscript). However, initially, we did not know whether either of these drugs would modulate the mTOR pathway in human NPCs, thus, in Figures 5A,5D, 6A and 6D we chose to focus on two of our data-sets to establish the effect of these drugs in human NPCs. Our experiments in Family-1 and Family-2 showed us that SC-79 increases PS6 in human NPCs while MK-2206 downregulates it. Once this was established, we knew the drugs would have similar effects in the NPCs from the other families. Thus, we only conducted a proof of principle test to confirm the drug does indeed have the intended effect in I-ASD-3 and 16pDel. We have included these proof of principle westerns in Figure 5I, 5K, 6I and 6K to show that the effects of these drugs are reproducible across all our NPC lines. We did not include quantification since the data is only from our single proof of principle western.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript will interest cognitive scientists, neuroimaging researchers, and neuroscientists interested in the systems-level organization of brain activity. The authors describe four brain states that are present across a wide range of cognitive tasks and determine that the relative distribution of the brain states shows both commonalities and differences across task conditions.

      The authors characterized the low-dimensional latent space that has been shown to capture the major features of intrinsic brain activity using four states obtained with a Hidden Markov Model. They related the four states to previously-described functional gradients in the brain and examined the relative contribution of each state under different cognitive conditions. They showed that states related to the measured behavior for each condition differed, but that a common state appears to reflect disengagement across conditions. The authors bring together a state-of-the-art analysis of systemslevel brain dynamics and cognitive neuroscience, bridging a gap that has long needed to be bridged.

      The strongest aspect of the study is its rigor. The authors use appropriate null models and examine multiple datasets (not used in the original analysis) to demonstrate that their findings replicate. Their thorough analysis convincingly supports their assertion that common states are present across a variety of conditions, but that different states may predict behavioural measures for different conditions. However, the authors could have better situated their work within the existing literature. It is not that a more exhaustive literature review is needed-it is that some of their results are unsurprising given the work reported in other manuscripts; some of their work reinforces or is reinforced by prior studies; and some of their work is not compared to similar findings obtained with other analysis approaches. While space is not unlimited, some of these gaps are important enough that they are worth addressing:

      We appreciate the reviewer’s thorough read of our manuscript and positive comments on its rigor and implications. We agree that the original version of the manuscript insufficiently situated this work in the existing literature. We have made extensive revisions to better place our findings in the context of prior work. These changes are described in detail below.

      1) The authors' own prior work on functional connectivity signatures of attention is not discussed in comparison to the latest work. Neither is work from other groups showing signatures of arousal that change over time, particularly in resting state scans. Attention and arousal are not the same things, but they are intertwined, and both have been linked to large-scale changes in brain activity that should be captured in the HMM latent states. The authors should discuss how the current work fits with existing studies.

      Thank you for raising this point. We agree that the relationship between low-dimensional latent states and predefined activity and functional connectivity signatures is an important and interesting question in both attention research and more general contexts. Here, we did not empirically relate the brain states examined in this study and functional connectivity signatures previously investigated in our lab (e.g., Rosenberg et al., 2016; Song et al., 2021a) because the research question and methodological complexities deserved separate attention that go beyond the scope of this paper. Therefore, we conceptually addressed the reviewer’s question on how functional connectivity signatures of attention are related to the brain states that were observed here. Next, we asked how arousal relates to the brain states by indirectly predicting arousal levels of each brain state based on its activity patterns’ spatial resemblance to the predefined arousal network template (Goodale et al., 2021).

      Latent states and dynamic functional connectivity

      Previous work suggested that, on medium time scales (~20-60 seconds), changes in functional connectivity signatures of sustained attention (Rosenberg et al., 2020) and narrative engagement (Song et al., 2021a) predicted changes in attentional states. How do these attention-related functional connectivity dynamics relate to latent state dynamics, measured on a shorter time scale (1 second)?

      Theoretically, there are reasons to think that these measures are related but not redundant. Both HMM and dynamic functional connectivity provide summary measures of the whole-brain functional interactions that evolve over time. Whereas HMM identifies recurring low-dimensional brain states, dynamic functional connectivity used in our and others’ prior studies captures high-dimensional dynamical patterns. Furthermore, while the mixture Gaussian function utilized to infer emission probability in our HMM infers the states from both the BOLD activity patterns and their interactions, functional connectivity considers only pairwise interactions between regions of interests. Thus, with a theoretical ground that the brain states can be characterized at multiple scales and different methods (Greene et al., 2023), we can hypothesize that the both measures could (and perhaps, should be able to) capture brain-wide latent state changes. For example, if we were to apply kmeans clustering methods on the sliding window-based dynamic functional connectivity as in Allen et al. (2014), the resulting clusters could arguably be similar to the latent states derived from the HMM.

      However, there are practical reasons why the correspondence between our prior dynamic functional connectivity models and current HMM states is difficult to test directly. A time point-bytime point matching of the HMM state sequence and dynamic functional connectivity is not feasible because, in our prior work, dynamic functional connectivity was measured in a sliding time window (~20-60 seconds), whereas the HMM state identification is conducted at every TR (1 second). An alternative would be to concatenate all time points that were categorized as each HMM state to compute representative functional connectivity of that state. This “splicing and concatenating” method, however, disrupts continuous BOLD-signal time series and has not previously been validated for use with our dynamic connectome-based predictive models. In addition, the difference in time series lengths across states would make comparisons of the four states’ functional connectomes unfair.

      One main focus of our manuscript was to relate brain dynamics (HMM state dynamics) to static manifold (functional connectivity gradients). We agree that a direct link between two measures of brain dynamics, HMM and dynamic functional connectivity, is an important research question. However, due to some intricacies that needed to be addressed to answer this question, we felt that it was beyond the scope of our paper. We are eager, however, to explore these comparisons in future work which can more thoroughly address the caveats associated with comparing models of sustained attention, narrative engagement, and arousal defined using different input features and methods.

      Arousal, attention, and latent neural state dynamics

      Next, the reviewer posed an important question about the relationship between arousal, attention, and latent states. The current study was designed to assess the relationship between attention and latent state dynamics. However, previous neuroimaging work showed that low-dimensional brain dynamics reflect fluctuations in arousal (Raut et al., 2021; Shine et al., 2016; Zhang et al., 2023). Behavioral studies showed that attention and arousal hold a non-linear relationship, for example, mind-wandering states are associated with lower arousal and externally distracted states are associated with higher arousal, when both these states indicate low attention (Esterman and Rothlein, 2019; Unsworth and Robison, 2018, 2016).

      To address the reviewer’s suggestion, we wanted to test if our brain states reflected changes in arousal, but we did not collect relevant behavioral or physiological measures. Therefore, to indirectly test for relationships, we predicted levels of arousal in brain states by applying the “arousal network template” defined by Dr. Catie Chang’s group (Chang et al., 2016; Falahpour et al., 2018; Goodale et al., 2021). The arousal network template was created from resting-state fMRI data to predict arousal levels indicated by eye monitoring and electrophysiological signals. In the original study, the arousal level at each time point was predicted by the correlation between the BOLD activity patterns of each TR to the arousal template. The more similar the whole-brain activation pattern was to the arousal network template, the higher the participant was predicted to be aroused at that moment. This activity pattern-based model was generalized to fMRI data during tasks (Goodale et al., 2021).

      We correlated the arousal template to the activity patterns of the four brain states that were inferred by the HMM. The DMN state was positively correlated with the arousal template (r=0.264) and the SM state was negatively correlated with the arousal template (r=-0.303) (Author response image 1). These values were not tested for significance because they were single observations. While speculative, this may suggest that participants are in a high arousal state during the DMN state and a low arousal state during the SM state. Together with our results relating brain states to attention, it is possible that the SM state is a common state indicating low arousal and low attention. On the other hand, the DMN state, a signature of a highly aroused state, may benefit gradCPT task performance but not necessarily in engaging with a sitcom episode. However, because this was a single observation and we did not collect a physiological measure of arousal to validate this indirect prediction result, we did not include the result in the manuscript. We hope to more directly test this question in future work with behavioral and physiological measures of arousal.

      Author response image 1.

      Changes made to the manuscript

      Importantly, we agree with the reviewer that a theoretical discussion about the relationships between functional connectivity, latent states, gradients, as well as attention and arousal was a critical omission from the original Discussion. We edited the Discussion to highlight past literature on these topics and encourage future work to investigate these relationships.

      [Manuscript, page 11] “Previous studies showed that large-scale neural dynamics that evolve over tens of seconds capture meaningful variance in arousal (Raut et al., 2021; Zhang et al., 2023) and attentional states (Rosenberg et al., 2020; Yamashita et al., 2021). We asked whether latent neural state dynamics reflect ongoing changes in attention in both task and naturalistic contexts.”

      [Manuscript, page 17] “Previous work showed that time-resolved whole-brain functional connectivity (i.e., paired interactions of more than a hundred parcels) predicts changes in attention during task performance (Rosenberg et al., 2020) as well as movie-watching and story-listening (Song et al., 2021a). Future work could investigate whether functional connectivity and the HMM capture the same underlying “brain states” to bridge the results from the two literatures. Furthermore, though the current study provided evidence of neural state dynamics reflecting attention, the same neural states may, in part, reflect fluctuations in arousal (Chang et al., 2016; Zhang et al., 2023). Complementing behavioral studies that demonstrated a nonlinear relationship between attention and arousal (Esterman and Rothlein, 2019; Unsworth and Robison, 2018, 2016), future studies collecting behavioral and physiological measures of arousal can assess the extent to which attention explains neural state dynamics beyond what can be explained by arousal fluctuations.”

      2) The 'base state' has been described in a number of prior papers (for one early example, see https://pubmed.ncbi.nlm.nih.gov/27008543). The idea that it might serve as a hub or intermediary for other states has been raised in other studies, and discussion of the similarity or differences between those studies and this one would provide better context for the interpretation of the current work. One of the intriguing findings of the current study is that the incidence of this base state increases during sitcom watching, the strongest evidence to date is that it has a cognitive role and is not merely a configuration of activity that the brain must pass through when making a transition.

      We greatly appreciate the reviewer’s suggestion of prior papers. We were not aware of previous findings of the base state at the time of writing the manuscript, so it was reassuring to see consistent findings. In the Discussion, we highlighted the findings of Chen et al. (2016) and Saggar et al. (2022). Both studies highlighted the role of the base state as a “hub”-like transition state. However, as the reviewer noted, these studies did not address the functional relevance of this state to cognitive states because both were based on resting-state fMRI.

      In our revised Discussion, we write that our work replicates previous findings of the base state that consistently acted as a transitional hub state in macroscopic brain dynamics. We also note that our study expands this line of work by characterizing what functional roles the base state plays in multiple contexts: The base state indicated high attentional engagement and exhibited the highest occurrence proportion as well as longest dwell times during naturalistic movie watching. The base state’s functional involvement was comparatively minor during controlled tasks.

      [Manuscript, page 17-18] “Past resting-state fMRI studies have reported the existence of the base state. Chen et al. (2016) used the HMM to detect a state that had “less apparent activation or deactivation patterns in known networks compared with other states”. This state had the highest occurrence probability among the inferred latent states, was consistently detected by the model, and was most likely to transition to and from other states, all of which mirror our findings here. The authors interpret this state as an “intermediate transient state that appears when the brain is switching between other more reproducible brain states”. The observation of the base state was not confined to studies using HMMs. Saggar et al. (2022) used topological data analysis to represent a low-dimensional manifold of resting-state whole-brain dynamics as a graph, where each node corresponds to brain activity patterns of a cluster of time points. Topologically focal “hub” nodes were represented uniformly by all functional networks, meaning that no characteristic activation above or below the mean was detected, similar to what we observe with the base state. The transition probability from other states to the hub state was the highest, demonstrating its role as a putative transition state.

      However, the functional relevance of the base state to human cognition had not been explored previously. We propose that the base state, a transitional hub (Figure 2B) positioned at the center of the gradient subspace (Figure 1D), functions as a state of natural equilibrium. Transitioning to the DMN, DAN, or SM states reflects incursion away from natural equilibrium (Deco et al., 2017; Gu et al., 2015), as the brain enters a functionally modular state. Notably, the base state indicated high attentional engagement (Figure 5E and F) and exhibited the highest occurrence proportion (Figure 3B) as well as the longest dwell times (Figure 3—figure supplement 1) during naturalistic movie watching, whereas its functional involvement was comparatively minor during controlled tasks. This significant relevance to behavior verifies that the base state cannot simply be a byproduct of the model. We speculate that susceptibility to both external and internal information is maximized in the base state—allowing for roughly equal weighting of both sides so that they can be integrated to form a coherent representation of the world—at the expense of the stability of a certain functional network (Cocchi et al., 2017; Fagerholm et al., 2015). When processing rich narratives, particularly when a person is fully immersed without having to exert cognitive effort, a less modular state with high degrees of freedom to reach other states may be more likely to be involved. The role of the base state should be further investigated in future studies.”

      3) The link between latent states and functional connectivity gradients should be considered in the context of prior work showing that the spatiotemporal patterns of intrinsic activity that account for most of the structure in resting state fMRI also sweep across functional connectivity gradients (https://pubmed.ncbi.nlm.nih.gov/33549755/). In fact, the spatiotemporal dynamics may give rise to the functional connectivity gradients (https://pubmed.ncbi.nlm.nih.gov/35902649/). HMM states bear a marked resemblance to the high-activity phases of these patterns and are likely to be closely linked to them. The spatiotemporal patterns are typically obtained during rest, but they have been reported during task performance (https://pubmed.ncbi.nlm.nih.gov/30753928/) which further suggests a link to the current work. Similar patterns have been observed in anesthetized animals, which also reinforces the conclusion of the current work that the states are fundamental aspects of the brain's functional organization.

      We appreciate the comments that relate spatiotemporal patterns, functional connectivity gradients, and the latent states derived from the HMM. Our work was also inspired by the papers that the reviewer suggested, especially Bolt et al.’s (2022), which compared the results of numerous dimensionality and clustering algorithms and suggested three spatiotemporal patterns that seemed to be commonly supported across algorithms. We originally cited these studies throughout the manuscript, but did not discuss them comprehensively. We have revised the Discussion to situate our findings on past work that used resting-state fMRI to study low-dimensional latent brain states.

      [Manuscript, page 15-16] “This perspective is supported by previous work that has used different methods to capture recurring low-dimensional states from spontaneous fMRI activity during rest. For example, to extract time-averaged latent states, early resting-state analyses identified task-positive and tasknegative networks using seed-based correlation (Fox et al., 2005). Dimensionality reduction algorithms such as independent component analysis (Smith et al., 2009) extracted latent components that explain the largest variance in fMRI time series. Other lines of work used timeresolved analyses to capture latent state dynamics. For example, variants of clustering algorithms, such as co-activation patterns (Liu et al., 2018; Liu and Duyn, 2013), k-means clustering (Allen et al., 2014), and HMM (Baker et al., 2014; Chen et al., 2016; Vidaurre et al., 2018, 2017), characterized fMRI time series as recurrences of and transitions between a small number of states. Time-lag analysis was used to identify quasiperiodic spatiotemporal patterns of propagating brain activity (Abbas et al., 2019; Yousefi and Keilholz, 2021). A recent study extensively compared these different algorithms and showed that they all report qualitatively similar latent states or components when applied to fMRI data (Bolt et al., 2022). While these studies used different algorithms to probe data-specific brain states, this work and ours report common latent axes that follow a long-standing theory of large-scale human functional systems (Mesulam, 1998). Neural dynamics span principal axes that dissociate unimodal to transmodal and sensory to motor information processing systems.”

      Reviewer #2 (Public Review):

      In this study, Song and colleagues applied a Hidden Markov Model to whole-brain fMRI data from the unique SONG dataset and a grad-CPT task, and in doing so observed robust transitions between lowdimensional states that they then attributed to specific psychological features extracted from the different tasks.

      The methods used appeared to be sound and robust to parameter choices. Whenever choices were made regarding specific parameters, the authors demonstrated that their approach was robust to different values, and also replicated their main findings on a separate dataset.

      I was mildly concerned that similarities in some of the algorithms used may have rendered some of the inter-measure results as somewhat inevitable (a hypothesis that could be tested using appropriate null models).

      This work is quite integrative, linking together a number of previous studies into a framework that allows for interesting follow-up questions.

      Overall, I found the work to be robust, interesting, and integrative, with a wide-ranging citation list and exciting implications for future work.

      We appreciate the reviewer’s comments on the study’s robustness and future implications. Our work was highly motivated by the reviewer’s prior work.

      Reviewer #3 (Public Review):

      My general assessment of the paper is that the analyses done after they find the model are exemplary and show some interesting results. However, the method they use to find the number of states (Calinski-Harabasz score instead of log-likelihood), the model they use generally (HMM), and the fact that they don't show how they find the number of states on HCP, with the Schaeffer atlas, and do not report their R^2 on a test set is a little concerning. I don't think this perse impedes their results, but it is something that they can improve. They argue that the states they find align with long-standing ideas about the functional organization of the brain and align with other research, but they can improve their selection for their model.

      We appreciate the reviewer’s thorough read of the paper, evaluation of our analyses linking brain states to behavior as “exemplary”, and important questions about the modeling approach. We have included detailed responses below and updated the manuscript accordingly.

      Strengths:

      • Use multiple datasets, multiple ROIs, and multiple analyses to validate their results

      • Figures are convincing in the sense that patterns clearly synchronize between participants

      • Authors select the number of states using the optimal model fit (although this turns out to be a little more questionable due to what they quantify as 'optimal model fit')

      We address this concern on page 30-31 of this response letter.

      • Replication with Schaeffer atlas makes results more convincing

      • The analyses around the fact that the base state acts as a flexible hub are well done and well explained

      • Their comparison of synchrony is well-done and comparing it to resting-state, which does not have any significant synchrony among participants is obvious, but still good to compare against.

      • Their results with respect to similar narrative engagement being correlated with similar neural state dynamics are well done and interesting.

      • Their results on event boundaries are compelling and well done. However, I do not find their Chang et al. results convincing (Figure 4B), it could just be because it is a different medium that explains differences in DMN response, but to me, it seems like these are just altogether different patterns that can not 100% be explained by their method/results.

      We entirely agree with the reviewer that the Chang et al. (2021) data are different in many ways from our own SONG dataset. Whereas data from Chang et al. (2021) were collected while participants listened to an audio-only narrative, participants in the SONG sample watched and listened to audiovisual stimuli. They were scanned at different universities in different countries with different protocols by different research groups for different purposes. That is, there are numerous reasons why we would expect the model should not generalize. Thus, we found it compelling and surprising that, despite all of these differences between the datasets, the model trained on the SONG dataset generalized to the data from Chang et al. (2021). The results highlighted a robust increase in the DMN state occurrence and a decrease in the base state occurrence after the narrative event boundaries, irrespective of whether the stimulus was an audiovisual sitcom episode or a narrated story. This external model validation was a way that we tested the robustness of our own model and the relationship between neural state dynamics and cognitive dynamics.

      • Their results that when there is no event, transition into the DMN state comes from the base state is 50% is interesting and a strong result. However, it is unclear if this is just for the sitcom or also for Chang et al.'s data.

      We apologize for the lack of clarity. We show the statistical results of the two sitcom episodes as well as Chang et al.’s (2021) data in Figure 4—figure supplement 2 in our original manuscript. Here, we provide the exact values of the base-to-DMN state transition probability, and how they differ across moments after event boundaries compared to non-event boundaries.

      For sitcom episode 1, the probability of base-to-DMN state transition was 44.6 ± 18.8 % at event boundaries whereas 62.0 ± 10.4 % at non-event boundaries (FDR-p = 0.0013). For sitcom episode 2, the probability of base-to-DMN state transition was 44.1 ± 18.0 % at event boundaries whereas 62.2 ± 7.6 % at non-event boundaries (FDR-p = 0.0006). For the Chang et al. (2021) dataset, the probability of base-to-DMN state transition was 33.3 ± 15.9 % at event boundaries whereas 58.1 ± 6.4 % at non-event boundaries (FDR-p < 0.0001). Thus, our result, “At non-event boundaries, the DMN state was most likely to transition from the base state, accounting for more than 50% of the transitions to the DMN state” (pg 11, line 24-25), holds true for both the internal and external datasets.

      • The involvement of the base state as being highly engaged during the comedy sitcom and the movie are interesting results that warrant further study into the base state theory they pose in this work.

      • It is good that they make sure SM states are not just because of head motion (P 12).

      • Their comparison between functional gradient and neural states is good, and their results are generally well-supported, intuitive, and interesting enough to warrant further research into them. Their findings on the context-specificity of their DMN and DAN state are interesting and relate well to the antagonistic relationship in resting-state data.

      Weaknesses:

      • Authors should train the model on part of the data and validate on another

      Thank you for raising this issue. To the best of our knowledge, past work that applied the HMM to the fMRI data has conducted training and inference on the same data, including initial work that implemented HMM on the resting-state fMRI (Baker et al., 2014; Chen et al., 2016; Vidaurre et al., 2018, 2017) as well as more recent work that applied HMMs to the task or movie-watching fMRI (Cornblath et al., 2020; Taghia et al., 2018; van der Meer et al., 2020; Yamashita et al., 2021). That is, the parameters—emission probability, transition probability, and initial probability—were estimated from the entire dataset and the latent state sequence was inferred using the Viterbi algorithm on the same dataset.

      However, we were also aware of the potential problem this may have. Therefore, in our recent work asking a different research question in another fMRI dataset (Song et al., 2021b), we trained an HMM on a subset of the dataset (moments when participants were watching movie clips in the original temporal order) and inferred latent state sequence of the fMRI time series in another subset of the dataset (moments when participants were watching movie clips in a scrambled temporal order). To the best of our knowledge, this was the first paper that used different segments of the data to fit and infer states from the HMM.

      In the current study, we wanted to capture brain states that underlie brain activity across contexts. Thus, we presented the same-dataset training and inference procedure as our primary result. However, for every main result, we also showed results where we separated the data used for model fitting and state inference. That is, we fit the HMM on the SONG dataset, primarily report the inference results on the SONG dataset, but also report inference on the external datasets that were not included in model fitting. The datasets used were the Human Connectome Project dataset (Van Essen et al., 2013), Chang et al. (2021) audio-listening dataset, Rosenberg et al. (2016) gradCPT dataset, and Chen et al. (2017) Sherlock dataset.

      However, to further address the concern of the reviewer whether the HMM fit is reliable when applied to held-out data, we computed the reliability of the HMM inference by conducting crossvalidations and split-half reliability analysis.

      (1) Cross-validation

      To separate the dataset used for HMM training and inference, we conducted cross-validation on the SONG dataset (N=27) by training the model with the data from 26 participants and inferring the latent state sequence of the held-out participant.

      First, we compared the robustness of the model training by comparing the mean activity patterns of the four latent states fitted at the group level (N=27) with the mean activity patterns of the four states fitted across cross-validation folds. Pearson’s correlations between the group-level vs. cross-validated latent states’ mean activity patterns were r = 0.991 ± 0.010, with a range from 0.963 to 0.999.

      Second, we compared the robustness of model inference by comparing the latent state sequences that were inferred at the group level vs. from held-out participants in a cross-validation scheme. All fMRI conditions had mean similarity higher than 90%; Rest 1: 92.74 ± 5.02 %, Rest2: 92.74 ± 4.83 %, GradCPT face: 92.97 ± 6.41 %, GradCPT scene: 93.27 ± 5.76 %, Sitcom ep1: 93.31 ± 3.92 %, Sitcom ep2: 93.13 ± 4.36 %, Documentary: 92.42 ± 4.72 %.

      Third, with the latent state sequences inferred from cross-validation, we replicated the analysis of Figure 3 to test for synchrony of the latent state sequences across participants. The crossvalidated results were highly similar to manuscript Figure 3, which was generated from the grouplevel analysis. Mean synchrony of latent state sequences are as follows: Rest 1: 25.90 ± 3.81%, Rest 2: 25.75 ± 4.19 %, GradCPT face: 27.17 ± 3.86 %, GradCPT scene: 28.11 ± 3.89 %, Sitcom ep1: 40.69 ± 3.86%, Sitcom ep2: 40.53 ± 3.13%, Documentary: 30.13 ± 3.41%.

      Author response image 2.

      (2) Split-half reliability

      To test for the internal robustness of the model, we randomly assigned SONG dataset participants into two groups and conducted HMM separately in each. Similarity (Pearson’s correlation) between the two groups’ activation patterns were DMN: 0.791, DAN: 0.838, SM: 0.944, base: 0.837. The similarity of the covariance patterns were DMN: 0.995, DAN: 0.996, SM: 0.994, base: 0.996.

      Author response image 3.

      We further validated the split-half reliability of the model using the HCP dataset, which contains data of a larger sample (N=119). Similarity (Pearson’s correlation) between the two groups’ activation patterns were DMN: 0.998, DAN: 0.997, SM: 0.993, base: 0.923. The similarity of the covariance patterns were DMN: 0.995, DAN: 0.996, SM: 0.994, base: 0.996.

      Together the cross-validation and split-half reliability results demonstrate that the HMM results reported in the manuscript are reliable and robust to the way we conducted the analysis. The result of the split-half reliability analysis is added in the Results.

      [Manuscript, page 3-4] “Neural state inference was robust to the choice of 𝐾 (Figure 1—figure supplement 1) and the fMRI preprocessing pipeline (Figure 1—figure supplement 5) and consistent when conducted on two groups of randomly split-half participants (Pearson’s correlations between the two groups’ latent state activation patterns: DMN: 0.791, DAN: 0.838, SM: 0.944, base: 0.837).”

      • Comparison with just PCA/functional gradients is weak in establishing whether HMMs are good models of the timeseries. Especially given that the HMM does not explain a lot of variance in the signal (~0.5 R^2 for only 27 brain regions) for PCA. I think they don't report their own R^2 of the timeseries

      We agree with the reviewer that the PCA that we conducted to compare with the explained variance of the functional gradients was not directly comparable because PCA and gradients utilize different algorithms to reduce dimensionality. To make more meaningful comparisons, we removed the data-specific PCA results and replaced them with data-specific functional gradients (derived from the SONG dataset). This allows us to directly compare SONG-specific functional gradients with predefined gradients (derived from the resting-state HCP dataset from Margulies et al. [2016]). We found that the degrees to which the first two predefined gradients explained whole-brain fMRI time series (SONG: 𝑟! = 0.097, HCP: 0.084) were comparable to the amount of variance explained by the first two data-specific gradients (SONG: 𝑟! = 0.100, HCP: 0.086). Thus, the predefined gradients explain as much variance in the SONG data time series as SONG-specific gradients do. This supports our argument that the low-dimensional manifold is largely shared across contexts, and that the common HMM latent states may tile the predefined gradients.

      These analyses and results were added to the Results, Methods, and Figure 1—figure supplement 8. Here, we only attach changes to the Results section for simplicity, but please see the revised manuscript for further changes.

      [Manuscript, page 5-6] “We hypothesized that the spatial gradients reported by Margulies et al. (2016) act as a lowdimensional manifold over which large-scale dynamics operate (Bolt et al., 2022; Brown et al., 2021; Karapanagiotidis et al., 2020; Turnbull et al., 2020), such that traversals within this manifold explain large variance in neural dynamics and, consequently, cognition and behavior (Figure 1C). To test this idea, we situated the mean activity values of the four latent states along the gradients defined by Margulies et al. (2016) (see Methods). The brain states tiled the two-dimensional gradient space with the base state at the center (Figure 1D; Figure1—figure supplement 7). The Euclidean distances between these four states were maximized in the two-dimensional gradient space, compared to a chance where the four states were inferred from circular-shifted time series (p < 0.001). For the SONG dataset, the DMN and SM states fell at more extreme positions of the primary gradient than expected by chance (both FDR-p values = 0.004; DAN and SM states, FDRp values = 0.171). For the HCP dataset, the DMN and DAN states fell at more extreme positions on the primary gradient (both FDR-p values = 0.004; SM and base states, FDR-p values = 0.076). No state was consistently found at the extremes of the secondary gradient (all FDR-p values > 0.021).

      We asked whether the predefined gradients explain as much variance in neural dynamics as latent subspace optimized for the SONG dataset. To do so, we applied the same nonlinear dimensionality reduction algorithm to the SONG dataset’s ROI time series. Of note, the SONG dataset includes 18.95% rest, 15.07% task, and 65.98% movie-watching data whereas the data used by Margulies et al. (2016) was 100% rest. Despite these differences, the SONG-specific gradients closely resembled the predefined gradients, with significant Pearson’s correlations observed for the first (r = 0.876) and second (r = 0.877) gradient embeddings (Figure 1—figure supplement 8). Gradients identified with the HCP data also recapitulated Margulies et al.’s (2016) first (r = 0.880) and second (r = 0.871) gradients. We restricted our analysis to the first two gradients because the two gradients together explained roughly 50% of the entire variance of functional brain connectome (SONG: 46.94%, HCP: 52.08%), and the explained variance dropped drastically from the third gradients (more than 1/3 drop compared to second gradients). The degrees to which the first two predefined gradients explained whole-brain fMRI time series (SONG: 𝑟! = 0.097, HCP: 0.084) were comparable to the amount of variance explained by the first two data-specific gradients (SONG: 𝑟! = 0.100, HCP: 0.086; Figure 1—figure supplement 8). Thus, the low-dimensional manifold captured by Margulies et al. (2016) gradients is highly replicable, explaining brain activity dynamics as well as data-specific gradients, and is largely shared across contexts and datasets. This suggests that the state space of whole-brain dynamics closely recapitulates low-dimensional gradients of the static functional brain connectome.”

      The reviewer also pointed out that the PCA-gradient comparison was weak in establishing whether HMMs are good models of the time series. However, we would like to point out that the purpose of the comparison was not to validate the performance of the HMM. Instead, we wanted to test whether the gradients introduced by Margulies et al. (2016) could act as a generalizable lowdimensional manifold of brain state dynamics. To argue that the predefined gradients are a shared manifold, these gradients should explain SONG data fMRI time series as much as the principal components derived directly from the SONG data. Our results showed comparable 𝑟!, both in predefined gradient vs. data-specific PC comparisons and predefined gradient vs. data-specific gradient comparisons, which supported our argument that the predefined gradients could be the shared embedding space across contexts and datasets.

      The reviewer pointed out that the 𝑟2 of ~0.5 is not explaining enough variance in the fMRI signal. However, we respectfully disagree with this point because there is no established criterion for what constitutes a high or low 𝑟2 for this type of analysis. Of note, previous literature that also applied PCA to fMRI time series (Author response image 4A and 4B) (Lynn et al., 2021; Shine et al., 2019) also found that the cumulative explained variance of top 5 principal components is around 50%. Author response image 4C shows cumulative variances to which gradients explain the functional connectome of the resting-state fMRI data (Margulies et al., 2016).

      Author response image 4.

      Finally, the reviewer pointed out that the 𝑟! of the HMM-derived latent sequence to the fMRI time series should be reported. However, there is no standardized way of measuring the explained variance of the HMM inference. There is no report of explained variance in the traditional HMMfMRI papers (Baker et al., 2014; Chen et al., 2016; Vidaurre et al., 2018, 2017). Rather than 𝑟!, the HMM computes the log likelihood of the model fit. However, because log likelihood values are dependent on the number of data points, studies do not report log likelihood values nor do they use these metrics to interpret the goodness of model fit.

      To ask whether the goodness of the HMM fit was significant above chance, we compared the log likelihood of the HMM to the log likelihood distribution of the null HMM fits. First, we extracted the log likelihood of the HMM fit with the real fMRI time series. We iterated this 1,000 times when calculating null HMMs using the circular-shifted fMRI time series. The log likelihood of the real model was significantly higher than the chance distribution, with a z-value of 2182.5 (p < 0.001). This indicates that the HMM explained a large variance in our fMRI time series data, significantly above chance.

      • Authors do not specify whether they also did cross-validation for the HCP dataset to find 4 clusters

      We apologize for the lack of clarity. When we computed the Calinski-Harabasz score with the HCP dataset, three was chosen as the most optimal number of states (Author response image 5A). When we set K as 3, the HMM inferred the DMN, DAN, and SM states (Author response image 5C). The base state was included when K was set to 4 (Author response image 5B). The activation pattern similarities of the DMN, DAN, and SM states were r = 0.981, 0.984, 0.911 respectively.

      Author response image 5.

      We did not use K = 3 for the HCP data replication because we were not trying to test whether these four states would be the optimal set of states in every dataset. Although the CalinskiHarabasz score chose K = 3 because it showed the best clustering performance, this does not mean that the base state is not meaningful to this dataset. Likewise, the latent states that are inferred when we increase/decrease the number of states are also meaningful states. For example, in Figure 1—figure supplement 1, we show an example of the SONG dataset’s latent states when we set K to 7. The seven latent states included the DAN, SM, and base states, the DMN state was subdivided into DMN-A and DMN-B states, and the FPN state and DMN+VIS state were included. Setting a higher number of states like K = 7 would mean that we are capturing brain state dynamics in a higher dimension than when using K = 4. Because we are utilizing a higher number of states, a model set to K = 7 would inevitably capture a larger variance of fMRI time series than a model set to K = 4.

      The purpose of latent state replication with the HCP dataset was to validate the generalizability of the DMN, DAN, SM, and base states. Before characterizing these latent states’ relevance to cognition, we needed to verify that these latent states were not simply overfit to the SONG dataset. The fact that the HMM revealed a similar set of latent states when applied to the HCP dataset suggested that the states were not merely specific to SONG data.

      To make our points clearer in the manuscript, we emphasized that we are not arguing for the four states to be the exclusive states. We made edits to Discussion as follows.

      [Manuscript, page 16] “Our study adopted the assumption of low dimensionality of large-scale neural systems, which led us to intentionally identify only a small number of states underlying whole-brain dynamics. Importantly, however, we do not claim that the four states will be the optimal set of states in every dataset and participant population. Instead, latent states and patterns of state occurrence may vary as a function of individuals and tasks (Figure 1—figure supplement 2). Likewise, while the lowest dimensions of the manifold (i.e., the first two gradients) were largely shared across datasets tested here, we do not argue that it will always be identical. If individuals and tasks deviate significantly from what was tested here, the manifold may also differ along with changes in latent states (Samara et al., 2023). Brain systems operate at different dimensionalities and spatiotemporal scales (Greene et al., 2023), which may have different consequences for cognition. Asking how brain states and manifolds—probed at different dimensionalities and scales—flexibly reconfigure (or not) with changes in contexts and mental states is an important research question for understanding complex human cognition.”

      • One of their main contributions is the base state but the correlation between the base state in their Song dataset and the HCP dataset is only 0.399

      This is a good point. However, there is precedent for lower spatial pattern correlation of the base state compared to other states in the literature.

      Compared to the DMN, DAN, and SM states, the base state did not show characteristic activation or deactivation of functional networks. Most of the functional networks showed activity levels close to the mean (z = 0). With this flattened activation pattern, relatively low activation pattern similarity was observed between the SONG base state and the HCP base state.

      In Figure 1—figure supplement 6, we write, “The DMN, DAN, and SM states showed similar mean activity patterns. We refrained from making interpretations about the base state’s activity patterns because the mean activity of most of the parcels was close to z = 0”.

      A similar finding has been reported in a previous work by Chen et al. (2016) that discovered the base state with HMM. State 9 (S9) of their results is comparable to our base state. They report that even though the spatial correlation coefficient of the brain state from the split-half reliability analysis was the lowest for S9 due to its low degrees of activation or deactivation, S9 was stably inferred by the HMM. The following is a direct quote from their paper:

      “To the best of our knowledge, a state similar to S9 has not been presented in previous literature. We hypothesize that S9 is the “ground” state of the brain, in which brain activity (or deactivity) is similar for the entire cortex (no apparent activation or deactivation as shown in Fig. 4). Note that different groups of subjects have different spatial patterns for state S9 (Fig. 3A). Therefore, S9 has the lowest reproducible spatial pattern (Fig. 3B). However, its temporal characteristics allowed us to distinguish it consistently from other states.” (Chen et al., 2016)

      Thus, we believe our data and prior results support the existence of the “base state”.

      • Figure 1B: Parcellation is quite big but there seems to be a gradient within regions

      This is a function of the visualization software. Mean activity (z) is the same for all voxels within a parcel. To visualize the 3D contours of the brain, we chose an option in the nilearn python function that smooths the mean activity values based on the surface reconstructed anatomy.

      In the original manuscript, our Methods write, “The brain surfaces were visualized with nilearn.plotting.plot_surf_stat_map. The parcel boundaries in Figure 1B are smoothed from the volume-to-surface reconstruction.”

      • Figure 1D: Why are the DMNs further apart between SONG and HCP than the other states

      To address this question, we first tested whether the position of the DMN states in the gradient space is significantly different for the SONG and HCP datasets. We generated surrogate HMM states from the circular-shifted fMRI time series and positioned the four latent states and the null DMN states in the 2-dimensional gradient space (Author response image 6).

      Author response image 6.

      We next tested whether the Euclidean distance between the SONG dataset’s DMN state and the HCP dataset’s DMN state is larger than would be expected by chance (Author response image 7). To do so, we took the difference between the DMN state positions and compared it to the 1,000 differences generated from the surrogate latent states. The DMN states of the SONG and HCP datasets did not significantly differ in the Gradient 1 dimension (two-tailed test, p = 0.794). However, as the reviewer noted, the positions differed significantly in the Gradient 2 dimension (p = 0.047). The DMN state leaned more towards the Visual gradient in the SONG dataset, whereas it leaned more towards the Somatosensory-Motor gradient in the HCP dataset.

      Author response image 7.

      Though we cannot claim an exact reason for this across-dataset difference, we note a distinctive difference between the SONG and HCP datasets. Both datasets largely included resting-state, controlled tasks, and movie watching. The SONG dataset included 18.95% of rest, 15.07% of task, and 65.98% of movie watching. The task only contained the gradCPT, i.e., sustained attention task. On the other hand, the HCP dataset included 52.71% of rest, 24.35% of task, and 22.94% of movie watching. There were 7 different tasks included in the HCP dataset. It is possible that different proportions of rest, task, and movie watching, and different cognitive demands involved with each dataset may have created data-specific latent states.

      • Page 5 paragraph starting at L25: Their hypothesis that functional gradients explain large variance in neural dynamics needs to be explained more, is non-trivial especially because their R^2 scores are so low (Fig 1. Supplement 8) for PCA

      We address this concern on page 21-23 of this response letter.

      • Generally, I do not find the PCA analysis convincing and believe they should also compare to something like ICA or a different model of dynamics. They do not explain their reasoning behind assuming an HMM, which is an extremely simplified idea of brain dynamics meaning they only change based on the previous state.

      We appreciate this perspective. We replaced the Margulies et al.’s (2016) gradient vs. SONGspecific PCA comparison with a more direct Margulies et al.’s (2016) gradient vs. SONG-specific gradient comparison as described on page 21-23 of this response letter.

      More broadly, we elected to use HMM because of recent work showing correspondence between low-dimensional HMM states and behavior (Cornblath et al., 2020; Taghia et al., 2018; van der Meer et al., 2020; Yamashita et al., 2021). We also found the model’s assumption—a mixture Gaussian emission probability and first-order Markovian transition probability—to be the most suited to analyzing the fMRI time series data. We do not intend to claim that other data-reduction techniques would not also capture low-dimensional, behaviorally relevant changes in brain activity. Instead, our primary focus was identifying a set of latent states that generalize (i.e., recur) across multiple contexts and understanding how those states reflect cognitive and attentional states.

      Although a comparison of possible data-reduction algorithms is out of the scope of the current work, an exhaustive comparison of different models can be found in Bolt et al. (2022). The authors compared dozens of latent brain state algorithms spanning zero-lag analysis (e.g., principal component analysis, principal component analysis with Varimax rotation, Laplacian eigenmaps, spatial independent component analysis, temporal independent component analysis, hidden Markov model, seed-based correlation analysis, and co-activation patterns) to time-lag analysis (e.g., quasi-periodic pattern and lag projections). Bolt et al. (2022) writes “a range of empirical phenomena, including functional connectivity gradients, the task-positive/task-negative anticorrelation pattern, the global signal, time-lag propagation patterns, the quasiperiodic pattern and the functional connectome network structure, are manifestations of the three spatiotemporal patterns.” That is, many previous findings that used different methods essentially describe the same recurring latent states. A similar argument was made in previous papers (Brown et al., 2021; Karapanagiotidis et al., 2020; Turnbull et al., 2020).

      We agree that the HMM is a simplified idea of brain dynamics. We do not argue that the four number of states can fully explain the complexity and flexibility of cognition. Instead, we hoped to show that there are different dimensionalities to which the brain systems can operate, and they may have different consequences to cognition. We “simplified” neural dynamics to a discrete sequence of a small number of states. However, what is fascinating is that these overly “simplified” brain state dynamics can explain certain cognitive and attentional dynamics, such as event segmentation and sustained attention fluctuations. We highlight this point in the Discussion.

      [Manuscript, page 16] “Our study adopted the assumption of low dimensionality of large-scale neural systems, which led us to intentionally identify only a small number of states underlying whole-brain dynamics. Importantly, however, we do not claim that the four states will be the optimal set of states in every dataset and participant population. Instead, latent states and patterns of state occurrence may vary as a function of individuals and tasks (Figure 1—figure supplement 2). Likewise, while the lowest dimensions of the manifold (i.e., the first two gradients) were largely shared across datasets tested here, we do not argue that it will always be identical. If individuals and tasks deviate significantly from what was tested here, the manifold may also differ along with changes in latent states (Samara et al., 2023). Brain systems operate at different dimensionalities and spatiotemporal scales (Greene et al., 2023), which may have different consequences for cognition. Asking how brain states and manifolds—probed at different dimensionalities and scales—flexibly reconfigure (or not) with changes in contexts and mental states is an important research question for understanding complex human cognition.”

      • For the 25- ROI replication it seems like they again do not try multiple K values for the number of states to validate that 4 states are in fact the correct number.

      In the manuscript, we do not argue that the four will be the optimal number of states in any dataset. (We actually predict that this may differ depending on the amount of data, participant population, tasks, etc.) Instead, we claim that the four identified in the SONG dataset are not specific (i.e., overfit) to that sample, but rather recur in independent datasets as well. More broadly we argue that the complexity and flexibility of human cognition stem from the fact that computation occurs at multiple dimensions and that the low-dimensional states observed here are robustly related to cognitive and attentional states. To prevent misunderstanding of our results, we emphasized in the Discussion that we are not arguing for a fixed number of states. A paragraph included in our response to the previous comment (page 16 in the manuscript) illustrates this point.

      • Fig 2B: Colorbar goes from -0.05 to 0.05 but values are up to 0.87

      We apologize for the confusion. The current version of the figure is correct. The figure legend states, “The values indicate transition probabilities, such that values in each row sums to 1. The colors indicate differences from the mean of the null distribution where the HMMs were conducted on the circular-shifted time series.”

      We recognize that this complicates the interpretation of the figure. However, after much consideration, we decided that it was valuable to show both the actual transition probabilities (values) and their difference from the mean of null HMMs (colors). The values demonstrate the Markovian property of latent state dynamics, with a high probability of remaining in the same state at consecutive moments and a low probability of transitioning to a different state. The colors indicate that the base state is a transitional hub state by illustrating that the DMN, DAN, and SM states are more likely to transition to the base state than would be expected by chance.

      • P 16 L4 near-critical, authors need to be more specific in their terminology here especially since they talk about dynamic systems, where near-criticality has a specific definition. It is unclear which definition they are looking for here.

      We agree that our explanation was vague. Because we do not have evidence for this speculative proposal, we removed the mention of near-criticality. Instead, we focus on our observation as the base state being the transitional hub state within a metastable system.

      [Manuscript, page 17-18] “However, the functional relevance of the base state to human cognition had not been explored previously. We propose that the base state, a transitional hub (Figure 2B) positioned at the center of the gradient subspace (Figure 1D), functions as a state of natural equilibrium. Transitioning to the DMN, DAN, or SM states reflects incursion away from natural equilibrium (Deco et al., 2017; Gu et al., 2015), as the brain enters a functionally modular state. Notably, the base state indicated high attentional engagement (Figure 5E and F) and exhibited the highest occurrence proportion (Figure 3B) as well as the longest dwell times (Figure 3—figure supplement 1) during naturalistic movie watching, whereas its functional involvement was comparatively minor during controlled tasks. This significant relevance to behavior verifies that the base state cannot simply be a byproduct of the model. We speculate that susceptibility to both external and internal information is maximized in the base state—allowing for roughly equal weighting of both sides so that they can be integrated to form a coherent representation of the world—at the expense of the stability of a certain functional network (Cocchi et al., 2017; Fagerholm et al., 2015). When processing rich narratives, particularly when a person is fully immersed without having to exert cognitive effort, a less modular state with high degrees of freedom to reach other states may be more likely to be involved. The role of the base state should be further investigated in future studies.”

      • P16 L13-L17 unnecessary

      We prefer to have the last paragraph as a summary of the implications of this paper. However, if the length of this paper becomes a problem as we work towards publication with the editors, we are happy to remove these lines.

      • I think this paper is solid, but my main issue is with using an HMM, never explaining why, not showing inference results on test data, not reporting an R^2 score for it, and not comparing it to other models. Secondly, they use the Calinski-Harabasz score to determine the number of states, but not the log-likelihood of the fit. This clearly creates a bias in what types of states you will find, namely states that are far away from each other, which likely also leads to the functional gradient and PCA results they have. Where they specifically talk about how their states are far away from each other in the functional gradient space and correlated to (orthogonal) components. It is completely unclear to me why they used this measure because it also seems to be one of many scores you could use with respect to clustering (with potentially different results), and even odd in the presence of a loglikelihood fit to the data and with the model they use (which does not perform clustering).

      (1) Showing inference results on test data

      We address this concern on page 19-21 of this response letter.

      (2) Not reporting 𝑹𝟐 score

      We address this concern on page 21-23 of this response letter.

      (3) Not comparing the HMM model to other models

      We address this concern on page 27-28 of this response letter.

      (4) The use of the Calinski-Harabasz score to determine the number of states rather than the log-likelihood of the model fit

      To our knowledge, the log-likelihood of the model fit is not used in the HMM literature. It is because the log-likelihood tends to increase monotonically as the number of states increases. Baker et al. (2014) illustrates this problem, writing:

      “In theory, it should be possible to pick the optimal number of states by selecting the model with the greatest (negative) free energy. In practice however, we observe that the free energy increases monotonically up to K = 15 states, suggesting that the Bayes-optimal model may require an even higher number of states.”

      Similarly, the following figure is the log-likelihood estimated from the SONG dataset. Similar to the findings of Baker et al. (2014), the log-likelihood monotonically increased as the number of states increased (Author response image 8, right). The measures like AIC or BIC, which account for the number of parameters, also have the same issue of monotonic increase.

      Author response image 8.

      Because there is “no straightforward data-driven approach to model order selection” (Baker et al., 2014), past work has used different approaches to decide on the number of states. For example, Vidaurre et al. (2018) iterated over a range of the number of states to repeat the same HMM training and inference procedures 5 times using the same hyperparameters. They selected the number of states that showed the highest consistency across iterations. Gao et al. (2021) tested the clustering performance of the model output using the Calinski-Harabasz score. The number of states that showed the highest within-cluster cohesion compared to the across-cluster separation was selected as the number of states. Chang et al. (2021) applied HMM to voxels of the ventromedial prefrontal cortex using a similar clustering algorithm, writing: “To determine the number of states for the HMM estimation procedure, we identified the number of states that maximized the average within-state spatial similarity relative to the average between-state similarity”. In our previous paper (Song et al., 2021b), we reported both the reliability and clustering performance measures to decide on the number of states.

      In the current manuscript, the model consistency criterion from Vidaurre et al. (2018) was ineffective because the HMM inference was extremely robust (i.e., always inferring the exact same sequence) due to a large number of data points. Thus, we used the Calinski-Harabasz score as our criterion for the number of states selected.

      We agree with the reviewer that the selection of the number of states is critical to any study that implements HMM. However, the field lacks a consensus on how to decide on the number of states in the HMM, and the Calinski-Harabasz score has been validated in previous studies. Most importantly, the latent states’ relationships with behavioral and cognitive measures give strong evidence that the latent states are indeed meaningful states. Again, we are not arguing that the optimal set of states in any dataset will be four nor are we arguing that these four states will always be the optimal states. Instead, the manuscript proposes that a small number of latent states explains meaningful variance in cognitive dynamics.

      • Grammatical error: P24 L29 rendering seems to have gone wrong

      Our intention was correct here. To avoid confusion, we changed “(number of participantsC2 iterations)” to “(#𝐶!iterations, where N=number of participants)” (page 26 in the manuscript).

      Questions:

      • Comment on subject differences, it seems like they potentially found group dynamics based on stimuli, but interesting to see individual differences in large-scale dynamics, and do they believe the states they find mostly explain global linear dynamics?

      We agree with the reviewer that whether low-dimensional latent state dynamics explain individual differences—above and beyond what could be explained by the high-dimensional, temporally static neural signatures of individuals (e.g., Finn et al., 2015)—is an important research question. However, because the SONG dataset was collected in a single lab, with a focus on covering diverse contexts (rest, task, and movie watching) over 2 sessions, we were only able to collect 27 participants. Due to this small sample size, we focused on investigating group-level, shared temporal dynamics and across-condition differences, rather than on investigating individual differences.

      Past work has studied individual differences (e.g., behavioral traits like well-being, intelligence, and personality) using the HMM (Vidaurre et al., 2017). In the lab, we are working on a project that investigates latent state dynamics in relation to individual differences in clinical symptoms using the Healthy Brain Network dataset (Ji et al., 2022, presented at SfN; Alexander et al., 2017).

      Finally, the reviewer raises an interesting question about whether the latent state sequence that was derived here mostly explains global linear dynamics as opposed to nonlinear dynamics. We have two responses: one methodological and one theoretical. First, methodologically, we defined the emission probabilities as a linear mixture of Gaussian distributions for each input dimension with the state-specific mean (mean fMRI activity patterns of the networks) and variance (functional covariance across networks). Therefore, states are modeled with an assumption of linearity of feature combinations. Theoretically, recent work supports in favor of nonlinearity of large-scale neural dynamics, especially as tasks get richer and more complex (Cunningham and Yu, 2014; Gao et al., 2021). However, whether low-dimensional latent states should be modeled nonlinearly—that is, whether linear algorithms are insufficient at capturing latent states compared to nonlinear algorithms—is still unknown. We agree with the reviewer that the assumption of linearity is an interesting topic in systems neuroscience. However, together with prior work which showed how numerous algorithms—either linear or nonlinear—recapitulated a common set of latent states, we argue that the HMM provides a strong low-dimensional model of large-scale neural activity and interaction.

      • P19 L40 why did the authors interpolate incorrect or no-responses for the gradCPT runs? It seems more logical to correct their results for these responses or to throw them out since interpolation can induce huge biases in these cases because the data is likely not missing at completely random.

      Interpolating the RTs of the trials without responses (omission errors and incorrect trials) is a standardized protocol for analyzing gradCPT data (Esterman et al., 2013; Fortenbaugh et al., 2018, 2015; Jayakumar et al., 2023; Rosenberg et al., 2013; Terashima et al., 2021; Yamashita et al., 2021). The choice of this analysis is due to an assumption that sustained attention is a continuous attentional state; the RT, a proxy for the attentional state in the gradCPT literature, is a noisy measure of a smoothed, continuous attentional state. Thus, the RTs of the trials without responses are interpolated and the RT time courses are smoothed by convolving with a gaussian kernel.

      References

      Abbas A, Belloy M, Kashyap A, Billings J, Nezafati M, Schumacher EH, Keilholz S. 2019. Quasiperiodic patterns contribute to functional connectivity in the brain. Neuroimage 191:193–204.

      Alexander LM, Escalera J, Ai L, Andreotti C, Febre K, Mangone A, Vega-Potler N, Langer N, Alexander A, Kovacs M, Litke S, O’Hagan B, Andersen J, Bronstein B, Bui A, Bushey M, Butler H, Castagna V, Camacho N, Chan E, Citera D, Clucas J, Cohen S, Dufek S, Eaves M, Fradera B, Gardner J, Grant-Villegas N, Green G, Gregory C, Hart E, Harris S, Horton M, Kahn D, Kabotyanski K, Karmel B, Kelly SP, Kleinman K, Koo B, Kramer E, Lennon E, Lord C, Mantello G, Margolis A, Merikangas KR, Milham J, Minniti G, Neuhaus R, Levine A, Osman Y, Parra LC, Pugh KR, Racanello A, Restrepo A, Saltzman T, Septimus B, Tobe R, Waltz R, Williams A, Yeo A, Castellanos FX, Klein A, Paus T, Leventhal BL, Craddock RC, Koplewicz HS, Milham MP. 2017. Data Descriptor: An open resource for transdiagnostic research in pediatric mental health and learning disorders. Sci Data 4:1–26.

      Allen EA, Damaraju E, Plis SM, Erhardt EB, Eichele T, Calhoun VD. 2014. Tracking whole-brain connectivity dynamics in the resting state. Cereb Cortex 24:663–676.

      Baker AP, Brookes MJ, Rezek IA, Smith SM, Behrens T, Probert Smith PJ, Woolrich M. 2014. Fast transient networks in spontaneous human brain activity. Elife 3:e01867.

      Bolt T, Nomi JS, Bzdok D, Salas JA, Chang C, Yeo BTT, Uddin LQ, Keilholz SD. 2022. A Parsimonious Description of Global Functional Brain Organization in Three Spatiotemporal Patterns. Nat Neurosci 25:1093–1103.

      Brown JA, Lee AJ, Pasquini L, Seeley WW. 2021. A dynamic gradient architecture generates brain activity states. Neuroimage 261:119526.

      Chang C, Leopold DA, Schölvinck ML, Mandelkow H, Picchioni D, Liu X, Ye FQ, Turchi JN, Duyn JH. 2016. Tracking brain arousal fluctuations with fMRI. Proc Natl Acad Sci U S A 113:4518–4523.

      Chang CHC, Lazaridi C, Yeshurun Y, Norman KA, Hasson U. 2021. Relating the past with the present: Information integration and segregation during ongoing narrative processing. J Cogn Neurosci 33:1–23.

      Chang LJ, Jolly E, Cheong JH, Rapuano K, Greenstein N, Chen P-HA, Manning JR. 2021. Endogenous variation in ventromedial prefrontal cortex state dynamics during naturalistic viewing reflects affective experience. Sci Adv 7:eabf7129.

      Chen J, Leong YC, Honey CJ, Yong CH, Norman KA, Hasson U. 2017. Shared memories reveal shared structure in neural activity across individuals. Nat Neurosci 20:115–125.

      Chen S, Langley J, Chen X, Hu X. 2016. Spatiotemporal Modeling of Brain Dynamics Using RestingState Functional Magnetic Resonance Imaging with Gaussian Hidden Markov Model. Brain Connect 6:326–334.

      Cocchi L, Gollo LL, Zalesky A, Breakspear M. 2017. Criticality in the brain: A synthesis of neurobiology, models and cognition. Prog Neurobiol 158:132–152.

      Cornblath EJ, Ashourvan A, Kim JZ, Betzel RF, Ciric R, Adebimpe A, Baum GL, He X, Ruparel K, Moore TM, Gur RC, Gur RE, Shinohara RT, Roalf DR, Satterthwaite TD, Bassett DS. 2020. Temporal sequences of brain activity at rest are constrained by white matter structure and modulated by cognitive demands. Commun Biol 3:261.

      Cunningham JP, Yu BM. 2014. Dimensionality reduction for large-scale neural recordings. Nat Neurosci 17:1500–1509.

      Deco G, Kringelbach ML, Jirsa VK, Ritter P. 2017. The dynamics of resting fluctuations in the brain: Metastability and its dynamical cortical core. Sci Rep 7:3095.

      Esterman M, Noonan SK, Rosenberg M, Degutis J. 2013. In the zone or zoning out? Tracking behavioral and neural fluctuations during sustained attention. Cereb Cortex 23:2712–2723.

      Esterman M, Rothlein D. 2019. Models of sustained attention. Curr Opin Psychol 29:174–180.

      Fagerholm ED, Lorenz R, Scott G, Dinov M, Hellyer PJ, Mirzaei N, Leeson C, Carmichael DW, Sharp DJ, Shew WL, Leech R. 2015. Cascades and cognitive state: Focused attention incurs subcritical dynamics. J Neurosci 35:4626–4634.

      Falahpour M, Chang C, Wong CW, Liu TT. 2018. Template-based prediction of vigilance fluctuations in resting-state fMRI. Neuroimage 174:317–327.

      Finn ES, Shen X, Scheinost D, Rosenberg MD, Huang J, Chun MM, Papademetris X, Constable RT. 2015. Functional connectome fingerprinting: Identifying individuals using patterns of brain connectivity. Nat Neurosci 18:1664–1671.

      Fortenbaugh FC, Degutis J, Germine L, Wilmer JB, Grosso M, Russo K, Esterman M. 2015. Sustained attention across the life span in a sample of 10,000: Dissociating ability and strategy. Psychol Sci 26:1497–1510.

      Fortenbaugh FC, Rothlein D, McGlinchey R, DeGutis J, Esterman M. 2018. Tracking behavioral and neural fluctuations during sustained attention: A robust replication and extension. Neuroimage 171:148–164.

      Fox MD, Snyder AZ, Vincent JL, Corbetta M, Van Essen DC, Raichle ME. 2005. The human brain is intrinsically organized into dynamic, anticorrelated functional networks. Proc Natl Acad Sci U S A 102:9673–9678.

      Gao S, Mishne G, Scheinost D. 2021. Nonlinear manifold learning in functional magnetic resonance imaging uncovers a low-dimensional space of brain dynamics. Hum Brain Mapp 42:4510–4524.

      Goodale SE, Ahmed N, Zhao C, de Zwart JA, Özbay PS, Picchioni D, Duyn J, Englot DJ, Morgan VL, Chang C. 2021. Fmri-based detection of alertness predicts behavioral response variability. Elife 10:1–20.

      Greene AS, Horien C, Barson D, Scheinost D, Constable RT. 2023. Why is everyone talking about brain state? Trends Neurosci.

      Greene DJ, Marek S, Gordon EM, Siegel JS, Gratton C, Laumann TO, Gilmore AW, Berg JJ, Nguyen AL, Dierker D, Van AN, Ortega M, Newbold DJ, Hampton JM, Nielsen AN, McDermott KB, Roland JL, Norris SA, Nelson SM, Snyder AZ, Schlaggar BL, Petersen SE, Dosenbach NUF. 2020. Integrative and Network-Specific Connectivity of the Basal Ganglia and Thalamus Defined in Individuals. Neuron 105:742-758.e6.

      Gu S, Pasqualetti F, Cieslak M, Telesford QK, Yu AB, Kahn AE, Medaglia JD, Vettel JM, Miller MB, Grafton ST, Bassett DS. 2015. Controllability of structural brain networks. Nat Commun 6:8414.

      Jayakumar M, Balusu C, Aly M. 2023. Attentional fluctuations and the temporal organization of memory. Cognition 235:105408.

      Ji E, Lee JE, Hong SJ, Shim W (2022). Idiosyncrasy of latent neural state dynamic in ASD during movie watching. Poster presented at the Society for Neuroscience 2022 Annual Meeting.

      Karapanagiotidis T, Vidaurre D, Quinn AJ, Vatansever D, Poerio GL, Turnbull A, Ho NSP, Leech R, Bernhardt BC, Jefferies E, Margulies DS, Nichols TE, Woolrich MW, Smallwood J. 2020. The psychological correlates of distinct neural states occurring during wakeful rest. Sci Rep 10:1–11.

      Liu X, Duyn JH. 2013. Time-varying functional network information extracted from brief instances of spontaneous brain activity. Proc Natl Acad Sci U S A 110:4392–4397.

      Liu X, Zhang N, Chang C, Duyn JH. 2018. Co-activation patterns in resting-state fMRI signals. Neuroimage 180:485–494.

      Lynn CW, Cornblath EJ, Papadopoulos L, Bertolero MA, Bassett DS. 2021. Broken detailed balance and entropy production in the human brain. Proc Natl Acad Sci 118:e2109889118.

      Margulies DS, Ghosh SS, Goulas A, Falkiewicz M, Huntenburg JM, Langs G, Bezgin G, Eickhoff SB, Castellanos FX, Petrides M, Jefferies E, Smallwood J. 2016. Situating the default-mode network along a principal gradient of macroscale cortical organization. Proc Natl Acad Sci U S A 113:12574–12579.

      Mesulam MM. 1998. From sensation to cognition. Brain 121:1013–1052.

      Munn BR, Müller EJ, Wainstein G, Shine JM. 2021. The ascending arousal system shapes neural dynamics to mediate awareness of cognitive states. Nat Commun 12:1–9.

      Raut R V., Snyder AZ, Mitra A, Yellin D, Fujii N, Malach R, Raichle ME. 2021. Global waves synchronize the brain’s functional systems with fluctuating arousal. Sci Adv 7.

      Rosenberg M, Noonan S, DeGutis J, Esterman M. 2013. Sustaining visual attention in the face of distraction: A novel gradual-onset continuous performance task. Attention, Perception, Psychophys 75:426–439.

      Rosenberg MD, Finn ES, Scheinost D, Papademetris X, Shen X, Constable RT, Chun MM. 2016. A neuromarker of sustained attention from whole-brain functional connectivity. Nat Neurosci 19:165–171.

      Rosenberg MD, Scheinost D, Greene AS, Avery EW, Kwon YH, Finn ES, Ramani R, Qiu M, Todd Constable R, Chun MM. 2020. Functional connectivity predicts changes in attention observed across minutes, days, and months. Proc Natl Acad Sci U S A 117:3797–3807.

      Saggar M, Shine JM, Liégeois R, Dosenbach NUF, Fair D. 2022. Precision dynamical mapping using topological data analysis reveals a hub-like transition state at rest. Nat Commun 13.

      Schaefer A, Kong R, Gordon EM, Laumann TO, Zuo X-N, Holmes AJ, Eickhoff SB, Yeo BTT. 2018. Local-Global Parcellation of the Human Cerebral Cortex from Intrinsic Functional Connectivity MRI. Cereb Cortex 28:3095–3114.

      Shine JM. 2019. Neuromodulatory Influences on Integration and Segregation in the Brain. Trends Cogn Sci 23:572–583.

      Shine JM, Bissett PG, Bell PT, Koyejo O, Balsters JH, Gorgolewski KJ, Moodie CA, Poldrack RA. 2016. The Dynamics of Functional Brain Networks: Integrated Network States during Cognitive Task Performance. Neuron 92:544–554.

      Shine JM, Breakspear M, Bell PT, Ehgoetz Martens K, Shine R, Koyejo O, Sporns O, Poldrack RA. 2019. Human cognition involves the dynamic integration of neural activity and neuromodulatory systems. Nat Neurosci 22:289–296.

      Smith SM, Fox PT, Miller KL, Glahn DC, Fox PM, Mackay CE, Filippini N, Watkins KE, Toro R, Laird AR, Beckmann CF. 2009. Correspondence of the brain’s functional architecture during activation and rest. Proc Natl Acad Sci 106:13040–13045.

      Song H, Emily FS, Rosenberg MD. 2021a. Neural signatures of attentional engagement during narratives and its consequences for event memory. Proc Natl Acad Sci 118:e2021905118.

      Song H, Park B-Y, Park H, Shim WM. 2021b. Cognitive and Neural State Dynamics of Narrative Comprehension. J Neurosci 41:8972–8990.

      Taghia J, Cai W, Ryali S, Kochalka J, Nicholas J, Chen T, Menon V. 2018. Uncovering hidden brain state dynamics that regulate performance and decision-making during cognition. Nat Commun 9:2505.

      Terashima H, Kihara K, Kawahara JI, Kondo HM. 2021. Common principles underlie the fluctuation of auditory and visual sustained attention. Q J Exp Psychol 74:705–715.

      Tian Y, Margulies DS, Breakspear M, Zalesky A. 2020. Topographic organization of the human subcortex unveiled with functional connectivity gradients. Nat Neurosci 23:1421–1432.

      Turnbull A, Karapanagiotidis T, Wang HT, Bernhardt BC, Leech R, Margulies D, Schooler J, Jefferies E, Smallwood J. 2020. Reductions in task positive neural systems occur with the passage of time and are associated with changes in ongoing thought. Sci Rep 10:1–10.

      Unsworth N, Robison MK. 2018. Tracking arousal state and mind wandering with pupillometry. Cogn Affect Behav Neurosci 18:638–664.

      Unsworth N, Robison MK. 2016. Pupillary correlates of lapses of sustained attention. Cogn Affect Behav Neurosci 16:601–615.

      van der Meer JN, Breakspear M, Chang LJ, Sonkusare S, Cocchi L. 2020. Movie viewing elicits rich and reliable brain state dynamics. Nat Commun 11:1–14.

      Van Essen DC, Smith SM, Barch DM, Behrens TEJ, Yacoub E, Ugurbil K. 2013. The WU-Minn Human Connectome Project: An overview. Neuroimage 80:62–79.

      Vidaurre D, Abeysuriya R, Becker R, Quinn AJ, Alfaro-Almagro F, Smith SM, Woolrich MW. 2018. Discovering dynamic brain networks from big data in rest and task. Neuroimage, Brain Connectivity Dynamics 180:646–656.

      Vidaurre D, Smith SM, Woolrich MW. 2017. Brain network dynamics are hierarchically organized in time. Proc Natl Acad Sci U S A 114:12827–12832.

      Yamashita A, Rothlein D, Kucyi A, Valera EM, Esterman M. 2021. Brain state-based detection of attentional fluctuations and their modulation. Neuroimage 236:118072.

      Yeo BTT, Krienen FM, Sepulcre J, Sabuncu MR, Lashkari D, Hollinshead M, Roffman JL, Smoller JW, Zöllei L, Polimeni JR, Fisch B, Liu H, Buckner RL. 2011. The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J Neurophysiol 106:1125–1165.

      Yousefi B, Keilholz S. 2021. Propagating patterns of intrinsic activity along macroscale gradients coordinate functional connections across the whole brain. Neuroimage 231:117827.

      Zhang S, Goodale SE, Gold BP, Morgan VL, Englot DJ, Chang C. 2023. Vigilance associates with the low-dimensional structure of fMRI data. Neuroimage 267.

    1. Author Response:

      Reviewer #1:

      Lee et al. identify miR-20b as a molecular regulator of hepatic lipid metabolism through the post-transcriptional regulation of the nuclear receptor PPAR alpha. Through mechanistic studies the authors identified the 3'UTR of PPARa as a direct target for miR-20b regulation of expression. The experiments are well controlled and the study provides deep mechanistic insight into the miR-20b/PPARa circuit in modulating hepatic lipid metabolism. Furthermore, the authors provide evidence that targeting the miR-20b pathway to enhance PPARa activation via synthetic ligand fenofibrate. The studies provide much needed mechanistic insight into molecular regulators of hepatic lipid metabolism in response to nutrient stress such as high fat diet. While this is a detailed and thorough assessment of this pathway, there are several issues that were identified in the review of this article outlined below:

      1) The authors state there is no off target expression of miR-20b in adipose tissue in their over expression experiments. However, per figure 4 supplement 1, EpiWAT has increased expression over controls in HFD fed conditions. Furthermore, figure 4 supplement 2 shows a functional difference in EpiWAT weight in HFD where miR-20b treated mice have higher fat weight. The authors need at the least to discuss the potential role of adipose tissue in promoting their observed phenotype.

      This is a good point. We increased the number of samples and carefully analyzed the changes of both the expression of Mir20b and the weight of epididymal adipose tissue. We observed that slight increase of Mir20b expression in epididymal adipose tissue of AAV- miR20b HFD-fed mice compared to AAV-control NCD-fed mice, not HFD-fed mice. The expression of Mir20b in adipose tissue of between AAV-control HFD and AAV-Mir20b HFD mice was not significantly changed (Figure 5-figure supplement 1).

      We have revised the text and added the discussion about the potential role of adipose tissue (page 25-26, line 582-594). Hepatic steatosis could be affected by adipose tissue through free fatty acid (FFA) release and hepatic uptake of circulating FFAs (Rasineni et al., 2021). Our results showed that the epididymal adipose tissue of HFD-fed mice was enlarged upon AAV-Mir20b treatment; however, the serum FFA levels in these mice were comparable to those in mice treated with the AAV-Control (Figure 5-figure supplement 4B)). Of note, the expression of genes related to lipolysis did not change in adipose tissues, and that of hepatic FA transporter, CD36, was decreased by AAV-Mir20b treatment (Figure 5Q and Figure 5-figure supplement 4A). In addition, excess hepatic triglycerides (TGs) are secreted as very low density lipoproteins (VLDLs), and the secretion rate increases with the TG level (Fabbrini et al., 2008). VLDLs deliver TGs from the liver to adipose tissue and contributes expansion of adipose tissue (Chiba et al., 2003). Together, these reports suggest that adipose tissue is also remodeled by the liver in HFD-fed mice and non-alcoholic fatty liver disease (NAFLD) patients. Therefore, the levels of hepatic TGs are unlikely affected by epididymal adipose tissue, and the increase in fat content (Figure 5-figure supplement 3) may be a consequence of increased hepatic TG levels.

      2) Figure 5 shows anti-miR-20b essentially restores PPARa expression. However, the rescue effects in terms of body weight, liver triglycerides and liver damage are only modestly improved. The authors need to discuss this modest effect and potentially offer alternative mechanisms aside from PPARa as the physiological target.

      Previously, we introduced AAV treatment after four weeks of high fat diet (HFD) feeding. Anti-Mir20b treatment significantly changed the expression of PPARA; however, the effect on the pathophysiological properties of the liver was significant but modest. We thought that this was because there was not enough time to make a proper impact on the liver. Thus, to maximize the effect of ani-Mir20b, the AAV was administered when the HFD was started. The new results showed more significant effects of anti-Mir20b (Figure 6).

      We also observed that other nuclear receptors, such as RORA, RORC, and THRB, could be potential targets of MIR20B (Figure 2H and Figure 2-figure supplement 3). However, in the patient data, there was no significant correlation between the expression of those nuclear receptors and that of MIR20B. In addition, among the candidate targets, only PPARA was selected as an overlapped predicted target of MIR20B by various miRNA target prediction programs, including miRDB, picTAR, TargetSCAN, and miRmap (Figure 2J, Figure 2-figure supplement 2). Consistent with these results, we observed that Ppara, not other nuclear receptors, is the target gene of MiR20b in both AAV-Mir20b and AAV-anti- Mir20b mice (Figure 5-figure supplement 2, Figure 6-figure supplement 2). Thus, we focused on PPARA as a MIR20B target in NAFLD.

      3) The authors performed experiments with mutated 3'UTR of PPARa and show mutated PPARa is refractory to regulation by miR-20b. However, the authors provide no functional evidence that mutating the 3'UTR of PPARa elicits changes in hepatic lipid metabolism. Discussion of this point is needed at the minimal.

      Thank you for your comment. To provide functional evidence, we tried to establish the PPARA 3’UTR mutation knock-in (KI) system in cells. However, we could not succeed because of technical difficulties and time constraints. Alternatively, we introduced the wild type PPARA open reading frame (ORF) followed by either the wild type (WT) or mutant (Mut) 3’UTR of PPARA in HepG2 cells, and analyzed the importance of the 3’UTR of PPARA. As shown in Figure 2-figure supplement 5C, MIR20B significantly suppressed the expression of PPARA and its target genes in PPARA-3’UTR WT expressing cells. Furthermore, Oil Red O staining showed that MIR20B expression increased the intracellular lipid content in these cells (Figure 2-figure supplement 5B). However, MIR20B did not have an effect on either the expression of PPARA and its target genes or intracellular lipid content in PPARA-3’UTR Mut expressing cells (Figure 2-figure supplement 5C, D). We have added the new results in page 17-18, line 350-359 and Figure 2-figure supplement 5.

      Reviewer #2:

      1) In the experiments depicted in Figures 1D and E, did OA treatment of HepG2 and/or Huh-7 cells produce a reduction in the levels of mRNA encoding PPARalpha (or PPARalpha protein levels) in concordance with the shown rise in mRNA for miR-20b?

      Thank you for your question. The samples used in Figure 1C–E were also analyzed to observe the changes in the expression of PPARA (Figure 2-figure supplement 4A–C). In each sample, the increase in MIR20B expression resulting from oleic acid (OA) treatment and HFD was accompanied by a reduction in the levels of PPARA mRNA.

      2) Moreover, Figure 1 shows a fuller landscape of the transcriptional impact of microRNAs in context of obese livers in mice and human. Given this, what made miR20-b more interesting than, for example, miR106a, miR-17, or others that also appear to be robustly regulated? Why focus on miR20b?

      This is a very good point. In the analysis of the regulatory network, other miRNAs including MIR129 and MIR106A appeared to possibly regulate nuclear receptors in NAFLD. We further confirmed the relationship between candidate miRNAs and NAFLD progression in patient samples. As shown in the revised Figure 1B, we observed that the expression of MIR20B was more robustly and significantly changed with NAFLD progression than that of MIR129 and MIR106a. This tendency was also confirmed in other experiments using OA- treated HepG2 and Huh7 cells or HFD-fed mice (Figure 1-figure supplement 4). Thus, we focused on the role of MIR20B in NAFLD. Nevertheless, we do not rule out the possibility that other miRNAs may be involved in NAFLD progression. Subsequent studies may uncover the roles of other miRNAs in liver physiology.

      3) What does the rank and p-value exactly represent in tabular part of Figure 1A? This is very unclear as shown, including the figure legend.

      The p-values in the table of Figure 1A were obtained from the hypergeometric distribution used for testing the enrichment of downregulated nuclear receptors among the targets of a miRNA. In other words, they indicate the probability of having downregulated nuclear receptors among the miRNA targets. They were calculated by the following equation:

      where N is the total number of genes analyzed, M is the number of candidate target genes of the miRNA, D is the downregulated NR genes, and O is the observed overlap between miRNA targets and the downregulated NR genes as described in the Materials and Methods (page 9, line 155-157). The ranks in the table were determined according to the p-value. The legend of Figure 1A has been modified as follows:

      “Figure 1. MIR20B expression is significantly increased in the livers of dietary and genetic obese mice and humans. (A) The miRNA regulatory networks for NR genes downregulated in the transcriptome of NAFLD patients. The adjusted p-values in the table represent the enrichment of miRNA targets in the downregulated NR genes (hypergeometric distribution).”

      4) Figure 1, supplement 1 shows characteristics of patients involved in data for Figure 1, etc. This shows that the normal patients are younger than the other two groups, the M-F ratio is not identical (more female in the normal group), and the total cholesterol levels are not well matched either. What other parameters are available? Hemoglobin A1c? Fasting glucose? In the end, we need to know that the groups, apart from the severity of NAFLD and NASH, were well matched. Given the small size of each group (n = only 4-5, this matching is critical to avoid confounding of the relationship between miR-20b, PPARalpha, and NAFLD/NASH progression.

      Thank you for your comment. Accordingly, we have included the patient information in a table (Figure 1-figure supplement 1A, B). To increase the statistical power and prevent confounding effects, we increased the number of samples and tried to match them to compare age, weight, and male/female ratio between the groups. Due to the limited number of patient samples, the cohorts could not be perfectly matched. Nevertheless, there were no significant differences in age and male/female ratio among the three groups. Specifically, serum AST, ALT, and fasting glucose levels were significantly increased with progression from normal to non-alcoholic steatohepatitis (NASH), but total cholesterol was comparable as previously reported (Chung et al., 2020). We have revised the text in page 7-8, line 118- 130.

      5) The title of Figure 2 relates to PPARalpha. However, in Figure 2G, it is clear that several NRs are downregulated by miR20b overexpression in cells. Although the paper focuses on PPARalpha, should the authors not explore at least some of the other hits to ensure that the impact of PPARalpha is of particular importance vs. others?

      This is a good point. We also observed that other nuclear receptors, such as RORA, RORC, and THRB, could be potential targets of MIR20B (Figure 2H and Figure 2-figure supplement 3). However, in the patient data, there was no significant correlation between the expression of those nuclear receptors and that of MIR20B. In addition, among the candidate targets, only PPARA was selected as an overlapped predicted target of MIR20B by various miRNA target prediction programs, including miRDB, picTAR, TargetSCAN, and miRmap (Figure 2J, Figure 2-figure supplement 2). Consistent with these results, we observed that PPARA, not other nuclear receptors is the target gene of MIR20B in both AAV-Mir20b and AAV-anti-Mir20b mice (Figure 5-figure supplement 2, Figure 6-figure supplement 2). Thus, we focused on PPARA as a MIR20B target in NAFLD.

      6) In Figure 3, the data show, presumably, that OA induces miR20b, which then represses PPARalpha and, in turn, CD36 downstream of PPARalpha. If this is the case, then how does OA continue to get into the cells? Once CD36 expression falls dramatically, doesn't the key OA uptake mechanism fall with it? Then, does the induction of miR20b abate? Or, does FATP6 or another uptake mechanism account for OA entry into these cells?

      This is a good point. FA uptake was decreased by overexpression of MIR20B, and was accompanied by a considerable decrease in CD36 expression (Figure 4B, J). However, other lipid transporters such as FATPs were not significantly altered (Figure 4-figure supplement 5), suggesting that FA uptake is continued by these transporters. The expression of CD36 is relatively low in normal hepatocytes, and the molecule may not be the primary fatty acid transporter in these cells (Wilson et al., 2016). Furthermore, the decrease in FA uptake upon CD36 KO is modest even during a HFD (Wilson et al., 2016). In addition, we observed that the expression of MIR20B is induced and increased for up to 24 h by OA treatment. This is followed by a slight decrease, remaining at a constant elevated level (Figure 4-figure supplement 6). Together, the findings indicated that other fatty acid transporters contributing to FA uptake account for the entry of OA into cells. We have added these discussion in page 25, line 571-581.

      7) Similarly, what happens to AGPAT, GPAT, and DGAT expression in context of OA treatment and modulation of miR20b? Does the capacity of the cell to store OA in the form of triglyceride inside of lipid droplets change, so that the amount of free OA or oleyl-CoA inside the cell rises? Could this impact the transcriptional phenotype?

      This is a very good point. Accordingly, we analyzed the transcriptional phenotype in the context of OA treatment and modulation of MIR20B. The expression of glycerolipid synthetic genes, including AGPATs, GPATs, and DGATs, was increased by OA treatment, but MIR20B overexpression did not influence the expression of lipogenic genes except for that of DGAT1. However, treatment with anti-MIR20B significantly reduced the expression of glycerolipid synthetic genes, including GPATs and DGATs, under OA treatment (Figure 4C, N). These results suggested that MIR20B is necessary but not sufficient to induce the expression of glycerolipid synthetic genes under OA treatment. We have shown that OA induces the expression of MIR20B (Figure 1C), which can explain why MIR20B overexpression did not show an additional enhancement under OA treatment. The increase in DGAT1 expression induced by MIR20B might contribute to the increase in TG formation and capacity to store OA. This could change the flux of oleyl-CoA to TG synthesis, not β-oxidation with reduced expression of lipid oxidation-associated genes (Figure 4B). Thus, we can expect that the decrease in OA uptake and increase in TG formation induced by MIR20B resulted in reduced amounts of OA or oleyl-CoA inside the cell. However, as lipid consumption through FA oxidation is decreased by MIR20B, free OA or oleyl-CoA might be maintained at a stably increased level compared to that of OA-untreated MIR NC or MIR20B condition, and the impact of the changes in OA or oleyl-CoA levels on the transcriptional phenotype might not be significant as found in a constant elevated level of MIR20B by OA (Figure 4-figure supplement 6). We have added these results in Figure 4C and the Discussion (page 26, line 595-610). Due to technical constraints, we could not measure the amounts of free OA and oleyl-CoA.

      8) In Figure 3P, would the impact of anti-miR on the effect of OA on FASN be lost in PPARalpha KO cells? This would really test the functional relevance of the purported transcriptional hierarchy.

      Thank you for your valuable comment. We tested the impact of anti-MIR20B treatment on OA-treated PPARA knock-down (KD) cells, not KO cells, due to technical constraints. PPARA KD cells showed a significant decrease in PPARA expression. As shown in Figure 4- figure supplement 4I, anti-MIR20B treatment enhanced the expression of PPARA but did not have a significant effect on fatty acid synthase (FASN) expression in both control and PPARA KD cells. In addition, PPARA KD did not affect FASN expression. The expression patterns of PPARα target genes differ between mice and humans. FASN is regulated by PPARα in mice, but this is unclear in humans (Rakhshandehroo, Hooiveld, Muller, & Kersten, 2009; Rakhshandehroo, Knoch, Muller, & Kersten, 2010). Moreover, fenofibrate, a PPARα agonist, reduces the expression of FASN in methionine choline-deficient (MCD)-fed mice (Cui et al., 2021). Here, we used human HepG2 cells to investigate the effect of OA and MIR20B. It is plausible that FASN might not be regulated by PPARα in our system. We have added these results in Figure 4-figure supplement 4I.

      9) The authors should really at least perform a bulk RNAseq analysis to confirm the similarity of the effect of miR20b or anti-miR seen in cells, at the mouse or human liver tissue level. As it is, they only look at 3 FAOX genes, 2 FA uptake associated genes, and 2 FA synthesis genes. This is not very comprehensive as a validation of the in vitro data, although it is intriguing. Or, at the very least, look at a large validated set of PPARalpha target genes in vivo.

      Thank you for your comment. Accordingly, we selected PPARα target genes altered by MIR20B in OA-treated cells (Figure 4-figure supplement 1A, B), and then examined the hepatic expression of PPARα target genes in HFD-fed mice treated with MIR20B or anti- MIR20B (Figure 5R and 6R). The expression of most PPARα target genes was decreased by OA treatment and the HFD, and MIR20B treatment further reduced their expression. In contrast, anti-MIR20B treatment rescued the reduced expression of PPARα target genes under OA treatment and the HFD. These results suggested that MIR20B suppresses PPARA in vivo, which is consistent with the results from cells. We have added these results in Figure 4-figure supplement 1A, B, Figure 5R, and Figure 6R.

      10) Notably, the figures in general do NOT show individual data points. This is the standard for visual display, rather than bar graphs with simple SEM bars.

      Thank you for your comment. We have revised the graphs to include individual data points.

      11) The in vivo data (e.g. Figure 4) are very low n values. Augmenting this would add confidence to the data. As an example, of inconsistencies potentially stemming from very low n, the liver weights (Figure 4F) are not very different across groups, although the triglyceride levels in the livers (Figure 4H) are more than twice as high. The images of liver specimens shown as examples (Figure 4F) are also more dramatic than the weights would indicate. Note also that the body weights of the mice (Figure 4C) are different as well, and this alone could explain the livers being modestly heavier. Indeed, the extent of body weight excess mirrors the extent of liver weight excess, suggesting that the entire animal may be larger across multiple metabolic tissues including adipose. This is proven in Figure 4D, where the fat mass looks to be larger as well. To this end, Figure 4 supplement 2 shows multiple tissue weights to be increased in this model, suggesting that specificity for hepatic steatosis may be low.

      Thank you for your comment. Accordingly, we conducted additional in vivo experiments with larger n values (n = 10). Then, we replaced the liver images with more representative ones. AAV-Mir20b robustly induced the hepatic expression of Mir20b and significantly increased the liver weight and hepatic TG levels (Figure 5F, 5I). In the liver of normal human, intrahepatic TGs do not exceed 5 % of the liver weight (Fabbrini & Magkos, 2015). In our results, TG levels were increased more than three times by the HFD, but the impact on liver weight was limited, as TGs did not account for more than 10 % of the liver weight (Figure 5I). Excess hepatic TGs are secreted as very low density lipoproteins (VLDLs), and the secretion rate increases with the TG level (Fabbrini et al., 2008). VLDLs deliver TGs from the liver to adipose tissue and other metabolic tissues (Heeren & Scheja, 2021). The excess hepatic TGs induced by MiR20b were presumably transferred to epididymal adipose tissue, contributing to the increase in adipose tissue weight, while inguinal and brown adipose tissues were not significantly affected by MiR20b (Figure 5-figure supplement 3). Together, the fat mass measured by EchoMRI included intrahepatic and adipose TGs, and mirrored the increases shown in Figure 5D. In addition, MiR20b induced the expression of hepatic DGAT1, which could explain increased TG secretion through VLDLs (Figure 4C) (Alves- Bezerra & Cohen, 2017; Liang et al., 2004).

      Conversely, the supply of FFAs from adipose tissue might have contributed to hepatic steatosis. However, we observed that there were no significant changes in the expression of Mir20b and lipolytic genes in adipose tissue (Figure 5-figure supplement 4A). Furthermore, the serum FFA levels in the AAV-Control and AAV-Mir20b groups under the HFD were comparable (Figure 5-figure supplement 4B). These findings suggested that increased intrahepatic TG levels constituted the specific and primary effect of AAV-Mir20b.

      12) In figure 5 S1, the anti-miR20b substantially reduces the weights of multiple tissues in mice fed a HFD, given this, why does overall body weight (figure 5c) show such a modest difference. Figure 5 E and F also suggest that the overall weights would have been lower than shown in Figure 5C. In the end, instead of bar graphs of the final weights, the entire weight curve for the mice fed the HFD should have been shown.

      Thank you for your comment. To make our results more robust, we increased the sample size (n = 10). Moreover, we provided the entire weight curve and revised the results (Figure 6C). AAV-anti-Mir20b treatment significantly reduced the liver weight (Figure 6F). The weight of adipose tissue, including epididymal white adipose tissue (EpiWAT), tended to decrease; however, the difference was not significant (Figure 6-figure supplement 3). As indicated in a previous question (#11), the change in hepatic TG levels could affect the weight of other tissues. In our revised Figure 6C, we show that the overall weight change might be higher than the sum of weight change of specific metabolic tissues, such as the liver and adipose tissues.

      13) How well were the NAFLD vs. normal GSE individuals matched? This is very important, since PPARalpha emerges from comparing these data sets. Matching is very important to make sure that the differences in NR expression does not stem from a confound that went along win parallel with the NAFLD cohort vs. the normal GSE cohort.

      This is a very good point. PPARA emerged from regulatory network analysis (Figure 1A) and was selected as target of MIR20B through the analysis of RNA-seq data from MIR20B- overexpressing HepG2 cells (Figure 2). By constructing a regulatory network in NAFLD patients, we determined that MIR20B is responsible for NR regulation in NAFLD. As shown in Figure 1A, we analyzed the differential expression of NR in NAFLD using public GSE data (GSE130970) consisting of patients with NAFLD and age- and weight-matched normal controls (Hoang et al., 2019). To verify the expression of MIR20B, we assessed the miRNA levels in another non-coding RNA GSE dataset (GSE40744) in the original manuscript (previous Figure 1B). However, in the process of reviewing GSE40744 patients’ information with physicians, we found that some of the patients were virus-infected. Thus, we removed the data from GSE40744 and truly apologize for the confusion.

      In the revised manuscript (page 16 line 303-304), we examined the expression of MIR20B and other candidate miRNAs such as MIR129 and MIR219A in patient samples from the Asan Medical Center (Seoul, Republic of Korea), who were diagnosed by pathologists and age- and weight-matched. As shown in Figure 1B, MIR20B is one of the main miRNAs involved in NAFLD progression. In addition, the expression of PPARA was significantly negatively correlated with that of MIR20B (Figure 2-figure supplement 3).

      Reviewer #3:

      In this manuscript, Le et al. use an elegant combination of cultured cells, patient samples, and mouse models to show that miR-20b promotes non-alcoholic fatty liver disease (NAFLD) by suppressing PPAR-alpha. The authors show that miR-20b inhibits PPAR-gamma expression, resulting in reduced fatty acid oxidation, decreased mitochondrial biogenesis, and increased hepatocyte lipid accumulation both in vitro and in vivo. Inhibition of miR-20b in mouse NAFLD models leads to increased PPAR-gamma, reduced hepatic lipid accumulation, decreased inflammation, and improved glucose tolerance. Overall, the data are well-controlled and support the authors' conclusions.

      Strengths:

      1) In Figure 1, the authors show miR-20b is increased in NAFLD patients, mouse obesity/NAFLD models, and cultured liver cancer cells treated with oleic acid (OA). The use of multiple complementary approaches is very powerful, although more information regarding the diagnoses in the 13 patient samples would be helpful (see below).

      Thank you for your comment. Accordingly, we have included the patient information in a table (Figure 1-figure supplement 1A, B). To increase the statistical power and prevent confounding effects, we increased the number of samples and tried to match them to compare age, weight, and male/female ratio between the groups. Due to the limited number of patient samples, the cohorts could not be perfectly matched. Nevertheless, there were no significant differences in age and male/female ratio among the three groups. Specifically, serum AST, ALT, and fasting glucose levels were significantly increased with progression from normal to non-alcoholic steatohepatitis (NASH), but total cholesterol was comparable as previously reported (Chung et al., 2020). We have revised the text in page 7-8, line 118- 130.

      2) In Figure 2, the authors show that PPAR-alpha is a direct target of miR-20b. These data include a luciferase reporter assay regulated by the 3'UTR of PPAR-alpha. Importantly, when the 3'UTR is mutated, suppression of luciferase expression by miR-20b is no longer observed. The authors use multiple different algorithms to predict miR-20b targets, look for overlap, and then confirm PPAR-alpha as the most important "hit" in vitro.

      3) Figure 3 highlights changes in fatty acid metabolism in HepG2 cells transfected with miR-20b, miR-NC, or anti-miR-20b and treated with oleic acid. Figure 3, supplement 4 shows that anti-miR-20b can alleviate OA-induced hepatic steatosis in both HepG2 cells and primary hepatocytes. The use of another (primary) cell line here is important, because HepG2 is a liver cancer cell line, and metabolic changes in HepG2 cells might not be representative of non-neoplastic hepatocytes.

      4) In Figure 4, the authors show that miR-20b promotes hepatic steatosis, increases liver weight, increases liver injury markers, and impairs glucose tolerance and insulin sensitivity in HFD-fed mice. Conversely, anti-miR-20b inhibits hepatic steatosis, decreases liver weight and liver injury markers, and improves glucose tolerance and insulin sensitivity in HFD-fed mice (Figure 5). Anti-miR-20b also inhibits hepatic steatosis and fibrosis and decreases liver injury markers in MCD-fed mice (Figure 8). These in vivo studies provide excellent support for the authors' hypothesis regarding the role of miR-20b in promoting fatty liver disease. The liver readily takes up small nucleic acids, including miRs and anti-miRs. Thus, the possibility of using anti-miR-20b as a therapeutic for fatty liver disease is intriguing, and supported by these experiments.

      5) In Figure 6, in HepG2 cells, the authors demonstrate that PPAR-alpha overexpression (or to a lesser extent fenofibrate treatment) is able to rescue the transcriptional effects of miR-20b overexpression. Conversely, siPPAR-alpha can rescue the transcriptional effects of anti-miR-20b. Similar results are shown in Figure 7-fenofibrate is able to at least partially suppress some of the metabolic phenotypes that are exacerbated by miR-20b overexpression in HFD-fed mice (the decreased lean/BW ratio, elevated fasting glucose, some transcriptional changes). Again, it is nice to see that the in vitro data is supported by in vivo results.

      Thank you for your comments.

      Weaknesses:

      1) In Figure 3, figure supplement 2, it seems the effects of miR-20b overexpression in primary hepatocytes may be a bit overstated. While it does seem that miR-20b enhances the accumulation of fat in primary hepatocytes upon OA treatment, miR-20b overexpression alone does not seem to have significant effects on steatosis (A), cholesterol (B), or triglycerides (C).

      Thank you for your comment. We have revised the text; “Unlike in HepG2 cells (Figure 2A-C), MIR20B alone did not induce lipid accumulation in primary hepatocytes without OA treatment, but MIR20B significantly increased lipid accumulation in the presence of OA (Figure 4-figure supplement 2)” (page 19, line 383-385). “Figure 4-figure supplement 2. MIR20B enhances lipid accumulation in primary hepatocytes under OA-treatment” (the title of Figure 4-figure supplement 2)

      2) Histologic analysis of mouse liver samples by a pathologist is lacking. In Figure 4, is there increased inflammation and/or fibrosis with miR-20b overexpression, or just increased steatosis? In Figure 4 and Figure 8, it would be helpful if steatosis, fibrosis, and inflammation were quantified/scored histologically.

      Thank you for your comment. Accordingly, we have conducted histological analysis and measured the NAFLD activity score (NAS) and fibrosis score by a pathologist. We have added the scoring graphs in Figure 5H, 6H, 7H, 8I, 8J, 9G, and 9H. In Figure 5G and 5H, AAV-Mir20b significantly increased steatosis but the increase of inflammation was not significant under the HFD; However, AAV-anti-Mir20b significantly decreased steatosis and inflammation, fibrosis under the MCD (Figure 8H-J). In addition, the combination of AAV-anti- Mir20b with fenofibrate significantly alleviated steatosis, inflammation, and fibrosis compared to AAV-Control under the MCD (Figure 9F-H).

      3) The effects of anti-miR-20b on hepatic triglycerides and inflammatory markers in vivo are modest (Figures 5 and 8). Perhaps an enhancement could be seen by combining anti-miR-20b with fenofibrate. While the authors show that fenofibrate's effects are suppressed with miR-20b overexpression, they don't examine what happens when fenofibrate is combined with anti-miR-20b. To me, this experiment is critical to determine if PPAR-alpha activity could be further maximized to combat NAFLD (beyond what is seen with fenofibrate alone).

      This is a very good point. Accordingly, we performed a new experiment in which fenofibrate was combined with anti-Mir20b to treat MCD-fed mice. The combination showed further improvements compared with those obtained by fenofibrate treatment alone. The results have been described in page 23-24, line 518-536.

      “Recently, drug development strategies for NAFLD/NASH are moving toward combination therapies (Dufour, Caussy, & Loomba, 2020). However, the efficacy of developing drugs, including fenofibrate, against NAFLD/NASH is limited (Fernandez-Miranda et al., 2008). Thus, we tested whether the combination of anti-Mir20b and fenofibrate would improve NAFLD in MCD-fed mice. The levels of hepatic Mir20b were reduced after administration of AAV-anti-Mir20b in MCD-fed mice compared to those in mice administered with AAV-Control, and this reduction was also observed after fenofibrate treatment (Figure 9A). Interestingly, the combination of AAV-anti-Mir20b and fenofibrate increased the levels of PPARα to a greater extent than AAV-Mir20b alone (Figure 9B, C). AAV-anti-Mir20b or fenofibrate administration significantly reduced the liver weight and hepatic TG levels, and co- administration further reduced hepatic steatosis (Figure 9D, E). Histological sections showed that the combination of AAV-anti-Mir20b and fenofibrate improved NAFLD, as evidenced by the effects on both lipid accumulation and fibrosis in the liver (Figure 9F-H). Consistently, the levels of AST and ALT were significantly lower after combined treatment with AAV-anti- Mir20b and fenofibrate than after a single treatment (Figure 9I, J). In addition, the expression of genes related to hepatic inflammation, such as Tnf and Il6 (Figure 9K), and fibrosis, such as Acta2, Col1a1, Fn, and Timp1, (Figure 9L), was further decreased by the combination of AAV-anti-Mir20b and fenofibrate. These results suggest that AAV-anti-Mir20b may increase the efficacy of fenofibrate, especially its effect on fibrosis, and provide a more effective option for improving NAFLD/NASH."

    1. Author Response:

      Reviewer #1:

      This study examines the use of terahertz wave modulation (THM), a technique for transmitting terahertz wave electromagnetic energy to the cochlea with the aim of improving the sensitivity of the cochlear outer hair cells. ABR obtained with and without THM suggests that sensitivity thresholds were improved by 10 dB when using THM. Whole-call patch clam recordings from outer hair cells suggest that THM significantly increases both K+ and MET currents of the cochlear outer hair cells. These results are convincing and potentially important for understanding normal cochlear physiology.

      On the other hand, the numerous claims about translational applicability of this work seem overstated.

      61-65 This is incorrect. For example, optogenetics or stem cell use are not currently seen as "treatment for hearing impairment" and, in fact, the manuscript says as much later in the paragraph. Also, pharmacological treatment is rarely effective, and only in limited circumstances.

      Many thanks to reviewers for pointing out this mistake, We have replaced the discussion by:

      “At present, treatment for hearing impairment is primarily administered through pharmacological treatment, hearing aid equipment, and electronic cochlear implantation (Wilson et al., 1991; Kipping et al., 2020; Gang et al., 2008). Optogenetics (Huet et al., 2021), stem cell differentiation and transplantation (Oshima et al., 2010; Li et al., 2003; Chen et al., 2012) are also being explored to treat hearing loss. However, pharmacological treatment is rarely effective, and only in limited circumstances.”

      283-294 The discussion of near-infrared vs THM is misguided. Near-infrared has been proposed as a possible alternative technology to stimulate spiral ganglion neurons, thus replacing cochlear implants. This is plausible, even though feasibility has not yet been demonstrated. In contrast, THM does not seem like a plausible alternative to cochlear implants. Patients who are candidates for cochlear implantation may not have enough (or any) outer hair cells, which are the target for THM.

      Thank the reviewer for pointing out the difference in principle between Near-infrared auditory stimulation and THM. We have now modified the main text and compared the differences and similarities between THM and NIRS. Please see the revised Discussion.

      295-299 "In comparison with wearing hearing aids, stem cell differentiation and transplantation (Oshima et al., 2010; Li et al., 2003; Chen et al., 2012), optogenetics (Huet et al., 2021) and electronic cochlear implantation (Wilson et al., 1991; Kipping et al., 2020; Gang et al., 2008), THM requires no traumatic surgery, cumbersome equipment, or genetic manipulation, and is thus more suitable for use in human subjects." In the described experiment, optic fibers had to be placed close to outer hair cells. That seems to require "cumbersome equipment" and obviously would require surgery for use in humans.

      Many thanks to the reviewer for pointing out these inappropriate statement. We completely agree. We have now revised this statement in the revised manuscript.

      The data show that sensitivity was improved by 8.75 dB. In practical terms this is a very small change. Sensitivity improvement of 10 dB (and much more than that) can be obtained non invasively and on a frequency dependent basis using traditional amplification.

      Any neural stimulation technology would require not only spatial selectivity but also temporal responsiveness. It seems that THM could meet the former criteria but the latter is unknown. In other words, for any practical application it would be necessary to show that modulation of a THM signal can be perceived by listeners. However, this criticism is moot if the claims about clinical applicability of THM are removed.

      We thank for the reviewer’s constructive comments. We completely agree with these comments and the claims about clinical applicability of THM are removed.

      Reviewer #2:

      This manuscript uses mid-infrared light to enhance the currents from natural stimuli (mechanical and voltage) of hair cells. The authors show increased voltage-gated K+ current and MET currents while being illuminated with mid-infrared light. Based on molecular dynamics simulations, the authors hypothesize that the augmented voltage-gated K+ currents are due to stimulation of C=O groups in the selectivity filter which allows K+ ions to pass through the pore more quickly to increase conductance; there was no hypothesis as to why MET currents were augmented. The authors also demonstrate improved ABR thresholds when the cochlea was illuminated with the mid-infrared light, demonstrating a potential therapeutic application. The enthusiasm for the novelty of this work is reduced because other work has shown that neurons can be excited by near-infrared (~2 microns) wavelength due to thermal stimulation and changes in cell capacitance, so this work mainly differs in their proposed mechanism and the longer wavelength of light (8.6 microns). Additionally, the Hudspeth group (Azimzadeh et al, 2018, PMC5805653) has shown thermal gating of MET channels using ultraviolet light and infrared light (1.47 microns). If the THM mechanism is indeed different from thermal stimulation, this would be a novel therapeutic mode, however, the data are not yet convincing that thermal stimulation is not the mechanism of action.

      We thank the reviewer’s suggestions that are essential for improving our manuscript, in particular to pointing out the important literature about thermal gating of MET channels. We have now cited and discussed this review paper and other related papers.

      Since the structure of the MET channels have not been resolved, we cannot study the mechanism at the atomic or chemical bond level by molecular dynamics.

      Infrared stimulation is emerging as an area of interest for neuromodulation and potential clinical application.While most studies on infrared stimulation have been conducted at near infrared wavelengths, whether mid-infrared wavelengths can impact neuronal function is unknown. A large number of studies have shown that the threshold of action potential generated by INS stimulation is correlated with the solution absorption coefficient to wavelength, that is, the higher the solution absorption coefficient is, the lower the threshold is. Therefore, the mechanism of action potential induced by INS is generally believed to be the rapid rise of solution temperature caused by INS, namely “ Photothermal effect ”[1]. However, as figure R1 shown, the absorption of water to the wavelength 8.6 μm we use is very weak.

      How does near-infrared light affect the excitability of cells or nerves through “ photothermal effect ”, so as to promote the generation or propagation of action potential in neurons or inhibit the generation or propagation of action potential? In other words, what is the target of “ photothermal effect ” ? Currently, there are few studies on the mechanisms, and the possible biophysical mechanisms include the following three:

      (1) After INS is absorbed by solution , the solution temperature increases rapidly, the membrane capacitance changes and the inward current is induced, which leads to the depolarization of membrane potential and the generation of action potential[2]; (2) INS activates temperature-sensitive TRP ion channels, which causes an action potential[3]; (3) INS enhanced inhibitory postsynaptic by acting on GABA receptor, thus producing inhibitory effect[4].

      At present, the wavelength of INS is mainly near infrared light (1-3 microns), the parameters used are not consistant, and there are many factors affecting the excitation or inhibition of INS (such as the diameter of the fiber, the energy of infraredlight, pulse width, repetition frequency). On the one hand, photothermal effect is difficult to control, and some studies have found that overheating photothermal effect will block the generation and propagation of action potential, and even cause irreversible effects of INS on inhibition of action potential and tissue damage [5]. On the other hand, it is difficult to determine the target of photothermal action, which hinders the safe and effective promotion of INS as a neuroregulatory tool to the clinical or research field. Therefore, new regulatory strategies with more explicit mechanisms are needed in the field of photoneural regulation.

      References:

      1. Wells, J., Kao, C., Konrad, P., Milner, T., Kim, J., Mahadevan-Jansen, A., Jansen, E.D.: Biophysical mechanisms of transient optical stimulation of peripheral nerve. Biophysical Journal. 93, 2567-2580 (2007).

      2. Shapiro, M.G., Homma, K., Villarreal, S., Richter, C.P., Bezanilla, F.: Infrared light excites cells by changing their electrical capacitance. Nature Communications. 3, (2012).

      3. Albert, E.S., Bec, J.M., Desmadryl, G., Chekroud, K., Travo, C., Gaboyard, S., Bardin, F., Marc, I., Dumas, M., Lenaers, G., Hamel, C., Muller, A., Chabbert, C.: TRPV4 channels mediate the infrared laser-evoked response in sensory neurons. Journal of Neurophysiology. 107, 3227–3234 (2012).

      4. Feng, H.J., Kao, C., Gallagher, M.J., Jansen, E.D., Mahadevan-Jansen, A., Konrad, P.E., Macdonald, R.L.: Alteration of GABAergic neurotransmission by pulsed infrared laser stimulation. Journal of Neuroscience Methods. 192, 110–114 (2010).

      5. Walsh, A.J., Tolstykh, G.P., Martens, S., Ibey, B.L., Beier, H.T.: Action potential block in neurons by infrared light. Neurophotonics. 3, 040501 (2016).

      The authors hypothesize that the increase in K+ current through voltage gated channels is due to increasing the speed of movement of the K+ ions through the selectivity filter, which they modeled with molecular dynamics simulations. However, the simulations are not validated with experimental manipulations.

      We thank the reviewer for pointing this out. As shown in Figure R1, we overlapped the vibration spectra of modeled channels and the attenuation of infrared light in water.

      Figure. R1. Comparisons of the absorption intensity of water molecular (green curve), Na+ channel (orange curve), and K+ channel (black curve) from our MD simulation, and the values from other molecular dynamics calculations [1] (purple star), respectively.

      As shown in the FIG. R1, the strong absorption of THz wave located at the frequency of 49.86 THz for K+ channel, but it falls in the strong absorption region of water molecules. Otherwise, THz wave modulation (THM) will be interfered with by the thermal effect caused by the large absorption of water molecules.

      For Na+ channels, the strongest absorption peak is located at 48.20 THz, which is consistent with these calculation results reported in the references of <PNAS 118, e2015685118 (2021)>. Nevertheless, it falls in the absorption region of water molecules and can be preferentially large absorbed by water molecules. In theory, the frequency of 39.82 THz can avoid the absorption of water molecules and regulate the carboxyl (-COO-) groups of Na+ channels in a non-thermal way, thus promoting or inhibiting the Na+ current. Unfortunately, these results are difficult to be confirmed by experiment methods due to no strong enough of the intensity of light source corresponding to this frequency, so the laser cannot be effectively coupled to the optical fiber to focus on nerve cells, which affects the current test of ion channel under terahertz stimulations [2]. We believe that the regulation characteristics of terahertz waves with specific frequency on Na+ channels will be further studied when the light source and coupling technology of correlation frequency are well developed in the future.

      References:

      1. Xi Liu†, Zhi Qiao†, Yuming Chai†, Zhi Zhu†, Kaijie Wu, Wenliang Ji, Daguang Li, Yujie Xiao, Junlong Li, Lanqun Mao, Chao Chang, Quan Wen, Bo Song, Yousheng Shu, Non-thermal and reversible control of neuronal signaling and behavior by mid-infrared stimulation. Proc. Natl. Acad. Sci. U. S. A. 118 (10): e2015685118, (2021).

      2. Seddon, Angela B. "Mid-infrared (MIR) photonics: MIR passive and active fiberoptics chemical and biomedical, sensing and imaging." Emerging Imaging and Sensing Technologies. International Society for Optics and Photonics, 9992, 999206, (2016).

      It was unclear to this reviewer whether the temperature effect would be measurable with the technique used. It appears that the temperature measuring system is rather large as compared to the cell, therefore it would likely measure changes in bulk solution temperature and not necessarily a local or micro-scale change in temperature that the cell may be responding too. Additionally, Littlefield and Richter has suggested that temperature changes on the order of 0.1 degrees Celsius are sufficient to evoke action potentials (Littlefield & Richter, 2021, PMC8035937), which is well within the temperature changes observed by the authors. At the longer wavelengths used in this study, the absorption of water is generally even higher as well, suggesting even greater temperature changes with the same power. In vestibular hair cells a 10 deg Celsius increase in temperature led to a 50-60% increase in peak MET current (Songer & Eatock, 2013, PMC3857958).

      We thank the reviewer for pointing out this issue. Indeed, the temperature measuring system is rather large as compared to the cell. we performed the temperature measurement protocal with an ADINSTRUMENT acquisition system (PowerLab 4/35) coupled to a T-type hypodermic thermocouple (MT 29/5, Physitemp),the diameter of the thermocouple is 100 μm. However, our new experiment on measuring tissue temperature in vitro showed that the maximum temperature elevation was less than 4 °C with the 75 mW stimulation, which was much lower than the temperature measured in the reference paper (10°C,Songer & Eatock, 2013, PMC3857958) and another paper (Littlefield & Richter, 2021, PMC8035937) mentioned by this reviewer also proposed in the introduction that light stimulation arouses neural responses due to photons rather than heat.. In addition, when the power is 10 mW, the temperature rise is not more than 1°C. two studies have found light illumination that is commonly used for optogenetics increases the temperature by ~2°C[1-2].This temperature elevation is associated with the inhibition of neuronal spiking in different brain areas and cannot explain the excitation effect observed in our experiment by the THM. We now mentioned this point in the main text. In addition, we also mention in the main text that the wavelength of 8.6 μm falls in the strong absorption region of water.

      References: 1. Owen, S. F., Liu, M. H. & Kreitzer, A. C. Thermal constraints on in vivo optogenetic manipulations. Nat. Neurosci. 22, 1061–1065 (2019) 2. Ait Ouares, K., Beurrier, C., Canepari, M., Laverne, G. & Kuczewski, N. Opto nongenetics inhibition of neuronal firing. Eur. J. Neurosci. 49, 6–26 (2019).

      In figure 1, when THM is on, there appears to be an increase in the inward current without any mechanical stimulation. There is no discussion of this, and this could be a baseline effect that is not aimed at simply enhancing existing conductances. The increase in K+ conductance seen in the voltage-gated K channel cannot account for this increased inward current, since K+ conductance is outward. THM itself could also activate a small amount of MET current, maybe via the thermal effect demonstrated by Azimzadeh et al. This increased conductance could also be from the Tmc1 leak conductance that the authors have published on previously.

      We thank the reviewer for pointing out this issue, in particular for suggesting several possible reasons about the increase in the inward current. We have now discussed this effect and cited related papers. In addition, the increase in MET currents caused by THM was far greater than the baseline offset, indicating that THM has a non-thermal effect.

      Line 232-233: With regard to the ABR data, data is not shown about whether an OABR can be elicited. The data show that once the THM is turned on and then a click stimulus is presented, there is no response; however, this experiment does not really test whether the THM can evoke an OABR since many repetitions are required to get the ABR waveform out of the noise. If THM is on and the stimulus is below threshold, then there is unlikely going to be an evoked response since the THM stimulus is not synchronized with the ABR recording. The authors need to show that THM onset stimulation that is synchronized with the ABR recording does not result in an ABR waveform.

      We thank the reviewer for suggesting this very important experiment. Following this suggestion, we test whether the THM onset stimulation that is synchronized with the ABR recording can evoke an OABR. We now present the new data in Figure S5.

    1. Author Response

      Reviewer #1 (Public Review):

      High resolution mechanistic studies would be instrumental in driving the development of Cas7-11 based biotechnology applications. This work is unfortunately overshadowed by a recent Cell publication (PMID: 35643083) describing the same Cas7-11 RNA-protein complex. However, given the tremendous interest in these systems, it is my opinion that this independent study will still be well cited, if presented well. The authors obviously have been trying to establish a unique angle for their story, by probing deeper into the mechanism of crRNA processing and target RNA cleavage. The study is carried out rigorously. The current version of the manuscript appears to have been rushed out. It would benefit from clarification and text polishing.

      We thank the reviewer for the positive and helpful comments that have made the manuscript more impactful.

      To summarize the revisions, we have resolved the metal-dependence issue, updated the maps in both main and supplementary figures that support the model, re-organized the labels for clarity, and added the comparison between our and Kato et al.’ structures.

      In addition, we describe a new result with an isolated C7L.1 fragment that retains the processing and crRNA binding activities.

      Reviewer #2 (Public Review):

      In this manuscript, Gowswami et al. solved a cryo-EM structure of Desulfonema ishimotonii Cas7-11 (DiCas7-11) bound to a guiding CRISPR RNA (crRNA) and target RNA. Cas7-11 is of interest due to its unusual architecture as a single polypeptide, in contrast to other type III CRISPR-Cas effectors that are composed of several different protein subunits. The authors have obtained a high-quality cryo-EM map at 2.82 angstrom resolution, allowing them to build a structural model for the protein, crRNA and target RNA. The authors used the structure to clearly identify a catalytic histidine residue in the Cas7-11 Cas7.1 domain that is important for crRNA processing activity. The authors also investigated the effects of metal ions and crRNAtarget base pairing on target RNA cleavage. Finally, the authors used their structure to guide engineering of a compact version of Cas7-11 in which an insertion domain that is disordered in the cryo-EM map was removed. This compact Cas7-11 appears to have comparable cleavage activity to the full-length protein.

      The cryo-EM map presented in this manuscript is generally of high quality and the manuscript is very well illustrated. However, some of the map interpretation requires clarification (outlined below). This structure will be valuable as there is significant interest in DiCas7-11 for biotechnology. Indeed, the authors have begun to engineer the protein based on observations from the structure. Although characterization of this engineered Cas7-11 is limited in this study and similar engineering was also performed in a recently published paper (PMID 35643083), this proof-of-principle experiment demonstrates the importance of having such structural information.

      The biochemistry experiments presented in the study identify an important residue for crRNA processing, and suggest that target RNA cleavage is not fully metal-ion dependent. Most of these conclusions are based on straightforward structure-function experiments. However, some results related to target RNA cleavage are difficult to interpret as presented. Overall, while the cryo-EM data presented in this work is of high quality, both the structural model and the biochemical results require further clarification as outlined below.

      We thank the reviewer for the positive and helpful comments that have made the manuscript more impactful.

      To summarize the revisions, we have resolved the metal-dependence issue, updated the maps in both main and supplementary figures that support the model, re-organized the labels for clarity, and added the comparison between our and Kato et al.’ structures.

      In addition, we describe a new result with an isolated C7L.1 fragment that retains the processing and crRNA binding activities.

      1. The DiCas7-11 structure bound to target RNA was also recently reported by Kato et al. (PMID 35643083). The authors have not cited this work or compared the two structures. While the structures are likely quite similar, it is notable that the structure reported in the current paper is for the wild-type protein and the sample was prepared under reactive conditions, resulting in a partially cleaved target. Kato et al. used a catalytically dead version of Cas7-11 in which the target RNA should remain fully intact. Are there differences in the Cas7-11 structure observed in the presence of a partially cleaved target RNA in comparison to the Kato et al. structure? Such a comparison is appropriate given the similarities between the two reports. A figure comparing the two structures could be included in the manuscript.

      We have added a paragraph on page 12 that describe the differences in preparation of the two complexes and their structures. We observed minor differences in the overall protein structure (r.m.s.d. 0.918 Å for 8114 atoms) but did observe quite different interactions between the protein and the first 5’-tag nucleotide (U(-15) vs. G(-15)) due to the different constructs in pre-crRNA, which suggests an importance of U(-15) in forming the processing-competent active site. We added Figure 2-figure supplementary 3 that illustrates the similarities and the differences.

      2.The cryo-EM density map is of high quality, but some of the structural model is not fully supported by the experimental data (e.g. protein loops from the alphafold model were not removed despite lack of cryo-EM density). Most importantly, there is little density for the target RNA beyond the site 1 cleavage site, suggesting that the RNA was cleaved and the product was released. However, this region of the RNA was included in the structural model. It is unclear what density this region of the target RNA model was based on. Further discussion of the interpretation of the partially cleaved target RNA is necessary. Were 3D classes observed in various states of RNA cleavage and with varied density for the product RNAs?

      We should have made it clear in the Method that multiple maps were used in building the structure but only submitted the post-processed map to reviewers. When using the Relion 4.0’s local resolution estimation-generated map, we observed sufficient density for some of the regions the reviewer is referring to. For instance, the site 1 cleavage density does support the model for the two nucleotides beyond site 1 cleavage site (see the revised Figure 1 & Figure 1- figure supplement 3).

      However, there are protein loops that remain lack of convincing density. These include 134141 and 1316-1329 that are now removed from the final coordinate.

      The “partially cleaved target RNA” phrase is a result of weak density for nucleotides downstream of site 1 (+2 and +3) but clear density flanking site 2. This feature indicates that cleavage likely had taken place at site 1 but not site 2 in most of the particles went into the reconstruction. To further clarify this phrase, we added “The PFS region plus the first base paired nucleotide (+1*) are not observed.” on page 4 and better indicate which nucleotides are or are not built in our model in Figure 1.

      1. The authors argue that site 1 cleavage of target RNA is independent of metal ions. This is a potentially interesting result, but it is difficult to determine whether it is supported by the evidence provided in the manuscript. The Methods section only describes a buffer containing 10 mM MgCl2, but does not describe conditions containing EDTA. How much EDTA was added and was MgCl2 omitted from these samples? In addition, it is unclear whether the site 1 product is visible in Figures 2d and 3d. To my eye, the products that are present in the EDTA conditions on these gels migrated slightly slower than the typical site 1 product. This may suggest an alternate cleavage site or chemistry (e.g. cyclic phosphate is maintained following cleavage). Further experimental details and potentially additional experiments are required to fully support the conclusion that site 1 cleavage may be metal independent.

      As we pointed out in response to Reviewer 1’s #8 comment, this conclusion may have been a result of using an older batch of DiCas7-11 that contains degraded fragments.

      As shown in the attached figure below, “batch Y” was an older prep from our in-house clone and “batch X” is a newer prep from the Addgene purchased clone (gel on right), and they consistently produce metal-independent (batch Y) or metal-dependent (batch X) cleavage (gel on left). It is possible that the degraded fragments in batch Y carry a metal-independent cleavage activity that is absent in the more pure batch X.

      We further performed mass spectrometry analysis of two of the degraded fragments from batch Y (indicated by arrows below) and discovered that these are indeed part of DiCas7-11. We, however, cannot rationalize, without more experimental evidence, why these fragments might have generated metal-independent cleavage at site 1. Therefore, we simply updated all our cleavage results from the new and cleaner prep (batch X) (For instance, Figure 3c). As a result, all references to “metal-independence” were removed.

      With regard to the nature of cleaved products, we found both sites could be inhibited by specific 2’-deoxy modifications, consistent with the previous observation that Type III systems generate a 2’, 3’-cyclic product in spite of the metal dependence (for instance, see Hale, C. R., Zhao, P., Olson, S., Duff, M. O., Graveley, B. R., Wells, L., ... & Terns, M. P. (2009). RNA-guided RNA cleavage by a CRISPR RNA-Cas protein complex. Cell, 139(5), 945-956.)

      We added this rationale based on the new results and believe that these characterizations are now thorough and conclusive

      1. The authors performed an experiment investigating the importance of crRNA-target base pairing on cleavage activity (Figure 3e). However, negative controls for the RNA targets in the absence of crRNA and Cas7-11 were not included in this experiment, making it impossible to determine which bands on the gel correspond to substrates and which correspond to products. This result is therefore not interpretable by the reader and does not support the conclusions drawn by the authors.

      Our original gel image (below) does contain these controls but we did not include them for the figure due to space considerations (we should have included it as a supplementary figure). We have now completely updated Figure 3e with much better quality and controls. Both the older and the updated experiments show the same results.

      Original gel for Figure 3e containing controls.

    1. Author Response:

      Reviewer #1:

      Salehinejad et al. run a battery of tests to investigate the effects of sleep deprivation on cortical excitability using TMS, LTP/LTD-like plasticity using tDCS, EEG-derived measures and behavioral task-performance. The study confirms evidence for sleep deprivation resulting in an increase in cortical excitability, diminishing LTP-like plasticity changes, increase in EEG theta band-power and worse task-performance. Additionally, a protocol usual resulting in LTD-like plasticity results in LTP-like changes in the sleep deprivation condition.

      We appreciate the reviewer's time for carefully reading our work and providing important suggestions/recommendations. In what follows, we addressed the comments one by one, revised the main text accordingly, and pasted the changes here as well.

      1) My main comment is regarding the motivation for executing this specific study setup, which did not become clear to me. It's a robust experimental design, with general approach quite similar to the (in the current manuscript heavily cited) Kuhn et al. 2016 study (which investigates cortical excitability, EEG markers, and changes in LTP mechanisms), with additional inclusion of LTD-plasticity measures. The authors list comprehensiveness as motivation, but the power of a comprehensive study like this would lie in being able to make comparisons across measures to identify new interrelations or interesting subgroups of participants differentially affected by sleep deprivations. These comparisons are presented in l. 322 and otherwise at the end of the supplementary material and the study does not seem to be designed with these as the main motivation in mind. Can the authors could comment on this & clarify their motivation? Maybe the authors can highlight in what way their study constitutes a methodological improvement and incorporates new aspects regarding hypothesis development as compared to e.g. Kuhn et al. 2016; currently, the authors highlight mainly the addition of LTD-plasticity protocols. Similarly, no motivation/context/hypotheses are given for saliva testing. There are a lot of different results, but e.g. the cortical excitability results are not discussed in depth, e.g. there is no effect on IO curve, but on other measures of excitability, the conclusion of that paragraph is only "our results demonstrate that corticocortical and corticospinal excitability are upscaled after sleep deprivation." There are some conflicting results regarding cortical excitability measures in the literature, possibly this could be discussed, so the reader can evaluate in what way the current study constitutes an improvement, for instance methodologically, over previous studies.

      Thank you for your comment/suggestion. The main motivation behind this study was to examine different physiological/behavioral/cognitive measures under sleep conditions and to provide a reasonably complete overview. This approach was not covered in detail by previous work, which is often limited to one or two pieces of behavioral and/or physiological evidence. Our study was not sufficiently powered to identify new interrelations between measures, because this was a secondary aim, although we found some relevant associations in exploratory analyses (i.e., association of motor learning with plasticity, and cortical excitability with memory and attention). Future studies, however, which are sufficiently powered for these comparisons, are needed to explore interrelations between physiological, and cognitive parameters more clearly and we stated this as a limitation (Page 22).

      That said, we agree that specific rationales of the study were not sufficiently clarified in the previous version. We rephrased and clarified respective motivations and rationales here:

      1) By comprehensive, we mean that we obtained measures from basic physiological parameters to behavior and higher-order cognition, which is not sufficiently covered so far. This includes also the exploration of expected associations between behavioral motor learning and plasticity measures, as well as excitability parameters and cognitive functions.

      2) In the Kuhn et al. (2016) study, cortical excitability was obtained by TMS intensity (single- pulse protocol) to elicit a predefined amplitude of the motor-evoked potential, which is a relatively unspecific parameter of corticospinal excitability. In the present study, cortical excitability was monitored by different TMS protocols, which cover not only corticospinal excitability, but also intracortical inhibition, facilitation, I-wave facilitation, and short-latency afferent inhibition, which allow more specific conclusions with respect to the involvement of cortical systems, neurotransmitters, and -modulators.

      3) Furthermore, Kuhn et al (2016) only investigated LTP-like, but not LTD-like plasticity. LTD- like plasticity was also not investigated in previous works to the best of our knowledge. LTD- like plasticity has however relevance for cognitive processing, and furthermore, knowledge about alterations of this kind of plasticity is important for mechanistic understanding of sleep- dependent plasticity alterations: The conversion of LTD-like to LTP-like plasticity under sleep deprivation is crucial for the interpretation of the study results as likely caused by cortical hyperactivity.

      4) Finally, an important motivation was to compare how brain physiology and cognition are differently affected by sleep deprivation, as compared to chronotype-dependent brain physiology, and cognitive performance, especially with respect to brain physiology, and performance at non-preferred times of the day. Our findings regarding the latter were recently published (Salehinejad et al., 2021) and comparisons of the present study with the published one have a novel, and important implications. Specifically, the results of both studies imply that the mechanistic background of sleep deprivation-, and non-optimal time of day performance- dependent reduced performance differs relevantly.

      We clarified these motivations in the introduction and discussion. Please see the revised text below:

      "The number of available studies about the impact of sleep deprivation on human brain physiology relevant for cognitive processes is limited, and knowledge is incomplete. With respect to cortical excitability, Kuhn et al. (2016) showed increased excitability under sleep deprivation via a global measure of corticospinal excitability, the TMS intensity needed to induce motor-evoked potentials of a specific amplitude. Specific information about the cortical systems, including neurotransmitters, and - modulators involved in these effects (e.g. glutamatergic, GABAergic, cholinergic), is however missing. The level of cortical excitability affects neuroplasticity, a relevant physiological derivate of learning, and memory formation. Kuhn and co-workers (2016) describe accordingly a sleep deprivation-dependent alteration of LTP-like plasticity in humans. The effects of sleep deprivation on LTD-like plasticity, which is required for a complete picture, have however not been explored so far. In the present study, we aimed to complete the current knowledge and explored also cognitive performance on those tasks which critically depend on cortical excitability (working memory, and attention), and neuroplasticity (motor learning) to gain mechanistic knowledge about sleep deprivation-dependent performance decline. Finally, we aimed to explore if the impact of sleep deprivation on brain physiology and cognitive performance differs from the effects of non-optimal time of day performance in different chronotypes, which we recently explored in a parallel study with an identical experimental design (Salehinejad et al., 2021). The use of measures of different modalities in this study allows us to comprehensively investigate the impact of sleep deprivation on brain and cognitive functions which is largely missing in the human literature."

      We added more details about the rationale for saliva sampling:

      "We also assessed resting-EEG theta/alpha, as an indirect measure of homeostatic sleep pressure, and examined cortisol and melatonin concentration to see how these are affected under sleep conditions, given the reported mixed effects in previous studies."

      We also rephrased the cortical excitability results. Please see the revised text below:

      "Taken together, our results demonstrate that glutamate-related intracortical excitability is upscaled after sleep deprivation. Moreover, cortical inhibition was decreased or turned into facilitation, which is indicative of enhanced cortical excitability as a result of GABAergic reduction. Corticospinal excitability did only show a trendwise upscaling, indicative for a major contribution of cortical, but not downstream excitability to this sleep deprivation-related enhancement."

      "The increase of cortical excitability parameters and the resultant synaptic saturation following sleep deprivation can explain the respective cognitive performance decline. It is, however, worth noting that our study was not powered to identify these correlations with sufficient reliability, and future studies that are powered for this aim are needed.

      Our findings have several implications. First, they show that sleep and circadian preference (i.e., chronotype) have functionally different impacts on human brain physiology and cognition. The same parameters of brain physiology and cognition were recently investigated at circadian optimal vs non-optimal time of day in two groups of early and late chronotypes (Salehinejad et al., 2021). While we found decreased cortical facilitation and lower neuroplasticity induction (same for both LTP and LTD) at the circadian nonpreferred time in that study (Salehinejad et al., 2021), in the present study we observed upscaled cortical excitability and a functionally different pattern of neuroplasticity alteration (i.e., diminished LTP-like plasticity induction and conversion of LTD- to LTP-like plasticity)."

      2) EEG-measures. In general, I find the presented evidence regarding a link between synaptic strength and human theta-power is weak. In humans, rhythmic theta activity can be found mostly in the form of midfrontal theta. Here, the largest changes seem to be in posterior electrodes (judging according to in Fig 4 bottom row), which will not capture rhythmic midfrontal theta in humans. Can the authors explain the scaling of the Fig. 4 top vs. bottom row, there seems to be a mismatch? No legend is given for the bottom row. The activity captured here is probably related to changes in nonrhythmic 1/f-type activity (which displays large changes relating to arousal: e.g. https://elifesciences.org/articles/55092. It would be of benefit to see a power spectrum for the EEG-measures to see the specific type of power changes across all frequencies & to verify that these are actually oscillatory peaks in individual subjects. As far as I understood, the referenced study Vyazovskiy et al., 2008 contains no information regarding theta as a marker for synaptic potentiation. The evidence that synaptic strength is captured by the specifically used measures needs to be strengthened or statements like "measured synaptic strength via the resting-EEG theta/alpha pattern" need to be more carefully stated.

      Thank you for this comment. We removed the Pz electrode from the figure and instead added F3 and F4 along with Fz and Cz to capture more mid-frontal regions. Please see the revised Figure 4. The top rows now include only midfrontal and midcentral areas (Fz, Cz, F3, F4), and show numerical comparisons of midfrontal theta which is significantly different across conditions (and larger after sleep deprivation). The purpose of the bottom figures, which are removed now, was just to provide an overall visual comparison of theta distribution across sleep conditions. However, we agree that the bottom-row figures are misleading because these just capture average theta band power without specifying midfrontal regions. We removed this part of the figure to prevent confusion. Please see below.

      Regarding the power spectrum, we also added new figures (4 g) showing how different frequency bands of the power spectrum are affected by sleep deprivation. Please see the revised Figure 4 below.

      Updated results, page 12-13:

      "In line with this, we investigated how sleep deprivation affects resting-state brain oscillations at the theta band (4-7 Hz), the beta band (15-30 Hz) as another marker of cortical excitability, vigilance and arousal (Eoh et al., 2005; Fischer et al., 2008) and the alpha band (8-14 Hz) which is important for cognition (e.g. memory, attention) (Klimesch, 2012). To this end, we analyzed EEG spectral power at mid-frontocentral electrodes (Fz, Cz, F3, F4) using a 4×2 mixed ANOVA. For theta activity, significant main effects of location (F1.71=18.68, p<0.001; ηp2=0.40) and sleep condition (F1=17.82, p<0.001; ηp2=0.39), but no interaction was observed, indicating that theta oscillations at frontocentral regions were similarly affected by sleep deprivation. Post hoc tests (paired, p<0.05) revealed that theta oscillations, grand averaged at mid-central electrodes, were significantly increased after sleep deprivation (p<0.001) (Fig. 4a,b). For the alpha band, the main effects of location (F1.49=12.92, p<0.001; ηp2=0.31) and sleep condition (F1=5.03, p=0.033; ηp2=0.15) and their interaction (F2.31=4.60, p=0.010; ηp2=0.14) were significant. Alpha oscillations, grand averaged at mid-frontocentral electrodes, were significantly decreased after sleep deprivation (p=0.033) (Fig. 4c,d). Finally, the analysis of beta spectral power showed significant main effects of location (F1.34=6.73, p=0.008; ηp2=0.19) and sleep condition (F1=6.98, p=0.013; ηp2=0.20) but no significant interaction. Beta oscillations, grand averaged at mid-frontocentral electrodes, were significantly increased after sleep deprivation (p=0.013) (Fig. 4e,f)."

      Fig. 4. Resting-state theta, alpha, and beta oscillations at electrodes Fz, Cz, F3 and F4. a,b Theta band activity was significantly higher after the sleep deprivation vs sufficient sleep condition (tFz=4.61, p<0.001; tCz=2.22, p=0.034; tF3=2.93, p=0.007; tF4=4.78, p<0.001). c,d, Alpha band activity was significantly lower at electrodes Fz and Cz (tFz=2.39, p=0.023; tCz=2.65, p=0.013) after the sleep deprivation vs the sufficient sleep condition. e,f, Beta band activity was significantly higher at electrodes Fz, Cz and F4 after sleep deprivation compared with the sufficient sleep condition (tFz=3.06, p=0.005; tCz=2.38, p= 0.024; tF4=2.25, p=0.032). g, Power spectrum including theta (4-7 Hz), alpha (8-14 Hz), and beta (15-30 Hz) bands at the electrodes Fz, Cz, F3 and F4 respectively. Data of one participant were excluded due to excessive noise. All pairwise comparisons for each electrode were calculated via post hoc Student’s t-tests (paired, p<0.05). n=29. Error bars represent s.e.m. ns = nonsignificant; Asterisks indicate significant differences. Boxes indicate the interquartile range that contains 50% of values (range from the 25th to the 75th percentile) and whiskers show the 1 to 99 percentiles.

      Regarding the reference, unfortunately, we were referring to a different work of the Vyazovskiy team. We meant Vyazovskiy et al. (2005). We removed this reference and the part that needed to be toned down from the introduction and added new relevant references while tuning down the statement about synaptic strength. Please see below:

      Revised text, Results, page 12:

      "So far, we found that sleep deprivation upscales cortical excitability, prevents induction of LTP-like plasticity, presumably due to saturated synaptic potentiation, and converts LTD- into LTP-like plasticity. Previous studies in animals (Vyazovskiy and Tobler, 2005; Leemburg et al., 2010) and humans (Finelli et al., 2000) have shown that EEG theta activity is a marker for homeostatic sleep pressure and increased cortical excitability (Kuhn et al., 2016)."

      3) In general, the authors generally do a good job pointing out multiple comparison corrected tests. In some cases, e.g. for their correlational analyses across measures, significant results are reported, but without a clearer discussion on what other tests were computed and how correction was applied, the evidence strength of these are hard to evaluate. Please check for all presented correlations.

      Thank you for your comment. For correlational analyses, no correction for multiple comparisons was computed, because these were secondary exploratory analyses. We state this now clearly in the manuscript. For the other analyses, the description of multiple comparisons is included below:

      Methods, pages 35-37:

      "For the TMS protocols with a double-pulse condition (i.e., SICI-ICF, I-wave facilitation, SAI), the resulting mean values were normalized to the respective single-pulse condition. First, mean values were calculated individually and then inter-individual means were calculated for each condition. For the I-O curves, absolute MEP values were used. To test for statistical significance, repeated-measures ANOVAs were performed with ISIs, TMS intensity (in I-O curve only), and condition (sufficient sleep vs sleep deprivation) as within-subject factors and MEP amplitude as the dependent variable. In case of significant results of the ANOVA, post hoc comparisons were performed using Bonferroni-corrected t-tests to compare mean MEP amplitudes of each condition against the baseline MEP and to contrast sufficient sleep vs sleep deprivation conditions. To determine if individual baseline measures differed within and between sessions, SI1mV and Baseline MEP were entered as dependent variables in a mixed-model ANOVA with session (4 levels) and condition (sufficient sleep vs sleep deprivation) as within-subject factors, and group (anodal vs cathodal) as between-subject factor. The mean MEP amplitude for each measurement time-point was normalized to the session’s baseline (individual quotient of the mean from the baseline mean) resulting in values representing either increased (> 1.0) or decreased (< 1.0) excitability. Individual averages of the normalized MEP from each time-point were then calculated and entered as dependent variables in a mixed-model ANOVA with repeated measures with stimulation condition (active, sham), time-point (8 levels), and sleep condition (normal vs deprivation) as within-subject factors and group (anodal vs cathodal) as between-subject factor. In case of significant ANOVA results, post hoc comparisons of MEP amplitudes at each time point were performed using Bonferroni-corrected t-tests to examine if active stimulation resulted in a significant difference relative to sham (comparison 1), baseline (comparison 2), the respective stimulation condition at sufficient sleepvs sleep deprivation (comparison 3), and the between-group comparisons at respective timepoints (comparison 4).

      The mean RT, RT variability and accuracy of blocks were entered as dependent variables in repeated-measures ANOVAs with block (5, vs 6, 6 vs 7) and condition (sufficient sleep vs sleep deprivation) as within-subject factors. Because the RT differences between blocks 5 vs 6 and 6 vs 7 were those of major interest, post hoc comparisons were performed on RT differences between these blocks using paired-sample t-tests (two-tailed, p<0.05) without correction for multiple comparisons. For 3-back, Stroop and AX-CPT tasks, mean and standard deviation of RT and accuracy were calculated and entered as dependent variables in repeated-measures ANOVAs with sleep condition (sufficient sleep vs sleep deprivation) as the within-subject factor. For significant ANOVA results, post hoc comparisons of dependent variables were performed using paired-sample t-tests (two-tailed, p<0.05) without correction for multiple comparisons.

      For the resting-state data, brain oscillations at mid-central electrodes (Fz, Cz, F3, F4) were analyzed with a 4×2 ANOVA with location (Fz, Cz, F3, F4) and sleep condition (sufficient sleep vs sleep deprivation) as the within-subject factors. For all tasks, individual ERP means were grand-averaged and entered as dependent variables in repeated-measures ANOVAs with sleep condition (sufficient sleep vs sleep deprivation) as the within-subject factor. Post hoc comparisons of grand-averaged amplitudes was performed using paired-sample t-tests (two-tailed, p<0.05) without correction for multiple comparisons.

      To assess the relationship between induced neuroplasticity and motor sequence learning, and the relationship between cortical excitability and cognitive task performance, we calculated Pearson correlations. For the first correlation, we used individual grand-averaged MEP amplitudes obtained from anodal and cathodal tDCS pooled for the time-points between 0, and 20 min after interventions, and individual motor learning performance (i.e. BL6-5 and BL6-7 RT difference) across sleep conditions. For the second correlation, we used individual grand-averaged MEP amplitudes obtained from each TMS protocol and individual accuracy/RT obtained from each task across sleep conditions. No correction for multiple comparisons was done for correlational analyses as these were secondary exploratory analyses."

      There are also inconsistencies like: " The average levels of cortisol and melatonin were lower after sleep deprivation vs sufficient sleep (cortisol: 3.51{plus minus}2.20 vs 4.85{plus minus}3.23, p=0.05; melatonin 10.50{plus minus}10.66 vs 16.07{plus minus}14.94, p=0.16)"

      The p-values are not significant here?

      Thank you for your comment. The p-value was only marginally significant for the cortisol level changes. We clarified this in the revision. Please see below:

      Revised text, page 19:

      "The average levels of cortisol and melatonin were numerically lower after sleep deprivation vs sufficient sleep (cortisol: 3.51±2.20 vs 4.85±3.23, p=0.056; melatonin 10.50±10.66 vs 16.07±14.94, p=0.16), but these differences were only marginally significant for the cortisol level and showed only a trendwise reduction for melatonin."

      Reviewer #2:

      This study represents the currently most comprehensive characterization of indices of synaptic plasticity and cognition in humans in the context of sleep deprivation. It provides further support for an interplay between the time course of synaptic strength/cortical excitability (homeostatic plasticity) and the inducibility of associative synaptic LTP- LTD-like plasticity. The study is of great interest, the translation of findings is of potential clinical relevance, the methods appear to be solid and the results are mostly convincing. I believe that the writing of the manuscript should be improved (e.g. quality of referencing), clearer framework and hypothesis, reduction of redundancies, and more precise discussion. However, all of these points can be addressed since the overall concept, design, conduct and findings are convincing and of great interest to the field of sleep research, but also more broader to the neurosciences, to clinicians and the public.

      We appreciate the reviewer's time for carefully reading our work and providing important suggestions/recommendations.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this article, Bollmann and colleagues demonstrated both theoretically and experimentally that blood vessels could be targeted at the mesoscopic scale with time-of-flight magnetic resonance imaging (TOF-MRI). With a mathematical model that includes partial voluming effects explicitly, they outline how small voxels reduce the dependency of blood dwell time, a key parameter of the TOF sequence, on blood velocity. Through several experiments on three human subjects, they show that increasing resolution improves contrast and evaluate additional issues such as vessel displacement artifacts and the separation of veins and arteries.

      The overall presentation of the main finding, that small voxels are beneficial for mesoscopic pial vessels, is clear and well discussed, although difficult to grasp fully without a good prior understanding of the underlying TOF-MRI sequence principles. Results are convincing, and some of the data both raw and processed have been provided publicly. Visual inspection and comparisons of different scans are provided, although no quantification or statistical comparison of the results are included.

      Potential applications of the study are varied, from modeling more precisely functional MRI signals to assessing the health of small vessels. Overall, this article reopens a window on studying the vasculature of the human brain in great detail, for which studies have been surprisingly limited until recently.

      In summary, this article provides a clear demonstration that small pial vessels can indeed be imaged successfully with extremely high voxel resolution. There are however several concerns with the current manuscript, hopefully addressable within the study.

      Thank you very much for this encouraging review. While smaller voxel sizes theoretically benefit all blood vessels, we are specifically targeting the (small) pial arteries here, as the inflow-effect in veins is unreliable and susceptibility-based contrasts are much more suited for this part of the vasculature. (We have clarified this in the revised manuscript by substituting ‘vessel’ with ‘artery’ wherever appropriate.) Using a partial-volume model and a relative contrast formulation, we find that the blood delivery time is not the limiting factor when imaging pial arteries, but the voxel size is. Taking into account the comparatively fast blood velocities even in pial arteries with diameters ≤ 200 µm (using t_delivery=l_voxel/v_blood), we find that blood dwell times are sufficiently long for the small voxel sizes considered here to employ the simpler formulation of the flow-related enhancement effect. In other words, small voxels eliminate blood dwell time as a consideration for the blood velocities expected for pial arteries.

      We have extended the description of the TOF-MRA sequence in the revised manuscript, and all data and simulations/analyses presented in this manuscript are now publicly available at https://osf.io/nr6gc/ and https://gitlab.com/SaskiaB/pialvesseltof.git, respectively. This includes additional quantifications of the FRE effect for large vessels (adding to the assessment for small vessels already included), and the effect of voxel size on vessel segmentations.

      Main points:

      1) The manuscript needs clarifying through some additional background information for a readership wider than expert MR physicists. The TOF-MRA sequence and its underlying principles should be introduced first thing, even before discussing vascular anatomy, as it is the key to understanding what aspects of blood physiology and MRI parameters matter here. MR physics shorthand terms should be avoided or defined, as 'spins' or 'relaxation' are not obvious to everybody. The relationship between delivery time and slab thickness should be made clear as well.

      Thank you for this valuable comment that the Theory section is perhaps not accessible for all readers. We have adapted the manuscript in several locations to provide more background information and details on time-of-flight contrast. We found, however, that there is no concise way to first present the MR physics part and then introduce the pial arterial vasculature, as the optimization presented therein is targeted towards this structure. To address this comment, we have therefore opted to provide a brief introduction to TOF-MRA first in the Introduction, and then a more in-depth description in the Theory section.

      Introduction section:

      "Recent studies have shown the potential of time-of-flight (TOF) based magnetic resonance angiography (MRA) at 7 Tesla (T) in subcortical areas (Bouvy et al., 2016, 2014; Ladd, 2007; Mattern et al., 2018; Schulz et al., 2016; von Morze et al., 2007). In brief, TOF-MRA uses the high signal intensity caused by inflowing water protons in the blood to generate contrast, rather than an exogenous contrast agent. By adjusting the imaging parameters of a gradient-recalled echo (GRE) sequence, namely the repetition time (T_R) and flip angle, the signal from static tissue in the background can be suppressed, and high image intensities are only present in blood vessels freshly filled with non-saturated inflowing blood. As the blood flows through the vasculature within the imaging volume, its signal intensity slowly decreases. (For a comprehensive introduction to the principles of MRA, see for example Carr and Carroll (2012)). At ultra-high field, the increased signal-to-noise ratio (SNR), the longer T_1 relaxation times of blood and grey matter, and the potential for higher resolution are key benefits (von Morze et al., 2007)."

      Theory section:

      "Flow-related enhancement

      Before discussing the effects of vessel size, we briefly revisit the fundamental theory of the flow-related enhancement effect used in TOF-MRA. Taking into account the specific properties of pial arteries, we will then extend the classical description to this new regime. In general, TOF-MRA creates high signal intensities in arteries using inflowing blood as an endogenous contrast agent. The object magnetization—created through the interaction between the quantum mechanical spins of water protons and the magnetic field—provides the signal source (or magnetization) accessed via excitation with radiofrequency (RF) waves (called RF pulses) and the reception of ‘echo’ signals emitted by the sample around the same frequency. The T1-contrast in TOF-MRA is based on the difference in the steady-state magnetization of static tissue, which is continuously saturated by RF pulses during the imaging, and the increased or enhanced longitudinal magnetization of inflowing blood water spins, which have experienced no or few RF pulses. In other words, in TOF-MRA we see enhancement for blood that flows into the imaging volume."

      "Since the coverage or slab thickness in TOF-MRA is usually kept small to minimize blood delivery time by shortening the path-length of the vessel contained within the slab (Parker et al., 1991), and because we are focused here on the pial vasculature, we have limited our considerations to a maximum blood delivery time of 1000 ms, with values of few hundreds of milliseconds being more likely."

      2) The main discussion of higher resolution leading to improvements rather than loss presented here seems a bit one-sided: for a more objective understanding of the differences it would be worth to explicitly derive the 'classical' treatment and show how it leads to different conclusions than the present one. In particular, the link made in the discussion between using relative magnetization and modeling partial voluming seems unclear, as both are unrelated. One could also argue that in theory higher resolution imaging is always better, but of course there are practical considerations in play: SNR, dynamics of the measured effect vs speed of acquisition, motion, etc. These issues are not really integrated into the model, even though they provide strong constraints on what can be done. It would be good to at least discuss the constraints that 140 or 160 microns resolution imposes on what is achievable at present.

      Thank you for this excellent suggestion. We found it instructive to illustrate the different effects separately, i.e. relative vs. absolute FRE, and then partial volume vs. no-partial volume effects. In response to comment R2.8 of Reviewer 2, we also clarified the derivation of the relative FRE vs the ‘classical’ absolute FRE (please see R2.8). Accordingly, the manuscript now includes the theoretical derivation in the Theory section and an explicit demonstration of how the classical treatment leads to different conclusions in the Supplementary Material. The important insight gained in our work is that only when considering relative FRE and partial-volume effects together, can we conclude that smaller voxels are advantageous. We have added the following section in the Supplementary Material:

      "Effect of FRE Definition and Interaction with Partial-Volume Model

      For the definition of the FRE effect employed in this study, we used a measure of relative FRE (Al-Kwifi et al., 2002) in combination with a partial-volume model (Eq. 6). To illustrate the implications of these two effects, as well as their interaction, we have estimated the relative and absolute FRE for an artery with a diameter of 200 µm or 2 000 µm (i.e. no partial-volume effects at the centre of the vessel). The absolute FRE expression explicitly takes the voxel volume into account, and so instead of Eq. (6) for the relative FRE we used"

      Eq. (1)

      "Note that the division by M_zS^tissue⋅l_voxel^3 to obtain the relative FRE from this expression removes the contribution of the total voxel volume (l_voxel^3). Supplementary Figure 2 shows that, when partial volume effects are present, the highest relative FRE arises in voxels with the same size as or smaller than the vessel diameter (Supplementary Figure 2A), whereas the absolute FRE increases with voxel size (Supplementary Figure 2C). If no partial-volume effects are present, the relative FRE becomes independent of voxel size (Supplementary Figure 2B), whereas the absolute FRE increases with voxel size (Supplementary Figure 2D). While the partial-volume effects for the relative FRE are substantial, they are much more subtle when using the absolute FRE and do not alter the overall characteristics."

      Supplementary Figure 2: Effect of voxel size and blood delivery time on the relative flow-related enhancement (FRE) using either a relative (A,B) (Eq. (3)) or an absolute (C,D) (Eq. (12)) FRE definition assuming a pial artery diameter of 200 μm (A,C) or 2 000 µm, i.e. no partial-volume effects at the central voxel of this artery considered here.

      In addition, we have also clarified the contribution of the two definitions and their interaction in the Discussion section. Following the suggestion of Reviewer 2, we have extended our interpretation of relative FRE. In brief, absolute FRE is closely related to the physical origin of the contrast, whereas relative FRE is much more concerned with the “segmentability” of a vessel (please see R2.8 for more details):

      "Extending classical FRE treatments to the pial vasculature

      There are several major modifications in our approach to this topic that might explain why, in contrast to predictions from classical FRE treatments, it is indeed possible to image pial arteries. For instance, the definition of vessel contrast or flow-related enhancement is often stated as an absolute difference between blood and tissue signal (Brown et al., 2014a; Carr and Carroll, 2012; Du et al., 1993, 1996; Haacke et al., 1990; Venkatesan and Haacke, 1997). Here, however, we follow the approach of Al-Kwifi et al. (2002) and consider relative contrast. While this distinction may seem to be semantic, the effect of voxel volume on FRE for these two definitions is exactly opposite: Du et al. (1996) concluded that larger voxel size increases the (absolute) vessel-background contrast, whereas here we predict an increase in relative FRE for small arteries with decreasing voxel size. Therefore, predictions of the depiction of small arteries with decreasing voxel size differ depending on whether one is considering absolute contrast, i.e. difference in longitudinal magnetization, or relative contrast, i.e. contrast differences independent of total voxel size. Importantly, this prediction changes for large arteries where the voxel contains only vessel lumen, in which case the relative FRE remains constant across voxel sizes, but the absolute FRE increases with voxel size (Supplementary Figure 2). Overall, the interpretations of relative and absolute FRE differ, and one measure may be more appropriate for certain applications than the other. Absolute FRE describes the difference in magnetization and is thus tightly linked to the underlying physical mechanism. Relative FRE, however, describes the image contrast and segmentability. If blood and tissue magnetization are equal, both contrast measures would equal zero and indicate that no contrast difference is present. However, when there is signal in the vessel and as the tissue magnetization approaches zero, the absolute FRE approaches the blood magnetization (assuming no partial-volume effects), whereas the relative FRE approaches infinity. While this infinite relative FRE does not directly relate to the underlying physical process of ‘infinite’ signal enhancement through inflowing blood, it instead characterizes the segmentability of the image in that an image with zero intensity in the background and non-zero values in the structures of interest can be segmented perfectly and trivially. Accordingly, numerous empirical observations (Al-Kwifi et al., 2002; Bouvy et al., 2014; Haacke et al., 1990; Ladd, 2007; Mattern et al., 2018; von Morze et al., 2007) and the data provided here (Figure 5, 6 and 7) have shown the benefit of smaller voxel sizes if the aim is to visualize and segment small arteries."

      Note that our formulation of the FRE—even without considering SNR—does not suggest that higher resolution is always better, but instead should be matched to the size of the target arteries:

      "Importantly, note that our treatment of the FRE does not suggest that an arbitrarily small voxel size is needed, but instead that voxel sizes appropriate for the arterial diameter of interest are beneficial (in line with the classic “matched-filter” rationale (North, 1963)). Voxels smaller than the arterial diameter would not yield substantial benefits (Figure 5) and may result in SNR reductions that would hinder segmentation performance."

      Further, we have also extended the concluding paragraph of the Imaging limitation section to also include a practical perspective:

      "In summary, numerous theoretical and practical considerations remain for optimal imaging of pial arteries using time-of-flight contrast. Depending on the application, advanced displacement artefact compensation strategies may be required, and zero-filling could provide better vessel depiction. Further, an optimal trade-off between SNR, voxel size and acquisition time needs to be found. Currently, the partial-volume FRE model only considers voxel size, and—as we reduced the voxel size in the experiments—we (partially) compensated the reduction in SNR through longer scan times. This, ultimately, also required the use of prospective motion correction to enable the very long acquisition times necessary for 140 µm isotropic voxel size. Often, anisotropic voxels are used to reduce acquisition time and increase SNR while maintaining in-plane resolution. This may indeed prove advantageous when the (also highly anisotropic) arteries align with the anisotropic acquisition, e.g. when imaging the large supplying arteries oriented mostly in the head-foot direction. In the case of pial arteries, however, there is not preferred orientation because of the convoluted nature of the pial arterial vasculature encapsulating the complex folding of the cortex (see section Anatomical architecture of the pial arterial vasculature). A further reduction in voxel size may be possible in dedicated research settings utilizing even longer acquisition times and/or larger acquisition volumes to maintain SNR. However, if acquisition time is limited, voxel size and SNR need to be carefully balanced against each other."

      3) The article seems to imply that TOF-MRA is the only adequate technique to image brain vasculature, while T2 mapping, UHF T1 mapping (see e.g. Choi et al., https://doi.org/10.1016/j.neuroimage.2020.117259) phase (e.g. Fan et al., doi:10.1038/jcbfm.2014.187), QSM (see e.g. Huck et al., https://doi.org/10.1007/s00429-019-01919-4), or a combination (Bernier et al., https://doi.org/10.1002/hbm.24337​, Ward et al., https://doi.org/10.1016/j.neuroimage.2017.10.049) all depict some level of vascular detail. It would be worth quickly reviewing the different effects of blood on MRI contrast and how those have been used in different approaches to measure vasculature. This would in particular help clarify the experiment combining TOF with T2 mapping used to separate arteries from veins (more on this question below).

      We apologize if we inadvertently created the impression that TOF-MRA is a suitable technique to image the complete brain vasculature, and we agree that susceptibility-based methods are much more suitable for venous structures. As outlined above, we have revised the manuscript in various sections to indicate that it is the pial arterial vasculature we are targeting. We have added a statement on imaging the venous vasculature in the Discussion section. Please see our response below regarding the use of T2* to separate arteries and veins.

      "The advantages of imaging the pial arterial vasculature using TOF-MRA without an exogenous contrast agent lie in its non-invasiveness and the potential to combine these data with various other structural and functional image contrasts provided by MRI. One common application is to acquire a velocity-encoded contrast such as phase-contrast MRA (Arts et al., 2021; Bouvy et al., 2016). Another interesting approach utilises the inherent time-of-flight contrast in magnetization-prepared two rapid acquisition gradient echo (MP2RAGE) images acquired at ultra-high field that simultaneously acquires vasculature and structural data, albeit at lower achievable resolution and lower FRE compared to the TOF-MRA data in our study (Choi et al., 2020). In summary, we expect high-resolution TOF-MRA to be applicable also for group studies to address numerous questions regarding the relationship of arterial topology and morphometry to the anatomical and functional organization of the brain, and the influence of arterial topology and morphometry on brain hemodynamics in humans. In addition, imaging of the pial venous vasculature—using susceptibility-based contrasts such as T2-weighted magnitude (Gulban et al., 2021) or phase imaging (Fan et al., 2015), susceptibility-weighted imaging (SWI) (Eckstein et al., 2021; Reichenbach et al., 1997) or quantitative susceptibility mapping (QSM) (Bernier et al., 2018; Huck et al., 2019; Mattern et al., 2019; Ward et al., 2018)—would enable a comprehensive assessment of the complete cortical vasculature and how both arteries and veins shape brain hemodynamics.*"

      4) The results, while very impressive, are mostly qualitative. This seems a missed opportunity to strengthen the points of the paper: given the segmentations already made, the amount/density of detected vessels could be compared across scans for the data of Fig. 5 and 7. The minimum distance between vessels could be measured in Fig. 8 to show a 2D distribution and/or a spatial map of the displacement. The number of vessels labeled as veins instead of arteries in Fig. 9 could be given.

      We fully agree that estimating these quantitative measures would be very interesting; however, this would require the development of a comprehensive analysis framework, which would considerably shift the focus of this paper from data acquisition and flow-related enhancement to data analysis. As noted in the discussion section Challenges for vessel segmentation algorithms, ‘The vessel segmentations presented here were performed to illustrate the sensitivity of the image acquisition to small pial arteries’, because the smallest arteries tend to be concealed in the maximum intensity projections. Further, the interpretation of these measures is not straightforward. For example, the number of detected vessels for the artery depicted in Figure 5 does not change across resolutions, but their length does. We have therefore estimated the relative increase in skeleton length across resolutions for Figures 5 and 7. However, these estimates are not only a function of the voxel size but also of the underlying vasculature, i.e. the number of arteries with a certain diameter present, and may thus not generalise well to enable quantitative predictions of the improvement expected from increased resolutions. We have added an illustration of these analyses in the Supplementary Material, and the following additions in the Methods, Results and Discussion sections.

      "For vessel segmentation, a semi-automatic segmentation pipeline was implemented in Matlab R2020a (The MathWorks, Natick, MA) using the UniQC toolbox (Frässle et al., 2021): First, a brain mask was created through thresholding which was then manually corrected in ITK-SNAP (http://www.itksnap.org/) (Yushkevich et al., 2006) such that pial vessels were included. For the high-resolution TOF data (Figures 6 and 7, Supplementary Figure 4), denoising to remove high frequency noise was performed using the implementation of an adaptive non-local means denoising algorithm (Manjón et al., 2010) provided in DenoiseImage within the ANTs toolbox, with the search radius for the denoising set to 5 voxels and noise type set to Rician. Next, the brain mask was applied to the bias corrected and denoised data (if applicable). Then, a vessel mask was created based on a manually defined threshold, and clusters with less than 10 or 5 voxels for the high- and low-resolution acquisitions, respectively, were removed from the vessel mask. Finally, an iterative region-growing procedure starting at each voxel of the initial vessel mask was applied that successively included additional voxels into the vessel mask if they were connected to a voxel which was already included and above a manually defined threshold (which was slightly lower than the previous threshold). Both thresholds were applied globally but manually adjusted for each slab. No correction for motion between slabs was applied. The Matlab code describing the segmentation algorithm as well as the analysis of the two-echo TOF acquisition outlined in the following paragraph are also included in our github repository (https://gitlab.com/SaskiaB/pialvesseltof.git). To assess the data quality, maximum intensity projections (MIPs) were created and the outline of the segmentation MIPs were added as an overlay. To estimate the increased detection of vessels with higher resolutions, we computed the relative increase in the length of the segmented vessels for the data presented in Figure 5 (0.8 mm, 0.5 mm, 0.4 mm and 0.3 mm isotropic voxel size) and Figure 7 (0.16 mm and 0.14 mm isotropic voxel size) by computing the skeleton using the bwskel Matlab function and then calculating the skeleton length as the number of voxels in the skeleton multiplied by the voxel size."

      "To investigate the effect of voxel size on vessel FRE, we acquired data at four different voxel sizes ranging from 0.8 mm to 0.3 mm isotropic resolution, adjusting only the encoding matrix, with imaging parameters being otherwise identical (FOV, TR, TE, flip angle, R, slab thickness, see section Data acquisition). The total acquisition time increases from less than 2 minutes for the lowest resolution scan to over 6 minutes for the highest resolution scan as a result. Figure 5 shows thin maximum intensity projections of a small vessel. While the vessel is not detectable at the largest voxel size, it slowly emerges as the voxel size decreases and approaches the vessel size. Presumably, this is driven by the considerable increase in FRE as seen in the single slice view (Figure 5, small inserts). Accordingly, the FRE computed from the vessel mask for the smallest part of the vessel (Figure 5, red mask) increases substantially with decreasing voxel size. More precisely, reducing the voxel size from 0.8 mm, 0.5 mm or 0.4 mm to 0.3 mm increases the FRE by 2900 %, 165 % and 85 %, respectively. Assuming a vessel diameter of 300 μm, the partial-volume FRE model (section Introducing a partial-volume model) would predict similar ratios of 611%, 178% and 78%. However, as long as the vessel is larger than the voxel (Figure 5, blue mask), the relative FRE does not change with resolution (see also Effect of FRE Definition and Interaction with Partial-Volume Model in the Supplementary Material). To illustrate the gain in sensitivity to detect smaller arteries, we have estimated the relative increase of the total length of the segmented vasculature (Supplementary Figure 9): reducing the voxel size from 0.8 mm to 0.5 mm isotropic increases the skeleton length by 44 %, reducing the voxel size from 0.5 mm to 0.4 mm isotropic increases the skeleton length by 28 %, and reducing the voxel size from 0.4 mm to 0.3 mm isotropic increases the skeleton length by 31 %. In summary, when imaging small pial arteries, these data support the hypothesis that it is primarily the voxel size, not the blood delivery time, which determines whether vessels can be resolved."

      "Indeed, the reduction in voxel volume by 33 % revealed additional small branches connected to larger arteries (see also Supplementary Figure 8). For this example, we found an overall increase in skeleton length of 14 % (see also Supplementary Figure 9)."

      "We therefore expect this strategy to enable an efficient image acquisition without the need for additional venous suppression RF pulses. Once these challenges for vessel segmentation algorithms are addressed, a thorough quantification of the arterial vasculature can be performed. For example, the skeletonization procedure used to estimate the increase of the total length of the segmented vasculature (Supplementary Figure 9) exhibits errors particularly in the unwanted sinuses and large veins. While they are consistently present across voxel sizes, and thus may have less impact on relative change in skeleton length, they need to be addressed when estimating the absolute length of the vasculature, or other higher-order features such as number of new branches. (Note that we have also performed the skeletonization procedure on the maximum intensity projections to reduce the number of artefacts and obtained comparable results: reducing the voxel size from 0.8 mm to 0.5 mm isotropic increases the skeleton length by 44 % (3D) vs 37 % (2D), reducing the voxel size from 0.5 mm to 0.4 mm isotropic increases the skeleton length by 28 % (3D) vs 26 % (2D), reducing the voxel size from 0.4 mm to 0.3 mm isotropic increases the skeleton length by 31 % (3D) vs 16 % (2D), and reducing the voxel size from 0.16 mm to 0.14 mm isotropic increases the skeleton length by 14 % (3D) vs 24 % (2D).)"

      Supplementary Figure 9: Increase of vessel skeleton length with voxel size reduction. Axial maximum intensity projections for data acquired with different voxel sizes ranging from 0.8 mm to 0.3 mm (TOP) (corresponding to Figure 5) and 0.16 mm to 0.14 mm isotropic (corresponding to Figure 7) are shown. Vessel skeletons derived from segmentations performed for each resolution are overlaid in red. A reduction in voxel size is accompanied by a corresponding increase in vessel skeleton length.

      Regarding further quantification of the vessel displacement presented in Figure 8, we have estimated the displacement using the Horn-Schunck optical flow estimator (Horn and Schunck, 1981; Mustafa, 2016) (https://github.com/Mustafa3946/Horn-Schunck-3D-Optical-Flow). However, the results are dominated by the larger arteries, whereas we are mostly interested in the displacement of the smallest arteries, therefore this quantification may not be helpful.

      Because the theoretical relationship between vessel displacement and blood velocity is well known (Eq. 7), and we have also outlined the expected blood velocity as a function of arterial diameter in Figure 2, which provided estimates of displacements that matched what was found in our data (as reported in our original submission), we believe that the new quantification in this form does not add value to the manuscript. What would be interesting would be to explore the use of this displacement artefact as a measure of blood velocities. This, however, would require more substantial analyses in particular for estimation of the arterial diameter and additional validation data (e.g. phase-contrast MRA). We have outlined this avenue in the Discussion section. What is relevant to the main aim of this study, namely imaging of small pial arteries, is the insight that blood velocities are indeed sufficiently fast to cause displacement artefacts even in smaller arteries. We have clarified this in the Results section:

      "Note that correction techniques exist to remove displaced vessels from the image (Gulban et al., 2021), but they cannot revert the vessels to their original location. Alternatively, this artefact could also potentially be utilised as a rough measure of blood velocity."

      "At a delay time of 10 ms between phase encoding and echo time, the observed displacement of approximately 2 mm in some of the larger vessels would correspond to a blood velocity of 200 mm/s, which is well within the expected range (Figure 2). For the smallest arteries, a displacement of one voxel (0.4 mm) can be observed, indicative of blood velocities of 40 mm/s. Note that the vessel displacement can be observed in all vessels visible at this resolution, indicating high blood velocities throughout much of the pial arterial vasculature. Thus, assuming a blood velocity of 40 mm/s (Figure 2) and a delay time of 5 ms for the high-resolution acquisitions (Figure 6), vessel displacements of 0.2 mm are possible, representing a shift of 1–2 voxels."

      Regarding the number of vessels labelled as veins, please see our response below to R1.5.

      In the main quantification given, the estimation of FRE increase with resolution, it would make more sense to perform the segmentation independently for each scan and estimate the corresponding FRE: using the mask from the highest resolution scan only biases the results. It is unclear also if the background tissue measurement one voxel outside took partial voluming into account (by leaving a one voxel free interface between vessel and background). In this analysis, it would also be interesting to estimate SNR, so you can compare SNR and FRE across resolutions, also helpful for the discussion on SNR.

      The FRE serves as an indicator of the potential performance of any segmentation algorithm (including manual segmentation) (also see our discussion on the interpretation of FRE in our response to R1.2). If we were to segment each scan individually, we would, in the ideal case, always obtain the same FRE estimate, as FRE influences the performance of the segmentation algorithm. In practice, this simply means that it is not possible to segment the vessel in the low-resolution image to its full extent that is visible in the high-resolution image, because the FRE is too low for small vessels. However, we agree with the core point that the reviewer is making, and so to help address this, a valuable addition would be to compare the FRE for the section of a vessel that is visible at all resolutions, where we found—within the accuracy of the transformations and resampling across such vastly different resolutions—that the FRE does not increase any further with higher resolution if the vessel is larger than the voxel size (page 18 and Figure 5). As stated in the Methods section, and as noted by the reviewer, we used the voxels immediately next to the vessel mask to define the background tissue signal level. Any resulting potential partial-volume effects in these background voxels would affect all voxel sizes, introducing a consistent bias that would not impact our comparison. However, inspection of the image data in Figure 5 showed partial-volume effects predominantly within those voxels intersecting the vessel, rather than voxels surrounding the vessel, in agreement with our model of FRE.

      "All imaging data were slab-wise bias-field corrected using the N4BiasFieldCorrection (Tustison et al., 2010) tool in ANTs (Avants et al., 2009) with the default parameters. To compare the empirical FRE across the four different resolutions (Figure 5), manual masks were first created for the smallest part of the vessel in the image with the highest resolution and for the largest part of the vessel in the image with the lowest resolution. Then, rigid-body transformation parameters from the low-resolution to the high-resolution (and the high-resolution to the low-resolution) images were estimated using coregister in SPM (https://www.fil.ion.ucl.ac.uk/spm/), and their inverse was applied to the vessel mask using SPM’s reslice. To calculate the empirical FRE (Eq. (3)), the mean of the intensity values within the vessel mask was used to approximate the blood magnetization, and the mean of the intensity values one voxel outside of the vessel mask was used as the tissue magnetization."

      "To investigate the effect of voxel size on vessel FRE, we acquired data at four different voxel sizes ranging from 0.8 mm to 0.3 mm isotropic resolution, adjusting only the encoding matrix, with imaging parameters being otherwise identical (FOV, TR, TE, flip angle, R, slab thickness, see section Data acquisition). The total acquisition time increases from less than 2 minutes for the lowest resolution scan to over 6 minutes for the highest resolution scan as a result. Figure 5 shows thin maximum intensity projections of a small vessel. While the vessel is not detectable at the largest voxel size, it slowly emerges as the voxel size decreases and approaches the vessel size. Presumably, this is driven by the considerable increase in FRE as seen in the single slice view (Figure 5, small inserts). Accordingly, the FRE computed from the vessel mask for the smallest part of the vessel (Figure 5, red mask) increases substantially with decreasing voxel size. More precisely, reducing the voxel size from 0.8 mm, 0.5 mm or 0.4 mm to 0.3 mm increases the FRE by 2900 %, 165 % and 85 %, respectively. Assuming a vessel diameter of 300 μm, the partial-volume FRE model (section Introducing a partial-volume model) would predict similar ratios of 611%, 178% and 78%. However, if the vessel is larger than the voxel (Figure 5, blue mask), the relative FRE remains constant across resolutions (see also Effect of FRE Definition and Interaction with Partial-Volume Model in the Supplementary Material). To illustrate the gain in sensitivity to smaller arteries, we have estimated the relative increase of the total length of the segmented vasculature (Supplementary Figure 9): reducing the voxel size from 0.8 mm to 0.5 mm isotropic increases the skeleton length by 44 %, reducing the voxel size from 0.5 mm to 0.4 mm isotropic increases the skeleton length by 28 %, and reducing the voxel size from 0.4 mm to 0.3 mm isotropic increases the skeleton length by 31 %. In summary, when imaging small pial arteries, these data support the hypothesis that it is primarily the voxel size, not blood delivery time, which determines whether vessels can be resolved."

      Figure 5: Effect of voxel size on flow-related vessel enhancement. Thin axial maximum intensity projections containing a small artery acquired with different voxel sizes ranging from 0.8 mm to 0.3 mm isotropic are shown. The FRE is estimated using the mean intensity value within the vessel masks depicted on the left, and the mean intensity values of the surrounding tissue. The small insert shows a section of the artery as it lies within a single slice. A reduction in voxel size is accompanied by a corresponding increase in FRE (red mask), whereas no further increase is obtained once the voxel size is equal or smaller than the vessel size (blue mask).

      After many internal discussions, we had to conclude that deducing a meaningful SNR analysis that would benefit the reader was not possible given the available data due to the complex relationship between voxel size and other imaging parameters in practice. In detail, we have reduced the voxel size but at the same time increased the acquisition time by increasing the number of encoding steps—which we have now also highlighted in the manuscript. We have, however, added additional considerations about balancing SNR and segmentation performance. Note that these considerations are not specific to imaging the pial arteries but apply to all MRA acquisitions, and have thus been discussed previously in the literature. Here, we wanted to focus on the novel insights gained in our study. Importantly, while we previously noted that reducing voxel size improves contrast in vessels whose diameters are smaller than the voxel size, we now explicitly acknowledge that, for vessels whose diameters are larger than the voxel size reducing the voxel size is not helpful---since it only reduces SNR without any gain in contrast---and may hinder segmentation performance, and thus become counterproductive.

      "In general, we have not considered SNR, but only FRE, i.e. the (relative) image contrast, assuming that segmentation algorithms would benefit from higher contrast for smaller arteries. Importantly, the acquisition parameters available to maximize FRE are limited, namely repetition time, flip angle and voxel size. SNR, however, can be improved via numerous avenues independent of these parameters (Brown et al., 2014b; Du et al., 1996; Heverhagen et al., 2008; Parker et al., 1991; Triantafyllou et al., 2011; Venkatesan and Haacke, 1997), the simplest being longer acquisition times. If the aim is to optimize a segmentation outcome for a given acquisition time, the trade-off between contrast and SNR for the specific segmentation algorithm needs to be determined (Klepaczko et al., 2016; Lesage et al., 2009; Moccia et al., 2018; Phellan and Forkert, 2017). Our own—albeit limited—experience has shown that segmentation algorithms (including manual segmentation) can accommodate a perhaps surprising amount of noise using prior knowledge and neighborhood information, making these high-resolution acquisitions possible. Importantly, note that our treatment of the FRE does not suggest that an arbitrarily small voxel size is needed, but instead that voxel sizes appropriate for the arterial diameter of interest are beneficial (in line with the classic “matched-filter” rationale (North, 1963)). Voxels smaller than the arterial diameter would not yield substantial benefits (Figure 5) and may result in SNR reductions that would hinder segmentation performance."

      5) The separation of arterial and venous components is a bit puzzling, partly because the methodology used is not fully explained, but also partly because the reasons invoked (flow artefact in large pial veins) do not match the results (many small vessels are included as veins). This question of separating both types of vessels is quite important for applications, so the whole procedure should be explained in detail. The use of short T2 seemed also sub-optimal, as both arteries and veins result in shorter T2 compared to most brain tissues: wouldn't a susceptibility-based measure (SWI or better QSM) provide a better separation? Finally, since the T2* map and the regular TOF map are at different resolutions, masking out the vessels labeled as veins will likely result in the smaller veins being left out.

      We agree that while the technical details of this approach were provided in the Data analysis section, the rationale behind it was only briefly mentioned. We have therefore included an additional section Inflow-artefacts in sinuses and pial veins in the Theory section of the manuscript. We have also extended the discussion of the advantages and disadvantages of the different susceptibility-based contrasts, namely T2, SWI and QSM. While in theory both T2 and QSM should allow the reliable differentiation of arterial and venous blood, we found T2* to perform more robustly, as QSM can fail in many places, e.g., due to the strong susceptibility sources within superior sagittal and transversal sinuses and pial veins and their proximity to the brain surface, dedicated processing is required (Stewart et al., 2022). Further, we have also elaborated in the Discussion section why the interpretation of Figure 9 regarding the absence or presence of small veins is challenging. Namely, the intensity-based segmentation used here provides only an incomplete segmentation even of the larger sinuses, because the overall lower intensity found in veins combined with the heterogeneity of the intensities in veins violates the assumptions made by most vascular segmentation approaches of homogenous, high image intensities within vessels, which are satisfied in arteries (page 29f) (see also the illustration below). Accordingly, quantifying the number of vessels labelled as veins (R1.4a) would provide misleading results, as often only small subsets of the same sinus or vein are segmented.

      "Inflow-artefacts in sinuses and pial veins

      Inflow in large pial veins and the sagittal and transverse sinuses can cause flow-related enhancement in these non-arterial vessels. One common strategy to remove this unwanted signal enhancement is to apply venous suppression pulses during the data acquisition, which saturate bloods spins outside the imaging slab. Disadvantages of this technique are the technical challenges of applying these pulses at ultra-high field due to constraints of the specific absorption rate (SAR) and the necessary increase in acquisition time (Conolly et al., 1988; Heverhagen et al., 2008; Johst et al., 2012; Maderwald et al., 2008; Schmitter et al., 2012; Zhang et al., 2015). In addition, optimal positioning of the saturation slab in the case of pial arteries requires further investigation, and in particular supressing signal from the superior sagittal sinus without interfering in the imaging of the pial arteries vasculature at the top of the cortex might prove challenging. Furthermore, this venous saturation strategy is based on the assumption that arterial blood is traveling head-wards while venous blood is drained foot-wards. For the complex and convoluted trajectory of pial vessels this directionality-based saturation might be oversimplified, particularly when considering the higher-order branches of the pial arteries and veins on the cortical surface. Inspired by techniques to simultaneously acquire a TOF image for angiography and a susceptibility-weighted image for venography (Bae et al., 2010; Deistung et al., 2009; Du et al., 1994; Du and Jin, 2008), we set out to explore the possibility of removing unwanted venous structures from the segmentation of the pial arterial vasculature during data postprocessing. Because arteries filled with oxygenated blood have T2-values similar to tissue, while veins have much shorter T2-values due to the presence of deoxygenated blood (Pauling and Coryell, 1936; Peters et al., 2007; Uludağ et al., 2009; Zhao et al., 2007), we used this criterion to remove vessels with short T2* values from the segmentation (see Data Analysis for details). In addition, we also explored whether unwanted venous structures in the high-resolution TOF images—where a two-echo acquisition is not feasible due to the longer readout—can be removed based on detecting them in a lower-resolution image."

      "Removal of pial veins

      Inflow in large pial veins and the superior sagittal and transverse sinuses can cause a flow-related enhancement in these non-arterial vessels (Figure 9, left). The higher concentration of deoxygenated haemoglobin in these vessels leads to shorter T2 values (Pauling and Coryell, 1936), which can be estimated using a two-echo TOF acquisition (see also Inflow-artefacts in sinuses and pial veins). These vessels can be identified in the segmentation based on their T2 values (Figure 9, left), and removed from the angiogram (Figure 9, right) (Bae et al., 2010; Deistung et al., 2009; Du et al., 1994; Du and Jin, 2008). In particular, the superior and inferior sagittal and the transversal sinuses and large veins which exhibited an inhomogeneous intensity profile and a steep loss of intensity at the slab boundary were identified as non-arterial (Figure 9, left). Further, we also explored the option of removing unwanted venous vessels from the high-resolution TOF image (Figure 7) using a low-resolution two-echo TOF (not shown). This indeed allowed us to remove the strong signal enhancement in the sagittal sinuses and numerous larger veins, although some small veins, which are characterised by inhomogeneous intensity profiles and can be detected visually by experienced raters, remain."

      Figure 9: Removal of non-arterial vessels in time-of-flight imaging. LEFT: Segmentation of arteries (red) and veins (blue) using T_2^ estimates. RIGHT: Time-of-flight angiogram after vein removal.*

      Our approach also assumes that the unwanted veins are large enough that they are also resolved in the low-resolution image. If we consider the source of the FRE effect, it might indeed be exclusively large veins that are present in TOF-MRA data, which would suggest that our assumption is valid. Fundamentally, the FRE depends on the inflow of un-saturated spins into the imaging slab. However, small veins drain capillary beds in the local tissue, i.e. the tissue within the slab. (Note that due to the slice oversampling implemented in our acquisition, spins just above or below the slab will also be excited.) Thus, small veins only contain blood water spins that have experienced a large number of RF pulses due to the long transit time through the pial arterial vasculature, the capillaries and the intracortical venules. Hence, their longitudinal magnetization would be similar to that of stationary tissue. To generate an FRE effect in veins, “pass-through” venous blood from outside the imaging slab is required. This is only available in veins that are passing through the imaging slab, which have much larger diameters. These theoretical considerations are corroborated by the findings in Figure 9, where large disconnected vessels with varying intensity profiles were identified as non-arterial. Due to the heterogenous intensity profiles in large veins and the sagittal and transversal sinuses, the intensity-based segmentation applied here may only label a subset of the vessel lumen, creating the impression of many small veins. This is particularly the case for the straight and inferior sagittal sinus in the bottom slab of Figure 9. Nevertheless, future studies potentially combing anatomical prior knowledge, advanced segmentation algorithms and susceptibility measures would be capable of removing these unwanted veins in post-processing to enable an efficient TOF-MRA image acquisition dedicated to optimally detecting small arteries without the need for additional venous suppression RF pulses.

      6) A more general question also is why this imaging method is limited to pial vessels: at 140 microns, the larger intra-cortical vessels should be appearing (group 6 in Duvernoy, 1981: diameters between 50 and 240 microns). Are there other reasons these vessels are not detected? Similarly, it seems there is no arterial vasculature detected in the white matter here: it is due to the rather superior location of the imaging slab, or a limitation of the method? Likewise, all three results focus on a rather homogeneous region of cerebral cortex, in terms of vascularisation. It would be interesting for applications to demonstrate the capabilities of the method in more complex regions, e.g. the densely vascularised cerebellum, or more heterogeneous regions like the midbrain. Finally, it is notable that all three subjects appear to have rather different densities of vessels, from sparse (participant II) to dense (participant I), with some inhomogeneities in density (frontal region in participant III) and inconsistencies in detection (sinuses absent in participant II). All these points should be discussed.

      While we are aware that the diameter of intracortical arteries has been suggested to be up to 240 µm (Duvernoy et al., 1981), it remains unclear how prevalent intracortical arteries of this size are. For example, note that in a different context in the Duvernoy study (in teh revised manuscript), the following values are mentioned (which we followed in Figure 1):

      “Central arteries of the Iobule always have a large diameter of 260 µ to 280 µ, at their origin. Peripheral arteries have an average diameter of 150 µ to 180 µ. At the cortex surface, all arterioles of 50 µ or less, penetrate the cortex or form anastomoses. The diameter of most of these penetrating arteries is approximately 40 µ.”

      Further, the examinations by Hirsch et al. (2012) (albeit in the macaque brain), showed one (exemplary) intracortical artery belonging to group 6 (Figure 1B), whose diameter appears to be below 100 µm. Given these discrepancies and the fact that intracortical arteries in group 5 only reach 75 µm, we suspect that intracortical arteries with diameters > 140 µm are a very rare occurrence, which we might not have encountered in this data set.

      Similarly, arteries in white matter (Nonaka et al., 2003) and the cerebellum (Duvernoy et al., 1983) are beyond our resolution at the moment. The midbrain is an interesting suggesting, although we believe that the cortical areas chosen here with their gradual reduction in diameter along the vascular tree, provide a better illustration of the effect of voxel size than the rather abrupt reduction in vascular diameter found in the midbrain. We have added the even higher resolution requirements in the discussion section:

      "In summary, we expect high-resolution TOF-MRA to be applicable also for group studies, to address numerous questions regarding the relationship of arterial topology and morphometry to the anatomical and functional organization of the brain, and the influence of arterial topology and morphometry on brain hemodynamics in humans. Notably, we have focused on imaging pial arteries of the human cerebrum; however, other brain structures such as the cerebellum, subcortex and white matter are of course also of interest. While the same theoretical considerations apply, imaging the arterial vasculature in these structures will require even smaller voxel sizes due to their smaller arterial diameters (Duvernoy et al., 1983, 1981; Nonaka et al., 2003)."

      Regarding the apparent sparsity of results from participant II, this is mostly driven by the much smaller coverage in this subject (19.6 mm in Participant II vs. 50 mm and 58 mm in Participant I and III, respectively). The reduction in density in the frontal regions might indeed constitute difference in anatomy or might be driven by the presence or more false-positive veins in Participant I than Participant III in these areas. Following the depiction in Duvernoy et al. (1981), one would not expect large arteries in frontal areas, but large veins are common. Thus, the additional vessels in Participant I in the frontal areas might well be false-positive veins, and their removal would result in similar densities for both participants. Indeed, as pointed out in section Future directions, we would expect a lower arterial density in frontal and posterior areas than in middle areas. The sinuses (and other large false-positive veins) in Participant II have been removed as outlined and discussed in sections Removal of pial veins and Challenges for vessel segmentation algorithms, respectively.

      7) One of the main practical limitations of the proposed method is the use of a very small imaging slab. It is mentioned in the discussion that thicker slabs are not only possible, but beneficial both in terms of SNR and acceleration possibilities. What are the limitations that prevented their use in the present study? With the current approach, what would be the estimated time needed to acquire the vascular map of an entire brain? It would also be good to indicate whether specific processing was needed to stitch together the multiple slab images in Fig. 6-9, S2.

      Time-of-flight acquisitions are commonly performed with thin acquisition slabs, following initial investigations by Parker et al. (1991) to maximise vessel sensitivity and minimize noise. We therefore followed this practice for our initial investigations but wanted to point out in the discussion that thicker slabs might provide several advantages that need to be evaluated in future studies. This would include theoretical and empirical evaluations balancing SNR gains from larger excitation volumes and SNR losses due to more acceleration. For this study, we have chosen the slab thickness such as to keep the acquisition time at a reasonable amount to minimize motion artefacts (as outlined in the Discussion). In addition, due to the extreme matrix sizes in particular for the 0.14 mm acquisition, we were also limited in the number of data points per image that can be indexed. This would require even more substantial changes to the sequence than what we have already performed. With 16 slabs, assuming optimal FOV orientation, full-brain coverage including the cerebellum of 95 % of the population (Mennes et al., 2014) could be achieved with an acquisition time of (16  11 min 42 s = 3 h 7 min 12 s) at 0.16 mm isotropic voxel size. No stitching of the individual slabs was performed, as subject motion was minimal. We have added a corresponding comment in the Data Analysis.

      "Both thresholds were applied globally but manually adjusted for each slab. No correction for motion between slabs was applied as subject motion was minimal. The Matlab code describing the segmentation algorithm as well es the analysis of the two-echo TOF acquisition outlined in the following paragraph are also included in the github repository (https://gitlab.com/SaskiaB/pialvesseltof.git)."

      8) Some researchers and clinicians will argue that you can attain best results with anisotropic voxels, combining higher SNR and higher resolution. It would be good to briefly mention why isotropic voxels are preferred here, and whether anisotropic voxels would make sense at all in this context.

      Anisotropic voxels can be advantageous if the underlying object is anisotropic, e.g. an artery running straight through the slab, which would have a certain diameter (imaged using the high-resolution plane) and an ‘infinite’ elongation (in the low-resolution direction). However, the vessels targeted here can have any orientation and curvature; an anisotropic acquisition could therefore introduce a bias favouring vessels with a particular orientation relative to the voxel grid. Note that the same argument applies when answering the question why a further reduction slab thickness would eventually result in less increase in FRE (section Introducing a partial-volume model). We have added a corresponding comment in our discussion on practical imaging considerations:

      "In summary, numerous theoretical and practical considerations remain for optimal imaging of pial arteries using time-of-flight contrast. Depending on the application, advanced displacement artefact compensation strategies may be required, and zero-filling could provide better vessel depiction. Further, an optimal trade-off between SNR, voxel size and acquisition time needs to be found. Currently, the partial-volume FRE model only considers voxel size, and—as we reduced the voxel size in the experiments—we (partially) compensated the reduction in SNR through longer scan times. This, ultimately, also required the use of prospective motion correction to enable the very long acquisition times necessary for 140 µm isotropic voxel size. Often, anisotropic voxels are used to reduce acquisition time and increase SNR while maintaining in-plane resolution. This may indeed prove advantageous when the (also highly anisotropic) arteries align with the anisotropic acquisition, e.g. when imaging the large supplying arteries oriented mostly in the head-foot direction. In the case of pial arteries, however, there is not preferred orientation because of the convoluted nature of the pial arterial vasculature encapsulating the complex folding of the cortex (see section Anatomical architecture of the pial arterial vasculature). A further reduction in voxel size may be possible in dedicated research settings utilizing even longer acquisition times and a larger field-of-view to maintain SNR. However, if acquisition time is limited, voxel size and SNR need to be carefully balanced against each other."

      Reviewer #2 (Public Review):

      Overview

      This paper explores the use of inflow contrast MRI for imaging the pial arteries. The paper begins by providing a thorough background description of pial arteries, including past studies investigating the velocity and diameter. Following this, the authors consider this information to optimize the contrast between pial arteries and background tissue. This analysis reveals spatial resolution to be a strong factor influencing the contrast of the pial arteries. Finally, experiments are performed on a 7T MRI to investigate: the effect of spatial resolution by acquiring images at multiple resolutions, demonstrate the feasibility of acquiring ultrahigh resolution 3D TOF, the effect of displacement artifacts, and the prospect of using T2* to remove venous voxels.

      Impression

      There is certainly interest in tools to improve our understanding of the architecture of the small vessels of the brain and this work does address this. The background description of the pial arteries is very complete and the manuscript is very well prepared. The images are also extremely impressive, likely benefiting from motion correction, 7T, and a very long scan time. The authors also commit to open science and provide the data in an open platform. Given this, I do feel the manuscript to be of value to the community; however, there are concerns with the methods for optimization, the qualitative nature of the experiments, and conclusions drawn from some of the experiments.

      Specific Comments :

      1) Figure 3 and Theory surrounding. The optimization shown in Figure 3 is based fixing the flip angle or the TR. As is well described in the literature, there is a strong interdependency of flip angle and TR. This is all well described in literature dating back to the early 90s. While I think it reasonable to consider these effects in optimization, the language needs to include this interdependency or simply reference past work and specify how the flip angle was chosen. The human experiments do not include any investigation of flip angle or TR optimization.

      We thank the reviewer for raising this valuable point, and we fully agree that there is an interdependency between these two parameters. To simplify our optimization, we did fix one parameter value at a time, but in the revised manuscript we clarified that both parameters can be optimized simultaneously. Importantly, a large range of parameter values will result in a similar FRE in the small artery regime, which is illustrated in the optimization provided in the main text. We have therefore chosen the repetition time based on encoding efficiency and then set a corresponding excitation flip angle. In addition, we have also provided additional simulations in the supplementary material outlining the interdependency for the case of pial arteries.

      "Optimization of repetition time and excitation flip angle

      As the main goal of the optimisation here was to start within an already established parameter range for TOF imaging at ultra-high field (Kang et al., 2010; Stamm et al., 2013; von Morze et al., 2007), we only needed to then further tailor these for small arteries by considering a third parameter, namely the blood delivery time. From a practical perspective, a TR of 20 ms as a reference point was favourable, as it offered a time-efficient readout minimizing wait times between excitations but allowing low encoding bandwidths to maximize SNR. Due to the interdependency of flip angle and repetition time, for any one blood delivery time any FRE could (in theory) be achieved. For example, a similar FRE curve at 18 ° flip angle and 5 ms TR can also be achieved at 28 ° flip angle and 20 ms TR; or the FRE curve at 18 ° flip angle and 30 ms TR is comparable to the FRE curve at 8 ° flip angle and 5 ms TR (Supplementary Figure 3 TOP). In addition, the difference between optimal parameter settings diminishes for long blood delivery times, such that at a blood delivery time of 500 ms (Supplementary Figure 3 BOTTOM), the optimal flip angle at a TR of 15 ms, 20 ms or 25 ms would be 14 °, 16 ° and 18 °, respectively. This is in contrast to a blood delivery time of 100 ms, where the optimal flip angles would be 32 °, 37 ° and 41 °. In conclusion, in the regime of small arteries, long TR values in combination with low flip angles ensure flow-related enhancement at blood delivery times of 200 ms and above, and within this regime there are marginal gains by further optimizing parameter values and the optimal values are all similar."

      Supplementary Figure 3: Optimal imaging parameters for small arteries. This assessment follows the simulations presented in Figure 3, but in addition shows the interdependency for the corresponding third parameter (either flip angle or repetition time). TOP: Flip angles close to the Ernst angle show only a marginal flow-related enhancement; however, the influence of the blood delivery time decreases further (LEFT). As the flip angle increases well above the values used in this study, the flow-related enhancement in the small artery regime remains low even for the longer repetition times considered here (RIGHT). BOTTOM: The optimal excitation flip angle shows reduced variability across repetition times in the small artery regime compared to shorter blood delivery times.

      "Based on these equations, optimal T_R and excitation flip angle values (θ) can be calculated for the blood delivery times under consideration (Figure 3). To better illustrate the regime of small arteries, we have illustrated the effect of either flip angle or T_R while keeping the other parameter values fixed to the value that was ultimately used in the experiments; although both parameters can also be optimized simultaneously (Haacke et al., 1990). Supplementary Figure 3 further delineates the interdependency between flip angle and T_R within a parameter range commonly used for TOF imaging at ultra-high field (Kang et al., 2010; Stamm et al., 2013; von Morze et al., 2007). Note how longer T_R values still provide an FRE effect even at very long blood delivery times, whereas using shorter T_R values can suppress the FRE effect (Figure 3, left). Similarly, at lower flip angles the FRE effect is still present for long blood delivery times, but it is not available anymore at larger flip angles, which, however, would give maximum FRE for shorter blood delivery times (Figure 3, right). Due to the non-linear relationships of both blood delivery time and flip angle with FRE, the optimal imaging parameters deviate considerably when comparing blood delivery times of 100 ms and 300 ms, but the differences between 300 ms and 1000 ms are less pronounced. In the following simulations and measurements, we have thus used a T_R value of 20 ms, i.e. a value only slightly longer than the readout of the high-resolution TOF acquisitions, which allowed time-efficient data acquisition, and a nominal excitation flip angle of 18°. From a practical standpoint, these values are also favorable as the low flip angle reduces the specific absorption rate (Fiedler et al., 2018) and the long T_R value decreases the potential for peripheral nerve stimulation (Mansfield and Harvey, 1993)."

      2) Figure 4 and Theory surrounding. A major limitation of this analysis is the lack of inclusion of noise in the analysis. I believe the results to be obvious that the FRE will be modulated by partial volume effects, here described quadratically by assuming the vessel to pass through the voxel. This would substantially modify the analysis, with a shift towards higher voxel volumes (scan time being equal). The authors suggest the FRE to be the dominant factor effecting segmentation; however, segmentation is limited by noise as much as contrast.

      We of course agree with the reviewer that contrast-to-noise ratio is a key factor that determines the detection of vessels and the quality of the segmentation, however there are subtleties regarding the exact inter-relationship between CNR, resolution, and segmentation performance.

      The main purpose of Figure 4 is not to provide a trade-off between flow-related enhancement and signal-to-noise ratio—in particular as SNR is modulated by many more factors than voxel size alone, e.g. acquisition time, coil geometry and instrumentation—but to decide whether the limiting factor for imaging pial arteries is the reduction in flow-related enhancement due to long blood delivery times (which is the explanation often found in the literature (Chen et al., 2018; Haacke et al., 1990; Masaryk et al., 1989; Mut et al., 2014; Park et al., 2020; Parker et al., 1991; Wilms et al., 2001; Wright et al., 2013)) or due to partial volume effects. Furthermore, when reducing voxel size one will also likely increase the number of encoding steps to maintain the imaging coverage (i.e., the field-of-view) and so the relationship between voxel size and SNR in practice is not straightforward. Therefore, we had to conclude that deducing a meaningful SNR analysis that would benefit the reader was not possible given the available data due to the complex relationship between voxel size and other imaging parameters. Note that these considerations are not specific to imaging the pial arteries but apply to all MRA acquisitions, and have thus been discussed previously in the literature. Here, we wanted to focus on the novel insights gained in our study, namely that it provides an expression for how relative FRE contrast changes with voxel size with some assumptions that apply for imaging pial arteries.

      Further, depending on the definition of FRE and whether partial-volume effects are included (see also our response to R2.8), larger voxel volumes have been found to be theoretically advantageous even when only considering contrast (Du et al., 1996; Venkatesan and Haacke, 1997), which is not in line with empirical observations (Al-Kwifi et al., 2002; Bouvy et al., 2014; Haacke et al., 1990; Ladd, 2007; Mattern et al., 2018; von Morze et al., 2007).

      The notion that vessel segmentation algorithms perform well on noisy data but poorly on low-contrast data was mainly driven by our own experiences. However, we still believe that the assumption that (all) segmentation algorithms are linearly dependent on contrast and noise (which the formulation of a contrast-to-noise ratio presumes) is similarly not warranted. Indeed, the necessary trade-off between FRE and SNR might be specific to the particular segmentation algorithm being used than a general property of the acquisition. Please also note that our analysis of the FRE does not suggest that an arbitrarily high resolution is needed. Importantly, while we previously noted that reducing voxel size improves contrast in vessels whose diameters are smaller than the voxel size, we now explicitly acknowledge that, for vessels whose diameters are larger than the voxel size reducing the voxel size is not helpful---since it only reduces SNR without any gain in contrast---and may hinder segmentation performance, and thus become counterproductive. But we take the reviewer’s point and also acknowledge that these intricacies need to be mentioned, and therefore we have rephrased the statement in the discussion in the following way:

      "In general, we have not considered SNR, but only FRE, i.e. the (relative) image contrast, assuming that segmentation algorithms would benefit from higher contrast for smaller arteries. Importantly, the acquisition parameters available to maximize FRE are limited, namely repetition time, flip angle and voxel size. SNR, however, can be improved via numerous avenues independent of these parameters (Brown et al., 2014b; Du et al., 1996; Heverhagen et al., 2008; Parker et al., 1991; Triantafyllou et al., 2011; Venkatesan and Haacke, 1997), the simplest being longer acquisition times. If the aim is to optimize a segmentation outcome for a given acquisition time, the trade-off between contrast and SNR for the specific segmentation algorithm needs to be determined (Klepaczko et al., 2016; Lesage et al., 2009; Moccia et al., 2018; Phellan and Forkert, 2017). Our own—albeit limited—experience has shown that segmentation algorithms (including manual segmentation) can accommodate a perhaps surprising amount of noise using prior knowledge and neighborhood information, making these high-resolution acquisitions possible. Importantly, note that our treatment of the FRE does not suggest that an arbitrarily small voxel size is needed, but instead that voxel sizes appropriate for the arterial diameter of interest are beneficial (in line with the classic “matched-filter” rationale (North, 1963)). Voxels smaller than the arterial diameter would not yield substantial benefits (Figure 5) and may result in SNR reductions that would hinder segmentation performance."

      3) Page 11, Line 225. "only a fraction of the blood is replaced" I think the language should be reworded. There are certainly water molecules in blood which have experience more excitation B1 pulses due to the parabolic flow upstream and the temporal variation in flow. There is magnetization diffusion which reduces the discrepancy; however, it seems pertinent to just say the authors assume the signal is represented by the average arrival time. This analysis is never verified and is only approximate anyways. The "blood dwell time" is also an average since voxels near the wall will travel more slowly. Overall, I recommend reducing the conjecture in this section.

      We fully agree that our treatment of the blood dwell time does not account for the much more complex flow patterns found in cortical arteries. However, our aim was not do comment on these complex patterns, but to help establish if, in the simplest scenario assuming plug flow, the often-mentioned slow blood flow requires multiple velocity compartments to describe the FRE (as is commonly done for 2D MRA (Brown et al., 2014a; Carr and Carroll, 2012)). We did not intend to comment on the effects of laminar flow or even more complex flow patterns, which would require a more in-depth treatment. However, as the small arteries targeted here are often just one voxel thick, all signals are indeed integrated within that voxel (i.e. there is no voxel near the wall that travels more slowly), which may average out more complex effects. We have clarified the purpose and scope of this section in the following way:

      "In classical descriptions of the FRE effect (Brown et al., 2014a; Carr and Carroll, 2012), significant emphasis is placed on the effect of multiple “velocity segments” within a slice in the 2D imaging case. Using the simplified plug-flow model, where the cross-sectional profile of blood velocity within the vessel is constant and effects such as drag along the vessel wall are not considered, these segments can be described as ‘disks’ of blood that do not completely traverse through the full slice within one T_R, and, thus, only a fraction of the blood in the slice is replaced. Consequently, estimation of the FRE effect would then need to accommodate contribution from multiple ‘disks’ that have experienced 1 to k RF pulses. In the case of 3D imaging as employed here, multiple velocity segments within one voxel are generally not considered, as the voxel sizes in 3D are often smaller than the slice thickness in 2D imaging and it is assumed that the blood completely traverses through a voxel each T_R. However, the question arises whether this assumption holds for pial arteries, where blood velocity is considerably lower than in intracranial vessels (Figure 2). To answer this question, we have computed the blood dwell time , i.e. the average time it takes the blood to traverse a voxel, as a function of blood velocity and voxel size (Figure 2). For reference, the blood velocity estimates from the three studies mentioned above (Bouvy et al., 2016; Kobari et al., 1984; Nagaoka and Yoshida, 2006) have been added in this plot as horizontal white lines. For the voxel sizes of interest here, i.e. 50–300 μm, blood dwell times are, for all but the slowest flows, well below commonly used repetition times (Brown et al., 2014a; Carr and Carroll, 2012; Ladd, 2007; von Morze et al., 2007). Thus, in a first approximation using the plug-flow model, it is not necessary to include several velocity segments for the voxel sizes of interest when considering pial arteries, as one might expect from classical treatments, and the FRE effect can be described by equations (1) – (3), simplifying our characterization of FRE for these vessels. When considering the effect of more complex flow patterns, it is important to bear in mind that the arteries targeted here are only one-voxel thick, and signals are integrated across the whole artery."

      4) Page 13, Line 260. "two-compartment modelling" I think this section is better labeled "Extension to consider partial volume effects" The compartments are not interacting in any sense in this work.

      Thank you for this suggestion. We have replaced the heading with Introducing a partial-volume model (page 14) and replaced all instances of ‘two-compartment model’ with ‘partial-volume model’.

      5) Page 14, Line 284. "In practice, a reduction in slab …." "reducing the voxel size is a much more promising avenue" There is a fair amount on conjecture here which is not supported by experiments. While this may be true, the authors also use a classical approach with quite thin slabs.

      The slab thickness used in our experiments was mainly limited by the acquisition time and the participants ability to lie still. We indeed performed one measurement with a very experienced participant with a thicker slab, but found that with over 20 minutes acquisition time, motion artefacts were unavoidable. The data presented in Figure 5 were acquired with similar slab thickness, supporting the statement that reducing the voxel size is a promising avenue for imaging small pial arteries. However, we indeed have not provided an empirical comparison of the effect of slab thickness. Nevertheless, we believe it remains useful to make the theoretical argument that due to the convoluted nature of the pial arterial vascular geometry, a reduction in slab thickness may not reduce the acquisition time if no reduction in intra-slab vessel length can be achieved, i.e. if the majority of the artery is still contained in the smaller slab. We have clarified the statement and removed the direct comparison (‘much more’ promising) in the following way:

      "In theory, a reduction in blood delivery time increases the FRE in both regimes, and—if the vessel is smaller than the voxel—so would a reduction in voxel size. In practice, a reduction in slab thickness―which is the default strategy in classical TOF-MRA to reduce blood delivery time―might not provide substantial FRE increases for pial arteries. This is due to their convoluted geometry (see section Anatomical architecture of the pial arterial vasculature), where a reduction in slab thickness may not necessarily reduce the vessel segment length if the majority of the artery is still contained within the smaller slab. Thus, given the small arterial diameter, reducing the voxel size is a promising avenue when imaging the pial arterial vasculature."

      6) Figure 5. These image differences are highly exaggerated by the lack of zero filling (or any interpolation) and the fact that the wildly different. The interpolation should be addressed, and the scan time discrepancy listed as a limitation.

      We have extended the discussion around zero-filling by including additional considerations based on the imaging parameters in Figure 5 and highlighted the substantial differences in voxel volume. Our choice not to perform zero-filling was driven by the open question of what an ‘optimal’ zero-filling factor would be. We have also highlighted the substantial differences in acquisition time when describing the results.

      Changes made to the results section:

      "To investigate the effect of voxel size on vessel FRE, we acquired data at four different voxel sizes ranging from 0.8 mm to 0.3 mm isotropic resolution, adjusting only the encoding matrix, with imaging parameters being otherwise identical (FOV, TR, TE, flip angle, R, slab thickness, see section Data acquisition). The total acquisition time increases from less than 2 minutes for the lowest resolution scan to over 6 minutes for the highest resolution scan as a result."

      Changes made to the discussion section:

      "Nevertheless, slight qualitative improvements in image appearance have been reported for higher zero-filling factors (Du et al., 1994), presumably owing to a smoother representation of the vessels (Bartholdi and Ernst, 1973). In contrast, Mattern et al. (2018) reported no improvement in vessel contrast for their high-resolution data. Ultimately, for each application, e.g. visual evaluation vs. automatic segmentation, the optimal zero-filling factor needs to be determined, balancing image appearance (Du et al., 1994; Zhu et al., 2013) with loss in statistical independence of the image noise across voxels. For example, in Figure 5, when comparing across different voxel sizes, the visual impression might improve with zero-filling. However, it remains unclear whether the same zero-filling factor should be applied for each voxel size, which means that the overall difference in resolution remains, namely a nearly 20-fold reduction in voxel volume when moving from 0.8-mm isotropic to 0.3-mm isotropic voxel size. Alternatively, the same ’zero-filled’ voxel sizes could be used for evaluation, although then nearly 94 % of the samples used to reconstruct the image with 0.8-mm voxel size would be zero-valued for a 0.3-mm isotropic resolution. Consequently, all data presented in this study were reconstructed without zero-filling."

      7) Figure 7. Given the limited nature of experiment may it not also be possible the subject moved more, had differing brain blood flow, etc. Were these lengthy scans acquired in the same session? Many of these differences could be attributed to other differences than the small difference in spatial resolution.

      The scans were acquired in the same session using the same prospective motion correction procedure. Note that the acquisition time of the images with 0.16 mm isotropic voxel size was comparatively short, taking just under 12 minutes. Although the difference in spatial resolution may seem small, it still amounts to a 33% reduction in voxel volume. For comparison, reducing the voxel size from 0.4 mm to 0.3 mm also ‘only’ reduces the voxel volume by 58 %—not even twice as much. Overall, we fully agree that additional validation and optimisation of the imaging parameters for pial arteries are beneficial and have added a corresponding statement to the Discussion section.

      Changes made to the results section (also in response to Reviewer 1 (R1.22))

      "We have also acquired one single slab with an isotropic voxel size of 0.16 mm with prospective motion correction for this participant in the same session to compare to the acquisition with 0.14 mm isotropic voxel size and to test whether any gains in FRE are still possible at this level of the vascular tree."

      Changes made to the discussion section:

      "Acquiring these data at even higher field strengths would boost SNR (Edelstein et al., 1986; Pohmann et al., 2016) to partially compensate for SNR losses due to acceleration and may enable faster imaging and/or smaller voxel sizes. This could facilitate the identification of the ultimate limit of the flow-related enhancement effect and identify at which stage of the vascular tree does the blood delivery time become the limiting factor. While Figure 7 indicates the potential for voxel sizes below 0.16 mm, the singular nature of this comparison warrants further investigations."

      8) Page 22, Line 395. Would the analysis be any different with an absolute difference? The FRE (Eq 6) divides by a constant value. Clearly there is value in the difference as other subtractive inflow imaging would have infinite FRE (not considering noise as the authors do).

      Absolutely; using an absolute FRE would result in the highest FRE for the largest voxel size, whereas in our data small vessels are more easily detected with the smallest voxel size. We also note that relative FRE would indeed become infinite if the value in the denominator representing the tissue signal was zero, but this special case highlights how relative FRE can help characterize “segmentability”: a vessel with any intensity surrounded by tissue with an intensity of zero is trivially/infinitely segmentatble. We have added this point to the revised manuscript as indicated below.

      Following the suggestion of Reviewer 1 (R1.2), we have included additional simulations to clarify the effects of relative FRE definition and partial-volume model, in which we show that only when considering both together are smaller voxel sizes advantageous (Supplementary Material).

      "Effect of FRE Definition and Interaction with Partial-Volume Model

      For the definition of the FRE effect in this study, we used a measure of relative FRE (Al-Kwifi et al., 2002) in combination with a partial-volume model (Eq. 6). To illustrate the effect of these two definitions, as well as their interaction, we have estimated the relative and absolute FRE for an artery with a diameter of 200 µm and 2 000 µm (i.e. no partial-volume effects). The absolute FRE explicitly takes the voxel volume into account, i.e. instead of Eq. (6) for the relative FRE we used"

      Eq. (1)

      Note that the division by

      to obtain the relative FRE removes the contribution of the total voxel volume

      "Supplementary Figure 2 shows that, when partial volume effects are present, the highest relative FRE arises in voxels with the same size as or smaller than the vessel diameter (Supplementary Figure 2A), whereas the absolute FRE increases with voxel size (Supplementary Figure 2C). If no partial-volume effects are present, the relative FRE becomes independent of voxel size (Supplementary Figure 2B), whereas the absolute FRE increases with voxel size (Supplementary Figure 2D). While the partial-volume effects for the relative FRE are substantial, they are much more subtle when using the absolute FRE and do not alter the overall characteristics."

      Supplementary Figure 2: Effect of voxel size and blood delivery time on the relative flow-related enhancement (FRE) using either a relative (A,B) (Eq. (3)) or an absolute (C,D) (Eq. (12)) FRE definition assuming a pial artery diameter of 200 μm (A,C) or 2 000 µm, i.e. no partial-volume effects at the central voxel of this artery considered here.

      Following the established literature (Brown et al., 2014a; Carr and Carroll, 2012; Haacke et al., 1990) and because we would ultimately derive a relative measure, we have omitted the effect of voxel volume on the longitudinal magnetization in our derivations, which make it appear as if we are dividing by a constant in Eq. 6, as the effect of total voxel volume cancels out for the relative FRE. We have now made this more explicit in our derivation of the partial volume model.

      "Introducing a partial-volume model

      To account for the effect of voxel volume on the FRE, the total longitudinal magnetization M_z needs to also consider the number of spins contained within in a voxel (Du et al., 1996; Venkatesan and Haacke, 1997). A simple approximation can be obtained by scaling the longitudinal magnetization with the voxel volume (Venkatesan and Haacke, 1997) . To then include partial volume effects, the total longitudinal magnetization in a voxel M_z^total becomes the sum of the contributions from the stationary tissue M_zS^tissue and the inflowing blood M_z^blood, weighted by their respective volume fractions V_rel:"

      A simple approximation can be obtained by scaling the longitudinal magnetization with the voxel volume (Venkatesan and Haacke, 1997) . To then include partial volume effects, the total longitudinal magnetization in a voxel M_z^total becomes the sum of the contributions from the stationary tissue M_zS^tissue and the inflowing blood M_z^blood, weighted by their respective volume fractions V_rel:

      Eq. (4)

      For simplicity, we assume a single vessel is located at the center of the voxel and approximate it to be a cylinder with diameter d_vessel and length l_voxel of an assumed isotropic voxel along one side. The relative volume fraction of blood V_rel^blood is the ratio of vessel volume within the voxel to total voxel volume (see section Estimation of vessel-volume fraction in the Supplementary Material), and the tissue volume fraction V_rel^tissue is the remainder that is not filled with blood, or

      Eq. (5)

      We can now replace the blood magnetization in equation Eq. (3) with the total longitudinal magnetization of the voxel to compute the FRE as a function of vessel-volume fraction:

      Eq. (6)

      Based on your suggestion, we have also extended our interpretation of relative and absolute FRE. Indeed, a subtractive flow technique where no signal in the background remains and only intensities in the object are present would have infinite relative FRE, as this basically constitutes a perfect segmentation (bar a simple thresholding step).

      "Extending classical FRE treatments to the pial vasculature

      There are several major modifications in our approach to this topic that might explain why, in contrast to predictions from classical FRE treatments, it is indeed possible to image pial arteries. For instance, the definition of vessel contrast or flow-related enhancement is often stated as an absolute difference between blood and tissue signal (Brown et al., 2014a; Carr and Carroll, 2012; Du et al., 1993, 1996; Haacke et al., 1990; Venkatesan and Haacke, 1997). Here, however, we follow the approach of Al-Kwifi et al. (2002) and consider relative contrast. While this distinction may seem to be semantic, the effect of voxel volume on FRE for these two definitions is exactly opposite: Du et al. (1996) concluded that larger voxel size increases the (absolute) vessel-background contrast, whereas here we predict an increase in relative FRE for small arteries with decreasing voxel size. Therefore, predictions of the depiction of small arteries with decreasing voxel size differ depending on whether one is considering absolute contrast, i.e. difference in longitudinal magnetization, or relative contrast, i.e. contrast differences independent of total voxel size. Importantly, this prediction changes for large arteries where the voxel contains only vessel lumen, in which case the relative FRE remains constant across voxel sizes, but the absolute FRE increases with voxel size (Supplementary Figure 9). Overall, the interpretations of relative and absolute FRE differ, and one measure may be more appropriate for certain applications than the other. Absolute FRE describes the difference in magnetization and is thus tightly linked to the underlying physical mechanism. Relative FRE, however, describes the image contrast and segmentability. If blood and tissue magnetization are equal, both contrast measures would equal zero and indicate that no contrast difference is present. However, when there is signal in the vessel and as the tissue magnetization approaches zero, the absolute FRE approaches the blood magnetization (assuming no partial-volume effects), whereas the relative FRE approaches infinity. While this infinite relative FRE does not directly relate to the underlying physical process of ‘infinite’ signal enhancement through inflowing blood, it instead characterizes the segmentability of the image in that an image with zero intensity in the background and non-zero values in the structures of interest can be segmented perfectly and trivially. Accordingly, numerous empirical observations (Al-Kwifi et al., 2002; Bouvy et al., 2014; Haacke et al., 1990; Ladd, 2007; Mattern et al., 2018; von Morze et al., 2007) and the data provided here (Figure 5, 6 and 7) have shown the benefit of smaller voxel sizes if the aim is to visualize and segment small arteries."

      9) Page 22, Line 400. "The appropriateness of " This also ignores noise. The absolute enhancement is the inherent magnetization available. The results in Figure 5, 6, 7 don't readily support a ratio over and absolute difference accounting for partial volume effects.

      We hope that with the additional explanations on the effects of relative FRE definition in combination with a partial-volume model and the interpretation of relative FRE provided in the previous response (R2.8) and that Figures 5, 6 and 7 show smaller arteries for smaller voxels, we were able to clarify our argument why only relative FRE in combination with a partial volume model can explain why smaller voxel sizes are advantageous for depicting small arteries.

      While we appreciate that there exists a fundamental relationship between SNR and voxel volume in MR (Brown et al., 2014b), this relationship is also modulated by many more factors (as we have argued in our responses to R2.2 and R1.4b).

      We hope that the additional derivations and simulations provided in the previous response have clarified why a relative FRE model in combination with a partial-volume model helps to explain the enhanced detectability of small vessels with small voxels.

      10) Page 24, Line 453. "strategies, such as radial and spiral acquisitions, experience no vessel displacement artefact" These do observe flow related distortions as well, just not typically called displacement.

      Yes, this is a helpful point, as these methods will also experience a degradation of spatial accuracy due to flow effects, which will propagate into errors in the segmentation.

      As the reviewer suggests, flow-related artefacts in radial and spiral acquisitions usually manifest as a slight blur, and less as the prominent displacement found in Cartesian sampling schemes. We have added a corresponding clarification to the Discussion section:

      "Other encoding strategies, such as radial and spiral acquisitions, experience no vessel displacement artefact because phase and frequency encoding take place in the same instant; although a slight blur might be observed instead (Nishimura et al., 1995, 1991). However, both trajectories pose engineering challenges and much higher demands on hardware and reconstruction algorithms than the Cartesian readouts employed here (Kasper et al., 2018; Shu et al., 2016); particularly to achieve 3D acquisitions with 160 µm isotropic resolution."

      11) Page 24, Line 272. "although even with this nearly ideal subject behaviour approximately 1 in 4 scans still had to be discarded and repeated" This is certainly a potential source of bias in the comparisons.

      We apologize if this section was written in a misleading way. For the comparison presented in Figure 7, we acquired one additional slab in the same session at 0.16 mm voxel size using the same prospective motion correction procedure as for the 0.14 mm data. For the images shown in Figure 6 and Supplementary Figure 4 at 0.16 mm voxel size, we did not use a motion correction system and, thus, had to discard a portion of the data. We have clarified that for the comparison of the high-resolution data, prospective motion correction was used for both resolutions. We have clarified this in the Discussion section:

      "This allowed for the successful correction of head motion of approximately 1 mm over the 60-minute scan session, showing the utility of prospective motion correction at these very high resolutions. Note that for the comparison in Figure 7, one slab with 0.16 mm voxel size was acquired in the same session also using the prospective motion correction system. However, for the data shown in Figure 6 and Supplementary Figure 4, no prospective motion correction was used, and we instead relied on the experienced participants who contributed to this study. We found that the acquisition of TOF data with 0.16 mm isotropic voxel size in under 12 minutes acquisition time per slab is possible without discernible motion artifacts, although even with this nearly ideal subject behaviour approximately 1 in 4 scans still had to be discarded and repeated."

      12) Page 25, Line 489. "then need to include the effects of various analog and digital filters" While the analysis may benefit from some of this, most is not at all required for analysis based on optimization of the imaging parameters.

      We have included all four correction factors for completeness, given the unique acquisition parameter and contrast space our time-of-flight acquisition occupies, e.g. very low bandwidth of only 100 Hz, very large matrix sizes > 1024 samples, ideally zero SNR in the background (fully supressed tissue signal). However, we agree that probably the most important factor is the non-central chi distribution of the noise in magnitude images from multiple-channel coil arrays, and have added this qualification in the text:

      "Accordingly, SNR predictions then need to include the effects of various analog and digital filters, the number of acquired samples, the noise covariance correction factor, and—most importantly—the non-central chi distribution of the noise statistics of the final magnitude image (Triantafyllou et al., 2011)."

      Al-Kwifi, O., Emery, D.J., Wilman, A.H., 2002. Vessel contrast at three Tesla in time-of-flight magnetic resonance angiography of the intracranial and carotid arteries. Magnetic Resonance Imaging 20, 181–187. https://doi.org/10.1016/S0730-725X(02)00486-1

      Arts, T., Meijs, T.A., Grotenhuis, H., Voskuil, M., Siero, J., Biessels, G.J., Zwanenburg, J., 2021. Velocity and Pulsatility Measures in the Perforating Arteries of the Basal Ganglia at 3T MRI in Reference to 7T MRI. Frontiers in Neuroscience 15. Avants, B.B., Tustison, N., Song, G., 2009. Advanced normalization tools (ANTS). Insight j 2, 1–35. Bae, K.T., Park, S.-H., Moon, C.-H., Kim, J.-H., Kaya, D., Zhao, T., 2010. Dual-echo arteriovenography imaging with 7T MRI: CODEA with 7T. J. Magn. Reson. Imaging 31, 255–261. https://doi.org/10.1002/jmri.22019

      Bartholdi, E., Ernst, R.R., 1973. Fourier spectroscopy and the causality principle. Journal of Magnetic Resonance (1969) 11, 9–19. https://doi.org/10.1016/0022-2364(73)90076-0

      Bernier, M., Cunnane, S.C., Whittingstall, K., 2018. The morphology of the human cerebrovascular system. Human Brain Mapping 39, 4962–4975. https://doi.org/10.1002/hbm.24337

      Bouvy, W.H., Biessels, G.J., Kuijf, H.J., Kappelle, L.J., Luijten, P.R., Zwanenburg, J.J.M., 2014. Visualization of Perivascular Spaces and Perforating Arteries With 7 T Magnetic Resonance Imaging: Investigative Radiology 49, 307–313. https://doi.org/10.1097/RLI.0000000000000027

      Bouvy, W.H., Geurts, L.J., Kuijf, H.J., Luijten, P.R., Kappelle, L.J., Biessels, G.J., Zwanenburg, J.J.M., 2016. Assessment of blood flow velocity and pulsatility in cerebral perforating arteries with 7-T quantitative flow MRI: Blood Flow Velocity And Pulsatility In Cerebral Perforating Arteries. NMR Biomed. 29, 1295–1304. https://doi.org/10.1002/nbm.3306

      Brown, R.W., Cheng, Y.-C.N., Haacke, E.M., Thompson, M.R., Venkatesan, R., 2014a. Chapter 24 - MR Angiography and Flow Quantification, in: Magnetic Resonance Imaging. John Wiley & Sons, Ltd, pp. 701–737. https://doi.org/10.1002/9781118633953.ch24

      Brown, R.W., Cheng, Y.-C.N., Haacke, E.M., Thompson, M.R., Venkatesan, R., 2014b. Chapter 15 - Signal, Contrast, and Noise, in: Magnetic Resonance Imaging. John Wiley & Sons, Ltd, pp. 325–373. https://doi.org/10.1002/9781118633953.ch15

      Carr, J.C., Carroll, T.J., 2012. Magnetic resonance angiography: principles and applications. Springer, New York. Cassot, F., Lauwers, F., Fouard, C., Prohaska, S., Lauwers-Cances, V., 2006. A Novel Three-Dimensional Computer-Assisted Method for a Quantitative Study of Microvascular Networks of the Human Cerebral Cortex. Microcirculation 13, 1–18. https://doi.org/10.1080/10739680500383407

      Chen, L., Mossa-Basha, M., Balu, N., Canton, G., Sun, J., Pimentel, K., Hatsukami, T.S., Hwang, J.-N., Yuan, C., 2018. Development of a quantitative intracranial vascular features extraction tool on 3DMRA using semiautomated open-curve active contour vessel tracing: Comprehensive Artery Features Extraction From 3D MRA. Magn. Reson. Med 79, 3229–3238. https://doi.org/10.1002/mrm.26961

      Choi, U.-S., Kawaguchi, H., Kida, I., 2020. Cerebral artery segmentation based on magnetization-prepared two rapid acquisition gradient echo multi-contrast images in 7 Tesla magnetic resonance imaging. NeuroImage 222, 117259. https://doi.org/10.1016/j.neuroimage.2020.117259

      Conolly, S., Nishimura, D., Macovski, A., Glover, G., 1988. Variable-rate selective excitation. Journal of Magnetic Resonance (1969) 78, 440–458. https://doi.org/10.1016/0022-2364(88)90131-X

      Deistung, A., Dittrich, E., Sedlacik, J., Rauscher, A., Reichenbach, J.R., 2009. ToF-SWI: Simultaneous time of flight and fully flow compensated susceptibility weighted imaging. J. Magn. Reson. Imaging 29, 1478–1484. https://doi.org/10.1002/jmri.21673

      Detre, J.A., Leigh, J.S., Williams, D.S., Koretsky, A.P., 1992. Perfusion imaging. Magnetic Resonance in Medicine 23, 37–45. https://doi.org/10.1002/mrm.1910230106

      Du, Y., Parker, D.L., Davis, W.L., Blatter, D.D., 1993. Contrast-to-Noise-Ratio Measurements in Three-Dimensional Magnetic Resonance Angiography. Investigative Radiology 28, 1004–1009. Du, Y.P., Jin, Z., 2008. Simultaneous acquisition of MR angiography and venography (MRAV). Magn. Reson. Med. 59, 954–958. https://doi.org/10.1002/mrm.21581

      Du, Y.P., Parker, D.L., Davis, W.L., Cao, G., 1994. Reduction of partial-volume artifacts with zero-filled interpolation in three-dimensional MR angiography. J. Magn. Reson. Imaging 4, 733–741. https://doi.org/10.1002/jmri.1880040517

      Du, Y.P., Parker, D.L., Davis, W.L., Cao, G., Buswell, H.R., Goodrich, K.C., 1996. Experimental and theoretical studies of vessel contrast-to-noise ratio in intracranial time-of-flight MR angiography. Journal of Magnetic Resonance Imaging 6, 99–108. https://doi.org/10.1002/jmri.1880060120

      Duvernoy, H., Delon, S., Vannson, J.L., 1983. The Vascularization of The Human Cerebellar Cortex. Brain Research Bulletin 11, 419–480. Duvernoy, H.M., Delon, S., Vannson, J.L., 1981. Cortical blood vessels of the human brain. Brain Research Bulletin 7, 519–579. https://doi.org/10.1016/0361-9230(81)90007-1

      Eckstein, K., Bachrata, B., Hangel, G., Widhalm, G., Enzinger, C., Barth, M., Trattnig, S., Robinson, S.D., 2021. Improved susceptibility weighted imaging at ultra-high field using bipolar multi-echo acquisition and optimized image processing: CLEAR-SWI. NeuroImage 237, 118175. https://doi.org/10.1016/j.neuroimage.2021.118175

      Edelstein, W.A., Glover, G.H., Hardy, C.J., Redington, R.W., 1986. The intrinsic signal-to-noise ratio in NMR imaging. Magn. Reson. Med. 3, 604–618. https://doi.org/10.1002/mrm.1910030413

      Fan, A.P., Govindarajan, S.T., Kinkel, R.P., Madigan, N.K., Nielsen, A.S., Benner, T., Tinelli, E., Rosen, B.R., Adalsteinsson, E., Mainero, C., 2015. Quantitative oxygen extraction fraction from 7-Tesla MRI phase: reproducibility and application in multiple sclerosis. J Cereb Blood Flow Metab 35, 131–139. https://doi.org/10.1038/jcbfm.2014.187

      Fiedler, T.M., Ladd, M.E., Bitz, A.K., 2018. SAR Simulations & Safety. NeuroImage 168, 33–58. https://doi.org/10.1016/j.neuroimage.2017.03.035

      Frässle, S., Aponte, E.A., Bollmann, S., Brodersen, K.H., Do, C.T., Harrison, O.K., Harrison, S.J., Heinzle, J., Iglesias, S., Kasper, L., Lomakina, E.I., Mathys, C., Müller-Schrader, M., Pereira, I., Petzschner, F.H., Raman, S., Schöbi, D., Toussaint, B., Weber, L.A., Yao, Y., Stephan, K.E., 2021. TAPAS: An Open-Source Software Package for Translational Neuromodeling and Computational Psychiatry. Front. Psychiatry 12. https://doi.org/10.3389/fpsyt.2021.680811

      Gulban, O.F., Bollmann, S., Huber, R., Wagstyl, K., Goebel, R., Poser, B.A., Kay, K., Ivanov, D., 2021. Mesoscopic Quantification of Cortical Architecture in the Living Human Brain. https://doi.org/10.1101/2021.11.25.470023

      Haacke, E.M., Masaryk, T.J., Wielopolski, P.A., Zypman, F.R., Tkach, J.A., Amartur, S., Mitchell, J., Clampitt, M., Paschal, C., 1990. Optimizing blood vessel contrast in fast three-dimensional MRI. Magn. Reson. Med. 14, 202–221. https://doi.org/10.1002/mrm.1910140207

      Helthuis, J.H.G., van Doormaal, T.P.C., Hillen, B., Bleys, R.L.A.W., Harteveld, A.A., Hendrikse, J., van der Toorn, A., Brozici, M., Zwanenburg, J.J.M., van der Zwan, A., 2019. Branching Pattern of the Cerebral Arterial Tree. Anat Rec 302, 1434–1446. https://doi.org/10.1002/ar.23994

      Heverhagen, J.T., Bourekas, E., Sammet, S., Knopp, M.V., Schmalbrock, P., 2008. Time-of-Flight Magnetic Resonance Angiography at 7 Tesla. Investigative Radiology 43, 568–573. https://doi.org/10.1097/RLI.0b013e31817e9b2c

      Hirsch, S., Reichold, J., Schneider, M., Székely, G., Weber, B., 2012. Topology and Hemodynamics of the Cortical Cerebrovascular System. J Cereb Blood Flow Metab 32, 952–967. https://doi.org/10.1038/jcbfm.2012.39

      Horn, B.K.P., Schunck, B.G., 1981. Determining optical flow. Artificial Intelligence 17, 185–203. https://doi.org/10.1016/0004-3702(81)90024-2

      Huck, J., Wanner, Y., Fan, A.P., Jäger, A.-T., Grahl, S., Schneider, U., Villringer, A., Steele, C.J., Tardif, C.L., Bazin, P.-L., Gauthier, C.J., 2019. High resolution atlas of the venous brain vasculature from 7 T quantitative susceptibility maps. Brain Struct Funct 224, 2467–2485. https://doi.org/10.1007/s00429-019-01919-4

      Johst, S., Wrede, K.H., Ladd, M.E., Maderwald, S., 2012. Time-of-Flight Magnetic Resonance Angiography at 7 T Using Venous Saturation Pulses With Reduced Flip Angles. Investigative Radiology 47, 445–450. https://doi.org/10.1097/RLI.0b013e31824ef21f

      Kang, C.-K., Park, C.-A., Kim, K.-N., Hong, S.-M., Park, C.-W., Kim, Y.-B., Cho, Z.-H., 2010. Non-invasive visualization of basilar artery perforators with 7T MR angiography. Journal of Magnetic Resonance Imaging 32, 544–550. https://doi.org/10.1002/jmri.22250

      Kasper, L., Engel, M., Barmet, C., Haeberlin, M., Wilm, B.J., Dietrich, B.E., Schmid, T., Gross, S., Brunner, D.O., Stephan, K.E., Pruessmann, K.P., 2018. Rapid anatomical brain imaging using spiral acquisition and an expanded signal model. NeuroImage 168, 88–100. https://doi.org/10.1016/j.neuroimage.2017.07.062

      Klepaczko, A., Szczypiński, P., Deistung, A., Reichenbach, J.R., Materka, A., 2016. Simulation of MR angiography imaging for validation of cerebral arteries segmentation algorithms. Computer Methods and Programs in Biomedicine 137, 293–309. https://doi.org/10.1016/j.cmpb.2016.09.020

      Kobari, M., Gotoh, F., Fukuuchi, Y., Tanaka, K., Suzuki, N., Uematsu, D., 1984. Blood Flow Velocity in the Pial Arteries of Cats, with Particular Reference to the Vessel Diameter. J Cereb Blood Flow Metab 4, 110–114. https://doi.org/10.1038/jcbfm.1984.15

      Ladd, M.E., 2007. High-Field-Strength Magnetic Resonance: Potential and Limits. Top Magn Reson Imaging 18, 139–152. Lesage, D., Angelini, E.D., Bloch, I., Funka-Lea, G., 2009. A review of 3D vessel lumen segmentation techniques: Models, features and extraction schemes. Medical Image Analysis 13, 819–845. https://doi.org/10.1016/j.media.2009.07.011

      Maderwald, S., Ladd, S.C., Gizewski, E.R., Kraff, O., Theysohn, J.M., Wicklow, K., Moenninghoff, C., Wanke, I., Ladd, M.E., Quick, H.H., 2008. To TOF or not to TOF: strategies for non-contrast-enhanced intracranial MRA at 7 T. Magn Reson Mater Phy 21, 159. https://doi.org/10.1007/s10334-007-0096-9

      Manjón, J.V., Coupé, P., Martí‐Bonmatí, L., Collins, D.L., Robles, M., 2010. Adaptive non-local means denoising of MR images with spatially varying noise levels. Journal of Magnetic Resonance Imaging 31, 192–203. https://doi.org/10.1002/jmri.22003

      Mansfield, P., Harvey, P.R., 1993. Limits to neural stimulation in echo-planar imaging. Magn. Reson. Med. 29, 746–758. https://doi.org/10.1002/mrm.1910290606

      Masaryk, T.J., Modic, M.T., Ross, J.S., Ruggieri, P.M., Laub, G.A., Lenz, G.W., Haacke, E.M., Selman, W.R., Wiznitzer, M., Harik, S.I., 1989. Intracranial circulation: preliminary clinical results with three-dimensional (volume) MR angiography. Radiology 171, 793–799. https://doi.org/10.1148/radiology.171.3.2717754

      Mattern, H., Sciarra, A., Godenschweger, F., Stucht, D., Lüsebrink, F., Rose, G., Speck, O., 2018. Prospective motion correction enables highest resolution time-of-flight angiography at 7T: Prospectively Motion-Corrected TOF Angiography at 7T. Magn. Reson. Med 80, 248–258. https://doi.org/10.1002/mrm.27033

      Mattern, H., Sciarra, A., Lüsebrink, F., Acosta‐Cabronero, J., Speck, O., 2019. Prospective motion correction improves high‐resolution quantitative susceptibility mapping at 7T. Magn. Reson. Med 81, 1605–1619. https://doi.org/10.1002/mrm.27509

      Mennes, M., Jenkinson, M., Valabregue, R., Buitelaar, J.K., Beckmann, C., Smith, S., 2014. Optimizing full-brain coverage in human brain MRI through population distributions of brain size. NeuroImage 98, 513–520. https://doi.org/10.1016/j.neuroimage.2014.04.030 Moccia, S., De Momi, E., El Hadji, S., Mattos, L.S., 2018. Blood vessel segmentation algorithms — Review of methods, datasets and evaluation metrics. Computer Methods and Programs in Biomedicine 158, 71–91. https://doi.org/10.1016/j.cmpb.2018.02.001

      Mustafa, M.A.R., 2016. A data-driven learning approach to image registration. Mut, F., Wright, S., Ascoli, G.A., Cebral, J.R., 2014. Morphometric, geographic, and territorial characterization of brain arterial trees. International Journal for Numerical Methods in Biomedical Engineering 30, 755–766. https://doi.org/10.1002/cnm.2627

      Nagaoka, T., Yoshida, A., 2006. Noninvasive Evaluation of Wall Shear Stress on Retinal Microcirculation in Humans. Invest. Ophthalmol. Vis. Sci. 47, 1113. https://doi.org/10.1167/iovs.05-0218

      Nishimura, D.G., Irarrazabal, P., Meyer, C.H., 1995. A Velocity k-Space Analysis of Flow Effects in Echo-Planar and Spiral Imaging. Magnetic Resonance in Medicine 33, 549–556. https://doi.org/10.1002/mrm.1910330414

      Nishimura, D.G., Jackson, J.I., Pauly, J.M., 1991. On the nature and reduction of the displacement artifact in flow images. Magnetic Resonance in Medicine 22, 481–492. https://doi.org/10.1002/mrm.1910220255

      Nonaka, H., Akima, M., Hatori, T., Nagayama, T., Zhang, Z., Ihara, F., 2003. Microvasculature of the human cerebral white matter: Arteries of the deep white matter. Neuropathology 23, 111–118. https://doi.org/10.1046/j.1440-1789.2003.00486.x

      North, D.O., 1963. An Analysis of the factors which determine signal/noise discrimination in pulsed-carrier systems. Proceedings of the IEEE 51, 1016–1027. https://doi.org/10.1109/PROC.1963.2383

      Park, C.S., Hartung, G., Alaraj, A., Du, X., Charbel, F.T., Linninger, A.A., 2020. Quantification of blood flow patterns in the cerebral arterial circulation of individual (human) subjects. Int J Numer Meth Biomed Engng 36. https://doi.org/10.1002/cnm.3288

      Parker, D.L., Goodrich, K.C., Roberts, J.A., Chapman, B.E., Jeong, E.-K., Kim, S.-E., Tsuruda, J.S., Katzman, G.L., 2003. The need for phase-encoding flow compensation in high-resolution intracranial magnetic resonance angiography. J. Magn. Reson. Imaging 18, 121–127. https://doi.org/10.1002/jmri.10322

      Parker, D.L., Yuan, C., Blatter, D.D., 1991. MR angiography by multiple thin slab 3D acquisition. Magn. Reson. Med. 17, 434–451. https://doi.org/10.1002/mrm.1910170215

      Pauling, L., Coryell, C.D., 1936. The magnetic properties and structure of hemoglobin, oxyhemoglobin and carbonmonoxyhemoglobin. Proceedings of the National Academy of Sciences 22, 210–216. https://doi.org/10.1073/pnas.22.4.210

      Payne, S.J., 2017. Cerebral Blood Flow And Metabolism: A Quantitative Approach. World Scientific. Peters, A.M., Brookes, M.J., Hoogenraad, F.G., Gowland, P.A., Francis, S.T., Morris, P.G., Bowtell, R., 2007. T2* measurements in human brain at 1.5, 3 and 7 T. Magnetic Resonance Imaging 25, 748–753. https://doi.org/10.1016/j.mri.2007.02.014

      Pfeifer, R.A., 1930. Grundlegende Untersuchungen für die Angioarchitektonik des menschlichen Gehirns. Berlin: Julius Springer. Phellan, R., Forkert, N.D., 2017. Comparison of vessel enhancement algorithms applied to time-of-flight MRA images for cerebrovascular segmentation. Medical Physics 44, 5901–5915. https://doi.org/10.1002/mp.12560

      Pohmann, R., Speck, O., Scheffler, K., 2016. Signal-to-Noise Ratio and MR Tissue Parameters in Human Brain Imaging at 3, 7, and 9.4 Tesla Using Current Receive Coil Arrays. Magn. Reson. Med. 75, 801–809. https://doi.org/10.1002/mrm.25677

      Reichenbach, J.R., Venkatesan, R., Schillinger, D.J., Kido, D.K., Haacke, E.M., 1997. Small vessels in the human brain: MR venography with deoxyhemoglobin as an intrinsic contrast agent. Radiology 204, 272–277. https://doi.org/10.1148/radiology.204.1.9205259 Schmid, F., Barrett, M.J.P., Jenny, P., Weber, B., 2019. Vascular density and distribution in neocortex. NeuroImage 197, 792–805. https://doi.org/10.1016/j.neuroimage.2017.06.046

      Schmitter, S., Bock, M., Johst, S., Auerbach, E.J., Uğurbil, K., Moortele, P.-F.V. de, 2012. Contrast enhancement in TOF cerebral angiography at 7 T using saturation and MT pulses under SAR constraints: Impact of VERSE and sparse pulses. Magnetic Resonance in Medicine 68, 188–197. https://doi.org/10.1002/mrm.23226

      Schulz, J., Boyacioglu, R., Norris, D.G., 2016. Multiband multislab 3D time-of-flight magnetic resonance angiography for reduced acquisition time and improved sensitivity. Magn Reson Med 75, 1662–8. https://doi.org/10.1002/mrm.25774

      Shu, C.Y., Sanganahalli, B.G., Coman, D., Herman, P., Hyder, F., 2016. New horizons in neurometabolic and neurovascular coupling from calibrated fMRI, in: Progress in Brain Research. Elsevier, pp. 99–122. https://doi.org/10.1016/bs.pbr.2016.02.003

      Stamm, A.C., Wright, C.L., Knopp, M.V., Schmalbrock, P., Heverhagen, J.T., 2013. Phase contrast and time-of-flight magnetic resonance angiography of the intracerebral arteries at 1.5, 3 and 7 T. Magnetic Resonance Imaging 31, 545–549. https://doi.org/10.1016/j.mri.2012.10.023

      Stewart, A.W., Robinson, S.D., O’Brien, K., Jin, J., Widhalm, G., Hangel, G., Walls, A., Goodwin, J., Eckstein, K., Tourell, M., Morgan, C., Narayanan, A., Barth, M., Bollmann, S., 2022. QSMxT: Robust masking and artifact reduction for quantitative susceptibility mapping. Magnetic Resonance in Medicine 87, 1289–1300. https://doi.org/10.1002/mrm.29048

      Stucht, D., Danishad, K.A., Schulze, P., Godenschweger, F., Zaitsev, M., Speck, O., 2015. Highest Resolution In Vivo Human Brain MRI Using Prospective Motion Correction. PLoS ONE 10, e0133921. https://doi.org/10.1371/journal.pone.0133921

      Szikla, G., Bouvier, G., Hori, T., Petrov, V., 1977. Angiography of the Human Brain Cortex. Springer Berlin Heidelberg, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-81145-6

      Triantafyllou, C., Polimeni, J.R., Wald, L.L., 2011. Physiological noise and signal-to-noise ratio in fMRI with multi-channel array coils. NeuroImage 55, 597–606. https://doi.org/10.1016/j.neuroimage.2010.11.084

      Tustison, N.J., Avants, B.B., Cook, P.A., Zheng, Y., Egan, A., Yushkevich, P.A., Gee, J.C., 2010. N4ITK: Improved N3 Bias Correction. IEEE Transactions on Medical Imaging 29, 1310–1320. https://doi.org/10.1109/TMI.2010.2046908

      Uludağ, K., Müller-Bierl, B., Uğurbil, K., 2009. An integrative model for neuronal activity-induced signal changes for gradient and spin echo functional imaging. NeuroImage 48, 150–165. https://doi.org/10.1016/j.neuroimage.2009.05.051

      Venkatesan, R., Haacke, E.M., 1997. Role of high resolution in magnetic resonance (MR) imaging: Applications to MR angiography, intracranial T1-weighted imaging, and image interpolation. International Journal of Imaging Systems and Technology 8, 529–543. https://doi.org/10.1002/(SICI)1098-1098(1997)8:6<529::AID-IMA5>3.0.CO;2-C

      von Morze, C., Xu, D., Purcell, D.D., Hess, C.P., Mukherjee, P., Saloner, D., Kelley, D.A.C., Vigneron, D.B., 2007. Intracranial time-of-flight MR angiography at 7T with comparison to 3T. J. Magn. Reson. Imaging 26, 900–904. https://doi.org/10.1002/jmri.21097

      Ward, P.G.D., Ferris, N.J., Raniga, P., Dowe, D.L., Ng, A.C.L., Barnes, D.G., Egan, G.F., 2018. Combining images and anatomical knowledge to improve automated vein segmentation in MRI. NeuroImage 165, 294–305. https://doi.org/10.1016/j.neuroimage.2017.10.049

      Wilms, G., Bosmans, H., Demaerel, Ph., Marchal, G., 2001. Magnetic resonance angiography of the intracranial vessels. European Journal of Radiology 38, 10–18. https://doi.org/10.1016/S0720-048X(01)00285-6

      Wright, S.N., Kochunov, P., Mut, F., Bergamino, M., Brown, K.M., Mazziotta, J.C., Toga, A.W., Cebral, J.R., Ascoli, G.A., 2013. Digital reconstruction and morphometric analysis of human brain arterial vasculature from magnetic resonance angiography. NeuroImage 82, 170–181. https://doi.org/10.1016/j.neuroimage.2013.05.089

      Yushkevich, P.A., Piven, J., Hazlett, H.C., Smith, R.G., Ho, S., Gee, J.C., Gerig, G., 2006. User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability. NeuroImage 31, 1116–1128. https://doi.org/10.1016/j.neuroimage.2006.01.015

      Zhang, Z., Deng, X., Weng, D., An, J., Zuo, Z., Wang, B., Wei, N., Zhao, J., Xue, R., 2015. Segmented TOF at 7T MRI: Technique and clinical applications. Magnetic Resonance Imaging 33, 1043–1050. https://doi.org/10.1016/j.mri.2015.07.002

      Zhao, J.M., Clingman, C.S., Närväinen, M.J., Kauppinen, R.A., van Zijl, P.C.M., 2007. Oxygenation and hematocrit dependence of transverse relaxation rates of blood at 3T. Magn. Reson. Med. 58, 592–597. https://doi.org/10.1002/mrm.21342

      Zhu, X., Tomanek, B., Sharp, J., 2013. A pixel is an artifact: On the necessity of zero-filling in fourier imaging. Concepts Magn. Reson. 42A, 32–44. https://doi.org/10.1002/cmr.a.21256

    1. Author Response

      Reviewer #2 (Public Review):

      I have only one concern with the study. I am not fully convinced that the disruption of behavioral updating is specifically due to NA signaling within OFC. In the first two studies, they observed non-specific anatomical effect likely due to the ablation of fibers of passage through OFC. The DREADD experiment is claimed to allay this concern. However, the DCZ was injected systemically. This means that any collaterals of LC NA neurons outside OFC will also be suppressed. While the lack of effect with the mPFC projection is interesting, this does not preclude an effect mediated in other target regions. Overall, I believe that none of the experiments truly demonstrate a specific effect of NA in OFC. A few experimental options that could be considered are injection of DCZ directly in OFC, optogenetic inhibition of fibers in OFC, or pharmacological disruption of NA signaling in OFC.

      The other options are to measure the effect of the toxin ablations from experiments 1 and 2 not just in mPFC but in other regions. If the non-specific effect is truly only in mPFC outside of OFC, that would lead to more confidence that mPFC projection is the only other viable pathway mediating the effect.

      As requested, we have quantified the effect of toxin ablations in neighbouring cortical regions known to be involved in the goal directed behavior, namely the insular cortex (IC, e.g., Balleine & Dickinson, 2000; Parkes & Balleine, 2013) the medial orbitofrontal cortex (MO, e.g., Bradfield et al., 2015; Gourley et al., 2016) and secondary motor cortex (M2, Gremel et al., 2016). Briefly, we found that injection of the saporin toxin in the VO and LO (Experiment 1) led to a significant decrease in NA fiber density in all examined regions. Injection of 6-OHDA also produced significant loss of NA fibres in MO and M2 but not insular cortex. These results are presented in Suppl. Figures 1 and 3 (pages 28 and 30) and the statistics are reported in the main text (page 6 and page 11)

      We have also added the following to our discussion on the reason for the off-target depletions that we observed and acknowledged the potential role of collateral LC neurons:

      Page 21, line starting 374: “The use of the saporin toxin led to a dramatic decrease of NA fiber density in all analysed cortical areas (Suppl Fig 1). This may be due to diffusion of the toxin from the injection site, the existence of collateral LC neurons and/or fibers passing through the ventral portion of the OFC but targeting other cortical areas (Cerpa et al 2019). However, injection of 6OHDA led to much less offsite NA depletion suggesting that a large part of the previous observation is toxin-specific. Indeed, no significant loss of NA fibers was visible in the insular cortex, which has been previously implicated in goal-directed behaviour (Balleine & Dickinson, 2000; Parkes et al., 2013; 2015; 2017). We did nevertheless observe an offsite depletion in more proximal prefrontal areas (prelimbic and medial orbitofrontal cortices) albeit a more modest depletion that what was observed using the saporin toxin. Several studies have described the projection pattern of LC cells. These studies, using various techniques, indicate that LC cells mainly target a single region, and that only a small proportion of LC neurons collateralize to minor targets (Plummer et al., 2020, Kebschull et al 2016, Uematsu et al 2017, Chandler et al 2014). Therefore, even if the OFC noradrenergic innervation is presumably specific (Chandler et al 2013), we cannot rule out a possible collateralization of some neurons toward neighbouring prefrontal areas (PL and MO). We have previously discussed that the posterior ventral portion of the OFC is an entry point for LC fibers en passant, which ultimately target other prefrontal areas (Cerpa et al 2019).

      To achieve a greater anatomical selectivity we used a CAV-2 vector carrying the noradrenergic promoter PRS to target either the LC:A32 or the LC:OFC pathways (Hayat et al., 2020; Hirschberg et al., 2017). It has been shown that the CAV-2 vector can infect axons-of-passage, however the vector does not spread more than 200 µm from the injection site (Schwarz et al 2015). Therefore, when targeting the OFC we injected anteriorly to the level where the highest density of fibers of passage is expected (Cerpa et al 2019) in order to minimize infection of such fibers and restrict inhibition to our pathway of interest.

      Overall, the current behavioural results are in line with our previous work showing that the ability to associate new outcomes to previously acquired actions is impaired following chemogenetic inhibition of the VO and LO (Parkes et al., 2018) or disconnection of the VO and LO from the submedius thalamic nucleus (Fresno et al 2019). These results point to a necessary role of the ventral and lateral parts of the OFC and its noradrenergic innervation for updating A-O associations. However, it is worth mentioning that different subregions of the OFC, both along the medio-lateral and antero-posterior axes of OFC, display clear functional heterogeneities (Dalton et al 2016, Izquierdo 2017, Panayi & Killcross, 2018, Bradfield et al 2018, Barreiros et al 2021). Therefore, while we have previously focused on the anatomical heterogeneity of the noradrenergic innervation in these prefrontal subregions (Cerpa et al 2019), a thorough characterization of its functional role in each of these subregions still needs to be addressed.”

      One last concern is that the lack of the effect due to disruption of the mPFC projection is not guaranteed to not be from experimental issues. If the authors have some evidence that the mPFC projection disruption produced some other behavioral effect, that would make the lack of effect in this case more convincing.

      Unfortunately, we do not provide evidence in the current paper that disrupting the LC:mPFC (now termed LC:A32 in the current study, based on the recommendation of reviewer 1) projection produces some other behavioural effect. However, in an on-going series of experiments, using the same tools as the current study, we found that inhibiting the LC:A32, but not LC:OFC, pathway impairs Pavlovian contingency degradation as shown in the figure below. We therefore believe that the failure of LC:mPFC pathway inhibition to effect outcome identity reversal in the present study is not due to experimental issues. Please note that in the figure below mPFC is referred to as area 32 (A32), as requested by reviewer 1.

      Figure 1. A) Experimental timeline for the Pavlovian contingency degradation procedure. Prior to behavioural training, rats were injected with CAV2-PRS-hM4D-mCherry into either the vlOFC or area 32 (A32). Number of food port entries during the non-degraded CS and degraded CS for rats injected with vehicle and rats injected with DCZ during degradation training (B, D) and the test in extinction (C, E). Inhibition of the LC:vlOFC had no effect on Pavlovian contingency degradation, whereas inhibition of LC:A32 during degradation training rendered rats insensitive to the change in the causal relationship between the CS and the US.

      Reviewer #3 (Public Review):

      I would be curious about the authors' thoughts regarding the recent Duan ... Robbins Neuron paper (https://pubmed.ncbi.nlm.nih.gov/34171290/), in which marmosets displayed paradoxical responses to VLO inactivation and stimulation in contingency degradation tasks. Are there ways to reconcile these reports?

      We previously argued that the updating processes underlying changes in causal contingency versus outcome identity may be supported by different prefrontal regions (Cerpa et al., 2021, Behav Neurosci). Unfortunately, the tasks used in the current study do not allow us to test if our rats are sensitive to changes in the action-outcome contingency. In fact, the effect of inactivation (or overactivation) of the ventral and lateral regions of OFC on an instrumental contingency degradation task similar to that used in Duan et al (2022) has not yet been examined in rats.

      Indeed, while it is stated in Duan et al (2022) that rats with lesions of lateral OFC are insensitive to contingency degradation, none of the citations provided support this conclusion (Balleine & Dickinson, 1998; Corbit & Balleine, 2003; Ostlund and Balleine, 2007; Yin et al., 2005). Balleine and Dickinson (1998) assessed the effect of prelimbic and insular cortex lesions (insular anteroposterior coordinate +1.2), with only the former affecting instrumental contingency degradation. Ostlund and Balleine (2007) assessed the effect of orbitofrontal lesions on Pavlovian contingency degradation (degradation of the S-O contingency) not instrumental contingency degradation. Finally, Corbit and Balleine (2003) and Yin et al (2005) assessed the effect of prelimbic and dorsomedial striatum lesions, respectively. Nevertheless, there are some reports on the effect of chemogenetic inhibition of VO/LO on degradation in a nose-poke response task but the results are conflicting (e.g., Whyte et al., 2019; Zimmerman et al., 2017; 2018). It would be very interesting to study the impact of both inactivation and overactivation of VO and LO in rats to compare with the results found in marmosets, using comparable tasks.

      We have added the following to our discussion, which cites Duan et al (2022) and the need to better understand the role of VO and LO in contingency degradation.

      Page 24, line starting 450: “However, it is not yet clear if the NA-OFC system is also involved in detecting the causal relationship between an action and its outcome (see Cerpa et al., 2021 for a discussion). Some have reported impaired adaptation to contingency changes following inhibition of VO and LO or BDNF-knockdown in these regions (Whyte et al., 2019; Zimmerman et al., 2017), while another study shows that inhibition of VO/LO leaves sensitivity to degradation intact, at least during an initial test (Zimmerman et al., 2018). Interestingly, a recent paper in marmosets demonstrates that inactivation of anterior OFC (area 11) improves instrumental contingency degradation, whereas overactivation impairs degradation (Duan et al., 2022). The potential role of the rodent ventral and lateral regions of OFC, and the NA innervation of OFC, in adapting to degradation of instrumental contingencies requires further investigation.”

    1. Author Response:

      Reviewer #1 (Public Review):

      There is growing precedent for the utility of GWAS-type analyses in elucidating otherwise cryptic genotypic associations with specific Mtb phenotypes, most commonly drug resistance. This study represents the latest instalment of this type of approach, utilizing a large set of WGS data from clinical Mtb isolates and refining the search for DR-associated alleles by restricting the set to those predicted (or known) to be phenotypically DR. This revealed a number of potential candidate mutations, including some in nucleotide excision repair (uvrA, uvrB), in base excision repair (mutY), and homologous recombination (recF). In validating these leads functional assays, the authors present evidence supporting the impact of the identified mutations on antibiotic susceptibility in vitro and in macrophage and animal infection models. These results extend the number of candidate mutations associated with Mtb drug resistance, however the following must be considered:

      (i) The GWAS analysis is the basis of this study, yet the description of the approach used and presentation of results obtained is occasionally obscure; for example, the authors report the use of known drug resistance phenotypes (where available) or inferences of drug-resistance from genotypic data to enhance the potential to identify other mutations that might be implicated in enabling the DR mutations, yet their list of known DR mutations seem to be predominantly rare or unusual mutations, not those commonly associated with clinical DR-TB. In addition, the distribution of the identified resistance-associated mutations across the different lineages need to be explained more clearly.

      In the revised manuscript, we have performed the phylogenetic analysis of the strains used. A phylogenetic tree was generated using Mycobacterium canetti as an outgroup (Figure 1b). The phylogeny analysis suggests the clustering of the strains in lineage 1, 2, 3, and 4. Lineages 2,3 and 4 are clustering together, and lineage 1 is monophyletic, as reported previously. The genome sequence data of 2773 clinical strains were downloaded from NCBI. These strains were also part of the GWAS analysis performed by Coll et al (https://pubmed.ncbi.nlm.nih.gov/29358649/) and Manson et al. (https://pubmed.ncbi.nlm.nih.gov/28092681/). The phenotype of the strains used for the association analysis was reported in the previous studies. We have not performed other predictions. The supplementary table provides the lineage origin of each strain used in the study (Supplementary File 1 & 2). The distributions of resistance-associated mutations in different strains is shown (Figure 2-figure supplement 6a-h). As suggested, we have performed an analysis wherein we looked for the direct target mutations that harbor mutations in the DNA repair genes (Figure 2-figure supplement 6i-k).

      We identified mostly the rare mutations due to the following reasons;

      1. We looked for the mutations that were present only in the multidrug resistant strains as compared to the susceptible strains for association mapping. This strategy exclusively gave most variants associated with multidrug resistant phenotype.

      2. We have used Mixed Linear Model (MLM) for association analysis. MLM removes all the population-specific SNPs based on PCA and kinship corrections. The false discovery rate (FDR) adjusted p-values in the GAPIT software are stringent as it corrects the effects of each marker based on the population structure (Q) as well as kinship (K) values. Therefore the probability of identifying the false-positive SNP is very low. We combined it with the Bonferroni corrections to identify markers associated with the drug resistant phenotype.

      (ii) By combining target gene deletions with different complementation alleles, the authors provide compelling microbiological evidence supporting the inferred role of the mutY and uvrB mutations in enhanced survival under antibiotic treatment. The experimental work, however, is limited to assessments of competitive survival in various models, with/without antibiotic selection, or to mutant frequency analyses; there is no direct evidence provided in support of the proposed mechanism.

      To ascertain if the better survival of the RvDmutY, or RvDmutY::mutY-R262Q, is indeed due to the acquisition of mutations in the direct target of antibiotics, we performed WGS of the strain from the ex vivo evolution experiment (Figure 5). Genomic DNA extracted from ten independent colonies (grown in vitro), was mixed in equal proportions before library preparation. Only those SNPs present in >20% of reads were retained for the analysis. Analysis of Rv sequences grown in vitro suggested that the laboratory strain has accumulated 100 SNPs compared with the reference strain. The sequence of Rv laboratory strain was used as the reference strain for the subsequent analysis. WGS data for RvDmutY, RvDmutY::mutY, and RvDmutY::mutY-R262Q strains grown in vitro did not show the presence of a mutation in the antibiotic target genes. In a similar vein, ten independent colonies, each from the 7H11-OADC plates, after the final round of ex vivo selection in the presence or absence of antibiotics, were selected for WGS. Data indicated that in the absence of antibiotics, no direct target mutations were identified in the ex vivo passaged strains (Figure 6a & e). In the presence of isoniazid, we found mutations in the katG (Ser315Thr or Ser315Ileu) in the Rv, RvDmutY but not in RvDmutY:mutY and RvDmutY::mutY-R262Q (Figure 6b & e). These findings are in congruence with the ex vivo evolution CFU analysis, wherein we did not observe a significant increase in the survival of RvDmutY and RvDmutY::mutY R262Q in the presence of isoniazid (Figure 5). In the presence of ciprofloxacin and rifampicin, direct target mutations were identified in the gyrA and rpoB (Figure 6c e). Asp94Glu/Asp94Gly mutations were identified in gyrA, and, His445Tyr/Ser450Leu mutations were identified in rpoB of RvDmutY and RvDmutY::mutY-R262Q, respectively. No direct target mutations were identified in the Rv and RvDmutY::mutY, suggesting that the perturbed DNA repair aids in acquiring the drug resistance-conferring mutations in Mtb (Figure 6c-e & Supplementary File 8).

      To determine if the better survival of the RvDmutY, or RvDmutY::mutY-R262Q, in the guinea pig infection experiment (Figure 8) is due to the accumulation of mutations in the host, we performed WGS of the strain isolated from guinea pig lungs. Analysis revealed specific genes such as cobQ1, smc, espI, and valS were mutated only in RvDmutY and RvDmutY::mutYR262Q but not in Rv and RvDmutY::mutY. Besides, tcrA and gatA were mutated only in RvDmutY, whereas rv0746 were mutated exclusively in the RvDmutY:mutY (Figure 8-Figure Supplement 2). However, we did not observe any direct target mutations; this may be because guinea pigs were not subjected to antibiotic treatment. Data suggests that the continued longterm selection pressure is necessary for bacilli to acquire mutations.

      (iii) The low drug concentrations used (especially of rifampicin against M. smegmatis) suggest the identified mutations confer low-level resistance to multiple antimycobacterial agents - in turn implying tolerance rather than resistance. If correct, it would be interesting to know how broadly tolerant strains containing these mutations are; that is, whether susceptibility is decreased to a broad range of antibiotics with different mechanisms of action (including both cidal and static agents), and whether the extent of the decrease be determined quantitatively (for example, as change in MIC value).

      To evaluate the effect of different drugs on the survival of RvDmutY or RvDmutY::mutYR262Q, we performed killing kinetics in the presence and absence of isoniazid, rifampicin, ciprofloxacin, and ethambutol (Figure 4a). In the absence of antibiotics, the growth kinetics of Rv, RvDmutY, RvDmutY:mutY, and RvDmutY::mutY-R262Q were similar (Figure 4b). In the presence of isoniazid, ~2 log-fold decreases in bacterial survival was observed on day 3 in Rv and RvDmutY:mutY; however, in RvDmutY and RvDmutY::mutY-R262Q, the difference was limited to ~1.5 log-fold (Figure 4c). A similar trend was apparent on days 6 and 9, suggesting a ~5-fold increase in the survival of RvDmutY and RvDmutY::mutY-R262Q compared with Rv and RvDmutY:mutY (Figure 4c). Interestingly, in the presence of ethambutol, we did not observe any significant difference (Figure 4d). In the presence of rifampicin and ciprofloxacin, we observed a ~10-fold increase in the survival of RvDmutY and RvDmutY::mutY-R262Q compared with Rv and RvDmutY:mutY (Figure 4e-f). Thus results suggest that the absence of mutY or the presence of mutY variant aids in subverting the antibiotic stress.

      Reviewer #2 (Public Review):

      This interesting manuscript uses a collection of whole genome sequences of TB isolates to associate specific sequence polymorphisms with MDR/XDR strains, and having found certain mutations in DNA repair pathways, does a detailed analysis of several mutations. The evaluation of the MutY polymorphism reveals it is loss of function and TB strains carrying this mutation have a higher mutation frequency and enhanced survival in serial passage in macrophages. The strengths of the manuscript are the leveraging of a large sequence dataset to derive interesting candidate mutations in DNA repair pathway and the demonstration that at least one of these mutations has a detectable effect on mutagenicity and pathogenesis. The weaknesses of the manuscript are a lack of experimental exploration of the mechanism by which loss of a DNA repair pathway would enhance survival in vivo. The model presented is that these phenotypes are due to hypermutagenicity and thereby evolution of enhanced pathogenesis, but this is not actually directly tested or investigated. There are also some technical concerns for some of the experimental data which can be strengthened.

      This paper presents the following data:

      • Analyzed whole-genome sequences 2773 clinical strains: 160 000 SNPs identified
      • 1815 drug-susceptible/422 MDR/XDR strains: 188 mutations correlated with Drug resistance.
      • Novel mutations associated with the drug resistance have been found in base excision repair (BER), nucleotide excision repair (NER), and homologous recombination (HR) pathway genes (mutY, uvrA, uvrB, and recF).
      • Specific mutations mutY-R262Q and uvrB-A524V were studied.
      • mutY-R262Q and uvrB-A524V mutations behave as loss of function alleles in vivo, as measured by non-complementation of the increased mutation frequency measured by resistance to Rif and INH.
      • The mutY deletion and the mutY-R262Q mutation increase Mtb survival over WT in macrophages when Mtb has not been submitted to previous rounds of macrophage infection.
      • This advantage is exacerbated in presence of antibiotic (Rif and Cipro but not INH).
      • The MutY deletion and the MutY-R262Q mutation result in an enhanced survival of Mtb during guinea pig infection.

      Major issues:

      The finding that mutations in MutY confers an advantage during macrophage infection is convincing based on the macrophage experiments, but it is premature to conclude that the mechanism of this effect is due to hypermutagenesis and selection of fitter bacterial clones. It is described in E. coli (Foti et al., 2012) and recently in mycobacteria (Dupuy et al., 2020) that the MutY/MutM excision pathways can increase the lethality of antibiotic treatment because of double-strand breaks caused by Adenine/oxoG excisions. The higher survival of the mutY mutant during antibiotic treatment could more be due to lower Adenine/oxoG excision in the mutant rather than acquisition of advantageous mutations, or some other mechanism. The same hypothesis cannot be excluded for the Guinea pig experiments (no antibiotics, but oxidative stress mediated by host defenses could also increase oxoG) and should at least be discussed. Experiments that would support the idea that the in vivo advantage is due to hypermutagenesis would be whole genome sequencing of the output vs input populations to directly document increased mutagenesis. Similarly, is the ΔmutY survival advantage after rounds of macrophage infections dependent on macrophage environment? What happens if the ΔmutY strain is cultivated in vitro in 7H9 (same number of generations) before infecting macrophages?

      We thank the reviewer for the insightful comments. To ascertain if the better survival of the RvDmutY, or RvDmutY::mutY-R262Q, is indeed due to the acquisition of mutations in the direct target of antibiotics, we performed WGS of the strain from the ex vivo evolution experiment (Figure 5). Genomic DNA extracted from ten independent colonies (grown in vitro) was mixed in equal proportion prior to library preparation. For the analysis, only those SNPs that were present in >20% of reads were retained. Analysis of Rv sequences grown in vitro suggested that the laboratory strain has accumulated 100 SNPs compared with the reference strain. The sequence of the Rv laboratory strain was used as the reference strain for the subsequent analysis. WGS data for RvDmutY, RvDmutY::mutY, and RvDmutY::mutY-R262Q strains grown in vitro did not show the presence of a mutation in the antibiotic target genes. In a similar vein, ten independent colonies, each from the 7H11-OADC plates, after the final round of ex vivo selection in the presence or absence of antibiotics, were selected for WGS. Data indicated that in the absence of antibiotic, no direct target mutations were identified in the ex vivo passaged strains (Figure 6a & e). In the presence of isoniazid, we found mutations in the katG (Ser315Thr or Ser315Ileu) in the Rv, RvDmutY but not in RvDmutY:mutY and RvDmutY::mutY-R262Q (Figure 6b & e). These findings are in congruence with the ex vivo evolution CFU analysis, wherein we did not observe a significant increase in the survival of RvDmutY and RvDmutY::mutY R262Q in the presence of isoniazid (Figure 5). In the presence of ciprofloxacin and rifampicin, direct target mutations were identified in the gyrA and rpoB (Figure 6c-e). Asp94Glu/Asp94Gly mutations were identified in gyrA, and, His445Tyr/Ser450Leu mutations were identified in rpoB of RvDmutY and RvDmutY::mutY-R262Q, respectively. No direct target mutations were identified in the Rv and RvDmutY::mutY, suggesting that the perturbed DNA repair aids in acquiring the drug resistance-conferring mutations in Mtb (Figure 6c-e & Supplementary File 8).

      To determine if the better survival of the RvDmutY, or RvDmutY::mutY-R262Q, in the guinea pig infection experiment (Figure 8) is due to the accumulation of mutations in the host, we performed WGS of the strain isolated from guinea pig lungs. Analysis revealed specific genes such as cobQ1, smc, espI, and valS were mutated only in RvDmutY and RvDmutY::mutYR262Q but not in Rv and RvDmutY::mutY. Besides, tcrA and gatA were mutated only in RvDmutY, whereas rv0746 were mutated exclusively in the RvDmutY:mutY (Figure 8-figure supplement 2). However, we did not observe any direct target mutations; this may be because guinea pigs were not subjected to antibiotic treatment. Data suggests that the continued longterm selection pressure is necessary for bacilli to acquire mutations.

      • It would be useful to present more data about the strain relatedness and genome characteristics of the DNA repair mutant strains in the GWAS. For example, the model would suggest that strains carrying DNA repair mutations should have higher SNP load than control strains. Additionally, it would be helpful to know whether the identified DNA repair pathway mutations are from epidemiologically linked strains in the collection to deduce whether these events are arising repeatedly or are a founder effect of a single mutant since for each mutation, the number of strains is small.

      We analyzed the genome of the clinical strains that possess DNA repair gene mutations to determine the additional polymorphisms. The number of SNPs in the strains harboring DNA repair mutation and the drug susceptible strains appears to be similar. The marginal difference, if any were not statistically significant.

      We agree with the reviewer that these strains might be epidemiologically linked. In the present study, all the strains harboring mutation in mutY belong to lineage 4. We observed that all the mutY mutationcontaining strains were either MDR or pre-XDR compared with drug susceptible strains of the same clade.

      • Some of the mutation frequency, survival and competition data could be strengthened by more experimental replicates. Data Lines 370-372 (mutation frequency), lines 387-388 (Survival of strains ex vivo), line 394 (competition experiment) : "Two biologically independent experiments were performed. Each experiment was performed in technical triplicates. Data represent one of the two biological experiments." Two biological replicates is insufficient for the phenotypes presented and all replicates should be included in the analysis. In addition, the definition of "technical triplicates" should be given, does this mean the same culture sampled in triplicate?

      We thank the reviewer for the comment. We performed at least two independent experiments with biological triplicates (not technical triplicates). We apologize for writing this incorrectly. We have reported data from one independent experiment consisting of at least biological triplicates. For mutation rate analysis, we have performed experiment using six independent colonies. These points are mentioned in the methods and legends of the revised manuscript.

      • MutY phenotypes. One caveat to the conclusion that the MutY R262Q mutant is nonfunctional is the lack of examination of the expression of the complementing protein. I would be informative to comment on the location of this mutation in relation to the known structures of MutY proteins. Similarly, for the UvrB polymorphism, this null strain has a clear UV sensitivity phenotype in the literature, so a fuller interrogation for UV killing would be informative re: the A524V mutation.

      We have now included the western blot data on both complementation strains (Figure 3-figure supplement 1). We agree with the reviewer that the uvrB null mutant may have UV sensitivity phenotype, but we have not performed the experiment in the present study.

      Reviewer #3 (Public Review):

      STRENGTHS

      • This ambitious study is broad in scope, beginning with a bacterial GWAS study and extending all the way to in vivo guinea pig infection models.

      • Numerous reports have attempted to identify Mtb strains with elevated mutation rates, and the results are conflicting. The present study sets out to thoroughly evaluate one such mutation that may produce a mutator phenotype, mutY-Arg262Gln.

      WEAKNESSES

      • While the authors follow-up experiments with the mutY-Arg262Gln allele are all consistent with the conclusion that this mutation elevates the mutation rate in Mtb and thus could promote the evolution of drug resistance, further work is needed to unambiguously demonstrate this link.

      • The authors highlight five mutations in genes associated with DNA replication and or repair from their GWAS analysis:

      o dnaA-Arg233Gln: as the authors note in the Discussion, Hicks et al. associate SNPs in dnaA with low-level isoniazid resistance, as a result of lowered katG expression. Since this is unrelated to their focus on DNA repair genes whose mutation could elevate mutation rates, I would consider removing this allele from the Table.

      As suggested, we have removed the dnaA from Table 3.

      o mutY-Arg262Gln: querying publicly available whole genome sequences of clinical Mtb isolates, this SNP appears to be restricted to lineage 4.3 (L4.3). All of these L4.3 strains appear to be drug-resistant. How many times did the mutY-Arg262Gln mutation evolve in the authors dataset? If there is evidence of homoplastic evolution, this would strengthen their case. If not, it doesn't mean the authors findings are incorrect, but does elevate that risk that this mutation could be a passenger (i.e. not driver) mutation. To address this, the authors could attempt to date when the mutY-Arg262Gln arose. If it was before the evolution of drug-resistance conferring alleles in these L4.3 strains, that is consistent with (but not proof of) a driver mutation. If mutY-Arg262Gln arose after, this is much more consistent with a passenger mutation.

      As pointed out by the reviewer, the mutY-Arg262Gln mutation is restricted to lineage 4. We have checked the mutY gene sequence from the strains harboring mutY Arg262Gln mutation and sensitive strains of the same clade. We identified only the reported mutation in the drug-resistant strains, and there was no synonymous mutation that could be used for performing molecular clock analysis. To ascertain whether it is a passenger or a driver mutation, we have performed multiple experiments that suggest that identified mutation aids in the acquisition of drug resistance.

      o uvrB-Ala524Val: curiously we don't see this SNP in our dataset of publicly available whole genome sequences of clinical Mtb isolates (~45,000 genomes).

      We have rechecked this SNP in our dataset. This SNP was present in 87 drug-resistant strains that belong to lineage 2.

      o uvrA-Gln135Lys: this SNP also appears to be restricted to lineage 4.3. Same question as for mutY-Arg262Gln.

      As pointed out by the reviewer, uvrA-Gln135lys mutation is restricted to lineage 4. We identified only the reported mutation in the drug-resistant strains, and there was no synonymous mutation that can be used for performing molecular clock analysis

      o recF-Gly269Gly: this is a very common mutation, is it unique to lineage 2.2.1? Same question as for mutY-Arg262Gln.

      RecF-Gly269Gly mutation was present in the lineage 2 strains. Here also, we identified only the reported mutation in the drug-resistant strains, and there was no synonymous mutation could be used for performing molecular clock analysis.

      • The CRYPTIC consortium recently published a number of preprints on biorxiv detailing very large GWAS studies in Mtb. Did any of these reports also associate drug resistance with mutY? If yes, this should be stated. If not, the potential reasons for this discrepancy should be discussed.

      We have checked the recently published CRYPTIC consortium article (https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.3001721#sec012) for mutY-Arg262Gln. We did not find the mutY-Arg262Gln mutation in their analysis; this is due to the different strains used in the study. However, we identified recF Gly269Gly mutation in their datase

      • Based on the authors follow-up studies in vivo, MutY-Arg262Gln is presumed to be a loss-of-function allele. If the authors could convincingly demonstrate this biochemically with recombinant proteins, this would significantly strengthen their case.

      Experiments performed in Msm and Mtb mutant strains suggest that MutY variant is a loss-of-function allele. We have not performed in vitro assays to confirm the same.

      • If the authors are correct and mutY-Arg262Gln strains have elevated mutation rates, presumably there would be evidence of this in the clinical strain sequencing data. Do mutY-Arg262Gln containing strains have elevated C→G or C→A mutations in their genomes? Presumably such strains would also have a higher number of SNPs than closely related strains WT for mutY- is this the case?

      We analyzed the genome of the clinical strains that possess DNA repair gene mutations to determine the additional polymorphisms. The number of SNPs in the strains harboring DNA repair mutation and the drug susceptible strains appears to be higher. We have also looked for the CàT and CàG mutations in the same strains. CàT mutations are higher in the strains harboring mutY variant compared with the susceptible strains (Figure 2-figure supplement 6 l). However, we could not perform statistical analysis as the number of strains that harbor mutY variant is limited to 8. Thus data suggest that empirically the strains harboring mutY variant show higher SNPs elsewhere and CàT mutations. We are not stating these conclusions strongly in the manuscript as the data is not statistically significant

      • While more work, mutation rates as measured by Luria-Delbruck fluctuation analysis are more accurate than mutation frequencies. I would recommend repeating key experiments by Luria-Delbruck fluctuation analysis. It is also important to report both drug-resistant colony counts and total CFU in these sorts of experiments. Given the clumpy nature of mycobacteria, mutation rates can appear to be artificially elevated due to low total CFU and not an increase in the number of drug-resistant colonies.

      As suggested, we determined the mutation rate in the presence of isoniazid, rifampicin, and ciprofloxacin (Figure 3g-j). The fold increase in the mutation rate relative to Rv for RvDmutY, RvDmutY:mutY, and RvDmutY::mutY-R262Q was 2.90, 0.76, and 3.0 in the presence of isoniazid and 5.62, 1.13, and 5.10 or 9.14, 1.57, and 8.71 in the presence of rifampicin and ciprofloxacin respectively (Figure 3).

      • Figure 4 would appear to measuring drug tolerance not resistance? Are the elevated CFU in the presence of drugs in the mutY-Arg262Gln strain due to an increase in the number of drug resistant strains or drug sensitive strains? This could be assessed by quantifying resulting CFU in the presence or absence the indicated drugs.

      To ascertain better survival is due to the acquisition of mutations in the direct target of antibiotics or drug tolerance. We performed WGS of the strain from the ex vivo evolution experiment (Figure 5). Genomic DNA extracted from ten independent colonies (grown in vitro) was mixed in equal proportion prior to library preparation. Only those SNPs present in >20% of reads were retained for the analysis. Analysis of Rv sequences grown in vitro suggested that the laboratory strain has accumulated 100 SNPs compared with the reference strain. The sequence of the Rv laboratory strain was used as the reference strain for the subsequent analysis. WGS data for RvDmutY, RvDmutY::mutY, and RvDmutY::mutY-R262Q strains grown in vitro did not show the presence of a mutation in the antibiotic target genes. In a similar vein, ten independent colonies, each from the 7H11-OADC plates, after the final round of ex vivo selection in the presence or absence of antibiotics, were selected for WGS. Data indicated that in the absence of antibiotics, no direct target mutations were identified in the ex vivo passaged strains (Figure 6a & e). In the presence of isoniazid, we found mutations in the katG (Ser315Thr or Ser315Ileu) in the Rv, RvDmutY but not in RvDmutY::mutY and RvDmutY::mutY-R262Q (Figure 6b & e). These findings are in congruence with the ex vivo evolution CFU analysis, wherein we did not observe a significant increase in the survival of RvDmutY and RvDmutY::mutY-R262Q in the presence of isoniazid (Figure 5). In the presence of ciprofloxacin and rifampicin, direct target mutations were identified in the gyrA and rpoB (Figure 6c-e). Asp94Glu/Asp94Gly mutations were identified in gyrA, and, His445Tyr/Ser450Leu mutations were identified in rpoB of RvDmutY and RvDmutY::mutY-R262Q, respectively. No direct target mutations were identified in the Rv and RvDmutY::mutY, suggesting that the perturbed DNA repair aids in acquiring the drug resistance-conferring mutations in Mtb (Figure 6c-e & Supplementary File 8).

      To determine if the better survival of the RvDmutY, or RvDmutY::mutY-R262Q, in the guinea pig infection experiment (Figure 8) is due to the accumulation of mutations in the host, we performed WGS of the strain isolated from guinea pig lungs. Analysis revealed specific genes such as cobQ1, smc, espI, and valS were mutated only in RvDmutY and RvDmutY::mutYR262Q but not in Rv and RvDmutY::mutY. Besides, tcrA and gatA were mutated only in RvDmutY, whereas rv0746 were mutated exclusively in the RvDmutY::mutY (Figure 2-figure supplement 6). However, we did not observe any direct target mutations; this may be because guinea pigs were not subjected to antibiotic treatment. Data suggests that the continued longterm selection pressure is necessary for bacilli to acquire mutations.

    1. Author Response:

      Reviewer #3 (Public Review):

      INaR is related to an alternative inactivation mode of voltage activated sodium channels. It was suggested that an intracellular charged particle blocks the sodium channel alpha subunit from the intracellular space in addition to the canonical fast inactivation pathway. Putative particles revealed were sodium channel beta4 subunit and Fibroblast growth factor 14. However, abolishing the expression of neither protein does eliminate INaR. Therefore as recently suggested by several authors it is conceivable that INaR is not mediated by a particle driven mechanism at all. Instead, these and other proteins might bind to the pore forming alpha subunit and endow it with an alternative inactivation pathway as envisioned in this paper by the authors.

      The main experimental findings were (1) The amplitude of INaR is independent of the voltage of the preceding step. (2) The peak amplitudes of INaR are dependent on the time of the depolarizing step but independent of the sodium driving force. (3) INaT and INaR are differential sensitive to recovery from inactivation. According to their experimental data the authors put forward a kinetic scheme that was fitted to their voltage-clamp patch-clamp recordings of freshly isolated Purkinje cells. The kinetic model proposed here has one open state and three inactivated states, two states related to fast inactivation (IF1, IF2) and one state related to a slower process (IS). Notably IS and IF are not linked directly in the kinetic scheme.

      In my humble opinion, the proposed kinetic model fails to explain important experimental aspects and falls short to be related to the molecular machinery of sodium channels as outlined below. Still it is due time to advance the concepts of INaR. The new experimental findings of the authors are important in this respect and some ideas of the new model might be integrated in future kinetics schemes. In addition, the framework of INaR is not easy to get hold on with lots of experimental findings in the literature. Likely, my review falls also short in some aspects. Discussion is much needed and appreciated.

      INaT & INaR decay The authors stated that decay speed of INaT and INaR is different and hence different mechanisms are involved. However at a given voltage (-45 mV) they have nicely illustrated (Fig. 2D and in the simulation Fig. 3H) that this is not the case. This statement is also not compatible with the used Markov model. That is because (at a given voltage) the decay of both current identities proceed from the same open state. Apparent inactivation time constants might be different, though, due to the transition to the on state.

      We apologize that the language used was confusing. Our suggestion that there is more than one pathway for inactivation (from an open/conducting state) is the observation that the decay of INaT being biexponential at steady-state voltages. In the revised manuscript, we point out (lines 546-549) that, at some voltages, the slower of the two decay time constants (of INaT) is identical to the time constant of INaR decay. We also discuss how this observation was previously (Raman and Bean, 2001) interpreted.

      Accumulation in the IS state after INaT inactivation in IF1 and IF2 has to proceed through closed states. How is this compatible with current NaV models? The authors have addressed this issue in the discussion. The arguments they have brought forward are not convincing for me since toxins and mutations are grossly impairing channel function.

      Thank you for this comment. We would like to point out that, in our Markov model, Nav channels may accumulate in IS through either the closed state or open state. This requires, of course, that Nav channels can recover from inactivation prior to deactivation. While we agree that toxins and mutations can grossly impair channel function, we think these studies remain crucial in revealing the potential gating mechanisms of Nav channel pore-forming subunits, and how these mechanisms may vary across cell types that express different combinations of accessory proteins.

      Fast inactivation - parallel inactivation pathways Related to the comment above the motivation to introduce a second fast-inactivated state IF2 is not clear. Using three states for inactivation would imply three inactivation time constants (O->IF1, IF1->IF2, O->IS) which are indeed partially visible in the simulation (Fig. 3). However, experimental data of INaT inactivation seldom require more than one time constant for fast inactivation. Importantly the authors do not provide data on INaT inactivation of the model in Fig. 3. Fast Inactivation is mapped to the binding of the IFM particle. In this model at slightly negative potential IF1 and IF2 reverse from absorbing states to dissipating states. How is this compatible with the IFM mechanism? Additionally, the statements in the discussion are not helpful, either a second time constants is required for IF (two distinct states, with two time constants) or not.

      We thank this Reviewer for this comment. We tried to developed the model based on previous data on Nav channel inactivation. Indeed, much experimental data exists for the fast inactivation pathway (O -> IF1). As we noted in the discussion, without the inclusion of the IF2 state, we were unable to fully reproduce our experimental data, which led us to add the IF2 state. As with all model development, we balanced the need to faithfully reproduce the experimental data with efforts to limit the complexity of the model structure. In addition, as noted in the Methods section, our routine is an automatic parameter optimization routine that seeks to minimize the error between simulation and experiments. We can never be sure that we have found an absolute minimum, or that the optimization got stuck at a local minimum when simulating without inclusion of IF2. In other words, there may be a parameter set that sufficiently fits the data without inclusion of IF2, but we were unable to find it. As a safeguard against local minima, we used multistarts of the optimization routine with different initial parameter sets. In each case, we were unable to find a sufficiently acceptable parameter set.

      We agree with this Reviewer that at slightly negative potentials (compared to strong depolarizations), channels exit the IF1 state at different rates, although we would point out that channels dissipate from the IF1 state (accumulating into IS1) under both conditions (see Figure 8B-C). This requires the binding and unbinding of the IFM motif to occur with some voltagesensitivity. We believe this to be a possibility in light of evidence that suggests IFM binding (and fast-inactivation) is an allosteric effect (Yan et al., 2017) and evidence showing that mutations in the pore-lining S6 segments can give rise to shifts of the voltage-dependence of fast inactivation without correlated shifts in the voltage-dependence of activation (Cervenka et al., 2018). However, it remains unclear how voltage-sensing in the Nav channel interact with fast- and slow-inactivation processes.

      Due to space constraints in Figure 3, we did not show a plot of INaT voltage dependence. However, below, please find the experimental data (points), and simulated (line) INaT in our model.

      Differential recovery of INaT & INaR Different kinetics for INaR and INaR are a very interesting finding. In my opinion, this data is not compatible with the proposed Markov model (and the authors do not provide data on the simulation). If INaT1 and INaT2 (Fig. 5 A) have the same amplitude the occupancy of the open state must be the same. I think there is no way to proceed differentially to the open state of INaR in subsequent steps unless e.g. slow inactivated states are introduced.

      Thank you for bringing up this important point. The differential recovery of INaT and INaR indicates there are distinct Nav channel populations underlying the Nav currents in Purkinje neurons. We make this point on lines 632-635 of the revised manuscript. Because our Markov model is used to simulate a single channel population, we do not expect the model to reproduce the results shown in Figure 5. We have now added this point to the Discussion section on lines 637-640.

      Kinetic scheme Comparison with the Raman-Bean model is a bit unfair unless the parameters are fitted to the same dataset used in this study. However, the authors have an important point in stating that this model could not reproduce all aspects of INaR. A more detailed discussion (and maybe analysis) of the states required for the models would be ideal including recent literature (e.g., J Physiol. 2020 Jan;598(2):381-40). Could the Raman-Bean model perform better if an additional inactivated state is introduced? Are alternative connections possible in the proposed model? How ambiguous is the model? Is given my statements above a second open state required? Finally, a better link of the introduced states to NaV structure-function relationship would be beneficial.

      These are all excellent points. We absolutely agree; it was/is not our intention to “prove” that the Raman-Bean model does not fit our dataset (as you mention, with proper refinement of the parameters, some of the data may be well fit). In fact, qualitatively we found the Raman-Bean model quite consistent with our dataset (which is an excellent validation of both the model, and our data). It was our intention to show (in Figure 7) that there is good agreement between the Raman-Bean model and our experimental data for steady state inactivation (C), availability (D), and recovery from inactivation (E). While we find the magnitude of the resurgent current (F) to be markedly different than the Raman-Bean data, we now note this to likely be due to the large differences in the extracellular Na+ concentrations used in voltage-clamp experiments (lines 440-444). Our models, however, specifically differ in our parallel fast and slow inactivation pathways (Figure 7H). As seen in the Raman-Bean model, in response to a prolonged depolarizing holding potential, there is negligible inactivation, as the OB state remains absorbent until the channel is repolarized. This is primarily because the channel must transit through the Open state on repolarization. We find distinctly different behavior in our data. As seen in the experimental data shown in 7H, despite a prolonged depolarization, Nav channels begin to inactivate and accumulate in the slow inactivated state without prerequisite channel opening. This behavior is impossible to fit in the Raman-Bean model, given the topological constraint of the model requiring a single pathway through the open state from the OB state.

      To that point, it is also unlikely that the addition of inactivated states to the Raman-Bean model would help fit this new dataset. Indeed, the Raman-Bean model contains 7 inactivated states. If there were a connection between OB ->I6, it is possible that direct inactivation (bypassing the O state) may help. Again, however, it is not our intention to discredit the Raman-Bean model, nor is it our intention to improve the Raman-Bean model. With new datasets, a fresh look at model topology was undertaken, which is how we developed our proposed model.

      This Reviewer astutely points out a known limitation of Markov (state-chain) modeling; it is impossible to tell uniqueness, or ambiguity of the model (both with parameters as well as model topology). Following the results of Menon et al. 2009 (PNAS vol. 106 / #39 / 16829 – 16834), in which they used a state mutating genetic algorithm to vary topologies of a Markov model, our group (Mangold et al. 2021, PLoS Comp Bio) recently published an algorithm to distinctly enumerate all possible model structures using rooted graph theory (e.g. all possible combinations of models, rooted around a single open state). What we found (which is not entirely surprising) is that there are many model structures and parameter sets that adequately fit certain datasets (e.g., cardiac Nav channels).

      Therefore, the goal is never to find the model (indeed we don’t propose that we have done so), but rather to find a model with acceptable fits to the data and then use that model to hypothesize why that model structure works, as well as to hypothesize higher dimensional dynamics. We make these points in the revised manuscript (lines 591-597).

      We did not specifically explore the impact of a second open state in our modeling and simulation studies, but we would certainly agree that a model with a second open state may recapitulate the dataset.

    1. Author Response

      Reviewer #1 (Public Review):

      Trudel and colleagues aimed to uncover the neural mechanisms of estimating the reliability of the information from social agents and non-social objects. By combining functional MRI with a behavioural experiment and computational modelling, they demonstrated that learning from social sources is more accurate and robust compared with that from non-social sources. Furthermore, dmPFC and pTPJ were found to track the estimated reliability of the social agents (as opposed to the non-social objects). The strength of this study is to devise a task consisting of the two experimental conditions that were matched in their statistical properties and only differed in their framing (social vs. non-social). The novel experimental task allows researchers to directly compare the learning from social and non-social sources, which is a prominent contribution of the present study to social decision neuroscience.

      Thank you so much for your positive feedback about our work. We are delighted that you found that our manuscript provided a prominent contribution to social decision neuroscience. We really appreciate your time to review our work and your valuable comments that have significantly helped us to improve our manuscript further.

      One of the major weaknesses is the lack of a clear description about the conceptual novelty. Learning about the reliability/expertise of social and non-social agents has been of considerable concern in social neuroscience (e.g., Boorman et al., Neuron 2013; and Wittmann et al., Neuron 2016). The authors could do a better job in clarifying the novelty of the study beyond the previous literature.

      We understand the reviewer’s comment and have made changes to the manuscript that, first, highlight more strongly the novelty of the current study. Crucially, second, we have also supplemented the data analyses with a new model-based analysis of the differences in behaviour in the social and non-social conditions which we hope makes clearer, at a theoretical level, why participants behave differently in the two conditions.

      There has long been interest in investigating whether ‘social’ cognitive processes are special or unique compared to ‘non-social’ cognitive processes and, if they are, what makes them so. Differences between conditions could arise during the input stage (e.g. the type of visual input that is processed by social and non-social system), at the algorithm stage (e.g. the type of computational principles that underpin social versus non-social processes) or, even if identical algorithms are used, social and non-social processes might depend on distinct anatomical brain areas or neurons within brain areas. Here, we conducted multiple analyses (in figures 2, 3, and 4 in the revised manuscript and in Figure 2 – figure supplement 1, Figure 3 – figure supplement 1, Figure 4 – figure supplement 3, Figure 4 – figure supplement 4) that not only demonstrated basic similarities in mechanism generalised across social and non-social contexts, but also demonstrated important quantitative differences that were linked to activity in specific brain regions associated with the social condition. The additional analyses (Figure 4 – figure supplement 3, Figure 4 – figure supplement 4) show that differences are not simply a consequence of differences in the visual stimuli that are inputs to the two systems1, nor does the type of algorithm differ between conditions. Instead, our results suggest that the precise manner in which an algorithm is implemented differs when learning about social or non-social information and that this is linked to differences in neuroanatomical substrates.

      The previous studies mentioned by the reviewer are, indeed, relevant ones and were, of course, part of the inspiration for the current study. However, there are crucial differences between them and the current study. In the case of the previous studies by Wittmann, the aim was a very different one: to understand how one’s own beliefs, for example about one’s performance, and beliefs about others, for example about their performance levels, are combined. Here, however, instead we were interested in the similarities and differences between social and non-social learning. It is true that the question resembles the one addressed by Boorman and colleagues in 2013 who looked at how people learned about the advice offered by people or computer algorithms but the difference in the framing of that study perhaps contributed to authors’ finding of little difference in learning. By contrast, in the present study we found evidence that people were predisposed to perceive stability in social performance and to be uncertain about non-social performance. By accumulating evidence across multiple analyses, we show that there are quantitative differences in how we learn about social versus non-social information, and that these differences can be linked to the way in which learning algorithms are implemented neurally. We therefore contend that our findings extend our previous understanding of how, in relation to other learning processes, ‘social’ learning has both shared and special features.

      We would like to emphasize the way in which we have extended several of the analyses throughout the revision. The theoretical Bayesian framework has made it possible to simulate key differences in behaviour between the social and non-social conditions. We explain in our point-by-point reply below how we have integrated a substantial number of new analyses. We have also more carefully related our findings to previous studies in the Introduction and Discussion.

      Introduction, page 4:

      [...] Therefore, by comparing information sampling from social versus non-social sources, we address a long-standing question in cognitive neuroscience, the degree to which any neural process is specialized for, or particularly linked to, social as opposed to non-social cognition 2–9. Given their similarities, it is expected that both types of learning will depend on common neural mechanisms. However, given the importance and ubiquity of social learning, it may also be that the neural mechanisms that support learning from social advice are at least partially specialized and distinct from those concerned with learning that is guided by nonsocial sources. However, it is less clear on which level information is processed differently when it has a social or non-social origin. It has recently been argued that differences between social and non-social learning can be investigated on different levels of Marr’s information processing theory: differences could emerge at an input level (in terms of the stimuli that might drive social and non-social learning), at an algorithmic level or at a neural implementation level 7. It might be that, at the algorithmic level, associative learning mechanisms are similar across social and non-social learning 1. Other theories have argued that differences might emerge because goal-directed actions are attributed to social agents which allows for very different inferences to be made about hidden traits or beliefs 10. Such inferences might fundamentally alter learning about social agents compared to non-social cues.

      Discussion, page 15:

      […] One potential explanation for the assumption of stable performance for social but not non-social predictors might be that participants attribute intentions and motivations to social agents. Even if the social and non-social evidence are the same, the belief that a social actor might have a goal may affect the inferences made from the same piece of information 10. Social advisors first learnt about the target’s distribution and accordingly gave advice on where to find the target. If the social agents are credited with goal-directed behaviour then it might be assumed that the goals remain relatively constant; this might lead participants to assume stability in the performances of social advisors. However, such goal-directed intentions might not be attributed to non-social cues, thereby making judgments inherently more uncertain and changeable across time. Such an account, focussing on differences in attribution in social settings aligns with a recent suggestion that any attempt to identify similarities or differences between social and non-social processes can occur at any one of a number of the levels in Marr’s information theory 7. Here we found that the same algorithm was able to explain social and non-social learning (a qualitatively similar computational model could explain both). However, the extent to which the algorithm was recruited when learning about social compared to non-social information differed. We observed a greater impact of uncertainty on judgments about social compared to non-social information. We have shown evidence for a degree of specialization when assessing social advisors as opposed to non-social cues. At the neural level we focused on two brain areas, dmPFC and pTPJ, that have not only been shown to carry signals associated with belief inferences about others but, in addition, recent combined fMRI-TMS studies have demonstrated the causal importance of these activity patterns for the inference process […]

      Another weakness is the lack of justifications of the behavioural data analyses. It is difficult for me to understand why 'performance matching' is suitable for an index of learning accuracy. I understand the optimal participant would adjust the interval size with respect to the estimated reliability of the advisor (i.e., angular error); however, I am wondering if the optimal strategy for participants is to exactly match the interval size with the angular error. Furthermore, the definitions of 'confidence adjustment across trials' and 'learning index' look arbitrary.

      First, having read the reviewer’s comments, we realise that our choice of the term ‘performance matching’ may not have been ideal as it indeed might not be the case that the participant intended to directly match their interval sizes with their estimates of advisor/predictor error. Like the reviewer, our assumption is simply that the interval sizes should change as the estimated reliability of the advisor changes and, therefore, that the intervals that the participants set should provide information about the estimates that they hold and the manner in which they evolve. On re-reading the manuscript we realised that we had not used the term ‘performance matching’ consistently or in many places in the manuscript. In the revised manuscript we have simply removed it altogether and referred to the participants’ ‘interval setting’.

      Most of the initial analyses in Figure 2a-c aim to better understand the raw behaviour before applying any computational model to the data. We were interested in how participants make confidence judgments (decision-making per se), but also how they adapt their decisions with additional information (changes or learning in decision making). In the revised manuscript we have made clear that these are used as simple behavioural measures and that they will be complemented later by more analyses derived from more formal computational models.

      In what we now refer to as the ‘interval setting’ analysis (Figure 2a), we tested whether participants select their interval settings differently in the social compared to non-social condition. We observe that participants set their intervals closer to the true angular error of the advisor/predictor in the social compared to the non-social condition. This observation could arise in two ways. First, it could be due to quantitative differences in learning despite general, qualitative similarity: mechanisms are similar but participants differ quantitatively in the way that they learn about non-social information and social information. Second, it could, however, reflect fundamentally different strategies. We tested basic performance differences by comparing the mean reward between conditions. There was no difference in reward between conditions (mean reward: paired t-test social vs. non-social, t(23)= 0.8, p=0.4, 95% CI= [-0.007 0.016]), suggesting that interval setting differences might not simply reflect better or worse performance in social or non-social contexts but instead might reflect quantitative differences in the processes guiding interval setting in the two cases.

      In the next set of analyses, in which we compared raw data, applied a computational model, and provided a theoretical account for the differences between conditions, we suggest that there are simple quantitative differences in how information is processed in social and nonsocial conditions but that these have the important impact of making long-term representations – representations built up over a longer series of trials – more important in the social condition. This, in turn, has implications for the neural activity patterns associated with social and non-social learning. We, therefore, agree with the reviewer, that one manner of interval setting is indeed not more optimal than another. However, the differences that do exist in behaviour are important because they reveal something about the social and non-social learning and its neural substrates. We have adjusted the wording and interpretation in the revised manuscript.

      Next, we analysed interval setting with two additional, related analyses: interval setting adjustment across trials and derivation of a learning index. We tested the degree to which participants adjusted their interval setting across trials and according to the prediction error (learning index, Figure f); the latter analysis is very similar to a trial-wise learning rate calculated in previous studies11. In contrast to many other studies, the intervals set by participants provide information about the estimates that they hold in a simple and direct way and enable calculation of a trial-wise learning index; therefore, we decided to call it ‘learning index’ instead of ‘learning rate’ as it is not estimated via a model applied to the data, but instead directly calculated from the data. Arguably the directness of the approach, and its lack of dependence on a specific computational model, is a strength of the analysis.

      Subsequently in the manuscript, a new analysis (illustrated in new Figure 3) employs Bayesian models that can simulate the differences in the social and non-social conditions and demonstrate that a number of behavioural observations can arise simply as a result of differences in noise in each trial-wise Bayesian update (Figure 3 and specifically 3d; Figure 3 – figure supplement 1b-c). In summary, the descriptive analyses in Figure 2a-c aid an intuitive understanding of the differences in behaviour in the social and non-social conditions. We have then repeated these analyses with Bayesian models incorporating different noise levels and showed that in such a way, the differences in behaviour between social and non-social conditions can be mimicked (please see next section and manuscript for details).

      We adjusted the wording in a number of sections in the revised manuscript such as in the legend of Figure 2 (figures and legend), Figure 4 (figures and legend).

      Main text, page 5:

      The confidence interval could be changed continuously to make it wider or narrower, by pressing buttons repeatedly (one button press resulted in a change of one step in the confidence interval). In this way participants provided what we refer to as an ’interval setting’.

      We also adjusted the following section in Main text, page 6:

      Confidence in the performance of social and non-social advisors

      We compared trial-by-trial interval setting in relation to the social and non-social advisors/predictors. When setting the interval, the participant’s aim was to minimize it while ensuring it still encompassed the final target position; points were won when it encompassed the target position but were greater when it was narrower. A given participant’s interval setting should, therefore, change in proportion to the participant’s expectations about the predictor’s angular error and their uncertainty about those expectations. Even though, on average, social and non-social sources did not differ in the precision with which they predicted the target (Figure 2 – figure supplement 1), participants gave interval settings that differed in their relationships to the true performances of the social advisors compared to the non-social predictors. The interval setting was closer to the angular error in the social compared to the non-social sessions (Figure 2a, paired t-test: social vs. non-social, t(23)= -2.57, p= 0.017, 95% confidence interval (CI)= [-0.36 -0.4]). Differences in interval setting might be due to generally lower performance in the nonsocial compared to social condition, or potentially due to fundamentally different learning processes utilised in either condition. We compared the mean reward amounts obtained by participants in the social and non-social conditions to determine whether there were overall performance differences. There was, however, no difference in the reward received by participants in the two conditions (mean reward: paired t-test social vs. non-social, t(23)= 0.8, p=0.4, 95% CI= [-0.007 0.016]), suggesting that interval setting differences might not simply reflect better or worse performance

      Discussion, page 14:

      Here, participants did not match their confidence to the likely accuracy of their own performance, but instead to the performance of another social or non-social advisor. Participants used different strategies when setting intervals to express their confidence in the performances of social advisors as opposed to non-social advisors. A possible explanation might be that participants have a better insight into the abilities of social cues – typically other agents – than non-social cues – typically inanimate objects.

      As the authors assumed simple Bayesian learning for the estimation of reliability in this study, the degree/speed of the learning should be examined with reference to the distance between the posterior and prior belief in the optimal Bayesian inference.

      We thank the reviewer for this suggestion. We agree with the reviewer that further analyses that aim to disentangle the underlying mechanisms that might differ between both social and non-social conditions might provide additional theoretical contributions. We show additional model simulations and analyses that aim to disentangle the differences in more detail. These new results allowed clearer interpretations to be made.

      In the current study, we showed that judgments made about non-social predictors were changed more strongly as a function of the subjective uncertainty: participants set a larger interval, indicating lower confidence, when they were more uncertain about the non-social cue’s accuracy to predict the target. In response to the reviewer’s comments, the new analyses were aimed at understanding under which conditions such a negative uncertainty effect might emerge.

      Prior expectations of performance First, we compared whether participants had different prior expectations in the social condition compared to the non-social condition. One way to compare prior expectations is by comparing the first interval set for each advisor/predictor. This is a direct readout of the initial prior expectation with which participants approach our two conditions. In such a way, we test whether the prior beliefs before observing any social or non-social information differ between conditions. Even though this does not test the impact of prior expectations on subsequent belief updates, it does test whether participants have generally different expectations about the performance of social advisors or non-social predictors. There was no difference in this measure between social or non-social cues (Figure below; paired t-test social vs. non-social, t(23)= 0.01, p=0.98, 95% CI= [-0.067 0.68]).

      Figure. Confidence interval for the first encounter of each predictor in social and non-social conditions. There was no initial bias in predicting the performance of social or non-social predictors.

      Learning across time We have now seen that participants do not have an initial bias when predicting performances in social or non-social conditions. This suggests that differences between conditions might emerge across time when encountering predictors multiple times. We tested whether inherent differences in how beliefs are updated according to new observations might result in different impacts of uncertainty on interval setting between social and non-social conditions. More specifically, we tested whether the integration of new evidence differed between social and non-social conditions; for example, recent observations might be weighted more strongly for non-social cues while past observations might be weighted more strongly for social cues. This approach was inspired by the reviewer’s comments about potential differences in the speed of learning as well as the reduction of uncertainty with increasing predictor encounters. Similar ideas were tested in previous studies, when comparing the learning rate (i.e. the speed of learning) in environments of different volatilities 12,13. In these studies, a smaller learning rate was prevalent in stable environments during which reward rates change slower over time, while higher learning rates often reflect learning in volatile environments so that recent observations have a stronger impact on behaviour. Even though most studies derived these learning rates with reinforcement learning models, similar ideas can be translated into a Bayesian model. For example, an established way of changing the speed of learning in a Bayesian model is to introduce noise during the update process14. This noise is equivalent to adding in some of the initial prior distribution and this will make the Bayesian updates more flexible to adapt to changing environments. It will widen the belief distribution and thereby make it more uncertain. Recent information has more weight on the belief update within a Bayesian model when beliefs are uncertain. This increases the speed of learning. In other words, a wide distribution (after adding noise) allows for quick integration of new information. On the contrary, a narrow distribution does not integrate new observations as strongly and instead relies more heavily on previous information; this corresponds to a small learning rate. So, we would expect a steep decline of uncertainty to be related to a smaller learning index while a slower decline of uncertainty is related to a larger learning index. We hypothesized that participants reduce their uncertainty quicker when observing social information, thereby anchoring more strongly on previous beliefs instead of integrating new observations flexibly. Vice versa, we hypothesized a less steep decline of uncertainty when observing non-social information, indicating that new information can be flexibly integrated during the belief update (new Figure 3a).

      We modified the original Bayesian model (Figure 2d, Figure 2 – figure supplement 2) by adding a uniform distribution (equivalent to our prior distribution) to each belief update – we refer to this as noise addition to the Bayesian model14,21 . We varied the amount of noise between δ = [0,1], while δ= 0 equals the original Bayesian model and δ= 1 represents a very noisy Bayesian model. The uniform distribution was selected to match the first prior belief before any observation was made (equation 2). This δ range resulted in a continuous increase of subjective uncertainty around the belief about the angular error (Figure 3b-c). The modified posterior distribution denoted as 𝑝′(σ x) was derived at each trial as follows:

      We applied each noisy Bayesian model to participants’ choices within the social and nonsocial condition.

      The addition of a uniform distribution changed two key features of the belief distribution: first, the width of the distribution remains larger with additional observations, thereby making it possible to integrate new observations more flexibly. To show this more clearly, we extracted the model-derived uncertainty estimate across multiple encounters of the same predictor for the original model and the fully noisy Bayesian model (Figure 3 – figure supplement 1). The model-derived ‘uncertainty estimate’ of a noisy Bayesian model decays more slowly compared to the ‘uncertainty estimate’ of the original Bayesian model (upper panel). Second, the model-derived ‘accuracy estimate’ reflects more recent observations in a noisy Bayesian model compared to the ‘accuracy estimate’ derived from the original Bayesian model, which integrates past observations more strongly (lower panel). Hence, as mentioned beforehand, a rapid decay of uncertainty implies a small learning index; or in other words, stronger integration of past compared to recent observations.

      In the following analyses, we tested whether an increasingly noisy Bayesian model mimics behaviour that is observed in the non-social compared to social condition. For example, we tested whether an increasingly noisy Bayesian model also exhibits a strongly negative ‘predictor uncertainty’ effect on interval setting (Figure 2e). In such a way, we can test whether differences in noise in the updating process of a Bayesian model might reproduce important qualitative differences in learning-related behaviour seen in the social and nonsocial conditions.

      We used these modified Bayesian models to simulate trial-wise interval setting for each participant according to the observations they made when selecting a particular advisor or non-social cue. We simulated interval setting at each trial and examined whether an increase in noise produced model behaviours that resembled participant behaviour patterns observed in the non-social condition as opposed to social condition. At each trial, we used the accuracy estimate (Methods, equation 6) – which represents a subjective belief about a single angular error -- to derive an interval setting for the selected predictor. To do so, we first derived the point-estimate of the belief distribution at each trial (Methods, equation 6) and multiplied it with the size of one interval step on the circle. The step size was derived by dividing the circle size by the maximum number of possible steps. Here is an example of transforming an accuracy estimate into an interval: let’s assume the belief about the angular error at the current trial is 50 (Methods, equation 6). Now, we are trying to transform this number into an interval for the current predictor on a given trial. To obtain the size of one interval step, the circle size (360 degrees) is divided by the maximum number of interval steps (40 steps; note, 20 steps on each side), which results in nine degrees that represents the size of one interval step. Next, the accuracy estimate in radians (0,87) is multiplied by the step size in radians (0,1571) resulting in an interval of 0,137 radians or 7,85 degrees. The final interval size would be 7,85.

      Simulating Bayesian choices in that way, we repeated the behavioural analyses (Figure 2b,e,f) to test whether intervals derived from more noisy Bayesian models mimic intervals set by participants in the non-social condition: greater changes in interval setting across trials (Figure 3 – figure supplement 1b), a negative ‘predictor uncertainty' effect on interval setting (Figure 3 – figure supplement 1c), and a higher learning index (Figure 3d).

      First, we repeated the most crucial analysis -- the linear regression analysis (Figure 2e) and hypothesized that intervals that were simulated from noisy Bayesian models would also show a greater negative ‘predictor uncertainty’ effect on interval setting. This was indeed the case: irrespective of social or non-social conditions, the addition of noise (increased weighting of the uniform distribution in each belief update) led to an increasingly negative ‘predictor uncertainty’ effect on confidence judgment (new Figure 3d). In Figure 3d, we show the regression weights (y-axis) for the ‘predictor uncertainty’ on confidence judgment with increasing noise (x-axis). This result is highly consistent with the idea that that in the non-social condition the manner in which task estimates are updated is more uncertain and more noisy. By contrast, social estimates appear relatively more stable, also according to this new Bayesian simulation analysis.

      This new finding extends the results and suggests a formal computational account of the behavioural differences between social and non-social conditions. Increasing the noise of the belief update mimics behaviour that is observed in the non-social condition: an increasingly negative effect of ‘predictor uncertainty’ on confidence judgment. Noteworthily, there was no difference in the impact that the noise had in the social and non-social conditions. This was expected because the Bayesian simulations are blind to the framing of the conditions. However, it means that the observed effects do not depend on the precise sequence of choices that participants made in these conditions. It therefore suggests that an increase in the Bayesian noise leads to an increasingly negative impact of ‘predictor uncertainty’ on confidence judgments irrespective of the condition. Hence, we can conclude that different degrees of uncertainty within the belief update is a reasonable explanation that can underlie the differences observed between social and non-social conditions.

      Next, we used these simulated confidence intervals and repeated the descriptive behavioural analyses to test whether interval settings that were derived from more noisy Bayesian models mimic behavioural patterns observed in non-social compared to social conditions. For example, more noise in the belief update should lead to more flexible integration of new information and hence should potentially lead to a greater change of confidence judgments across predictor encounters (Figure 2b). Further, a greater reliance on recent information should lead to prediction errors more strongly in the next confidence judgment; hence, it should result in a higher learning index in the non-social condition that we hypothesize to be perceived as more uncertain (Figure 2f). We used the simulated confidence interval from Bayesian models on a continuum of noise integration (i.e. different weighting of the uniform distribution into the belief update) and derived again both absolute confidence change and learning indices (Figure 3 – figure supplement 1b-c).

      ‘Absolute confidence change’ and ‘learning index’ increase with increasing noise weight, thereby mimicking the difference between social and non-social conditions. Further, these analyses demonstrate the tight relationship between descriptive analyses and model-based analyses. They show that a noise in the Bayesian updating process is a conceptual explanation that can account for both the differences in learning and the difference in uncertainty processing that exist between social and non-social conditions. The key insight conveyed by the Bayesian simulations is that a wider, more uncertain belief distribution changes more quickly. Correspondingly, in the non-social condition, participants express more uncertainty in their confidence estimate when they set the interval, and they also change their beliefs more quickly as expressed in a higher learning index. Therefore, noisy Bayesian updating can account for key differences between social and non-social condition.

      We thank the reviewer for making this point, as we believe that these additional analyses allow theoretical inferences to be made in a more direct manner; we think that it has significantly contributed towards a deeper understanding of the mechanisms involved in the social and non-social conditions. Further, it provides a novel account of how we make judgments when being presented with social and non-social information.

      We made substantial changes to the main text, figures and supplementary material to include these changes:

      Main text, page 10-11 new section:

      The impact of noise in belief updating in social and non-social conditions

      So far, we have shown that, in comparison to non-social predictors, participants changed their interval settings about social advisors less drastically across time, relied on observations made further in the past, and were less impacted by their subjective uncertainty when they did so (Figure 2). Using Bayesian simulation analyses, we investigated whether a common mechanism might underlie these behavioural differences. We tested whether the integration of new evidence differed between social and non-social conditions; for example, recent observations might be weighted more strongly for non-social cues while past observations might be weighted more strongly for social cues. Similar ideas were tested in previous studies, when comparing the learning rate (i.e. the speed of learning) in environments of different volatilities12,13. We tested these ideas using established ways of changing the speed of learning during Bayesian updates14,21. We hypothesized that participants reduce their uncertainty quicker when observing social information. Vice versa, we hypothesized a less steep decline of uncertainty when observing non-social information, indicating that new information can be flexibly integrated during the belief update (Figure 5a).

      We manipulated the amount of uncertainty in the Bayesian model by adding a uniform distribution to each belief update (Figure 3b-c) (equation 10,11). Consequently, the distribution’s width increases and is more strongly impacted by recent observations (see example in Figure 3 – figure supplement 1). We used these modified Bayesian models to simulate trial-wise interval setting for each participant according to the observations they made by selecting a particular advisor in the social condition or other predictor in the nonsocial condition. We simulated confidence intervals at each trial. We then used these to examine whether an increase in noise led to simulation behaviour that resembled behavioural patterns observed in non-social conditions that were different to behavioural patterns observed in the social condition.

      First, we repeated the linear regression analysis and hypothesized that interval settings that were simulated from noisy Bayesian models would also show a greater negative ‘predictor uncertainty’ effect on interval setting resembling the effect we had observed in the nonsocial condition (Figure 2e). This was indeed the case when using the noisy Bayesian model: irrespective of social or non-social condition, the addition of noise (increasing weight of the uniform distribution to each belief update) led to an increasingly negative ‘predictor uncertainty’ effect on confidence judgment (new Figure 3d). The absence of difference between the social and non-social conditions in the simulations, suggests that an increase in the Bayesian noise is sufficient to induce a negative impact of ‘predictor uncertainty’ on interval setting. Hence, we can conclude that different degrees of noise in the updating process are sufficient to cause differences observed between social and non-social conditions. Next, we used these simulated interval settings and repeated the descriptive behavioural analyses (Figure 2b,f). An increase in noise led to greater changes of confidence across time and a higher learning index (Figure 3 – figure supplement 1b-c). In summary, the Bayesian simulations offer a conceptual explanation that can account for both the differences in learning and the difference in uncertainty processing that exist between social and non-social conditions. The key insight conveyed by the Bayesian simulations is that a wider, more uncertain belief distribution changes more quickly. Correspondingly, in the non-social condition, participants express more uncertainty in their confidence estimate when they set the interval, and they also change their beliefs more quickly. Therefore, noisy Bayesian updating can account for key differences between social and non-social condition.

      Methods, page 23 new section:

      Extension of Bayesian model with varying amounts of noise

      We modified the original Bayesian model (Figure 2d, Figure 2 – figure supplement 2) to test whether the integration of new evidence differed between social and non-social conditions; for example, recent observations might be weighted more strongly for non-social cues while past observations might be weighted more strongly for social cues. [...] To obtain the size of one interval step, the circle size (360 degrees) is divided by the maximum number of interval steps (40 steps; note, 20 steps on each side), which results in nine degrees that represents the size of one interval step. Next, the accuracy estimate in radians (0,87) is multiplied by the step size in radians (0,1571) resulting in an interval of 0,137 radians or 7,85 degrees. The final interval size would be 7,85.

      We repeated behavioural analyses (Figure 2b,e,f) to test whether confidence intervals derived from more noisy Bayesian models mimic behavioural patterns observed in the nonsocial condition: greater changes of confidence across trials (Figure 3 – figure supplement 1b), a greater negative ‘predictor uncertainty' on confidence judgment (Figure 3 – figure supplement 1c) and a greater learning index (Figure 3d).

      Discussion, page 14: […] It may be because we make just such assumptions that past observations are used to predict performance levels that people are likely to exhibit next 15,16. An alternative explanation might be that participants experience a steeper decline of subjective uncertainty in their beliefs about the accuracy of social advice, resulting in a narrower prior distribution, during the next encounter with the same advisor. We used a series of simulations to investigate how uncertainty about beliefs changed from trial to trial and showed that belief updates about non-social cues were consistent with a noisier update process that diminished the impact of experiences over the longer term. From a Bayesian perspective, greater certainty about the value of advice means that contradictory evidence will need to be stronger to alter one’s beliefs. In the absence of such evidence, a Bayesian agent is more likely to repeat previous judgments. Just as in a confirmation bias 17, such a perspective suggests that once we are more certain about others’ features, for example, their character traits, we are less likely to change our opinions about them.

      Reviewer #2 (Public Review):

      Humans learn about the world both directly, by interacting with it, and indirectly, by gathering information from others. There has been a longstanding debate about the extent to which social learning relies on specialized mechanisms that are distinct from those that support learning through direct interaction with the environment. In this work, the authors approach this question using an elegant within-subjects design that enables direct comparisons between how participants use information from social and non-social sources. Although the information presented in both conditions had the same underlying structure, participants tracked the performance of the social cue more accurately and changed their estimates less as a function of prediction error. Further, univariate activity in two regions-dmPFC and pTPJ-tracked participants' confidence judgments more closely in the social than in the non-social condition, and multivariate patterns of activation in these regions contained information about the identity of the social cues.

      Overall, the experimental approach and model used in this paper are very promising. However, after reading the paper, I found myself wanting additional insight into what these condition differences mean, and how to place this work in the context of prior literature on this debate. In addition, some additional analyses would be useful to support the key claims of the paper.

      We thank the reviewer for their very supportive comments. We have addressed their points below and have highlighted changes in our manuscript that we made in response to the reviewer’s comments.

      (1) The framing should be reworked to place this work in the context of prior computational work on social learning. Some potentially relevant examples:

      • Shafto, Goodman & Frank (2012) provide a computational account of the domainspecific inductive biases that support social learning. In brief, what makes social learning special is that we have an intuitive theory of how other people's unobservable mental states lead to their observable actions, and we use this intuitive theory to actively interpret social information. (There is also a wealth of behavioral evidence in children to support this account; for a review, see Gweon, 2021).

      • Heyes (2012) provides a leaner account, arguing that social and non-social learning are supported by a common associative learning mechanism, and what distinguishes social from non-social learning is the input mechanism. Social learning becomes distinctively "social" to the extent that organisms are biased or attuned to social information.

      I highlight these papers because they go a step beyond asking whether there is any difference between mechanisms that support social and nonsocial learning-they also provide concrete proposals about what that difference might be, and what might be shared. I would like to see this work move in a similar direction.

      References<br /> (In the interest of transparency: I am not an author on these papers.)

      Gweon, H. (2021). Inferential social learning: how humans learn from others and help others learn. PsyArXiv. https://doi.org/10.31234/osf.io/8n34t

      Heyes, C. (2012). What's social about social learning?. Journal of Comparative Psychology, 126(2), 193.

      Shafto, P., Goodman, N. D., & Frank, M. C. (2012). Learning from others: The consequences of psychological reasoning for human learning. Perspectives on Psychological Science, 7(4), 341-351.

      Thank you for this suggestion to expand our framing. We have now made substantial changes to the Discussion and Introduction to include additional background literature, the relevant references suggested by the reviewer, addressing the differences between social and non-social learning. We further related our findings to other discussions in the literature that argue that differences between social and non-social learning might occur at the level of algorithms (the computations involved in social and non-social learning) and/or implementation (the neural mechanisms). Here, we describe behaviour with the same algorithm (Bayesian model), but the weighing of uncertainty on decision-making differs between social and non-social contexts. This might be explained by similar ideas put forward by Shafto and colleagues (2012), who suggest that differences between social and non-social learning might be due to the attribution of goal-directed intention to social agents, but not non-social cues. Such an attribution might lead participants to assume that advisor performances will be relatively stable under the assumption that they should have relatively stable goal-directed intentions. We also show differences at the implementational level in social and non-social learning in TPJ and dmPFC.

      Below we list the changes we have made to the Introduction and Discussion. Further, we would also like to emphasize the substantial extension of the Bayesian modelling which we think clarifies the theoretical framework used to explain the mechanisms involved in social and non-social learning (see our answer to the next comments below).

      Introduction, page 4:

      [...]<br /> Therefore, by comparing information sampling from social versus non-social sources, we address a long-standing question in cognitive neuroscience, the degree to which any neural process is specialized for, or particularly linked to, social as opposed to non-social cognition 2–9. Given their similarities, it is expected that both types of learning will depend on common neural mechanisms. However, given the importance and ubiquity of social learning, it may also be that the neural mechanisms that support learning from social advice are at least partially specialized and distinct from those concerned with learning that is guided by nonsocial sources.

      However, it is less clear on which level information is processed differently when it has a social or non-social origin. It has recently been argued that differences between social and non-social learning can be investigated on different levels of Marr’s information processing theory: differences could emerge at an input level (in terms of the stimuli that might drive social and non-social learning), at an algorithmic level or at a neural implementation level 7. It might be that, at the algorithmic level, associative learning mechanisms are similar across social and non-social learning 1. Other theories have argued that differences might emerge because goal-directed actions are attributed to social agents which allows for very different inferences to be made about hidden traits or beliefs 10. Such inferences might fundamentally alter learning about social agents compared to non-social cues.

      Discussion, page 15:

      […] One potential explanation for the assumption of stable performance for social but not non-social predictors might be that participants attribute intentions and motivations to social agents. Even if the social and non-social evidence are the same, the belief that a social actor might have a goal may affect the inferences made from the same piece of information 10. Social advisors first learnt about the target’s distribution and accordingly gave advice on where to find the target. If the social agents are credited with goal-directed behaviour then it might be assumed that the goals remain relatively constant; this might lead participants to assume stability in the performances of social advisors. However, such goal-directed intentions might not be attributed to non-social cues, thereby making judgments inherently more uncertain and changeable across time. Such an account, focussing on differences in attribution in social settings aligns with a recent suggestion that any attempt to identify similarities or differences between social and non-social processes can occur at any one of a number of the levels in Marr’s information theory 7. Here we found that the same algorithm was able to explain social and non-social learning (a qualitatively similar computational model could explain both). However, the extent to which the algorithm was recruited when learning about social compared to non-social information differed. We observed a greater impact of uncertainty on judgments about social compared to non-social information. We have shown evidence for a degree of specialization when assessing social advisors as opposed to non-social cues. At the neural level we focused on two brain areas, dmPFC and pTPJ, that have not only been shown to carry signals associated with belief inferences about others but, in addition, recent combined fMRI-TMS studies have demonstrated the causal importance of these activity patterns for the inference process […]

      (2) The results imply that dmPFC and pTPJ differentiate between learning from social and non-social sources. However, more work needs to be done to rule out simpler, deflationary accounts. In particular, the condition differences observed in dmPFC and pTPJ might reflect low-level differences between the two conditions. For example, the social task could simply have been more engaging to participants, or the social predictors may have been more visually distinct from one another than the fruits.

      We understand the reviewer’s concern regarding low-level distinctions between the social and non-social condition that could confound for the differences in neural activation that are observed between conditions in areas pTPJ and dmPFC. From the reviewer’s comments, we understand that there might be two potential confounders: first, low-level differences such that stimuli within one condition might be more distinct to each other compared to the relative distinctiveness between stimuli within the other condition. Therefore, simply the greater visual distinctiveness of stimuli in one condition than another might lead to learning differences between conditions. Second, stimuli in one condition might be more engaging and potentially lead to attentional differences between conditions. We used a combination of univariate analyses and multivariate analyses to address both concerns.

      Analysis 1: Univariate analysis to inspect potential unaccounted variance between social and non-social condition

      First, we used the existing univariate analysis (exploratory MRI whole-brain analysis, see Methods) to test for neural activation that covaried with attentional differences – or any other unaccounted neural difference -- between conditions. If there were neural differences between conditions that we are currently not accounting for with the parametric regressors that are included in the fMRI-GLM, then these differences should be captured in the constant of the GLM model. For example, if there are attentional differences between conditions, then we could expect to see neural differences between conditions in areas such as inferior parietal lobe (or other related areas that are commonly engaged during attentional processes).

      Importantly, inspection of the constant of the GLM model should capture any unaccounted differences, whether they are due to attention or alternative processes that might differ between conditions. When inspecting cluster-corrected differences in the constant of the fMRI-GLM model during the setting of the confidence judgment, there were no clustersignificant activation that was different between social and non-social conditions (Figure 4 – figure supplement 4a; results were familywise-error cluster-corrected at p<0.05 using a cluster-defining threshold of z>2.3). For transparency, we show the sub-threshold activation map across the whole brain (z > 2) for the ‘constant’ contrasted between social and nonsocial condition (i.e. constant, contrast: social – non-social).

      For transparency we additionally used an ROI-approach to test differences in activation patterns that correlated with the constant during the confidence phase – this means, we used the same ROI-approach as we did in the paper to avoid any biased test selection. We compared activation patterns between social and non-social conditions in the same ROI as used before; dmPFC (MNI-coordinate [x/y/z: 2,44,36] 16), bilateral pTPJ (70% probability anatomical mask; for reference see manuscript, page 23) and additionally compared activation patterns between conditions in bilateral IPLD (50% probability anatomical mask, 20). We did not find significantly different activation patterns between social and non-social conditions in any of these areas: dmPFC (confidence constant; paired t-test social vs nonsocial: t(23) = 0.06, p=0.96, [-36.7, 38.75]), bilateral TPJ (confidence constant; paired t-test social vs non-social: t(23) = -0.06, p=0.95, [-31, 29]), bilateral IPLD (confidence constant; paired t-test social vs non-social: t(23) = -0.58, p=0.57, [-30.3 17.1]).

      There were no meaningful activation patterns that differed between conditions in either areas commonly linked to attention (eg IPL) or in brain areas that were the focus of the study (dmPFC and pTPJ). Activation in dmPFC and pTPJ covaried with parametric effects such as the confidence that was set at the current and previous trial, and did not correlate with low-level differences such as attention. Hence, these results suggest that activation between conditions was captured better by parametric regressors such as the trial-wise interval setting, i.e. confidence, and are unlikely to be confounded by low-level processes that can be captured with univariate neural analyses.

      Analysis 2: RSA to test visual distinctiveness between social and non-social conditions

      We addressed the reviewer’s other comment further directly by testing whether potential differences between conditions might arise due to a varying degree of visual distinctiveness in one stimulus set compared to the other stimulus set. We used RSA analysis to inspect potential differences in early visual processes that should be impacted by greater stimulus similarity within one condition. In other words, we tested whether the visual distinctiveness of one stimuli set was different to the visual distinctiveness of the other stimuli set. We used RSA analysis to compare the Exemplar Discriminability Index (EDI) between conditions in early visual areas. We compared the dissimilarity of neural activation related to the presentation of an identical stimulus across trials (diagonal in RSA matrix) with the dissimilarity in neural activation between different stimuli across trials (off-diagonal in RSA matrix). If stimuli within one stimulus set are very similar, then the difference between the diagonal and off-diagonal should be very small and less likely to be significant (i.e. similar diagonal and off-diagonal values). In contrast, if stimuli within one set are very distinct from each other, then the difference between the diagonal and off-diagonal should be large and likely to result in a significant EDI (i.e. different diagonal and off-diagonal values) (see Figure 4g for schematic illustration). Hence, if there is a difference in the visual distinctiveness between social and non-social conditions, then this difference should result in different EDI values for both conditions – hence, visual distinctiveness between the stimuli set can be tested by comparing the EDI values between conditions within the early visual processing. We used a Harvard-cortical ROI mask based on bilateral V1. Negative EDI values indicate that the same exemplars are represented more similarly in the neural V1 pattern than different exemplars. This analysis showed that there was no significant difference in EDI between conditions (Figure 4 – figure supplement 4b; EDI paired sample t-test: t(23) = -0.16, p=0.87, 95% CI [-6.7 5.7]).

      We have further replicated results in V1 with a whole-brain searchlight analysis, averaging across both social and non-social conditions.

      In summary, by using a combination of univariate and multivariate analyses, we could test whether neural activation might be different when participants were presented with a facial or fruit stimuli and whether these differences might confound observed learning differences between conditions. We did not find meaningful neural differences that were not accounted for with the regressors included in the GLM. Further, we did not find differences in the visual distinctiveness between the stimuli sets. Hence, these control analyses suggest that differences between social and non-social conditions might not arise because of differences in low-level processes but are instead more likely to develop when learning about social or non-social information.

      Moreover, we also examined behaviourally whether participants differed in the way they approached social and non-social condition. We tested whether there were initial biases prior to learning, i.e. before actually receiving information from either social or non-social information sources. Therefore, we tested whether participants have different prior expecations about the performance of social compared to non-social predictors. We compared the confidence judgments at the first trial of each predictor. We found that participants set confidence intervals very similarly in social and non-social conditions (Figure below). Hence, it did not seem to be the case that differences between conditions arose due to low level differences in stimulus sets or prior differences in expectations about performances of social compared to non-social predictors. However, we can show that differences between conditions are apparent when updating one’s belief about social advisors or non-social cues and as a consequence, in the way that confidence judgments are set across time.

      Figure. Confidence interval for the first encounter of each predictor in social and non-social conditions. There was no initial bias in predicting the performance of social or non-social predictors.

      Main text page 13:

      [… ]<br /> Additional control analyses show that neural differences between social and non-social conditions were not due to the visually different set of stimuli used in the experiment but instead represent fundamental differences in processing social compared to non-social information (Figure 4 – figure supplement 4). These results are shown in ROI-based RSA analysis and in whole-brain searchlight analysis. In summary, in conjunction, the univariate and multivariate analyses demonstrate that dmPFC and pTPJ represent beliefs about social advisors that develop over a longer timescale and encode the identities of the social advisors.

      References

      1. Heyes, C. (2012). What’s social about social learning? Journal of Comparative Psychology 126, 193–202. 10.1037/a0025180.
      2. Chang, S.W.C., and Dal Monte, O. (2018). Shining Light on Social Learning Circuits. Trends in Cognitive Sciences 22, 673–675. 10.1016/j.tics.2018.05.002.
      3. Diaconescu, A.O., Mathys, C., Weber, L.A.E., Kasper, L., Mauer, J., and Stephan, K.E. (2017). Hierarchical prediction errors in midbrain and septum during social learning. Soc Cogn Affect Neurosci 12, 618–634. 10.1093/scan/nsw171.
      4. Frith, C., and Frith, U. (2010). Learning from Others: Introduction to the Special Review Series on Social Neuroscience. Neuron 65, 739–743. 10.1016/j.neuron.2010.03.015.
      5. Frith, C.D., and Frith, U. (2012). Mechanisms of Social Cognition. Annu. Rev. Psychol. 63, 287–313. 10.1146/annurev-psych-120710-100449.
      6. Grabenhorst, F., and Schultz, W. (2021). Functions of primate amygdala neurons in economic decisions and social decision simulation. Behavioural Brain Research 409, 113318. 10.1016/j.bbr.2021.113318.
      7. Lockwood, P.L., Apps, M.A.J., and Chang, S.W.C. (2020). Is There a ‘Social’ Brain? Implementations and Algorithms. Trends in Cognitive Sciences, S1364661320301686. 10.1016/j.tics.2020.06.011.
      8. Soutschek, A., Ruff, C.C., Strombach, T., Kalenscher, T., and Tobler, P.N. (2016). Brain stimulation reveals crucial role of overcoming self-centeredness in self-control. Sci. Adv. 2, e1600992. 10.1126/sciadv.1600992.
      9. Wittmann, M.K., Lockwood, P.L., and Rushworth, M.F.S. (2018). Neural Mechanisms of Social Cognition in Primates. Annu. Rev. Neurosci. 41, 99–118. 10.1146/annurev-neuro080317-061450.
      10. Shafto, P., Goodman, N.D., and Frank, M.C. (2012). Learning From Others: The Consequences of Psychological Reasoning for Human Learning. Perspect Psychol Sci 7, 341– 351. 10.1177/1745691612448481.
      11. McGuire, J.T., Nassar, M.R., Gold, J.I., and Kable, J.W. (2014). Functionally Dissociable Influences on Learning Rate in a Dynamic Environment. Neuron 84, 870–881. 10.1016/j.neuron.2014.10.013.
      12. Behrens, T.E.J., Woolrich, M.W., Walton, M.E., and Rushworth, M.F.S. (2007). Learning the value of information in an uncertain world. Nature Neuroscience 10, 1214– 1221. 10.1038/nn1954.
      13. Meder, D., Kolling, N., Verhagen, L., Wittmann, M.K., Scholl, J., Madsen, K.H., Hulme, O.J., Behrens, T.E.J., and Rushworth, M.F.S. (2017). Simultaneous representation of a spectrum of dynamically changing value estimates during decision making. Nat Commun 8, 1942. 10.1038/s41467-017-02169-w.
      14. Allenmark, F., Müller, H.J., and Shi, Z. (2018). Inter-trial effects in visual pop-out search: Factorial comparison of Bayesian updating models. PLoS Comput Biol 14, e1006328. 10.1371/journal.pcbi.1006328.
      15. Wittmann, M., Trudel, N., Trier, H.A., Klein-Flügge, M., Sel, A., Verhagen, L., and Rushworth, M.F.S. (2021). Causal manipulation of self-other mergence in the dorsomedial prefrontal cortex. Neuron.
      16. Wittmann, M.K., Kolling, N., Faber, N.S., Scholl, J., Nelissen, N., and Rushworth, M.F.S. (2016). Self-Other Mergence in the Frontal Cortex during Cooperation and Competition. Neuron 91, 482–493. 10.1016/j.neuron.2016.06.022.
      17. Kappes, A., Harvey, A.H., Lohrenz, T., Montague, P.R., and Sharot, T. (2020). Confirmation bias in the utilization of others’ opinion strength. Nat Neurosci 23, 130–137. 10.1038/s41593-019-0549-2.
      18. Trudel, N., Scholl, J., Klein-Flügge, M.C., Fouragnan, E., Tankelevitch, L., Wittmann, M.K., and Rushworth, M.F.S. (2021). Polarity of uncertainty representation during exploration and exploitation in ventromedial prefrontal cortex. Nat Hum Behav. 10.1038/s41562-020-0929-3.
      19. Yu, Z., Guindani, M., Grieco, S.F., Chen, L., Holmes, T.C., and Xu, X. (2022). Beyond t test and ANOVA: applications of mixed-effects models for more rigorous statistical analysis in neuroscience research. Neuron 110, 21–35. 10.1016/j.neuron.2021.10.030.
      20. Mars, R.B., Jbabdi, S., Sallet, J., O’Reilly, J.X., Croxson, P.L., Olivier, E., Noonan, M.P., Bergmann, C., Mitchell, A.S., Baxter, M.G., et al. (2011). Diffusion-Weighted Imaging Tractography-Based Parcellation of the Human Parietal Cortex and Comparison with Human and Macaque Resting-State Functional Connectivity. Journal of Neuroscience 31, 4087– 4100. 10.1523/JNEUROSCI.5102-10.2011.
      21. Yu, A.J., and Cohen, J.D. Sequential effects: Superstition or rational behavior? 8.
      22. Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., and Kriegeskorte, N. (2014). A Toolbox for Representational Similarity Analysis. PLoS Comput Biol 10, e1003553. 10.1371/journal.pcbi.1003553.
      23. Lockwood, P.L., Wittmann, M.K., Nili, H., Matsumoto-Ryan, M., Abdurahman, A., Cutler, J., Husain, M., and Apps, M.A.J. (2022). Distinct neural representations for prosocial and self-benefiting effort. Current Biology 32, 4172-4185.e7. 10.1016/j.cub.2022.08.010.
    1. Author response:

      Reviewer #1 (Public Review):

      This is an important and very well conducted study providing novel evidence on the role of zinc homeostasis for the control of infection with the intracellular bacterium S. typhimurium also disentangling the underlying mechanisms and providing clear evidence on the importance of spatio-temporal distribution of (free) zinc within the cell.

      We thank the reviewer for the positive comments.

      1) It would be important to provide more information on the genotype of mice.

      As suggested by the reviewer, we have added the detailed genotype of Slc30a1flagEGFP/+ and Slc30a1fl/flLysMCre mice to the revised supplementary Figure supplement 10.

      2) It is rather unlikely that C57Bl6 mice survive up to two weeks after i.p. injection of 1x10E5 bacteria.

      According to the reviewer comment, we have tested survival rate using a group of our experimental animals and C57BL/6 wild type.

      The Salmonella stain is a gift from our friend, Professor Ge Bao-xue. We have sent this stain for genetic characterisation which we found 100% identity to Salmonella enterica Typhimurium with many strains originated from poultry. One of them is Salmonella enterica subsp. enterica serovar Typhimurium strain MeganVac1 (Accession: CP112994.1), a live attenuated stain. We hope that this would support the relationship between the high infectious dose and mice survive.

      Author response image 1.

      (A) Survival rate of Slc30a1fl/fl and Slc30a1fl/flLysMCre (n = 14-15/group) and (B) Survival rate of C57BL/6 wild type (n = 8) after Salmonella infection for two weeks. (C) A fulllength sequence (1,478 bases) of 16S rDNA genes sequences of Salmonella stain and (D) the sequencing electropherogram.

      3) To be sure that macrophages Slc30A1 fl/fl LysMcre mice really have an impaired clearance of bacteria it would be important to rule out an effect of Slc30A1 deletion of bacterial phagocytosis and containment (f.e. evaluation of bacterial numbers after 30 min of infection).

      As the reviewer advised, we have repeated the experiment and measured the bacterial numbers after 30 min of infection (dashed line in A). The results show that there is no statistical difference in the bacterial numbers after 30 min between Slc30a1fl/flLysMCre and Slc30a1fl/fl BMDMs. Therefore, the reduction of bacterial numbers after 24 hours occurs due to the impairment of intracellular pathogen-killing capacity as the reviewer pointed out.

      Author respnse image 2.

      (A) Time course of the intracellular pathogen-killing capacity of Salmonellainfected Slc30a1fl/flLysMCre and Slc30a1fl/fl BMDMs measured in colony-forming units per ml (n = 5). (B) Fold change in Salmonella survival (CFU/mL) at different time points from A. (C) Representative images of Salmonella colonies on solid agar medium at 24 hours. Data are represented as mean ± SEM. P values were determined using 2-tailed unpaired Student’s t-test. P<0.05, *P<0.01, and ns, not significant.

      4) Does the addition of zinc to macrophages negatively affect iNOS transcription as previously observed for the divalent metal iron and is a similar mechanism also employed (CEBPß/NF-IL6 modulation) (Dlaska M et al. J Immunol 1999)?

      The reviewer has raised an important point here since free zinc also play a role in multiple levels of cellular signaling components (Kembe et al., 2015). Dlaska and colleague reported that NF-IL6, a protein responsible for iNOS transcription is negatively regulated by iron perturbation under IFNg/LPS stimulation in macrophages (Dlaska and Weiss, 1999). As the reviewer suggested, our results showed that zinc supplementation decreases the iNOS expression in macrophages after Salmonella infection, suggesting that free zinc might play a role in iNOS regulation.

      However, in Slc30a1fl/flLysMCre macrophages, despite increase intracellular free zinc, lacking Slc30a1 also induces Mt1, a zinc reservoir which might negatively affect NO production (Schwarz et al., 1995) or alternatively inhibits iNOS through NF-kB pathway (Cong et al., 2016) as reported by previous studies. Therefore, we couldn’t rule out the possibility that defects in Salmonella clearance due to iNOS/NO inhibition may be caused by a complex combination of excess free zinc and overexpression of the zinc reservoir. To prove this hypothesis, further studies using the specific target, for example Mtfl/fliNOSfl/flLysMCre model might be needed to investigate the precision mechanism.

      Author response image 3.

      RT-qPCR analysis of mRNA encoding Nos2 in BMDMs after infected with Salmonella and Salmonella plus ZnSO4 (20 μM) for 4 h.

      Reference:

      Dlaska M, Weiss G. 1999. Central role of transcription factor NF-IL6 for cytokine and ironmediated regulation of murine inducible nitric oxide synthase expression. The Journal of Immunology. 162:6171-6177, PMID: 10229861

      Kambe T, Tsuji T, Hashimoto A, Itsumura N. 2015. The physiological, biochemical, and molecular roles of zinc transporters in zinc homeostasis and metabolism. Physiological Reviews. 95:749-784. https://doi: 10.1152/physrev.00035.2014, PMID: 26084690

      Schwarz MA, Lazo JS, Yalowich JC, Allen WP, Whitmore M, Bergonia HA, Tzeng E, Billiar TR, Robbins PD, Lancaster JR Jr, et al. 1995. Metallothionein protects against the cytotoxic and DNA-damaging effects of nitric oxide. Proceedings of the National Academy of Sciences of the United States of America. 92: 4452-4456. https://doi: 10.1073/pnas.92.10.4452, PMID: 7538671

      Cong W, Niu C, Lv L, Ni M, Ruan D, Chi L, Wang Y, Yu Q, Zhan K, Xuan Y, Wang Y, Tan Y, Wei T, Cai L, Jin L. 2016. Metallothionein prevents age-associated cardiomyopathy via inhibiting NF-κB pathway activation and associated nitrative damage to 2-OGD. Antioxidants & Redox Signaling. 25: 936-952. https://doi: 10.1089/ars.2016.6648, PMID: 27477335

      5) How does Zinc or TPEN supplementation to bacteria in LB medium affect the log growth of Salmonella?

      We found that zinc supplementation at both low (20 µM) and high (640 µM) concentrations negatively effects Salmonella growth, especially during log phase and stationary phase in the broth culture medium, but not TPEN (20 µM) supplementation. These indicates that high zinc conditions occur at cellular levels such as within phagosomes (Botella et al., 2011) can limit bacterial growth.

      Author response image 4.

      Growth curve (optical density, OD 600 nm) of Salmonella in LB medium at different concentrations of ZnSO4 and/or TPEN. Bar graph indicating Salmonella growth at specific time points. Each value was expressed as mean of triplicates for each testing and data were determined using 2-tailed unpaired Student’s t-test. P<0.05, P<0.01, **P<0.001 and ns, not significant.

      Reference:

      Botella H, Peyron P, Levillain F, Poincloux R, Poquet Y, Brandli I, Wang C, Tailleux L, Tilleul S, Charrière GM, Waddell SJ, Foti M, Lugo-Villarino G, Gao Q, Maridonneau-Parini I, Butcher PD, Castagnoli PR, Gicquel B, de Chastellier C, Neyrolles O. 2011. Mycobacterial p(1)-type ATPases mediate resistance to zinc poisoning in human macrophages. Cell Host Microbe. 10:248-59. https://doi: 10.1016/j.chom.2011.08.006, PMID: 21925112

      Reviewer #2 (Public Review):

      This paper explores the importance of zinc metabolism in host defense against the intracellular pathogen Salmonella Typhimurium. Using conditional mice with a deletion of the Slc30a1 zinc exporter, the authors show a critical role for zinc homeostasis in the pathogenesis of Salmonella. Specifically, mice deficient in Slc30a1 gene in LysM+ myeloid cells are hypersusceptible to Salmonella infection, and their macrophages show alter phenotypes in response to Salmonella. The study adds important new information on the role metal homeostasis plays in microbe host interactions. Despite the strengths, the manuscript has some weaknesses. The authors conclude that lack of slc30a1 in macrophages impairs nos2-dependent anti-Salmonella activity. However, this idea is not tested experimentally. In addition, the research presented on Mt1 is preliminary. The text related to Figure 7 could be deleted without affecting the overall impact of the findings.

      We thank the reviewer for his/her positive comments and constructive suggestions.

      Reviewer #3 (Public Review):

      Na-Phatthalung et al observed that transcripts of the zinc transporter Slc30a1 was upregulated in Salmonella-infected murine macrophages and in human primary macrophages therefore they sought to determine if, and how, Slc30a1 could contribute to the control of bacterial pathogens. Using a reporter mouse the authors show that Slc30a1 expression increases in a subset of peritoneal and splenic macrophages of Salmonella-infected animals. Specific deletion of Slc30a1 in LysM+ cells resulted in a significantly higher susceptibility of mice to Salmonella infection which, counter to the authors conclusions, is not explained by the small differences in the bacterial burden observed in vivo and in vitro. Although loss of Slc30a1 resulted in reduced iNOS levels in activated macrophages, the study lacks experiments that mechanistically link loss of NO-mediated bactericidal activity to Salmonella survival in Slc30a1 deficient cells. The additional deletion of Mt1, another zinc binding protein, resulted in even lower nitrite levels of activated macrophages but only modest effects on Salmonella survival. By combining genetic approaches with molecular techniques that measure variables in macrophage activation and the labile zinc pool, Na-Phattalung et al successfully demonstrate that Slc30a1 and metallothionein 1 regulate zinc homeostasis in order to modulate effective immune responses to Salmonella infection. The authors have done a lot of work and the information that Slc30a1 expression in macrophages contributes to control of Salmonella infection in mice is a new finding that will be of interest to the field. Whether the mechanism by which SLC30A1 controls bacterial replication and/or lethality of infection involves nitric oxide production by macrophages remains to be shown.

      We very much appreciate the reviewer’s detailed evaluation and suggestions. The manuscript has been revised thoroughly according to the reviewer’s advice.

    1. Author response:

      Reviewer #1 (Public Review):

      In this study, Girardello et al. use proteomics to reveal the membrane tension sensitive caveolin-1 interactome in migrating cells. The authors use EM and surface rendering to demonstrate that caveolae formed at the rear of migrating cells are complex membrane-linked multilobed structures, and they devise a robust strategy to identify caveolin-1 associated proteins using APEX2-mediated proximity biotinylation. This important dataset is further validated using proximity ligation assays to confirm key interactions, and follows up with an interrogation of a surprising relationship between caveolae and RhoGTPase signalling, where caveolin-1 recruits ROCK1 under high membrane tension conditions, and ROCK1 activity is required to reform caveolae upon reversion to isotonic solution. However, caveolin-1 recruits the RhoA inactivator ARHGAP29 when membrane tension is low and ARHGAP29 overexpression leads to disassembly of caveolae and reduced cell motility. This study builds on previous findings linking caveolae to positive feedback regulation of RhoA signalling, and provides further evidence that caveolae serve to drive rear retraction in migration but also possess an intrinsic brake to limit RhoA activation, leading the authors to suggest that cycles of caveolae assembly and disassembly could thereby be central to establish a stable cell rear for persistent cell migration

      A major strength of the manuscript is the robust proteomic dataset. The experimental set up is well defined and mostly well controlled, and there is good internal validation in that the high abundance of core caveolar proteins in low membrane tension (isotonic) conditions, and absence under high membrane tension (brief hypo-osmotic shock) conditions, correlating very well with previous finding. The data could however be better presented to show where statically robust changes occur, and supplementary information should include a table of showing abundance. It's very good to see a link to PRIDE, providing a useful resource for the community.

      We thank the reviewer for the positive feedback. We have included the outputs from the search engine in Supplementary File 1.

      The authors detail several known interactions and their mechanosensitivty, but also report new interactors of caveolin-1. Several mechanosensitive interactions of caveolin-1 take place at the cell rear, but others are more diffuse across the cell looking at the PLA data (e.g FLN1, CTTN, HSPB1; Figure 4A-F and Figure 4 supplement 1). It is interesting to speculate that those at the cell rear are involved in caveolae, whilst others are linked specifically to caveolin-1 (e.g. dolines). PLA or localisation analysis with Cavin1/PTRF may be able to resolve this and further specify caveolae versus non-caveolae mechanosensitive interactions.

      We thank the reviewer for this interesting idea. It is true that many if not most proteins we identified to be associated with Cav1 are not restricted to the cell rear. To analyse to what extent the identified proteins interact with Cav1 at the rear we reanalysed our PLA data for some of the antibody combinations we looked at. This new analysis is now shown in Fig 5G. As expected, for Cav1/PTRF and Cav1/EHD2 most PLA dots (70-80%) were found at the rear. This rear bias is also evident from the representative images we show in the Figure panels 5A and 5E. On the contrary, much fewer PLA dots (~40%) were rear-localised for Cav1/CTTN and Cav1/FLNA antibody combinations. This reflects the much broader cellular distribution of these proteins compared to the core caveolae proteins, and might suggest that there are generally few links between caveolae and cortical actin. However, it is also possible that such links/interactions are more difficult to detect using PLA (because of the extended distance between caveolae and the actin cortex, or because of steric constraints).

      The Cav1/ARHGAP29 influence on YAP signalling is interesting, but appear to be quite isolated from the rest of the manuscript. Does overexpression of ARHGAP29 influence YAP signalling and/or caveolar protein expression/Cav1pY14?

      Our data and published work originally prompted us to speculate that there is a potential functional link between Cav1, YAP, and ARHGAP29. In an attempt to address this we have performed several Western blots on cell lysates from cells overexpressing ARHGAP29. We did not see major changes in Cav1 Y14 phosphorylation levels in cells overexpressing ARHGAP29, and YAP and pYAP levels also remained unchanged (not shown). In addition, based on previous literature 1,2 we expected to see an effect on ARHGAP29 mRNA levels and YAP target gene transcripts in Cav1 siRNA transfected cells. To our surprise, the mRNA levels of three independent YAP target genes and ARHGAP29 were unchanged in Cav1 siRNA treated cells (this is now shown in Figure 6 Figure Supplement 1). Our data therefore suggest that in RPE1 cells, the connection between Cav1 and ARHGAP29 is independent of YAP signalling, and that the increase in ARHGAP29 protein levels observed in Cav1 siRNA cells is due to some unknown post-translational mechanism.

      ARHGAP29 and RhoA/ROCK1 related observations are very interesting and potentially really important. However, the link between ARHGAP29 and caveolae is not well established (other than in proteomic data). PLA or FRET could help establish this.

      We agree that the physical and functional link between caveolae (or Cav1) and ARHGAP29 was not well worked out in the original manuscript. In an attempt to address this we have performed PLA assays in GFP-ARHGAP29 transfected cells (as we did not find a suitable ARHGAP29 antibody that works reliably in IF) using anti-Cav1 and anti-GFP antibodies. The PLA signal we obtained for Cav1 and ARHGAP29 was not significantly different to control PLA experiments. There was very little PLA signal to start with. This is not surprising given that ARHGAP29 localisation is mostly diffuse in the cytoplasm, whilst Cav1 is concentrated at the rear. In addition, in cases where we do see ARHGAP29 localisation at the cell cortex, Cav1 tends to be absent (this is now shown in Figure 6 – Figure Supplement 2E). In other words, with the tools we have available, we see little colocalization between Cav1 and ARHGAP29 at steady state. Altogether we speculate that ARHGAP29, through its negative effect on RhoA, flattens caveolae at the membrane or interferes with caveolae assembly at these sites.

      This of course prompts the question why ARHGAP29 was identified in the Cav1 proteome with such specificity and reproducibility in the first place? This can be explained by the way APEX2 labeling works. Proximity biotinylation with APEX2 is extremely sensitive and restricted to a labelling radius of ~20 nm 3. The labeling reaction is conducted on live and intact cells at room temperature for 1 min. Although 1 min appears short, dynamic cellular processes occur at the time scale of seconds and are ongoing during the labelling reaction. It is conceivable that within this 1 min time frame, ARHGAP29 cycles on and off the rear membrane (kiss and run). This allows ARHGAP29 to be biotinylated by Cav1-APEX2, resulting in its identification by MS. We have included this in the discussion section.

      The relationship between ARHGAP29 and RhoA signalling is not well defined. Is GAP activity important in determining the effect on migration and caveolae formation? What is the effect on RhoA activity? Alternatively, the authors could investigate YAP dependent transcriptional regulation downstream of overexpression.

      We have addressed this point using overexpression and siRNA transfections. We overexpressed ARHGAP29 or ARHGAP29 lacking its GAP domain and performed WB analysis against pMLC (which is a commonly used and reliable readout for RhoA and myosin-II activity). Much to our surprise, overexpression of ARHGAP29 increased (rather than decreased) pMLC levels, partially in a GAP-dependent manner (see Author response image 1). This is puzzling, as ARHGAP29 is expected to reduce RhoA-GTP levels, which in turn is expected to reduce ROCK activity and hence pMLC levels. In addition, and also surprisingly, siRNA-mediated silencing of ARHGAP29 did not significantly change pMLC levels. By contrast, pMLC levels were strongly reduced in Cav1 siRNA treated cells (this is shown in Fig. 6A and 6B in the revised manuscript). These new data underscore the important role of caveolae in the control of myosin-II activity, but do not allow us to draw any firm conclusions about the role of ARHGAP29 at the cell rear.

      Author response image 1.

      Overexpression of ARHGAP29 reduces, rather than increases pMLC in RPE1 cells.

      We are uncertain as to how to interpret the ARHGAP29 overexpression data presented in Author response image 1 and therefore decided not to include it in the manuscript. One possibility is that inactivation of RhoA below a certain critical threshold causes other mechanisms to compensate. For instance, the activity of alternative MLC kinases such as MLCK could be enhanced under these conditions. Another possibility is that ARHGAP29 controls MLC phosphorylation indirectly. For instance, it has been shown that ARHGAP29 promotes actin destabilization through inactivating LIMK/cofilin signalling 1. In agreement with this, we find that overexpression of ARHGAP29 reduces p-cofilin (serine 3) levels (see Author response image 2). Since cofilin and MLC crosstalk 4, it is possible that increased pMLC levels are the result of a feedback loop that compensates for the effect of actin depolymerisation. This is now discussed in the discussion section. Whichever the case, we hope the reviewers understand that deeper mechanistic insight into the intricate mechanisms of Rho signalling at the cell rear are beyond the scope of this manuscript.

      Author response image 2.

      Overexpression of ARHGAP29 reduces p-cofilin levels in RPE1.

      Reviewer #2 (Public Review):

      Girardello et al investigated the composition of the molecular machinery of caveolae governing their mechano-regulation in migrating cells. Using live cell imaging and RPE1 cells, the authors provide a spatio-temporal analysis of cavin-3 distribution during cell migration and reveal that caveolae are preferentially localized at the rear of the cell in a stable manner. They further characterize these structures using electron tomography and reveal an organization into clusters connected to the cell surface. By performing a proteomic approach, they address the interactome of caveolin-1 proteins upon mechanical stimulation by exposing RPE1 cells to hypo-osmotic shock (which aims to increase cell membrane tension) or not as a control condition. The authors identify over 300 proteins, notably proteins related to actin cytoskeleton and cell adhesion. These results were further validated in cellulo by interrogating protein-protein interactions using proximity ligation assays and hypo-osmotic shock. These experiments confirmed previous data showing that high membrane tension induces caveolae disassembly in a reversible manner. Eventually, based on literature and on the results collected by the proteomic analysis, authors investigated more deeply the molecular signaling pathway controlling caveolae assembly upon mechanical stimuli. First, they confirm the targeting of ROCK1 with Caveolin-1 and the implication of the kinase activity for caveolae formation (at the rear of the cell). Then, they show that RhoGAP ARHGAP29, a factor newly identified by the proteomic analysis, is also implicated in caveolae mechano-regulation likely through YAP protein and found that overexpression of RhoGAP ARHGAP29 affects cell motility. Overall, this paper interrogated the role of membrane tension in caveolae located at the rear of the cell and identified a new pathway controlling cell motility.

      Strengths:

      Using a proximity-based proteomic assay, the authors reveal the protein network interacting with caveolae upon mechanical stimuli. This approach is elegant and allows to identify a substantial new set of factors involved in the mechano-regulation of caveolin-1, some of which have been verified directly in the cell by PLA. This study provides a compelling set of data on the interactions between caveolae and its cortical network which was so far ill-characterized.

      We thank the reviewer for this positive feedback.

      Weaknesses:

      The methodology demonstrating an impact of membrane tension is not precise enough to directly assess a direct role on caveolae at a subcellular scale, that is between the front and the rear of the cell. First, a better characterization of the "front-rear" cellular model is encouraged.

      We agree with the reviewer that a quantitative analysis of the caveolae front-rear polarity would strengthen our conclusions. To address this, we have analysed the localisation of Cav1 and cavins in detail and in a large pool of cells, both in fixed and live cells. Our quantification clearly shows that Cav1 and cavins are enriched at the cell rear. This is now shown in Figure 1 and Figure 1 - Figure Supplement 1. To demonstrate that Cav1/cavins are truly rear-localised we analysed live migrating cells expressing tagged Cav1 or cavins. This analysis, which was performed on several individual time lapse movies, showed that caveolae rear localisation is remarkably stable (e.g. Figure 1C and 1D). We also present novel data panels and movies showing caveolae dynamics during rear retractions, in dividing cells, and in cells that polarise de novo. This new data is now described in the first paragraph of the results section.

      Secondly, authors frequently present osmotic shock as "high membrane tension" stimuli. While osmotic shock is widely used in the field, this study is focused only on caveolae localized at the rear of cell and it remains unclear how the level of a global mechanical stimuli triggered by an osmotic shock could mimic a local stimuli.

      We agree with the reviewer that osmotic shock will cause a global increase in membrane tension and therefore is only of limited value to understand how membrane tension is regulated at the rear, and how caveolae respond to such a local stimulus. It was not our aim nor is it our expertise to address such questions. To answer this sophisticated optogenetic approaches or localised membrane tension measurements (e.g. through the use of the Flipper-TR probe) are needed. It is beyond the scope of this manuscript to perform such experiments. However, given the strong enrichment of caveolae at the cell rear, we believe it is justified to propose that the changes we observe in the proteome do (mostly) reflect changes in caveolae at the rear. We have now included several quantifications on fixed cells, live cells, and PLA assays to support that caveolae are highly enriched at the rear. In addition, and importantly, a recent preprint by the Roux lab shows that membrane tension gradients indeed exist in many migrating and non-migrating cells 5. Using very similar hypotonic shock assays, the Caswell lab also showed that low membrane tension at the rear is required for caveolae formation 6. We have included a section in the discussion in which we elaborate on how membrane tension is controlled in migrating cells, and how it might regulate caveolae rear localisation.

      In the present case, it remains unknown the extent to which this mechanical stress is physiologically relevant to mimic mechanical forces applied at the rear of a migrating cell.

      This is true. Our study does not address the nature of mechanical forces at the cell rear. This a complex subject that is technically challenging to address, and therefore is beyond the scope of this manuscript.

      Some images are not satisfying to fully support the conclusions of the article.

      We agree that some of the images, in particular the ones presented for the PLA assays, do not always show a clear rear localisation of caveolae. We have explained above why this is the case. We hope that our new quantitative measurements, movies and figure panels, addresses the reviewer’s concern.

      At this stage, the lack of an unbiased quantitative analysis of the spatio-temporal analysis of caveolae upon well-defined mechanical stimuli is also needed.

      These are all very good points that were previously addressed beautifully by the Caswell group 6. To address this in part in our RPE1 cell system, we imaged RPE1 cells exposed to the ROCK inhibitor Y27632 (see Author response image 3). The data shows that cell rear retraction is impeded in response to ROCK inhibition, which is in line with several previous reports. Cavin-1 remained mostly associated with the cell rear, although the distribution appeared more diffuse. We believe this data does not add much new insight into how caveolae function at the rear, and hence was not included in the manuscript.

      Author response image 3.

      Effect of ROCK inhibition on cavin1 rear localisation and rear retraction. Cells were imaged one hour after the addition of Y27632.

      Cells on images, in particular Figure 1, are difficult to see. Signal-to noise ratio in different cell area could generate a biased. Since there is inconsistency between caveolae density and localization between Figures, more solid illustrations are needed along quantitative analysis.

      As mentioned above, we have carefully analysed the localisation of caveolae in fixed cells (using Cav1 and cavin1 antibodies as well as Cav1 and cavin fusion proteins) and in live cells transfected with various different caveolae proteins. The analysis clearly demonstrates an enrichment of caveolae at the rear (Figure 1 and Figure 1 – Figure Supplement 1). Our tomography and TEM data supports this as well (Figure 2).

      References:

      1. Qiao Y, Chen J, Lim YB, et al. YAP Regulates Actin Dynamics through ARHGAP29 and Promotes Metastasis. Cell reports. 2017;19(8):1495-1502.

      2. Rausch V, Bostrom JR, Park J, et al. The Hippo Pathway Regulates Caveolae Expression and Mediates Flow Response via Caveolae. Curr Biol. 2019;29(2):242-255 e246.

      3. Hung V, Udeshi ND, Lam SS, et al. Spatially resolved proteomic mapping in living cells with the engineered peroxidase APEX2. Nat Protoc. 2016;11(3):456-475.

      4. Wiggan O, Shaw AE, DeLuca JG, Bamburg JR. ADF/cofilin regulates actomyosin assembly through competitive inhibition of myosin II binding to F-actin. Dev Cell. 2012;22(3):530-543.

      5. Juan Manuel García-Arcos AM, Julissa Sánchez Velázquez, Pau Guillamat, Caterina Tomba, Laura Houzet, Laura Capolupo, Giovanni D’Angelo, Adai Colom, Elizabeth Hinde, Charlotte Aumeier, Aurélien Roux. Actin dynamics sustains spatial gradients of membrane tension in adherent cells. bioRxiv 20240715603517. 2024.

      6. Hetmanski JHR, de Belly H, Busnelli I, et al. Membrane Tension Orchestrates Rear Retraction in Matrix-Directed Cell Migration. Dev Cell. 2019;51(4):460-475 e410.

      7. Tsai TY, Collins SR, Chan CK, et al. Efficient Front-Rear Coupling in Neutrophil Chemotaxis by Dynamic Myosin II Localization. Dev Cell. 2019;49(2):189-205 e186.

      8. Mueller J, Szep G, Nemethova M, et al. Load Adaptation of Lamellipodial Actin Networks. Cell. 2017;171(1):188-200 e116.

      9. De Belly H, Yan S, Borja da Rocha H, et al. Cell protrusions and contractions generate long-range membrane tension propagation. Cell. 2023.

      10. Matthaeus C, Sochacki KA, Dickey AM, et al. The molecular organization of differentially curved caveolae indicates bendable structural units at the plasma membrane. Nat Commun. 2022;13(1):7234.

      11. Sinha B, Koster D, Ruez R, et al. Cells respond to mechanical stress by rapid disassembly of caveolae. Cell. 2011;144(3):402-413.

      12. Lieber AD, Schweitzer Y, Kozlov MM, Keren K. Front-to-rear membrane tension gradient in rapidly moving cells. Biophysical journal. 2015;108(7):1599-1603.

      13. Shi Z, Graber ZT, Baumgart T, Stone HA, Cohen AE. Cell Membranes Resist Flow. Cell. 2018;175(7):1769-1779 e1713.

      14. Grande-Garcia A, Echarri A, de Rooij J, et al. Caveolin-1 regulates cell polarization and directional migration through Src kinase and Rho GTPases. The Journal of cell biology. 2007;177(4):683-694.

      15. Grande-Garcia A, del Pozo MA. Caveolin-1 in cell polarization and directional migration. Eur J Cell Biol. 2008;87(8-9):641-647.

      16. Ludwig A, Howard G, Mendoza-Topaz C, et al. Molecular composition and ultrastructure of the caveolar coat complex. PLoS biology. 2013;11(8):e1001640.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors set out to extend modeling of bispecific engager pharmacology through explicit modelling of the search of T cells for tumour cells, the formation of an immunological synapse and the dissociation of the immunological synapse to enable serial killing. These features have not been included in prior models and their incorporation may improve the predictive value of the model.

      Thank you for the positive feedback.

      The model provides a number of predictions that are of potential interest- that loss of CD19, the target antigen, to 1/20th of its initial expression will lead to escape and that the bone marrow is a site where the tumour cells may have the best opportunity to develop loss variants due to the limited pressure from T cells.

      Thank you for the positive feedback.

      A limitation of the model is that adhesion is only treated as a 2D implementation of the blinatumomab mediated bridge between T cell and B cells- there is no distinct parameter related to the distinct adhesion systems that are critical for immunological synapse formation. For example, CD58 loss from tumours is correlated with escape, but it is not related to the target, CD19. While they begin to consider the immunological synapse, they don't incorporate adhesion as distinct from the engager, which is almost certainly important.

      We agree that adhesion molecules play critical roles in cell-cell interaction. In our model, we assumed these adhesion molecules are constant (or not showing difference across cell populations). This assumption made us to focus on the BiTE-mediated interactions.

      Revision: To clarify this point, we added a couple of sentences in the manuscript.

      “Adhesion molecules such as CD2-CD58, integrins and selectins, are critical for cell-cell interaction. The model did not consider specific roles played by these adhesion molecules, which were assumed constant across cell populations. The model performed well under this simplifying assumption”.

      In addition, we acknowledged the fact that “synapse formation is a set of precisely orchestrated molecular and cellular interactions. Our model merely investigated the components relevant to BiTE pharmacologic action and can only serve as a simplified representation of this process”.

      While the random search is a good first approximation, T cell behaviour is actually guided by stroma and extracellular matrix, which are non-isotropic. In a lymphoid tissue the stroma is optimised for a search that can be approximated as brownian, or more accurately, a correlated random walk, but in other tissues, particularly tumours, the Brownian search is not a good approximation and other models have been applied. It would be interesting to look at observations from bone marrow or other sites to determine the best approximating for the search related to BiTE targets.

      We agree that the tissue stromal factors greatly influence the patterns of T cell searching strategy. Our current model considered Brownian motion as a good first approximation for two reasons: 1) we define tissues as homogeneous compartments to attain unbiased evaluations of factors that influence BiTE-mediated cell-cell interaction, such as T cell infiltration, T: B ratio, and target expression. The stromal factors were not considered in the model, as they require spatially resolved tissue compartments to represent the gradients of stromal factors; 2) our model was primarily calibrated against in vitro data obtained from a “well-mixed” system that does not recapitulate specific considerations of tissue stromal factors. We did not obtain tissue-specific data to support the prediction of T cell movement. This is under current investigation in our lab. Therefore, we are cautious about assuming different patterns of T cell movement in the model when translating into in vivo settings. We acknowledged the limitation of our model for not considering the more physiologically relevant T-cell searching strategies.

      Revision: In the Discussion, we added a limitation of our model: “We assumed Brownian motion in the model as a good first approximation of T cell movement. However, T cells often take other more physiologically relevant searching strategies closely associated with many stromal factors. Because of these stromal factors, the cell-cell encounter probabilities would differ across anatomical sites.”

      Reviewer #3 (Public Review):

      Liu et al. combined mechanistic modeling with in vitro experiments and data from a clinical trial to develop an in silico model to describe response of T cells against tumor cells when bi-specific T cell engager (BiTE) antigens, a standard immunotherapeutic drug, are introduced into the system. The model predicted responses of T cell and target cell populations in vitro and in vivo in the presence of BiTEs where the model linked molecular level interactions between BiTE molecules, CD3 receptors, and CD19 receptors to the population kinetics of the tumor and the T- cells. Furthermore, the model predicted tumor killing kinetics in patients and offered suggestions for optimal dosing strategies in patients undergoing BiTE immunotherapy. The conclusions drawn from this combined approach are interesting and are supported by experiments and modeling reasonably well. However, the conclusions can be tightened further by making some moderate to minor changes in their approach. In addition, there are several limitations in the model which deserves some discussion.

      Strengths

      A major strength of this work is the ability of the model to integrate processes from the molecular scales to the populations of T cells, target cells, and the BiTE antibodies across different organs. A model of this scope has to contain many approximations and thus the model should be validated with experiments. The authors did an excellent job in comparing the basic and the in vitro aspects of their approach with in vitro data, where they compared the numbers of engaged target cells with T cells as the numbers of the BiTE molecules, the ratio of effector and target cells, and the expressions of the CD3 and CD19 receptors were varied. The agreement with the model with the data were excellent in most cases which led to several mechanistic conclusions. In particular, the study found that target cells with lower CD19 expressions escape the T cell killing.

      The in vivo extension of the model showed reasonable agreements with the kinetics of B cell populations in patients where the data were obtained from a published clinical trial. The model explained differences in B cell population kinetics between responders and non-responders and found that the differences were driven by the differences in the T cell numbers between the groups. The ability of the model to describe the in vivo kinetics is promising. In addition, the model leads to some interesting conclusions, e.g., the model shows that the bone marrow harbors tumor growth during the BiTE treatment. The authors then used the model to propose an alternate dosage scheme for BiTEs that needed a smaller dose of the drug.

      Thank you for the positive comments.

      Weaknesses

      There are several weaknesses in the development of the model. Multiscale models of this nature contain parameters that need to be estimated by fitting the model with data. Some these parameters are associated with model approximations or not measured in experiments. Thus, a common practice is to estimate parameters with some 'training data' and then test model predictions using 'test data'. Though Supplementary file 1 provides values for some of the parameters that appeared to be estimated, it was not clear which dataset were used for training and which for test. The confidence intervals of the estimated parameters and the sensitivity of the proposed in vivo dosage schemes to parameter variations were unclear.

      We agree with the reviewer on the model validation.

      Revision: To ensure reproducibility, we summarized model assumptions and parameter values/sources in the supplementary file 1. To mimic tumor heterogeneity and evolution process, we applied stochastic agent-based models, which are challenging to be globally optimized against the data. The majority of key parameters was obtained or derived from the literature. Details have been provided in the response to Reviewer 3 - Question 1. In our modeling process, we manually optimized sensitive coefficient (β) for base model using pilot in-vitro data and sensitive coefficient (β) for in-vivo model by re-calibrating against the in-vitro data at a low BiTE concentration. BiTE concentrations in patients (mostly < 2 ng/ml) is only relevant to the low bound of the concentration range we investigated in vitro (0.65-2000 ng/ml). We have added some clarification/limitation of this approach in the text (details are provided in the following question). We understand the concerns, but the agent-based modeling nature prevent us to do global optimization.

      The model appears to show few unreasonable behaviors and does not agree with experiments in several cases which could point to missing mechanisms in the model. Here are some examples. The model shows a surprising decrease in the T cell-target cell synapse formation when the affinity of the BiTEs to CD3 was increased; the opposite should have been more intuitive. The authors suggest degradation of CD3 could be a reason for this behavior. However, this probably could be easily tested by removing CD3 degradation in the model. Another example is the increase in the % of engaged effector cells in the model with increasing CD3 expressions does not agree well with experiments (Fig. 3d), however, a similar fold increase in the % of engaged effector cells in the model agrees better with experiments for increasing CD19 expressions (Fig. 3e). It is unclear how this can be explained given CD3 and CD19 appears to be present in similar copy numbers per cell (~104 molecules/cell), and both receptors bind the BiTE with high affinities (e.g., koff < 10-4 s-1).

      Thank you for pointing this out. The bidirectional effect of CD3 affinity on IS formation is counterintuitive. In a hypothetical situation when there is no CD3 downregulation, the bidirectional effect disappears (as shown below), consistent with our view that CD3 downregulation accounts for the counterintuitive behavior. We have included the simulation to support our point. From a conceptual standpoint, the inclusion of CD3 degradation means the way to maximize synapse formation is for the BiTE to first bind tumor antigen, after which the tumor-BiTE complex “recruits” a T cell through the CD3 arm.

      We agree that the model did not adequately capture the effect of CD3 expression at the highest BiTE concentration 100 ng/ml, while the effects at other BiTE concentrations were well captured (as shown below, left). The model predicted a much moderate effect of CD3 expression on IS formation at the highest concentration. This is partly because the model assumed rapid CD3 downregulation upon antibody engagement. We did a similar simulation as above, with moderate CD3 downregulation (as shown below, right). This increases the effect of CD3 expression at the highest BiTE concentration, consistent with experiments. Interestingly, a rapid CD3 downregulation rate, as we concluded, is required to capture data profiles at all other conditions. Considering BiTE concentration at 100 ng/ml is much higher than therapeutically relevant level in circulation (< 2 ng/ml), we did not investigate the mechanism underlying this inconsistent model prediction but we acknowledged the fact that the model under-predicted IS formation in Figure 3d. Notably, this discrepancy may rarely appear in our clinical predictions as the CD3 expression is low level and blood BiTE concentration is very low (< 2 ng/ml).

      Revision: we have made text adjustment to increase clarity on these points. In addition, we added: “The base model underpredicted the effect of CD3 expression on IS formation at 100 ng/ml BiTE concentration, which is partially because of the rapid CD3 downregulation upon BiTE engagement and assay variation across experimental conditions.”

      The model does not include signaling and activation of T cells as they form the immunological synapse (IS) with target cells. The formation IS leads to aggregation of different receptors, adhesion molecules, and kinases which modulate signaling and activation. Thus, it is likely the variations of the copy numbers of CD3, and the CD19-BiTE-CD3 will lead to variations in the cytotoxic responses and presumably to CD3 degradation as well. Perhaps some of these missing processes are responsible for the disagreements between the model and the data shown in Fig. 3. In addition, the in vivo model does not contain any development of the T cells as they are stimulated by the BiTEs. The differences in development of T cells, such as generation of dysfunctional/exhausted T cells could lead to the differences in responses to BiTEs in patients. In particular, the in vivo model does not agree with the kinetics of B cells after day 29 in non-responders (Fig. 6d); could the kinetics of T cell development play a role in this?

      We agree that intracellular signaling is critical to T cell activation and cytotoxic effects. IS formation, T cell activation, and cytotoxicity are a cascade of events with highly coordinated molecular and cellular interactions. Compared to the events of T cell activation and cytotoxicity, IS formation occurs at a relatively earlier time. As shown in our study, IS formation can occur at 2-5 min, while the other events often need hours to be observed. We found that IS formation is primarily driven by two intercellular processes: cell-cell encounter and cell-cell adhesion. The intracellular signaling would be initiated in the process of cell-cell adhesion or at the late stage of IS formation. We think these intracellular events are relevant but may not be the reason why our model did not adequately capture the profiles in Figure 3d at the highest BiTE concentrations. Therefore, we did not include intracellular signaling in the models. Another reason was that we simulated our models at an agent level to mimic the process of tumor evolution, which is computationally demanding. Intracellular events for each cell may make it more challenging computationally.

      T cell activation and exhaustion throughout the BiTE treatment is very complicated, time-variant and impacted by multiple factors like T cell status, tumor burden, BiTE concentration, immune checkpoints, and tumor environment. T cell proliferation and death rates are challenging to estimate, as the quantitative relationship with those factors is unknown. Therefore, T cell abundance (expansion) was considered as an independent variable in our model. T cell counts are measured in BiTE clinical trials. We included these data in our model to reveal expanded T cell population. Patients with high T cell expansion are often those with better clinical response. Notably, the T cell decline due to rapid redistribution after administration was excluded in the model. T cell abundance was included in the simulations in Figure 6 but not proof of concept simulations in Figure 7.

      In Figure 6d, kinetics of T cell abundance had been included in the simulations for responders and non-responders in MT103-211 study. Thus, the kinetics of T cell development can’t be used to explain the disagreement between model prediction and observation after day 29 in non-responders. The observed data is actually median values of B-cell kinetics in non-responders (N = 27) with very large inter-subject variation (baseline from 10-10000/μL), which makes it very challenging to be perfectly captured by the model. A lot of non-responders with severe progression dropped out of the treatment at the end of cycle 1, which resulted in a “more potent” efficacy in the 2nd cycle. This might be main reason for the disagreement.

      Variation in cytotoxic response was not included in our models. Tumor cells were assumed to be eradicated after the engagement with effecter cells, no killing rate or killing probability was implemented. This assumption reduced the model complexity and aligned well with our in-vitro and clinical data. Cytotoxic response in vivo is impacted by multiple factors like copy number of CD3, cytokine/chemokine release, tumor microenvironment and T cell activation/exhaustion. For example, the cytotoxic response and killing rate mediated by 1:1 synapse (ET) and other variants (ETE, TET, ETEE, etc.) are supposed to be different as well. Our model did not differentiate the killing rate of these synapse variants, but the model has quantified these synapse variants, providing a framework for us to address these questions in the future. We agree that differentiate the cytotoxic responses under different scenarios cell may improve model prediction and more explorations need to be done in the future.

      Revision: We added a discussion of the limitations which we believe is informative to future studies.

      “Our models did not include intracellular signaling processes, which are critical for T activation and cytotoxicity. However, our data suggests that encounter and adhesion are more relevant to initial IS formation. To make more clinically relevant predictions, the models should consider these intracellular signaling events that drive T cell activation and cytotoxic effects. Of note, we did consider the T cell expansion dynamics in organs as independent variable during treatment for the simulations in Figure 6. T cell expansion in our model is case-specific and time-varying.”

      References:

      Chen W, Yang F, Wang C, Narula J, Pascua E, Ni I, Ding S, Deng X, Chu ML, Pham A, Jiang X, Lindquist KC, Doonan PJ, Blarcom TV, Yeung YA, Chaparro-Riggers J. 2021. One size does not fit all: navigating the multi-dimensional space to optimize T-cell engaging protein therapeutics. MAbs 13:1871171. DOI: 10.1080/19420862.2020.1871171, PMID: 33557687

      Dang K, Castello G, Clarke SC, Li Y, AartiBalasubramani A, Boudreau A, Davison L, Harris KE, Pham D, Sankaran P, Ugamraj HS, Deng R, Kwek S, Starzinski A, Iyer S, Schooten WV, Schellenberger U, Sun W, Trinklein ND, Buelow R, Buelow B, Fong L, Dalvi P. 2021. Attenuating CD3 affinity in a PSMAxCD3 bispecific antibody enables killing of prostate tumor cells with reduced cytokine release. Journal for ImmunoTherapy of Cancer 9:e002488. DOI: 10.1136/jitc-2021-002488, PMID: 34088740

      Gong C, Anders RA, Zhu Q, Taube JM, Green B, Cheng W, Bartelink IH, Vicini P, Wang BPopel AS. 2019. Quantitative Characterization of CD8+ T Cell Clustering and Spatial Heterogeneity in Solid Tumors. Frontiers in Oncology 8:649. DOI: 10.3389/fonc.2018.00649, PMID: 30666298

      Mejstríková E, Hrusak O, Borowitz MJ, Whitlock JA, Brethon B, Trippett TM, Zugmaier G, Gore L, Stackelberg AV, Locatelli F. 2017. CD19-negative relapse of pediatric B-cell precursor acute lymphoblastic leukemia following blinatumomab treatment. Blood Cancer Journal 7: 659. DOI: 10.1038/s41408-017-0023-x, PMID: 29259173

      Samur MK, Fulciniti M, Samur AA, Bazarbachi AH, Tai YT, Prabhala R, Alonso A, Sperling AS, Campbell T, Petrocca F, Hege K, Kaiser S, Loiseau HA, Anderson KC, Munshi NC. 2021. Biallelic loss of BCMA as a resistance mechanism to CAR T cell therapy in a patient with multiple myeloma. Nature Communications 12:868. DOI: 10.1038/s41467-021-21177-5, PMID: 33558511

      Xu X, Sun Q, Liang X, Chen Z, Zhang X, Zhou X, Li M, Tu H, Liu Y, Tu S, Li Y. 2019. Mechanisms of relapse after CD19 CAR T-cell therapy for acute lymphoblastic leukemia and its prevention and treatment strategies. Frontiers in Immunology 10:2664. DOI: 10.3389/fimmu.2019.02664, PMID: 31798590

      Yoneyama T, Kim MS, Piatkov K, Wang H, Zhu AZX. 2022. Leveraging a physiologically-based quantitative translational modeling platform for designing B cell maturation antigen-targeting bispecific T cell engagers for treatment of multiple myeloma. PLOS Computational Biology 18: e1009715. DOI: 10.1371/journal.pcbi.1009715, PMID: 35839267

    1. Author Response

      Reviewer #1 (Public Review):

      This study examines the factors underlying the assembly of MreB, an actin family member involved in mediating longitudinal cell wall synthesis in rod-shaped bacteria. Required for maintaining rod shape and essential for growth in model bacteria, single molecule work indicates that MreB forms treadmilling polymers that guide the synthesis of new peptidoglycan along the longitudinal cell wall. MreB has proven difficult to work with and the field is littered with artifacts. In vitro analysis of MreB assembly dynamics has not fared much better as helpfully detailed in the introduction to this study. In contrast to its distant relative actin, MreB is difficult to purify and requires very specific conditions to polymerize that differ between groups of bacteria. Currently, in vitro analysis of MreB and related proteins has been mostly limited to MreBs from Gram-negative bacteria which have different properties and behaviors from related proteins in Gram-positive organisms.

      Here, Mao and colleagues use a range of techniques to purify MreB from the Gram-positive organism Geobacillus stearothermophilus, identify factors required for its assembly, and analyze the structure of MreB polymers. Notably, they identify two short hydrophobic sequences-located near one another on the 3-D structure-which are required to mediate membrane anchoring.

      With regard to assembly dynamics, the authors find that Geobacillus MreB assembly requires both interactions with membrane lipids and nucleotide binding. Nucleotide hydrolysis is required for interaction with the membrane and interaction with lipids triggers polymerization. These experiments appear to be conducted in a rigorous manner, although the salt concentration of the buffer (500mM KCl) is quite high relative to that used for in vitro analysis of MreBs from other organisms. The authors should elaborate on their decision to use such a high salt buffer, and ideally, provide insight into how it might impact their findings relative to previous work.

      Response 1.1. MreB proteins are notoriously difficult to maintain in a soluble form. Some labs deleted the N-terminal amphipathic or hydrophobic sequences to increase solubility, while other labs used full-length protein but high KCl concentration (300 mM KCl) (Harne et al, 2020; Pande et al., 2022; Popp et al, 2010; Szatmari et al, 2020). Early in the project, we tested many conditions and noticed that high KCl helped keeping a slightly better solubility of full length MreBGs, without the need for deleting a part of the protein. In addition, concentrations of salt > 100 mM would better mimic the conditions met by the protein in vivo. While 50-100 mM KCl is traditionally used in actin polymerization assays, physiological salt concentrations are around 100-150 mM KCl in invertebrates and vertebrates (Schmidt-Nielsen, 1975), around 50-250 in fungal and plant cells (Rodriguez-Navarro, 2000) and 200-300 mM in the budding yeast (Arino et al, 2010). However, cytoplasmic K+ concentration varies greatly (up to 800 mM) depending on the osmolality of the medium in both E. coli (Cayley et al, 1991; Epstein & Schultz, 1965; Rhoads et al, 1976), and B. subtilis, in which the basal intracellular concentration of KCl was estimated to be ~ 350 mM (Eisenstadt, 1972; Whatmore et al, 1990). 500 mM KCl can therefore be considered as physiological as 100 mM KCl for bacterial cells. Since we observed plenty of pairs of protofilaments at 500 mM KCl and this condition helped to avoid aggregation, we kept this high concentration as a standard for most of our experiments. Nonetheless, we had also performed TEM polymerization assays at 100 mM in line with most of MreB and F-actin in vitro literature, and found no difference in the polymerization (or absence of polymerization) conditions. This was indicated in the initial submission (e.g. M&M section L540 and footnote of Table S2) but since two reviewers bring it up as a main point, it is evident we failed at communicating it clearly, for which we apologize. This has been clarified in the revised version of the manuscript. We have also almost systematically added the 100 mM KCl concentration too as per reviewer #2 request and to conciliate our salt conditions with those used for some in vitro analysis of MreBs from other organisms (see also response to reviewer #2 comments 1A and 1B = Responses 2.1A, 2.1B below). We then decided to refer to the 100 mM KCl concentration as our “standard condition” in the revised version of the manuscript, but we compile and compare the results obtained at 500 mM too, as both concentrations are within the physiological range in Bacillus.

      Additionally, this study, like many others on MreB, makes much of MreB's relationship to actin. This leads to confusion and the use of unhelpful comparisons. For example, MreB filaments are not actin-like (line 58) any more than any polymer is "actin-like." As evidenced by the very beautiful images in this manuscript, MreB forms straight protofilaments that assemble into parallel arrays, not the paired-twisted polymers that are characteristic of F-actin. Generally, I would argue that work on MreB has been hindered by rather than benefitted from its relationship to actin (E.g early FP fusion data interpreted as evidence for an MreB endoskeleton supporting cell shape or depletion experiments implicating MreB in chromosome segregation) and thus such comparisons should be avoided unless absolutely necessary.

      Response 1.2. We completely agree with reviewer #1 regarding unhelpful comparisons of actin and MreB, and that work on MreB has been traditionally hindered from its relationship to eukaryotic actin. MreB is nonetheless a structural homolog of actin, with a close structural fold and common properties (polymerization into pairs of protofilaments, ATPase activity…). It still makes sense to refer to a protein with common features, common ancestry and widely studied as long as we don’t enclose our mind into a conceptual framework. This said, actin and MreB diverged very early in evolution, which may account for differences in their biochemical properties and cellular functions. Current data on MreB filaments confirm that they display F-actin-like and F-actin-unlike properties. We thank the reviewer for this insightful comment. We have revised the text to remove any inaccurate or unhelpful comparison to actin (in particular the ‘actin-like filaments’ statement, previously used once)

      Reviewer #2 (Public Review):

      The paper "Polymerization cycle of actin homolog MreB from a Gram-positive bacterium" by Mao et al. provides the second biochemical study of a gram-positive MreB, but importantly, the first study examines how gram-positive MreB filaments bind to membranes. They also show the first crystal structure of a MreB from a Gram-positive bacterium - in two nucleotide-bound forms, finally solving structures that have been missing for too long. They also elucidate what residues in Geobacillus MreB are required for membrane associations. Also, the QCM-D approach to monitoring MreB membrane associations is a direct and elegant assay.

      While the above findings are novel and important, this paper also makes a series of conclusions that run counter to multiple in vitro studies of MreBs from different organisms and other polymers with the actin fold. Overall, they propose that Geobacillus MreB contains biochemical properties that are quite different than not only the other MreBs examined so far but also eukaryotic actin and every actin homolog that has been characterized in vitro. As the conclusions proposed here would place the biochemical properties of Geobacillus MreB as the sole exception to all other actin fold polymers, further supporting experiments are needed to bolster these contrasting conclusions and their overall model.

      Response 2.0. We are grateful to reviewer #2 for stressing out the novelty and importance of our results. Most of our conclusions were in line with previous in vitro studies of MreBs (formation of pairs of straight filaments on a lipid layer, both ATP and GTP binding and hydrolysis, distortion of liposomes…), to the exception of the claimed requirement of NTP hydrolysis for membrane binding prior to polymerization based on the absence of pairs of filaments in free solution or in the presence of AMP-PNP in our experimental conditions (which we agree was not sufficient to make such a bold claim, see below). Thanks to the reviewer’s comments, we have performed many controls and additional experiments that lead us to refine our results and largely conciliate them with the literature. Please see the answer to the global review comments - our conclusions have been revised on the basis of our new data.

      1. (Difference 1) - The predominant concern about the in vitro studies that makes it difficult to evaluate many of their results (much less compare them to other MreB/s and actin homologs) is the use of a highly unconventional polymerization buffer containing 500(!) mM KCL. As has been demonstrated with actin and other polymers, the high KCl concentration used here (500mM) is certain to affect the polymerization equilibria, as increasing salt increases the hydrophobic effect and inhibits salt bridges, and therefore will affect the affinity between monomers and filaments. For example, past work has shown that high salt greatly changes actin polymerization, causing: a decreased critical concentration, increased bundling, and a greatly increased filament stiffness (Kang et al., 2013, 2012). Similarly, with AlfA, increased salt concentrations have been shown to increase the critical concentration, decrease the polymerization kinetics, and inhibit the bundling of AlfA filaments (Polka et al., 2009).

      A more closely related example comes from the previous observation that increasing salt concentrations increasingly slow the polymerization kinetics of B. subtilis MreB (Mayer and Amann, 2009). Lastly, These high salt concentrations might also change the interactions of MreB(Gs) with the membrane by screening charges and/or increasing the hydrophobic effect. Given that 500mM KCl was used throughout this paper, many (if not all) of the key experiments should be repeated in more standard salt concentration (~100mM), similar to those used in most previous in vitro studies of polymers.

      Response 2.1A. As per reviewer #2 request, we have done at 100 mM KCl too most experiments (TEM, cryo-EM, QCMD and ATPase assays) initially performed at 500 mM KCl only. The KCl concentration affects both membrane binding and filament stiffness as anticipated by the reviewer but the main conclusions are the same. The revised version of the manuscript compiles and compares the results obtained at both high and low [KCl], both concentrations being within the physiological range in Bacillus. Please see point 1 of the response to the global review comments and the first response to reviewer 1 (Response 1.1) for further elaboration.

      Please note that in Mayer & Amann, 2009 (B. subtilis MreB), light scattering in free solution was inversely proportional to the KCl concentration, with the higher light scattering signal at 0 mM KCl (!), a > 2-fold reduction below 30 mM KCl and no scatter at all at 250 mM, suggesting a “salting in” phenomenon (see also the “Other Points to address” answers 1A and 2, below) (Mayer & Amann, 2009). Since no effective polymer formation (e.g. polymers shown by EM) was demonstrated in these experiments, it cannot be excluded that KCl was simply preventing aggregation of B. subtilis MreB in solution, as we observe. For all their other light scattering experiments, the ‘standard polymerization condition’ used by Mayer & Amann was 0.2 mM ATP, 5 mM MgCl2, 1 mM EGTA and 10 mM imidazole pH 7.0, to which MreB (in 5 mM Tris pH 8.0) was added. No KCl was present in their ‘standard’ polymerization conditions.

      This would test if the many divergent properties of MreB(Gs) reported here arise from some difference in MreB(Gs) relative to other MreBs (and actin homologs), or if they arise from the 400mM difference in salt concentration between the studies. Critically, it would also allow direct comparisons to be made relative to previous studies of MreB (and other actin homologs) that used much lower salt, thereby allowing them to definitively demonstrate whether MreB(Gs) is indeed an outlier relative to other MreB and actin homologs. I would suggest using 100mM KCL, as historically, all polymerization assays of actin and numerous actin homologs have used 50-100mM KCL: 50mM KCl (for actin in F buffer) or 100mM KCl for multiple prokaryotic actin homologs and MreB (Deng et al., 2016; Ent et al., 2014; Esue et al., 2006, 2005; Garner et al., 2004 ; Polka et al., 2009 ; Rivera et al., 2011 ; Salje et al., 2011). Likewise, similar salt concentrations are standard for tubulin (80 mM K-Pipes) and FtsZ (100 mM KCl or 100mM KAc in HMK100 buffer).

      Response 2.1B. We appreciate the reviewer’s feedback on this point. Please note that, although actin polymerization assays are historically performed at 50-100 mM KCl and thus 100 mM KCl was used for other bacterial actin homologs (MamK, ParM and AlfA), MreB polymerization assays have previously been reported at 300 mM KCl too (Harne et al., 2020; Pande et al., 2022; Popp et al., 2010; Szatmari et al., 2020), which is closer to the physiological salt concentration in bacterial cells (see Response 1.1), but also in the absence of KCl (see above). As a matter of fact, we originally wanted to use a “standard polymerization condition” based on the literature on MreB, before realizing there was none: only half used KCl (the other half used NaCl, or no monovalent salt at all) and among these, KCl concentrations varied (out of 8 publications, 2 used 20 mM KCl, 2 used 50 mM KCl and 4 used 300 mM KCl).

      1. (Difference 2) - One of the most important differences claimed in this paper is that MreB(Gs) filaments are straight, a result that runs counter to the curved T. Maritima and C. crescentus filaments detailed by the Löwe group (Ent et al., 2014; Salje et al., 2011). Importantly, this difference could also arise from the difference in salt concentrations used in each study (500mM here vs. 100mM in the Löwe studies), and thus one cannot currently draw any direct comparisons between the two studies.

      One example of how high salt could be causing differences in filament geometry: high salts are known to greatly increase the bending stiffness of actin filaments, making them more rigid (Kang et al., 2013). Likewise, increasing salt is known to change the rigidity of membranes. As the ability of filaments to A) bend the membrane or B) Deform to the membrane depends on the stiffness of filaments relative to the stiffness of the membrane, the observed difference in the "straight vs. curved" conformation of MreB filaments might simply arise from different salt concentrations. Thus, in order to draw several direct comparisons between their findings and those of other MreB orthologs (as done here), the studies of MreB(GS) confirmations on lipids should be repeated at the same buffer conditions as used in the Löwe papers, then allowing them to be directly compared.

      Response 2.2. We fully agreed with reviewer #2 that the salts could be affecting the assay and did cryo-EM experiments also in the presence of 100 mM KCl as requested. The results unambiguously showed countless curved liposomes on the contact areas with MreB (Fig. 2F-G and Fig. 2-S5), very similar to what was reported for Thermotoga and Caulobacter MreBs by the Lowe group. Our results therefore confirm the previous findings that MreBs can bend lipids, and suggest that, indeed, high salt may increase filament stiffness as it has been shown for actin filaments. We are very grateful to reviewer #2 for his suggestion and for drawing our attention to the work of Kang et al, 2013. The different bending observed when varying the salt concentration raise relevant questions regarding the in vivo behavior of MreB, since KCl was shown to vary greatly depending on the medium composition. The manuscript has been updated accordingly in the Results (from L243) and Discussion sections (L585-595).

      1. (Difference 3) - The next important difference between MreB(Gs) and other MreBs is the claim that MreB polymers do not form in the absence of membranes.

      A) This is surprising relative to other MreBs, as MreBs from 1) T. maritime (multiple studies), E.coli (Nurse and Marians, 2013), and C. crescentus (Ent et al., 2014) have been shown to form polymers in solution (without lipids) with electron microscopy, light scattering, and time-resolved multi-angle light scattering. Notably, the Esue work was able to observe the first phase of polymer formation and a subsequent phase of polymer bundling (Esue et al., 2006) of MreB in solution. 2) Similarly, (Mayer and Amann, 2009) demonstrated B. subtilis MreB forms polymers in the absence of membranes using light scattering.

      Response 2.3A. The literature does convincingly show that Thermotoga MreB forms polymers in solution, without lipids (note that for Caulobacter MreB filaments were only reported in the presence of lipids, (van den Ent et al, 2014)). Assemblies reported in solution are bundles or sheets (included in at the earlier time points in the time-resolved EM experiments reported by Esue et al. 2006 mentioned by the reviewer – ‘2 minutes after adding ATP, EM revealed that MreB formed short filamentous bundles’) (Esue et al, 2006). However, and as discussed above (Response 2.1A), the light scattering experiments in Mayer et Amann, 2009 do not conclusively demonstrate the presence of polymers of B. subtilis MreB in solution (Mayer & Amann, 2009). We performed many light scattering experiments of B. subtilis MreB in solution in the past (before finding out that filaments were only forming in the presence of lipids), and got similar scattering curves (see two examples of DLS experiments in Author response image 1) in conditions in which NO polymers could ever been observed by EM while plenty of aggregates were present.

      Author response image 1.

      We did not consider these results publishable in the absence of true polymers observed by TEM. As pointed out on the interesting study from Nurse et al. (on E. coli MreB) (Nurse & Marians, 2013), one cannot rely only on light scattering only because non-specific aggregates would show similar patterns than polymers. Over the last two decades, about 15 publications showed polymers of MreB from several Gram-negative species, while none (despite the efforts of many) showed a single convincing MreB polymer from a Gram-positive bacterium by EM. A simple hypothesis is that a critical parameter was missing, and we present convincing evidence that lipids are critical for Geobacillus MreB to form pairs of filaments in the conditions tested. However, in solution too we do occasionally see pairs of filaments (Fig 2-S2), and also sheet-like structures among aggregates when the concentration of MreB is increased (Fig. 2-S2 and Fig. 3-S2). Thus, we agree with the reviewer that it cannot be claimed that Geobacillus MreB is unable to polymerize in the absence of lipids, but rather that lipids strongly stimulate its polymerization, condition depending.

      B) The results shown in figure 5A also go against this conclusion, as there is only a 2-fold increase in the phosphate release from MreB(Gs) in the presence of membranes relative to the absence of membranes. Thus, if their model is correct, and MreB(Gs) polymers form only on membranes, this would require the unpolymerized MreB monomers to hydrolyze ATP at 1/2 the rate of MreB in filaments. This high relative rate of hydrolysis of monomers compared to filaments is unprecedented. For all polymers examined so far, the rate of monomer hydrolysis is several orders of magnitude less than that of the filament. For example, actin monomers are known to hydrolyze ATP 430,000X slower than the monomers inside filaments (Blanchoin and Pollard, 2002; Rould et al., 2006).

      Response 2.3B. We agree with the reviewer. We have now found conditions where sheets of MreB form in solution (at high MreB concentration) in the presence of ADP and AMP-PNP. However, we have now added several controls that exclude efficient formation of polymers in solution in the presence of ATP at low concentrations of MreBGs (≤ 1.5 µM), the condition used for the malachite green assays. At these MreB concentrations, pairs of filaments are observed in the presence of lipids, but very unfrequently in solution, and sheets are not observed in solution either (Fig. 2-S2A, B). Yet, albeit puzzling, in these conditions Pi release is reproducibly observed in solution, reduced only ~ 2 to 3-fold relative to Pi release in the presence of lipids (Fig. 5A and Fig. 5-S1). A reinforcing observation is when the ATPase assays is performed at 100 mM KCl (Fig. 5A). In this condition MreB binding to lipids is increased relative to 500 mM KCl (Fig. 4-S4C), and the stimulation of the ATPase activity by the presence of lipids is also stronger that at 500 mM (Fig. 5-S1A). Further work is needed to characterize in detail the ATPase activity of MreB proteins, for which data in the literature is very scarce. We can’t exclude that MreB could nucleate in solution or form very unstable filaments that cannot be seen in our EM assay but consume ATP in the process. At the moment, the significance of the Pi released in solution is unknown and will require further investigation.

      C) Thus, there is a strong possibility that MreB(Gs) polymers are indeed forming in solution in addition to those on the membrane, and these "solution polymers" may not be captured by their electron microscopy assay. For example, high salt could be interfering with the absorption of filaments to glow discharged lacking lipids.

      Response 2.3C. We appreciate the reviewer’s insight about this critical point. Polymers presented in the original Fig. 2A were obtained at 500 mM KCl but we had tested the polymerization of MreB at 100 mM KCl as well, without noticing differences. We have nonetheless redone this quantitatively and used these data for the revised Fig. 2A, as we are now using 100 mM KCl as our standard polymerization condition throughout the revised manuscript. We also followed the other suggestion of the reviewer and tested glow discharged grids (a more classic preparation for soluble proteins) vs non-glow discharged EM grids, as well as a higher concentration of MreB. Grids are generally glow-discharged to make them hydrophilic in order to adsorb soluble proteins, but the properties of MreB (soluble but obviously presenting hydrophobic domains) made difficult to predict what support putative soluble polymers would preferentially interact with. Septins for example bind much better to hydrophobic grids despite their soluble properties (I. Adriaans, personal communication). Virtually no double filaments were observed in solution at either low or high [MreB]. The fact that in some conditions (high [MreB], other nucleotides) we were able to detect sheet-like structures excluded a technical issue that would prevent the detection of existing but “invisible” polymers here. We have added these new data in Fig. 2-S2.

      As indicated above, the reviewer’s comments made us realize that we could not state or imply that MreB cannot polymerize in the absence of lipids. As a matter of fact, we always saw some random filaments in the EM fields, both in solution and in the presence of non-hydrolysable analogues, at very low frequency (Fig. 2A). And we do see now sheets at high MreB concentration (Fig. 2-S2B). We could be just missing the optimal conditions for polymerisation in solution, while our phrasing gave the impression that no polymers could ever form in the absence of ATP or lipids. Therefore, we have:

      1) analyzed all TEM data to present it as semi-quantitative TEM, using our methodology originally implemented for the analysis of the mutants

      2) reworked the text to remove any issuing statements and to indicate that MreBGs was only found to bind to a lipid monolayer as a double protofilament in the presence of ATP/GTP but that this does not exclude that filaments may also form in other conditions.

      In order to definitively prove that MreB(Gs) does not have polymers in solution, the authors should:

      i) conduct orthogonal experiments to test for polymers in solution. The simplest test of polymerization might be conducting pelleting assays of MreB(Gs) with and without lipids, sweeping through the concentration range as done in 2B and 5a.

      Response 2.3Ci. Following reviewer #2 suggestion, we conducted a series of sedimentation assays in the presence and in the absence of lipids, at low (100 mM) and high (500 mM) salt, for both the wild-type protein and the three membrane-anchoring mutants (all at 1.3 µM). Sedimentation experiments in salt conditions preventing aggregation in solution (500 mM KCl) fitted with our TEM results: MreB wild-type pelleting increased in the presence of both ATP and lipids (Fig. R1). The sedimentation was further increased at 100 mM KCl, which would fit our other results indicating an increased interaction of MreB with the membrane. However, in addition to be poorly reproducible (in our hands), the approach does not discriminate between polymers and aggregates (or monomers bound to liposomes) and since MreB has a strong tendency to aggregate, we believe that the technique is ill-suited to reliably address MreB polymerization and prefer not to include sedimentation data in our manuscript. The recent work from Pande et al. (2022) illustrates well this issue since no sedimentation of MreB (at 2 µM) was observed in solution in conditions supporting polymerization (at 300 mM KCl): ‘the protein does not pellet on its own in the absence of liposome, irrespective of its polymerization state’, implying that sedimentation does not allow to detect MreB5 filaments in solution (Pande et al., 2022).

      ii) They also could examine if they see MreB filaments in the absence of lipids at 100mM salt (as was seen in both Löwe studies), as the high salt used here might block the charges on glow discharged grids, making it difficult for the polymer to adhere.

      See above, Response 2.3C

      iii) Likewise, the claim that MreB lacking the amino-terminus and the α2β7 hydrophobic loop "is required for polymerization" is questionable as if deleting these resides blocks membrane binding, the lack of polymers on the membrane on the grid is not unexpected, as these filaments that cannot bind the membrane would not be observable. Given these mutants cannot bind the membrane, mutant polymers could still indeed exist in solution, and thus pelleting assays should be used to test if non-membrane associated filaments composed of these mutants do or do not exist.

      Response 2.3Ciii. This is a fair point, we thank the reviewer for this remark. We did not mean to state or imply that the hydrophobic loop was required for polymerization per se, but that polymerization into double filaments only efficiently occurs upon membrane binding, which is mediated by the two hydrophobic sequences. We tested all three mutants by sedimentation as suggested by reviewer #2. In the salt condition that limits aggregation (500 mM KCl) the mutants did not pellet while the wild-type protein did (in the presence of lipids) (Fig. R2 below), in agreement with our EM data. We tested the absence of lipids on the mutant bearing the 2 deletions and observed that the (partial) sedimentation observed at low KCl concentration was ATP and lipid dependent (Fig. R3).

      Given our concerns about MreB sedimentation assays (see above, Response 2.3Ci), we prefer not to include these sedimentation data in our manuscript. Instead, we tested by TEM the possible polymerization of the mutants in solution (we only tested them in the presence of lipids in the initial submission). No filaments were detected in solution for any of the mutants (Fig. 4-S3A).

      A final note, the results shown in "Figure 1 - figure supplement 2, panel C" appear to directly refute the claim that MreB(Gs) requires lipids to polymerize. As currently written, it appears they can observe MreB(Gs) filaments on EM grids without lipids. If these experiments were done in the presence of lipids, the figure legend should be updated to indicate that. If these experiments were done in the absence of lipids, the claim that membrane association is required for MreB polymerizations should be revised.

      The TEM experiments show were indeed performed in the presence of lipids. We apologize for this was not clearly stated in the legend. To prevent all confusion, we have nevertheless removed these images in this figure since the polymerization conditions and lipid requirement are not yet presented when this figure is referred to in the text. We have instead added a panel with the calibration curve for the size exclusion profiles as per request of reviewer #3. The main point of this figure is to show the tendency of MreBGs to aggregate: analytical size-exclusion chromatography shows a single peak corresponding to the monomeric MreBGs, molecular weight ~ 37 KDa, in our purification conditions, but it can readily shift to a peak corresponding to high MW aggregates, depending on the protein concentration and/or storage conditions.

      1. (Difference 4) - The next difference between this study and previous studies of MreB and actin homologs is the conclusion that MreB(Gs) must hydrolyze ATP in order to polymerize. This conclusion is surprising, given the fact that both T. Maritima (Salje · 2011, Bean 2008) and B. subtilis MreB (Mayer 2009) have been shown to polymerize in the presence of ATP as well as AMP-PNP.

      Likewise, MreB polymerization has been shown to lag ATP hydrolysis in not only T. maritima MreB (Esue 2005), eukaryotic actin, and all other prokaryotic actin homologs whose polymerization and phosphate release have been directly compared: MamK (Deng et al., 2016), AlfA (Polka et al., 2009), and two divergent ParM homologs (Garner et al., 2004; Rivera et al., 2011). Currently, the only piece of evidence supporting the idea that MreB(Gs) must hydrolyze ATP in order to polymerize comes from 2 observations: 1) using electron microscopy, they cannot see filaments of MreB(Gs) on membranes in the presence of AMP-PNP or ApCpp, and 2) no appreciable signal increase appears testing AMPPNP- MreB(Gs) using QCM-D. This evidence is by no means conclusive enough to support this bold claim: While their competition experiment does indicate AMPPNP binds to MreB(Gs), it is possible that MreB(Gs) cannot polymerize when bound to AMPPNP.

      For example, it has been shown that different actin homologs respond differently to different non-hydrolysable analogs: Some, like actin, can hydrolyze one ATP analog but not the other, while others are able to bind to many different ATP analogs but only polymerize with some of one of them.

      Response 2.4. We agree with the reviewer, it is uncertain what analogs bind because they are quite different to ATP and some proteins just do not like them, they can change conditions such that filaments stop forming as well and be (theoretically) misleading. This is why we had tested ApCpp in addition to AMP-PNP as non-hydrolysable analog (Fig. 3A). As indicated above, our new complementary experiments (Fig. 3-S1B-D) now show that some rare (i.e. unfrequently and in limited amount) dual polymers are detected in the presence of ApCpp (Fig. 3A) and at high MreB concentration only in the presence of AMP-PNP (Fig. 3-S1B-D), suggesting different critical concentrations in the presence of alternative nucleotides. We have dampened our conclusions, in the light of our new data, and modified the discussion accordingly.

      Thus, to further verify their "hydrolysis is needed for polymerization" conclusion, they should:

      A. Test if a hydrolysis deficient MreB(Gs) mutant (such as D158A) is also unable to polymerize by EM.

      Response 2.4A. We thank the reviewer for this suggestion. As this conclusion has been reviewed on the basis of our new data (see previous response), testing putative ATPase deficient mutants is no longer required here. The study of ATPase mutants is planned for future studies (see Response 3.10 to reviewer #3).

      B. They also should conduct an orthogonal assay of MreB polymerization aside from EM (pelleting assays might be the easiest). They should test if polymers of ATP, AMP-PNP, and MreB(Gs)(D158A) form in solution (without membranes) by conducting pelleting assays. These could also be conducted with and without lipids, thereby also addressing the points noted above in point 3.

      Response 2.4B. Please see Response 2.3Ci above.

      C. Polymers may indeed form with ATP-gamma-S, and this non-hydrolysable ATP analog should be tested.

      Response 2.4C. It is fairly possible that ATP-γ-S supports polymerization since it is known to be partially hydrolysable by actin giving a mild phenotype (Mannherz et al, 1975). This molecule can even be a bona fide substrate for some ATPases (e.g. (Peck & Herschlag, 2003). Thus, we decided to exclude this “non-hydrolysable” analog and tested instead AMP-PNP and ApCpp. We know that ATP-γ-S has been and it is still frequently used, but we preferred to avoid it for the moment for the above-indicated reasons. We chose AMPPNP and AMPPCP instead because (1) they were shown to be completely non-hydrolysable by actin, in contrast to ATP-γ-S; (2) they are widely used (the most commonly used for structural studies; (Lacabanne et al, 2020), (3) AMPPNP was previously used in several publications on MreB (Bean & Amann, 2008; Nurse & Marians, 2013; Pande et al., 2022; Popp et al., 2010; Salje et al, 2011; van den Ent et al., 2014)and thus would allow direct comparison. AMPPCP was added to confirm the finding with AMP-PNP. There are many other analogs that we are planning to explore in future studies (see next Response, 2.4D).

      D. They could also test how the ADP-Phosphate bound MreB(Gs) polymerizes in bulk and on membranes, using beryllium phosphate to trap MreB in the ADP-Pi state. This might allow them to further refine their model.

      Response 2.4D. We plan to address the question of the transition state in depth in following-up work, using a series of analogs and mutants presumably affected in ATPase activity, both predicted and identified in a genetic screen. As indicated above, it is uncertain what analogs bind because they are quite different to ATP and some may bind but prevent filament formation. Thus, we anticipate that trying just one may not be sufficient, they can change conditions and be (theoretically) misleading and thus a thorough analysis is needed to address this question. Since our model and conclusions have been revised on the basis of our new data, we believe that these experiments are beyond the scope of the current manuscript.

      E. Importantly, the Mayer study of B. subtilis MreB found the same results in regard to nucleotides, "In polymerization buffer, MreB produced phosphate in the presence of ATP and GTP, but not in ADP, AMP, GDP or AMP-PNP, or without the readdition of any nucleotide". Thus this paper should be referenced and discussed

      Response 2.4E. We agree that Pi release was detected previously. We have added the reference (L121)

      1. (Difference 5) - The introduction states (lines 128-130) "However, the need for nucleotide binding and hydrolysis in polymerization remains unclear due to conflicting results, in vivo and in vitro, including the ability of MreB to polymerize or not in the presence of ADP or the non-hydrolysable ATP analog AMP-PNP."

      A) While this is a great way to introduce the problem, the statement is a bit vague and should be clarified, detaining the conflicting results and appropriate references. For example, what conflicting in vivo results are they referring to? Regarding "MreB polymerization in AMP-PNP", multiple groups have shown the polymerization of MreB(Tm) in the presence of AMP-PNP, but it is not clear what papers found opposing results.

      Response 2.5A. Thanks for the comment. We originally did not detail these ‘conflicting results’ in the Introduction because we were doing it later in the text, with the appropriate references, in particular in the Discussion (former L433-442). We have now removed this from the Discussion section and added a sentence in the introduction too (L123-130) quickly detailing the discrepancies and giving the references.

      • For more clarity, we have removed the “in vivo” (which referred to the distinct results reported for the presumed ATPase mutants by the Garner and Graumann groups) and focus on the in vitro discrepancies only.

      • These discrepancies are the following: while some studies showed indeed polymerization (as assessed by EM) of MreBTm in the presence of AMPPNP, the studies from Popp et al and Esue et al on T. maritima MreB, and of Nurse et al on E. coli MreB reported aggregation in the presence of AMP-PNP (Esue et al., 2006; Popp et al., 2010) or ADP (Nurse & Marians, 2013), or no assembly in the presence of ADP (Esue et al., 2006). As for the studies reporting polymerization in the presence of AMP-PNP by light scattering only (Bean & Amann, 2008; Gaballah et al, 2011; Mayer & Amann, 2009; Nurse & Marians, 2013), they could not differentiate between aggregates or true polymers and thus cannot be considered conclusive.

      B) The statement "However, the need for nucleotide binding and hydrolysis in polymerization remains unclear due to conflicting results, in vivo and in vitro, including the ability of MreB to polymerize or not in the presence of ADP or the non-hydrolyzable ATP analog AMP-PNP" is technically incorrect and should be rephrased or further tested.

      i. For all actin (or tubulin) family proteins, it is not that a given filament "cannot polymerize" in the presence of ADP but rather that the ADP-bound form has a higher critical concentration for polymer formation relative to the ATP-bound form. This means that the ADP polymers can indeed polymerize, but only when the total protein exceeds the ADP critical concentration. For example, many actin-family proteins do indeed polymerize in ADP: ADP actin has a 10-fold higher critical concentration than ATP actin, (Pollard, 1984) and the ADP critical concentrations of AlfA and ParM are 5X and 50X fold higher (respectively) than their ATP-bound forms(Garner et al., 2004; Polka et al., 2009)

      Response 2.5Bi. Absolutely correct. We apologize for the lack of accuracy of our phrasing and have corrected it (L123).

      ii. Likewise, (Mayer and Amann, 2009) have already demonstrated that B. subtilis MreB can polymerize in the presence of ADP, with a slightly higher critical concentration relative to the ATP-bound form.

      Response 2.5Bii. In Mayer and Amann, 2009, the same light scattering signal (interpreted as polymerization) occurred regardless of the nucleotide, and also in the absence of nucleotide (their Fig. 10) and ATP-, ADP- and AMP-PNP-MreB ‘displayed nearly indistinguishable critical concentrations’. They concluded that MreB polymerization is nucleotide-independent. Please see below (responses to ’Other points to address’) our extensive answer to the Mayer & Amann recurring point of reviewer #2

      Thus, to prove that MreB(Gs) polymers do not form in the presence of ADP would require one to test a large concentration range of ADP-bound MreB(Gs). They should test if ADP- MreB(Gs) polymerizes at the highest MreB(Gs) concentrations that can be assayed. Even if this fails, it may be the MreB(Gs) ADP polymerizes at higher concentrations than is possible with their protein preps (13uM). An even more simple fix would be to simply state MreB(Gs)-ADP filaments do not form beneath a given MreB(Gs) concentration.

      We agree with the reviewer. Our wording was overstating our conclusions. Based on our new quantifications (Fig. 3-S1B, D), we have rephrased the results section and now indicate that pairs of filaments are occasionally observed in the presence of ADP in our conditions across the range of MreB concentration that could be tested, suggesting a higher critical concentration for MreB-ADP (L310-312). Only at the highest MreB concentration, sheet- and ribbon-like structures were observed in the presence of ADP (Fig. 3-S2B).

      Other Points to address:

      1) There are several points in this paper where the work by Mayer and Amann is ignored, not cited, or readily dismissed as "hampered by aggregation" without any explanation or supporting evidence of that fact.

      We have cited the Mayer study where appropriate. However, we cannot cite it as proof of polymerization in such or such condition since their approach does not show that polymers were obtained in their conditions. Again, they based all their conclusions solely on light scattering experiments, which cannot differentiate between polymers and aggregates.

      A) Lines 100-101 - While the irregular 3-D formations seen formed by MreB in the Dersch 2020 paper could be interpreted as aggregates, stating that the results from specifically the Gaballah and Meyer papers (and not others) were "hampered by aggregation" is currently an arbitrary statement, with no evidence or backing provided. Overall, these lines (and others in the paper) dismiss these two works without giving any evidence to that point. Thus, they should provide evidence for why they believe all these papers are aggregation, or remove these (and other) dismissive statements.

      We apologize if our statements about these reports seemed dismissive or disrespectful, it was definitely not our intention. Light scattering shows an increase of size of particles over time, but there is no way to tell if the scattering is due to organized (polymers) or disorganized (aggregation) assemblies. Thus, it cannot be considered a conclusive evidence of polymerization without the proof that true filaments are formed by the protein in the conditions tested, as confirmed by EM for example. MreB is known to easily aggregate (see our size exclusion chromatography profiles and ones from Dersch 2020 (Dersch et al, 2020), and note that no chromatography profiles were shown in the Mayer report) and, as indicated above, we had similar light scattering results for MreB for years, while only aggregates could be observed by TEM (see above Response 2.3A). Several observations also suggest that aggregation instead of polymerization might be at play in the Mayer study, for example ‘polymerization’ occurring in salt-less buffer but ‘inhibited’ with as low as 100 mM KCl, which should rather be “salting in” (see below). We did not intend to be dismissive, but it seemed wrong to report their conclusions as conclusive evidence. We thought that we had cited these papers where appropriate but then explained that they show no conclusive proof of polymerization and why, but it is evident that we failed at communicating it clearly. We have reworked the text to remove any issuing and arbitrary statement about our concerns regarding these reports (e.g. L93 & L126).

      One important note - There are 2 points indicating that dismissing the Meyer and Amann work as aggregation is incorrect:

      1) the Meyer work on B. subtilis MreB shows both an ATP and a slightly higher ADP critical concentration. As the emergence of a critical concentration is a steady-state phenomenon arising from the association/dissociation of monomers (and a kinetically limiting nucleation barrier), an emergent critical concentration cannot arise from protein aggregation, critical concentrations only arise from a dynamic equilibrium between monomer and polymer.

      • Critical concentration for ATP, ADP or AMPPNP were described in Mayer & Amann (Mayer & Amann, 2009) as “nearly indistinguishable” (see Response 2.5Bii)
      • Protein aggregation depends on the solution (pH and ions), protein concentration and temperature. And above a certain concentration, proteins can become instable, thus a critical concentration for aggregation can emerge.

      2) Furthermore, Meyer observed that increased salt slowed and reduced B. subtilis MreB light scattering, the opposite of what one would expect if their "polymerization signal" was only protein aggregation, as higher salts should increase the rate of aggregation by increasing the hydrophobic effect.

      It is true that at high salt concentration proteins can precipitate, a phenomenon described as “salting out”. However, it is also true that salts help to solubilize proteins (“salting in”), and that proteins tend to precipitate in the absence of salt. Considering that the starting point of the Mayer and Amann experiment (Mayer & Amann, 2009) is the absence of salt (where they observed the highest scattering) and that they gradually reduce this scattering by increasing KCl (the scattering is almost abolished below 100 mM only!) it is plausible that a salting-in phenomenon might be at play, due to increased solubility of MreB by salt. In any case, this cannot be taken as a proof that polymerization rather than aggregation occurred.

      B) Lines 113-137 -The authors reference many different studies of MreB, including both MreB on membranes and MreB polymerized in solution (which formed bundles). However, they again neglect to mention or reference the findings of Meyer and Amann (Mayer and Amann, 2009), as it was dismissed as "aggregation". As B. subtilis is also a gram-positive organism, the Meyer results should be discussed.

      We did cite the Mayer and Amann paper but, as explained above, we cannot cite this study as an example of proven polymerization. We avoided as much as possible to polemicize in the text and cited this paper when possible. Again, we have reworked the text to avoid any issuing or dismissive statement. Also, we forgot mentioned this study at L121 as an example of reported ATPase activity, and this has now been corrected.

      2) Lines 387-391 state the rates of phosphate release relative to past MreB findings: "These rates of Pi release upon ATP hydrolysis (~ 1 Pi/MreB in 6 min at 53{degree sign}C) are comparable to those observed for MreBTm and MreB(Ec) in vitro". While the measurements of Pi release AND ATP hydrolysis have indeed been measured for actin, this statement does not apply to MreB and should be corrected: All MreB papers thus far have only measured Pi release alone, not ATP hydrolysis at the same time. Thus, it is inaccurate to state "rates of Pi release upon ATP hydrolysis" for any MreB study, as to accurately determine the rate of Pi release, one must measure: 1. The rate of polymer over time, 2) the rate of ATP hydrolysis, and 3) the rate of phosphate release. For MreB, no one has, so far, even measured the rates of ATP hydrolysis and phosphate release with the same sample.

      We completely agree with the reviewer, we apologize if our formulation was inaccurate. We have corrected the sentence (L479). Thank you for pointing out this mistake.

      3) The interpretation of the interactions between monomers in the MreB crystal should be more carefully stated to avoid confusion. While likely not their intention, the discussions of the crystal packing contacts of MreB can appear to assume that the monomer-monomer contacts they see in crystals represent the contacts within actual protofilaments. One cannot automatically assume the observations of monomer-monomer contacts within a crystal reflect those that arise in the actual filament (or protofilament).

      We agree, we thank the reviewer for his comments. We have revamped the corresponding paragraph.

      A) They state, "the apo form of MreBGs forms less stable protofilaments than its G- homologs ." Given filaments of the Apo form of MreB(GS) or b. subtilis have never been observed in solution, this statement is not accurate: while the contacts in the crystal may change with and without nucleotide, if the protein does not form polymers in solution in the apo state, then there are no "real" apo protofilaments, and any statements about their stability become moot. Thus this statement should be rephrased or appropriately qualified.

      see above.

      B) Another example: while they may see that in the apo MreB crystal, the loop of domain IB makes a single salt bridge with IIA and none with IIB. This contrasts with every actin, MreB, and actin homolog studied so far, where domain IB interacts with IIB. This might reflect the real contacts of MreB(Gs) in the solution, or it may be simply a crystal-packing artifact. Thus, the authors should be careful in their claims, making it clear to the reader that the contacts in the crystal may not necessarily be present in polymerized filaments.

      Again, we agree with the reviewer, we cannot draw general conclusions about the interactions between monomers from the apo form. We have rephrased this paragraph.

      4) lines 201-202 - "Polymers were only observed at a concentration of MreB above 0.55 μM (0.02 mg/mL)". Given this concentration dependence of filament formation, which appears the same throughout the paper, the authors could state that 0.55 μM is the critical concentration of MreB on membranes under their buffer conditions. Given the lack of critical concentration measurement in most of the MreB literature, this could be an important point to make in the field.

      Following reviewer’s #2 suggestion, we have now estimated the critical concentration (Cc=0.4485 µM) and reported it in the text. (L218).

      5) Both mg/ml and uM are used in the text and figures to refer to protein concentration. They should stick to one convention, preferably uM, as is standard in the polymer field.

      Sorry for the confusion. We have homogenized to MreB concentrations to µM throughout the text and figures.

      6) Lines 77-78 - (Teeffelen et al., 2011) should be referenced as well in regard to cell wall synthesis driving MreB motion.

      This has been corrected, sorry for omitting this reference.

      7) Line 90 - "Do they exhibit turnover (treadmill) like actin filaments?". This phrase should be modified, as turnover and treadmilling are two very different things. Turnover is the lifetime of monomers in filaments, while treadmilling entails monomer addition at one end and loss at the other. While treadmilling filaments cause turnover, there are also numerous examples of non-treadmilling filaments undergoing turnover: microtubules, intermediate filaments, and ParM. Likewise, an antiparallel filament cannot directionally treadmill, as there is no difference between the two filament ends to confer directional polarity.

      This is absolutely true, we apologize for our mistake. The sentence has been corrected (L82).

      8) Throughout the paper, the term aggregation is used occasionally to describe the polymerization shown in many previous MreB studies, almost all of which very clearly showed "bundled" filaments, very distinct entities from aggregates, as a bundle of polymers cannot form without the filaments first polymerizing on their own. Evidence to this point, polymerization has been shown to precede the bundling of MreB(Tm) by (Esue et al., 2005).

      We agree with reviewer #2 about polymers preceding bundles and “sheets”. However, we respectfully disagree that we used the word aggregation “throughout the paper” to describe structures that clearly showed polymers or sheets of filaments. A search (Ctrl-F: “aggreg”) reveals only 6 matches, 3 describing our own observations (L152, 163/5, and 1023/28), one referring to (Salje et al., 2011) (L107) but citing her claim that they observed aggregation (due to the N-terminus), and the last two (L100, L440) refer (again) to the Gaballah/Mayer/Dersch publications to say that aggregation could not be excluded in these reports as discussed above (Dersch et al., 2020; Gaballah et al., 2011; Mayer & Amann, 2009).

      9) lines 106-108 mention that "The N-terminal amphipathic helix of E. coli MreB (MreBEc) was found to be necessary for membrane binding. " This is not accurate, as Salje observed that one single helix could not cause MreB to mind to the membrane, but rather, multiple amphipathic helices were required for membrane association (Salje et al., 2011).

      Salje et al showed that in vivo the deletion of the helix abolishes the association of MreB to the membrane. This publication also shows that in vitro, addition of the helix to GFP (not to MreB) prompts binding to lipid vesicles, and that this was increased if there are 2 copies of the helix, but they could not test this directly in vitro with MreB (which is insoluble when expressed with its N-terminus). This prompted them to speculate that multiple MreBs could bind better to the membrane than monomers. However, this remained to be demonstrated. Additional hydrophobic regions in MreB such as the hydrophobic loop could participate to membrane anchoring but are absent in their in vitro assays with GFP.

      The Salje results imply that dimers (or further assemblies) of MreB drive membrane association, a point that should be discussed in regard to the question "What prompts the assembly of MreB on the inner leaflet of the cytoplasmic membrane?" posed on lines 86-87.

      We agree that this is an interesting point. As it is consistent with our results, we have incorporated it to our model (Fig. 6) and we are addressing it in the discussion L573-575.

      10) On lines 414-415, it is stated, "The requirement of the membrane for polymerization is consistent with the observation that MreB polymeric assemblies in vivo are membrane-associated only." While I agree with this hypothesis, it must be noted that the presence or absence of MreB polymers in the cytoplasm has not been directly tested, as short filaments in the cytoplasm would diffuse very quickly, requiring very short exposures (<5ms) to resolve them relative to their rate of diffusion. Thus, cytoplasmic polymers might still exist but have not been tested.

      This is also an interesting point. Indeed if a nucleated form, or very short (unbundled) polymers exist in the cytoplasm, they have not been tested by fluorescence microscopy. However, the polymers that localize at the membrane (~ 200 nm), if soluble, would have been detected in the cytoplasm by the work of reviewer #2, us or others.

      11) lines 429-431 state, "but polymerization in the presence of ADP was in most cases concluded from light scattering experiments alone, so the possibility that aggregation rather than ordered polymerization occurred in the process cannot be excluded."

      A) If an increased light scattering signal is initiated by the addition of ADP (or any nucleotide), that signal must come from polymerization or multimerization. What the authors imply is that there must be some ADP-dependent "aggregation" of MreB, which has not been seen thus far for any polymer. Furthermore, why would the addition of ADP initiate aggregation?

      We did not mean that ADP itself would prompt aggregation, but that the protein would aggregate in the buffer regardless of the presence of ADP or other nucleotides. The Mayer & Amann study claims that MreB “polymerization” is nucleotide-independent, as they got identical curves with ATP, ADP, AMPPNP and even with no nucleotides at all (Fig. 10 in their paper, pasted here) (Mayer & Amann, 2009).

      Their experiments with KCl are also remarkable as when they lowered the salt they got faster and faster “polymerization”, with the strongest light scattering signal in the absence of any salt. The high KCl concentration in which they got almost no more “polymers” was 75 mM KCl, and ‘polymerization was almost entirely inhibited at 100 mM’ (Fig. 7, pasted below). Yet the intracellular level of KCl in bacteria is estimated to be ~300 mM (see Response 1.1)

      B) Likewise, the statement "Differences in the purity of the nucleotide stocks used in these studies could also explain some of the discrepancies" is unexplained and confusing. How could an impurity in a nucleotide stock affect the past MreB results, and what is the precedent for this claim?

      We meant that the presence of ATP in the ADP stocks might have affected the outcome of some assays, generating the conflicting results existing in the literature. We agree this sentence was confusing, we have removed it.

      12) lines 467-469 state, "Thus, for both MreB and actin, despite hydrolyzing ATP before and after polymerization, respectively, the ADP-Pi-MreB intermediate would be the long-lived intermediate state within the filaments."

      A) For MreB, this statement is extremely speculative and unbiased, as no one has measured 1) polymerization, 2) ATP hydrolysis, and 3) phosphate release. For example, it could be that ATP hydrolysis is slow, while phosphate release is fast, as is seen in the actin from Saccharomyces cerevisiae.

      We agree that this was too speculative. This has been removed from the (extensively) modified Discussion section. Thanks for the comment.

      B) For actin, the statement of hydrolysis of ATP of monomer occurring "before polymerization" is functionally irrelevant, as the rate of ATP hydrolysis of actin monomers is 430,000 times slower than that of actin monomers inside filaments (Blanchoin and Pollard, 2002; Rould et al., 2006).

      We agree that the difference of hydrolysis rate between G-actin and F-actin implies that ATP hydrolysis occurs after polymerization. We are afraid that we do not follow the reviewer’s point here, we did not say or imply that ATP hydrolysis by actin monomers was functionally relevant.

      13) Lines 442-444. "On the basis of our data and the existing literature, we propose that the requirement for ATP (or GTP) hydrolysis for polymerization may be conserved for most MreBs." Again, this statement both here (and in the prior text) is an extremely bold claim, one that runs contrary to a large amount of past work on not just MreB, but also eukaryotic actin and every actin homolog studied so far. They come to this model based on 1) one piece of suggestive data (the behavior of MreB(GS) bound to 2 non-hydrolysable ATP analogs in 500mM KCL), and 2) the dismissal (throughout the paper) of many peer-reviewed MreB papers that run counter to their model as "aggregation" or "contaminated ATP stocks ." If they want to make this bold claim that their finding invalidates the work of many labs, they must back it up with further validating experiments.

      We respectfully disagree that our model was based on “one piece of suggestive data” and backed-up by dismissing most past work in the field. We only wanted to raise awareness about the conflicting data between some reports (listed in response 2.5a), and that the claims made by some publications are to be taken with caution because they only rely on light scattering or, when TEM was performed, showed only disorganized structures.

      This said, we clearly failed in proposing our model and we are sorry to see that we really annoyed the reviewer with our suspicion that the work by Mayer & Amann reports aggregation. As indicated above, we have amended our manuscript relative to this point. We also agree that our suggestion to generalize our findings to most MreBs was unsupported, and overstated considering how confusing some result from the literature are. We have refined our model and reworked the text to take on board the reviewer’s remarks as well as the new data generated during the revision process.

      We would like to thank reviewer #2 for his in-depth review of our manuscript.  

      Reviewer #3 (Public Review):

      The major claim from the paper is the dependence of two factors that determine the polymerization of MreB from a Gram-positive, thermophilic bacteria 1) The role of nucleotide hydrolysis in driving the polymerization. 2) Lipid bilayer as a facilitator/scaffold that is required for hydrolysis-dependent polymerization. These two conclusions are contrasting with what has been known until now for the MreB proteins that have been characterized in vitro. The experiments performed in the paper do not completely justify these claims as elaborated below.

      We understand the reviewer’ concerns in view of the existing literature on actin and Gram-negative MreBs. We may just be missing the optimal conditions for polymerisation in solution, while our phrasing gave the impression that polymers could never form in the absence of ATP or lipids. Our new data actually shows that MreBGs at higher concentration can assemble into bundle- and sheet-like structures in solution and in the presence of ADP/AMP-PNP. Pairs of filaments are however only observed in the presence of lipids for all conditions tested. As indicated in the answers to the global review comments, we have included our new data in the manuscript, revised our conclusions and claims about the lipid requirement and expanded on these points in the Discussion.

      Major comments:

      1) No observation of filaments in the absence of lipid monolayer can also be accounted due to the higher critical concentration of polymerization for MreBGS in that condition. It is seen that all the negative staining without lipid monolayer condition has been performed at a concentration of 0.05 mg/mL. It is important to check for polymerization of the MreBGS at higher concentration ranges as well, in order to conclusively state the requirement of lipids for polymerization.

      Response 3.1. 0.05 mg/ml (1.3µM) is our standard condition, and our leeway was limited by the rapid aggregation observed at higher MreB concentrations, as indicated in the text. We have now tested as well 0.25 mg/ml (6.5 µM - the maximum concentration possible before major aggregation occurs in our experimental conditions). At this higher concentration, we see some sheet-like structures in solution, confirming a requirement of a higher concentration of MreB for polymerization in these conditions (see the answers to the global review comments for more details)

      We thank the reviewer for pushing us to address this point. We have revised our conclusions accordingly.

      2) The absence of filaments for the non-hydrolysable conditions in the lipid layer could also be because the filaments that might have formed are not binding to the planar lipid layer, and not necessarily because of their inability to polymerize.

      Response 3.2. This is a fair point. To test the possibility that polymers would form but would not bind to the lipid layer we have now added additional semi-quantitative EM controls (for both the non-hydrolysable ATP analogs and the three ‘membrane binding’ deletion mutants) testing polymerization in solution (without lipids) and also using plasma-treated grids. These showed that in our standard polymerization conditions, virtually no polymers form in solution (Fig. 3-S1B and Fig. 4-S4A). Albeit at very low frequency, some dual protofilaments were however detected in the presence of ADP or AMP-PNP at the high MreB concentration (Fig. 3-S1D). At this high MreB concentration, the sheet-like structures occasionally observed in solution in the presence of ATP were frequent in the presence of ADP and very frequent in the presence of AMP-PNP (Fig. 3-S2B). We have revised our conclusions on the basis of these new data: MreBGs can form polymeric assemblies in solution and in the absence of ATP hydrolysis at a higher critical concentration than in the presence of ATP and lipids.

      See the answers to the global review comments (point 2) and Response 2.3C to reviewer #2 for more details.

      3) Given the ATPase activity measurements, it is not very convincing that ATP rather than ADP will be present in the structure. The ATP should have been hydrolysed to ADP within the structure. The structure is now suggestive that MreB is not capable of hydrolysis, which is contradictory to the ATP hydrolysis data.

      Response 3.3. We thank the reviewer for her insightful remarks about the MreB-ATP crystal structure. The electron density map clearly demonstrates the presence of 3 phosphates. However, as suggested by the reviewer, the density which was attributed to a Mg2+ ion was to be interpreted as a water molecule. The absence of Mg2+ in the crystal could thus explain why the ATP had not been hydrolyzed.

      References

      Arino J, Ramos J, Sychrova H (2010) Alkali metal cation transport and homeostasis in yeasts. Microbiology and molecular biology reviews 74: 95-120

      Bean GJ, Amann KJ (2008) Polymerization properties of the Thermotoga maritima actin MreB: roles of temperature, nucleotides, and ions. Biochemistry 47: 826-835

      Cayley S, Lewis BA, Guttman HJ, Record MT, Jr. (1991) Characterization of the cytoplasm of Escherichia coli K-12 as a function of external osmolarity. Implications for protein-DNA interactions in vivo. Journal of molecular biology 222: 281-300

      Dersch S, Reimold C, Stoll J, Breddermann H, Heimerl T, Defeu Soufo HJ, Graumann PL (2020) Polymerization of Bacillus subtilis MreB on a lipid membrane reveals lateral co-polymerization of MreB paralogs and strong effects of cations on filament formation. BMC Mol Cell Biol 21: 76

      Eisenstadt E (1972) Potassium content during growth and sporulation in Bacillus subtilis. Journal of bacteriology 112: 264-267

      Epstein W, Schultz SG (1965) Cation Transport in Escherichia coli: V. Regulation of cation content. J Gen Physiol 49: 221-234

      Esue O, Wirtz D, Tseng Y (2006) GTPase activity, structure, and mechanical properties of filaments assembled from bacterial cytoskeleton protein MreB. Journal of bacteriology 188: 968-976

      Gaballah A, Kloeckner A, Otten C, Sahl HG, Henrichfreise B (2011) Functional analysis of the cytoskeleton protein MreB from Chlamydophila pneumoniae. PloS one 6: e25129

      Harne S, Duret S, Pande V, Bapat M, Beven L, Gayathri P (2020) MreB5 Is a Determinant of Rod-to-Helical Transition in the Cell-Wall-less Bacterium Spiroplasma. Curr Biol 30: 4753-4762 e4757

      Kang H, Bradley MJ, McCullough BR, Pierre A, Grintsevich EE, Reisler E, De La Cruz EM (2012) Identification of cation-binding sites on actin that drive polymerization and modulate bending stiffness. Proceedings of the National Academy of Sciences of the United States of America 109: 16923-16927

      Lacabanne D, Wiegand T, Wili N, Kozlova MI, Cadalbert R, Klose D, Mulkidjanian AY, Meier BH, Bockmann A (2020) ATP Analogues for Structural Investigations: Case Studies of a DnaB Helicase and an ABC Transporter. Molecules 25

      Mannherz HG, Brehme H, Lamp U (1975) Depolymerisation of F-actin to G-actin and its repolymerisation in the presence of analogs of adenosine triphosphate. Eur J Biochem 60: 109-116

      Mayer JA, Amann KJ (2009) Assembly properties of the Bacillus subtilis actin, MreB. Cell motility and the cytoskeleton 66: 109-118

      Nurse P, Marians KJ (2013) Purification and characterization of Escherichia coli MreB protein. The Journal of biological chemistry 288: 3469-3475

      Pande V, Mitra N, Bagde SR, Srinivasan R, Gayathri P (2022) Filament organization of the bacterial actin MreB is dependent on the nucleotide state. The Journal of cell biology 221

      Peck ML, Herschlag D (2003) Adenosine 5 '-O-(3-thio)triphosphate (ATP-gamma S) is a substrate for the nucleotide hydrolysis and RNA unwinding activities of eukaryotic translation initiation factor eIF4A. Rna 9: 1180-1187

      Popp D, Narita A, Maeda K, Fujisawa T, Ghoshdastider U, Iwasa M, Maeda Y, Robinson RC (2010) Filament structure, organization, and dynamics in MreB sheets. The Journal of biological chemistry 285: 15858-15865

      Rhoads DB, Waters FB, Epstein W (1976) Cation transport in Escherichia coli. VIII. Potassium transport mutants. J Gen Physiol 67: 325-341

      Rodriguez-Navarro A (2000) Potassium transport in fungi and plants. Biochimica et biophysica acta 1469: 1-30

      Salje J, van den Ent F, de Boer P, Lowe J (2011) Direct membrane binding by bacterial actin MreB. Molecular cell 43: 478-487

      Schmidt-Nielsen B (1975) Comparative physiology of cellular ion and volume regulation. J Exp Zool 194: 207-219

      Szatmari D, Sarkany P, Kocsis B, Nagy T, Miseta A, Barko S, Longauer B, Robinson RC, Nyitrai M (2020) Intracellular ion concentrations and cation-dependent remodelling of bacterial MreB assemblies. Sci Rep-Uk 10

      van den Ent F, Izore T, Bharat TA, Johnson CM, Lowe J (2014) Bacterial actin MreB forms antiparallel double filaments. eLife 3: e02634

      Whatmore AM, Chudek JA, Reed RH (1990) The Effects of Osmotic Upshock on the Intracellular Solute Pools of Bacillus subtilis. Journal of general microbiology 136: 2527-2535

    1. Author Response

      Reviewer #1 (Public Review):

      Briggs et al use a combination of mathematical modelling and experimental validation to tease apart the contributions of metabolic and electronic coupling to the pancreatic beta cell functional network. A number of recent studies have shown the existence of functional beta cell subpopulations, some of which are difficult to fully reconcile with established electrophysiological theory. More generally, the contribution of beta cell heterogeneity (metabolism, differentiation, proliferation, activity) to islet function cannot be explained by existing combined metabolic/electrical oscillator models. The present studies are thus timely in modelling the islet electrical (structural) and functional networks. Importantly, the authors show that metabolic coupling primarily drives the islet functional network, giving rise to beta cell subpopulations. The studies, however, do not diminish the critical role of electrical coupling in dictating glucose responsiveness, network extent as well as longer-range synchronization. As such, the studies show that islet structural and functional networks both act to drive islet activity, and that conclusions on the islet structural network should not be made using measures of the functional network (and vice versa).

      Strengths:

      • State-of-the-art multi-parameter modelling encompassing electrical and metabolic components.

      • Experimental validation using advanced FRAP imaging techniques, as well as Ca2+ data from relevant gap junction KO animals.

      • Well-balanced arguments that frame metabolic and electrical coupling as essential contributors to islet function.

      • Likely to change how the field models functional connectivity and beta cell heterogeneity.

      Weaknesses:

      • Limitations of FRAP and electrophysiological gap junction measures not considered.

      • Limitations of Cx36 (gap junction) KO animals not considered.

      • Accuracy of citations should be improved in a few cases.

      We thank reviewer 1 for their positive comments, including the many strengths in the approaches, arguments and impact. We do note the weaknesses raised by the reviewer and have addressed them following the comments below.

      We would like to also note that when we refer to metabolic activity driving the functional network, we are not referring to metabolic coupling between beta cells. Rather we mean that two cells that show either high levels of metabolic activity (glycolytic flux) or that show similar levels metabolic activity will show increased synchronization and thus a functional network edge as compares to cells with elevated gap junction conductance. Increased metabolic activity would likely generate increased depolarizing currents that will provide an increased coupling current to drive synchronization; whereas similar metabolic activity would mean a given coupling current could more readily drive synchronized activity. We have substantially rewritten the manuscript to clarify this point.

      Reviewer #2 (Public Review):

      In their present work, Briggs et al. combine biophysical simulations and experimental recordings of beta cell activity with analyses of functional network parameters to determine the role played by gap-junctional coupling, metabolism, and KATP conductance in defining the functional roles that the cells play in the functional networks, assess the structure-function relationship, and to resolve an important current open question in the field on the role of so-called hub cells in islets of Langerhans.

      Combining differential equation-based simulations on 1000 coupled cells with demanding calcium, NAPDH, and FRAP imaging, as well as with advanced network analyses, and then comparing the network metrics with simulated and experimentally determined properties is an achievement in its own right and a major methodological strength. The findings have the potential to help resolve the issue of the importance of hub cells in beta cell networks, and the methodological pipeline and data may prove invaluable for other researchers in the community.

      However, methodologically functional networks may be based on different types of calcium oscillations present in beta cells, i.e., fast oscillations produced by bursts of electrical activity, slow oscillations produced by metabolic/glycolytic oscillations, or a mixture of both. At present, the authors base the network analyses on fast oscillations only in the case of simulated traces and on a mixture of fast and slow oscillations in the case of experimental traces. Since different networks may depend on the studied beta cell properties to a different extent (e.g., fast oscillation-based networks may, more importantly, depend on electrical properties and slow oscillationbased networks may more strongly depend on metabolic properties), it is important that in drawing the conclusions the authors separately address the influence of a cell's electrical and metabolic properties on its functional role in the network based on fast oscillations, slow oscillations, or a mixture of both.

      We thank reviewer 2 for their positive comments, including addressing the importance of this study as it pertains to islet biology and acknowledging methodological complexities of this study. We also thank the reviewer for their careful reading and providing useful comments. We have integrated each comment into the manuscript. Most importantly, we have now extended our analysis to both fast and slow oscillations by incorporating an additional mathematical model of coupled slow oscillations and performing additional experimental analysis of fast, slow, and mixed oscillations.

      Reviewer #3 (Public Review):

      Over the past decade, novel approaches to understanding beta cell connectivity and how that contributes to the overall function of the pancreatic islet have emerged. The application of network theory to beta cell connectivity has been an extremely useful tool to understand functional hierarchies amongst beta cells within an islet. This helps to provide functional relevance to observations from structural and gene expression data that beta cells are not all identical.

      There are a number of "controversies" in this field that have arisen from the mathematical and subsequent experimental identification of beta "hub" cells. These are small populations of beta cells that are very highly connected to other beta cells, as assessed by applying correlation statistics to individual beta cell calcium traces across the islet.

      In this paper Briggs et al set out to answer the following areas of debate:

      They use computational datasets, based on established models of beta cells acting in concert (electrically coupled) within an islet-like structure, to show that it is similarities in metabolic parameters rather than "structural" connections (ie proximity which subserves gap junction coupling) that drives functional network behaviour. Whilst the computational models are quite relevant, the fact that the parameters (eg connectivity coefficients) are quite different to what is measured experimentally, confirm the limitations of this model. Therefore it was important for the authors to back up this finding by performing both calcium and metabolic imaging of islet beta cells. These experimental data are reported to confirm that metabolic coupling was more strongly related to functional connectivity than gap junction coupling. However, a limitation here is that the metabolic imaging data confirmed a strong link between disconnected beta cells and low metabolic coupling but did not robustly show the opposite. Similarly, I was not convinced that the FRAP studies, which indirectly measured GJ ("structural") connections were powered well enough to be related to measures of beta cell connectivity.

      The group goes on to provide further analytical and experimental data with a model of increasing loss of GJ connectivity (by calcium imaging islets from WT, heterozygous (50% GJ loss), and homozygous (100% loss). Given the former conclusion that it was metabolic not GJ connectivity that drives small world network behaviour, it was surprising to see such a great effect on the loss of hubs in the homs. That said, the analytical approaches in this model did help the authors confirm that the loss of gap junctions does not alter the preferential existence of beta cell connectivity and confirms the important contribution of metabolic "coupling". One perhaps can therefore conclude that there are two types of network behaviour in an islet (maybe more) and the field should move towards an understanding of overlapping network communities as has been done in brain networks.

      Overall this is an extremely well-written paper which was a pleasure to read. This group has neatly and expertly provided both computational and experimental data to support the notion that it is metabolic but not "structural" ie GJ coupling that drives our observations of hubs and functional connectivity. However, there is still much work to do to understand whether this metabolic coupling is just a random epiphenomenon or somehow fated, the extent to which other elements of "structural" coupling - ie the presence of other endocrine cell types, the spatial distribution of paracrine hormone receptors, blood vessels and nerve terminals are also important.

      We thank reviewer 3 for their positive comments, including the methodology, writing style, and the importance of this paper to the broader islet community. We thank the reviewer for their very in-depth and helpful comments. We have addressed each comment below and made significant changes to the manuscript according. We conducted more FRAP experiments and separated results into slow, fast, and mixed oscillations. We included analysis of an additional computational model that simulates slow calcium oscillations. Additionally, we substantially rewrote the paper to clarify that we are not referring to metabolic coupling and speak on the broader implications of network theory and our findings.

      Reviewer #4 (Public Review):

      This manuscript describes a complex, highly ambitious set of modeling and experimental studies that appear designed to compare the structural and functional properties of beta cell subpopulations within the islet network in terms of their influence on network synchronization. The authors conclude that the most functionally coupled cell subpopulations in the islet network are not those that are most structurally coupled via gap junctions but those that are most metabolically active.

      Strengths of the paper include (1) its use of an interdisciplinary collection of methods including computer simulations, FRAP to monitor functional coupling by gap junctions, the monitoring of Ca2+ oscillations in single beta cells embedded in the network, and the use of sophisticated approaches from probability theory. Most of these methods have been used and validated previously. Unfortunately, however, it was not clear what the underlying premise of the paper actually is, despite many stated intentions, nor what about it is new compared to previous studies, an additional weakness.

      Although the authors state that they are trying to answer 3 critical questions, it was not clear how important these questions are in terms of significance for the field. For example, they state that a major controversy in the field is whether network structure or network function mediates functional synchronization of beta cells within the islet. However, this question is not much debated. As an example, while it is known that there can be long-range functional coupling in islets, no workers in the field believe there is a physical structure within islets that mediates this, unlike the case for CNS neurons that are known to have long projections onto other neurons. Beta cells within the islets are locally coupled via gap junctions, as stated repeatedly by the authors but these mediate short-range coupling. Thus, there are clearly functional correlations over long ranges but no structures, only correlated activity. This weakness raises questions about the overall significance of the work, especially as it seems to reiterate ideas presented previously.

      We thank reviewer 4 for their positive comments, including our multidisciplinary use of mathematical models and experimental imaging techniques. We have now included an additional model of slow oscillations (the Integrated Oscillator Model) to improve our conclusions. We also thank reviewer 4 for the insightful comments. We have carefully reviewed each comment and made significant changes to the manuscript accordingly. In particular, we have significantly rewritten the introduction and discussion attempting to clarify what is new in our manuscript and what is previously shown. Additionally, we agree with the reviewers’ sentiment that there is little debate over whether, for example, there are physical structures within the islet that mediate long-range functional connections. However, there is current debate over whether functional beta-cell subpopulations can dictate islet dynamics (see [11]–[13]). This debate can be framed by observing whether these functional subpopulations emerge from the islet due to physical connections (structural network) or something more nuisance (such as intrinsic dynamics). We have reframed the introduction and discussion to clarify this debate as well as more clearly state the premise of the paper.

      Specific Comments

      1). The authors state it is well accepted that the disruption of gap junctional coupling is a pathophysiological characteristic of diabetes, but this is not an opinion widely accepted by the field, although it has been proposed. The authors should scale back on such generalizations, or provide more compelling evidence to support such a claim.

      Thank you for pointing this out, we have provided more specific citations and changes the wording from “well accepted” to “has been documented”. See Discussion page 13 lines 415-416.

      2) The paper relies heavily on simulations performed using a version of the model of Cha et al (2011). While this is a reasonable model of fast bursting (e.g. oscillations having periods <1 min.), the Ca2+ oscillations that were recorded by the authors and shown in Fig. 2b of the manuscript are slow oscillations with periods of 5 min and not <1 min, which is a weakness of the model in the current context. Furthermore, the model outputs that are shown lack the well-known characteristics seen in real islets, such as fast-spiking occurring on prolonged plateaus, again as can be seen by comparing the simulated oscillations shown in Fig. 1d with those in Fig. 2b. It is recommended that the simulations be repeated using a more appropriate model of slow oscillations or at least using the model of Cha et al but employed to simulate in slower bursting.

      The reviewer raises an important point and caveat associated with our simulated model and experimental data. This point was also made by other reviewers, and a similar response to this comment can be found elsewhere in response to reviewer 2 point 6. To address this comment, we have performed several additional experiments and analyses:

      1) We collected additional Ca2+ (to identify the functional network and hubs) and FRAP data (to assess gap junction permeability) in islets which show either pure slow, pure fast, or mixed oscillations. We generated networks based on each time scale to compare with FRAP gap junction permeability data. We found that the conclusions of our first draft to be consistent across all oscillation types. There was no relationship between gap junction conductance, as approximated using FRAP, and normalized degree for slow (Figure 3j), fast (Figure 3 Supp 1d,e), or mixed (Figure 3 Supp 1g,h) oscillations. We also include discussion of these conclusions - See Results page 7 lines 184-186 and lines 188-191, Discussion page 12 lines 357-360.

      2) We also performed additional simulations with a coupled ‘Integrated Oscillator Model’ which shows slow oscillations because of metabolic oscillations (Figure 2). We compared connectivity with gap junction coupling and underlying cell parameters. In this case, there is an association between functional and structural networks, with highly-connected hub cells showing higher gap junction conductance (Figure 2f) but also low KATP channel conductance (gKATP) (Figure 2e). However, there are some caveats to these findings – given the nature of the IOM model, we were limited to simulating smaller islets (260 cells) and less heterogeneity in the calcium traces was observed. Additional analysis suggests the greater association between functional and structural networks in this model was a result of the smaller islets, and the association was also dependent on threshold (unlike in the Cha-Noma fast oscillator model) robust. These limitations and results are discussed further (Discussion page 11 lines 344-354).

      Additionally, in the IOM, the underlying cell dynamics of highly-connected hub cells are differentiated by KATP channel conductance (gKATP), which is different than in the fast oscillator model (differentiated by metabolism, kglyc). However this difference between models can be linked to differences in the way duty cycle is influenced by gKATP and kglyc (Figure 1h, Figure 2g). In each model there was a similar association between duty cycle and highly-connected hub cells. We also discuss these findings (Discussion page 11 lines 334-343).

      Overall these results and discussion with respect to the coupled IOM oscillator model can be found in Figure 2, Results page 6 lines 128-156 and Discussion page 11 lines 332-354.

      3) Much of the data analyzed whether obtained via simulation or through experiment seems to produce very small differences in the actual numbers obtained, as can be seen in the bar graphs shown in Figs. 1e,g for example (obtained from simulations), or Fig. 2j (obtained from experimental measurements). The authors should comment as to why such small differences are often seen as a result of their analyses throughout the manuscript and why also in many cases the observed variance is high. Related to the data shown, very few dots are shown in Figs. 1eg or Fig 4e and 4h even though these points were derived from simulations where 100s of runs could be carried out and many more points obtained for plotting. These are weaknesses unless specific and convincing explanations are provided.

      We thank the reviewer for these comments, which are similar to those of reviewer 2 (point 4) and reviewer 3 (point 6). Indeed there is some variability between cells in both simulations and experiments related to the metabolic activity in hubs and non-hubs. The variability points to potentially other factors being involved in determining hubs beyond simply kglyc, including a minor role for gap junction coupling structural network and potentially cell position and other intrinsic factors. We now discuss this point – see Discussion page 12 lines 364-266.

      The differences between hubs and nonhubs appear small because the value of kglyc is very small. For figure 1e, the average kglyc for nonhubs was 1.26x10-4 s-1 (which is the average of the distribution because most cells are non hubs) while the average kglyc for hubs was 1.4x10-4 s-1 which is about half of a standard deviation higher. The paired t-test controls for the small value of average kglyc.

      For simulation data each of the 5 dots corresponds to a simulated islet averaged over 1000 cells (or 260 cells for coupled IOM). The computational resources are high to generate such data so it is not feasible to conduct 100s of runs. Again, we note the comparisons between hubs and non-hubs are paired, and we find statistically significant differences for kglyc in figure 1 using only 5 paired data points. That we find these differences indicates the substantial difference between hubs and non-hubs. This is further supported all effect sizes being much greater than 0.8 for all significantly different findings (Cha Noma - kglyc: 2.85, gcoup: 0.82) (IOM: gKATP: 1.27, gcoup: 2.94) – We have included these effect sizes in the captions see Figure 1 and 2 captions (pages 34, 36)

      To consider all of the available data rather than the average across an entire islet, we created a kernel density estimate the kglyc for hubs and nonhubs created by concatenating every single cell in each of the five islets. A kstest results in a highly significant difference (P<0.0001) between these two distributions.

      Author response image 1.

      4) The data shown in Fig. 4i,j are intended to compare long-range synchronization at different distances along a string of coupled cells but the difference between the synchronized and unsynchronized cells for gcoup and Kglyc was subtle, very much so.

      Thank you for pointing out these subtle differences. The y-axis scale for i and j is broad to allow us to represent all distances on a single plot. After correction for multiple comparison, the differences were still statistically significant. As the reviewer mentioned in point 3, each plot contains only five data points, each of which represent the average of a single simulated islet, therefore we are not concerned about statistical significance coming from too large of a sample size. We also checked the differences between synchronized and nonsynchronized cell pairs in figure 4 panels e and h (now figure 5 e, h). These are the same data as i and j but normalized such that all of the distances could be averaged together. We again found statistical significance between synchronized and non-synchronized cell pairs. As can be seen in Author response image 2 the difference between synchronized and non-synchronized cell pairs is greater than the variability between simulated islets. Thus, in this case the variability is not substantial.

      Author response image 2.

      5) The data shown in Fig. 5 for Cx36 knockout islets are used to assess the influence of gap junctional coupling, which is reasonable, but it would be reassuring to know that loss of this gene has no effects on the expression of other genes in the beta cell, especially genes involved with glucose metabolism.

      This is an important point. Previous studies have assessed that no significant change in NAD(P)H is observed in Cx36 deficient islets – see Benninger et al J.Physiol 2011 [14]. Islet architecture is also retained. Further the insulin secretory response of dissociated Cx36 knockout beta cells is the same as that of dissociated wildtype beta cells, further indicating no significant defect in the intrinsic ability of the beta cell to release insulin – see Benninger et al J.Physiol 2011 [14]. We now Mention these findings in the discussion. See Discussion page 14 lines 459-464.

      6) In many places throughout the paper, it is difficult to ascertain whether what is being shown is new vs. what has been shown previously in other studies. The paper would thus benefit strongly from added text highlighting the novelty here and not just restating what is known, for instance, that islets can exhibit small-world network properties. This detracts from the strengths of the paper and further makes it difficult to wade through. Even the finding here that metabolic characteristics of the beta cells can infer profound and influential functional coupling is not new, as the authors proposed as much many years ago. Again, this makes it difficult to distill what is new compared to what is mainly just being confirmed here, albeit using different methods.

      Thank you for the suggestion, we have made significant modifications throughout the Introduction, Discussion and Results to be clearer about what is known from previous work and what is newly found in this manuscript.

      Reviewer #5 (Public Review):

      The authors use state-of-the-art computation, experiment, and current network analysis to try and disaggregate the impact of cellular metabolism driving cellular excitability and structural electrical connections through gap junctions on islet synchronization. They perform interesting simulations with a sophisticated mathematical model and compare them with closely associated experiments. This close association is impressive and is an excellent example of using mathematics to inform experiments and experimental results. The current conclusions, however, appear beyond the results presented. The use of functional connectivity is based on correlated calcium traces but is largely without an understood biophysical mechanism. This work aims to clarify such a mechanism between metabolism and structural connection and comes out on the side of metabolism driving the functional connectivity, but both are required and more nuanced conclusions should be drawn.

      We thank reviewer 5 for their positive comments, including our multifaceted experimental and computational techniques. We also found the reviewers careful reading and thoughtful comments to be very helpful and we have worked to integrate each comment into our manuscript. It is evident from the reviewer comments that we did not clearly explain what was meant by our conclusions concerning the functional network reflecting metabolism rather than gap junctions. We have conducted significant rewriting to show that we are not concluding that communication (metabolic or electric) occurs due to conduits other than gap junctions. Rather, our data suggest that the functional network (which reflects calcium synchronization) reflects intrinsic dynamics of the cells, which include metabolic rates, more than individual gap junction connections.

      References referred to in this response to reviewers document:

      [1] A. Stožer et al., “Functional connectivity in islets of Langerhans from mouse pancreas tissue slices,” PLoS Comput Biol, vol. 9, no. 2, p. e1002923, 2013.

      [2] N. L. Farnsworth, A. Hemmati, M. Pozzoli, and R. K. Benninger, “Fluorescence recovery after photobleaching reveals regulation and distribution of connexin36 gap junction coupling within mouse islets of Langerhans,” The Journal of physiology, vol. 592, no. 20, pp. 4431–4446, 2014.

      [3] C.-L. Lei, J. A. Kellard, M. Hara, J. D. Johnson, B. Rodriguez, and L. J. Briant, “Beta-cell hubs maintain Ca2+ oscillations in human and mouse islet simulations,” Islets, vol. 10, no. 4, pp. 151–167, 2018.

      [4] N. R. Johnston et al., “Beta cell hubs dictate pancreatic islet responses to glucose,” Cell metabolism, vol. 24, no. 3, pp. 389–401, 2016.

      [5] V. Kravets et al., “Functional architecture of pancreatic islets identifies a population of first responder cells that drive the first-phase calcium response,” PLoS Biology, vol. 20, no. 9, p. e3001761, 2022.

      [6] H. Ren et al., “Pancreatic α and β cells are globally phase-locked,” Nature Communications, vol. 13, no. 1, p. 3721, 2022.

      [7] A. Stožer et al., “From Isles of Königsberg to Islets of Langerhans: Examining the function of the endocrine pancreas through network science,” Frontiers in Endocrinology, vol. 13, p. 922640, 2022.

      [8] J. Zmazek et al., “Assessing different temporal scales of calcium dynamics in networks of beta cell populations,” Frontiers in physiology, vol. 12, p. 337, 2021.

      [9] M. E. Corezola do Amaral et al., “Caloric restriction recovers impaired β-cell-β-cell gap junction coupling, calcium oscillation coordination, and insulin secretion in prediabetic mice,” American Journal of Physiology-Endocrinology and Metabolism, vol. 319, no. 4, pp. E709–E720, 2020.

      [10] J. M. Dwulet, J. K. Briggs, and R. K. P. Benninger, “Small subpopulations of beta-cells do not drive islet oscillatory [Ca2+] dynamics via gap junction communication,” PLOS Computational Biology, vol. 17, no. 5, p. e1008948, May 2021, doi: 10.1371/journal.pcbi.1008948.

      [11] B. E. Peercy and A. S. Sherman, “Do oscillations in pancreatic islets require pacemaker cells?,” Journal of Biosciences, vol. 47, no. 1, pp. 1–11, 2022.

      [12] G. A. Rutter, N. Ninov, V. Salem, and D. J. Hodson, “Comment on Satin et al.‘Take me to your leader’: an electrophysiological appraisal of the role of hub cells in pancreatic islets. Diabetes 2020; 69: 830–836,” Diabetes, vol. 69, no. 9, pp. e10–e11, 2020.

      [13] L. S. Satin and P. Rorsman, “Response to comment on satin et al.‘Take me to your leader’: An electrophysiological appraisal of the role of hub cells in pancreatic islets. Diabetes 2020; 69: 830–836,” Diabetes, vol. 69, no. 9, pp. e12–e13, 2020.

      [14] R. K. Benninger, W. S. Head, M. Zhang, L. S. Satin, and D. W. Piston, “Gap junctions and other mechanisms of cell–cell communication regulate basal insulin secretion in the pancreatic islet,” The Journal of physiology, vol. 589, no. 22, pp. 5453–5466, 2011.

      [15] R. Fried, Erectile dysfunction as a cardiovascular impairment. Academic Press, 2014. [16] T. Pipatpolkai, S. Usher, P. J. Stansfeld, and F. M. Ashcroft, “New insights into KATP channel gene mutations and neonatal diabetes mellitus,” Nature Reviews Endocrinology, vol. 16, no. 7, pp. 378–393, 2020.

      [17] A. M. Notary, M. J. Westacott, T. H. Hraha, M. Pozzoli, and R. K. P. Benninger, “Decreases in Gap Junction Coupling Recovers Ca2+ and Insulin Secretion in Neonatal Diabetes Mellitus, Dependent on Beta Cell Heterogeneity and Noise,” PLOS Computational Biology, vol. 12, no. 9, p. e1005116, Sep. 2016, doi: 10.1371/journal.pcbi.1005116.

      [18] J. V. Rocheleau, G. M. Walker, W. S. Head, O. P. McGuinness, and D. W. Piston, “Microfluidic glucose stimulation reveals limited coordination of intracellular Ca2+ activity oscillations in pancreatic islets,” Pro ceedings of the National Academy of Sciences, vol. 101, no. 35, pp. 12899–12903, 2004. [19] R. K. Benninger, M. Zhang, W. S. Head, L. S. Satin, and D. W. Piston, “Gap junction coupling and calcium waves in the pancreatic islet,” Biophysical journal, vol. 95, no. 11, pp. 5048–5061, 2008.

    1. Author Response:

      Evaluation Summary:

      The authors assessed multivariate relations between a dimensionality-reduced symptom space and brain imaging features, using a large database of individuals with psychosis-spectrum disorders (PSD). Demonstrating both high stability and reproducibility of their approaches, this work showed a promise that diagnosis or treatment of PSD can benefit from a proposed data-driven brain-symptom mapping framework. It is therefore of broad potential interest across cognitive and translational neuroscience.

      We are very grateful for the positive feedback and the careful read of our paper. We would especially like to thank the Reviewers for taking the time to read this lengthy and complex manuscript and for providing their helpful and highly constructive feedback. Overall, we hope the Editor and the Reviewers will find that our responses address all the comments and that the requested changes and edits improved the paper.

      Reviewer 1 (Public Review):

      The paper assessed the relationship between a dimensionality-reduced symptom space and functional brain imaging features based on the large multicentric data of individuals with psychosis-spectrum disorders (PSD).

      The strength of this study is that i) in every analysis, the authors provided high-level evidence of reproducibility in their findings, ii) the study included several control analyses to test other comparable alternatives or independent techniques (e.g., ICA, univariate vs. multivariate), and iii) correlating to independently acquired pharmacological neuroimaging and gene expression maps, the study highlighted neurobiological validity of their results.

      Overall the study has originality and several important tips and guidance for behavior-brain mapping, although the paper contains heavy descriptions about data mining techniques such as several dimensionality reduction algorithms (e.g., PCA, ICA, and CCA) and prediction models.

      We thank the Reviewer for their insightful comments and we appreciate the positive feedback. Regarding the descriptions of methods and analytical techniques, we have removed these descriptions out of the main Results text and figure captions. Detailed descriptions are still provided in the Methods, so that they do not detract from the core message of the paper but can still be referenced if a reader wishes to look up the details of these methods within the context of our analyses.

      Although relatively minors, I also have few points on the weaknesses, including i) an incomplete description about how to tell the PSD effects from the normal spectrum, ii) a lack of overarching interpretation for other principal components rather than only the 3rd one, and iii) somewhat expected results in the stability of PC and relevant indices.

      We are very appreciative of the constructive feedback and feel that these revisions have strengthened our paper. We have addressed these points in the revision as following:

      i) We are grateful to the Reviewer for bringing up this point as it has allowed us to further explore the interesting observation we made regarding shared versus distinct neural variance in our data. It is important to not confuse the neural PCA (i.e. the independent neural features that can be detected in the PSD and healthy control samples) versus the neuro-behavioral mapping. In other words, both PSD patients and healthy controls are human and therefore there are a number of neural functions that both cohorts exhibit that may have nothing to do with the symptom mapping in PSD patients. For instance, basic regulatory functions such as control of cardiac and respiratory cycles, motor functions, vision, etc. We hypothesized therefore that there are more common than distinct neural features that are on average shared across humans irrespective of their psychopathology status. Consequently, there may only be a ‘residual’ symptom-relevant neural variance. Therefore, in the manuscript we bring up the possibility that a substantial proportion of neural variance may not be clinically relevant. If this is in fact true then removing the shared neural variance between PSD and CON should not drastically affect the reported symptom-neural univariate mapping solution, because this common variance does not map to clinical features and therefore is orthogonal statistically. We have now verified this hypothesis quantitatively and have added extensive analyses to highlight this important observation made the the Reviewer. We first conducted a PCA using the parcellated GBC data from all 436 PSD and 202 CON (a matrix with dimensions 638 subjects x 718 parcels). We will refer to this as the GBC-PCA to avoid confusion with the symptom/behavioral PCA described elsewhere in the manuscript. This GBC-PCA resulted in 637 independent GBC-PCs. Since PCs are orthogonal to each other, we then partialled out the variance attributable to GBC-PC1 from the PSD data by reconstructing the PSD GBC matrix using only scores and coefficients from the remaining 636 GBC-PCs (GBˆCwoP C1). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. The results are shown in Fig. S21 and reproduced below. Removing the first PC of shared neural variance (which accounted for about 15.8% of the total GBC variance across CON and PSD) from PSD data attenuated the statistics slightly (not unexpected as the variance was by definition reduced) but otherwise did not strongly affect the univariate mapping solution.

      We repeated the symptom-neural regression next with the first 2 GBC-PCs partialled out of the PSD data Fig. S22, with the first 3 PCs parsed out Fig. S23, and with the first 4 neural PCs parsed out Fig. S24. The symptom-neural maps remain fairly robust, although the similarity with the original βP CGBC maps does drop as more common neural variance is parsed out. These figures are also shown below:

      Fig. S21. Comparison between the PSD βP CGBC maps computed using GBC and GBC with the first neural PC parsed out. If a substantial proportion of neural variance is not be clinically relevant, then removing the shared neural variance between PSD and CON should not drastically affect the reported symptom-neural univariate mapping solution, because this common variance will not map to clinical features. We therefore performed a PCA on CON and PSD GBC to compute the shared neural variance (see Methods), and then parsed out the first GBC-PC from the PSD GBC data (GBˆCwoP C1). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. (A) The βP C1GBC map, also shown in Fig. S10. (B) The first GBC-PC accounted for about 15.8% of the total GBC variance across CON and PSD. Removing GBC-PC1 from PSD data attenuated the βP C1GBC statistics slightly (not unexpected as the variance was by definition reduced) but otherwise did not strongly affect the univariate mapping solution. (C) Correlation across 718 parcels between the two βP C1GBC map shown in A and B. (D-O) The same results are shown for βP C2GBC to βP C5GBC maps.

      Fig. S22. Comparison between the PSD βP CGBC maps computed using GBC and GBC with the first two neural PCs parsed out. We performed a PCA on CON and PSD GBC and then parsed out the first three GBC-PC from the PSD GBC data (GBˆCwoP C1−2, see Methods). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. (A) The βP C1GBC map, also shown in Fig. S10. (B) The second GBC-PC accounted for about 9.5% of the total GBC variance across CON and PSD. (C) Correlation across 718 parcels between the two βP C1GBC map shown in A and B. (D-O) The same results are shown for βP C2GBC to βP C5GBC maps.

      Fig. S23. Comparison between the PSD βP CGBC maps computed using GBC and GBC with the first three neural PCs parsed out. We performed a PCA on CON and PSD GBC and then parsed out the first three GBC-PC from the PSD GBC data (GBˆCwoP C1−3, see Methods). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. (A) The βP C1GBC map, also shown in Fig. S10. (B) The second GBC-PC accounted for about 9.5% of the total GBC variance across CON and PSD. (C) Correlation across 718 parcels between the two βP C1GBC map shown in A and B. (D-O) The same results are shown for βP C2GBC to βP C5GBC maps.

      Fig. S24. Comparison between the PSD βP CGBC maps computed using GBC and GBC with the first four neural PCs parsed out. We performed a PCA on CON and PSD GBC and then parsed out the first four GBC-PC from the PSD GBC data (GBˆCwoP C1−4, see Methods). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. (A) The βP C1GBC map, also shown in Fig. S10. (B) The second GBC-PC accounted for about 9.5% of the total GBC variance across CON and PSD. (C) Correlation across 718 parcels between the two βP C1GBC map shown in A and B. (D-O) The same results are shown for βP C2GBC to βP C5GBC maps.

      For comparison, we also computed the βP CGBC maps for control subjects, shown in Fig. S11. In support of the βP CGBC in PSD being circuit-relevant, we observed only mild associations between GBC and PC scores in healthy controls:

      Results: All 5 PCs captured unique patterns of GBC variation across the PSD (Fig. S10), which were not observed in CON (Fig. S11). ... Discussion: On the contrary, this bi-directional “Psychosis Configuration” axis also showed strong negative variation along neural regions that map onto the sensory-motor and associative control regions, also strongly implicated in PSD (1, 2). The “bi-directionality” property of the PC symptom-neural maps may thus be desirable for identifying neural features that support individual patient selection. For instance, it may be possible that PC3 reflects residual untreated psychosis symptoms in this chronic PSD sample, which may reveal key treatment neural targets. In support of this circuit being symptom-relevant, it is notable that we observed a mild association between GBC and PC scores in the CON sample (Fig. S11).

      ii) In our original submission we spotlighted PC3 because of its pattern of loadings on to hallmark symptoms of PSD, including strong positive loadings across Positive symptom items in the PANSS and conversely strong negative loadings on to most Negative items. It was necessary to fully examine this dimension in particular because these are key characteristics of the target psychiatric population, and we found that the focus on PC3 was innovative because it provided an opportunity to quantify a fully data-driven dimension of symptom variation that is highly characteristic of the PSD patient population. Additionally, this bi-directional axis captured shared variance from measures in other traditional symptoms factors, such the PANSS General factor and cognition. This is a powerful demonstration of how data-driven techniques such as PCA can reveal properties intrinsic to the structure of PSD-relevant symptom data which may in turn improve the mapping of symptom-neural relationships. We refrained from explaining each of the five PCs in detail in the main text as we felt that it would further complicate an already dense manuscript. Instead, we opted to provide the interpretation and data from all analyses for all five PCs in the Supplement. However, in response to the Reviewers’ thoughtful feedback that more focus should be placed on other components, we have expanded the presentation and discussion of all five components (both regarding the symptom profiles and neural maps) in the main text:

      Results: Because PC3 loads most strongly on to hallmark symptoms of PSD (including strong positive loadings across PANSS Positive symptom measures in the PANSS and strong negative loadings onto most Negative measures), we focus on this PC as an opportunity to quantify an innovative, fully data-driven dimension of symptom variation that is highly characteristic of the PSD patient population. Additionally, this bi-directional symptom axis captured shared variance from measures in other traditional symptoms factors, such the PANSS General factor and cognition. We found that the PC3 result provided a powerful empirical demonstration of how using a data-driven dimensionality-reduced solution (via PCA) can reveal novel patterns intrinsic to the structure of PSD psychopathology.

      iii) We felt that demonstrating the stability of the PCA solution was extremely important, given that this degree of rigor has not previously been tested using broad behavioral measures across psychosis symptoms and cognition in a cross-diagnostic PSD sample. Additionally, we demonstrated reproducibility of the PCA solution using independent split-half samples. Furthermore, we derived stable neural maps using the PCA solution. In our original submission we show that the CCA solution was not reproducible in our dataset. Following the Reviewers’ feedback, we computed the estimated sample sizes needed to sufficiently power our multivariate analyses for stable/reproducible solutions. using the methods in (3). These results are discussed in detail in our resubmitted manuscript and in our response to the Critiques section below.

      Reviewer 2 (Public Review):

      The work by Ji et al is an interesting and rather comprehensive analysis of the trend of developing data-driven methods for developing brain-symptom dimension biomarkers that bring a biological basis to the symptoms (across PANSS and cognitive features) that relate to psychotic disorders. To this end, the authors performed several interesting multivariate analyses to decompose the symptom/behavioural dimensions and functional connectivity data. To this end, the authors use data from individuals from a transdiagnostic group of individuals recruited by the BSNIP cohort and combine high-level methods in order to integrate both types of modalities. Conceptually there are several strengths to this paper that should be applauded. However, I do think that there are important aspects of this paper that need revision to improve readability and to better compare the methods to what is in the field and provide a balanced view relative to previous work with the same basic concepts that they are building their work around. Overall, I feel as though the work could advance our knowledge in the development of biomarkers or subject level identifiers for psychiatric disorders and potentially be elevated to the level of an individual "subject screener". While this is a noble goal, this will require more data and information in the future as a means to do this. This is certainly an important step forward in this regard.

      We thank the Reviewer for their insightful and constructive comments about our manuscript. We have revised the text to make it easier to read and to clarify our results in the context of prior works in the field. We fully agree that a great deal more work needs to be completed before achieving single-subject level treatment selection, but we hope that our manuscript provides a helpful step towards this goal.

      Strengths:

      • Combined analysis of canonical psychosis symptoms and cognitive deficits across multiple traditional psychosis-related diagnoses offers one of the most comprehensive mappings of impairments experienced within PSD to brain features to date
      • Cross-validation analyses and use of various datasets (diagnostic replication, pharmacological neuroimaging) is extremely impressive, well motivated, and thorough. In addition the authors use a large dataset and provide "out of sample" validity
      • Medication status and dosage also accounted for
      • Similarly, the extensive examination of both univariate and multivariate neuro-behavioural solutions from a methodological viewpoint, including the testing of multiple configurations of CCA (i.e. with different parcellation granularities), offers very strong support for the selected symptom-to-neural mapping
      • The plots of the obtained PC axes compared to those of standard clinical symptom aggregate scales provide a really elegant illustration of the differences and demonstrate clearly the value of data-driven symptom reduction over conventional categories
      • The comparison of the obtained neuro-behavioural map for the "Psychosis configuration" symptom dimension to both pharmacological neuroimaging and neural gene expression maps highlights direct possible links with both underlying disorder mechanisms and possible avenues for treatment development and application
      • The authors' explicit investigation of whether PSD and healthy controls share a major portion of neural variance (possibly present across all people) has strong implications for future brain-behaviour mapping studies, and provides a starting point for narrowing the neural feature space to just the subset of features showing symptom-relevant variance in PSD

      We are very grateful for the positive feedback. We would like to thank the Reviewers for taking the time to read this admittedly dense manuscript and for providing their helpful critique.

      Critiques:

      • Overall I found the paper very hard to read. There are abbreviation everywhere for every concept that is introduced. The paper is methods heavy (which I am not opposed to and quite like). It is clear that the authors took a lot of care in thinking about the methods that were chosen. That said, I think that the organization would benefit from a more traditional Intro, Methods, Results, and Discussion formatting so that it would be easier to parse the Results. The figures are extremely dense and there are often terms that are coined or used that are not or poorly defined.

      We appreciate the constructive feedback around how to remove the dense content and to pay more attention to the frequency of abbreviations, which impact readability. We implemented the strategies suggested by the Reviewer and have moved the Methods section after the Introduction to make the subsequent Results section easier to understand and contextualize. For clarity and length, we have moved methodological details previously in the Results and figure captions to the Methods (e.g. descriptions of dimensionality reduction and prediction techniques). This way, the Methods are now expanded for clarity without detracting from the readability of the core results of the paper. Also, we have also simplified the text in places where there was room for more clarity. For convenience and ease of use of the numerous abbreviations, we have also added a table to the Supplement (Supplementary Table S1).

      • One thing I found conceptually difficult is the explicit comparison to the work in the Xia paper from the Satterthwaite group. Is this a fair comparison? The sample is extremely different as it is non clinical and comes from the general population. Can it be suggested that the groups that are clinically defined here are comparable? Is this an appropriate comparison and standard to make. To suggest that the work in that paper is not reproducible is flawed in this light.

      This is an extremely important point to clarify and we apologize that we did not make it sufficiently clear in the initial submission. Here we are not attempting to replicate the results of Xia et al., which we understand were derived in a fundamentally different sample than ours both demographically and clinically, with testing very different questions. Rather, this paper is just one example out of a number of recent papers which employed multivariate methods (CCA) to tackle the mapping between neural and behavioral features. The key point here is that this approach does not produce reproducible results due to over-fitting, as demonstrated robustly in the present paper. It is very important to highlight that in fact we did not single out any one paper when making this point. In fact, we do not mention the Xia paper explicitly anywhere and we were very careful to cite multiple papers in support of the multivariate over-fitting argument, which is now a well-know issue (4). Nevertheless, the Reviewers make an excellent point here and we acknowledge that while CCA was not reproducible in the present dataset, this does not explicitly imply that the results in the Xia et al. paper (or any other paper for that matter) are not reproducible by definition (i.e. until someone formally attempts to falsify them). We have made this point explicit in the revised paper, as shown below. Furthermore, in line with the provided feedback, we also applied the multivariate power calculator derived by Helmer et al. (3), which quantitatively illustrates the statistical point around CCA instability.

      Results: Several recent studies have reported “latent” neuro-behavioral relationships using multivariate statistics (5–7), which would be preferable because they simultaneously solve for maximal covariation across neural and behavioral features. Though concerns have emerged whether such multivariate results will replicate due to the size of the feature space relative to the size of the clinical samples (4), Given the possibility of deriving a stable multivariate effect, here we tested if results improve with canonical correlation analysis (CCA) (8) which maximizes relationships between linear combinations of symptom (B) and neural features (N) across all PSD (Fig. 5A).

      Discussion: Here we attempted to use multivariate solutions (i.e. CCA) to quantify symptom and neural feature co- variation. In principle, CCA is well-suited to address the brain-behavioral mapping problem. However, symptom-neural mapping using CCA across either parcel-level or network-level solutionsin our sample was not reproducible even when using a low-dimensional symptom solution and parcellated neural data as a starting point. Therefore, while CCA (and related multivariate methods such as partial least squares) are theoretically appropriate and may be helped by regularization methods such as sparse CCA, in practice many available psychiatric neuroimaging datasets may not provide sufficient power to resolve stable multivariate symptom-neural solutions (3). A key pressing need for forthcoming studies will be to use multivariate power calculators to inform sample sizes needed for resolving stable symptom-neural geometries at the single subject level. Of note, though we were unable to derive a stable CCA in the present sample, this does not imply that the multivariate neuro-behavioral effect may not be reproducible with larger effect sizes and/or sample sizes. Critically, this does highlight the importance of power calculations prior to computing multivariate brain-behavioral solutions (3).

      • Why was PCA selected for the analysis rather than ICA? Authors mention that PCA enables the discovery of orthogonal symptom dimensions, but don't elaborate on why this is expected to better capture behavioural variation within PSD compared to non-orthogonal dimensions. Given that symptom and/or cognitive items in conventional assessments are likely to be correlated in one way or another, allowing correlations to be present in the low-rank behavioural solution may better represent the original clinical profiles and drive more accurate brain-behaviour mapping. Moreover, as alluded to in the Discussion, employing an oblique rotation in the identification of dimensionality-reduced symptom axes may have actually resulted in a brain-behaviour space that is more generalizable to other psychiatric spectra. Why not use something more relevant to symptom/behaviour data like a factor analysis?

      This is a very important point! We agree with the Reviewer that an oblique solution may better fit the data. For this reason, we performed an ICA as shown in the Supplement. We chose to show PCA for the main analyses here because it is a deterministic solution and the number of significant components could be computed via permutation testing. Importantly, certain components from the ICA solution in this sample were highly similar to the PCs shown in the main solution (Supplementary Note 1), as measured by comparing the subject behavioral scores (Fig. S4), and neural maps (Fig. S13). However, notably, certain components in the ICA and PCA solutions did not appear to have a one-to-one mapping (e.g. PCs 1-3 and ICs 1-3). The orthogonality of the PCA solution forces the resulting components to capture maximally separated, unique symptom variance, which in turn map robustly on to unique neural circuits. We observed that the data may be distributed in such a way that in the ICA highly correlated independent components emerge, which do not maximally separate the symptom variance associate with neural variance. We demonstrate this by plotting the relationship between parcel beta coefficients for the βP C3GBC map versus the βIC2GBC and βIC3GBC maps. The sigmoidal shape of the distribution indicates an improvement in the Z-statistics for the βP C3GBC map relative to the βIC2GBC and βIC3GBC maps. We have added this language to the main text Results:

      Notably, independent component analysis (ICA), an alternative dimensionality reduction procedure which does not enforce component orthogonality, produced similar effects for this PSD sample, see Supplementary Note 1 & Fig. S4A). Certain pairs of components between the PCA and ICA solutions appear to be highly similar and exclusively mapped (IC5 and PC4; IC4 and PC5) (Fig. S4B). On the other hand, PCs 1-3 and ICs 1-3 do not exhibit a one-to-one mapping. For example, PC3 appears to correlate positively with IC2 and equally strongly negatively with IC3, suggesting that these two ICs are oblique to the PC and perhaps reflect symptom variation that is explained by a single PC. The orthogonality of the PCA solution forces the resulting components to capture maximally separated, unique symptom variance, which in turn map robustly on to unique neural circuits. We observed that the data may be distributed in such a way that in the ICA highly correlated independent components emerge, which do not maximally separate the symptom variance associate with neural variance. We demonstrate this by plotting the relationship between parcel beta coefficients for the βP C3GBC map versus the βIC2GBC and βIC3GBC maps Fig. ??G). The sigmoidal shape of the distribution indicates an improvement in the Z-statistics for the βP C3GBC map relative to the βIC2GBC and βIC3GBC maps.

      Additionally, the Reviewer raises an important point, and we agree that orthogonal versus oblique solutions warrant further investigation especially with regards to other psychiatric spectra and/or other stages in disease progression. For example, oblique components may better capture dimensions of behavioral variation in prodromal individuals, as these individuals are in the early stages of exhibiting psychosis-relevant symptoms and may show early diverging of dimensions of behavioral variation. We elaborate on this further in the Discussion:

      Another important aspect that will require further characterization is the possibility of oblique axes in the symptom-neural geometry. While orthogonal axes derived via PCA were appropriate here and similar to the ICA-derived axes in this solution, it is possible that oblique dimensions more clearly reflect the geometry of other psychiatric spectra and/or other stages in disease progression. For example, oblique components may better capture dimensions of neuro-behavioral variation in a sample of prodromal individuals, as these patients are exhibiting early-stage psychosis-like symptoms and may show signs of diverging along different trajectories.

      Critically, these factors should constitute key extensions of an iteratively more robust model for indi- vidualized symptom-neural mapping across the PSD and other psychiatric spectra. Relatedly, it will be important to identify the ‘limits’ of a given BBS solution – namely a PSD-derived effect may not generalize into the mood spectrum (i.e. both the symptom space and the resulting symptom-neural mapping is orthogonal). It will be important to evaluate if this framework can be used to initialize symptom-neural mapping across other mental health symptom spectra, such as mood/anxiety disorders.

      • The gene expression mapping section lacks some justification for why the 7 genes of interest were specifically chosen from among the numerous serotonin and GABA receptors and interneuron markers (relevant for PSD) available in the AHBA. Brief reference to the believed significance of the chosen genes in psychosis pathology would have helped to contextualize the observed relationship with the neuro-behavioural map.

      We thank the Reviewer for providing this suggestion and agree that it will strengthen the section on gene expression analysis. Of note, we did justify the choice for these genes, but we appreciate the opportunity to expand on the neurobiology of selected genes and their relevance to PSD. We have made these edits to the text:

      We focus here on serotonin receptor subunits (HTR1E, HTR2C, HTR2A), GABA receptor subunits (GABRA1, GABRA5), and the interneuron markers somatostatin (SST) and parvalbumin (PVALB). Serotonin agonists such as LSD have been shown to induce PSD-like symptoms in healthy adults (9) and the serotonin antagonism of “second-generation” antipsychotics are thought to contribute to their efficacy in targeting broad PSD symptoms (10–12). Abnormalities in GABAergic interneurons, which provide inhibitory control in neural circuits, may contribute to cognitive deficits in PSD (13–15) and additionally lead to downstream excitatory dysfunction that underlies other PSD symptoms (16, 17). In particular, a loss of prefrontal parvalbumin-expression fast-spiking interneurons has been implicated in PSD (18–21).

      • What the identified univariate neuro-behavioural mapping for PC3 ("psychosis configuration") actually means from an empirical or brain network perspective is not really ever discussed in detail. E.g., in Results, "a high positive PC3 score was associated with both reduced GBC across insular and superior dorsal cingulate cortices, thalamus, and anterior cerebellum and elevated GBC across precuneus, medial prefrontal, inferior parietal, superior temporal cortices and posterior lateral cerebellum." While the meaning and calculation of GBC can be gleaned from the Methods, a direct interpretation of the neuro-behavioural results in terms of the types of symptoms contributing to PC3 and relative hyper-/hypo-connectivity of the DMN compared to e.g. healthy controls could facilitate easier comparisons with the findings of past studies (since GBC does not seem to be a very commonly-used measure in the psychosis fMRI literature). Also important since GBC is a summary measure of the average connectivity of a region, and doesn't provide any specificity in terms of which regions in particular are more or less connected within a functional network (an inherent limitation of this measure which warrants further attention).

      We acknowledge that GBC is a linear combination measure that by definition does not provide information on connectivity between any one specific pair of neural regions. However, as shown by highly robust and reproducible neurobehavioral maps, GBC seems to be suitable as a first-pass metric in the absence of a priori assumptions of how specific regional connectivity may map to the PC symptom dimensions, and it has been shown to be sensitive to altered patterns of overall neural connectivity in PSD cohorts (22–25) as well as in models of psychosis (9, 26). Moreover, it is an assumption free method for dimensionality reduction of the neural connectivity matrix (which is a massive feature space). Furthermore, GBC provides neural maps (where each region can be represented by a value, in contrast to full functional connectivity matrices), which were necessary for quantifying the relationship with independent molecular benchmark maps (i.e. pharmacological maps and gene expression maps). We do acknowledge that there are limitations to the method which we now discuss in the paper. Furthermore we agree with the Reviewer that the specific regions implicated in these symptom-neural relationships warrants a more detailed investigation and we plan to develop this further in future studies, such as with seed-based functional connectivity using regions implicated in PSD (e.g. thalamus (2, 27)) or restricted GBC (22) which can summarize connectivity information for a specific network or subset of neural regions. We have provided elaboration and clarification regarding this point in the Discussion:

      Another improvement would be to optimize neural data reduction sensitivity for specific symptom variation (28). We chose to use GBC for our initial geometry characterizations as it is a principled and assumption-free data-reduction metric that captures (dys)connectivity across the whole brain and generates neural maps (where each region can be represented by a value, in contrast to full functional connectivity matrices) that are necessary for benchmarking against molecular imaging maps. However, GBC is a summary measure that by definition does not provide information regarding connectivity between specific pairs of neural regions, which may prove to be highly symptom-relevant and informative. Thus symptom-neural relationships should be further explored with higher-resolution metrics, such as restricted GBC (22) which can summarize connectivity information for a specific network or subset of neural regions, or seed-based FC using regions implicated in PSD (e.g. thalamus (2, 27)).

      • Possibly a nitpick, but while the inclusion of cognitive measures for PSD individuals is a main (self-)selling point of the paper, there's very limited focus on the "Cognitive functioning" component (PC2) of the PCA solution. Examining Fig. S8K, the GBC map for this cognitive component seems almost to be the inverse for that of the "Psychosis configuration" component (PC3) focused on in the rest of the paper. Since PC3 does not seem to have high loadings from any of the cognitive items, but it is known that psychosis spectrum individuals tend to exhibit cognitive deficits which also have strong predictive power for illness trajectory, some discussion of how multiple univariate neuro-behavioural features could feasibly be used in conjunction with one another could have been really interesting.

      This is an important piece of feedback concerning the cognitive measure aspect of the study. As the Reviewer recognizes, cognition is a core element of PSD symptoms and the key reason for including this symptom into the model. Notably, the finding that one dimension captures a substantial proportion of cognitive performance-related variance, independent of other residual symptom axes, has not previously been reported and we fully agree that expanding on this effect is important and warrants further discussion. We would like to take two of the key points from the Reviewers’ feedback and expand further. First, we recognize that upon qualitative inspection PC2 and PC3 neural maps appear strongly anti-correlated. However, as demonstrated in Fig. S9O, PC2 and PC3 maps were anti-correlated at r=-0.47. For comparison, the PC2 map was highly anti-correlated with the BACS composite cognitive map (r=-0.81). This implies that the PC2 map in fact reflects unique neural circuit variance that is relevant for cognition, but not necessarily an inverse of the PC3.

      In other words, these data suggest that there are PSD patients with more (or less) severe cognitive deficits independent of any other symptom axis, which would be in line with the observation that these symptoms are not treatable with antipsychotic medication (and therefore should not correlate with symptoms that are treatable by such medications; i.e. PC3). We have now added these points into the revised paper:

      Results Fig. 1E highlights loading configurations of symptom measures forming each PC. To aid interpretation, we assigned a name for each PC based on its most strongly weighted symptom measures. This naming is qualitative but informed by the pattern of loadings of the original 36 symptom measures (Fig. 1). For example, PC1 was highly consistent with a general impairment dimension (i.e. “Global Functioning”); PC2 reflected more exclusively variation in cognition (i.e. “Cognitive Functioning”); PC3 indexed a complex configuration of psychosis-spectrum relevant items (i.e. “Psy- chosis Configuration”); PC4 generally captured variation mood and anxiety related items (i.e. “Affective Valence”); finally, PC5 reflected variation in arousal and level of excitement (i.e. “Agitation/Excitation”). For instance, a generally impaired patient would have a highly negative PC1 score, which would reflect low performance on cognition and elevated scores on most other symptomatic items. Conversely, an individual with a high positive PC3 score would exhibit delusional, grandiose, and/or hallucinatory behavior, whereas a person with a negative PC3 score would exhibit motor retardation, social avoid- ance, possibly a withdrawn affective state with blunted affect (29). Comprehensive loadings for all 5 PCs are shown in Fig. 3G. Fig. 1F highlights the mean of each of the 3 diagnostic groups (colored spheres) and healthy controls (black sphere) projected into a 3-dimensional orthogonal coordinate system for PCs 1,2 & 3 (x,y,z axes respectively; alternative views of the 3-dimensional coordinate system with all patients projected are shown in Fig. 3). Critically, PC axes were not parallel with traditional aggregate symptom scales. For instance, PC3 is angled at 45◦ to the dominant direction of PANSS Positive and Negative symptom variation (purple and blue arrows respectively in Fig. 1F). ... Because PC3 loads most strongly on to hallmark symptoms of PSD (including strong positive load- ings across PANSS Positive symptom measures in the PANSS and strong negative loadings onto most Negative measures), we focus on this PC as an opportunity to quantify an innovative, fully data-driven dimension of symptom variation that is highly characteristic of the PSD patient population. Additionally, this bi-directional symptom axis captured shared variance from measures in other traditional symptoms factors, such the PANSS General factor and cognition. We found that the PC3 result provided a powerful empirical demonstration of how using a data-driven dimensionality-reduced solution (via PCA) can reveal novel patterns intrinsic to the structure of PSD psychopathology.

      Another nitpick, but the Y axes of Fig. 8C-E are not consistent, which causes some of the lines of best fit to be a bit misleading (e.g. GABRA1 appears to have a more strongly positive gene-PC relationship than HTR1E, when in reality the opposite is true.)

      We have scaled each axis to best show the data in each plot but see how this is confusing and recognise the need to correct this. We have remade the plots with consistent axes labelling.

      • The authors explain the apparent low reproducibility of their multivariate PSD neuro-behavioural solution using the argument that many psychiatric neuroimaging datasets are too small for multivariate analyses to be sufficiently powered. Applying an existing multivariate power analysis to their own data as empirical support for this idea would have made it even more compelling. The following paper suggests guidelines for sample sizes required for CCA/PLS as well as a multivariate calculator: Helmer, M., Warrington, S. D., Mohammadi-Nejad, A.-R., Ji, J. L., Howell, A., Rosand, B., Anticevic, A., Sotiropoulos, S. N., & Murray, J. D. (2020). On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations (p. 2020.08.25.265546). https://doi.org/10.1101/2020.08.25.265546

      We deeply appreciate the Reviewer’s suggestion and the opportunity to incorporate the methods from the Helmer et al. paper. We now highlight the importance of having sufficiently powered samples for multivariate analyses in our other manuscript first-authored by our colleague Dr. Markus Helmer (3). Using the method described in the above paper (GEMMR version 0.1.2), we computed the estimated sample sizes required to power multivariate CCA analyses with 718 neural features and 5 behavioral (PC) features (i.e. the feature set used throughout the rest of the paper):

      As argued in Helmer et al., rtrue is likely below 0.3 in many cases, thus the estimated sample size of 33k is likely a lower bound for the required sample size for sufficiently-powered CCA analyses using the 718+5 features leveraged throughout the univariate analyses in the present manuscript. This number is two orders of magnitude greater than our available sample (and at least one order of magnitude greater than any single existing clinical dataset). Even if rtrue is 0.5, a sample size of ∼10k would likely be required.

      As argued in Helmer et al., rtrue is likely below 0.3 in many cases, thus the estimated sample size of 33k is likely a lower bound for the required sample size for sufficiently-powered CCA analyses using the 718+5 features leveraged throughout the univariate analyses in the present manuscript. This number is two orders of magnitude greater than our available sample (and at least one order of magnitude greater than any single existing clinical dataset). Even if rtrue is 0.5, a sample size of ∼10k would likely be required. We also computed the estimated sample sizes required for 180 neural features (symmetrized neural cortical parcels) and 5 symptom PC features, consistent with the CCA reported in our main text:

      Assuming that rtrue is likely below 0.3, this minimal required sample size remains at least an order of magnitude greater than the size of our present sample, consistent with the finding that the CCA solution computed using these data was unstable. As a lower limit for the required sample size plausible using the feature sets reported in our paper, we additionally computed for comparison the estimated N needed with the smallest number of features explored in our analyses, i.e. 12 neural functional network features and 5 symptom PC features:

      These required sample sizes are closer to the N=436 used in the present sample and samples reported in the clinical neuroimaging literature. This is consistent with the observation that when using 12 neural and 5 symptom features (Fig. S15C) the detected canonical correlation r = 0.38 for CV1 is much lower (and likely not inflated due to overfitting) and may be closer to the true effect because with the n=436 this effect is resolvable. This is in contrast to the 180 neural features and 5 symptom feature CCA solution where we observed a null CCA effect around r > 0.6 across all 5 CVs. This clearly highlights the inflation of the effect in the situation where the feature space grows. There is no a priori plausible reason to believe that the effect for 180 vs. 5 feature mapping is literally double the effect when using 12 vs. 5 feature mapping - especially as the 12 features are networks derived from the 180 parcels (i.e. the effect should be comparable rather than 2x smaller). Consequently, if the true CCA effect with 180 vs. 5 features was actually in the more comparable r = 0.38, we would need >5,000 subjects to resolve a reproducible neuro-behavioral CCA map (an order of magnitude more than in the BSNIP sample). Moreover, to confidently detect effects if rtrue is actually less than 0.3, we would require a sample size >8,145 subjects. We have added this to the Results section on our CCA results:

      Next, we tested if the 180-parcel CCA solution is stable and reproducible, as done with PC-to-GBC univariate results. The CCA solution was robust when tested with k-fold and leave-site-out cross- validation (Fig. S16) likely because these methods use CCA loadings derived from the full sample. However, the CCA loadings did not replicate in non-overlapping split-half samples (Fig. 5L, see see Supplementary Note 4). Moreover, a leave-one-subject-out cross-validation revealed that removing a single subject from the sample affected the CCA solution such that it did not generalize to the left-out subject (Fig. 5M). This is in contrast to the PCA-to-GBC univariate mapping, which was substantially more reproducible for all attempted cross-validations relative to the CCA approach. This is likely because substantially more power is needed to resolve a stable multivariate neuro-behavioral effect with this many features. Indeed, a multivariate power analysis using 180 neural features and 5 symptom features, and assuming a true canonical correlation of r = 0.3, suggests that a minimal sample size of N = 8145 is needed to sufficiently detect the effect (3), an order of magnitude greater than the available sample size. Therefore, we leverage the univariate neuro-behavioral result for subsequent subject-specific model optimization and comparisons to molecular neuroimaging maps.

      Additionally, we added the following to Supplementary Note 4: Establishing the Reproducibility of the CCA Solution:

      Here we outline the details of the split-half replication for the CCA solution. Specifically, the full patient sample was randomly split (referred to as “H1” and “H2” respectively), while preserving the proportion of patients in each diagnostic group. Then, CCA was performed independently for H1 and H2. While the loadings for behavioral PCs and original behavioral items are somewhat similar (mean r 0.5) between the two CCAs in each run, the neural loadings were not stable across H1 and H2 CCA solutions. Critically, CCA results did not perform well for leave-one-subject-out cross-validation (Fig. 5M). Here, one patient was held out while CCA was performed using all data from the remaining 435 patients. The loadings matrices Ψ and Θ from the CCA were then used to calculate the “predicted” neural and behavioral latent scores for all 5 CVs for the patient that was held out of the CCA solution. This process was repeated for every patient and the final result was evaluated for reproducibility. As described in the main text, this did not yield reproducible CCA effects (Fig. 5M). Of note, CCA may yield higher reproducibility if the neural feature space were to be further reduced. As noted, our approach was to first parcellate the BOLD signal and then use GBC as a data-driven method to yield a neuro-biologically and quantitatively interpretable neural data reduction, and we additionally symmetrized the result across hemispheres. Nevertheless, in sharp contrast to the PCA univariate feature selection approach, the CCA solutions were still not stable in the present sample size of N = 436. Indeed, a multivariate power analysis (3) estimates that the following sample sizes will be required to sufficiently power a CCA between 180 neural features and 5 symptom features, at different levels of true canonical correlation (rtrue):

      To test if further neural feature space reduction may be improve reproducibility, we also evaluated CCA solutions with neural GBC parcellated according to 12 brain-wide functional networks derived from the recent HCP driven network parcellation (30). Again, we computed the CCA for all 36 item-level symptom as well as 5 PCs (Fig. S15). As with the parcel-level effects, the network-level CCA analysis produced significant results (for CV1 when using 36 item-level scores and for all 5 CVs when using the 5 PC-derived scores). Here the result produced much lower canonical correlations ( 0.3-0.5); however, these effects (for CV1) clearly exceeded the 95% confidence interval generated via random permutations, suggesting that they may reflect the true canonical correlation. We observed a similar result when we evaluated CCAs computed with neural GBC from 192 symmetrized subcortical parcels and 36 symptoms or 5 PCs (Fig. S14). In other words, data-reducing the neural signal to 12 functional networks likely averaged out parcel-level information that may carry symptom-relevant variance, but may be closer to capturing the true effect. Indeed, the power analysis suggests that the current sample size is closer to that needed to detect an effect with 12 + 5 features:

      Note that we do not present a CCA conducted with parcels across the whole brain, as the number of variables would exceed the number of observations. However, the multivariate power analysis using 718 neural features and 5 symptom features estimates that the following sample sizes would be required to detect the following effects:

      This analysis suggests that even the lowest bound of 10k samples exceeds the present available sample size by two orders of magnitude.

      We have also added Fig. S19, illustrating these power analyses results:

      Fig. S19. Multivariate power analysis for CCA. Sample sizes were calculated according to (3), see also https://gemmr.readthedocs.io/en/latest/. We computed the multivariate power analyses for three versions of CCA reported in this manuscript: i) 718 neural vs. 5 symptom features; ii) 180 neural vs. 5 symptom features; iii) 12 neural vs. 5 symptom features. (A) At different levels of features, the ratio of samples (i.e. subjects) required per feature to derive a stable CCA solution remains approximately the same across all values of rtrue. As discussed in (3), at rtrue = 0.3 the number of samples required per feature is about 40, which is much greater than the ratio of samples to features available in our dataset. (B) The total number of samples required (nreq)) for a stable CCA solution given the total number of neural and symptom features used in our analyses, at different values of rtrue. In general these required sample sizes are much greater than the N=436 (light grey line) PSD in our present dataset, consistent with the finding that the CCA solutions computed using our data were unstable. Notably, the ‘12 vs. 5’ CCA assuming rtrue = 0.3 requires only 700 subjects, which is closest to the N=436 (horizontal grey line) used in the present sample. This may be in line with the observation of the CCA with 12 neural vs 5 symptom features (Fig. S15C) that the canonical correlation (r = 0.38 for CV1) clearly exceeds the 95% confidence interval, and may be closer to the true effect. However, to confidently detect effects in such an analysis (particularly if rtrue is actually less than 0.3), a larger sample would likely still be needed.

      We also added the corresponding methods in the Methods section:

      Multivariate CCA Power Analysis. Multivariate power analyses to estimate the minimum sample size needed to sufficiently power a CCA were computed using methods described in (3), using the Genera- tive Modeling of Multivariate Relationships tool (gemmr, https://github.com/murraylab/ gemmr (v0.1.2)). Briefly, a model was built by: 1) Generating synthetic datasets for the two input data matrices, by sampling from a multivariate normal distribution with a joint covariance matrix that was structured to encode CCA solutions with specified properties; 2) Performing CCAs on these synthetic datasets. Because the joint covariance matrix is known, the true values of estimated association strength, weights, scores, and loadings of the CCA, as well as the errors for these four metrics, can also be computed. In addition, statistical power that the estimated association strength is different from 0 is determined through permutation testing; 3) Varying parameters of the generative model (number of features, assumed true between-set correlation, within-set variance structure for both datasets) the required sample size Nreq is determined in each case such that statistical power reaches 90% and all of the above described error metrics fall to a target level of 10%; and 4) Fitting and validating a linear model to predict the required sample size Nreq from parameters of the generative model. This linear model was then used to calculate Nreq for CCA in three data scenarios: i) 718 neural vs. 5 symptom features; ii) 180 neural vs. 5 symptom features; iii) 12 neural vs. 5 symptom features.

      • Given the relatively even distribution of males and females in the dataset, some examination of sex effects on symptom dimension loadings or neuro-behavioural maps would have been interesting (other demographic characteristics like age and SES are summarized for subjects but also not investigated). I think this is a missed opportunity.

      We have now provided additional analyses for the core PCA and univariate GBC mapping results, testing for effects of age, sex, and SES in Fig. S8. Briefly, we observed a significant positive relationship between age and PC3 scores, which may be because older patients (whom presumably have been ill for a longer time) exhibit more severe symptoms along the positive PC3 – Psychosis Configuration dimension. We also observed a significant negative relationship between Hollingshead index of SES and PC1 and PC2 scores. Lower PC1 and PC2 scores indicate poorer general functioning and cognitive performance respectively, which is consistent with higher Hollingshead indices (i.e. lower-skilled jobs or unemployment and fewer years of education). We also found significant sex differences in PC2 – Cognitive Functioning, PC4 – Affective Valence, and PC5 – Agitation/Excitement scores.

      Fig. S8. Effects of age, socio-economic status, and sex on symptom PCA solution. (A) Correlations between symptom PC scores and age (years) across N=436 PSD. Pearson’s correlation value and uncorrected p-values are reported above scatterplots. After Bonferroni correction, we observed a significant positive relationship between age and PC3 score. This may be because older patients have been ill for a longer period of time and exhibit more severe symptoms along the positive PC3 dimension. (B) Correlations between symptom PC scores and socio-economic status (SES) as measured by the Hollingshead Index of Social Position (31), across N=387 PSD with available data. The index is computed as (Hollingshead occupation score * 7) + (Hollingshead education score * 4); a higher score indicates lower SES (32). We observed a significant negative relationship between Hollingshead index and PC1 and PC2 scores. Lower PC1 and PC2 scores indicate poorer general functioning and cognitive performance respectively, which is consistent with higher Hollingshead indices (i.e. lower-skilled jobs or unemployment and fewer years of education). (C) The Hollingshead index can be split into five classes, with 1 being the highest and 5 being the lowest SES class (31). Consistent with (B) we found a significant difference between the classes after Bonferroni correction for PC1 and PC2 scores. (D) Distributions of PC scores across Hollingshead SES classes show the overlap in scores. White lines indicate the mean score in each class. (E) Differences in PC scores between (M)ale and (F)emale PSD subjects. We found a significant difference between sexes in PC2 – Cognitive Functioning, PC4 – Affective Valence, and PC5 – Agitation/Excitement scores. (F) Distributions of PC scores across M and F subjects show the overlap in scores. White lines indicate the mean score for each sex.

      Bibliography

      1. Jie Lisa Ji, Caroline Diehl, Charles Schleifer, Carol A Tamminga, Matcheri S Keshavan, John A Sweeney, Brett A Clementz, S Kristian Hill, Godfrey Pearlson, Genevieve Yang, et al. Schizophrenia exhibits bi-directional brain-wide alterations in cortico-striato-cerebellar circuits. Cerebral Cortex, 29(11):4463–4487, 2019.
      2. Alan Anticevic, Michael W Cole, Grega Repovs, John D Murray, Margaret S Brumbaugh, Anderson M Winkler, Aleksandar Savic, John H Krystal, Godfrey D Pearlson, and David C Glahn. Characterizing thalamo-cortical disturbances in schizophrenia and bipolar illness. Cerebral cortex, 24(12):3116–3130, 2013.
      3. Markus Helmer, Shaun D Warrington, Ali-Reza Mohammadi-Nejad, Jie Lisa Ji, Amber Howell, Benjamin Rosand, Alan Anticevic, Stamatios N Sotiropoulos, and John D Murray. On stability of canonical correlation analysis and partial least squares with application to brain-behavior associations. bioRxiv, 2020. .
      4. Richard Dinga, Lianne Schmaal, Brenda WJH Penninx, Marie Jose van Tol, Dick J Veltman, Laura van Velzen, Maarten Mennes, Nic JA van der Wee, and Andre F Marquand. Evaluating the evidence for biotypes of depression: Methodological replication and extension of. NeuroImage: Clinical, 22:101796, 2019.
      5. Cedric Huchuan Xia, Zongming Ma, Rastko Ciric, Shi Gu, Richard F Betzel, Antonia N Kaczkurkin, Monica E Calkins, Philip A Cook, Angel Garcia de la Garza, Simon N Vandekar, et al. Linked dimensions of psychopathology and connectivity in functional brain networks. Nature communications, 9(1):3003, 2018.
      6. Andrew T Drysdale, Logan Grosenick, Jonathan Downar, Katharine Dunlop, Farrokh Mansouri, Yue Meng, Robert N Fetcho, Benjamin Zebley, Desmond J Oathes, Amit Etkin, et al. Resting-state connectivity biomarkers define neurophysiological subtypes of depression. Nature medicine, 23(1):28, 2017.
      7. Meichen Yu, Kristin A Linn, Russell T Shinohara, Desmond J Oathes, Philip A Cook, Romain Duprat, Tyler M Moore, Maria A Oquendo, Mary L Phillips, Melvin McInnis, et al. Childhood trauma history is linked to abnormal brain connectivity in major depression. Proceedings of the National Academy of Sciences, 116(17):8582–8590, 2019.
      8. David R Hardoon, Sandor Szedmak, and John Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural computation, 16(12):2639–2664, 2004.
      9. Katrin H Preller, Joshua B Burt, Jie Lisa Ji, Charles H Schleifer, Brendan D Adkinson, Philipp Stämpfli, Erich Seifritz, Grega Repovs, John H Krystal, John D Murray, et al. Changes in global and thalamic brain connectivity in LSD-induced altered states of consciousness are attributable to the 5-HT2A receptor. eLife, 7:e35082, 2018.
      10. Mark A Geyer and Franz X Vollenweider. Serotonin research: contributions to understanding psychoses. Trends in pharmacological sciences, 29(9):445–453, 2008.
      11. H Y Meltzer, B W Massey, and M Horiguchi. Serotonin receptors as targets for drugs useful to treat psychosis and cognitive impairment in schizophrenia. Current pharmaceutical biotechnology, 13(8):1572–1586, 2012.
      12. Anissa Abi-Dargham, Marc Laruelle, George K Aghajanian, Dennis Charney, and John Krystal. The role of serotonin in the pathophysiology and treatment of schizophrenia. The Journal of neuropsychiatry and clinical neurosciences, 9(1):1–17, 1997.
      13. Francine M Benes and Sabina Berretta. Gabaergic interneurons: implications for understanding schizophrenia and bipolar disorder. Neuropsychopharmacology, 25(1):1–27, 2001.
      14. Melis Inan, Timothy J. Petros, and Stewart A. Anderson. Losing your inhibition: Linking cortical gabaergic interneurons to schizophrenia. Neurobiology of Disease, 53:36–48, 2013. ISSN 0969-9961. . What clinical findings can teach us about the neurobiology of schizophrenia?
      15. Samuel J Dienel and David A Lewis. Alterations in cortical interneurons and cognitive function in schizophrenia. Neurobiology of disease, 131:104208, 2019.
      16. John E Lisman, Joseph T Coyle, Robert W Green, Daniel C Javitt, Francine M Benes, Stephan Heckers, and Anthony A Grace. Circuit-based framework for understanding neurotransmitter and risk gene interactions in schizophrenia. Trends in neurosciences, 31(5):234–242, 2008.
      17. Anthony A Grace. Dysregulation of the dopamine system in the pathophysiology of schizophrenia and depression. Nature Reviews Neuroscience, 17(8):524, 2016.
      18. John F Enwright III, Zhiguang Huo, Dominique Arion, John P Corradi, George Tseng, and David A Lewis. Transcriptome alterations of prefrontal cortical parvalbumin neurons in schizophrenia. Molecular psychiatry, 23(7): 1606–1613, 2018.
      19. Daniel J Lodge, Margarita M Behrens, and Anthony A Grace. A loss of parvalbumin-containing interneurons is associated with diminished oscillatory activity in an animal model of schizophrenia. Journal of Neuroscience, 29(8): 2344–2354, 2009.
      20. Clare L Beasley and Gavin P Reynolds. Parvalbumin-immunoreactive neurons are reduced in the prefrontal cortex of schizophrenics. Schizophrenia research, 24(3):349–355, 1997.
      21. David A Lewis, Allison A Curley, Jill R Glausier, and David W Volk. Cortical parvalbumin interneurons and cognitive dysfunction in schizophrenia. Trends in neurosciences, 35(1):57–67, 2012.
      22. Alan Anticevic, Margaret S Brumbaugh, Anderson M Winkler, Lauren E Lombardo, Jennifer Barrett, Phillip R Corlett, Hedy Kober, June Gruber, Grega Repovs, Michael W Cole, et al. Global prefrontal and fronto-amygdala dysconnectivity in bipolar i disorder with psychosis history. Biological psychiatry, 73(6):565–573, 2013.
      23. Alex Fornito, Jong Yoon, Andrew Zalesky, Edward T Bullmore, and Cameron S Carter. General and specific functional connectivity disturbances in first-episode schizophrenia during cognitive control performance. Biological psychiatry, 70(1):64–72, 2011.
      24. Avital Hahamy, Vince Calhoun, Godfrey Pearlson, Michal Harel, Nachum Stern, Fanny Attar, Rafael Malach, and Roy Salomon. Save the global: global signal connectivity as a tool for studying clinical populations with functional magnetic resonance imaging. Brain connectivity, 4(6):395–403, 2014.
      25. Michael W Cole, Alan Anticevic, Grega Repovs, and Deanna Barch. Variable global dysconnectivity and individual differences in schizophrenia. Biological psychiatry, 70(1):43–50, 2011.
      26. Naomi R Driesen, Gregory McCarthy, Zubin Bhagwagar, Michael Bloch, Vincent Calhoun, Deepak C D’Souza, Ralitza Gueorguieva, George He, Ramani Ramachandran, Raymond F Suckow, et al. Relationship of resting brain hyperconnectivity and schizophrenia-like symptoms produced by the nmda receptor antagonist ketamine in humans. Molecular psychiatry, 18(11):1199–1204, 2013.
      27. Neil D Woodward, Baxter Rogers, and Stephan Heckers. Functional resting-state networks are differentially affected in schizophrenia. Schizophrenia research, 130(1-3):86–93, 2011.
      28. Zarrar Shehzad, Clare Kelly, Philip T Reiss, R Cameron Craddock, John W Emerson, Katie McMahon, David A Copland, F Xavier Castellanos, and Michael P Milham. A multivariate distance-based analytic framework for connectome-wide association studies. Neuroimage, 93 Pt 1:74–94, Jun 2014. .
      29. Alan J Gelenberg. The catatonic syndrome. The Lancet, 307(7973):1339–1341, 1976.
      30. Jie Lisa Ji, Marjolein Spronk, Kaustubh Kulkarni, Grega Repovš, Alan Anticevic, and Michael W Cole. Mapping the human brain’s cortical-subcortical functional network organization. NeuroImage, 185:35–57, 2019.
      31. August B Hollingshead et al. Four factor index of social status. 1975.
      32. Jaya L Padmanabhan, Neeraj Tandon, Chiara S Haller, Ian T Mathew, Shaun M Eack, Brett A Clementz, Godfrey D Pearlson, John A Sweeney, Carol A Tamminga, and Matcheri S Keshavan. Correlations between brain structure and symptom dimensions of psychosis in schizophrenia, schizoaffective, and psychotic bipolar i disorders. Schizophrenia bulletin, 41(1):154–162, 2015.
    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewing Editor Comments:

      (A) Revisions related to the first part, regarding data mining and curation:

      (1) One question that arises with the part of the manuscript that discusses the identification and classification of ion channels is whether these will be made available to the wider public. For the 419 human sequences, making a small database to share this result so that these sequences can be easily searched and downloaded would be desirable. There are a variety of acceptable formats for this: GitHub/figshare/zenodo/university website that allows a wider community to access their hard work. Providing such a resource would greatly expand the impact of this paper. The same question can be asked of the 48,000+ ion channels from diverse organisms.

      We thank the reviewer for providing this important feedback. While the long term plan is to provide access to these sequences and annotations through a knowledge base resource like Pharos, we agree with the comments that it would be beneficial to have these sequences made available with the manuscript as well. We have compiled 3 fasta files containing the following: 1) Full length sequences for the curated 419 ion channel sequences. 2) Pore containing domain sequences for the 343 pore domain containing human ion channel sequences. 3) All the identified orthologs for the human ion channels.

      For each sequence in these files, we have extended the ID line to include the most pertinent annotation information to make it readily available. For example, the id>sp|P48995|TRPC1_HUMAN|TRP:VGIC--TRP-TRPC|pore-forming|dom:387-637 provides the classification, unit and domain bounds for the human TRPC1 in the fasta file itself.

      These files have been uploaded to Zenodo and are available for download with doi 10.5281/zenodo.16232527. We have included this in the Data Availability statement of the manuscript as well.

      (2) Regarding the 48,000+ sequences, what checks have been done to confirm that they all represent bona fide, full-length ion channel sequences? Uniprot contains a good deal of unreviewed sequences, especially from single-celled organisms. The process by which true orthologues were identified and extraneous hits discarded should be discussed in more detail, and all inclusion criteria should be described and justified, clearly illustrating that the risk of gene duplicates and fragments in this final set of ion channel orthologues has been avoided. Related to this, does this analysis include or exclude isoforms?

      We thank the reviewer for raising this important point. Our selection of curated proteomes and the KinOrtho pipeline for orthology detection returns, up to an extent, reliable orthologous sequence sets. In brief, our database sequences are retrieved from full proteomes that only include proteins that are part of an official proteome release. Thus, they are mapped from a reference genome to ensure species-specific relevance and avoid redundancy. The >1500 proteomes in this analysis were selected based on their wider use in other orthology detection pipelines like OMA and InParanoid. Our orthology detection pipeline, KinOrtho, performs a fulllength and a domain-based orthology detection which ensures that the orthologous relationships are being defined based on the pore-domain sequence similarity. 

      But we agree with the reviewer that this might leave room for extraneous, fragments or misannotated sequences to be included in our results. Taking this into careful consideration, we have expanded our sequence validation pipeline to include additional checks such as checking the uniport entry type, protein existence evidence and sequence level checks such as evaluating the compositional bias, non-standard codons and sequence lengths. These validation steps are now described in detail in the Methods section under orthology analysis (lines 768-808). All the originally listed orthologous sequences passed this validation pipeline and thus provide additional confidence that they are bona fide full length ion channel sequences.

      We have also expanded this section (lines 758 – 766) to provide more details of the KinOrtho pipeline for orthology detection, which is a previously published method used for orthology detection in kinases by our lab.

      Finally, our orthology analysis excludes isoforms and only spans the primary canonical sequences that are part of the UniProt Proteomes annotated sequence set. The isoforms that are generally available in UniProt Proteomes in a separate file named *_additional.fasta were not included in this analysis.

      (3) The decision to show the families of ion channels in Figure 1 as pie charts within a UMAP embedding is intriguing but somewhat non-intuitive and difficult to understand. Illustrating these results with a standard tree-like visualization of the relationship of these channels to each other would be preferred.

      We appreciate the feedback provided by the reviewer, and understand that a standard tree-like visualization would be much easier to interpret and familiar than a bubble chart based on UMAP embeddings. However, we opted to use the bubble chart for the following reasons:

      Low sequence similarity: the 419 human ICs share very minimal sequence similarity, falling in the twilight zone or lower ( Dolittle, 1992; PMID:1339026). Thus, traditional multiple sequence alignment and phylogenetic reconstruction methods perform very poorly and generate unreliable or even misleading results. To explore the practicality of this option, we pursued performing a multiple sequence alignment of just 3 of the possibly related IC families as suggested by reviewer 2 (CALHM, Pannexins, and Connexins) using the state of the art structure based sequence alignment method Foldmason (doi: https://doi.org/10.1101/2024.08.01.606130). Even then, the sequence alignment and the resulting tree for just these 3 families were poor and unreliable, as illustrated in the attached Author response Image 2.

      Protein embeddings based clustering: Novel LLM based approaches such as the protein language model embeddings offer ways to overcome these limitations by capturing sequence, structure, function and evolutionary properties in a high-dimensional space. Thus, we employed this model using DEDAL followed by UMAP for dimensionality reduction, which preserves biologically meaningful local and global relationships.

      Abstraction at family level: In Figure 1, we aggregate individual channels into family bubbles with their positions representing the average UMAP coordinates of their members. This offers a balance between an intuitive view of how IC families are distributed in the embedding space and reflects potential functional and evolutionary proximities, while not being impeded by individual IC relationships across families.

      We have revised the figure legend (lines 1221 – 1234) with additional description of the visualization and the process used to generate it, and the manuscript text (lines 248-270) provides the rationale behind the selection of this method.

      (4) A strength of this paper is the visualization of 'dark' ion channels. However, throughout the paper, this could be emphasized more as the key advantage of this approach and how this or similar approaches could be used for other families of proteins. Specifically, in the initial statement describing 'light' vs 'dark channels', the importance of this distinction and the historical preference in science to study that which has already been studied can be discussed more, even including references to other studies that take this kind of approach. An example of a relevant reference here is to the Structural Genomics Consortium and its goals to achieve structures of proteins for which functions may not be well-characterized. Clarifying these motivations throughout the entire paper would strengthen it considerably.

      We thank the reviewer for this constructive comment and agree that highlighting the strength of visualizing “dark” channels and prioritizing them for future studies would strengthen the paper. As suggested, we have revised the text throughout the paper (lines 84-89, 176-180) to contextualize and emphasize this distinction. We have also added a reference for the Structural Genomics Consortium, which, along with resources like IDG, has provided significant resources for prioritizing understudied proteins.

      (5) Since the authors have generated the UMAP visualization of the channome, it would be interesting to understand how the human vs orthologue gene sets compare in this space.

      We appreciate the reviewer’s input. It is an interesting idea to explore the UMAP embedding space for the human ICs along with their orthologs. The large number of orthologous sequences (>37,000) would certainly impose a computational challenge to generate embeddings-based pairwise alignments across all of them. Downstream dimensionality reduction from such a large set and the subsequent visualization would also suffer from accuracy and interpretability concerns. However, to follow up on the reviewer’s comments, we selected orthologous sequences from a subset of 12 model organisms spanning all taxa (such as mouse, zebrafish, fruit fly, C. elegans, A. thaliana, S. cerevisiae, E. coli, etc.).This increased the number of sequences for analysis to 1094 from 343, which is still manageable for UMAP. Using the exact same method, we generated the UMAP embeddings plot for this set as shown below. 

      Author response image 1.

      UMAP embeddings of the human ICs alongside orthologs from 12 model organisms

      As shown above, we observed that each orthologous set forms tight, well-defined clusters, preserving local relationships among closely related sequences. For example, a large number of VGICs cluster more closely together compared to Supplementary Figure 1 (with only the human ICs). However, families that were previously distant from others now appear to be even more scattered or pushed further away, indicating a loss of global structure. This pattern suggests that while local distances are well preserved, the global topology of the embedding space could be compromised. Moreover, we find that the placement of ICs with respect to other families is highly sensitive to the parameter choices (e.g., n_neighbors and min_dist), an issue which we did not encounter when using only the human IC sequences. The inclusion of a large number of orthologous sequences that are highly similar to a single human IC but dissimilar to others skews the embedding space, emphasizing local structure at the expense of global relationships.

      Since UMAP and similar dimensionality reduction methods prioritize local over global structure, the resulting embeddings accurately reflect strong ortholog clustering but obscure broader interfamily relationships. Consequently, interpreting the spatial arrangement of human IC families with respect to one another becomes unreliable. We have made this plot available as part of this response, and anyone interested can access this in the response document.   

      (6) Figure 1 should say more clearly that this is an analysis of the human gene set and include more of the information in the text: 419 human ion channel sequences, 75 sequences previously unidentified, 4 major groups and 55 families, 62 outliers, etc. Clearer visualizations of these categories and numbers within the UMAP (and newly included tree) visualization would help guide the reader to better understand these results. Specifically, which are the 75 previously unidentified sequences?

      We thank the reviewer for the comments. To address this, we have revised Figure 1 and added more information, including a clear header that states that these are only human IC sets, numbers showing the total number of ICs, and the number of ICs in each group. We have further included new Supplementary Figure 2 and Supplementary Table 2, which show the overlap of IC sequences across the different resources. Supplementary Figure 2 is an upset plot that provides a snapshot of the overlap between curated human ICs in this study compared to KEGG, GtoP, and Pharos. Supplementary Table 2 provides more details on this overlap by listing, for each human IC, whether they are curated as an IC in the 3 IC annotation resources. We believe these additions should provide all the information, including the unidentified sequences we are adding to this resource.

      (7) Overall, the manuscript needs to provide a clearer description of the need for a better-curated sequence database of ion channels, as well as how existing resources fall short.

      We thank the reviewer for pointing out this important gap in the description. As suggested, we have revised the text thoroughly in the Introduction section to address this comment. Specifically, we have added sections to describe existing resources at sequence and structure levels that currently provide details and/or classification of human ion channels. Then, we highlight the facts that these resources are missing some characterized pore-containing ICs, do not include any information on auxiliary channels, and lack a holistic evolutionary perspective, which raises the need for a better-curated database of ion channels. Please refer to lines 57-63, 73-79, and 95 – 119 for these changes and additions.

      (8) Some of the analysis pipeline is unclear. Specifically, the RAG analysis seems critical, but it is unclear how this works - is it on top of the GPT framework and recursively inquires about the answer to prompts? Some example prompts would be useful to understand this.

      We thank the reviewer for highlighting this gap in explanation. We understand that the details provided in the Methods and Supplementary Figure 1 may not have sufficiently explained the pipeline, and are missing some important details. The RAG pipeline leverages vector-based retrieval integrated with OpenAI’s GPT-4o model to systematically search literature and generate evidence-based answers. The process is as follows:

      Literature sources (PubMed articles) relevant to the annotated ion channels were converted into vector representations stored in a Qdrant database.

      Queries constructed from the annotated IC dataset were submitted to the vector database, retrieving contextually relevant literature segments.

      Retrieved contexts served as inputs to the GPT-4o model, which produced structured JSON-formatted responses containing direct evidence regarding ion selectivity and gating mechanisms, along with associated confidence scores.

      To clarify this further, we have rewritten the relevant subsection in lines 649 - 718. Now, this section provides a detailed description of the RAG pipeline. Also, we have improved Supplementary Figure 1 to provide a clearer description of the pipeline. We have also provided an example prompt template to illustrate the query. These additions clarify how the pipeline functions and demonstrate its practical utility for IC annotation.

      (9) The existence of 76 auxiliary non-pore containing 'ion channel' genes in this analysis is a little confusing, as it seems a part of the pipeline is looking for pore-lining residues. Furthermore, how many of these are picked up in the larger orthologues search? Are these harder to perform checks on to ensure that they are indeed ion channel genes? A further discussion of the choice to include these auxiliary sequences would be relevant. This could just be further discussion of the literature that has decided to do this in the past.

      We thank the reviewer for this comment, and agree that further clarification of our selection and definition of auxiliary IC sequences would be helpful. As the reviewer has pointed out, one of the annotation pipeline steps is indeed looking for the pore-lining residues. Any sequences that do not have a pore-containing domain are then considered to be auxiliary, and we search for additional evidence of their binding with one of the annotated pore-containing ICs. If such evidence is not found in the literature, we remove them from our curated IC list. 

      In response to the above comment, we have revised the manuscript text to provide these details. In the Introduction section, we have added references to previous literature that have described auxiliary ICs and also pointed out that the existing ion channel resources do not account for such auxiliary channels (lines 73-79, 107-108,148-149). We have also expanded the Methods section to describe the selection and definition of auxiliary channels (lines 640-646).

      With regards to the orthology analysis, since auxiliary channels do not have a pore domain, and our orthology pipeline requires a pore domain similarity search and hit, we did not include them in this part of the analysis. We have clarified the text in the Results section to ensure this is communicated properly throughout the manuscript (lines 212-215, 260-263). 

      (10) Why are only evolutionary relationships between rat, mouse, and human shown in Figure 3A? These species are all close on the evolutionary timeline.

      We thank the reviewer for this comment. Figure 3A currently provides a high-level evolutionary relationship across the 6 human CALHM members as a pretext for the pattern based Bayesian analysis. However, since this analysis is based on a wider set of orthologs that span taxa, we agree that a larger tree that includes more orthologs is warranted.

      We have now revised Figure 3A to include an expanded tree that includes 83 orthologs from all 6 human CALHM members spanning 14 organisms from different taxa, ranging from mammals, fishes, birds, nematodes, and cnidarians. The overall structure of the tree is still consistent with 2 major clades as before, with CALHM 1 and 3 in the first clade and CALHM 2,4,5, and 6 in the second clade, with good branch support.

      (B) Revisions related to the second part, regarding the analysis of CAHLM channel mutations:

      (1) It would strengthen the manuscript if it included additional discussion and references to show that previous methods to analyze conserved residues in CALHM were significantly lacking. What results would previous methods give, and why was this not enough? Were there just not enough identified CALHM orthologues to give strong signals in conservation analysis? Also, the amino acid conservation between CLHM-1 and CALHM1 is extremely low. Thus, there are other CALHM orthologs that give strong signals in conservation analysis. There are ~6 papers that perform in-depth analysis of the role of conserved residues in the gating of CALHM channels (human and C. elegans) that were not cited (Ma et al, Am J Physiol Cell Physiol, 2025; Syrjanen et al, Nat Commun, 2023; Danielli et al, EMBO J, 2023; Kwon et al, Mol Cells, 2021; Tanis et al, Am J Physiol Cell Physiol, 2017; Tanis et al, J Neurosci, 2013; Ma et al, PNAS, 2013) - these data needs to be discussed in the context of the present work.

      We thank the reviewer for the comment and agree that these are excellent studies that have advanced understanding of conserved residues in CALHM gating. While their analyses compared a limited set of sequences, focusing on residues conserved in specific CALHM homologs or species like C. elegans, our analysis encompasses thousands of sequences across the entire CALHM family, allowing us to identify residues conserved across all family members over evolution. We also coupled this sequence analysis with hypotheses derived from our published structural studies (Choi et al., Nature, 2019), which highlighted the NTH/S1 region as a critical element in channel gating. Based on this, we focused on evolutionarily conserved residues in the S1–S2 linker and at the interface of S1 with the rest of the TMD, reasoning that if S1 movement is essential for gating, these two structural elements (acting as a hinge and stabilizing interface, respectively) would be key determinants of the conformational dynamics of S1. These regions have been largely overlooked in previous studies. As a result, the residues highlighted in our study do not overlap with those previously reported but instead provide complementary insights into gating mechanisms in this unique channel family. Together, our study and the published literature suggest that many regions and residues in CALHM proteins are critical for gating: while some are conserved across the entire family evolutionarily, others appear conserved only within certain species or subfamilies.

      To address the reviewer’s comment, and to highlight the points mentioned above, we have added a brief discussion of these studies and the relevant citations in the revised manuscript (lines 378– 385, 563–576).

      (2) Whereas the current-voltage relations for WT channels are clearly displayed, the data that is shown for the mutants does not allow for determining if their gating properties are indeed different than WT.

      First, the current amplitudes for the mutants were quantified at just one voltage, which makes it impossible to determine if their voltage-dependence was different than WT, which would be a strong indicator for an effect in gating. Current-voltage relations as done for the WT channels should be included for at least some key mutations, which should include additional relevant controls like the use of Gd3+ as an inhibitor to rule out the contribution of some endogenous currents.

      We thank the reviewer for this comment. To address this, we performed additional experiments using a multi-step pulse protocol to obtain current-voltage relations for WT CALHM1, CALHM1(I109W), WT CALHM6, and CALHM6(W113A). Our initial two-step protocol (−80 mV and +120 mV) covers both the physiological voltage range and the extended range commonly used in biophysical characterization of ion channels. Most mutants did not exhibit channel activation even within this broad range. We therefore focused on the three mutants that did show substantial activation to perform full I–V analysis as suggested. In all groups, currents activated at 37 °C were significantly inhibited by Gd<sup>3+</sup>, consistent with published reports (Ma et al., AJP 2025; Danielli et al., EMBO J 2023; Syrjänen et al., Nat Commun 2023). Notably, for CALHM6(Y51A), while this mutation did not significantly alter current amplitudes at positive membrane potentials, it markedly reduced currents at negative potentials, rendering the channel outwardly rectifying and altering its voltage dependence. These new data are incorporated into Figure 5 (panels A–O) and discussed in the manuscript. Figure 5 now also shows current amplitudes at both +120 mV and −80 mV in 0 mM Ca<sup>2+</sup> at 37 °C to facilitate direct comparison between WT and mutants. The previous data at 5 mM Ca<sup>2+</sup> and 0 mM Ca<sup>2+</sup> at 22 °C have been moved to Supplementary Figure 5 as requested.

      Second, it is unclear whether the three experimental conditions (5 mM Ca<sup>2+</sup>, and 0 Ca<sup>2+</sup>, at 22 and 37C) were measured in the same cell in each experiment, or if they represent different experiments. This should be clarified. If measurements at each condition were done in the same experiment, direct comparison between the three conditions within each individual experiment could further help identify mutations with altered gating.

      We thank the reviewer for pointing this out and apologize for the confusion. All three conditions (5 mM Ca<sup>2+</sup> at 22 °C, 0 mM Ca<sup>2+</sup> at 22 °C, and 0 mM Ca<sup>2+</sup> at 37 °C) were sequentially measured in the same cell within each experiment. The currents were then averaged across cells and plotted for each group.

      Third, in line 334, the authors state that "expression levels of wild-type proteins and mutants are comparable." However, Western blots showing CALHM protein abundance (Supplementary Fig. 3) are not of acceptable quality; in the top blot, WT CALHM1 appears too dim, representative blots were not shown for all mutants, and individual data points should be included on the group data quantitation of the blots, together with a statistical test comparing mutants with the WT control.

      We thank the reviewer for the comment and agree that representative blots were not shown for all mutants. Supplementary Figure 4 (previously Supplementary Figure 3) has been updated to include representative blots for all mutants, individual data points in the quantification, and statistical tests comparing each mutant to the WT control.

      A more serious concern is that the total protein quantitation is not very informative about the functional impact of mutations in ion channels, because mutations can severely impact channel localization in the plasma membrane without reducing the total protein that is translated. In mammalian cells, CALHM6 is localized to intracellular compartments and only translocates to the plasma membrane in response to an activating stimulus (Danielli et al, EMBO J, 2023). Thus, if CALHM6 is only intracellular, the protein amount would not change, but the measured current would. Abundant intracellular CALHM1 has also been observed in mammalian cells transfected with this protein (Dreses-Werringloer et al., Cell, 2008). Quantitation of surface-biotinylated channels would provide information on whether there are differences between the constructs in relation to surface expression rather than gating. An alternative approach to biotinylation would be to express GFP-tagged constructs in Xenopus oocytes and look for surface expression. This is what has been done in previous CALHM channel studies.

      Without evidence for the absence of defects in localization or clear alterations in gating properties, it is not possible to conclude whether mutant channels have altered activity. Does the analysis of sequences provide any testable hypotheses about substitutions with different side chains at the same position in the sequence?

      We thank the reviewer for this very important comment. We agree that total protein levels alone do not distinguish between intracellular retention and proper trafficking to the plasma membrane. To address this, we performed surface biotinylation assays for all WT and mutant CALHM1 and CALHM6 constructs to assess their plasma membrane localization. The results show that mutants have either comparable or substantially higher surface expression levels than WT, consistent with the Western blot data. Together, these findings support our original interpretation that the observed differences in electrophysiological currents are not due to trafficking defects but reflect functional effects. These new data are presented in Supplementary Figure 5.

      (3) Line 303 - 13 aligned amino acids were conserved across all CALHM homologs - are these also aligned in related connexin and pannexin families? It is likely that cysteines and proline in TM2 are since CALHM channels overall share a lot of similarities with connexins and pannexins (Siebert et al, JBC, 2013). As in line 207, it would be expected that pannexins, connexins, and CALHM channel families would group together. Related to this, see Line 406 - in connexins, there is also a proline kink in TM2 that may play a role in mediating conformational changes between channel states (Ri et al, Biophysical Journal, 1999). This should be discussed.

      We thank the reviewer for the suggestion. We attempted a structure based sequence alignment of representative structures from all 3 families (CALHM, connexins and pannexins), but the resulting alignments are very poor and have a lot of gapped regions, making it very difficult to comment on the similarities mentioned in this comment. This is actually expected, as although CALHM, connexins, and pannexins are all considered “large-pore” channels, the TMD arrangement and conformation of CALHM are distinct from those of connexins and pannexins. Below, we have included a snapshot of the alignment at the conserved cysteine regions of the CALHM homologs, along with the resulting tree, which has very low support values and has difficulty placing the connexins properly, making it difficult to interpret.

      Author response image 2.

      Structure based sequence alignment and phylogenetic analysis of available crystal structures of members from the CALHM, Pannexin and Connexin families. Top: The resulting sequence alignment is very sparse and does not show conservation of residues in the TM regions. The CPC motif with conserved cysteines in CALHM family is shown. Bottom: Phylogenetic tree based on the alignment has low support values making it difficult to interpret.

      (4) Line 36 - This work does not have experimental evidence to show that the selected evolutionarily conserved residues alter gating functions.

      Our electrophysiology data demonstrate that the selected evolutionarily conserved residues have a major impact on CALHM1 and CALHM6 gating. As shown in Figure 5, mutations at these residues produce two distinct phenotypes: (1) nonconductive channels, and (2) altered voltage dependence, resulting in outward rectification. Importantly, these functional changes occur despite normal total expression and surface trafficking, as confirmed by Western blotting and surface biotinylation (Supplementary Figure 4). These findings indicate that the affected residues are critical for the conformational dynamics underlying channel gating rather than for protein expression or localization.

      (5) Line 296-297 - This could also be put in the context of what we already know about CALHM gating. While all cryo EM structures of CALHM channels are in the open state, we still do understand some things about gating mechanism (Tanis et al Am J Physiol Cell Physiol, Cell Physiol 2017; Ma et al Am J Physiol Cell Physiol, Cell Physiol 2025) with the NT modulating voltage dependence and stabilizing closed channel states and the voltage dependent gate being formed by proximal regions of TM1.

      Thank you for providing this suggestion. As suggested, we have revised the text to place our findings in the context of current knowledge about CALHM gating and have added the relevant citations (lines 370-373).

      (6) Lines 314-315 - Just because residues are conserved does not mean that they play a role in channel gating. These residues could also be important for structure, ion selectivity, etc.

      We agree that evolutionary conservation alone does not imply a role in gating. However, our hypothesis derives from the positioning of these conserved residues, and previous studies that have indicated the importance of the NTH/S1 region for channel gating function. More importantly, our electrophysiology data indicate that these conserved residues specifically impact channel gating in CALHM1 and CALHM6. We have revised the text in lines 404-406 to clarify this further.

      (7) Line 333 - while CALHM6 is less studied than CALHM1, there is knowledge of its function and gating properties. Should CALHM6 be considered a "dark" channel? The IDG development level in Pharos is Tbio. There have been multiple papers published on this channel (ex: Ebihara et al, J Exp Med, 2010; Kasamatsu et al, J Immunol 2014; Danielli et al, EMBO J, 2023).

      We thank the reviewer for noting this important discrepancy. We have updated the text and labels related to CALHM6 to reflect its status as Tbio in the manuscript.

      (8) Please cite Jeon et al., (Biochem Biophys Res Commun, 2021), who have already shown temperature-dependence of CALHM1.

      Thank you for the comment. We have added the citation.  

      (9) It would be helpful to have a schematic showing amino acid residues, TM domains, highlighted residues mutated, etc.

      Thank you for the suggestion. We have revised the figure and added labels for the TM domains, and highlighted the mutated residues.

      Reviewer #1 (Recommendations for the authors):

      (1) Why in the title is 'ion-channels' hyphenated but in the text it is not?

      This has been changed.

      (2) Line 78: 'Cryo-EM' is not defined before the acronym is used.

      This has been fixed.

      (3) Typo in line 519: KinOrthto.

      This has been fixed.

      (4) Capitalizing 'Tree of Life' is a bit strange in section 2 of the results and the Discussion.

      We have removed the capitalization as suggested.

      (5) In Figure 3 and Supplementary Figure 4A, the gene names in the tree are CAHM and not CALHM - I assume this is an error.

      This has been made consistent to CALHM.

      (6) Font sizes throughout all figures, with the exception of Figure 1, need to be more legible. The X-axis labels in Figure 2A are hard to read, for example (though I can see that there is also the CAHM/CALHM typo here...). A good rule of thumb is that they should be the same size as the manuscript text. Furthermore, the grey backgrounds of Figure 4 and Figure 5 are off-putting; just having a white background here should be sufficient.

      This has been addressed. We have increased the font size in all figures with these revisions. The styling for Figure 4 and 5 has also been made consistent with other figures.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 36 - This work does not have experimental evidence to show that the selected evolutionarily conserved residues alter gating functions.

      Addressed in comment #4 for Part B Revisions related to the second part, regarding the analysis of CAHLM channel mutations above.

      (2) Line 168 - should also be Supplemental Table 1.

      This has been addressed.

      (3) Line 170 - 419 human ion channel sequences were identified and this was an increase of 75 sequences over previous number. Which 75 proteins are these?

      This is now shown in Supplementary Figure 2 and Supplementary Table 2. Supplementary Figure 2 shows an Upset plot with the number of sequences that overlap across databases and the novel sequences that we have added as part of this study. The 75 specifically refers to the sequences that were not included in Pharos, which was chosen to refer to this number since it has the highest number of ICs listed out of all the other resources. Further, Supplementary Table 2 now provides a list of individual ICs and whether they were present in each of the 3 databases compared.

      (4) Line 289 - Ca2+ (not Ca); other similar mistakes throughout the manuscript

      These have been fixed.

      (5) Line 291-292 - Please include more about functions for CALHM channels; ex. CALHM1 regulates cortical neuron excitability (Ma et al, PNAS 2012), CLHM-1 regulates locomotion and induces neurodegeneration in C. elegans (Tanis et al. Journal of Neuroscience 2013); see above for references on CALHM6 function.

      We have added the functions as suggested.

      (6) Line 296-297 - This could also be put in the context of what we already know about CALHM gating. While all cryo EM structures of CALHM channels are in the open state, we still do understand some things about gating mechanism (Tanis et al Am J Physiol Cell Physiol, Cell Physiol 2017; Ma et al Am J Physiol Cell Physiol, Cell Physiol 2025) with the NT modulating voltage dependence and stabilizing closed channel states and the voltage dependent gate being formed by proximal regions of TM1.

      Addressed in comment #5 for Part B Revisions related to the second part, regarding the analysis of CAHLM channel mutations above.

      (7) Lines 314-315 - Just because residues are conserved does not mean that they play a role in channel gating. These residues could also be important for structure, ion selectivity, etc.

      Addressed in comment #6 for Part B Revisions related to the second part, regarding the analysis of CAHLM channel mutations above.

      (8) Line 333 - While CALHM6 is less studied than CALHM1, there is knowledge of its function and gating properties. Should CALHM6 be considered a "dark" channel? The IDG development level in Pharos is Tbio. There have been multiple papers published on this channel (ex: Ebihara et al, J Exp Med, 2010; Kasamatsu et al, J Immunol 2014; Danielli et al, EMBO J, 2023).

      Addressed in comment #7 for Part B Revisions related to the second part, regarding the analysis of CAHLM channel mutations above.

      (9) Line 627 - Do you mean that 5 mM CaCl2 was replaced with 5 mM EGTA in 0 Ca2+ solution?

      This is correct.  

      (10) Why are only evolutionary relationships between rat, mouse, and human shown in Figure 3A? These species are all close on the evolutionary timeline.

      Addressed in comment #10 for Part A Revisions related to the first part, regarding data mining and curation above.

      (11) Figure 5 - no need to show the currents at room temperature in the main text since there are robust currents at 37 degrees; this could go into the supplement. Also, please cite Jeon et al. (Biochem Biophys Res Commun, 2021), who have already shown temperature-dependence of CALHM1.

      Addressed in comment #8 for Part B Revisions related to the second part, regarding the analysis of CAHLM channel mutations above.

      (12) It would be helpful to have a schematic showing amino acid residues, TM domains, highlighted residues mutated etc.

      Addressed in comment #9 for Part B Revisions related to the second part, regarding the analysis of CAHLM channel mutations above.

      (13) Use of S1-S4 to refer to the transmembrane "segments" is not standard; rather, TM1-TM4 would generally be used to refer to transmembrane domains.

      We have used the S1–S4 helix notation to maintain consistency with the nomenclature employed in our previous study (Choi et al., Nature, 2019).

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Authors’ reply (____Ono et al)

      Review Commons Refereed Preprint #RC-2025-03137

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Ono et al addressed how condensin II and cohesin work to define chromosome territories (CT) in human cells. They used FISH to assess the status of CT. They found that condensin II depletion leads to lengthwise elongation of G1 chromosomes, while double depletion of condensin II and cohesin leads to CT overlap and morphological defects. Although the requirement of condensin II in shortening G1 chromosomes was already shown by Hoencamp et al 2021, the cooperation between condensin II and cohesin in CT regulation is a new finding. They also demonstrated that cohesin and condensin II are involved in G2 chromosome regulation on a smaller and larger scale, respectively. Though such roles in cohesin might be predictable from its roles in organizing TADs, it is a new finding that the two work on a different scale on G2 chromosomes. Overall, this is technically solid work, which reports new findings about how condensin II and cohesin cooperate in organizing G1 and G2 chromosomes.

      We greatly appreciate the reviewer’s supportive comments. The reviewer has accurately recognized our new findings concerning the collaborative roles of condensin II and cohesin in establishing and maintaining interphase chromosome territories.

      Major point:

      They propose a functional 'handover' from condensin II to cohesin, for the organization of CTs at the M-to-G1 transition. However, the 'handover', i.e. difference in timing of executing their functions, was not experimentally substantiated. Ideally, they can deplete condensin II and cohesin at different times to prove the 'handover'. However, this would require the use of two different degron tags and go beyond the revision of this manuscript. At least, based on the literature, the authors should discuss why they think condensin II and cohesin should work at different timings in the CT organization.

      We take this comment seriously, especially because Reviewer #2 also expressed the same concern. 

      First of all, we must admit that the basic information underlying the “handover” idea was insufficiently explained in the original manuscript. Let us make it clear below:

      • Condensin II bound to chromosomes and is enriched along their axes from anaphase through telophase (Ono et al., 2004; Hirota et al., 2004; Walther et al., 2018).
      • In early G1, condensin II is diffusely distributed within the nucleus and does not bind tightly to chromatin, as shown by detergent extraction experiments (Ono et al., 2013).
      • Cohesin starts binding to chromatin when the cell nucleus reassembles (i.e., during the cytokinesis stage shown in Fig. 1B), apparently replacing condensins I and II (Brunner et al., 2025).
      • Condensin II progressively rebinds to chromatin from S through G2 phase (Ono et al., 2013). The cell cycle-dependent changes in chromosome-bound condensin II and cohesin summarized above are illustrated in Fig. 1A. We now realize that Fig. 1B in the original manuscript was inconsistent with Fig. 1A, creating unnecessary confusion, and we sincerely apologize for this. The fluorescence images shown in the original Fig. 1B were captured without detergent extraction prior to fixation, giving the misleading impression that condensin II remained bound to chromatin from cytokinesis through early G1. This was not our intention. To clarify this, we have repeated the experiment in the presence of detergent extraction and replaced the original Fig. 1B with a revised panel. Figs. 1A and 1B are now more consistent with each other. Accordingly, we have modified the correspsonding sentences as follows:

      Although condensin II remains nuclear throughout interphase, its chromatin binding is weak in G1 and becomes robust from S phase through G2 (Ono et al., 2013). Cohesin, in contrast, replaces condensin II in early G1 (Fig. 1 B)(Abramo et al., 2019; Brunner et al., 2025), and establishes topologically associating domains (TADs) in the G1 nucleus (Schwarzer et al., 2017; Wutz et al., 2017)*. *

      While there is a loose consensus in the field that condensin II is replaced by cohesin during the M-to-G1 transition, it remains controversial whether there is a short window during which neither condensin II nor cohesin binds to chromatin (Abramo et al., 2019), or whether there is a stage in which the two SMC protein complexes “co-occupy” chromatin (Brunner et al., 2025). Our images shown in the revised Fig. 1B cannot clearly distinguish between these two possibilities.

      From a functional point of view, the results of our depletion experiments are more readily explained by the latter possibility. If this is the case, the “interplay” or “cooperation” rather than the “handover” may be a more appropriate term to describe the functional collaboration between condensin II and cohesin during the M-to-G1 transition. For this reason, we have avoided the use of the word “handover” in the revised manuscript. It should be emphasized, however, that given their distinct chromosome-binding kinetics, the cooperation of the two SMC complexes during the M-to-G1 transition is qualitatively different from that observed in G2. Therefore, the central conclusion of the present study remains unchanged.

      For example, a sentence in Abstract has been changed as follows:

      a functional interplay between condensin II and cohesin during the mitosis-to-G1 transition is critical for establishing chromosome territories (CTs) in the newly assembling nucleus.

      While the reviewer suggested one experiment, it is clearly beyond the scope of the current study. It should also be noted that even if such a cell line were available, the proposed application of sequential depletion to cells progressing from mitosis to G1 phase would be technically challenging and unlikely to produce results that could be interpreted with confidence.

      Other points:

      Figure 2E: It seems that the chromosome length without IAA is shorter in Rad21-aid cells than H2-aid cells or H2-aid Rad21-aid cells. How can this be interpreted? This comment is well taken. A related comment was made by Reviewer #3 (Major comment #2). Given the substantial genetic manipulations applied to establish multiple cell lines used in the present study, it is, strictly speaking, not straightforward to compare the -IAA controls between different cell lines. Such variations are most prominently observed in Fig. 2E, although they can also be observed to lesser extent in other experiments (e.g., Fig. 3E). This issue is inherently associated with all studies using genetically manipulated cell lines and therefore cannot be completely avoided. For this reason, we focus on the differences between -IAA and +IAA within each cell line, rather than comparing the -IAA conditions across different cell lines. In this sense, a sentence in the original manuscript (lines 178-180) was misleading. In the revised manuscript, we have modified the corresponding and subsequent sentence as follows:

      Although cohesin depletion had a marginal effect on the distance between the two site-specific probes (Fig.2, C and E), double depletion did not result in a significant change (Fig.2, D and E), consistent with the partial restoration of centromere dispersion (Fig. 1G).

      • *

      In addition, we have added a section entitled “Limitations of the study” at the end of the Discussion to address technical issues that are inevitably associated with the current approach.

      Figure 3: Regarding the CT morphology, could they explain further the difference between 'elongated' and 'cloud-like (expanded)'? Is it possible to quantify the frequency of these morphologies? In the original manuscript, we provided data that quantitatively distinguished between the “elongated” and “cloud-like” phenotypes. Specifically, Fig. 2E shows that the distance between two specific loci (Cen 12 and 12q15) is increased in the elongated phenotype but not in the cloud-like phenotype. In addition, the cloud-like morphology was clearly deviated from circularity, as indicated by the circularity index (Fig. 3F). However, because circularity can also decrease in rod-shaped chromosomes, these datasets alone may not be sufficiently convincing, as the reviewer pointed out. We have now included an additional parameter, the aspect ratio, defined as the ratio of an object’s major axis to its minor axis (new Fig. 3F). While this intuitive parameter was altered upon condensin II depletion and double depletion, again, we acknowledge that it is not sufficient to convincingly distinguish between the elongated and cloud-like phenotypes proposed in the original manuscript. For these reasons, in the revised manuscript, we have toned down our statements regarding the differences in CT morphology between the two conditions. Nonetheless, together with the data from Figs. 1 and 2, it is that the Rabl configuration observed upon condensin II depletion is further exacerbated in the absence of cohesin. Accordingly, we have modified the main text and the cartoon (Fig 3H) to more accurately depict the observations summarized above.

      Figure 5: How did they assign C, P and D3 for two chromosomes? The assignment seems obvious in some cases, but not in other cases (e.g. in the image of H2-AID#2 +IAA, two D3s can be connected to two Ps in the other way). They may have avoided line crossing between two C-P-D3 assignments, but can this be justified when the CT might be disorganized e.g. by condensin II depletion? This comment is well taken. As the reviewer suspected, we avoided line crossing between two sets of assignments. Whenever there was ambiguity, such images were excluded from the analysis. Because most chromosome territories derived from two homologous chromosomes are well separated even under the depleted conditions as shown in Fig. 6C, we did not encounter major difficulties in making assignments based on the criteria described above. We therefore remain confident that our conclusion is valid.

      That said, we acknowledge that our assignments of the FISH images may not be entirely objective. We have added this point to the “Limitations of the study” section at the end of the Discussion.

      Figure 6F: The mean is not indicated on the right-hand side graph, in contrast to other similar graphs. Is this an error? We apologize for having caused this confusion. First, we would like to clarify that the right panel of Fig. 6F should be interpreted together with the left panel, unlike the seemingly similar plots shown in Figs. 6G and 6H. In the left panel of Fig. 6F, the percentages of CTs that contact the nucleolus are shown in grey, whereas those that do not are shown in white. All CTs classified in the “non-contact” population (white) have a value of zero in the right panel, represented by the bars at 0 (i.e., each bar corresponds to a collection of dots having a zero value). In contrast, each CT in the “contact” population (grey) has a unique contact ratio value in the right panel. Because the right panel consists of two distinct groups, we reasoned that placing mean or median bars would not be appropriate. This was why no mean or median bars were shown in in the tight panel (The same is true for Fig. S5 A and B).

      That said, for the reviewer’s reference, we have placed median bars in the right panel (see below). In the six cases of H2#2 (-/+IAA), Rad21#2 (-/+IAA), Double#2 (-IAA), and Double#3 (-IAA), the median bars are located at zero (note that in these cases the mean bars [black] completely overlap with the “bars” derived from the data points [blue and magenta]). In the two cases of Double#2 (+IAA) and Double#3 (+IAA), they are placed at values of ~0.15. Statistically significant differences between -IAA and +IAA are observed only in Double#2 and Double#3, as indicated by the P-value shown on the top of the panel. Thus, we are confident in our conclusion that CTs undergo severe deformation in the absence of both condensin II and cohesin.

      Figure S1A: The two FACS profiles for Double-AID #3 Release-2 may be mixed up between -IAA and +IAA. The review is right. This inadvertent error has been corrected.

      The method section explains that 'circularity' shows 'how closely the shape of an object approximates a perfect circle (with a value of 1 indicating a perfect circle), calculated from the segmented regions'. It would be helpful to provide further methodological details about it. We have added further explanations regarding the circularity in Materials and Methods together with a citation (two added sentences are underlined below):

      To analyze the morphology of nuclei, CTs, and nucleoli, we measured “circularity,” a morphological index that quantifies how closely the shape of an object approximates a perfect circle (value =1). Circularity was defined as 4π x Area/Perimeter2, where both the area and perimeter of each segmented object were obtained using ImageJ. This index ranges from 0 to 1, with values closer to 1 representing more circular objects and lower values correspond to elongated or irregular shapes (Chen et al, 2017).

      Chen, B., Y. Wang, S. Berretta and O. Ghita. 2017. Poly Aryl Ether Ketones (PAEKs) and carbon-reinforced PAEK powders for laser sintering. J Mater Sci 52:6004-6019.

      Reviewer #1 (Significance (Required)):

      Ono et al addressed how condensin II and cohesin work to define chromosome territories (CT) in human cells. They used FISH to assess the status of CT. They found that condensin II depletion leads to lengthwise elongation of G1 chromosomes, while double depletion of condensin II and cohesin leads to CT overlap and morphological defects. Although the requirement of condensin II in shortening G1 chromosomes was already shown by Hoencamp et al 2021, the cooperation between condensin II and cohesin in CT regulation is a new finding. They also demonstrated that cohesin and condensin II are involved in G2 chromosome regulation on a smaller and larger scale, respectively. Though such roles in cohesin might be predictable from its roles in organizing TADs, it is a new finding that the two work on a different scale on G2 chromosomes. Overall, this is technically solid work, which reports new findings about how condensin II and cohesin cooperate in organizing G1 and G2 chromosomes.

      See our reply above.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Ono et al use a variety of imaging and genetic (AID) depletion approaches to examine the roles of condensin II and cohesin in the reformation of interphase genome architecture in human HCT16 cells. Consistent with previous literature, they find that condensin II is required for CENP-A dispersion in late mitosis/early G1. Using in situ FISH at the centromere/q arm of chromosome 12 they then establish that condensin II removal causes lengthwise elongation of chromosomes that, interestingly, can be suppressed by cohesin removal. To better understand changes in whole-chromosome morphology, they then use whole chromosome painting to examine chromosomes 18 and 19. In the absence of condensin II, cells effectively fail to reorganise their chromosomes from rod-like structures into spherical chromosome territories (which may explain why CENP-A dispersion is suppressed). Cohesin is not required for spherical CT formation, suggesting condensin II is the major initial driver of interphase genome structure. Double depletion results in complete disorganisation of chromatin, leading the authors to conclude that a typical cell cycle requires orderly 'handover' from the mitotic to interphase genome organising machinery. The authors then move on to G2 phase, where they use a variety of different FISH probes to assess alterations in chromosome structure at different scales. They thereby establish that perturbation of cohesin or condensin II influences local and longer range chromosome structure, respectively. The effects of condensin II depletion become apparent at a genomic distance of 20 Mb, but are negligible either below or above. The authors repeat the G1 depletion experiment in G2 and now find that condensin II and cohesin are individually dispensable for CT organisation, but that dual depletion causes CT collapse. This rather implies that there is cooperation rather than handover per se. Overall this study is a broadly informative multiscale investigation of the roles of SMC complexes in organising the genome of postmitotic cells, and solidifies a potential relationship between condensin II and cohesin in coordinating interphase genome structure. The deeper investigation of the roles of condensin II in establishing chromosome territories and intermediate range chromosome structure in particular is a valuable and important contribution, especially given our incomplete understanding of what functions this complex performs during interphase.

      We sincerely appreciate the reviewer’s supportive comments. The reviewer has correctly acknowledged both the current gaps in our understanding of the role of condensin II in interphase chromosome organization and our new findings on the collaborative roles of condensin II and cohesin in establishing and maintaining interphase chromosome territories.

      Major comments:

      In general the claims and conclusions of the manuscript are well supported by multiscale FISH labelling. An important absent control is western blotting to confirm protein depletion levels. Currently only fluorescence is used as a readout for the efficiency of the AID depletion, and we know from prior literature that even small residual quantities of SMC complexes are quite effective in organising chromatin. I would consider a western blot a fairly straightforward and important technical control.

      Let me explain why we used immunofluorescence measurements to evaluate the efficiency of depletion. In our current protocol for synchronizing at the M-to-G1 transition, ~60% of control and H2-depleted cells, and ~30% of Rad21-depleted and co-depleted cells, are successfully synchronized in G1 phase. The apparently lower synchronization efficiency in the latter two groups is attributable to the well-documented mitotic delay caused by cohesin depletion. From these synchronized populations, early G1 cells were selected based on their characteristic morphologies (see the legend of Fig. 1C). In this way, we analyzed an early G1 cell population that had completed mitosis without chromosome segregation defects. We acknowledge that this represents a technically challenging aspect of M-to-G1 synchronization in HCT116 cells, whose synchronization efficiency is limited compared with that of HeLa cells. Nevertheless, this approach constitutes the most practical strategy currently available. Hence, immunofluorescence provides the only feasible means to evaluate depletion efficiency under these conditions.

      Although immunoblotting can, in principle, be applied to G2-arrested cell populations, we do not believe that information obtained from such experiments would affect the main conclusions of the current study. Please note that we carefully designed and performed all experiments with appropriate controls: H2 depletion, RAD21 depletion, and double depletion, with outcomes confirmed using independent cell lines (Double-AID#2 and Double-AID#3) whenever deemed necessary.

      We fully acknowledge the technical limitations associated with the AID-mediated depletion techniques, which are now described in the section entitled “Limitations of the study” at the end of the Discussion. Nevertheless, we emphasize that these limitations do not compromise the validity of our findings.

      I find the point on handover as a mechanism for maintaining CT architecture somewhat ambiguous, because the authors find that the dependence simply switches from condensin II to both condensin II and cohesin, between G1 and G2. To me this implies augmented cooperation rather than handover. I have two further suggestions, both of which I would strongly recommend but would consider desirable but 'optional' according to review commons guidelines.

      First of all, we would like to clarify a possible misunderstanding regarding the phrase “handover as a mechanism for maintaining CT architecture somewhat ambiguous”. In the original manuscript, we proposed handover as a mechanism for establishing G1 chromosome territories, not for maintaining CTs.

      That said, we take this comment very seriously, especially because Reviewer #1 also expressed the same concern. Please see our reply to Reviewer #1 (Major point).

      In brief, we agree with the reviewer that the word “handover” may not be appropriate to describe the functional relationship between condensin II and cohesin during the M-to-G1 transition. In the revised manuscript, we have avoided the use of the word “handover”, replacing it with “interplay”. It should be emphasized, however, that given their distinct chromosome-binding kinetics, the cooperation of the two SMC complexes during the M-to-G1 transition is qualitatively different from that observed in G2. Therefore, the central conclusion of the present study remains unchanged.

      For example, a sentence in Abstract has been changed as follows:

      a functional interplay between condensin II and cohesin during the mitosis-to-G1 transition is critical for establishing chromosome territories (CTs) in the newly assembling nucleus.

      Firstly, the depletions are performed at different stages of the cell cycle but have different outcomes. The authors suggest this is because handover is already complete, but an alternative possibility is that the phenotype is masked by other changes in chromosome structure (e.g. duplication/catenation). I would be very curious to see, for example, how the outcome of this experiment would change if the authors were to repeat the depletions in the presence of a topoisomerase II inhibitor.

      The reviewer’s suggestion here is somewhat vague, and it is unclear to us what rationale underlies the proposed experiment or what meaningful outcomes could be anticipated. Does the reviewer suggest that we perform topo II inhibitor experiments both during the M-to-G1 transition and in G2 phase, and then compare the outcomes between the two conditions?

      For the M-to-G1 transition, Hildebrand et at (2024) have already reported such experiments. They used a topo II inhibitor to provided evidence that mitotic chromatids are self-entangled and that the removal of these mitotic entanglements is required to establish a normal interphase nucleus. Our own preliminary experiments (not presented in the current manuscript) showed that ICRF treatment of cells undergoing the M-to-G1 transition did not affect post-mitotic centromere dispersion. The same treatment also had little effect on the suppression of centromere dispersion observed in condensin II-depleted cells.

      Under G2-arrested condition, because chromosome territories are largely individualized, we would expect topo II inhibition to affect only the extent of sister catenation, which is not the focus of our current study. We anticipate that inhibiting topo II in G2 would have only a marginal, if any, effect on the maintenance of chromosome territories detectable by our current FISH approaches.

      In any case, we consider the suggested experiment to be beyond the scope of the present manuscript, which focuses on the collaborative roles of condensin II and cohesin as revealed by multi-scale FISH analyses.

      Secondly, if the author's claim of handover is correct then one (not exclusive) possibility is that there is a relationship between condensin II and cohesin loading onto chromatin. There does seem to be a modest co-dependence (e.g. fig S4 and S7), could the authors comment on this?

      First of all, we wish to point out the reviewer’s confusion between the G2 experiments and the M-to-G1 experiments. Figs. S4 and S7 concern experiments using G2-arrested cells, not M-to-G1 cells in which a possible handover mechanism is discussed. Based on Fig. 1, in which the extent of depletion in M-to-G1 cells was tested, no evidence of “co-dependence” between H2 depletion and RAD21 depletion was observed.

      That said, as the reviewer correctly points out, we acknowledge the presence of marginal yet statistically significant reductions in the RAD21 signal upon H2 depletion (and vice versa) in G2-arrested cells (Figs. S4 and S7).

      Another control experiment here would be to treat fully WT cells with IAA and test whether non-AID labelled H2 or RAD21 dip in intensity. If they do not, then perhaps there's a causal relationship between condensin II and cohesin levels?

      According to the reviewer’s suggestion, we tested whether IAA treatment causes an unintentional decreases in the H2 or RAD21 signals in G2-arrested cells, and found that it is not the case (see the attached figure below).

      Thus, these data indicate that there is a modest functional interdependence between condensin II and cohesin in G2-arrested cells. For instance, condensin II depletion may modestly destabilize chromatin-bound cohesin (and vice versa). However, we note that these effects are minor and do not affect the overall conclusions of the study. In the revised manuscript, we have described these potentially interesting observations briefly as a note in the corresponding figure legends (Fig. S4).

      I recognise this is something considered in Brunner et al 2025 (JCB), but in their case they depleted SMC4 (so all condensins are lost or at least dismantled). Might bear further investigation.

      Methods:

      Data and methods are described in reasonable detail, and a decent number of replicates/statistical analyses have been. Documentation of the cell lines used could be improved. The actual cell line is not mentioned once in the manuscript. Although it is referenced, I'd recommend including the identity of the cell line (HCT116) in the main text when the cells are introduced and also in the relevant supplementary tables. Will make it easier for readers to contextualise the findings.

      We apologize for the omission of important information regarding the parental cell line used in the current study. The information has been added to Materials and Methods as well as the resource table.

      Minor comments:

      Overall the manuscript is well-written and well presented. In the introduction it is suggested that no experiment has established a causal relationship between human condensin II and chromosome territories, but this is not correct, Hoencamp et al 2021 (cell) observed loss of CTs after condensin II depletion. Although that manuscript did not investigate it in as much detail as the present study, the fundamental relationship was previously established, so I would encourage the authors to revise this statement.

      We are somewhat puzzled by this comment. In the original manuscript, we explicitly cited Hoencamp et al (2021) in support of the following sentences:

      • *

      (Lines 78-83 in the original manuscript)

      *Moreover, high-throughput chromosome conformation capture (Hi-C) analysis revealed that, under such conditions, chromosomes retain a parallel arrangement of their arms, reminiscent of the so-called Rabl configuration (Hoencamp et al., 2021). These findings indicate that the loss or impairment of condensin II during mitosis results in defects in post-mitotic chromosome organization. *

      • *

      That said, to make the sentences even more precise, we have made the following revision in the manuscript.

      • *

      (Lines 78- 82 in the revised manuscript)

      *Moreover, high-throughput chromosome conformation capture (Hi-C) analysis revealed that, under such conditions, chromosomes retain a parallel arrangement of their arms, reminiscent of the so-called Rabl configuration (Hoencamp et al., 2021). These findings,together with cytological analyses of centromere distributions, indicate that the loss or impairment of condensin II during mitosis results in defects in post-mitotic chromosome organization. *

      • *

      The following statement was intended to explain our current understanding of the maintenance of chromosome territories. Because Hoencamp et al (2021) did not address the maintenance of CTs, we have kept this sentence unchanged.

      • *

      (Lines 100-102 in the original manuscript)

      Despite these findings, there is currently no evidence that either condensin II, cohesin, or their combined action contributes to the maintenance of CT morphology in mammalian interphase cells (Cremer et al., 2020).

      • *

      • *

      Reviewer #2 (Significance (Required)):

      General assessment:

      Strengths: the multiscale investigation of genome architecture at different stages of interphase allow the authors to present convincing and well-analysed data that provide meaningful insight into local and global chromosome organisation across different scales.

      Limitations:

      As suggested in major comments.

      Advance:

      Although the role of condensin II in generating chromosome territories, and the roles of cohesin in interphase genome architecture are established, the interplay of the complexes and the stage specific roles of condensin II have not been investigated in human cells to the level presented here. This study provides meaningful new insight in particular into the role of condensin II in global genome organisation during interphase, which is much less well understood compared to its participation in mitosis.

      Audience:

      Will contribute meaningfully and be of interest to the general community of researchers investigating genome organisation and function at all stages of the cell cycle. Primary audience will be cell biologists, geneticists and structural biochemists. Importance of genome organisation in cell/organismal biology is such that within this grouping it will probably be of general interest.

      My expertise is in genome organization by SMCs and chromosome segregation.

      We appreciate the reviewer’s supportive comments. As the reviewer fully acknowledges, this study is the first systematic survey of the collaborative role of condensin II and cohesin in establishing and maintaining interphase chromosome territories. In particular, multi-scale FISH analyses have enabled us to clarify how the two SMC protein complexes contribute to the maintenance of G2 chromosome territories through their actions at different genomic scales. As the reviewer notes, we believe that the current study will appeal to a broad readership in cell and chromosome biology. The limitations of the current study mentioned by the reviewer are addressed in our reply above.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary:

      The manuscript “Condensin II collaborates with cohesin to establish and maintain interphase chromosome territories" investigates how condensin II and cohesin contribute to chromosome organization during the M-to-G1 transition and in G2 phase using published auxin-inducible degron (AID) cell lines which render the respective protein complexes nonfunctional after auxin addition. In this study, a novel degron cell line was established that enables the simultaneous depletion of both protein complexes, thereby facilitating the investigation of synergistic effects between the two SMC proteins. The chromosome architecture is studied using fluorescence in situ hybridization (FISH) and light microscopy. The authors reproduce a number of already published data and also show that double depletion causes during the M-to-G1 transition defects on chromosome territories, producing expanded, irregular shapes that obscure condensin II-specific phenotypes. Findings in G2 cells point to a new role of condensin II for chromosome conformation at a scale of ~20Mb. Although individual depletion has minimal effects on large-scale CT morphology in G2, combined loss of both complexes produces marked structural abnormalities, including irregular crescent-shaped CTs displaced toward the nucleolus and increased nucleolus-CT contact. The authors propose that condensin II and cohesin act sequentially and complementarily to ensure proper post-mitotic CT formation and maintain chromosome architecture across genomic scales.

      We greatly appreciate the reviewer’s supportive comments. The reviewer has accurately recognized our new findings concerning the collaborative roles of condensin II and cohesin in the establishment and maintenance of interphase chromosome territories.

      Concenrs about statistics:

      • The authors provide the information on how many cells are analyzed but not the number of independent experiments. My concern is that there might variations in synchronization of the cell population and in the subsequent preparation (FISH) affecting the final result. We appreciate the reviewer’s important comment regarding the biological reproducibility of our experiments. As the reviewer correctly points out, variations in cell-cycle synchronization and FISH sample preparation can occur across experiments. To address this concern, we repeated the key experiments supporting our main conclusions (Figs. 3 and 6) two additional times, resulting in three independent biological replicas in total. All replicate experiments reproduced the major observations from the original analyses. These results further substantiated our original conclusion, despite the inevitable variability arising from cell synchronization or sample preparation in this type of experiments. In the revised manuscript, we have now explicitly indicated the number of biological replicates in the corresponding figures.

      The analyses of chromosome-arm conformation shown in Fig. 5 were already performed in three independent rounds of experiments, as noted in the original submission. In addition, similar results were already obtained in other analyses reported in the manuscript. For example, centromere dispersion was quantified using an alternative centromere detection method (related to Fig. 1), and distances between specific chromosomal sites were measured using different locus-specific probes (related to Figs. 2 and 4). In both cases, the results were consistent with those presented in the manuscript.

      • Statistically the authors analyze the effect of cells with induced degron vs. vehicle control (non-induced). However, the biologically relevant question is whether the data differ between cell lines when the degron system is induced. This is not tested here (cf. major concern 2 and 3). See our reply to major concerns 2 and 3.

      • Some Journal ask for blinded analysis of the data which might make sense here as manual steps are involved in the data analysis (e.g. line 626 / 627the convex hull of the signals was manually delineated, line 635 / 636 Chromosome segmentation in FISH images was performed using individual thresholding). However personally I have no doubts on the correctness of the work. We thank the reviewer for pointing out that some steps in our data analysis were performed manually, such as delineating the convex hull of signals and segmenting chromosomes in FISH and IF images using individual thresholds. These manual steps were necessary because signal intensities vary among cells and chromosomes, making fully automated segmentation unreliable. To ensure objectivity, we confirmed that the results were consistent across two independently established double-depletion cell lines, which produced essentially identical findings. In addition, we repeated the key experiments underpinning our main conclusions (Figs. 3 and 6) two additional times, and the results were fully consistent with the original analyses. Therefore, we are confident that our current data analysis approach does not compromise the validity of our conclusions. Finally, we appreciate the reviewer’s kind remark that there is no doubt regarding the correctness of our work.

      Major concerns:

      • Degron induction appears to delay in Rad21-AID#1 and Double-AID#1 cells the transition from M to G1, as shown in Fig. S1. After auxin treatment, more cells exhibit a G2 phenotype than in an untreated population. What are the implications of this for the interpretation of the experiments? In our protocol shown in Fig. 1C, cells were released into mitosis after G2 arrest, and IAA was added 30 min after release. It is well established that cohesin depletion causes a prometaphase delay due to spindle checkpoint activation (e.g., Vass et al, 2003, Curr Biol; Toyoda and Yanagida, 2006, MBoC; Peters et al, 2008, Genes Dev), which explains why cells with 4C DNA content accumulated, as judged by FACS (Fig. S1). The same was true for doubly depleted cells. However, a fraction of cells that escaped this delay progressed through mitosis and enter the G1 phase of the next cell cycle. We selected these early G1 cells and used them for down-stream analyses. This experimental procedure was explicitly described in the legends of Fig. 1C and Fig. S1A as follows:

      (Lines 934-937; Legend of Fig. 1C)

      From the synchronized populations, early G1cells were selected based on their characteristic morphologies (i.e., pairs of small post-mitotic cells) and subjected to downstream analyses. Based on the measured nuclear sizes (Fig. S2 G), we confirmed that early G1 cells were appropriately selected.

      (Lines 1114-1119; Legend of Fig. S1A)

      In this protocol, ~60% of control and H2-depleted cells, and ~30% of Rad21-depleted and co-depleted cells, were successfully synchronized in G1 phase. The apparently lower synchronization efficiency in the latter two groups is attributable to the well documented mitotic delay caused by cohesin depletion (Hauf et al., 2005; Haarhuis et al., 2013; Perea-Resa et al., 2020). From these synchronized populations, early G1 cells were selected based on their characteristic morphologies (see the legend of Fig. 1 C).

      • *

      Thus, using this protocol, we analyzed an early G1 cell population that had completed mitosis without chromosome segregation defects. We acknowledge that this represents a technically challenging aspect of synchronizing cell-cycle progression from M to G1 in HCT116 cells, whose synchronization efficiency is limited compared with that of HeLa cells. Nevertheless, this approach constitutes the most practical strategy currently available.

      • Line 178 "In contrast, cohesin depletion had a smaller effect on the distance between the two site-specific probes compared to condensin II depletion (Fig. 2, C and E)." The data in Fig. 2 E show both a significant effect of H2 and a significant effect of RAD21 depletion. Whether the absolute difference in effect size between the two conditions is truly relevant is difficult to determine, as the distribution of the respective control groups also appears to be different. This comment is well taken. Reviewer #1 has made a comment on the same issue. See our reply to Reviewer #1 (Other points, Figure 2E).

      In brief, in the current study, we should focus on the differences between -IAA and +IAA within each cell line, rather than comparing the -IAA conditions across different cell lines. In this sense, a sentence in the original manuscript (lines 178-180) was misleading. In the revised manuscript, we have modified the corresponding and subsequent sentence as follows:

      Although cohesin depletion had a marginal effect on the distance between the two site-specific probes (Fig.2, C and E), double depletion did not result in a significant change (Fig.2, D and E), consistent with the partial restoration of centromere dispersion (Fig. 1G).

      • In Figures 3, S3 and related text in the manuscript I cannot follow the authors' argumentation, as H2 depletion alone leads to a significant increase in the CT area (Chr. 18, Chr. 19, Chr. 15). Similar to Fig. 2, the authors argue about the different magnitude of the effect (H2 depletion vs double depletion). Here, too, appropriate statistical tests or more suitable parameters describing the effect should be used. I also cannot fully follow the argumentation regarding chromosome elongation, as double depletion in Chr. 18 and Chr. 19 also leads to a significantly reduced circularity. Therefore, the schematic drawing Fig. 3 H (double depletion) seems very suggestive to me. This comment is related to the comment above (Major comment #2). See our reply to Reviewer #1 (Other points, Figure 2E).

      It should be noted that, in Figure 3 (unlike in Figure 2), we did not compare the different magnitudes of the effect observed between H2 depletion and double depletion. Thus, the reviewer’s comment that “Similar to Fig. 2, the authors argue about the different magnitude of the effect (H2 depletion vs double depletion) ” does not accurately reflected our description.

      Moreover, while the distance between two specific loci (Fig. 2E) and CT circularity (Fig. 3G) are intuitively related, they represent distinct parameters. Thus, it is not unexpected that double depletion resulted in apparently different outcomes for the two measurements. Thus, the reviewer’s counter-argument is not strictly applicable here.

      That said, we agree with the reviewer that our descriptions here need to be clarified.

      The differences between H2 depletion and double depletion are two-fold: (1) centromere dispersion is suppressed upon H2 depletion, but not upon double depletion (Fig 1G); (2) the distance between Cen 12 and 12q15 increased upon H2 depletion, but not upon double depletion (Fig 2E).

      We have decided to remove the “homologous pair overlap” panel (formerly Fig. 3E) from the revised manuscript. Accordingly, the corresponding sentence has been deleted from the main text. Instead, we have added a new panel of “aspect ratio”, defined as the ratio of the major to the minor axis (new Fig. 3F). While this intuitive parameter was altered upon condensin II depletion and double depletion, again, we acknowledge that it is not sufficient to convincingly distinguish between the elongated and cloud-like phenotypes proposed in the original manuscript. For these reasons, in the revised manuscript, we have toned down our statements regarding the differences in CT morphology between the two conditions. Nonetheless, together with the data from Figs. 1 and 2, it is clear that the Rabl configuration observed upon condensin II depletion is further exacerbated in the absence of cohesin. Accordingly, we have modified the main text and the cartoon (Fig 3H) to more accurately depict the observations summarized above.

      • 5 and accompanying text. I agree with the authors that this is a significant and very interesting effect. However, I believe the sharp bends is in most cases an artifact caused by the maximum intensity projection. I tried to illustrate this effect in two photographs: Reviewer Fig. 1, side view, and Reviewer Fig. 2, same situation top view (https://cloud.bio.lmu.de/index.php/s/77npeEK84towzJZ). As I said, in my opinion, there is a significant and important effect; the authors should simply adjust the description. This comment is well taken. We appreciate the reviewer’s effort to help clarify our original observations. We have therefore added a new section entitled “Limitations of the study” to explicitly describe the constrains of our current approach. That said, as the reviewer also acknowledges, our observations remain valid because all experiments were performed with appropriate controls.

      Minor concerns:

      • I would like to suggest proactively discussing possible artifacts that may arise from the harsh conditions during FISH sample preparation. We fully agree with the reviewer’s concerns. For FISH sample preparation, we used relatively harsh conditions, including (1) fixation under a hypotonic condition (0.3x PBS), (2) HCl treatment, and (3) a denaturation step. We recognize that these procedures inevitably affect the preservation of the original structure; however, they are unavoidable in the standard FISH protocol. We also acknowledge that our analyses were limited to 2D structures based on projected images, rather than full 3D reconstructions. These technical limitations are now explicitly described in a new section entitled “Limitations of the study”, and the technical details are provided in Materials and Methods.

      • It would be helpful if the authors could provide the original data (microscopic image stacks) for download. We thank the reviewer for this suggestion and understand that providing the original image stacks could be of interest to readers. We agree that if the nuclei were perfectly spherical, as is the case for example in lymphocytes, 3D image stacks would contain much more information than 2D projections. However, as is typical for adherent cultured cells, including the HCT116-derived cells used in this study, the nuclei are flattened due to cell adhesion to the culture dish, with a thickness of only about one-tenth of the nuclear diameter (10–20 μm). Considering also the inevitable loss of structural preservation during FISH sample preparation, we were concerned that presenting 3D images might confuse rather than clarify. We therefore believe that representing the data as 2D projections, while explicitly acknowledging the technical limitations, provides the clearest and most interpretable presentation of our results. These limitations are now described in a new section of the manuscript.

      • The authors use a blind deconvolution algorithm to improve image quality. It might be helpful to test other methods for this purpose (optional). We thank the reviewer for this valuable suggestion and fully agree that it is a valid point. We recognize that alternative image enhancement methods can offer advantages, particularly for smaller structures or when multiple probes are analyzed simultaneously. In our study, however, the focus was on detecting whole chromosome territories (CTs) and specific chromosomal loci, which can be visualized clearly with our current FISH protocol combined with blind deconvolution. We therefore believe that the image quality we obtained is sufficient to support the conclusions of this manuscript.

      Reviewer #3 (Significance (Required)):

      Advance:

      Ono et al. addresses the important question on how the complex pattern of chromatin is reestablished after mitosis and maintained during interphase. In addition to affinity interactions (1,2), it is known that cohesin plays an important role in the formation and maintenance of chromosome organization interphase (3). However, current knowledge does not explain all known phenomena. Even with complete loss of cohesin, TAD-like structures can be recognized at the single-cell level (4), and higher structures such as chromosome territories are also retained (5). The function of condensin II during mitosis is another important factor that affects chromosome architecture in the following G1 phase (6). Although condensin II is present in the cell nucleus throughout interphase, very little is known about the role of this protein in this phase of the cell cycle. This is where the present publication comes in, with a new double degron cell line in which essential subunits of cohesin AND condensin can be degraded in a targeted manner. I find the data from the experiments in the G2 phase most interesting, as they suggest a previously unknown involvement of condensin II in the maintenance of larger chromatin structures such as chromosome territories.

      The experiments regarding the M-G1 transition are less interesting to me, as it is known that condensin II deficiency in mitosis leads to elongated chromosomes (Rabl configuration)(6), and therefore the double degradation of condensin II and cohesin describes the effects of cohesin on an artificially disturbed chromosome structure.

      For further clarification, we provide below a table summarizing previous studies relevant to the present work. We wish to emphasize three novel aspects of the present study. First, newly established cell lines designed for double depletion enabled us to address questions that had remained inaccessible in earlier studies. Second, to our knowledge, no study has previously reported condensin II depletion, cohesin depletion and double depletion in G2-arrested cells. Third, the present study represents the first systematic comparison of two different stages of the cell cycle using multiscale FISH under distinct depletion conditions. Although the M-to-G1 part of the present study partially overlaps with previous work, it serves as an important prelude to the subsequent investigations. We are confident that the reviewer will also acknowledge this point.

      cell cycle

      cond II depletion

      cohesin depletion

      double depletion

      M-to-G1

      Hoencamp et al (2021); Abramo et al (2019); Brunner et al (2025);

      this study

      Schwarzer et al (2017);

      Wutz et al (2017);

      this study

      this study

      G2

      this study

      this study

      this study

      Hoencamp et al (2021): Hi-C and imaging (CENP-A distribution)

      Abramo et al (2019): Hi-C and imaging

      Brunner et al (2025): mostly imaging (chromatin tracing)

      Schwarzer et al (2017); Wutz et al (2017): Hi-C

      this study: imaging (multi-scale FISH)

      General limitations:

      (1) Single cell imaging of chromatin structure typically shows only minor effects which are often obscured by the high (biological) variability. This holds also true for the current manuscript (cf. major concern 2 and 3).

      See our reply above.

      (2) A common concern are artefacts introduced by the harsh conditions of conventional FISH protocols (7). The authors use a method in which the cells are completely dehydrated, which probably leads to shrinking artifacts. However, differences between samples stained using the same FISH protocol are most likely due to experimental variation and not an artefact (cf. minor concern 1).

      See our reply above.

      • The anisotropic optical resolution (x-, y- vs. z-) of widefield microscopy (and most other light microscopic techniques) might lead to misinterpretation of the imaged 3D structures. This seems to be the cases in the current study (cf. major concern 4). See our reply above.

      • In the present study, the cell cycle was synchronized. This requires the use of inhibitors such as the CDK1 inhibitor RO-3306. However, CDK1 has many very different functions (8), so unexpected effects on the experiments cannot be ruled out. The current approaches involving FISH inevitably require cell cycle synchronization. We believe that the use of the CDK1 inhibitor RO-3306 to arrest the cell cycle at G2 is a reasonable choice, although we cannot rule out unexpected effects arising from the use of the drug. This issue has now been addressed in the new section entitled “Limitations of the study”.

      Audience:

      The spatial arrangement of genomic elements in the nucleus and their (temporal) dynamics are of high general relevance, as they are important for answering fundamental questions, for example, in epigenetics or tumor biology (9,10). The manuscript from Ono et al. addresses specific questions, so its intended readership is more likely to be specialists in the field.

      We are confident that, given the increasing interest in the 3D genome and its role in regulating diverse biological functions, the current manuscript will attract the broad readership of leading journals in cell biology.

      About the reviewer:

      By training I'm a biologist with strong background in fluorescence microscopy and fluorescence in situ hybridization. In recent years, I have been involved in research on the 3D organization of the cell nucleus, chromatin organization, and promoter-enhancer interactions.

      We greatly appreciate the reviewer’s constructive comments on both the technical strengths and limitations of our fluorescence imaging approaches, which have been very helpful in revising the manuscript. As mentioned above, we have decided to add a special paragraph entitled “Limitations of the study” at the end of the Discussion section to discuss these issues.

      All questions regarding the statistics of angularly distributed data are beyond my expertise. The authors do not correct their statistical analyses for "multiple testing". Whether this is necessary, I cannot judge.

      We thank the reviewer for raising this important point. In our study, the primary comparisons were made between -IAA and +IAA conditions within the same cell line. Accordingly, the figures report P-values for these pairwise comparisons.

      For the distance measurements, statistical evaluations were performed in PRISM using ANOVA (Kruskal–Wallis test), and the P-values shown in the figures are based on these analyses (Fig. 1, G and H; Fig. 2 E; Fig. 3 F and G; Fig. 4 F; Fig. 6 F [right]–H; Fig. S2 B and G; Fig. S3 D and H; Fig. S5 A [right] and B [right]; Fig. S8 B). While the manuscript focuses on pairwise comparisons between -IAA and +IAA conditions within the same cell line, we also considered potential differences across cell lines as part of the same ANOVA framework, thereby ensuring that multiple testing was properly addressed. Because cell line differences are not the focus of the present study, the corresponding results are not shown.

      For the angular distribution analyses, we compared -IAA and +IAA conditions within the same cell line using the Mardia–Watson–Wheeler test; these analyses do not involve multiple testing (circular scatter plots; Fig. 5 C–E and Fig. S6 B, C, and E–H). In addition, to determine whether angular distributions exhibited directional bias under each condition, we applied the Rayleigh test to each dataset individually (Fig. 5 F and Fig. S6 I). As these tests were performed on a single condition, they are also not subject to the problem of multiple testing. Collectively, we consider that the statistical analyses presented in our manuscript appropriately account for potential multiple testing issues, and we remain confident in the robustness of the results.

      Literature

      Falk, M., Feodorova, Y., Naumova, N., Imakaev, M., Lajoie, B.R., Leonhardt, H., Joffe, B., Dekker, J., Fudenberg, G., Solovei, I. et al. (2019) Heterochromatin drives compartmentalization of inverted and conventional nuclei. Nature, 570, 395-399. Mirny, L.A., Imakaev, M. and Abdennur, N. (2019) Two major mechanisms of chromosome organization. Curr Opin Cell Biol, 58, 142-152. Rao, S.S.P., Huang, S.C., Glenn St Hilaire, B., Engreitz, J.M., Perez, E.M., Kieffer-Kwon, K.R., Sanborn, A.L., Johnstone, S.E., Bascom, G.D., Bochkov, I.D. et al. (2017) Cohesin Loss Eliminates All Loop Domains. Cell, 171, 305-320 e324. Bintu, B., Mateo, L.J., Su, J.H., Sinnott-Armstrong, N.A., Parker, M., Kinrot, S., Yamaya, K., Boettiger, A.N. and Zhuang, X. (2018) Super-resolution chromatin tracing reveals domains and cooperative interactions in single cells. Science, 362. Cremer, M., Brandstetter, K., Maiser, A., Rao, S.S.P., Schmid, V.J., Guirao-Ortiz, M., Mitra, N., Mamberti, S., Klein, K.N., Gilbert, D.M. et al. (2020) Cohesin depleted cells rebuild functional nuclear compartments after endomitosis. Nat Commun, 11, 6146. Hoencamp, C., Dudchenko, O., Elbatsh, A.M.O., Brahmachari, S., Raaijmakers, J.A., van Schaik, T., Sedeno Cacciatore, A., Contessoto, V.G., van Heesbeen, R., van den Broek, B. et al. (2021) 3D genomics across the tree of life reveals condensin II as a determinant of architecture type. Science, 372, 984-989. Beckwith, K.S., Ødegård-Fougner, Ø., Morero, N.R., Barton, C., Schueder, F., Tang, W., Alexander, S., Peters, J.-M., Jungmann, R., Birney, E. et al. (2023) Nanoscale 3D DNA tracing in single human cells visualizes loop extrusion directly in situ. BioRxiv 8 of 9https://doi.org/10.1101/2021.04.12.439407. Massacci, G., Perfetto, L. and Sacco, F. (2023) The Cyclin-dependent kinase 1: more than a cell cycle regulator. Br J Cancer, 129, 1707-1716. Bonev, B. and Cavalli, G. (2016) Organization and function of the 3D genome. Nat Rev Genet, 17, 661-678. Dekker, J., Belmont, A.S., Guttman, M., Leshyk, V.O., Lis, J.T., Lomvardas, S., Mirny, L.A., O'Shea, C.C., Park, P.J., Ren, B. et al. (2017) The 4D nucleome project. Nature, 549, 219-226.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The manuscript „Condensin II collaborates with cohesin to establish and maintain interphase chromosome territories" investigates how condensin II and cohesin contribute to chromosome organization during the M-to-G1 transition and in G2 phase using published auxin-inducible degron (AID) cell lines which render the respective protein complexes nonfunctional after auxin addition. In this study, a novel degron cell line was established that enables the simultaneous depletion of both protein complexes, thereby facilitating the investigation of synergistic effects between the two SMC proteins. The chromosome architecture is studied using fluorescence in situ hybridization (FISH) and light microscopy. The authors reproduce a number of already published data and also show that double depletion causes during the M-to-G1 transition defects on chromosome territories, producing expanded, irregular shapes that obscure condensin II-specific phenotypes. Findings in G2 cells point to a new role of condensin II for chromosome conformation at a scale of ~20Mb. Although individual depletion has minimal effects on large-scale CT morphology in G2, combined loss of both complexes produces marked structural abnormalities, including irregular crescent-shaped CTs displaced toward the nucleolus and increased nucleolus-CT contact. The authors propose that condensin II and cohesin act sequentially and complementarily to ensure proper post-mitotic CT formation and maintain chromosome architecture across genomic scales.

      Concerns about statistics:

      (1) The authors provide the information on how many cells are analyzed but not the number of independent experiments. My concern is that there might variations in synchronization of the cell population and in the subsequent preparation (FISH) affecting the final result.

      (2) Statistically the authors analyze the effect of cells with induced degron vs. vehicle control (non-induced). However, the biologically relevant question is whether the data differ between cell lines when the degron system is induced. This is not tested here (cf. major concern 2 and 3).

      (3) Some Journal ask for blinded analysis of the data which might make sense here as manual steps are involved in the data analysis (e.g. line 626 / 627the convex hull of the signals was manually delineated, line 635 / 636 Chromosome segmentation in FISH images was performed using individual thresholding). However personally I have no doubts on the correctness of the work.

      Major concerns:

      (1) Degron induction appears to delay in Rad21-AID#1 an Double-AID#1 cells the transition from M to G1, as shown in Fig. S1. After auxin treatment, more cells exhibit a G2 phenotype than in an untreated population. What are the implications of this for the interpretation of the experiments?

      (2) Line 178 "In contrast, cohesin depletion had a smaller effect on the distance between the two site-specific probes compared to condensin II depletion (Fig. 2, C and E)." The data in Fig. 2 E show both a significant effect of H2 and a significant effect of RAD21 depletion. Whether the absolute difference in effect size between the two conditions is truly relevant is difficult to determine, as the distribution of the respective control groups also appears to be different.

      (3) In Figures 3, S3 and related text in the manuscript I cannot follow the authors' argumentation, as H2 depletion alone leads to a significant increase in the CT area (Chr. 18, Chr. 19, Chr. 15). Similar to Fig. 2, the authors argue about the different magnitude of the effect (H2 depletion vs double depletion). Here, too, appropriate statistical tests or more suitable parameters describing the effect should be used. I also cannot fully follow the argumentation regarding chromosome elongation, as double depletion in Chr. 18 and Chr. 19 also leads to a significantly reduced circularity. Therefore, the schematic drawing Fig. 3 H (double depletion) seems very suggestive to me.

      (4) Fig. 5 and accompanying text. I agree with the authors that this is a significant and very interesting effect. However, I believe the sharp bends is in most cases an artifact caused by the maximum intensity projection. I tried to illustrate this effect in two photographs: Reviewer Fig. 1, side view, and Reviewer Fig. 2, same situation top view (https://cloud.bio.lmu.de/index.php/s/77npeEK84towzJZ). As I said, in my opinion, there is a significant and important effect; the authors should simply adjust the description.

      Minor concerns:

      (1) I would like to suggest proactively discussing possible artifacts that may arise from the harsh conditions during FISH sample preparation..

      (2) It would be helpful if the authors could provide the original data (microscopic image stacks) for download

      (3) The authors use a blind deconvolution algorithm to improve image quality. It might be helpful to test other methods for this purpose (optional).

      Significance

      Advance:

      Ono et al. addresses the important question on how the complex pattern of chromatin is reestablished after mitosis and maintained during interphase. In addition to affinity interactions (1,2), it is known that cohesin plays an important role in the formation and maintenance of chromosome organization interphase (3). However, current knowledge does not explain all known phenomena. Even with complete loss of cohesin, TAD-like structures can be recognized at the single-cell level (4), and higher structures such as chromosome territories are also retained (5). The function of condensin II during mitosis is another important factor that affects chromosome architecture in the following G1 phase (6). Although condensin II is present in the cell nucleus throughout interphase, very little is known about the role of this protein in this phase of the cell cycle. This is where the present publication comes in, with a new double degron cell line in which essential subunits of cohesin AND condensin can be degraded in a targeted manner. I find the data from the experiments in the G2 phase most interesting, as they suggest a previously unknown involvement of condensin II in the maintenance of larger chromatin structures such as chromosome territories. The experiments regarding the M-G1 transition are less interesting to me, as it is known that condensin II deficiency in mitosis leads to elongated chromosomes (Rabl configuration)(6), and therefore the double degradation of condensin II and cohesin describes the effects of cohesin on an artificially disturbed chromosome structure.

      General limitations:

      (1) Single cell imaging of chromatin structure typically shows only minor effects which are often obscured by the high (biological) variability. This holds also true for the current manuscript (cf. major concern 2 and 3).

      (2) A common concern are artefacts introduced by the harsh conditions of conventional FISH protocols (7). The authors use a method in which the cells are completely dehydrated, which probably leads to shrinking artifacts. However, differences between samples stained using the same FISH protocol are most likely due to experimental variation and not an artefact (cf. minor concern 1).

      (3) The anisotropic optical resolution (x-, y- vs. z-) of widefield microscopy (and most other light microscopic techniques) might lead to misinterpretation of the imaged 3D structures. This seems to be the cases in the current study (cf. major concern 4).

      (4) In the present study, the cell cycle was synchronized. This requires the use of inhibitors such as the CDK1 inhibitor RO-3306. However, CDK1 has many very different functions (8), so unexpected effects on the experiments cannot be ruled out.

      Audience:

      The spatial arrangement of genomic elements in the nucleus and their (temporal) dynamics are of high general relevance, as they are important for answering fundamental questions, for example, in epigenetics or tumor biology (9,10). The manuscript from Ono et al. addresses specific questions, so its intended readership is more likely to be specialists in the field.

      About the reviewer: By training I'm a biologist with strong background in fluorescence microscopy and fluorescence in situ hybridization. In recent years, I have been involved in research on the 3D organization of the cell nucleus, chromatin organization, and promoter-enhancer interactions.

      All questions regarding the statistics of angularly distributed data are beyond my expertise. The authors do not correct their statistical analyses for "multiple testing". Whether this is necessary, I cannot judge.

      Literature

      1. Falk, M., Feodorova, Y., Naumova, N., Imakaev, M., Lajoie, B.R., Leonhardt, H., Joffe, B., Dekker, J., Fudenberg, G., Solovei, I. et al. (2019) Heterochromatin drives compartmentalization of inverted and conventional nuclei. Nature, 570, 395-399.
      2. Mirny, L.A., Imakaev, M. and Abdennur, N. (2019) Two major mechanisms of chromosome organization. Curr Opin Cell Biol, 58, 142-152.
      3. Rao, S.S.P., Huang, S.C., Glenn St Hilaire, B., Engreitz, J.M., Perez, E.M., Kieffer-Kwon, K.R., Sanborn, A.L., Johnstone, S.E., Bascom, G.D., Bochkov, I.D. et al. (2017) Cohesin Loss Eliminates All Loop Domains. Cell, 171, 305-320 e324.
      4. Bintu, B., Mateo, L.J., Su, J.H., Sinnott-Armstrong, N.A., Parker, M., Kinrot, S., Yamaya, K., Boettiger, A.N. and Zhuang, X. (2018) Super-resolution chromatin tracing reveals domains and cooperative interactions in single cells. Science, 362.
      5. Cremer, M., Brandstetter, K., Maiser, A., Rao, S.S.P., Schmid, V.J., Guirao-Ortiz, M., Mitra, N., Mamberti, S., Klein, K.N., Gilbert, D.M. et al. (2020) Cohesin depleted cells rebuild functional nuclear compartments after endomitosis. Nat Commun, 11, 6146.
      6. Hoencamp, C., Dudchenko, O., Elbatsh, A.M.O., Brahmachari, S., Raaijmakers, J.A., van Schaik, T., Sedeno Cacciatore, A., Contessoto, V.G., van Heesbeen, R., van den Broek, B. et al. (2021) 3D genomics across the tree of life reveals condensin II as a determinant of architecture type. Science, 372, 984-989.
      7. Beckwith, K.S., Ødegård-Fougner, Ø., Morero, N.R., Barton, C., Schueder, F., Tang, W., Alexander, S., Peters, J.-M., Jungmann, R., Birney, E. et al. (2023) Nanoscale 3D DNA tracing in single human cells visualizes loop extrusion directly in situ. BioRxiv https://doi.org/10.1101/2021.04.12.439407.
      8. Massacci, G., Perfetto, L. and Sacco, F. (2023) The Cyclin-dependent kinase 1: more than a cell cycle regulator. Br J Cancer, 129, 1707-1716.
      9. Bonev, B. and Cavalli, G. (2016) Organization and function of the 3D genome. Nat Rev Genet, 17, 661-678.
      10. Dekker, J., Belmont, A.S., Guttman, M., Leshyk, V.O., Lis, J.T., Lomvardas, S., Mirny, L.A., O'Shea, C.C., Park, P.J., Ren, B. et al. (2017) The 4D nucleome project. Nature, 549, 219-226.
    1. o reason why younger students would not benefit from dis- cussion teaching,

      This is why even with my second graders I am still encouraging and promoting those academic discussions. No matter how young they are, they can all benefit from it.

    1. /hyperpost/🌐/🧊/snarf-peergos.chat/

      Use this link to view the page in Peergs

      close the view and the enclosing folder is shown

      where the page can be edited using indy0pad.next

  2. rws511.pbworks.com rws511.pbworks.com
    1. o virtually anyone who wants to pay them, theysell the capacity to precisely target our eyeballs.

      This is what leads to partisan SNS, because the type and form of an advert will be different along party/ideological lines

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The manuscript by Dufour et al. is a follow-up on the groups' previous publication that introduced the photo-inducible Cre recombinase, LiCre. In the present work, the authors further characterize the properties and kinetics of their optogenetic switch. Initially, the authors show that light affects only LiCre-mediated recombination itself and not DNA binding. Following these observations, they measure and mathematically model LiCre kinetics demonstrating high efficiency in vivo and a surprising temperature sensitivity. Finally, Dufour et al. evaluate several mutations that affect the LOV photo-cycle and provide recommendation for LiCre applications. The study thoroughly investigates various aspects of the function of LiCre, confirming some previously known characteristics (i.e. temperature-dependence of Cre activity and functionality of LOV-based optogenetic tools in yeast without co-factor supplementation), while providing new LiCre-specific insights (kinetics, light-independent DNA binding). Please note that the reviewer is no expert in mathematical modeling and cannot fully judge the methodological details of the models. While I have some concerns as listed below, I believe study should be well-suited for publication after a revision.

      Major comments:

      1. After completing the initial experiment, the authors discovered that their plasmids carry different numbers of V5 epitopes. I am wondering whether this was due to a recombination event happening during the experiment or whether the constructs were not sequence verified prior to use? In any case, an additional ChIP experiment using Cre and LiCre constructs with the identical number of tag-repeats will be necessary. The result, i.e. the strong reduction of DNA-binding of LiCre (which is close to the negative control), is quite remarkable given that LiCre is still considerably active and high DNA affinities were observed in SPR experiments. In light of these counterindications, identical experiment conditions for test and reference group become even more important.
      2. The conclusion that DNA-binding of LiCre is completely light-independent is not entirely convincing to me. The differences between the light and dark conditions in Fig. 2d are indeed small, but the values for LiCre are almost on par with the vector control and therefore hard to interpret. Based on this experiment alone, one could even be inclined to argue that LiCre does not bind DNA at all (which is of course falsified by the later experiments), showing that the resolution of the corresponding dataset is too low to draw final conclusions. Light-independent DNA binding should either be confirmed by a more sensitive method or the conclusion statements on this matter should be revised accordingly.
      3. If I understand the explanations correctly, replicates and plotted data points refer to multiple samples (different colonies), that were handled in a single experiment, i.e. by one researcher at the same time/same day. As already mentioned by the authors in the main text, this workflow explains the considerable differences between some of the results in the present manuscript and an identical experiment in a previous publication by the same authors. Providing truly independent experiments (performed on different days) that are therefore independent towards variables such as the fluctuation in incubation temperature (which was the issue in the described experiments) will be crucial, at least for the key datasets.

      Minor comments:

      1. At the end of the Introduction, the authors mention that the interaction of the Cre heptamers was weakened via point mutations in LiCre. A short sentence about the engineering rationale behind this weakened interaction would help readers, who are not familiar with the author's prior work.
      2. Fig. 2a-b depicts images relating to the purification procedure. These could be moved to the supplements as they don't provide any insight apart from the fact that the proteins were successfully purified.
      3. The kinetic characterization was only performed for LiCre. Especially for scientists, who have worked with wildtype Cre before, a side-by-side comparison with wt Cre would be valuable to judge the loss in reaction speed that has to be expected when switching from Cre to LiCre.
      4. The difference between the ChIP results and the SPR results is striking but not mentioned in the discussion section. Also, the statement: "Finally, our results have practical implications on experimental protocols employing LiCre. First, given its high affinity for loxP (Fig. 5b), over-expressing LiCre at high levels will probably not increase its efficiency." (line 502) refers only to the affinity but seems to ignore the low DNA-occupancy of LiCre observed in Fig. 2d. Adapting the discussion section accordingly would improve the manuscript.

      Significance

      General assessment and advance:

      The present study provides a large set of experiments and analyses characterizing the optogenetic LiCre recombinase. In general, the study is well conceived and executed. Although some of my concerns listed above affect key aspects of the study, they should be straightforward to address. The manuscript is a follow-up study providing a more detailed characterization of an optogenetic tool previously developed by the same authors. Its novelty is therefore somewhat limited. While the study provides a rich body of additional data, many of the findings merely confirmed aspects that were to be expected based on the two proteins LiCre is built of (temperature-dependent activity of Cre, optogenetics in yeast w/o the need of co-factor supplementation, weaker DNA-affinity of the Cre fusion protein as compared to wildtype Cre). New insights are provided by the facts that (i) light only controls recombination but not DNA binding and (ii) light activation of only some protomers within the LiCre heptamer is likely to be sufficient to activate recombination. The former aspect is, however, not entirely evident from the results as described above.

      Audience:

      The study will be of interest for researchers focusing on inducible DNA recombination and especially relevant to those who plan to work with LiCre and can now rely on a more detailed and extended characterization compared to the original LiCre publication.

    1. En términos computacionales, mientras que BFGS requiere O(n2) memoria, L-BFGS-B reduce el costo a O(mn), con m≪n (típicamente 3≤m≤20).

      explicar el O(n^2)

    Annotators

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Bisht et al address the hypothesis that protein folding chaperones may be implicated in aggregopathies and in particular Tau aggregation, as a means to identify novel therapeutic routes for these largely neurodegenerative conditions.

      The authors conducted a genetic screen in the Drosophila eye, which facilitates the identification of mutations that either enhance or suppress a visible disturbance in the nearly crystalline organization of the compound eye. They screened by RNA interference all 64 known Drosophila chaperones and revealed that mutations in 20 of them exaggerate the Tau-dependent phenotype, while 15 ameliorated it. The enhancer of the degeneration group included 2 subunits of the typically heterohexameric prefoldin complex and other co-translational chaperones.

      The authors characterized in depth one of the prefoldin subunits, Pfdn5, and convincingly demonstrated that this protein functions in the regulation of microtubule organization, likely due to its regulation of proper folding of tubulin monomers. They demonstrate convincingly using both immunohistochemistry in larval motor neurons and microtubule binding assays that Pfdn5 is a bona fide microtubule-associated protein contributing to the stability of the axonal microtubule cytoskeleton, which is significantly disrupted in the mutants.

      Similar phenotypes were observed in larvae expressing Frontotemporal dementia with Parkinsonism on chromosome 17-associated mutations of the human Tau gene V377M and R406W. On the strength of the phenotypic evidence and the enhancement of the TauV377Minduced eye degeneration, they demonstrate that loss of Pfdn5 exaggerates the synaptic deficits upon expression of the Tau mutants. Conversely, the overexpression of Pfdn5 or Pfdn6 ameliorates the synaptic phenotypes in the larvae, the vacuolization phenotypes in the adult, and even memory defects upon TauV377M expression.

      Strengths

      The phenotypic analyses of the mutant and its interactions with TauV377M at the cell biological, histological, and behavioral levels are precise, extensive, and convincing and achieve the aims of characterization of a novel function of Pfdn5. 

      Regarding this memory defect upon V377M tau expression. Kosmidis et al (2010), PMID: 20071510, demonstrated that pan-neuronal expression of Tau<sup>V377M</sup> disrupts the organization of the mushroom bodies, the seat of long-term memory in odor/shock and odor/reward conditioning. If the novel memory assay the authors use depends on the adult brain structures, then the memory deficit can be explained in this manner. 

      (1) If the mushroom bodies are defective upon Tau<sup>V377M</sup>. expression, does overexpression of Pfdn5 or 6 reverse this deficit? This would argue strongly in favor of the microtubule stabilization explanation.

      We thank the reviewer for this insightful comment. Consistent with Kosmidis et al. (2010), we confirm that expression of hTau<sup>V377M</sup> disrupts the architecture of mushroom bodies.   In addition, we find, as suggested by the reviewer, that coexpression of either Pfdn5 or Pfdn6 with hTau<sup>V377M</sup> significantly restores the organization of the mushroom bodies. These new findings strongly support the hypothesis that Pfdn5 or Pfdn6 mitigate hTau<sup>V377M</sup> -induced memory deficits by preserving the structure of the mushroom body, likely through stabilizing the microtubule network. This data has now been included in the revised manuscript (Figure 7H-O).

      (2) The discovery that Pfdn5 (and 6 most likely) affects tauV377M toxicity is indeed a novel and important discovery for the Tauopathies field. It is important to determine whether this interaction affects only the FTDP-17-linked mutations or also WT Tau isoforms, which are linked to the rest of the Tauopathies. Also, insights on the mode(s) that Pfdn5/6 affect Tau toxicity, such as some of the suggestions above, are aiming at will likely be helpful towards therapeutic interventions.

      We agree that determining whether prefoldin modulates the toxicity of both mutant and wildtype Tau is critical for understanding its broader relevance to Tauopathies. We have now performed additional experiments required to address this issue. These new data show that loss of Pfdn5 also exacerbates toxicity associated with wildype Tau (hTau<sup>WT</sup>), in a manner similar to that observed with hTau<sup>V337M</sup> or hTau<sup>R406W</sup>. Specifically, overexpression of hTau<sup>WT</sup> in a Pfdn5 mutant background leads to Tau aggregate formation (Figure S7G-I), and coexpression of Pfdn5 with hTau<sup>WT</sup> reduces the associated synaptic defects (Figure S11F-L). These findings underscore a general role for Pfdn5 in modulating diverse Tauopathy-associated phenotypes and suggest that it could be a broadly relevant therapeutic target. 

      Weakness

      (3) What is unclear, however, is how Pfdn5 loss or even overexpression affects the pathological Tau phenotypes. Does Pfdn5 (or 6) interact directly with TauV377M? Colocalization within tissues is a start, but immunoprecipitations would provide additional independent evidence that this is so.

      We appreciate this important suggestion. To investigate a potential direct interaction between Pfdn5 and Tau<sup>V377M</sup>, we performed co-immunoprecipitation experiments using lysates from adult fly brain expressing hTau<sup>V337M</sup>. Under the conditions tested, we did not detect a direct physical interaction. While this does not support a direct interaction, it does not strongly refute it either. We note that Pfdn5 and Tau are colocalized within axons (Figure S13J-K). At this stage, we are unable to resolve the issue of direct vs indirect association. If indirect, then Tau and Pfdn5 act within the same subcellular compartments (axon); if direct, then either only a small fraction of the total cellular proteins is in the Tau-Pfdn5 complex and therefore difficult to detect in bulk protein westerns, or the interactions are dynamic or occur in conditions that we have not been able to mimic in vitro. 

      (4) Does Pfdn5 loss exacerbate Tau<sup>V377M</sup> phenotypes because it destabilizes microtubules, which are already at least partially destabilized by Tau expression? Rescue of the phenotypes by overexpression of Pfdn5 agrees with this notion. 

      However, Cowan et al (2010) pmid: 20617325 demonstrated that wildtype Tau accumulation in larval motor neurons indeed destabilizes microtubules in a Tau phosphorylation-dependent manner. So, is Tau<sup>V377M</sup> hyperphosphorylated in the larvae?? What happens to Tau<sup>V377M</sup> phosphorylation when Pfdn5 is missing and presumably more Tau is soluble and subject to hyperphosphorylation as predicted by the above?

      We completely agree that it is important to link Tau-induced phenotypes with the microtubule destabilization and phosphorylation state of Tau.   We performed immunostaining using futsch antibody to check the microtubule organization at the NMJ and observed a severe reduction in futsch intensity when Tau<sup>V337M</sup> was expressed in the Pfdn5 mutant (ElavGal4>Tau<sup>V337M</sup>; DPfdn5<sup>15/40</sup>), suggesting that Pfdn5 absence exacerbates the hTau<sup>V337M</sup> defects due to more microtubule destabilization (Figure S6F-J). 

      We have performed additional experiments to examine the phosphorylation state of hTau in Drosophila larval axons. Immunocytochemistry indicated that only a subset of hTau aggregates in Pfdn5 mutants (Elav-Gal4>Tau<sup>V337M</sup>; DPfdn5<sup>15/40</sup>) are recognized by phospho-hTau antibodies.   For instance, the AT8 antibody (targeting pSer202/pThr205) (Goedert et al., 1995) labelled only a subset of aggregates identified by the total hTau antibody (D5D8N) (Figure S9AE). Moreover, feeding these larvae (Elav-Gal4>Tau<sup>V337M</sup; DPfdn5<sup>15/40</sup>) with LiCl, which blocks GSK3b, still showed robust Tau aggregation (Figure S9F-J). 

      These results imply that: a) soluble phospho-hTau levels in Pfdn5 mutants are low and not reliably detected with a single phospholylation-specific antibody; b) Loss of Pfdn5 results in Tau aggregation in a hyperphosphorylation-independent manner similar to what has been reported earlier (LI et al. 2022); and c) the destabilization of microtubules in Elav-Gal4>Tau<sup>V337M</sup>; DPfdn5<sup>15/40</sup> results in Tau dissociation and aggregate formation. These data and conclusions have been incorporated into the revised manuscript.

      (5) Expression of WT human Tau (which is associated with most common Tauopathies other than FTDP-17) as Cowan et al suggest has significant effects on microtubule stability, but such Tauexpressing larvae are largely viable. Will one mutant copy of the Pfdn5 knockout enhance the phenotype of these larvae?? Will it result in lethality? Such data will serve to generalize the effects of Pfdn5 beyond the two FDTP-17 mutations utilized.

      We have now examined whether heterozygous loss of Pfdn5 (∆Pfdn5/+) enhances the effect of Tau expression. While each genotype (hTau<sup>V337M</sup>, hTau<sup>WT</sup> or ∆Pfdn5/+) alone is viable, Elav-Gal4 driven expression of hTau<sup>V337M</sup> or hTau<sup>WT</sup> in Pfdn5 heterozygous background does not cause lethality. 

      (6) Does the loss of Pfdn5 affect TauV377M (and WTTau) levels?? Could the loss of Pfdn5 simply result in increased Tau levels? And conversely, does overexpression of Pfdn5 or 6 reduce Tau levels?? This would explain the enhancement and suppression of Tau<sup>V377M</sup> (and possibly WT Tau) phenotypes. It is an easily addressed, trivial explanation at the observational level, which, if true, begs for a distinct mechanistic approach.

      To test whether Pfdn5 modulates Tau phenotypes by altering Tau protein levels, we performed western blot analysis under Pfdn5 or Pfdn6 overexpression conditions and observed no change in hTau<sup>V337M</sup> levels (Figure 6O). However, in the absence of Pfdn5, both hTau<sup>V337M</sup> and hTau<sup>WT</sup> form large, insoluble aggregates that are not detected in soluble lysates by standard western blotting but are visualized by immunocytochemistry (Figure S7G-I). Thus, the apparent reduction in Tau levels on western blots reflects a solubility shift, not an actual decrease in Tau expression. These findings argue against a simple model in which Pfdn5 regulates Tau abundance and instead support a mechanism in which Pfdn5 loss leads to change in Tau conformation, leading to its sequesteration away for already destabilized microtubules.  

      (7) Finally, the authors argue that Tau<sup>V377M</sup> forms aggregates in the larval brain based on large puncta observed especially upon loss of Pfdn5. This may be so, but protocols are available to validate this molecularly the presence of insoluble Tau aggregates (for example, pmid: 36868851) or soluble Tau oligomers, as these apparently differentially affect Tau toxicity. Does Pfdn5 loss exaggerate the toxic oligomers, and overexpression promote the more benign large aggregates??

      We have performed additional experiments to analyze the nature of these aggregates using 1,6-HD. The 1,6-hexanediol can dissolve the Tau aggregate seeds formed by Tau droplets, but cannot dissolve the stable Tau aggregates (WEGMANN et al. 2018). We observed that 5% 1,6hexanediol failed to dissolve these Tau aggregates (Figure S8), demonstrating the formation of stable filamentous flame-shaped NFT-like aggregates in the absence of Pfdn5 (Figure 5D and Figure S9).

      Reviewer #2 (Public review):

      Bisht et al detail a novel interaction between the chaperone, Prefoldin 5, microtubules, and taumediated neurodegeneration, with potential relevance for Alzheimer's disease and other tauopathies. Using Drosophila, the study shows that Pfdn5 is a microtubule-associated protein, which regulates tubulin monomer levels and can stabilize microtubule filaments in the axons of peripheral nerves. The work further suggests that Pfdn5/6 may antagonize Tau aggregation and neurotoxicity. While the overall findings may be of interest to those investigating the axonal and synaptic cytoskeleton, the detailed mechanisms for the observed phenotypes remain unresolved and the translational relevance for tauopathy pathogenesis is yet to be established. Further, a number of key controls and important experiments are missing that are needed to fully interpret the findings.

      The strength of this study is the data showing that Pfdn5 localizes to axonal microtubules and the loss-of-function phenotypic analysis revealing disrupted synaptic bouton morphology. The major weakness relates to the experiments and claims of interactions with Tau-mediated neurodegeneration. 

      In particular, it is unclear whether knockdown of Pfdn5 may cause eye phenotypes independent of Tau. 

      Our new experiments confirm that knockdown of Pfdn5 alone does not cause eye phenotypes.

      Further, the GMR>tau phenotype appears to have been incorrectly utilized to examine agedependent, neurodegeneration.

      In response, we have modulated and explained our conclusions in this regard as described later in our “rebuttal.”

      This manuscript argues that its findings may be relevant to thinking about mechanisms and therapies applicable to tauopathies; however, this is premature given that many questions remain about the interactions from Drosophila, the detailed mechanisms remain unresolved, and absent evidence that Tau and Pfdn may similarly interact in the mammalian neuronal context. Therefore, this work would be strongly enhanced by experiments in human or murine neuronal culture or supportive evidence from analyses of human data.

      The reviewer is correct that the impact would be greater if Pfdn5-Tau interactions were also examined in human tissue.   While we have not attempted these experiments ourselves, we hope that our observations will stimulate others to test the conservation of phenomena we describe. There are, however, several lines of circumstantial evidence from human Alzheimer’s disease datasets that implicate PFDN5 in disease pathology. For example, recent compilations and analyses of proteomic data show reductions of CCT components, TBCE, as well as Prefoldin subunits, including PFDN5, in AD tissue (HSIEH et al. 2019; TAO et al. 2020; JI et al. 2022; ASKENAZI et al. 2023; LEITNER et al. 2024; SUN et al. 2024). Furthermore, whole blood mRNA expression data from Alzheimer's patients revealed downregulation of PFDN5 transcript (JI et al. 2022). Together, these findings from human data are consistent with the roles of PFDN5 in suppressing diverse neurodegenerative processes. We have incorporated these points into the discussion section of the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      See public review for experimental recommendations focusing on the Tau Pfdn interactions.  I would refrain from using the word aggregates, I would call them puncta, unless there is molecular or visual (ie AFM) evidence that they are indeed insoluble aggregates.  Finally, although including the full genotypes written out below the axis in the bar graphs is appreciated, it nevertheless makes them difficult to read due to crowding in most cases and somewhat distracting from the figure. 

      In my opinion, a more reader-friendly manner of reporting the phenotypes will be highly helpful. For example, listing each component of the genotype on the left of each bar graph and adding a cross or a filled circle under the bar to inform of the full genotype of the animals used.

      As described in the response to the previous comment, we now have strong direct evidences to support our view that the observed puncta are stable Tau aggregates. Thus, we feel justified to use the term Tau-aggregates in preference to Tau puncta. 

      We have tried to write the genotypes to make them more reader-friendly.

      Reviewer #2 (Recommendations for the authors):

      (1) Lines 119-121: 35 modifiers from 64 seem like an unusually high hit rate. Are these individual genes or lines? Were all modifiers supported by at least 2 independent RNAi strains targeting non-overlapping sequences? A supplemental table should be included detailing all genes and specific strains tested, with corresponding results.

      We agree with the reviewer that 35 modifiers from 64 genes may be too high. However, since the genes knocked down in the study are chaperones, crucial for maintaining proteostasis, we may have got unusually high hits. The information related to individual genes and lines is provided in Supplemental Table 1. We have now included an additional Supplemental Table 3, which lists the genes and the RNAi lines used in Figure 1, detailing the sequence target information. The table also specifies the number of independent RNAi strains used and the corresponding results. 

      (2) Figure 1: The authors quantify the areas of ommatidial fusion and necrosis as degeneration, but it is difficult to appreciate the aberrations in the photos provided. Was any consideration given to also quantifying eye size?

      We have processed the images to enhance their contrast and make the aberrations clearer. The percentage of degenerated eye area (Figure 1M) was normalized with total eye area. The method for quantifying degenerated area has been explained in the materials and methods section.

      (3) Figure 1: a) Only enhancers of rough eyes are shown but no controls are included to evaluate whether knockdown of these genes causes eye toxicity in the absence of Tau. These are important missing controls. All putative Tau enhancers, including Pdn5/6, need to be tested with GMR-GAL4 independently of Tau to determine whether they cause a rough eye. In a previous publication from some of the same investigators (Raut et al 2017), knockdown of Pfdn using eyGAL4 was shown to induce severe eye morphology defects - this raises questions about the results shown here. 

      We agree that assessing the effects of HSP knockdown independent of Tau is essential to confirm modifier specificity. We have now performed these knockdowns, and the data are reported in Supplemental Table 1. For RNAi lines represented in Figure 1, which enhanced Tau-induced degeneration/eye developmental defect, except for one of the RNAi lines against Pfdn6 (GD34204), no detectable eye defects were observed when knocked down with GMR-Gal4 at 25°C, suggesting that enhancement is specific to the Tau background. 

      Use of a more eye-specific GMR-Gal4 driver at 25°C versus broader expressing ey-Gal4 at 29°C in prior work (Raut et al. 2017) likely reflects the differences in the eye morphological defects.

      (b) Besides RNAi, do the classical Pdn5 deletion alleles included in this work also enhance the tau rough eye when heterozygous? Please also consider moving the Pfdn5/6 overexpression studies to evaluate possible suppression of the Tau rough eye to Figure 1, as it would enhance the interpretation of these data (but see also below).

      GMR-Gal4 driven expression of hTau<sup>V337M</sup> or hTau<sup>WT</sup> in Pfdn5 heterozygous background does not enhance rough eye phenotype. 

      (4) For genes of special interest, such as Pdn5, and other genes mentioned in the results, the main figure, or discussion, it is also important to perform quantitative PCR to confirm that the RNAi lines used actually knock down mRNA expression and by how much. These studies will establish specificity.

      We agree that confirming RNAi efficiency via quantitative PCR (qPCR) is essential for validating the knockdown efficiency. We have now included qPCR data, especially for key modifiers, confirming effective knockdown (Figure S2).

      (5) Lines 235-238: how do you conclude whether the tau phenotype is "enhanced" when Pfdn5 causes a similar phenotype on its own? Could the combination simply be additive? Did overexpression of Pdn5 suppress the UAS-hTau NMJ bouton phenotype (see below)? 

      Although Pfdn5 mutants and hTau expression individually increase satellite boutons, their combination leads to a significantly more severe and additional phenotype, such as significantly decreased bouton size and increased bouton number, indicating an enhancing rather than purely additive interaction (Figure 4 and Figure S6C). Moreover, we now show that overexpression of Pfdn5 significantly suppressed the hTau<sup>V337M</sup>-induced NMJ phenotypes. This new data has been incorporated as Figure S11F-L in the revised manuscript. 

      Alternatively, did the authors consider reducing fly tau in the Pdn5 mutant background?

      In new additional experiments, we observe that double mutants for Drosophila Tau (dTau) and Pfdn5 also exhibit severe NMJ defects, suggesting genetic interactions between dTau and Pfdn5. This data is shown below for the reviewer.

      Author response image 1.

      A double mutant combination of dTau and Pfdn5 aggravates the synaptic defects at the Drosophila NMJ. (A-D') Confocal images of NMJ synapses at muscle 4 of A2 hemisegment showing synaptic morphology in (A-A') control, (B-B') ΔPfdn5<SUP>15/40</SUP>, (C-C') dTauKO/dTauKO (Drosophila Tau mutant), (D-D') dTauKO/dTauKO; ∆Pfdn5<SUP>15/40</SUP> double immunolabeled for HRP (green), and CSP (magenta). The scale bar in D for (A-D') represents 10 µm. 

      (6) It may be important to further extend the investigation to the actin cytoskeleton. It is noted that Pfdn5 also stabilizes actin. Importantly, tau-mediated neurodegeneration in Drosophila also disrupts the actin cytoskeleton, and many other regulators of actin modify tau phenotypes.

      We appreciate the suggestion to examine the actin cytoskeleton. While prior studies indicate that Pfdn5 might regulate the actin cytoskeleton and that Tau<sup>V377M</sup> hyperstabilizes the actin cytoskeleton, we did not observe altered actin levels in Pfdn5 mutants (Figure 2G). However, actin dynamics may represent an additional mechanism through which Pfdn5 might temporally influence Tauopathy. Future work will address potential actin-related mechanisms in Tauopathy.

      (7) Figure 2: in the provided images, it is difficult to appreciate the futsch loops. Please include an image with increased magnification. It appears that fly strains harboring a genomic rescue BAC construct are available for Pfdn-this would be a complementary reagent to test besides Pfdn overexpression.

      We have updated Figure 2 to include high magnification NMJ images as insets, clearly showing the Futsch loops. While we have not yet tested a genomic rescue BAC construct for Pfdn5, we plan to use the fly line harboring this construct in future work.

      (8) Figure 3: Some of the data is not adequately explained. The use of Ran as a loading control seems rather unusual. What is the justification? Pfdn appears to only partially co-localize with a-tubulin in the axon; can the authors discuss or explain this? Further, in Pfdn5 mutants, there appears to be a loss of a-tubulin staining (3b'); this should also be discussed.

      We appreciate the reviewer's concern regarding the choice of loading control for our Western blot analysis. Importantly, since Tubulin levels and related pathways were the focus of our analysis, traditional loading controls such as α- or β-tubulin or actin were deemed unsuitable due to potential co-regulation. Ran, a nuclear GTPase involved in nucleocytoplasmic transport, is not known to be transcriptionally or post-translationally regulated by Tubulin-associated signaling pathways. To ensure its reliability as a loading control, we confirmed by densitometric analysis that Ran expression showed minimal variability across all samples. Hence, we used Ran for accurate normalization in the Western blot data represented in this manuscript. We have also used GAPDH as a loading control and found no difference with respect to Ran as a loading control across samples.

      We appreciate the reviewer's comment regarding the interpretation of our Pearson's correlation coefficient (PCC) results. While the mean colocalization value of 0.6 represents a moderate positive correlation (MUKAKA 2012), which may not reach the conventional threshold for "high positive" colocalization (usually considered 0.7-0.9), it nonetheless indicates substantial spatial overlap between the proteins of interest. Importantly, colocalization analysis provides supportive but indirect evidence for molecular proximity.  To further validate the interaction, we performed a microtubule binding assay, which directly demonstrates the binding of Pfdn5 to stabilized microtubules.

      In accordance with the western blot analysis shown in Figure 2G-I, the levels of Tubulin are reduced in the Pfdn5 mutants (Figure 3B''). We have incorporated and discussed this in the revised manuscript.

      (9) Figure 4: Overexpression of Pfdn appears to rescue the supernumerary satellite bouton numbers induced by human Tau; however, interpretation of this experiment is somewhat complicated as it is performed in Pfdn mutant genetic background. Can overexpression of Pfdn on its own rescue the Tau bouton defect in an otherwise wildtype background?

      We have now coexpressed Pfdn5 and hTau<SUP>V337M</SUP> in an otherwise wild-type background. As shown in Figure S11F-L, Pfdn5 overexpression suppresses Tau-induced bouton defects. We have incorporated the data in the Results section to support the role of Pfdn5 as a modifier of Tau toxicity.

      (10) Lines 256-263 / Figure 5: (a) What exactly are these tau-positive structures (punctae) being stained in larval brains in Fig 5C-E? Most prior work on tau aggregation using Drosophila models has been done in the adult brain, and human wildtype or mutant Tau is not known to form significant numbers of aggregates in neurons (although aggregates have been described following glia tau expression). 

      Therefore, the results need to be further clarified. Besides the provided schematic, a zoomed-out image showing the whole larval brain is needed here for orientation. Have these aggregates been previously characterized in the literature? 

      We agree with the reviewer that the expression of the wildtype or mutant form of human Tau in Drosophila is not known to form aggregates in the larval brain, in contrast to the adult brain (JACKSON et al. 2002; OKENVE-RAMOS et al. 2024). Consistent with previous reports, we also observed that Tau expression on its own does not form aggregates in the Drosophila larval brain.

      However, in the absence of Pfdn5, microtubule disruption is severe, leading to reduced Taumicrotubule binding and formation of globular/round or flame-shaped tangles like aggregates in the larval brain. Previous studies have reported that 1,6-hexanediol can dissolve the Tau aggregate seeds formed by Tau droplets, but cannot dissolve the stable Tau aggregates (WEGMANN et al. 2018). We observed that 5% 1,6-Hexanediol failed to dissolve these Tau puncta, demonstrating the formation of stable aggregates in the absence of Pfdn5. Additionally, we now performed a Tau solubility assay and show that in the absence of Pfdn5, a significant amount of Tau goes in the pellet fraction, which could not be detected by phospho-specific AT8 Tau antibody (targeting pSer202/pThr205) but was detected by total hTau antibody (D5D8N) on the western blots (Figure S8). These data further reinforce our conclusion that  Pfdn5 prevents the transition of hTau from soluble and/or microtubule-associated state to an aggregated, insoluble, and pathogenic state. These new data have been incorporated into the revised manuscript.

      (b) Can additional markers (nuclei, cell membrane, etc.) be used to highlight whether the taupositive structures are present in the cell body or at synapses?

      We performed the co-staining of Tau and Elav to assess the aggregated Tau localization. We found that in the presence of Pfdn5, Tau is predominantly cytoplasmic and localised to the cell body and axons. In the absence of Pfdn5, Tau forms aggregates but is still localized to the cell body or axons. However, some of the aggregates are very large, and the subcellular localization could not be determined (Figure S8M-N'). These might represent brain regions of possible nuclear breakdown and cell death (JACKSON et al. 2002).

      (c) It would also be helpful to perform western blots from larval (and adult) brains examining tau protein levels, phospho-tau species, possible higher-molecular weight oligomeric forms, and insoluble vs. soluble species. These studies would be especially important to help interpret the potential mechanisms of observed interactions.

      Western blot analysis revealed that overexpression of Pfdn5 does not alter total Tau levels (Figure 6O). In Pfdn5 mutants, however, hTau<sup>V337M</sup> levels were reduced in the supernatant fraction and increased in the pellet fraction, indicating a shift from soluble monomeric Tau to aggregated Tau.

      (d) Does overexpression of Pdn5 (UAS-Pdn5) suppress the formation of tau aggregates? I would therefore recommend that additional experiments be performed looking at adult flies (perhaps in Pfdn5 heterozygotes or using RNAi due to the larval lethality of Pdn5 null animals).

      Overexpression of Pfdn5 significantly reduced Tau-aggregates (Elav-Gal4/UASTau<sup>V337M</sup>; UAS-Pfdn5; DPfdn5<sup>15/40</sup>) observed in Pfdn5 mutants (Figure 5E). Coexpression of Pfdn5 and hTau<sup>V337M</sup> suppresses the Tau aggregates/puncta in 30-day adult brain. Since heterozygous DPfdn<sup>15</sup>/+ did not show a reduction in Pfdn5 levels, we did not test the suppression of Tau aggregates in  DPfdn<sup>15</sup>/+; Elav>UAS-Pfdn5, UAS-Tau<sup>V337M</sup>.

      (11) Figure 6, panels A-N: The GMR>Tau rough eye is not a "neurodegenerative" but rather a predominantly developmental phenotype. It results from aberrant retinal developmental patterning and the subsequent secretion/formation of the overlying eye cuticle (lenslets). I am confused by the data shown suggesting a "shrinking eye size" and increasing roughened surface over time (a GMR>tau eye similar to that shown in panel B cannot change to appear like the one in panel H with aging). The rough eye can be quite variable among a population of animals, but it is usually fixed at the time the adult fly ecloses from the pupal case, and quite stable over time in an individual animal. Therefore, any suppression of the Tau rough eye seen at 30 days should be appreciable as soon as the animals eclose. These results need to be clarified. If indeed there is robust suppression of Tau rough eye, it may be more intuitive and clearer to include these data with Figure 1, when first showing the loss-of-function enhancement of the Tau rough eye. Also, why is Pfdn6 included in these experiments but not in the studies shown in Figures 2-5?

      We thank the reviewer for their careful and knowledgeable assessment of the GMR>Tau rough eye model. We appreciate the clarification that the rough eye phenotype could be “developmental” rather than neurodegenerative.”  Our initial observations regarding "shrinking eye size" and "increased surface roughness" clearly show age-related progression of structural change.   Such progression has been observed and reported by others (IIJIMA-ANDO et al. 2012; PASSARELLA AND GOEDERT 2018).   We observed an age-dependent increase in the number of fused ommatidia in GMR-Gal4 >Tau, which were rescued by Pfdn5 or Pfdn6 expression. We noted that adult-specific induction of hTau<sup>V337M</sup> adult flies using the Gal80<sup>ts</sup> and GMR-GeneSwitch (GMR-GS) systems was not sufficient to induce a significant eye phenotype; thus, early expression of Tau in the developing eye imaginal disc appears to be required for the adult progressive phenotype that we observe. We feel that it is inadequate to refer to this adult progressive phenotype as “developmental,” and while admittedly arguable whether this can be termed “degenerative.”   

      To address neurodegeneration more directly, we focused on 30-day-old adult fly brains and demonstrated that Pfdn5 overexpression suppresses age-dependent Tau-induced neurodegeneration in the central nervous system (Figure 6H-N and Figure S12). This supports our central conclusion regarding the neuroprotective role of Pfdn5 in age-associated Tau pathology. Since we found an enhancement in the Tau-induced synaptic and eye phenotypes by Pfdn6 knockdown, we also generated CRISPR/Cas9-mediated loss-of-function mutants for Pfdn6. However, loss of Pfdn6 resulted in embryonic/early first instar lethality, which precluded its detailed analysis at the larval stages.

      (12) Figure 6, panels O-T: the elav>tau image appears to show a different frontal section plane compared to the other panels. It is advisable to show images at a similar level in all panels since vacuolar pathology can vary by region. It is also useful to be able to see the entire brain at a lower power, but the higher power inset view is obscuring these images. I would recommend creating separate panels rather than showing them as insets.

      In the revised figure, we now display the low- and high-magnification images as separate, clearly labeled panels instead of using insets. This improves visibility of the brain morphology while providing detailed views of the vacuolar pathology (Figure 6H-L).

      (13) Figure 6/7: For the experiments in which Pfdn5/6 is overexpressed and possibly suppresses tau phenotypes (brain vacuoles and memory), it is important to use controls that normalize the number of UAS binding sites, since increased UAS sites may dilute GAL4 and reduced Tau expression levels/toxicity. Therefore, it would be advisable to compare with Elav>Tau flies that also include a chromosome with an empty UAS site or other transgenes, such as UAS-GFP or UAS-lacZ.

      We thank the reviewer for the suggestion. Now we have incorporated proper controls in the brain vacuolization, the mushroom body, and ommatidial fusion rescue experiments. Also, we have independently verified whether Gal4 dilution has any effect on the Tau phenotypes (Figure 6H-L, Figure 7, and Figure S11A-B).

      (14) Lines 311-312: the authors say vacuolization occurs in human neurodegenerative disease, which is not really true to my knowledge and definitely not stated in the citation they use. Please re-phrase.

      Now we have made the appropriate changes in the revised manuscript.

      (15) Figure 7: The authors claim that Pfdn5/6 expression does not impact memory behavior, but there in fact appears to be a decrease in preference index (panel D vs panel B). Does this result complicate the interpretation of the potential interaction with Tau (panel F). Are data from wildtype control flies available?

      In our memory assay, a decrease in performance index (PI) of the trained flies compared to the naïve flies indicates memory formation (normal memory in control flies, Figure 7B). In contrast, a lack of significant difference in PI indicates a memory defect (Figure 7C: hTau<sup>V337M</sup> overexpressed flies). "Decrease in preference index (panel D vs panel B)" is not a sign of memory defect; it may be interpreted as a better memory instead. Hence, neuronal overexpression of Pfdn5 (Figure 7D) or Pfdn6 (Figure 7E) in wildtype neurons does not cause memory deficits. In addition, coexpression of Pfdn5/6 and hTau<sup>V337M</sup> successfully rescues the Tau-induced memory defect (significant drop in PI compared to the PI of naïve flies in Figure 7F-G). Moreover, almost complete rescue of the Tau-induced mushroom body defect on Pfdn5 or Pfdn6 expression further establishes potential interaction between Pfdn5/6 and Tau. This data has been incorporated into the revised manuscript.

      The memory assay itself with extensive data on wildtype flies and various other genotype will shortly be submitted for publication in another manuscript (Majumder et al, manuscript under preparation); However, we can confirm for the reviewer that wildtype flies, trained and assayed by the protocol described, show a significant decrease in performance index compared to the naïve flies, indicative of strong learning and memory performance, very similar to the control genotype data shown in Figure 7B. 

      Additional minor considerations

      (16) Lines 50-52: there are many therapeutic interventions for treating tauopathies, but not curative or particularly effective ones.

      Now we have made the appropriate changes in the revised manuscript.

      (17) Lines 87-106 seem like a duplication of the abstract. Consider deleting or condensing.

      We have made the appropriate changes in the revised manuscript.

      (18) Where is pfdn5 expressed? Development v. adult? Neuron v. glia? Conservation?

      Prefoldin5 is expressed throughout development but strongly localized to the larval trachea and neuronal axons. Drosophila Pfdn5 shows 35% overall identity with human PFDN5. 

      (19) Liine 187: is pfdn5 truly "novel"?

      The role of Pfdn5 as microtubule-binding and stabilizing is a new finding and has not been predicted or described before. Hence, it is a novel neuronal microtubule-associated protein.  

      (20) Figure 5, panel F, genotype labels on the x-axis are confusing; consider simplifying to Control, DPfdn, and Rescue.

      We have made appropriate changes in the figure for better readability.

      (21) Figures 5/8: it might be preferable to use consistent colors for Tau/HRP--Tau is labeled green in Figure 5 and then purple in Figure 8.

      We have made these changes where possible. 

      (22) Lines 311-312: Vacuolar neuropathology is NOT typically observed in human Tauopathy.

      We thank the reviewer for pointing this out. We have made the appropriate changes in the revised manuscript.

      (23) Lines 328-349: The explanation could be made more clear. Naïve flies should not necessarily be called controls. Also, a more detailed explanation of how the preference index is computed would be helpful. Why are some datapoints negative values?

      (a) We have rewritten this paragraph to make the description and explanation clearer. The detailed method and formula to calculate the Preference index have been incorporated in the Materials and Methods section.

      (b) We have replaced the term Control with Naïve. 

      (c) Datapoints with negative values appeared in some of the 'Trained' group flies. It indicates that post-CuSO<sub>4</sub> training, some groups showed repulsion towards the otherwise attractive odor 2,3B. As 2,3B is an attractive odorant, naïve or control flies show attraction towards it compared to air, which is evident from a higher number of flies in the Odor arm (O) compared to that of the Air arm (A) of the Y-maze; thus, the PI [(O-A/O+A)*100] is positive in case of naïve fly groups. Training of the flies led to an association of the attractive odorant with bitter food, leading to a decrease of attraction, and even repulsion towards the odorant in a few instances, resulting in less fly count in the odor arm compared to the air arm. Hence, the PI becomes negative as (O-A) is negative in such instances. Thus, it is not an anomaly but indicates strong learning. 

      (24) Line 403: misspelling "Pdfn"

      We have corrected this.

      (25) Lines 423-425: recommend re-phrasing, since tauopathies are human diseases. Mice and other animal models may be susceptible to tau-mediated neuronal dysfunction but not Tauopathy, per see.

      We have made the appropriate changes in the revised manuscript.

      (26) Lines 468-469: "tau neuropathology" rather than "tau associated neuropathies".

      We have made the appropriate changes in the revised manuscript. 

      References

      Askenazi, M., T. Kavanagh, G. Pires, B. Ueberheide, T. Wisniewski et al., 2023 Compilation of reported protein changes in the brain in Alzheimer's disease. Nat Commun 14: 4466.

      Hsieh, Y. C., C. Guo, H. K. Yalamanchili, M. Abreha, R. Al-Ouran et al., 2019 Tau-Mediated Disruption of the Spliceosome Triggers Cryptic RNA Splicing and Neurodegeneration in Alzheimer's Disease. Cell Rep 29: 301-316 e310.

      Iijima-Ando, K., M. Sekiya, A. Maruko-Otake, Y. Ohtake, E. Suzuki et al., 2012 Loss of axonal mitochondria promotes tau-mediated neurodegeneration and Alzheimer's disease-related tau phosphorylation via PAR-1. PLoS Genet 8: e1002918.

      Jackson, G. R., M. Wiedau-Pazos, T. K. Sang, N. Wagle, C. A. Brown et al., 2002 Human wildtype tau interacts with wingless pathway components and produces neurofibrillary pathology in Drosophila. Neuron 34: 509-519.

      Ji, W., K. An, C. Wang and S. Wang, 2022 Bioinformatics analysis of diagnostic biomarkers for Alzheimer's disease in peripheral blood based on sex differences and support vector machine algorithm. Hereditas 159: 38.

      Leitner, D., G. Pires, T. Kavanagh, E. Kanshin, M. Askenazi et al., 2024 Similar brain proteomic signatures in Alzheimer's disease and epilepsy. Acta Neuropathol 147: 27.

      Li, L., Y. Jiang, G. Wu, Y. A. R. Mahaman, D. Ke et al., 2022 Phosphorylation of Truncated Tau Promotes Abnormal Native Tau Pathology and Neurodegeneration. Mol Neurobiol 59: 6183-6199.

      Mershin, A., E. Pavlopoulos, O. Fitch, B. C. Braden, D. V. Nanopoulos et al., 2004 Learning and memory deficits upon TAU accumulation in Drosophila mushroom body neurons. Learn Mem 11: 277-287.

      Mukaka, M. M., 2012 Statistics corner: A guide to appropriate use of correlation coefficient in medical research. Malawi Med J 24: 69-71.

      Okenve-Ramos, P., R. Gosling, M. Chojnowska-Monga, K. Gupta, S. Shields et al., 2024 Neuronal ageing is promoted by the decay of the microtubule cytoskeleton. PLoS Biol 22: e3002504.

      Passarella, D., and M. Goedert, 2018 Beta-sheet assembly of Tau and neurodegeneration in Drosophila melanogaster. Neurobiol Aging 72: 98-105.

      Sun, Z., J. S. Kwon, Y. Ren, S. Chen, C. K. Walker et al., 2024 Modeling late-onset Alzheimer's disease neuropathology via direct neuronal reprogramming. Science 385: adl2992.

      Tao, Y., Y. Han, L. Yu, Q. Wang, S. X. Leng et al., 2020 The Predicted Key Molecules, Functions, and Pathways That Bridge Mild Cognitive Impairment (MCI) and Alzheimer's Disease (AD). Front Neurol 11: 233.

      Wegmann, S., B. Eftekharzadeh, K. Tepper, K. M. Zoltowska, R. E. Bennett et al., 2018 Tau protein liquid-liquid phase separation can initiate tau aggregation. EMBO J 37.

    1. Are there any criticisms of this framework?
      1. the P-O-L-C functions might be ideal but that they do not accurately depict the day-to-day actions of actual managers
    2. Are there any criticisms of this framework?
      1. the P-O-L-C functions might be ideal but that they do not accurately depict the day-to-day actions of actual managers
    1. S m o k i n gRisk factorDirect relationship - smoking and the prevalence andincidence of periodontitisAffects severityAffects healingSlight to moderate periodontitis - fair to poorSevere periodontitis - poor to hopeless

      (①) Smoking (①) Sigara

      (②) Risk factor (②) Risk faktörü

      (③) Direct relationship - smoking and the prevalence and incidence of periodontitis (③) Doğrudan ilişki – sigara kullanımı ile periodontitisin prevalansı ve insidansı arasında

      (④) Affects severity (④) Hastalığın şiddetini etkiler

      (⑤) Affects healing (⑤) İyileşmeyi olumsuz etkiler

      (⑥) Slight to moderate periodontitis - fair to poor (⑥) Hafif ila orta düzey periodontitis – orta ila kötü prognoz

      (⑦) Severe periodontitis - poor to hopeless (⑦) Şiddetli periodontitis – kötü ila umutsuz prognoz

    2. he prediction of a present disease (done after the diseaseis there)• Prognostic factors are the factors that affect the prognosis...• T h e likelihood to get the disease (the possibility to get thedisease)• Risk factors are the factors which make the patient at risk to get thedisease..

      ① The prediction of a present disease (done after the disease is there) (Mevcut bir hastalığın tahmini — hastalık ortaya çıktıktan sonra yapılır.)

      ② Prognostic factors are the factors that affect the prognosis... (Prognozu etkileyen faktörler “prognostik faktörler” olarak adlandırılır...)

      ③ The likelihood to get the disease (the possibility to get the disease) (Hastalığa yakalanma olasılığı veya riski)

      ④ Risk factors are the factors which make the patient at risk to get the disease.. (Risk faktörleri, hastanın o hastalığa yakalanma riskini artıran faktörlerdir.)

      🦷 Kısaca fark:

      Risk faktörleri: Hastalığın oluşmasından önce etkilidir.

      Prognostik faktörler: Hastalık oluştuktan sonra seyrini etkiler.

    3. P r o g n o s i s s h o u l d b e e s t a b l i s h e d b e f o r et r e a t m e n t i s s t a r t e d a n d b a s e d o n t h i sp r o g n o s i s y o u r t r e a t m e n t p l a n s h o u l d b ed o n e ...

      ① Prognosis should be established before treatment is started and based on this prognosis your treatment plan should be done (Prognoz, tedaviye başlamadan önce belirlenmelidir ve bu prognoza dayanarak tedavi planınız yapılmalıdır)

      Açıklama:

      Prognoz, hastalığın olası seyri ve tedaviye yanıtını öngörür.

    4. linical StepsTaking History &ExaminationFurther Investigations-if needed- Define the DiagnosisDetermine thePrognosis of thediseasePlan the TreatmentP r o g n o s i s is e s t a b l i s h e d AFTER t h e d i a g n o s i s is m a d e a n d BEFORE t h e t r e a t m e n tp l a n i s e s t a b l i s h e

      ① Clinical Steps (Klinik Adımlar)

      ② Taking History & Examination (Hastanın öyküsünü almak ve muayene yapmak)

      ③ Further Investigations — if needed (Gerekirse ileri tetkikler yapmak)

      ④ Define the Diagnosis (Tanıyı koymak)

      ⑤ Determine the Prognosis of the disease (Hastalığın prognozunu belirlemek)

      ⑥ Plan the Treatment (Tedavi planını yapmak)

      ⑦ Prognosis is established AFTER the diagnosis is made and BEFORE the treatment plan is established (Prognoz, tanı konulduktan sonra ve tedavi planı yapılmadan önce belirlenir)

    Annotators

    1. Me~bers o[ school communities may believe that sexuality is not anappropriate topic for young people. However, there are significant numbersof LGBTQ and ally students in schools, as well as significant numbers ofsexually aware heterosexual students. Ignoring the issue of sexuality meansneglecting to provide LGBTQ students with representations of themselvesthat enable them to understand themselves, and to provide examples ofways to counter bias and work toward respect for those who initially maynot be willing to respect LGBTQ students. Many LGBTQ students reporthearing insulting words on a daily basis. According to the 2019 NationalSchool Climate Survey of the Gay, Lesbian & Straight Education Network(GLSEN), three quarters of students reported hearing derogatory languagesuch as "faggot" and "dyke" (Kosciw et al., 2020).

      In 2019, I was still in middle school in China. That year, I saw how hard it was for classmates who didn’t fit the “normal” expectations of gender and sexuality. I remember one boy who performed a Blackpink dance during an event—he danced with so much emotion and confidence, but a lot of people laughed at him or called him names. At the time, I didn’t really understand him either. But as I grew older and met more people from the LGBTQ+ community, I started to understand their experiences and slowly began to accept them.

    1. How might the implications of the P-O-L-C framework differ for an organization like Goodwill Industries versus a firm like Starbucks?

      POLC for Goodwill needs to establish a comprehensive POLC to support people with various backgrounds. The core strategy is doing good for people and the planet. While starbuks forcuses on how to provide good products and services.

    1. self

      [/ 🧊/ ♖/ hyperpost/ ~/ indyweb/ 📓/ 20/ 25/ 11/ 3/ 🏛️](https://bafybeicbv7b4bpesh5wmnynftywhm2dzrswf6csndh2v4ndu2n3uuex4ny.ipfs.dweb.link/?filename=save%20string%20to%20local%20filesystem%20javascript%20-%20Brave%20Search%20(11_13_2025%208%EF%BC%9A27%EF%BC%9A28%20AM).html}

    1. Zmiany w prawie pracy 2025/2026 – jak przygotować firmę na nową rzeczywistość?
      • Jawność wynagrodzeń (od 24 grudnia 2025 r.)

        • Obowiązek ujawniania wynagrodzenia lub widełek płacowych przed rozmową kwalifikacyjną.
        • Zakaz pytania o wcześniejsze zarobki kandydata.
        • Oferty pracy muszą mieć neutralne płciowo nazwy stanowisk.
        • Wynagrodzenie obejmuje wszystkie składniki – premie, dodatki, benefity.
        • Od czerwca 2026 r. firmy zatrudniające ≥100 osób będą musiały raportować lukę płacową.
        • Wymagana aktualizacja polityki wynagrodzeń i szkoleń rekruterów.
      • Nowe zasady liczenia stażu pracy (od 2026 r.)

        • Do stażu pracy będą wliczane umowy B2B, zlecenia i agencyjne objęte składkami ZUS.
        • Możliwość wydłużenia urlopu, odpraw i okresów wypowiedzenia.
        • Pracownik ma 24 miesiące na udokumentowanie wcześniejszej współpracy.
        • Firmy powinny przeanalizować historię zatrudnienia i zaktualizować regulaminy.
        • Zmiana zwiększy koszty organizacyjne i kadrowe.
      • Nowe uprawnienia Państwowej Inspekcji Pracy (od 1 stycznia 2026 r.)

        • PIP będzie mógł samodzielnie stwierdzić istnienie stosunku pracy.
        • Decyzja administracyjna zastąpi wyrok sądu i będzie natychmiast wykonalna.
        • Możliwość zdalnych kontroli, żądania transmisji wideo, przesłuchań online.
        • Firmy muszą przeanalizować umowy cywilnoprawne pod kątem ryzyka uznania za etat.
      • Mobbing i dyskryminacja – nowe przepisy

        • Zniesienie wymogu „długotrwałości” mobbingu – wystarczy uporczywe nękanie.
        • Minimalne zadośćuczynienie: 12-krotność miesięcznego wynagrodzenia.
        • Obowiązek wprowadzenia formalnej procedury antymobbingowej.
        • Nowe formy dyskryminacji: przez asocjację i przez domniemanie.
        • Wymagane szkolenia dla kadry i jasne procedury zgłaszania naruszeń.
      • Cyfrowa komunikacja z pracownikami

        • Zastąpienie „formy pisemnej” przez „postać papierową lub elektroniczną”.
        • Możliwość komunikacji e-mail, przez komunikatory lub SMS.
        • Pracodawca musi udowodnić doręczenie wiadomości.
        • Konieczność aktualizacji regulaminu pracy i zgodności z RODO.
      • Płaca minimalna 2026

        • Od 1 stycznia 2026 r. wzrost do 4806 zł brutto/miesiąc i 31,40 zł brutto/godz.
        • Wzrost kosztów zatrudnienia, składek ZUS i świadczeń.
        • Firmy powinny uwzględnić zmiany w budżetach kadrowych.
      • Polityka AI w organizacji

        • Wymóg wprowadzenia zasad korzystania z AI zgodnie z unijnym AI Act.
        • Brak polityki może zostać uznany za naruszenie obowiązków pracodawcy.
        • Konieczność określenia, gdzie AI można, a gdzie nie można stosować.
      • Podsumowanie działań dla firm

        • Zaktualizować regulaminy i umowy.
        • Przeszkolić menedżerów i dział HR.
        • Przygotować system do raportowania płac i cyfrowego obiegu dokumentów.
        • Przeanalizować umowy B2B i zlecenia pod kątem ryzyka PIP.
        • Wczesne przygotowanie zapewni spokój i przewagę konkurencyjną.
    1. Las universidades o son centros de pensamiento independiente y creativo, o no son.

      Esto implicaría que ¡en América Latina casi no hay universidades! Apenas unas cuantas por país. Una postura idealista del autor.

    2. Los avances que llevaron a la democratización en el acceso a la universidad trajeron consigo la respuesta de sectores sociales que dudan sobre si las universidades están facilitando a los estudiantes el aprendizaje que necesita la actividad económica, así como si proporcionan a las empresas la tecnología que demandan, o si son realmente eficientes en sus costes.

      Dudas que me parece, por demás, bien fundadas, si bien, como se comenta, la universidad tiene también otros fines.

    1. aiming to augment their own experiences and through that ended up uh augmenting uh what the rest of humanity can do.

      augmenting what the rest of humanity can do

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study extends the previous interesting work of this group to address the potentially differential control of movement and posture. Their earlier work explored a broad range of data to make the case for a downstream neural integrator hypothesized to convert descending velocity movement commands into postural holding commands. Included in that data were observations from people with hemiparesis due to stroke. The current study uses similar data, but pushes into a different, but closely related direction, suggesting that these data may address the independence of these two fundamental components of motor control. I find the logic laid out in the second sentence of the abstract ("The paretic arm after stroke is notable for abnormalities both at rest and during movement, thus it provides an opportunity to address the relationships between control of reaching, stopping, and stabilizing") less then compelling, but the study does make some interesting observations. Foremost among them, is the relation between the resting force postural bias and the effect of force perturbations during the target hold periods, but not during movement. While this interesting observation is consistent with the central mechanism the authors suggest, it seems hard to me to rule out other mechanisms, including peripheral ones. These limitations should should be discussed.

      Thank you for summarizing our work. Note we have improved the logic in our abstract (…”providing an opportunity to ask whether control of these behaviors is independently affected in stroke”) based on your comments as outlined in our previous revision. We now extensively discuss limitations and potential alternative mechanisms in greater detail, in a dedicated section (lines 846-895; see response to reviewer 2 for further details).

      Reviewer #2 (Public review):

      Summary:

      Here the authors address the idea that postural and movement control are differentially impacted with stroke. Specifically, they examined whether resting postural forces influenced several metrics of sensorimotor control (e.g., initial reach angle, maximum lateral hand deviation following a perturbation, etc.) during movement or posture. The authors found that resting postural forces influenced control only following the posture perturbation for the paretic arm of stroke patients, but not during movement. They also found that resting postural forces were greater when the arm was unsupported, which correlated with abnormal synergies (as assessed by the Fugl-Meyer). The authors suggest that these findings can be explained by the idea that the neural circuitry associated with posture is relatively more impacted by stroke than the neural circuitry associated with movement. They also propose a conceptual model that differentially weights the reticulospinal tract (RST) and corticospinal tract (CST) to explain greater relative impairments with posture control relative to movement control, due to abnormal synergies, in those with stroke.

      Thank you for the brief but comprehensive summary. We would like to clarify one point: we do not suggest that our findings are necessarily due to the neural circuitry associated with posture being more impacted than the neural circuitry associated with movement. (rather, our conceptual model suggests that increased outflow through the (ipsilateral) RST, involved in posture, compensates for CST damage, at the expense of posture abnormalities spilling over into movement). Instead, we suggest that the neural circuitry for posture vs. movement control remains relatively separate in stroke, with impairments in posture control not substantially explaining impairments in movement control.

      Comments on revisions:

      The authors should be commended for being very responsive to comments and providing several further requested analyses, which have improved the paper. However, there is still some outstanding issues that make it difficult to fully support the provided interpretation.

      Thank you for appreciating our response to your earlier comments. We address the outstanding issues below.

      The authors say within the response, "We would also like to stress that these perturbations were not designed so that responses are directly compared to each other ***(though of course there is an *indirect* comparison in the sense that we show influence of biases in one type of perturbation but not the other)***." They then state in the first paragraph of the discussion that "Remarkably, these resting postural force biases did not seem to have a detectable effect upon any component of active reaching but only emerged during the control of holding still after the movement ended. The results suggest a dissociation between the control of movement and posture." The main issue here is relying on indirect comparisons (i.e., significant in one situation but not the other), instead of relying on direct comparisons. Using well-known example, just because one group / condition might display a significant linear relationship (i.e., slope_1 > 0) and another group / condition does not (slope_2 = 0), does not necessarily mean that the two groups / conditions are statistically different from one another [see Figure 1 in Makin, T. R., & Orban de Xivry, J. J. (2019). Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. eLife, 8, e48175.].

      We agree and are well aware of the limitation posed by an indirect comparison – hence the language we used to comment on the data (“did not seem”, “suggest”, etc.). To address this limitation, we performed a more direct comparison of how the two types of perturbations (moving vs. holding) interact with resting biases. For this comparison, we calculated a Response Asymmetry Index (RAI):

      Above, 𝑟<sub>𝐴</sub> is the response on direction where resting bias is most-aligned with the perturbation, and 𝑟<sub>𝑂</sub> is the response on direction where resting bias is most-opposed to the perturbation.

      We calculated RAIs for two response metrics used for both moving and holding perturbations: maximum deviation and time to stabilization/settling time. For these two response metrics, positive RAIs indicate an asymmetry in line with an effect of resting bias.

      The idea behind the RAI is that, while the magnitude of responses may well differ between the two types of perturbations, this will be accounted for by the ratio used to calculate the asymmetry. The same approach has been used to assess symmetry/laterality across a variety of different modalities, such as gait asymmetry (Robinson et al., 1987), the relative fMRI activity in the contralateral vs. ipsilateral sensorimotor cortex while performing a motor task (Cramer et al., 1997), or the relative strength of ipsilateral vs. contralateral responses to transcranial magnetic stimulation (McPherson et al., 2018). Notably, the normalization also addresses potential differences in overall stiffness between holding vs. moving perturbations, which would similarly affect aligned and opposing cases (see our response to your following point).

      Figure 8 shows RAIs we obtained for holding (red) vs. moving/pulse (blue) perturbations. For the maximum deviation (left), there is more asymmetry for the holding case though the pvalue is marginal (p=0.088) likely due to the large variability in the pulse case (individual values shown in black dots). For time to stabilization/settling time (right) the difference is significant (p=0.0048). Together, these analyses indicate that resting biases interact substantially more with holding compared to movement control, in line with a relative independence between these two control modalities. We now include this panel as Figure 8, and describe it in Results (lines 587-611).

      Note that even a direct comparison does not prove that resting biases and active movement control are perfectly independent. We now discuss these issues in more depth, in the new Limitations section suggested by the Reviewer (lines 836-849).

      The authors have provided reasonable rationale of why they chose certain perturbation waveforms for different. Yet it still holds that these different waveforms would likely yield very different muscular responses making it difficult to interpret the results and this remains a limitation. From the paper it is unknown how these different perturbations would differentially influence a variety of classic neuromuscular responses, including short-range stiffness and stretch reflexes, which would be at play here.

      Much of the results can be interpreted when one considers classic neuromuscular physiology. In Experiment 1, differences in resting postural bias in supported versus unsupported conditions can readily be explained since there is greater muscle activity in the unsupported condition that leads to greater muscle stiffness to resist mechanical perturbations (Rack, P. M., & Westbury, D. R. (1974). The short-range stiffness of active mammalian muscle and its effect on mechanical properties. The Journal of physiology, 240(2), 331-350.). Likewise muscle stiffness would scale with changes in muscle contraction with synergies. Importantly for experiment 2, muscle stiffness is reduced during movement (Rack and Westbury, 1974) which may explain why resting postural biases do not seem to be impacting movement. Likewise, muscle spindle activity is shown to scale with extrafusal muscle fiber activity and forces acting through the tendon (Blum, K. P., Campbell, K. S., Horslen, B. C., Nardelli, P., Housley, S. N., Cope, T. C., & Ting, L. H. (2020). Diverse and complex muscle spindle afferent firing properties emerge from multiscale muscle mechanics. eLife, 9, e55177.). The concern here is that the authors have not sufficiently considered muscle neurophysiology, how that might relate to their findings, and how that might impact their interpretation. Given the differences in perturbations and muscle states at different phases, the concern is that it is not possible to disentangle whether the results are due to classic neurophysiology, the hypothesis they propose, or both. Can the authors please comment.

      It is possible that neuromuscular physiology may explain part of our results. However, this would not contradict our conceptual model.

      Regarding Experiment 1, it is possible that stiffness would scale with changes in background muscle contraction as the reviewer suggests. Indeed, Bennett and al.(Bennett et al., 1992) used brief perturbations on the wrist to assess elbow stiffness, finding that, during movement, stiffness was increased in positions with a higher gravity load (and, in general, in positions where the net muscle torque was higher). However, during posture maintenance (like in our Experiment 1), they found that stiffness did not vary with (elbow) position or gravity load (two characteristics of our findings in Experiment 1):

      “The observed stiffness variation was not simply due to passive tissue or other joint angle dependent properties, as stiffnesses measured during posture were position invariant. Note that the minimum stiffness found in posture was higher than the peak stiffness measured during movement, and did not change much with the gravity load.” (illustrated in Fig. 5 of that paper)

      We thus find it very unlikely that stiffness explains the difference between the supported vs. unsupported conditions in Experiment 1.

      Even if stiffness modulation between the supported vs. unsupported conditions could explain our finding of stronger posture biases in the latter case, it would not be incompatible with our interpretation of increased RST drive: increased stiffness would potentially magnify the effects of the RST drive we propose to drive these resting biases. It is possible that the increase in resting biases under conditions of increased muscle contraction (lack of arm support) is mediated through an increase in muscle stiffness. In other words, the increase in resting biases may not directly reflect additional RST outflow per se, but the scaling, through stiffness, of the same magnitude of RST outflow. Understanding this interaction was beyond the scope of our experiment design; in line with this, we briefly comment about it in our Limitations section.

      Regarding Experiment 2, stiffness has indeed been shown to be lower during movement, and we now comment the potential effect of this on our results in the “Limitations” section (lines 815-830, replicated below). Importantly, for the case of holding perturbations, the increased stiffness associated with holding would increase resistance to both extension and flexion-inducing perturbations. Thus, higher stiffness would be unlikely to explain our finding whereby resting biases resist or aggravate the effects of holding perturbations depending on perturbation direction. In addition, the framework in Blum et al., that describes how interactions between alpha and gramma drive can explain muscle activity patterns, does not rule out central neural control of stiffness: “muscle spindles have a unique muscle-within-muscle design such that their firing depends critically on both peripheral and central factors” (emphasis ours). It may be, for example, that gamma motoneurons controlling muscle spindles and stiffness are modulated from input from the reticular formation, making this a mechanism in line with our conceptual model.

      “Moreover, it has been shown that joint stiffness is reduced during movement compared to holding control (Rack and Westbury, 1974; Bennett et al., 1992). Along similar lines, muscle spindle activity – which may modulate stiffness – scales with extrafusal muscle fiber activity (such as muscle exertion involved in holding) and forces acting through the tendon (Blum et al., 2020). Such observations could, in principle, explain why we were unable to detect a relationship between resting biases and active movement control but we readily found a relationship between resting biases and active holding control: reduced joint stiffness during movement could scale down the influence of resting abnormalities. There are two issues with this explanation, however. First, it is debatable whether this should be considered an alternative explanation per se: stiffness modulation could be, in total or in part, the manifestation of a central movement/posture CST/RST mechanism similar to the one we propose in our conceptual model. For example, (Blum et al., 2020) argue that muscle spindle firing depends on both peripheral and central factors. Second, increased stiffness would not necessarily help detect differences in how active postural control responds to within-resting-posture vs. out-of-resting-posture perturbations. This is because an overall increase in stiffness would likely increase resistance to perturbations in any direction.”

      The authors should provide a limitations paragraph. They should address 1) how they used different perturbation force profiles, 2) the muscles were in different states which would change neuromuscular responses between trial phase / condition, 3) discuss a lack of direct statistical comparisons that support their hypothesis, and 4) provide a couple of paragraphs on classic neurophysiology, such as muscle stiffness and stretch reflexes, and how these various factors could influence the findings (i.e., whether they can disentangle whether the reported results are due to classic neurophysiology, the hypothesis they propose, or both).

      Thank you for your suggestion. We now discuss these points in a separate paragraph (lines 846895), bringing together our previous discussion on stretch reflexes, our description of different perturbation types, and the additional issues raised by the reviewer above.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors have responded well to all my concerns, save two minor points.

      Figure 2 appears to be unchanged, although they describe appropriate changes in the response letter.

      Thank you for catching this error – we now include the updated figure (further updated to use the terms near/distant in place of proximal/distal).

      I still take issue with the use of proximal and distal to describe the locations of targets. Taking definitions somewhat randomly from the internet, "The terms proximal and distal are used in structures that are considered to have a beginning and an end," and "Proximal and distal are anatomical terms used to describe the position of a body part in relation to another part or its origin." In any case, the hand does not become proximal just because you bring it to your chest. Why not simply stick to the common and clearly defined terms "near" and "distant"?

      Point taken. We have updated the paper to use the terms near/distant.

      Additional changes/corrections not outlined above

      We now include a link to the data and code supporting our findings (https://osf.io/hufy8/). In addition, we made several minor edits throughout the text to improve readability, and corrected occasional mislabeling of CCW and CW pulse data. Note that this correction did not alter the (lack of) relationship between resting biases and responses to perturbations during active movement.

      Response letter references

      Bennett D, Hollerbach J, Xu Y, Hunter I (1992) Time-varying stiffness of human elbow joint during cyclic voluntary movement. Exp Brain Res 88:433–442.

      Blum KP, Campbell KS, Horslen BC, Nardelli P, Housley SN, Cope TC, Ting LH (2020) Diverse and complex muscle spindle afferent firing properties emerge from multiscale muscle mechanics. Elife 9:e55177.

      Cramer SC, Nelles G, Benson RR, Kaplan JD, Parker RA, Kwong KK, Kennedy DN, Finklestein SP, Rosen BR (1997) A functional MRI study of subjects recovered from hemiparetic stroke. Stroke 28:2518–2527.

      McPherson JG, Chen A, Ellis MD, Yao J, Heckman C, Dewald JP (2018) Progressive recruitment of contralesional cortico-reticulospinal pathways drives motor impairment post stroke. J Physiol 596:1211–1225 Available at: https://doi.org/10.1113/JP274968.

      Rack PM, Westbury D (1974) The short range stiffness of active mammalian muscle and its effect on mechanical properties. J Physiol 240:331–350.

      Robinson R, Herzog W, Nigg BM (1987) Use of force platform variables to quantify the effects of chiropractic manipulation on gait symmetry. J Manipulative Physiol Ther 10:172–176.

      Williams PE, Goldspink G (1973) The effect of immobilization on the longitudinal growth of striated muscle fibres. J Anat 116:45.

    1. O que é que você pagou ontem?

      1,Eu pagei o aluguel 2,Ele tomou em casa 3,Eu jantei no resturante ontem à noite 4,Ela saiu com uns amigos 5,eu almocei às onze e meia 6,Eles deitaram à meia-noite 7,Eu li um jornal 8,Ela se levantou às seis e quinze 9,Eu me-levantei às oito e vinte 10.Nós fomos ao cinema

  3. inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net
    1. Machines for these purposes are now o ftwo types: keyboard machines for accounting and th elike, manually controlled for the insertion of data, a n dusually automatically controlled as far as the sequenceof operations is concerned; and punched-card machinesin which separate operations are usually delegated toa series of machines, and the cards then transferredbodily from one to another. B oth forms are very useful; bu t as far as complex computations are concerned,both are still in embryo.

      It is interesting to see optimism in both forms of technology, and to know of how one of these definitely won in terms of longevity of use

    2. A . r e c o r d , if it is to be useful to science, must becontinuously extended, it must be stored, and above allit must be consulted.

      In Coms2003A, a bias is explained for mediums where in they can either lean to a bias of time or space. A space bias dictates the mobility of a message, a time bias dictates the longevity of a message

    1. a felicidade é fundamentalmente igualitária, ela integra a questão do outro, enquanto que a satisfação, ligada ao egoísmo da sobrevivência, ignora a igualdade. Depois, a satisfação não é dependente do encontro ou da decisão. Ela ocorre quando nós encontramos um bom lugar no mundo, um bom trabalho, um carro bonito e belas férias no estrangeiro. A satisfação é o consumo das coisas pelas quais lutamos para obter.

      a felicidade tem que integrar o outro, a satisfação não.

    1. until the pressureinside the cuff reaches 20-25 cm H2O.

      ① until the pressure inside the cuff reaches 20-25 cm H2O. ① Manşon içindeki basınç 20-25 cm H₂O değerine ulaşana kadar.

    Annotators

    1. In our consistent hashing model, we can change the number of nodes with only O(N/M + log M) runtime;

      This uses the Ring approach shown above...

      When you add Server D to the ring:

      Only the keys between Server A and the new Server D need to move. (Since we are moving clockwise only)
      

      Result: Smooth scaling. Your cache hit rate dips slightly instead of collapsing

    1. Que horas são

      1.são onze horas 2,à uma hora 3,Tânia e Patrícia 4,beto 5,não eles vão restaurante italiano 6,O Beto vai fazer assistir televisão 7,Onze e quinze

  4. drive.google.com drive.google.com
    1. o evaluate the tool to ensure it is fit for purpose o evaluate and verify the content generated o ensure that your own thoughts and efforts have provided a substantial contribution to the task o acknowledge generative AI usage with a reference or statement (refer to the library guidance).

      There are double dot points here that need to be removed

  5. minio.la.utexas.edu minio.la.utexas.edu
    1. “justice too long delayed is justice denied.”

      Action should not be delayed because we only really live so long and in order to bring justice o those who deserve it, it must be given as soon as possible!

    1. Joseph Priestley: discovered CO₂ (1772), N₂O (1772), and O₂ (1777).• 1842 – Crawford W. Long introduced the use of ether.• 1847 – In London, John Snow administered anesthesia withchloroform for many years.• 1864 – Lundy introduced trichloroethylene.• 1882 – Von Freud introduced cyclopropane.• 1951 – Suckling introduced halothane.• 1958 – Larssen introduced methoxyflurane.

      ① Joseph Priestley: discovered CO₂ (1772), N₂O (1772), and O₂ (1777). Joseph Priestley: 1772’de CO₂, 1772’de N₂O ve 1777’de O₂’yi keşfetti.

      ② 1842 – Crawford W. Long introduced the use of ether. 1842 – Crawford W. Long eterin kullanımını tanıttı.

      ③ 1847 – In London, John Snow administered anesthesia with chloroform for many years. 1847 – Londra’da John Snow, yıllarca kloroform ile anestezi uyguladı.

      ④ 1864 – Lundy introduced trichloroethylene. 1864 – Lundy triklorotilenin kullanımını tanıttı.

      ⑤ 1882 – Von Freud introduced cyclopropane. 1882 – Von Freud siklopropanın kullanımını tanıttı.

      ⑥ 1951 – Suckling introduced halothane. 1951 – Suckling halotanı tanıttı.

      ⑦ 1958 – Larssen introduced methoxyflurane. 1958 – Larssen metoksifluranı tanıttı.

    Annotators

    1. Lagere 2D:4D vingerlengteratio kan meer prenatale androgene blootstelling weergeven bij lesbiennes. Ook zijn homoseksuele mannen en lesbiennes vaker linkshandig, wat gerelateerd is aan hormonale effecten. o Tenslotte kunnen epigenetische factoren de foetus meer of minder gevoelig maken voor androgenen.

      ?

    1. Author response:

      (1) General Statements

      Our manuscript studies mechanisms of planar polarity establishment in vivo in the Drosophila pupal wing. Specifically we seek to understand mechanisms of ‘cell-scale signalling’ that is responsible for segregating core pathway planar polarity proteins to opposite cell edges. This is an understudied question, in part because it is difficult to address experimentally.

      We use conditional and restrictive expression tools to spatiotemporally manipulate core protein activity, combined with quantitative measurement of core protein distribution, polarity and stability. Our results provide evidence for a robust cell-scale signal, while arguing against mechanisms that depend on depletion of a limited pool of a core protein or polarised transport of core proteins on microtubules. Furthermore, we show that polarity propagation across a tissue is hard, highlighting the strong intrinsic capacity of individual cells to establish and maintain planar polarity.

      The original manuscript received three fair and thorough peer-reviews, which raised many important points. In response, we decided to embark on a full revision that attempts to answer all of the points. We have included new data to support our conclusions in Supplemental Figures 1, 2 and 5.

      Additionally in response to the reviewers we have revised the manuscript title, which is now ‘Characterisation of cell-scale signalling by the core planar polarity pathway during Drosophila wing development’.

      (2) Point-by-point description of the revisions

      We thank all of the reviewers for their thorough and thoughtful review of our manuscript. They raise many helpful points which have been extremely useful in assisting us to revise the manuscript.

      In response we have carried out a major revision of the manuscript, making numerous changes and additions to the text and also adding new experimental data. Specific changes are listed after our detailed response to each comment.

      Reviewer #1:

      […] Major points:

      The exact meaning of cell-scale signaling is not defined, but I infer that the authors use this term to describe how what happens on one side of a cell affects another side. The remainder of my critique depends on this understanding of the intended meaning.

      As the reviewer points out, it is important that the meaning of the term ‘cell-scale signalling’ is clear to the reader and in response to their comment we have had another go at defining it explicitly in the Introduction to the manuscript.

      Specifically, we use the term ‘cell-scale signalling’ to describe possible intracellular mechanisms acting on core protein segregation to opposite cell membranes during core pathway dependent planar polarisation. For example, this could be a signal from distal complexes at one side of the cell leading to segregation of proximal complexes to the opposite cell edge, or vice versa. See also our response to Reviewer #2 regarding the distinction between ‘molecular-scale’ and ‘cell-scale’ signalling. 

      Changes to manuscript: Revised definition of ‘cell-scale signalling’ in Introduction.

      The authors state that any tissue wide directional information comes from pre-existing polarity and its modification by cell flow, such that the de novo signaling paradigm "bypasses" these events and should therefore not be responsive to any further global cues. It is my understanding that this is not a universally accepted model, and indeed, the authors' data seem to suggest otherwise. For example, the image in Fig 5B shows that de novo induction restores polarity orientation to a predominantly proximal to distal orientation. If no global cue is active, how is this orientation explained?

      We assume that the reviewer’s point is that it is not universally accepted that de novo induction after hinge contraction leads to uncoupling from global cues (rather than that it is not accepted that hinge contraction remodels radial polarity to a proximodistal pattern). We are (we believe) the only lab that has used de novo induction as a tool, and we’re not aware of any debate in the literature about whether this bypasses global cues. Nevertheless, we accept that it is hard to prove there is no influence of global cues, when the nature of those cues and the time at which they act remain unclear. Below we summarise the reasons why we believe there are not significance effects of global cues in our experiments that would influence the interpretation of our results.

      First, our reading of the literature supports a broad consensus that an early radial core planar polarity pattern is realigned by cell flow produced by hinge contraction beginning at around 16h APF (e.g. Aigouy et al., 2010; Strutt and Strutt, 2015; Aw and Devenport, 2017; Butler and Wallingford, 2017; Tan and Strutt, 2025). Taken at face value, this suggests that there are ‘radial’ cues present prior to hinge contraction, maybe coming from the wing margin – arguably these radial cues could be Ft-Ds or Wnts or both, given they are expressed in patterns consistent with such a role (notwithstanding the published evidence arguing against roles for either of these cues). It then appears that hinge contraction supercedes these cues to convert a radial pattern to a proximodistal pattern – whether the radial cues that affect the core pathway earlier remain active after hinge contraction is unclear, although both Ft-Ds and Wnts appear to maintain their ‘radial’ patterns beyond the beginning of hinge contraction (e.g. Merkel et al., 2014; Ewen-Campen et al., 2020; Yu et al., 2020).

      We think that the reviewer is proposing the presence of a proximodistal cue that is active in the proximal region of the wing that we use for our experiments shown e.g. in Fig.5, and that this cue orients core polarity here (but not elsewhere in the wing) in a time window after 18h APF. Ft-Ds and Wnts do not seem to be plausible candidates as they are still in ‘radial’ patterns. This leaves either an unknown proximodistal cue (a gradient of some unknown signalling molecule?), or possibly some ability of hinge contraction to align proximodistal polarity specifically in this wing region but not elsewhere. We cannot definitively rule out either of these possibilities, but neither do we think there is sufficient evidence to justify invoking their existence to explain our observations.

      In particular, the reason that we don’t think there is a proximodistal cue in the proximal part of the wing after 18h APF, is that work from our lab shows that induction of Fz or Stbm expression at times around or after the start of hinge contraction (i.e. >16 h APF) results in increasing levels of trichome swirling with polarity not being coordinated with the tissue axis either proximally or distally (Strutt and Strutt, 2002; Strutt and Strutt 2007). Our simplest interpretation for this is that induction at these stages fails to establish the early radial pattern of core pathway polarity and hence hinge contraction cannot reorient radial to proximodistal. If hinge contraction alone could specify proximodistal polarity in the absence of the earlier radial polarity, then we would not expect to see swirling over much of the proximal wing (where the forces from hinge contraction are strongest (Etournay et al., 2015)).

      In this manuscript, our earliest de novo experiments begin with Fz induction at 18h APF (de novo 10h), then at 20h APF (de novo 8h) and at 22h APF (de novo 6h). The image in Fig. 5B, referred to by the reviewer, is of a wing where Fz is induced de novo at 22 h APF. In these wings, as expected, the core proteins localise asymmetrically in stereotypical swirling patterns throughout the wing surface (see Fig. 2M and also Strutt and Strutt, 2002; Strutt and Strutt 2007), but – usefully for our experiments – they broadly localise along the proximal-distal axis in the region analysed in Fig. 5B. Given the strong swirling in surrounding regions when inducing at >20h APF, we feel reasonably confident in assuming that the pattern is not due to a proximodistal cue present in the proximal wing.

      We appreciate that the original manuscript did not show images including the trichome pattern in adjacent regions, so this point would not have been clear, but we now include these in Supplementary Fig. 5. We have also added a note in the legend to Fig. 5B to clarify that the proximodistal pattern seen is local to this wing region. We apologise for this oversight and the confusion caused and appreciate the feedback.

      The 6 hr condition, that has only partial polarity magnitude, is quite disordered. Do the patterns at 8 and 10 hrs become more proximally-distally oriented? It is stated that they all show swirls, but please provide adult wing images, and the corresponding orientation outputs from QuantifyPolarity to help validate the notion that the global cues are indeed bypassed by this paradigm.

      In all three ‘normal’ de novo conditions (6h, 8h and 10h), regardless of the time of induction, the polarity orientation patterns of Fz-mKate2 in pupal and adult wings are very similar in the experimentally analysed region (Fig. S5B-E). The strong local hair swirling agrees with the previous published data (Strutt and Strutt, 2002; Strutt and Strutt 2007). Overall, we don’t see any evidence that the 10h de novo induction results in more proximodistally coordinated polarity than the 8h or 6h conditions. This is consistent with our contention that there is no global cue present at these stages, which presumably would have a stronger effect when core pathway activity was induced at earlier stages.

      Changes to manuscript: Added additional explanation of the ‘de novo induction’ paradigm and why we believe the resulting polarity patterns are unlikely to be influenced by any global signals in Introduction and Results section ‘Induced core protein relocalisation…’. Added quantification of polarity in the experiment region proximal to the anterior cross-vein in pupal wings (Fig.S5E-E’’’) and zoomed-out images of the surrounding region in adult wings showing that the polarity pattern does not become more proximodistal when induction time is longer, and also that there is not overall proximodistal polarity in proximal regions of the wing (Fig.S5B-D), arguing against an unknown proximodistal polarity cue at these stages of development.

      In the de novo paradigm, polarization is initiated immediately or shortly after heat shock induction. However, the results should be differently interpreted if the level of available Fz protein does not rise rapidly and then stabilize before the 6 hr time point, and instead continues to rise throughout the experiment. Western blots of the Fz::mKate2-sfGFP at time points after induction should be performed to demonstrate steady state prior to measurements. Otherwise, polarity magnitude could simply reflect the total available pool of Fz at different times after induction. Interpreting stability is complex, and could depend on the same issue, as well as the amount of recycling that may occur. Prior work from this lab using FRAP suggested that turnover occurs, and could result from recycling as well as replenishment from newly synthesized protein. 

      The reviewer raises an important point, which we agree could confound our experimental interpretations. As suggested we have now carried out western blotting and quantitation for Fz::mKate2-sfGFP levels and added these data to Fig.S1 (Fig. S1C,D). Quantified Fz is not significantly different between the three de novo polarity induction timings and not significantly different compared to constitutive Fz::mKate2-sfGFP expression (although there is a trend towards increasing Fz::mKate2-sfGFP protein levels with increasing induction times). These data are consistent with Fz::mKate2-sfGFP being at steady state in our experiments and that levels are sufficient to achieve normal polarity (as constitutive Fz::mKate2-sfGFP does so). Therefore it is unlikely that differing protein levels explain the differing polarity magnitudes at the different induction times. Interestingly, Fz::mKate2-sfGFP levels are lower than endogenous Fz levels, possibly due to lower expression or increased turnover/reduced recycling.

      Changes to manuscript: Added western blot analysis of Fz::mKate2-sfGFP expression under 10h, 8h and 6h induction conditions vs endogenous Fz expression and constitutive Fz::mKate2sfGFP expression (Fig.S1C-D) and discussed in Results section ‘Planar polarity establishment is…’.

      From the Fig 3 results, the authors claim that limiting pools of core proteins do not explain cellscale signaling, a result expected based on the lack of phenotypes in heterozygotes, but of course they do not test the possibility that Fz is limiting. They do note that some other contributing protein could be. 

      Previously published results from our lab (Strutt et al., 2016 Cell Reports; Supplemental Fig. S6E) show that in a heterozygous fz mutant background, Fz protein levels are not affected by halving the gene dosage when compared to wt, suggesting that Fz is most likely produced in excess and is not normally limiting, but that protein that cannot form complexes may be rapidly degraded. We have now added this information to the text.

      Changes to manuscript: Added explanation in text that Fz levels had previously been shown to not be dosage sensitive in Results section ‘Planar polarity establishment is…’ and also added a caveat to the Discussion about not directly testing Fz.

      In Fig 3, it is unclear why the authors chose to test dsh1/+ rather than dsh[null]/+. In any case, the statistically significant effect of Dsh dose reduction is puzzling, and might indicate that the other interpretation is correct. Ideally, a range including larger and smaller reductions would be tested. As is, I don't think limiting Dsh is ruled out. 

      Concerning the choice of dsh allele, we appreciate the query of the reviewer regarding use of dsh[1] instead of a null, as there might be a concern that dsh[1] would give a less strong phenotype. The answer is that over more than two decades we and others have never found any evidence that dsh[1] does not act as a ‘null’ for planar polarity in the pupal wing, and furthermore use of dsh[1] preserves function in Wg signalling – and we would prefer to rule out any phenotypic effects due to any potential cross-talk between the two pathways that might be seen using a complete null. To expand on this point, dsh[1] mutant protein is never seen at cell junctions (Axelrod 2001; Shimada et al., 2001; our own work), and by every criteria we have used, planar polarity is completely disrupted in hemizygous or homozygous mutants e.g. see quantifications of polarity in (Warrington et al., 2017 Curr Biol).

      In terms of the broader point, whether we can rule out Dsh being limiting, we were very careful to be clear that we did not see evidence for Dsh (or other core proteins) being limiting in terms of ‘rates of core pathway de novo polarisation’. When the reviewer says ‘the statistically significant effect of Dsh dose reduction is puzzling’ we believe they are referring to the data in Fig. 3J, showing a small but significantly different reduction in stable Fz in de novo 6h conditions (also seen in 8h de novo conditions, Fig. S3I). As Dsh is known to stabilise Fz in complexes (Strutt et al., 2011 Dev Cell; Warrington et al., 2017 Curr Biol), in itself this result is not wholly surprising. Nevertheless, while this shows that halving Dsh levels does modestly reduce Fz stability, it does not alter our conclusion that halving Dsh levels does not affect Fz polarisation rate under either 6h or 8h de novo conditions.

      Unfortunately, we do not have available to us a practical way of achieving consistent intermediate reductions in Dsh levels (e.g. a series of verified transgenes expressing at different levels). Levels of all the core proteins could be dialled down using transgenes, to see when the system breaks, and indeed we have previously published that lower levels of polarity are seen if Fmi levels are <<50% or if animals are transheterozygous for pk, stbm, dgo or dsh, pk, stbm, dgo simultaneously (Strutt et al., 2016 Cell Reports). However, it seems to be a trivial result that eventually the ability to polarise is lost if insufficient core proteins are present at the junctions. For this reason we have focused on a simple set of experiments reducing gene dosage singly by 50% under two de novo induction conditions, and have been careful to state our results cautiously. The assays we carried out were a great deal of work even for just the 5 heterozygous conditions tested.

      We believe that the experiments shown effectively make the point that there is no strong dosage sensitivity – and it remains our contention that if protein levels were the key to setting up cell-scale polarity, then a 50% reduction would be expected to show an effect on the rate of polarisation. We further note that as Fz::mKate2-sfGFP levels are lower than endogenous Fz levels (see above), the system might be expected to be sensitised to further dosage reductions, and despite this we failed to see an effect on rate of polarisation.

      We note that Reviewer #3 made a similar point about whether we can rule out dosage sensitivity on the basis of 50% reductions in protein level. To address the comments of both reviewers we had now added some further narrative and caveats in the text.

      In a similar vein, Reviewer #2 requested data on whether dosage reduction altered protein levels by the expected amount. We have now added further explanation/references and western blot data to address this.

      Changes to manuscript: Added more explanation of our choice of dsh[1] as an appropriate mutant allele to use in Results section ‘Planar polarity establishment is…’. Added some narrative and caveats regarding whether lowering levels more than 50% would add to our findings in the Discussion. Revised conclusions to be more cautious including altering section title to read ‘Planar polarity establishment is not highly sensitive to variation in protein levels of core complex components’.

      Also added westerns and text/references showing that for the tested proteins there is a reduction in protein levels upon removal of one gene dosage in Results section ‘Planar polarity establishment is…’ and Fig.S2.

      The data in Fig 5 are somewhat internally inconsistent, and inconsistent with the authors' interpretation. In both repolarization conditions, the authors claim that repolarization extends only to row 1, and row 1 is statistically different from non-repolarized row 1, but so too is row 3. Row 2 is not. This makes no sense, and suggests either that the statistical tests are inappropriate and/or the data is too sparse to be meaningful. 

      As we’re sure the reviewer appreciates, this was an extremely complex experiment to perform and analyse. We spent a lot of time trying to find the best way to illustrate the results (finally settling on a 2D vector representation of polarity) and how to show the paired statistical comparisons between different groups. Moreover, in the end we were only able to detect generally quite modest (statistically significant) changes in cell polarity under the experimental conditions.

      However, we note that failure to see large and consistent changes in polarity is exactly the expected result if it is hard to repolarise from a boundary – and this is of course the conclusion that we draw. Conversely, if repolarisation were easy, which was our expectation at least under de novo conditions without existing polarity, then we would have expected large and highly statistically significant changes in polarity across multiple cell rows. Hence we stand by our conclusion that ‘it is hard to repolarise from a boundary of Fz overexpression in both control and de novo polarity conditions’.

      Overall, we were trying to establish three points:

      (1) to demonstrate that repolarisation occurs from a boundary of overexpression i.e. from boundary 0 to row 0

      (2) to establish whether a wave of repolarisation occurs across rows 1, 2 and 3

      (3) to determine if in repolarisation in de novo condition it is easier to repolarise than in repolarisation in the control (already polarised) condition Taking each in turn:

      (1) To detect repolarisation from a boundary relative to the control condition, we have to compare row 0 in repolarisation condition (Fig.5G,K) vs control condition (Fig.5F,J). This comparison shows a significative repolarisation (p=0.0014). From now, row 0 in repolarisation condition is our reference for repolarisation occurring.

      (2) To determine if there is a wave of repolarisation in the repolarisation condition we have to compare row 0 vs row 1 to 3 in the repolarisation condition (Fig.5K). Row 1 is not significantly different to row 0, but rows 2 and 3 are different and the vectors show obviously lower polarity than row 0. Hence no wave of repolarisation is detected over rows 1 to 3.

      (3) To determine if it is easier to repolarise in the de novo condition, our reference for establishment of a repolarisation pattern is the polarisation condition in rows 0 to 3. So, we compare repolarisation condition vs repolarisation in de novo condition, row 0 vs row 0, row 1 vs row 1, row 2 vs row 2 and row 3 vs row 3 – in each case no significative difference in polarity is detected, supporting our conclusion that it is not easier to repolarise in the de novo condition.

      We agree that the variations in row 3 are puzzling, but there is no evidence that this is due to propagation of polarity from row 0, and so in terms of our three questions, it does not alter our conclusions.

      Changes to manuscript: We have extensively revised the text describing the results in Fig.5 to hopefully make the reasons for our conclusions clearer and also be more cautious in our conclusions in Results section ‘Induced core protein relocalisation…’. 

      For the related boundary intensity data in Fig 6, the authors need to describe exactly how boundaries were chosen or excluded from the analysis. Ideally, all boundaries would be classified as either meido-lateral (meaning anterior-posterior) or proximal-distal depending on angle. 

      We thank the reviewer for pointing out that this was not clear.

      All boundaries were classified following their orientation compared to the Fz over-expression boundary using hh-GAL4 expressed in the wing posterior compartment. Horizontal junctions were defined as parallel to the Fz over-expression boundary (between 0 and 45 degrees) and mediolateral junctions as junctions linking two horizontal boundaries (between 45 and 90 degrees).

      Changes to manuscript: The boundary classification detailed above has been added in the Materials and Methods.

      If the authors believe their Fig 5 and 6 analyses, how do they explain that hairs are reoriented well beyond where the core proteins are not? This would be a dramatic finding, because as far as I know, when core proteins are polarized, prehair orientation always follows the core protein distribution. Surprisingly, the authors do not so much as comment about this. The authors should age their wings just a bit more to see whether the prehair pattern looks more like the adult hair pattern or like that predicted by their protein orientation results.

      Again the reviewer makes an interesting point, and we agree that this is something that we should have more directly addressed in the manuscript.

      There are three reasons why we might expect adult trichomes to show a different effect from the measured core protein polarity pattern seen in our experiments:

      (i) we are assaying core protein polarity at 28h APF, but trichomes emerge at >32h APF, so there is still time for polarity to propagate a bit further from the boundary. We now have added data showing that by the point of trichome initiation, the wave of polarisation extends 3-4 cell rows (Fig.S5A).

      (ii) it has long been known that a strong localisation of core proteins at a cell edge is not required for polarisation of trichome polarity from a boundary. For instance, in Strutt & Strutt 2007 we show clones of cells overexpressing Fz causing propagation through pk[pk-sple] mutant tissue where there is no detectable core protein polarity. We were following up prior observations of Adler et al., 2000 in the wing and Lawrence et al., 2004 in the abdomen.

      (iii) there is evidence to suggest that the polarity of adult trichomes is locally coupled, possibly mechanically. This point is hard to prove without live imaging taking in both initial core protein localisation, the site of actin-rich trichome initiation and then the final orientation of the much larger microtubule filled trichome, and we’re not aware that such data exist. However, Wong & Adler 1993 (JCB) showed that over a number of hours trichomes become much larger and move towards the centre of the cell, presumably becoming decoupled from any core protein cue. The images in Guild … & Tilney, 2005 (MBoC)  are also interesting to look at in this regard. Finally, septate junction proteins have been implicated in local alignment of trichomes, independently of the core pathway (Venema … & Auld, 2004 Dev Biol).

      Changes to manuscript: Added new data in Fig.S5A showing where trichomes initiate under 6h de novo induction conditions, for comparison to core protein localisation and adult trichome data in Fig.5. Added some text explaining why adult trichome repolarisation might be stronger than the observed effects on core protein localisation in Discussion. 

      Minor points:

      As the authors know, there is a model in the literature that suggests microtubule trafficking provides a global cue to orient PCP. The authors' repolarization data in Fig 4 make a reasonably convincing case against a role for no role for microtubules in cell-scale signaling, but do not rule out a role as a global cue. The authors should be careful of language such as "...MTs and core proteins being oriented independently of each other" that would appear to possibly also refer to a role as a global cue. 

      Thank you for pointing out that this was not clear. We have now modified the text to hopefully address this.

      Changes to manuscript: Text updated in Results section ‘Microtubules do not provide…’.

      Significance:

      There are two negative conclusions and one positive conclusion made by the authors. Provided the above points are addressed, the negative conclusions, that core proteins are not limiting and that microtubules are not involved in cell-scale signaling are solid. The positive conclusion is more nebulous - the authors say that cell-scale signaling is strong relative to cell-cell signaling - but how strong is strong? Strong relative to their prior expectations? I'm not sure how to interpret such a conclusion. Overall, we learn something from these results, though it fails to reveal anything about mechanism. These results will be of some interest to those studying PCP.

      The reviewer raises an interesting point, which is how do you compare the strength of two different processes, even if both processes affect the same outcome (in this case cell polarity). Repolarisation from a boundary has not been carefully studied at the level of core protein localisation in any previous study to our knowledge – this is one of the important novel aspects of this study. Hence there is not a baseline for defining strong repolarisation. Similarly, there has been no investigation of the nature of ‘cell-scale signalling’. This was a considerable challenge for us in writing the manuscript, and we have done our best to find appropriate language that hopefully conveys our message adequately. Minimally our work may provide a baseline for helping to define the ‘strengths’ of these processes in future studies.

      One of our main points is that we can generate an artificial boundary of Fz expression, where Fz levels are at least several fold higher than in the neighbouring cell (e.g. compare Fig.4N’ and O’) and only two rows of cells show a significant change in polarity relative to controls. Even when the tissue next to the overexpression domain is still in the process of generating polarity (de novo condition) then the boundary has little effect on polarity in neighbouring cell rows. This was a result that surprised us, and we tried to convey that by using language to suggest cell-scale signalling was stronger than cell-cell signalling i.e. stronger in terms of the ability to define the final direction of polarity.

      Changes to manuscript: In the revised manuscript we have reviewed our use of language and now avoid saying ‘strong’ but instead use terms such as ‘effective’ and ‘robust’ in e.g. Results section ‘Induced core protein relocalisation…’, the Discussion and we have also changed the title of the manuscript to avoid claiming a ‘strong’ signal.

      Reviewer #2:

      […] Critique

      The experiments described in this paper are of high quality with a sophisticated level of design and analysis. However, there needs to be some recalibration of the extent of the conclusions that can be drawn (see below). Moreover, a limitation of this paper is that, despite the quality of their data, they cannot give a molecular hint about the nature of their proposed cell-scale signal. Below are a two key points that the authors may want to clarify.

      (1) The first set of repolarisation experiment is performed after the global cell rearrangements that have been shown to act as global signal. However, this approach does not exclude the possible contribution of an unknown diffusible global signal.

      A similar point was raised by Reviewer 1. For the convenience of this reviewer, we’ll summarise the arguments against such an unknown cue again below. More broadly, both reviewers asking a similar question indicates that we have failed to lay out the evidence in sufficient detail. In our defence, we have used the same ‘de novo’ paradigm in three previous publications (Strutt and Strutt 2002, 2007; Brittle et al 2022) without attracting (overt) controversy. We have now added text to the Introduction and Results that goes into more detail, as well as more experimental evidence (Fig.S5).

      Firstly, it is worth noting that the global cues acting in the wing are poorly understood, with mostly negative evidence against particular cues accruing in recent years. This makes it a hard subject to succinctly discuss. Secondly, we accept that it is hard to prove there is no influence of global cues, when the nature of those cues and the time at which they act remain unclear. Below we summarise the reasons why we believe there are not significance effects of global cues in our experiments that would influence the interpretation of our results.

      First, our reading of the literature supports a broad consensus that an early radial core planar polarity pattern is realigned by cell flow produced by hinge contraction beginning at around 16h APF (e.g. Aigouy et al., 2010; Strutt and Strutt, 2015; Aw and Devenport, 2017; Butler and Wallingford, 2017; Tan and Strutt, 2025). Taken at face value, this suggests that there are ‘radial’ cues present prior to hinge contraction, maybe coming from the wing margin – arguably these radial cues could be Ft-Ds or Wnts or both, given they are expressed in patterns consistent with such a role (notwithstanding the published evidence arguing against roles for either of these cues). It then appears that hinge contraction supercedes these cues to convert a radial pattern to a proximodistal pattern – whether the radial cues that affect the core pathway earlier remain active after hinge contraction is unclear, although both Ft-Ds and Wnts appear to maintain their ‘radial’ patterns beyond the beginning of hinge contraction (e.g. Merkel et al., 2014; Ewen-Campen et al.,2020; Yu et al., 2020).

      We think that the reviewers are proposing the presence of a proximodistal cue that is active in the proximal region of the wing that we use for our experiments shown e.g. in Fig.5, and that this cue orients core polarity here (but not elsewhere in the wing) in a time window after 18h APF. Ft-Ds and Wnts do not seem to be plausible candidates as they are still in ‘radial’ patterns. This leaves either an unknown proximodistal cue (a gradient of some unknown signalling molecule?), or possibly some ability of hinge contraction to align proximodistal polarity specifically in this wing region but not elsewhere. We cannot definitively rule out either of these possibilities, but neither do we think there is sufficient evidence to justify invoking their existence to explain our observations.

      In particular, the reason that we don’t think there is a proximodistal cue in the proximal part of the wing after 18h APF, is that work from our lab shows that induction of Fz or Stbm expression at times around or after the start of hinge contraction (i.e. >16 h APF) results in increasing levels of trichome swirling with polarity not being coordinated with the tissue axis either proximally or distally (Strutt and Strutt, 2002; Strutt and Strutt 2007). Our simplest interpretation of this is that induction at these stages fails to result in the early radial pattern of core pathway polarity being established and hence a failure of hinge contraction to reorient radial to proximodistal. If hinge contraction alone could specify proximodistal polarity in the absence of the earlier radial polarity, then we would not expect to see swirling over much of the proximal wing (where the forces from hinge contraction are strongest, Etournay et al., 2015).

      In this manuscript, our earliest de novo experiments begin at 18h APF (de novo 10h), then at 20h APF (de novo 8h) and at 22h APF (de novo 6h). The image in Fig. 5B referred to by Reviewer 1, is of a wing where Fz is induced de novo at 22 h APF. In these wings, as expected, the core proteins localise asymmetrically in stereotypical swirling patterns throughout the wing surface (see Fig. 2M and also Strutt and Strutt, 2002; Strutt and Strutt 2007), but – usefully for our experiments – they broadly localise along the proximal-distal axis in the region analysed in Fig. 5B. Given the strong swirling in surrounding regions when inducing at >20h APF, we feel reasonably confident in assuming that the pattern is not due to a proximodistal cue present in the proximal wing. We appreciate that the original manuscript did not show images including the trichome pattern in adjacent regions, so this point would not have been clear, but we now include these in Supplementary Fig.S5. We have also added a note in the legend to Fig. 5B to clarify that the proximodistal pattern seen is local to this wing region.

      Changes to manuscript: Text extended in Introduction and Results to better explain why we believe the de novo conditions that we use most likely result in a polarity pattern that is not significantly influenced by ‘global cues’. Now show zoomed-out images of the surrounding region around the experiment region proximal to the anterior cross-vein region in adult wings, showing that the polarity pattern does not become more proximodistal when induction time is longer, and also that there is not overall proximodistal polarity in proximal regions of the wing, arguing against an unknown proximodistal polarity cue at these stages of development (Fig.S5B-E’’’).

      (2) The putative non-local cell scale signal must be more precisely defined (maybe also given a better name). It is not clear to me that one can separate cell-scale from molecular-scale signal.

      Local signals can redistribute within a cell (or membrane) so local signals are also cell-scale. Without a clear definition, it is difficult to interpret the results of the gene dosage experiments. The link between gene dosage and cell-scale signal is not rigorously stated. Related to this, the concluding statement of the introduction is too cryptic.

      We thank the reviewer for raising this, as again a similar comment was made by Reviewer 1, so we are clearly falling short in defining the term. We have now had another attempt in the Introduction.

      To more specifically answer the point made by the reviewer regarding molecular vs cellular, we are essentially being guided here by the prior computational modelling work, as at the biological level the details are still being worked out. A specific class of previous models only allowed ‘signals’ between core proteins to act ‘locally’, meaning within a cell junction, and within the models there was no explicit mechanism by which proteins on other junctions could ‘detect’ the polarity of a neighbouring junction (e.g. Amonlirdviman et al., 2005; Le Garrec et al., 2006; Fischer et al., 2013). Other models implicitly or explicitly encode a mechanism by which cell junctions can be influenced by the polarity of other junctions (e.g. Meinhardt, 2007; Burak and Shraiman, 2009; Abley et al., 2013; Shadkhoo and Mani, 2019), for instance by diffusion of a factor produced by localisation of particular planar polarity proteins.

      We agree with the reviewer that a cell-scale signal will depend on ‘molecules’ and thus could be called ‘molecular-scale’, but here by ‘molecular-scale’ we mean signals that at the range of the sizes of molecules i.e. nanometers, rather than cell-scale signals that act at the size of cells i.e. micrometers. A caveat to our definition is that we implicitly include interactions that occur locally on cell junctions (<1 µm range) within ‘molecular-scale’, but this is a shorter range than ‘cellular-scale’ which requires signals acting over the diameter of a cell (3-5 µm). Nevertheless, we think the concept of ‘molecular-scale’ vs ‘cell-scale’ is a helpful one in this context, and have attempted to address the issue through a more careful definition of the terms.

      Changes to manuscript: Text revised in Introduction and legend to Fig.1 to more carefully define ‘cell-scale signalling’ and to distinguish it from ‘molecular-scale signalling’. Final sentence of Introduction also altered so we no longer cryptically speculate on the nature of the cell-scale signal but leave this to the Discussion.

      Minor comments. 

      Some of the (clever) genetic manipulation may need more details in the text. For example:

      - Need to specify if the hs-flp approach induces expression throughout the tissue.

      We apologise for the lack of clarity. In all the experiments, the hs-FLP transgene is present in all cells, and heat-shock results in ubiquitous expression. 

      Changes to manuscript: We have clarified this in the Results and Materials and Methods.

      - Need to specify in the text that in the unpolarised condition the tissue is both dsh and fz mutant.

      The reviewer is of course correct and we have updated this point in the text. The full genotype for the unpolarised condition is: w dsh<sup>1</sup> hsFLP22/y;; Act>>fz-mKate2sfGFP, fz<sup>P21</sup>/fz<sup>P21</sup> (see Table S1). So this line is mutant for dsh and fz with induced expression of Fz-mKate2sfGFP. 

      Changes to manuscript: We have clarified this in the relevant part of the Results.

      - Need to specify in the text that the experiment illustrated in Fig 5 is with hh-gal4. 

      As noted by the reviewer, we continued to use the same hh-GAL4 repolarisation paradigm as in Fig.4 and this info was in the legend to Fig.5 legend. However, we agree it is helpful to be explicit about this in the main text.

      Changes to manuscript: We have added this to this section of the Results.

      - Need to address a possible shortcoming of the hh experiment, that the AP boundary is a region of high tension.

      It is true that the AP boundary is under high tension in the wing disc (e.g. Landsberg et al., 2009). But we are not aware of any evidence that this higher tension persists into the pupal wing. In separate studies we have labelled for Myosin II in pupal wings (Trinidad et al 2025 Curr Biol; Tan & Strutt 2025 Nature Comms), and as far as we have noticed have not seen preferentially higher levels on the AP boundary. We think if tension were higher, the cell boundaries would appear straighter than in surrounding cells (as seen in the wing disc) and this is not evident in our images.

      - Need to dispel the possibility that there is no residual polarisation (e.g. of other components) in fz1 mutant (I assume this is the case).

      We use the null allele fz[P21] through this work, and we and others have consistently reported a complete loss of polarisation of other core proteins or downstream components in this background. The caveat to this is that core proteins that persist at cell junctions always appear at least slightly punctate in mutant backgrounds for other core proteins, and so any automated detection algorithm will always find evidence of individual cell polarity above a baseline level of uniform distribution. Hence we tend to use lack of local coordination of polarity (variance of cell polarity angle) as an additional measure of loss of polarisation, in addition to direct measures of average cell polarity. (We discuss this in the QuantifyPolarity manuscript Tan et al 2021 e.g. Fig.S6).

      Changes to manuscript: We now include in the Materials and Methods section ‘Fly genetics…’ a much more extensive explanation of the evidence for specific mutant alleles being ‘null’ for planar polarity function (including dsh1 as raised by Reviewer 1), specifically that they result in no detectable planar polarisation of either other core proteins or downstream effectors, and added appropriate references.

      - Need to provide evidence that 50% gene dosage commensurately affect protein level. 

      This is a good suggestion. In the case of Stbm, we have already published a western blot showing that a reduction in gene dosage results in reduced protein levels (Strutt et al 2016, Fig.S6). We have now performed western blots to quantify protein levels upon reduction of fmi, pk and dgo levels (we actually used EGFP-dgo for the latter, as we don’t have antibodies that can detect endogenous Dgo on western blots).

      Changes to manuscript: When presenting the dosage reduction experiments, we now refer back to Strutt et al., 2016 explicitly for Stbm, and have added western blot data for Fmi, Pk and EGFPDgo in new Fig.S2.

      - I am surprised that the relationship with microtubule polarity was never investigated. Is this true? 

      We agree this is a point that needed further clarification, as Reviewer 1 made a related point regarding the two possible roles for microtubules, one being as a mediator of a global cue upstream of the core pathway, and the second (which we investigate in this manuscript) as a mediator of a cell-scale signal downstream of the core pathway.

      Both the Uemura and Axelrod groups have published on potential upstream function as a global cue mediator in the Drosophila wing (e.g. Shimada et al., 2006; Harumoto et al., 2010; Matis et al., 2014).

      Both groups have also looked out whether core pathway components could affect orientation of microtubules (Harumoto et al., 2010; Olofsson at al., 2014; Sharp and Axelrod 2016). Notably Harumoto et al., 2010 observed that in 24h APF wings, loss of Fz or Stbm did not alter microtubule polarity from a proximodistal orientation consistent with the microtubules aligning along the long cell axis in the absence of other cues. However, this did not rule out an instructive effect of Fz or Stbm on microtubule polarity during core pathway cell-scale signalling. The Axelrod lab manuscripts saw interesting effects of Pk protein isoforms on microtubule polarity, albeit not throughout the entire wing, which hinted at a potential role in cell-scale signalling. Taken together this prior work was the motivation for our directed experiments to specifically test whether the core pathway might generate cell-scale polarity by instructing microtubule polarity.

      Changes to manuscript: We have revised the Results section ‘Microtubules do not…’ to make a clearer distinction regarding possible ‘upstream’ and ‘downstream’ roles of microtubules in Drosophila core pathway planar polarity and the motivation for our experiments investigating the latter.

      - The authors suggest that polarity does not propagate as a wave. And yet the range measured in adult is longer than in the pupal wing. Explain. 

      Again an excellent point, also made by Reviewer 1, which we have now addressed explicitly in the manuscript. For the convenience of this reviewer, we lay out the reasons why we think the propagation of polarity seen in the adult is further than seen for core protein localisation.

      There are three reasons why we might expect adult trichomes to show a different effect from the measured core protein polarity pattern seen in our experiments:

      (i) we are assaying core protein polarity at 28h APF, but trichomes emerge at >32h APF, so there is still time for polarity to propagate a bit further from the boundary. We now have added data showing that by the point of trichome initiation, the wave of polarisation extends 3-4 cell rows (Fig.S5A).  

      (ii) it has long been known that a strong localisation of core proteins at a cell edge is not required for polarisation of trichome polarity from a boundary. For instance, in Strutt & Strutt 2007 we show clones of cells overexpressing Fz causing propagation through pk[pk-sple] mutant tissue where there is no detectable core protein polarity. We were following up prior observations of Adler et al 2000 in the wing and Lawrence et al 2004 in the abdomen.

      (iii) there is evidence to suggest that the polarity of adult trichomes is locally coupled, possibly mechanically. This point is hard to prove without live imaging taking in both initial core protein localisation, the site of actin-rich trichome initiation and then the final orientation of the much larger microtubule filled trichome, and we’re not aware that such data exist. However, Wong & Adler 1993 (JCB) showed that over a number of hours trichomes become much larger and move towards the centre of the cell, presumably becoming decoupled from any core protein cue. The images in Guild … & Tilney, 2005 (MBoC)  are also interesting to look at in this regard. Finally, septate junction proteins have been implicated in local alignment of trichomes, independently of the core pathway (Venema … & Auld, 2004 Dev Biol).

      Changes to manuscript: Added new data in Fig.S5A showing where trichomes initiate under 6h de novo induction conditions, for comparison to core protein localisation and adult trichome data in Fig.5. Added some text explaining why adult trichome repolarisation might be stronger than the observed effects on core protein localisation in Discussion. 

      - The discussion states that the cell-intrinsic system remains to be fully characterised, implying that it has been partially characterised. What do we know about it? 

      As the reviewer probably realises, we were attempting to side-step a long speculative discussion about the various hints and ideas in the literature by grouping them under the umbrella of ‘remaining to be fully characterised’. We would argue that this current manuscript is the first to attempt to systematically investigate the nature of ‘cell-scale signalling’. The lack of prior work is probably due to two factors (i) pioneering theoretical work showed that a sufficiently strong global signal coupled with ‘local’ (i.e. confined to one cell junction) protein interactions was sufficient to polarise cells without the need to invoke the existence of a cell-scale signal; (ii) there is no easy way to identify cell-scale signals as their loss results in loss of polarity which will also occur if other (i.e. more locally acting) core pathway functions are compromised.

      The main investigation of the potential for cell-scale signalling has been another set of theory studies (Burak and Shraiman 2009; Abley et al., 2013; Shadkhoo and Mani 2019) which have considered the possibility of diffusible signals. In our present work we have further considered the possibility of a ‘depletion’ model, based on the pioneering theory work of Hans Meinhardt, and as discussed above the possibility that microtubules could mediate a cell-scale signal.

      Changes to manuscript: We have revised the Discussion to hopefully be clearer about the current state of knowledge.

      Reviewer #3:

      […] Major comments

      The data are clearly presented and the manuscript is well written. The conclusions are well supported by the data. 

      (1) The authors use a system to de novo establish PCP, which has the advantage of excluding global cues orienting PCP and thus to focus on the cell-intrinsic mechanisms. At the same time, the system has the limitation that it is unclear to what extent de novo PCP establishment reflects 'normal' cell scale PCP establishment, in particular because the Gal4/UAS expression system that is used to induce Fz expression will likely result in much higher Fz levels compared with the endogenous levels. The authors should briefly discuss this limitation. 

      We apologise if this wasn’t clear. We only used GAL4/UAS overexpression when we were generating an artificial boundary of Fz expression with hh-GAL4 to induce repolarisation. The de novo induction system involves Fz::mKate2-sfGFP being expressed directly under an Act5C promoter without use of GAL4/UAS. In response to a comment from Reviewer 1 we have now carried out western blot analysis which shows that Fz::mKate2-sfGFP levels under Act5C are actually lower than endogenous Fz levels. As we achieve normal levels of polarity, similar to what we measure in wild-type conditions when measured using QuantifyPolarity, we assume that therefore Fz levels are not limiting under these conditions. However, we note that lower than normal levels of Fz might sensitise the system to perturbation, which in fact would be advantageous in our study, as it might for instance have been expected to more readily reveal dosage sensitivity of other components.

      Changes to manuscript: We now describe the levels of expression achieved using the de novo induction system (Fig.S1C-D) and discuss possible consequences in the relevant Results sections and Discussion.

      (2) Fig. 3. The authors use heterozygous mutant backgrounds to test the robustness of de novo PCP establishment towards (partial) depletion in core PCP proteins. The authors conclude that de novo polarization is 'extremely robust to variation in protein level'. Since the authors (presumably) lowered protein levels by 50%, this conclusion appears to be somewhat overstated. The authors should tune down their conclusion. 

      Reviewer 1 makes a similar point about whether we can argue that the lack of sensitivity to a 50% reduction in protein levels actually rules out the depletion model. To address the comments of both reviewers we had now added some further narrative and caveats in the text.

      We nevertheless believe that the experiments shown effectively make the point that there is no strong dosage sensitivity – and it remains our contention that if protein levels were the key to setting up cell-scale polarity, then a 50% reduction would be expected to show an effect on the rate of polarisation. We further note that as Fz::mKate2-sfGFP levels are lower than endogenous Fz levels, the system might be expected to be sensitised to further dosage reductions, and despite this we fail to see an effect on rate of polarisation.

      In a similar vein, Reviewer 2 requested data on whether dosage reduction altered protein levels by the expected amount. We have now added further explanation/references and western blot data to address this.

      Changes to manuscript: Added some narrative and caveats regarding whether lowering levels more than 50% would add to our findings in the Discussion. Revised conclusions to be more cautious including altering section title to read ‘Planar polarity establishment is not highly sensitive to variation in protein levels of core complex components.

      Also added westerns and text/references showing that for the tested proteins there is a reduction in protein levels upon removal of one gene dosage in Results section ‘Planar polarity establishment is…’ and Fig.S2.

      Minor comments :

      (1) Page 3. The authors mention and reference that they used the PCA method to quantify cell polarity magnification and magnitude. It would help the unfamiliar reader, if the authors would briefly describe the principle of this method. 

      Changes to manuscript: More details have been added in Materials & Methods.

      Significance:

      The manuscript contributes to our understanding of how planar cell polarity is established. It extends previous work by the authors (Strutt and Strutt, 2002,2007) that already showed that induction of core PCP pathway activity by itself is sufficient to induce de novo PCP. This manuscript further explores the underlying mechanisms. The authors test whether de novo PCP establishment depends on an 'inhibitory signal', as previously postulated (Meinhardt, 2007), but do not find evidence. They also test whether core PCP proteins help to orient microtubules (which could enhance cell intrinsic polarization of core PCP proteins), but, again, do not find evidence, corroborating previous work (Harumoto et al, 2010). The most significant finding of this manuscript, perhaps, is the observation that local de novo PCP establishment does not propagate far through the tissue. A limitation of the study is that the mechanisms establishing intrinsic cell scale polarity remain unknown. The work will likely be of interest to specialists in the field of PCP.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      • *

      __Reviewer #1 __


      Major comments


      1. The manuscript posits that the loss of function of MASh components (Ogc1 and Aralar) decreases adrenergic-stimulated lipolysis by altering the cytosolic NAD⁺/NADH ratio, with AMPK/ACC mentioned as possible mediators. However, this remains speculative. Please provide mechanistic data directly linking MASh-dependent NAD⁺/NADH changes to the regulation of lipolysis in brown adipocytes during adrenergic stimulation. Answer 1) The reviewer raises an important point regarding the direct assessment of cytosolic NAD⁺/NADH redox changes as a mechanistic link for altered lipolysis in brown adipocytes lacking MASh components. To address this point, we added new data to the revised manuscript showing lactate/pyruvate ratio as measured by metabolomics. This is a well-established surrogate marker to monitor changes in redox balance. Notably, under basal (non-stimulated) conditions, the lactate/pyruvate ratio did not display any significant differences between Aralar 1 KD and control cells, suggesting preservation of cytosolic NAD⁺/NADH levels in the absence of functional MASh under these conditions. This finding is consistent with reports showing the robustness of NAD⁺ regeneration via multiple shuttles and the possibility of metabolic compensation when one shuttle is compromised (PMID: 40540398; PMID: 37647199).

      The results have been added as new supplementary Figure 1 as following:

      Our new metabolomics data also revealed substantial reductions in the aspartate/glutamate ratio in Aralar 1 knockdown cells, serving as a metabolomic signature of impaired MASh function and reduced exchange of these amino acids between the cytosol and mitochondria. Given that the MASh is a major mechanism for exporting cytosolic reducing equivalents into the mitochondria under high metabolic demand, its loss would be expected to impact redox homeostasis, particularly under adrenergic stimulation when glycolytic flux and lipolytic activity are elevated (PMID: 40540398).

      Importantly, although our redox surrogate marker did not detect alterations, this may be explained by activation of compensatory pathways, most notably the glycerol phosphate shuttle (GPSh), which is highly expressed and active in brown adipocytes. Indirect support for this compensation comes from data shown in figure 4I showing reduced glycerol release in Aralar 1 KD cells upon norepinephrine stimulation and blocked lipolysis. This suggests a redirection of glycolytically derived G3P away from release and toward enhanced cycling within the GPSh, supporting cytosolic NAD⁺ regeneration via mitochondrial FAD-dependent G3PDH and cytosolic NAD⁺-dependent G3PDH activity. This is consistent with studies documenting that the combined action of MASh and GPSh maintains NAD redox homeostasis in brown adipocytes especially during non-thermogenic conditions (PMID: 168075; PMID: 40540398; PMID: 37647199). We have included a discussion about this possibility at page 9, third paragraph as follows:

      *“Previous studies have shown that BAT exhibits high activity of mitochondrial FAD-dependent glycerol-3-phosphate dehydrogenase (mG3PDH), which functions as an electron sink to sustain low cytosolic NADH levels essential for continuous glycolytic flux [11]. Accordingly, suppression of the MASh, either genetically or pharmacologically, is likely to induce a compensatory upregulation of the GPSh. This adaptation would enhance G3P turnover, contributing to the maintenance of cytosolic NAD redox balance. Moreover, the increased flux through the GPSh could favor fatty acid esterification and triglyceride synthesis or re-esterification, consistent with our findings in Ogc and/or Aralar 1 KD cells, where (i) triglyceride content rises (Fig. 3), (ii) overall respiratory rates remain largely unaltered (Figs. 2D–G), and (iii) glycerol release declines significantly (Fig. 4I). Notably, the decrease in glycerol release persists even when lipolysis is blocked by ATGlistatin, suggesting that the available G3P pool is rerouted from dephosphorylation and extracellular release toward oxidation to DHAP by mG3PDH to regenerate cytosolic NAD+ under MASh-deficient conditions. We propose that interference with the MASh does not directly impact lipolysis but instead alters the cellular balance between DHAP and G3P owing to enhanced activity of the GPSh. This metabolic shift would favor the esterification of G3P with free fatty acids, thereby promoting triglyceride synthesis. These results support the notion that, even during adrenergic stimulation—when long-chain unsaturated fatty acids and their CoA esters strongly inhibit mG3PDH activity [11]—the residual flux through the glycerophosphate shuttle remains critical for sustaining cytosolic NAD redox equilibrium [11,19,32].” *

      • *

      At the mechanistic level, adrenergic stimulation in brown adipocytes activates robust lipolysis and thermogenic gene programs, generating high NADH that must be efficiently reoxidized to sustain flux through glycolysis and lipolysis-linked pathways. Our findings are consistent with a model in which the loss of MASh does not prevent cytosolic NAD⁺ regeneration or lipolytic flux during acute adrenergic stimulation, due to compensatory upregulation of the GPSh, as suggested by the glycerol release changes. Thus, while MASh normally acts as a conduit for NADH export and aspartate/glutamate exchange, in its absence, the GPSh maintains cytosolic redox balance, thereby sustaining glycolytic and lipolytic capacity.

      We agree that future studies should employ direct measurements of cytosolic NAD⁺/NADH ratios (e.g., genetically-encoded redox sensors) during adrenergic stimulation and specific pharmacological inhibition of both shuttles to dissect these relationships in greater detail. We sincerely appreciate the reviewer's input, which has prompted us to clarify the indirect but robust evidence supporting a role for compensatory redox shuttle activity in preserving brown adipocyte lipolysis in the setting of MASh impairment.

      We have further added a new paragraph in the discussion section (page 10)::

      *“Mechanistically, the connection between the MASh and lipolysis appears to involve regulation of the cytosolic NAD⁺/NADH redox balance. MASh activity facilitates the regeneration of NAD⁺ from NADH in the cytosol, primarily through the reduction of oxaloacetate to malate by cytosolic malate dehydrogenase (Fig. 1G-H). Despite the theoretical expectation that reductions in MASh activity would disturb redox homeostasis, our metabolomic data show that the lactate/pyruvate ratio remains unchanged under conditions of MASh impairment, indicating that the overall cytosolic NAD⁺/NADH ratio is maintained (Figure S1A-C). While direct measurements of cytosolic NAD⁺/NADH were not performed, the preserved lactate/pyruvate ratio in Aralar 1 KD cells under basal conditions strongly suggests redox stability, likely due to compensatory activity by alternative mitochondrial shuttles or metabolic adaptations that maintain NAD redox homeostasis despite MASh impairment [18,33]. *

      Previous evidence indicates that BAT exhibits high activity of mitochondrial FAD-dependent glycerol-3-phosphate dehydrogenase (G3PDH), which acts as an electron sink to sustain low cytosolic NADH levels critical for glycolysis [34]. In this sense, it is conceivable that genetic or pharmacological suppression of MASh triggers compensatory enhancement of the G3P shuttle, increasing G3P availability and facilitating the maintenance of cytosolic NAD redox balance. This adaptation could also promote fatty acid esterification and triglyceride synthesis or re-esterification, aligning with our observations that in Ogc and/or Aralar 1 KD cells: (i) triglyceride levels increase (Fig. 3); (ii) overall respiratory rates are preserved (Figs. 2D–G); and (iii) glycerol release is significantly reduced (Fig 4I).”

      • *

      __ The absence of in vivo analysis of lipid-droplet size in MASh loss-of-function models is a major concern. In vitro results could be confounded by differences in differentiation stage between groups. Please document equivalent adipogenesis across groups (e.g., Pparg/Cebpa/Plin1/Fabp4 expression).__

      Answer 2) We thank the reviewer for the thoughtful and constructive comment regarding potential confounding by differences in differentiation stage, and for highlighting the importance of documenting equivalence between experimental groups. We appreciate the opportunity to clarify and provide additional assurance on this point.

      As detailed in our manuscript, we have performed qPCR analysis of multiple well-established markers of brown adipocyte differentiation, including Ucp1, Elovl3, Prdm16, Pparg, Cebpa, Plin1, and Fabp4, in both scramble, aralar1 KD, and Ogc KD cells (see Fig. S1A and accompanying text). Our results show no apparent effect of these genetic interventions on overall differentiation, as the expression levels of these key markers were consistently unaltered across groups. Furthermore, adenoviral-mediated knockdown of Ogc achieved an approximate 80% reduction in Ogc mRNA (see Fig. S1B), yet most differentiation markers remained unaffected. We did observe significant increases in Atgl, Pgc1α, and Tfam mRNA levels, which may indicate a degree of pathway reprogramming without affecting the general differentiation profile. We propose that interference with the MASh does not directly impact lipolysis but instead alters the cellular balance between DHAP and G3P owing to enhanced activity of the GPSh. This metabolic shift would favor the esterification of G3P with free fatty acids, thereby promoting triglyceride synthesis.

      Additional experimental support for equivalent differentiation can be drawn from our respirometry data presented in Figures 2E and 2G. These figures demonstrate that respiratory rates upon norepinephrine stimulation, which is a sensitive indicator of brown adipocyte thermogenic capacity, were essentially identical in scramble, aralar1 KD, and Ogc KD cells. Since norepinephrine-stimulated respiration requires both functional mitochondria and the full differentiation of brown adipocytes, these results strongly support the conclusion that silencing either MASh component does not impair the fundamental ability of cells to undergo brown adipocyte differentiation or achieve functional thermogenic competence.

      This is consistent with published findings showing that norepinephrine triggers robust respiration and thermogenic activation only in fully differentiated and functional brown adipocytes, making such measurements a widely accepted proxy for differentiation status and mitochondrial integrity. Thus, the equivalent respiratory responses observed in all groups further validate that differentiation was not compromised by the genetic interventions.

      We hope this clarifies that equivalent adipogenesis was carefully documented and that any observed phenotypes are unlikely to be attributable to differences in differentiation stages. Thank you again for your rigorous assessment and for helping to ensure the robustness of our study.

      __ Please include rescue experiments (add-back OGC1 and Aralar) to rule out siRNA/shRNA off-target effects and verify that the phenotype stems from MASh loss of function.__

      Answer 3) We thank the reviewer for this important suggestion regarding the inclusion of rescue experiments with add-back of Ogc and Aralar to definitively exclude off-target effects of the siRNA/shRNA-mediated knockdowns.

      We would like to kindly point out that although we did not perform add-back rescue experiments directly, the consistency of phenotypes observed across two independent genetic interventions—aralar 1 KD and Ogc KD—strongly argues against off-target effects being responsible for the observed metabolic and functional alterations. Specifically, both knockdowns yielded remarkably similar phenotypes in multiple assays, including respirometry analyses, mitochondrial morphology, lipid droplet homeostasis, and lipid metabolism, supporting the conclusion that these effects stem from MASh loss of function rather than nonspecific silencing.

      Furthermore, our new supplementary data (new Supplementary Figure 1A-F) reveals a significant reduction in the aspartate/glutamate ratio in Aralar 1 KD cells, a compelling functional readout for MASh impairment. This molecular evidence corroborates that our genetic interventions effectively disrupted MASh activity as intended.

      We sincerely appreciate the reviewer’s thorough evaluation and understand the importance of rescue experiments. While recognizing their value, we believe the convergent genetic, metabolic, and functional evidence presented across two different MASh components provides strong and consistent support that the phenotypes observed are due to specific loss of MASh function.


      __ Please expand on physiological significance: What is the importance of MASh regulation of BAT lipolysis in long-term adaptive thermogenesis?__

      Answer 4) This is a very interesting aspect, and we have included a new paragraph in the discussion section (page 14) to address it as follows:

      “Our results, supported by recent literature, strongly indicate that the malate–aspartate shuttle (MASh) plays a key role in facilitating fatty acid–dependent thermogenesis in brown adipocytes. Specifically, BAT-targeted overexpression of GOT1 has been shown to enhance β-oxidation and support acute cold-induced thermogenesis (PMID: 40540398). Interestingly, genetic ablation of GOT1—and thus MASh inhibition—preserves cold-induced thermogenesis by promoting a metabolic shift from fatty acid to glucose oxidation. Our findings corroborate and extend these observations by demonstrating that MASh impairment sustains overall respiratory activity in norepinephrine-stimulated brown adipocytes (Figures 2D–2G), while concurrently impairing lipolysis and resulting in an accumulation of small lipid droplets (Figures 3 and 4). Collectively, these data suggest that MASh not only modulates substrate preference towards fatty acid oxidation but also facilitates lipolysis, an essential upstream step that enables lipid oxidation and supports thermogenic heat production.”

      Minor comments

      1. __ Fig. 4 legend/title contains a typo ("lypolysis" → lipolysis).__ Answer 1) Corrected

      __ In Fig. 2 legend line: "Adevirus-mediated" → Adenovirus-mediated; "OCAR" → OCR.__

      Answer 2) Corrected

      __ For lipolysis imaging, you already show Forskolin/Atglistatin/Etomoxir controls; add a vehicle-only time course overlay in the main figure (currently in text/legend) to aid visual comparison.__

      Answer 3) We thank the reviewer for pointing this out. To improve clarity, we have updated the labeling in Figures 3 and 4: “basal” now clearly refers to the unstimulated/untreated condition, and the previously labeled “UT” condition has been clarified as “untransduced.” These changes make the figure legends and data presentation more consistent and easier to interpret.

      __ Ensure consistent gene symbols (Atgl/Pnpla2), and protein capitalization.__

      Answer 4) Corrected.

      __Reviewer #2 __

      Major points:

      1. __ In the current manuscript, mitochondrial morphology (area, aspect ratio, and roundness) was analyzed in OGC1 KD cells using TMRE, whereas MitoTracker Deep Red (MTDR) was used in Aralar1 KD cells. Notably, TMRE is a ΔΨm-dependent probe. The signal intensity can change, or the distribution may reflect alterations in membrane potential rather than true morphological changes. Therefore, the observed differences in OGC1 KD cells based on TMRE staining may be confounded by the dye's functional dependence, potentially biasing the conclusions. It is recommended to evaluate mitochondrial morphology with consistent trackers across conditions. In addition, in the subsequent OCR analysis, mitochondrial area was used for normalization. Please clarify which staining method was employed, and provide justification for its suitability.__ Answer 1) We thank the reviewer for this insightful comment. Indeed, TMRE is a membrane potential-sensitive dye and could therefore potentially affect measurements of mitochondria.

      We would like to point out that mitochondrial morphology was quantified based on mitochondrial area rather than fluorescence intensity. To create an accurate binary map of mitochondria, we used a low threshold, which allowed us to include even weakly stained mitochondria and thereby detect them independently of their membrane potential. In all imaged cells, TMRE signal was sufficient to reliably identify mitochondrial pixels. Moreover, these images were acquired using a confocal microscope, where the risk of pixel expansion due to higher fluorescence intensity is minimized. Lastly, given that overall mitochondrial oxygen consumption in these cells remains largely intact, we do not expect a substantial loss of membrane potential, although minor effects cannot be entirely excluded.

      We opted to use TMRE for imaging Ogc KD cells because the scramble control for these shRNA viruses carries an mKate fluorescent tag, which overlaps with the MTDR signal. Since accurate assessment of transduction efficiency relied on detecting mKate, MTDR could not be used in these experiments. Importantly, we only compare mitochondrial morphology within the same staining condition and do not draw conclusions across cells stained with different dyes.

      To ensure transparency, we have added a new section at the discussion (page 17, 2nd paragraph) highlighting the potential influence of ΔΨm-dependent dyes on morphological measurements as follows:

      “It is also important to note that mitochondrial morphology was quantified using MTDR in Aralar 1 KD cells and TMRE in Ogc KD cells due to experimental constraints (see Methods). TMRE is a membrane potential–dependent dye, which could potentially influence morphology measurements. To minimize this risk, we used confocal microscopy, which reduces the likelihood of pixel expansion due to higher fluorescence intensity, and set thresholds to detect even weakly stained mitochondria. Nonetheless, we cannot fully exclude the possibility that the differences in morphology observed between Aralar 1 and Ogc KD are influenced by the use of different dyes; however, statistical comparisons were never performed across samples stained with different dyes.”

      Also, we have expanded the Methods section (page 22, 2nd paragraph) to include a rationale for using these dyes and describe the analysis protocol as following:

      “TMRE was used for Ogc KD cells because the scramble control for the shRNA viruses carries an mKate fluorescent tag, which overlaps with MTDR fluorescence, preventing its use. MTDR was used for Aralar KD cells. Image Analysis was performed in FIJI (ImageJ, NIH). For the quantification of mitochondrial morphology and area, images stained with TMRE or MTDR were analyzed. Thresholds were adjusted to ensure that even weakly stained mitochondria were detected and included in the analysis. Only the mitochondrial area was evaluated, independent of fluorescence intensity.”

      Minor points:

      1. __ In the introduction, the authors state that "LDH activity increases in the context of BAT activation". This point is important for the logic of the manuscript, reference [10] cited here is not sufficient to support this claim. It is recommended to provide appropriate references to support this statement.__ Answer 1) We have substantially changed this paragraph in the revised manuscript to better explain why LDH would not act as a major player in contributing to NAD redox balance in the context of BAT thermogenesis, as follows:

      “In mammalian cells, cytosolic NAD⁺ is regenerated through lactate dehydrogenase (LDH), the glycerol-3-phosphate shuttle (GPSh), or the malate-aspartate shuttle (MASh). In BAT, however, lactate production rises only slightly with adrenergic activation and most lactate is oxidized via the TCA cycle, suggesting that LDH primarily consumes NAD⁺ rather than regenerating it [PMID: 30456392; PMID: 37337122; PMID: 30456392; PMID: 37802078; PMID: 40982723]. Consequently, mitochondrial redox shuttles become critical for sustaining cytosolic NAD⁺ supply”.

      We have also provided additional references to support this new section at the introduction.

      __ In Fig. 1A and B-D, there are inconsistencies and duplications in the abbreviation labels. Please check and revise accordingly. __

      Answer 2) We thank the reviewer for this comment. We would like to clarify that Figure 1A is a schematic overview of the system, while Figures 1B–D show protein expression in specific contexts: whole BAT (B), whole liver (C), and BAT mitochondria (D). In Figures 1B and 1C, all components are shown because both cytosolic (MDH1 and GOT1) and mitochondrial proteins (MDH2, GOT2, Aralar 1 and 2 and OGC) are present. In contrast, Figure 1D shows only mitochondrial components (OGC, Aralar1, MDH2, and GOT2). Although Aralar2 is a mitochondrial protein, it was not detected in this study (Forner et al., 2009). Similarly, cytosolic components such as MDH1 and GOT1 are not shown in Figure 1D because they are absent in the mitochondrial fraction. We have revised the figure legend to make these distinctions clearer.

      __ In Fig. S1, the number of n indicated does not match the number of data points shown. Please clarify whether these represent technical replicates or biological replicates, and provide a detailed description of the statistical methods used throughout the manuscript.__

      Answer 3) We thank the reviewer for catching this and allowing us to correct our mistakes. In the revised version, we have corrected the figure legend of Supplementary Figure 1 so that the number of n matches the data points shown.

      __ Please provide details on the normalization strategy used in the BODIPY-C12/BODIPY-493 staining analysis, such as whether fluorescence intensity was quantified as mean or integrated values, and whether the analysis was normalized to lipid droplet area, cell number, or baseline. Since lipolytic stimulation can reduce droplet size and increase droplet number, these factors may bias the results. __

      Answer 4) We thank the reviewer for this important comment and apologize for the lack of detail regarding this analysis. The analysis of BODIPY-C12 and BODIPY-493 was performed by quantifying the mean fluorescence intensity of BODIPY-C12 detected within a mask generated from the BODIPY-493 signal. This approach allowed us to define all lipid droplets and measure the release of previously esterified C12. To account for variability across samples, the data were normalized to each sample’s individual baseline at time point 0 and expressed as fold change relative to this baseline. In the revised manuscript we have included this description in the Methods section (page 18, last paragraph) for clarity and reproducibility, as following:

      “Lipid Droplet area was defined based on Bodipy 493/503 signal, which was used to generate a mask identifying all lipid droplets. Within this mask, the mean fluorescence intensity of BODIPY C12 was quantified over time to monitor the release of previously esterified C12. To account for variability between samples, data were normalized to each sample’s individual baseline at time point 0 and expressed as fold change relative to this baseline.”

      __ The manuscript notes that the unexpected result in Fig. 3K-M in parallel with increased Atgl mRNA expression might be because it does not reflect protein levels or enzymatic activity. To strengthen this point, it is recommended to include data on ATGL and phosphorylation ATGL. __

      Answer 5) We thank the reviewer for this constructive comment. We have clarified these aspects in the revised Results and Discussion sections to reflect this interpretation more accurately as follows:

      “Notably, Atgl mRNA measurement in our study was primarily used as a marker of brown adipocyte differentiation, rather than as a direct indicator of ATGL protein abundance or enzymatic activity. We detected increased Atgl expression only in Ogc KD cells (Fig. S1H), but not in Aralar 1 KD cells (Fig. S1G). This likely does not reflect a major difference in differentiation status, as other brown adipocyte markers and norepinephrine-stimulated respiration were comparable between scramble and knockdown cells (Fig. 2D-G and 2N-O and S1G-H). Although lipolysis was not evaluated in Ogc KD cells, in Aralar 1 KD cells basal lipolysis remained unchanged (Fig. 4D-E and 4G-I), whereas norepinephrine-stimulated lipolysis was delayed or partially inhibited. Notably, the enhanced fatty acid esterification observed in Ogc KD cells despite elevated Atgl expression is not contradictory, since in brown adipocytes lipolysis and re-esterification occur concurrently to sustain high lipid turnover [34].

      __ Red-on-black is not a great color code for IMFs, how about black-and-white? __

      Answer 6) We have changed color text for white on figures 2H and K as suggested.

      __Reviewer #3 __

      Major points;

      1. __ Although in the manuscript Veliova and coworkers demonstrated that MAS is functional in brown adipocytes showing kinetic parameters equivalent to that previously described in other tissues, surprisingly, when its components are downregulated, no effect, or very little, on mitochondrial respiration is found (figure 2). This is an intriguing result since MAS disruption has been widely reported to impair respiration in different cell types and tissues. However, since no direct evidence of MAS dysfunction is provided, it is possible that MAS may still remain partially or fully functional under the conditions used by the authors, and therefore this point needs to be clarified to validate these results.__ Answer 1) We thank the reviewer for the insightful comment and the opportunity to clarify these important points regarding MASh dysfunction validation in our study. We acknowledge the reviewer’s observation that mitochondrial respiration was largely unaffected by MASh component knockdown, which is indeed intriguing. Importantly, as already indicated in our responses to Reviewer 1, we have provided new data showing direct molecular evidence of MASh impairment through substantial reductions in the aspartate/glutamate ratio in Aralar 1 KD cells (new Supplementary Figure S1F). This ratio is a well-established functional readout reflecting MASh activity and amino acid exchange between cytosol and mitochondria, as demonstrated in original experimental studies of MASh function in multiple tissues including brown adipocytes (PMID: 4436323). The reduction in the aspartate/glutamate ratio directly confirms loss of MASh functionality even though respiratory rates remained unchanged, likely due to metabolic compensation by robust glycerol phosphate shuttle (GPSh) activity, as further supported by our data showing reduced glycerol release upon norepinephrine stimulation in Aralar 1 KD cells cells (Figure 4I). This metabolic rerouting maintains cytosolic NAD⁺ regeneration and partially preserves respiration and energy metabolism under these experimental conditions (PMID: 168075; PMID: 40540398; PMID: 37647199). Thus, the combination of metabolomic, respirometry, and functional lipid data strongly indicates that MASh activity was disrupted specifically and effectively by our genetic interventions. This molecular evidence was already signposted in our original manuscript and responses, underscoring that MASh loss of function—and not residual or compensatory MASh activity—is responsible for the phenotypes reported. We greatly appreciate the reviewer’s insightful attention to this critical mechanistic issue and hope this provides clear reassurance that MASh impairment was indeed achieved and functionally validated within our study framework.

      Furthermore, strategies used to downregulate MAS components produce only a partial reduction in mRNA levels, about 70 %, but its outcome on protein levels has not been determined. and the remaining protein level could be sufficient to maintain shuttle activity. Therefore, the effect of silencing at protein level should be analyzed, because as authors also point out on page 16; "mRNA levels may not reflect actual protein levels or activity".

      Answer 2) We thank the reviewer for this important point. Our knockdowns resulted in ~70–80% reduction in mRNA levels. While not complete, this represents a substantial decrease and is sufficient to produce strong functional effects. At the time the experiments were performed, we did not have access to suitable antibodies, and the available antibodies did not provide reliable signals in our samples, which is why we used qPCR to estimate knockdown efficiency. Importantly, we observed clear phenotypic changes in both knockdowns (Aralar and OGC), and both showed very similar phenotypes. This suggests that the level of knockdown was sufficient to significantly impair MAS activity. In the revised version we added new data which further validated the functional impact of Aralar KD (given that this protein has an alternative isoform, as pointed out by the reviewer). We performed metabolomics experiments measuring aspartate and glutamate levels. Our new data shows that the aspartate-to-glutamate ratio is significantly reduced in Aralar KD cells. This ratio serves as a proxy for glutamate catabolism, and the observed decrease suggests reduced glutamate catabolism, likely due to impaired MAS activity. Therefore, the reduced whole-cell aspartate/glutamate ratio serves as a metabolic signature of MAS impairment, consistent with Aralar KD. These data indicate that Aralar is sufficiently downregulated to produce a functional effect, supporting our conclusion that MAS activity is impaired. The results have been added as new supplementary Figure 1 as follows:

      __ In the case of aspartate/glutamate carriers (AGCs) the role of citrin/slc25a13, the second AGC paralog, should also be analyzed. This AGC isoform is discarded based on proteomic data from brown adipose tissue, but, as it is shown in figure 1B, its levels are similar those of Aralar/slc25a12, the only AGC silenced. Besides, primary brown adipocytes differentiated for 7 days are used here, and it is possible that factors such as culture conditions or differentiation itself could alter AGC levels. Therefore, it is necessary to determine the protein levels of citrin/AGC2, and, if necessary, downregulate it together with the Aralar/AGC1 isoform. citrin/AGC2 activity may be responsible for the observed difference between the OGC and Aralar/AGC1 KD adipocytes.__

      Answer 3) We thank the reviewer for this important point. We chose Aralar1 because it is the isoform predominantly expressed in brown adipose tissue (PMID: 23436904). We acknowledge, however, that compensatory increases in Citrin/AGC2 upon Aralar1 knockdown are possible. To address this, we have included new metabolomics data in the revised manuscript (added as Supplementary Figure 1), which provides additional support that downregulation of Aralar1, even if not complete, is sufficient to cause a metabolic change reflected by a reduced aspartate/glutamate ratio in these cells. This functional change supports that the knockdown of Aralar1 alone is sufficient to study its role in brown adipocytes, although minor compensation by Citrin/AGC2 cannot be entirely excluded.

      To address this explicitly, we have added a paragraph to the discussion (page 13, 2nd paragraph) highlighting the potential for partial compensation by Citrin/AGC2 and explaining why the observed metabolic effects are still attributable to Aralar 1 knockdown, as follows:

      “Phenotypes observed in Aralar 1 KD cells closely resemble those in Ogc KD cells, particularly in terms of lipid metabolism alterations and energy expenditure. The main difference lies in mitochondrial morphology, which is altered in Ogc KD cells but remains unchanged in Aralar 1-silenced cells (Fig. 2J,M). Unlike Ogc, which lacks an alternative isoform, Aralar 1 has a paralog Aralar 2 (Citrin, or SLC25A13) that may partially compensate for its loss. This potential compensation might explain the preservation of mitochondrial morphology in Aralar 1 KD cells. Nonetheless, our metabolomics data demonstrate that downregulation of Aralar 1 alone significantly reduces the aspartate/glutamate ratio (Fig. S1D-F). Since this ratio reflects glutamate catabolism, its decrease indicates impaired malate-aspartate shuttle activity and reduced glutamate catabolism. Therefore, although compensation by Aralar 2 cannot be entirely excluded, Aralar 1 KD alone suffices to cause substantial impairment of malate-aspartate shuttle function”.

      • *

      __ OGC and Aralar/AGC1 silencing is associated with the accumulation of smaller lipid droplets and impaired norepinephrine-induced lipolysis, but no mechanistical evidence is provided. The authors discuss a role for AMPK signaling associated with the redox unbalance generated by MAS disfunction but neither of them is proven.__

      Answer 4) We thank the reviewer for this insightful question, which was also raised by Reviewer 1 (see Reviewer 1, Question 1 above). Here, we aim to clarify the mechanistic basis by which MASh may regulate lipolysis in BAT in a complementary and refined manner.

      Our new data directly addresses this issue by examining cytosolic redox status through the lactate/pyruvate ratio, a well-established indicator of NAD⁺/NADH balance. Under basal conditions, Aralar 1 KD cells showed no change in this ratio compared to controls, indicating preserved cytosolic NAD⁺ regeneration despite reduced MASh activity. This observation is consistent with previous studies demonstrating the resilience of cellular redox homeostasis through overlapping NAD⁺-regenerating systems (PMID: 40540398; PMID: 37647199). The new results are shown in Supplementary Figure 1.

      At the same time, we detected a marked decrease in the aspartate/glutamate ratio in Aralar 1 KD cells, confirming impaired MASh function and reduced amino acid exchange between cytosol and mitochondria. The lack of redox imbalance likely reflects compensatory mechanisms, most notably the GPSh, which is highly active in brown adipocytes. Supporting this view, Aralar 1 KD cells displayed significantly reduced glycerol release upon norepinephrine stimulation (Fig. 4I), suggesting enhanced metabolic cycling of G3P through mitochondrial and cytosolic G3PDH, thereby sustaining NAD⁺ regeneration and redox equilibrium.

      We therefore propose that, although MASh normally facilitates NADH export and aspartate/glutamate exchange, its loss activates GPSh-mediated compensation that preserves cytosolic NAD⁺/NADH balance and maintains lipolytic flux during adrenergic stimulation. These findings refine our mechanistic understanding of how redox shuttle interplay supports glycolytic and lipolytic processes in BAT. Future studies employing NAD⁺/NADH sensors and simultaneous blockade of both shuttles will be essential to dissect these compensatory mechanisms in greater detail.

      Minor points;

      1. __ Is pyruvate present in respiration medium? If so, no effect on respiration is expected as pyruvate reverses the respiratory defects caused by MAS inactivation. __ Answer 1) Thanks for this important insight. In fact, as indicated in the methods section (page 17, last paragraph) all respirometry experiments were carried out in the absence of pyruvate in the media. Therefore, preserved overall respiratory rates in Aralar 1 and Ogc KD cannot be explained by compensatory pyruvate oxidation present in the media.

      __ In figure 4, only data from Aralar KD cells in relation to norepinephrine-stimulated lipolysis are shown. What happens when OGC is silenced? __

      Answer 2) This is a very interesting and relevant question. We did not perform the norepinephrine-stimulated lipolysis experiments in Ogc-silenced cells, since in most of the other experiments presented in the manuscript Ogc and Aralar 1 silencing converged to very similar, if not identical, phenotypes. Based on these consistent overlaps, we anticipate that Ogc KD would likely lead to comparable effects on lipolysis as observed in Aralar 1 KD cells. Nonetheless, we fully agree that direct assessment of lipolysis upon Ogc KD would strengthen this conclusion, and we consider this an important aspect for future studies.

      __ Nomenclature used for mitochondrial carriers is confusing. Please do not use OGC1 as there is only one isoform. Furthermore, different names for OGC are used in the manuscript; oxoglutarate carrier, malate-ketoglutarate carrier or OGC1/SLC25A11. In the case of citrin/AGC2, Aralar2 is used and is a uncommon designation.__

      Answer 3) We corrected all OGC naming in the revised manuscript. We also changed “aralar 2” for “citrin” since this was more commonly used in the literature.

      __ Some panels of figures 3 and 4 should be improved. Panels 3J, 3L and 4G are difficult to see. In panel 3J please clarify UT line from untreated/NE, are they not transduced? No equivalents conditions are assayed in Aralar KD and OGC KO cells.__

      Answer 4) We thank the reviewer for giving us the opportunity to improve this figure and apologize for the confusing labeling. In the revised version, we have clarified the labels in panels 3J, 3L, and 4G to improve visibility, and we have added descriptions of all abbreviations to the figure legends, accordingly.

    1. Reviewer #1 (Public review):

      Summary:

      This manuscript investigates the effects of oral supplementation with nicotinamide mononucleotide (NMN) on metabolism and inflammation in mice with diet-induced obesity, and whether these effects depend on the NAD⁺-dependent enzyme SIRT1. Using control and inducible SIRT1 knockout mice, the authors show that NMN administration mitigates high-fat diet-induced weight gain, enhances energy expenditure, and normalizes fasting glucose and plasma lipid profiles in a largely SIRT1-dependent manner. However, reductions in fat mass and adipose tissue expansion occur independently of SIRT1. Comprehensive plasma proteomic analyses (O-Link and mass spectrometry) reveal that NMN reverses obesity-induced alterations in metabolic and immune pathways, particularly those related to glucose and cholesterol metabolism. Integrative network and causal analyses identify both SIRT1-dependent and -independent protein clusters, as well as potential upstream regulators such as FBXW7, ADIPOR2, and PRDM16. Overall, the study supports that NMN modulates key metabolic and immune pathways through both SIRT1-dependent and alternative mechanisms to alleviate obesity and dyslipidemia in mice.

      Strengths:

      Well-written manuscript, and state-of-the-art proteomics-based methodologies to assess NMN and SIRT1-dependent effects.

      Weaknesses:

      Unfortunately, the study design, as well as the data analysis approach taken by the authors, are flawed. This limits the authors' ability to make the proposed conclusions.

    2. Author response:

      Reviewer #1 (Public review):

      Summary:

      This manuscript investigates the effects of oral supplementation with nicotinamide mononucleotide (NMN) on metabolism and inflammation in mice with diet-induced obesity, and whether these effects depend on the NAD⁺-dependent enzyme SIRT1. Using control and inducible SIRT1 knockout mice, the authors show that NMN administration mitigates high-fat diet-induced weight gain, enhances energy expenditure, and normalizes fasting glucose and plasma lipid profiles in a largely SIRT1-dependent manner. However, reductions in fat mass and adipose tissue expansion occur independently of SIRT1. Comprehensive plasma proteomic analyses (O-Link and mass spectrometry) reveal that NMN reverses obesity-induced alterations in metabolic and immune pathways, particularly those related to glucose and cholesterol metabolism. Integrative network and causal analyses identify both SIRT1-dependent and -independent protein clusters, as well as potential upstream regulators such as FBXW7, ADIPOR2, and PRDM16. Overall, the study supports that NMN modulates key metabolic and immune pathways through both SIRT1-dependent and alternative mechanisms to alleviate obesity and dyslipidemia in mice.

      Strengths:

      Well-written manuscript, and state-of-the-art proteomics-based methodologies to assess NMN and SIRT1-dependent effects.

      We thank the reviewer for highlighting that state-of-the-art proteomic research methods used, and we report for the first time on significant changes in plasma proteomics in mice after NMN supplementation in both wild-type and SIRT1-KO mice using a combination of DIA mass spectrometry and Olink.

      Weaknesses:

      Unfortunately, the study design, as well as the data analysis approach taken by the authors, are flawed. This limits the authors' ability to make the proposed conclusions.

      We agree that the administration of tamoxifen, along with the associated weight loss, could affect the obesity phenotype. For this reason, we ensured that both Cre-positive and Cre-negative mice received tamoxifen. Importantly, after the tamoxifen 'washout', the two groups weighed essentially the same. Going forward, we plan to address this comment by performing additional statistical tests on all six experimental groups to gain insights into dependencies. Based on your suggestions, we will clarify the limitations of the study design and improve the data analysis approaches to provide stronger support for our conclusions in the revised version of the paper.

      Reviewer #2 (Public review):

      Summary:

      Majeed and colleagues aimed to evaluate whether the metabolic effects of NMN in the context of a high-fat diet are SIRT1 dependent. For this, they used an inducible SIRT1 KO model (SIRT1 iKO), allowing them to bypass the deleterious effects of SIRT1 ablation during development. In line with previous reports, the authors observed that NMN prevents, to some degree, diet-induced metabolic damage in wild-type mice. When doing similar tests on SIRT1 iKO mice, the authors see that some, but not all, of the effects of NMN are abrogated. The phenotypic studies are complemented by plasma proteomic analyses evaluating the influence of the high-fat diet, SIRT1, and NMN on circulating protein profiles.

      Strengths:

      The mechanistic aspects behind the potential health benefits of NAD+ precursors have been poorly elucidated. This is in part due to the pleiotropic actions of NAD-related molecules on cellular processes. While sirtuins, most notably SIRT1, have been largely hypothesized to be key players in the therapeutic actions of NAD+ boosters, the proof for this in vivo is very limited. In this sense, this work is an important contribution to the field.

      We thank the reviewer for acknowledging the importance of this work to the field. In this report, we provide in vivo evidence of the action of NAD+ boosting, and hope to delineate the action of Sirt1, as well as the pleiotropic effects of NAD-related molecules on cellular and metabolic processes.

      Weaknesses:

      While the authors use a suitable methodology (SIRT1 iKO mice), the results show very early that the iKO mice themselves have some notable phenotypes, which complicate the picture. The actions of NMN in WT and SIRT1 KO mice are most often presented separately. However, this is not the right approach to evaluate and visualize SIRT1 dependency. Indeed, many of the "SIRT1-dependent" effects of NMN are consequent to the fact that SIRT1 deletion itself has a phenotype equivalent to or larger than that induced by NMN in wild-type mice. This would have been very evident if the two genotypes had been systematically plotted together. Consequently, and despite the value of the study, the results obtained with this model might not allow for solidly established claims of SIRT1 dependency on NMN actions. The fact that some of the effects of SIRT1 deletion are similar to those of NMN supplementation also makes it counterintuitive to propose that activation of SIRT1 is a major driver of NMN actions. Unbiasedly, one might as well conclude that NMN could act by inhibiting SIRT1. The fact that readouts for SIRT1 activity are not explored makes it also difficult to test the influence of NMN on SIRT1 in their experimental setting, or whether compensations could exist.

      We thank the reviewer for raising this point and acknowledge the limitations of using Sirt1 iKO mice. However, inducing Sirt1 KO in adulthood is a better alternative than using a homozygous Sirt1 KO mouse model, as the latter leads to embryonic lethality and many other developmental defects (1, 2). The proteomics analysis can provide insight into the effects of SIRT1 deletion under chow and high-fat diet (HFD) conditions, as well as the effects of diet in the presence or absence of nicotinamide mononucleotide (NMN). We will discuss these limitations and present the results for the two genotypes together, as suggested.

      A second weak point is that the proteomic explorations are interesting, yet feel too descriptive and disconnected from the overall phenotype or from the goal of the manuscript. It would be unreasonable to ask for gain/loss-of-function experiments based on the differentially abundant peptides. Yet, a deeper exploration of whether their altered presence in circulation is consistent with changes in their expression - and, if so, in which tissues - and a clearer discussion on their link to the phenotypes observed would be needed, especially for changes related to SIRT1 and NMN.

      First, we presented the data in this manner as a proof of concept, to demonstrate the effect of the diet on the plasma proteome and corroborate our findings with those published in the literature. We then investigated the effects of NAD boosting and Sirt1 KO in order to identify significant changes. We agree with the reviewer that it would be unreasonable to validate all the differentially abundant proteins. However, we will choose key proteins and assess their expression in different tissues, such as the liver, white adipose tissue (WAT) and muscles, and attempt to connect these changes with the phenotypes.

      Impact on the field and further significance of the work:

      Despite the fact that, in my opinion, the authors might not have conclusively achieved their main aim, there are multiple valuable aspects in this manuscript:

      (1) It provides independent validation for the potential benefits of NAD+ boosters in the context of diet-induced metabolic complications. Previous efforts using NR or NMN itself have provided contradicting observations. Therefore, additional independent experiments are always valuable to further balance the overall picture.

      (2) The metabolic consequences of deleting SIRT1 in adulthood have been poorly explored in previous works. Therefore, irrespective of the actions of NMN, the phenotypes observed are intriguing, and the proteomic differences are also large enough to spur further research to understand the role of SIRT1 as a therapeutic target.

      (3) Regardless of the influence of SIRT1, NMN promotes some plasma proteomic changes that are very well worth exploring. In addition, they highlight once more that the in vivo actions of NMN, as those of other NAD+ boosters, are pleiotropic. Hence, this work brings into question whether single gene KO models are really a good approach to explore the mechanisms of action of NAD+ precursors.

      We thank the reviewer for their analysis in highlighting the valuable aspects of the manuscript and we hope the revised manuscript will further strengthen the key results.

      References:

      (1) McBurney   MW, Yang   X, Jardine   K, Hixon   M, Boekelheide   K, Webb   JR, Lansdorp   PM, Lemieux   M. The mammalian SIR2alpha protein has a role in embryogenesis and gametogenesis. Mol Cell Biol  2003; 23:38–54.

      (2) Cheng   HL, Mostoslavsky   R, Saito   S, Manis   JP, Gu   Y, Patel   P, Bronson   R, Appella   E, Alt   FW, Chua   KF. Developmental defects and p53 hyperacetylation in Sir2 homolog (SIRT1)-deficient mice. Proc Natl Acad Sci U S A  2003; 100:10794–10799.

    1. Normalization refers to each fact being expressed in one place. The objective is to divide up your information into as many standard subsets as is practical. However, atomic specificity and perfection are impossible and not going to help anyone. Getting to granular may make a huge, unwieldy dataset. Ultimately, analysis will likely require recombining data together again, but that task will be straightforward if your data is normal. Whether you’re working in a relational database or performing analyses on derived tables, appropriate normalization may vary. But considering normalization of data from the start can keep things clean.

      So far, my research has involved tabling ingredients in ancient recipes. There is a certain level of extra granularity I need to provide to account for vague descriptions, but my initial model of separating ingredients created an absolute mess.

    2. When working with legacy data, or even on your own large projects, you’ll need to begin by gathering many datasets into a harmonious collection

      This relates to my Dante topic because modern Dante manuscript research is basically legacy data. Scholars compare multiple medieval copies from different libraries, and have to bring them together to analyze patterns. That is similar to what this line is saying about harmonizing datasets.

    3. data can’t ever truly be “raw”(Gitelman 2013)

      This connects to my Dante manuscripts topic because manuscripts were never raw either. Every copy of Dante had interpretation built into it. Scribes made choices at every stage of copying, so the medieval text is not a pure or untouched version. It is already processed knowledge.

    1. “What data am I using? Whose labor produced it and what biases and assumptions are built into it? Why choose this particular phenomenon for digitization or transcription? And what do the data leave out?”

      The main data sources for my assignments are compiled translations of the original Latin texts. While the translators have included some footnote discussion on why they believe their work is the proper translation, I am sure there are other, conflicting translations of this same work out there.Anything subjective such as translating text into another language adds in the limitations of the new format onto those of the original work.

    1. The key to productive failure as we envision it is to recognize when one’s work is suffering from a type 1 or type 2 fail, and to transform it to a type 3 or type 4

      This is the likely goal of my project. I had some weak foundational material to begin with, but simply making the attempt and getting over a want of total success might help me get a better perspective on my research and on me as a person.

    2. The second, while labeled ‘human failure’ really means that the context, the framework for encountering the technology was not erected properly, leading to a failure to appreciate what the technology could do or how it was intended to be used

      My entire side-tangent on OCR was me completely misinterpreting my project and what it meant. I had plenty of information, I just did not have the skill or context to understand what I was looking at and what it meant. I still also don't quite know what I'm doing.... another possible avenue of failure.

    1. ‘Social Network Analysis promises to revolutionize our knowledge of the social contexts that underpin archaeological fieldwork, putting this power in the hands of everyone from the site director on down.’

      Oh hey, same tool! This social context aligns somewhat with my topic, though I am applying it more to source comparison than this specifically.

    2. outline of a project involves figuring out: the question, problem, or provocation sources (primary, secondary) analytical activity audience product

      I probably should have put more thought into this order of operations when I did my initial project proposal. Changing the method changes the potential product, and I have swapped methods twice since then.

    1. Digital archaeology of the 21st century is necessarily a public archaeology. Public archaeology seeks to promote awareness of what archaeology is, how it is done, and why it matters amongst members of the general audience

      This communicative angle is important for bridging space and time. It allows us in the present to connect to something fleeting as a meal in the past. We can understand the context of its existence, which can then let us appreciate our own lives from a new lens.

    1. In contrast, DNA hypomethylation has also been identified in brain tissue and peripheral blood of schizophrenia patients. For instance, the hypomethylation of promoter region of catechol-O-methyltransferase (COMT), the gene from dopaminergic pathway, has been revealed in frontal lobe in schizophrenia (Abdolmaleky et al., 2006)

      This passage underscores the discovery that DNA hypomethylation—specifically in the promoter region of the COMT gene—has been observed in schizophrenia. COMT plays a critical role in dopamine metabolism, a pathway strongly linked to schizophrenia symptoms.

  6. jpf-projects.bubbleapps.io jpf-projects.bubbleapps.io
    1. <svg xmlns="http://www.w3.org/2000/svg" width="385" height="590" version="1.1"><rect width="1100" height="600" fill="#FFFFFF"/><g transform="translate(192.5,295)"><text text-anchor="middle" transform="translate(-46, -87)" style="font-size: 70px; user-select: none; cursor: default; font-family: Lato; fill: rgb(166, 65, 130);">origo</text><text text-anchor="middle" transform="translate(-19, -7)" style="font-size: 70px; user-select: none; cursor: default; font-family: Lato; fill: rgb(166, 65, 130);">folder</text><text text-anchor="middle" transform="translate(67, 98)" style="font-size: 43px; user-select: none; cursor: default; font-family: Lato; fill: rgb(166, 65, 130);">clues</text><text text-anchor="middle" transform="translate(-57, 31)" style="font-size: 43px; user-select: none; cursor: default; font-family: Lato; fill: rgb(166, 65, 130);">meta</text><text text-anchor="middle" transform="translate(90, 57)" style="font-size: 43px; user-select: none; cursor: default; font-family: Lato; fill: rgb(166, 65, 130);">reflective</text><text text-anchor="middle" transform="translate(-79, 93)" style="font-size: 43px; user-select: none; cursor: default; font-family: Lato; fill: rgb(166, 65, 130);">integral</text><text text-anchor="middle" transform="translate(4, -148)" style="font-size: 43px; user-select: none; cursor: default; font-family: Lato; fill: rgb(59, 50, 115);">autopoietic</text><text text-anchor="middle" transform="translate(-18, 142)" style="font-size: 43px; user-select: none; cursor: default; font-family: Lato; fill: rgb(59, 50, 115);">design</text><text text-anchor="middle" transform="translate(47, 163)" style="user-select: none; cursor: default; font-family: Lato; fill: rgb(59, 50, 115);">word</text><text text-anchor="middle" transform="translate(70, -57)" style="user-select: none; cursor: default; font-family: Lato; fill: rgb(59, 50, 115);">cloud</text><text text-anchor="middle" transform="translate(44, 14)" style="user-select: none; cursor: default; font-family: Lato; fill: rgb(59, 50, 115);">manually</text><text text-anchor="middle" transform="translate(-75, 170)" style="user-select: none; cursor: default; font-family: Lato; fill: rgb(59, 50, 115);">maintained</text><text text-anchor="middle" transform="translate(-40, 52)" style="user-select: none; cursor: default; font-family: Lato; fill: rgb(242, 236, 145);">outline</text><text text-anchor="middle" transform="translate(89, -6)" style="user-select: none; cursor: default; font-family: Lato; fill: rgb(242, 236, 145);">inter</text><text text-anchor="middle" transform="translate(81, -127)" style="user-select: none; cursor: default; font-family: Lato; fill: rgb(242, 236, 145);">related</text><text text-anchor="middle" transform="translate(-40, -187)" style="user-select: none; cursor: default; font-family: Lato; fill: rgb(242, 236, 145);">salient</text><text text-anchor="middle" transform="translate(68, -92)" style="user-select: none; cursor: default; font-family: Lato; fill: rgb(242, 236, 145);">words</text><text text-anchor="middle" transform="translate(124, -27)" style="user-select: none; cursor: default; font-family: Lato; fill: rgb(242, 158, 109);">adjacenses</text><text text-anchor="middle" transform="translate(91, 120)" style="user-select: none; cursor: default; font-family: Lato; fill: rgb(242, 158, 109);">neologism</text><text text-anchor="middle" transform="translate(-125, 119)" style="user-select: none; cursor: default; font-family: Lato; fill: rgb(242, 158, 109);">indwelling</text><text text-anchor="middle" transform="translate(34, -191)" style="user-select: none; cursor: default; font-family: Lato; fill: rgb(242, 158, 109);">gestalt</text><text text-anchor="middle" transform="translate(-100, 55)" style="user-select: none; cursor: default; font-family: Lato; fill: rgb(242, 158, 109);">focal</text><text text-anchor="middle" transform="translate(-113, -191)" style="user-select: none; cursor: default; font-family: Lato; fill: rgb(242, 158, 109);">subsidiary</text><text text-anchor="middle" transform="translate(-74, -212)" style="user-select: none; cursor: default; font-family: Lato; fill: rgb(242, 125, 114);">situational</text><text text-anchor="middle" transform="translate(-107, -69)" style="user-select: none; cursor: default; font-family: Lato; fill: rgb(242, 125, 114);">awareness</text><text text-anchor="middle" transform="translate(96, 185)" style="user-select: none; cursor: default; font-family: Lato; fill: rgb(242, 125, 114);">omni-optional</text><text text-anchor="middle" transform="translate(-43, 193)" style="user-select: none; cursor: default; font-family: Lato; fill: rgb(242, 125, 114);">co-evolutional</text><text text-anchor="middle" transform="translate(16, 211)" style="user-select: none; cursor: default; font-family: Lato; fill: rgb(242, 125, 114);">generative</text></g></svg>

    1. reply to u/SlumberCrow at https://reddit.com/r/typewriters/comments/1orwxqq/type_writer_leaving_small_divets_in_paper_when/

      A new platen will certainly help, but it's also a question of having a proper ring and cylinder adjustment across the length of your platen and segment. Often letters that punch through tend to be the . , and o which are at the extreme end of the segment. Some machines have adjustment screws at either end of the carriage and the adjustment should be checked at not only the center of the platen but both ends. If you don't have an experienced mechanic who knows how to do all of this properly you can easily get issues which will most often show up at the far ends of the the segment/platen.

      Beyond a proper adjustment, it's also the case that the surface area of the . and , are smaller than other characters and so they tend to get more force even when actuated by the weaker fingers on the right hand when touch typing. Some older manuals and training films will suggest putting less pressure on these keys when typing. This is likely even more important for those who hunt-and-peck and are likely using the full force of their index fingers.

      Unless your ribbon is obviously dry or marginal, replacing your ribbon isn't likely to help much. Slugs are made out of hardened steel and you'd have to do something incredibly drastic to damage the slugs, so don't sweat that too much. Backing sheet will help as a stop-gap particularly on machines with older/hardened platens, but there's only so much help that will do without a good platen and a properly adjusted machine.

    1. Regular Expressions Notepad++ regular expressions (“regex”) use the Boost regular expression library v1.85 (as of NPP v8.6.6), which was originally based on PCRE (Perl Compatible Regular Expression) syntax, only departing from it in very minor ways. Complete documentation on the precise implementation is to be found on the Boost pages for search syntax and replacement syntax. (Some users have misunderstood this paragraph to mean that they can use one of the regex-explainer websites that accepts PCRE and expect anything that works there to also work in Notepad++; this is not accurate. There are many different “PCRE” implimentations, and Boost itself does not claim to be “PCRE”, though both Boost and PCRE variants have the same origins in an early version of Perl’s regex engine. If your regex-explainer does not claim to use the same Boost engine as Notepad++ uses, there will be differences between the results from your chosen website and the results that Notepad++ gives.) The Notepad++ Community has a FAQ on other resources for regular expressions. Note: Regular expression “backward” search is disallowed due to sometimes surprising results. (For example, in the text to the test they travelled, a forward regex t\w+ will find 5 results; the same regex searching backward will find 17 matches.) If you really need this feature, please see Allow regex backward search to learn how to activate this option. Important Note: Syntax that works in the Find What: box for searching will not always work in the Replace with: box for replacement. There are different syntaxes. The Control Characters and Match by character code syntax work in both; other than that, see the individual sections for Searches vs Substitutions for which syntaxes are valid in which fields. Regex Special Characters for Searches In a regular expression (shortened into regex throughout), special characters interpreted are: Single-character matches . or \C ⇒ Matches any character. If you check the box which says . matches newline, or use the (?s) search modifier, then . or \C will match any character, including newline characters (\r or \n). With the option unchecked, or using the (?-s) search modifier, . or \C only match characters within a line, and do not match the newline characters. Any Unicode character within the Basic Multilingual Plane (BMP) (with a codepoint from U+0000 through U+FFFF) will be matched per these rules. Any Unicode character that is beyond the BMP (with a codepoint from U+10000 through U+10FFFF) will be matched as two separate characters instead, since the “surrogate code” uses two characters. (See the Match by Character Code section for more on how surrogate codes work.) \X ⇒ Matches a single non-combining character followed by any number (zero or more) combining characters. You can think of \X as a “. on steroids”: it matches the whole grapheme as a unit, not just the base character itself. This is useful if you have a Unicode encoded text with accents as separate, combining characters. For example, the letter ǭ̳̚, with four combining characters after the o, can be found either with the regex (?-i)o\x{0304}\x{0328}\x{031a}\x{0333} or with the shorter regex \X (the latter, being generic, matches more than just ǭ̳̚, inluding but not limited to ą̳̄̚ or o alone); if you want to limit the \X in this example to just match a possibly-modified o (so “o followed by 0 or more modifiers”), use a lookahead before the \X: (?=o)\X, which would match o alone or ǭ̳̚, but not ą̳̄̚. \$ , \( , \) , \* , \+ , \. , \? , \[ , \] , \\ , \| ⇒ Prefixing a special character with \ to “escape” the character allows you to search for a literal character when the regular expression syntax would otherwise have that character have a special meaning as a regex meta-character. The characters $ ( ) * + . ? [ ] \ | all have special meaning to the regex engine in normal circumstances; to get them to match as a literal (or to show up as a literal in the substitution), you will have to prefix them with the \ character. There are also other characters which are special only in certain circumstances (any time a character is used with a non-literal meaning throughout the Regular Expression section of this manual); if you want to match one of those sometimes-special characters as literal character in those situations, those sometimes-special characters will also have to be escaped in those situations by putting a \ before it. Please note: if you escape a normal character, it will sometimes gain a special meaning; this is why so many of the syntax items listed in this section have a \ before them. Match by character code It is possible to match any character using its character code. This allows searching for any character, even if you cannot type it into the Find box, or the Find box doesn’t seem to match your emoji that you want to search for. If you are using an ANSI encoding in your document (that is, using a character set like Windows 1252), you can use any character code with a decimal codepoint from 0 to 255. If you are using Unicode (one of the UTF-8 or UTF-16 encodings), you can actually match any Unicode character. These notations require knowledge of hexadecimal or octal versions of the character code. (You can find such character code information on most web pages about ASCII, or about your selected character set, and about UTF-8 and UTF-16 representations of Unicode characters.) \0ℕℕℕ ⇒ A single byte character whose code in octal is ℕℕℕ, where each ℕ is an octal digit. (That’s the number 0, not the letter o or O.) This notation works for for codepoints 0-255 (\0000 - \0377), which covers the full ANSI character set range, or the first 256 Unicode characters. For example, \0101 looks for the letter A, as 101 in octal is 65 in decimal, and 65 is the character code for A in ASCII, in most of the character sets, and in Unicode. \xℕℕ ⇒ Specify a single character with code ℕℕ, where each ℕ is a hexadecimal digit. What this stands for depends on the text encoding. This notation works for codepoints 0-255 (\x00 - \xFF), which covers the full ANSI character set range, or the first 256 Unicode characters. For instance, \xE9 may match an é or a θ depending on the character set (also known as the “code page”) in an ANSI encoded document. These next two only work with Unicode encodings (so the various UTF-8 and UTF-16 encodings): \x{ℕℕℕℕ} ⇒ Like \xℕℕ, but matches a full 16-bit Unicode character, which is any codepoint from U+0000 to U+FFFF. \x{ℕℕℕℕ}\x{ℕℕℕℕ} ⇒ For Unicode characters above U+FFFF, in the range U+10000 to U+10FFFF, you need to break the single 5-digit or 6-digit hex value and encode it into two 4-digit hex codes; these two codes are the “surrogate codes” for the character. For example, to search for the 🚂 STEAM LOCOMOTIVE character at U+1F682, you would search for the surrogate codes \x{D83D}\x{DE82}. If you want to know the surrogate codes for a given character, search the internet for “surrogate codes for character” (where character is the fancy Unicode character you need the codes for); the surrogate codes are equivalent to the two-word UTF-16 encoding for those higher characters, so UTF-16 tables will also work for looking this up. Any site or tool that you are likely to be using to find the U+###### for a given Unicode character will probably already give you the surrogate codes or UTF-16 words for the same character; if not, find a tool or site that does. You can also compute surrogate codes yourself from the character code, but only if you are comfortable with hexadecimal and binary. Skip the following bullets if you are prone to mathematics-based PTSD. Start with your Unicode U+######, calling the hexadecimal digits as PPWXYZ. The PP digits indicate the plane. subtract one and convert to the 4 binary bits pppp (so PP=01 becomes 0000, PP=0F becomes 1110, and PP=10 becomes 1111) Convert each of the other digits into 4 bits (W as wwww, X as xxvv, Y as yyyy, and Z as zzzz; you will see in a moment why two different characters are used in xxvv) Write those 20 bits in sequence: ppppwwwwxxvvyyyyzzzz Group into two equal groups: ppppwwwwxx and vvyyyyzzzz (you can see that the X ⇒ xxvv was split between the two groups, hence the notation) Before the first group, insert the binary digits 110110 to get 110110ppppwwwwxx, and split into the nibbles 1101 10pp ppww wwxx. Convert those nibbles to hex: it will give you a value from \x{D800} thru \x{DBFF}; this is the High Surrogate code Before the second group, insert the binary digits 110111 to get 110111vvyyyyzzzz, and split into the nibbles 1101 11vv yyyy zzzz. Convert those nibbles to hex: it will give you a value from \x{DC00} thru \x{DFFF}; this is the Low Surrogate code Combine those into the final \x{ℕℕℕℕ}\x{ℕℕℕℕ} for searching. For more on this, see the Wikipedia article on Unicode Planes, and the discussion in the Notepad++ Community Forum about how to search for non-ASCII characters Collating Sequences [[._col_.]] ⇒ The character the col “collating sequence” stands for. For instance, in Spanish, ch is a single letter, though it is written using two characters. That letter would be represented as [[.ch.]]. This trick also works with symbolic names of control characters, like [[.BEL.]] for the character of code 0x07. See also the discussion on character ranges. Control characters \a ⇒ The BEL control character 0x07 (alarm). \b ⇒ The BS control character 0x08 (backspace). This is only allowed inside a character class definition. Otherwise, this means “a word boundary”. \e ⇒ The ESC control character 0x1B. \f ⇒ The FF control character 0x0C (form feed). \n ⇒ The LF control character 0x0A (line feed). This is the regular end of line under Unix systems. \r ⇒ The CR control character 0x0D (carriage return). This is part of the DOS/Windows end of line sequence CR-LF, and was the EOL character on Mac 9 and earlier. OSX and later versions use \n. \t ⇒ The TAB control character 0x09 (tab, or hard tab, horizontal tab). \c☒ ⇒ The control character obtained from character ☒ by stripping all but its 5 lowest order bits. For instance, \cA and \ca both stand for the SOH control character 0x01. You can think of this as “\c means ctrl”, so \cA is the character you would get from hitting Ctrl+A in a terminal. (Note that \c☒ will not work if ☒ is outside of the Basic Multilingual Plane (BMP) – that is, it only works if ☒ is in the Unicode character range U+0000 - U+FFFF. The intention of \c☒ is to mnemonically escape the ASCII control characters obtained by typing Ctrl+☒, it is expected that you will use a simple ASCII alphanumeric for the ☒, like \cA or \ca.) Special Control escapes \R ⇒ Any newline sequence. Specifically, the atomic group (?>\r\n|\n|\x0B|\f|\r|\x85|\x{2028}|\x{2029}). Please note, this sequence might match one or two characters, depending on the text. Because its length is variable-width, it cannot be used in lookbehinds. Because it expands to a parentheses-based group with an alternation sequence, it cannot be used inside a character class. If you accidentally attempt to put it in a character class, it will be interpreted like any other literal-character escape (where \☒ is used to make sure that the next character is literal) meaning that the R will be taken as a literal R, without any special meaning. For example, if you try [\t\R]: you may be intending to say, “match any single character that’s a tab or a newline”, but what you are actually saying is “match the tab or a literal R”; to get what you probably intended, use [\t\v] for “a tab or any vertical spacing character”, or [\t\r\n] for “a tab or carriage return or newline but not any of the weird verticals”. Ranges or kinds of characters Character Classes [_set_] ⇒ This indicates a set of characters, for example, [abc] means any of the literal characters a, b or c. You can also use ranges by putting a hyphen between characters, for example [a-z] for any character from a to z. You can use a collating sequence in character ranges, like in [[.ch.]-[.ll.]] (these are collating sequences in Spanish). Certain characters require special treatment inside character classes: To use a literal - in a character class: Use it directly as the first or last character in the enclosing class notation, like [-abc] or [abc-]; OR use it “escaped” at any position, like [\-abc] or [a\-bc] . To use a literal ] in a character class: Use it directly right after the opening [ of the class notation, like []abc]; OR use it “escaped” at any position, like [\]abc] or [a\]bc] . To use a literal [ in a character class: Use it directly like any other character, like [ab[c]; “escaping” is not necessary, but is permissible, like [ab\[c] . This character is not special when used alone inside a class; however, there are cases where it is special in combination with another: If used with a colon in the order [: inside a class, it is the opening sequence for a named class (described below); if you want to include both a [ and a : inside the same character class, do not use them unescaped right next to each other; either change the order, like [:[], or escape one or both, like [\[:] or [[\:] or [\[\:] . If used with an equals sign in the order [= inside a class, it is the opening sequence for an equivalence class (described below); if you want to include both a [ and a = inside the same character class, do not use them unescaped right next to each other; either change the order, like [=[], or escape one or both, like [\[=] or [[\=] or [\[\=] . To use a literal \ in a character class, it must be doubled (i.e., \\) inside the enclosing class notation, like [ab\\c] . To use a literal ^ in a character class: Use it directly as any character but the first, such as [a^b] or [ab^]; OR use it “escaped” at any position, such as [\^ab] or [a\^b] or [ab\^] . [^_set_] ⇒ The complement of the characters in the set. For example, [^A-Za-z] means any character except an alphabetic character. Care should be taken with a complement list, as regular expressions are always multi-line, and hence [^ABC]* will match until the first A, B or C (or a, b or c if match case is off), including any newline characters. To confine the search to a single line, include the newline characters in the exception list, e.g. [^ABC\r\n]. [[:_name_:]] or [[:☒:]] ⇒ The whole character class named name. For many, there is also a single-letter “short” class name, ☒. Please note: the [:_name_:] and [:☒:] must be inside a character class [...] to have their special meaning. short full name description equivalent character class alnum letters and digits alpha letters h blank spacing which is not a line terminator [\t\x20\xA0] cntrl control characters [\x00-\x1F\x7F\x81\x8D\x8F\x90\x9D] d digit digits graph graphical character, so essentially any character except for control chars, \0x7F, \x80 l lower lowercase letters print printable characters [\s[:graph:]] punct punctuation characters [!"#$%&'()*+,\-./:;<=>?@\[\\\]^_{\|}~] s space whitespace (word or line separator) [\t\n\x0B\f\r\x20\x85\xA0\x{2028}\x{2029}] u upper uppercase letters unicode any character with code point above 255 [\x{0100}-\x{FFFF}] w word word characters [_\d\l\u] xdigit hexadecimal digits [0-9A-Fa-f] Note that letters include any unicode letters (ASCII letters, accented letters, and letters from a variety of other writing systems); digits include ASCII numeric digits, and anything else in Unicode that’s classified as a digit (like superscript numbers ¹²³…). Note that those character class names may be written in upper or lower case without changing the results. So [[:alnum:]] is the same as [[:ALNUM:]] or the mixed-case [[:AlNuM:]]. As stated earlier, the [:_name_:] and [:☒:] (note the single brackets) must be a part of a surrounding character class. However, you may combine them inside one character class, such as [_[:d:]x[:upper:]=], which is a character class that would match any digit, any uppercase, the lowercase x, and the literal _ and = characters. These named classes won’t always appear with the double brackets, but they will always be inside of a character class. If the [:_name_:] or [:☒:] are accidentally not contained inside a surrounding character class, they will lose their special meaning. For example, [:upper:] is the character class matching :, u, p, e, and r; whereas [[:upper:]] is similar to [A-Z] (plus other unicode uppercase letters) [^[:_name_:]] or [^[:☒:]] ⇒ The complement of character class named name or ☒ (matching anything not in that named class). This uses the same long names, short names, and rules as mentioned in the previous description. Character classes may not contain parentheses-based groups of any kind, including the special escape \R (which expands to a parentheses-based group when evaluated, even though \R doesn’t look like it contains parentheses). Character Properties These properties behave similar to named character classes, but cannot be contained inside a character class. \p☒ or \p{_name_} ⇒ Same as [[:☒:]] or [[:_name_:]], where ☒ stands for one of the short names from the table above, and name stands for one of the full names from above. For instance, \pd and \p{digit} both stand for a digit, just like the escape sequence \d does. \P☒ or \P{_name_} ⇒ Same as [^[:☒:]] or [^[:_name_:]] (not belonging to the class name). Character escape sequences \☒ ⇒ Where ☒ is one of d, w, l, u, s, h, v, described below. These single-letter escape sequences are each equivalent to a class from above. The lower-case escape sequence means it matches that class; the upper-case escape sequence means it matches the negative of that class. (Unlike the properties, these can be used both inside or outside of a character class.) Description Escape Sequence Positive Class Negative Escape Sequence Negative Class digits \d [[:digit:]] \D [^[:digit:]] word chars \w [[:word:]] \W [^[:word:]] lowercase \l [[:lower:]] \L [^[:lower:]] uppercase \u [[:upper:]] \U [^[:upper:]] word/line separators \s [[:space:]] \S [^[:space:]] horizontal space \h [[:blank:]] \H [^[:blank:]] vertical space \v see below \V Vertical space: This encompasses all the [[:space:]] characters that aren’t [[:blank:]] characters: The LF, VT, FF, CR , NEL control characters and the LS and PS format characters: 0x000A (line feed), 0x000B (vertical tabulation), 0x000C (form feed), 0x000D (carriage return), 0x0085 (next line), 0x2028 (line separator) and 0x2029 (paragraph separator). There isn’t a named class which matches. Note: despite its similarity to \v, even though \R matches certain vertical space characters, it is not a character-class-equivalent escape sequence (because it evaluates to a parentheses()-based expression, not a class-based expression). So while \d, \l, \s, \u, \w, \h, and \v are all equivalent to a character class and can be included inside another bracket[]-based character class, the \R is not equivalent to a character class, and cannot be included inside a bracketed[] character-class. Equivalence Classes [[=_char_=]] ⇒ All characters that differ from char by case, accent or similar alteration only. For example [[=a=]] matches any of the characters: A, À, Á, Â, Ã, Ä, Å, a, à, á, â, ã, ä and å. Multiplying operators + ⇒ This matches 1 or more instances of the previous character, as many as it can. For example, Sa+m matches Sam, Saam, Saaam, and so on. [aeiou]+ matches consecutive strings of vowels. * ⇒ This matches 0 or more instances of the previous character, as many as it can. For example, Sa*m matches Sm, Sam, Saam, and so on. ? ⇒ Zero or one of the last character. Thus Sa?m matches Sm and Sam, but not Saam. *? ⇒ Zero or more of the previous group, but minimally: the shortest matching string, rather than the longest string as with the “greedy” operator. Thus, m.*?o applied to the text margin-bottom: 0; will match margin-bo, whereas m.*o will match margin-botto. +? ⇒ One or more of the previous group, but minimally. {ℕ} ⇒ Matches ℕ copies of the element it applies to (where ℕ is any decimal number). {ℕ,} ⇒ Matches ℕ or more copies of the element it applies to. {ℕ,ℙ} ⇒ Matches ℕ to ℙ copies of the element it applies to, as much it can (where ℙ ≥ ℕ). {ℕ,}? or {ℕ,ℙ}? ⇒ Like the above, but minimally. *+ or ?+ or ++ or {ℕ,}+ or {ℕ,ℙ}+ ⇒ These so called “possessive” variants of greedy repeat marks do not backtrack. This allows failures to be reported much earlier, which can boost performance significantly. But they will eliminate matches that would require backtracking to be found. As an example, see how the matching engine handles the following two regexes: When regex “.*” is run against the text “abc”x : `“` matches `“` `.*` matches `abc”x` `”` doesn't match ( End of line ) => Backtracking `.*` matches `abc”` `”` doesn't match letter `x` => Backtracking `.*` matches `abc` `”` matches `”` => 1 overall match `“abc”` When regex “.*+”, with a possessive quantifier, is run against the text “abc”x : `“` matches `“` `.*+` matches `abc”x` ( catches all remaining characters ) `”` doesn't match ( End of line ) Notice there is no match at all in this version, because the possessive quantifier prevents backtracking to a possible solution. Anchors Anchors match a zero-length position in the line, rather than a particular character. ^ ⇒ This matches the start of a line (except when used inside a set, see above). $ ⇒ This matches the end of a line. \< ⇒ This matches the start of a word using Boost’s definition of words. \> ⇒ This matches the end of a word using Boost’s definition of words. \b ⇒ Matches either the start or end of a word. \B ⇒ Not a word boundary. It represents any location between two word characters or between two non-word characters. \A or \` ⇒ Matches the start of the file. \z or \' ⇒ Matches the end of the file. \Z ⇒ Matches like \z with an optional sequence of newlines before it. This is equivalent to (?=\v*\z), which departs from the traditional Perl meaning for this escape. \G ⇒ This “Continuation Escape” matches the end of the previous match, or matches the start of the text being matched if no previous match was found. In Find All or Replace All circumstances, this will allow you to anchor your next match at the end of the previous match. If it is the first match of a Find All or Replace All, and any time you use a single Find Next or Replace, the “end of previous match” is defined to be the start of the search area – the beginning of the document, or the current caret position, or the start of the highlighted text. Because of that, if you are using it in an alternation, where you want to say “find any occurrence of something after some prefix, or after a previous match), you will want to make sure that your prefix includes the start-of-file \A, otherwise the \G portion may accidentally match start-of-file when you don’t want that to occur. Capture Groups and Backreferences (_subset_) ⇒ Numbered Capture Group: Parentheses mark a part of the regular expression, also known as a subset expression or capture group. The string matched by the contents of the parentheses (indicated by subset in this example) can be re-used with a backreference or as part of a replace operation; see Substitutions, below. Groups may be nested. (?<name>_subset_) or (?'name'_subset_) ⇒ Named Capture Group: Names the value matched by subset as the group name. Please note that group names are case-sensitive. \ℕ, \gℕ, \g{ℕ}, \g<ℕ>, \g'ℕ', \kℕ, \k{ℕ}, \k<ℕ> or \k'ℕ' ⇒ Numbered Backreference: These syntaxes match the ℕth capture group earlier in the same expression. (Backreferences are used to refer to the capture group contents only in the search/match expression; see the Substitution Escape Sequences for how to refer to capture groups in substitutions/replacements.) A regex can have multiple subgroups, so \2, \3, etc. can be used to match others (numbers advance left to right with the opening parenthesis of the group). You can have as many capture groups as you need, and are not limited to only 9 groups (though some of the syntax variants can only reference groups 1-9; see the notes below, and use the syntaxes that explicitly allow multi-digit ℕ if you have more than 9 groups) Example: ([Cc][Aa][Ss][Ee]).*\1 would match a line such as Case matches Case but not Case doesn't match cASE. \ℕ ⇒ This form can only have ℕ as digits 1-9, so if you have more than 9 capture groups, you will have to use one of the other numbered backreference notations, listed in the next bullet point. Example: the expression \10 matches the contents of the first capture group \1 followed by the literal character 0”, not the contents of the 10th group. \gℕ, \g{ℕ}, \g<ℕ>, \g'ℕ', \kℕ, \k{ℕ}, \k<ℕ> or \k'ℕ' ⇒ These forms can handle any non-zero ℕ. For positive ℕ, it matches the ℕth subgroup, even if ℕ has more than one digit. \g10 matches the contents from the 10th capture group, not the contents from the first capture group followed by the literal 0. If you want to match a literal number after the contents of the ℕth capture group, use one of the forms that has braces, brackets, or quotes, like \g{ℕ} or \k'ℕ' or \k<ℕ>: For example, \g{2}3 matches the contents of the second capture group, followed by a literal 3, whereas \g23 would match the contents of the twenty-third capture group. For clarity, it is highly recommended to always use the braces or brackets form for multi-digit ℕ For negative ℕ, groups are counted backwards relative to the last group, so that \g{-1} is the last matched group, and \g{-2} is the next-to-last matched group. Please, note the difference between absolute and relative backreferences. For instance, an exact four-letters word palindrome can be matched with : the regex (?-i)\b(\w)(\w)\g{2}\g{1}\b, when using absolute (positive) coordinates the regex (?-i)\b(\w)(\w)\g{-1}\g{-2}\b, when using relative (negative) coordinates \g{name}, \g<name>, \g'name', \k{name}, \k<name> or \k'name' ⇒ Named Backreference: The string matching the subexpression named name. (As with the Numbered Backreferences above, these Named Backreferences are used to refer to the capture group contents only in the search/match expression; see the Substitution Escape Sequences for how to refer to capture groups in substitutions/replacements.)

      regular expression

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      SMC5/6 is a highly conserved complex able to dynamically alter chromatin structure, playing in this way critical roles in genome stability and integrity that include homologous recombination and telomere maintenance. In the last years, a number of studies have revealed the importance of SMC5/6 in restricting viral expression, which is in part related to its ability to repress transcription from circular DNA. In this context, Oravcova and colleagues recently reported how SMC5/6 is recruited by two mutually exclusive complexes (orthologs of yeast Nse5/6) to SV40 LT-induced PML nuclear bodies (SIMC/SLF2) and DNA lesions (SLF1/2). In this current work, the authors extend this study, providing some new results. However, as a whole, the story lacks unity and does not delve into the molecular mechanisms responsible for the silencing process. One has the feeling that the story is somewhat incomplete, putting together not directly connected results.

      Please see the introductory overview above.

      (1) In the first part of the work, the authors confirm previous conclusions about the relevance of a conserved domain defined by the interaction of SIMC and SLF2 for their binding to SMC6, and extend the structural analysis to the modelling of the SIMC/SLF2/SMC complex by AlphaFold. Their data support a model where this conserved surface of SIMC/SLF2 interacts with SMC at the backside of SMC6's head domain, confirming the relevance of this interaction site with specific mutations. These results are interesting but confirmatory of a previous and more complete structural analysis in yeast (Li et al. NSMB 2024). In any case, they reveal the conservation of the interaction. My major concern is the lack of connection with the rest of the article. This structure does not help to understand the process of transcriptional silencing reported later beyond its relevance to recruit SMC5/6 to its targets, which was already demonstrated in the previous study.

      Demonstrating the existence of a conserved interface between the Nse5/6-like complexes and SMC6 in both yeast and human is foundationally important, not confirmatory, and was not revealed in our previous study. It remains unclear how this interface regulates SMC5/6 function, but yeast studies suggest a potential role in inhibiting the SMC5/6 ATPase cycle. Nevertheless, the precise function of Nse5/6 and its human orthologs in SMC5/6 regulation remain undefined, largely due to technical limitations in available in vivo analyses. The SIMC1/SLF2/SMC6 complex structure likely extends to the SLF1/2/SMC6 complex, suggesting a unifying function of the Nse5/6-like complexes in SMC5/6 regulation, albeit in the distinct processes of ecDNA silencing and DNA repair. There have been no studies to date (including this one) showing that SIMC1-SLF2 is required for SMC5/6 recruitment to ecDNA. Our previous study showed that SIMC1 was needed for SMC5/6 to colocalize with SV40 LT antigen at PML NBs. Here we show that SIMC1 is required for ecDNA repression, in the absence of PML NBs, which was not anticipated.

      (2) In the second part of the work, the authors focus on the functionality of the different complexes. The authors demonstrate that SMC5/6's role in transcription silencing is specific to its interaction with SIMC/SLF2, whereas SMC5/6's role in DNA repair depends on SLF1/2. These results are quite expected according to previous results. The authors already demonstrated that SLF1/2, but not SIMC/SLF2, are recruited to DNA lesions. Accordingly, they observe here that SMC5/6 recruitment to DNA lesions requires SLF1/2 but not SIMC/SLF2. Likewise, the authors already demonstrated that SIMC/SLF2, but not SLF1/2, targets SMC5/6 to PML NBs. Taking into account the evidence that connects SMC5/6's viral resistance at PML NBs with transcription repression, the observed requirement of SIMC/SLF2 but not SLF1/2 in plasmid silencing is somehow expected. This does not mean the expectation has not to be experimentally confirmed. However, the study falls short in advancing the mechanistic process, despite some interesting results as the dispensability of the PML NBs or the antagonistic role of the SV40 large T antigen. It had been interesting to explore how LT overcomes SMC5/6-mediated repression: Does LT prevent SIMC/SLF2 from interacting with SMC5/6? Or does it prevent SMC5/6 from binding the plasmid? Is the transcription-dependent plasmid topology altered in cells lacking SIMC/SLF2? And in cells expressing LT? In its current form, the study is confirmatory and preliminary. In agreement with this, the cartoons modelling results here and in the previous work look basically the same.

      Our previous study only examined the localization of SLF1 and SIMC1 at DNA lesions. The localization of these subcomplexes alone should not be used to define their roles in SMC5/6 localization. Indeed, the field is split in terms of whether Nse5/6-like complexes are required for ecDNA binding/loading, or regulation of SMC5/6 once bound. 

      We agree, determining the potential mechanism of action of LT in overcoming SMC5/6-based repression is an important next step. We believe it is unlikely due to blocking of the SMC5/6SIMC1/SLF2 interface, since SIMC1-SLF2 is required for SMC5/6 to localize at LT-induced foci. It will require the identification of any direct interactions with SMC5/6 subunits, and better methods for assessing SMC5/6 loading and activity on ecDNAs. Unlike HBx, Vpr, and BNRF1 it does not appear to induce degradation of SMC5/6, making it a more complex and interesting challenge. Also, the dispensability of PML NBs in plasmid silencing versus viral silencing raises multiple important questions about SMC5/6’s repression mechanism. 

      (3) There are some points about the presented data that need to be clarified.

      Thank you, we have addressed these points below, within the Recommendations for authors section.

      Reviewer #2 (Public review):

      Oracová et al. present data supporting a role for SIMC1/SLF2 in silencing plasmid DNA via the SMC5/6 complex. Their findings are of interest, and they provide further mechanistic detail of how the SMC5/6 complex is recruited to disparate DNA elements. In essence, the present report builds on the author's previous paper in eLife in 2022 (PMID: 36373674, "The Nse5/6-like SIMC1-SLF2 complex localizes SMC5/6 to viral replication centers") by showing the role of SIMC1/SLF2 in localisation of the SMC5/6 complex to plasmid DNA, and the distinct requirements as compared to recruitment to DNA damage foci. Although the findings of the manuscript are of interest, we are not yet convinced that the new data presented here represents a compelling new body of work and would better fit the format of a "research advance" article. In their previous paper, Oracová et al. show that the recruitment of SMC5/6 to SV40 replication centres is dependent on SIMC1, and specifically, that it is dependent on SIMC1 residues adjacent to neighbouring SLF2.

      We agree. We submitted this manuscript as a “Research Advance”, not as a standalone research article, given that it is an extension of our previous “Research Article” (1).

      Other comments

      (1) The mutations chosen in Figure 1 are quite extensive - 5 amino acids per mutant. In addition, they are in many cases 'opposite' changes, e.g., positive charge to negative charge. Is the effect lost if single mutations to an alanine are made?

      The mutations were chosen to test and validate the predicted SIMC1-SLF2-SMC6 structure i.e. the contact point between the conserved patch of SIMC1-SLF2 and SMC6. Multiple mutations and charge inversions increased the chance of disrupting the extensive interface. In this respect, the mutations were successful and informative, confirming the requirement of this region in specifically contacting SMC6. Whilst alanine scanning mutations are possible, we believe that they would not add to, or detract from, our validation of the predicted SIMC1-SLF2-SMC6 interface.

      (2) In Figure 2c, it isn't clear from the data shown that the 'SLF2-only' mutations in SMC6 result in a substantial reduction in SIMC1/SLF2 binding.

      To clarify the difference between wild-type and SLF2-only mutations in SIMC1-SLF2 interaction, we have performed an image volume analysis. This shows that the SLF2-facing SMC6 mutant reduces its interaction with SIMC1 (to 44% of WT) and SLF2 (to 21% of WT). The reduction in both SIMC1 and SLF2 interaction with SMC6 SLF2-facing mutant is expected, since SIMC1 and SLF2 are an interdependent heterodimer.  

      Author response table 1.

      (3) In the GFP reporter assays (e.g. Figure 3), median fluorescence is reported - was there any observed difference in the percentage of cells that are GFP positive?

      Yes, as expected when the GFP plasmid is not actively repressed, the percent of GFP positive cells differs in each cell line – in the same trend as GFP intensity

      (4) The potential role of the large T antigen as an SMC5/6 evasion factor is intriguing. However, given the role of the large T antigen as a transcriptional activator, caution is required when interpreting enhanced GFP fluorescence. Antagonism of the SMC5/6 complex in this context might be further supported by ChIP experiments in the presence or absence of large T. Can large T functionally substitute for HBx or HIV-Vpr?

      We agree, the potential role of LT in SMC5/6 antagonism is interesting. We did state in the text “While LT is known to be a promiscuous transcriptional activator (2,3) that does not rule out a co-existing role in antagonizing SMC5/6. Indeed, these findings are reminiscent of HBx from HBV and Vpr of HIV-1, both of which are known promiscuous transcriptional activators that also directly antagonize SMC5/6 to relieve transcriptional repression (4-10).“ We have tried ChIP experiments, but found these to be unreliable in assessing SMC5/6 association with plasmid DNA. Given the many disparate targets of LT, HBx and Vpr (other than SMC5/6), it seems unlikely that LT could functionally substitute for HBx and Vpr in supporting HBV and HIV-1 infections. Whilst certainly an interesting future question, we believe it is beyond the scope of this study.

      (5) In Figure 5c, the apparent molecular weight of large T and SMC6 appears to change following transfection of GFP-SMC5 - is there a reason for this?

      We are not certain as to what causes the molecular weight shift, but it is not specifically related to GFPSMC5 transfection. Rather, it appears to be a general effect of the pulldown. Indeed, a very weak “background” band of LT is seen in the GFP only pulldown, which also runs at a “higher” molecular weight, as in the GFP-SMC5 pulldown. We believe that the effect is instead related to gel mobility in the wells that contain post pulldown proteins and different buffers. We have also seen similar effects using different protein-protein interaction pairs. 

      Reviewer #3 (Public review):

      Summary:

      This study by the Boddy and Otomo laboratories further characterizes the roles of SMC5/6 loader proteins and related factors in SMC5/6-mediated repression of extrachromosomal circular DNA. The work shows that mutations engineered at an AlphaFold-predicted protein-protein interface formed between the loader SLF2/SIMC1 and SMC6 (similar to the interface in the yeast counterparts observed by cryo-EM) prevent co-IP of the respective proteins. The mutations in SLF2 also hinder plasmid DNA silencing when expressed in SLF2-/- cell lines, suggesting that this interface is needed for silencing. SIMC1 is dispensable for recruitment of SMC5/6 to sites of DNA damage, while SLF1 is required, thus separating the functions of the two loader complexes. Preventing SUMOylation (with a chemical inhibitor) increases transcription from plasmids but does not in SLF2-deleted cell lines, indicating the SMC5/6 silences plasmids in a SUMOylation dependent manner. Expression of LT is sufficient for increased expression, and again, not additive or synergistic with SIMC1 or SLF2 deletion, indicating that LT prevents silencing by directly inhibiting 5/6. In contrast, PML bodies appear dispensable for plasmid silencing.

      Strengths:

      The manuscript defines the requirements for plasmid silencing by SMC5/6 (an interaction of Smc6 with the loader complex SLF2/SIMC1, SUMOylation activity) and shows that SLF1 and PML bodies are dispensable for silencing. Furthermore, the authors show that LT can overcome silencing, likely by directly binding to (but not degrading) SMC5/6.

      Weaknesses:

      (1) Many of the findings were expected based on recent publications.

      There have been no manuscripts describing the role of SIMC1-SLF2 in ecDNA silencing. There have been studies describing SLF2’s roles in ecDNA silencing, but these suggested SLF2 had an SLF1 independent role, with no mention of an alternate Nse5-like cofactor. Our earlier study in eLife (1) described the identification of SIMC1 as an Nse5-like cofactor for SLF2 but did not test potential roles of the complex in ecDNA silencing. Also, the apparent dispensability of PML NBs in plasmid silencing (in U2OS cells) was unexpected based on recent publications. Finally, SV40 LT has not previously been implicated in SMC5/6 inhibition, which may occur through novel mechanisms.

      (2) While the data are consistent with SIMC1 playing the main function in plasmid silencing, it is possible that SLF1 contributes to silencing, especially in the absence of SIMC1. This would potentially explain the discrepancy with the data reported in ref. 50. SLF2 deletion has a stronger effect on expression than SIMC1 deletion in many but not all experiments reported in this manuscript. A double mutant/deletion experiments would be useful to explore this possibility.

      It is interesting to note that the data in ref. 50 (11) is also at odds with that in ref. 45 (8) in terms of defining a role for SLF1 in the silencing of unintegrated HIV-1 DNA. The Irwan study showed that SLF1 deficient cells exhibit increased expression of a reporter gene from unintegrated HIV-1, whereas the Dupont study found that SLF1 deletion, unlike SLF2 deletion, has no effect. It is unclear what the basis of this discrepancy is. In line with the Dupont study, we found no effect of SLF1 deletion on plasmid expression (Figure 4B), whereas SLF2 deletion increased reporter expression (Figure 3A/B). It is possible that SLF1 could support some plasmid silencing in the absence of SIMC1, especially considering the gross structural similarity in their C-terminal Nse5-like domains. However, we have been unable to generate double-knockout SIMC1 and SLF1 cells to test such a possibility, and shSLF1 has been ineffective. 

      (3) SLF2 is part of both types of loaders, while SLF1 and SIMC1 are specific to their respective loaders. Did the authors observe differences in phenotypes (growth, sensitivities to DNA damage) when comparing the mutant cell lines or their construction? This should be stated in the manuscript.

      We have not observed significant differences in the growth rates of each cell line, and DNA damage sensitivities are as yet untested.   

      (4) It would be desirable to have control reporter constructs located on the chromosome for several experiments, including the SUMOylation inhibition (Figures 5A and 5-S2) and LT expression (Figure 5D) to exclude more general effects on gene expression.

      We have repeated all GFP reporter assays using integrated versus episomal plasmid DNA. A seminal study by Decorsière et al. (6) showed that SMC5/6 degradation by HBx of HBV increased transcription of episomal but not chromosomally integrated reporters. In line with this data, the deletion of SLF2 does not notably impact the expression of our GFP reporter construct when it is genomically integrated (Figure 3—figure supplement 1C).  

      Somewhat surprisingly, given the generally transcriptionally repressive roles of SUMO, inhibition of the SUMO pathway with SUMOi did not significantly impact the expression of our genomically integrated GFP reporter, versus the episomal plasmid (Figure 5—figure supplement 1C). Finally, the expression of SV40 LT, which enhances plasmid reporter expression (Figure 5D), also did not notably affect expression of the same reporter when located in the genome (Figure 5—figure supplement 3B). This is an interesting result, which is in line with an early study showing that HBx of HBV induces transcription from episomal, but not chromosomally integrated reporters (12). This further suggests that SV40 LT acts similarly to other early viral proteins like HBx and Vpr to counteract or bypass SMC5/6 restriction, amongst their multifaceted functions. Clearly, further analyses are needed to define mechanisms of LT in counteracting SMC5/6, but they do not appear to include complex degradation as seen with HBx and Vpr.  

      (5) Figure 5A: There appears to be an increase in GFP in the SLF2-/- cells with SUMOi? Is this a significant increase?

      No significant difference was found between WT, SIMC1-/- or SLF2-/- when treated with SUMOi (p>0.05). The p-value is 0.0857 (when comparing SLF2-/- to WT in the SUMOi condition) This is described in the figure legend to Figure 5.

      (6) The expression level of SFL2 mut1 should be tested (Figure 3B).

      Full length SLF2 (WT or mutants) has been undetectable by western analyses. However, truncated SLF2 mut1 expresses well and binds SIMC1 but not SMC6 (Figure 1C). Moreover, full length SLF2 mut1 expression was confirmed by qPCR – showing a somewhat higher expression level than SLF2 WT (Figure 3—figure supplement 1B).  

      Reviewer #1 (Recommendations for the authors):

      There are some points about the presented data that need to be clarified.

      (1) Figures 3, 4B, and 5. The authors should rule out the possibility that the reported effects on transcription were due to alterations in plasmid number. This is particularly important, taking into account the importance of SMC5/6 in DNA replication.

      We used qPCR to assess plasmid copy number versus genomic DNA in our cell lines, testing at 72 hours post transfection to avoid any impact of cytosolic DNA (13). Our qPCR data show that there is no significant impact on plasmid copy number across our cell lines i.e. WT and SLF2 null.  SMC5/6 has a positive role in DNA replication progression on the genome (e.g. (14)), so loss of SMC5/6 “targeting” in SIMC1 and SLF2 null cells would be unlikely to promote replication fork progression per se. 

      (2) Figure S1A. In contrast to the statement in the text, the SIMC1-combo control is affected in its binding to SLF2; however, it is not affected in its binding to SMC6. This is somehow unexpected because it suggests that the solenoid-like structure is not required for SMC6 binding, just specific patches at either SIMC or SLF2. This should be commented on.

      We appreciate the reviewer’s observation regarding the discrepancy between Figure S1A and the text. This was our oversight. The data show that SLF2 recovery was reduced in the pull-down with the SIMC1 combo control mutant, while SLF2 expression was unchanged. Because SLF2 or SIMC1 variants that fail to associate typically show poor expression (1), these findings suggest that the SIMC1 combo control mutant associates with SLF2, albeit more weakly. Since the mutations were introduced into surface residues of SIMC1, it is not immediately clear how they would weaken the interaction or destabilize the complex. In contrast, SMC6 was fully recovered with the SIMC1 combo control mutant, indicating that the SIMC1–SMC6 interaction remains stable without stoichiometric SLF2. This may reflect direct recognition of a SIMC1 binding epitope or stabilization of its solenoid structure by SMC6, although this interpretation remains uncertain given the unstable nature of free SIMC1 and SLF2. Alternatively, SMC6 may have co-sedimented with the SIMC1 combo control mutant together with SLF2, which was initially retained but subsequently lost during washing, whereas SMC6 remained due to its limited solubility in the absence of other SMC5/6 subunits. While further mechanistic analysis will require purified SMC5/6 components, our data support the AlphaFold-based model by demonstrating that SIMC1 mutations on the non–SMC6-contacting surface retain association with SMC6. The text has been revised accordingly.

      (3) The SLF2-only mutant has alterations that affect interactions with both SLF2 and SIMC1. Is it not another Mixed mutant?

      We appreciate the reviewer’s observation regarding the discrepancy between the mutant name (“SLF2only”) and its description (“while N947 forms salt bridges with SIMC1”). The previous statement was inaccurate due to a misinterpretation of several AlphaFold models. Across these models, the SIMC1– SLF2 interface residues remain largely consistent, but the SIMC1 residue R470 exhibits positional variability—contacting N947 in some models but not in others. Given this variability and the absence of an experimental structure, we have revised the text to avoid overinterpretation. Because the N947 side chain is oriented toward SLF2 and consistently forms polar contacts with the H1148 side chain and G1149 backbone, we have renamed this mutant “SLF2-facing,” which more accurately describes its modeled environment. The other mutants are likewise renamed “SIMC1-facing” and “SIMC1–SLF2groove-facing,” providing a clearer and more consistent description of the interface.

      (4) The SLF2-only mutant still displays clear interactions with SMC6. Can this be explained with the AlphaFold model?

      SIMC1 may contribute more substantially to SMC6 binding than SLF2, consistent with our mutagenesis results. However, the energetic contributions of individual residues or proteins cannot be quantitatively inferred from structural models alone. Comprehensive experimental and computational analyses would be required to address this point.

      (5) The conclusions about the role of SUMOylation are vague; it is already known that its general effect on transcription repression, and the authors already demonstrated that SIMC interacts with SUMO pathway factors. Concerning the epistatic effect, the experiment should be done at a lower inhibitor concentration; at 100 nM there is not much margin to augment according to the kinetics analysis in Figure S5.

      The SUMO pathway is indeed thought to be generally repressive for transcription. Notably, in response to a suggestion from Reviewer 3 (public review point 4), we have repeated several of our GFP expression assays using cells with the GFP reporter plasmid integrated into the genome (please see Figure 3—figure supplement 1C; Figure 5—figure supplement 1C; Figure 5—figure supplement 3B). This type of integrated reporter does not show elevated expression following inhibition of the SMC5/6 complex, unlike ecDNAs (6,10). Interestingly, SUMOi, LT expression, and SLF2 knockout also did not notably impact the expression of our integrated GFP reporter (Figure 3—figure supplement 1C; Figure 5—figure supplement 1C; Figure 5—figure supplement 3B, unlike that of the plasmid (ecDNA) reporter. Given the “general” inhibitory effect of SUMO on transcription, the SUMOi result was not expected, and it opens further interesting avenues for study. 

      In Figure 5—figure supplement 1A, 100 nM SUMOi increases reporter expression well below the highest SUMOi dose. We believe that the ~3-4 fold induction of GFP expression in SLF2 null cells, if independent of SUMOylation, should further increase GFP expression. The impact of SUMOylation on GFP reporter expression remains “vague”, but our data indicate that SMC5/6 operates within SUMO’s “umbrella” function and provides a starting point for more mechanistic dissection. 

      (6) Figure 5C. Why is the size different between Input versus GFP-PD?

      Please see our response to this question above: reviewer 2, point (5)

      Reviewer #2 (Recommendations for the authors):

      If further data could be provided to extend on that which is presented, then publication as a 'standalone research article' may be appropriate, but not in its present form.

      We submitted this manuscript as a “Research Advance” not as a standalone research article, given that it was an extension of our previous research article (1).

      Reviewer #3 (Recommendations for the authors):

      (1) The term 'LT' should be defined in the title

      We have updated the title accordingly.  

      (2) This reviewer found the nomenclature of the SMC6 mutants confusing (SIMC1-only...). Either rephrase or define more clearly in the text and the figures.

      We agree with the reviewer and have renamed the mutants as “SIMC1-facing”, “SLF2-facing,”, and “SIMC1–SLF2-groove-facing”.

      (3) The authors could better emphasize that LT blocks silencing in trans (not only on its cognate target sequence in cis). This is consistent with the observed direct binding to SMC5/6.

      We appreciate the suggestion to further emphasize the impact of LT on plasmid silencing. We did not want to overstate its impact at this time because we do not know if it directly binds SMC5/6 or indeed affects SMC5/6 function more broadly. LT expression like HBx, does cause induction of a DNA damage response, but we cannot at this point tie that response to SMC5/6 inhibition alone.

      (4) Figure 5 S1: the merge looks drastically different. Is DAPI omitted in the wt merge image?

      Thank you for noting this issue. We have corrected the image, which was impacted by the use of an underexposed DAPI image.  

      (5) Figure 1: how is the structure in B oriented relative to A? A visual guide would be helpful.

      We have added arrows to indicate the view orientation and rotational direction to turn A to B.

      (6) Line 126, unclear what "specificity" here means.

      We have revised the sentence without this word, which now starts with “To confirm the SIMC1-SMC6 interface, we introduced….”

      (7) Line 152, The statement implies that the conserved residues are needed for loader subunits interactions ('mediating the SIMC1-SLF2 interaction"). Does Figure 1C not show that the residues are not important? Please clarify.

      Thank you for noting this writing error. We have corrected the sentence to provide the intended meaning. It now reads "Collectively, these results confirm that the conserved surface patch of SIMC1SLF2 is essential for SMC6 binding.” 

      References

      (1) Oravcova M, Nie M, Zilio N, Maeda S, Jami-Alahmadi Y, Lazzerini-Denchi E, Wohlschlegel JA, Ulrich HD, Otomo T, Boddy MN. The Nse5/6-like SIMC1-SLF2 complex localizes SMC5/6 to viral replication centers. Elife. 2022;11. PMCID: PMC9708086

      (2) Sullivan CS, Pipas JM. T antigens of simian virus 40: molecular chaperones for viral replication and tumorigenesis. Microbiol Mol Biol Rev. 2002;66(2):179-202. PMCID: PMC120785

      (3) Gilinger G, Alwine JC. Transcriptional activation by simian virus 40 large T antigen: requirements for simple promoter structures containing either TATA or initiator elements with variable upstream factor binding sites. J Virol. 1993;67(11):6682-8. PMCID: PMC238107

      (4) Qadri I, Conaway JW, Conaway RC, Schaack J, Siddiqui A. Hepatitis B virus transactivator protein, HBx, associates with the components of TFIIH and stimulates the DNA helicase activity of TFIIH. Proc Natl Acad Sci U S A. 1996;93(20):10578-83. PMCID: PMC38195

      (5) Aufiero B, Schneider RJ. The hepatitis B virus X-gene product trans-activates both RNA polymerase II and III promoters. EMBO J. 1990;9(2):497-504. PMCID: PMC551692

      (6) Decorsiere A, Mueller H, van Breugel PC, Abdul F, Gerossier L, Beran RK, Livingston CM, Niu C, Fletcher SP, Hantz O, Strubin M. Hepatitis B virus X protein identifies the Smc5/6 complex as a host restriction factor. Nature. 2016;531(7594):386-9. 

      (7) Murphy CM, Xu Y, Li F, Nio K, Reszka-Blanco N, Li X, Wu Y, Yu Y, Xiong Y, Su L. Hepatitis B Virus X Protein Promotes Degradation of SMC5/6 to Enhance HBV Replication. Cell Rep. 2016;16(11):2846-54. PMCID: PMC5078993

      (8) Dupont L, Bloor S, Williamson JC, Cuesta SM, Shah R, Teixeira-Silva A, Naamati A, Greenwood EJD, Sarafianos SG, Matheson NJ, Lehner PJ. The SMC5/6 complex compacts and silences unintegrated HIV-1 DNA and is antagonized by Vpr. Cell Host Microbe. 2021;29(5):792-805 e6. PMCID: PMC8118623

      (9) Felzien LK, Woffendin C, Hottiger MO, Subbramanian RA, Cohen EA, Nabel GJ. HIV transcriptional activation by the accessory protein, VPR, is mediated by the p300 co-activator. Proc Natl Acad Sci U S A. 1998;95(9):5281-6. PMCID: PMC20252

      (10) Diman A, Panis G, Castrogiovanni C, Prados J, Baechler B, Strubin M. Human Smc5/6 recognises transcription-generated positive DNA supercoils. Nat Commun. 2024;15(1):7805. PMCID: PMC11379904

      (11) Irwan ID, Bogerd HP, Cullen BR. Epigenetic silencing by the SMC5/6 complex mediates HIV-1 latency. Nat Microbiol. 2022;7(12):2101-13. PMCID: PMC9712108

      (12) van Breugel PC, Robert EI, Mueller H, Decorsiere A, Zoulim F, Hantz O, Strubin M. Hepatitis B virus X protein stimulates gene expression selectively from extrachromosomal DNA templates. Hepatology. 2012;56(6):2116-24. 

      (13) Lechardeur D, Sohn KJ, Haardt M, Joshi PB, Monck M, Graham RW, Beatty B, Squire J, O'Brodovich H, Lukacs GL. Metabolic instability of plasmid DNA in the cytosol: a potential barrier to gene transfer. Gene Ther. 1999;6(4):482-97. 

      (14) Gallego-Paez LM, Tanaka H, Bando M, Takahashi M, Nozaki N, Nakato R, Shirahige K, Hirota T. Smc5/6-mediated regulation of replication progression contributes to chromosome assembly during mitosis in human cells. Mol Biol Cell. 2014;25(2):302-17. PMCID: PMC3890350

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review): 

      Summary: 

      This paper by Schommartz and colleagues investigates the neural basis of memory reinstatement as a function of both how recently the memory was formed (recent, remote) and its development (children, young adults). The core question is whether memory consolidation processes as well as the specificity of memory reinstatement differ with development. A number of brain regions showed a greater activation difference for recent vs. remote memories at the long versus shorter delay specifically in adults (cerebellum, PHG, LOC). A different set showed decreases in the same comparison, but only in children (precuneus, RSC). The authors also used neural pattern similarity analysis to characterize reinstatement, though still in this revised paper I have substantive concerns about how the analyses were performed. While scene-specific reinstatement decreased for remote memories in both children and adults, claims about its presence cannot be made given the analyses. Gist-level reinstatement was observed in children but not adults, but I also have concerns about this analysis. Broadly, the behavioral and univariate findings are consistent with the idea memory consolidation differs between children and adults in important ways, and takes a step towards characterizing how.

      Strengths: 

      The topic and goals of this paper are very interesting. As the authors note, there is little work on memory consolidation over development, and as such this will be an important data point in helping us begin to understand these important differences. The sample size is great, particularly given this is an onerous, multi-day experiment; the authors are to be commended for that. The task design is also generally well controlled, for example as the authors include new recently learned pairs during each session.  

      Weaknesses: 

      As noted above and in my review of the original submission, the pattern similarity analysis for both item and category-level reinstatement were performed in a way that is not interpretable given concerns about temporal autocorrelation within scanning run.Unfortunately these issues remain of concern in this revision because they were not rectified. Most of my review focuses on this analytic issue, though I also outline additional concerns. 

      (1) The pattern similarity analyses are largely uninterpretable due to how they were performed. 

      (a) First, the scene-specific reinstatement index: The authors have correlated a neural pattern during a fixation cross (delay period) with a neural pattern associated with viewing a scene as their measure of reinstatement. The main issue with this is that these events always occurred back-to-back in time. As such, the two patterns will be similar due simply to the temporal autocorrelation in the BOLD signal. Because of the issues with temporal autocorrelation within scanning run, it is always recommended to perform such correlations only across different runs. In this case, the authors always correlated patterns extracted from the same run, and which moreover have temporal lags that are perfectly confounded with their comparison of interest (i.e., from Fig 4A, the "scene-specific" comparisons will always be back-to-back, having a very short temporal lag; "set-based" comparisons will be dispersed across the run, and therefore have a much higher lag). The authors' within-run correlation approach also yields correlation values that are extremely high - much higher than would be expected if this analysis was done appropriately. The way to fix this would be to restrict the analysis to only cross-run comparisons, which is not possible given the design. 

      To remedy this, in the revision the authors have said they will refrain from making conclusions about the presence of scene-specific reinstatement (i.e., reinstatement above baseline). While this itself is an improvement from the original manuscript, I still have several concerns. First, this was not done thoroughly and at times conclusions/interpretations still seem to imply or assume the presence of scene reinstatement (e.g., line 979-985, "our research supports the presence of scene-specific reinstatement in 5-to-7-year-old children"; line 1138). 

      We thank the reviewers for pointing out that there are inconsistencies in our writing. We agree that we cannot make any claims about the baseline level of scene-specific reinstatement. To reiterate, our focus is on the changes in reinstatement over time (30 minutes, 24 hours, and two weeks after learning), which showed a robust decrease. Importantly, scenespecific reinstatement indices for recent items — tested on different days — did not significantly differ, as indicated by non-significant main effects of Session (all p > .323) and Session x ROI interactions (all p > .817) in either age group. This supports our claim that temporal autocorrelation is stable and consistent across conditions and that the observed decline in scene-specific reinstatement reflects a time-dependent change in remote retrieval. We have revised the highlighted passages, accordingly, emphasizing the delay-related decrease in scene-specific reinstatement rather than its absolute magnitude. 

      Second, the authors' logic for the neural-behavioural correlations in the PLSC analysis involved restricting to regions that showed significant reinstatement for the gist analysis, which cannot be done for the analogous scene-specific reinstatement analysis. This makes it challenging to directly compare these two analyses since one was restricted to a small subset of regions and only children (gist), while scene reinstatement included both groups and all ROIs. 

      We thank the reviewer for pointing this out and want to clarify that it was not our intention to directly compare these analyses. For the neural-behavioral correlations, we included only those regions identified based on gist-like representations baseline, whereas for scene-specific reinstatement, we included all regions due to the absence of such a baseline. The primary aim of the PLSC analysis was to identify a set of regions that, after a stringent permutation and bootstrapping procedure, form a latent variable that explains a significant proportion of variance in behavioral performance across all participants. 

      Third, it is also unclear whether children and adults' values should be directly comparable given pattern similarity can be influenced by many factors like motion, among other things. 

      We thank the reviewer for raising this important point. In our multivariate analysis, we included confounding regressors specifically addressing motion-related artefacts. Following recent best practices for mitigating motion-related confounding factors in both adult and pediatric fMRI data (Ciric et al., 2017; Esteban et al., 2020; Jones et al., 2021; Satterthwaite et al., 2013), we implemented the most effective motion correction strategies. 

      Importantly, our group × session interaction analysis focuses on relative changes in reinstatement over time rather than comparing absolute levels of pattern similarity between children and adults. This approach controls for potential baseline differences and instead examines whether the magnitude of delay-related changes differs across groups. We believe this warrants the comparison and ensures that our conclusions are not driven by group-level differences in baseline similarity or motion artifacts.

      My fourth concern with this analysis relates to the lack of regional specificity of the effects. All ROIs tested showed a virtually identical pattern: "Scene-specific reinstatement" decreased across delays, and was greater in children than adults. I believe control analyses are needed to ensure artifacts are not driving these effects. This would greatly strengthen the authors' ability to draw conclusions from the "clean" comparison of day 1 vs. day 14. (A) The authors should present results from a control ROI that should absolutely not show memory reinstatement effects (e.g., white matter?). Results from the control ROI should look very different - should not differ between children and adults, and should not show decreases over time. 

      (C) If the same analysis was performed comparing the object cue and immediately following fixation (rather than the fixation and the immediately following scene), the results should look very different. I would argue that this should not be an index of reinstatement at all since it involves something presented visually rather than something reinstated (i.e., the scene picture is not included in this comparison). If this control analysis were to show the same effects as the primary analysis, this would be further evidence that this analysis is uninterpretable and hopelessly confounded. 

      We appreciate the reviewer’s suggestion to strengthen the interpretation of our findings by including appropriate control analyses to rule out non-memory-related artifacts. In response, we conducted several control analyses, detailed below, which collectively support the specificity of the observed reinstatement effects. The report of the results is included in the manuscript (line 593-619).

      We checked that item reinstatement for incorrectly remembered trial did not show any session-related decline for any ROI. This indicates that the reinstatement for correctly remembered items is memory-related (see Fig. S5 for details). 

      We conducted additional analyses on three subregions of the corpus callosum (the body, genu, and splenium). The results of the linear mixed-effects models revealed no significant group effect (all p > .426), indicating no differences between children and adults. In contrast, all three ROIs showed a significant main effect of Session (all p < .001). However, post hoc analyses indicated that this effect was driven by differences between the recent and the Day 14 remote condition. The main contrasts of interest – recent vs. Day 1 remote and Day 1 remote vs. Day 14 remote – were not significant (all p > .080; see Table S10.4), suggesting that, unlike in other ROIs, there was no delay-related decrease in scene-specific reinstatement in these white matter regions.

      Then we repeated our analysis using the same procedure but replaced the “scene” time window with the “object” time window. The rationale for this control is that comparing the object cue to the immediately following fixation period should not reflect scene reinstatement, as the object and the reinstated scene rely on distinct neural representations. Accordingly, we did not expect a delay-related decrease in the reinstatement index. Consistent with this expectation, the analysis using the object – fixation similarity index – though also influenced by temporal autocorrelation – did not reveal any significant effect of session or delay in any ROI (all p > .059; see Table S9, S9.1).

      Together, these control analyses provide converging evidence that our findings are not driven by global or non-specific signal changes. We believe that these control analyses strengthen our interpretation about delay-related decrease in scene-specific reinstatement index. 

      (B) Do the recent items from day 1 vs. day 14 differ? If so, this could suggest something is different about the later scans (and if not, it would be reassuring). 

      The recent items tested on day 1 and day14 do not differ (all p. > .323). This effect remains stable across all ROIs.

      (b) For the category-based neural reinstatement: (1) This suffers from the same issue of correlations being performed within run. Again, to correct this the authors would need to restrict comparisons to only across runs (i.e., patterns from run 1 correlated with patterns for run 2 and so on). The authors in their response letter have indicated that because the patterns being correlated are not derived from events in close temporal proximity, they should not suffer from the issue of temporal autocorrelation. This is simply not true. For example, see the paper by Prince et al. (eLife 2022; on GLMsingle). This is not the main point of Prince et al.'s paper, but it includes a nice figure that shows that, using standard modelling approaches, the correlation between (same-run) patterns can be artificially elevated for lags as long as ~120 seconds (and can even be artificially reduced after that; Figure 5 from that paper) between events. This would affect many of the comparisons in the present paper. The cleanest way to proceed is to simply drop the within-run comparisons, which I believe the authors can do and yet they have not. Relatedly, in the response letter the authors say they are focusing mainly on the change over time for reinstatement at both levels including the gist-type reinstatement; however, this is not how it is discussed in the paper. They in fact are mainly relying on differences from zero, as children show some "above baseline" reinstatement while adults do not, but I believe there were no significant differences over time (i.e., the findings the authors said they would lean on primarily, as they are arguably the most comparable).  

      We thank the reviewer for this important comment regarding the potential inflation of similarity values due to within-run comparisons.

      To address the reviewer’s concern, we conducted an additional cross-run analysis for all correctly retrieved trials. The approach restricted comparisons to non-overlapping runs (run1run2, run2-run3, run1-run3). This analysis revealed robust gist-like reinstatement in children for remote Day 14 memories in the mPFC (p = .035) and vlPFC (p = .0007), in adults’ vlPFC remote Day 1 memories (p = .029), as well as in children and adults remote Day 1 memories in LOC (p < .02). A significant Session effect in both regions (mPFC: p = .026; vlPFC: p = .002) indicated increased reinstatement for long delay (Day 14) compared to short-delay and recent session (all p < .05). Given that the cross-run results largely replicate and reinforce the effects found previously with within-run, we believe that combining both sources of information is methodologically justified and statistically beneficial. Specifically, both approaches independently identified significant gist-like reinstatement in children’s mPFC and vlPFC (although within-run vlPFC effect (short delay: p = .038; long delay p = .047) did not survive multiple comparisons), particularly for remote memories. Including both withinrun and between-run comparisons increases the number of unique, non-repeated trial pairs, improving statistical power without introducing redundancy. While we acknowledge that same-run comparisons may be influenced by residual autocorrelation (as shown by Prince et al. 2022, eLife), we believe that our design mitigates this risk through consistency between within-run and cross-run results, long inter-trial intervals, and trial-wise estimation of activation. We have adjusted the manuscript, accordingly, reporting the combined analysis. We also report cross-run and within-run analysis separately in supplementary materials (Tables S12.1, S12.2, showing that they converge with the cross-run results and thus strengthen rather than dilute the findings. 

      As suggested, we now explicitly highlight the change over time as the central finding. We observe a clear increase in gist-like reinstatement from recent to remote memories in children, particularly in mPFC and vlPFC. These effects based on combined within- and cross-run comparisons, are now clearly stated in the main results and interpreted in the discussion accordingly. 

      (2) This analysis uses a different approach of comparing fixations to one another, rather than fixations to scenes. In their response letter and the revised paper, the authors do provide a bit of reasoning as to why this is the most sensible. However, it is still not clear to me whether this is really "reinstatement" which (in my mind) entails the re-evoking of a neural pattern initially engaged during perception. Rather, could this be a shared neural state that is category specific? 

      We thank the reviewer for raising this important conceptual point about whether our findings reflect reinstatement in the classical sense — namely, the reactivation of perceptual neural patterns — or a shared, category-specific state.

      While traditional definitions of reinstatement emphasize item-specific reactivation (e.g., Ritchey et al., 2013; Xiao et al., 2017) it is increasingly recognized that memory retrieval can also involve the reactivation of abstracted, generalized, or gist-like representations, especially as memories consolidate. Our analysis follows this view, aimed to capture how memory representations evolve over time, particularly in development.

      Several studies support this broader notion of gist-like reinstatement. For instance, Chen et al. (2017) showed that while event-specific patterns were reinstated across the default mode network and medial temporal lobe, inter-subject recall similarity exceeded encodingretrieval similarity, suggesting transformation and abstraction beyond perceptual reinstatement. Zhuang et al. (2021) further showed that loss of neural distinctiveness in the

      MTL over time predicted false memories, linking neural similarity to representational instability. This aligns with our finding that greater gist-like reinstatement is associated with lower memory accuracy.

      Ye et al. (2020) discuss how memory representations are reshaped post-encoding — becoming more differentiated, integrated, or weakened depending on task goals and neural resources. While their work focuses on adults, our previous findings (Schommartz et al., 2023) suggest that children’s neural systems (the same sample) are structurally immature, making them more likely to rely on gist-based consolidation (see Fandakova et al., 2019). Adults, by contrast, may retain more item-specific traces.

      Relatedly, St-Laurent & Buchsbaum (2019) show that with repeated encoding, neural memory representations become increasingly distinct from perception, suggesting that reinstatement need not mimic perception. We agree that reinstatement does not always reflect reactivation of low-level sensory patterns, particularly over long delays or in developing brains.

      Finally, while we did not correlate retrieval patterns directly with perceptual encoding patterns, we assessed neural similarity among retrieved items within vs. between categories, based on non-repeated, independently sampled trials. This approach is intended to capture the structure and delay-related transformation of mnemonic representations, especially in terms of how they become more schematic or gist-like over time. Our findings align conceptually with the results of Kuhl et al. (2012), who used MVPA to show that older and newer visual memories can be simultaneously reactivated during retrieval, with greater reactivation of older memories interfering with retrieval accuracy for newer memories. Their work highlights how overlapping category-level representations in ventral temporal cortex can reflect competition among similar memories, even in the absence of item-specific cues. In our developmental context, we interpret the increased neural similarity among category members in children as possibly reflecting such representational overlap or competition, where generalized traces dominate over item-specific ones. This pattern may reflect a shift toward efficient but less precise retrieval, consistent with developmental constraints on memory specificity and consolidation.

      In this context, we view our findings as evidence of memory trace reorganization — from differentiated, item-level representations toward more schematic, gist-like neural patterns (Sekeres et al., 2018), particularly in children. Our cross-run analyses further confirm that this is not an artifact of same-run correlations or low-level confounds. We have clarified this distinction and interpretation throughout the revised manuscript (see lines 144-158; 1163-1170).

      In any case, I think additional information should be added to the text to clarify that this definition differs from others in the literature. The authors might also consider using some term other than reinstatement. Again (as I noted in my prior review), the finding of no category-level reinstatement in adults is surprising and confusing given prior work and likely has to do with the operationalization of "reinstatement" here. I was not quite sure about the explanation provided in the response letter, as category-level reinstatement is quite widespread in the brain for adults and is robust to differences in analytic procedures etc. 

      We agree that our operationalization of "reinstatement" differs from more conventional uses of the term, which typically involve direct comparisons between encoding and retrieval phases, often with item-level specificity. As our analysis is based on similarity among retrieval-phase trials (fixation-based activation patterns) and focuses on within- versus between-category neural similarity, we agree that the term reinstatement may suggest a stronger encoding–retrieval mapping than we are claiming.

      To avoid confusion and overstatement, we have revised the terminology throughout the manuscript: we now refer to our measure as “gist-like representations” rather than “gist-like reinstatement.” This change better reflects the nature of our analysis — namely, that we are capturing shared neural patterns among category-consistent memories that may reflect reorganized or abstracted traces, especially after delay and in development.

      As the reviewer rightly points out, category-level reinstatement is well documented in adults (e.g., Kuhl & Chun, 2014; Tompary et al., 2020; Tompary & Davachi, 2017). The absence of such effects in our adult group may indeed reflect differences in study design, particularly our use of non-repeated, cross-trial comparisons based on fixation events. It may also reflect different consolidation strategies, with adults preserving more differentiated or item-specific representations, while children form more schematic or generalizable representations — a pattern consistent with our interpretation and supported by prior work (Fandakova et al., 2019; Sekeres et al., 2018) 

      We have updated the relevant sections of the manuscript (Results, Discussion (particularly lines 1163- 1184), and Figure captions) to clarify this terminology shift and explicitly contrast our approach with more standard definitions of reinstatement. We hope this revision provides the needed conceptual clarity while preserving the integrity of our developmental findings.

      (3) Also from a theoretical standpoint-I'm still a bit confused as to why gist-based reinstatement would involve reinstatement of the scene gist, rather than the object's location (on the screen) gist. Were the locations on the screen similar across scene backgrounds from the same category? It seems like a different way to define memory retrieval here would be to compare the neural patterns when cued to retrieve the same vs. similar (at the "gist" level) vs. different locations across object-scene pairs. This is somewhat related to a point from my review of the initial version of this manuscript, about how scene reinstatement is not necessary. The authors state that participants were instructed to reinstate the scene, but that does not mean they were actually doing it. The point that what is being measured via the reinstatement analyses is actually not necessary to perform the task should be discussed in more detail in the paper. 

      We appreciate the reviewer’s thoughtful theoretical question regarding whether our measure of “gist-like representations” might reflect reinstatement of spatial (object-location) gist, rather than scene-level gist. We would like to clarify several key points about our task design and interpretation:

      (1) Object locations were deliberately varied and context dependent.

      In our stimulus set, each object was embedded in a rich scene context, and the locations were distributed across six distinct possible areas within each scene, with three possible object placements per location. These placements were manually selected to ensure realistic and context-sensitive positioning of objects within the scenes. Importantly, locations were not fixed across scenes within a given category. For example, objects placed in “forest” scenes could appear in different screen locations across different scene exemplars (e.g., one in the bottom-left side, another floating above). Therefore, the task did not introduce a consistent spatial schema across exemplars from the same scene category that could give rise to a “location gist.”

      (2) Scene categories provided consistent high-level contextual information.

      By contrast, the scene categories (e.g., farming, forest, indoor, etc.) provided semantically coherent and visually rich contextual backgrounds that participants could draw upon during retrieval. This was emphasized in the instruction phase, where participants were explicitly encouraged to recall the whole scene based on the stories they created during learning (not just the object or its position). While we acknowledge that we cannot directly verify the reinstated content, this instruction aligns with prior studies showing that scene and context reinstatement can occur even without direct task relevance (e.g., Kuhl & Chun, 2014; Ritchey et al., 2013).

      (3) Our results are unlikely to reflect location-based reinstatement.

      If participants had relied on a “location gist” strategy, we would have expected greater neural similarity across scenes with similar spatial layouts, regardless of category. However, our design avoids this confound by deliberately varying locations across exemplars within categories. Additionally, our categorical neural similarity measure contrasted within-category vs. between-category comparisons — making it sensitive to shared contextual or semantic structure, not simply shared screen positions.

      Considering this, we believe that the neural similarity observed in the mPFC and vlPFC in children at long delay reflects the emergence of scene-level, gist-like representations, rather than low-level spatial regularities. Nevertheless, we now clarify this point in the manuscript and explicitly discuss the limitation that reinstatement of scene context was encouraged but not required for successful task performance.

      Future studies could dissociate spatial and contextual components of reinstatement more directly by using controlled spatial overlap or explicit location recall conditions. However, given the current task structure, location-based generalization is unlikely to account for the category-level similarity patterns we observe.

      (2) Inspired by another reviewer's comment, it is unclear to me the extent to which age group differences can be attributed to differences in age/development versus memory strength. I liked the other reviewer's suggestions about how to identify and control for differences in memory strength, which I don't think the authors actually did in the revision. They instead showed evidence that memory strength does seem to be lower in children, which indicates this is an interpretive confound. For example, I liked the reviewer's suggestion of performing analyses on subsets of participants who were actually matched in initial learning/memory performance would have been very informative. As it is, the authors didn't really control for memory strength adequately in my opinion, and as such their conclusions about children vs. adults could have been reframed as people with weak vs. strong memories. This is obviously a big drawback given what the authors want to conclude. Relatedly, I'm not sure the DDM was incorporated as the reviewer was suggesting; at minimum I think the authors need to do more work in the paper to explain what this means and why it is relevant. (I understand putting it in the supplement rather

      than the main paper, but I still wanted to know more about what it added from an interpretive perspective.) 

      We appreciate the reviewer’s thoughtful concerns regarding potential confounding effects of memory strength on the observed age group differences. This is indeed a critical issue when interpreting developmental findings.

      While we agree that memory strength differs between children and adults — and our own DDM-based analysis confirms this, mirroring differences observed in accuracy — we would like to emphasize that these differences are not incidental but rather reflect developmental changes in the underlying memory system. Given the known maturation of both structural and functional memory-related brain regions, particularly the hippocampus and prefrontal cortex, we believe it would be theoretically inappropriate to control for memory strength entirely, as doing so would remove variance that is central to the age-related neural effects we aim to understand.

      To address the reviewer's concern empirically, we conducted an additional control analysis in which we subsampled children to include only those who reached learning criterion after two cycles (N = 28 out of 49 children, see Table S1.1, S1.2, Figure S1, Table S9.1), thereby selecting a high-performing subgroup. Importantly, this subsample replicated behavioral and neural results to the full group. This further suggests that the observed age group differences are not merely driven by differences in memory strength.

      As abovementioned, the results of the DDM support our behavioral findings, showing that children have lower drift rates for evidence accumulation, consistent with weaker or less accessible memory representations. While these results are reported in the Supplementary Materials (section S2.1, Figure S2, Table S2), we agree that their interpretive relevance should be more clearly explained in the main text. We have therefore updated the Discussion section to explicitly state how the DDM results provide converging evidence for our interpretation that developmental differences in memory quality — not merely strategy or task performance — underlie the observed neural differences (see lines 904-926).

      In sum, we view memory strength not as a confound to be removed, but as a meaningful and theoretically relevant factor in understanding the emergence of gist-like representations in children. We have clarified this interpretive stance in the revised manuscript and now discuss the role of memory strength more explicitly in the Discussion.

      (3) Some of the univariate results reporting is a bit strange, as they are relying upon differences between retrieval of 1- vs. 14-day memories in terms of the recent vs. remote difference, and yet don't report whether the regions are differently active for recent and remote retrieval. For example in Figure 3A, neither anterior nor posterior hippocampus seem to be differentially active for recent vs. remote memories for either age group (i.e., all data is around 0). Precuneus also interestingly seems to show numerically recent>remote (values mostly negative), whereas most other regions show the opposite. This difference from zero (in either direction) or lack thereof seems important to the message. In response to this comment on the original manuscript, the authors seem to have confirmed that hippocampal activity was greater during retrieval than implicit baseline. But this was not really my question - I was asking whether hippocampus is (and other ROIs in this same figure are) differently engaged for recent vs. remote memories.

      We thank the reviewer for bringing up this important point. Our previous analysis showed that both anterior and posterior regions of the hippocampus, anterior parahippocampal gyrus and precuneus exhibited significant activation from zero in children and adults for correctly remembered items (see Fig. S2, Table S7 in Supplementary Materials). Based on your suggestion, our additional analysis showed: 

      (i) The linear mixed-effects model for correctly remembered items showed no significant interaction effects (group x session x memory age (recent, remote)) for the anterior hippocampus (all p > .146; see Table S7.1).

      (ii) For the posterior hippocampus, we observed a significant main effect of group (F(1,85),   = 5.62, p = .038), showing significantly lower activation in children compared to adults (b = .03, t = -2.34, p = .021). No other main or interaction effects were significant (all p > .08; see Table S7.1).

      (iii) For the anterior PHG, that also showed no significant remote > recent difference, the model showed that there was indeed no difference between remote and recent items across age groups and delays (all p > .194; Table S7.1). 

      Moreover, when comparing recent and remote hippocampal activation directly, there were no significant differences in either group (all FDR-adjusted p > .116; Table S7.2), supporting the conclusion that hippocampal involvement was stable across delays for successfully retrieved items. 

      In contrast, analysis of unsuccessfully remembered items showed that hippocampal activation was not significantly different from zero in either group (all FDR-adjusted p > .052; Fig. S2.1, Table S7.1), indicating that hippocampal engagement was specific to successful memory retrieval.

      To formally test whether hippocampal activation differs between remembered and forgotten items, we ran a linear mixed-effects model with Group, Memory Success (remembered vs. forgotten), and ROI (anterior vs. posterior hippocampus) as fixed effects. This model revealed a robust main effect of memory success (F(1,1198) = 128.27, p < .001), showing that hippocampal activity was significantly higher for remembered compared to forgotten items (b = .06, t(1207) = 11.29, p < .001; Table S7.3). 

      As the reviewer noted, precuneus activation was numerically higher for recent vs. remote items, and this was confirmed in our analysis. While both recent and remote retrieval elicited significantly above-zero activation in the precuneus (Table S7.2), activation for recent items was significantly higher than for remote items, consistent across both age groups.

      Taken together, these analyses support the conclusion that hippocampal involvement in successful retrieval is sustained across delays, while other ROIs such as the precuneus may show greater engagement for more recent memories. We have now updated the manuscript text ( lines 370-390) and supplementary materials to reflect these findings more clearly, as well as to clarify the distinction between activation relative to baseline and memory-agerelated modulation.

      (4) Related to point 3, the claims about hippocampus with respect to multiple trace theory feel very unsupported by the data. I believe the authors want to conclude that children's memory retrieval shows reliance on hippocampus irrespective of delay, presumably because this is a detailed memory task. However the authors have not really shown this; all they have shown is that hippocampal involvement (whatever it is) does not vary by delay. But we do not have compelling evidence that the hippocampus is involved in this task at all. That hippocampus is more active during retrieval than implicit baseline is a very low bar and does not necessarily indicate a role in memory retrieval. If the authors want to make this claim, more data are needed (e.g., showing that hippocampal activity during retrieval is higher when the upcoming memory retrieval is successful vs. unsuccessful). In the absence of this, I think all the claims about multiple trace theory supporting retrieval similarly across delays and that this is operational in children are inappropriate and should be removed. 

      We thank the reviewer for pointing this out. We agree that additional analysis of hippocampal activity during successful and unsuccessful memory retrieval is warranted. This will provide stronger support for our claim that strong, detailed memories during retrieval rely on the hippocampus in both children and adults. Our previously presented results on the remote > recent univariate signal difference in the hippocampus (p. 14-18; lines 433-376, Fig. 3A) show that this difference does not vary between children and adults, or between Day 1 and Day 14. Our further analysis showed that both anterior and posterior regions of the hippocampus exhibited significant activation from zero in children and adults for correctly remembered items (see Fig. S2, Table S7 in Supplementary Materials). Based on your suggestion, our recent additional analysis showed:

      (i) For forgotten items, we did not observe any activation significantly higher than zero in either the anterior or posterior hippocampus for recent and remote memory on Day 1 and Day 14 in either age group (all p > .052 FDR corrected; see Table S7.1, Fig. S2.1).

      (ii) After establishing no difference between recent and remote activation across and between sessions (Day 1, Day 14), we conducted another linear mixed-effects model with group x memory success (remembered, forgotten) x region (anterior hippocampus, posterior hippocampus), with subject as a random effect. The model showed no significant effects for the memory success x region interaction (F = 1.12(1,1198), p = .289) and no significant group x memory success x region interaction (F = .017(1,1198), p = .895). However, we observed a significant main effect of memory success (F = 128.27(1,1198), p < .001), indicating significantly higher hippocampal activation for remembered compared to forgotten items (b = .06, t = 11.29, p <.001; see Table S7.3).

      (iii) Considering the comparatively low number of incorrect trials for recent items in the adult group, we reran this analysis only for remote items. Similarly, the model showed no significant effects for the memory success x region interaction (F = .72(1,555), p = .398) and no significant group x memory success x region interaction (F = .14(1,555), p = .705). However, we observed a significant main effect of memory success (F = 68.03(1,555), p < .001), indicating significantly higher hippocampal activation for remote remembered compared to forgotten items (b = .07, t = 8.20, p <.001; see Table S7.3).

      Taken together, our results indicate that significant hippocampal activation was observed only for correctly remembered items in both children and adults, regardless of memory age and session. For forgotten items, we did not observe any significant hippocampal activation in either group or delay. Moreover, hippocampal activation was significantly higher for remembered compared to forgotten memories. This evidence supports our conclusions regarding the Multiple Trace and Trace Transformation Theories, suggesting that the hippocampus supports retrieval similarly across delays, and provides novel evidence that this process is operational in both children and adults. This aligns also with Contextual Bindings Theory, as well as empirical evidence by Sekeres, Winokur, & Moscovitch (2018), among others. We have added this information to the manuscript.

      (5) There are still not enough methodological details in the main paper to make sense of the results. Some of these problems were addressed in the revision but others remain. For example, a couple of things that were unclear: that initially learned locations were split, where half were tested again at day 1 and the other half at day 14; what specific criterion was used to determine to pick the 'well-learned' associations that were used for comparisons at different delay periods (object-scene pairs that participants remembered accurately in the last repetition of learning? Or across all of learning?). 

      We thank the reviewer for pointing this out. The initially learned object-scene associations on Day 0 were split in two halves based on  their categories before the testing. Specifically, half of the pairs from the first set and half of the pairs from the second set of 30 object-scene associations were used to create the set 30 remote pair for Day 1 testing. A similar procedure was repeated for the remaining pairs to create a set of remote object-scene associations for Day 14 retrieval. We tried to equally distribute the categories of pairs between the testing sets. We added this information to the methods section of the manuscript (see p. 47, lines 12371243). In addition, the sets of association for delay test on Day 1 and Day 14 were not based on their learning accuracy. Of note, the analysis of variance revealed that there was no difference in learning accuracy between the two sets created for delay tests in either age group (children: p = .23; adults  p = .06). These results indicate that the sets were comprised of items learned with comparable accuracy in both age groups. 

      (6) In still find the revised Introduction a bit unclear. I appreciated the added descriptions of different theories of consolidation, though the order of presented points is still a bit hard to follow. Some of the predictions I also find a bit confusing as laid out in the introduction. (1) As noted in the paper multiple trace theory predicts that hippocampal involvement will remain high provided memories retained are sufficiently high detail. The authors however also predict that children will rely more on gist (than detailed) memories than adults, which would seem to imply (combined with the MTT idea) that they should show reduced hippocampal involvement over time (while in adults, it should remain high). However, the authors' actual prediction is that hippocampus will show stable involvement over time in both kids and adults. I'm having a hard time reconciling these points. (2) With respect to the extraction of gist in children, I was confused by the link to Fuzzy Trace Theory given the children in the present study are a bit young to be showing the kind of gist extraction shown in the Brainerd & Reyna data. Would 5-7 year olds not be more likely to show reliance on verbatim traces under that framework? Also from a phrasing perspective, I was confused about whether gist-like information was something different from just gist in this sentence: "children may be more inclined to extract gist information at the expense of detailed or gist-like information." (p. 8) - is this a typo? 

      We thank the reviewer for this thoughtful observation. 

      Our hypothesis of stable hippocampal engagement over time was primarily based on Contextual Binding Theory (Yonelinas et al., 2019), and the MTT, supported by the evidence provided by Sekeres et al., 2018, which posits that the hippocampus continues to support retrieval when contextual information is preserved, even for older, consolidated memories. Given that our object-location associations were repeatedly encoded and tied to specific scene contexts, we believe that retrieval success for both recent and remote memories likely involved contextual reinstatement, leading to sustained hippocampal activity. Also in accordance with the MTT and related TTT, different memory representations may coexist, including detailed and gist-like memories. Therefore, we suggest that children may not rely on highly detailed item-specific memory, but rather on sufficiently contextualized schematic traces, which still engage the hippocampus. This distinction is now made clearer in the Introduction (see lines 223-236).

      We appreciate the reviewer’s point regarding Fuzzy Trace Theory (Brainerd & Reyna, 2002). Indeed, in classic FTT, young children are thought to rely more on verbatim traces due to immature gist extraction mechanisms (primarily from verbal material). However, we use the term “gist-like representations” to refer to schematic or category-level retrieval that emerges through structured, repeated learning (as in our task). This form of abstraction may not require full semantic gist extraction in the FTT sense but may instead reflect consolidation-driven convergence onto shared category-level representations — especially when strategic resources are limited. We now clarify this distinction and revise the ambiguous sentence with typo (“at the expense of detailed or gist-like information”) to better reflect our intended meaning (see p.8).

      (7) For the PLSC, if I understand this correctly, the profiles were defined for showing associations with behaviour across age groups. (1) As such, is it not "double dipping" to then show that there is an association between brain profile and behaviour-must this not be true by definition? If I am mistaken, it might be helpful to clarify this in the paper. (2) In addition, I believe for the univariate and scene-specific reinstatement analyses these profiles were defined across both age groups. I assume this doesn't allow for separate definition of profiles across the two group (i.e., a kind of "interaction"). If this is the case, it makes sense that there would not be big age differences... the profiles were defined for showing an association across all subjects. If the authors wanted to identify distinct profiles in children and adults they may need to run another analysis. 

      We thank the reviewer for this thoughtful comment. 

      (1) We agree that showing the correlation between the latent variable and behavior may be redundant, as the relationship is already embedded in the PLSC solution and quantified by the explained variance. Our intention was merely to visualize the strength of this relationship. In hindsight, we agree that this could be misinterpreted, and we have removed the additional correlation figure from the manuscript.

      We also see the reviewer’s point that, given the shared latent profile across groups, it is expected that the strength of the brain-behavior relationship does not differ between age groups. Instead, to investigate group differences more appropriately, we examined whether children and adults differed in their expression of the shared latent variable (i.e., brain scores). This analysis revealed that children showed significantly lower brain scores than adults both in short delay, t(83) = -4.227, p = .0001, and long delay, t(74) = -5.653, p < .001, suggesting that while the brain-behavior profile is shared, its expression varies by group. We have added this clarification to the Results section (p. 19-20) of the revised manuscript. 

      (2) Regarding the second point, we agree with the reviewer that defining the PLS profiles across both age groups inherently limits the ability to detect group-specific association, as the resulting latent variables represent shared pattern across the full sample. To address this, we conducted additional PLS analyses separately within each age group to examine whether distinct neural upregulation profiles (remote > recent) emerge for short and long delay conditions.

      These within-group analyses, however, were based on smaller subsamples, which reduced statistical power, especially when using bootstrapping to assess the stability of the profiles. For the short delay, although some regions reached significance, the overall latent variables did not reach conventional thresholds for stability (all p > .069), indicating that the profiles were not robust. This suggests that within-group PLS analyses may be underpowered to detect subtle effects, particularly when modelling neural upregulation (remote > recent), which may be inherently small.

      Nonetheless, when we exploratively applied PLSC separately within each group using recent and remote activity levels against the implicit baseline (rather than the contrast remote > recent) and its relation to memory performance, we observed significant and stable latent variables in both children and adults. This implies that such contrasts (vs. baseline) may be more sensitive and better suited to detect meaningful brain–behavior relationships within age groups. We have added this clarification to the Results sections of the manuscript to highlight the limitations of within-group contrasts for neural upregulation. 

      Author response image 1.

      (3) Also, as for differences between short delay brain profile and long delay brain profile for the scene-specific reinstatement - there are 2 regions that become significant at long delay that were not significant at a short delay (PC, and CE). However, given there are ceiling effects in behaviour at the short but not long delay, it's unclear if this is a meaningful difference or just a difference in sensitivity. Is there a way to test whether the profiles are statistically different from one another?

      We thank the reviewer for this comment. To better illustrate differential profiles also for high memory accuracy after immediate delay (30 minutes delay), we added the immediate (30 minutes delay) condition as a third reference point, given the availability of scene-specific reinstatement data at this time point. Interestingly, the immediate reinstatement profile revealed a different set of significant regions, with distinct expression patterns compared to both the short and long delay conditions. This supports the view that scene-specific reinstatement is not static but dynamically reorganized over time.

      Regarding the ceiling effect at short delay, we acknowledge this as a potential limitation. However, we note that our primary analyses were conducted across both age groups combined, and not solely within high-performing individuals. As such, the grouping may mitigate concerns that ceiling-level performance in a subset of participants unduly influenced the overall reinstatement profile. Moreover, we observed variation in neural reinstatement despite ceiling-level behavior, suggesting that the neural signal retains sensitivity to consolidation-related processes even when behavioral accuracy is near-perfect.

      While we agree that formal statistical comparisons of reinstatement profiles across delays (e.g., using representational profile similarity or interaction tests) could be an informative direction, we feel that this goes beyond the scope of the current manuscript. 

      (4) As I mentioned above, it also was not ideal in my opinion that all regions were included for the scene-specific reinstatement due to the authors' inability to have an appropriate baseline and therefore define above-chance reinstatement. It makes these findings really challenging to compare with the gist reinstatement ones. 

      We appreciate the reviewer’s comment and agree that the lack of a clearly defined baseline for scene-specific reinstatement limits our ability to determine whether these values reflect above-chance reinstatement. However, we would like to clarify that we do not directly compare the magnitude of scene-specific reinstatement to that of gist-like reinstatement in our analyses or interpretations. These two analyses serve complementary purposes: the scenespecific analysis captures trial-unique similarity (within-item reinstatement), while the gistlike analysis captures category-level representational structure (across items). Because they differ not only in baseline assumptions but also in analytical scope and theoretical interpretation, our goal was not to compare them directly, but rather to explore distinct but co-existing representational formats that may evolve differently across development and delay.

      (8) I would encourage the authors to be specific about whether they are measuring/talking about memory representations versus reinstatement, unless they think these are the same thing (in which case some explanation as to why would be helpful). For example, especially under the Fuzzy Trace framework, couldn't someone maintain both verbatim and gist traces of a memory yet rely more on one when making a memory decision? 

      We thank the reviewer for pointing out the importance of conceptual clarity when referring to memory representations versus reinstatement. We agree that these are distinct but related concepts: in our framework, memory representations refer to the neural content stored as a result of encoding and consolidation, whereas reinstatement refers to the reactivation of those representations during retrieval. Thus, reinstatement serves as a proxy for the underlying memory representation — it is how we measure or infer the nature (e.g., specificity, abstraction) of the stored content.

      Under Fuzzy Trace Theory, it is indeed possible for both verbatim and gist representations to coexist. Our interpretation is not that children lack verbatim traces, but rather that they are more likely to rely on schematic or gist-like representations during retrieval, especially after a delay. Our use of neural pattern similarity (reinstatement) reflects which type of representation is being accessed, not necessarily which traces exist in parallel.

      To avoid ambiguity, we have revised the manuscript to more explicitly distinguish between reinstatement (neural reactivation) and the representational format (verbatim vs. gist-like), especially in the framing of our hypotheses and interpretation of age group differences.

      (9) With respect to the learning criteria - it is misleading to say that "children needed between two to four learning-retrieval cycles to reach the criterion of 83% correct responses" (p. 9). Four was the maximum, and looking at the Figure 1C data it appears as though there were at least a few children who did not meet the 83% minimum. I believe they were included in the analysis anyway? Please clarify. Was there any minimum imposed for inclusion?

      We thank the reviewer for pointing this out. As stated in Methods Section (p. 50, lines 13261338) “These cycles ranged from a minimum of two to a maximum of four.<…> The cycles ended when participants provided correct responses to 83% of the trials or after the fourth cycle was reached.” We have corrected the corresponding wording in the Results section (line 286-289) to reflect this more accurately. Indeed, five children did not reach the 83% criterion but achieved final performance between 70 and 80% after the fourth learning cycle. These participants were included in this analysis for two main reasons:

      (1) The 83% threshold was established during piloting as a guideline for how many learningretrieval cycles to allow, not a strict learning criterion. It served to standardize task continuation, rather than to exclude participants post hoc.

      (2) The performance of these five children was still well above chance level (33%), indicating meaningful learning. Excluding them would have biased the sample toward higherperforming children and reduced the ecological validity of our findings. Including them ensures a more representative view of children’s performance under extended learning conditions.

      (10) For the gist-like reinstatement PLSC analysis, results are really similar a short and long delays and yet some of the text seems to implying specificity to the long delay. One is a trend and one is significant (p. 31), but surely these two associations would not be statistically different from one another?  

      We agree with the reviewer that the associations at short and long delays appeared similar. While a formal comparison (e.g., using a Z-test for dependent correlations) would typically be warranted, in the reanalyzed dataset only the long delay profile remains statistically significant, which limits the interpretability of such a comparison. 

      (11) As a general comment, I had a hard time tying all of the (many) results together. For example adults show more mature neocortical consolidation-related engagement, which the authors say is going to create more durable detailed memories, but under multiple trace theory we would generally think of neocortical representations as providing more schematic information. If the authors could try to make more connections across the different neural analyses, as well as tie the neural findings in more closely with the behaviour & back to the theoretical frameworks, that would be really helpful.  

      We thank the reviewer for this valuable suggestion. We have revised the discussion section to more clearly link the behavioral and neural findings and to interpret them in light of existing consolidation theories for better clarity. 

      Reviewer #2 (Public Review): 

      Schommartz et al. present a manuscript characterizing neural signatures of reinstatement during cued retrieval of middle-aged children compared to adults. The authors utilize a paradigm where participants learn the spatial location of semantically related item-scene memoranda which they retrieve after short or long delays. The paradigm is especially strong as the authors include novel memoranda at each delayed time point to make comparisons across new and old learning. In brief, the authors find that children show more forgetting than adults, and adults show greater engagement of cortical networks after longer delays as well as stronger item-specific reinstatement. Interestingly, children show more category-based reinstatement, however, evidence supports that this marker may be maladaptive for retrieving episodic details. The question is extremely timely both given the boom in neurocognitive research on the neural development of memory, and the dearth of research on consolidation in this age group. Also, the results provide novel insights into why consolidation processes may be disrupted in children. 

      We thank the reviewer for the positive evaluation.

      Comments on the revised version: 

      I carefully reviewed not only the responses to my own reviews as well as those raised by the other reviewers. While they addressed some of the concerns raised in the process, I think many substantive concerns remain. 

      Regarding Reviewer 1: 

      The authors point that the retrieval procedure is the same over time and similarly influenced by temporal autocorrelations, which makes their analysis okay. However, there is a fundamental problem as to whether they are actually measuring reinstatement or they are only measuring differences in temporal autocorrelation (or some non-linear combination of both). The authors further argue that the stimuli are being processed more memory wise rather than perception wise, however, I think there is no evidence for that and that perception-memory processes should be considered on a continuum rather than as discrete processes. Thus, I agree with reviewer 1 that these analyses should be removed. 

      We thank the reviewer for raising this important question. We would like to clarify a few key points regarding temporal autocorrelation and reinstatement.

      During the fixation window, participants were instructed to reinstate the scene and location associated with the cued object from memory. This task was familiar to them, as they had been trained in retrieving locations within scenes. Our analysis aims to compare the neural representations during this retrieval phase with those when participants view the scene, in order to assess how these representations change in similarity over time, as memories become less precise.

      We acknowledge that temporal proximity can lead to temporal autocorrelation. However, evidence suggests that temporal autocorrelation is consistent and stable across conditions (Gautama & Van Hulle, 2004; Woolrich et al., 2004). Shinn & Lagalwar (2021)further demonstrated that temporal autocorrelation is highly reliable at both the subject and regional levels. Given that we analyze regions of interest (ROIs) separately, potential spatial variability in temporal autocorrelation is not a major concern.

      No difference between item-specific reinstatement for recent items on day 1 and day 14 (which were merged) for further delay-related comparison also suggests that the reinstatement measure was stable for recent items even sampled at two different testing days. 

      Importantly, we interpret the relative change in the reinstatement index rather than its absolute value.

      In addition, when we conducted the same analysis for incorrectly retrieved memories, we did not observe any delay-related decline in reinstatement (see p. 25, lines 623-627). This suggests that the delay-related changes in reinstatement are specific to correctly retrieved memories. 

      Finally, our control analysis examining reinstatement between object and fixation time points (as suggested by Reviewer 1) revealed no delay-related effects in any ROI (see p.24, lines 605-612), further highlighting the specificity of the observed delay-related change in item reinstatement.

      We emphasize that temporal autocorrelation should be similar across all retrieval delays due to the identical task design and structure. Therefore, any observed decrease in reinstatement with increasing delay likely reflects a genuine change in the reinstatement index, rather than differences in temporal autocorrelation. Since our analysis includes only correctly retrieved items, and there is no perceptual input during the fixation window, this process is inherently memory-based, relying on mnemonic retrieval rather than sensory processing.

      We respectfully disagree with the reviewer's assertion that retrieval during the fixation period cannot be considered more memory-driven than perception-driven. At this time point, participants had no access to actual images of the scene, making it necessary for them to rely on mnemonic retrieval. The object cue likely triggered pattern completion for the learned object-scene association, forming a unique memory if remembered correctly(Horner & Burgess, 2013). This process is inherently mnemonic, as it is based on reconstructing the original neural representation of the scene (Kuhl et al., 2012; Staresina et al., 2013).

      While perception and memory processes can indeed be viewed as a continuum, some cognitive processes are predominantly memory-based, involving reconstruction rather than reproduction of previous experiences (Bartlett, 1932; Ranganath & Ritchey, 2012). In our task, although the retrieved material is based on previously encoded visual information, the process of recalling this information during the fixation period is fundamentally mnemonic, as it does not involve visual input. Our findings indicate that the similarity between memorybased representations and those observed during actual perception decreases over time, suggesting a relative change in the quality of the representations. However, this does not imply that detailed representations disappear; they may still be robust enough to support correct memory recall. Previous studies examining encoding-retrieval similarity have shown similar findings(Pacheco Estefan et al., 2019; Ritchey et al., 2013).

      We do not claim that perception and memory processes are entirely discrete, nor do we suggest that only perception is involved when participants see the scene. Viewing the scene indeed involves recognition processes, updating retrieved representations from the fixation period, and potentially completing missing or unclear information. This integrative process demonstrates the interrelation of perception and memory, especially in complex tasks like the one we employed.

      In conclusion, our task design and analysis support the interpretation that the fixation period is primarily characterized by mnemonic retrieval, facilitated by cue-triggered pattern completion, rather than perceptual processing. We believe this approach aligns with the current understanding of memory retrieval processes as supported by the existing literature.

      The authors seem to have a design that would allow for across run comparisons, however, they did not include these additional analyses. 

      Thank you for pointing this out. We ran as additional cross-run comparison. This results and further proceeding are reported in the comment for reviewer 1. 

      To address the reviewer’s concern, we conducted an additional cross-run analysis for all correctly retrieved trials. The approach restricted comparisons to non-overlapping runs (run1run2, run2-run3, run1-run3). This analysis revealed robust gist-like reinstatement in children for remote Day 14 memories in the mPFC (p = .035) and vlPFC (p = .0007), in adults’ vlPFC remote Day 1 memories (p = .029), as well as in children and adults remote Day 1 memories in LOC (p < .02). A significant Session effect in both regions (mPFC: p = .026; vlPFC: p = .002) indicated increased reinstatement for long delay (Day 14) compared to short-delay and recent session (all p < .05). Given that the cross-run results largely replicate and reinforce the effects found previously with within-run, we believe that combining both sources of information is methodologically justified and statistically beneficial. Specifically, both approaches independently identified significant gist-like reinstatement in children’s mPFC and vlPFC (although within-run vlPFC effect (short delay: p = .038; long delay p = .047) did not survive multiple comparisons), particularly for remote memories. Including both withinrun and between-run comparisons increases the number of unique, non-repeated trial pairs, improving statistical power without introducing redundancy. While we acknowledge that same-run comparisons may be influenced by residual autocorrelation(Prince et al., 2022), we believe that our design mitigates this risk through consistency between within-run and crossrun results, long inter-trial intervals, and trial-wise estimation of activation. We have adjusted the manuscript, accordingly, reporting the combined analysis. We also report cross-run and within-run analysis separately in supplementary materials (Tables S12.1, S12.2, showing that they converge with the cross-run results and thus strengthen rather than dilute the findings. 

      As suggested, we now explicitly highlight the change over time as the central finding. We observe a clear increase in gist-like reinstatement from recent to remote memories in children, particularly in mPFC and vlPFC. These effects based on combined within- and cross-run comparisons, are now clearly stated in the main results and interpreted in the discussion accordingly. 

      (1) The authors did not satisfy my concerns about different amounts of re-exposures to stimuli as a function of age, which introduces a serious confound in the interpretation of the neural data. 

      (2) Regarding Reviewer 1's point about different number of trials being entered into analysis, I think a more formal test of sub-sampling the adult trials is warranted. 

      (1) We thank the reviewer for pointing this out. Overall, children needed 2 to 4 learning cycles to improve their performance and reach the learning criteria, compared to 2 learning cycles in adults. To address the different amounts of re-exposure to stimuli between the age groups, we subsampled the child group to only those children who reached the learning criteria after 2 learning cycles. For this purpose, we excluded 21 children from the analysis who needed 3 or 4 learning cycles. This resulted in 39 young adults and 28 children being included in the subsequent analysis. 

      (i) We reran the behavioral analysis with the subsampled dataset (see Supplementary Materials,  Table S1.1, Fig. S1, Table S1.2). This analysis replicated the previous findings of less robust memory consolidation in children across all time delays. 

      (ii) We reran the univariate analysis (see in Supplementary Materials, Table S9.1). This analysis also replicated fully the previous findings. This indicates that the inclusion of child participants with greater material exposure during learning in the analysis of neural retrieval patterns did not affect the group differences in univariate neural results. 

      These subsampled results demonstrated that the amount of re-exposure to stimuli during encoding does not affect consolidation-related changes in memory retrieval at the behavioral and neural levels in children and adults across all time delays. We have added this information to the manuscript (line 343-348, 420-425). 

      (2) We appreciate Reviewer 1's suggestion to perform a formal test by sub-sampling the adult trials to match the number of trials in the child group. However, we believe that this approach may not be optimal for the following reasons:

      (i) Loss of Statistical Power: Sub-sampling the adult trials would result in a reduced sample size, potentially leading to a significant loss of statistical power and the ability to detect meaningful effects, particularly in a context where the adult group is intended to serve as a robust control or comparison group.

      (ii) Introducing sub-sampling could introduce variability that complicates the interpretation of results, particularly if the trial sub-sampling process does not fully capture the variability inherent in the original adult data.

      (iii) Robustness of Existing Findings: We have already addressed potential concerns about unequal trial numbers by conducting analyses that control for the number of learning cycles, as detailed in our supplementary materials. These analyses have shown that the observed effects are consistent, suggesting that the differences in trial numbers do not critically influence our findings.

      Given these considerations, we hope the reviewer understands our rationale and agrees that the current analysis is robust and appropriate for addressing the research questions.

      I also still fundamentally disagree with the use of global signals when comparing children to adults, and think this could very much skew the results. 

      We thank the reviewer for raising this important issue. To address this concern comprehensively, we have taken the following steps:

      (1) Overview of the literature support for global signal regression (GSR). A growing body of methodological and empirical research supports the inclusion of global signal repression as part of best practice denoising pipelines, particularly when analyzing pediatric fMRI data. Studies such as (Ciric et al., 2017; Parkes et al., 2018; J. D. Power et al., 2012, 2014; Power et al., 2012), and (Thompson et al., 2016) show that  GSR improves motion-related artifact removal. Critically, pediatric-specific studies (Disselhoff et al., 2025; Graff et al., 2022) conclude that pipelines including GSR are most effective for signal recovery and artifact removal in younger children. Graff et al. (2021) demonstrated that among various pipelines, GSR yielded the best noise reduction in 4–8-year-olds. Additionally, (Li et al., 2019; Qing et al., 2015) emphasized that GSR reduces artifactual variance without distorting the spatial structure of neural signals. (Ofoghi et al., 2021)demonstrated that global signal regression helps mitigate non-neuronal noise sources, including respiration, cardiac activity, motion, vasodilation, and scanner-related artifacts. Based on this and other recent findings, we consider GSR particularly beneficial for denoising paediatric  fMRI data in our study.

      (2) Empirical comparison of pipelines with and without GSR. We re-run the entire first-level univariate analysis using the pipeline that excluded the global signal regression. The resulting activation maps (see Supplementary Figure S3.2, S4.2, S5.2, S9.2) differed notably from the original pipeline. Specifically, group differences in cortical regions such as mPFC, cerebellum, and posterior PHG no longer reached significance, and the overall pattern of results appeared noisier. 

      (3) Evaluation of the pipeline differences. To further evaluate the impact of GSR, we conducted the following analyses:

      (a) Global signal is stable across groups and sessions. A linear mixed-effects model showed no significant main effects or interactions involving group or session on the global signal (F-values < 2.62, p > .11), suggesting that the global signal was not group- or session-dependent in our sample. 

      (b) Noise Reduction Assessment via Contrast Variability. We compared the variability (standard deviation and IQR) of contrast estimates across pipelines. Both SD (b = .070, p < .001) and IQR (b = .087, p < .001) were significantly reduced in the GSR pipeline, especially in children (p < .001) compared to adults (p = .048). This suggests that GSR reduces inter-subject variability in children, likely reflecting improved signal quality.

      (c) Residual Variability After Regressing Global Signal. We regressed out global signal post hoc from both pipelines and compared the residual variance. Residual standard deviation was significantly lower for the GSR pipeline (F = 199, p < .001), with no interaction with session or group, further indicating that GSR stabilizes the signal and attenuates non-neuronal variability.

      Conclusion

      In summary, while we understand the reviewer’s concern, we believe the empirical and theoretical support for GSR, especially in pediatric samples, justifies its use in our study. Nonetheless, to ensure full transparency, we provide full results from both pipelines in the Supplementary Materials and have clarified our reasoning in the revised manuscript.

      Reviewer #1 (Recommendations For The Authors): 

      (1) Some figures are still missing descriptions of what everything on the graph means; please clarify in captions. 

      We thank the reviewer for pointing this out. We undertook the necessary adjustments in the graph annotations. 

      (2) The authors conclude they showed evidence of neural reorganization of memory representations in children (p. 41). But the gist is not greater in children than adults, and also does not differ over time-so, I was confused about what this claim was based on? 

      We thank the reviewer for raising this question. Our results on gist-like reinstatements suggest that gist-like reinstatement was significantly higher in children compared to adults in the mPFC in addition to the child gist-like reinstatement indices being significantly higher than zero (see p.27-28). These results support our claim on neural reorganization of memory represenations in children. We hope this clarifies the issue. 

      References

      Bartlett, F. C. (1932). Remembering: A study in experimental and social psychology. Cambridge University Press.

      Brainerd, C. J., & Reyna, V. F. (2002). Fuzzy-Trace Theory: Dual Processes in Memory, Reasoning, and Cognitive Neuroscience (pp. 41–100). https://doi.org/10.1016/S00652407(02)80062-3

      Chen, J., Leong, Y. C., Honey, C. J., Yong, C. H., Norman, K. A., & Hasson, U. (2017). Shared memories reveal shared structure in neural activity across individuals. Nature Neuroscience, 20(1), 115–125. https://doi.org/10.1038/nn.4450

      Ciric, R., Wolf, D. H., Power, J. D., Roalf, D. R., Baum, G. L., Ruparel, K., Shinohara, R. T., Elliott, M. A., Eickhoff, S. B., Davatzikos, C., Gur, R. C., Gur, R. E., Bassett, D. S., & Satterthwaite, T. D. (2017). Benchmarking of participant-level confound regression strategies for the control of motion artifact in studies of functional connectivity. NeuroImage, 154, 174–187. https://doi.org/10.1016/j.neuroimage.2017.03.020

      Disselhoff, V., Jakab, A., Latal, B., Schnider, B., Wehrle, F. M., Hagmann, C. F., Held, U., O’Gorman, R. T., Fauchère, J.-C., & Hüppi, P. (2025). Inhibition abilities and functional brain connectivity in school-aged term-born and preterm-born children. Pediatric Research, 97(1), 315–324. https://doi.org/10.1038/s41390-024-03241-0

      Esteban, O., Ciric, R., Finc, K., Blair, R. W., Markiewicz, C. J., Moodie, C. A., Kent, J. D., Goncalves, M., DuPre, E., Gomez, D. E. P., Ye, Z., Salo, T., Valabregue, R., Amlien, I. K., Liem, F., Jacoby, N., Stojić, H., Cieslak, M., Urchs, S., … Gorgolewski, K. J. (2020). Analysis of task-based functional MRI data preprocessed with fMRIPrep. Nature Protocols, 15(7), 2186–2202. https://doi.org/10.1038/s41596-020-0327-3

      Fandakova, Y., Leckey, S., Driver, C. C., Bunge, S. A., & Ghetti, S. (2019). Neural specificity of scene representations is related to memory performance in childhood. NeuroImage, 199, 105–113. https://doi.org/10.1016/j.neuroimage.2019.05.050

      Gautama, T., & Van Hulle, M. M. (2004). Optimal spatial regularisation of autocorrelation estimates in fMRI analysis. NeuroImage, 23(3), 1203–1216.  https://doi.org/10.1016/j.neuroimage.2004.07.048

      Graff, K., Tansey, R., Ip, A., Rohr, C., Dimond, D., Dewey, D., & Bray, S. (2022). Benchmarking common preprocessing strategies in early childhood functional connectivity and intersubject correlation fMRI. Developmental Cognitive Neuroscience, 54, 101087. https://doi.org/10.1016/j.dcn.2022.101087

      Horner, A. J., & Burgess, N. (2013). The associative structure of memory for multi-element events. Journal of Experimental Psychology: General, 142(4), 1370–1383. https://doi.org/10.1037/a0033626

      Jones, J. S., the CALM Team, & Astle, D. E. (2021). A transdiagnostic data-driven study of children’s behaviour and the functional connectome. Developmental Cognitive Neuroscience, 52, 101027. https://doi.org/10.1016/j.dcn.2021.101027

      Kuhl, B. A., Bainbridge, W. A., & Chun, M. M. (2012). Neural Reactivation Reveals Mechanisms for Updating Memory. Journal of Neuroscience, 32(10), 3453–3461. https://doi.org/10.1523/JNEUROSCI.5846-11.2012

      Kuhl, B. A., & Chun, M. M. (2014). Successful Remembering Elicits Event-Specific Activity Patterns in Lateral Parietal Cortex. Journal of Neuroscience, 34(23), 8051–8060. https://doi.org/10.1523/JNEUROSCI.4328-13.2014

      Li, J., Kong, R., Liégeois, R., Orban, C., Tan, Y., Sun, N., Holmes, A. J., Sabuncu, M. R., Ge, T., & Yeo, B. T. T. (2019). Global signal regression strengthens association between resting-state functional connectivity and behavior. NeuroImage, 196, 126–141. https://doi.org/10.1016/j.neuroimage.2019.04.016

      Ofoghi, B., Chenaghlou, M., Mooney, M., Dwyer, D. B., & Bruce, L. (2021). Team technical performance characteristics and their association with match outcome in elite netball. International Journal of Performance Analysis in Sport, 21(5), 700–712. https://doi.org/10.1080/24748668.2021.1938424

      Pacheco Estefan, D., Sánchez-Fibla, M., Duff, A., Principe, A., Rocamora, R., Zhang, H., Axmacher, N., & Verschure, P. F. M. J. (2019). Coordinated representational reinstatement in the human hippocampus and lateral temporal cortex during episodic memory retrieval. Nature Communications, 10(1), 2255. https://doi.org/10.1038/s41467019-09569-0

      Parkes, L., Fulcher, B., Yücel, M., & Fornito, A. (2018). An evaluation of the efficacy, reliability, and sensitivity of motion correction strategies for resting-state functional MRI. NeuroImage, 171, 415–436. https://doi.org/10.1016/j.neuroimage.2017.12.073

      Power, J. D., Barnes, K. A., Snyder, A. Z., Schlaggar, B. L., & Petersen, S. E. (2012). Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion. NeuroImage, 59(3), 2142–2154. https://doi.org/10.1016/j.neuroimage.2011.10.018

      Power, J. D., Mitra, A., Laumann, T. O., Snyder, A. Z., Schlaggar, B. L., & Petersen, S. E. (2014). Methods to detect, characterize, and remove motion artifact in resting state fMRI. NeuroImage, 84, 320–341. https://doi.org/10.1016/j.neuroimage.2013.08.048

      Power, S. D., Kushki, A., & Chau, T. (2012). Intersession Consistency of Single-Trial Classification of the Prefrontal Response to Mental Arithmetic and the No-Control State by NIRS. PLoS ONE, 7(7), e37791. https://doi.org/10.1371/journal.pone.0037791

      Prince, J. S., Charest, I., Kurzawski, J. W., Pyles, J. A., Tarr, M. J., & Kay, K. N. (2022). Improving the accuracy of single-trial fMRI response estimates using GLMsingle. ELife, 11. https://doi.org/10.7554/eLife.77599

      Qing, Z., Dong, Z., Li, S., Zang, Y., & Liu, D. (2015). Global signal regression has complex effects on regional homogeneity of resting state fMRI signal. Magnetic Resonance Imaging, 33(10), 1306–1313. https://doi.org/10.1016/j.mri.2015.07.011

      Ranganath, C., & Ritchey, M. (2012). Two cortical systems for memory-guided behaviour. Nature Reviews Neuroscience, 13(10), 713–726. https://doi.org/10.1038/nrn3338

      Ritchey, M., Wing, E. A., LaBar, K. S., & Cabeza, R. (2013). Neural Similarity Between Encoding and Retrieval is Related to Memory Via Hippocampal Interactions. Cerebral Cortex, 23(12), 2818–2828. https://doi.org/10.1093/cercor/bhs258

      Satterthwaite, T. D., Elliott, M. A., Gerraty, R. T., Ruparel, K., Loughead, J., Calkins, M. E., Eickhoff, S. B., Hakonarson, H., Gur, R. C., Gur, R. E., & Wolf, D. H. (2013). An improved framework for confound regression and filtering for control of motion artifact in the preprocessing of resting-state functional connectivity data. NeuroImage, 64, 240–256. https://doi.org/10.1016/j.neuroimage.2012.08.052

      Schommartz, I., Lembcke, P. F., Pupillo, F., Schuetz, H., de Chamorro, N. W., Bauer, M., Kaindl, A. M., Buss, C., & Shing, Y. L. (2023). Distinct multivariate structural brain profiles are related to variations in short- and long-delay memory consolidation across children and young adults. Developmental Cognitive Neuroscience, 59. https://doi.org/10.1016/J.DCN.2022.101192

      Sekeres, M. J., Winocur, G., & Moscovitch, M. (2018). The hippocampus and related neocortical structures in memory transformation. Neuroscience Letters, 680, 39–53. https://doi.org/10.1016/j.neulet.2018.05.006

      Shinn, L. J., & Lagalwar, S. (2021). Treating Neurodegenerative Disease with Antioxidants: Efficacy of the Bioactive Phenol Resveratrol and Mitochondrial-Targeted MitoQ and SkQ. Antioxidants, 10(4), 573. https://doi.org/10.3390/antiox10040573

      Staresina, B. P., Alink, A., Kriegeskorte, N., & Henson, R. N. (2013). Awake reactivation predicts memory in humans. Proceedings of the National Academy of Sciences, 110(52), 21159–21164. https://doi.org/10.1073/pnas.1311989110

      St-Laurent, M., & Buchsbaum, B. R. (2019). How Multiple Retrievals Affect Neural Reactivation in Young and Older Adults. The Journals of Gerontology: Series B, 74(7), 1086–1100. https://doi.org/10.1093/geronb/gbz075

      Thompson, G. J., Riedl, V., Grimmer, T., Drzezga, A., Herman, P., & Hyder, F. (2016). The Whole-Brain “Global” Signal from Resting State fMRI as a Potential Biomarker of Quantitative State Changes in Glucose Metabolism. Brain Connectivity, 6(6), 435–447. https://doi.org/10.1089/brain.2015.0394

      Tompary, A., & Davachi, L. (2017). Consolidation Promotes the Emergence of Representational Overlap in the Hippocampus and Medial Prefrontal Cortex. Neuron, 96(1), 228-241.e5. https://doi.org/10.1016/j.neuron.2017.09.005

      Tompary, A., Zhou, W., & Davachi, L. (2020). Schematic memories develop quickly, but are not expressed unless necessary. PsyArXiv.

      Woolrich, M. W., Behrens, T. E. J., Beckmann, C. F., Jenkinson, M., & Smith, S. M. (2004). Multilevel linear modelling for FMRI group analysis using Bayesian inference. NeuroImage, 21(4), 1732–1747. https://doi.org/10.1016/j.neuroimage.2003.12.023

      Xiao, X., Dong, Q., Gao, J., Men, W., Poldrack, R. A., & Xue, G. (2017). Transformed Neural Pattern Reinstatement during Episodic Memory Retrieval. The Journal of Neuroscience, 37(11), 2986–2998. https://doi.org/10.1523/JNEUROSCI.2324-16.2017

      Ye, Z., Shi, L., Li, A., Chen, C., & Xue, G. (2020). Retrieval practice facilitates memory updating by enhancing and differentiating medial prefrontal cortex representations. ELife, 9, 1–51. https://doi.org/10.7554/ELIFE.57023

      Yonelinas, A. P., Ranganath, C., Ekstrom, A. D., & Wiltgen, B. J. (2019). A contextual binding theory of episodic memory: systems consolidation reconsidered. Nature Reviews. Neuroscience, 20(6), 364–375. https://doi.org/10.1038/S41583-019-01504

      Zhuang, L., Wang, J., Xiong, B., Bian, C., Hao, L., Bayley, P. J., & Qin, S. (2021). Rapid neural reorganization during retrieval practice predicts subsequent long-term retention and false memory. Nature Human Behaviour, 6(1), 134–145.

      https://doi.org/10.1038/s41562-021-01188-4

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary: 

      In this manuscript, the authors identified that

      (1) CDK4/6i treatment attenuates the growth of drug-resistant cells by prolongation of the G1 phase;

      (2) CDK4/6i treatment results in an ineffective Rb inactivation pathway and suppresses the growth of drugresistant tumors;

      (3) Addition of endocrine therapy augments the efficacy of CDK4/6i maintenance; 

      (4) Addition of CDK2i with CDK4/6 treatment as second-line treatment can suppress the growth of resistant cell; 

      (5) The role of cyclin E as a key driver of resistance to CDK4/6 and CDK2 inhibition.

      Strengths: 

      To prove their complicated proposal, the authors employed orchestration of several kinds of live cell markers, timed in situ hybridization, IF and Immunoblotting. The authors strongly recognize the resistance of CDK4/6 + ET therapy and demonstrated how to overcome it. 

      Weaknesses: 

      The authors need to underscore their proposed results from what is to be achieved by them and by other researchers. 

      Reviewer #2 (Public review): 

      Summary: 

      This study elucidated the mechanism underlying drug resistance induced by CDK4/6i as a single agent and proposed a novel and efficacious second-line therapeutic strategy. It highlighted the potential of combining CDK2i with CDK4/6i for the treatment of HR+/HER2- breast cancer.

      Strengths: 

      The study demonstrated that CDK4/6 induces drug resistance by impairing Rb activation, which results in diminished E2F activity and a delay in G1 phase progression. It suggests that the synergistic use of CDK2i and CDK4/6i may represent a promising second-line treatment approach. Addressing critical clinical challenges, this study holds substantial practical implications.

      Weaknesses: 

      (1) Drug-resistant cell lines: Was a drug concentration gradient treatment employed to establish drug-resistant cell lines? If affirmative, this methodology should be detailed in the materials and methods section. 

      We greatly appreciate the reviewer for raising this important question. In the revised manuscript, we have updated the methods section (“Drug-resistant cell lines”) to more precisely describe how the drug-resistant cell lines were established. 

      (2) What rationale informed the selection of MCF-7 cells for the generation of CDK6 knockout cell lines? Supplementary Figure 3. A indicates that CDK6 expression levels in MCF-7 cells are not notably elevated. 

      We appreciate the reviewer’s insightful question about the rationale for selecting MCF-7 cells to generate CDK6 knockout cell lines. This choice was guided by prior studies highlighting the significant role of CDK6 in mediating resistance to CDK4/6 inhibitors (21-24). Moreover, we observed a 4.6-fold increase in CDK6 expression in CDK4/6i resistant MCF-7 cells compared to their drug-naïve counterparts (Supplementary Figure 3A). While we did not detect notable differences in CDK4/6 activity between wild-type and CDK6 knockout cells under CDK4/6 inhibitor treatment, these findings point to a potential non-canonical function of CDK6 in conferring resistance to CDK4/6 inhibitors.  

      (3) For each experiment, particularly those involving mice, the author must specify the number of individuals utilized and the number of replicates conducted, as detailed in the materials and methods section. 

      We sincerely thank the reviewer for bringing this to our attention. In the revised manuscript, we have explicitly stated the number of replicates and mice used for each experiment as appropriate in figure legends and relevant text to ensure transparency and clarity. 

      (4) Could this treatment approach be extended to triple-negative breast cancer?

      We greatly appreciate the reviewer’s inquiry about extending our findings to triple-negative breast cancer (TNBC). Based on the data presented in Figure 1 and Supplementary Figure 2, which include the TNBC cell line MDA-MB-231, we expect that the benefits of maintaining CDK4/6 inhibitors could indeed be applicable to TNBC with an intact Rb/E2F pathway. Additionally, our recent paper (25) indicates a similar mechanism in TNBC.

      Reviewer #3 (Public review):

      Summary: 

      In their manuscript, Armand and colleagues investigate the potential of continuing CDK4/6 inhibitors or combining them with CDK2 inhibitors in the treatment of breast cancer that has developed resistance to initial therapy. Utilizing cellular and animal models, the research examines whether maintaining CDK4/6 inhibition or adding CDK2 inhibitors can effectively control tumor growth after resistance has set in. The key findings from the study indicate that the sustained use of CDK4/6 inhibitors can slow down the proliferation of cancer cells that have become resistant, and the combination of CDK2 inhibitors with CDK4/6 inhibitors can further enhance the suppression of tumor growth. Additionally, the study identifies that high levels of Cyclin E play a significant role in resistance to the combined therapy. These results suggest that continuing CDK4/6 inhibitors along with the strategic use of CDK2 inhibitors could be an effective strategy to overcome treatment resistance in hormone receptor-positive breast cancer.

      Strengths: 

      (1) Continuous CDK4/6 Inhibitor Treatment Significantly Suppresses the Growth of Drug-Resistant HR+ Breast Cancer: The study demonstrates that the continued use of CDK4/6 inhibitors, even after disease progression, can significantly inhibit the growth of drug-resistant breast cancer. 

      (2) Potential of Combined Use of CDK2 Inhibitors with CDK4/6 Inhibitors: The research highlights the potential of combining CDK2 inhibitors with CDK4/6 inhibitors to effectively suppress CDK2 activity and overcome drug resistance. 

      (3) Discovery of Cyclin E Overexpression as a Key Driver: The study identifies overexpression of cyclin E as a key driver of resistance to the combination of CDK4/6 and CDK2 inhibitors, providing insights for future cancer treatments. 

      (4) Consistency of In Vitro and In Vivo Experimental Results: The study obtained supportive results from both in vitro cell experiments and in vivo tumor models, enhancing the reliability of the research. 

      (5) Validation with Multiple Cell Lines: The research utilized multiple HR+/HER2- breast cancer cell lines (such as MCF-7, T47D, CAMA-1) and triple-negative breast cancer cell lines (such as MDA-MB-231), validating the broad applicability of the results.

      Weaknesses: 

      (1) The manuscript presents intriguing findings on the sustained use of CDK4/6 inhibitors and the potential incorporation of CDK2 inhibitors in breast cancer treatment. However, I would appreciate a more detailed discussion of how these findings could be translated into clinical practice, particularly regarding the management of patients with drug-resistant breast cancer. 

      Thank you to the reviewer for this crucial comment. In the revised Discussion, we've broadened our exploration of clinical translation. Specifically, we emphasize that ongoing CDK4/6 inhibition, although not fully stopping resistant tumors, significantly slows their growth and may offer a therapeutic window when combined with ET and CDK2 inhibition. We also note that these approaches may work best for patients without Rb loss or newly acquired resistance-driving mutations, and that cyclin E overexpression could be a biomarker to inform patient selection. These points together highlight that our findings provide a mechanistic understanding and potential framework for clinical trials testing maintenance CDK4/6i with selective addition of CDK2i as a secondline strategy in drug-resistant HR+/HER2- breast cancer.

      (2) While the emergence of resistance is acknowledged, the manuscript could benefit from a deeper exploration of the molecular mechanisms underlying resistance development. A more thorough understanding of how CDK2 inhibitors may overcome this resistance would be valuable. 

      We thank the reviewer for this valuable suggestion. In the revised manuscript, we have expanded our Discussion to more explicitly synthesize the molecular mechanisms of resistance and how CDK2 inhibitors counteract them. Specifically, we describe how sustained CDK4/6 inhibition drives a non-canonical route of Rb degradation, resulting in inefficient E2F activation and prolonged G1 phase progression. We also highlight the role of c-Myc in amplifying E2F activity and promoting resistance, and we show that continued ET mitigates this effect by suppressing c-Myc. Importantly, we demonstrate that CDK2 inhibition alone cannot fully suppress the growth of resistant cells, but when combined with CDK4/6 inhibition, it produces durable repression of E2F and Myc target gene programs and significantly delays the G1/S transition. Finally, we identify cyclin E overexpression as a key mechanism of escape from dual CDK4/6i + CDK2i therapy, suggesting its potential as a biomarker for patient stratification . Together, these findings provide a detailed mechanistic rationale for how CDK2 inhibition can overcome specific pathways of resistance in HR<sup>+</sup>/HER2<sup>-</sup> breast cancer.

      (3) The manuscript supports the continued use of CDK4/6 inhibitors, but it lacks a discussion on the long-term efficacy and safety of this approach. Additional studies or data to support the safety profile of prolonged CDK4/6 inhibitor use would strengthen the manuscript. 

      We appreciate the reviewer’s insightful comment. In the revised manuscript, we emphasize the longterm efficacy and safety considerations of sustained CDK4/6 inhibition. Clinical trial and retrospective data have shown that continued CDK4/6i therapy can extend progression-free survival in selected patients, while maintaining a favorable safety profile (26-28). We have updated the Discussion to highlight these findings more explicitly, underscoring that while prolonged CDK4/6 inhibition slows but does not fully arrest tumor growth, it remains a clinically viable strategy when balanced against its manageable toxicity profile.

      Reviewer #1 (Recommendations for the authors): 

      It is well known that the combination therapy of CDK4/6i and ET has therapeutic benefits in ER(+) HER2(-) advanced breast cancer. However, drug resistance is a problem, and second-line therapy to solve this problem has not been established. Although some parts of the research results are already reported, the authors confirmed them by employing live cell markers, and further proved and suggested how to overcome this resistance in detail. This part is considered novel. 

      Overall, this research manuscript is eligible to be accepted with the appropriate addressing of questions.

      (1)The effects and biochemical changes of combination therapy of CDK4/6i and CDK2i are already known in several papers. The author needs to highlight the differences between the author's research and that of otherresearchers. 

      We thank the reviewer for the opportunity to clarify the novelty of our findings in the context of prior studies on CDK4/6i and CDK2i combination therapy. In the revised manuscript, we have updated the Discussion section to more clearly delineate how our work extends and differs from existing research.

      Specifically, we now state:

      Page 12: The combination of CDK4/6i and ET has reshaped treatment for HR<sup>+</sup>/HER2<sup>-</sup> breast cancer (1-8). However, resistance commonly emerges, and no consensus second-line standard is established. Our data show that continued CDK4/6i treatment in drug-resistant cells engages a non-canonical, proteolysis-driven route of Rb inactivation, yielding attenuated E2F output and a pronounced delay in G1 progression (Figure 7G). Concurrent ET further deepens this blockade by suppressing c-Myc-mediated E2F amplification, thereby prolonging G1 and slowing population growth. Importantly, CDK2 inhibition alone was insufficient to control resistant cells. Robust suppression of CDK2 activity and resistant-cell growth required CDK2i in combination with CDK4/6i, consistent with prior reports supporting dual CDK targeting (9-16). Moreover, cyclin E, and in some contexts cyclin A, blunted the efficacy of the CDK4/6i and CDK2i combination by reactivating CDK2. Together, these findings provide a mechanistic rationale for maintaining CDK4/6i beyond progression and support testing ET plus CDK4/6i with the strategic addition of CDK2i, as evidenced by concordant in vitro and in vivo results.

      (2) Regarding Figures 3H and 3I, I wonder if it is live cell imaging results or if the authors counter each signal via timed IF staining slides? If live cell imaging is used, the authors need to present the methods. 

      We appreciate the reviewer’s question. Figures 3H and 3I derive from a live–fixed correlative pipeline rather than purely live imaging or independently timed IF slides. We first imaged asynchronously proliferating cells live for ≥48 h to (i) segment/track nuclei with H2B fluorescence, (ii) define mitotic exit (t = 0 at anaphase), and (iii) record CDK2 activity using a CDK2 KTR in the last live frame. Immediately after the live acquisition, we pulsed EdU (10 µM, 15 min) and fixed the same wells, photobleached fluorescent proteins (3% H₂O₂ + 20 mM HCl, 2 h, RT) to prevent crosstalk, and then performed click-chemistry EdU detection, IF for phospho-Rb (Ser807/811) and total Rb, and RNA FISH for E2F1. Fixed-cell readouts (p-Rb positivity, EdU incorporation, E2F1 mRNA puncta) were mapped back to each single cell’s live-derived time since mitosis and/or CDK2 activity, enabling the kinetic plots shown in Fig. 3H–I.

      To ensure transparency and reproducibility, we added detailed methods describing this workflow in the “Immunofluorescence and mRNA fluorescence in situ hybridization (FISH)” section under a dedicated “live– fixed pipeline” paragraph, and we cross-referenced acquisition and analysis parameters in “Live- and fixed-cell image acquisition” and “Image processing and analysis.” These updates specify: EdU pulse/fix conditions, photobleaching, antibodies/probes, imaging hardware and channels, segmentation/tracking, mitosis alignment, background correction, and how fixed readouts were binned/quantified as functions of time after mitosis and CDK2 activity.

      (3) Regarding Figure 3F, seven images were obtained in same fields? The author needs to describe the meaning of the white image and the yellow and blue image of the bottom in detail. 

      Thank you for raising this point. All seven panels in Fig. 3F are from the same field of view. The top row shows the raw channels (Hoechst, p-Rb, total Rb, and E2F1 RNA FISH). The bottom row shows the corresponding processed outputs from that field: (i) nuclear segmentation, (ii) phosphorylated Rb-status classification, and (iii) cell boundaries used for single-cell RNA-FISH quantification. We have revised the figure legend to make this explicit.

      (4) The author showed E2F mRNA by ISH, but in fact, RB does not suppress E2F mRNA but suppresses protein, so the author needs to confirm E2F at the protein level.

      We sincerely appreciate the reviewer’s thoughtful suggestion to examine E2F1 at the protein level. In our study, we focused on E2F1 mRNA expression because it is a well-established and biologically meaningful readout of E2F1 transcriptional activity. Due to its autoregulatory nature (17), the release of active E2F1 protein from Rb induces the transcription of E2F1 itself, creating a positive feedback loop. As a result, E2F1 mRNA abundance serves as a direct and reliable proxy for E2F1 protein activity (18-20). Thus, quantifying E2F1 mRNA provides a biologically relevant and mechanistic indicator of Rb-E2F pathway status. To clarify this rationale, we have updated the Results section and added references supporting our use of E2F1 mRNA as a readout for E2F1 activity.

      (5) Is it possible to synchronize cells (nocodazole shake-off, Double thymidine block) under the presence of cdk4/6i? If so, then the authors need to demonstrate the delay of G1 progression via immunoblotting. 

      We thank the reviewer for this constructive suggestion. To address it, we performed nocodazole synchronization followed by release and monitored cell-cycle progression in the presence or absence of CDK4/6 inhibition.

      Specifically, we added the following new datasets to the revised manuscript:

      Fig. 3L: Live single-cell trajectories of CDK4/6 and CDK2 activities alongside the Cdt1-degron reporter after 14 hours of nocodazole (250 nM) treatment and release. We compared the averaged traces of CDK4/6 and CDK2 activities and Cdt1 intensity in parental cells (gray) and resistant cells with (red) and without (blue) CDK4/6i maintenance. These data show suppressed and delayed CDK2 activation, as well as a right-shifted S-phase entry, particularly under continuous CDK4/6 inhibition.

      Fig. 3M: Fixed-cell EdU pulse-labeling at 4, 6, 8, 12, 16, and 24 h post-release further confirms a significant delay in S-phase entry and prolonged G1 duration in CDK4/6i-maintained cells compared with naïve and withdrawn conditions.

      Together, these results directly demonstrate the delay in G1 progression following synchronized mitotic exit under CDK4/6 inhibition.

      (6) In Figure 5C the authors showed a violin plot of c-Myc level. Is this Immunohistochemical staining? The authors need to clarify the methods.

      Thank you for flagging this. The c-Myc measurements in Fig. 5C are from immunofluorescence (IF), not IHC. We now state this explicitly in the legend.

      (7) Regarding Live cell immunofluorescence tracing of live-cell reporters, the author needs to clarify the methods (excitation, emission), name of instruments, and software used.

      To address this, we have expanded the “Live-cell, fixed-cell, and tumor tissue image acquisition” section in the Materials and Methods.

      (8) Lines 475 SF1A, the authors need to correct typos. Naïve Naïve.

      We greatly appreciate the reviewer’s attention to this detail and have ensured all typos have been addressed.  

      (9) The authors need to unify Cdt1-degron(legends) Vs Cdt1 degron (figures). 

      We greatly appreciate your attention to this discrepancy. Language referring to the Cdt1 degron has been unified between figures and legends. 

      Reviewer #3 (Recommendations for the authors):

      (1) While the manuscript discusses the selection of doses for CDK4/6 inhibitors and CDK2 inhibitors, there is a lack of detailed data on the dose-response relationship. Additional data on the effects of different doses would be beneficial. 

      We appreciate the reviewer’s important comment. To address it, we performed additional dose– response experiments testing a range of CDK4/6i and CDK2i concentrations. These analyses revealed a clear synergistic interaction between the two inhibitors. The new data are now presented in Figure 6G and Supplementary Figure 8F of the revised manuscript.

      (2) In clinical trials, the criteria for patient selection are crucial for interpreting study outcomes. A detailed description of the patient selection criteria should be provided.  

      We thank the reviewer for bringing this important point to our attention. In the revised manuscript, we have clarified the patient selection criteria relevant to the interpretation of clinical outcomes. Specifically, we note that retrospective analyses suggest patients with indolent disease and no prior chemotherapy may benefit most from continued CDK4/6i plus ET. Moreover, our data and others’ indicate that clinical benefit is expected in tumors retaining an intact Rb/E2F axis, while resistance-driving alterations (e.g., Rb loss, PIK3CA, ESR1, FGFR1–3, HER2, FAT1 mutations) are likely to limit efficacy. Finally, we highlight cyclin E overexpression as a potential biomarker of resistance to combined CDK4/6i and CDK2i, underscoring the need for biomarker-guided patient stratification. These additions provide a more detailed framework for patient selection in future clinical applications.

      References

      (1) Finn RS, Crown JP, Lang I, Boer K, Bondarenko IM, Kulyk SO, et al. The cyclin-dependent kinase 4/6 inhibitor palbociclib in combination with letrozole versus letrozole alone as first-line treatment of oestrogen receptor-positive, HER2-negative, advanced breast cancer (PALOMA-1/TRIO-18): a randomised phase 2 study. Lancet Oncol 2015;16:25-35

      (2) Finn RS, Martin M, Rugo HS, Jones S, Im S-A, Gelmon K, et al. Palbociclib and Letrozole in Advanced Breast Cancer. New England Journal of Medicine 2016;375:1925-36

      (3) Turner NC, Slamon DJ, Ro J, Bondarenko I, Im S-A, Masuda N, et al. Overall Survival with Palbociclib and Fulvestrant in Advanced Breast Cancer. New England Journal of Medicine 2018;379:1926-36

      (4) Dickler MN, Tolaney SM, Rugo HS, Cortés J, Diéras V, Patt D, et al. MONARCH 1, A Phase II Study of Abemaciclib, a CDK4 and CDK6 Inhibitor, as a Single Agent, in Patients with Refractory HR(+)/HER2(-) Metastatic Breast Cancer. Clin Cancer Res 2017;23:5218-24

      (5) Johnston S, Martin M, Di Leo A, Im S-A, Awada A, Forrester T, et al. MONARCH 3 final PFS: a randomized study of abemaciclib as initial therapy for advanced breast cancer. npj Breast Cancer 2019;5:5

      (6) Hortobagyi GN, Stemmer SM, Burris HA, Yap Y-S, Sonke GS, Hart L, et al. Overall Survival with Ribociclib plus Letrozole in Advanced Breast Cancer. New England Journal of Medicine 2022;386:94250

      (7) Slamon DJ, Neven P, Chia S, Fasching PA, De Laurentiis M, Im S-A, et al. Overall Survival with Ribociclib plus Fulvestrant in Advanced Breast Cancer. New England Journal of Medicine 2019;382:51424

      (8) Im S-A, Lu Y-S, Bardia A, Harbeck N, Colleoni M, Franke F, et al. Overall Survival with Ribociclib plus Endocrine Therapy in Breast Cancer. New England Journal of Medicine 2019;381:307-16

      (9) Pandey K, Park N, Park KS, Hur J, Cho YB, Kang M, et al. Combined CDK2 and CDK4/6 Inhibition Overcomes Palbociclib Resistance in Breast Cancer by Enhancing Senescence. Cancers (Basel) 2020;12

      (10) Freeman-Cook K, Hoffman RL, Miller N, Almaden J, Chionis J, Zhang Q, et al. Expanding control of the tumor cell cycle with a CDK2/4/6 inhibitor. Cancer Cell 2021;39:1404-21 e11

      (11) Dietrich C, Trub A, Ahn A, Taylor M, Ambani K, Chan KT, et al. INX-315, a selective CDK2 inhibitor, induces cell cycle arrest and senescence in solid tumors. Cancer Discov 2023

      (12) Al-Qasem AJ, Alves CL, Ehmsen S, Tuttolomondo M, Terp MG, Johansen LE, et al. Co-targeting CDK2 and CDK4/6 overcomes resistance to aromatase and CDK4/6 inhibitors in ER+ breast cancer. NPJ Precis Oncol 2022;6:68

      (13) Kudo R, Safonov A, Jones C, Moiso E, Dry JR, Shao H, et al. Long-term breast cancer response to CDK4/6 inhibition defined by TP53-mediated geroconversion. Cancer Cell 2024

      (14) Arora M, Moser J, Hoffman TE, Watts LP, Min M, Musteanu M, et al. Rapid adaptation to CDK2 inhibition exposes intrinsic cell-cycle plasticity. Cell 2023;186:2628-43 e21

      (15) Kumarasamy V, Wang J, Roti M, Wan Y, Dommer AP, Rosenheck H, et al. Discrete vulnerability to pharmacological CDK2 inhibition is governed by heterogeneity of the cancer cell cycle. Nature Communications 2025;16:1476

      (16) Dommer AP, Kumarasamy V, Wang J, O'Connor TN, Roti M, Mahan S, et al. Tumor Suppressors Condition Differential Responses to the Selective CDK2 Inhibitor BLU-222. Cancer Res 2025

      (17) Johnson DG, Ohtani K, Nevins JR. Autoregulatory control of E2F1 expression in response to positive and negative regulators of cell cycle progression. Genes & Development 1994;8:1514-25

      (18) Chung M, Liu C, Yang HW, Koberlin MS, Cappell SD, Meyer T. Transient Hysteresis in CDK4/6 Activity Underlies Passage of the Restriction Point in G1. Mol Cell 2019;76:562-73 e4

      (19) Kim S, Leong A, Kim M, Yang HW. CDK4/6 initiates Rb inactivation and CDK2 activity coordinates cell-cycle commitment and G1/S transition. Sci Rep 2022;12:16810

      (20) Yang HW, Chung M, Kudo T, Meyer T, Yang HW, Chung, Mingyu, Kudo T, et al. Competing memories of mitogen and p53 signalling control cell-cycle entry. Nature 2017;549:404-8

      (21) Yang C, Li Z, Bhatt T, Dickler M, Giri D, Scaltriti M, et al. Acquired CDK6 amplification promotes breast cancer resistance to CDK4/6 inhibitors and loss of ER signaling and dependence. Oncogene 2017;36:2255-64

      (22) Li Q, Jiang B, Guo J, Shao H, Del Priore IS, Chang Q, et al. INK4 Tumor Suppressor Proteins Mediate Resistance to CDK4/6 Kinase Inhibitors. Cancer Discov 2022;12:356-71

      (23) Ji W, Zhang W, Wang X, Shi Y, Yang F, Xie H, et al. c-myc regulates the sensitivity of breast cancer cells to palbociclib via c-myc/miR-29b-3p/CDK6 axis. Cell Death & Disease 2020;11:760

      (24) Wu X, Yang X, Xiong Y, Li R, Ito T, Ahmed TA, et al. Distinct CDK6 complexes determine tumor cell response to CDK4/6 inhibitors and degraders. Nature Cancer 2021;2:429-43

      (25) Kim S, Son E, Park HR, Kim M, Yang HW. Dual targeting CDK4/6 and CDK7 augments tumor response and anti-tumor immunity in breast cancer models. J Clin Invest 2025

      (26) Ravani LV, Calomeni P, Vilbert M, Madeira T, Wang M, Deng D, et al. Efficacy of Subsequent Treatments After Disease Progression on CDK4/6 Inhibitors in Patients With Hormone Receptor-Positive Advanced Breast Cancer. JCO Oncol Pract 2025;21:832-42

      (27) Martin JM, Handorf EA, Montero AJ, Goldstein LJ. Systemic Therapies Following Progression on Firstline CDK4/6-inhibitor Treatment: Analysis of Real-world Data. Oncologist 2022;27:441-6

      (28) Kalinsky K, Bianchini G, Hamilton E, Graff SL, Park KH, Jeselsohn R, et al. Abemaciclib Plus Fulvestrant in Advanced Breast Cancer After Progression on CDK4/6 Inhibition: Results From the Phase III postMONARCH Trial. J Clin Oncol 2025;43:1101-12

    1. Você tem a sua revista?

      1,Tenho,sim.tenho o meu revista 2,sim,eu gosto o meu quartro 3,Sim,ele comversa com a sua amiga 4,Sim,Ela telefona para o seu médico 5,Sim nós temos camisas novas 6,sim ela mora com o seu primo 7,Sim,eu sempre venho de ônibus 8.sim nós almoçamos com a minha tia todos os dias

    1. o continually navigate in intellectual spaces wherethe issue is one of both ‘thinking globally’ and examining processes ofindividual and collective subjectivation and de-subjectivation

      But how?

    1. The Empire Within: Postcolonial Thought and Political Activism in Sixties Montreal by Sean Mills Call Number: F1054.5 M8 M44 ISBN: 0773536957 Publication Date: 2010-03-25 Generations of intellectuals have debated Canada’s national question. Rather than join the debate, Multicultural Nationalism challenges its logic. The national question is self-defeating: attempts to constitute a Canadian political community generate polarizing and depoliticizing deliberations. The Ethics of Cultural Appropriation by Conrad G. Brunk (Editor); James O. Young (Editor) Call Number: GN496 E74 ISBN: 1444350838 Publication Date: 2012-02-14 A comprehensive and systematic investigation of the moral and aesthetic questions that arise from the practice of cultural appropriation. Living as Form by Nato Thompson (Editor) Call Number: N72.5 P6 L47 ISBN: 0262017342 Publication Date: 2012-02-17 Over the past twenty years, an abundance of art forms have emerged that use aesthetics to affect social dynamics. These works are often produced by collectives or come out of a community context; they emphasize participation, dialogue, and action, and appear in situations ranging from theater to activism to urban planning to visual art to health care. Rethinking a Lot by Eran Ben-Joseph Call Number: TL175 B35 ISBN: 0262017334 Publication Date: 2012-02-17 There are an estimated 600,000,000 passenger cars in the world, and that number is increasing every day. So too is Earth's supply of parking spaces. In some cities, parking lots cover more than one-third of the metropolitan footprint. It's official: we have paved paradise and put up a parking lot.

      out of date

    1. HERO IMAGE & GALERIA

      Fakty: • 75% konsumentów bazuje decyzję na zdjęciach • Lifestyle photos zwiększają konwersję o 25-50% • Bez lifestyle photos: “To jest produkt”, z lifestyle photos: “To mogę być ja” Co musi być w galerii (minimum):

    1. We live in a current moment where, to get things done, we have to deploy terms in ways that capture the imagination of decision makers and the public in ways that affect change

      This connects to my Dante topic because medieval manuscripts also had to be presented in ways that grabbed attention. The way a manuscript looked could influence how seriously people took the text. So marketing knowledge is not just digital, it existed back then too.

    1. While tempting to store meaningful information in formatting like color codes or bolded text, this is a very bad idea. Formatting gets easily broken between software versions and applications.

      This reminds me of Dante manuscripts because from what I have been learning so far, the pages can be really decorative and visually interesting, but that does not automatically make them easy to analyze. Good looking does not always mean simple to understand.

    2. There’s no perfect format choice that applies to every project, but there are some trade offs to keep in mind.

      This connects to my Dante manuscript topic because there is no perfect manuscript copy either. Each copy of Dante is different and each version has trade offs. There were choices about spelling, commentary, and what sources or references to include. Choosing a database format today is basically the modern version of choosing a manuscript format back then.

    1. Digital archaeology resists the (digital) neo-colonialism of Google, Facebook, and similar tech giants that typically promote disciplinary silos and closed code and data repositories.

      This quote highlights the important ethical dimension of digital archaeology. Showing how open access and collaborative tools in digital archaeology challenge corporate control over knowledge. It aligns with using open GIS platforms that I will use for my projects and open data policy.

    1. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This study provides a valuable contribution to understanding how negative affect influences food-choice decision making in bulimia nervosa, using a mechanistic approach with a drift diffusion model (DDM) to examine the weighting of tastiness and healthiness attributes. The solid evidence is supported by a robust crossover design and rigorous statistical methods, although concerns about low trial counts, possible overfitting, and the absence of temporally aligned binge-eating measures limit the strength of causal claims. Addressing modeling transparency, sample size limitations, and the specificity of mood induction effects, would enhance the study's impact and generalizability to broader populations.

      We thank the Editor and Reviewers for their summary of the strengths of our study, and for their thoughtful review and feedback on our manuscript. We apologize for the confusion in how we described the multiple steps performed to ensure that the hierarchical model reported in the main text was the best fit for the data but was not overfitted. Regarding “model transparency,” as described in our response to Reviewer 1 below, we have now more clearly explained (with references) that the use of hierarchical estimation procedures allows for information sharing across participants, which improves the reliability and stability of parameter estimates—even when the number of trials per individual is small. We have clarified for the less familiar reader how our Bayesian model selection criterion penalizes models with more parameters (e.g., more complex models).

      Details about model diagnostics, recoverability, and posterior predictive checks are all provided in the Supplementary Materials. We have clarified how these steps ensure that the parameters we estimate are identifiable and interpretable, while confirming that the model can reproduce key patterns in the data, ultimately supporting the validity of the winning model. Additionally, we have provided all scripts for estimating the models by linking to our public Github repository. Furthermore, we have edited language throughout to eliminate any implication of causal claims and acknowledged the limitation of the small sample size. Given these efforts, we are concerned that the current wording about “modeling transparency” in the public eLife Assessment may inadvertently misrepresent the modeling practices in our paper. Would it be possible to revise or remove that particular phrase to better reflect the steps we have taken? We believe this would help avoid confusion for readers.

      We have also taken additional steps to ensure that we have used “appropriate and validated methodology in line with current state-of-the-art," and we have added references to recent papers supporting our approaches.

      All changes in the revised text are marked in blue.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Using a computational modeling approach based on the drift diffusion model (DDM) introduced by Ratcliff and McKoon in 2008, the article by Shevlin and colleagues investigates whether there are differences between neutral and negative emotional states in:

      (1) The timings of the integration in food choices of the perceived healthiness and tastiness of food options between individuals with bulimia nervosa (BN) and healthy participants.

      (2) The weighting of the perceived healthiness and tastiness of these options.

      Strengths:

      By looking at the mechanistic part of the decision process, the approach has the potential to improve the understanding of pathological food choices. The article is based on secondary research data.

      Weaknesses:

      I have two major concerns and a major improvement point.

      The major concerns deal with the reliability of the results of the DDM (first two sections of the Results, pages 6 and 7), which are central to the manuscript, and the consistency of the results with regards to the identification of mechanisms related to binge eating in BN patients (i.e. last section of the results, page 7).

      (1) Ratcliff and McKoon in 2008 used tasks involving around 1000 trials per participant. The Chen et al. experiment the authors refer to involves around 400 trials per participant. On the other hand, Shevlin and colleagues ask each participant to make two sets of 42 choices with two times fewer participants than in the Chen et al. experiment. Shevlin and colleagues also fit a DDM with additional parameters (e.g. a drift rate that varies according to subjective rating of the options) as compared to the initial version of Ratcliff and McKoon. With regards to the number of parameters estimated in the DDM within each group of participants and each emotional condition, the 5- to 10-fold ratio in the number of trials between the Shevlin and colleagues' experiment and the experiments they refer to (Ratcliff and McKoon, 2008; Chen et al. 2022) raises serious concerns about a potential overfitting of the data by the DDM. This point is not highlighted in the Discussion. Robustness and sensitivity analyses are critical in this case.

      We thank the Reviewer for their thoughtful critique. We agree that a limited number of trials can impede reliable estimation, which we acknowledge in the Discussion section. However, we used a hierarchical estimation approach which leverages group information to constrain individual-level estimates. This use of group-level parameters to inform individual-level estimates reduces overfitting and noise that can arise when trial counts are low, and the regularization inherent in hierarchical fitting prevents extreme parameter estimates that could arise from noisy or limited data (Rouder & Lu, 2005). As a result, hierarchical estimation has been repeatedly shown to work well in settings with low trial counts, including as few as 40 trials per condition (Lerche et al., 2017; Ratcliff & Childers, 2015; Wiecki et al., 2013). In addition, previous applications of the time-varying DDM to food choice task data has included experiments with as few as 60 trials per condition (Maier et al., 2020). We have added references to these more recent approaches and specifically note their advantages for the modeling of tasks with fewer trials. Finally, our successful parameter recovery described in the Supplementary Materials supports the robustness of the estimation procedure and the reliability of our results.

      The authors compare different DDMs to show that the DDM they used to report statistical results in the main text is the best according to the WAIC criterion. This may be viewed as a robustness analysis. However, the other DDM models (i.e. M0, M1, M2 in the supplementary materials) they used to make the comparison have fewer parameters to estimate than the one they used in the main text. Fits are usually expected to follow the rule that the more there are parameters to estimate in a model, the better it fits the data. Additionally, a quick plot of the data in supplementary table S12 (i.e. WAIC as a function of the number of parameters varying by food type in the model - i.e. 0 for M0, 2 for M1, 1 for M2 and 3 for M3) suggests that models M1 and potentially M2 may be also suitable: there is a break in the improvement of WAIC between model M0 and the three other models. I would thus suggest checking how the results reported in the main text differ when using models M1 and M2 instead of M3 (for the taste and health weights when comparing M3 with M1, for τS when comparing M3 with M2). If the differences are important, the results currently reported in the main text are not very reliable.

      We thank the Reviewer for highlighting that it would be helpful to explicitly note that we specifically selected WAIC as one of two methods to assess model fit because it penalizes for model complexity. We now explicitly state that, in addition to being more robust than other metrics like AIC or BIC when comparing hierarchical Bayesian models like those in the current study, model fit metrics like WAIC penalize for model complexity based on the number of parameters (Watanabe, 2010). Therefore, more complex models (i.e., those with more parameters) do not automatically have lower WAIC. Additionally, we now more clearly note that our second method to assess model fit, posterior predictive checks, demonstrate that only model M3 can reproduce key behavioral patterns present in the empirical data. As described in the Supplementary Materials, M1 and M2 miss key patterns in the data. In summary, we used best practices to assess model fit and reliability (Wilson & Collins, 2019): results from the WAIC comparison (which penalizes models with more parameters) and results from posterior predictive checks align in showing that M3 provided the best fit to our data. We have added a sentence to the manuscript to state this explicitly.

      (2) The second main concern deals with the association reported between the DDM parameters and binge eating episodes (i.e. last paragraph of the results section, page 7). The authors claim that the DDM parameters "predict" binge eating episodes (in the Abstract among other places) while the binge eating frequency does not seem to have been collected prospectively. Besides this methodological issue, the interpretation of this association is exaggerated: during the task, BN patients did not make binge-related food choices in the negative emotional state. Therefore, it is impossible to draw clear conclusions about binge eating, as other explanations seem equally plausible. For example, the results the authors report with the DDM may be a marker of a strategy of the patients to cope with food tastiness in order to make restrictive-like food choices. A comparison of the authors' results with restrictive AN patients would be of interest. Moreover, correlating results of a nearly instantaneous behavior (i.e. a couple of minutes to perform the task with the 42 food choices) with an observation made over several months (i.e. binge eating frequency collected over three months) is questionable: the negative emotional state of patients varies across the day without systematically leading patients to engage in a binge eating episode in such states.

      I would suggest in such an experiment to collect the binge craving elicited by each food and the overall binge craving of patients immediately before and after the task. Correlating the DDM results with these ratings would provide more compelling results. Without these data, I would suggest removing the last paragraph of the Results.

      We thank the Reviewer for these interesting and important suggestions, and we agree that claims about causal connections between our decision parameters and symptom severity metrics would be inappropriate. Per the Reviewer’s suggestions, we have eliminated the use of the word “predict” to describe the tested association with symptom metrics. We also agree that more time-locked associations with craving ratings and near-instantaneous behavior would be useful, and we have added this as an important direction for future research in the discussion. However, associating task-based behavior with validated self-report measures that assess symptom severity over long periods of time that precede the task visit (e.g., over the past 2 weeks in depression, over the past month in eating disorders) is common practice in computational psychiatry, psychiatric neuroimaging, and clinical cognitive neuroscience (Hauser et al., 2022; Huys et al., 2021; Wise et al., 2023), and this approach has been used several times specifically with food choice tasks (Dalton et al., 2020; Steinglass et al., 2015). We have revised the language throughout the manuscript to clarify: the results suggest that individuals whose task behavior is more reactive to negative affect tend to be the most symptomatic, but the results do not allow us to determine whether this reactivity causes the symptoms.

      In response to this Reviewer’s important point about negative affect not always producing loss-of-control eating in individuals with BN, we now explicitly note that while several studies employing ecological momentary assessments (EMA) have repeatedly shown that increases in negative affect significantly increase the likelihood of subsequent loss-of-control eating (Alpers & Tuschen-Caffier, 2001; Berg et al., 2013; Haedt-Matt & Keel, 2011; Hilbert & Tuschen-Caffier, 2007; Smyth et al., 2007), not all loss-of-control eating occurs in the context of negative affect. We further note that future studies should integrate food choice task data pre and post-affect inductions with measures capturing the specific frequency of loss of control eating episodes that occur during states of high negative affect.

      (3) My major improvement point is to tone down as much as possible any claim of a link with binge eating across the entire manuscript and to focus more on the restrictive behavior of BN patients in between binge eating episodes (see my second major concern about the methods). Additionally, since this article is a secondary research paper and since some of the authors have already used the task with AN patients, if possible I would run the same analyses with AN patients to test whether there are differences between AN (provided they were of the restrictive subtype) and BN.

      We appreciate the Reviewer’s very helpful suggestions. We have adjusted our language linking loss-of-control eating frequency with decision parameters, and we have added sentences focusing on the implications for the restrictive behavior of patients with BN between binge eating episodes. In the Supplementary Materials, we have added an analysis of the restraint subscale of the EDE-Q and confirmed no relationship with parameters of interest. While we agree additional analyses with AN patients would be of interest, this is outside the scope of the paper. Our team have collected data from individuals with AN using this task, but not with any affect induction or measure of affect. Therefore, we have added this important direction for future research to the discussion.

      Reviewer #2 (Public review):

      Summary:

      Binge eating is often preceded by heightened negative affect, but the specific processes underlying this link are not well understood. The purpose of this manuscript was to examine whether affect state (neutral or negative mood) impacts food choice decision-making processes that may increase the likelihood of binge eating in individuals with bulimia nervosa (BN). The researchers used a randomized crossover design in women with BN (n=25) and controls (n=21), in which participants underwent a negative or neutral mood induction prior to completing a food-choice task. The researchers found that despite no differences in food choices in the negative and neutral conditions, women with BN demonstrated a stronger bias toward considering the 'tastiness' before the 'healthiness' of the food after the negative mood induction.

      Strengths:

      The topic is important and clinically relevant and methods are sound. The use of computational modeling to understand nuances in decision-making processes and how that might relate to eating disorder symptom severity is a strength of the study.

      Weaknesses:

      The sample size was relatively small and may have been underpowered to find differences in outcomes (i.e., food choice behaviors). Participants were all women with BN, which limits the generalizability of findings to the larger population of individuals who engage in binge eating. It is likely that the negative affect manipulation was weak and may not have been potent enough to change behavior. Moreover, it is unclear how long the negative affect persisted during the actual task. It is possible that any increases in negative affect would have dissipated by the time participants were engaged in the decision-making task.

      We thank the Reviewer for their comments on the strengths of the paper, and for highlighting these important considerations regarding the sample demographics and the negative affect induction. As in the original paper that focused only on ultimate food choice behaviors, we now specifically acknowledge that the study was only powered to detect small to medium group differences in the effect of negative emotion on these final choice behaviors.

      Regarding the sample demographics, we agree that the study’s inclusion of only female participants is a limitation. Although the original decision for this sampling strategy was informed by data suggesting that bulimia nervosa is roughly six times more prevalent among females than males (Udo & Grilo, 2018), we now note in the discussion that our female-only sample limits the generalizability of the findings.

      We also agree with the Reviewer’s noted limitations of the negative mood induction, and based on the reviewer’s suggestions, we have expanded our original description of these limitations in the Discussion. Specifically, we now note that although the task was completed immediately after the affect induction, the study did not include intermittent mood assessments throughout the choice task, so it is unclear how long the negative affect persisted during the actual task.

      Reviewer #3 (Public review):

      Summary:

      The study uses the food choice task, a well-established method in eating disorder research, particularly in anorexia nervosa. However, it introduces a novel analytical approach - the diffusion decision model - to deconstruct food choices and assess the influence of negative affect on how and when tastiness and healthiness are considered in decision-making among individuals with bulimia nervosa and healthy controls.

      Strengths:

      The introduction provides a comprehensive review of the literature, and the study design appears robust. It incorporates separate sessions for neutral and negative affect conditions and counterbalances tastiness and healthiness ratings. The statistical methods are rigorous, employing multiple testing corrections.

      A key finding - that negative affect induction biases individuals with bulimia nervosa toward prioritizing tastiness over healthiness - offers an intriguing perspective on how negative affect may drive binge eating behaviors.

      Weaknesses:

      A notable limitation is the absence of a sample size calculation, which, combined with the relatively small sample, may have contributed to null findings. Additionally, while the affect induction method is validated, it is less effective than alternatives such as image or film-based stimuli (Dana et al., 2020), potentially influencing the results.

      We agree that the limited sample size and specific affect induction method may have contributed to the null model-agnostic behavioral findings. Based on this Reviewer’s and Reviewer 2’s comments, we have added these factors to our acknowledgements of limitations in the discussion.

      Another concern is the lack of clarity regarding which specific negative emotions were elicited. This is crucial, as research suggests that certain emotions, such as guilt, are more strongly linked to binge eating than others. Furthermore, recent studies indicate that negative affect can lead to both restriction and binge eating, depending on factors like negative urgency and craving (Leenaerts et al., 2023; Wonderlich et al., 2024). The study does not address this, though it could explain why, despite the observed bias toward tastiness, negative affect did not significantly impact food choices.

      We thank the Reviewer for raising these important points and possibilities. In the Supplementary Materials, we have added an additional analysis of the specific POMS subscales that comprise the total negative affect calculation that was reported in the original paper (Gianini et al., 2019). We also report total negative affect scores from the POMS in the main text. Ultimately, we found that, across both groups, the negative affect induction increased responses related to anger, confusion, depression, and tension while reducing vigor.

      We agree with the Reviewer that factors like negative urgency and cravings are relevant here. The study did not collect any measures of craving, and in response to Reviewer 1 and this Reviewer, we now note in the discussion that replication studies including momentary craving assessments will be important. While we do not have any measurements of cravings, we did measure negative urgency. The original paper (Gianini et al., 2019) did not find that negative urgency was related to restrictive food choices. We have now repeated those analyses, and we also were unable to find any meaningful patterns related to negative urgency. Nonetheless, we have added an analysis of negative urgency scores and decision parameters to the Supplementary Materials.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Please improve the description of the computational methods: the fit of the DDM, the difference between the models used in the DDM, and the difference between the DDM model and the models used in the linear mixed models (the word "model" is at the end confusing as it may refer either to the DDM or to the statistical analysis of the DDM parameters).

      We thank the Reviewer for highlighting the unclear language. We have updated the main text to clarify when the term “model” refers to the DDM itself versus the regression models assessing DDM parameters. As described above, we have clarified that both tests of model fit (WAIC and posterior predictive checks) suggest that Model 3 was the best fit to the data. We have also clarified the differences between the tested models in the Supplementary Materials.

      Please avoid reporting estimates of main effects in statistical models when an interaction is included: the estimates of the main effects may be heavily biased by the interaction term (this can be checked by re-running the model without the interaction term).

      We sincerely appreciate the Reviewer’s comment regarding the interpretation of main effects in the presence of significant interaction terms. In the revised manuscript, we no longer discuss significant main effects and instead focus on interpreting the interaction terms.

      Additionally, to help unpack interaction effects, we now include exploratory simple effects analyses in the supplementary materials. Simple effects analyses allow us to examine the effects of one independent variable at specific values of other independent variables (Aiken et al., 1991; Brambor et al., 2006; Jaccard & Turrisi, 2003; Winer et al., 1991).

      Supplementary tables S5 and S6 are excessive: there is no third-level interaction (supplementary tables S3 and S4) to justify a split between BN and healthy participants. Please perform rather a descending regression. Accordingly, the results reported in the second paragraph of page 7 should be entirely rewritten.

      We agree with the Reviewer’s suggestion that these tables are unnecessary. We have updated them to include details about simple effects analyses described above. We have revised the main text to reflect these changes.

      The words such as "predictive" indicating a causality link is used in several places in the manuscript including the supplementary materials while the experimental design does not allow such claims. This should be rephrased.

      We agree with the Reviewer that the term “predicted” in the main text improperly suggested a causal relationship between symptom severity and DDM parameters that our methods cannot evaluate. We have updated the main text with more appropriate language. However, our use of the term “predicted” in the Supplementary Materials refers to predicting the probability of a choice based on trial-level features which is standard use of the term in the computational cognitive modeling literature (Piray et al., 2019; Wilson & Collins, 2019; Zhang et al., 2020).

      The word "evaluated" appears twice in line 42 of the supplementary materials. Same with "in" at line 50.

      Thank you very much for highlighting this. We have removed the repeated words.

      Reviewer #2 (Recommendations for the authors):

      (1) I think it would be helpful if the authors noted in the Methods how long the food-choice task took. Prior research has suggested that in-lab mood inductions are very short-lasting (e.g., max 7 minutes) and it is likely that the task itself may have impacted the mood states of participants. Expanding on this in the Discussion/limitations seems important.

      The Reviewer raises an important point regarding the duration of our affect manipulation. Since we did not measure mood during or after the Food Choice Task, we cannot determine how long these effects persisted. We have added this limitation to the discussion section, noting that the absence of continuous affect measures following mood induction is a widespread limitation in the field.

      (2) Personally, I was a bit confused about what data the researchers were using to extrapolate information on whether or not participants were considering healthiness or tastiness. How was this operationalized? Is this an assumption being made based on how quickly someone chose a low-fat vs. high-fat food?

      We thank this Reviewer for highlighting that our models’ complexity warrants a more thorough explanation.

      Since we collected tastiness and healthiness attribute ratings during the first phase of the Food Choice Task, we can use those values to determine how these attribute values influence decision-making. Independently, foods were classified as low-fat or high-fat based on their objective properties (i.e., the percentage of calories from fat). However, the primary information we used to compute model parameters were participants’ attribute ratings, choices, and response times.

      In these models, the drift rate parameter captures the speed and direction of evidence accumulation. As the unsigned magnitude of the drift rate increases, the decision-maker is making up their mind more quickly. Once the evidence accumulates to a response boundary, the option associated with that boundary is selected. A positive drift rate means they are moving toward choosing one option (i.e., upper boundary), and a negative drift rate means they are moving toward choosing the other (i.e., lower boundary). In these decisions, decision-makers often consider multiple attributes, such as perceived healthiness and tastiness. Each of these attributes can influence the evidence accumulation process with different strengths, or weights.

      In addition, decision-makers do not consider all attributes at the same time. Inspired by earlier work on multi-attribute decision-making (Maier et al., 2020; Sullivan & Huettel, 2021), our modeling approach computes a parameter (i.e., relative attribute onset) which captures the time delay between when each attribute starts influencing the evidence accumulation process. This parameter gives us a way to estimate when decision-makers are considering different attributes, and tells us how much influence each attribute has, because if the attribute starts late, it has less time to influence the decision. These models use a piecewise drift rate function to describe how evidence changes over time within a trial: sometimes the decision maker only considers taste, sometimes only health, and other times both. Importantly, models with a relative attribute onset parameter can produce key behavioral patterns observed in mouse-tracking studies that models without this parameter are unable to replicate (Maier et al., 2020).

      In summary, the computational model describes decision-makers’ behaviors (what they would choose, and how fast they would choose) using different potential values of the drift weights and relative start time parameters. We then used Bayesian estimation methods to compare the model's predictions to the actual data. By examining how reaction times and choices change depending on the attribute values of the presented options, the model allows us to infer when each attribute is considered, and how strongly it influences the final choice.

      We have clarified this in the main text.

      Reviewer #3 (Recommendations for the authors):

      I wonder whether there were any measures concerning negative affect before and after the mood induction? This would make it clearer whether there was a significant change before and after. If different emotions were assessed, which emotion showed the strongest change?

      We thank the Reviewer for flagging this point. We realize that the main text did not make it clear that mood was assessed before and after the mood induction using the POMS (McNair et al., 1989). While these analyses were conducted and the results were reported in the original manuscript (Gianini et al., 2019), we now report them in the main text for completeness. Additionally, we added more details about how specific emotions changed by analyzing the subscales of the POMS in the Supplementary Materials. As mentioned above, we found that, across both groups, the negative affect induction increased responses related to anger, confusion, depression, and tension while reducing vigor.

      Thank you again for your consideration and for the reviewers’ comments and suggestions. We believe their incorporation has significantly strengthened the paper. In addition, thank you for the opportunity to publish our work in eLife. We look forward to hearing your response.

      References

      Aiken, L. S., West, S. G., & Reno, R. R. (1991). Multiple regression: Testing and interpreting interactions. Sage Publications, Inc.

      Alpers, G. W., & Tuschen-Caffier, B. (2001). Negative feelings and the desire to eat in bulimia nervosa. Eating Behaviors, 2(4), 339–352. https://doi.org/10.1016/S1471-0153(01)00040-X

      Berg, K. C., Crosby, R. D., Cao, L., Peterson, C. B., Engel, S. G., Mitchell, J. E., & Wonderlich, S. A. (2013). Facets of negative affect prior to and following binge-only, purge-only, and binge/purge events in women with bulimia nervosa. Journal of Abnormal Psychology, 122(1), 111–118. https://doi.org/10.1037/a0029703

      Brambor, T., Clark, W. R., & Golder, M. (2006). Understanding Interaction Models: Improving Empirical Analyses. Political Analysis, 14(1), 63–82. https://doi.org/10.1093/pan/mpi014

      Dalton, B., Foerde, K., Bartholdy, S., McClelland, J., Kekic, M., Grycuk, L., Campbell, I. C., Schmidt, U., & Steinglass, J. E. (2020). The effect of repetitive transcranial magnetic stimulation on food choice-related self-control in patients with severe, enduring anorexia nervosa. International Journal of Eating Disorders, 53(8), 1326–1336. https://doi.org/10.1002/eat.23267

      Gianini, L., Foerde, K., Walsh, B. T., Riegel, M., Broft, A., & Steinglass, J. E. (2019). Negative affect, dietary restriction, and food choice in bulimia nervosa. Eating Behaviors, 33, 49–54. https://doi.org/10.1016/j.eatbeh.2019.03.003

      Haedt-Matt, A. A., & Keel, P. K. (2011). Revisiting the affect regulation model of binge eating: A meta-analysis of studies using ecological momentary assessment. Psychological Bulletin, 137(4), 660–681. https://doi.org/10.1037/a0023660

      Hauser, T. U., Skvortsova, V., Choudhury, M. D., & Koutsouleris, N. (2022). The promise of a model-based psychiatry: Building computational models of mental ill health. The Lancet Digital Health, 4(11), e816–e828. https://doi.org/10.1016/S2589-7500(22)00152-2

      Hilbert, A., & Tuschen-Caffier, B. (2007). Maintenance of binge eating through negative mood: A naturalistic comparison of binge eating disorder and bulimia nervosa. International Journal of Eating Disorders, 40(6), 521–530. https://doi.org/10.1002/eat.20401

      Huys, Q. J. M., Browning, M., Paulus, M. P., & Frank, M. J. (2021). Advances in the computational understanding of mental illness. Neuropsychopharmacology, 46(1), 3–19. https://doi.org/10.1038/s41386-020-0746-4

      Jaccard, J., & Turrisi, R. (2003). Interaction effects in multiple regression (2nd ed.). Sage Publications, Inc.

      Lerche, V., Voss, A., & Nagler, M. (2017). How many trials are required for parameter estimation in diffusion modeling? A comparison of different optimization criteria. Behavior Research Methods, 49(2), 513–537. https://doi.org/10.3758/s13428-016-0740-2

      Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour, 4(9), Article 9. https://doi.org/10.1038/s41562-020-0893-y

      McNair, D., Lorr, M., & Droppleman, L. (1989). Profile of mood states (POMS).

      Piray, P., Dezfouli, A., Heskes, T., Frank, M. J., & Daw, N. D. (2019). Hierarchical Bayesian inference for concurrent model fitting and comparison for group studies. PLOS Computational Biology, 15(6), e1007043. https://doi.org/10.1371/journal.pcbi.1007043

      Ratcliff, R., & Childers, R. (2015). Individual differences and fitting methods for the two-choice diffusion model of decision making. Decision, 2(4), 237–279. https://doi.org/10.1037/dec0000030

      Rouder, J. N., & Lu, J. (2005). An introduction to Bayesian hierarchical models with an application in the theory of signal detection. Psychonomic Bulletin & Review, 12(4), 573–604. https://doi.org/10.3758/BF03196750

      Smyth, J. M., Wonderlich, S. A., Heron, K. E., Sliwinski, M. J., Crosby, R. D., Mitchell, J. E., & Engel, S. G. (2007). Daily and momentary mood and stress are associated with binge eating and vomiting in bulimia nervosa patients in the natural environment. Journal of Consulting and Clinical Psychology, 75(4), 629–638. https://doi.org/10.1037/0022-006X.75.4.629

      Steinglass, J., Foerde, K., Kostro, K., Shohamy, D., & Walsh, B. T. (2015). Restrictive food intake as a choice—A paradigm for study. International Journal of Eating Disorders, 48(1), 59–66. https://doi.org/10.1002/eat.22345

      Sullivan, N., & Huettel, S. A. (2021). Healthful choices depend on the latency and rate of information accumulation. Nature Human Behaviour, 5(12), Article 12. https://doi.org/10.1038/s41562-021-01154-0

      Udo, T., & Grilo, C. M. (2018). Prevalence and Correlates of DSM-5–Defined Eating Disorders in a Nationally Representative Sample of U.S. Adults. Biological Psychiatry, 84(5), 345–354. https://doi.org/10.1016/j.biopsych.2018.03.014

      Watanabe, S. (2010). Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. Journal of Machine Learning Research, 11, 3571–3594.

      Wiecki, T. V., Sofer, I., & Frank, M. J. (2013). HDDM: Hierarchical Bayesian estimation of the drift-diffusion model in Python. Frontiers in Neuroinformatics, 7. https://doi.org/10.3389/fninf.2013.00014

      Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. eLife, 8, e49547. https://doi.org/10.7554/eLife.49547

      Winer, B. J., Brown, D. R., & Michels, K. M. (1991). Statistical principles in experimental design (3rd ed). McGraw-Hill.

      Wise, T., Robinson, O. J., & Gillan, C. M. (2023). Identifying Transdiagnostic Mechanisms in Mental Health Using Computational Factor Modeling. Biological Psychiatry, 93(8), 690–703. https://doi.org/10.1016/j.biopsych.2022.09.034

      Zhang, L., Lengersdorff, L., Mikus, N., Gläscher, J., & Lamm, C. (2020). Using reinforcement learning models in social neuroscience: Frameworks, pitfalls and suggestions of best practices. Social Cognitive and Affective Neuroscience, 15(6), 695–707. https://doi.org/10.1093/scan/nsaa089

    1. Symptom: Ghosting lines overlaying typed text:

      This is assuredly not a type slug cleaning issue and the secret is that the loops in the letters like "a", "e", "o", etc. are clear. The lines are caused by the paper not being held to the platen, so when the slug hits, you're getting ink from the other part of the slug transferring to the paper. The remedy is to tuck the paper underneath the paper bale and rollers.

      If one still sees issues after this then check your manual to ensure that the ribbon is properly threaded followed by a check that the ribbon vibrator isn't bent too far away from the typing point and too close to the platen and causing the ribbon to rub against the paper.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In the article "Non-Invasive Mechanical-Functional Analysis of Individual Liver Mitochondria by Atomic Force Microscopy", O. Zorikova and colleagues propose the use of Atomic Force Microscopy (AFM) as a tool for characterizing the biophysical properties of individual mitochondria. By analyzing parameters such as height, membrane fluctuation power spectra, and Young's modulus under various drug treatments and genetic mutations, the authors aim to provide a novel, label-free method for assessing mitochondrial functionality.

      While the manuscript presents an interesting approach, the introduction would benefit from a clearer and more cohesive narrative. The authors highlight the need to monitor the function of individual mitochondria, which is indeed an important challenge, but the rationale for doing so should be more explicitly stated. A stronger emphasis on the biological importance of mitochondrial biophysical parameters and the added value of using AFM would enhance the motivation for the study. Additionally, the symbol Δψ, referring to mitochondrial membrane potential, should be defined and briefly explained in the introduction for clarity.

      In the results section, a schematic diagram of the experiment would aid comprehension, especially for readers less familiar with this technique. In general, in the figures it would be good to find the individual data points. The integration of the results into the main text could also be improved. Currently, several findings are presented in a descriptive manner, but the biological interpretation or relevance is not always clear. For example, the sentence "Figure 2 presents a comprehensive analysis of the height and elastic properties of mitochondria" could be expanded to explain what those findings actually mean and how they help support the main goal of the study. Similarly, the statement that "the integrated power of mitochondrial membrane fluctuations decreased significantly upon valinomycin treatment" is presented without explanation of what this metric represents or why valinomycin was chosen. When discussing MTH2, the authors refer to "mechanical alterations in mitochondria lacking this protein" without explaining what MTH2 is, where it is localized, or why it is biologically relevant.

      Finally, in the discussion, the interpretation of results could be expanded. For example, the statement "MKO/MLM exhibited increased integrated power/potential, increased modulus/stiffness, and decreased height" would benefit from more biological context - what do these changes imply about mitochondrial function or physiology? Adding this kind of interpretation would help the reader better understand the broader significance of the findings.

      Methods: The authors say they record the piezo movement but it is not clear to the reviewer if the authors perform a closed-loop force-feedback experiment. If so, this will introduce noise into the measurement which can be avoid by performing an open loop measurement. Why did the authors not record the cantilever fluctuation at a constant piezo height? This gives enough bandwidth and low noise to record Angstrom deflections. Likewise, it is unclear to this reviewer why the power spectrum is given in V and not in nm, as it is typical in AFM measurements. I assume the authors calibrated the deflection sensitivity and spring constant of the cantilever, hence, if possible, the authors should convert the PSD into nm/Hz.

      During the elasticity measurements, did the authors correct for the finite thickness of the mitochondria? What was the contact force and indentation depth, and how thick were the mitochondria to begin with? If the indentation is larger than 20%, I suggest to perform a correction to account for the infinite stiffness of the substrate. Given that the mitochondrial stiffness is in the tens of kPa, this seems to be important (perhaps not for relative values but for absolute stiffness measurements).

      Figures. The figures are well constructed and aid the reader through the important messages of the paper. The authors however, should not excessively overuse bar charts without explicitly mentioning number of measurements for each condition. In essence, I strongly recommend plotting individual data points to see the distribution and replace the stars with actual p-values.

      Significance

      The premise of the study is compelling and could have important clinical implications for distinguishing dysfunctional mitochondria in pathological contexts. However, the manuscript in its current should be improved. First of all, non-invasive is more than an euphemism, as the mitochondria need to be taken out of the cell, which is highly invasive. The authors should delete non-invase from the title.

      As the work presents an orthogonal and non-standard approach, the authors introduced a novel assay that can guide future investigations into the biophysics of mitochondrial physiology. Thus the paper is of high interest, timely and cutting edge.

      In summary, the study presents a promising approach with potentially high relevance for mitochondrial research.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #3 (Public review):

      The central issue for evaluating the overfilling hypothesis is the identity of the mechanism that causes the very potent (>80% when inter pulse is 20 ms), but very quickly reverting (< 50 ms) paired pulse depression (Fig 1G, I). To summarize: the logic for overfilling at local cortical L2/3 synapses depends critically on the premise that probability of release (pv) for docked and fully primed vesicles is already close to 100%. If so, the reasoning goes, the only way to account for the potent short-term enhancement seen when stimulation is extended beyond 2 pulses would be by concluding that the readily releasable pool overfills. However, the conclusion that pv is close to 100% depends on the premise that the quickly reverting depression is caused by exocytosis dependent depletion of release sites, and the evidence for this is not strong in my opinion. Caution is especially reasonable given that similarly quickly reverting depression at Schaffer collateral synapses, which are morphologically similar, was previously shown to NOT depend on exocytosis (Dobrunz and Stevens 1997). Note that the authors of the 1997 study speculated that Ca2+-channel inactivation might be the cause, but did not rule out a wide variety of other types of mechanisms that have been discovered since, including the transient vesicle undocking/re-docking (and subsequent re-priming) reported by Kusick et al (2020), which seems to have the correct timing.

      Thank you for your comments on an alternative possibility besides Ca<sup>2+</sup> channel inactivation. Kusick et al. (2020) showed that transient destabilization of docked vesicle pool is recovered within 14 ms after stimulation. This rapid recovery implies that post-stimulation undocking events might be largely resolved before the 20 ms inter-stimulus interval (ISI) used in our paired-pulse ratio (PPR) experiments, arguing against the possibility that post-AP undocking/re-docking events significantly influence PPR measured at 20 ms ISI. Furthermore, Vevea et al. (2021) showed that post-stimulus undocking is facilitated in synaptotagmin-7 (Syt7) knockout synapses. In our study, Syt7 knockdown did not affect PPR at 20 ms ISI, suggesting that the undocking process described in Kusick et al. may not be a major contributor to the paired-pulse depression observed at 20 ms interval in our study. Therefore, it is unlikely that transient vesicle undocking primarily underlies the strong PPD at 20 ms ISI in our experiments. Taken together, the undocking/redocking dynamics reported by Kusick et al. are too rapid to affect PPR at 20 ms ISI, and our Syt7 knockdown data further argue against a significant role of this process in the PPD observed at 20 ms interval.

      In an earlier round of review, I suggested raising extracellular Ca<sup>2+</sup>, to see if this would increase synaptic strength. This is a strong test of the authors' model because there is essentially no room for an increase in synaptic strength. The authors have now done experiments along these lines, but the result is not clear cut. On one hand, the new results suggest an increase in synaptic strength that is not compatible with the authors' model; technically the increase does not reach statistical significance, but, likely, this is only because the data set is small and the variation between experiments is large. Moreover, a more granular analysis of the individual experiments seems to raise more serious problems, even supporting the depletion-independent counter hypothesis to some extent. On the other hand, the increase in synaptic strength that is seen in the newly added experiments does seem to be less at local L2/3 cortical synapses compared to other types of synapses, measured by other groups, which goes in the general direction of supporting the critical premise that pv is unusually high at L2/3 cortical synapses. Overall, I am left wishing that the new data set were larger, and that reversal experiments had been included as explained in the specific points below.

      Specific Points:

      (1) One of the standard methods for distinguishing between depletion-dependent and depletion-independent depression mechanisms is by analyzing failures during paired pulses of minimal stimulation. The current study includes experiments along these lines showing that pv would have to be extremely close to 1 when Ca<sup>2+</sup> is 1.25 mM to preserve the authors' model (Section "High double failure rate ..."). Lower values for pv are not compatible with their model because the k<sub>1</sub> parameter already had to be pushed a bit beyond boundaries established by other types of experiments.

      It should be noted that we did not arbitrarily pushed the k<sub>1</sub> parameter beyond boundaries, but estimated the range of k<sub>1</sub> based on the fast time constant for recovery from paired pulse depression as shown in Fig. 3-S2-Ab.

      The authors now report a mean increase in synaptic strength of 23% after raising Ca to 2.5 mM. The mean increase is not quite statistically significant, but this is likely because of the small sample size. I extracted a 95% confidence interval of [-4%, +60%] from their numbers, with a 92% probability that the mean value of the increase in the full population is > 5%. I used the 5% value as the greatest increase that the model could bear because 5% implies pv < 0.9 using the equation from Dodge and Rahamimoff referenced in the rebuttal. My conclusion from this is that the mean result, rather than supporting the model, actually undermines it to some extent. It would have likely taken 1 or 2 more experiments to get above the 95% confidence threshold for statistical significance, but this is ultimately an arbitrary cut off.

      Our key claim in Fig. 3-S3 is not the statistical non-significance of EPSC changes, but the small magnitude of the change (1.23-fold). This small increase is far less than the 3.24-fold increase predicted by the fourth-power relationship (D&R equation, Dodge & Rahamimoff, 1967), which would be valid under the conditions that the fusion probability of docked vesicles (p<sub>v</sub>) is not saturated. We do not believe that addition of new experiments would increase the magnitude of EPSC change as high as the Dodge & Rahamimoff equation predicts, even if more experiments (n) yielded a statistical significance. In other words, even a small but statistically significant EPSC changes would still contradict with what we expect from low p<sub>v</sub> synapses. It should be noted that our main point is the extent of EPSC increase induced by high external [Ca<sup>2+</sup>], not a p-value. In this regard, it is hard for us to accept the Reviewer’s request for larger sample size expecting lower p-value.

      Although we agree to Reviewer’s assertion that our data may indicate a 92% probability for the high Ca<sup>2+</sup> -induced EPSC increases by more than 5%, we do not agree to the Reviewer’s interpretation that the EPSC increase necessarily implies an increase in p<sub>v</sub>. We are sorry that we could not clearly understand the Reviewer’s inference that the 5% increase of EPSCs implies p<sub>v</sub> < 0.9. Please note that release probability (p<sub>r</sub>) is the product of p<sub>v</sub> and the occupancy of docked vesicles in an active zone (p<sub>occ</sub>). We imagine that this inference might be under the premise that p<sub>occ</sub> is constant irrespective of external [Ca<sup>2+</sup>]. Contrary to the Reviewer’s premise, Figure 2c in Kusick et al. (2020) showed that the number of docked SVs increased by c. a. 20% upon increasing external [Ca<sup>2+</sup>] to 2 mM. Moreover, Figure 7F in Lin et al. (2025) demonstrated that the number of TS vesicles, equivalent to p<sub>occ</sub> increased by 23% at high external [Ca<sup>2+</sup>]. These extents of p<sub>occ</sub> increases are similar to our magnitude of high external Ca<sup>2+</sup> -induced increase in EPSC (1.23-fold). Of course, it is possible that both increase of p<sub>occ</sub> and p<sub>v</sub> contributed to the high [Ca<sup>2+</sup>]<sub>o</sub>-induced increase in EPSC. The low PPR and failure rate analysis, however, suggest that p<sub>v</sub> is already saturated in baseline conditions of 1.3 mM [Ca<sup>2+</sup>]<sub>o</sub> and thus it is more likely that an increase in p<sub>occ</sub> is primarily responsible for the 1.23-fold increase. Moreover, the 1.23-fold increase, does not match to the prediction of the D&R equation, which would be valid at synapses with low p<sub>v</sub>. Therefore, interpreting our observation (1.23-fold increase) as a slight increase in p<sub>occ</sub> is rather consistent with recent papers (Kusick et al.,2020; Lin et al., 2025) as well as our other results supporting the baseline saturation of p<sub>v</sub> as shown in Figure 2 and associated supplement figures (Fig. 2-S1 and Fig. 2-S2).

      (2) The variation between experiments seems to be even more problematic, at least as currently reported. The plot in Figure 3-figure supplement 3 (left) suggests that the variation reflects true variation between synapses, not measurement error.

      Note that there was a substantial variance in the number of docked or TS vesicles at baseline and its fold changes at high external Ca<sup>2+</sup> condition in previous studies too (Lin et al., 2025; Kusick et al., 2020). Our study did not focus on the heterogeneity but on the mean dynamics of short-term plasticity at L2/3 recurrent synapses. Acknowledging this, the short-term plasticity of these synapses could be best explained by assuming that vesicular fusion probability (p<sub>v</sub>) is near to unity, and that release probability is regulated by p<sub>occ</sub>. In other words, even though p<sub>v</sub> is near to unity, synaptic strength can increase upon high external [Ca<sup>2+</sup>], if the baseline occupancy of release sites (p<sub>occ</sub>) is low and p<sub>occ</sub> is increased by high [Ca<sup>2+</sup>]. Lin et al. (2025) showed that high external [Ca<sup>2+</sup>] induces an increase in the number of TS vesicles (equivalent to p<sub>occ</sub>) by 23% at the calyx synapses. Different from our synapses, the baseline p<sub>v</sub> (denoted as p<sub>fusion</sub> in Lin et al., 2025) of the calyx synapse is not saturated (= 0.22) at 1.5 mM external [Ca<sup>2+</sup>], and thus the calyx synapses displayed 2.36-fold increase of EPSC at 2 mM external [Ca<sup>2+</sup>], to which increases in p<sub>occ</sub> as well as in p<sub>v</sub> (from 0.22 to 0.42) contributed. Therefore, the small increase in EPSC (= 23%) supports that p<sub>v</sub> is already saturated at L2/3 recurrent synapses.

      And yet, synaptic strength increased almost 2-fold in 2 of the 8 experiments, which back extrapolates to pv < 0.2.

      We are sorry that we could not understand the first comment in this paragraph. Could you explain in detail why two-fold increase implies pv < 0.2?

      If all of the depression is caused by depletion as assumed, these individuals would exhibit paired pulse facilitation, not depression. And yet, from what I can tell, the individuals depressed, possibly as much as the synapses with low sensitivity to Ca<sup>2+</sup>, arguing against the critical premise that depression equals depletion, and even arguing - to some extent - for the counter hypothesis that a component of the depression is caused by a mechanism that is independent of depletion.

      For the first statement in this paragraph, we imagine that ‘the depression’ means paired pulse depression (PPD). If so, we can not understand why depletion-dependent PPD should lead to PPF. If the paired pulse interval is too short for docked vesicles to be replenished, the first pulse-induced vesicle depletion would result in PPD. We are very sorry that we could not understand Reviewer’s subsequent inference, because we could not understand the first statement.

      I would strongly recommend adding an additional plot that documents the relationship between the amount of increase in synaptic strength after increasing extracellular Ca<sup>2+</sup> and the paired pulse ratio as this seems central.

      We found no clear correlation of EPSC<sub>1</sub> with PPR changes (ΔPPR) as shown in the figure below.

      Author response image 1.

      Plot of PPR changes as a function of EPSC1.<br />

      (3) Decrease in PPR. The authors recognize that the decrease in the paired-pulse ratio after increasing Ca<sup>2+</sup> seems problematic for the overfilling hypothesis by stating: "Although a reduction in PPR is often interpreted as an increase in pv, under conditions where pv is already high, it more likely reflects a slight increase in p<sub>occ</sub> or in the number of TS vesicles, consistent with the previous estimates (Lin et al., 2025)."

      We admit that there is a logical jump in our statement you mentioned here. We appreciate your comment. We re-wrote that part in the revised manuscript (line 285) as follows:

      “Recent morphological and functional studies revealed that elevation of [Ca<sup>2+</sup>]<sub>o</sub> induces an increase in the number of TS or docked vesicles to a similar extent as our observation (Kusick et al., 2020; Lin et al., 2025), raising a possibility that an increase in p<sub>occ</sub> is responsible for the 1.23-fold increase in EPSC at high [Ca<sup>2+</sup>]<sub>o</sub> . A slight but significant reduction in PPR was observed under high [Ca<sup>2+</sup>]<sub>o</sub> too. An increase in p<sub>occ</sub> is thought to be associated with that in the baseline vesicle refilling rate. While PPR is always reduced by an increase in p<sub>v,</sub> the effects of refilling rate to PPR is complicated. For example, PPR can be reduced by both a decrease (Figure 2—figure supplement 1) and an increase (Lin et al., 2025) in the refilling rate induced by EGTA-AM and PDBu, respectively. Thus, the slight reduction in PPR is not contradictory to the possible contribution of p<sub>occ</sub> to the high [Ca<sup>2+</sup>]<sub>o</sub> effects.”

      I looked quickly, but did not immediately find an explanation in Lin et al 2025 involving an increase in pocc or number of TS vesicles, much less a reason to prefer this over the standard explanation that reduced PPR indicates an increase in pv.

      Fig. 7F of Lin et al. (2025) shows an 1.23-fold increase in the number of TS vesicles by high external [Ca<sup>2+</sup>]. The same figure (Fig. 7E) in Lin et al. (2025) also shows a two-fold increase of p<sub>fusion</sub> (equivalent to p<sub>v</sub> in our study) by high external [Ca<sup>2+</sup>] (from 0.22 to 0.42,). Because p<sub>occ</sub> is the occupancy of TS vesicles in a limited number of slots in an active zone, the fold change in the number of TS vesicles should be similar to that of p<sub>occ</sub>.

      The authors should explain why the most straightforward interpretation is not the correct one in this particular case to avoid the appearance of cherry picking explanations to fit the hypothesis.

      The results of Lin et al. (2025) indicate that high external [Ca<sub>2+</sub>] induces a milder increase in p<sub>occ</sub> (23%) compared to p<sub>v</sub> (190%) at the calyx synapses. Because the extent of p<sub>occ</sub> increase is much smaller than that of p<sub>v</sub> and multiple lines of evidence in our study support that the baseline p<sub>v</sub> is already saturated, we raised a possibility that an increase in p<sub>occ</sub> would primarily contribute to the unexpectedly low increase of EPSC at 2.5 mM [Ca<sub>2+</sub>]<sub>o</sub>. As mentioned above, our interpretation is also consistent with the EM study of Kusick et al. (2020). Nevertheless, the reduction of PPR at 2.5 mM Ca<sub>2+</sub> seems to support an increase in p<sub>v,</sub> arguing against this possibility. On the other hand, because p<sub>occ</sub> = k<sub>1</sub>/(k<sub>1</sub>+b<sub>1</sub>) under the simple vesicle refilling model (Fig. 3-S2Aa), a change in p<sub>occ</sub> should associate with changes in k<sub>1</sub> and/or b<sub>1</sub>. While PPR is always reduced by an increase in p<sub>v,</sub> the effects of refilling rate to PPR is complicated. For example, despite that EGTA-AM would not increase p<sub>v,</sub> it reduced PPR probably through reducing refilling rate (Fig. 2-S1). On the contrary, PDBu is thought to increase k<sub>1</sub> because it induces two-fold increase of p<sub>occ</sub> (Fig. 7L of Lin et al., 2025). Such a marked increase of p<sub>occ,</sub> rather than p<sub>v,</sub> seems to be responsible for the PDBu-induced marked reduction of PPR (Fig. 7I of Lin et al., 2025), because PDBu induced only a slight increase in p<sub>v</sub> (Fig. 7K of Lin et al., 2025). Therefore, the slight reduction of PPR is not contradictory to our interpretation that an increase in p<sub>occ</sub> might be responsible for the slight increase in EPSC induced by high [Ca<sup>2+</sup>]<sub>o</sub>.

      (4) The authors concede in the rebuttal that mean pv must be < 0.7, but I couldn't find any mention of this within the manuscript itself, nor any explanation for how the new estimate could be compatible with the value of > 0.99 in the section about failures.

      We have never stated in the rebuttal or elsewhere that the mean p<sub>v</sub> must be < 0.7. On the contrary, both of our manuscript and previous rebuttals consistently argued that the baseline p<sub>v</sub> is already saturated, based on our observations including low PPR, tight coupling, high double failure rate and the minimal effect of external Ca<sup>2+</sup> elevation.

      (5) Although not the main point, comparisons to synapses in other brain regions reported in other studies might not be accurate without directly matching experiments.

      Please understand that it not trivial to establish optimal experimental settings for studying other synapses using the same methods employed in the study. We think that it should be performed in a separate study. Furthermore, we have already shown in the manuscript that action potentials (APs) evoked by oChIEF activation occur in a physiologically natural manner, and the STP induced by these oChIEF-evoked APs is indistinguishable from the STP elicited by APs evoked by dual-patch electrical stimulation. Therefore, we believe that our use of optogenetic stimulation did not introduce any artificial bias in measuring STP.

      As it is, 2 of 8 synapses got weaker instead of stronger, hinting at possible rundown, but this cannot be assessed because reversibility was not evaluated. In addition, comparing axons with and without channel rhodopsins might be problematic because the channel rhodopsins might widen action potentials.

      We continuously monitored series resistance and baseline EPSC amplitude throughout the experiments. The figure below shows the mean time course of EPSCs at two different [Ca<sup>2+</sup>]<sub>o</sub>. As it shows, we observed no tendency for run-down of EPSCs during experiments. If any, such recordings were discarded from analysis. In addition, please understand that there is a substantial variance in the number of docked vesicles at both baseline and high external Ca<sup>2+</sup> (Lin et al., 2025; Kusick et al., 2020) as well as short-term dynamics of EPSCs at our synapses.

      Author response image 2.

      Time course of normalized amplitudes of the first EPSCs during paired-pulse stimulation at 20 ms ISI in control and in the elevated external Ca<sup>2+</sup> (n = 8).<br />

      (6) Perhaps authors could double check with Schotten et al about whether PDBu does/does not decrease the latency between osmotic shock and transmitter release. This might be an interesting discrepancy, but my understanding is that Schotten et al didn't acquire information about latency because of how the experiments were designed.

      Schotten et al. (2015) directly compared experimental and simulation data for hypertonicity-induced vesicle release. They showed a pronounced acceleration of the latency as the tonicity increases (Fig. 2-S2), but this tonicity-dependent acceleration was not reproduced by reducing the activation energy barrier for fusion (ΔEa) in their simulations (Fig. 2-S1). Thus, the authors mentioned that an unknown compensatory mechanism counteracting the osmotic perturbation might be responsible for the tonicity-dependent changes in the latency. Importantly, their modeling demonstrated that reducing ΔEa, which would correspond to increasing p<sub>v</sub> results in larger peak amplitudes and shorter time-to-peak, but did not accelerate the latency. Therefore, there is currently no direct explanation for the notion that PDBu or similar manipulations shorten latency via an increase in p<sub>v</sub>.

      (7) The authors state: "These data are difficult to reconcile with a model in which facilitation is mediated by Ca2+-dependent increases in pv." However, I believe that discarding the premise that depression is always caused by depletion would open up wide range of viable possibilities.

      We hope that Reviewer understands the reasons why we reached the conclusion that the baseline p<sub>v</sub> is saturated at our synapses. First of all, strong paired pulse depression (PPD) cannot be attributed to Ca<sup>2+</sup> channel inactivation because Ca<sup>2+</sup> influx at the axon terminal remained constant during 40 Hz train stimulation (Fig.2 -S2). Moreover, even if Ca<sup>2+</sup> channel inactivation is responsible for the strong PPD, this view cannot explain the delayed facilitation that emerges subsequent pulses (third EPSC and so on) in the 40 Hz train stimulation (Fig. 1-4), because Ca<sup>2+</sup> channel inactivation gradually accumulates during train stimulations as directly shown by Wykes et al. (2007) in chromaffin cells. Secondly, the strong PPD and very fast recovery from PPD indicates very fast refilling rate constant (k<sub>1</sub>). Under this high k<sub>1</sub>, the failure rates were best explained by p<sub>v</sub> close to unity. Thirdly, the extent of EPSC increase induced by high external Ca<sup>2+</sup> was much smaller than other synapses such as calyx synapses at which p<sub>v</sub> is not saturated (Lin et al., 2025), and rather similar to the increases in p<sub>occ</sub> estimated at calyx synapses or the EM study (Kusick et al., 2020; Lin et al., 2025).

      Reference

      Wykes et al. (2007). Differential regulation of endogenous N-and P/Q-type Ca<sup>2+</sup> channel inactivation by Ca<sup>2+</sup>/calmodulin impacts on their ability to support exocytosis in chromaffin cells. Journal of Neuroscience, 27(19), 5236-5248.

      Reviewer #3 (Recommendations for the authors):

      I continue to think that measuring changes in synaptic strength when raising extracellular Ca<sup>2+</sup> is a good experiment for evaluating the overfilling hypothesis. Future experiments would be better if the authors would include reversibility criteria to rule out rundown, etc. Also, comparisons to other types of synapses would be stronger if the same experimenter did the experiments at both types of synapses.

      We observed no systemic tendency for run-down of EPSCs during these experiments (Author response image 2). Furthermore, the observed variability is well within the expected variance range in the number of docked vesicles at both baseline and high external Ca²⁺ (Lin et al., 2025; Kusick et al., 2020) and reflects biological variability rather than experimental artifact. Therefore, we believe that additional reversibility experiments are not warranted. However, we are open to further discussion if the Reviewer has specific methodological concerns not resolved by our present data.

      For the second issue, as mentioned above, we think that studying at other synapse types should be done in a separate study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, Gu et al. employed novel viral strategies, combined with in vivo two-photon imaging, to map the tone response properties of two groups of cortical neurons in A1. The thalamocortical recipient (TR neurons) and the corticothalamic (CT neurons). They observed a clear tonotopic gradient among TR neurons but not in CT neurons. Moreover, CT neurons exhibited high heterogeneity of their frequency tuning and broader bandwidth, suggesting increased synaptic integration in these neurons. By parsing out different projecting-specific neurons within A1, this study provides insight into how neurons with different connectivity can exhibit different frequency response-related topographic organization.

      Strengths:

      This study reveals the importance of studying neurons with projection specificity rather than layer specificity since neurons within the same layer have very diverse molecular, morphological, physiological, and connectional features. By utilizing a newly developed rabies virus CSN-N2c GCaMP-expressing vector, the authors can label and image specifically the neurons (CT neurons) in A1 that project to the MGB. To compare, they used an anterograde trans-synaptic tracing strategy to label and image neurons in A1 that receive input from MGB (TR neurons).

      Weaknesses:

      Perhaps as cited in the introduction, it is well known that tonotopic gradient is well preserved across all layers within A1, but I feel if the authors want to highlight the specificity of their virus tracing strategy and the populations that they imaged in L2/3 (TR neurons) and L6 (CT neurons), they should perform control groups where they image general excitatory neurons in the two depths and compare to TR and CT neurons, respectively. This will show that it's not their imaging/analysis or behavioral paradigms that are different from other labs. 

      We thank the reviewer for these constructive suggestions. As recommended, we have performed control experiments that imaged the general excitatory neurons in superficial layers (shown below), and the results showed a clear tonotopic gradient, which was consistent with previous findings (Bandyopadhyay et al., 2010; Romero et al., 2020; Rothschild et al., 2010; Tischbirek et al., 2019), thereby validating the reliability of our imaging/analysis approach. The results are presented in a new supplemental figure (Figure 2- figure supplementary 3).

      Related publications:

      (1) Gu M, Li X, Liang S, Zhu J, Sun P, He Y, Yu H, Li R, Zhou Z, Lyu J, Li SC, Budinger E, Zhou Y, Jia H, Zhang J, Chen X. 2023. Rabies virus-based labeling of layer 6 corticothalamic neurons for two-photon imaging in vivo. iScience 26: 106625. DIO: https://doi.org/10.1016/j.isci.2023.106625, PMID: 37250327

      (2) Bandyopadhyay S, Shamma SA, Kanold PO. 2010. Dichotomy of functional organization in the mouse auditory cortex. Nat Neurosci 13: 361-8. DIO: https://doi.org/10.1038/nn.2490, PMID: 20118924

      (3) Romero S, Hight AE, Clayton KK, Resnik J, Williamson RS, Hancock KE, Polley DB. 2020. Cellular and Widefield Imaging of Sound Frequency Organization in Primary and Higher Order Fields of the Mouse Auditory Cortex. Cerebral Cortex 30: 1603-1622. DIO: https://doi.org/10.1093/cercor/bhz190, PMID: 31667491

      (4) Rothschild G, Nelken I, Mizrahi A. 2010. Functional organization and population dynamics in the mouse primary auditory cortex. Nat Neurosci 13: 353-60. DIO: https://doi.org/10.1038/nn.2484, PMID: 20118927

      (5) Tischbirek CH, Noda T, Tohmi M, Birkner A, Nelken I, Konnerth A. 2019. In Vivo Functional Mapping of a Cortical Column at Single-Neuron Resolution. Cell Rep 27: 1319-1326 e5. DIO: https://doi.org/10.1016/j.celrep.2019.04.007, PMID: 31042460

      Figures 1D and G, the y-axis is Distance from pia (%). I'm not exactly sure what this means. How does % translate to real cortical thickness?

      We thank the reviewer for this question. The distance of labeled cells from pia was normalized to the entire distance from pia to L6/WM border for each mouse, according to the previous study (Chang and Kawai, 2018). For all mice tested, the entire distance from pia to L6/WM border was 826.5 ± 23.4 mm (in the range of 752.9 to 886.1).

      Related publications:

      Chang M, Kawai HD. 2018. A characterization of laminar architecture in mouse primary auditory cortex. Brain Structure and Function 223: 4187-4209. DIO: https://doi.org/10.1007/s00429-018-1744-8, PMID: 30187193

      For Figure 2G and H, is each circle a neuron or an animal? Why are they staggered on top of each other on the x-axis? If the x-axis is the distance from caudal to rostral, each neuron should have a different distance? Also, it seems like it's because Figure 2H has more circles, which is why it has more variation, thus not significant (for example, at 600 or 900um, 2G seems to have fewer circles than 2H). 

      We sincerely appreciate the reviewer’s careful attention to the details of our figures. Each circle in the Figure 2G and H represents an individual imaging focal plane from different animals, and the median BF of some focal planes may be similar, leading to partial overlap. In the regions where overlap occurs, the brightness of the circle will be additive.

      Since fewer CT neurons, compared to TR neurons, responded to pure tones within each focal plane, as shown in Figure 2- figure supplementary 2, a larger number of focal planes were imaged to ensure a consistent and robust analysis of the pure tone response characteristics. The higher variance and lack of correlation in CT neurons is a key biological finding, not an artifact of sample size. The data clearly show a wide spread of median BFs at any given location for CT neurons, a feature absent in the TR population.

      Similarly, in Figures 2J and L, why are the circles staggered on the y-axis now? And is each circle now a neuron or a trial? It seems they have many more circles than Figure 2G and 2H. Also, I don't think doing a correlation is the proper stats for this type of plot (this point applies to Figures 3H and 3J).

      We regret any confusion have caused. In fact, Figure 2 illustrates the tonotopic gradient of CT and TR neurons at different scales. Specifically, Figures 2E-H present the imaging from the focal plane perspective (23 focal planes in Figures 2G, 40 focal planes in Figures 2H), whereas Figures 2I-L provide a more detailed view at the single-cell level (481 neurons in Figures 2J, 491 neurons in Figures 2L). So, Figures 2J and L do indeed have more circles than Figures 2G and H. The analysis at these varying scales consistently reveals the presence of a tonotopic gradient in TR neurons, whereas such a gradient is absent in CT neurons.

      We used Pearson correlation as a standard and direct method to quantify the linear relationship between a neuron's anatomical position and its frequency preference, which is widely used in the field to provide a quantitative measure (R-value) and a significance level (p-value) for the strength of a tonotopic gradient. The same statistical logic applies to testing for spatial gradients in local heterogeneity in Figure 3. We are confident that this is an appropriate and informative statistical approach for these data.

      What does the inter-quartile range of BF (IQRBF, in octaves) imply? What's the interpretation of this analysis? I am confused as to why TR neurons show high IQR in HF areas compared to LF areas, which means homogeneity among TR neurons (lines 213 - 216). On the same note, how is this different from the BF variability?  Isn't higher IQR equal to higher variability?

      We thank the reviewer for raising this important point. IQRBF, is a measure of local tuning heterogeneity. It quantifies the diversity of BFs among neighboring neurons. A small IQRBF means neighbors are similarly tuned (an orderly, homogeneous map), while a large IQRBF means neighbors have very different BFs (a disordered, heterogeneous map). (Winkowski and Kanold, 2013; Zeng et al., 2019).

      From the BF position reconstruction of all TR neurons (Figures 2I), most TR neurons respond to high-frequency sounds in the high-frequency (HF) region, but some neurons respond to low frequencies such as 2 kHz, which contributes to high IQR in HF areas. This does not contradict our main conclusion, that the TR neurons is significantly more homogeneous than the CT neurons. BF variability represents the stability of a neuron's BF over time, while IQR represents the variability of BF among different neurons within a certain range. (Chambers et al., 2023).

      Related publications:

      (1) Chambers AR, Aschauer DF, Eppler JB, Kaschube M, Rumpel S. 2023. A stable sensory map emerges from a dynamic equilibrium of neurons with unstable tuning properties. Cerebral Cortex 33: 5597-5612. DIO: https://doi.org/10.1093/cercor/bhac445, PMID: 36418925

      (2) Winkowski DE, Kanold PO. 2013. Laminar transformation of frequency organization in auditory cortex. Journal of Neuroscience 33: 1498-508. DIO: https://doi.org/10.1523/JNEUROSCI.3101-12.2013, PMID: 23345224

      (3) Zeng HH, Huang JF, Chen M, Wen YQ, Shen ZM, Poo MM. 2019. Local homogeneity of tonotopic organization in the primary auditory cortex of marmosets. Proceedings of the National Academy of Sciences of the United States of America 116: 3239-3244. DIO: https://doi.org/10.1073/pnas.1816653116, PMID: 30718428

      Figure 4A-B, there are no clear criteria on how the authors categorize V, I, and O shapes. The descriptions in the Methods (lines 721 - 725) are also very vague.

      We apologize for the initial vagueness and have replaced the descriptions in the Methods section. “V-shaped”: Neurons whose FRAs show decreasing frequency selectivity with increasing intensity. “I-shaped”: Neurons whose FRAs show constant frequency selectivity with increasing intensity. “O-shaped”: Neurons responsive to a small range of intensities and frequencies, with the peak response not occurring at the highest intensity level.

      To provide better visual intuition, we show multiple representative examples of each FRA type for both TR and CT neurons below. We are confident that these provide the necessary clarity and reproducibility for our analysis of receptive field properties.

      Author response image 1.

      Different FRA types within the dataset of TR and CT neurons. Each row shows 6 representative FRAs from a specific type. Types are V-shaped (‘V'), I-shaped (‘I’), and O-shaped (‘O’). The X-axis represents 11 pure tone frequencies, and the Y-axis represents 6 sound intensities.

      Reviewer #2 (Public Review):

      Summary:

      Gu and Liang et. al investigated how auditory information is mapped and transformed as it enters and exits an auditory cortex. They use anterograde transsynaptic tracers to label and perform calcium imaging of thalamorecipient neurons in A1 and retrograde tracers to label and perform calcium imaging of corticothalamic output neurons. They demonstrate a degradation of tonotopic organization from the input to output neurons.

      Strengths:

      The experiments appear well executed, well described, and analyzed.

      Weaknesses:

      (1) Given that the CT and TR neurons were imaged at different depths, the question as to whether or not these differences could otherwise be explained by layer-specific differences is still not 100% resolved. Control measurements would be needed either by recording (1) CT neurons in upper layers, (2) TR in deeper layers, (3) non-CT in deeper layers and/or (4) non-TR in upper layers.

      We appreciate these constructive suggestions. To address this, we performed new experiments and analyses.

      Comparison of TR neurons across superficial layers: we analyzed our existing TR neuron dataset to see if response properties varied by depth within the superficial layers. We found no significant differences in the fraction of tuned neurons, field IQR, or maximum bandwidth (BWmax) between TR neurons in L2/3 and L4. This suggests a degree of functional homogeneity within the thalamorecipient population across these layers. The results are presented in new supplemental figures (Figure 2- figure supplementary 4).

      Necessary control experiments.

      (1) CT neurons in upper layers. CT neurons are thalamic projection neurons that only exist in the deeper cortex, so CT neurons do not exist in upper layers (Antunes and Malmierca, 2021).

      (2) TR neurons in deeper layers. As we mentioned in the manuscript, due to high-titer AAV1-Cre virus labeling controversy (anterograde and retrograde labelling both exist), it is challenging to identify TR neurons in deeper layers.

      (3) non-CT in deeper layers and/or (4) non-TR in upper layers.

      To directly test if projection identity confers distinct functional properties within the same cortical layers, we performed the crucial control of comparing TR neurons to their neighboring non-TR neurons. We injected AAV1-Cre in MGB and a Cre-dependent mCherry into A1 to label TR neurons red. We then co-injected AAV-CaMKII-GCaMP6s to label the general excitatory population green.  In merged images, this allowed us to functionally image and directly compare TR neurons (yellow) and adjacent non-TR neurons (green). We separately recorded the responses of these neurons to pure tones using two-photon imaging. The results show that TR neurons are significantly more likely to be tuned to pure tones than their neighboring non-TR excitatory neurons. This finding provides direct evidence that a neuron's long-range connectivity, and not just its laminar location, is a key determinant of its response properties. The results are presented in new supplemental figures (Figure 2- figure supplementary 5).

      Related publications:

      Antunes FM, Malmierca MS. 2021. Corticothalamic Pathways in Auditory Processing: Recent Advances and Insights From Other Sensory Systems. Front Neural Circuits 15: 721186. DIO: https://doi.org/10.3389/fncir.2021.721186, PMID: 34489648

      (2) What percent of the neurons at the depths are CT neurons? Similar questions for TR neurons?

      We thank the reviewer for the comments. We performed histological analysis on brain slices from our experimental animals to quantify the density of these projection-specific populations. Our analysis reveals that CT neurons constitute approximately 25.47%\22.99%–36.50% of all neurons in Layer 6 of A1. In the superficial layers(L2/3 and L4), TR neurons comprise approximately 10.66%\10.53%–11.37% of the total neuronal population.

      Author response image 2.

      The fraction of CT and TR neurons. (A) Boxplots showing the fraction of CT neurons. N = 11 slices from 4 mice. (B) Boxplots showing the fraction of TR neurons. N = 11 slices from 4 mice.

      (3) V-shaped, I-shaped, or O-shaped is not an intuitively understood nomenclature, consider changing. Further, the x/y axis for Figure 4a is not labeled, so it's not clear what the heat maps are supposed to represent.

      The terms "V-shaped," "I-shaped," and "O-shaped" are an established nomenclature in the auditory neuroscience literature for describing frequency response areas (FRAs), and we use them for consistency with prior work. V-shaped: Neurons whose FRAs show decreasing frequency selectivity with increasing intensity. I-shaped: Neurons whose FRAs show constant frequency selectivity with increasing intensity. O-shaped: Neurons responsive to a small range of intensities and frequencies, with the peak response not occurring at the highest intensity level.

      (Rothschild et al., 2010). We have included a more detailed description in the Methods.

      The X-axis represents 11 pure tone frequencies, and the Y-axis represents 6 sound intensities. So, the heat map represents the FRA of neurons in A1, reflecting the responses for different frequencies and intensities of sound stimuli. In the revised manuscript, we have provided clarifications in the figure legend.

      (4) Many references about projection neurons and cortical circuits are based on studies from visual or somatosensory cortex. Auditory cortex organization is not necessarily the same as other sensory areas. Auditory cortex references should be used specifically, and not sources reporting on S1, and V1.

      We thank the reviewers for their valuable comments. We have made a concerted effort to ensure that claims about cortical circuit organization are supported by findings specifically from the auditory cortex wherever possible, strengthening the focus and specificity of our discussion.

      Reviewer #3 (Public Review):

      Summary:

      The authors performed wide-field and 2-photon imaging in vivo in awake head-fixed mice, to compare receptive fields and tonotopic organization in thalamocortical recipient (TR) neurons vs corticothalamic (CT) neurons of mouse auditory cortex. TR neurons were found in all cortical layers while CT neurons were restricted to layer 6. The TR neurons at nominal depths of 200-400 microns have a remarkable degree of tonotopy (as good if not better than tonotopic maps reported by multiunit recordings). In contrast, CT neurons were very heterogenous in terms of their best frequency (BF), even when focusing on the low vs high-frequency regions of the primary auditory cortex. CT neurons also had wider tuning.

      Strengths:

      This is a thorough examination using modern methods, helping to resolve a question in the field with projection-specific mapping.

      Weaknesses:

      There are some limitations due to the methods, and it's unclear what the importance of these responses are outside of behavioral context or measured at single timepoints given the plasticity, context-dependence, and receptive field 'drift' that can occur in the cortex.

      (1) Probably the biggest conceptual difficulty I have with the paper is comparing these results to past studies mapping auditory cortex topography, mainly due to differences in methods. Conventionally, the tonotopic organization is observed for characteristic frequency maps (not best frequency maps), as tuning precision degrades and the best frequency can shift as sound intensity increases. The authors used six attenuation levels (30-80 dB SPL) and reported that the background noise of the 2-photon scope is <30 dB SPL, which seems very quiet. The authors should at least describe the sound-proofing they used to get the noise level that low, and some sense of noise across the 2-40 kHz frequency range would be nice as a supplementary figure. It also remains unclear just what the 2-photon dF/F response represents in terms of spikes. Classic mapping using single-unit or multi-unit electrodes might be sensitive to single spikes (as might be emitted at characteristic frequency), but this might not be as obvious for Ca2+ imaging. This isn't a concern for the internal comparison here between TR and CT cells as conditions are similar, but is a concern for relating the tonotopy or lack thereof reported here to other studies.

      We sincerely thank the reviewer for the thoughtful evaluation of our manuscript and for your positive assessment of our work.

      (1)  Concern regarding Best Frequency (BF) vs. Characteristic Frequency (CF)

      Our use of BF, defined as the frequency eliciting the highest response averaged across all sound levels, is a standard and practical approach in 2-photon Ca²⁺ imaging studies. (Issa et al., 2014; Rothschild et al., 2010; Schmitt et al., 2023; Tischbirek et al., 2019). This method is well-suited for functionally characterizing large numbers of neurons simultaneously, where determining a precise firing threshold for each individual cell can be challenging.

      (2) Concern regarding background noise of the 2-photon setup

      We have expanded the Methods section ("Auditory stimulation") to include a detailed description of the sound-attenuation strategies used during the experiments. The use of a custom-built, double-walled sound-proof enclosure lined with wedge-shaped acoustic foam was implemented to significantly reduce external noise interference. These strategies ensured that auditory stimuli were delivered under highly controlled, low-noise conditions, thereby enhancing the reliability and accuracy of the neural response measurements obtained throughout the study.

      (3) Concern regarding the relationship between dF/F and spikes

      While Ca²⁺ signals are an indirect and filtered representation of spiking activity, they are a powerful tool for assessing the functional properties of genetically-defined cell populations. As you note, the properties and limitations of Ca²⁺ imaging apply equally to both the TR and CT neuron groups we recorded. Therefore, the profound difference we observed—a clear tonotopic gradient in one population and a lack thereof in the other—is a robust biological finding and not a methodological artifact.

      Related publications:

      (1) Issa JB, Haeffele BD, Agarwal A, Bergles DE, Young ED, Yue DT. 2014. Multiscale optical Ca2+ imaging of tonal organization in mouse auditory cortex. Neuron 83: 944-59. DIO: https://doi.org/10.1016/j.neuron.2014.07.009, PMID: 25088366

      (2) Rothschild G, Nelken I, Mizrahi A. 2010. Functional organization and population dynamics in the mouse primary auditory cortex. Nat Neurosci 13: 353-60. DIO: https://doi.org/10.1038/nn.2484, PMID: 20118927

      (3) Schmitt TTX, Andrea KMA, Wadle SL, Hirtz JJ. 2023. Distinct topographic organization and network activity patterns of corticocollicular neurons within layer 5 auditory cortex. Front Neural Circuits 17: 1210057. DIO: https://doi.org/10.3389/fncir.2023.1210057, PMID: 37521334

      (4) Tischbirek CH, Noda T, Tohmi M, Birkner A, Nelken I, Konnerth A. 2019. In Vivo Functional Mapping of a Cortical Column at Single-Neuron Resolution. Cell Rep 27: 1319-1326 e5. DIO: https://doi.org/10.1016/j.celrep.2019.04.007, PMID: 31042460

      (2) It seems a bit peculiar that while 2721 CT neurons (N=10 mice) were imaged, less than half as many TR cells were imaged (n=1041 cells from N=5 mice). I would have expected there to be many more TR neurons even mouse for mouse (normalizing by number of neurons per mouse), but perhaps the authors were just interested in a comparison data set and not being as thorough or complete with the TR imaging?

      As shown in the Figure 2- figure supplementary 2, a much higher fraction of TR neurons was "tuned" to pure tones (46% of 1041 neurons) compared to CT neurons (only 18% of 2721 neurons). To obtain a statistically robust and comparable number of tuned neurons for our core analysis (481 tuned TR neurons vs. 491 tuned CT neurons), it was necessary to sample a larger total population of CT neurons, which required imaging from more animals.

      (3) The authors' definitions of neuronal response type in the methods need more quantitative detail. The authors state: "Irregular" neurons exhibited spontaneous activity with highly variable responses to sound stimulation. "Tuned" neurons were responsive neurons that demonstrated significant selectivity for certain stimuli. "Silent" neurons were defined as those that remained completely inactive during our recording period (> 30 min). For tuned neurons, the best frequency (BF) was defined as the sound frequency associated with the highest response averaged across all sound levels.". The authors need to define what their thresholds are for 'highly variable', 'significant', and 'completely inactive'. Is best frequency the most significant response, the global max (even if another stimulus evokes a very close amplitude response), etc.

      We appreciate the reviewer's suggestions. We have added more detailed description in the Methods.

      Tuned neurons: A responsive neuron was further classified as "Tuned" if its responses showed significant frequency selectivity. We determined this using a one-way ANOVA on the neuron's response amplitudes across all tested frequencies (at the sound level that elicited the maximal response). If the ANOVA yielded a p-value < 0.05, the neuron was considered "Tuned”. Irregular neurons: Responsive neurons that did not meet the statistical criterion for being "Tuned" (i.e., ANOVA p-value ≥ 0.05) were classified as "Irregular”. This provides a clear, mutually exclusive category for sound-responsive but broadly-tuned or non-selective cells. Silent neurons: Neurons that were not responsive were classified as "Silent". This quantitatively defines them as cells that showed no significant stimulus-evoked activity during the entire recording session. Best frequency (BF): It is the frequency that elicited the maximal mean response, averaged across all sound levels.

      To provide greater clarity, we showed examples in the following figures.

      Author response image 3.

      Reviewer #1 (Recommendations For The Authors):

      (1) A1 and AuC were used exchangeably in the text.

      Thank you for pointing out this issue. Our terminological strategy was to remain faithful to the original terms used in the literature we cite, where "AuC" is often used more broadly. In the revised manuscript, we have performed a careful edit to ensure that we use the specific term "A1" (primary auditory cortex) when describing our own results and recording locations, which were functionally and anatomically confirmed.

      (2) Grammar mistakes throughout.

      We are grateful for the reviewer’s suggested improvement to our wording. The entire manuscript has undergone a thorough professional copyediting process to correct all grammatical errors and improve overall readability.

      (3) The discussion should talk more about how/why L6 CT neurons don't possess the tonotopic organization and what are the implications. Currently, it only says 'indicative of an increase in synaptic integration during cortical processing'...

      Thanks for this suggestion. We have substantially revised and expanded the Discussion section to explore the potential mechanisms and functional implications of the lack of tonotopy in L6 CT neurons.

      Broad pooling of inputs: We propose that the lack of tonotopy is an active computation, not a passive degradation. CT neurons likely pool inputs from a wide range of upstream neurons with diverse frequency preferences. This broad synaptic integration, reflected in their wider tuning bandwidth, would actively erase the fine-grained frequency map in favor of creating a different kind of representation.

      A shift from topography to abstract representation: This transformation away from a classic sensory map may be critical for the function of corticothalamic feedback. Instead of relaying "what" frequency was heard, the descending signal from CT neurons may convey more abstract, higher-order information, such as the behavioral relevance of a sound, predictions about upcoming sounds, or motor-related efference copy signals that are not inherently frequency-specific.’

      Modulatory role of the descending pathway: The descending A1-to-MGB pathway is often considered to be modulatory, shaping thalamic responses rather than driving them directly. A modulatory signal designed to globally adjust thalamic gain or selectivity may not require, and may even be hindered by, a fine-grained topographical organization.

      Reviewer #2 (Recommendations For The Authors):

      (1) Given that the CT and TR neurons were imaged at different depths, the question as to whether or not these differences could otherwise be explained by layer-specific differences is still not 100% resolved. Control measurements would be needed either by recording (1) CT neurons in upper layers (2) TR in deeper layers (3) non-CT in deeper layers and/or (4) non-TR in upper layers.

      We appreciate these constructive suggestions. To address this, we performed new experiments and analyses.

      Comparison of TR neurons across superficial layers: we analyzed our existing TR neuron dataset to see if response properties varied by depth within the superficial layers. We found no significant differences in the fraction of tuned neurons, field IQR, or maximum bandwidth (BWmax) between TR neurons in L2/3 and L4. This suggests a degree of functional homogeneity within the thalamorecipient population across these layers.

      Necessary control experiments.

      (1) CT neurons in upper layers. CT neurons are thalamic projection neurons that only exist in the deeper cortex, so CT neurons do not exist in upper layers (Antunes and Malmierca, 2021).

      (2) TR neurons in deeper layers. As we mentioned in the manuscript, due to high-titer AAV1-Cre virus labeling controversy (anterograde and retrograde labelling both exist), it is challenging to identify TR neurons in deeper layers.

      (3) non-CT in deeper layers and/or (4) non-TR in upper layers.

      To directly test if projection identity confers distinct functional properties within the same cortical layers, we performed the crucial control of comparing TR neurons to their neighboring non-TR neurons. We injected AAV1-Cre in MGB and a Cre-dependent mCherry into A1 to label TR neurons red. We then co-injected AAV-CaMKII-GCaMP6s to label the general excitatory population green.  In merged images, this allowed us to functionally image and directly compare TR neurons (yellow) and adjacent non-TR neurons (green). We separately recorded the responses of these neurons to pure tones using two-photon imaging. The results show that TR neurons are significantly more likely to be tuned to pure tones than their neighboring non-TR excitatory neurons. This finding provides direct evidence that a neuron's long-range connectivity, and not just its laminar location, is a key determinant of its response properties.

      Related publications:

      Antunes FM, Malmierca MS. 2021. Corticothalamic Pathways in Auditory Processing: Recent Advances and Insights From Other Sensory Systems. Front Neural Circuits 15: 721186. DIO: https://doi.org/10.3389/fncir.2021.721186, PMID: 34489648

      (3) V-shaped, I-shaped, or O-shaped is not an intuitively understood nomenclature, consider changing. Further, the x/y axis for Figure 4a is not labeled, so it's not clear what the heat maps are supposed to represent.

      The terms "V-shaped," "I-shaped," and "O-shaped" are an established nomenclature in the auditory neuroscience literature for describing frequency response areas (FRAs), and we use them for consistency with prior work. V-shaped: Neurons whose FRAs show decreasing frequency selectivity with increasing intensity. I-shaped: Neurons whose FRAs show constant frequency selectivity with increasing intensity. O-shaped: Neurons responsive to a small range of intensities and frequencies, with the peak response not occurring at the highest intensity level.

      (Rothschild et al., 2010). We have included a more detailed description in the Methods.

      The X-axis represents 11 pure tone frequencies, and the Y-axis represents 6 sound intensities. So, the heat map represents the FRA of neurons in A1, reflecting the responses for different frequencies and intensities of sound stimuli. In the revised manuscript, we have provided clarifications in the figure legend.

      (4) Many references about projection neurons and cortical circuits are based on studies from visual or somatosensory cortex. Auditory cortex organization is not necessarily the same as other sensory areas. Auditory cortex references should be used specifically, and not sources reporting on S1, V1.

      We thank the reviewers for their valuable comments. We have made a concerted effort to ensure that claims about cortical circuit organization are supported by findings specifically from the auditory cortex wherever possible, strengthening the focus and specificity of our discussion.

      Reviewer #3 (Recommendations For The Authors):

      I suggest showing some more examples of how different neurons and receptive field properties were quantified and statistically analyzed. Especially in Figure 4, but really throughout.

      We thank the reviewer for this valuable suggestion. To provide greater clarity, we have added more examples in the following figure.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      The study by Klug et al. investigated the pathway specificity of corticostriatal projections, focusing on two cortical regions. Using a G-deleted rabies system in D1-Cre and A2a-Cre mice to retrogradely deliver channelrhodopsin to cortical inputs, the authors found that M1 and MCC inputs to direct and indirect pathway spiny projection neurons (SPNs) are both partially segregated and asymmetrically overlapping. In general, corticostriatal inputs that target indirect pathway SPNs are likely to also target direct pathway SPNs, while inputs targeting direct pathway SPNs are less likely to also target indirect pathway SPNs. Such asymmetric overlap of corticostriatal inputs has important implications for how the cortex itself may determine striatal output. Indeed, the authors provide behavioral evidence that optogenetic activation of M1 or MCC cortical neurons that send axons to either direct or indirect pathway SPNs can have opposite effects on locomotion and different effects on action sequence execution. The conclusions of this study add to our understanding of how cortical activity may influence striatal output and offer important new clues about basal ganglia function. 

      The conceptual conclusions of the manuscript are supported by the data, but the details of the magnitude of afferent overlap and causal role of asymmetric corticostriatal inputs on behavioral outcomes were not yet fully resolved. 

      We appreciate the reviewer’s thoughtful understanding and acknowledgment that the conceptual conclusion of asymmetric projections from the cortex to the striatum is well supported by our data. We also recognize the importance of further elucidating the extent of afferent overlap and the causal contributions of asymmetric corticostriatal inputs to behavioral outcomes. However, we respectfully note that current technical limitations pose significant challenges to addressing these questions with high precision.

      In response to the reviewer’s comments, we have now clarified the sample size, added proper analysis and elaborated on the experimental design to ensure that our conclusions are presented more transparently and are more accessible to the reader.

      After virally labeling either direct pathway (D1) or indirect pathway (D2) SPNs to optogenetically tag pathway-specific cortical inputs, the authors report that a much larger number of "non-starter" D2-SPNs from D2-SPN labeled mice responded to optogenetic stimulation in slices than "non-starter" D1 SPNs from D1-SPN labeled mice did. Without knowing the relative number of D1 or D2 SPN starters used to label cortical inputs, it is difficult to interpret the exact meaning of the lower number of responsive D2-SPNs in D1 labeled mice (where only ~63% of D1-SPNs themselves respond) compared to the relatively higher number of responsive D1-SPNs (and D2-SPNs) in D2 labeled mice. While relative differences in connectivity certainly suggest that some amount of asymmetric overlap of inputs exists, differences in infection efficiency and ensuing differences in detection sensitivity in slice experiments make determining the degree of asymmetry problematic. 

      Thank you for highlighting this point. As it lies at the core of our manuscript, we agree that it is essential to present it clearly and convincingly. As shown by the statistics (Fig. 2B-F), non-starter D1- and D2-SPNs appear to receive fewer projections from D1-projecting cortical neurons (Input D1-record D1, 0.63; Input D1-record D2, 0.40) compared to D2-projecting cortical neurons (Input D2 - record D1, 0.73; Input D2 -record D2, 0.79).

      While it is not technically feasible to quantify the number of infected cells in brain slices following electrophysiological recordings, we addressed this limitation by collecting data from multiple animals and restricting recordings to cells located within the injection sites. In Figure 2D, we used 7 mice in the D1-projecting to D1 EGFP(+) group, 8 mice in the D1-projecting to D2 EGFP(-) group, 10 mice in the D2-projecting to D2 EGFP(+) group, and 8 mice in the D2-projecting to D1 EGFP(-) group. In Figure 2G, the group sizes were as follows: 8 mice in the D1-projecting to D2 EGFP(+) group, 7 mice in the D1-projecting to D1 EGFP(-) group, 8 mice in the D2-projecting to D1 EGFP(+) group, and 10 mice in the D2-projecting to D2 EGFP(-) group. In both panels, connection ratios were compared using Fisher’s exact test. Comparisons were then made across experimental groups. Furthermore, as detailed in our Methods section (page 20, line 399-401), we assessed cortical expression levels prior to performing whole-cell recordings. Taken together, these precautions help ensure that the calculated connection ratios are unlikely to be confounded by differences in infection efficiency.

      It is also unclear if retrograde labeling of D1-SPN- vs D2-SPN- targeting afferents labels the same densities of cortical neurons. This gets to the point of specificity in the behavioral experiments. If the target-based labeling strategies used to introduce channelrhodopsin into specific SPN afferents label significantly different numbers of cortical neurons, might the difference in the relative numbers of optogenetically activated cortical neurons itself lead to behavioral differences? 

      Thank you for bringing this concern to our attention. While optogenetic manipulation has become a widely adopted tool in functional studies of neural circuits, it remains subject to several technical limitations due to the nature of its implementation. Factors such as opsin expression efficiency, optic fiber placement, light intensity, stimulation spread, and other variables can all influence the specificity and extent of neuronal activation or inhibition. As such, rigorous experimental controls are essential when interpreting the outcomes of optogenetic experiments.

      In our study, we verified both the expression of channelrhodopsin in D1- or D2-projecting cortical neurons and the placement of the optic fiber following the completion of behavioral testing. To account for variability, we compared the behavioral effects of optogenetic stimulation within the same animals, stimulated versus non-stimulated conditions, as shown in Figures 3 and 4. Moreover, Figure S3 includes important controls that rule out the possibility that the behavioral effects observed were due to direct activation of D1- or D2-SPNs in striatum or to light alone in the cortex.

      An additional point worth emphasizing is that the behavioral effects observed in the open field and ICSS tests cannot be attributed to differences in the number of neurons activated. Specifically, activation of D1-projecting cortical neurons promoted locomotion in the open field, whereas activation of D2-projecting cortical neurons did not. However, in the ICSS test, activation of both D1- and D2-projecting cortical neurons reinforced lever pressing. Given that only D1-SPN activation, but not D2-SPN activation, supports ICSS behavior, these effects are unlikely to result merely from differences in the number of neurons recruited.

      This rationale underlies our use of multiple behavioral paradigms to examine the functions of D1- and D2-projecting cortical neurons. By assessing behavior across distinct tasks, we aimed to approach the question from multiple angles and reduce the likelihood of spurious or confounding effects influencing our interpretation.

      In general, the manuscript would also benefit from more clarity about the statistical comparisons that were made and sample sizes used to reach their conclusions.

      We thank the reviewer for the valuable suggestion to improve the manuscript. In response, we have made the following changes and provided additional clarification:

      (1) In Figure 2D, we used 7 mice in the D1-projecting to D1 EGFP(+) group, 8 mice in the D1-projecting to D2 EGFP(-) group, 10 mice in the D2-projecting to D2 EGFP(+) group, and 8 mice in the D2-projecting to D1 EGFP(-) group. In Figure 2G, the group sizes were as follows: 8 mice in the D1-projecting to D2 EGFP(+) group, 7 mice in the D1-projecting to D1 EGFP(-) group, 8 mice in the D2-projecting to D1 EGFP(+) group, and 10 mice in the D2-projecting to D2 EGFP(-) group. In both panels, connection ratios were compared using Fisher’s exact test.

      (2) In Figure 3, we reanalyzed the data in panels O, P, R, and S using permutation tests to assess whether each individual group exhibited a significant ICSS learning effect. The figure legend has been revised accordingly as follows:

      (O-P) D1-SPN (red) but not D2-SPN stimulation (black) drives ICSS behavior in both the DMS (O: D1, n = 6, permutation test, slope = 1.5060, P = 0.0378; D2, n = 5, permutation test, slope = -0.2214, P = 0.1021; one-tailed Mann Whitney test, Day 7 D1 vs. D2, P = 0.0130) and the DLS (P: D1, n = 6, permutation test, slope = 28.1429, P = 0.0082; D2, n = 5, permutation test, slope = -0.3429, P = 0.0463; one-tailed Mann Whitney test, Day 7 D1 vs. D2, P = 0.0390). *, P < 0.05. (Q) Timeline of helper virus injections, rabies-ChR2 injections and optogenetic stimulation for ICSS behavior. (R-S) Optogenetic stimulation of the cortical neurons projecting to either D1- or D2-SPNs induces ICSS behavior in both the MCC (R: MCC-D1, n = 5, permutation test, Day1-Day7, slope = 2.5857, P = 0.0034; MCC-D2, n = 5, Day2-Day7, permutation test, slope = 1.4229, P = 0.0344; no significant effect on Day7, MCC-D1 vs. MCC-D2,  two-tailed Mann Whitney test, P = 0.9999) and the M1 (S: M1-D1, n = 5, permutation test, Day1-Day7, slope = 1.8214, P = 0.0259; M1-D2, n = 5, Day1-Day7, permutation test, slope = 1.8214, P = 0.0025; no significant effect on Day7, M1-D1 vs. M1-D2, two-tailed Mann Whitney test, P = 0.3810). n.s., not statistically significant.

      (3) In Figure 4, we have added a comparison against a theoretical percentage change of zero to better evaluate the net effect of each manipulation. The results showed that in Figure 4D, optogenetic stimulation of D1-projecting MCC neurons significantly increased the pressing rate, whereas stimulation of D2-projecting MCC neurons did not (MCC-D1: n = 8, one-sample two-tailed t-test, t = 2.814, P = 0.0131; MCC-D2: n = 7, t = 0.8481, P = 0.4117). In contrast, in Figure 4H, optogenetic stimulation of both D1- and D2-projecting M1 neurons significantly increased the sequence press rate (M1-D1: n = 6, one-sample two-tailed Wilcoxon signed-rank test, P = 0.0046; M1-D2: n = 7, P = 0.0479).

      Reviewer #2 (Public Review):

      Summary: 

      Klug et al. use monosynaptic rabies tracing of inputs to D1- vs D2-SPNs in the striatum to study how separate populations of cortical neurons project to D1- and D2-SPNs. They use rabies to express ChR2, then patch D1-or D2-SPNs to measure synaptic input. They report that cortical neurons labeled as D1-SPN-projecting preferentially project to D1-SPNs over D2-SPNs. In contrast, cortical neurons labeled as D2-SPN-projecting project equally to D1- and D2-SPNs. They go on to conduct pathway-specific behavioral stimulation experiments. They compare direct optogenetic stimulation of D1- or D2-SPNs to stimulation of MCC inputs to DMS and M1 inputs to DLS. In three different behavioral assays (open field, intra-cranial self-stimulation, and a fixed ratio 8 task), they show that stimulating MCC or M1 cortical inputs to D1-SPNs is similar to D1-SPN stimulation, but that stimulating MCC or M1 cortical inputs to D2-SPNs does not recapitulate the effects of D2-SPN stimulation (presumably because both D1- and D2-SPNs are being activated by these cortical inputs). 

      Strengths: 

      Showing these same effects in three distinct behaviors is strong. Overall, the functional verification of the consequences of the anatomy is very nice to see. It is a good choice to patch only from mCherry-negative non-starter cells in the striatum.

      Thank you for your profound understanding and appreciation of our manuscript’s design and the methodologies employed. In the realm of neuroscience, quantifying synaptic connections is a formidable challenge. While the roles of the direct and indirect pathways in motor control have long been explored, the mechanism by which upstream cortical inputs govern these pathways remains shrouded in mystery at the circuitry level.

      In the ‘Go/No-Go’ model, the direct and indirect pathways operate antagonistically; in contrast, the ‘Co-activation’ model suggests that they work cooperatively to orchestrate movement. These distinct theories raise a compelling question: Do these two pathways receive inputs from the same upstream cortical neurons, or are they modulated by distinct subpopulations? Answering this question could provide vital clues as to whether these pathways collaborate or operate independently.

      Previous studies have revealed both differences and similarities in the cortical inputs to direct and indirect pathways at population level. However, our investigation delves deeper to understand how a singular cortical input simultaneously drives these pathways, or might it regulate one pathway through distinct subpopulations? To address this, we employed rabies virus–mediated retrograde tracing from D1- or D2-SPNs and recorded non-starter SPNs to determine if they receive the same inputs as the starter SPNs. This approach allowed us to calculate the connection ratio and estimate the probable connection properties.

      Weaknesses: 

      One limitation is that all inputs to SPNs are expressing ChR2, so they cannot distinguish between different cortical subregions during patching experiments. Their results could arise because the same innervation patterns are repeated in many cortical subregions or because some subregions have preferential D1-SPN input while others do not.

      Thank you for raising this thoughtful concern. It is indeed not feasible to restrict ChR2 expression to a specific cortical region using the first-generation rabies-ChR2 system alone. A more refined approach would involve injecting Cre-dependent TVA and RG into the striatum of D1- or A2A-Cre mice, followed by rabies-Flp infection. Subsequently, a Flp-dependent ChR2 virus could be injected into the MCC or M1 to selectively label D1- or D2-projecting cortical neurons. This strategy would allow for more precise targeting and address many of the current limitations.

      However, a significant challenge lies in the cytotoxicity associated with rabies virus infection. Neuronal health begins to deteriorate substantially around 10 days post-infection, which provides an insufficient window for robust Flp-dependent ChR2 expression. We have tested several new rabies virus variants with extended survival times (Chatterjee et al., 2018; Jin et al., 2024), but unfortunately, they did not perform effectively or suitably in the corticostriatal systems we examined.

      In our experimental design, the aim is to delineate the connectivity probabilities to D1 or D2-SPNs from cortical neurons. Our hypothesis considered includes the possibility that similar innervation patterns could occur across multiple cortical subregions, or that some subregions might show preferential input to D1-SPNs while others do not, or a combination of both scenarios. This leads us to perform a series behavior test that using optogenetic activation of the D1- or D2-projecting cortical populations to see which could be the case.

      In the cortical areas we examined, MCC and M1, during behavioral testing, there is consistency with our electrophysiological results. Specifically, when we stimulated the D1-projecting cortical neurons either in MCC or in M1, mice exhibited facilitated local motion in open field test, which is the same to the activation of D1 SPNs in the striatum along (MCC: Fig 3C & D vs. I; M1: Fig 3F & G vs. L). Conversely, stimulation of D2-projecting MCC or M1 cortical neurons resulted in behavioral effects that appeared to combine characteristics of both D1- and D2-SPNs activation in the striatum (MCC: Fig 3C & D vs. J; M1: Fig 3F & G vs. M). The similar results were observed in the ICSS test. Our interpretation of these results is that the activation of D1-projecting neurons in the cortex induces behavior changes akin to D1 neuron activation, while activation of D2-projecting neurons in the cortex leads to a combined effect of both D1 and D2 neuron activation. This suggests that at least some cortical regions, the ones we tested, follow the hypothesis we proposed.

      There are also some caveats with respect to the efficacy of rabies tracing. Although they only patch non-starter cells in the striatum, only 63% of D1-SPNs receive input from D1-SPN-projecting cortical neurons. It's hard to say whether this is "high" or "low," but one question is how far from the starter cell region they are patching. Without this spatial indication of where the cells that are being patched are relative to the starter population, it is difficult to interpret if the cells being patched are receiving cortical inputs from the same neurons that are projecting to the starter population. Convergence of cortical inputs onto SPNs may vary with distance from the starter cell region quite dramatically, as other mapping studies of corticostriatal inputs have shown specialized local input regions can be defined based on cortical input patterns (Hintiryan et al., Nat Neurosci, 2016, Hunnicutt et al., eLife 2016, Peters et al., Nature, 2021).

      This is a valid concern regarding anatomical studies. Investigating cortico-striatal connectivity at the single-cell level remains technically challenging due to current methodological limitations. At present, we rely on rabies virus-mediated trans-synaptic retrograde tracing to identify D1- or D2-projecting cortical populations. This anatomical approach is coupled with ex vivo slice electrophysiology to assess the functional connectivity between these projection-defined cortical neurons and striatal SPNs. This enables us to quantify connection ratios, for example, the proportion of D1-projecting cortical neurons that functionally synapse onto non-starter D1-SPNs.

      To ensure the robustness of our conclusions, it is essential that both the starter cells and the recorded non-starter SPNs receive comparable topographical input from the cortex and other brain regions. Therefore, we carefully designed our experiments so that all recorded cells were located within the injection site, were mCherry-negative (i.e., non-starter cells), and were surrounded by ChR2-mCherry-positive neurons. This configuration ensured that the distance between recorded and starter cells did not exceed 100 µm, maintaining close anatomical proximity and thereby preserving the likelihood of shared cortical innervation within the examined circuitry.

      These methodological details are also described in the section on ex vivo brain slice electrophysiology, specifically in the Methods section, lines 396–399:

      “D1-SPNs (eGFP-positive in D1-eGFP mice, or eGFP-negative in D2-eGFP mice) or D2-SPNs (eGFP-positive in D2-eGFP mice, or eGFP-negative in D1-eGFP mice) that were ChR2-mCherry-negative, but in the injection site and surrounded by cells expressing ChR2-mCherry were targeted for recording.”

      This experimental strategy was implemented to control for potential spatial biases and to enhance the interpretability of our connectivity measurements.

      A caveat for the optogenetic behavioral experiments is that these optogenetic experiments did not include fluorophore-only controls.

      Thank you for bringing this to our attention. A fluorophore-only control is indeed a valuable negative control, commonly used to rule out effects caused by light exposure independent of optogenetic manipulation. In this study, however, comparisons were made between light-on and light-off conditions within the same animal. This within-subject design, as employed in recent studies (Geddes et al., 2018; Zhu et al., 2025), is considered sufficient to isolate the effects of optogenetic manipulation.

      Furthermore, as shown in Figure S3, we conducted an additional control experiment in which optogenetic stimulation was applied to M1, while ensuring that ChR2 expression was restricted to the striatum via targeted viral infection. This approach serves as a functional equivalent to the control you suggested. Importantly, we observed no effects that could be attributed solely to light exposure, further supporting the conclusion that the observed outcomes in our main experiments are due to targeted optogenetic manipulation, rather than confounding effects of illumination.

      Lastly, by employing an in-animal comparison, measuring changes between stimulated and non-stimulated trials, we account for subject-specific variability and strengthen the interpretability of our findings.

      Another point of confusion is that other studies (Cui et al, J Neurosci, 2021) have reported that stimulation of D1-SPNs in DLS inhibits rather than promotes movement.

      Thank you for bringing the study by Cui and colleagues to our attention. While that study has generated some controversy, other independent investigations have demonstrated that activation of D1-SPNs in DLS facilitates local motion and lever-press behaviors (Dong et al., 2025; Geddes et al., 2018; Kravitz et al., 2010).

      It is still worth to clarify. The differences in behavioral outcomes observed between our study and that of Cui et al. may be attributable to several methodological factors, including differences in both the stereotaxic targeting coordinates and the optical fiber specifications used for stimulation.

      Specifically, in our experiments, the dorsomedial striatum (DMS) was targeted at coordinates AP +0.5 mm, ML ±1.5 mm, DV –2.2 mm, and the DLS at AP +0.5 mm, ML ±2.5 mm, DV –2.2 mm. In contrast, Cui et al. targeted the DMS at AP +0.9 mm, ML ±1.4 mm, DV –3.0 mm and the DLS at AP +0.7 mm, ML ±2.3 mm, DV –3.0 mm. These coordinates correspond to sites that are slightly more rostral and ventral compared to our own. Even subtle differences in anatomical targeting can result in activation of distinct neuronal subpopulations, which may account for the differing behavioral effects observed during optogenetic stimulation.

      In addition, the optical fibers used in the two studies varied considerably. We employed fibers with a 200 µm core diameter and a numerical aperture (NA) of 0.37, whereas Cui et al. used fibers with a 250 µm core diameter and a higher NA of 0.66. The combination of a larger core and higher NA in their setup implies a broader spatial spread and deeper tissue penetration of light, likely resulting in activation of a larger neural volume. This expanded volume of stimulation may have engaged additional neural circuits not recruited in our experiments, further contributing to the divergent behavioral outcomes. Taken together, these differences in targeting and photostimulation parameters are likely key contributors to the distinct effects reported between the two studies.

      Reviewer #3 (Public Review): 

      In the manuscript by Klug and colleagues, the investigators use a rabies virus-based methodology to explore potential differences in connectivity from cortical inputs to the dorsal striatum. They report that the connectivity from cortical inputs onto D1 and D2 MSNs differs in terms of their projections onto the opposing cell type, and use these data to infer that there are differences in cross-talk between cortical cells that project to D1 vs. D2 MSNs. Overall, this manuscript adds to the overall body of work indicating that there are differential functions of different striatal pathways which likely arise at least in part by differences in connectivity that have been difficult to resolve due to difficulty in isolating pathways within striatal connectivity and several interesting and provocative observations were reported. Several different methodologies are used, with partially convergent results, to support their main points.

      However, I have significant technical concerns about the manuscript as presented that make it difficult for me to interpret the results of the experiments. My comments are below.

      Major:

      There is generally a large caveat to the rabies studies performed here, which is that both TVA and the ChR2-expressing rabies virus have the same fluorophore. It is thus essentially impossible to determine how many starter cells there are, what the efficiency of tracing is, and which part of the striatum is being sampled in any given experiment. This is a major caveat given the spatial topography of the cortico-striatal projections. Furthermore, the authors make a point in the introduction about previous studies not having explored absolute numbers of inputs, yet this is not at all controlled in this study. It could be that their rabies virus simply replicates better in D1-MSNs than D2-MSNs. No quantifications are done, and these possibilities do not appear to have been considered. Without a greater standardization of the rabies experiments across conditions, it is difficult to interpret the results.

      We thank the reviewer for raising these questions, which merit further discussion.

      Firstly, the primary aim of our study is to investigate the connectivity of the corticostriatal pathway. Given the current technical limitations, it is not feasible to trace all the striatal SPNs connected to a single cortical neuron. Therefore, we approached this from the opposite direction, starting from D1- or D2-SPNs to retrogradely label upstream cortical neurons, and then identifying their connected SPNs via functional synaptic recordings. To achieve this, we employed the only available transsynaptic retrograde method: rabies virus-mediated tracing. Because we crossed D1- or D2-GFP mice with D1- or A2A-Cre mice to identify SPN subtypes during electrophysiological recordings, the conventional rabies-GFP system could not be used to distinguish starter cells without conflicting with the GFP labeling of SPNs. To overcome this, we tagged ChR2 expression with mCherry. In this setup, we recorded from mCherry-negative D1- or D2-SPNs within the injection site and surrounded by mCherry-positive neurons. This ensures that the recorded neurons are topographically matched to the starter cell population and receive input from the same cortical regions. We acknowledge that TVA-only and ChR2-expressing cells are both mCherry-positive and therefore indistinguishable in our system. As such, mCherry-positive cells likely comprise a mixture of starter cells and TVA-only cells, representing a somewhat broader population than starter cells alone. Nevertheless, by restricting recordings to mCherry-negative SPNs within the injection site, it is ensured that our conclusions about functional connectivity remain valid and aligned with the primary objective of this study.

      Secondly, if rabies virus replication were significantly more efficient in D1-SPNs than in D2-SPNs, this would likely result in a higher observed connection probability in the D1-projecting group. However, we used consistent genetic strategies across all groups: D1-SPNs were defined as GFP-positive in D1-GFP mice and GFP-negative in D2-GFP mice, with D2-SPNs defined analogously. Recordings from both D1- and D2-SPNs were performed using the same methodology and under the same injection conditions within the same animals. This internal control helps mitigate the possibility that differential rabies infection efficiency biased our results.

      With these experimental safeguards in place, we found that 40% of D2-SPNs received input from D1-SPN-projecting cortical neurons, while 73% of D1-SPNs received input from D2-SPN-projecting cortical neurons. Although the ideal scenario would involve an even larger sample size to refine these estimates, the technical demands of post-rabies-infection electrophysiological recordings inherently limit throughput. Nonetheless, our approach represents the most feasible and accurate method currently available, and provides a significant advance in characterizing the functional connectivity within corticostriatal circuits.

      The authors claim using a few current clamp optical stimulation experiments that the cortical cells are healthy, but this result was far from comprehensive. For example, membrane resistance, capacitance, general excitability curves, etc are not reported. In Figure S2, some of the conditions look quite different (e.g., S2B, input D2-record D2, the method used yields quite different results that the authors write off as not different). Furthermore, these experiments do not consider the likely sickness and death that occurs in starter cells, as has been reported elsewhere. The health of cells in the circuit is overall a substantial concern that alone could invalidate a large portion, if not all, of the behavioral results. This is a major confound given those neurons are thought to play critical roles in the behaviors being studied. This is a major reason why first-generation rabies viruses have not been used in combination with behavior, but this significant caveat does not appear to have been considered, and controls e.g., uninfected animals, infected with AAV helpers, etc, were not included.

      We understand and appreciate the reviewer’s concern regarding the potential cytotoxicity of rabies virus infection. Indeed, this is a critical consideration when interpreting functional connectivity data. We have tested several newer rabies virus variants reported to support extended survival times (Chatterjee et al., 2018; Jin et al., 2024), but unfortunately, these variants did not perform reliably in the corticostriatal circuits we examined.

      Given these limitations, we relied on the rabies virus approach originally developed by Osakada et al. (Osakada et al., 2011), which demonstrated that neurons infected with rabies virus expressing ChR2 remain both viable and functional up to at least 10 days post-infection (Fig. 3, cited below). In our own experiments, we further validated the health and viability of cortical neurons, the presynaptic partners of SPNs, particularly around day 7 post-infection.

      To minimize the risk of viral toxicity, we performed ex vivo slice recordings within a conservative time window, between 4 and 8 days after infection, when the health of labeled neurons is well maintained. Moreover, the recorded SPNs were consistently mCherry-negative, indicating they were not directly infected by rabies virus, thus further reducing the likelihood of recording from compromised cells.

      Taken together, these steps help ensure that our synaptic recordings reflect genuine functional connectivity, rather than artifacts of viral toxicity. We hope this clarifies the rationale behind our experimental design.

      For the behavioral tests, including a naïve uninfected group and an AAV helper virus-only group as negative controls could be beneficial to isolate the specific impact of rabies virus infection. However, our primary focus is on the activation of selected presynaptic inputs to D1- or D2-SPNs by optogenetic method. Therefore, comparing stimulated versus non-stimulated trials within the same animal offers more direct and relevant results for our study objectives.

      It is also important to note that the ICSS test is particularly susceptible to the potential cytotoxic effects of rabies virus, as it spans a relatively extended period, from Day 4 to Day 12 post-infection. To mitigate this issue, we focused our analysis on the first 7 days of ICSS testing, thereby keeping the behavioral observations within 10 days post-rabies injection. This approach minimizes potential confounds from rabies-induced neurotoxicity while still capturing the relevant behavioral dynamics. Accordingly, we have revised Figure 3 and updated the statistical analyses to reflect this adjustment.

      The overall purity (e.g., EnvA-pseudotyping efficiency) of the RABV prep is not shown. If there was a virus that was not well EnvA-pseudotyped and thus could directly infect cortical (or other) inputs, it would degrade specificity.

      We agree that anatomical specificity is crucial for accurately labeling inputs to defined SPN populations in our study. The rabies virus strain employed here has been rigorously validated for its specificity in numerous previous studies from our group and others (Aoki et al., 2019; Klug et al., 2018; Osakada et al., 2011; Smith et al., 2016; Wall et al., 2013; Wickersham et al., 2007). For example, in a recent study by Aoki et al. (Aoki et al., 2019), we tested the same rabies virus strain by co-injecting the glycoprotein-deleted rabies virus and the TVA-expressing helper virus, without glycoprotein expressing AAV, into the SNr. As shown in Figure S1 (related to Figure 2), GFP expression was restricted to starter cells within the SNr, with no evidence of transsynaptic labeling in upstream regions such as the striatum, EPN, GPe, or STN (see panels F–H). These findings provide strong evidence that the rabies virus used in our experiments is properly pseudotyped and exhibits high specificity for starter cell labeling without off-target spread.

      We appreciate the reviewer’s emphasis on specificity, and we hope this clarification further supports the reliability of our anatomical tracing approach.

      While most of the study focuses on the cortical inputs, in slice recordings, inputs from the thalamus are not considered, yet likely contribute to the observed results. Related to this, in in vivo optogenetic experiments, technically, if the thalamic or other inputs to the dorsal striatum project to the cortex, their method will not only target cortical neurons but also terminals of other excitatory inputs. If this cannot be ruled it, stating that the authors are able to selectively activate the cortical inputs to one or the other population should be toned down.

      We agree with the reviewer that the thalamus is also a significant source of excitatory input to the striatum. However, current techniques do not allow for precise and exclusive labeling of upstream neurons in a given brain region, such as the cortex or thalamus. This technical limitation indeed makes it difficult to definitively determine whether inputs from these regions follow the same projection rules. Despite this, our findings show that stimulation of defined cortical populations, specifically, D1- or D2-projecting neurons in MCC and M1, elicits behavioral outcomes that closely mirror those observed in our ex vivo slice recordings, providing strong support for the cortical origin of the effects we observed.

      In our in vivo optogenetic experiments, we acknowledge that stimulating a specific cortical region may also activate axonal terminals from rabies-infected cortical or thalamic neurons. While somatic stimulation is generally more effective than terminal stimulation, we recognize the possibility that terminals on non-rabies-traced cortical neurons could be activated through presynaptic connections. To address this, we considered the finding of a previous study (Cruikshank et al., 2010), which demonstrated that while brief optogenetic stimulation (0.05 ms) of thalamo-cortical terminals can elicit few action potentials in postsynaptic cortical neurons, sustained terminal stimulation (500 ms) also results in only transient postsynaptic firing rather than prolonged activation (Fig. 3C, cited below). This suggests that cortical neurons exhibit only short-lived responses to continuous presynaptic stimulation of thalamic origin.

      In comparison, our behavioral paradigms employed prolonged optogenetic stimulation protocols- 20 Hz, 10 ms pulses for 15 s (open-field test), 1 s (ICSS), and 8 s (FR4/8)—which more closely resemble sustained stimulation conditions. Given these parameters, and the robust behavioral responses observed, it means that the effects are primarily mediated by activation of rabies-labeled, ChR2-expressing D1- or D2-projecting cortical neurons rather than indirect activation through thalamic input.

      We appreciate the reviewer’s valuable comment, and we have now incorporated this point into the revised manuscript (page 13, line 265 to 275) to more clearly address the potential contribution of thalamic inputs in our experimental design.

      The statements about specificity of connectivity are not well-founded. It may be that in the specific case where they are assessing outside of the area of injections, their conclusions may hold (e.g., excitatory inputs onto D2s have more inputs onto D1s than vice versa). However, how this relates to the actual site of injection is not clear. At face value, if such a connectivity exists, it would suggest that D1-MSNs receive substantially more overall excitatory inputs than D2s. It is thus possible that this observation would not hold over other spatial intervals. This was not explored and thus the conclusions are over-generalized. e.g., the distance from the area of red cells in the striatum to recordings was not quantified, what constituted a high level of cortical labeling was not quantified, etc. Without more rigorous quantification of what was being done, it is difficult to interpret the results. 

      We sincerely thank the reviewer for the thoughtful comments and critical insights into our interpretation of connectivity data. These concerns are valid and provide an important opportunity to clarify and reinforce our experimental design and conclusions.

      Firstly, as described in our previous response, all patched neurons were carefully selected to be within the injection site and in close proximity to ChR2-mCherry-positive cells. Specifically, the estimated distance from each recorded neuron to the nearest starter cells did not exceed 100 µm. This design choice was made to minimize variability associated with spatial distance or heterogeneity in viral expression, thereby allowing for a more consistent sampling of putatively connected neurons.

      Secondly, quantifying both the number of starter and input neurons would, in principle, provide a more comprehensive picture of connectivity. However, given the technical limitations of the current approach particularly when combining rabies tracing with functional recordings it is not feasible to obtain such precise cell counts. Instead, we focused on connection ratios derived from targeted electrophysiological recordings, which offer a reliable and practical means of estimating connectivity within these defined circuits.

      Thirdly, regarding the potential influence of rabies-labeled neurons beyond the immediate recording site: while we acknowledge that rabies tracing labels a broad set of upstream neurons, our analysis was confined to a well-defined and localized area. The analogy we find helpful here is that of a spotlight - our recordings were restricted to the illuminated region directly under the beam, where the projection pattern is fixed and interpretable, regardless of what lies outside that area. Although we cannot fully account for all possible upstream connections, our methodology was designed to minimize variability and maintain consistency in the region of interest, which we believe supports the robustness of our conclusions in the ex vivo slice recording experiment.

      We hope this additional explanation addresses the reviewer’s concerns and helps clarify the rationale of our experimental strategy.

      The results in figure 3 are not well controlled. The authors show contrasting effects of optogenetic stimulation of D1-MSNs and D2-MSNs in the DMS and DLS, results which are largely consistent with the canon of basal ganglia function. However, when stimulating cortical inputs, stimulating the inputs from D1-MSNs gives the expected results (increased locomotion) while stimulating putative inputs to D2-MSNs had no effect. This is not the same as showing a decrease in locomotion - showing no effect here is not possible to interpret.

      We apologize for any confusion and appreciate the opportunity to clarify this point. Our electrophysiological recordings demonstrated that D1-projecting cortical neurons preferentially innervate D1-SPNs in the striatum, whereas D2-projecting cortical neurons provide input to both D1- and D2-SPNs, without a clear preference. These synaptic connectivity patterns are further supported by our behavioral experiments: optogenetic stimulation of D1-projecting neurons in cortical areas such as MCC and M1 led to behavioral effects consistent with direct D1-SPN activation. In contrast, stimulation of D2-projecting cortical neurons produced behavioral outcomes that appeared to reflect a mixture of both D1- and D2-SPN activation.

      We acknowledge that interpreting negative behavioral findings poses inherent challenges, as it is difficult to distinguish between a true lack of effect and insufficient experimental manipulation. To mitigate this, we ensured that all animals included in the analysis exhibited appropriate viral expression and correctly placed optic fibers in the targeted regions. These controls help to confirm that the observed behavioral effects - or lack thereof - are indeed due to the activation of the intended neuronal populations rather than technical artifacts such as weak expression or fiber misplacement.

      As shown in Author response image 1 below, our verification of virus expression and fiber positioning confirms effective targeting in MCC and M1 of A2A-Cre mice. Therefore, we interpret the negative behavioral outcomes as meaningful consequences of specific neural circuit activation.

      Author response image 1.

      Confocal image from A2A-Cre mouse showing targeted optogenetic stimulation of D2-projecting cortical neurons in MCC or M1. ChR2-mCherry expression highlights D2-projecting neurons, selectively labeled via rabies-mediated tracing. Optic fiber placement is confirmed above the cortical region of interest. Image illustrates robust expression and anatomical specificity necessary for pathway-selective stimulation in behavioral assays.

      In light of their circuit model, the result showing that inputs to D2-MSNs drive ICSS is confusing. How can the authors account for the fact that these cells are not locomotor-activating, stimulation of their putative downstream cells (D2-MSNs) does not drive ICSS, yet the cortical inputs drive ICSS? Is the idea that these inputs somehow also drive D1s? If this is the case, how do D2s get activated, if all of the cortical inputs tested net activate D1s and not D2s? Same with the results in figure 4 - the inputs and putative downstream cells do not have the same effects. Given the potential caveats of differences in viral efficiency, spatial location of injections, and cellular toxicity, I cannot interpret these experiments.

      We apologize for any confusion in our previous explanation. In our behavioral experiments, the primary objective was to determine whether activation of D1- or D2-projecting cortical neurons would produce behavioral outcomes distinct from those observed with pure D1 or D2 activation.

      Our findings show that stimulation of D1-projecting cortical neurons produced behavioral effects closely resembling those of selective D1 activation in both open field and ICSS tests. This is consistent with our slice recording data, which revealed that D1-projecting cortical neurons exhibit a higher connection probability with D1-SPNs than with D2-SPNs.

      In contrast, interpreting the effects of D2-projecting cortical neuron stimulation is inherently more nuanced. In the open field test, activation of these neurons did not significantly modulate local motion. This could reflect a balanced influence of D1 activation, which facilitates movement, and D2 activation, which suppresses it - resulting in a net neutral behavioral outcome. In the ICSS test, the absence of a strong reinforcement effect typically associated with D2 activation, combined with partial reinforcement likely due to concurrent D1 activation, suggests that stimulation of D2-projecting neurons produces a mixed behavioral signal. This outcome supports the interpretation that these neurons synapse onto both D1- and D2-SPNs, leading to a blended behavioral response that differs from selective D1 or D2 activation alone.

      Together, these two behavioral assays offer complementary perspectives, providing a more complete view of how projection-specific cortical inputs influence striatal output and behavior.

      In Figure 4 of the current manuscript (as cited below), we show that optogenetic activation of MCC neurons projecting to D1-SPNs facilitates sequence lever pressing, whereas activation of MCC neurons projecting to D2-SPNs does not induce significant behavioral changes. Conversely, activation of M1 neurons projecting to either D1- or D2-SPNs enhances lever pressing sequences. These observations align with our prior findings (Geddes et al., 2018; Jin et al., 2014), where we demonstrated that in the striatum, D1-SPN activation facilitates ongoing lever pressing, whereas D2-SPN activation is more involved in suppressing ongoing actions and promoting transitions between sub-sequences, shown in Fig. 4 from (Geddes et al., 2018; Jin et al., 2014) and Fig. 5K from (Jin et al., 2014) . Taken together, the facilitation of lever pressing by D1-projecting MCC and M1 neurons is consistent with their preferential connectivity to D1-SPNs and their established behavioral role.

      What is particularly intriguing, though admittedly more complex, is the behavioral divergence observed upon activation of D2-SPN-projecting cortical neurons. Activation of D2-projecting MCC neurons does not alter lever pressing, possibly reflecting a counterbalancing effect from concurrent D1- and D2-SPN activation. In contrast, stimulation of D2-projecting M1 neurons facilitates lever pressing, albeit less robustly than their D1-projecting counterparts. This discrepancy may reflect regional differences in striatal targets, DMS for MCC versus DLS for M1, as also supported by our open field test results. Furthermore, our recent findings (Zhang et al., 2025) show that synaptic strength from Cg to D2-SPNs is stronger than to D1-SPNs, whereas the M1 pathway exhibits the opposite pattern. These data suggest that beyond projection ratios, synaptic strength also shapes cortico-striatal functional output. Thus, stronger D2-SPN synapses in the DMS may offset D1-SPN activation during MCC-D2 stimulation, dampening lever pressing increase. Conversely, weaker D2 synapses in the DLS may permit M1-D2 projections to facilitate behavior more readily.

      In summary, the behavioral outcomes of our optogenetic manipulations support the proposed asymmetric cortico-striatal connectivity model. While the effects of D2-projecting neurons are not uniform, they reflect varying balances of D1 and D2-SPN influence, which further underscores the asymmetrical connections of cortical inputs to the striatum.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors): 

      (1) What are the sample sizes for Fig S2? Some trends that are listed as nonsignificant look like they may just be underpowered. Related to this point, S2C indicates that PPR is statistically similar in all conditions. The traces shown in Figure 2 suggest that PPR is quite different in "Input D1"- vs "Input D2" projections. If there is indeed no difference, the exemplar traces should be replaced with more representative ones to avoid confusion. 

      Thank you for your suggestion. The sample size reported in Figure S2 corresponds to the neurons identified as connected in Figure 2. The representative traces shown in Figure 2 were selected based on their close alignment with the amplitude statistics and are intended to reflect typical responses. Given this, it is appropriate to retain the current examples as they accurately illustrate the underlying data.

      (2) Previous studies have described that SPN-SPN collateral inhibition is also asymmetric, with D2->D1 SPN connectivity stronger than the other direction. While cortical inputs to D2-SPNs may also strongly innervate D1-SPNs, it would be helpful to speculate on how collateral inhibition may further shape the biases (or lack thereof) reported here. 

      This would indeed be an interesting topic to explore. SPN-SPN mutual inhibition and/or interneuron inhibition may also play a role in the functional organization and output of the striatum. In the present study, we focused on the primary layer of cortico-striatal connectivity to examine how cortical neurons selectively connect to the striatal direct and indirect pathways, as these pathways have been shown to have distinct yet cooperative functions. To achieve this, we applied a GABAA receptor inhibitor to isolate only excitatory synaptic currents in SPNs, yielding the relevant results.

      To investigate additional circuit organization involving SPN-SPN mutual inhibition, the current available technique would involve single-cell initiated rabies tracing. This approach would help identify the starter SPN and the upstream SPNs that provide input to the starter cell, thereby offering a clearer understanding of the local circuit.

      (3) In Fig 3N-S there are no stats confirming that optogenetic stimulation does indeed increase lever pressing in each group (though it obviously looks like it does). It would be helpful to add statistics for this comparison, in addition to the between-group comparisons that are shown. 

      We thank the reviewer for this thoughtful suggestion. To assess whether optogenetic stimulation increases lever pressing in each group shown in Figures 3O, 3P, 3R, and 3S, we employed a permutation test (10,000 permutations). This non-parametric statistical method does not rely on assumptions about the underlying data distribution and is particularly appropriate for our analysis given the relatively small sample sizes.

      Additionally, in response to Reviewer 3’s concern regarding the potential cytotoxicity of rabies virus affecting behavioral outcomes during in vivo optogenetic stimulation experiments, we focused our analysis on Days 1 through 7 of the ICSS test. This time window remains within 10 days post-rabies infection, a period during which previous studies have reported minimal cytopathic effects (Osakada et al., 2011).

      Accordingly, we have updated Figure 3N-S and revised the associated statistical analyses in the figure legend as follows:

      (O-P) D1-SPN (red) but not D2-SPN stimulation (black) drives ICSS behavior in both the DMS (O: D1, n = 6, permutation test, slope = 1.5060, P = 0.0378; D2, n = 5, permutation test, slope = -0.2214, P = 0.1021; one-tailed Mann Whitney test, Day 7 D1 vs. D2, P = 0.0130) and the DLS (P: D1, n = 6, permutation test, slope = 28.1429, P = 0.0082; D2, n = 5, permutation test, slope = -0.3429, P = 0.0463; one-tailed Mann Whitney test, Day 7 D1 vs. D2, P = 0.0390). *, P < 0.05. (Q) Timeline of helper virus injections, rabies-ChR2 injections and optogenetic stimulation for ICSS behavior. (R-S) Optogenetic stimulation of the cortical neurons projecting to either D1- or D2-SPNs induces ICSS behavior in both the MCC (R: MCC-D1, n = 5, permutation test, Day1-Day7, slope = 2.5857, P = 0.0034; MCC-D2, n = 5, Day2-Day7, permutation test, slope = 1.4229, P = 0.0344; no significant effect on Day7, MCC-D1 vs. MCC-D2,  two-tailed Mann Whitney test, P = 0.9999) and the M1 (S: M1-D1, n = 5, permutation test, Day1-Day7, slope = 1.8214, P = 0.0259; M1-D2, n = 5, Day1-Day7, permutation test, slope = 1.8214, P = 0.0025; no significant effect on Day7, M1-D1 vs. M1-D2, two-tailed Mann Whitney test, P = 0.3810). n.s., not statistically significant.

      We believe this updated analysis and additional context further strengthen the validity of our conclusions regarding the reinforcement effects.

      (4) Line 206: mice were trained for "a few more days" is not a very rigorous description. It would be helpful to state the range of additional days of training. 

      We thank the reviewer for the suggestion. In accordance with the Methods section, we have now specified the number of days, which is 4 days, in the main text (line 207).

      (5) In Fig 4D,H, the statistical comparison is relative modulation (% change) by stimulation of D1- vs D2- projecting inputs. Please show statistics comparing the effect of stimulation on lever presses for each individual condition. For example, is the effect of MCC-D2 stimulation in panel D negative or not significant? 

      Thank you for your suggestion. Below are the statistical results, which we have also incorporated into the figure legend for clarity. To assess the net effects of each manipulation, we compared the observed percentage changes with a theoretical value of zero.

      In Figure 4D, optogenetic stimulation of D1-projecting MCC neurons significantly increased the pressing rate (MCC-D1, n = 8, one-sample two-tailed t-test, t = 2.814, P = 0.0131), whereas stimulation of D2-projecting MCC neurons did not produce a significant effect (MCC-D2, n = 7, one-sample two-tailed t-test, t = 0.8481, P = 0.4117).

      In contrast, Figure 4H shows that optogenetic stimulation of both D1- and D2-projecting M1 neurons significantly increased the sequence press rate (M1-D1, n = 6, one-sample two-tailed Wilcoxon signed-rank test, P = 0.0046; M1-D2, n = 7, one-sample two-tailed Wilcoxon signed-rank test, P = 0.0479).

      These analyses help clarify the distinct behavioral effects of manipulating different corticostriatal projections.

      (6) Are data in Fig 1G-H from a D1- or A2a- cre mouse? 

      The data in Fig 1G-H are from a D1-Cre mouse.

      (7) In Fig S3 it looks like there may actually be an effect of 20Hz simulation of D2-SPNs. Though it probably doesn't affect the interpretation. 

      As indicated by the statistics, there is a slight, but not statistically significant, decrease in local motion when 20 Hz stimulation is delivered to the motor cortex with ChR2 expression in D2-SPNs in the striatum.

      Reviewer #2 (Recommendations For The Authors): 

      The rabies tracing is referred to on several occasions as "new" but the reference papers are from 2011, 2013, and 2018. It is unclear what is new about the system used in the paper and what new feature is relevant to the experiments that were performed. Either clarify or remove "new" terminology. 

      Thank you for bringing this to our attention. We have revised the relevant text accordingly at line 20 in the Abstract, line 31 in the In Brief, line 69 in the Introduction, line 83 in the Results, and line 226 in the Discussion to improve clarity and accuracy.

      In Figure 2 D and G, D1 eGFP (+) and D2 eGFP(-) are plotted separately. These are the same cell type; therefore it may work best to combine that data. This could also be done for 'input to D2- Record D2' in panel D as well as 'input D1-Record D2' and 'input D2-Record D1' in panel G. Combining the information in panel D and G and comparing all 4 conditions to each other would give a better understanding of the comparison of functional connectivity between cortical neurons and D1 and D2 SPNs. 

      We thank the reviewer for the thoughtful suggestion. While presenting single bars for each condition (e.g., ‘input D1 - record D1’) might improve visual simplicity, it would obscure an important aspect of our experimental design. Specifically, we aimed to highlight that the comparisons between D1- and D2-projecting neurons to D1 and D2 SPNs were counterbalanced within the same animals - not just across different groups. By showing both D1-eGFP(+) and D2-eGFP(-), or vice versa, within each group and at similar proportions, we provide a more complete picture of the internal control built into our design. This format helps ensure the audience that our conclusions are not biased by group-level differences, but are supported by within-subject comparisons. Therefore, that the current presentation better could serve to communicate the rigor and balance of our experimental approach.

      The findings in Figure 2 are stated as D1 projecting excitatory inputs have a higher probability of targeting D1 SPNs while D2 projecting excitatory inputs target both D1 SPNs and D2 SPNs. It may be more clear to say that some cortical neurons project specifically to D1 SPNs while other cortical neurons project to both D1 and D2 SPNs equally. A better summary diagram could also help with clarity. 

      Thank you for bringing this up. The data we present reflect the connection probabilities of D1- or D2-projecting cortical neurons to D1 or D2 SPNs. One possible interpretation is like the reviewer said that a subset of cortical neurons preferentially target D1 SPNs, while others exhibit more balanced projections to both D1 and D2 SPNs. However, we cannot rule out alternative explanations - for example, that some D2-projecting neurons preferentially target D2 SPNs, or that the observed differences arise from the overall proportions of D1- and D2-projecting cortical neurons connecting to each striatal subtype.

      There are multiple possible patterns of connectivity that could give rise to the observed differences in connection ratios. Based on our current data, we can confidently conclude the existence of asymmetric cortico-striatal projections to the direct and indirect pathways, but the precise nature of this asymmetry will require further investigation.

      Figure 4 introduces the FR8 task, but there are similar takeaways to the findings from Figure 3. Is there another justification for the FR8 task or interesting way of interpreting that data that could add richness to the manuscript?

      The FR8 task is a self-initiated operant sequence task that relies on motor learning mechanisms, whereas the open field test solely assesses spontaneous locomotion. Furthermore, the sequence task enables us to dissect the functional role of specific neuronal populations in the initiation, maintenance, and termination of sequential movements through closed-loop optogenetic manipulations integrated into the task design. These methodological advantages underscore the rationale for including Figure 4 in the manuscript, as it highlights the unique insights afforded by this experimental paradigm.

      I am somewhat surprised to see that D1-SPN stimulation in DLS gave the results in Figure 3 F and P, as mentioned in the public review. These contrast with some previous results (Cui et al, J Neurosci, 2021). Any explanation? Would be useful to speculate or compare parameters as this could have important implications for DLS function.

      Thank you for raising this point. While Cui’s study has generated some debate, several independent investigations have consistently demonstrated that stimulation of D1-SPNs in the dorsolateral striatum (DLS) facilitates local motion and lever-press behaviors (Dong et al., 2025; Geddes et al., 2018; Kravitz et al., 2010). These findings support the functional role of D1-SPNs in promoting movement and motivated actions.

      The differences in behavioral outcomes observed between our study and that of Cui et al. may stem from several methodological factors, particularly related to anatomical targeting and optical stimulation parameters.

      Specifically, our experiments targeted the DMS at AP +0.5 mm, ML ±1.5 mm, DV –2.2 mm, and the DLS at AP +0.5 mm, ML ±2.5 mm, DV –2.2 mm. In contrast, Cui’s study targeted the DMS at AP +0.9 mm, ML ±1.4 mm, DV –3.0 mm, and the DLS at AP +0.7 mm, ML ±2.3 mm, DV –3.0 mm. These differences indicate that their targeting was slightly more rostral and more ventral than ours, which could have led to stimulation of distinct neuronal populations within the striatum, potentially accounting for variations in behavioral effects observed during optogenetic activation.

      In addition, the optical fibers used in the two studies differed markedly. We employed optical fibers with a 200 µm core diameter and a numerical aperture (NA) of 0.37. Cui’s study used fibers with a larger core diameter (250 µm) and a higher NA (0.66), which would produce a broader spread and deeper penetration of light. This increased photostimulation volume may have recruited a more extensive network of neurons, possibly including off-target circuits, thus influencing the behavioral outcomes in a manner not seen in our more spatially constrained stimulation paradigm.

      Taken together, these methodological differences, both in anatomical targeting and optical stimulation parameters, likely contribute to the discrepancies in behavioral results observed between the two studies. Our findings, consistent with other independent reports, support the role of D1-SPNs in facilitating movement and reinforcement behaviors under more controlled and localized stimulation conditions.

      Reviewer #3 (Recommendations For The Authors): 

      Minor: 

      The authors repeatedly state that they are using a new rabies virus system, but the system has been in widespread use for 16 years, including in the exact circuits the authors are studying, for over a decade. I would not consider this new. 

      Thank you for bringing this to our attention. We have revised the relevant text accordingly at line 20 in the Abstract, line 31 in the In Brief, line 69 in the Introduction, line 83 in the Results, and line 226 in the Discussion to improve clarity and accuracy.

      Figure 2G, how many mice were used for recordings?

      In Fig. 2G, we used 8 mice in the D1-projecting to D2 EGFP(+) group, 7 mice in the D1-projecting to D1 EGFP(-) group, 8 mice in the D2-projecting to D1 EGFP(+) group, and 10 mice in the D2-projecting to D2 EGFP(-) group.

      The amplitude of inputs was not reported in figure 2. This is important, as the strength of the connection matters. This is reported in Figure S2, but how exactly this relates to the presence or absence of connections should be made clearer.

      The amplitude data presented in Figure S2 summarize all recorded currents from confirmed connections, as detailed in the Methods section. A connection is defined by the presence of a detectable and reliable postsynaptic current with an onset latency of less than 10 ms following laser stimulation.

      Reference in the reply-to-review comments:

      Aoki, S., Smith, J.B., Li, H., Yen, X.Y., Igarashi, M., Coulon, P., Wickens, J.R., Ruigrok, T.J.H., and Jin, X. (2019). An open cortico-basal ganglia loop allows limbic control over motor output via the nigrothalamic pathway. Elife 8, e49995.

      Chatterjee, S., Sullivan, H.A., MacLennan, B.J., Xu, R., Hou, Y.Y., Lavin, T.K., Lea, N.E., Michalski, J.E., Babcock, K.R., Dietrich, S., et al. (2018). Nontoxic, double-deletion-mutant rabies viral vectors for retrograde targeting of projection neurons. Nat Neurosci 21, 638-646.

      Cruikshank, S.J., Urabe, H., Nurmikko, A.V., and Connors, B.W. (2010). Pathway-Specific Feedforward Circuits between Thalamus and Neocortex Revealed by Selective Optical Stimulation of Axons. Neuron 65, 230-245.

      Dong, J., Wang, L.P., Sullivan, B.T., Sun, L.X., Smith, V.M.M., Chang, L.S., Ding, J.H., Le, W.D., Gerfen, C.R., and Cai, H.B. (2025). Molecularly distinct striatonigral neuron subtypes differentially regulate locomotion. Nat Commun 16, 2710.

      Geddes, C.E., Li, H., and Jin, X. (2018). Optogenetic Editing Reveals the Hierarchical Organization of Learned Action Sequences. Cell 174, 32-43.

      Jin, L., Sullivan, H.A., Zhu, M., Lavin, T.K., Matsuyama, M., Fu, X., Lea, N.E., Xu, R., Hou, Y.Y., Rutigliani, L., et al. (2024). Long-term labeling and imaging of synaptically connected neuronal networks in vivo using double-deletion-mutant rabies viruses. Nat Neurosci 27, 373-383.

      Jin, X., Tecuapetla, F., and Costa, R.M. (2014). Basal ganglia subcircuits distinctively encode the parsing and concatenation of action sequences. Nat Neurosci 17, 423-430.

      Klug, J.R., Engelhardt, M.D., Cadman, C.N., Li, H., Smith, J.B., Ayala, S., Williams, E.W., Hoffman, H., and Jin, X. (2018). Differential inputs to striatal cholinergic and parvalbumin interneurons imply functional distinctions. Elife 7, e35657.

      Kravitz, A.V., Freeze, B.S., Parker, P.R.L., Kay, K., Thwin, M.T., Deisseroth, K., and Kreitzer, A.C. (2010). Regulation of parkinsonian motor behaviours by optogenetic control of basal ganglia circuitry. Nature 466, 622-626.

      Osakada, F., Mori, T., Cetin, A.H., Marshel, J.H., Virgen, B., and Callaway, E.M. (2011). New Rabies Virus Variants for Monitoring and Manipulating Activity and Gene Expression in Defined Neural Circuits. Neuron 71, 617-631.

      Smith, J.B., Klug, J.R., Ross, D.L., Howard, C.D., Hollon, N.G., Ko, V.I., Hoffman, H., Callaway, E.M., Gerfen, C.R., and Jin, X. (2016). Genetic-Based Dissection Unveils the Inputs and Outputs of Striatal Patch and Matrix Compartments. Neuron 91, 1069-1084.

      Wall, N.R., De La Parra, M., Callaway, E.M., and Kreitzer, A.C. (2013). Differential Innervation of Direct- and Indirect-Pathway Striatal Projection Neurons. Neuron 79, 347-360.

      Wickersham, I.R., Lyon, D.C., Barnard, R.J.O., Mori, T., Finke, S., Conzelmann, K.K., Young, J.A.T., and Callaway, E.M. (2007). Monosynaptic restriction of transsynaptic tracing from single, genetically targeted neurons. Neuron 53, 639-647.

      Zhang, B.B., Geddes, C.E., and Jin, X. (2025) Complementary corticostriatal circuits orchestrate action repetition and switching. Sci Adv, in press.

      Zhu, Z.G., Gong, R., Rodriguez, V., Quach, K.T., Chen, X.Y., and Sternson, S.M. (2025). Hedonic eating is controlled by dopamine neurons that oppose GLP-1R satiety. Science 387, eadt0773.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Reviews):

      Weaknesses: 

      Overall I find the data presented compelling, but I feel that the number of observations is quite low (typically n=3-7 neurons, typically one per animal). While I understand that only a few slices can be obtained for the IPN from each animal, the strength of the novel findings would be more convincing with more frequent observations (larger n, more than one per animal). The findings here suggest that the authors have identified a novel mechanism for the normal function of neurotransmission in the IPN, so it would be expected to be observable in almost any animal. Thus,  it is not clear to me why the authors investigated so few neurons per slice and chose to combine different treatments into one group (e.g. Figure 2f), even if the treatments have the same expected effect.  

      This is a well taken suggestion. However, we must  point out that we do perform statistical analyses on the original datasets and we believe that our conclusions are justified as acknowledged by the Reviewer. As the Reviewer is aware,  the IPN is a small nucleus and with the slicing protocol used, we typically attain 1-2 slices per mouse that are suitable for recordings. Since most of the experiments in the manuscript deals with some form of pharmacological interrogation, we were reticent to use slices that are not naïve and therefore in general did not perform more than 1 cell recording per slice. Having said this, to comply with the Reviewer’s suggestion we have now performed additional experiments to increase the n number for certain experiments. We have amended all figures and legends to incorporate the additional data. We must point out that during the replotting of the data in the summary Figure 8i (previously Figure 7i) we noticed an error with the data representation of the TAC IPL data and have now corrected this oversight  

      Figure 2b,c. 

      500nM DAMGO effect on TAC IPL AMPAR EPSC – n increased from 5 to 9

      Figure 3g. 

      500nM DAMGO effect on CHAT IPR AMPAR EPSC – n increased from 8 to 16 Effect of CTAP on DAMGO on CHAT IPR AMPAR EPSC – n increased from 4 to 7

      Figure 3i. 

      500nm DAMGO or Met-enk effect in “silent” CHAT IPR AMPAR EPSC – n increased    from 7 to 9

      Figure 4e. 

      500nM DAMGO effect on ES coupling – Note: in the original version the n number was 5 and not 7 as written in the figure legend. We have now increased the n from 5 – 9.

      Figure 5e,f. 

      500nM DAMGO effect on TAC IPR AMPAR EPSC – n increased from 5 to 9

      Figure 7f.

      Effect of DHE on EPSC amplitude after application of DNQX/APV/4-AP or DTX-α – n increased from 7-9.

      Figure 7g.

      Emergence of nAChR EPSC after DTX – n increased from 4 to 7

      Figure 7i. 

      Effect of ambenonium on nAChR amplitude and charge – n increased from 4 to 7

      Supplementary Figure 3c and h

      Effect of DAMGO after DNQX – n increased from 4 to 7

      Effect of DNQX after DAMGO mediated potentiation – n increased from 3 to 5.

      Throughout the study (Figs. 3i, 7f and 8h in the revised manuscript)  we do indeed pool datasets that were amassed from different conditions since we were not directly investigating the possibility of any deviation in the extent of response between said treatments. For example, and as pointed out by the Reviewer, in Fig. 2F (now Fig. 3i) the use of DAMGO and met-ENK were merely employed to ascertain whether light-evoked synaptic transmission (ChATCre:ai32 mice) in cells that had no measurable EPSC could be pharmacologically “unsilenced” by mOR activation. Thus, the means by which mOR receptor was activated was not relevant to this specific question. Note: 2 more recordings are now added to this dataset (Fig. 3i) that were taken from ChATChR2/SSTCre:ai9 mice in response to the comment by this Reviewer below (“Are there baseline differences in the electrophysiological or morphological properties of these "silent" neurons compared to the responsive neurons?”).  Similarly, in the revised Fig.7f we pooled data investigating the pharmacological block of the EPSC that emerged following application of either DNQX/APV/4-AP or DNQX/APV/DTX. Low concentrations 4-AP or DTX were interchangeably employed to reveal the DNQX-insensitive EPSC that we go on to show is indeed the nAChR response. Finally, in Fig. 8h, we pooled data demonstrating a  lack of effect of DAMGO in potentiating  both the glutamatergic and cholinergic arms of synaptic transmission in the OPRM1 KO mice. Again, here we were only interested in determining whether removal of mOR expression prevented potentiation of transmission mediated by mHB ChAT neurons irrespective of neurotransmitter modality.  Thus, overall we were careful to only pool data in those instances where it  would not change the interpretation and hence conclusions reached. 

      There are also significant sex differences in nAChR expression in the IPN that might not be functionally apparent using the low n presented here. It would be helpful to know which of the recorded neurons came from each sex, rather than presenting only the pooled data.  

      As the reviewer correctly states there are veins of literature concerning a divergence, based on sex, of not only nicotinic receptor expression but also behaviors associated with nicotine addiction. However, we have reanalyzed our datasets focusing on the extent of the mOR potentiation of glutamatergic and cholinergic transmission mediated by mHB ChAT neurons in IPR  between male and female mice. Please refer to the Author response image 1 below. Although there is a possible trend towards a higher potentiation of nAChR in female mice, this was not found to be of statistical significance (see Author response image 1 below). We therefore chose not to split our data in the manuscript based on gender.

      Author response image 1.

      Comparison of the mOR (500nM DAMGO) mediated potentiation on evoked (a) AMPAR and (b) nAChR  EPSCs in IPR between male and female mice.  

      There are also some particularly novel observations that are presented but not followed up on, and this creates a somewhat disjointed story. For example, in Figure 2, the authors identify neurons in which no response is elicited by light stimulation of ChAT-neurons, but the application of DAMGO (mOR agonist) un-silences these neurons. Are there baseline differences in the electrophysiological or morphological properties of these "silent" neurons compared to the responsive neurons?  

      Unfortunately, we did not routinely measure intrinsic properties of the recorded postsynaptic neurons nor systematically recovered biocytin fills to assess morphology. Therefore, it remains unclear whether the  neurons in which there were none or minimal AMPAR-mediated EPSCs are distinct to the ones displaying measurable responses. The IPR is resident to GABAergic SST neurons that comprise the most numerous neuron type in this IPN subdivision. Although heavily outnumbered by the SST neurons there are additionally VGluT3+ glutamatergic neurons in IPN. The Reviewer is likely referring to a recent study investigating synaptic transmission specifically onto  SST+ and VGluT3+ neurons in IPN demonstrating that mHB cholinergic mediated glutamatergic input is “weaker” onto the glutamatergic neurons. Furthermore, in some instances synaptic transmission onto this latter population can be “unsilenced” by GABAB receptor activation in a similar manner to that seen with mOR activation in this manuscript when IPR neurons are blindly targeted(Stinson & Ninan, 2025).  Using a similar strategy as in this recent study(Stinson & Ninan, 2025), we now include experiments in which the ChATChR2 mouse was crossed with  a SSTCre:Ai14. This allowed for recording of postsynaptic EPSCs in directly identified SST IPR neurons. We demonstrate that DAMGO can indeed increase glutamatergic EPSCs and in 2 of the cells where light activation demonstrated no appreciable AMPAR EPSC upon maximal LED light activation, DAMGO clearly “unsilenced” transmission.  Thus, our additional analyses directly demonstrate that our original observations concerning mOR modulation extend to the mHb cholinergic AMPAR mediated input onto IPR SST neurons. This additional data is in the revised manuscript (Figure 3D-F, I). Future experimentation will be required to determine if the propensity of encountering a  “silent” input that can be converted to robust synaptic transmission by mOR differs between these two cell types. Furthermore, it will be of interest to investigate if any differences exist in the magnitude of the cholinergic input or the mOR mediated potentiation of co-transmission between postsynaptic SST GABA and glutamatergic neuronal subtypes. 

      Reviewer #2 (Public review)

      Weaknesses: 

      The genetic strategy used to target the mHb-IPN pathway (constitutive expression in all ChAT+ and Tac1+ neurons) is not specific to this projection.  

      This is an important point made. We are acutely aware that the source of the synaptic input in IPN mediated by conditional expression of ChR2 employing  using transgenic cre driver lines does not confer specificity to mHB. This is particularly relevant considering one of the novel observations here relates to  a previously unidentified functional input from TAC1 neurons to the IPR. At this juncture we would like to point the Reviewer to the publicly available Connectivity Atlas provided by the Allen Brain Institute (https://connectivity.brain-map.org/). With reference to mHB TAC1 neuronal output, targeted viral injection into the habenula of Tac1Cre mice allows conditional expression of EGFP to SP neurons as evidenced by the predominant expression of reported fluorescence in dorsal mHB (see Author response image 2 a,b below). Tracing the axonal projections to the IPN clearly demonstrates dense fibers in IPL as expected but also arborization in  IPR (Author response image 2 a,c) . This pattern is reminiscent of that seen in the transgenic Tac1Cre:ai9 or ai32 mice used in the current study (Figs. 1c, 2a, 5c). Closer inspection of the fibers in the IPR reveals putative synaptic bouton like structures as we have shown in Fig. 5a,b (Author response image 2 d below).

      Author response image 2.

      Sterotaxic viral injection into mHB pf Tac1Cre mice taken from Allen Brain connectivity atlas (Link to Connectivity Atlas for mHb SP neuronal projection pattern)

      These anatomical data suggest that part of the synaptic input to the IPR originates from mHB TAC1 neurons although we cannot fully discount additional synaptic input from other brain areas that may impinge on the IPR. Indeed, as the Reviewer points out, it is evident that other regions including the nucleus incertus send outputs to the IPN(Bueno et al., 2019; Liang et al., 2024; Lima et al., 2017). However, it is unclear if neuronal inputs from these alternate sources {Liang, 2024 #123;Lima, 2017 #33}{Bueno, 2019 #178} are glutamatergic in nature AND mediated by a TAC1/OPRM1-expressing neuronal population. Nevertheless, we have now modified text in the discussion to highlight the limitations of using a transgenic strategy (pg 12, para 1).

      In addition, a braking mechanism involving Kv1.2 has not been identified.

      It is unclear to what the Reviewer is referring to here. Although most of our experiments pertaining to the brake on cholinergic  transmission by potassium channels use low concentrations of 4-AP (50100M) which have been used to block Shaker Kv1 channels there although at these concentrations there are additional action at other K+-channels such as Kv3, for instance. However, we essentially demonstrate that a selective Kv1.1 and Kv1.2 antagonist dendrotoxin replicates the 4-AP effects. We have now also included RNAseq data demonstrating the relative expression levels of Kv1 channel mRNA in mHb ChAT neurons (KCNA1 through KCNA6; Figure 6b). The complete absence of KCNA1 yet a high expression level of KCNA2 transcripts highly suggests a central role of Kv1.2 in unmasking nAChR mediated synaptic transmission. 

      Reviewer #3 (Public review)

      Weaknesses:  

      The significance of the ratio of AMPA versus nACh EPSCs shown in Figure 6 is unclear since nAChR EPSCs measured in the K+ channel blockers are compared to AMPA EPSCs in control (presumably 4-AP would also increase AMPA EPSCs). 

      We understand the Reviewer’s concern regarding the calculation of nicotinic/AMPA ratios since they are measured under differing conditions i.e. absence and presence of 4-AP, respectively. As the reviewer correctly points point 4-AP likely increases the amplitude of the AMPA receptor mediated EPSC. However, our intention of calculating this ratio was not to ascertain a measure of relative strengths of fast glutamatergic vs cholinergic transmission onto a given postsynaptic IPN neuron per se. Rather, we used the ratio as a means to normalize the size of the nicotinic receptor EPSC to the strength of the light stimulation (using the AMPA EPSC as the normalizing factor) in each individual recording. This permits a more meaningful comparison across cells/slices/mice . We apologize for the confusion and have amended the text in the results section to reflect this (pg 9; para2).

      The mechanistic underpinnings of the most now  results are not pursued. For example, the experiments do not provide new insight into the differential effects of evoked and spontaneous glutamate/Ach release by Gi/o coupled mORs, nor the differential threshold for glutamate versus Ach release. 

      Our major goal of the current manuscript was to provide a much-needed roadmap outlining the effects of opioids in the habenulo-interpeduncular axis. Of course, a full understanding of the mechanisms underlying such complex opioid actions at the molecular level will be of great value. We feel that this is beyond the scope of this already quite result dense manuscript but will be essential if directed manipulation of the circuit is to be leveraged to alter maladaptive behaviors associated with addiction/emotion during adolescence and in adult. 

      The authors note that blocking Kv1 channels typically enhances transmitter release by slowing action potential repolarization. The idea that Kv1 channels serve as a brake for Ach release in this system would be strengthened by showing that these channels are the target of neuromodulators or that they contribute to activity-dependent regulation that allows the brake to be released. 

      The exact mechanistic underpinnings that can potentially titer Kv1.2 availability and hence nAChR transmission would be essential to shed light on potential in vivo conditions under which this arm of neurotransmission can be modulated. However, we feel that detailed mechanistic interrogation constitutes significant work but one that future studies should aim to achieve. Thus, it presently remains unclear under what physiological or pathological scenarios result in attenuation of Kv1.2 to subsequently promote nAChR mediated transmission but as mentioned in the existing discussion future work to decipher such mechanisms would be of great value.

      Reviewer #1 (Recommendations for the authors): 

      Overall I find this to be a very interesting and exciting paper, presenting novel findings that provide clarity for a problem that has persisted in the IPN field: that of the conundrum that light-evoked cholinergic signaling was challenging to observe despite the abundance of nAChRs in the IPN. 

      Major concerns: 

      (1) The n is quite low in most cases, and in many instances, data from one figure are replotted in another figure. Given that the findings presented here are expected in the normal condition, it should not be difficult to increase the n. A more robust number of observations would strengthen the novel findings presented here. 

      Please refer to the response to the public review above.

      (2) In general, I find the organization of the figures somewhat disjointed. Sometimes it feels as if parts of the information presented in the results are split between figures, where it would make more sense to be together in a figure. For example, all the histology for each of the lines is in Figure 1, but only ephys data for one line is included there. It would be more logical to include the histology and ephys data for each line in its own figure. It would also be helpful to show the overlap of mOR expression with Tac1-Cre and ChAT-Cre terminals in the IPN. Likewise, the summarized Tac1Cre:Ai32 IPR data is in Figure 4, but the individual data is in Figure 5. 

      We introduce both ChAT and TAC1 cre lines in Figure 1 as an overview particularly for those readers who are not entirely familiar with the distinct afferent systems operating with the habenulointerpeduncular pathway.  However, in compliance with the Reviewer’s suggestion we have now restructured the Figures. In the revised manuscript, the functional data pertaining to the various transmission modalities mediated by the distinct afferent systems impinging on the subdivision of the IPN tested are now split into their own dedicated figure as follows:

      Figure 2. 

      mOR effect on TAC1neuronal glutamatergic output in IPL.

      Figure 3. 

      mOR effect on CHAT neuronal glutamatergic output in IPR.

      Figure 5. 

      mOR effect on TAC1neuronal glutamatergic output in IPR.

      Figure 8.

      mOR effect on CHAT neuronal cholinergic output in IPC.

      Supp. Fig. 1 mOR effect on CHAT neuronal glutamatergic output in IPC.

      We thank the Reviewer for their suggestions regarding the style of the manuscript. The restructuring has now resulted in a much better flow of the presented data.

      (3) The discussion is largely satisfactory. However, a little more discussion of the integrative function of the IPN is warranted given the opposing effects of MOR activation in the Tac vs ChAT terminals, particularly in the context of both opioids and natural rewards. 

      We thank the reviewer for this comment. However, we feel the discussion is rather lengthy as is and therefore we refrained from including additional text.  

      Minor concerns: 

      (1)  The methods are missing key details. For example, the stock numbers of each of the strains of mice appear to have been left out. This is of particular importance for this paper as there are key differences between the ChAT-Cre lines that are available that would affect observed electrophysiological properties. As the authors indicate, the ChAT-ChR2 mice overexpress VAChT, while the ChAT-IRES-Cre mice do not have this problem. However, as presented it is unclear which mice are being used. 

      We apologize for the omission - the catalog numbers of the mice employed have now been included in the methods section.

      We have now clearly included in each figure panel (single trace examples and pooled data) from which mice the data are taken from – in some instances the pooled data are from the two CHAT mouse strains employed. Despite the tendency of the ChATChR2 mice to demonstrate more pronounced nAChR mediated transmission (Fig. 7h),  we justify pooling the data since we see no statistical significance in the effect of mOR activation on either potentiating AMPA or nAChR EPSCs (Please refer to response to Reviewer 2, Minor Concern point 2)

      (2) Likewise, antibody dilutions used for staining are presented as both dilution and concentration, which is not typical. 

      We thank the reviewer for pointing out this inconsistency. We have amended the text in the methods to include only the working dilution for all antibodies employed in the study.

      (3) There are minor typos throughout the manuscript. 

      All typos have been corrected.

      Reviewer #2 (Recommendations for the authors): 

      The authors provide a thorough investigation into the subregion, and cell-type effect of mu opioid receptor (MOR) signaling on neurotransmission in the medial habenula to interpeduncular nucleus circuit (mHb-IPN). This circuit largely comprises two distinct populations of neurons: mHb substance P (Tac1+) and cholinergic (ChAT+) neurons. Corroborating prior work, the authors report that Tac1+ neurons preferentially innervate the lateral IPN (IPL) and rostral IPN (IPR), while ChAT+ neurons preferentially innervate the central IPN (IPC) and IPR. The densest expression of MOR is observed in the IPL and MOR agonists produce a canonical presynaptic depression of glutamatergic neurotransmission in this region. Interestingly, MOR signaling in the ChAT+ mHb projection to the IPR potentiates light-evoked glutamate and acetylcholine-mediated currents (EPSC), and this effect is mediated by a MOR-induced inhibition of Kv2.1 channels. 

      Major concerns: 

      (1) The method used for expressing channelrhodopsin (ChR2) into cholinergic and neurokinin neurons in the mHb (Ai32 mice crossed with Cre-driver lines) has limitations because all Tac1+/ChAT+ inputs to the IPN express ChR2 in this mouse. Importantly, the IPN receives inputs from multiple brain regions besides the IPN-containing neurons capable of releasing these neurotransmitters (PMID: 39270652). Thus, it would be important to isolate the contributions of the mHb-IPN pathway using virally expressed ChR2 in the mHb of Cre driver mice. 

      Please refer to the response to the public review above. 

      (2) Figure 4: The authors conclude that the sEPSC recorded from IPR originate from Tac1+ mHbIPR projections. However, this cannot be stated conclusively without additional experimentation. For instance, an optogenetic asynchronous release experiment. For these experiments it would also be important to express ChR2 virus in the mHb in Tac1- and ChAT-Cre mice since glutamate originating from other brain regions could contribute to a change in asynchronous EPSCs induced by DAMGO. 

      This is a well taken point. The incongruent effect of DAMGO on evoked CHAT neuronal EPSC amplitude and sEPSC frequency prompted us  to consider the the possibility of differing effect of DAMGO on a  secondary input. We agree that we do not show directly if the sEPSCs originate from a TAC1 neuronal population. Therefore, we have tempered our wording with regards the origin of the sEPSCs and  have also restructured the Figure in question moving the sEPSC data into supplemental data (Supplemental Fig. 2) 

      (3) Figure 5D: lt would be useful to provide a quantitative measure in a few mice of mOR fluorescence across development (e.g. integrated density of fluorescence in IPR). 

      We have now included mOR expression density across development  (Fig. 6). Interestingly, the adult expression levels of mOR in the IPR are essentially reached at a very early developmental age (P10) yet we see stark differences in the role of mOR activation in modulating glutamatergic transmission mediated by mHB cholinergic neurons. Note: since we processed adult tissue (i.e. >p40) for these developmental analyses we utilized these slices to also include an analysis of the relative mOR expression density specifically in adults between the subdivisions of IPN in Fig. 1.

      (4) Figure 6B: It would be useful to quantify the expression of Kcna2 in ChAT and Tac1 neurons (e.g. using FISH). 

      We thank the Reviewer for this suggestion. We have now included mRNA expression levels available from publicly available 10X RNA sequencing dataset provided by the Allen Brain Institute (Figure 7b).  

      (5) It would be informative to examine what the effects of MOR activation are on mHb projections to the (central) . 

      In response to this suggestion, we now have included  additional data in the manuscript in putative IPC cells that clearly demonstrate a similar DAMGO elicited potentiation of AMPAR EPSC to that  seen in IPR. These data are now included in the revised manuscript  (Supplemental Fig. 1; Fig. 8i). 

      (6) What is the proposed link between MOR activation and the inhibition of Kv1.2 (e.g. beta-Arrestin signaling, G beta-gamma interaction with Kv1.2, PKA inhibition?) 

      We apologize for any confusion. We do not directly test whether the potentiation of EPSCs upon mOR activation occurs via inhibition of Kv1.2.Although we have not directly tested this possibility we find it an unlikely underlying cellular mechanism, especially for the potentiation of the cholinergic arm of neurotransmission since in the presence of DNQX/APV, the activation of mOR does not result in any emergence of any nAChR EPSC (see Supplementary Fig. 3a-c)

      Minor concerns: 

      (1) Methods: Jackson lab ID# for used mouse strains is missing. 

      We apologize for this omission and have now included the mouse strain catalog numbers.

      (2) The authors use data from both ChAT-Cre x Ai32 and ChAT-ChR2 mice. It would be helpful to show some comparisons between the lines to justify merging data sets for some of the analyses as there appear to be differences between the lines (e.g. Figure 6G). 

      This is a well taken point. We have now provided a figure for the Reviewer (see below) that illustrates the lack of  significant difference between the mOR mediated potentiation of both mHB CHAT neuronal AMPAR and nAChR transmission between the two mouse lines employed despite a divergence in the extent of glutamatergic vs cholinergic transmission shown in Fig. 7g (previously Figure 6g). We have chosen not to include this data in the revised manuscript.

      Author response image 3.

      Comparison of the mOR (500nM DAMGO) mediated potentiation on evoked AMPAR (a) and nAChR (b)EPSCs in IPR between ChATCre:Ai32  and ChATChR2 mice.

      (3)  Line 154: How was it determined that the EPSC is glutamatergic? 

      We apologize for any confusion. In the revised manuscript we now clearly point to the relevant figures (see Supplementary Figs. 2a and 3) in the Results section (pg. 4, para 2; pg 7, para 1; pg 8, para2) where we determine that both the sEPSCs and ChAT mediated light evoked EPSCs recorded under baseline conditions are totally blocked by DNQX and hence are exclusively AMPAR events 

      (4) It would be helpful to discuss the differences between GABA-B mediated potentiation of mHbIPN signaling and the current data in more detail. 

      We are unclear as to what differences the Reviewer is referring to. At least from the perspective of ChAT neuronal mediated synaptic transmission, other groups (and in the current study; Fig. 7h) have clearly shown that GABA<sub>B</sub> activation markedly potentiates synaptic transmission like mOR activation. Nevertheless, based on our novel findings it would be of interest to determine whether the influence of GABA<sub>B</sub> is inhibitory onto the TAC mediated input in IPR and whether there is a developmental regulation of this effect as we demonstrate upon mOR activation. These additional comparisons between the effect of the two Gi-linked receptors may shed light onto the similarity, or lack thereof, regarding the underlying cellular mechanisms. We now have included a few sentences in the discussion to highlight this (pg 11, para 1).

      Reviewer #3 (Recommendations for the authors): 

      The abstract was confusing at first read due to the complex language, particularly the sentence starting with... Further, specific potassium channels... 

      The authors might want to consider simplifying the description of the experiments and the results to clarify the content of the manuscript for readers who many only read the abstract. 

      We have altered the wording of the abstract and hope it is now more reader friendly.

      The opposite effect of mOR activation on spontaneous EPSCs versus electrical or ChR2-evoked EPSCs is very interesting and raises the issue of which measure is most physiologically relevant. For example, it is unclear whether sEPSCs arise primarily from cholinergic neurons (that are spontaneously active in the slice, Figure 3), and if so, does mOR activation suppress or enhance cholinergic neuron excitability and/or recruitment by ChR2? While a full analysis of this question is beyond the scope of this manuscript, the assumption that glutamate release assayed by electrical/ChR2 evoked transmission is the most physiologically relevant might merit some discussion since sEPSCs presumably also reflect action-potential dependent glutamate release. One wonders whether mORs hyperpolarize cholinergic neurons to reduce spontaneous spiking yet enhance fiber recruitment by ChR2 or an electrical stimulus (i.e. by removing Na channel inactivation). The authors have clearly stated that they do not know where the mORs are located, and that the effects arising from disinhibition are likely complex. But they also might discuss whether glutamate release following synchronous activation of a fiber pathway by ChR2 or electrode is more or less physiologically relevant than glutamate release assayed during spontaneous activity. It seems likely that an equivalent experiment to Figure 3D, E using spontaneous spiking of IPR neurons would show that spiking is reduced by mOR activation. 

      We thank the Reviewer for this comment. As pointed it would be of interest to dissect the “network” effect of mOR activation but as the Reviewer acknowledges this is beyond the scope of the current manuscript. The Reviewer is correct in postulating that mOR activation results in hyperpolarization of mHB ChAT neurons.  A recent study(Singhal et al 2025) demonstrate that a subpopulation of ChAT neurons undergoes a reduction in firing frequency following DAMGO application. This is corroborated by our own observations although we chose not to include this data in our current manuscript (but see below).

      Additionally, the Reviewer questions whether ChR2/electrical stimulation is physiological. This is a well taken point and of course the simultaneous activation of potentially all possible axonal release sites is not the mode under which the circuit operates. Nevertheless, our data clearly demonstrates the ability of mORs to modulate release under these circumstances that must reflect an impact on spontaneous action potential driven evoked release.  Although the suggested experiment  could shed light on the synaptic outcomes of mOR receptor activation on ES coupling of downstream IPN neurons. Interpretation of the outcome would be confounded by the fact that postsynaptic IPN neurons also express mORs . Thus,  we would not be able to isolate the effects of presynaptic changes in modulating ES coupling from any direct postsynaptic effect on the recorded cell when in current clamp. 

      Together these additional sites of action of mOR (i.e. mHB ChAT somatodendritic and postsynaptic IPN neuron) only serve to further highlight the complex nature of the actions of opioids on the habenulo-interpeduncular axis warranting  future work to fully understand the physiological and pathological effects on the habenulo-interpeduncular axis as a whole.

      The idea that Kv2.1 channels serve as a brake raises the question of whether they contribute to activity-dependent action potential broadening to facilitate Ach release during trains of stimuli. 

      This is an interesting suggestion and one that we had considered ourselves. Indeed, as the Reviewer is likely aware and as mentioned in the manuscript, previous studies have shown nAChR signaling can be revealed under conditions of multiple stimulations given at relatively high frequencies.  We therefore attempted to perform high frequency stimulation (20 stimulations at 25Hz and 50Hz) in the presence of ionotropic glutamatergic receptor antagonists DNQX and APV. We have now included this data in the revised manuscript (Supplementary Fig 3b). As shown, this failed to engage nAChR mediated synaptic transmission in our hands. Interestingly there is evidence from reduced expression systems demonstrating that Kv1.2 channels undergo use-dependent potentiation(Baronas et al., 2015) in contrast to that seen with other K+-channels. Whether this is the case for the axonal Kv1.2 channels on mHB axonal terminals in situ is not known but this may explain the inability to reveal nAChR EPSCs upon delivery of such stimulation paradigms.  

      References 

      Baronas, V. A., McGuinness, B. R., Brigidi, G. S., Gomm Kolisko, R. N., Vilin, Y. Y., Kim, R. Y., … Kurata, H. T. (2015). Use-dependent activation of neuronal Kv1.2 channel complexes. J Neurosci, 35(8), 3515-3524. doi:10.1523/JNEUROSCI.4518-13.2015

      Bueno, D., Lima, L. B., Souza, R., Goncalves, L., Leite, F., Souza, S., … Metzger, M. (2019). Connections of the laterodorsal tegmental nucleus with the habenular-interpeduncular-raphe system. J Comp Neurol, 527(18), 3046-3072. doi:10.1002/cne.24729

      Liang, J., Zhou, Y., Feng, Q., Zhou, Y., Jiang, T., Ren, M., … Luo, M. (2024). A brainstem circuit amplifies aversion. Neuron. doi:10.1016/j.neuron.2024.08.010

      Lima, L. B., Bueno, D., Leite, F., Souza, S., Goncalves, L., Furigo, I. C., … Metzger, M. (2017). Afferent and efferent connections of the interpeduncular nucleus with special reference to circuits involving the habenula and raphe nuclei. J Comp Neurol, 525(10), 2411-2442. doi:10.1002/cne.24217

      Singhal, S. M., Szlaga, A., Chen, Y. C., Conrad, W. S., & Hnasko, T. S. (2025). Mu-opioid receptor activation potentiates excitatory transmission at the habenulo-peduncular synapse. Cell Rep, 44(7), 115874. doi:10.1016/j.celrep.2025.115874

      Stinson, H.E., & Ninan, I. (2025). GABA(B) receptor-mediated potentiation of ventral medial habenula glutamatergic transmission in GABAergic and glutamatergic interpeduncular nucleus neurons. bioRxiv doi.10.1101/2025.01.03.631193

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors use anatomical tracing and slice physiology to investigate the integration of thalamic (ATN) and retrosplenial cortical (RSC) signals in the dorsal presubiculum (PrS). This work will be of interest to the field, as the postsubiculum is thought to be a key region for integrating internal head direction representations with external landmarks. The main result is that ATN and RSC inputs drive the same L3 PrS neurons, which exhibit superlinear summation to near-coincident inputs. Moreover, this activity can induce bursting in L4 PrS neurons, which can pass the signals LMN (perhaps gated by cholinergic input).

      Strengths:

      The slice physiology experiments are carefully done. The analyses are clear and convincing, and the figures and results are well-composed. Overall, these results will be a welcome addition to the field.

      We thank this reviewer for the positive comment on our work.

      Weaknesses:

      The conclusions about the circuit-level function of L3 PrS neurons sometimes outstrip the data, and their model of the integration of these inputs is unclear. I would recommend some revision of the introduction and discussion. I also had some minor comments about the experimental details and analysis.

      Specific major comments:

      (1) I found that the authors' claims sometimes outstrip their data, given that there were no in vivo recordings during behavior. For example, in the abstract, their results indicate "that layer 3 neurons can transmit a visually matched HD signal to medial entorhinal cortex", and in the conclusion they state "[...] cortical RSC projections that carry visual landmark information converge on layer 3 pyramidal cells of the dorsal presubiculum". However, they never measured the nature of the signals coming from ATN and RSC to L3 PrS (or signals sent to downstream regions). Their claim is somewhat reasonable with respect to ATN, where the majority of neurons encode HD, but neurons in RSC encode a vast array of spatial and non-spatial variables other than landmark information (e.g., head direction, egocentric boundaries, allocentric position, spatial context, task history to name a few), so making strong claims about the nature of the incoming signals is unwarranted.

      We agree of course that RSC does not only encode landmark information. We have clarified this point in the introduction (line 69-70) and formulated more carefully in the abstract (removed the word ‘landmark’ in line 17) and in the  introduction (line 82-83). In the discussion we explicitly state that ‘In our slice work we are blind to the exact nature of the signal that is carried by ATN and RSC axons’ (line 522-523).

      (2) Related to the first point, the authors hint at, but never explain, how coincident firing of ATN and RSC inputs would help anchor HD signals to visual landmarks. Although the lesion data (Yoder et al. 2011 and 2015) support their claims, it would be helpful if the proposed circuit mechanism was stated explicitly (a schematic of their model would be helpful in understanding the logic). For example, how do neurons integrate the "right" sets of landmarks and HD signals to ensure stable anchoring? Moreover, it would be helpful to discuss alternative models of HD-to-landmark anchoring, including several studies that have proposed that the integration may (also?) occur in RSC (Page & Jeffrey, 2018; Yan, Burgess, Bicanski, 2021; Sit & Goard, 2023). Currently, much of the Discussion simply summarizes the results of the study, this space could be better used in mapping the findings to the existing literature on the overarching question of how HD signals are anchored to landmarks.

      We agree with the reviewer on the importance of the question, how do neurons integrate the “right” sets of landmarks and HD signals to ensure stable anchoring? Based on our results we provide a schematic to illustrate possible scenarios, and we include it as a supplementary figure (Figure 1, to be included in the ms as Figure 7—figure supplement 2), as well as a new paragraph in the discussion section (line 516-531).  We point out that critical information on the convergence and divergence of functionally defined inputs is still lacking, both for principal cells and interneurons

      Interestingly, recent evidence from functional ultrasound imaging and electrical single cell recording demonstrated that visual objects may refine head direction coding, specifically in the dorsal presubiculum (Siegenthaler et al. bioRxiv 2024.10.21.619417; doi: https://doi.org/10.1101/2024.10.21.619417). The increase in firing rate for HD cells whose preferred firing direction corresponds to a visual landmark could be supported by the supralinear summation of thalamic HD signals and retrosplenial input described in our study. We include this point in the discussion (line 460-462), and hope that our work will spur further investigations.

      Reviewer #2 (Public Review):

      Richevaux et al investigate how anterior thalamic (AD) and retrosplenial (RSC) inputs are integrated by single presubicular (PrS) layer 3 neurons. They show that these two inputs converge onto single PrS layer 3 principal cells. By performing dual-wavelength photostimulation of these two inputs in horizontal slices, the authors show that in most layer 3 cells, these inputs summate supra-linearly. They extend the experiments by focusing on putative layer 4 PrS neurons, and show that they do not receive direct anterior thalamic nor retrosplenial inputs; rather, they are (indirectly) driven to burst firing in response to strong activation of the PrS network.

      This is a valuable study, that investigates an important question - how visual landmark information (possibly mediated by retrosplenial inputs) converges and integrates with HD information (conveyed by the AD nucleus of the thalamus) within PrS circuitry. The data indicate that near-coincident activation of retrosplenial and thalamic inputs leads to non-linear integration in target layer 3 neurons, thereby offering a potential biological basis for landmark + HD binding.

      The main limitations relate to the anatomical annotation of 'putative' PrS L4 neurons, and to the presentation of retrosplenial/thalamic input modularity. Specifically, more evidence should be provided to convincingly demonstrate that the 'putative L4 neurons' of the PrS are not distal subicular neurons (as the authors' anatomy and physiology experiments seem to indicate). The modularity of thalamic and retrosplenial inputs could be better clarified in relation to the known PrS modularity.

      We thank the reviewer for their important feedback. We discuss what defines presubicular layer 4 in horizontal slices, cite relevant literature, and provide new and higher resolution images. See below for detailed responses to the reviewer’s comments, in the section ‘recommendations to authors’.

      Reviewer #3 (Public Review):

      Summary:

      The authors sought to determine, at the level of individual presubiculum pyramidal cells, how allocentric spatial information from the retrosplenial cortex was integrated with egocentric information from the anterior thalamic nuclei. Employing a dual opsin optogenetic approach with patch clamp electrophysiology, Richevaux, and colleagues found that around three-quarters of layer 3 pyramidal cells in the presubiculum receive monosynaptic input from both brain regions. While some interesting questions remain (e.g. the role of inhibitory interneurons in gating the information flow and through different layers of presubiculum, this paper provides valuable insights into the microcircuitry of this brain region and the role that it may play in spatial navigation).

      Strengths:

      One of the main strengths of this manuscript was that the dual opsin approach allowed the direct comparison of different inputs within an individual neuron, helping to control for what might otherwise have been an important source of variation. The experiments were well-executed and the data was rigorously analysed. The conclusions were appropriate to the experimental questions and were well-supported by the results. These data will help to inform in vivo experiments aimed at understanding the contribution of different brain regions in spatial navigation and could be valuable for computational modelling.

      Weaknesses:

      Some attempts were made to gain mechanistic insights into how inhibitory neurotransmission may affect processing in the presubiculum (e.g. Figure 5) but these experiments were a little underpowered and the analysis carried out could have been more comprehensively undertaken, as was done for other experiments in the manuscript.

      We agree that the role of interneurons for landmark anchoring through convergence in Presubiculum requires further investigation. In our latest work on the recruitment of VIP interneurons we begin to address this point in slices (Nassar et al., 2024 Neuroscience. doi: 10.1016/j.neuroscience.2024.09.032.); more work in behaving animals will be needed.

      Reviewer #1 (Recommendations For The Authors):

      Full comments below. Beyond the (mostly minor) issues noted below, this is a very well-written paper and I look forward to seeing it in print.

      Major comments:

      (1) I found that the authors' claims sometimes outstrip their data, given that there were no in vivo recordings during behavior. For example, in the abstract, their results indicate "that layer 3 neurons can transmit a visually matched HD signal to medial entorhinal cortex", and in the conclusion they state "[...] cortical RSC projections that carry visual landmark information converge on layer 3 pyramidal cells of the dorsal presubiculum". However, they never measured the nature of the signals coming from ATN and RSC to L3 PrS (or signals sent to downstream regions). Their claim is somewhat reasonable with respect to ATN, where the majority of neurons encode HD, but neurons in RSC encode a vast array of spatial and non-spatial variables other than landmark information (e.g., head direction, egocentric boundaries, allocentric position, spatial context, task history to name a few), so making strong claims about the nature of the incoming signals is unwarranted.

      Our study was motivated by the seminal work from Yoder et al., 2011 and 2015, indicating that visual landmark information is processed in PoS and from there transmitted to the LMN.  Based on that, and in the interest of readability, we may have used an oversimplified shorthand for the type of signal carried by RSC axons. There are numerous studies indicating a role for RSC in encoding visual landmark information (Auger et al., 2012; Jacob et al., 2017; Lozano et al., 2017; Fischer et al., 2020; Keshavarzi et al., 2022; Sit and Goard, 2023); we agree of course that this is certainly not the only variable that is represented. Therefore we change the text to make this point clear:

      Abstract, line 17: removed the word ‘landmark’

      Introduction, line 69: added “...and supports an array of cognitive functions including memory, spatial and non-spatial context and navigation (Vann et al., 2009; Vedder et al., 2017). ”

      Introduction, line 82: changed “...designed to examine the convergence of visual landmark information, that is possibly integrated in the RSC, and vestibular based thalamic head direction signals”.

      Discussion, line 522-523: added “In our slice work we are blind to the exact nature of the signal that is carried by ATN and RSC axons.”

      (2) Related to the first point, the authors hint at, but never explain, how coincident firing of ATN and RSC inputs would help anchor HD signals to visual landmarks. Although the lesion data (Yoder et al., 2011 and 2015) support their claims, it would be helpful if the proposed circuit mechanism was stated explicitly (a schematic of their model would be helpful in understanding the logic). For example, how do neurons integrate the "right" sets of landmarks and HD signals to ensure stable anchoring? Moreover, it would be helpful to discuss alternative models of HD-to-landmark anchoring, including several studies that have proposed that the integration may (also?) occur in RSC (Page & Jeffrey, 2018; Yan, Burgess, Bicanski, 2021; Sit & Goard, 2023). Currently, much of the Discussion simply summarizes the results of the study, this space could be better used in mapping the findings to the existing literature on the overarching question of how HD signals are anchored to landmarks.

      We suggest a physiological mechanism for inputs to be selectively integrated and amplified, based on temporal coincidence. Of course there are still many unknowns, including the divergence of connections from a single thalamic or retrosplenial input neuron. The anatomical connectivity of inputs will be critical, as well as the subcellular arrangement of synaptic contacts. Neuromodulation and changes in the balance of excitation and inhibition will need to be factored in. While it is premature to provide a comprehensive explanation for landmark anchoring of HD signals in PrS, our results have led us to include a schematic, to illustrate our thinking (Figure 1, see below).

      Do HD tuned inputs from thalamus converge on similarly tuned HD neurons only? Is divergence greater for the retrosplenial inputs? If so, thalamic input might pre-select a range of HD neurons, and converging RSC input might narrow down the precise HD neurons that become active (Figure 1). In the future, the use of activity dependent labeling strategies might help to tie together information on the tuning of pre-synaptic neurons, and their convergence or divergence onto functionally defined postsynaptic target cells. This critical information is still lacking, for principal cells, and also for interneurons. 

      Interneurons may have a key role in HD-to-landmark anchoring. SST interneurons support stability of HD signals (Simonnet et al., 2017) and VIP interneurons flexibly disinhibit the system (Nassar et al., 2024). Could disinhibition be a necessary condition to create a window of opportunity for updating the landmark anchoring of the attractor? Single PV interneurons might receive thalamic and retrosplenial inputs non-specifically. We need to distinguish the conditions for when the excitation-inhibition balance in pyramidal cells may become tipped towards excitation, and the case of coincident, co-tuned thalamic and retrosplenial input may be such a condition. Elucidating the principles of hardwiring of inputs, as for example, selective convergence, will be necessary. Moreover, neuromodulation and oscillations may be critical for temporal coordination and precise temporal matching of HD-to-landmark signals.

      We note that matching directional with visual landmark information based on temporal coincidence as described here does not require synaptic plasticity. Algorithms for dynamic control of cognitive maps without synaptic plasticity have been proposed (Whittington et al., 2025, Neuron): information may be stored in neural attractor activity, and the idea that working memory may rely on recurrent updates of neural activity might generalize to the HD system. We include these considerations in the discussion (line 497-501; 521-531) and hope that our work will spur further experimental investigations and modeling work.

      While the focus of our work has been on PrS, we agree that RSC also treats HD and landmark signals. Possibly the RSC registers a direction to a landmark rather than comparing it with the current HD (Sit & Goard, 2023). We suggest that this integrated information then reaches PrS. In contrast to RSC, PrS is uniquely positioned to update the signal in the LMN (Yoder et al., 2011), cf. discussion (line 516-520).

      Minor comments:

      (1) Fig 1 - Supp 1: It appears there is a lot of input to PrS from higher visual regions, could this be a source of landmark signals?

      Yes, higher visual regions projecting to PrS may also be a source of landmark information, even if the visual signal is not integrated with HD at that stage (Sit & Goard 2023). The anatomical projection from the visual cortex was first described by Vogt & Miller (1983), but not studied on a functional level so far.

      (2) Fig 2F, G: Although the ATN and RSC measurements look quite similar, there are no stats included. The authors should use an explicit hypothesis test.

      We now compare the distributions of amplitudes and of latencies, using the Mann-Whitney U test. No significant difference between the two groups were found. Added in the figure legend: 2F, “Mann-Whitney U test revealed no significant difference (p = 0.95)”. 2G, “Mann-Whitney U test revealed no significant difference (p = 0.13)”.

      (3) Fig 2 - Supp 2A, C: Again, no statistical tests. This is particularly important for panel A, where the authors state that the latencies are similar but the populations appear to be different.

      Inputs from ATN and RSC have a similar ‘jitter’ (latency standard deviation) and ‘tau decay’. We added in the Fig 2 - Supp 2 figure legend: A, “Mann-Whitney U test revealed no significant difference (p = 0.26)”. C, “Mann-Whitney U test revealed no significant difference (p = 0.87)”.

      As a complementary measure for the reviewer, we performed the Kolmogorov-Smirnov test which confirmed that the populations’ distributions for ‘jitter’ were not significantly different, p = 0.1533.

      (4) Fig 4E, F: The statistics reporting is confusing, why are asterisks above the plots and hashmarks to the side?

      Asterisks refer to a comparison between ‘dual’ and ‘sum’ for each of the 5 stimulations in a Sidak multiple comparison test. Hashmarks refer to comparison of the nth stimulation to the 1st one within dual stimulation events (Friedman + Dunn’s multiple comparison test). We mention the two-way ANOVA p-value in the legend (Sum v Dual, for both Amplitude and Surface).

      (5) Fig 5C: I was confused by the 2*RSC manipulation. How do we know if there is amplification unless we know what the 2*RSC stim alone looks like?

      We now label the right panel in Fig 5C as “high light intensity” or “HLI”. Increasing the activation of Chrimson increases the amplitude of the summed EPSP that now exceeds the threshold for amplification of synaptic events. Amplification refers to the shape of the plateau-like prolongation of the peak, most pronounced on the second EPSP, now indicated with an arrow.  We clarify this also in the text (line 309-310).

      (6) Fig 6D (supplement 1): Typo, "though" should be "through"

      Yes, corrected (line 1015).

      (7) Fig 6G (supplement 1): Typo, I believe this refers to the dotted are in panel F, not panel A.

      Yes, corrected (line 1021).

      (8) Fig 7: The effect of muscarine was qualitatively described in the Results, but there is no quantification and it is not shown in the Figure. The results should either be reported properly or removed from the Results.

      We remove the last sentence in the Results.

      (9) Methods: The age and sex of the mice should be reported. Transgenic mouse line should be reported (along with stock number if applicable).

      We used C57BL6 mice with transgenic background (Ai14 mice, Jax n007914  reporter line) or C57BL6 wild type mice. This is now indicated in the Methods (lines 566-567).

      (10) Methods: If the viruses are only referred to with their plasmid number, then the capsid used for the viruses should be specified. For example, I believe the AAV-CAG-tomato virus used the retroAAV capsid, which is important to the experiment.

      Thank you for pointing this out. Indeed the AAV-CAG-tdTom virus used the retroAAV capsid, (line 575).

      (11) Data/code availability: I didn't see any sort of data/code availability statement, will the data and code be made publicly available?

      Data are stored on local servers at the SPPIN, Université Paris Cité, and are made available upon reasonable request. Code for intrinsic properties analysis is available on github (https://github.com/schoki0710/Intrinsic_Properties). This information is now included (line 717-720).

      (12) Very minor (and these might be a matter of opinion), but I believe "records" should be "recordings", and "viral constructions" should be "viral constructs".

      The text had benefited from proofreading by Richard Miles, who always preferred “records” to “recordings” in his writings. We choose to keep the current wording.

      Reviewer #2 (Recommendations For The Authors):

      Below are two major points that require clarification.

      (1) In the last set of experiments presented by the authors (Figs 6 onwards) they focus on 'putative L4' PrS cells. For several lines of evidence (outlined below), I am convinced that these neurons are not presubicular, but belong to the subiculum. I think this is a major point that requires substantial clarification, in order to avoid confusion in the field (see also suggestions on how to address this comment at the end of this section).

      Several lines of evidence support the interpretation that, what the authors call 'L4 PrS neurons', are distal subicular cells:

      (1.1) The anatomical location of the retrogradely-labelled cells (from mammillary bodies injections), as shown in Figs 6B, C, and Fig. 6_1B, very clearly indicates that they belong to the distal subiculum. The subicular-to-PrS boundary is a sharp anatomical boundary that follows exactly the curvature highlighted by the authors' red stainings. The authors could also use specific subicular/PrS markers to visualize this border more clearly - e.g. calbindin, Wfs-1, Zinc (though I believe this is not strictly necessary, since from the pattern of AD fibers, one can already draw very clear conclusions, see point 1.3 below).

      Our criteria to delimit the presubiculum are the following: First and foremost, we rely on the defining presence of antero-dorsal thalamic fibers that target specifically the presubiculum and not the neighbouring subiculum (Simonnet et al., 2017, Nassar et al., 2018, Simonnet and Fricker, 2018; Jiayan Liu et al., 2021). This provides the precise outline of the presubicular superficial layers 1 to 3. It may have been confusing to the reviewer that our slicing angle gives horizontal sections. In fact, horizontal sections are favourable to identify the layer structure of the PrS,  based on DAPI staining and the variations in cell body size. The work by Ishihara and Fukuda (2016) illustrates in their Figure 12 that the presubicular layer 4 lies below the presubicular layer 3, and forms a continuation with the subiculum (Sub1). Their Figure 4 indicates with a dotted line the “generally accepted border between the (distal) subiculum and PreS”, and it runs from the proximal tip of superficial cells of the PrS toward the white matter, among the radial direction of the cortical tissue.  We agree with this definition. Others have sliced coronally (Cembrowski et al., 2018) which renders a different visualization of the border region with the subiculum.

      Second, let me explain the procedure for positioning the patch electrode in electrophysiological experiments on horizontal presubicular slices. Louis Richevaux, the first author, who carried out the layer 4 cell recordings, took great care to stay very close (<50 µm) to the lower limit of the zone where the GFP labeled thalamic axons can be seen. He was extremely meticulous about the visualization under the microscope, using LED illumination, for targeting. The electrophysiological signature of layer 4 neurons with initial bursts (but not repeated bursting, in mice) is another criterion to confirm their identity (Huang et al., 2017). Post-hoc morphological revelation showed their apical dendrites, running toward the pia, sometimes crossing through the layer 3, sometimes going around the proximal tip, avoiding the thalamic axons (Figure 6D). For example the cell in Figure 6, suppl. 1 panel D, has an apical dendrite that runs through layer 3 and layer 1. 

      Third, retrograde labeling following stereotaxic injection into the LMN is another criterion to define PrS layer 4. This approach is helpful for visualization, and is based on the defining axonal projection of layer 4 neurons (Yoder and Taube, 2011; Huang et al., 2017). Due to the technical challenge to stereotaxically inject only into LMN, the resultant labeling may not be limited to PrS layer 4. We cannot entirely exclude some overflow of retrograde tracers (B) or retrograde virus (C) to the neighboring MMN. This would then lead to co-labeling of the subiculum. In the main Figure 6, panels B and C, we agree that for this reason the red labelled cell bodies likely include also subicular neurons, on the proximal side, in addition to L4 presubicular neurons. We now point out this caveat in the main text (line 324-326) and in the methods (line 591-592).

      (1.2) Consistent with their subicular location, neuronal morphologies of the 'putative L4 cells' are selectively constrained within the subicular boundaries, i.e. they do not cross to the neighboring PrS (maybe a minor exception in Figs. 6_1D2,3). By definition, a neuron whose morphology is contained within a structure belongs to that structure.

      From a functional point of view, for the HD system, the most important criterion for defining presubicular layer 4 neurons is their axonal projection to the LMN (Yoder and Taube 2011). From an electrophysiological standpoint, it is the capacity of layer 4 neurons to fire initial bursts (Simonnet et al., 2013; Huang et al., 2017).  Anatomically, we note that the expectation that the apical dendrite should go straight up into layer 3 might not be a defining criterion in this curved and transitional periarchicortex. Presubicular layer 4 apical dendrites may cross through layer 3 and exit to the side, towards the subiculum (This is the red dendritic staining at the proximal end of the subiculum, at the frontier with the subiculum, Figure 6 C).

      (1.3) As acknowledged by the authors in the discussion (line 408): the PrS is classically defined by the innervation domain of AD fibers. As Figure 6B clearly indicates, the retrogradely-labelled cells ('putative L4') are convincingly outside the input domain of the AD; hence, they do not belong to the PrS.

      The reviewer is mistaken here, the deep layers 4 and 5/6 indeed do not lie in the zone innervated by the thalamic fibers (Simonnet et al., 2017; Nassar et al., 2018; Simonnet and Fricker, 2018) but still belong to the presubiculum. The presubicular deep layers are located below the superficial layers, next to, and in continuation of the subiculum. This is in agreement with work by Yoder and Taube 2011; Ishihara and Fukuda 2016; Boccara, … Witter, 2015; Peng et al., 2017 (Fig 2D); Yoshiko Honda et al., (Marmoset, Fig 2A) 2022; Balsamo et al., 2022 (Figure 2B).

      (1.4) Along with the above comment: in my view, the optogenetic stimulation experiments are an additional confirmation that the 'putative L4 cells' are subicular neurons, since they do not receive AD inputs at all (hence, they are outside of the PrS); they are instead only indirectly driven upon strong excitation of the PrS. This indirect activation is likely to occur via PrS-to-Subiculum 'back-projections', the existence of which is documented in the literature and also nicely shown by the authors (see Figure 1_1 and line 109).

      See above. Only superficial layers 1-3 of the presubiculum receive direct AD input.

      (1.5) The electrophysiological properties of the 'putative L4 cells' are consistent with their subicular identity, i.e. they show a sag current and they are intrinsically bursty.

      Presubicular layer 4 cells also show bursting behaviour and a sag current (Simonnet et al., 2013; Huang et al., 2017).

      From the above considerations, and the data provided by the authors, I believe that the most parsimonious explanation is that these retrogradely-labelled neurons (from mammillary body injections), referred to by the authors as 'L4 PrS cells', are indeed pyramidal neurons from the distal subiculum.

      We agree that the retrograde labeling is likely not limited to the presubicular layer 4 cells, and we now indicate this in the text (line 324-326). However, the portion of retrogradely labeled neurons that is directly below the layer 3 should be considered as part of the presubiculum.

      I believe this is a fundamental issue that deserves clarification, in order to avoid confusion/misunderstandings in the field. Given the evidence provided, I believe that it would be inaccurate to call these cells 'L4 PrS neurons'. However, I acknowledge the fact that it might be difficult to convincingly and satisfactorily address this issue within the framework of a revision. For example, it is possible that these 'putative L4 cells' might be retrogradely-labelled from the Medial Mammillary Body (a major subicular target) since it is difficult to selectively restrict the injection to the LMN, unless a suitable driver line is used (if available). The authors should also consider the possibility of removing this subset of data (referring to putative L4), and instead focus on the rest of the story (referring to L3)- which I think by itself, still provides sufficient advance.

      We agree with the reviewer that it is difficult to provide a satisfactory answer. To some extent, the reviewer’s comments target the nomenclature of the subicular region. This transitional region between the hippocampus and the entorhinal cortex has been notoriously ill defined, and the criteria are somewhat arbitrary for determining exactly where to draw the line. Based on the thalamic projection, presubicular layers 1-3 can now be precisely outlined, thanks to the use of viral labeling. But the presubicular layer 4 had been considered to be cell-free in early works, and termed ‘lamina dissecans’ (Boccara 2010), as the limit between the superficial and deep layers. Then it became of great interest to us and to the field, when the PrS layer 4 cells were first identified as LMN projecting neurons (Yoder and Taube 2011). This unique back-projection to the upstream region of the HD system is functionally very important, closing the loop of the Papez circuit (mammillary bodies - thalamus - hippocampal structures).

      We note that the reviewer does not doubt our results, rather questions the naming conventions. We therefore maintain our data. We agree that in the future a genetically defined mouse line would help to better pin down this specific neuronal population.

      We thank the reviewer for sharing their concerns and giving us the opportunity to clarify our experimental approach to target the presubicular layer 4. We hope that these explanations will be helpful to the readers of eLife as well.

      (2) The PrS anatomy could be better clarified, especially in relation to its modular organization (see e.g. Preston-Ferrer et al., 2016; Ray et al., 2017; Balsamo et al., 2022). The authors present horizontal slices, where cortical modularity is difficult to visualize and assess (tangential sections are typically used for this purpose, as in classical work from e.g. barrel cortex). I am not asking the authors to validate their observations in tangential sections, but just to be aware that cortical modules might not be immediately (or clearly) apparent, depending on the section orientation and thickness. The authors state that AD fibers were 'not homogeneously distributed' in L3 (line 135) and refer to 'patches of higher density in deep L3' (line 136). These statements are difficult to support unless more convincing anatomy and  . I see some L3 inhomogeneity in the green channel in Fig. 1G (last two panels) and also in Fig. 1K, but this seems to be rather upper L3. I wonder how consistent the pattern is across different injections and at what dorsoventral levels this L3 modularity is observed (I think sagittal sections might be helpful). If validated, these observations could point to the existence of non-homogeneous AD innervation domains in L3 - hinting at possible heterogeneity among the L3 pyramidal cell targets. Notably, modularity in L2 and L1 is not referred to. The authors state that AD inputs 'avoid L2' (line 131) but this statement is not in line with recent work (cited above) and is also not in line with their anatomy data in Fig. 1G, where modularity is already quite apparent in L2 (i.e. there are territories avoided by the AD fibers in L2) and in L1 (see for example the last image in Fig. 1G). This is the case also for the RSC axons (Fig. 1H) where a patchy pattern is quite clear in L1 (see the last image in panel H). Higher-mag pictures might be helpful here. These qualitative observations imply that AD and RSC axons probably bear a precise structural relationship relative to each other, and relative to the calbindin patch/matrix PrS organization that has been previously described. I am not asking the authors to address these aspects experimentally, since the main focus of their study is on L3, where RSC/AD inputs largely converge. Better anatomy pictures would be helpful, or at least a better integration of the authors' (qualitative) observations within the existing literature. Moreover, the authors' calbindin staining in Fig. 1K is not particularly informative. Subicular, PaS, MEC, and PrS borders should be annotated, and higher-resolution images could be provided. The authors should also check the staining: MEC appears to be blank but is known to strongly express calb1 in L2 (see 'island' by Kitamura et al., Ray et al., Science 2014; Ray et al., frontiers 2017). As additional validation for the staining: I would expect that the empty L2 patches in Figs. 1G (last two panels) would stain positive for Calbindin, as in previous work (Balsamo et al. 2022).

      We now provide a new figure showing the pattern of AD innervation in PrS superficial layers 1 to 3, with different dorso-ventral levels and higher magnification (Figure 2). Because our work was aimed at identifying connectivity between long-range inputs and presubicular neurons, we chose to work with horizontal sections that preserve well the majority of the apical dendrites of presubicular pyramidal neurons. We feel it is enriching for the presubicular literature to show the cytoarchitecture from different angles and to show patchiness in horizontal sections. The non-homogeneous AD innervation domains (‘microdomains’) in L3 were consistently observed across different injections in different animals.

      Author response image 1.

      Thalamic fiber innervation pattern. A, ventral, and B, dorsal horizontal section of the Presubiculum containing ATN axons expressing GFP. Patches of high density of ATN axonal ramifications in L3 are indicated as “ATN microdomains”. Layers 1, 2, 3, 4, 5/6 are indicated.  C, High magnification image (63x optical section)(different animal).<br />

      We also provide a supplementary figure with images of horizontal sections of calbindin staining in PrS, with a larger crop, for the reviewer to check (Figure 3, see below). We thank the reviewer for pointing out recent studies using tangential sections. Our results agree with the previous observation that AD axons are found in calbindin negative territories (cf Fig 1K). Calbindin+ labeling is visible in the PrS layer 2 as well as in some patches in the MEC (Figure 3 panel A). Calbindin staining tends to not overlap with the territories of ATN axonal ramification. We indicate the inhomogeneities of anterior thalamic innervation that form “microdomains” of high density of green labeled fibers, located in layer 1 and layer 3 (Figure 3, Panel A, middle). Panel B shows another view of a more dorsal horizontal section of the PrS, with higher magnification, with a big Calbindin+ patch near the parasubiculum.

      The “ATN+ microdomains” possess a high density of axonal ramifications from ATN, and have been previously documented in the literature. They are consistently present. Our group had shown them in the article by Nassar et al., 2018, at different dorsoventral levels (Fig 1 C (dorsal) and 1D (ventral) PrS). See also Simonnet et al., 2017, Fig 2B, for an illustration of the typical variations in densities of thalamic fibers, and supplementary Figure 1D. Also Jiayan Liu et al., 2021 (Figure 2 and Fig 5) show these characteristic microzones of dense thalamic axonal ramifications, with more or less intense signals across layers 1, 2, and 3.  While it is correct that thalamic axons can be seen to cross layer 2 to ramify in layer 1, we maintain that AD axons typically do not ramify in layer 2. We modify the text to say, “mostly” avoiding L2 (line 130).

      The reviewer is correct in pointing out that the 'patches of higher density in deep L3' are not only in the deep L3, as in the first panel in Fig 1G, but in the more dorsal sections they are also found in the upper L3. We change the text accordingly (line 135-136) and we provide the layer annotation in Figure 1G. We further agree with the reviewer that RSC axons also present a patchy innervation pattern. We add this observation in the text (line 144).

      It is yet unclear whether anatomical microzones of dense ATN axon ramifications in L3 might fulfill the criteria of a functional modularity, as it is the case for the calbindin patch/matrix PrS organization (Balsamo et al., 2022). As the reviewer points out, this will require more information on the precise structural relationship of AD and RSC axons relative to each other, as well as functional studies. Interestingly, we note a degree of variation in the amplitudes of oEPSC from different L3 neurons (Fig. 2F, discussion line 420; 428), which might be a reflection of the local anatomo-functional micro-organization.

      Minor points:

      (1) The pattern or retrograde labelling, or at least the way is referred to in the results (lines 104ff), seems to imply some topography of AD-to-PreS projections. Is it the case? How consistent are these patterns across experiments, and individual injections? Was there variability in injection sites along the dorso-ventral and possibly antero-posterior PrS axes, which could account for a possibly topographical AD-to-PrS input pattern? It would be nice to see a DAPI signal in Fig. 1B since the AD stands out quite clearly in DAPI (Nissl) alone.

      Yes, we find a consistent topography for the AD-to-PrS projection, for similar injection sites in the presubiculum. The coordinates for retrograde labeling were as indicated -4.06 (AP), 2.00 (ML) and -2.15 mm (DV) such that we cannot report on possible variations for different injection sites.

      (2) Fig. 2_2KM: this figure seems to show the only difference the authors found between AD and RS input properties. The authors could consider moving these data into main Fig. 2 (or exchanging them with some of the panels in F-O, which instead show no difference between AD and RSC). Asterisks/stats significance is not visible in M.

      For space reasons we leave the panels of Fig. 2_2KM in the supplementary section. We increased the size of the asterisk in M.

      (3) The data in Fig. 1_1 are quite interesting, since some of the PrS projection targets are 'non-canonical'. Maybe the authors could consider showing some injection sites, and some fluorescence images, in addition to the schematics. Maybe the authors could acknowledge that some of these projection targets are 'putative' unless independently verified by e.g. retrograde labeling. Unspecific white matter labelling and/or spillover is always a potential concern.

      We now include the image of the injection site for data in Fig. 1_1 as a supplementary Fig. 1_2. The Figure 1_1 shows the retrogradely labeled upstream areas of Presubiculum.

      Author response image 2.

      Retrobeads were injected in the right Presubiculum.<br />

      (4) The authors speculate that the near-coincident summation of RS + AD inputs in L3 cells could be a potential mechanism for the binding of visual + HD information in PrS. However, landmarks are learned, and learning typically implies long-term plasticity. As the authors acknowledge in the discussion (lines 493ff) GluR1 is not expressed in PrS cells. What alternative mechanics could the authors envision? How could the landmark-update process occur in PrS, if is not locally stored? RSC could also be involved (Jakob et al) as acknowledged in the introduction - the authors should keep this possibility open also in the discussion.

      A similar point has been raised by Reviewer 1, please check our answer to their point 2. Briefly, our results indicate that HD-to-landmark updating is a multi-step process. RSC may be one of the places where landmarks are learned. The subsequent temporal mapping of HD to landmark signals in PrS might be plasticity-free, as matching directional with visual landmark information based on temporal coincidence does not necessarily require synaptic plasticity.  It seems likely that there is no local storage and no change in synaptic weights in PrS. The landmark-anchored HD signals reach LMN via L4 neurons, sculpting network dynamics across the Papez circuit. One possibility is that the trace of a landmark that matches HD may be stored as patterns of neural activity that could guide navigation (cf. El-Gaby et al., 2024, Nature) Clearly more work is needed to understand how the HD attractor is updated on a mechanistic level. Recent work in prefrontal cortex mentions “activity slots” and delineates algorithms for dynamic control of cognitive maps without synaptic plasticity (Whittington et al., 2025, Neuron): information may be stored in neural attractor activity, and the idea that working memory may rely on recurrent updates of neural activity might generalize to the HD system. We include these considerations in the discussion (line 499-503; 523-533) and also point to alternative models (line 518 -522) including modeling work in the retrosplenial cortex.

      (5) The authors state that (lines 210ff) their cluster analysis 'provided no evidence for subpopulations of layer 3 cells (but see Balsamo et al., 2022)' implying an inconsistency; however, Balsamo et al also showed that the (in vivo) ephys properties of the two HD cell 'types' are virtually identical, which is in line with the 'homogeneity' of L3 ephys properties (in slice) in the authors' data. Regarding the possible heterogeneity of L3 cells: the authors report inhomogeneous AD innervation domains in L3 (see also main comment 2) and differences in input summation (some L3 cells integrate linearly, some supra-linearly; lines 272) which by itself might already imply some heterogeneity. I would therefore suggest rewording the statements to clarify what the lack of heterogeneity refers to.

      We agree. In line 212 we now state “cluster analysis (Figure 2D) provided no evidence for subpopulations of layer 3 cells in terms of intrinsic electrophysiological properties (see also Balsamo et al., 2022).”

      (6) n=6 co-recorded pairs are mentioned at line 348, but n=9 at line 366. Are these numbers referring to the same dataset? Please correct or clarify

      Line 349 refers to a set of 6 co-recorded pairs (n=12 neurons) in double injected mice with Chronos injected in ATN and Chrimson in RSC (cf. Fig. 7E). The 9 pairs mentioned in line 367 refer to another type of experiment where we stimulated layer 3 neurons by depolarizing them to induce action potential firing while recording neighboring layer 4 neurons to assess connectivity. Line 367  now reads: “In n = 9 paired recordings, we did not detect functional synapses between layer 3 and layer 4 neurons.”

      Reviewer #3 (Recommendations For The Authors):

      Questions for the authors/points for addressing:

      I found that the slice electrophysiology experiments were not reported with sufficient detail. For example, in Figure 2, I am assuming that the voltage clamp experiments were carried out using the Cs-based recording solution, while the current clamp experiments were carried out using the K-Gluc intracellular solution. However, this is not explicitly stated and it is possible that all of these experiments were performed using the K-Gluc solution, which would give slightly odd EPSCs due to incomplete space/voltage clamp. Furthermore, the method states that gabazine was used to block GABA(A) receptor-mediated currents, but not when this occurred. Was GABAergic neurotransmission blocked for all measurements of EPSC magnitude/dynamics? If so, why not block GABA(B) receptors? If not blocking GABAergic transmission for measuring EPSCs, why not? This should be stated explicitly either way.

      The addition of drugs or difference of solution is indicated in the figure legend and/or in the figure itself, as well as in the methods. We now state explicitly: “In a subset of experiments, the following drugs were used to modulate the responses to optogenetic stimulations; the presence of these drugs is indicated in the figure and figure legend, whenever applicable.” (line 632). A Cs-based internal solution and gabazine were used in Figure 5, this is now indicated in the Methods section (line 626). All other experiments were performed using K-Gluc as an internal solution and ACSF.

      Methods: The experiments involving animals are incompletely reported. For example, were both sexes used? The methods state "Experiments were performed on wild‐type and transgenic C57Bl6 mice" - what transgenic mice were used and why is this not reported in detail (strain, etc)? I would refer the authors to the ARRIVE guidelines for reporting in vivo experiments in a reproducible manner (https://arriveguidelines.org/).

      We now added this information in the methods section, subsection “Animals” (line 566-567). Animals of both sexes were used. The only transgenic mouse line used was the Ai14 reporter line (no phenotype), depending on the availability in our animal facility.

      For experiments comparing ATN and RSC inputs onto the same neuron (e.g. Figure 2 supplement 2 G - J), are the authors certain that the observed differences (e.g. rise time and paired-pulse facilitation on the ATN input) are due to differences in the synapses and not a result of different responses of the opsins? Refer to https://pubmed.ncbi.nlm.nih.gov/31822522/ from Jess Cardin's lab. This could easily be tested by switching which opsin is injected into which nucleus (a fair amount of extra work) or comparing the Chrimson synaptic responses with those evoked using Chronos on the same projection, as used in Figure 2 (quite easy as authors should already have the data).

      We actually did switch the opsins across the two injection sites. In Figure 2 - supplement 2G-J, the values linked by a dashed line result from recordings in the switched configuration with respect to the original configuration (in full lines, Chronos injected in RSC and Chrimson in ATN). The values from switched configuration followed the trend of the main configuration and were not statistically different (Mann-Whitney U test).

      Statistical reporting: While the number of cells is generally reported for experiments, the number of slices and animals is not. While slice ephys often treat cells as individual biological replicates, this is not entirely appropriate as it could be argued that multiple cells from a single animal are not independent samples (some sort of mixed effects model that accounts for animals as a random effect would be better). For the experiments in the manuscript, I don't think this is necessary, but it would certainly reassure the reader to report how many animals/slices each dataset came from. At a bare minimum, one would want any dataset to be taken from at least 3 animals from 2 different litters, regardless of how many cells are in there.

      Our slice electrophysiology experiments include data from 38 successfully injected animals: 14 animals injected in ATN, 20 animals injected in RSC, and 4 double injected animals. Typically, we recorded 1 to 3 cells per slice. We now include this information in the text or in the figure legends (line 159, 160, 297, 767, 826, 831, 832, 839, 845, 901, 941).

      For the optogenetic experiments looking at the summation of EPSPs (e.g. figure 4), I have two questions: why were EPSPs measured and not EPSCs? The latter would be expected to give a better readout of AMPA receptor-mediated synaptic currents. And secondly, why was 20 Hz stimulation used for these experiments? One might expect theta stimulation to be a more physiologically-relevant frequency of stimulation for comparing ATN and RSC inputs to single neurons, given the relevance with spatial navigation and that the paper's conclusions were based around the head direction system. Similarly, gamma stimulation may also have been informative. Did the authors try different frequencies of stimulation?

      Question 1. The current clamp configuration allows to measure  EPSPamplification/prolongation by NMDA or persistent Na currents (cf.  Fricker and Miles 2000), which might contribute to supralinearity.

      Question 2. In a previous study from our group about the AD to PrS connection (Nassar et al., 2018), no significant difference was observed on the dynamics of EPSCs between stimulations at 10 Hz versus 30 Hz. Therefore we chose 20 Hz. This value is in the range of HD cell firing (Taube 1995, 1998 (peak firing rates, 18 to 24 spikes/sec in RSC; 41 spikes/sec in AD)(mean firing rates might be lower), Blair and Sharp 1995). In hindsight, we agree that it would have been useful to include 8Hz or 40Hz stimulations. 

      The GABA(A) antagonist experiments in Figure 5 are interesting but I have concerns about the statistical power of these experiments - n of 3 is absolutely borderline for being able to draw meaningful conclusions, especially if this small sample of cells came from just 1 or 2 animals. The number of animals used should be stated and/or caution should be applied when considering the potential mechanisms of supralinear summation of EPSPs. It looks like the slight delay in RSC input EPSP relative to ATN that was in earlier figures is not present here - could this be the loss of feedforward inhibition?

      The current clamp experiments in the presence of QX314 and a Cs gluconate based internal solution were preceded by initial experiments using puff applications of glutamate to the recorded neurons (not shown). Results from those experiments had pointed towards a role for TTX resistant sodium currents and for NMDA receptor activation as a factor favoring the amplification and prolongation of glutamate induced events. They inspired the design of the dual wavelength stimulation experiments shown in Figure 5, and oriented our discussion of the results. We agree of course that more work is required to dissect the role of disinhibition for EPSP amplification. This is however beyond the present study.

      Concerning the EPSP onset delays following RSC input stimulation:  In this set of experiments, we compensated for the notoriously longer delay to EPSP onset, following RSC axon stimulation, by shifting the photostimulation (red) of RSC fibers to -2 ms, relative to the onset of photostimulation of ATN fibers (blue). This experimental trick led to an improved  alignment of the onset of the postsynaptic response, as shown in the figure below for the reviewer.

      Author response image 3.

      In these experiments, the onset of RSC photostimulation was shifted forward in time by -2 ms, in an attempt to better align the EPSP onset to the one evoked by ATN stimulation.<br />

      We insert in the results a sentence to indicate that experiments illustrated in Figure 5 were performed in only a small sample of 3 cells that came from 2 mice (line 297), so caution should be applied. In the discussion we  formulate more carefully, “From a small sample of cells it appears that EPSP amplification may be facilitated by a reduction in synaptic inhibition (n = 3; Figure 5)” (line 487).

      Figure 7: I appreciate the difficulties in making dual recordings from older animals, but no conclusion about the RSC input can legitimately be made with n=1.

      Agreed. We want to avoid any overinterpretation, and point out in the results section that the RSC stimulation data is from a single cell pair. The sentence now reads : “... layer 4 neurons occurred after firing in the layer 3 neuron, following ATN afferent stimuli, in 4 out of 5 cell pairs. We also observed this sequence when RSC input was activated, in one tested pair.” line (347-349)

      Minor points:

      Line 104: 'within the two subnuclei that form the anterior thalamus' - the ATN actually has three subdivisions (AD, AV, AM) so this should state 'two of the three nuclei that form the anterior thalamus...'

      Corrected, line 103

      Line 125: should read "figure 1F" and not "figure 2F".

      Corrected, line 124

      Line 277-280: Why were two different posthoc tests used on the same data in Figures 3E & F?

      We used Sidak’s multicomparison test to compare each event Sum vs. Dual (two different configurations at each time point - asterisks) and Friedman’s and Dunn’s to compare the nth EPSP amplitude to the first one for Dual events (same configuration between time points - hashmarks). We give two-way ANOVA results in the legend.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1:

      Point 1.1

      Summary: This paper describes a reanalysis of data collected by Gagne et al. (2020), who investigated how human choice behaviour differs in response to changes in environmental volatility. Several studies to date have demonstrated that individuals appear to increase their learning rate in response to greater volatility and that this adjustment is reduced amongst individuals with anxiety and depression. The present authors challenge this view and instead describe a novel Mixture of Strategies (MOS) model, that attributes individual differences in choice behaviour to different weightings of three distinct decision-making strategies. They demonstrate that the MOS model provides a superior fit to the data and that the previously observed differences between patients and healthy controls may be explained by patients opting for a less cognitively demanding, but suboptimal, strategy. 

      Strengths: 

      The authors compare several models (including the original winning model in Gagne et al., 2020) that could feasibly fit the data. These are clearly described and are evaluated using a range of model diagnostics. The proposed MOS model appears to provide a superior fit across several tests. 

      The MOS model output is easy to interpret and has good face validity. This allows for the generation of clear, testable, hypotheses, and the authors have suggested several lines of potential research based on this. 

      We appreciate the efforts in understanding our manuscript. This is a good summary.

      Point 1.2

      The authors justify this reanalysis by arguing that learning rate adjustment (which has previously been used to explain choice behaviour on volatility tasks) is likely to be too computationally expensive and therefore unfeasible. It is unclear how to determine how "expensive" learning rate adjustment is, and how this compares to the proposed MOS model (which also includes learning rate parameters), which combines estimates across three distinct decision-making strategies. 

      We are sorry for this confusion. Actually, our motivation is that previous models only consider the possibility of learning rate adaptation to different levels of environmental volatility. The drawback of previous computational modeling is that they require a large number of parameters in multi-context experiments. We feel that learning rate adaptation may not be the only mechanisms or at least there may exist alternative explanations. Understanding the true mechanisms is particularly important for rehabilitation purposes especially in our case of anxiety and depression. To clarify, we have removed all claims about the learning rate adaptation is “too complex to understand”.

      Point 1.3

      As highlighted by the authors, the model is limited in its explanation of previously observed learning differences based on outcome value. It's currently unclear why there would be a change in learning across positive/negative outcome contexts, based on strategy choice alone. 

      Thanks for mentioning this limitation. We want to highlight two aspect of work.

      First, we developed the MOS6 model primarily to account for the learning rate differences between stable and volatile contexts, and between healthy controls and patients, not for between positive and negative outcomes. In the other words, our model does not eliminate the possibility of different learning rate in positive and negative outcomes.

      Second, Figure 3A shows that FLR (containing different learning parameters for positive/negative outcomes) even performed worse than MOS6 (setting identical learning rate for positive/negative outcomes). This result question whether learning rate differences between positive/negative outcomes exist in our dataset.

      Action: We now include this limitation in lines 784-793 in discussion:

      “The MOS model is developed to offer context-free interpretations for the learning rate differences observed both between stable and volatile contexts and between healthy individuals and patients. However, we also recognize that the MOS account may not justify other learning rate effects based solely on strategy preferences. One such example is the valence-specific learning rate differences, where learning rates for better-than-expected outcomes are higher than those for worse-than-expected outcomes (Gagne et al., 2020). When fitted to the behavioral data, the context-dependent MOS22 model does not reveal valence-specific learning rates (Supplemental Note 4). Moreover, the valence-specific effect was not replicated in the FLR22 model when fitted to the synthesized data of MOS6.”

      Point 1.4

      Overall the methods are clearly presented and easy to follow, but lack clarity regarding some key features of the reversal learning task.

      Throughout the method the stimuli are referred to as "right" and "left". It's not uncommon in reversal learning tasks for the stimuli to change sides on a trial-by-trial basis or counterbalanced across stable/volatile blocks and participants. It is not stated in the methods whether the shapes were indeed kept on the same side throughout. If this is the case, please state it. If it was not (and the shapes did change sides throughout the task) this may have important implications for the interpretation of the results. In particular, the weighting of the habitual strategy (within the Mixture of Strategies model) could be very noisy, as participants could potentially have been habitual in choosing the same side (i.e., performing the same motor movement), or in choosing the same shape. Does the MOS model account for this? 

      We are sorry for the confusion. Yes, two shapes indeed changed sides throughout the task. We replaced the “left” and “right” with “stimulus 1” and “stimulus 2”. We also acknowledge the possibility that participants may develop a habitual preference for a particular side, rather than a shape. Due to the counterbalance design, habitual on side will introduce a random selection noise in choices, which should be captured by the MOS model through the inverse temperature parameter.  

      Point 1.5

      Line 164: "Participants received points or money in the reward condition and an electric shock in the punishment condition." What determined whether participants received points or money, and did this differ across participants? 

      Thanks! We have the design clarified in lines 187-188:

      “Each participant was instructed to complete two blocks of the volatile reversal learning task, one in the reward context and the other in the aversive context”,

      and in lines:

      “A total of 79 participants completed tasks in both feedback contexts. Four participants only completed the task in the reward context, while three participants only completed the aversive task.”

      Point 1.6

      Line 167: "The participant received feedback only after choosing the correct stimulus and received nothing else" Is this correct? In Figure 1a it appears the participant receives feedback irrespective of the stimulus they chose, by either being shown the amount 1-99 they are being rewarded/shocked, or 0. Additionally, what does the "correct stimulus" refer to across the two feedback conditions? It seems intuitive that in the reward version, the correct answer would be the rewarding stimulus - in the loss version is the "correct" answer the one where they are not receiving a shock? 

      Thanks for raising this issue. We removed the term “correct stimulus” and revised the lines 162-166 accordingly:

      “Only one of the two stimuli was associated with actual feedback (0 for the other one). The feedback magnitude, ranged between 1-99, is sampled uniformly and independently for each shape from trial to trial. Actual feedback was delivered only if the stimulus associated with feedback was chosen; otherwise, a number “0” was displayed on the screen, signifying that the chosen stimulus returns nothing.”

      Point 1.7

      Line 176: "The whole experiment included two runs each for the two feedback conditions." Does this mean participants completed the stable and volatile blocks twice, for each feedback condition? (i.e., 8 blocks total, 4 per feedback condition). 

      Thanks! We have removed the term “block”, and now we refer to it as “context”. In particular, we removed phrases like “stable block” and “volatile block” and used “context” instead.

      Action: See lines 187-189 for the revised version.

      “Each participant was instructed to complete two runs of the volatile reversal learning task, one in the reward context and the other in the aversive context. Each run consisted of 180 trials, with 90 trials in the stable context and 90 in the volatile context (Fig. 1B).”

      Point 1.8

      In the expected utility (EU) strategy of the Mixture or Strategies model, the expected value of the stimulus on each trial is produced by multiplying the magnitude and probability of reward/shock. In Gagne et al.'s original paper, they found that an additive mixture of these components better-captured participant choice behaviour - why did the authors not opt for the same strategy here? 

      Thanks for asking this. Their strategy basic means the mixture of PF+MO+HA, where PF stands for the feedback probability (e.g., 0.3 or 0.7) without multiplying feedback magnitude. However, ours are EU+MO+HA, where EU stands for feedback probability x feedback magnitude. We did compare these two strategies and the model using their strategy performed much worse than ours (see the red box below).

      Author response image 1.

      Thorough model comparison.

      Point 1.9

      How did the authors account for individuals with poor/inattentive responding, my concern is that the habitual strategy may be capturing participants who did not adhere to the task (or is this impossible to differentiate?). 

      The current MOS6 model distinguishes between the HA strategy and the inattentive response. Due to the counterbalance design, the HA strategy requires participants to actively track the stimuli on the screen. In contrast, the inattentive responding, like the same motor movement mentioned in Point 1.4, should exhibit random selection in their behavioral data, which should be account by the inverse temperature parameter.

      Point 1.10

      The authors provide a clear rationale for, and description of, each of the computational models used to capture participant choice behaviour. 

      • Did the authors compare different combinations of strategies within the MOS model (e.g., only including one or two strategies at a time, and comparing fit?) I think more explanation is needed as to why the authors opted for those three specific strategies. 

      We appreciate this great advice. Following your advice, we conducted a thorough model comparisons. Please refer to Figure R1 above. The detailed text descriptions of all the models in Figure R1 are included in Supplemental Note 1.

      Point 1.11

      Please report the mean and variability of each of the strategy weights, per group. 

      Thanks. We updated the mean of variability of the strategies in lines 490-503:

      “We first focused on the fitted parameters of the MOS6 model. We compared the weight parameters (, , ) across groups and conducted statistical tests on their logits (, , ). The patient group showed a ~37% preference towards the EU strategy, which is significantly weaker than the ~50% preference in healthy controls (healthy controls’ : M = 0.991, SD = 1.416; patients’ : M = 0.196, SD = 1.736; t(54.948) = 2.162, p = 0.035, Cohen’s d = 0.509; Fig. 4A). Meanwhile, the patients exhibited a weaker preference (~27%) for the HA strategy compared to healthy controls (~36%) (healthy controls’ : M = 0.657,  SD = 1.313; patients’ : M = -0.162, SD = 1.561; t(56.311) = 2.455, p = 0.017, Cohen’s d = 0.574), but a stronger preference for the MO strategy (36% vs. 14%; healthy controls’ : M = -1.647,  SD = 1.930; patients’ : M = -0.034, SD = 2.091; t(63.746) = -3.510, p = 0.001, Cohen’s d = 0.801). Most importantly, we also examined the learning rate parameter in the MOS6 but found no group differences (t(68.692) = 0.690, p = 0.493, Cohen’s d = 0.151). These results strongly suggest that the differences in decision strategy preferences can account for the learning behaviors in the two groups without necessitating any differences in learning rate per se.”

      Point 1.12

      The authors compare the strategy weights of patients and controls and conclude that patients favour more simpler strategies (see Line 417), based on the fact that they had higher weights for the MO, and lower on the EU.

      (1) However, the finding that control participants were more likely to use the habitual strategy was largely ignored. Within the control group, were the participants significantly more likely to opt for the EU strategy, over the HA? 2) Further, on line 467 the authors state "Additionally, there was a significant correlation between symptom severity and the preference for the HA strategy (Pearson's r = -0.285, p = 0.007)." Apologies if I'm mistaken, but does this negative correlation not mean that the greater the symptoms, the less likely they were to use the habitual strategy?

      I think more nuance is needed in the interpretation of these results, particularly in the discussion. 

      Thanks. The healthy participants seemed more likely to opt for the EU strategy, although this difference did not reach significance (paired-t(53) = 1.258, p = 0.214, Cohen’s d = 0.242). We systematically explore the role of HA. Compared to the MO, the HA saves cognitive resources but yields a significantly higher hit rate (Fig. 4A). Therefore, a preference for the HA over the MO strategy may reflect a more sophisticated balance between reward and complexity within an agent: when healthier subjects run out of cognitive resources for the EU strategy, they will cleverly resort to the HA strategy, adopting a simpler strategy but still achieving a certain level of hit rate. This explains the negative symptom-HA correlation. As clever as the HA strategy is, it is not surprising that the health control participants opt more for the HA during decision-making.

      However, we are cautious to draw strong conclusion on (1) non-significant difference between EU and HA within health controls and (2) the negative symptom-HA correlation. The reason is that the MOS22, the context-dependent variant, 1) exhibited a significant higher preference for EU over HA (paired-t(53) = 4.070, p < 0.001, Cohen’s d = 0.825) and 2) did not replicate this negative correlation (Supplemental Information Figure S3).

      Action: Simulation analysis on the effects of HA was introduced in lines 556-595 and Figure 4. We discussed the effects of HA in lines 721-733:

      “Although many observed behavioral differences can be explained by a shift in preference from the EU to the MO strategy among patients, we also explore the potential effects of the HA strategy. Compared to the MO, the HA strategy also saves cognitive resources but yields a significantly higher hit rate (Fig. 4A). Therefore, a preference for the HA over the MO strategy may reflect a more sophisticated balance between reward and complexity within an agent (Gershman, 2020): when healthier participants exhaust their cognitive resources for the EU strategy, they may cleverly resort to the HA strategy, adopting a simpler strategy but still achieving a certain level of hit rate. This explains the stronger preference for the HA strategy in the HC group (Fig. 3A) and the negative correlation between HA preferences and symptom severity  (Fig. 5). Apart from shedding light on the cognitive impairments of patients, the inclusion of the HA strategy significantly enhances the model’s fit to human behavior (see examples in Daw et al. (2011); Gershman (2020); and also Supplemental Note 1 and Supplemental Figure S3).”

      Point 1.13

      Line 513: "their preference for the slowest decision strategy" - why is the MO considered the slowest strategy? Is it not the least cognitively demanding, and therefore, the quickest? 

      Sorry for the confusion. In Fig. 5C, we conducted simulations to estimate the learning speed for each strategy. As shown below, the MO strategy exhibits a flat learning curve. Our claim on the learning speed was based solely on simulation outcomes without referring to cognitive demands. Note that our analysis did not aim to compare the cognitive demands of the MO and HA strategies directly.

      Action: We explain the learning speed of the three strategies in lines 571-581.

      Point 1.14

      The authors argue that participants chose suboptimal strategies, but do not actually report task performance. How does strategy choice relate to the performance on the task (in terms of number of rewards/shocks)? Did healthy controls actually perform any better than the patient group? 

      Thanks for the suggestion. The answers are: 1) EU is the most rewarding > the HA > the MO (Fig. 5A), and 2) yes healthy controls did actually perform better than patients in terms of hit rate (Fig. 2).

      Action: We included additional sections on above analyses in lines 561-570 and lines 397-401.

      Point 1.15

      The authors speculate that Gagne et al. (2020) did not study the relationship between the decision process and anxiety and depression, because it was too complex to analyse. It's unclear why the FLR model would be too complex to analyse. My understanding is that the focus of Gagne's paper was on learning rate (rather than noise or risk preference) due to this being the main previous finding. 

      Thanks! Yes, our previous arguments are vague and confusing. We have removed all this kind of arguments.

      Point 1.16

      Minor Comments: 

      • Line 392: Modeling fitting > Model fitting 

      • Line 580 reads "The MO and HA are simpler heuristic strategies that are cognitively demanding."

      - should this read as less cognitively demanding? 

      • Line 517: health > healthy 

      • Line 816: Desnity > density 

      Sorry for the typo! They have all been fixed.

      Reviewer #2:

      Point 2.1

      Summary: Previous research shows that humans tend to adjust learning in environments where stimulus-outcome contingencies become more volatile. This learning rate adaptation is impaired in some psychiatric disorders, such as depression and anxiety. In this study, the authors reanalyze previously published data on a reversal-learning task with two volatility levels. Through a new model, they provide some evidence for an alternative explanation whereby the learning rate adaptation is driven by different decision-making strategies and not learning deficits. In particular, they propose that adjusting learning can be explained by deviations from the optimal decision-making strategy (based on maximizing expected utility) due to response stickiness or focus on reward magnitude. Furthermore, a factor related to the general psychopathology of individuals with anxiety and depression negatively correlated with the weight on the optimal strategy and response stickiness, while it correlated positively with the magnitude strategy (a strategy that ignores the probability of outcome). 

      Thanks for evaluating our paper. This is a good summary.

      Point 2.2

      My main concern is that the winning model (MOS6) does not have an error term (inverse temperature parameter beta is fixed to 8.804). 

      (1) It is not clear why the beta is not estimated and how were the values presented here chosen. It is reported as being an average value but it is not clear from which parameter estimation. Furthermore, with an average value for participants that would have lower values of inverse temperature (more stochastic behaviour) the model is likely overfitting.

      (2) In the absence of a noise parameter, the model will have to classify behaviour that is not explained by the optimal strategy (where participants simply did not pay attention or were not motivated) as being due to one of the other two strategies.

      We apologize for any confusion caused by our writing. We did set the inverse temperature as a free parameter and quantitatively estimate it during the model fitting and comparison. We also created a table to show the free parameters for each models. In the previous manuscript, we did mention “temperature parameter beta is fixed to 8.804”, but only for the model simulation part, which is conducted to interpret some model behaviors.

      We agree with the concern that using the averaged value over the inverse temperature could lead to overfitting to more stochastic behaviors. To mitigate this issue, we now used the median as a more representative value for the population during simulation. Nonetheless, this change does not affect our conclusion (see simulation results in Figures 4&6).

      Action: We now use the term “free parameter” to emphasize that the inverse temperature was fitted rather than fixed. We also create a new table “Table 1”  in line 458 to show all the free parameters within a model. We also update the simulation details in lines 363-391 for more clarifications.

      Point 2.3

      (3) A model comparison among models with inverse temperature and variable subsets of the three strategies (EU + MO, EU + HA) would be interesting to see. Similarly, comparison of the MOS6 model to other models where the inverse temperature parameter is fixed to 8.804).

      This is an important limitation because the same simulation as with the MOS model in Figure 3b can be achieved by a more parsimonious (but less interesting) manipulation of the inverse temperature parameter.

      Thanks, we added a comparison between the MOS6 and the two lesion models (EU + MO, EU + HA). Please refer to the figure below and Point 1.8.

      We also realize that the MO strategy could exhibit averaged learning curves similar to random selection. To confirm that patients' slower learning rates are due to a preference for the MO strategy, we compared the MOS6 model with a variant (see the red box below) in which the MO strategy is replaced by Random (RD) selection that assigns a 0.5 probability to both choices. This comparison showed that the original MOS6 model with the MO strategy better fits human data.

      Author response image 2.

      Point 2.4

      Furthermore, the claim that the EU represents an optimal strategy is a bit overstated. The EU strategy is the only one of the three that assumes participants learn about the stimulus-outcomes contingencies. Higher EU strategy utilisation will include participants that are more optimal (in maximum utility maximisation terms), but also those that just learned better and completely ignored the reward magnitude.

      Thank you for your feedback. We have now revised the paper to remove all statement about “EU strategy is the optimal” and replaced by “EU strategy is rewarding but complex”. We agree that both the EU strategy and the strategy only focusing on feedback probability (i.e., ignoring the reward magnitude, refer to as the PF strategy) are rewarding but complex beyond two simple heuristics. We also included the later strategy in our model comparisons (see the next section Point 2.5).

      Point 2.5

      The mixture strategies model is an interesting proposal, but seems to be a very convoluted way to ask: to what degree are decisions of subjects affected by reward, what they've learned, and response stickiness? It seems to me that the same set of questions could be addressed with a simpler model that would define choice decisions through a softmax with a linear combination of the difference in rewards, the difference in probabilities, and a stickiness parameter. 

      Thanks for suggesting this model. We did include the proposed linear combination models (see “linear comb.” in the red box below) and found that it performed significantly worse than the MOS6.

      Action: We justified our model selection criterion in the Supplemental Note 1.

      Author response image 3.

      Point 2.6

      Learning rate adaptation was also shown with tasks where decision-making strategies play a less important role, such as the Predictive Inference task (see for instance Nassar et al, 2010). When discussing the merit of the findings of this study on learning rate adaptation across volatility blocks, this work would be essential to mention. 

      Thanks for mentioning this great experimental paradigm, which provides an ideal solution for disassociating the probability learning and decision process. We have discussed about this paradigm as well as the associated papers in discussion lines 749-751, 763-765, and 796-801.

      Point 2.7

      Minor mistakes that I've noticed:

      Equation 6: The learning rate for response stickiness is sometimes defined as alpha_AH or alpha_pi.

      Supplementary material (SM) Contents are lacking in Note1. SM talks about model MOS18, but it is not defined in the text (I am assuming it is MOS22 that should be talked about here).

      Thanks! Fixed.

      Reviewer #3:

      Point 3.1

      Summary: This paper presents a new formulation of a computational model of adaptive learning amid environmental volatility. Using a behavioral paradigm and data set made available by the authors of an earlier publication (Gagne et al., 2020), the new model is found to fit the data well. The model's structure consists of three weighted controllers that influence decisions on the basis of (1) expected utility, (2) potential outcome magnitude, and (3) habit. The model offers an interpretation of psychopathology-related individual differences in decision-making behavior in terms of differences in the relative weighting of the three controllers.

      Strengths: The newly proposed "mixture of strategies" (MOS) model is evaluated relative to the model presented in the original paper by Gagne et al., 2020 (here called the "flexible learning rate" or FLR model) and two other models. Appropriate and sophisticated methods are used for developing, parameterizing, fitting, and assessing the MOS model, and the MOS model performs well on multiple goodness-of-fit indices. The parameters of the model show decent recoverability and offer a novel interpretation for psychopathology-related individual differences. Most remarkably, the model seems to be able to account for apparent differences in behavioral learning rates between high-volatility and low-volatility conditions even with no true condition-dependent change in the parameters of its learning/decision processes. This finding calls into question a class of existing models that attribute behavioral adaptation to adaptive learning rates. 

      Thanks for evaluating our paper. This is a good summary.

      Point 3.2<br /> (1) Some aspects of the paper, especially in the methods section, lacked clarity or seemed to assume context that had not been presented. I found it necessary to set the paper down and read Gagne et al., 2020 in order to understand it properly.

      (3) Clarification-related suggestions for the methods section: <br /> - Explain earlier that there are 4 contexts (reward/shock crossed with high/low volatility). Lines 252-307 contain a number of references to parameters being fit separately per context, but "context" was previously used only to refer to the two volatility levels. 

      Action: We have placed the explanation as well as the table about the 4 contexts (stable-reward/stable-aversive/volatile-reward/volatile-aversive) earlier in the section that introduces the experiment paradigm (lines 177-186):

      “Participants was supposed to complete this learning and decision-making task in four experimental contexts (Fig. 1A), two feedback contexts (reward or aversive)  two volatility contexts (stable or volatile). Participants received points in the reward context and an electric shock in the aversive context. The reward points in the reward context were converted into a monetary bonus by the end of the task, ranging from £0 to £10. In the stable context, the dominant stimulus (i.e., a certain stimulus induces the feedback with a higher probability) provided a feedback with a fixed probability of 0.75, while the other one yielded a feedback with a probability of 0.25. In the volatile context, the dominant stimulus’s feedback probability was 0.8, but the dominant stimulus switched between the two every 20 trials. Hence, this design required participants to actively learn and infer the changing stimulus-feedback contingency in the volatile context.”

      - It would be helpful to provide an initial outline of the four models that will be described since the FLR, RS, and PH models were not foreshadowed in the introduction. For the FLR model in particular, it would be helpful to give a narrative overview of the components of the model before presenting the notation. 

      Action: We now include an overview paragraph in the section of computation model to outline the four models as well as the hypotheses constituted in the model (lines 202-220).  

      - The subsection on line 343, describing the simulations, lacks context. There are references to three effects being simulated (and to "the remaining two effects") but these are unclear because there's no statement in this section of what the three effects are.

      - Lines 352-353 give group-specific weighting parameters used for the stimulations of the HC and PAT groups in Figure 4B. A third, non-group-specific set of weighting parameters is given above on lines 348-349. What were those used for?

      - Line 352 seems to say Figure 4A is plotting a simulation, but the figure caption seems to say it is plotting empirical data. 

      These paragraphs has been rewritten and the abovementioned issues have been clarified. See lines 363-392.

      Point 3.2

      (2) There is little examination of why the MOS model does so well in terms of model fit indices. What features of the data is it doing a better job of capturing? One thing that makes this puzzling is that the MOS and FLR models seem to have most of the same qualitative components: the FLR model has parameters for additive weighting of magnitude relative to probability (akin to the MOS model's magnitude-only strategy weight) and for an autocorrelative choice kernel (akin to the MOS model's habit strategy weight). So it's not self-evident where the MOS model's advantage is coming from.

      An intuitive understanding of the FLR model is that it estimates the stimuli value through a linear combination of probability feedback (PF, )and (non-linear) magnitude .See equation:

      Also, the FLR model include the mechanisms of HA as:

      In other words, FLR model considers the mechanisms about the probability of feedback (PF)+MO+HA (see Eq. XX in the original study), but our MOS considers the mechanisms of EU+MO+HA. The key qualitative difference lies between FLR and MOS is the usage of the expected utility formula (EU) instead the probability of feedback (PF). The advantage of our MOS model has been fully evidenced by our model comparisons, indicating that human participants multiply probability and magnitude rather than only considering probability. The EU strategy has also been suggested by a large pile of literature (Gershman et al., 2015; Von Neumann & Morgenstern, 1947).

      Making decisions based on the multiplication of feedback probability and magnitude can often yield very different results compared to decisions based on a linear combination of the two, especially when the two magnitudes have a small absolute difference but a large ratio. Let’s consider two cases:

      (1) Stimulus 1: vs. Stimulus 2:

      (2) Stimulus 1: vs. Stimulus 2:

      The EU strategy may opt for stimulus 2 in both cases, since stimulus 2 always has a larger expected value. However, it is very likely for the PF+MO to choose stimulus 1 in the first case. For example, when .  If we want the PF+MO to also choose stimulus to align with the EU strategy, we need to increase the weight on magnitude . Note that in this example we divided the magnitude value by 100 to ensure that probability and magnitude are on the same scale to help illustration.

      In the dataset reported by Gagne, 2020, the described scenario seems to occur more often in the aversive context than in the reward context. To accurately capture human behaviors, FLR22 model requires a significantly larger weight for magnitude in the aversive context than in the reward context . Interestingly, when the weights for magnitude in different contexts are forced to be equal, the model (FLR6) fails, exhibiting an almost chance-level performance throughout learning (Fig. 3E, G). In contrast, the MOS6 model, and even the RS3 model, exhibit good performance using one identical set of parameters across contexts. Both MOS6 and RS3 include the EU strategy during decision-making. These findings suggest humans make decisions using the EU strategy rather than PF+MO.

      The focus of our paper is to present that a good-enough model can interpret the same dataset in a completely different perspective, not necessarily to explore improvements for the FLR model.

      Point 3.3

      One of the paper's potentially most noteworthy findings (Figure 5) is that when the FLR model is fit to synthetic data generated by the expected utility (EU) controller with a fixed learning rate, it recovers a spurious difference in learning rate between the volatile and stable environments. Although this is potentially a significant finding, its interpretation seems uncertain for several reasons: 

      - According to the relevant methods text, the result is based on a simulation of only 5 task blocks for each strategy. It would be better to repeat the simulation and recovery multiple times so that a confidence interval or error bar can be estimated and added to the figure. 

      - It makes sense that learning rates recovered for the magnitude-oriented (MO) strategy are near zero, since behavior simulated by that strategy would have no reason to show any evidence of learning. But this makes it perplexing why the MO learning rate in the volatile condition is slightly positive and slightly greater than in the stable condition. 

      - The pure-EU and pure-MO strategies are interpreted as being analogous to the healthy control group and the patient group, respectively. However, the actual difference in estimated EU/MO weighting between the two participant groups was much more moderate. It's unclear whether the same result would be obtained for a more empirically plausible difference in EU/MO weighting. 

      - The fits of the FLR model to the simulated data "controlled all parameters except for the learning rate parameters across the two strategies" (line 522). If this means that no parameters except learning rate were allowed to differ between the fits to the pure-EU and pure-MO synthetic data sets, the models would have been prevented from fitting the difference in terms of the relative weighting of probability and magnitude, which better corresponds to the true difference between the two strategies. This could have interfered with the estimation of other parameters, such as learning rate. 

      - If, after addressing all of the above, the FLR model really does recover a spurious difference in learning rate between stable and volatile blocks, it would be worth more examination of why this is happening. For example, is it because there are more opportunities to observe learning in those blocks?

      I would recommend performing a version of the Figure 5 simulations using two sets of MOS-model parameters that are identical except that they use healthy-control-like and patient-like values of the EU and MO weights (similar to the parameters described on lines 346-353, though perhaps with the habit controller weight equated). Then fit the simulated data with the FLR model, with learning rate and other parameters free to differ between groups. The result would be informative as to (1) whether the FLR model still misidentifies between-group strategy differences as learning rate differences, and (2) whether the FLR model still identifies spurious learning rate differences between stable and volatile conditions in the control-like group, which become attenuated in the patient-like group. 

      Many thanks for this great advice. Following your suggestions, we now conduct simulations using the median of the fitted parameters. The representations for healthy controls and patients have identical parameters, except for the three preference parameters; moreover, the habit weights are not controlled to be equal. 20 simulations for each representative, each comprising 4 task sequences sampled from the behavioral data. In this case, we could create error bars and perform statistical tests. We found that the differences in learning rates between stable and volatile conditions, as well as the learning rate adaptation differences between healthy controls and patients, still persisted.

      Combined with the discussion in Point 3.2, we justify why a mixture-of-strategy can account for learning rate adaptation as follow. Due to (unknown) differences in task sequences, the MOS6 model exhibits more MO-like behaviors due to the usage of the EU strategy. To capture this behavior pattern, the FLR22 model has to increase its weighting parameter 1-λ for magnitude, which could ultimately drive the FLR22 to adjust the fitted learning rate parameters, exhibiting a learning rate adaptation effect. Our simulations suggest that estimating learning rate just by model fitting may not be the only way to interpret the data.

      Action: We included the simulation details in the method section (lines 381-lines 391)

      “In one simulated experiment, we sampled the four task sequences from the real data. We simulated 20 experiments with the parameters of to mimic the behavior of the healthy control participants. The first three are the median of the fitted parameters across all participants; the latter three were chosen to approximate the strategy preferences of real health control participants (Figure 4A). Similarly, we also simulated 20 experiments for the patient group with the identical values of , and , but different strategy preferences   . In other words, the only difference in the parameters of the two groups is the switched and . We then fitted the FLR22 to the behavioral data generated by the MOS6 and examined the learning rate differences across groups and volatile contexts (Fig. 6). ”

      Point 3.4

      Figure 4C shows that the habit-only strategy is able to learn and adapt to changing contingencies, and some of the interpretive discussion emphasizes this. (For instance, line 651 says the habit strategy brings more rewards than the MO strategy.) However, the habit strategy doesn't seem to have any mechanism for learning from outcome feedback. It seems unlikely it would perform better than chance if it were the sole driver of behavior. Is it succeeding in this example because it is learning from previous decisions made by the EU strategy, or perhaps from decisions in the empirical data?

      Yes, the intuition is that the HA strategy seems to show no learning mechanism. But in reality, it yields a higher hit rate than MO by simply learning from previous decisions made by the EU strategy. We run simulations to confirm this (Figure 4B).

      Point 3.5

      For the model recovery analysis (line 567), the stated purpose is to rule out the possibility that the MOS model always wins (line 552), but the only result presented is one in which the MOS model wins. To assess whether the MOS and FLR models can be differentiated, it seems necessary also to show model recovery results for synthetic data generated by the FLR model. 

      Sure, we conducted a model recovery analysis that include all models, and it demonstrates that MOS and FLR can be fully differentiated. The results of the new model recovery analysis were shown in Fig. 7.

      Point 3.6

      To the best of my understanding, the MOS model seems to implement valence-specific learning rates in a qualitatively different way from how they were implemented in Gagne et al., 2020, and other previous literature. Line 246 says there were separate learning rates for upward and downward updates to the outcome probability. That's different from using two learning rates for "better"- and "worse"-than-expected outcomes, which will depend on both the direction of the update and the valence of the outcome (reward or shock). Might this relate to why no evidence for valence-specific learning rates was found even though the original authors found such evidence in the same data set? 

      Thanks. Following the suggestion, we have corrected our implementation of valence-specific learning rate in all models (see lines 261-268).

      “To keep consistent with Gagne et al., (2020), we also explored the valence-specific learning rate,

      is the learning rate for better-than-expected outcome, and for worse-than-expected outcome. It is important to note that Eq. 6 was only applied to the reward context, and the definitions of “better-than-expected” and “worse-than-expected” should change accordingly in the aversive context, where we defined for and for .

      No main effect of valence on learning rate was found (see Supplemental Information Note 3)

      Point 3.7

      The discussion (line 649) foregrounds the finding of greater "magnitude-only" weights with greater "general factor" psychopathology scores, concluding it reflects a shift toward simplifying heuristics. However, the picture might not be so straightforward because "habit" weights, which also reflect a simplifying heuristic, correlated negatively with the psychopathology scores. 

      Thanks. In contrast the detrimental effects of “MO”, “habit” is actually beneficial for the task. Please refer to Point 1.12.

      Point 3.8

      The discussion section contains some pejorative-sounding comments about Gagne et al. 2020 that lack clear justification. Line 611 says that the study "did not attempt to connect the decision process to anxiety and depression traits." Given that linking model-derived learning rate estimates to psychopathology scores was a major topic of the study, this broad statement seems incorrect. If the intent is to describe a more specific step that was not undertaken in that paper, please clarify. Likewise, I don't understand the justification for the statement on line 615 that the model from that paper "is not understandable" - please use more precise and neutral language to describe the model's perceived shortcomings. 

      Sorry for the confusion. We have removed all abovementioned pejorative-sounding comments.

      Point 3.9

      4. Minor suggestions: 

      - Line 114 says people with psychiatric illness "are known to have shrunk cognitive resources" - this phrasing comes across as somewhat loaded. 

      Thanks. We have removed this argument.

      - Line 225, I don't think the reference to "hot hand bias" is correct. I understand hot hand bias to mean overestimating the probability of success after past successes. That's not the same thing as habitual repetition of previous responses, which is what's being discussed here. 

      Response: Thanks for mentioning this. We have removed all discussions about “hot hand bias”.

      - There may be some notational inconsistency if alpha_pi on line 248 and alpha_HA on line 253 are referring to the same thing. 

      Thanks! Fixed!

      - Check the notation on line 285 - there may be some interchanging of decimals and commas.

      Thanks! Fixed!

      Also, would the interpretation in terms of risk seeking and risk aversion be different for rewarding versus aversive outcomes? 

      Thanks for asking. If we understand it correctly, risk seeking and risk aversion mechanisms are only present in the RS models, which show clearly worse fitting performance. We thus decide not to overly interpret the fitted parameters in the RS models.

      - Line 501, "HA and PAT groups" looks like a typo. 

      - In Figure 5, better graphical labeling of the panels and axes would be helpful. 

      Response: Thanks! Fixed!

      REFERENCES

      Daw, N. D., Gershman, S. J., Seymour, B., Dayan, P., & Dolan, R. J. (2011). Model-based influences on humans' choices and striatal prediction errors. Neuron, 69(6), 1204-1215.

      Gagne, C., Zika, O., Dayan, P., & Bishop, S. J. (2020). Impaired adaptation of learning to contingency volatility in internalizing psychopathology. Elife, 9.

      Gershman, S. J. (2020). Origin of perseveration in the trade-off between reward and complexity. Cognition, 204, 104394.

      Gershman, S. J., Horvitz, E. J., & Tenenbaum, J. B. (2015). Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science, 349(6245), 273-278.

      Von Neumann, J., & Morgenstern, O. (1947). Theory of games and economic behavior, 2nd rev.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Major comments (Public Reviews)

      Generality of grid cells

      We appreciate the reviewers’ concern regarding the generality of our approach, and in particular for analogies in nonlinear spaces. In that regard, there are at least two potential directions that could be pursued. One is to directly encode nonlinear structures (such as trees, rings, etc.) with grid cells, to which DPP-A could be applied as described in our model. The TEM model [1] suggests that grid cells in the medial entorhinal may form a basis set that captures structural knowledge for such nonlinear spaces, such as social hierarchies and transitive inference when formalized as a connected graph. Another would be to use eigen-decomposition of the successor representation [2], a learnable predictive representation of possible future states that has been shown by Stachenfield et al. [3] to provide an abstract structured representation of a space that is analogous to the grid cell code. This general-purpose mechanism could be applied to represent analogies in nonlinear spaces [4], for which there may not be a clear factorization in terms of grid cells (i.e., distinct frequencies and multiple phases within each frequency). Since the DPP-A mechanism, as we have described it, requires representations to be factored in this way it would need to be modified for such purpose. Either of these approaches, if successful, would allow our model to be extended to domains containing nonlinear forms of structure. To the extent that different coding schemes (i.e., basis sets) are needed for different forms of structure, the question of how these are identified and engaged for use in a given setting is clearly an important one, that is not addressed by the current work. We imagine that this is likely subserved by monitoring and selection mechanisms proposed to underlie the capacity for selective attention and cognitive control [5], though the specific computational mechanisms that underlie this function remain an important direction for future research. We have added a discussion of these issues in Section 6 of the updated manuscript.

      (1) Whittington, J.C., Muller, T.H., Mark, S., Chen, G., Barry, C., Burgess, N. and Behrens, T.E., 2020. The Tolman-Eichenbaum machine: unifying space and relational memory through generalization in the hippocampal formation. Cell, 183(5), pp.1249-1263.

      (2) Dayan, P., 1993. Improving generalization for temporal difference learning: The successor representation. Neural computation, 5(4), pp.613-624.

      (3) Stachenfeld, K.L., Botvinick, M.M. and Gershman, S.J., 2017. The hippocampus as a predictive map. Nature neuroscience, 20(11), pp.1643-1653.

      (4) Frankland, S., Webb, T.W., Petrov, A.A., O'Reilly, R.C. and Cohen, J., 2019. Extracting and Utilizing Abstract, Structured Representations for Analogy. In CogSci (pp. 1766-1772).

      (5) Shenhav, A., Botvinick, M.M. and Cohen, J.D., 2013. The expected value of control: an integrative theory of anterior cingulate cortex function. Neuron, 79(2), pp.217-240. Biological plausibility of DPP-A

      We appreciate the reviewers’ interest in the biological plausibility of our model, and in particular the question of whether and how DPP-A might be implemented in a neural network. In that regard, Bozkurt et al. [1] recently proposed a biologically plausible neural network algorithm using a weighted similarity matrix approach to implement a determinant maximization criterion, which is the core idea underlying the objective function we use for DPP-A, suggesting that the DPP-A mechanism we describe may also be biologically plausible. This could be tested experimentally by exposing individuals (e.g., rodents or humans) to a task that requires consistent exposure to a subregion, and evaluating the distribution of activity over the grid cells. Our model predicts that high frequency grid cells should increase their firing rate more than low frequency cells, since the high frequency grid cells maximize the determinant of the covariance matrix of the grid cell embeddings. It is also worth noting that Frankland et al. [2] have suggested that the use of DPPs may also help explain a mutual exclusivity bias observed in human word learning and reasoning. While this is not direct evidence of biological plausibility, it is consistent with the idea that the human brain selects representations for processing that maximize the volume of the representational space, which can be achieved by maximizing the DPP-A objective function defined in Equation 6. We have added a comment to this effect in Section 6 of the updated manuscript.

      (1) Bozkurt, B., Pehlevan, C. and Erdogan, A., 2022. Biologically-plausible determinant maximization neural networks for blind separation of correlated sources. Advances in Neural Information Processing Systems, 35, pp.13704-13717.

      (2) Frankland, S. and Cohen, J., 2020. Determinantal Point Processes for Memory and Structured Inference. In CogSci.

      Simplicity of analogical problem and comparison to other models using this task

      First, we would like to point out that analogical reasoning is a signatory feature of human cognition, which supports flexible and efficient adaptation to novel inputs that remains a challenge for most current neural network architectures. While humans can exhibit complex and sophisticated forms of analogical reasoning [1, 2, 3], here we focused on a relatively simple form, that was inspired by Rumelhart’s parallelogram model of analogy [4,5] that has been used to explain traditional human verbal analogies (e.g., “king is to what as man is to woman?”). Our model, like that one, seeks to explain analogical reasoning in terms of the computation of simple Euclidean distances (i.e., A - B = C - D, where A, B, C, D are vectors in 2D space). We have now noted this in Section 2.1.1 of the updated manuscript. It is worth noting that, despite the seeming simplicity of this construction, we show that standard neural network architectures (e.g., LSTMs and transformers) struggle to generalize on such tasks without the use of the DPP-A mechanism.

      Second, we are not aware of any previous work other than Frankland et al. [6] cited in the first paragraph of Section 2.2.1, that has examined the capacity of neural network architectures to perform even this simple form of analogy. The models in that study were hardcoded to perform analogical reasoning, whereas we trained models to learn to perform analogies. That said, clearly a useful line of future work would be to scale our model further to deal with more complex forms of representation and analogical reasoning tasks [1,2,3]. We have noted this in Section 6 of the updated manuscript.

      (1) Holyoak, K.J., 2012. Analogy and relational reasoning. The Oxford handbook of thinking and reasoning, pp.234-259.

      (2) Webb, T., Fu, S., Bihl, T., Holyoak, K.J. and Lu, H., 2023. Zero-shot visual reasoning through probabilistic analogical mapping. Nature Communications, 14(1), p.5144.

      (3) Lu, H., Ichien, N. and Holyoak, K.J., 2022. Probabilistic analogical mapping with semantic relation networks. Psychological review.

      (4) Rumelhart, D.E. and Abrahamson, A.A., 1973. A model for analogical reasoning. Cognitive Psychology, 5(1), pp.1-28.

      (5) Mikolov, T., Chen, K., Corrado, G. and Dean, J., 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

      (6) Frankland, S., Webb, T.W., Petrov, A.A., O'Reilly, R.C. and Cohen, J., 2019. Extracting and Utilizing Abstract, Structured Representations for Analogy. In CogSci (pp. 1766-1772).

      Clarification of DPP-A attentional modulation

      We would like to clarify several concerns regarding the DPP-A attentional modulation. First, we would like to make it clear that ω is not meant to correspond to synaptic weights, and thank the reviewer for noting the possibility for confusion on this point. It is also distinct from a biasing input, which is often added to the product of the input features and weights. Rather, in our model ω is a vector, and diag (ω) converts it into a matrix with ω as the diagonal of the matrix, and the rest entries are zero. In Equation 6, diag(ω) is matrix multiplied with the covariance matrix V, which results in elementwise multiplication of ω with column vectors of V, and hence acts more like gates. We have noted this in Section 2.2.2 and have changed all instances of “weights (ω)” to “gates (ɡ)” in the updated manuscript. We have also rewritten the definition of Equation 6 and uses of it (as in Algorithm 1) to depict the use of sigmoid nonlinearity (σ) to , so that the resulting values are always between 0 and 1.

      Second, we would like to clarify that we don’t compute the inner product between the gates ɡ and the grid cell embeddings x anywhere in our model. The gates within each frequency were optimized (independent of the task inputs), according to Equation 6, to compute the approximate maximum log determinant of the covariance matrix over the grid cell embeddings individually for each frequency. We then used the grid cell embeddings belonging to the frequency that had the maximum within-frequency log determinant for training the inference module, which always happened to be grid cells within the top three frequencies. Author response image 1 (also added to the Appendix, Section 7.10 of the updated manuscript) shows the approximate maximum log determinant (on the y-axis) for the different frequencies (on the x-axis).

      Author response image 1.

      Approximate maximum log determinant of the covariance matrix over the grid cell embeddings (y-axis) for each frequency (x-axis), obtained after maximizing Equation 6.

      Third, we would like to clarify our interpretation of why DPP-A identified grid cell embeddings corresponding to the highest spatial frequencies, and why this produced the best OOD generalization (i.e., extrapolation on our analogy tasks). It is because those grid cell embeddings exhibited greater variance over the training data than the lower frequency embeddings, while at the same time the correlations among those grid cell embeddings were lower than the correlations among the lower frequency grid cell embeddings. The determinant of the covariance matrix of the grid cell embeddings is maximized when the variances of the grid cell embeddings are high (they are “expressive”) and the correlation among the grid cell embeddings is low (they “cover the representational space”). As a result, the higher frequency grid cell embeddings more efficiently covered the representational space of the training data, allowing them to efficiently capture the same relational structure across training and test distributions which is required for OOD generalization. We have added some clarification to the second paragraph of Section 2.2.2 in the updated manuscript. Furthermore, to illustrate this graphically, Author response image 2 (added to the Appendix, Section 7.10 of the updated manuscript) shows the results after the summation of the multiplication of the grid cell embeddings over the 2d space of 1000x1000 locations, with their corresponding gates for 3 representative frequencies (left, middle and right panels showing results for the lowest, middle and highest grid cell frequencies, respectively, of the 9 used in the model), obtained after maximizing Equation 6 for each grid cell frequency. The color code indicates the responsiveness of the grid cells to different X and Y locations in the input space (lighter color corresponding to greater responsiveness). Note that the dark blue area (denoting regions of least responsiveness to any grid cell) is greatest for the lowest frequency and nearly zero for the highest frequency, illustrating that grid cell embeddings belonging to the highest frequency more efficiently cover the representational space which allows them to capture the same relational structure across training and test distributions as required for OOD generalization.

      Author response image 2.

      Each panel shows the results after summation of the multiplication of the grid cell embeddings over the 2d space of 1000x1000 locations, with their corresponding gates for a particular frequency, obtained after maximizing Equation 6 for each grid cell frequency. The left, middle, and right panels show results for the lowest, middle, and highest grid cell frequencies, respectively, of the 9 used in the model. Lighter color in each panel corresponds to greater responsiveness of grid cells at that particular location in the 2d space.

      Finally, we would like to clarify how the DPP-A attentional mechanism is different from the attentional mechanism in the transformer module, and why both are needed for strong OOD generalization. Use of the standard self-attention mechanism in transformers over the inputs (i.e., A, B, C, and D for the analogy task) in place of DPP-A would lead to weightings of grid cell embeddings over all frequencies and phases. The objective function for the DPP-A represents an inductive bias, that selectively assigns the greatest weight to all grid cell embeddings (i.e., for all phases) of the frequency for which the determinant of the covariance matrix is greatest computed over the training space. The transformer inference module then attends over the inputs with the selected grid cell embeddings based on the DPP-A objective. We have added a discussion of this point in Section 6 of the updated manuscript.

      We would like to thank the reviewers for their recommendations. We have tried our best to incorporate them into our updated manuscript. Below we provide a detailed response to each of the recommendations grouped for each reviewer.

      Reviewer #1 (Recommendations for the authors)

      (1) It would be helpful to see some equations for R in the main text.

      We thank the reviewer for this suggestion. We have now added some equations explaining the working of R in Section 2.2.3 of the updated manuscript.

      (2) Typo: p 11 'alongwith' -> 'along with'

      We have changed all instances of ‘alongwith’ to ‘along with’ in the updated manuscript.

      (3) Presumably, this is related to equivariant ML - it would be helpful to comment on this.

      Yes, this is related to equivariant ML, since the properties of equivariance hold for our model. Specifically, the probability distribution after applying softmax remains the same when the transformation (translation or scaling) is applied to the scores for each of the answer choices obtained from the output of the inference module, and when the same transformation is applied to the stimuli for the task and all the answer choices before presenting as input to the inference module to obtain the scores. We have commented on this in Section 2.2.3 of the updated manuscript.

      Reviewer #2 (Recommendations for the authors)

      (1) Page 2 - "Webb et al." temporal context - they should also cite and compare this to work by Marc Howard on generalization based on multi-scale temporal context.

      While we appreciate the important contributions that have been made by Marc Howard and his colleagues to temporal coding and its role in episodic memory and hippocampal function, we would like to clarify that his temporal context model is unrelated to the temporal context normalization developed by Webb et al. (2020) and mentioned on Page 2. The former (Temporal Context Model) is a computational model that proposes a role for temporal coding in the functions of the medial temporal lobe in support of episodic recall, and spatial navigation. The latter (temporal context normalization) is a normalization procedure proposed for use in training a neural network, similar to batch normalization [1], in which tensor normalization is applied over the temporal instead of the batch dimension, which is shown to help with OOD generalization. We apologize for any confusion engendered by the similarity of these terms, and failure to clarify the difference between these, that we have now attempted to do in a footnote on Page 2.

      Ioffe, S. and Szegedy, C., 2015, June. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448-456). pmlr.

      (2) page 3 - "known to be implemented in entorhinal" - It's odd that they seem to avoid citing the actual biology papers on grid cells. They should cite more of the grid cell recording papers when they mention the entorhinal cortex (i.e. Hafting et al., 2005; Barry et al., 2007; Stensola et al., 2012; Giocomo et al., 2011; Brandon et al., 2011).

      We have now cited the references mentioned below, on page 3 after the phrase “known to be implemented in entohinal cortex”.

      (1) Barry, C., Hayman, R., Burgess, N. and Jeffery, K.J., 2007. Experience-dependent rescaling of entorhinal grids. Nature neuroscience, 10(6), pp.682-684.

      (2) Stensola, H., Stensola, T., Solstad, T., Frøland, K., Moser, M.B. and Moser, E.I., 2012. The entorhinal grid map is discretized. Nature, 492(7427), pp.72-78.

      (3) Giocomo, L.M., Hussaini, S.A., Zheng, F., Kandel, E.R., Moser, M.B. and Moser, E.I., 2011. Grid cells use HCN1 channels for spatial scaling. Cell, 147(5), pp.1159-1170.

      (4) Brandon, M.P., Bogaard, A.R., Libby, C.P., Connerney, M.A., Gupta, K. and Hasselmo, M.E., 2011. Reduction of theta rhythm dissociates grid cell spatial periodicity from directional tuning. Science, 332(6029), pp.595-599.

      (3) To enhance the connection to biological systems, they should cite more of the experimental and modeling work on grid cell coding (for example on page 2 where they mention relational coding by grid cells). Currently, they tend to cite studies of grid cell relational representations that are very indirect in their relationship to grid cell recordings (i.e. indirect fMRI measures by Constaninescu et al., 2016 or the very abstract models by Whittington et al., 2020). They should cite more papers on actual neurophysiological recordings of grid cells that suggest relational/metric representations, and they should cite more of the previous modeling papers that have addressed relational representations. This could include work on using grid cell relational coding to guide spatial behavior (e.g. Erdem and Hasselmo, 2014; Bush, Barry, Manson, Burges, 2015). This could also include other papers on the grid cell code beyond the paper by Wei et al., 2015 - they could also cite work on the efficiency of coding by Sreenivasan and Fiete and by Mathis, Herz, and Stemmler.

      We thank the reviewer for bringing the additional references to our attention. We have cited the references mentioned below on page 2 of the updated manuscript.

      (1) Erdem, U.M. and Hasselmo, M.E., 2014. A biologically inspired hierarchical goal directed navigation model. Journal of Physiology-Paris, 108(1), pp.28-37.

      (2) Sreenivasan, S. and Fiete, I., 2011. Grid cells generate an analog error-correcting code for singularly precise neural computation. Nature neuroscience, 14(10), pp.1330-1337.

      (3) Mathis, A., Herz, A.V. and Stemmler, M., 2012. Optimal population codes for space: grid cells outperform place cells. Neural computation, 24(9), pp.2280-2317.

      (4) Bush, D., Barry, C., Manson, D. and Burgess, N., 2015. Using grid cells for navigation. Neuron, 87(3), pp.507-520

      (4) Page 3 - "Determinantal Point Processes (DPPs)" - it is rather annoying that DPP is defined after DPP-A is defined. There ought to be a spot where the definition of DPP-A is clearly stated in a single location.

      We agree it makes more sense to define Determinantal Point Process (DPP) before DPP-A. We have now rephrased the sentences accordingly. In the “Abstract”, the sentence now reads “Second, we propose an attentional mechanism that operates over the grid cell code using Determinantal Point Process (DPP), which we call DPP attention (DPP-A) - a transformation that ensures maximum sparseness in the coverage of that space.” We have also modified the second paragraph of the “Introduction”. The modified portion now reads “b) an attentional objective inspired from Determinantal Point Processes (DPPs), which are probabilistic models of repulsion arising in quantum physics [1], to attend to abstract representations that have maximum variance and minimum correlation among them, over the training data. We refer to this as DPP attention or DPP-A.” Due to this change, we removed the last sentence of the fifth paragraph of the “Introduction”.

      (1) Macchi, O., 1975. The coincidence approach to stochastic point processes. Advances in Applied Probability, 7(1), pp.83-122.

      (5) Page 3 - "the inference module R" - there should be some discussion about how this component using LSTM or transformers could relate to the function of actual brain regions interacting with entorhinal cortex. Or if there is no biological connection, they should state that this is not seen as a biological model and that only the grid cell code is considered biological.

      While we agree that the model is not construed to be as specific about the implementation of the R module, we assume that — as a standard deep learning component — it is likely to map onto neocortical structures that interact with the entorhinal cortex and, in particular, regions of the prefrontal-posterior parietal network widely believed to be involved in abstract relational processes [1,2,3,4]. In particular, the role of the prefrontal cortex in the encoding and active maintenance of abstract information needed for task performance (such as rules and relations) has often been modeled using gated recurrent networks, such as LSTMs [5,6], and the posterior parietal cortex has long been known to support “maps” that may provide an important substrate for computing complex relations [4]. We have added some discussion about this in Section 2.2.3 of the updated manuscript.

      (1) Waltz, J.A., Knowlton, B.J., Holyoak, K.J., Boone, K.B., Mishkin, F.S., de Menezes Santos, M., Thomas, C.R. and Miller, B.L., 1999. A system for relational reasoning in human prefrontal cortex. Psychological science, 10(2), pp.119-125.

      (2) Christoff, K., Prabhakaran, V., Dorfman, J., Zhao, Z., Kroger, J.K., Holyoak, K.J. and Gabrieli, J.D., 2001. Rostrolateral prefrontal cortex involvement in relational integration during reasoning. Neuroimage, 14(5), pp.1136-1149.

      (3) Knowlton, B.J., Morrison, R.G., Hummel, J.E. and Holyoak, K.J., 2012. A neurocomputational system for relational reasoning. Trends in cognitive sciences, 16(7), pp.373-381.

      (4) Summerfield, C., Luyckx, F. and Sheahan, H., 2020. Structure learning and the posterior parietal cortex. Progress in neurobiology, 184, p.101717.

      (5) Frank, M.J., Loughry, B. and O’Reilly, R.C., 2001. Interactions between frontal cortex and basal ganglia in working memory: a computational model. Cognitive, Affective, & Behavioral Neuroscience, 1, pp.137-160.

      (6) Braver, T.S. and Cohen, J.D., 2000. On the control of control: The role of dopamine in regulating prefrontal function and working memory. Control of cognitive processes: Attention and performance XVIII, (2000).

      (6) Page 4 - "Learned weighting w" - it is somewhat confusing to use "w" as that is commonly used for synaptic weights, whereas I understand this to be an attentional modulation vector with the same dimensionality as the grid cell code. It seems more similar to a neural network bias input than a weight matrix.

      We refer to the first paragraph of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (7) Page 4 - "parameterization of w... by two loss functions over the training set." - I realize that this has been stated here, but to emphasize the significance to a naïve reader, I think they should emphasize that the learning is entirely focused on the initial training space, and there is NO training done in the test spaces. It's very impressive that the parameterization is allowing generalization to translated or scaled spaces without requiring ANY training on the translated or scaled spaces.

      We have added the sentence “Note that learning of parameter occurs only over the training space and is not further modified during testing (i.e. over the test spaces)” to the updated manuscript.

      (8) Page 4 - "The first," - This should be specific - "The first loss function"

      We have changed it to “The first loss function” in the updated manuscript.

      (9) Page 4 - The analogy task seems rather simplistic when first presented (i.e. just a spatial translation to different parts of a space, which has already been shown to work in simulations of spatial behavior such as Erdem and Hasselmo, 2014 or Bush, Barry, Manson, Burgess, 2015). To make the connection to analogy, they might provide a brief mention of how this relates to the analogy space created by word2vec applied to traditional human verbal analogies (i.e. king-man+woman=queen).

      We agree that the analogy task is simple, and recognize that grid cells can be used to navigate to different parts of space over which the test analogies are defined when those are explicitly specified, as shown by Erdem and Hasselmo (2014) and Bush, Barry, Manson, and Burgess (2015). However, for the analogy task, the appropriate set of grid cell embeddings must be identified that capture the same relational structure between training and test analogies to demonstrate strong OOD generalization, and that is achieved by the attentional mechanism DPP-A. As suggested by the reviewer’s comment, our analogy task is inspired by Rumelhart’s parallelogram model of analogy [1,2] (and therefore similar to traditional human verbal analogies) in as much as it involves differences (i.e A - B = C - D, where A, B, C, D are vectors in 2D space). We have now noted this in Section 2.1.1 of the updated manuscript.

      (1) Rumelhart, D.E. and Abrahamson, A.A., 1973. A model for analogical reasoning. Cognitive Psychology, 5(1), pp.1-28.

      (2) Mikolov, T., Chen, K., Corrado, G. and Dean, J., 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

      (10) Page 5 - The variable "KM" is a bit confusing when it first appears. It would be good to re-iterate that K and M are separate points and KM is the vector between these points.

      We apologize for the confusion on this point. KM is meant to refer to an integer value, obtained by multiplying K and M, which is added to both dimensions of A, B, C and D, which are points in ℤ2, to translate them to a different region of the space. K is an integer value ranging from 1 to 9 and M is also an integer value denoting the size of the training region, which in our implementation is 100. We have clarified this in Section 2.1.1 of the updated manuscript.

      (11) Page 5 - "two continuous dimensions (Constantinescu et al._)" - this ought to give credit to the original study showing the abstract six-fold rotational symmetry for spatial coding (Doeller, Barry and Burgess).

      We have now cited the original work by Doeller et al. [1] along with Constantinescu et al. (2016) in the updated manuscript after the phrase “two continuous dimensions” on page 5.

      (1) Doeller, C.F., Barry, C. and Burgess, N., 2010. Evidence for grid cells in a human memory network. Nature, 463(7281), pp.657-661.

      (12) Page 6 - Np=100. This is done later, but it would be clearer if they right away stated that Np*Nf=900 in this first presentation.

      We have now added this sentence after Np=100. “Hence Np*Nf=900, which denotes the number of grid cells.”

      (13) Page 6 - They provide theorem 2.1 on the determinant of the covariance matrix of the grid code, but they ought to cite this the first time this is mentioned.

      We have cited Gilenwater et al. (2012) before mentioning theorem 2.1. The sentence just before that reads “We use the following theorem from Gillenwater et al. (2012) to construct :”

      (14) Page 6 - It would greatly enhance the impact of the paper if they could give neuroscientists some sense of how the maximization of the determinant of the covariance matrix of the grid cell code could be implemented by a biological circuit. OR at least to show an example of the output of this algorithm when it is used as an inner product with the grid cell code. This would require plotting the grid cell code in the spatial domain rather than the 900 element vector.

      We refer to our response above to the topic “Biological plausibility of DPP-A” and second, third, and fourth paragraphs of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contain our responses to this issue.

      (15) Page 6 - "That encode higher spatial frequencies..." This seems intuitive, but it would be nice to give a more intuitive description of how this is related to the determinant of the covariance matrix.

      We refer to the third paragraph of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (16) Page 7 - log of both sides... Nf is number of frequencies... Would be good to mention here that they are referring to equation 6 which is only mentioned later in the paragraph.

      As suggested, we now refer to Equation 6 in the updated manuscript. The sentence now reads “This is achieved by maximizing the determinant of the covariance matrix over the within frequency grid cell embeddings of the training data, and Equation 6 is obtained by applying the log on both sides of Theorem 2.1, and in our case where refers to grid cells of a particular frequency.”

      (17) Page 7 - Equation 6 - They should discuss how this is proposed to be implemented in brain circuits.

      We refer to our response above to the topic “Biological plausibility of DPP-A” under “Major comments (Public Reviews)”, which contains our response to this issue.

      18) Page 9 - "egeneralize" - presumably this is a typo?

      Yes. We have corrected it to “generalize” in the updated manuscript.

      (19) Page 9 - "biologically plausible encoding scheme" - This is valid for the grid cell code, but they should be clear that this is not valid for other parts of the model, or specify how other parts of the model such as DPP-A could be biologically plausible.

      We refer to our response above to the topic “Biological plausibility of DPP-A” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (20) Page 12 - Figure 7 - comparsion to one-hots or smoothed one-hots. The text should indicate whether the smoothed one-hots are similar to place cell coding. This is the most relevant comparison of coding for those knowledgeable about biological coding schemes.

      Yes, smoothed one-hots are similar to place cell coding. We now mention this in Section 5.3 of the updated manuscript.

      (21) Page 12 - They could compare to a broader range of potential biological coding schemes for the overall space. This could include using coding based on the boundary vector cell coding of the space, band cell coding (one dimensional input to grid cells), or egocentric boundary cell coding.

      We appreciate these useful suggestions, which we now mention as potentially valuable directions for future work in the second paragraph of Section 6 of the updated manuscript.

      (22) Page 13 - "transformers are particularly instructive" - They mention this as a useful comparison, but they might discuss further why a much better function is obtained when attention is applied to the system twice (once by DPP-A and then by a transformer in the inference module).

      We refer to the last paragraph of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (23) Page 13 - "Section 5.1 for analogy and Section 5.2 for arithmetic" - it would be clearer if they perhaps also mentioned the specific figures (Figure 4 and Figure 6) presenting the results for the transformer rather than the LSTM.

      We have now rephrased to also refer to the figures in the updated manuscript. The phrase now reads “a transformer (Figure 4 in Section 5.1 for analogy and Figure 6 in Section 5.2 for arithmetic tasks) failed to achieve the same level of OOD generalization as the network that used DPP-A.”

      (24) Page 14 - "statistics of the training data" - The most exciting feature of this paper is that learning during the training space analogies can so effectively generalize to other spaces based on the right attention DPP-A, but this is not really made intuitive. Again, they should illustrate the result of the xT w inner product to demonstrate why this work so effectively!

      We refer to the second, third, and fourth paragraphs of our response above to the topic “Clarification of DPP-A attentional modulation” under “Major comments (Public Reviews)”, which contains our response to this issue.

      (25) Bibliography - Silver et al., go paper - journal name "nature" should be capitalized. There are other journal titles that should be capitalized. Also, I believe eLife lists family names first.

      We have made the changes to the bibliography of the updated manuscript suggested by the reviewer.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Preliminary note from the Reviewing Editor:

      The evaluations of the two Reviewers are provided for your information. As you can see, their opinions are very different.

      Reviewer #1 is very harsh in his/her evaluation. Clearly, we don't expect you to be able to affect one type of actin network without affecting the other, but rather to change the balance between the two. However, he/she also raises some valid points, in particular that more rationale should be added for the perturbations (also mentioned by Reviewer #2). Both Reviewers have also excellent suggestions for improving the presentation of the data.

      We sincerely appreciate your and the reviewers’ suggestions. The comments are amended accordingly.

      On another point, I was surprised when reading your manuscript that a molecular description of chirality change in cells is presented as a completely new one. Alexander Bershadsky's group has identified several factors (including alpha-actinin) as important regulators of the direction of chirality. The articles are cited, but these important results are not specifically mentioned. Highlighting them would not call into question the importance of your work, but might even provide additional arguments for your model.

      We appreciate the editor’s comment. Alexander Bershadsky's group has done marvelous work in cell chirality. They introduced the stair-stepping and screw theory, which suggested how radial fiber polymerization generates ACW force and drives the actin cytoskeleton into the ACW pattern. Moreover, they have identified chiral regulators like alpha-actinin 1, mDia1, capZB, and profilin 1, which can reverse or neutralize the chiral expression.

      It is worth noting that Bershadsky's group primarily focuses on radial fibers. In our manuscript, instead, we primarily focused on the contractile unit in the transverse arcs and CW chirality in our investigation. Our manuscript incorporates our findings in the transverse arcs and the radial fibers theory by Bershadsky's group into the chirality balance hypothesis, providing a more comprehensive understanding of the chirality expression.

      We have included relevant articles from Alexander Bershadsky's group, we agree that highlighting these important results of chiral regulators would further strengthen our manuscript. The manuscript was revised as follows:

      “ACW chirality can be explained by the right-handed axial spinning of radial fibers during polymerization, i.e. ‘stair-stepping' mode proposed by Tee et al. (Tee et al. 2015) (Figure 8A; Video 4). As actin filament is formed in a right-handed double helix, it possesses an intrinsic chiral nature. During the polymerization of radial fiber, the barbed end capped by formin at focal adhesion was found to recruit new actin monomers to the filament. The tethering by formin during the recruitment of actin monomers contributes to the right-handed tilting of radial fibers, leading to ACW rotation. Supporting this model, Jalal et al. (Jalal et al. 2019) showed that the silencing of mDia1, capZB, and profilin 1 would abolish the ACW chiral expression or reverse the chirality into CW direction. Specifically, the silencing of mDia1, capZB or profilin-1 would attenuate the recruitment of actin monomer into the radial fiber, with mDia1 acting as the nucleator of actin filament (Tsuji et al. 2002), CapZB promoting actin polymerization as capping protein (Mukherjee et al. 2016), and profilin-1 facilitating ATP-bound G-actin to the barbed ends(Haarer and Brown 1990; Witke 2004). The silencing resulted in a decrease in the elongation velocity of radial fiber, driving the cell into neutral or CW chirality. These results support that our findings that reduction of radial fiber elongation can invert the balance of chirality expression, changing the ACW-expressing cell into a neutral or CW-expressing cell.”

      By incorporating their findings into our revision and discussion, we provide additional support for our radial fiber-transverse arc balance model for chirality expression. The revision is made on pages 8 to 9, 13, lines 253 to 256, 284, 312 to 313, 443, 449 to 459.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Kwong et al. present evidence that two actin-filament based cytoskeletal structures regulate the clockwise and anticlockwise rotation of the cytoplasm. These claims are based on experiments using cells plated on micropatterned substrates (circles). Previous reports have shown that the actomyosin network that forms on the dorsal surface of a cell plated on a circle drives a rotational or swirling pattern of movement in the cytoplasm. This actin network is composed of a combination of non-contractile radial stress fibers (AKA dorsal stress fibers) which are mechanically coupled to contractile transverse actin arcs (AKA actin arcs). The authors claim that directionality of the rotation of the cytoplasm (i.e., clockwise or anticlockwise) depends on either the actin arcs or radial fibers, respectively. While this would interesting, the authors are not able to remove either actin-based network without effecting the other. This is not surprising, as it is likely that the radial fibers require the arcs to elongate them, and the arcs require the radial fibers to stop them from collapsing. As such, it is difficult to make simple interpretations such as the clockwise bias is driven by the arcs and anticlockwise bias is driven by the radial fibers.

      Weaknesses:

      (1) There are also multiple problems with how the data is displayed and interpreted. First, it is difficult to compare the experimental data with the controls as the authors do not include control images in several of the figures. For example, Figure 6 has images showing myosin IIA distribution, but Figure 5 has the control image. Each figure needs to show controls. Otherwise, it will be difficult for the reader to understand the differences in localization of the proteins shown. This could be accomplished by either adding different control examples or by combining figures.

      We appreciate the reviewer’s comment. We agree with the reviewer that it is difficult to compare our results in the current arrangement. The controls are included in the new Figure 6.

      (2) It is important that the authors should label the range of gray values of the heat maps shown. It is difficult to know how these maps were created. I could not find a description in the methods, nor have previous papers laid out a standardized way of doing it. As such, the reader needs some indication as to whether the maps showing different cells were created the same and show the same range of gray levels. In general, heat maps showing the same protein should have identical gray levels. The authors already show color bars next to the heat maps indicating the range of colors used. It should be a simple fix to label the minimum (blue on the color bar) and the maximum (red on the color bar) gray levels on these color bars. The profiles of actin shown in Figure 3 and Figure 3- figure supplement 3 were useful for interpretating the distribution of actin filaments. Why did not the authors show the same for the myosin IIa distributions?

      We appreciate the reviewer’s comment. For generating the distribution heatmap, the images were taken under the same setting (e.g., fluorescent staining procedure, excitation intensity, or exposure time). The prerequisite of cells for image stacking was that they had to be fully spread on either 2500 µm2 or 750 µm2 circular patterns. Then, the location for image stacking was determined by identifying the center of each cell spread in a perfect circle. Finally, the images were aligned at the cell center to calculate the averaged intensity to show the distribution heatmap on the circular pattern. Revision is made on pages 19 to 20, lines 668 to 677.

      It is important to note that the individual heatmaps represent the normalized distribution generated using unique color intensity ranges. This approach was chosen to emphasize the proportional distribution of protein within cells and its variations among samples, especially for samples with generally lower expression levels. Additionally, a differential heatmap with its own range was employed to demonstrate the normalized differences compared to the control sample. Furthermore, to provide additional insight, we plotted the intensity profile of the same protein with the same size for comparative analysis. Revision is made on pages 20, lines 679 to 682.

      The labels of the heatmap are included to show the intensity in the revised Figure 3, Figure 5, Figure 6, and Figure 3 —figure supplement 4.

      To better illustrate the myosin IIa distribution, the myosin intensity profiles were plotted for Y27 treatment and gene silencing. The figures are included as Figure 5—figure supplement 2 and Figure 6—figure supplement 2. Revisions are made on pages 10, lines 332 to 334 and pages 11, lines 377 to 379.

      (3) Line 189 "This absence of radial fibers is unexpected". The authors should clarify what they mean by this statement. The claim that the cell in Figure 3B has reduced radial stress fiber is not supported by the data shown. Every actin structure in this cell is reduced compared to the cell on the larger micropattern in Figure 3A. It is unclear if the radial stress fibers are reduced more than the arcs. Are the authors referring to radial fiber elongation?

      We appreciate the reviewer’s comment. We calculated the structures' pixel number and the percentage in the image to better illustrate the reduction of radial fiber or transverse arc. As radial fibers emerge from the cell boundary and point towards the cell center and the transverse arcs are parallel to the cell edge, the actin filament can be identified by their angle with respect to the cell center. We found that the pixel number of radial fiber is greatly reduced by 91.98 % on 750 µm2 compared to the 2500 µm2 pattern, while the pixel number of transverse arc is reduced by 70.58 % (Figure 3- figure supplement 3A). Additionally, we compared the percentage of actin structures on different pattern sizes (Figure 3- figure supplement 3B). On 2500 µm2 pattern, the percentage of radial fiber in the actin structure is 61.76 ± 2.77 %, but it only accounts for 31.13 ± 2.76 % while on 750 µm2 pattern. These results provide evidence of the structural reduction on a smaller pattern.

      Regarding the radial fiber elongation, we only discussed the reduction of radial fiber on 750 µm2 compared to the 2500 µm2 pattern in this part. For more understanding of the radial fiber contribution to chirality, we compared the radial fiber elongation rate in the LatA treatment and control on 2500 µm2 pattern (Figure 4). This result suggests the potential role of radial fiber in cell chirality. Revisions are made on page 6, lines 186 to 194; pages 17 to 18, 601 to 606; and the new Figure 3- figure supplement 3.

      (4) The choice of the small molecule inhibitors used in this study is difficult to understand, and their results are also confusing. For example, sequestering G actin with Latrunculin A is a complicated experiment. The authors use a relatively low concentration (50 nM) and show that actin filament-based structures are reduced and there are more in the center of the cell than in controls (Figure 3E). What was the logic of choosing this concentration?

      We appreciate the reviewer’s comment. The concentration of drugs was selected based on literatures and their known effects on actin arrangement or chiral expression.

      For example, Latrunculin A was used at 50 nM concentration, which has been proven effective in reversing the chirality at or below 50 nM (Bao et al., 2020; Chin et al., 2018; Kwong et al., 2019; Wan et al., 2011). Similarly, the 2 µM A23187 treatment concentration was selected to initiate the actin remodeling (Shao et al., 2015). Furthermore, NSC23677 at 100 µM was found to efficiently inhibit the Rac1 activation and resulted in a distinct change in actin structure (Chen et al., 2011; Gao et al., 2004), enhancing ACW chiral expression. The revision is made on pages 6 to 7, lines 202 to 211.

      (5) Using a small molecule that binds the barbed end (e.g., cytochalasin) could conceivably be used to selectively remove longer actin filaments, which the radial fibers have compared to the lamellipodia and the transverse arcs. The authors should articulate how the actin cytoskeleton is being changed by latruculin treatment and the impact on chirality. Is it just that the radial stress fibers are not elongating? There seems to be more radial stress fibers than in controls, rather than an absence of radial stress fibers.

      We appreciate the reviewer’s comment. Our results showed Latrunculin A treatment reversed the cell chirality. To compare the amount of radial fiber and transverse arc, we calculated the structures' pixel percentage. We found that, the percentage of radial fibers pixel with LatA treatment was reduced compared to that of the control, while the percentage of transverse arcs pixel increased (Figure 3— figure supplement 5). This result suggests that radial fibers are inhibited under Latrunculin A treatment.

      Furthermore, the elongation rate of radial fibers is reduced by Latrunculin A treatment (Figure 4). This result, along with the reduction of radial fiber percentage under Latrunculin A treatment suggests the significant impact of radial fiber on the ACW chirality.  Revisions are made on pages 7 to 8, lines 244 to 250 and the new Figure 3— figure supplement 5 and Figure 3— figure supplement 6.

      (6) Similar problems arise from the other small molecules as well. LPA has more effects than simply activating RhoA. Additionally, many of the quantifiable effects of LPA treatment are apparent only after the cells are serum starved, which does not seem to be the case here.

      We appreciate the reviewer’s comment. The reviewer mentioned that the quantifiable effects of LPA treatments were seen after the cells were serum-starved. LPA is known to be a serum component and has an affinity to albumin in serum (Moolenaar, 1995). Serum starvation is often employed to better observe the effects of LPA by comparing conditions with and without LPA. We agree with the reviewer that the effect of LPA cannot be fully seen under the current setting. Based on the reviewer’s comment and after careful consideration, we have decided to remove the data related to LPA from our manuscript. Revisions are made on pages 6 to 7, 17 and Figure 3— figure supplement 4.

      (7) Furthermore, inhibiting ROCK with, Y-27632, effects myosin light chain phosphorylation and is not specific to myosin IIA. Are the two other myosin II paralogs expressed in these cells (myosin IIB and myosin IIC)? If so, the authors’ statements about this experiment should refer to myosin II not myosin IIa.

      We appreciate the reviewer’s comment. We agree that ensuring accuracy and clarity in our statements is important. The terminology is revised to myosin II regarding the Y27632 experiment for a more concise description. Revision is made on pages 9 to 10 and 29, lines 317 to 341, 845 and 848.  

      (8) None of the uses of the small molecules above have supporting data using a different experimental method. For example, backing up the LPA experiment by perturbing RhoA tho.

      We appreciate the reviewer’s comment. After careful consideration, we have decided to remove the data related to LPA from our manuscript. Revisions are made on pages 6 to 7, 17 and Figure 3— figure supplement 4.

      (9) The use of SMIFH2 as a "formin inhibitor" is also problematic. SMIFH2 also inhibits myosin II contractility, making interpreting its effects on cells difficult to impossible. The authors present data of mDia2 knockdown, which would be a good control for this SMIFH2.

      We appreciate the reviewer’s comment. We agree that there is potential interference of SMIFH2 with myosin II contractility, which could introduce confounding factors to the results. Based on your comment and further consideration, we have decided to remove the data related to SMIFH2 from our manuscript. Revisions are made on pages 6 to 7, 10, 17 and Figure 3— figure supplement 4.

      (10) However, the authors claim that mDia2 "typically nucleates tropomyosin-decorated actin filaments, which recruit myosin II and anneal endwise with α-actinin- crosslinked actin filaments."

      There is no reference to this statement and the authors own data shows that both arcs and radial fibers are reduced by mDia2 knockdown. Overall, the formin data does not support the conclusions the authors report.

      We appreciate the reviewer’s comment. We apologize for the lack of citation for this claim. To address this, we have added a reference to support this claim in the revised manuscript (Tojkander et al., 2011). Revision is made on page 10, line 345 to 347.

      Regarding the actin structure of mDia2 gene silencing, our results showed that myosin II was disassociated from the actin filament compared to the control. At the same time, there is no considerable differences in the actin structure of radial fibers and transverse arcs between the mDia2 gene silencing and the control.  

      (11) The data in Figure 7 does not support the conclusion that myosin IIa is exclusively on top of the cell. There are clear ventral stress fibers in A (actin) that have myosin IIa localization. The authors simply chose to not draw a line over them to create a height profile.

      We appreciate the reviewer’s comment. To better illustrate myosin IIa distribution in a cell, we have included a video showing the myosin IIa staining from the base to the top of the cell (Video 7). At the cell base, the intensity of myosin IIa is relatively low at the center. However, when the focal plane elevates, we can clearly see the myosin II localizes near the top of the cell (Figure 7B and Video 7). Revision is made on page 12, lines 421 to 424, and the new Video 7. 

      Reviewer #2 (Public Review):

      Summary:

      Chirality of cells, organs, and organisms can stem from the chiral asymmetry of proteins and polymers at a much smaller lengthscale. The intrinsic chirality of actin filaments (F-actin) is implicated in the chiral arrangement and movement of cellular structures including F-actin-based bundles and the nucleus. It is unknown how opposite chiralities can be observed when the chirality of F-actin is invariant. Kwong, Chen, and co-authors explored this problem by studying chiral cell-scale structures in adherent mammalian cultured cells. They controlled the size of adhesive patches, and examined chirality at different timepoints. They made various molecular perturbations and used several quantitative assays. They showed that forces exerted by antiparallel actomyosin bundles on parallel radial bundles are responsible for the chirality of the actomyosin network at the cell scale.

      Strengths:

      Whereas previously, most effort has been put into understanding radial bundles, this study makes an important distinction that transverse or circumferential bundles are made of antiparallel actomyosin arrays. A minor point that was nice for the paper to make is that between the co-existing chirality of nuclear rotation and radial bundle tilt, it is the F-actin driving nuclear rotation and not the other way around. The paper is clearly written.

      Weaknesses:

      The paper could benefit from grammatical editing. Once the following Major and Minor points are addressed, which may not require any further experimentation and does not entail additional conditions, this manuscript would be appropriate for publication in eLife.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      Major:

      (1) The binary classification of cells as exhibiting clockwise or anticlockwise F-actin structures does not capture the instances where there is very little chirality, such as in the mDia2-depleted cells on small patches (Figure 6B). Such reports of cell chirality throughout the cell population need to be reported as the average angle of F-actin structures on a per cell basis as a rose plot or scatter plot of angle. These changes to cell-scoring and data display will be important to discern between conditions where chirality is random (50% CW, 50% ACW) from conditions where chirality is low (radial bundles are radial and transverse arcs are circumferential).

      We appreciate the reviewer’s comment. We apologize if we did not convey our analysis method clearly enough. Throughout the manuscript, unless mentioned otherwise, the chirality analysis was based on the chiral nucleus rotation within a period of observation. The only exception is the F-actin structure chirality, in Figure 3—figure supplement 1, which we analyzed the angle of radial fiber of the control cell on 2500 µm2. It was described on pages 5 to 6, lines 169-172, and the method section “Analysis of fiber orientation and actin structure on circular pattern” on page 17.

      Based on the feedback, we attempted to use a scatter plot to present the mDia2 overexpression and silencing to show the randomness of the result. However, because scatter plots primarily focus on visualizing the distribution, they become cluttered and visually overwhelming, as shown below.

      Author response image 1.

      (A) Percentage of ACW nucleus rotational bias on 2500 µm2 with untreated control (reused data from Figure 3D, n = 57), mDia2 silencing (n = 48), and overexpression (n = 25). (B) Probability of ACW/CW rotation on 750 µm2 pattern with untreated control (reused data from Figure 3E, n = 34), mDia2 silencing (n = 53), and overexpressing (n = 22). Mean ± SEM. Two-sample equal variance two-tailed t-test.

      Therefore, in our manuscript, the presentation primarily used a column bar chart with statistical analysis, the Student T-test. The column bar chart makes it easier to understand and compare values. In brief, the Student T-test is commonly used to evaluate whether the means between the two groups are significantly different, assuming equal variance. As such, the Student T-test is able to discern the randomness of the chirality.

      (2) The authors need to discuss the likely nucleator of F-actin in the radial bundles, since it is apparently not mDia2 in these cells.

      We appreciate the reviewer’s comment. In our manuscript, we originally focused on mDia2 and Tpm4 as they are the transverse arc nucleator and the mediator of myosin II motion. However, we agree with the reviewer that discussing the radial fiber nucleator would provide more insight into radial fiber polymerization in ACW chirality and improve the completeness of the story.

      Radial fiber polymerizes at the focal adhesion. Serval proteins are involved in actin nucleation or stress fiber formation at the focal adhesion, such as Arp2/3 complex (Serrels et al., 2007), Ena/VASP (Applewhite et al., 2007; Gateva et al., 2014), and formins (Dettenhofer et al., 2008; Sahasrabudhe et al., 2016; Tsuji et al., 2002), etc. Within the formin family, mDia1 is the likely nucleator of F-actin in the radial bundle. The presence of mDia1 facilitates the elongation of actin bundles at focal adhesion (Hotulainen and Lappalainen, 2006). Studies by Jalal, et al (2019) (Jalal et al., 2019) and Tee, et al (2023) (Tee et al., 2023), have demonstrated the silencing of mDia1 abolished the ACW actin expression. Silencing of other nucleation proteins like Arp2/3 complex or Ena/VASP would only reduce the ACW actin expression without abolishing it.

      Based on these findings, the attenuation of radial fiber elongation would abolish the ACW chiral expression, providing more support for our model in explaining chirality expression.

      This part is incorporated into the Discussion. The revision is made on page 13, lines 443, 449 to 459.

      Minor:

      (1) In the introduction, additional observations of handedness reversal need to be referenced (line 79), including Schonegg, Hyman, and Wood 2014 and Zaatri, Perry, and Maddox 2021.

      We appreciate the reviewer’s comment. The observations of handedness reversal references are cited on page 3, line 78 to 79.

      (2) For clarity of logic, the authors should share the rationale for choosing, and results from administering, the collection of compounds as presented in Figure 3 one at a time instead of as a list.

      We appreciate the reviewer’s comment. The concentration of drugs was determined based on existing literature and their known outcomes on actin arrangement or chiral expression.

      To elucidate, the use of Latrunculin A was based on previous studies, which have demonstrated to reverse the chirality at or below 50 nM (Bao et al., 2020; Chin et al., 2018; Kwong et al., 2019; Wan et al., 2011).  Because inhibiting F-actin assembly can lead to the expression of CW chirality, we hypothesized that the opposite treatment might enhance ACW chirality. Therefore, we chose A23187 treatment with 2 µM concentration as it could initiate the actin remodeling and stress fiber formation (Shao et al., 2015).

      Furthermore, in the attempt to replicate the reversal of chirality by inhibiting F-actin assembly through other pathways, we explored NSC23677 at 100 µM, which was found to inhibit the Rac1 activation (Chen et al., 2011; Gao et al., 2004) and reduce cortical F-actin assembly (Head et al., 2003). However, it failed to reverse the chirality but enhanced the ACW chirality of the cell.

      We carefully selected the drugs and the applied concentration to investigate various pathways and mechanisms that influence actin arrangement and might affect the chiral expression. We believe that this clarification strengthens the rationale behind our choice of drug. The revision is made on pages 6 to 7, lines 202 to 211.

      (3) "Image stacking" isn't a common term to this referee. Its first appearance in the main text (line 183) should be accompanied with a call-out to the Methods section. The authors could consider referring to this approach more directly. Related issue: Image stacking fails to report the prominent enrichment of F-actin at the very cell periphery (see Figure 3 A and F) except for with images of cells on small islands (Figure 3H). Since this data display approach seems to be adding the intensity from all images together, and since cells on circular adhesive patches are relatively radially symmetric, it is unclear how to align cells, but perhaps cells could be aligned based on a slight asymmetry such as the peripheral location with highest F-actin intensity or the apparent location of the centrosome.

      We appreciate the reviewer’s comment. We fully acknowledge the uncommon use of “image stacking” and the insufficient description of image stacking under the Method section. First, we have added a call-out to the Methods section at its first appearance (Page 6, Lines 182 to 183). The method of image stacking is as follows. During generating the distribution heatmap, the images were taken under the same setting (e.g., staining procedure, fluorescent intensity, exposure time, etc.). The prerequisite of cells to be included in image stacking was that they had to be fully spread on either 2500 µm2 or 750 µm2 circular patterns. Then, the consistent position for image stacking could be found by identifying the center of each cell spreading in a perfect circle. Finally, the images were aligned at the center to calculate the averaged intensity to show the distribution heatmap on the circular pattern.

      We agree with the reviewer that our image alignment and stacking are based on cells that are radially symmetric. As such, the intensity distribution of stacked image is to compare the difference of F-actin along the radial direction. Revision is made on page 19, lines 668 to 682.

      (4) The authors need to be consistent with wording about chirality, avoiding "right" and left (e.g. lines 245-6) since if the cell periphery were oriented differently in the cropped view, the tilt would be a different direction side-to-side but the same chirality. This section is confusing since the peripheral radial bundles are quite radial, and the inner ones are pointing from upper left to lower right, pointing (to the right) more downward over time, rather than more right-ward, in the cropped images.

      We appreciate the reviewer’s comment. We apologize for the confusion caused by our description of the tilting direction. For consistency in our later description, we mention the “right” or “left” direction of the radial fibers referencing to the elongation of the radial fiber, which then brings the “rightward tilting” toward the ACW rotation of the chiral pattern. To maintain the word “rightward tilting”, we added the description to ensure accurate communication in our writing. We also rearrange the image in the new Figure 4A and Video 2 for better observation. Revision is made on page 8, lines 262 to 263.

      (5) Why are the cells Figure 4A dominated by radial (and more-central, tilting fibers, while control cells in 4D show robust circumferential transverse arcs? Have these cells been plated for different amounts of time or is a different optical section shown?

      We appreciate the reviewer’s comment. The cells in Figure 4A and Figure 4D are prepared with similar conditions, such as incubation time and optical setting. Actin organization is a dynamic process, and cells can exhibit varied actin arrangements, transitioning between different forms such as circular, radial, chordal, chiral, or linear patterns, as they spread on a circular island (Tee et al., 2015). In Figure 4A, the actin is arranged in a chiral pattern, whereas in Figure 4D, the actin exhibits a radial pattern. These variations reflect the natural dynamics of actin organization within cells during the imaging process.

      (6) All single-color images (such as Fig 5 F-actin) need to be black-on-white, since it is far more difficult to see F-actin morphology with red on black.

      We appreciate the reviewer’s comment. We have changed all F-actin images (single color) into black and white for better image clarity. Revisions are made in the new Figure 5, Figure 6 and Figure 7.

      (7) Figure 5A, especially the F-actin staining, is quite a bit blurrier than other micrographs. These images should be replaced with images of comparable quality to those shown throughout.

      We appreciate the reviewer’s comment. We agree that the F-actin staining in Figure 5 is difficult to observe. To improve image clarity, the F-actin staining images are replaced with more zoomed-in image. Revision is made in the new Figure 5.

      (8) F-actin does not look unchanged by Y27632 treatment, as the authors state in line 306. This may be partially due to image quality and the ambiguities of communicating with the blue-to-red colormap. Similarly, I don't agree that mDia2 depletion did not change F-actin distribution (line 330) as cells in that condition had a prominent peripheral ring of F-actin missing from cells in other conditions.

      We appreciate the reviewer’s comment. We agree with the reviewer’s observation that the F-actin distribution is indeed changed under Y27632 treatment compared to the control in Figure 5A-B. Here, we would like to emphasize that the actin ring persists despite the actin structure being altered under the Y27632 treatment. The actin ring refers to the darker red circle in the distribution heatmap. It presents the condensed actin structure, including radial fibers and transverse arcs. This important structure remains unaffected despite the disruption of myosin II, the key component in radial fiber.

      Furthermore, we agree with the reviewer that mDia2 depletion does change F-actin distribution. Similar to the Y27632 treatment, the actin ring persists despite the actin structure being altered under mDia2 gene silencing. Moreover, compared to other treatments, mDia2 depletion has less significant impact on actin distribution. To address these points more comprehensively, we have made revision in Y27632 treatment and mDia2 sections. The revisions of Y27632 and mDia2 are made on pages 10, lines 324-327 and 352-353, respectively.

      (9) The colormap shown for intensity coding should be reconsidered, as dark red is harder to see than the yellow that is sub-maximal. Verdis is a colormap ranging from cooler and darker blue, through green, to warmer and lighter yellow as the maximum. Other options likely exist as well.

      We appreciate the reviewer’s comment. We carefully considered the reviewer’s concern and explored other color scale choices in the colormap function in Matlab. After evaluating different options, including “Verdis” color scale, we found that “jet” provides a wide range of colors, allowing the effective visual presentation of intensity variation in our data. The use of ‘jet’ allows us to appropriately visualize the actin ring distribution, which represented in red or dark re. While we understand that dark red could be harder to see than the sub-maximal yellow, we believe that “jet” serves our purpose of presenting the intensity information.

      (10) For Figure 6, why doesn't average distribution of NMMIIa look like the example with high at periphery, low inside periphery, moderate throughout lamella, low perinuclear, and high central?

      We appreciate the reviewer’s comment. We understand that the reviewer’s concern about the average distribution of NMMIIa not appearing as the same as the example. The chosen image is the best representation of the NMMIIa disruption from the transverse arcs after the mDia2 silencing. Additionally, it is important to note that the average distribution result is a stacked image which includes other images. As such, the NMMIIA example and the distribution heatmap might not necessarily appear identical.

      (11) In 2015, Tee, Bershadsky and colleagues demonstrated that transverse bundles are dorsal to radial bundles, using correlative light and electron microscopy. While it is important for Kwong and colleagues to show that this is true in their cells, they should reference Tee et al. in the rationale section of text pertaining to Figure 7.

      We appreciate the reviewer’s comment. Tee, et al (Tee et al., 2015) demonstrated the transverse fiber is at the same height as the radial fiber based on the correlative light and electron microscopy. Here, using the position of myosin IIa, a transverse arc component, our results show the dorsal positioning of transverse arcs with connection to the extension of radial fibers (Figure 7C), which is consistent with their findings. It is included in our manuscript, page 12, lines 421 to 424, and page 14 lines 477 to 480.

      Reference

      Applewhite, D.A., Barzik, M., Kojima, S.-i., Svitkina, T.M., Gertler, F.B., and Borisy, G.G. (2007). Ena/Vasp Proteins Have an Anti-Capping Independent Function in Filopodia Formation. Mol. Biol. Cell. 18, 2579-2591. DOI: https://doi.org/10.1091/mbc.e06-11-0990

      Bao, Y., Wu, S., Chu, L.T., Kwong, H.K., Hartanto, H., Huang, Y., Lam, M.L., Lam, R.H., and Chen, T.H. (2020). Early Committed Clockwise Cell Chirality Upregulates Adipogenic Differentiation of Mesenchymal Stem Cells. Adv. Biosyst. 4, 2000161. DOI: https://doi.org/10.1002/adbi.202000161

      Chen, Q.-Y., Xu, L.-Q., Jiao, D.-M., Yao, Q.-H., Wang, Y.-Y., Hu, H.-Z., Wu, Y.-Q., Song, J., Yan, J., and Wu, L.-J. (2011). Silencing of Rac1 Modifies Lung Cancer Cell Migration, Invasion and Actin Cytoskeleton Rearrangements and Enhances Chemosensitivity to Antitumor Drugs. Int. J. Mol. Med. 28, 769-776. DOI: https://doi.org/10.3892/ijmm.2011.775

      Chin, A.S., Worley, K.E., Ray, P., Kaur, G., Fan, J., and Wan, L.Q. (2018). Epithelial Cell Chirality Revealed by Three-Dimensional Spontaneous Rotation. Proc. Natl. Acad. Sci. U.S.A. 115, 12188-12193. DOI: https://doi.org/10.1073/pnas.1805932115

      Dettenhofer, M., Zhou, F., and Leder, P. (2008). Formin 1-Isoform IV Deficient Cells Exhibit Defects in Cell Spreading and Focal Adhesion Formation. PLoS One 3, e2497. DOI:  https://doi.org/10.1371/journal.pone.0002497

      Gao, Y., Dickerson, J.B., Guo, F., Zheng, J., and Zheng, Y. (2004). Rational Design and Characterization of a Rac GTPase-Specific Small Molecule Inhibitor. Proc. Natl. Acad. Sci. U.S.A. 101, 7618-7623. DOI: https://doi.org/10.1073/pnas.0307512101

      Gateva, G., Tojkander, S., Koho, S., Carpen, O., and Lappalainen, P. (2014). Palladin Promotes Assembly of Non-Contractile Dorsal Stress Fibers through Vasp Recruitment. J. Cell Sci. 127, 1887-1898. DOI: https://doi.org/10.1242/jcs.135780

      Haarer, B., and Brown, S.S. (1990). Structure and Function of Profilin.

      Head, J.A., Jiang, D., Li, M., Zorn, L.J., Schaefer, E.M., Parsons, J.T., and Weed, S.A. (2003). Cortactin Tyrosine Phosphorylation Requires Rac1 Activity and Association with the Cortical Actin Cytoskeleton. Mol. Biol. Cell. 14, 3216-3229. DOI: https://doi.org/10.1091/mbc.e02-11-0753

      Hotulainen, P., and Lappalainen, P. (2006). Stress Fibers are Generated by Two Distinct Actin Assembly Mechanisms in Motile Cells. J. Cell Biol. 173, 383-394. DOI: https://doi.org/10.1083/jcb.200511093

      Jalal, S., Shi, S., Acharya, V., Huang, R.Y., Viasnoff, V., Bershadsky, A.D., and Tee, Y.H. (2019). Actin Cytoskeleton Self-Organization in Single Epithelial Cells and Fibroblasts under Isotropic Confinement. J. Cell Sci. 132. DOI: https://doi.org/10.1242/jcs.220780

      Kwong, H.K., Huang, Y., Bao, Y., Lam, M.L., and Chen, T.H. (2019). Remnant Effects of Culture Density on Cell Chirality after Reseeding. J. Cell Sci. 132. DOI: https://doi.org/10.1242/jcs.220780

      Moolenaar, W.H. (1995). Lysophosphatidic Acid, a Multifunctional Phospholipid Messenger. J. Cell Sci. 132. DOI: https://doi.org/10.1242/jcs.220780

      Mukherjee, K., Ishii, K., Pillalamarri, V., Kammin, T., Atkin, J.F., Hickey, S.E., Xi, Q.J., Zepeda, C.J., Gusella, J.F., and Talkowski, M.E. (2016). Actin Capping Protein Capzb Regulates Cell Morphology, Differentiation, and Neural Crest Migration in Craniofacial Morphogenesis. Hum. Mol. Genet. 25, 1255-1270. DOI: https://doi.org/10.1093/hmg/ddw006

      Sahasrabudhe, A., Ghate, K., Mutalik, S., Jacob, A., and Ghose, A. (2016). Formin 2 Regulates the Stabilization of Filopodial Tip Adhesions in Growth Cones and Affects Neuronal Outgrowth and Pathfinding In Vivo. Development 143, 449-460. DOI: https://doi.org/10.1242/dev.130104

      Serrels, B., Serrels, A., Brunton, V.G., Holt, M., McLean, G.W., Gray, C.H., Jones, G.E., and Frame, M.C. (2007). Focal Adhesion Kinase Controls Actin Assembly via a Ferm-Mediated Interaction with the Arp2/3 Complex. Nat. Cell Biol. 9, 1046-1056. DOI: https://doi.org/10.1038/ncb1626

      Shao, X., Li, Q., Mogilner, A., Bershadsky, A.D., and Shivashankar, G. (2015). Mechanical Stimulation Induces Formin-Dependent Assembly of a Perinuclear Actin Rim. Proc. Natl. Acad. Sci. U.S.A. 112, E2595-E2601. DOI: https://doi.org/10.1073/pnas.1504837112

      Tee, Y.H., Goh, W.J., Yong, X., Ong, H.T., Hu, J., Tay, I.Y.Y., Shi, S., Jalal, S., Barnett, S.F., and Kanchanawong, P. (2023). Actin Polymerisation and Crosslinking Drive Left-Right Asymmetry in Single Cell and Cell Collectives. Nat. Commun. 14, 776. DOI: https://doi.org/10.1038/s41467-023-35918-1

      Tee, Y.H., Shemesh, T., Thiagarajan, V., Hariadi, R.F., Anderson, K.L., Page, C., Volkmann, N., Hanein, D., Sivaramakrishnan, S., Kozlov, M.M., and Bershadsky, A.D. (2015). Cellular Chirality Arising from the Self-Organization of the Actin Cytoskeleton. Nat. Cell Biol. 17, 445-457. DOI: https://doi.org/10.1038/ncb3137

      Tojkander, S., Gateva, G., Schevzov, G., Hotulainen, P., Naumanen, P., Martin, C., Gunning, P.W., and Lappalainen, P. (2011). A Molecular Pathway for Myosin II Recruitment to Stress Fibers. Curr. Biol. 21, 539-550. DOI: https://doi.org/10.1016/j.cub.2011.03.007

      Tsuji, T., Ishizaki, T., Okamoto, M., Higashida, C., Kimura, K., Furuyashiki, T., Arakawa, Y., Birge, R.B., Nakamoto, T., Hirai, H., and Narumiya, S. (2002). Rock and mdia1 Antagonize in Rho-Dependent Rac Activation in Swiss 3T3 Fibroblasts. J. Cell Biol. 157, 819-830. DOI: https://doi.org/10.1083/jcb.200112107

      Wan, L.Q., Ronaldson, K., Park, M., Taylor, G., Zhang, Y., Gimble, J.M., and Vunjak-Novakovic, G. (2011). Micropatterned Mammalian Cells Exhibit Phenotype-Specific Left-Right Asymmetry. Proc. Natl. Acad. Sci. U.S.A. 108, 12295-12300. DOI: https://doi.org/10.1073/pnas.1103834108

      Witke, W. (2004). The Role of Profilin Complexes in Cell Motility and Other Cellular Processes. Trends Cell Biol. 14, 461-469. DOI: https://doi.org/10.1016/j.tcb.2004.07.003

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important study provides solid evidence that both psychiatric dimensions (e.g. anhedonia, apathy, or depression) and chronotype (i.e., being a morning or evening person) influence effort-based decision-making. Notably, the current study does not elucidate whether there may be interactive effects of chronotype and psychiatric dimensions on decision-making. This work is of importance to researchers and clinicians alike, who may make inferences about behaviour and cognition without taking into account whether the individual may be tested or observed out-of-sync with their phenotype.

      We thank the three reviewers for their comments, and the Editors at eLife. We have taken the opportunity to revise our manuscript considerably from its original form, not least because we feel a number of the reviewers’ suggested analyses strengthen our manuscript considerably (in one instance even clarifying our conclusions, leading us to change our title)—for which we are very appreciative indeed. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study uses an online cognitive task to assess how reward and effort are integrated in a motivated decision-making task. In particular the authors were looking to explore how neuropsychiatric symptoms, in particular apathy and anhedonia, and circadian rhythms affect behavior in this task. Amongst many results, they found that choice bias (the degree to which integrated reward and effort affects decisions) is reduced in individuals with greater neuropsychiatric symptoms, and late chronotypes (being an 'evening person').

      Strengths:

      The authors recruited participants to perform the cognitive task both in and out of sync with their chronotypes, allowing for the important insight that individuals with late chronotypes show a more reduced choice bias when tested in the morning.<br /> Overall, this is a well-designed and controlled online experimental study. The modelling approach is robust, with care being taken to both perform and explain to the readers the various tests used to ensure the models allow the authors to sufficiently test their hypotheses.

      Weaknesses:

      This study was not designed to test the interactions of neuropsychiatric symptoms and chronotypes on decision making, and thus can only make preliminary suggestions regarding how symptoms, chronotypes and time-of-assessment interact.

      We appreciate the Reviewer’s positive view of our research and agree with their assessment of its weaknesses; the study was not designed to assess chronotype-mental health interactions. We hope that our new title and contextualisation makes this clearer. We respond in more detail point-by-point below.

      Reviewer #2 (Public Review):

      Summary:

      The study combines computational modeling of choice behavior with an economic, effort-based decision-making task to assess how willingness to exert physical effort for a reward varies as a function of individual differences in apathy and anhedonia, or depression, as well as chronotype. They find an overall reduction in effort selection that scales with apathy and anhedonia and depression. They also find that later chronotypes are less likely to choose effort than earlier chronotypes and, interestingly, an interaction whereby later chronotypes are especially unwilling to exert effort in the morning versus the evening.

      Strengths:

      This study uses state-of-the-art tools for model fitting and validation and regression methods which rule out multicollinearity among symptom measures and Bayesian methods which estimate effects and uncertainty about those estimates. The replication of results across two different kinds of samples is another strength. Finally, the study provides new information about the effects not only of chronotype but also chronotype by timepoint interactions which are previously unknown in the subfield of effort-based decision-making.

      Weaknesses:

      The study has few weaknesses. One potential concern is that the range of models which were tested was narrow, and other models might have been considered. For example, the Authors might have also tried to fit models with an overall inverse temperature parameter to capture decision noise. One reason for doing so is that some variance in the bias parameter might be attributed to noise, which was not modeled here. Another concern is that the manuscripts discuss effort-based choice as a transdiagnostic feature - and there is evidence in other studies that effort deficits are a transdiagnostic feature of multiple disorders. However, because the present study does not investigate multiple diagnostic categories, it doesn't provide evidence for transdiagnosticity, per se.

      We appreciate Reviewer 2’s assessment of our research and agree generally with its weaknesses. We have now addressed the Reviewer’s comments regarding transdiagnosticity in the discussion of our revised version and have addressed their detailed recommendations below (see point-by-point responses).

      In addition to the below specific changes, in our Discussion section, we now have also added the following (lines 538 – 540):

      “Finally, we would like to note that as our study is based on a general population sample, rather than a clinical one. Hence, we cannot speak to transdiagnosticity on the level of multiple diagnostic categories.”

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, Mehrhof and Nord study a large dataset of participants collected online (n=958 after exclusions) who performed a simple effort-based choice task. They report that the level of effort and reward influence choices in a way that is expected from prior work. They then relate choice preferences to neuropsychiatric syndromes and, in a smaller sample (n<200), to people's circadian preferences, i.e., whether they are a morning-preferring or evening-preferring chronotype. They find relationships between the choice bias (a model parameter capturing the likelihood to accept effort-reward challenges, like an intercept) and anhedonia and apathy, as well as chronotype. People with higher anhedonia and apathy and an evening chronotype are less likely to accept challenges (more negative choice bias). People with an evening chronotype are also more reward sensitive and more likely to accept challenges in the evening, compared to the morning.

      Strengths:

      This is an interesting and well-written manuscript which replicates some known results and introduces a new consideration related to potential chronotype relationships which have not been explored before. It uses a large sample size and includes analyses related to transdiagnostic as well as diagnostic criteria. I have some suggestions for improvements.

      Weaknesses:

      (1) The novel findings in this manuscript are those pertaining to transdiagnostic and circadian phenotypes. The authors report two separate but "overlapping" effects: individuals high on anhedonia/apathy are less willing to accept offers in the task, and similarly, individuals tested off their chronotype are less willing to accept offers in the task. The authors claim that the latter has implications for studying the former. In other words, because individuals high on anhedonia/apathy predominantly have a late chronotype (but might be tested early in the day), they might accept less offers, which could spuriously look like a link between anhedonia/apathy and choices but might in fact be an effect of the interaction between chronotype and time-of-testing. The authors therefore argue that chronotype needs to be accounted for when studying links between depression and effort tasks.

      The authors argue that, if X is associated with Y and Z is associated with Y, X and Z might confound each other. That is possible, but not necessarily true. It would need to be tested explicitly by having X (anhedonia/apathy) and Z (chronotype) in the same regression model. Does the effect of anhedonia/apathy on choices disappear when accounting for chronotype (and time-of-testing)? Similarly, when adding the interaction between anhedonia/apathy, chronotype, and time-of-testing, within the subsample of people tested off their chronotype, is there a residual effect of anhedonia/apathy on choices or not?

      If the effect of anhedonia/apathy disappeared (or got weaker) while accounting for chronotype, this result would suggest that chronotype mediates the effect of anhedonia/apathy on effort choices. However, I am not sure it renders the direct effect of anhedonia/apathy on choices entirely spurious. Late chronotype might be a feature (induced by other symptoms) of depression (such as fatigue and insomnia), and the association between anhedonia/apathy and effort choices might be a true and meaningful one. For example, if the effect of anhedonia/apathy on effort choices was mediated by altered connectivity of the dorsal ACC, we would not say that ACC connectivity renders the link between depression and effort choices "spurious", but we would speak of a mechanism that explains this effect. The authors should discuss in a more nuanced way what a significant mediation by the chronotype/time-of-testing congruency means for interpreting effects of depression in computational psychiatry.

      We thank the Reviewer for pointing out this crucial weakness in the original version of our manuscript. We have now thought deeply about this and agree with the Reviewer that our original results did not warrant our interpretation that reported effects of anhedonia and apathy on measures of effort-based decision-making could potentially be spurious. At the Reviewer’s suggestion, we decided to test this explicitly in our revised version—a decision that has now deepened our understanding of our results, and changed our interpretation thereof.  

      To investigate how the effects of neuropsychiatric symptoms and the effects of circadian measures relate to each other, we have followed the Reviewer’s advice and conducted an additional series of analyses (see below). Surprisingly (to us, but perhaps not the Reviewer) we discovered that all three symptom measures (two of anhedonia, one of apathy) have separable effects from circadian measures on the decision to expend effort (note we have also re-named our key parameter ‘motivational tendency’ to address this Reviewer’s next comment that the term ‘choice bias’ was unclear). In model comparisons (based on leave-one-out information criterion which penalises for model complexity) the models including both circadian and psychiatric measures always win against the models including either circadian or psychiatric measures. In essence, this strengthens our claims about the importance of measuring circadian rhythm in effort-based tasks generally, as circadian rhythm clearly plays an important role even when considering neuropsychiatric symptoms, but crucially does not support the idea of spurious effects: statistically, circadian measures contributes separably from neuropsychiatric symptoms to the variance in effort-based decision-making. We think this is very interesting indeed, and certainly clarifies (and corrects the inaccuracy in) our original interpretation—and can only express our thanks to the Reviewer for helping us understand our effect more fully.

      In response to these new insights, we have made numerous edits to our manuscript. First, we changed the title from “Overlapping effects of neuropsychiatric symptoms and circadian rhythm on effort-based decision-making” to “Both neuropsychiatric symptoms and circadian rhythm alter effort-based decision-making”. In the remaining manuscript we now refrain from using the word ‘overlapping’ (which could be interpreted as overlapping in explained variance), and instead opted to describe the effects as parallel. We hope our new analyses, title, and clarified/improved interpretations together address the Reviewer’s valid concern about our manuscript’s main weakness.

      We detail these new analyses in the Methods section as follows (lines 800 – 814):

      “4.5.2. Differentiating between the effects of neuropsychiatric symptoms and circadian measures on motivational tendency

      To investigate how the effects of neuropsychiatric symptoms on motivational tendency (2.3.1) relate to effects of chronotype and time-of-day on motivational tendency we conducted exploratory analyses. In the subsamples of participants with an early or late chronotype (including additionally collected data), we first ran Bayesian GLMs with neuropsychiatric questionnaire scores (SHAPS, DARS, AES respectively) predicting motivational tendency, controlling for age and gender. We next added an interaction term of chronotype and time-of-day into the GLMs, testing how this changes previously observed neuropsychiatric and circadian effects on motivational tendency. Finally, we conducted a model comparison using LOO, comparing between motivational tendency predicted by a neuropsychiatric questionnaire, motivational tendency predicted by chronotype and time-of-day, and motivational tendency predicted by a neuropsychiatric questionnaire and time-of-day (for each neuropsychiatric questionnaire, and controlling for age and gender).”

      Results of the outlined analyses are reported in the results section as follows (lines 356 – 383):

      “2.5.2.1 Neuropsychiatric symptoms and circadian measures have separable effects on motivational tendency

      Exploratory analyses testing for the effects of neuropsychiatric questionnaires on motivational tendency in the subsamples of early and late chronotypes confirmed the predictive value of the SHAPS (M=-0.24, 95% HDI=[-0.42,-0.06]), the DARS (M=-0.16, 95% HDI=[-0.31,-0.01]), and the AES (M=-0.18, 95% HDI=[-0.32,-0.02]) on motivational tendency.

      For the SHAPS, we find that when adding the measures of chronotype and time-of-day back into the GLMs, the main effect of the SHAPS (M=-0.26, 95% HDI=[-0.43,-0.07]), the main effect of chronotype (M=-0.11, 95% HDI=[-0.22,-0.01]), and the interaction effect of chronotype and time-of-day (M=0.20, 95% HDI=[0.07,0.34]) on motivational tendency remain. Model comparison by LOOIC reveals motivational tendency is best predicted by the model including the SHAPS, chronotype and time-of-day as predictors, followed by the model including only the SHAPS. Note that this approach to model comparison penalizes models for increasing complexity.

      Repeating these steps with the DARS, the main effect of the DARS is found numerically, but the 95% HDI just includes 0 (M=-0.15, 95% HDI=[-0.30,0.002]). The main effect of chronotype (M=-0.11, 95% HDI=[-0.21,-0.01]), and the interaction effect of chronotype and time-of-day (M=0.18, 95% HDI=[0.05,0.33]) on motivational tendency remain. Model comparison identifies the model including the DARS and circadian measures as the best model, followed by the model including only the DARS.

      For the AES, the main effect of the AES is found (M=-0.19, 95% HDI=[-0.35,-0.04]). For the main effect of chronotype, the 95% narrowly includes 0 (M=-0.10, 95% HDI=[-0.21,0.002]), while the interaction effect of chronotype and time-of-day (M=0.20, 95% HDI=[0.07,0.34]) on motivational tendency remains. Model comparison identifies the model including the AES and circadian measures as the best model, followed by the model including only the AES.”

      We have now edited parts of our Discussion to discuss and reflect these new insights, including the following.

      Lines 399 – 402:

      “Various neuropsychiatric disorders are marked by disruptions in circadian rhythm, such as a late chronotype. However, research has rarely investigated how transdiagnostic mechanisms underlying neuropsychiatric conditions may relate to inter-individual differences in circadian rhythm.”

      Lines 475 – 480:

      “It is striking that the effects of neuropsychiatric symptoms on effort-based decision-making largely are paralleled by circadian effects on the same neurocomputational parameter. Exploratory analyses predicting motivational tendency by neuropsychiatric symptoms and circadian measures simultaneously indicate the effects go beyond recapitulating each other, but rather explain separable parts of the variance in motivational tendency.”

      Lines 528 – 532:

      “Our reported analyses investigating neuropsychiatric and circadian effects on effort-based decision-making simultaneously are exploratory, as our study design was not ideally set out to examine this. Further work is needed to disentangle separable effects of neuropsychiatric and circadian measures on effort-based decision-making.”

      Lines 543 – 550:

      “We demonstrate that neuropsychiatric effects on effort-based decision-making are paralleled by effects of circadian rhythm and time-of-day. Exploratory analyses suggest these effects account for separable parts of the variance in effort-based decision-making. It unlikely that effects of neuropsychiatric effects on effort-based decision-making reported here and in previous literature are a spurious result due to multicollinearity with chronotype. Yet, not accounting for chronotype and time of testing, which is the predominant practice in the field, could affect results.”

      (2) It seems that all key results relate to the choice bias in the model (as opposed to reward or effort sensitivity). It would therefore be helpful to understand what fundamental process the choice bias is really capturing in this task. This is not discussed, and the direction of effects is not discussed either, but potentially quite important. It seems that the choice bias captures how many effortful reward challenges are accepted overall which maybe captures general motivation or task engagement. Maybe it is then quite expected that this could be linked with questionnaires measuring general motivation/pleasure/task engagement. Formally, the choice bias is the constant term or intercept in the model for p(accept), but the authors never comment on what its sign means. If I'm not mistaken, people with higher anhedonia but also higher apathy are less likely to accept challenges and thus engage in the task (more negative choice bias). I could not find any discussion or even mention of what these results mean. This similarly pertains to the results on chronotype. In general, "choice bias" may not be the most intuitive term and the authors may want to consider renaming it. Also, given the sign of what the choice bias means could be flipped with a simple sign flip in the model equation (i.e., equating to accepting more vs accepting less offers), it would be helpful to show some basic plots to illustrate the identified differences (e.g., plotting the % accepted for people in the upper and lower tertile for the SHAPS score etc).

      We apologise that this was not made clear previously: the meaning and directionality of “choice bias” is indeed central to our results. We also thank the Reviewer for pointing out the previousely-used term “choice bias” itself might not be intuitive. We have now changed this to ‘motivational tendency’ (see below) as well as added substantial details on this parameter to the manuscript, including additional explanations and visualisations of the model as suggested by the Reviewer (new Figure 3) and model-agnostic results to aid interpretation (new Figure S3). Note the latter is complex due to our staircasing procedure (see new figure panel D further detailing our staircasing procedure in Figure 2). This shows that participants with more pronounced anhedonia are less likely to accept offers than those with low anhedonia (Fig. S3A), a model-agnostic version of our central result.

      Our changes are detailed below:

      After careful evaluation we have decided to term the parameter “motivational tendency”, hoping that this will present a more intuitive description of the parameter.

      To aid with the understanding and interpretation of the model parameters, and motivational tendency in particular, we have added the following explanation to the main text:

      Lines 149 – 155:

      “The models posit efforts and rewards are joined into a subjective value (SV), weighed by individual effort (and reward sensitivity (parameters. The subjective value is then integrated with an individual motivational tendency (a) parameter to guide decision-making. Specifically, the motivational tendency parameter determines the range at which subjective values are translated to acceptance probabilities: the same subjective value will translate to a higher acceptance probability the higher the motivational tendency.”

      Further, we have included a new figure, visualizing the model. This demonstrates how the different model parameters contribute to the model (A), and how different values on each parameter affects the model (B-D).

      We agree that plotting model agnostic effects in our data may help the reader gain intuition of what our task results mean. We hope to address this with our added section on “Model agnostic task measures relating to questionnaires”. We first followed the reviewer’s suggestion of extracting subsamples with higher and low anhedonia (as measured with the SHAPS, highest and lowest quantile) and plotted the acceptance proportion across effort and reward levels (panel A in figure below). However, due to our implemented task design, this only shows part of the picture: the staircasing procedure individualises which effort-reward combination a participant is presented with. Therefore, group differences in choice behaviour will lead to differences in the development of the staircases implemented in our task. Thus, we plotted the count of offered effort-reward combinations for the subsamples of participants with high vs. low SHAPS scores by the end of the task, averaged across staircases and participants.

      As the aspect of task development due to the implemented staircasing may not have been explained sufficiently in the main text, we have included panel (D) in figure 2.

      Further, we have added the following figure reference to the main text (lines 189 – 193):

      “The development of offered effort and reward levels across trials is shown in figure 2D; this shows that as participants generally tend to accept challenges rather than reject them, the implemented staircasing procedure develops toward higher effort and lover reward challenges.”

      To statistically test effects of model-agnostic task measures on the neuropsychiatric questionnaires, we performed Bayesian GLMs with the proportion of accepted trials predicted by SHAPS and AES. This is reported in the text as follows.

      Supplement, lines 172 – 189:

      “To explore the relationship between model agnostic task measures to questionnaire measures of neuropsychiatric symptoms, we conducted Bayesian GLMs, with the proportion of accepted trials predicted by SHAPS scores, controlling for age and gender. The proportion of accepted trials averaged across effort and reward levels was predicted by the Snaith-Hamilton Pleasure Scale (SHAPS) sum scores (M=-0.07; 95%HDI=[-0.12,-0.03]) and the Apathy Evaluation Scale (AES) sum scores (M=-0.05; 95%HDI=[-0.10,-0.002]). Note that this was not driven only by higher effort levels; even confining data to the lowest two effort levels, SHAPS has a predictive value for the proportion of accepted trials: M=-0.05; 95%HDI=[-0.07,-0.02].<br /> A visualisation of model agnostic task measures relating to symptoms is given in Fig. S4, comparing subgroups of participants scoring in the highest and lowest quartile on the SHAPS. This shows that participants with a high SHAPS score (i.e., more pronounced anhedonia) are less likely to accept offers than those with a low SHAPS score (Fig. S4A). Due to the implemented staircasing procedure, group differences can also be seen in the effort-reward combinations offered per trial. While for both groups, the staircasing procedure seems to devolve towards high effort – low reward offers, this is more pronounced in the subgroup of participants with a lower SHAPS score (Fig S4B).”

      (3) None of the key effects relate to effort or reward sensitivity which is somewhat surprising given the previous literature and also means that it is hard to know if choice bias results would be equally found in tasks without any effort component. (The only analysis related to effort sensitivity is exploratory and in a subsample of N=56 per group looking at people meeting criteria for MDD vs matched controls.) Were stimuli constructed such that effort and reward sensitivity could be separated (i.e., are uncorrelated/orthogonal)? Maybe it would be worth looking at the % accepted in the largest or two largest effort value bins in an exploratory analysis. It seems the lowest and 2nd lowest effort level generally lead to accepting the challenge pretty much all the time, so including those effort levels might not be sensitive to individual difference analyses?

      We too were initially surprised by the lack of effect of neuropsychiatric symptoms on reward and effort sensitivity. To address the Reviewer’s first comment, the nature of the ‘choice bias’ parameter (now motivational tendency) is its critical importance in the context of effort-based decision-making: it is not modelled or measured explicitly in tasks without effort (such as typical reward tasks), so it would be impossible to test this in tasks without an effort component. 

      For the Reviewer’s second comment, the exploratory MDD analysis is not our only one related to effort sensitivity: the effort sensitivity parameter is included in all of our central analyses, and (like reward sensitivity), does not relate to our measured neuropsychiatric symptoms (e.g., see page 15). Note most previous effort tasks do not include a ‘choice bias’/motivational tendency parameter, potentially explaining this discrepancy. However, our model was quantitatively superior to models without this parameter, for example with only effort- and reward-sensitivity (page 11, Fig. 3).

      Our three model parameters (reward sensitivity, effort sensitivity, and choice bias/motivational tendency) were indeed uncorrelated/orthogonal to one another (see parameter orthogonality analyses below), making it unlikely that the variance and effect captured by our motivational tendency parameter (previously termed “choice bias”) should really be attributed to reward sensitivity. As per the Reviewer’s suggestion, we also examined whether the lowest two effort levels might not be sensitive to individual differences; in fact, we found out proportion of accepted trials on the lowest effort levels alone was nevertheless predicted by anhedonia (see ceiling effect analyses below).

      Specifically, in terms of parameter orthogonality:

      When developing our task design and computational modelling approach we were careful to ensure that meaningful neurocomputational parameters could be estimated and that no spurious correlations between parameters would be introduced by modelling. By conducting parameter recoveries for all models, we showed that our modelling approach could reliably estimate parameters, and that estimated parameters are orthogonal to the other underlying parameters (as can be seen in Figure S1 in the supplement). It is thus unlikely that the variance and effect captured by our motivational tendency parameter (previously termed “choice bias”) should really be attributed to reward sensitivity.

      And finally, regarding the possibility of a ceiling effect for low effort levels:

      We agree that visual inspection of the proportion of accepted results across effort and reward values can lead to the belief that a ceiling effect prevents the two lowest effort levels from capturing any inter-individual differences. To test whether this is the case, we ran a Bayesian GLM with the SHAPS sum score predicting the proportion of accepted trials (controlling for age and gender), in a subset of the data including only trials with an effort level of 1 or 2. We found the SHAPS has a predictive value for the proportion of accepted trials in the lowest two effort levels: M=-0.05; 95%HDI=[-0.07,-0.02]). This is noted in the text as follows.

      Supplement, lines 175 – 180:

      “The proportion of accepted trials averaged across effort and reward levels was predicted by the Snaith-Hamilton Pleasure Scale (SHAPS) sum scores (M=-0.07; 95%HDI=[-0.12,-0.03]) and the Apathy Evaluation Scale (AES) sum scores (M=-0.05; 95%HDI=[-0.10,-0.002]). Note that this was not driven only by higher effort levels; even confining data to the lowest two effort levels, SHAPS has a predictive value for the proportion of accepted trials: M=-0.05; 95%HDI=[-0.07,-0.02].”

      (4) The abstract and discussion seem overstated (implications for the school system and statements on circadian rhythms which were not measured here). They should be toned down to reflect conclusions supported by the data.

      We thank the Reviewer for pointing this out, and have now removed these claims from the abstract and Discussion; we hope they now better reflect conclusions supported by these data directly.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Suggestions for improved or additional experiments, data or analyses.

      - For a non-computational audience, it would be useful to unpack the influence of the choice bias on behavior, as it is less clear how this would affect decision-making than sensitivity to effort or reward. Perhaps a figure showing accept/reject decisions when sensitivities are held and choice bias is high would be beneficial.

      We thank the Reviewer for suggesting additional explanations of the choice bias parameter to aid interpretation for non-computational readers; as per the Reviewer’s suggestion, we have now included additional explanations and visualisations (Figure 3) to make this as clear as possible. Please note also that, in response to one of the other Reviewers and after careful considerations, we have decided to rename the “choice bias” parameter to “motivational tendency”, hoping this will prove more intuitive.

      To aid with the understanding and interpretation of this and the other model parameters, we have added the following explanation to the main text.

      Lines 149 – 155:

      “The models posit efforts and rewards are joined into a subjective value (SV), weighed by individual effort (and reward sensitivity (parameters. The subjective value is then integrated with an individual motivational tendency (a) parameter to guide decision-making. Specifically, the motivational tendency parameter determines the range at which subjective values are translated to acceptance probabilities: the same subjective value will translate to a higher acceptance probability the higher the motivational tendency.”

      Additionally, we add the following explanation to the Methods section.

      Lines 698 – 709:

      First, a cost function transforms costs and rewards associated with an action into a subjective value (SV):

      with and for reward and effort sensitivity, and ℛ and 𝐸 for reward and effort. Higher effort and reward sensitivity mean the SV is more strongly influenced by changes in effort and reward, respectively (Fig. 3B-C). Hence, low effort and reward sensitivity mean the SV, and with that decision-making, is less guided by effort and reward offers, as would be in random decision-making.

      This SV is then transformed to an acceptance probability by a softmax function:

      with for the predicted acceptance probability and 𝛼 for the intercept representing motivational tendency. A high motivational tendency means a subjects has a tendency, or bias, to accept rather than reject offers (Fig. 3D).

      Our new figure (panels A-D in figure 3) visualizes the model. This demonstrates how the different model parameters come at play in the model (A), and how different values on each parameter affects the model (B-D).

      - The early and late chronotype groups have significant differences in ages and gender. Additional supplementary analysis here may mitigate any concerns from readers.

      The Reviewer is right to notice that our subsamples of early and late chronotypes differ significantly in age and gender, but it important to note that all our analyses comparing these two groups take this into account, statistically controlling for age and gender. We regret that this was previously only mentioned in the Methods section, so this information was not accessible where most relevant. To remedy this, we have amended the Results section as follows.

      Lines 317 – 323:

      “Bayesian GLMs, controlling for age and gender, predicting task parameters by time-of-day and chronotype showed effects of chronotype on reward sensitivity (i.e. those with a late chronotype had a higher reward sensitivity; M= 0.325, 95% HDI=[0.19,0.46]) and motivational tendency (higher in early chronotypes; M=-0.248, 95% HDI=[-0.37,-0.11]), as well as an interaction between chronotype and time-of-day on motivational tendency (M=0.309, 95% HDI=[0.15,0.48]).”

      (2) Recommendations for improving the writing and presentation.

      - I found the term 'overlapping' a little jarring. I think the authors use it to mean both neuropsychiatric symptoms and chronotypes affect task parameters, but they are are not tested to be 'separable', nor is an interaction tested. Perhaps being upfront about how interactions are not being tested here (in the introduction, and not waiting until the discussion) would give an opportunity to operationalize this term.

      We agree with the Reviewer that our previously-used term “overlapping” was not ideal: it may have been misleading, and was not necessarily reflective of the nature of our findings. We now state explicitly that we are not testing an interaction between neuropsychiatric symptoms and chronotypes in our primary analyses. Additionally, following suggestions made by Reviewer 3, we ran new exploratory analyses to investigate how the effects of neuropsychiatric symptoms and circadian measures on motivational tendency relate to one another. These results in fact show that all three symptom measures have separable effects from circadian measures on motivational tendency. This supports the Reviewer’s view that ‘overlapping’ was entirely the wrong word—although it nevertheless shows the important contribution of circadian rhythm as well as neuropsychiatric symptoms in effort-based decision-making. We have changed the manuscript throughout to better describe this important, more accurate interpretation of our findings, including replacing the term “overlapping”. We changed the title from “Overlapping effects of neuropsychiatric symptoms and circadian rhythm on effort-based decision-making” to “Both neuropsychiatric symptoms and circadian rhythm alter effort-based decision-making”.

      To clarify the intention of our primary analyses, we have added the following to the last paragraph of the introduction.

      Lines 107 – 112:

      “Next, we pre-registered a follow-up experiment to directly investigate how circadian preference interacts with time-of-day on motivational decision-making, using the same task and computational modelling approach. While this allows us to test how circadian effects on motivational decision-making compare to neuropsychiatric effects, we do not test for possible interactions between neuropsychiatric symptoms and chronobiology.”

      We detail our new analyses in the Methods section as follows.

      Lines 800 – 814:

      “4.5.2 Differentiating between the effects of neuropsychiatric symptoms and circadian measures on motivational tendency

      To investigate how the effects of neuropsychiatric symptoms on motivational tendency (2.3.1) relate to effects of chronotype and time-of-day on motivational tendency we conducted exploratory analyses. In the subsamples of participants with an early or late chronotype (including additionally collected data), we first ran Bayesian GLMs with neuropsychiatric questionnaire scores (SHAPS, DARS, AES respectively) predicting motivational tendency, controlling for age and gender. We next added an interaction term of chronotype and time-of-day into the GLMs, testing how this changes previously observed neuropsychiatric and circadian effects on motivational tendency. Finally, we conducted a model comparison using LOO, comparing between motivational tendency predicted by a neuropsychiatric questionnaire, motivational tendency predicted by chronotype and time-of-day, and motivational tendency predicted by a neuropsychiatric questionnaire and time-of-day (for each neuropsychiatric questionnaire, and controlling for age and gender).”

      Results of the outlined analyses are reported in the Results section as follows.

      Lines 356 – 383:

      “2.5.2.1 Neuropsychiatric symptoms and circadian measures have separable effects on motivational tendency

      Exploratory analyses testing for the effects of neuropsychiatric questionnaires on motivational tendency in the subsamples of early and late chronotypes confirmed the predictive value of the SHAPS (M=-0.24, 95% HDI=[-0.42,-0.06]), the DARS (M=-0.16, 95% HDI=[-0.31,-0.01]), and the AES (M=-0.18, 95% HDI=[-0.32,-0.02]) on motivational tendency.

      For the SHAPS, we find that when adding the measures of chronotype and time-of-day back into the GLMs, the main effect of the SHAPS (M=-0.26, 95% HDI=[-0.43,-0.07]), the main effect of chronotype (M=-0.11, 95% HDI=[-0.22,-0.01]), and the interaction effect of chronotype and time-of-day (M=0.20, 95% HDI=[0.07,0.34]) on motivational tendency remain. Model comparison by LOOIC reveals motivational tendency is best predicted by the model including the SHAPS, chronotype and time-of-day as predictors, followed by the model including only the SHAPS. Note that this approach to model comparison penalizes models for increasing complexity.

      Repeating these steps with the DARS, the main effect of the DARS is found numerically, but the 95% HDI just includes 0 (M=-0.15, 95% HDI=[-0.30,0.002]). The main effect of chronotype (M=-0.11, 95% HDI=[-0.21,-0.01]), and the interaction effect of chronotype and time-of-day (M=0.18, 95% HDI=[0.05,0.33]) on motivational tendency remain. Model comparison identifies the model including the DARS and circadian measures as the best model, followed by the model including only the DARS.

      For the AES, the main effect of the AES is found (M=-0.19, 95% HDI=[-0.35,-0.04]). For the main effect of chronotype, the 95% narrowly includes 0 (M=-0.10, 95% HDI=[-0.21,0.002]), while the interaction effect of chronotype and time-of-day (M=0.20, 95% HDI=[0.07,0.34]) on motivational tendency remains. Model comparison identifies the model including the AES and circadian measures as the best model, followed by the model including only the AES.”

      In addition to the title change, we edited our Discussion to discuss and reflect these new insights, including the following.

      Lines 399 – 402:

      “Various neuropsychiatric disorders are marked by disruptions in circadian rhythm, such as a late chronotype. However, research has rarely investigated how transdiagnostic mechanisms underlying neuropsychiatric conditions may relate to inter-individual differences in circadian rhythm.”

      Lines 475 – 480:

      “It is striking that the effects of neuropsychiatric symptoms on effort-based decision-making largely are paralleled by circadian effects on the same neurocomputational parameter. Exploratory analyses predicting motivational tendency by neuropsychiatric symptoms and circadian measures simultaneously indicate the effects go beyond recapitulating each other, but rather explain separable parts of the variance in motivational tendency.”

      Lines 528 – 532:

      “Our reported analyses investigating neuropsychiatric and circadian effects on effort-based decision-making simultaneously are exploratory, as our study design was not ideally set out to examine this. Further work is needed to disentangle separable effects of neuropsychiatric and circadian measures on effort-based decision-making.”

      Lines 543 – 550:

      “We demonstrate that neuropsychiatric effects on effort-based decision-making are paralleled by effects of circadian rhythm and time-of-day. Exploratory analyses suggest these effects account for separable parts of the variance in effort-based decision-making. It unlikely that effects of neuropsychiatric effects on effort-based decision-making reported here and in previous literature are a spurious result due to multicollinearity with chronotype. Yet, not accounting for chronotype and time of testing, which is the predominant practice in the field, could affect results.”

      - A minor point, but it could be made clearer that many neurotransmitters have circadian rhythms (and not just dopamine).

      We agree this should have been made clearer, and have added the following to the Introduction.

      Lines 83 – 84:

      “Bi-directional links between chronobiology and several neurotransmitter systems have been reported, including dopamine47.

      (47) Kiehn, J.-T., Faltraco, F., Palm, D., Thome, J. & Oster, H. Circadian Clocks in the Regulation of Neurotransmitter Systems. Pharmacopsychiatry 56, 108–117 (2023).”

      - Making reference to other studies which have explored circadian rhythms in cognitive tasks would allow interested readers to explore the broader field. One such paper is: Bedder, R. L., Vaghi, M. M., Dolan, R. J., & Rutledge, R. B. (2023). Risk taking for potential losses but not gains increases with time of day. Scientific reports, 13(1), 5534, which also includes references to other similar studies in the discussion.

      We thank the Reviewer for pointing out that we failed to cite this relevant work. We have now included it in the Introduction as follows.

      Lines 97 – 98:

      “A circadian effect on decision-making under risk is reported, with the sensitivity to losses decreasing with time-of-day66.

      (66) Bedder, R. L., Vaghi, M. M., Dolan, R. J. & Rutledge, R. B. Risk taking for potential losses but not gains increases with time of day. Sci Rep 13, 5534 (2023).”

      (3) Minor corrections to the text and figures.

      None, clearly written and structured. Figures are high quality and significantly aid understanding.

      Reviewer #2 (Recommendations For The Authors):

      I did have a few more minor comments:

      - The manuscript doesn't clarify whether trials had time limits - so that participants might fail to earn points - or instead they did not and participants had to continue exerting effort until they were done. This is important to know since it impacts on decision-strategies and behavioral outcomes that might be analyzed. For example, if there is no time limit, it might be useful to examine the amount of time it took participants to complete their effort - and whether that had any relationship to choice patterns or symptomatology. Or, if they did, it might be interesting to test whether the relationship between choices and exerted effort depended on symptoms. For example, someone with depression might be less willing to choose effort, but just as, if not more likely to successfully complete a trial once it is selected.

      We thank the Reviewer for pointing out this important detail in the task design, which we should have made clearer. The trials did indeed have a time limit which was dependent on the effort level. To clarify this in the manuscript, we have made changes to Figure 2 and the Methods section. We agree it would be interesting to explore whether the exerted effort in the task related to symptoms. We explored this in our data by predicting the participant average proportion of accepted but failed trials by SHAPS score (controlling for age and gender). We found no relationship: M=0.01, 95% HDI=[-0.001,0.02]. However, it should be noted that the measure of proportion of failed trials may not be suitable here, as there are only few accepted but failed trials (M = 1.3% trials failed, SD = 3.50). This results from several task design characteristics aimed at preventing subjects from failing accepted trials, to avoid confounding of effort discounting with risk discounting. As an alternative measure, we explored the extent to which participants went “above and beyond” the target in accepted trials. Specifically, considering only accepted and succeeded trials, we computed the factor by which the required number of clicks was exceeded (i.e., if a subject clicked 15 times when 10 clicks were required the factor would be 1.3), averaging across effort and reward level. We then conducted a Bayesian GLM to test whether this subject wise click-exceedance measure can be predicted by apathy or anhedonia, controlling for age and gender. We found neither the SHAPS (M=-0.14, 95% HDI=[-0.43,0.17]) nor the AES (M=0.07, 95% HDI=[-0.26,0.41]) had a predictive value for the amount to which subjects exert “extra effort”. We have now added this to the manuscript.

      In Figure 2, which explains the task design in the results section, we have added the following to the figure description.

      Lines 161 – 165:

      “Each trial consists of an offer with a reward (2,3,4, or 5 points) and an effort level (1,2,3, or 4, scaled to the required clicking speed and time the clicking must be sustained for) that subjects accept or reject. If accepted, a challenge at the respective effort level must be fulfilled for the required time to win the points.”

      In the Methods section, we have added the following.

      Lines 617 – 622:

      “We used four effort-levels, corresponding to a clicking speed at 30% of a participant’s maximal capacity for 8 seconds (level 1), 50% for 11 seconds (level 2), 70% for 14 seconds (level 3), and 90% for 17 seconds (level 4). Therefore, in each trial, participants had to fulfil a certain number of mouse clicks (dependent on their capacity and the effort level) in a specific time (dependent on the effort level).”

      In the Supplement, we have added the additional analyses suggested by the Reviewer.

      Lines 195 – 213:

      “3.2 Proportion of accepted but failed trials

      For each participant, we computed the proportion of trial in which an offer was accepted, but the required effort then not fulfilled (i.e., failed trials). There was no relationship between average proportion of accepted but failed trials and SHAPS score (controlling for age and gender): M=0.01, 95% HDI=[-0.001,0.02]. However, there are intentionally few accepted but failed trials (M = 1.3% trials failed, SD = 3.50). This results from several task design characteristics aimed at preventing subjects from failing accepted trials, to avoid confounding of effort discounting with risk discounting.”

      “3.3 Exertion of “extra effort”

      We also explored the extent to which participants went “above and beyond” the target in accepted trials. Specifically, considering only accepted and succeeded trials, we computed the factor by which the required number of clicks was exceeded (i.e., if a subject clicked 15 times when 10 clicks were required the factor would be 1.3), averaging across effort and reward level. We then conducted a Bayesian GLM to test whether this subject wise click-exceedance measure can be predicted by apathy or anhedonia, controlling for age and gender. We found neither the SHAPS (M=-0.14, 95% HDI=[-0.43,0.17]) nor the AES (M=0.07, 95% HDI=[-0.26,0.41]) had a predictive value for the amount to which subjects exert “extra effort”.”

      - Perhaps relatedly, there is evidence that people with depression show less of an optimism bias in their predictions about future outcomes. As such, they show more "rational" choices in probabilistic decision tasks. I'm curious whether the Authors think that a weaker choice bias among those with stronger depression/anhedonia/apathy might be related. Also, are choices better matched with actual effort production among those with depression?

      We think this is a very interesting comment, but unfortunately feel our manuscript cannot properly speak to it: as in our response to the previous comment, our exploratory analysis linking the proportion of accepted but failed trials to anhedonia symptoms (i.e. less anhedonic people making more optimistic judgments of their likelihood of success) did not show a relationship between the two. However, this null finding may be the result of our task design which is not laid out to capture such an effect (in fact to minimize trials of this nature). We have added to the Discussion section.

      Lines 442 – 445:

      “It is possible that a higher motivational tendency reflects a more optimistic assessment of future task success, in line with work on the optimism bias95; however our task intentionally minimized unsuccessful trials by titrating effort and reward; future studies should explore this more directly.

      (95) Korn, C. W., Sharot, T., Walter, H., Heekeren, H. R. & Dolan, R. J. Depression is related to an absence of optimistically biased belief updating about future life events. Psychological Medicine 44, 579–592 (2014).”

      - The manuscript does not clarify: How did the Authors ensure that each subject received each effort-reward combination at least once if a given subject always accepted or always rejected offers?

      We have made the following edit to the Methods section to better explain this aspect of our task design.

      Lines 642 – 655:

      “For each subject, trial-by-trial presentation of effort-reward combinations were made semi-adaptively by 16 randomly interleaved staircases. Each of the 16 possible offers (4 effort-levels x 4 reward-levels) served as the starting point of one of the 16 staircase. Within each staircase, after a subject accepted a challenge, the next trial’s offer on that staircase was adjusted (by increasing effort or decreasing reward). After a subject rejected a challenge, the next offer on that staircase was adjusted by decreasing effort or increasing reward. This ensured subjects received each effort-reward combination at least once (as each participant completed all 16 staircases), while individualizing trial presentation to maximize the trials’ informative value. Therefore, in practice, even in the case of a subject rejecing all offers (and hence the staircasing procedures always adapting by decreasing effort or increasing reward), the full range of effort-reward combinations will be represented in the task across the startingpoints of all staircases (and therefore before adaption takeplace).”

      - The word "metabolic" is misspelled in Table 1

      - Figure 2 is missing panel label "C"

      - The word "effort" is repeated on line 448.

      We thank the Reviewer for their attentive reading of our manuscript and have corrected the mistakes mentioned.

      Reviewer #3 (Recommendations For The Authors):

      It is a bit difficult to get a sense of people's discounting from the plots provided. Could the authors show a few example individuals and their fits (i.e., how steep was effort discounting on average and how much variance was there across individuals; maybe they could show the mean discount function or some examples etc)

      We appreciate very much the Reviewer's suggestion to visualise our parameter estimates within and across individuals. We have implemented this in Figure .S2

      It would be helpful if correlations between the various markers used as dependent variables (SHAPS, DARS, AES, chronotype etc) could plotted as part of each related figure (e.g., next to the relevant effects shown).

      We agree with the Reviewer that a visual representation of the various correlations between dependent variables would be a better and more assessable communication than our current paragraph listing the correlations. We have implemented this by adding a new figure plotting all correlations in a heat map, with asterisks indicating significance.

      The authors use the term "meaningful relationship" - how is this defined? If undefined, maybe consider changing (do they mean significant?)

      We understand how our use of the term “(no) meaningful relationship” was confusing here. As we conducted most analyses in a Bayesian fashion, this is a formal definition of ‘meaningful’: the 95% highest density interval does not span across 0. However, we do not want this to be misunderstood as frequentist “significance” and agree clarity can be improved here, To avoid confusion, we have amended the manuscript where relevant (i.e., we now state “we found a (/no) relationship / effect” rather than “we found a meaningful relationship”.

      The authors do not include an inverse temperature parameter in their discounting models-can they motivate why? If a participant chose nearly randomly, which set of parameter values would they get assigned?

      Our decision to not include an inverse temperature parameter was made after an extensive simulation-based investigation of different models and task designs. A series of parameter recovery studies including models with an inverse temperature parameter revealed the inverse temperature parameter could not be distinguished from the reward sensitivity parameter. Specifically, inverse temperature seemed to capture the variance of the true underlying reward sensitivity parameter, leading to confounding between the two. Hence, including both reward sensitivity and inverse temperature would not have allowed us to reliably estimate either parameter. As our pre-registered hypotheses related to the reward sensitivity parameter, we opted to include models with the reward sensitivity parameter rather than the inverse temperature parameter in our model space. We have now added these simulations to our supplement.

      Nevertheless, we believe our models can capture random decision-making. The parameters of effort and reward sensitivity capture how sensitive one is to changes in effort/reward level. Hence, random decision-making can be interpreted as low effort and reward sensitivity, such that one’s decision-making is not guided by changes in effort and reward magnitude. With low effort/reward sensitivity, the motivational tendency parameter (previously “choice bias”) would capture to what extend this random decision-making is biased toward accepting or rejecting offers.

      The simulation results are now detailed in the Supplement.

      Lines 25 – 46:

      “1.2.1 Parameter recoveries including inverse temperature

      In the process of task and model space development, we also considered models incorportating an inverse temperature paramater. To this end, we conducted parameter recoveries for four models, defined in Table S3.

      Parameter recoveries indicated that, parameters can be recovered reliably in model 1, which includes only effort sensitivity ( ) and inverse temperature as free parameters (on-diagonal correlations: .98 > r > .89, off-diagonal correlations: .04 > |r| > .004). However, as a reward sensitivity parameter is added to the model (model 2), parameter recovery seems to be compromised, as parameters are estimated less accurately (on-diagonal correlations: .80 > r > .68), and spurious correlations between parameters emerge (off-diagonal correlations: .40 > |r| > .17). This issue remains when motivational tendency is added to the model (model 4; on-diagonal correlations: .90 > r > .65; off-diagonal correlations: .28 > |r| > .03), but not when inverse temperature is modelled with effort sensitivity and motivational tendency, but not reward sensitivity (model 3; on-diagonal correlations: .96 > r > .73; off-diagonal correlations: .05 > |r| > .003).

      As our pre-registered hypotheses related to the reward sensitivity parameter, we opted to include models with the reward sensitivity parameter rather than the inverse temperature parameter in our model space.”

      And we now discuss random decision-making specifically in the Methods section.

      Lines 698 – 709:

      “First, a cost function transforms costs and rewards associated with an action into a subjective value (SV):

      with and for reward and effort sensitivity, and  and  for reward and effort. Higher effort and reward sensitivity mean the SV is more strongly influenced by changes in effort and reward, respectively (Fig. 3B-C). Hence, low effort and reward sensitivity mean the SV, and with that decision-making, is less guided by effort and reward offers, as would be in random decision-making.

      This SV is then transformed to an acceptance probability by a softmax function:

      with for the predicted acceptance probability and  for the intercept representing motivational tendency. A high motivational tendency means a subjects has a tendency, or bias, to accept rather than reject offers (Fig. 3D).”

      The pre-registration mentions effects of BMI and risk of metabolic disease-those are briefly reported the in factor loadings, but not discussed afterwards-although the authors stated hypotheses regarding these measures in their preregistration. Were those hypotheses supported?

      We reported these results (albeit only briefly) in the factor loadings resulting from our PLS regression and results from follow-up GLMs (see below). We have now amended the Discussion to enable further elaboration on whether they confirmed our hypotheses (this evidence was unclear, but we have subsequently followed up in a sample with type-2 diabetes, who also show reduced motivational tendency).

      Lines 258 – 261:

      “For the MEQ (95%HDI=[-0.09,0.06]), MCTQ (95%HDI=[-0.17,0.05]), BMI (95%HDI=[-0.19,0.01]), and FINDRISC (95%HDI=[-0.09,0.03]) no relationship with motivational tendency was found, consistent with the smaller magnitude of reported component loadings from the PLS regression.”

      We have added the following paragraph to our discussion.

      Lines 491 – 502:

      “To our surprise, we did not find statistical evidence for a relationship between effort-based decision-making and measures of metabolic health (BMI and risk for type-2 diabetes). Our analyses linking BMI to motivational tendency reveal a numeric effect in line with our hypothesis: a higher BMI relating to a lower motivational tendency. However, the 95% HDI for this effect narrowly included zero (95%HDI=[-0.19,0.01]). Possibly, our sample did not have sufficient variance in metabolic health to detect dimensional metabolic effects in a current general population sample. A recent study by our group investigates the same neurocomputational parameters of effort-based decision-making in participants with type-2 diabetes and non-diabetic controls matched by age, gender, and physical activity105. We report a group effect on the motivational tendency parameter, with type-2 diabetic patients showing a lower tendency to exert effort for reward.”

      “(105) Mehrhof, S. Z., Fleming, H. A. & Nord, C. A cognitive signature of metabolic health in effort-based decision-making. Preprint at https://doi.org/10.31234/osf.io/4bkm9 (2024).”

      R-values are indicated as a range (e.g., from 0.07-0.72 for the last one in 2.1 which is a large range). As mentioned above, the full correlation matrix should be reported in figures as heatmaps.

      We agree with the Reviewer that a heatmap is a better way of conveying this information – see Figure 1 in response to their previous comment.  

      The answer on whether data was already collected is missing on the second preregistration link. Maybe this is worth commenting on somewhere in the manuscript.

      This question appears missing because, as detailed in the manuscript, we felt that technically some data *was* already collected by the time our second pre-registration was posted. This is because the second pre-registration detailed an additional data collection, with the goal of extending data from the original dataset to include extreme chronotypes and increase precision of analyses. To avoid any confusion regarding the lack of reply to this question in the pre-registration, we have added the following disclaimer to the description of the second pre-registration:

      “Please note the lack of response to the question regarding already collected data. This is because the data collection in the current pre-registration extends data from the original dataset to increase the precision of analyses. While this original data is already collected, none of the data collection described here has taken place.”

      Some referencing is not reflective of the current state of the field (e.g., for effort discounting: Sugiwaka et al., 2004 is cited). There are multiple labs that have published on this since then including Philippe Tobler's and Sven Bestmann's groups (e.g., Hartmann et al., 2013; Klein-Flügge et al., Plos CB, 2015).

      We agree absolutely, and have added additional, more recent references on effort discounting.

      Lines 67 – 68:

      “Higher costs devalue associated rewards, an effect referred to as effort-discounting33–37.”

      (33) Sugiwaka, H. & Okouchi, H. Reformative self-control and discounting of reward value by delay or effort1. Japanese Psychological Research 46, 1–9 (2004).

      (34) Hartmann, M. N., Hager, O. M., Tobler, P. N. & Kaiser, S. Parabolic discounting of monetary rewards by physical effort. Behavioural Processes 100, 192–196 (2013).

      (35) Klein-Flügge, M. C., Kennerley, S. W., Saraiva, A. C., Penny, W. D. & Bestmann, S. Behavioral Modeling of Human Choices Reveals Dissociable Effects of Physical Effort and Temporal Delay on Reward Devaluation. PLOS Computational Biology 11, e1004116 (2015).

      (36) Białaszek, W., Marcowski, P. & Ostaszewski, P. Physical and cognitive effort discounting across different reward magnitudes: Tests of discounting models. PLOS ONE 12, e0182353 (2017).

      (37) Ostaszewski, P., Bąbel, P. & Swebodziński, B. Physical and cognitive effort discounting of hypothetical monetary rewards. Japanese Psychological Research 55, 329–337 (2013).

      There are lots of typos throughout (e.g., Supplementary martial, Mornignness etc)

      We thank the Reviewer for their attentive reading of our manuscript and have corrected our mistakes.

      In Table 1, it is not clear what the numbers given in parentheses are. The figure note mentions SD, IQR, and those are explicitly specified for some rows, but not all.

      After reviewing Table 1 we understand the comment regarding the clarity of the number in parentheses. In our original manuscript, for some variables, numbers were given per category (e.g. for gender and ethnicity), rather than per row, in which case the parenthetical statistic was indicated in the header row only. However, we now see that the clarity of the table would have been improved by adding the reported statistic for each row—we have corrected this.

      In Figure 1C, it would be much more helpful if the different panels were combined into one single panel (using differently coloured dots/lines instead of bars).

      We agree visualizing the proportion of accepted trials across effort and reward levels in one single panel aids interpretability. We have implemented it in the following plot (now Figure 2C).

      In Sections 2.2.1 and 4.2.1, the authors mention "mixed-effects analysis of variance (ANOVA) of repeated measures" (same in the preregistration). It is not clear if this is a standard RM-ANOVA (aggregating data per participant per condition) or a mixed-effects model (analysing data on a trial-by-trial level). This model seems to only include within-subjects variable, so it isn't a "mixed ANOVA" mixing within and between subjects effects.

      We apologise that our use of the term "mixed-effects analysis of variance (ANOVA) of repeated measures" is indeed incorrectly applied here. We aggregate data per participant and effort-by-reward combination, meaning there are no between-subject effects tested. We have corrected this to “repeated measures ANOVA”.

      In Section 2.2.2, the authors write "R-hats>1.002" but probably mean "R-hats < 1.002". ESS is hard to evaluate unless the total number of samples is given.

      We thank the Reviewer for noticing this mistake and have corrected it in the manuscript.

      In Section 2.3, the inference criterion is unclear. The authors first report "factor loadings" and then perform a permutation test that is not further explained. Which of these factors are actually needed for predicting choice bias out of chance? The permutation test suggests that the null hypothesis is just "none of these measures contributes anything to predicting choice bias", which is already falsified if only one of them shows an association with choice bias. It would be relevant to know for which measures this is the case. Specifically, it would be relevant to know whether adding circadian measures into a model that already contains apathy/anhedonia improves predictive performance.

      We understand the Reviewer’s concerns regarding the detail of explanation we have provided for this part of our analysis, but we believe there may have been a misunderstanding regarding the partial least squares (PLS) regression. Rather than identifying a number of factors to predict the outcome variable, a PLS regression identifies a model with one or multiple components, with various factor loadings of differing magnitude. In our case, the PLS regression identified a model with one component to best predict our outcome variable (motivational tendency, which in our previous various we called choice bias). This one component had factor loadings of our questionnaire-based measures, with measures of apathy and anhedonia having highest weights, followed by lesser weighted factor loadings by measures of circadian rhythm and metabolic health. The permutation test tests whether this component (consisting of the combination of factor loadings) can predict the outcome variable out of sample.

      We hope we have improved clarity on this in the manuscript by making the following edits to the Results section.

      Lines 248 – 251:

      “Permutation testing indicated the predictive value of the resulting component (with factor loadings described above) was significant out-of-sample (root-mean-squared error [RMSE]=0.203, p=.001).”

      Further, we hope to provide a more in-depth explanation of these results in the Methods section.

      Lines 755 – 759:

      “Statistical significance of obtained effects (i.e., the predictive accuracy of the identified component and factor loadings) was assessed by permutation tests, probing the proportion of root-mean-squared errors (RMSEs) indicating stronger or equally strong predictive accuracy under the null hypothesis.”

      In Section 2.5, the authors simply report "that chronotype showed effects of chronotype on reward sensitivity", but the direction of the effect (higher reward sensitivity in early vs. late chronotype) remains unclear.

      We thank the Reviewer for pointing this out. While we did report the direction of effect, this was only presented in the subsequent parentheticals and could have been made much clearer. To assist with this, we have made the following addition to the text.

      Lines 317 – 320:

      “Bayesian GLMs, controlling for age and gender, predicting task parameters by time-of-day and chronotype showed effects of chronotype on reward sensitivity (i.e. those with a late chronotype had a higher reward sensitivity; M= 0.325, 95% HDI=[0.19,0.46])”

      In Section 4.2, the authors write that they "implemented a previously-described procedure using Prolific pre-screeners", but no reference to this previous description is given.

      We thank the Reviewer for bringing our attention to this missing reference, which has now been added to the manuscript.

      In Supplementary Table S2, only the "on-diagonal correlations" are given, but off-diagonal correlations (indicative of trade-offs between parameters) would also be informative.

      We agree with the Reviewer that off-diagonal correlations between underlying and recovered parameters are crucial to assess confounding between parameters during model estimation. We reported this in figure S1D, where we present the full correlation matric between underlying and recovered parameters in a heatmap. We have now noticed that this plot was missing axis labels, which have been added now.

      I found it somewhat difficult to follow the results section without having read the methods section beforehand. At the beginning of the Results section, could the authors briefly sketch the outline of their study? Also, given they have a pre-registration, could the authors introduce each section with a statement of what they expected to find, and close with whether the data confirmed their expectations? In the current version of the manuscript, many results are presented without much context of what they mean.

      We agree a brief outline of the study procedure before reporting the results would be beneficial to following the subsequently text and have added the following to the end of our Introduction.

      Lines 101 – 106:

      “Here, we tested the relationship between motivational decision-making and three key neuropsychiatric syndromes: anhedonia, apathy, and depression, taking both a transdiagnostic and categorical (diagnostic) approach. To do this, we validate a newly developed effort-expenditure task, designed for online testing, and gamified to increase engagement. Participants completed the effort-expenditure task online, followed by a series of self-report questionnaires.”

      We have added references to our pre-registered hypotheses at multiple points in our manuscript.

      Lines 185 – 187:

      “In line with our pre-registered hypotheses, we found significant main effects for effort (F(1,14367)=4961.07, p<.0001) and reward (F(1,14367)=3037.91, p<.001), and a significant interaction between the two (F(1,14367)=1703.24, p<.001).”

      Lines 215 – 221:

      “Model comparison by out-of-sample predictive accuracy identified the model implementing three parameters (motivational tendency a, reward sensitivity , and effort sensitivity ), with a parabolic cost function (subsequently referred to as the full parabolic model) as the winning model (leave-one-out information criterion [LOOIC; lower is better] = 29734.8; expected log posterior density [ELPD; higher is better] = -14867.4; Fig. 31ED). This was in line with our pre-registered hypotheses.”

      Lines 252 – 258:

      “Bayesian GLMs confirmed evidence for psychiatric questionnaire measures predicting motivational tendency (SHAPS: M=-0.109; 95% highest density interval (HDI)=[-0.17,-0.04]; AES: M=-0.096; 95%HDI=[-0.15,-0.03]; DARS: M=-0.061; 95%HDI=[-0.13,-0.01]; Fig. 4A). Post-hoc GLMs on DARS sub-scales showed an effect for the sensory subscale (M=-0.050; 95%HDI=[-0.10,-0.01]). This result of neuropsychiatric symptoms predicting a lower motivational tendency is in line with our pre-registered hypothesis.”

      Lines 258 – 263:

      “For the MEQ (95%HDI=[-0.09,0.06]), MCTQ (95%HDI=[-0.17,0.05]), BMI (95%HDI=[-0.19,0.01]), and FINDRISC (95%HDI=[-0.09,0.03]) no meaningful relationship with choice biasmotivational tendency was found, consistent with the smaller magnitude of reported component loadings from the PLS regression. This null finding for dimensional measures of circadian rhythm and metabolic health was not in line with our pre-registered hypotheses.”

      Lines 268 – 270:

      “For reward sensitivity, the intercept-only model outperformed models incorporating questionnaire predictors based on RMSE. This result was not in line with our pre-registered expectations.”

      Lines 295 – 298:

      “As in our transdiagnostic analyses of continuous neuropsychiatric measures (Results 2.3), we found evidence for a lower motivational tendency parameter in the MDD group compared to HCs (M=-0.111, 95% HDI=[ -0.20,-0.03]) (Fig. 4B). This result confirmed our pre-registered hypothesis.”

      Lines 344 – 355:

      “Late chronotypes showed a lower motivational tendency than early chronotypes (M=-0.11, 95% HDI=[-0.22,-0.02])—comparable to effects of transdiagnostic measures of apathy and anhedonia, as well as diagnostic criteria for depression. Crucially, we found motivational tendency was modulated by an interaction between chronotype and time-of-day (M=0.19, 95% HDI=[0.05,0.33]): post-hoc GLMs in each chronotype group showed this was driven by a time-of-day effect within late, rather than early, chronotype participants (M=0.12, 95% HDI=[0.02,0.22], such that late chronotype participants showed a lower motivational tendency in the morning testing sessions, and a higher motivational tendency in the evening testing sessions; early chronotype: 95% HDI=[-0.16,0.04]) (Fig. 5A). These results of a main effect and an interaction effect of chronotype on motivational tendency confirmed our pre-registered hypothesis.”

      Lines 390 – 393:

      “Participants with an early chronotype had a lower reward sensitivity parameter than those with a late chronotype (M=0.27, 95% HDI=[0.16,0.38]). We found no effect of time-of-day on reward sensitivity (95%HDI=[-0.09,0.11]) (Fig. 5B). These results were in line with our pre-registered hypotheses.”

  7. milenio-nudos.github.io milenio-nudos.github.io
    1. ‘identify app

      Escribir fraseo completo del ítem, o bien ser más específico con el tipo de tarea, pues para el lector sin contexto no queda claro a que se refiere identify app (Ni siquiera los presentamos así en la tabla de métodos.

    1. Reviewer #2 (Public review):

      This is an innovative and technically strong study that integrates dual-gas respirometry with LC-MS metabolomics to examine how sleep and circadian disruption shape metabolism in Drosophila. The combination of continuous O₂/CO₂ measurements with high-temporal-resolution metabolite profiling is novel and provides fresh insight into how wild-type flies maintain anticipatory fuel alignment, while mutants shift to reactive or misaligned metabolism. The use of lag-shift correlation analysis is particularly clever, as it highlights temporal coordination rather than static associations. Together, the findings advance our understanding of how circadian clocks and sleep contribute to metabolic efficiency and redox balance.

      However, there are several areas where the manuscript could be strengthened. The authors should acknowledge that their findings may be gene-specific. Because sleep deprivation was not performed, it remains uncertain whether the observed metabolic shifts generalize to sleep loss broadly or are restricted to the fmn and sss mutants. This concern also connects to the finding of metabolic misalignment under constant darkness despite an intact clock. The conclusion that external entrainment is essential for maintaining energy homeostasis in flies may not translate to mammals. It would help to reference supporting data for the finding and discuss differences across species. Ideally, complementary circadian (light-dark cycle disruption) or sleep deprivation (for several hours) experiments, or citation of comparable studies, would strengthen the generality of the findings. Figures 1-4 are straightforward and clear, but when the manuscript transitions to the metabolite-respiration correlations, there is little description of the metabolomics methods or datasets, which should be clarified. The Discussion is at times repetitive and could be tightened, with the main message (i.e., wild-type flies align metabolism in advance, while mutants do not) kept front and center. Terms such as "anticipatory" and "reactive" should be defined early and used consistently throughout.

      Overall, this is a strong and novel contribution. With clarification of scope, refinement of presentation, and a more focused Discussion, the paper will make a significant impact.

    1. The k-means algorithm has a complexity of O(n)O(n), meaning that the algorithm scales linearly with nn. This algorithm will be the focus of this course.

      k-means is the main focus, it's O(n) so its much more efficient, I have yet to learn how right now

    2. clustering algorithms compute the similarity between all pairs of examples, which means their runtime increases as the square of the number of examples nn, denoted as O(n2)O(n^2) in complexity notation

      Algorithms that operate in O(n^2) are VERY INEFFICIENT, specifically ones that calculate the similarity between each individual object of data, out of MILLIONS of data

    1. Eu estou perto da parede

      Não ,você não perto da parede Não,eu não gosta de ir ao clube Não,eu não portugês Não,ele não prepara a lição tudos os dias Não ele não pronuncia bem a palavra Não, ele não na igreja Não,vocês não alunos de inglês Não,nós não visitam os parentes Não, vocês não estão longe da famácia Não, eles não vão à loja Não,nós não somos bons alunos Não, eles não repetem o vocabulário com satisfação

    1. Open notebook science is a gathering movement across a number of fields to make the entire research process transparent by sharing materials online as they are generated.

      The idea of transparency aligns with my goal of including a digital dataset without over-claiming, adding onto a previous annotation I made, I believe I should continue with the idea of creating a secondary journal to log my processes and potential failures.

    1. Merely making material available online - however well intentioned - is not sufficient to democratize learning or to give actual access.

      Adding onto this quote and relating it to my project if I share my results on a small webpage, accessibility matters (headings) so those who do not specialize in my specific topic can understand my evidence about incubation, substances and cures.

    1. Excel is a black box. When we use it, we have to take on faith that its statistics do what they say they are doing

      If I do my dataset only in a spreadsheet, it doesn't show or record the steps I made to create my "incubation + Inscription + cure" accounts. I think it's necessary to keep a secondary journal to log the processes I took.

    1. Es de suma importancia tener una visión o propósito antes de entrar a la investigación ya que permite tener claridad, facilidad de interpretación o evitar errores.

    2. Para poder hacer una investigación debe haber un tema o más que esto tiene que haber una finalidad ya sea una necesidad o un problema práctico, ya que esto servirá de guía en el proceso.

    1. Digital archaeology should exist to assist us in the performance of archaeology as a whole. It should not be a secret knowledge, nor a distinct school of thought, but rather simply seen as archaeology done well, using all of the tools available to and in better recovering, understanding and presenting the past.

      This expresses the idea that digital tools should be acting as extensions of thought rather than replacements. It connects to my final project because GIS and digital mapping help interpret archaeological data more effectively without detaching from human analysis.

  8. social-media-ethics-automation.github.io social-media-ethics-automation.github.io
    1. 21] Star Wars Kid. December 2008. URL: https://knowyourmeme.com/memes/star-wars-kid (visited on 2023-12-08).

      Ghyslain Raza became known as the "Star Wars Kid" after he recorded a video of himself pretending o fight in the Star Wars movie. A classmate posted the video online and it became viral. He was bullied severely and had to finish school is a psych ward. There are a lot of negative side effects of going viral and how that subjects you to unwanted and negative attention.

    1. In the absence of a sample, the light intensity at the detector is at a maximum when the second (movable) polarizer is set parallel to the first polarizer (α = 0º). If the analyzer is turned 90º to the plane of initial polarization, all the light will be blocked from reaching the detector.

      Start with Special Light: The machine uses a special light (monochromatic) that is passed through a fixed filter (the polarizer). This filter acts like a vertical slot, forcing all the light to vibrate in only one direction (e.g., up-and-down).

      The "Empty" Test: Before you add a sample, you have a second, movable filter (the analyzer) at the other end.

      Max Light (0°): If you line up this second filter perfectly with the first one (both vertical), all the light passes through.

      No Light (90°): If you turn the second filter sideways (to 90°, making a "+"), it completely blocks all the "up-and-down" light from the first filter. No light gets to the detector.

      This "no light" position is the starting point. When you add a sample (like sugar water), if it's "optically active," it will twist the light. The "up-and-down" light might become "diagonal." This twisted light can now sneak past the 90° filter, and the detector will once again see light. You then have to turn the analyzer to find the new "no light" angle, and that angle tells you exactly how much the sample twisted the light.

    1. Data that are sold are often purchased by data brokers who consolidate multiple streams of data, repackage them into new products, and o!er data services, such as microtargeted advertising, demographic profiling of individuals and places, assessing creditworthiness and risk, and business and bespoke data analytics

      exocap?

    1. n sum, from infancy, as humans processacoustic, lexical, and syntactic information anindexed to these. From the pair of linguistic sihumans identify diverse in-group and o

      Language is used in many ways, and even subconciously can affect people's biases

    1. ♖/indy/0/pad/index.html

      Gyuri Lajos, [03/11/2025 09:53]

      Hyper Plex Mark In Editor - indy Pad Plex - transitional logs

      reimagining html as hpmi

      Universal Hyper Plex Marked In named networks of intentionally deeply interconnected documents people and capabilities

      It's a new month, a new week

      Over the weekend I started to work on turning indy 0 Pad to be the next level indy 0 Pad Plex HyperPlex Mark In document editor

      Dpoing it by making Peer gos Custom App development slef-verioning and self-documenting. So I am in a double transition, trying to get two mutually interdependednt things right.

      Hence I resort to document the work, again in Telegram. This one point to the urgent need to create indy 0 gram exapting indy 0 Plex which in turn is and exaption thorugh mixins of Indy 0 Pad which is already an exaption of CK Editor Peergos Custom App

    1. Reviewer #1 (Public review):

      Summary:

      This study set out to investigate potential pharmacological drug-drug interactions between the two most common antimalarial classes, the artemisinins and quinolines. There is a strong rationale for this aim, because drugs from these classes are already widely used in Artemisinin Combination Therapies (ACTs) in the clinic, and drug combinations are an important consideration in the development of new medicines. Furthermore, whilst there is ample literature proposing many diverse mechanisms of action and resistance for the artemisinins and quinolines, it is generally accepted that the mechanisms for both classes involve heme metabolism in the parasite, and that artemisinin activity is dependent on activation by reduced heme. The study was designed to measure drug-drug interactions associated with a short pulse exposure (4 h) that is reminiscent of the short duration of artemisinin exposure obtained after in vivo dosing. Clear antagonism was observed between dihydroartemisinin (DHA) and chloroquine, which became even more extensive in chloroquine-resistant parasites. Antagonism was also observed in this assay for the more clinically-relevant ACT partner drugs piperaquine and amodiaquine, but not for other ACT partners mefloquine and lumefantrine, which don't share the 4-aminoquinoline structure or mode of action. Interestingly, chloroquine induced an artemisinin resistance phenotype in the standard in vitro Ring-stage Survival Assay, whereas this effect was not apparent for piperaquine.

      The authors also utilised a heme-reactive probe to demonstrate that the 4-aminoquinolines can inhibit heme-mediated activation of the probe within parasites, which suggests that the mechanism of antagonism involves the inactivation of heme, rendering it unable to activate the artemisinins. Measurement of protein ubiquitination showed reduced DHA-induced protein damage in the presence of chloroquine, which is also consistent with decreased heme-mediated activation, and/or with decreased DHA activity more generally.

      Overall, the study clearly demonstrates a mechanistic antagonism between DHA and 4-aminoquinoline antimalarials in vitro. It is interesting that this combination is successfully used to treat millions of malaria cases every year, which may raise questions about the clinical relevance of this finding. However, the conclusions in this paper are supported by multiple lines of evidence, and the data are clearly and transparently presented, leaving no doubt that DHA activity is compromised by the presence of chloroquine in vitro. It is perhaps fortunate that the clinical dosing regimens of 4-aminoquinoline-based ACTs have been sufficient to maintain clinical efficacy despite the non-optimal combination. Nevertheless, optimisation of antimalarial combinations and dosing regimens is becoming more important in the current era of increasing resistance to artemisinins and 4-aminoquinolines. Therefore, these findings should be considered when proposing new treatment regimens (including Tripe-ACTs) and the assays described in this study should be performed on new drug combinations that are proposed for new or existing antimalarial medicines.

      Strengths:

      This manuscript is clearly written, and the data presented are clear and complete. The key conclusions are supported by multiple lines of evidence, and most findings are replicated with multiple drugs within a class, and across multiple parasite strains, thus providing more confidence in the generalisability of these findings across the 4-aminoquinoline and peroxide drug classes.

      A key strength of this study was the focus on short pulse exposures to DHA (4 h in trophs and 3 h in rings), which is relevant to the in vivo exposure of artemisinins. Artemisinin resistance has had a significant impact on treatment outcomes in South-East Asia, and is now emerging in Africa, but is not detected using a 'standard' 48 or 72 h in vitro growth inhibition assay. It is only in the RSA (a short pulse of 3-6 h treatment of early ring stage parasites) that the resistance phenotype can be detected in vitro. Therefore, assays based on this short pulse exposure provide the most relevant approach to determine whether drug-drug interactions are likely to have a clinically relevant impact on DHA activity. These assays clearly showed antagonism between DHA and 4-aminoquinolines (chloroquine, piperaquine, amodiaquine, and ferroquine) in trophozoite stages. Interestingly, whilst chloroquine clearly induced an artemisinin-resistant phenotype in the RSA, piperaquine did not appear to impact the early ring stage activity of DHA, which may be fortunate considering that piperaquine is a currently recommended DHA partner drug in ACTs, whereas chloroquine is not!

      The evaluation of additional drug combinations at the end of this paper is a valuable addition, which increases the potential impact of this work. The finding of antagonism between piperaquine and OZ439 in trophozoites is consistent with the general interactions observed between peroxides and 4-aminoquinolines, and it would be interesting to see whether piperaquine impacts the ring-stage activity of OZ439.

      The evaluation of reactive heme in parasites using a fluorescent sensor, combined with the measurement of K48-linked ubiquitin, further supports the findings of this study, providing independent read-outs for the chloroquine-induced antagonism.

      The in-depth discussion of the interpretation and implications of the results is an additional strength of this manuscript. Whilst the discussion section is rather lengthy, there are important caveats to the interpretation of some of these results, and clear relevance to the future management of malaria that require these detailed explanations.

      Overall, this is a high-quality manuscript describing an important study that has implications for the selection of antimalarial combinations for new and existing malaria medicines.

      Weaknesses:

      This study is an in vitro study of parasite cultures, and therefore, caution should be taken when applying these findings to decisions about clinical combinations. The drug concentrations and exposure durations in these assays are intended to represent clinically relevant exposures, although it is recognised that the in vitro system is somewhat simplified and there may be additional factors that influence in vivo activity. I think this is reasonably well acknowledged in the manuscript.

      It is also important to recognise that the majority of the key findings regarding antagonism are based on trophozoite-stage parasites, and one must show caution when generalising these findings to other stages or scenarios. For example, piperaquine showed clear antagonism in trophozoite stages, but not in ring stages under these assay conditions.

      The key weakness in this manuscript is the over-interpretation of the mechanistic studies that implicate heme-mediated artemisinin activation as the mechanism underpinning antagonism by chloroquine. In particular, the manuscript title focuses on heme-mediated activation of artemisinins, but this study did not directly measure the activation of artemisinins. The data obtained from the activation of the fluorescent probe are generally supportive of chloroquine suppressing the heme-mediated activation of artemisinins, and I think this is the most likely explanation, but there are significant caveats that undermine this conclusion. Primarily, the inconsistency between the fluorescence profile in the chemical reactions and the cell-based assay raises questions about the accuracy of this readout. In the chemical reaction, mefloquine and chloroquine showed identical inhibition of fluorescence, whereas piperaquine had minimal impact. On the contrary, in the cell, chloroquine and piperaquine had similar impacts on fluorescence, but mefloquine had minimal impact. This inconsistency indicates that the cellular fluorescence based on this sensor does not give a simple direct readout of the reactivity of ferrous heme, and therefore, these results should be interpreted with caution. Indeed, the correlation between fluorescence and antagonism for the tested drugs is a correlation, not causation. There could be several reasons for the disconnect between the chemical and biological results, either via additional mechanisms that quench fluorescence, or the presence of biomolecules that alter the oxidation state or coordination chemistry of heme or other potential catalysts of this sensor. It is possible that another factor that influences the H-FluNox fluorescence in cells also influences the DHA activity in cells, leading to the correlation with activity. It should be noted that H-FluNox is not a chemical analogue of artemisinins. Its activation relies on Fenton-like chemistry, but with an N-O rather than O-O bond, and it possesses very different steric and electronic substituents around the reactive centre, which are known to alter reactivity to different iron sources. Despite these limitations, the authors have provided reasonable justification for the use of this probe to directly visualise heme reactivity in cells, and the results are still informative, but additional caution should be provided in the interpretation, and the results are not conclusive enough to justify the current title of the paper.

      Another interesting finding that was not elaborated by the authors is the impact of chloroquine on the DHA dose-response curves from the ring stage assays. Detection of artemisinin resistance in the RSA generally focuses on the % survival at high DHA concentrations (700 nM) as there is minimal shift in the IC50 (see Figure 2), however, chloroquine clearly induces a shift in the IC50 (~5-fold), where the whole curve is shifted to the right, whereas the increase in % survival is relatively small. This different profile suggests that the mechanism of chloroquine-induced antagonism is different from the mechanism of artemisinin resistance. Current evidence regarding the mechanism of artemisinin resistance generally points towards decreased heme-mediated drug activation due to a decrease in hemoglobin uptake, which should be analogous to the decrease in heme-mediated drug activation caused by chloroquine. However, these different dose-response curves suggest different mechanisms are primarily responsible. Additional mechanisms have been proposed for artemisinin resistance, involving redox or heat stress responses, proteostatic responses, mitochondrial function, dormancy, and PI3K signaling, among others. Whilst the H-FluNox probe generally supports the idea that chloroquine suppresses heme-mediated DHA activation, it remains plausible that chloroquine could induce these, or other, cellular responses that suppress DHA activity.

      The other potential weakness in the current manuscript is the interpretation of the OZ439 clinical data. Whilst the observed interaction with piperaquine and ferroquine may have been a contributing factor, it should also be recognised that the low pharmacokinetic exposure in these studies was the primary reason for treatment failure (Macintyre 2017).

      Impact:

      This study has important implications for the selection of drugs to form combinations for the treatment of malaria. The overall findings of antagonism between peroxide antimalarials and 4-aminoquinolines in the trophozoite stage are robust, and this carries across to the ring stage for chloroquine (but not piperaquine).

      The manuscript also provides a plausible mechanism to explain the antagonism, although future work will be required to further explore the details of this mechanism and to rule out alternative factors that may contribute.

      Overall, this is an important contribution to the field and provides a clear justification for the evaluation of potential drug combinations in relevant in vitro assays before clinical testing.

    1. o use Ellin Scholnick's example (from theIntroduction), "Calling roses and daisies 'flowers' in-duces children to search for their similarities" (p. 14).The kinds of similarities recognized, however, varyover time and across domain. For example, whenasked to interpret the statement "A tape recorder islike a camera," 6-years-olds tended to identify similarsurface attributes (e.g., noting that they are the samecolor), whereas 9-year-old children and adults tendedto identify similarities in Ranction, that is, that theyboth can record something for later use (Centner,1988, as cited in the chapter, pp. 96-97).

      a great example and something too remember

    1. o, I call myself a theocratic libertarian, and theocratic means, if we outlaw something, I want a Bible verse, ideally the Ten Commandments, if we make something against the law. But if it has to do with the manufacturing and sale of widgets, or the thoughts a person thinks, or the beliefs that they have, I’m a libertarian

      laws need to be biblically sourced and enforced, but those who practice another religion should be free to do so

    1. CONSIDERAÇÕES FINAIS

      São coerentes com a proposta do artigo e abordam os principais resultados alcançados, inclusive, indicando um direcionamento para estudos futuros sobre o tema. Porém, poderia enfatizar a contribuição do próprio trabalho no uso de portfólios com apoio da inteligência artificial.

    2. DISCUSSÃO

      Está em consonância com os resultados. Utiliza um discurso crítico, bem estruturado em 5 eixos temáticos. É lastreada em literatura recente (entre 2023 e 2025), contemplando fontes em português, espanhol e inglês, o que torna abrangente a revisão. Contudo, não explicita as limitações do estudo, deixando de evidenciar o alcance das conclusões obtidas.

    3. INTRODUÇÃO

      Bem estruturada. Fornece informações relevantes sobre a temática proposta. Contempla uma revisão sintética da literatura. Explicita o objetivo do trabalho. Todavia, para reforçar a justificativa do estudo, poderia evidenciar melhor as lacunas da literatura.

    1. LXXIX
      • Informativo 1068
      • ADI 6649 / DF
      • Órgão julgador: Tribunal Pleno
      • Relator(a): Min. GILMAR MENDES
      • Julgamento: 15/09/2022 (Presencial)
      • Ramo do Direito: Constitucional
      • Matéria: Direitos e garantias fundamentais

      Compartilhamento de dados no âmbito da Administração Pública federal

      Resumo - É legítimo, desde que observados alguns parâmetros, o compartilhamento de dados pessoais entre órgãos e entidades da Administração Pública federal, sem qualquer prejuízo da irrestrita observância dos princípios gerais e mecanismos de proteção elencados na Lei Geral de Proteção de Dados Pessoais (Lei 13.709/2018) e dos direitos constitucionais à privacidade e proteção de dados.

      • Consoante recente entendimento desta Corte, a proteção de dados pessoais e a autodeterminação informacional são direitos fundamentais <u>autônomos</u>, dos quais decorrem tutela jurídica específica e dimensão normativa própria. Assim, é necessária a instituição de controle efetivo e transparente da coleta, armazenamento, aproveitamento, transferência e compartilhamento desses dados, bem como o controle de políticas públicas que possam afetar substancialmente o direito fundamental à proteção de dados (1).

      • Na espécie, o Decreto 10.046/2019, da Presidência da República, dispõe sobre a governança no compartilhamento de dados no âmbito da Administração Pública federal e institui o Cadastro Base do Cidadão e o Comitê Central de Governança de Dados.

      • Para a sua plena validade, é necessário que seu conteúdo seja interpretado em conformidade com a Constituição Federal, subtraindo do campo semântico da norma eventuais aplicações ou interpretações que <u>conflitem</u> com o direito fundamental à proteção de dados pessoais.

      • Com base nesse entendimento, o Tribunal, por maioria, julgou parcialmente procedentes as ações, para conferir interpretação conforme a Constituição Federal ao Decreto 10.046/2019, nos seguintes termos:

      • 1. O compartilhamento de dados pessoais entre órgãos e entidades da Administração Pública, pressupõe: a) eleição de propósitos legítimos, específicos e explícitos para o tratamento de dados (art. 6º, inciso I, da Lei 13.709/2018); b) compatibilidade do tratamento com as finalidades informadas (art. 6º, inciso II); c) limitação do compartilhamento ao <u>mínimo necessário</u> para o atendimento da finalidade informada (art. 6º, inciso III); bem como o cumprimento integral dos requisitos, garantias e procedimentos estabelecidos na Lei Geral de Proteção de Dados, no que for compatível com o setor público.

      • 2. O compartilhamento de dados pessoais entre órgãos públicos pressupõe rigorosa observância do art. 23, inciso I, da Lei 13.709/2018, que determina seja dada a devida publicidade às hipóteses em que cada entidade governamental compartilha ou tem acesso a banco de dados pessoais, ‘fornecendo informações claras e atualizadas sobre a previsão legal, a finalidade, os procedimentos e as práticas utilizadas para a execução dessas atividades, em veículos de fácil acesso, preferencialmente em seus sítios eletrônicos’.

      • 3. O acesso de órgãos e entidades governamentais ao Cadastro Base do Cidadão fica condicionado ao atendimento integral das diretrizes acima arroladas, cabendo ao Comitê Central de Governança de Dados, no exercício das competências aludidas nos arts. 21, incisos VI, VII e VIII do Decreto 10.046/2019: 3.1. prever mecanismos rigorosos de controle de acesso ao Cadastro Base do Cidadão, o qual será limitado a órgãos e entidades que comprovarem real necessidade de acesso aos dados pessoais nele reunidos. Nesse sentido, a permissão de acesso somente poderá ser concedida para o alcance de propósitos legítimos, específicos e explícitos, sendo limitada a informações que sejam indispensáveis ao atendimento do interesse público, nos termos do art. 7º, inciso III, e art. 23, caput e inciso I, da Lei 13.709/2018; 3.2. justificar <u>formal</u>, <u>prévia</u> e <u>minudentemente</u>, à luz dos postulados da proporcionalidade, da razoabilidade e dos princípios gerais de proteção da LGPD, tanto a necessidade de inclusão de novos dados pessoais na base integradora (art. 21, inciso VII) como a escolha das bases temáticas que comporão o Cadastro Base do Cidadão (art. 21, inciso VIII); 3.3. instituir medidas de segurança compatíveis com os princípios de proteção da LGPD, em especial a criação de sistema eletrônico de registro de acesso, para efeito de responsabilização em caso de abuso.

      • 4. O compartilhamento de informações pessoais em atividades de inteligência observará o disposto em legislação específica e os parâmetros fixados no julgamento da ADI 6.529, Rel. Min. Cármen Lúcia, quais sejam: <u>(i)</u> adoção de medidas proporcionais e estritamente necessárias ao atendimento do interesse público; <u>(ii)</u> instauração de procedimento administrativo formal, acompanhado de prévia e exaustiva motivação, para permitir o controle de legalidade pelo Poder Judiciário; <u>(iii)</u> utilização de sistemas eletrônicos de segurança e de registro de acesso, inclusive para efeito de responsabilização em caso de abuso; e <u>(iv)</u> observância dos princípios gerais de proteção e dos direitos do titular previstos na LGPD, no que for compatível com o exercício dessa função estatal.

      • 5. O tratamento de dados pessoais promovido por órgãos públicos ao arrepio dos parâmetros legais e constitucionais importará a responsabilidade civil do Estado pelos danos suportados pelos particulares, na forma dos arts. 42 e seguintes da Lei 13.709/2018, associada ao exercício do direito de regresso contra os servidores e agentes políticos responsáveis pelo ato ilícito, em caso de culpa ou dolo.

        1. A transgressão dolosa ao dever de publicidade estabelecido no art. 23, inciso I, da LGPD, fora das hipóteses constitucionais de sigilo, importará a responsabilização do agente estatal por ato de improbidade administrativa, nos termos do art. 11, inciso IV, da Lei 8.429/1992, sem prejuízo da aplicação das sanções disciplinares previstas nos estatutos dos servidores públicos federais, municipais e estaduais.”
      • Por fim, o Tribunal declarou, com efeito pro futuro, a inconstitucionalidade do art. 22 do Decreto 10.046/2019, preservando a atual estrutura do Comitê Central de Governança de Dados pelo prazo de 60 (sessenta) dias, a contar da data de publicação da ata de julgamento, a fim de garantir ao Chefe do Poder Executivo prazo hábil para (i) atribuir ao órgão um perfil independente e plural, aberto à participação efetiva de representantes de outras instituições democráticas; e (ii) conferir aos seus integrantes garantias mínimas contra influências indevidas. Vencidos, parcialmente e nos termos de seus respectivos votos, os Ministros André Mendonça, Nunes Marques e Edson Fachin.

      (1) Precedente citado: ADI 6.387 Ref-MC.

      Legislação: Lei 13.709/2018 Decreto 10.046/2019

      Precedentes: ADI 6.387 Ref-MC

      Observação: Julgamento em conjunto: ADI 6649/DF e ADPF 695/DF (relator Min. Gilmar Mendes)


      • Informativo 1033
      • ADI 6529 / DF
      • Órgão julgador: Tribunal Pleno
      • Relator(a): Min. CÁRMEN LÚCIA
      • Julgamento: 08/10/2021 (Virtual)
      • Ramo do Direito: Constitucional, Administrativo
      • Matéria: Proteção à intimidade e sigilo de dados; Atividade de inteligência

      Fornecimento de dados à Agência Brasileira de Inteligência (ABIN) e controle judicial de legalidade

      Resumo - Os órgãos componentes do Sistema Brasileiro de Inteligência somente podem fornecer dados e conhecimentos específicos à ABIN quando comprovado o interesse público da medida.

      • Toda e qualquer decisão de fornecimento desses dados deverá ser devida e formalmente motivada para eventual controle de legalidade pelo Poder Judiciário.

      • Os órgãos componentes do Sistema Brasileiro de Inteligência somente podem fornecer dados e conhecimentos específicos à ABIN quando comprovado o interesse público da medida.

      • Os mecanismos legais de compartilhamento de dados e informações previstos no parágrafo único do art. 4º da Lei 9.883/1999 (1) são previstos para abrigar o interesse público. O compartilhamento de dados e de conhecimentos específicos que visem ao interesse privado do órgão ou de agente público não é juridicamente admitido, caracterizando-se desvio de finalidade e abuso de direito.

      • O fornecimento de informações entre órgãos públicos para a defesa das instituições e dos interesses nacionais é ato legítimo. É proibido, no entanto, que essas finalidades se tornem subterfúgios para atendimento ou benefício de interesses particulares ou pessoais.

      • Toda e qualquer decisão de fornecimento desses dados deverá ser devida e formalmente motivada para eventual controle de legalidade pelo Poder Judiciário.

      • Cabe destacar que a natureza da atividade de inteligência, que eventualmente se desenvolve em regime de sigilo ou de restrição de publicidade, <u>não afasta a obrigação de motivação dos atos administrativos</u>. A motivação dessas solicitações mostra-se indispensável para que o Poder Judiciário, se provocado, realize o controle de legalidade, examinando sua conformidade aos princípios da proporcionalidade e da razoabilidade.

      • Ademais, ainda que presentes o interesse público e a motivação, o ordenamento jurídico nacional prevê hipóteses em que se impõe a cláusula de reserva de jurisdição, ou seja, a necessidade de análise e autorização prévia do Poder Judiciário. Nessas hipóteses, tem-se, na CF, ser essencial a intervenção prévia do Estado-juiz, sem o que qualquer ação de autoridade estatal será ilegítima, ressalvada a situação de flagrante delito.

      • Com base nesse entendimento, o Tribunal conheceu parcialmente da ação direta e deu interpretação conforme ao parágrafo único do art. 4º da Lei 9.883/1999 para estabelecer que:

      a) os órgãos componentes do Sistema Brasileiro de Inteligência somente podem fornecer dados e conhecimentos específicos à ABIN quando comprovado o interesse público da medida, afastada qualquer possibilidade de o fornecimento desses dados atender a interesses pessoais ou privados;

      b) toda e qualquer decisão de fornecimento desses dados deverá ser devida e formalmente motivada para eventual controle de legalidade pelo Poder Judiciário;

      c) mesmo quando presente o interesse público, os dados referentes às comunicações telefônicas ou dados sujeitos à reserva de jurisdição não podem ser compartilhados na forma do dispositivo, em razão daquela limitação, decorrente do respeito aos direitos fundamentais;

      d) nas hipóteses cabíveis de fornecimento de informações e dados à ABIN, são imprescindíveis procedimento formalmente instaurado e a existência de sistemas eletrônicos de segurança e registro de acesso, inclusive para efeito de responsabilização em caso de eventual omissão, desvio ou abuso.

      (1) Lei 9.883/1999: “Art. 4o À ABIN, além do que lhe prescreve o artigo anterior, compete: (...) Parágrafo único. Os órgãos componentes do Sistema Brasileiro de Inteligência fornecerão à ABIN, nos termos e condições a serem aprovados mediante ato presidencial, para fins de integração, dados e conhecimentos específicos relacionados com a defesa das instituições e dos interesses nacionais.”

      Legislação: Lei 9.883/1999, art. 4º, Parágrafo único

      Consultar todos os resumos relacionados ao processo (2)

    1. Beschrijvende statistieken opvragen: o Meeste bezoeken (visits) o Langste verblijftijd (duration)

      De cellen met de meeste visits en langste duration is het meest aannemelijk dat dit attractor toestanden zijn.

    1. HERO IMAGE & GALERIA

      Fakty: • 75% konsumentów bazuje decyzję na zdjęciach • Lifestyle photos zwiększają konwersję o 25-50% • Bez lifestyle photos: “To jest produkt”, z lifestyle photos: “To mogę być ja” Co musi być w galerii (minimum):

    1. If it were a war for the purpose of making democracy safe for the world, we would say that democracy must first be safe for America before it can be safe for the world.

      She talking :o

  9. drive.google.com drive.google.com
  10. Oct 2025
    1. Every spot of the old world is overrun with oppression. Freedom hath been huntedround the globe. Asia, and Africa, have long expelled her. — Europe regards her like astranger, and England hath given her warning to depart. O! receive the fugitive, and preparein time an asylum for mankind

      Paine shows the revolution has global significance and urges America to protect liberty, emphasizing the importance of action.

    2. O ye that love mankind! Ye that dare oppose, not only the tyranny, but the tyrant, standforth!

      How does Paine's call to action compare to petitions, letters, or other revolutionary writings we've read in the course?

    1. Art. 158

      Pertence ao Município, aos Estados e ao Distrito Federal a titularidade das receitas arrecadadas a título de imposto de renda retido na fonte incidente sobre valores pagos por eles, suas autarquias e fundações a pessoas físicas ou jurídicas contratadas para a prestação de bens ou serviços, conforme disposto nos arts. 158, I, e 157, I, da Constituição Federal. Nesse sentido:


      • RE 1293453 - Tema 1.130
      • Órgão julgador: Tribunal Pleno
      • Relator(a): Min. ALEXANDRE DE MORAES
      • Julgamento: 11/10/2021
      • Publicação: 22/10/2021

      RECURSO EXTRAORDINÁRIO. REPERCUSSÃO GERAL. INCIDENTE DE RESOLUÇÃO DE DEMANDAS REPETITIVAS (IRDR). DIREITO TRIBUTÁRIO. DIREITO FINANCEIRO. REPARTIÇÃO DE RECEITAS ENTRE OS ENTES DA FEDERAÇÃO. TITULARIDADE DO IMPOSTO DE RENDA INCIDENTE NA FONTE SOBRE RENDIMENTOS PAGOS, A QUALQUER TÍTULO, PELOS MUNICÍPIOS, A PESSOAS FÍSICAS OU JURÍDICAS CONTRATADAS PARA PRESTAÇÃO DE BENS OU SERVIÇOS. ART. 158, INCISO I, DA CONSTITUIÇÃO FEDERAL. RECURSO EXTRAORDINÁRIO DESPROVIDO. TESE FIXADA.

      • 1. A Constituição Federal de 1988 rompeu com o paradigma anterior - no qual verificávamos a tendência de concentração do poder econômico no ente central (União)-, implementando a descentralização de competências e receitas aos entes subnacionais, a fim de garantir-lhes a autonomia necessária para cumprir suas atribuições.

      • 2. A análise dos dispositivos constitucionais que versam sobre a repartição de receitas entre os Entes Federados, considerando o contexto histórico em que elaborados, deve ter em vista a tendência de descentralização dos recursos e os valores do federalismo de cooperação, com vistas ao fortalecimento e autonomia dos entes subnacionais.

      • 3. A Constituição Federal, ao dispor no art. 158, I, que pertencem aos Municípios “ o produto da arrecadação do imposto da União sobre renda e proventos de qualquer natureza, incidente na fonte, sobre rendimentos pagos, a qualquer título, por eles, suas autarquias e pelas fundações que instituírem e mantiverem.”, optou por não restringir expressamente o termo ‘rendimentos pagos’, por sua vez, a expressão ‘a qualquer título’ demonstra nitidamente a intenção de ampliar as hipóteses de abrangência do referido termo. Desse modo, o conceito de rendimentos constante do referido dispositivo constitucional não deve ser interpretado de forma restritiva.

      • 4. A previsão constitucional de repartição das receitas tributárias não altera a distribuição de competências, pois não influi na privatividade do ente federativo em instituir e cobrar seus próprios impostos, influindo, tão somente, na distribuição da receita arrecadada, inexistindo, na presente hipótese, qualquer ofensa ao art. 153, III, da Constituição Federal.

      • 5. O direito subjetivo do ente federativo beneficiado com a participação no produto da arrecadação do Imposto de Renda Retido na Fonte - IRRF, nos termos dos arts. 157, I, e 158, I, da Constituição Federal, somente existirá a partir do momento em que o ente federativo competente criar o tributo e ocorrer seu fato imponível. No entanto, uma vez devidamente instituído o tributo, não pode a União - que possui a competência legislativa - inibir ou restringir o acesso dos entes constitucionalmente agraciados com a repartição de receitas aos valores que lhes correspondem.

      • 6. O acórdão recorrido, ao fixar a tese no sentido de que “O artigo 158, I, da Constituição Federal de 1988 define a titularidade municipal das receitas arrecadadas a título de imposto de renda retido na fonte, incidente sobre valores pagos pelos Municípios, a pessoas físicas ou jurídicas contratadas para a prestação de bens ou serviços”, atentou-se à literalidade e à finalidade (descentralização de receitas) do disposto no art. 158, I, da Lei Maior.

      • 7. Ainda que em dado momento alguns entes federados, incluindo a União, tenham adotado entendimento restritivo relativamente ao disposto no art. 158, I, da Constituição Federal, tal entendimento vai de encontro à literalidade do referido dispositivo constitucional, devendo ser extirpado do ordenamento jurídico pátrio.

      • 8. A delimitação imposta pelo art. 64 da Lei 9.430/1996 - que permite a retenção do imposto de renda somente pela Administração federal - é claramente inconstitucional, na medida em que cria uma verdadeira discriminação injustificada entre os entes federativos, com nítida vantagem para a União Federal e exclusão dos entes subnacionais.

      • 9. Recurso Extraordinário a que se nega provimento. Fixação da seguinte tese para o TEMA 1130: “Pertence ao Município, aos Estados e ao Distrito Federal a titularidade das receitas arrecadadas a título de imposto de renda retido na fonte incidente sobre valores pagos por eles, suas autarquias e fundações a pessoas físicas ou jurídicas contratadas para a prestação de bens ou serviços, conforme disposto nos arts. 158, I, e 157, I, da Constituição Federal.”

      Tema 1130 - Titularidade das receitas arrecadadas a título de imposto de renda retido na fonte incidente sobre valores pagos pelos Municípios, suas autarquias e fundações a pessoas físicas ou jurídicas contratadas para a prestação de bens ou serviços.

      Tese - Pertence ao Município, aos Estados e ao Distrito Federal a titularidade das receitas arrecadadas a título de imposto de renda retido na fonte incidente sobre valores pagos por eles, suas autarquias e fundações a pessoas físicas ou jurídicas contratadas para a prestação de bens ou serviços, conforme disposto nos arts. 158, I, e 157, I, da Constituição Federal.

      Outras ocorrências Decisão (1)

    1. Últimos anos de Itamar Franco Depois da presidência, Itamar Franco não abandonou a política. Entre 1995 e 1996, ele assumiu o posto de embaixador do Brasil em Portugal. Em 1998, ele concorreu ao governo de Minas Gerais pelo PMDB, e venceu no segundo turno ao obter mais de 57% dos votos. Dessa vez, seguindo apenas um mandato. Em 2010, Itamar Franco concorreu novamente ao cargo de senador por Minas Gerais, e conseguiu eleger-se ao obter quase 27% dos votos. Ele ficou poucos meses na função, pois faleceu em 2 de julho de 2011, vítima de leucemia. A vaga deixada por ele foi ocupada por Zezé Perrella.

      Últimas notícias de Itamar Franco

    1. Scaling Context Requires Rethinking Attention

      Core Thesis

      • Neither transformers nor sub-quadratic architectures are well-suited for long-context training

        "the cost of processing the context is too expensive in the former, too inexpensive in the latter"

      • Power attention introduced as solution: A linear-cost sequence modeling architecture with independently adjustable state size > "an architectural layer for linear-cost sequence modeling whose state size can be adjusted independently of parameters"

      Three Requirements for Long-Context Architectures

      1. Balanced Weight-to-State Ratio (WSFR)

      • Weight-state FLOP ratio should approach 1:1 for compute-optimal models

        "for compute-optimal models, the WSFR should be somewhat close to 1:1"

      • Exponential attention becomes unbalanced at long contexts

      • At 65,536 context: WSFR is 1:8
      • At 1,000,000 context: WSFR is 1:125

        "exponential attention is balanced for intermediate context lengths, but unbalanced for long context lengths, where it does far more state FLOPs than weight FLOPs"

      • Linear attention remains unbalanced at all context lengths

      • WSFR stays at 30:1 regardless of context length

        "Linear attention...is unbalanced at all context lengths in the opposite direction: far more weight FLOPs than state FLOPs"

      2. Hardware-Aware Implementation

      • Must admit efficient implementation on tensor cores
      • Power attention achieves 8.6x faster throughput than Flash Attention at 64k context (head size 32)
      • 3.3x speedup at head size 64

      3. Strong In-Context Learning (ICL)

      • Large state size improves ICL performance

        "state scaling improves performance"

      • Windowed attention fails ICL beyond window size

        "no in-context learning occurs beyond 100 tokens for window-32 attention"

      • Linear attention maintains ICL across entire sequence

        "linear attention...demonstrate consistent in-context learning across the entire sequence"

      Power Attention Technical Details

      Mathematical Foundation

      • Power attention formula: Uses p-th power instead of exponential

        "attnᵖₚₒw(Q, K, V)ᵢ = Σⱼ₌₁ⁱ (QᵢᵀKⱼ)ᵖVⱼ"

      • Symmetric power expansion (SPOW) reduces state size vs tensor power (TPOW)

      • At p=2, d=64: SPOW uses 2,080 dimensions vs TPOW's 4,096 (49% savings)
      • At p=4, d=64: 95% size reduction

        "SPOWₚ is a state expansion that increases the state size by a factor of (ᵈ⁺ᵖ⁻¹ₚ)/d without introducing any parameters"

      Implementation Innovation

      • Fused expand-MMA kernel: Expands tiles on-the-fly during matrix multiplication

        "a matrix multiplication where the tiles of one operand are expanded on-the-fly"

      • Tiled symmetric power expansion (TSPOW): Interpolates between TPOW and SPOW

      • Provides GPU-friendly structure while reducing data duplication
      • Optimal tile size: d-tile = 8 for p=2, d-tile = 4 for p=3

      • Chunked form enables practical efficiency

        "The chunked form interpolates between the recurrent form and the attention form, capturing benefits of both"

      • Cost: O(tDv + tcd) where c is chunk size

      Experimental Results

      In-Context Learning Performance

      • Power attention dominates windowed attention at equal state sizes across all context lengths
      • All scaling axes improve ICL: gradient updates, batch size, parameter count, context length

        "In all cases, the ICL curve becomes steeper as we scale the respective axis"

      Long-Context Training (65,536 tokens)

      • Power attention (p=2) outperforms both exponential and linear attention in loss-per-FLOP
      • RWKV with power attention shows near-zero ICL benefit beyond 2,000 tokens
      • Power attention enables RWKV to ICL "nearly as well as exponential attention"

      Compute-Optimal Under Latency Constraints

      • When inference latency constrains parameter count and state size:
      • Window-1k attention: loss 1.638
      • Standard attention: loss 1.631
      • Power attention (p=2): loss 1.613 (best)

      Dataset and Experimental Setup

      LongCrawl64

      • 6.66M documents, each 65,536 tokens (435B total tokens)
      • Sourced from Common Crawl, filtered for long sequences
      • Critical for ICL research

        "Most sequences in OpenWebText have length less than 1k"

      Architectures Tested

      • Base architectures: GPT-2, RWKV (RWKV7), GLA, RetNet
      • Attention variants: Exponential, linear, windowed, power (p=2)
      • Training: LongCrawl64, AdamW, bf16, learning rate 3e-4 with warmup and cosine decay

      Key Limitations and Future Work

      Current Limitations

      1. Experiments limited to natural language NLL - no other domains/modalities tested
      2. Compute-optimal context grows slowly in natural language

        "autoregressive prediction of natural language is largely dominated by short-context dependencies"

      3. p=2 only - normalization requires positive inner products (even powers only)
      4. Triton implementation - not yet optimized to CUDA level

      Future Directions

      • Explore domains with long-term dependencies: chain-of-thought reasoning, audio, video
      • Scaling laws research for state size, context size, and ICL
      • CUDA implementation for further speedups beyond current Triton kernels
      • Alternative normalization to support odd powers
      • Comprehensive comparison to hybrid models, sparse attention, MQA, latent attention

      Key References and Tools

      Implementations

      Related Techniques

      • Flash Attention [Dao, 2023]: Operator fusion to avoid materializing attention matrix
      • Linear attention [Katharopoulos et al., 2020]: Enables recurrent formulation
      • Gating [Lin et al., 2025]: Learned mechanism to avoid attending to old data
      • Sliding window attention [Child et al., 2019]: Truncates KV cache

      Key Papers

      • Transformers [Vaswani et al., 2023]
      • Mamba [Gu and Dao, 2024]: Modern RNN architecture
      • RWKV [Peng et al., 2023]: Reinventing RNNs for transformer era
      • Scaling laws [Kaplan et al., 2020]

      Technical Contributions

      1. Framework for evaluating long-context architectures (balance, efficiency, ICL)
      2. Power attention architecture with parameter-free state size adjustment
      3. Symmetric power expansion theory and implementation
      4. Hardware-efficient kernels with operation fusion
      5. Empirical validation on 435B token dataset
    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Lu & Golomb combined EEG, artificial neural networks, and multivariate pattern analyses to examine how different visual variables are processed in the brain. The conclusions of the paper are mostly well supported, but some aspects of methods and data analysis would benefit from clarification and potential extensions.

      The authors find that not only real-world size is represented in the brain (which was known), but both retinal size and real-world depth are represented, at different time points or latencies, which may reflect different stages of processing. Prior work has not been able to answer the question of real-world depth due to the stimuli used. The authors made this possible by assessing real-world depth and testing it with appropriate methodology, accounting for retinal and real-world size. The methodological approach combining behavior, RSA, and ANNs is creative and well thought out to appropriately assess the research questions, and the findings may be very compelling if backed up with some clarifications and further analyses.

      The work will be of interest to experimental and computational vision scientists, as well as the broader computational cognitive neuroscience community as the methodology is of interest and the code is or will be made available. The work is important as it is currently not clear what the correspondence between many deep neural network models and the brain is, and this work pushes our knowledge forward on this front. Furthermore, the availability of methods and data will be useful for the scientific community.

      Reviewer #2 (Public Review):

      Summary:

      This paper aims to test if neural representations of images of objects in the human brain contain a 'pure' dimension of real-world size that is independent of retinal size or perceived depth. To this end, they apply representational similarity analysis on EEG responses in 10 human subjects to a set of 200 images from a publicly available database (THINGS-EEG2), correlating pairwise distinctions in evoked activity between images with pairwise differences in human ratings of real-world size (from THINGS+). By partialling out correlations with metrics of retinal size and perceived depth from the resulting EEG correlation time courses, the paper claims to identify an independent representation of real-world size starting at 170 ms in the EEG signal. Further comparisons with artificial neural networks and language embeddings lead the authors to claim this correlation reflects a relatively 'high-level' and 'stable' neural representation.

      Strengths:

      The paper features insightful figures/illustrations and clear figures.

      The limitations of prior work motivating the current study are clearly explained and seem reasonable (although the rationale for why using 'ecological' stimuli with backgrounds matters when studying real-world size could be made clearer; one could also argue the opposite, that to get a 'pure' representation of the real-world size of an 'object concept', one should actually show objects in isolation).

      The partial correlation analysis convincingly demonstrates how correlations between feature spaces can affect their correlations with EEG responses (and how taking into account these correlations can disentangle them better).

      The RSA analysis and associated statistical methods appear solid.

      Weaknesses:

      The claim of methodological novelty is overblown. Comparing image metrics, behavioral measurements, and ANN activations against EEG using RSA is a commonly used approach to study neural object representations. The dataset size (200 test images from THINGS) is not particularly large, and neither is comparing pre-trained DNNs and language models, or using partial correlations.

      Thanks for your feedback. We agree that the methods used in our study – such as RSA, partial correlations, and the use of pretrained ANN and language models – are indeed well-established in the literature. We therefore revised the manuscript to more carefully frame our contribution: rather than emphasizing methodological novelty in isolation, we now highlight the combination of techniques, the application to human EEG data with naturalistic images, and the explicit dissociation of real-world size, retinal size, and depth representations as the primary strengths of our approach. Corresponding language in the Abstract, Introduction, and Discussion has been adjusted to reflect this more precise positioning:

      (Abstract, line 34 to 37) “our study combines human EEG and representational similarity analysis to disentangle neural representations of object real-world size from retinal size and perceived depth, leveraging recent datasets and modeling approaches to address challenges not fully resolved in previous work.”

      (Introduction, line 104 to 106) “we overcome these challenges by combining human EEG recordings, naturalistic stimulus images, artificial neural networks, and computational modeling approaches including representational similarity analysis (RSA) and partial correlation analysis …”

      (Introduction, line 108) “We applied our integrated computational approach to an open EEG dataset…”

      (Introduction, line 142 to 143) “The integrated computational approach by cross-modal representational comparisons we take with the current study…”

      (Discussion, line 550 to 552) “our study goes beyond the contributions of prior studies in several key ways, offering both theoretical and methodological advances: …”

      The claims also seem too broad given the fairly small set of RDMs that are used here (3 size metrics, 4 ANN layers, 1 Word2Vec RDM): there are many aspects of object processing not studied here, so it's not correct to say this study provides a 'detailed and clear characterization of the object processing process'.

      Thanks for pointing this out. We softened language in our manuscript to reflect that our findings provide a temporally resolved characterization of selected object features, rather than a comprehensive account of object processing:

      (line 34 to 37) “our study combines human EEG and representational similarity analysis to disentangle neural representations of object real-world size from retinal size and perceived depth, leveraging recent datasets and modeling approaches to address challenges not fully resolved in previous work.”

      (line 46 to 48) “Our research provides a temporally resolved characterization of how certain key object properties – such as object real-world size, depth, and retinal size – are represented in the brain, …”

      The paper lacks an analysis demonstrating the validity of the real-world depth measure, which is here computed from the other two metrics by simply dividing them. The rationale and logic of this metric is not clearly explained. Is it intended to reflect the hypothesized egocentric distance to the object in the image if the person had in fact been 'inside' the image? How do we know this is valid? It would be helpful if the authors provided a validation of this metric.

      We appreciate the comment regarding the real-world depth metric. Specifically, this metric was computed as the ratio of real-world size (obtained via behavioral ratings) to measured retinal size. The rationale behind this computation is grounded in the basic principles of perspective projection: for two objects subtending the same retinal size, the physically larger object is presumed to be farther away. This ratio thus serves as a proxy for perceived egocentric depth under the simplifying assumption of consistent viewing geometry across images.

      We acknowledge that this is a derived estimate and not a direct measurement of perceived depth. While it provides a useful approximation that allows us to analytically dissociate the contributions of real-world size and depth in our RSA framework, we agree that future work would benefit from independent perceptual depth ratings to validate or refine this metric. We added more discussions about this to our revised manuscript:

      (line 652 to 657) “Additionally, we acknowledge that our metric for real-world depth was derived indirectly as the ratio of perceived real-world size to retinal size. While this formulation is grounded in geometric principles of perspective projection and served the purpose of analytically dissociating depth from size in our RSA framework, it remains a proxy rather than a direct measure of perceived egocentric distance. Future work incorporating behavioral or psychophysical depth ratings would be valuable for validating and refining this metric.”

      Given that there is only 1 image/concept here, the factor of real-world size may be confounded with other things, such as semantic category (e.g. buildings vs. tools). While the comparison of the real-world size metric appears to be effectively disentangled from retinal size and (the author's metric of) depth here, there are still many other object properties that are likely correlated with real-world size and therefore will confound identifying a 'pure' representation of real-world size in EEG. This could be addressed by adding more hypothesis RDMs reflecting different aspects of the images that may correlate with real-world size.

      We thank the reviewer for this thoughtful and important point. We agree that semantic category and real-world size may be correlated, and that semantic structure is one of the plausible sources of variance contributing to real-world size representations. However, we would like to clarify that our original goal was to isolate real-world size from two key physical image features — retinal size and inferred real-world depth — which have been major confounds in prior work on this topic. We acknowledge that although our analysis disentangled real-world size from depth and retinal size, this does not imply a fully “pure” representation; therefore, we now refer to the real-world size representations as “partially disentangled” throughout the manuscript to reflect this nuance.

      Interestingly, after controlling for these physical features, we still found a robust and statistically isolated representation of real-world size in the EEG signal. This motivated the idea that realworld size may be more than a purely perceptual or image-based property — it may be at least partially semantic. Supporting this interpretation, both the late layers of ANN models and the non-visual semantic model (Word2Vec) also captured real-world size structure. Rather than treating semantic information as an unwanted confound, we propose that semantic structure may be an inherent component of how the brain encodes real-world size.

      To directly address the your concern, we conducted an additional variance partitioning analysis, in which we decomposed the variance in EEG RDMs explained by four RDMs: real-world depth, retinal size, real-world size, and semantic information (from Word2Vec). Specifically, for each EEG timepoint, we quantified (1) the unique variance of real-world size, after controlling for semantic similarity, depth, and retinal size; (2) the unique variance of semantic information, after controlling for real-world size, depth, and retinal size; (3) the shared variance jointly explained by real-world size and semantic similarity, controlling for depth and retinal size. This analysis revealed that real-world size explained unique variance in EEG even after accounting for semantic similarity. And there was also a substantial shared variance, indicating partial overlap between semantic structure and size. Semantic information also contributed unique explanatory power, as expected. These results suggest that real-world size is indeed partially semantic in nature, but also has independent neural representation not fully explained by general semantic similarity. This strengthens our conclusion that real-world size functions as a meaningful, higher-level dimension in object representation space.

      We now include this new analysis and a corresponding figure (Figure S8) in the revised manuscript:

      (line 532 to 539) “Second, we conducted a variance partitioning analysis, in which we decomposed the variance in EEG RDMs explained by three hypothesis-based RDMs and the semantic RDM (Word2Vec RDM), and we still found that real-world size explained unique variance in EEG even after accounting for semantic similarity (Figure S9). And we also observed a substantial shared variance jointly explained by real-world size and semantic similarity and a unique variance of semantic information. These results suggest that real-world size is indeed partially semantic in nature, but also has independent neural representation not fully explained by general semantic similarity.”

      The choice of ANNs lacks a clear motivation. Why these two particular networks? Why pick only 2 somewhat arbitrary layers? If the goal is to identify more semantic representations using CLIP, the comparison between CLIP and vision-only ResNet should be done with models trained on the same training datasets (to exclude the effect of training dataset size & quality; cf Wang et al., 2023). This is necessary to substantiate the claims on page 19 which attributed the differences between models in terms of their EEG correlations to one of them being a 'visual model' vs. 'visual-semantic model'.

      We argee that the choice and comparison of models should be better contextualized.

      First, our motivation for selecting ResNet-50 and CLIP ResNet-50 was not to make a definitive comparison between model classes, but rather to include two widely used representatives of their respective categories—one trained purely on visual information (ResNet-50 on ImageNet) and one trained with joint visual and linguistic supervision (CLIP ResNet-50 on image–text pairs). These models are both highly influential and commonly used in computational and cognitive neuroscience, allowing for relevant comparisons with existing work (line 181-187).

      Second, we recognize that limiting the EEG × ANN correlation analyses to only early and late layers may be viewed as insufficiently comprehensive. To address this point, we have computed the EEG correlations with multiple layers in both ResNet and CLIP models (ResNet: ResNet.maxpool, ResNet.layer1, ResNet.layer2, ResNet.layer3, ResNet.layer4, ResNet.avgpool; CLIP: CLIP.visual.avgpool, CLIP.visual.layer1, CLIP.visual.layer2, CLIP.visual.layer3, CLIP.visual.layer4, CLIP.visual.attnpool). The results, now included in Figure S4, show a consistent trend: early layers exhibit higher similarity to early EEG time points, and deeper layers show increased similarity to later EEG stages. We chose to highlight early and late layers in the main text to simplify interpretation.

      Third, we appreciate the reviewer’s point that differences in training datasets (ImageNet vs. CLIP's dataset) may confound any attribution of differences in brain alignment to the models' architectural or learning differences. We agree that the comparisons between models trained on matched datasets (e.g., vision-only vs. multimodal models trained on the same image–text corpus) would allow for more rigorous conclusions. Thus, we explicitly acknowledged this limitation in the text:

      (line 443 to 445) “However, it is also possible that these differences between ResNet and CLIP reflect differences in training data scale and domain.”

      The first part of the claim on page 22 based on Figure 4 'The above results reveal that realworld size emerges with later peak neural latencies and in the later layers of ANNs, regardless of image background information' is not valid since no EEG results for images without backgrounds are shown (only ANNs).

      We revised the sentence to clarify that this is a hypothesis based on the ANN results, not an empirical EEG finding:

      (line 491 to 495) “These results show that real-world size emerges in the later layers of ANNs regardless of image background information, and – based on our prior EEG results – although we could not test object-only images in the EEG data, we hypothesize that a similar temporal profile would be observed in the brain, even for object-only images.”

      While we only had the EEG data of human subjects viewing naturalistic images, the ANN results suggest that real-world size representations may still emerge at later processing stages even in the absence of background, consistent with what we observed in EEG under with-background conditions.

      The paper is likely to impact the field by showcasing how using partial correlations in RSA is useful, rather than providing conclusive evidence regarding neural representations of objects and their sizes.

      Additional context important to consider when interpreting this work:

      Page 20, the authors point out similarities of peak correlations between models ('Interestingly, the peaks of significant time windows for the EEG × HYP RSA also correspond with the peaks of the EEG × ANN RSA timecourse (Figure 3D,F)'. Although not explicitly stated, this seems to imply that they infer from this that the ANN-EEG correlation might be driven by their representation of the hypothesized feature spaces. However this does not follow: in EEG-image metric model comparisons it is very typical to see multiple peaks, for any type of model, this simply reflects specific time points in EEG at which visual inputs (images) yield distinctive EEG amplitudes (perhaps due to stereotypical waves of neural processing?), but one cannot infer the information being processed is the same. To investigate this, one could for example conduct variance partitioning or commonality analysis to see if there is variance at these specific timepoints that is shared by a specific combination of the hypothesis and ANN feature spaces.

      Thanks for your thoughtful observation! Upon reflection, we agree that the sentence – "Interestingly, the peaks of significant time windows for the EEG × HYP RSA also correspond with the peaks of the EEG × ANN RSA timecourse" – was speculative and risked implying a causal link that our data do not warrant. As you rightly points out, observing coincident peak latencies across different models does not necessarily imply shared representational content, given the stereotypical dynamics of evoked EEG responses. And we think even variance partitioning analysis would still not suffice to infer that ANN-EEG correlations are driven specifically by hypothesized feature spaces. Accordingly, we have removed this sentence from the manuscript to avoid overinterpretation. 

      Page 22 mentions 'The significant time-window (90-300ms) of similarity between Word2Vec RDM and EEG RDMs (Figure 5B) contained the significant time-window of EEG x real-world size representational similarity (Figure 3B)'. This is not particularly meaningful given that the Word2Vec correlation is significant for the entire EEG epoch (from the time-point of the signal 'arriving' in visual cortex around ~90 ms) and is thus much less temporally specific than the realworld size EEG correlation. Again a stronger test of whether Word2Vec indeed captures neural representations of real-world size could be to identify EEG time-points at which there are unique Word2Vec correlations that are not explained by either ResNet or CLIP, and see if those timepoints share variance with the real-world size hypothesized RDM.

      We appreciate your insightful comment. Upon reflection, we agree that the sentence – "'The significant time-window (90-300ms) of similarity between Word2Vec RDM and EEG RDMs (Figure 5B) contained the significant time-window of EEG x real-world size representational similarity (Figure 3B)" – was speculative. And we have removed this sentence from the manuscript to avoid overinterpretation. 

      Additionally, we conducted two analyses as you suggested in the supplement. First, we calculated the partial correlation between EEG RDMs and the Word2Vec RDM while controlling for four ANN RDMs (ResNet early/late and CLIP early/late) (Figure S8). Even after regressing out these ANN-derived features, we observed significant correlations between Word2Vec and EEG RDMs in the 100–190 ms and 250–300 ms time windows. This result suggests that

      Word2Vec captures semantic structure in the neural signal that is not accounted for by ResNet or CLIP. Second, we conducted an additional variance partitioning analysis, in which we decomposed the variance in EEG RDMs explained by four RDMs: real-world depth, retinal size, real-world size, and semantic information (from Word2Vec) (Figure S9). And we found significant shared variance between Word2Vec and real-world size at 130–150 ms and 180–250 ms. These results indicate a partially overlapping representational structure between semantic content and real-world size in the brain.

      We also added these in our revised manuscript:

      (line 525 to 539) “To further probe the relationship between real-world size and semantic information, and to examine whether Word2Vec captures variances in EEG signals beyond that explained by visual models, we conducted two additional analyses. First, we performed a partial correlation between EEG RDMs and the Word2Vec RDM, while regressing out four ANN RDMs (early and late layers of both ResNet and CLIP) (Figure S8). We found that semantic similarity remained significantly correlated with EEG signals across sustained time windows (100-190ms and 250-300ms), indicating that Word2Vec captures neural variance not fully explained by visual or visual-language models. Second, we conducted a variance partitioning analysis, in which we decomposed the variance in EEG RDMs explained by three hypothesis-based RDMs and the semantic RDM (Word2Vec RDM), and we still found that real-world size explained unique variance in EEG even after accounting for semantic similarity (Figure S9). And we also observed a substantial shared variance jointly explained by realworld size and semantic similarity and a unique variance of semantic information. These results suggest that real-world size is indeed partially semantic in nature, but also has independent neural representation not fully explained by general semantic similarity.”

      Reviewer #3 (Public Review):

      The authors used an open EEG dataset of observers viewing real-world objects. Each object had a real-world size value (from human rankings), a retinal size value (measured from each image), and a scene depth value (inferred from the above). The authors combined the EEG and object measurements with extant, pre-trained models (a deep convolutional neural network, a multimodal ANN, and Word2vec) to assess the time course of processing object size (retinal and real-world) and depth. They found that depth was processed first, followed by retinal size, and then real-world size. The depth time course roughly corresponded to the visual ANNs, while the real-world size time course roughly corresponded to the more semantic models.

      The time course result for the three object attributes is very clear and a novel contribution to the literature. However, the motivations for the ANNs could be better developed, the manuscript could better link to existing theories and literature, and the ANN analysis could be modernized. I have some suggestions for improving specific methods.

      (1) Manuscript motivations

      The authors motivate the paper in several places by asking " whether biological and artificial systems represent object real-world size". This seems odd for a couple of reasons. Firstly, the brain must represent real-world size somehow, given that we can reason about this question. Second, given the large behavioral and fMRI literature on the topic, combined with the growing ANN literature, this seems like a foregone conclusion and undermines the novelty of this contribution.

      Thanks for your helpful comment. We agree that asking whether the brain represents real-world size is not a novel question, given the existing behavioral and neuroimaging evidence supporting this. Our intended focus was not on the existence of real-world size representations per se, but the nature of these representations, particularly the relationship between the temporal dynamics and potential mechanisms of representations of real-world size versus other related perceptual properties (e.g., retinal size and real-world depth). We revised the relevant sentence to better reflect our focue, shifting from a binary framing (“whether or not size is represented”) to a more mechanistic and time-resolved inquiry (“how and when such representations emerge”):

      (line 144 to 149) “Unraveling the internal representations of object size and depth features in both human brains and ANNs enables us to investigate how distinct spatial properties—retinal size, realworld depth, and real-world size—are encoded across systems, and to uncover the representational mechanisms and temporal dynamics through which real-world size emerges as a potentially higherlevel, semantically grounded feature.”

      While the introduction further promises to "also investigate possible mechanisms of object realworld size representations.", I was left wishing for more in this department. The authors report correlations between neural activity and object attributes, as well as between neural activity and ANNs. It would be nice to link the results to theories of object processing (e.g., a feedforward sweep, such as DiCarlo and colleagues have suggested, versus a reverse hierarchy, such as suggested by Hochstein, among others). What is semantic about real-world size, and where might this information come from? (Although you may have to expand beyond the posterior electrodes to do this analysis).

      We thank the reviewer for this insightful comment. We agree that understanding the mechanisms underlying real-world size representations is a critical question. While our current study does not directly test specific theoretical frameworks such as the feedforward sweep model or the reverse hierarchy theory, our results do offer several relevant insights: The temporal dynamics revealed by EEG—where real-world size emerges later than retinal size and depth—suggest that such representations likely arise beyond early visual feedforward stages, potentially involving higherlevel semantic processing. This interpretation is further supported by the fact that real-world size is strongly captured by late layers of ANNs and by a purely semantic model (Word2Vec), suggesting its dependence on learned conceptual knowledge.

      While we acknowledge that our analyses were limited to posterior electrodes and thus cannot directly localize the cortical sources of these effects, we view this work as a first step toward bridging low-level perceptual features and higher-level semantic representations. We hope future work combining broader spatial sampling (e.g., anterior EEG sensors or source localization) and multimodal recordings (e.g., MEG, fMRI) can build on these findings to directly test competing models of object processing and representation hierarchy.

      We also added these to the Discussion section:

      (line 619 to 638) “Although our study does not directly test specific models of visual object processing, the observed temporal dynamics provide important constraints for theoretical interpretations. In particular, we find that real-world size representations emerge significantly later than low-level visual features such as retinal size and depth. This temporal profile is difficult to reconcile with a purely feedforward account of visual processing (e.g., DiCarlo et al., 2012), which posits that object properties are rapidly computed in a sequential hierarchy of increasingly complex visual features. Instead, our results are more consistent with frameworks that emphasize recurrent or top-down processing, such as the reverse hierarchy theory (Hochstein & Ahissar, 2002), which suggests that high-level conceptual information may emerge later and involve feedback to earlier visual areas. This interpretation is further supported by representational similarities with late-stage artificial neural network layers and with a semantic word embedding model (Word2Vec), both of which reflect learned, abstract knowledge rather than low-level visual features. Taken together, these findings suggest that real-world size is not merely a perceptual attribute, but one that draws on conceptual or semantic-level representations acquired through experience. While our EEG analyses focused on posterior electrodes and thus cannot definitively localize cortical sources, we see this study as a step toward linking low-level visual input with higher-level semantic knowledge. Future work incorporating broader spatial coverage (e.g., anterior sensors), source localization, or complementary modalities such as MEG and fMRI will be critical to adjudicate between alternative models of object representation and to more precisely trace the origin and flow of real-world size information in the brain.”

      Finally, several places in the manuscript tout the "novel computational approach". This seems odd because the computational framework and pipeline have been the most common approach in cognitive computational neuroscience in the past 5-10 years.

      We have revised relevant statements throughout the manuscript to avoid overstating novelty and to better reflect the contribution of our study.

      (2) Suggestion: modernize the approach

      I was surprised that the computational models used in this manuscript were all 8-10 years old. Specifically, because there are now deep nets that more explicitly model the human brain (e.g., Cornet) as well as more sophisticated models of semantics (e.g., LLMs), I was left hoping that the authors had used more state-of-the-art models in the work. Moreover, the use of a single dCNN, a single multi-modal model, and a single word embedding model makes it difficult to generalize about visual, multimodal, and semantic features in general.

      Thanks for your suggestion. Indeed, our choice of ResNet and CLIP was motivated by their widespread use in the cognitive and computational neuroscience area. These models have served as standard benchmarks in many studies exploring correspondence between ANNs and human brain activity. To address you concern, we have now added additional results from the more biologically inspired model, CORnet, in the supplementary (Figure S10). The results for CORnet show similar patterns to those observed for ResNet and CLIP, providing converging evidence across models.

      Regarding semantic modeling, we intentionally chose Word2Vec rather than large language models (LLMs), because our goal was to examine concept-level, context-free semantic representations. Word2Vec remains the most widely adopted approach for obtaining noncontextualized embeddings that reflect core conceptual similarity, as opposed to the contextdependent embeddings produced by LLMs, which are less directly suited for capturing stable concept-level structure across stimuli.

      (3) Methodological considerations

      (a) Validity of the real-world size measurement

      I was concerned about a few aspects of the real-world size rankings. First, I am trying to understand why the scale goes from 100-519. This seems very arbitrary; please clarify. Second, are we to assume that this scale is linear? Is this appropriate when real-world object size is best expressed on a log scale? Third, the authors provide "sand" as an example of the smallest realworld object. This is tricky because sand is more "stuff" than "thing", so I imagine it leaves observers wondering whether the experimenter intends a grain of sand or a sandy scene region. What is the variability in real-world size ratings? Might the variability also provide additional insights in this experiment?

      We now clarify the origin, scaling, and interpretation of the real-world size values obtained from the THINGS+ dataset.

      In their experiment, participants first rated the size of a single object concept (word shown on the screen) by clicking on a continuous slider of 520 units, which was anchored by nine familiar real-world reference objects (e.g., “grain of sand,” “microwave oven,” “aircraft carrier”) that spanned the full expected size range on a logarithmic scale. Importantly, participants were not shown any numerical values on the scale—they were guided purely by the semantic meaning and relative size of the anchor objects. After the initial response, the scale zoomed in around the selected region (covering 160 units of the 520-point scale) and presented finer anchor points between the previous reference objects. Participants then refined their rating by dragging from the lower to upper end of the typical size range for that object. If the object was standardized in size (e.g., “soccer ball”), a single click sufficed. These size judgments were collected across at least 50 participants per object, and final scores were derived from the central tendency of these responses. Although the final size values numerically range from 0 to 519 (after scaling), this range is not known to participants and is only applied post hoc to construct the size RDMs.

      Regarding the term “sand”: the THINGS+ dataset distinguished between object meanings when ambiguity was present. For “sand,” participants were instructed to treat it as “a grain of sand”— consistent with the intended meaning of a discrete, minimal-size reference object. 

      Finally, we acknowledge that real-world size ratings may carry some degree of variability across individuals. However, the dataset includes ratings from 2010 participants across 1854 object concepts, with each object receiving at least 50 independent ratings. Given this large and diverse sample, the mean size estimates are expected to be stable and robust across subjects. While we did not include variability metrics in our main analysis, we believe the aggregated ratings provide a reliable estimate of perceived real-world size.

      We added these details in the Materials and Method section:

      (line 219 to 230) “In the THINGS+ dataset, 2010 participants (different from the subjects in THINGS EEG2) did an online size rating task and completed a total of 13024 trials corresponding to 1854 object concepts using a two-step procedure. In their experiment, first, each object was rated on a 520unit continuous slider anchored by familiar reference objects (e.g., “grain of sand,” “microwave oven,” “aircraft carrier”) representing a logarithmic size range. Participants were not shown numerical values but used semantic anchors as guides. In the second step, the scale zoomed in around the selected region to allow for finer-grained refinement of the size judgment. Final size values were derived from aggregated behavioral data and rescaled to a range of 0–519 for consistency across objects, with the actual mean ratings across subjects ranging from 100.03 (‘grain of sand’) to 423.09 (‘subway’).”

      (b) This work has no noise ceiling to establish how strong the model fits are, relative to the intrinsic noise of the data. I strongly suggest that these are included.

      We have now computed noise ceiling estimates for the EEG RDMs across time. The noise ceiling was calculated by correlating each participant’s EEG RDM with the average EEG RDM across the remaining participants (leave-one-subject-out), at each time point. This provides an upper-bound estimate of the explainable variance, reflecting the maximum similarity that any model—no matter how complex—could potentially achieve, given the intrinsic variability in the EEG data.

      Importantly, the observed EEG–model similarity values are substantially below this upper bound. This outcome is fully expected: Each of our model RDMs (e.g., real-world size, ANN layers) captures only a specific aspect of the neural representational structure, rather than attempting to account for the totality of the EEG signal. Our goal is not to optimize model performance or maximize fit, but to probe which components of object information are reflected in the spatiotemporal dynamics of the brain’s responses.

      For clarity and accessibility of the main findings, we present the noise ceiling time courses separately in the supplementary materials (Figure S7). Including them directly in the EEG × HYP or EEG × ANN plots would conflate distinct interpretive goals: the model RDMs are hypothesis-driven probes of specific representational content, whereas the noise ceiling offers a normative upper bound for total explainable variance. Keeping these separate ensures each visualization remains focused and interpretable. 

      Reviewer #1 (Recommendations For The Authors)::

      Some analyses are incomplete, which would be improved if the authors showed analyses with other layers of the networks and various additional partial correlation analyses.

      Clarity

      (1) Partial correlations methods incomplete - it is not clear what is being partialled out in each analysis. It is possible to guess sometimes, but it is not entirely clear for each analysis. This is important as it is difficult to assess if the partial correlations are sensible/correct in each case. Also, the Figure 1 caption is short and unclear.

      For example, ANN-EEG partial correlations - "Finally, we directly compared the timepoint-bytimepoint EEG neural RDMs and the ANN RDMs (Figure 3F). The early layer representations of both ResNet and CLIP were significantly correlated with early representations in the human brain" What is being partialled out? Figure 3F says partial correlation

      We apologize for the confusion. We made several key clarifications and corrections in the revised version.

      First, we identified and corrected a labeling error in both Figure 1 and Figure 3F. Specifically, our EEG × ANN analysis used Spearman correlation, not partial correlation as mistakenly indicated in the original figure label and text. We conducted parital correlations for EEG × HYP and ANN × HYP. But for EEG × ANN, we directly calculated the correlation between EEG RDMs and ANN RDM corresponding to different layers respectively. We corrected these errors: (1) In Figure 1, we removed the erroneous “partial” label from the EEG × ANN path and updated the caption to clearly outline which comparisons used partial correlation. (2) In Figure 3F, we corrected the Y-axis label to “(correlation)”.

      Second, to improve clarity, we have now revised the Materials and Methods section to explicitly describe what is partialled out in each parital correlation analysis:

      (line 284 to 286) “In EEG × HYP partial correlation (Figure 3D), we correlated EEG RDMs with one hypothesis-based RDM (e.g., real-world size), while controlling for the other two (retinal size and real-world depth).”

      (line 303 to 305) “In ANN (or W2V) × HYP partial correlation (Figure 3E and Figure 5A), we correlated ANN (or W2V) RDMs with one hypothesis-based RDM (e.g., real-world size), while partialling out the other two.”

      Finally, the caption of Figure 1 has been expanded to clarify the full analysis pipeline and explicitly specify the partial correlation or correlation in each comparison.

      (line 327 to 332) “Figure 1 Overview of our analysis pipeline including constructing three types of RDMs and conducting comparisons between them. We computed RDMs from three sources: neural data (EEG), hypothesized object features (real-world size, retinal size, and real-world depth), and artificial models (ResNet, CLIP, and Word2Vec). Then we conducted cross-modal representational similarity analyses between: EEG × HYP (partial correlation, controlling for other two HYP features), ANN (or W2V) × HYP (partial correlation, controlling for other two HYP features), and EEG × ANN (correlation).”

      We believe these revisions now make all analytic comparisons and correlation types full clear and interpretable.

      Issues / open questions

      (2) Semantic representations vs hypothesized (hyp) RDMs (real-world size, etc) - are the representations explained by variables in hyp RDMs or are there semantic representations over and above these? E.g., For ANN correlation with the brain, you could partial out hyp RDMs - and assess whether there is still semantic information left over, or is the variance explained by the hyp RDMs?

      Thank for this suggestion. As you suggested, we conducted the partial correlation analysis between EEG RDMs and ANN RDMs, controlling for the three hypothesis-based RDMs. The results (Figure S6) revealed that the EEG×ANN representational similarity remained largely unchanged, indicating that ANN representations capture much more additional representational structure not accounted for by the current hypothesized features. This is also consistent with the observation that EEG×HYP partial correlations were themselves small, but EEG×ANN correlations were much greater.

      We also added this statement to the main text:

      (line 446 to 451) “To contextualize how much of the shared variance between EEG and ANN representations is driven by the specific visual object features we tested above, we conducted a partial correlation analysis between EEG RDMs and ANN RDMs controlling for the three hypothesis-based RDMs (Figure S6). The EEG×ANN similarity results remained largely unchanged, suggesting that ANN representations capture much more additional rich representational structure beyond these features. ”

      (3) Why only early and late layers? I can see how it's clearer to present the EEG results. However, the many layers in these networks are an opportunity - we can see how simple/complex linear/non-linear the transformation is over layers in these models. It would be very interesting and informative to see if the correlations do in fact linearly increase from early to later layers, or if the story is a bit more complex. If not in the main text, then at least in the supplement.

      Thank you for the thoughtful suggestion. To address this point, we have computed the EEG correlations with multiple layers in both ResNet and CLIP models (ResNet: ResNet.maxpool, ResNet.layer1, ResNet.layer2, ResNet.layer3, ResNet.layer4, ResNet.avgpool; CLIP:CLIP.visual.avgpool, CLIP.visual.layer1, CLIP.visual.layer2, CLIP.visual.layer3, CLIP.visual.layer4, CLIP.visual.attnpool). The results, now included in Figure S4 and S5, show a consistent trend: early layers exhibit higher similarity to early EEG time points, and deeper layers show increased similarity to later EEG stages. We chose to highlight early and late layers in the main text to simplify interpretation, but now provide the full layerwise profile for completeness.

      (4) Peak latency analysis - Estimating peaks per ppt is presumably noisy, so it seems important to show how reliable this is. One option is to find the bootstrapped mean latencies per subject.

      Thanks for your suggestion. To estimate the robustness of peak latency values, we implemented a bootstrap procedure by resampling the pairwise entries of the EEG RDM with replacement. For each bootstrap sample, we computed a new EEG RDM and recalculated the partial correlation time course with the hypothesis RDMs. We then extracted the peak latency within the predefined significant time window. Repeating this process 1000 times allowed us to get the bootstrapped mean latencies per subject as the more stable peak latency result. Notably, the bootstrapped results showed minimal deviation from the original latency estimates, confirming the robustness of our findings. Accordingly, we updated the Figure 3D and added these in the Materials and Methods section:

      (line 289 to 298) “To assess the stability of peak latency estimates for each subject, we performed a bootstrap procedure across stimulus pairs. At each time point, the EEG RDM was vectorized by extracting the lower triangle (excluding the diagonal), resulting in 19,900 unique pairwise values. For each bootstrap sample, we resampled these 19,900 pairwise entries with replacement to generate a new pseudo-RDM of the same size. We then computed the partial correlation between the EEG pseudo-RDM and a given hypothesis RDM (e.g., real-world size), controlling for other feature RDMs, and obtained a time course of partial correlations. Repeating this procedure 1000 times and extracting the peak latency within the significant time window yielded a distribution of bootstrapped latencies, from which we got the bootstrapped mean latencies per subject.”

      (5) "Due to our calculations being at the object level, if there were more than one of the same objects in an image, we cropped the most complete one to get a more accurate retinal size. " Did EEG experimenters make sure everyone sat the same distance from the screen? and remain the same distance? This would also affect real-world depth measures.

      Yes, the EEG dataset we used (THINGS EEG2; Gifford et al., 2022) was collected under carefully controlled experimental conditions. We have confirmed that all participants were seated at a fixed distance of 0.6 meters from the screen throughout the experiment. We also added this information in the method (line 156 to 157).

      Minor issues/questions - note that these are not raised in the Public Review

      (6) Title - less about rigor/quality of the work but I feel like the title could be improved/extended. The work tells us not only about real object size, but also retinal size and depth. In fact, isn't the most novel part of this the real-world depth aspect? Furthermore, it feels like the current title restricts its relevance and impact... Also doesn't touch on the temporal aspect, or processing stages, which is also very interesting. There may be something better, but simply adding something like"...disentangled features of real-world size, depth, and retinal size over time OR processing stages".

      Thanks for your suggestion! We changed our title – “Human EEG and artificial neural networks reveal disentangled representations and processing timelines of object real-world size and depth in natural images”.

      (7) "Each subject viewed 16740 images of objects on a natural background for 1854 object concepts from the THINGS dataset (Hebart et al., 2019). For the current study, we used the 'test' dataset portion, which includes 16000 trials per subject corresponding to 200 images." Why test images? Worth explaining.

      We chose to use the “test set” of the THINGS EEG2 dataset for the following two reasons:

      (1) Higher trial count per condition: In the test set, each of the 200 object images was presented 80 times per subject, whereas in the training set, each image was shown only 4 times. This much higher trial count per condition in the test set allows for substantially higher signal-tonoise ratio in the EEG data.

      (2) Improved decoding reliability: Our analysis relies on constructing EEG RDMs based on pairwise decoding accuracy using linear SVM classifiers. Reliable decoding estimates require a sufficient number of trials per condition. The test set design is thus better suited to support high-fidelity decoding and robust representational similarity analysis.

      We also added these explainations to our revised manuscript (line 161 to 164).

      (8) "For Real-World Size RDM, we obtained human behavioral real-world size ratings of each object concept from the THINGS+ dataset (Stoinski et al., 2022).... The range of possible size ratings was from 0 to 519 in their online size rating task..." How were the ratings made? What is this scale - do people know the numbers? Was it on a continuous slider?

      We should clarify how the real-world size values were obtained from the THINGS+ dataset.

      In their experiment, participants first rated the size of a single object concept (word shown on the screen) by clicking on a continuous slider of 520 units, which was anchored by nine familiar real-world reference objects (e.g., “grain of sand,” “microwave oven,” “aircraft carrier”) that spanned the full expected size range on a logarithmic scale. Importantly, participants were not shown any numerical values on the scale—they were guided purely by the semantic meaning and relative size of the anchor objects. After the initial response, the scale zoomed in around the selected region (covering 160 units of the 520-point scale) and presented finer anchor points between the previous reference objects. Participants then refined their rating by dragging from the lower to upper end of the typical size range for that object. If the object was standardized in size (e.g., “soccer ball”), a single click sufficed. These size judgments were collected across at least 50 participants per object, and final scores were derived from the central tendency of these responses. Although the final size values numerically range from 0 to 519 (after scaling), this range is not known to participants and is only applied post hoc to construct the size RDMs.

      We added these details in the Materials and Method section:

      (line 219 to 230) “In the THINGS+ dataset, 2010 participants (different from the subjects in THINGS EEG2) did an online size rating task and completed a total of 13024 trials corresponding to 1854 object concepts using a two-step procedure. In their experiment, first, each object was rated on a 520unit continuous slider anchored by familiar reference objects (e.g., “grain of sand,” “microwave oven,” “aircraft carrier”) representing a logarithmic size range. Participants were not shown numerical values but used semantic anchors as guides. In the second step, the scale zoomed in around the selected region to allow for finer-grained refinement of the size judgment. Final size values were derived from aggregated behavioral data and rescaled to a range of 0–519 for consistency across objects, with the actual mean ratings across subjects ranging from 100.03 (‘grain of sand’) to 423.09 (‘subway’).”

      (9) "For Retinal Size RDM, we applied Adobe Photoshop (Adobe Inc., 2019) to crop objects corresponding to object labels from images manually... " Was this by one person? Worth noting, and worth sharing these values per image if not already for other researchers as it could be a valuable resource (and increase citations).

      Yes, all object cropping were performed consistently by one of the authors to ensure uniformity across images. We agree that this dataset could be a useful resource to the community. We have now made the cropped object images publicly available https://github.com/ZitongLu1996/RWsize.

      We also updated the manuscript accordingly to note this (line 236 to 239).

      (10) "Neural RDMs. From the EEG signal, we constructed timepoint-by-timepoint neural RDMs for each subject with decoding accuracy as the dissimilarity index " Decoding accuracy is presumably a similarity index. Maybe 1-accuracy (proportion correct) for dissimilarity?

      Decoding accuracy is a dissimilarity index instead of a similarity index, as higher decoding accuracy between two conditions indicates that they are more distinguishable – i.e., less similar – in the neural response space. This approach aligns with prior work using classification-based representational dissimilarity measures (Grootswagers et al., 2017; Xie et al., 2020), where better decoding implies greater dissimilarity between conditions. Therefore, there is no need to invert the decoding accuracy values (e.g., using 1 - accuracy).

      Grootswagers, T., Wardle, S. G., & Carlson, T. A. (2017). Decoding dynamic brain patterns from evoked responses: A tutorial on multivariate pattern analysis applied to time series neuroimaging data. Journal of Cognitive Neuroscience, 29(4), 677-697.

      Xie, S., Kaiser, D., & Cichy, R. M. (2020). Visual imagery and perception share neural representations in the alpha frequency band. Current Biology, 30(13), 2621-2627.

      (11) Figure 1 caption is very short - Could do with a more complete caption. Unclear what the partial correlations are (what is being partialled out in each case), what are the comparisons "between them" - both in the figure and the caption. Details should at least be in the main text.

      Related to your comment (1). We revised the caption and the corresponding text.

      Reviewer #2 (Recommendations For The Authors):

      (1) Intro:

      Quek et al., (2023) is referred to as a behavioral study, but it has EEG analyses.

      We corrected this – “…, one recent study (Quek et al., 2023) …”

      The phrase 'high temporal resolution EEG' is a bit strange - isn't all EEG high temporal resolution? Especially when down-sampling to 100 Hz (40 time points/epoch) this does not qualify as particularly high-res.

      We removed this phrasing in our manuscript.

      (2) Methods:

      It would be good to provide more details on the EEG preprocessing. Were the data low-pass filtered, for example?

      We added more details to the manuscript:

      (line 167 to 174) “The EEG data were originally sampled at 1000Hz and online-filtered between 0.1 Hz and 100 Hz during acquisition, with recordings referenced to the Fz electrode. For preprocessing, no additional filtering was applied. Baseline correction was performed by subtracting the mean signal during the 100 ms pre-stimulus interval from each trial and channel separately. We used already preprocessed data from 17 channels with labels beginning with “O” or “P” (O1, Oz, O2, PO7, PO3, POz, PO4, PO8, P7, P5, P3, P1, Pz, P2) ensuring full coverage of posterior regions typically involved in visual object processing. The epoched data were then down-sampled to 100 Hz.”

      It is important to provide more motivation about the specific ANN layers chosen. Were these layers cherry-picked, or did they truly represent a gradual shift over the course of layers?

      We appreciate the reviewer’s concern and fully agree that it is important to ensure transparency in how ANN layers were selected. The early and late layers reported in the main text were not cherry-picked to maximize effects, but rather intended to serve as illustrative examples representing the lower and higher ends of the network hierarchy. To address this point directly, we have computed the EEG correlations with multiple layers in both ResNet and CLIP models (ResNet: ResNet.maxpool, ResNet.layer1, ResNet.layer2, ResNet.layer3, ResNet.layer4, ResNet.avgpool; CLIP: CLIP.visual.avgpool, CLIP.visual.layer1, CLIP.visual.layer2, CLIP.visual.layer3, CLIP.visual.layer4, CLIP.visual.attnpool). The results, now included in Figure S4, show a consistent trend: early layers exhibit higher similarity to early EEG time points, and deeper layers show increased similarity to later EEG stages.

      It is important to provide more specific information about the specific ANN layers chosen. 'Second convolutional layer': is this block 2, the ReLu layer, the maxpool layer? What is the 'last visual layer'?

      Apologize for the confusing! We added more details about the layer chosen:

      (line 255 to 257) “The early layer in ResNet refers to ResNet.maxpool layer, and the late layer in ResNet refers to ResNet.avgpool layer. The early layer in CLIP refers to CLIP.visual.avgpool layer, and the late layer in CLIP refers to CLIP.visual.attnpool layer.”

      Again the claim 'novel' is a bit overblown here since the real-world size ratings were also already collected as part of THINGS+, so all data used here is available.

      We removed this phrasing in our manuscript.

      Real-world size ratings ranged 'from 0 - 519'; it seems unlikely this was the actual scale presented to subjects, I assume it was some sort of slider?

      You are correct. We should clarify how the real-world size values were obtained from the THINGS+ dataset.

      In their experiment, participants first rated the size of a single object concept (word shown on the screen) by clicking on a continuous slider of 520 units, which was anchored by nine familiar real-world reference objects (e.g., “grain of sand,” “microwave oven,” “aircraft carrier”) that spanned the full expected size range on a logarithmic scale. Importantly, participants were not shown any numerical values on the scale—they were guided purely by the semantic meaning and relative size of the anchor objects. After the initial response, the scale zoomed in around the selected region (covering 160 units of the 520-point scale) and presented finer anchor points between the previous reference objects. Participants then refined their rating by dragging from the lower to upper end of the typical size range for that object. If the object was standardized in size (e.g., “soccer ball”), a single click sufficed. These size judgments were collected across at least 50 participants per object, and final scores were derived from the central tendency of these responses. Although the final size values numerically range from 0 to 519 (after scaling), this range is not known to participants and is only applied post hoc to construct the size RDMs.

      We added these details in the Materials and Method section:

      (line 219 to 230) “In the THINGS+ dataset, 2010 participants (different from the subjects in THINGS EEG2) did an online size rating task and completed a total of 13024 trials corresponding to 1854 object concepts using a two-step procedure. In their experiment, first, each object was rated on a 520unit continuous slider anchored by familiar reference objects (e.g., “grain of sand,” “microwave oven,” “aircraft carrier”) representing a logarithmic size range. Participants were not shown numerical values but used semantic anchors as guides. In the second step, the scale zoomed in around the selected region to allow for finer-grained refinement of the size judgment. Final size values were derived from aggregated behavioral data and rescaled to a range of 0–519 for consistency across objects, with the actual mean ratings across subjects ranging from 100.03 (‘grain of sand’) to 423.09 (‘subway’).”

      Why is conducting a one-tailed (p<0.05) test valid for EEG-ANN comparisons? Shouldn't this be two-tailed?

      Our use of one-tailed tests was based on the directional hypothesis that representational similarity between EEG and ANN RDMs would be positive, as supported by prior literature showing correspondence between hierarchical neural networks and human brain representations (e.g., Cichy et al., 2016; Kuzovkin et al., 2014). This is consistent with a large number of RSA studies which conduct one-tailed tests (i.e., testing the hypothesis that coefficients were greater than zero: e.g., Kuzovkin et al., 2018; Nili et al., 2014; Hebart et al., 2018; Kaiser et al., 2019; Kaiser et al., 2020; Kaiser et al., 2022). Thus, we specifically tested whether the similarity was significantly greater than zero.

      Cichy, R. M., Khosla, A., Pantazis, D., Torralba, A., & Oliva, A. (2016). Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Scientific reports, 6(1), 27755.

      Kuzovkin, I., Vicente, R., Petton, M., Lachaux, J. P., Baciu, M., Kahane, P., ... & Aru, J. (2018). Activations of deep convolutional neural networks are aligned with gamma band activity of human visual cortex. Communications biology, 1(1), 107.

      Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., & Kriegeskorte, N. (2014). A toolbox for representational similarity analysis. PLoS computational biology, 10(4), e1003553.

      Hebart, M. N., Bankson, B. B., Harel, A., Baker, C. I., & Cichy, R. M. (2018). The representational dynamics of task and object processing in humans. Elife, 7, e32816.

      Kaiser, D., Turini, J., & Cichy, R. M. (2019). A neural mechanism for contextualizing fragmented inputs during naturalistic vision. elife, 8, e48182.

      Kaiser, D., Inciuraite, G., & Cichy, R. M. (2020). Rapid contextualization of fragmented scene information in the human visual system. Neuroimage, 219, 117045.

      Kaiser, D., Jacobs, A. M., & Cichy, R. M. (2022). Modelling brain representations of abstract concepts. PLoS Computational Biology, 18(2), e1009837.

      Importantly, we note that using a two-tailed test instead would not change the significance of our results. However, we believe the one-tailed test remains more appropriate given our theoretical prediction of positive similarity between ANN and brain representations.

      The sentence on the partial correlation description (page 11 'we calculated partial correlations with one-tailed test against the alternative hypothesis that the partial correlation was positive (greater than zero)') didn't make sense to me; are you referring to the null hypothesis here?

      We revised this sentence to clarify that we tested against the null hypothesis that the partial correlation was less than or equal to zero, using a one-tailed test to assess whether the correlation was significantly greater than zero.

      (line 281 to 284) “…, we calculated partial correlations and used a one-tailed test against the null hypothesis that the partial correlation was less than or equal to zero, testing whether the partial correlation was significantly greater than zero.”

      (3) Results:

      I would prevent the use of the word 'pure', your measurement is one specific operationalization of this concept of real-world size that is not guaranteed to result in unconfounded representations. This is in fact impossible whenever one is using a finite set of natural stimuli and calculating metrics on those - there can always be a factor or metric that was not considered that could explain some of the variance in your measurement. It is overconfident to claim to have achieved some form of Platonic ideal here and to have taken into account all confounds.

      Your point is well taken. Our original use of the term “pure” was intended to reflect statistical control for known confounding factors, but we recognize that this wording may imply a stronger claim than warranted. In response, we revised all relevant language in the manuscript to instead describe the statistically isolated or relatively unconfounded representation of real-world size, clarifying that our findings pertain to the unique contribution of real-world size after accounting for retinal size and real-world depth.

      Figure 2C: It's not clear why peak latencies are computed on the 'full' correlations rather than the partial ones.

      No. The peak latency results in Figure 2C were computed on the partial correlation results – we mentioned this in the figure caption – “Temporal latencies for peak similarity (partial Spearman correlations) between EEG and the 3 types of object information.”

      SEM = SEM across the 10 subjects?

      Yes. We added this in the figure caption.

      Figure 3F y-axis says it's partial correlations but not clear what is partialled out here.

      We identified and corrected a labeling error in both Figure 1 and Figure 3F. Specifically, our EEG × ANN analysis used Spearman correlation, not partial correlation as mistakenly indicated in the original figure label and text. We conducted parital correlations for EEG × HYP and ANN × HYP. But for EEG × ANN, we directly calculated the correlation between EEG RDMs and ANN RDM corresponding to different layers respectively. We corrected these errors: (1) In Figure 1, we removed the erroneous “partial” label from the EEG × ANN path and updated the caption to clearly outline which comparisons used partial correlation. (2) In Figure 3F, we corrected the Y-axis label to “(correlation)”.

      Reviewer #3 (Recommendations For The Authors):

      (1) Several methodologies should be clarified:

      (a) It's stated that EEG was sampled at 100 Hz. I assume this was downsampled? From what original frequency?

      Yes. We added more detailed about EEG data:

      (line 167 to 174) “The EEG data were originally sampled at 1000Hz and online-filtered between 0.1 Hz and 100 Hz during acquisition, with recordings referenced to the Fz electrode. For preprocessing, no additional filtering was applied. Baseline correction was performed by subtracting the mean signal during the 100 ms pre-stimulus interval from each trial and channel separately. We used already preprocessed data from 17 channels with labels beginning with “O” or “P” (O1, Oz, O2, PO7, PO3, POz, PO4, PO8, P7, P5, P3, P1, Pz, P2) ensuring full coverage of posterior regions typically involved in visual object processing. The epoched data were then down-sampled to 100 Hz.”

      (b) Why was decoding accuracy used as the human RDM method rather than the EEG data themselves?

      Thanks for your question! We would like to address why we used decoding accuracy for EEG RDMs rather than correlation. While fMRI RDMs are typically calculated using 1 minus correlation coefficient, decoding accuracy is more commonly used for EEG RDMs (Grootswager et al., 2017; Xie et al., 2020). The primary reason is that EEG signals are more susceptible to noise than fMRI data. Correlation-based methods are particularly sensitive to noise and may not reliably capture the functional differences between EEG patterns for different conditions. Decoding accuracy, by training classifiers to focus on task-relevant features, can effectively mitigate the impact of noisy signals and capture the representational difference between two conditions.

      Grootswagers, T., Wardle, S. G., & Carlson, T. A. (2017). Decoding dynamic brain patterns from evoked responses: A tutorial on multivariate pattern analysis applied to time series neuroimaging data. Journal of Cognitive Neuroscience, 29(4), 677-697.

      Xie, S., Kaiser, D., & Cichy, R. M. (2020). Visual imagery and perception share neural representations in the alpha frequency band. Current Biology, 30(13), 2621-2627.

      We added this explanation to the manuscript:

      (line 204 to 209) “Since EEG has a low SNR and includes rapid transient artifacts, Pearson correlations computed over very short time windows yield unstable dissimilarity estimates (Kappenman & Luck, 2010; Luck, 2014) and may thus fail to reliably detect differences between images. In contrast, decoding accuracy - by training classifiers to focus on task-relevant features - better mitigates noise and highlights representational differences.”

      (c) How were the specific posterior electrodes selected?

      The 17 posterior electrodes used in our analyses were pre-selected and provided in the THINGS EEG2 dataset, and corresponding to standard occipital and parietal sites based on the 10-10 EEG system. Specifically, we included all 17 electrodes with labels beginning with “O” or “P”, ensuring full coverage of posterior regions typically involved in visual object processing (Page 7).

      (d) The specific layers should be named rather than the vague ("last visual")

      Apologize for the confusing! We added more details about the layer information:

      (line 255 to 257) “The early layer in ResNet refers to ResNet.maxpool layer, and the late layer in ResNet refers to ResNet.avgpool layer. The early layer in CLIP refers to CLIP.visual.avgpool layer, and the late layer in CLIP refers to CLIP.visual.attnpool layer.”

      (line 420 to 434) “As shown in Figure 3F, the early layer representations of both ResNet and CLIP (ResNet.maxpool layer and CLIP.visual.avgpool) showed significant correlations with early EEG time windows (early layer of ResNet: 40-280ms, early layer of CLIP: 50-130ms and 160-260ms), while the late layers (ResNet.avgpool layer and CLIP.visual.attnpool layer) showed correlations extending into later time windows (late layer of ResNet: 80-300ms, late layer of CLIP: 70-300ms). Although there is substantial temporal overlap between early and late model layers, the overall pattern suggests a rough correspondence between model hierarchy and neural processing stages.

      We further extended this analysis across intermediate layers of both ResNet and CLIP models (from early to late, ResNet: ResNet.maxpool, ResNet.layer1, ResNet.layer2, ResNet.layer3, ResNet.layer4, ResNet.avgpool; from early to late, CLIP: CLIP.visual.avgpool, CLIP.visual.layer1, CLIP.visual.layer2, CLIP.visual.layer3, CLIP.visual.layer4, CLIP.visual.attnpool).”

      (e) p19: please change the reporting of t-statistics to standard APA format.

      Thanks for the suggestion. We changed the reporting format accordingly:

      (line 392 to 394) “The representation of real-word size had a significantly later peak latency than that of both retinal size, t(9)=4.30, p=.002, and real-world depth, t(9)=18.58, p<.001. And retinal size representation had a significantly later peak latency than real-world depth, t(9)=3.72, p=.005.”

      (2) "early layer of CLIP: 50-130ms and 160-260ms), while the late layer representations of twoANNs were significantly correlated with later representations in the human brain (late layer of ResNet: 80-300ms, late layer of CLIP: 70-300ms)."

      This seems a little strong, given the large amount of overlap between these models.

      We agree that our original wording may have overstated the distinction between early and late layers, given the substantial temporal overlap in their EEG correlations. We revised this sentence to soften the language to reflect the graded nature of the correspondence, and now describe the pattern as a general trend rather than a strict dissociation:

      (line 420 to 427) “As shown in Figure 3F, the early layer representations of both ResNet and CLIP (ResNet.maxpool layer and CLIP.visual.avgpool) showed significant correlations with early EEG time windows (early layer of ResNet: 40-280ms, early layer of CLIP: 50-130ms and 160-260ms), while the late layers (ResNet.avgpool layer and CLIP.visual.attnpool layer) showed correlations extending into later time windows (late layer of ResNet: 80-300ms, late layer of CLIP: 70-300ms). Although there is substantial temporal overlap between early and late model layers, the overall pattern suggests a rough correspondence between model hierarchy and neural processing stages.”

      (3) "Also, human brain representations showed a higher similarity to the early layer representation of the visual model (ResNet) than to the visual-semantic model (CLIP) at an early stage. "

      This has been previously reported by Greene & Hansen, 2020 J Neuro.

      Thanks! We added this reference.

      (4) "ANN (and Word2Vec) model RDMs"

      Why not just "model RDMs"? Might provide more clarity.

      We chose to use the phrasing “ANN (and Word2Vec) model RDMs” to maintain clarity and avoid ambiguity. In the literature, the term “model RDMs” is sometimes used more broadly to include hypothesis-based feature spaces or conceptual models, and we wanted to clearly distinguish our use of RDMs derived from artificial neural networks and language models. Additionally, explicitly referring to ANN or Word2Vec RDMs improves clarity by specifying the model source of each RDM. We hope this clarification justifies our choice to retain the original phrasing for clarity.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      In this manuscript, Hoon Cho et al. present a novel investigation into the role of PexRAP, an intermediary in ether lipid biosynthesis, in B cell function, particularly during the Germinal Center (GC) reaction. The authors profile lipid composition in activated B cells both in vitro and in vivo, revealing the significance of PexRAP. Using a combination of animal models and imaging mass spectrometry, they demonstrate that PexRAP is specifically required in B cells. They further establish that its activity is critical upon antigen encounter, shaping B cell survival during the GC reaction. Mechanistically, they show that ether lipid synthesis is necessary to modulate reactive oxygen species (ROS) levels and prevent membrane peroxidation.

      Highlights of the Manuscript:

      The authors perform exhaustive imaging mass spectrometry (IMS) analyses of B cells, including GC B cells, to explore ether lipid metabolism during the humoral response. This approach is particularly noteworthy given the challenge of limited cell availability in GC reactions, which often hampers metabolomic studies. IMS proves to be a valuable tool in overcoming this limitation, allowing detailed exploration of GC metabolism.

      The data presented is highly relevant, especially in light of recent studies suggesting a pivotal role for lipid metabolism in GC B cells. While these studies primarily focus on mitochondrial function, this manuscript uniquely investigates peroxisomes, which are linked to mitochondria and contribute to fatty acid oxidation (FAO). By extending the study of lipid metabolism beyond mitochondria to include peroxisomes, the authors add a critical dimension to our understanding of B cell biology.

      Additionally, the metabolic plasticity of B cells poses challenges for studying metabolism, as genetic deletions from the beginning of B cell development often result in compensatory adaptations. To address this, the authors employ an acute loss-of-function approach using two conditional, cell-type-specific gene inactivation mouse models: one targeting B cells after the establishment of a pre-immune B cell population (Dhrs7b^f/f, huCD20-CreERT2) and the other during the GC reaction (Dhrs7b^f/f; S1pr2-CreERT2). This strategy is elegant and well-suited to studying the role of metabolism in B cell activation.

      Overall, this manuscript is a significant contribution to the field, providing robust evidence for the fundamental role of lipid metabolism during the GC reaction and unveiling a novel function for peroxisomes in B cells. 

      Comments on revisions:

      There are still some discrepancies in gating strategies. In Fig. 7B legend (lines 1082-1083), they show representative flow plots of GL7+ CD95+ GC B cells among viable B cells, so it is not clear if they are IgDneg, as the rest of the GC B cells aforementioned in the text.

      We apologize for missing this item in need of correction in the revision and sincerely thank the reviewer for the stamina and care in picking this up. The data shown in Fig. 7B represented cells (events) in the IgD<sup>neg</sup> Dump<sup>neg</sup> viable lymphoid gate. We will correct this omission/blemish in the final revision that becomes the version of record.

      Western blot confirmation: We understand the limitations the authors enumerate. Perhaps an RT-qPCR analysis of the Dhrs7b gene in sorted GC B cells from the S1PR2-CreERT2 model could be feasible, as it requires a smaller number of cells. In any case, we agree with the authors that the results obtained using the huCD20-CreERT2 model are consistent with those from the S1PR2-CreERT2 model, which adds credibility to the findings and supports the conclusion that GC B cells in the S1PR2-CreERT2 model are indeed deficient in PexRAP.

      We will make efforts to go back through the manuscript and highlight this limitation to readers, i.e., that we were unable to get genetic evidence to assess what degree of "counter-selection" applied to GC B cells in our experiments.

      We agree with the referee that optimally to support the Imaging Mass Spectrometry (IMS) data showing perturbations of various ether lipids within GC after depletion of PexRAP, it would have been best if we could have had a qRT2-PCR that allowed quantitation of the Dhrs7b-encoded mRNA in flow-purified GC B cells, or the extent to which the genomic DNA of these cells was in deleted rather than 'floxed' configuration.

      While the short half-life of ether lipid species leads us to infer that the enzymatic function remains reduced/absent, it definitely is unsatisfying that the money for experiments ran out in June and the lab members had to move to new jobs.

      Lines 222-226: We believe the correct figure is 4B, whereas the text refers to 4C.

      As for the 1st item, we apologize and will correct this error.

      Supplementary Figure 1 (line 1147): The figure title suggests that the data on T-cell numbers are from mice in a steady state. However, the legend indicates that the mice were immunized, which means the data are not from steady-state conditions. 

      We will change the wording both on line 1147 and 1152.

      Reviewer #2 (Public review):

      Summary:

      In this study, Cho et al. investigate the role of ether lipid biosynthesis in B cell biology, particularly focusing on GC B cell, by inducible deletion of PexRAP, an enzyme responsible for the synthesis of ether lipids.

      Strengths:

      Overall, the data are well-presented, the paper is well-written and provides valuable mechanistic insights into the importance of PexRAP enzyme in GC B cell proliferation.

      Weaknesses:

      More detailed mechanisms of the impaired GC B cell proliferation by PexRAP deficiency remain to be further investigated. In minor part, there are issues for the interpretation of the data which might cause confusions by readers.

      Comments on revisions:

      The authors improved the manuscript appropriately according to my comments.

      To re-summarize, we very much appreciate the diligence of the referees and Editors in re-reviewing this work at each cycle and helping via constructive peer review, along with their favorable comments and overall assessments. The final points will be addressed with minor edits since there no longer is any money for further work and the lab people have moved on.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      In this manuscript, Sung Hoon Cho et al. presents a novel investigation into the role of PexRAP, an intermediary in ether lipid biosynthesis, in B cell function, particularly during the Germinal Center (GC) reaction. The authors profile lipid composition in activated B cells both in vitro and in vivo, revealing the significance of PexRAP. Using a combination of animal models and imaging mass spectrometry, they demonstrate that PexRAP is specifically required in B cells. They further establish that its activity is critical upon antigen encounter, shaping B cell survival during the GC reaction. 

      Mechanistically, they show that ether lipid synthesis is necessary to modulate reactive oxygen species (ROS) levels and prevent membrane peroxidation.

      Highlights of the Manuscript:

      The authors perform exhaustive imaging mass spectrometry (IMS) analyses of B cells, including GC B cells, to explore ether lipid metabolism during the humoral response. This approach is particularly noteworthy given the challenge of limited cell availability in GC reactions, which often hampers metabolomic studies. IMS proves to be a valuable tool in overcoming this limitation, allowing detailed exploration of GC metabolism.

      The data presented is highly relevant, especially in light of recent studies suggesting a pivotal role for lipid metabolism in GC B cells. While these studies primarily focus on mitochondrial function, this manuscript uniquely investigates peroxisomes, which are linked to mitochondria and contribute to fatty acid oxidation (FAO). By extending the study of lipid metabolism beyond mitochondria to include peroxisomes, the authors add a critical dimension to our understanding of B cell biology.

      Additionally, the metabolic plasticity of B cells poses challenges for studying metabolism, as genetic deletions from the beginning of B cell development often result in compensatory adaptations. To address this, the authors employ an acute loss-of-function approach using two conditional, cell-type-specific gene inactivation mouse models: one targeting B cells after the establishment of a pre-immune B cell population (Dhrs7b^f/f, huCD20-CreERT2) and the other during the GC reaction (Dhrs7b^f/f; S1pr2-CreERT2). This strategy is elegant and well-suited to studying the role of metabolism in B cell activation.

      Overall, this manuscript is a significant contribution to the field, providing robust evidence for the fundamental role of lipid metabolism during the GC reaction and unveiling a novel function for peroxisomes in B cells.

      We appreciate these positive reactions and response, and agree with the overview and summary of the paper's approaches and strengths.

      However, several major points need to be addressed:

      Major Comments:

      Figures 1 and 2

      The authors conclude, based on the results from these two figures, that PexRAP promotes the homeostatic maintenance and proliferation of B cells. In this section, the authors first use a tamoxifen-inducible full Dhrs7b knockout (KO) and afterwards Dhrs7bΔ/Δ-B model to specifically characterize the role of this molecule in B cells. They characterize the B and T cell compartments using flow cytometry (FACS) and examine the establishment of the GC reaction using FACS and immunofluorescence. They conclude that B cell numbers are reduced, and the GC reaction is defective upon stimulation, showing a reduction in the total percentage of GC cells, particularly in the light zone (LZ).

      The analysis of the steady-state B cell compartment should also be improved. This includes a  more detailed characterization of MZ and B1 populations, given the role of lipid metabolism and lipid peroxidation in these subtypes.

      Suggestions for Improvement:

      B Cell compartment characterization: A deeper characterization of the B cell compartment in non-immunized mice is needed, including analysis of Marginal Zone (MZ) maturation and a more detailed examination of the B1 compartment. This is especially important given the role of specific lipid metabolism in these cell types. The phenotyping of the B cell compartment should also include an analysis of immunoglobulin levels on the membrane, considering the impact of lipids on membrane composition.

      Although the manuscript is focused on post-ontogenic B cell regulation in Ab responses, we believe we will be able to polish a revised manuscript through addition of results of analyses suggested by this point in the review: measurement of surface IgM on and phenotyping of various B cell subsets, including MZB and B1 B cells, to extend the data in Supplemental Fig 1H and I. Depending on the level of support, new immunization experiments to score Tfh and analyze a few of their functional molecules as part of a B cell paper may be feasible.   

      Addendum / update of Sept 2025: We added new data with more on MZB and B1 B cells, surface IgM, and on Tfh populations. 

      GC Response Analysis Upon Immunization: The GC response characterization should include additional data on the T cell compartment, specifically the presence and function of Tfh cells. In Fig. 1H, the distribution of the LZ appears strikingly different. However, the authors have not addressed this in the text. A more thorough characterization of centroblasts and centrocytes using CXCR4 and CD86 markers is needed.

      The gating strategy used to characterize GC cells (GL7+CD95+ in IgD− cells) is suboptimal. A more robust analysis of GC cells should be performed in total B220+CD138− cells.

      We first want to apologize the mislabeling of LZ and DZ in Fig 1H. The greenish-yellow colored region (GL7<sup>+</sup> CD35<sup>+</sup>) indicate the DZ and the cyan-colored region (GL7<sup>+</sup> CD35<sup>+</sup>) indicates the LZ.    Addendum / update of Sept 2025: We corrected the mistake, and added new experimental data using the CD138 marker to exclude preplasmablasts.  

      As a technical note, we experienced high background noise with GL7 staining uniquely with PexRAP deficient (Dhrs7b<sup>f/f</sup>; Rosa26-CreER<sup>T2</sup>) mice (i.e., not WT control mice). The high background noise of GL7 staining was not observed in B cell specific KO of PexRAP (Dhrs7b<sup>f/f</sup>; huCD20-CreER<sup>T2</sup>). Two formal possibilities to account for this staining issue would be if either the expression of the GL7 epitope were repressed by PexRAP or the proper positioning of GL7<sup>+</sup> cells in germinal center region were defective in PexRAPdeficient mice (e.g., due to an effect on positioning cues from cell types other than B cells). In a revised manuscript, we will fix the labeling error and further discuss the GL7 issue, while taking care not to be thought to conclude that there is a positioning problem or derepression of GL7 (an activation antigen on T cells as well as B cells).

      While the gating strategy for an overall population of GC B cells is fairly standard even in the current literature, the question about using CD138 staining to exclude early plasmablasts (i.e., analyze B220<sup>+</sup> CD138<sup>neg</sup> vs B220<sup>+</sup> CD138<sup>+</sup>) is interesting. In addition, some papers like to use GL7<sup>+</sup> CD38<sup>neg</sup> for GC B cells instead of GL7<sup>+</sup> Fas (CD95)<sup>+</sup>, and we thank the reviewer for suggesting the analysis of centroblasts and centrocytes. For the revision, we will try to secure resources to revisit the immunizations and analyze them for these other facets of GC B cells (including CXCR4/CD86) and for their GL7<sup>+</sup> CD38<sup>neg</sup>. B220<sup>+</sup> CD138<sup>-</sup> and B220<sup>+</sup> CD138<sup>+</sup> cell populations. 

      We agree that comparison of the Rosa26-CreERT2 results to those with B cell-specific lossof-function raise a tantalizing possibility that Tfh cells also are influenced by PexRAP. Although the manuscript is focused on post-ontogenic B cell regulation in Ab responses, we hope to add a new immunization experiments that scores Tfh and analyzes a few of their functional molecules could be added to this B cell paper, depending on the ability to wheedle enough support / fiscal resources.  

      Addendum / update of Sept 2025: Within the tight time until lab closure, and limited $$, we were able to do experiments that further reinforced the GC B cell data - including stains for DZ vs LZ sub-subsetting - and analyzed Tfh cells. We were not able to explore changes in functional antigenic markers on the GC B or Tfh cells. 

      The authors claim that Dhrs7b supports the homeostatic maintenance of quiescent B cells in vivo and promotes effective proliferation. This conclusion is primarily based on experiments where CTV-labeled PexRAP-deficient B cells were adoptively transferred into μMT mice (Fig. 2D-F). However, we recommend reviewing the flow plots of CTV in Fig. 2E, as they appear out of scale. More importantly, the low recovery of PexRAP-deficient B cells post-adoptive transfer weakens the robustness of the results and is insufficient to conclusively support the role of PexRAP in B cell proliferation in vivo.

      In the revision, we will edit the text and try to adjust the digitized cytometry data to allow more dynamic range to the right side of the upper panels in Fig. 2E, and otherwise to improve the presentation of the in vivo CTV result. However, we feel impelled to push back respectfully on some of the concern raised here. First, it seems to gloss over the presentation of multiple facets of evidence. The conclusion about maintenance derives primarily from Fig. 2C, which shows a rapid, statistically significant decrease in B cell numbers (extending the finding of Fig. 1D, a more substantial decrease after a bit longer a period). As noted in the text, the rate of de novo B cell production does not suffice to explain the magnitude of the decrease. 

      In terms of proliferation, we will improve presentation of the Methods but the bottom line is that the recovery efficiency is not bad (comparing to prior published work) inasmuch as transferred B cells do not uniformly home to spleen. In a setting where BAFF is in ample supply in vivo, we transferred equal numbers of cells that were equally labeled with CTV and counted B cells. The CTV result might be affected by lower recovered B cell with PexRAP deficiency, generally, the frequencies of CTV<sup>low</sup> divided population are not changed very much. However, it is precisely because of the pitfalls of in vivo analyses that we included complementary data with survival and proliferation in vitro. The proliferation was attenuated in PexRAP-deficient B cells in vitro; this evidence supports the conclusion that proliferation of PexRAP knockout B cells is reduced. It is likely that PexRAP deficient B cells also have defect in viability in vivo as we observed the reduced B cell number in PexRAP-deficient mice. As the reviewer noticed, the presence of a defect in cycling does, in the transfer experiments, limit the ability to interpret a lower yield of B cell population after adoptive transfer into µMT recipient mice as evidence pertaining to death rates. We will edit the text of the revision with these points in mind. 

      In vitro stimulation experiments: These experiments need improvement. The authors have used anti-CD40 and BAFF for B cell stimulation; however, it would be beneficial to also include antiIgM in the stimulation cocktail. In Fig. 2G, CTV plots do not show clear defects in proliferation, yet the authors quantify the percentage of cells with more than three divisions. These plots should clearly display the gating strategy. Additionally, details about histogram normalization and potential defects in cell numbers are missing. A more in-depth analysis of apoptosis is also required to determine whether the observed defects are due to impaired proliferation or reduced survival. 

      As suggested by reviewer, testing additional forms of B cell activation can help explore the generality (or lack thereof) of findings. We plan to test anti-IgM stimulation together with anti-CD40 + BAFF as well as anti-IgM + TLR7/8, and add the data to a revised and final manuscript. 

      Addendum / update of Sept 2025: The revision includes results of new experiments in which anti-IgM was included in the stimulation cocktail, as well as further data on apoptosis and distinguishing impaired cycling / divisions from reduced survival .

      With regards to Fig. 2G (and 2H), in the revised manuscript we will refine the presentation (add a demonstration of the gating, and explicate histogram normalization of FlowJo). 

      It is an interesting issue in bioscience, but in our presentation 'representative data' really are pretty representative, so a senior author is reminded of a comment Tak Mak made about a reduction (of proliferation, if memory serves) to 0.7 x control. [His point in a comment to referees at a symposium related that to a salary reduction by 30% :) A mathematical alternative is to point out that across four rounds of division for WT cells, a reduction to  0.7x efficiency at each cycle means about 1/4 as many progeny.] 

      We will try to edit the revision (Methods, Legends, Results, Discussion] to address better the points of the last two sentences of the comment, and improve the details that could assist in replication or comparisons (e.g., if someone develops a PexRAP inhibitor as potential therapeutic). 

      For the present, please note that the cell numbers at the end of the cultures are currently shown in Fig 2, panel I. Analogous culture results are shown in Fig 8, panels I, J, albeit with harvesting at day 5 instead of day 4. So, a difference of ≥ 3x needs to be explained. As noted above, a division efficiency reduced to 0.7x normal might account for such a decrease, but in practice the data of Fig. 2I show that the number of PexRAP-deficient B cells at day 4 is similar to the number plated before activation, and yet there has been a reasonable amount of divisions. So cell numbers in the culture of mutant B cells are constant because cycling is active but decreased and insufficient to allow increased numbers ("proliferation" in the true sense) as programmed death is increased. In line with this evidence, Fig 8G-H document higher death rates [i.e., frequencies of cleaved caspase3<sup>+</sup> cell and Annexin V<sup>+</sup> cells] of PexRAP-deficient B cells compared to controls. Thus, the in vitro data lead to the conclusion that both decreased division rates and increased death operate after this form of stimulation. 

      An inference is that this is the case in vivo as well - note that recoveries differed by ~3x (Fig. 2D), and the decrease in divisions (presentation of which will be improved) was meaningful but of lesser magnitude (Fig. 2E, F). 

      Reviewer #2 (Public review):

      Summary:

      In this study, Cho et al. investigate the role of ether lipid biosynthesis in B cell biology, particularly focusing on GC B cell, by inducible deletion of PexRAP, an enzyme responsible for the synthesis of ether lipids.

      Strengths:

      Overall, the data are well-presented, the paper is well-written and provides valuable mechanistic insights into the importance of PexRAP enzyme in GC B cell proliferation.

      We appreciate this positive response and agree with the overview and summary of the paper's approaches and strengths. 

      Weaknesses:

      More detailed mechanisms of the impaired GC B cell proliferation by PexRAP deficiency remain to be further investigated. In the minor part, there are issues with the interpretation of the data which might cause confusion for the readers.

      Issues about contributions of cell cycling and divisions on the one hand, and susceptibility to death on the other, were discussed above, amplifying on the current manuscript text. The aggregate data support a model in which both processes are impacted for mature B cells in general, and mechanistically the evidence and work focus on the increased ROS and modes of death. Although the data in Fig. 7 do provide evidence that GC B cells themselves are affected, we agree that resource limitations had militated against developing further evidence about cycling specifically for GC B cells. We will hope to be able to obtain sufficient data from some specific analysis of proliferation in vivo (e.g., Ki67 or BrdU) as well as ROS and death ex vivo when harvesting new samples from mice immunized to analyze GC B cells for CXCR4/CD86, CD38, CD138 as indicated by Reviewer 1. As suggested by Reviewer 2, we will further discuss the possible mechanism(s) by which proliferation of PexRAP-deficient B cells is impaired. We also will edit the text of a revision where to enhance clarity of data interpretation - at a minimum, to be very clear that caution is warranted in assuming that GC B cells will exhibit the same mechanisms as cultures in vitro-stimulated B cells. 

      Addendum / update of Sept 2025: We were able to obtain results of intravital BrdU incorporation into GC B cells to measure cell cycling rates. The revised manuscript includes these results as well as other new data on apoptosis / survival, while deleting the data about CD138 populations whose interpretation was reasonably questioned by the referees.  

      Reviewer #1 (Recommendations for the authors):

      We believe the evidence presented to support the role of PexRAP in protecting B cells from cell death and promoting B cell proliferation is not sufficiently robust and requires further validation in vivo. While the study demonstrates an increase in ether lipid content within the GC compartment, it also highlights a reduction in mature B cells in PexRAP-deficient mice under steady-state conditions. However, the IMS results (Fig. 3A) indicate that there are no significant differences in ether lipid content in the naïve B cell population. This discrepancy raises an intriguing point for discussion: why is PexRAP critical for B cell survival under steady-state conditions?

      We thank the referee for all their care and input, and we agree that further intravital analyses could strengthen the work by providing more direct evidence of impairment of GC B cells in vivo. To revise and improve this manuscript before creation of a contribution of record, we performed new experiments to the limit of available funds and have both (i) added these new data and (ii) sharpened the presentation to correct what we believe to be one inaccurate point raised in the review. 

      (A) Specifically, we immunized mice with a B cell-specific depletion of PexRAP (Dhrs7b<sup>D/D-B</sup> mice) and measured a variety of readouts of the GC B cells' physiology in vivo: proliferation by intravital incorporation of BrdU, ROS in the viable GC B cell gate, and their cell death by annexin V staining directly ex vivo. Consistent with the data with in vitro activated B cells, these analyses showed increased ROS (new - Fig. 7D) and higher frequencies of Annexin V<sup>+</sup> 7AAD<sup>+</sup> in GC B cells (GL7<sup>+</sup> CD38<sup>-</sup> B cell-gate) of immunized Dhrs7b<sup>D/D-B</sup> mice compared with WT controls (huCD20-CreERT2<sup>+/-</sup>, Dhrs7b<sup>+/+</sup>)  (new - Fig. 7E). Collectively, these results indicate that PexRAP aids (directly or indirectly) in controlling ROS in GC B cells and reduces B cell death, likely contributing to the substantially decreased overall GC B cell population. These new data are added to the revised manuscript in Figure 7.  

      Moreover, in each of two independent experiments (each comprising 3 vs 3 immunized mice), BrdU<sup>+</sup> events among GL7<sup>+</sup> CD38<sup>-</sup> (GC B cell)-gated cells were reduced in the B cell-specific PexRAP knockouts compared with WT controls (new, Fig. 7F and Supplemental Fig 6E). This result on cell cycle rates in vivo is presented with caution in the revised manuscript text because the absolute labeling fractions were somewhat different in Expt 1 vs Expt 2. This situation affords a useful opportunity to comment on the culture of "P values" and statistical methods. It is intriguing to consider how many successful drugs are based on research published back when the standard was to interpret a result of this sort more definitively despite a merged "P value" that was not a full 2 SD different from the mean. In the optimistic spirit of the eLife model, it can be for the attentive reader to decide from the data (new, Fig. 7F and Supplemental Fig 6E) whether to interpret the BrdU results more strongly that what we state in the revised text.  

      (B) On the issue of whether or not the loss of PexRAP led to perturbations of the lipidome of B cells prior to activation, we have edited the manuscript to do a better job making this point more clear.  

      We point out to readers that in the resting, pre-activation state abnormalities were detected in naive B cells, not just in activated and GC B cells. In brief, the IMS analysis and LC-MS-MS analysis detected statistically significant differences in some, but not all, the ether phospholipids species in PexRAP deficient cells (some of which was in Supplemental Figure 2 of the original version). 

      With this appropriate and helpful concern having been raised, we realize that this important point merited inclusion in the main figures. We point specifically to a set of phosphatidyl choline ions shown in Fig. 3 (revised - panels A, B, D) of the revised manuscript (PC O-36:5; PC O-38:5; PC O-40:6 and -40:7). 

      For this ancillary record (because a discourse on the limitations of each analysis), we will note issues such as the presence of many non-B cells in each pixel of the IMS analyses (so that some or many "true positives" will fail to achieve a "significant difference") and for the naive B cells, differential rates of synthesis, turnover, and conversion (e.g., addition of another 2-carbon unit or saturation / desaturation of one side-chain). To the extent the concern reflects some surprise and perhaps skepticism that what seem relatively limited differences (many species appear unaffected, etc), we share in the sentiment. But the basic observation is that there are differences, and a reasonable connection between the altered lipid profile and evidence of effects on survival or proliferation (i.e., integration of survival and cell cycling / division). 

      Additionally, it would be valuable to evaluate the humoral response in a T-independent setting. This would clarify whether the role of PexRAP is restricted to GC B cells or extends to activated B cells in general. 

      We agree that this additional set of experiments would be nice and would extend work incrementally by testing the generality of the findings about Ab responses. The practical problem is that money and time ran out while testing important items that strengthen the evidence about GC B cells. 

      Finally, the manuscript would benefit from a thorough revision to improve its readability and clarity. Including more detailed descriptions of technical aspects, such as the specific stimuli and time points used in analyses, would greatly enhance the flow and comprehension of the study. Furthermore, the authors should review figure labeling to ensure consistency throughout the manuscript, and carefully cite the relevant references. For instance, S1PR2 CreERT2 mouse is established by Okada and Kurosaki (Shinnakasu et al ,Nat. Immunol, 2016)

      We appreciate this feedback and comment, inasmuch as both the clarity and scholarship matter greatly to us for a final item of record. For the revision, we have given our best shot to editing the text in the hopes of improved clarity, reduction of discrepancies (helpfully noted in the Minor Comments), and further detail-rich descriptions of procedures. We also edited the figure labeling to give a better consistency. While we note that the appropriate citation of Shinnakasu et al (2016) was ref. #69 of the original and remains as a citation, we have rechecked other referencing and try to use citations with the best relevant references.  

      Minor Comments: The labeling of plots in Fig. 2 should be standardized. For example, in Fig. 2C, D, and G, the same mouse strain is used, yet the Cre+ mouse is labeled differently in each plot. 

      We agree and have tried to tighten up these features in the panels noted as well as more generally (e.g., Fig. 4, 5, 6, 7, 9; consistency of huCD20-CreERT2 / hCD20CreERT2).

      According to the text, the results shown in Fig. 1G and H correspond to a full KO  (Dhrs7b^f/f; Rosa26-CreERT2 mice). However, Fig. 1H indicates that the bottom image corresponds to Dhrs7b^f/f, huCD20-CreERT2 mice (Dhrs7bΔ/Δ -B). 

      We have corrected Fig. 1H to be labeled as Dhrs7b<sup>Δ/Δ</sup> (with the data on Dhrs7b<sup>Δ/Δ-B</sup> presented in Supplemental Figure 4A, which is correctly labeled). Thank you for picking up this error that crept in while using copy/paste in preparation of figure panels and failing to edit out the "-B"!  

      Similarly, the gating strategy for GC cells in the text mentions IgD− cells, while the figure legend refers to total viable B cells. These discrepancies need clarification.

      We believe we located and have corrected this issue in the revised manuscript.   

      Figures 3 and 4. The authors claim that B cell expression of PexRAP is required to  achieve normal concentrations of ether phospholipids. 

      Suggestions for Improvement: 

      Lipid Metabolism Analysis: The analysis in Fig. 3 is generally convincing but could be strengthened by including an additional stimulation condition such as anti-IgM plus antiCD40. In Fig. 4C, the authors display results from the full KO model. It would be helpful to include quantitative graphs summarizing the parameters displayed in the images.

      We have performed new experiments (anti-IgM + anti-CD40) and added the data to the revised manuscript (new - Supplemental Fig. 2H and Supplemental Fig 6, D & F). Conclusions based on the effects are not changed from the original. 

      As a semantic comment and point of scientific process, any interpretation ("claim") can - by definition - only be taken to apply to the conditions of the experiment. Nonetheless, it is inescapable that at least for some ether P-lipids of naive, resting B cells, and for substantially more in B cells activated under the conditions that we outline, B cell expression of PexRAP is required. 

      With regards to the constructive suggestion about a new series of lipidomic analyses, we agree that for activated B cells it would be nice and increase insight into the spectrum of conditions under which the PexRAP-deficient B cells had altered content of ether phospholipids. However, in light of the costs of metabolomic analyses and the lack of funds to support further experiments, and the accuracy of the point as stated, we prioritized the experiments that could fit within the severely limited budget. 

      [One can add that our results provide a premise for later work to analyze a time course after activation, and to perform isotopomer (SIRM) analyses with [13] C-labeled acetate or glucose, so as to understand activation-induced increases in the overall   To revise the manuscript, we did however extrapolate from the point about adding BCR cross-linking to anti-CD40 as a variant form of activating the B cells for measurements of ROS, population growth, and rates of division (CTV partitioning). The results of these analyses, which align with and thereby strengthen the conclusions about these functional features from experiments with anti-CD40 but no anti-IgM, are added to Supplemental Fig 2H and Supplemental Fig 6D, F. 

      Figures 5, 6, and 7

      The authors claim that Dhrs7b in B cells shapes antibody affinity and quantity. They use two mouse models for this analysis: huCD20-CreERT2 and Dhrs7b f/f; S1pr2-CreERT2 mice. 

      Suggestions for Improvement:

      Adaptive immune response characterization: A more comprehensive characterization of the adaptive immune response is needed, ideally using the Dhrs7b f/f; S1pr2-CreERT2 model. This should include: Analysis of the GC response in B220+CD138− cells. Class switch recombination analysis. A detailed characterization of centroblasts, centrocytes, and Tfh populations. Characterization of effector cells (plasma cells and memory cells).

      Within the limits of time and money, we have performed new experiments prompted by this constructive set of suggestions. 

      Specifically, we analyzed the suggested read-outs in the huCD20-CreERT2, Dhrs7b<sup>f/f</sup> model after immunization, recognizing that it trades greater signal-noise for the fact that effects are due to a mix of the impact on B cells during clonal expansion before GC recruitment and activities within the GC. In brief, the results showed that 

      (a) the GC B cell population - defined as CD138<sup>neg</sup> GL7<sup>+</sup> CD38<sup>lo/neg</sup> IgD<sup>neg</sup> B cells - was about half as large for PexRAP-deficient B cells net of any early- or preplasmablasts (CD138<sup>+</sup> events) (new - Fig 5G); 

      (b) the frequencies of pre- / early plasmablasts (CD138<sup>+</sup> GL7<sup>+</sup> CD38<sup>neg</sup>) events (see new - Fig. 6H, I; also, new Supplemental Fig 5D) were so low as to make it unlikely that our data with the S1pr2-CreERT2 model (in Fig 7B, C) would be affected meaningfully by analysis of the CD138 levels;

      (c) There was a modest decrease in centrocytes (LZ) but not centroblasts (DZ) (new - Fig 5H, I) - consistent with the immunohistochemical data of Supplemental Fig. 5A-C). 

      Because of time limitations (the "shelf life" of funds and the lab) and insufficient stock of the S1pr2-CreERT2, Dhrs7b<sup>f/f</sup> mice as well as those that would be needed as adoptive transfer recipients because of S1PR2 expression in (GC-)Tfh, the experiments were performed instead with the huCD20-CreERT2, Dhrs7b<sup>f/f</sup> model. We would also note that using this Cre transgene better harmonizes the centrocyte/centroblast and Tfh data with the existing data on these points in Supplemental Fig. 4. 

      (d) Of note, the analyses of Tfh and GC-Tfh phenotype cells using the huCD20-CreERT2 B cell type-specific inducible Cre system to inactivate Dhrs7b (new - Supplemental Fig 1G-I; which, along with new - Supplemental Fig 5E) provide evidence of an abnormality that must stem from a function or functions of PexRAP in B cells, most likely GC B cells. Specifically, it is known that the GC-Tfh population proliferates and is supported by the GC B cells, and the results of B cell-specific deletion show substantial reductions in Tfh cells (both the GC-Tfh gating and the wider gate for plots of CXCR5/PD-1/ fluorescence of CD4 T cells 

      Timepoint Consistency: The NP response (Fig. 5) is analyzed four weeks postimmunization, whereas SRBC (Supp. Fig. 4) and Fig. 7 are analyzed one week or nine days post-immunization. The NP system analysis should be repeated at shorter timepoints to match the peak GC reaction.

      This comment may stem from a misunderstanding. As diagrammed in Fig. 5A, the experiments involving the NP system were in fact measured at 7 d after a secondary (booster) immunization. That timing is approximately the peak period and harmonizes with the 7 d used for harvesting SRBC-immunized mice. So in fact the data with each system were obtained at a similar time point. Of course the NP experiments involved a second immunization so that many plasma cell and Ab responses derived from memory B cells generated by the primary immunization. However, the field at present is dominated by the view that the vast majority of the GC B cells after this second immunization (which historically we perform with alum adjuvant) are recruited from the naive rather than the memory B cell pool. For the revised manuscript, we have taken care that the Methods, Legend, and Figure provide the information to readers, and expanded the statement of a rationale. 

      It may seem a technicality but under NIH regulations we are legally obligated to try to minimize mouse usage. It also behooves researchers to use funds wisely. In line with those imperatives, we used systems that would simultaneously allow analyses of GC B cells, identification of affinity maturation (which is minimal in our hands at a 7 d time point after primary NP-carrier immunization), and a switched repertoire (also minimal), and where with each immunogen the GC were scored at 7-9 d after immunization (9 d refers to the S1pr2-CreERT2 experiments). Apart from the end of funding, we feel that what little might be learned from performing a series of experiments that involve harvests 7 d after a primary immunization with NP-ovalbumin cannot well be justified. 

      In vitro plasma cell differentiation: Quantification is missing for plasma cell differentiation in vitro (Supp. Fig. 4). The stimulus used should also be specified in the figure legend. Given the use of anti-CD40, differentiation towards IgG1 plasma cells could provide additional insights.

      As suggested by reviewer, we have added the results of quantifying the in vitro plasma cell differentiation in Supplemental Fig 6B. Also, we edited the Methods and Supplemental Figure Legend to give detailed information of in vitro stimulation. 

      Proliferation and apoptosis analysis: The observed defects in the humoral response should be correlated with proliferation and apoptosis analyses, including Ki67 and Caspase markers.

      As suggested by the review, we have performed new experiment and analyzed the frequencies of cell death by annexin V staining, and elected to use intravital uptake of BrdU as a more direct measurement of S phase / cell cycling component of net proliferation. The new results are now displayed in Figure 5 and Supplemental Fig. 5. 

      Western blot confirmation: While the authors have demonstrated the absence of PexRAP protein in the huCD20-CreERT2 model, this has not been shown in GC B cells from the Dhrs7b f/f; S1pr2-CreERT2 model. This confirmation is necessary to validate the efficiency of Dhrs7b deletion.

      We were unable to do this for technical reasons expanded on below. For the revision, we have edited in a bit of text more explicitly to alert readers to the potential impact of counter-selection on interpretation of the findings with GC B cells. Before entering the GC, B cells have undergone many divisions, so if there were major pre-GC counterselection, in all likelihood the GC B cells would PexRAP-sufficient. To recap from the original manuscript and the new data we have added, IMS shows altered lipid profiles in the GC B cells and the literature indicates that the lipids are short-lived, requiring de novo resynthesis. The BrdU, ROS, and annexin V data show that GC B cells are abnormal. Accordingly, abnormal GC B cells represent the parsimonious or straightforward interpretation of the new results with GC-Tfh cell prevalence. 

      While we take these findings together to suggest that counterselection (i.e., a Western result showing normal levels of PexRAP in the GC B cells) seems unlikely, it is formally possible and would mean that the in situ defects of GC B cells arose due to environmental influences of the PexRAP-deficient B cells during the developmental history of the WT B cells observed in the GC. 

      Having noted all that, we understand that concerns about counter-selection are an issue if a reader accepts the data showing that mutant (PexRAP-deficient) B cells tend to proliferate less and die more readily. Indeed, one can speculate that were we also to perform competition experiments in which the Ighb, Cd45.2 B cells (WT or Dhrs7b D/D) are mixed with equal numbers of Igha, Cd45.1 competitors, the differences would become much greater. With this in mind, Western blotting of flow-purified GC B cells might give a sense of how much counter-selection has occurred. 

      That said, the Westerns need at least 2.5 x 10<sup>6</sup> B cells (those in the manuscript used five million, 5  x 10<sup>6</sup>) and would need replication. Taken together with the observation that ~200,000 GC B cells (on average) were measured in each B cell-specific knockout mouse after immunization (Fig. 1, Fig 5) and taking into account yields from sorting, each Western would require some 20-25 tamoxifen-injected ___-CreERT2, Dhrs7b f/f mice, and about half again that number as controls. The expiry of funds prohibited the time and costs of generating that many mice (>70) and flow-purified GC B cells. 

      Figure 8

      The authors claim that Dhrs7b contributes to the modulation of ROS, impacting B cell proliferation.

      Suggestions for Improvement:

      GC ROS Analysis: The in vitro ROS analysis should be complemented by characterizing ROS and lipid peroxidation in the GC response using the Dhrs7b f/f; S1pr2-CreERT2 model. Flow cytometry staining with H2DCFDA, MitoSOX, Caspase-3, and Annexin V would allow assessment of ROS levels and cell death in GC B cells. 

      While subject to some of the same practical limits noted above, we have performed new experiments in line with this helpful input of the reviewer, and added the helpful new data to the revised manuscript. Specifically, in addition to the BrdU and phenotyping analyses after immunization of huCD20-CreER<sup>T2</sup>, Dhrs7b<sup>f/f</sup> mice, DCFDA (ROS), MitoSox, and annexin V signals were measured for GC B cells. Although the mitoSox signals did not significantly differ for PexRAP-deficient GCB, the ROS and annexin V signals were substantially increased. We added the new data to Figure 5 and Supplemental Figure 5. Together with the decreased in vivo BrdU incorporation in GC B cells from Dhrs7b<sup>D/D-B</sup> mice, these results are consistent with and support our hypothesis that PexRAP regulates B cell population growth and GC physiology in part by regulating ROS detoxification, survival and proliferation of B cells.  

      Quantification is missing in Fig. 8E, and Fig. 8F should use clearer symbols for better readability. 

      We added quantification for Fig 8E in Supplemental Fig 6E, and edited the symbols in Fig 8F for better readability.

      Figure 9

      The authors claim that Dhrs7b in B cells affects oxidative metabolism and ER mass. The  results in this section are well-performed and convincing.

      Suggestion for Improvement:

      Based on the results, the discussion should elaborate on the potential role of lipids in antigen presentation, considering their impact on mitochondria and ER function.

      We very much appreciate the praise of the tantalizing findings about oxidative metabolism and ER mass, and will accept the encouragement that we add (prudently) to the Discussion section to make note of the points mentioned by the Reviewer, particularly now that (with their encouragement) we have the evidence that B cell-specific loss of PexRAP (with the huCD20-CreERT2 deletion prior to immunization) resulted in decreased (GC-)Tfh and somewhat lower GC B cell proliferation.  

      Reviewer #2 (Recommendations for the authors):

      The authors should investigate whether PexRAP-deficient GC B cells exhibit increased mitochondrial ROS and cell death ex vivo, as observed in in vitro cultured B cells.

      We very much appreciate the work of the referee and their input. We addressed this helpful recommendation, in essence aligned with points from Reviewer 1, via new experiments (until the money ran out) and addition of data to the manuscript. To recap briefly, we found increased ROS in GC B cells along with higher fractions of annexin V positive cells; intriguingly, increased mtROS (MitoSox signal) was not detected, which contrasts with the results in activated B cells in vitro in a small way. To keep the text focused and not stray too far outside the foundation supported by data, this point may align with papers that provide evidence of differences between pre-GC and GC B cells (for instance with lack of Tfam or LDHA in B cells).    

      It remains unclear whether the impaired proliferation of PexRAP-deficient B cells is primarily due to increased cell death. Although NAC treatment partially rescued the phenotype of reduced PexRAP-deficient B cell number, it did not restore them to control levels. Analysis of the proliferation capacity of PexRAP-deficient B cells following NAC treatment could provide more insight into the cause of impaired proliferation.

      To add to the data permitting an assessment of this issue, we performed new experiments in which B cells were activated (BCR and CD40 cross-linking), cultured, and both the change in population and the CTV partitioning were measured in the presence or absence of NAC. The results, added to the revision as Supplemental Fig 6FH, show that although NAC improved cell numbers for PexRAP-deficient cells relative to controls, this compound did not increase divisions at all. We infer that the more powerful effect of this lipid synthesis enzyme is to promote survival rather than division  capacity. 

      Primary antibody responses were assessed at only one time point (day 20). It would be valuable to examine the kinetics of antibody response at multiple time points (0, 1w, 2w, 3w, for example) to better understand the temporal impact of PexRAP on antibody production.

      We thank the reviewer for this suggestion. While it may be that the kinetic measurement of Ag-specific antibody level across multiple time points would provide an additional mechanistic clue into the of impact PexRAP on antibody production, the end of sponsored funding and imminent lab closure precluded performing such experiments.   

      CD138+ cell population includes both GC-experienced and GC-independent plasma cells (Fig. 7). Enumeration of plasmablasts, which likely consists of both PexRAP-deleted and undeleted cells (Fig. 7D and E), may mislead the readers such that PexRAP is dispensable for plasmablast generation. I would suggest removing these data and instead examining the number of plasmablasts in the experimental setting of Fig. 4A (huCD20-CreERT2-mediated deletion) to address whether PexRAP-deficiency affects plasmablast generation. 

      We have eliminated the figure panels in question, since it is accurate that in the absence of a time-stamping or marking approach we have a limited ability to distinguish plasma cells that arose prior to inactivation of the Dhrs7b gene in B cells. In addition, we performed new experiments that were used to analyze the "early plasmablast" phenotype and added those data to the revision (Supplemental Fig 5D).

    1. o

      U legendy by bylo dobré tedy taky předělat na velké začáteční písmeno, až to máme všude stejné: používám tyto kategorie: typ_bydl_mlada_dom <- typ_bydl_mlada_dom |> mutate( byt_upr = case_when( byt_upr == "vlastnické" ~ "Vlastnické", byt_upr == "nájemní" ~ "Nájemní", byt_upr == "družstevní" ~ "Družstevní", byt_upr == "bydleni u příbuzných,\nznámých apod." ~ "Bydleni u příbuzných,\nznámých apod.", TRUE ~ byt_upr # ostatní ponechá beze změny, pro jistotu ), byt_upr = factor(byt_upr, levels = c( "Vlastnické", "Nájemní", "Družstevní", "Bydleni u příbuzných,\nznámých apod." )) )

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Specifically, the authors need to define the DFG conformation using criteria accepted in the field, for example, see https://klifs.net/index.php.

      We thank the reviewer for this suggestion. In the manuscript, we use pseudodihedral and bond angle-based DFG definitions that have been previously established by literature cited in the study (re-iterated below) to unambiguously define the side-chain conformational states of the DFG motif. As we are interested in the specific mechanics of DFG flips under different conditions, we’ve found that the descriptors defined below are sufficient to distinguish between DFG states and allow a more direct comparison with previously-reported results in the literature using different methods.

      We amended the text to be more clear as to those definitions and their choice:

      DFG angle definitions:

      Phe382/Cg, Asp381/OD2, Lys378/O

      Source: Structural Characterization of the Aurora Kinase B "DFG-flip" Using Metadynamics. Lakkaniga NR, Balasubramaniam M, Zhang S, Frett B, Li HY. AAPS J. 2019 Dec 18;22(1):14. doi: 10.1208/s12248-019-0399-6. PMID: 31853739; PMCID: PMC7905835.

      “Finally, we chose the angle formed by Phe382's gamma carbon, Asp381's protonated side chain oxygen (OD2), and Lys378's backbone oxygen as PC3 based on observations from a study that used a similar PC to sample the DFG flip in Aurora Kinase B using metadynamics \cite{Lakkaniga2019}. This angular PC3 should increase or decrease (based on the pathway) during the DFG flip, with peak differences at intermediate DFG configurations, and then revert to its initial state when the flip concludes.”

      DFG pseudodihedral definitions:

      Ala380/Cb, Ala380/Ca, Asp381/Ca, Asp381/Cg

      Ala380/Cb, Ala380/CA, Phe382/CA, Phe382Cg

      Source: Computational Study of the “DFG-Flip” Conformational Transition in c-Abl and c-Src Tyrosine Kinases. Yilin Meng, Yen-lin Lin, and Benoît Roux The Journal of Physical Chemistry B 2015 119 (4), 1443-1456 DOI: 10.1021/jp511792a

      “For downstream analysis, we used two pseudodihedrals previously defined in the existing Abl1 DFG flip simulation literature \cite{Meng2015} to identify and discriminate between DFG states. The first (dihedral 1) tracks the flip state of Asp381, and is formed by the beta carbon of Ala380, the alpha carbon of Ala380, the alpha carbon of Asp381, and the gamma carbon of Asp381. The second (dihedral 2) tracks the flip state of Phe382, and is formed by the beta carbon of Ala380, the alpha carbon of Ala380, the alpha carbon of Phe381, and the gamma carbon of Phe381. These pseudodihedrals, when plotted in relation to each other, clearly distinguish between the initial DFG-in state, the target DFG-out state, and potential intermediate states in which either Asp381 or Phe381 has flipped.”

      Convergence needs to be demonstrated for estimating the population difference between different conformational states.

      We agree that demonstrating convergence is important for accurate estimations of population differences between conformational states. However, as the DFG flip is a complex and concerted conformational change with an energy barrier of 30 kcal/mol [1], and considering the traditional limitations of methods like weighted ensemble molecular dynamics (WEMD), it would take an unrealistic amount of GPU time (months) to observe convergence in our simulations. As discussed in the text (see examples below), we caveat our energy estimations by explicitly mentioning that the state populations we report are not converged and are indicative of a much larger energy barrier in the mutant.

      “These relative probabilities qualitatively agree with the large expected free energy barrier for the DFG-in to DFG-out transition (~32 kcal/mol), and with our observation of a putative metastable DFG-inter state that is missed by NMR experiments due to its low occupancy.”

      “As an important caveat, it is unlikely that the DFG flip free energy barriers of over 70 kcal/mol estimated for the Abl1 drug-resistant variants quantitatively match the expected free energy barrier for their inactivation. Rather, our approximate free energy barriers are a symptom of the markedly increased simulation time required to sample the DFG flip in the variants relative to the wild-type, which is a strong indicator of the drastically reduced propensity of the variants to complete the DFG flip. Although longer WE simulations could allow us to access the timescales necessary for more accurately sampling the free energy barriers associated with the DFG flip in Abl1's drug-resistant compound mutants, the computational expense of running WE for 200 iterations is already large (three weeks with 8 NVIDIA RTX3900 GPUs for one replicate); this poses a logistical barrier to attempting to sample sufficient events to be able to fully characterize how the reaction path and free energy barrier change for the flip associated with the mutations. Regardless, the results of our WE simulations resoundingly show that the Glu255Lys/Val and Thr315Ile compound mutations drastically reduce the probability for DFG flip events in Abl1.”

      (1) Conformational states dynamically populated by a kinase determine its function. Tao Xie et al., Science 370, eabc2754 (2020). DOI:10.1126/science.abc2754

      The DFG flip needs to be sampled several times to establish free energy difference.

      Our simulations have captured thousands of correlated and dozens of uncorrelated DFG flip events. The per-replicate free energy differences are computed based on the correlated transitions. Please consult the WEMD literature (referenced below and in the manuscript, references 34 and 36) for more information on how WEMD allows the sampling of multiple such events and subsequent estimation of probabilities:

      Zuckermann et al (2017) 10.1146/annurev-biophys-070816-033834

      Chong et al (2021) 10.1021/acs.jctc.1c01154

      The free energy plots do not appear to show an intermediate state as claimed.

      Both the free energy plots and the representative/anecdotal trajectories analyzed in the study show a saddle point when Asp381 has flipped but Phe382 has not (which defines the DFG-inter state), we observe a distinct change in probability when going to the pseudodihedral values associated with DFG-inter to DFG-up or DFG-out. We removed references to the putative state S1 as we we agree with the reviewer that its presence is unlikely given the data we show.

      The trajectory length of 7 ns in both Figure 2 and Figure 4 needs to be verified, as it is extremely short for a DFG flip that has a high free energy barrier.

      We appreciate this point. To clarify, the 7 ns segments corresponds to a collated trajectory extracted from the tens of thousands of walkers that compose the WEMD ensemble, and represent just the specific moment at which the dihedral flips occur rather than the entire flip process. On average, our WEMD simulations sample over 3 us of aggregate simulation time before the first DFG flip event is observed, in line with a high energy barrier. This is made clear in the manuscript excerpt below: “Over an aggregate simulation time of over 20 $\mu$s, we have collected dozens of uncorrelated and unbiased inactivation events, starting from the lowest energy conformation of the Abl1 kinase core (PDB 6XR6) \cite{Xie2020}.”

      The free energy scale (100 kT) appears to be one order of magnitude too large.

      As discussed in the text and quoted in response to comment 2, the exponential splitting nature of WEMD simulations (where the probability of individual walkers are split upon crossing each bin threshold) often leads to unrealistically high energy barriers for rare events. This is not unexpected, and as discussed in the text, we consider that value to be a qualitative measurement of the decreased probability of a DFG flip in Abl1 mutants, and not a direct measurement of energy barriers.

      Setting the DFG-Asp to the protonated state is not justified, because in the DFG-in state, the DFG-Asp is clearly deprotonated.

      According to previous publications, DFG-Asp is frequently protonated in the DFG-in state of Abl1 kinase. For instance, as quoted from Hanson, Chodera, et al., Cell Chem Bio (2019), “C onsistent with previous simulations on the DFG-Asp-out/in interconversion of Abl kinase we only observe the DFG flip with protonated Asp747 ( Shan et al., 2009 ). We showed previously that the pKa for the DFG-Asp in Abl is elevated at 6.5.”

      Finally, the authors should discuss their work in the context of the enormous progress made in theoretical studies and mechanistic understanding of the conformational landscape of protein kinases in the last two decades, particularly with regard to the DFG flip. and The study is not very rigorous. The major conclusions do not appear to be supported. The claim that it is the first unbiased simulation to observe DFG flip is not true. For example, Hanson, Chodera et al (Cell Chem Biol 2019), Paul, Roux et al (JCTC 2020), and Tsai, Shen et al (JACS 2019) have also observed the DFG flip.

      We thank the reviewer for pointing out these issues. We have revised the manuscript to better contextualize our claims within the limitations of the method and to acknowledge previous work by Hanson, Chodera et al., Paul, Roux et al., and Tsai, Shen et al.

      The updated excerpt is described below

      “Through our work, we have simulated an ensemble of DFG flip pathways in a wild-type kinase and its variants with atomistic resolution and without the use of biasing forces, also reporting the effects of inhibitor-resistant mutations in the broader context of kinase inactivation likelihood with such level of detail. “

      Reviewer #2:

      I appreciated the discussion of the strengths/weaknesses of weighted ensemble simulations. Am I correct that this method doesn't do anything to explicitly enhance sampling along orthogonal degrees of freedom? Maybe a point worth mentioning if so.

      Yes, this is correct. We added a sentence to WEMD summary section of Results and Discussion discussing it.

      “As a supervised enhanced sampling method, WE employs progress coordinates (PCs) to track the time-dependent evolution of a system from one or more basis states towards a target state. Although weighted ensemble simulations are unbiased in the sense that no biasing forces are added over the course of the simulations, the selection of progress coordinates and the bin definitions can potentially bias the results towards specific pathways \cite{Zuckerman2017}. Additionally, traditional WEMD simulations do not explicitly enhance sampling along orthogonal degrees of freedom (those not captured by the progress coordinates). In practice, this means that insufficient PC definitions can lead to poor sampling.”

      I don't understand Figure 3C. Could the authors instead show structures corresponding to each of the states in 3B, and maybe also a representative structure for pathways 1 and 2?

      We have remade Figure 3. We removed 3B and accompanying discussion as upon review we were not confident on the significance of the LPATH results where it pertains to the probability of intermediate states. We replaced 3B with a summary of the pathways 1 and 2 in regards to the Phe382 flip (which is the most contrasting difference).

      Why introduce S1 and DFG-inter? And why suppose that DFG-inter is what corresponds to the excited state seen by NMR?

      As a consequence of dropping the LPATH analysis, we also removed mentions to S1 as it further analysis made it hard to distinguish from DFG-in, For DFG-inter, we mention that conformation because (a) it is shared by both flipping mechanisms that we have found, and (b) it seems relevant for pharmacology, as it has been observed in other kinases such as Aurora B (PDB 2WTV), as Asp381 flipping before Phe382 creates space in the orthosteric kinase pocket which could be potentially targeted by an inhibitor.

      It would be nice to have error bars on the populations reported in Figure 3.

      Agreed, upon review we decided do drop the populations as we were not confident on the significance of the LPATH results where it pertains to the probability of intermediate states.

      I'm confused by the attempt to relate the relative probabilities of states to the 32 kca/mol barrier previously reported between the states. The barrier height should be related to the probability of a transition. The DFG-out state could be equiprobable with the DFG-in state and still have a 32 kcal/mol barrier separating them.

      Thanks for the correction, we agree with the reviewer and have amended the discussion to reflect this. Since we are starting our simulations in the DFG-in state, the probability of walkers arriving in DFG-out in our steady state WEMD simulations should (assuming proper sampling) represent the probability of the transition. We incorrectly associated the probability of the DFG-out state itself with the probability of the transition.

      How do the relative probabilities of the DFG-in/out states compare to experiments, like NMR?

      Previous NMR work has found the population of apo DFG in (PDB 6XR6) in solution to be around 88% for wild-type ABL1, and 6% for DFG out (PDB 6XR7). The remaining 6% represents post-DFG-out state (PDB 6XRG) where the activation loop has folded in near the hinge, which we did not simulate due to the computational cost associated with it. The same study reports the barrier height from DFG-in to DFG-out to be estimated at around 30 kcal/mol.

      (1) Conformational states dynamically populated by a kinase determine its function. Tao Xie et al., Science 370, eabc2754 (2020). DOI:10.1126/science.abc2754

      (we already have that in the text, just need to quote here)

      “Do the staggered and concerted DFG flip pathways mentioned correspond to pathways 1 and 2 in Figure 3B, or is that a concept from previous literature?”

      Yes, we have amended Figure 3B to be clearer. In previous literature both pathways have been observed [1], although not specifically defined.

      Source: Computational Study of the “DFG-Flip” Conformational Transition in c-Abl and c-Src Tyrosine Kinases. Yilin Meng, Yen-lin Lin, and Benoît Roux The Journal of Physical Chemistry B 2015 119 (4), 1443-1456 DOI: 10.1021/jp511792a

    1. AbstractBackground Technological advances in sequencing and computation have allowed deep exploration of the molecular basis of diseases. Biological networks have proven to be a useful framework for interrogating omics data and modeling regulatory gene and protein interactions. Large collaborative projects, such as The Cancer Genome Atlas (TCGA), have provided a rich resource for building and validating new computational methods resulting in a plethora of open-source software for downloading, pre-processing, and analyzing those data. However, for an end-to-end analysis of regulatory networks a coherent and reusable workflow is essential to integrate all relevant packages into a robust pipeline.Findings We developed tcga-data-nf, a Nextflow workflow that allows users to reproducibly infer regulatory networks from the thousands of samples in TCGA using a single command. The workflow can be divided into three main steps: multi-omics data, such as RNA-seq and methylation, are downloaded, preprocessed, and lastly used to infer regulatory network models with the netZoo software tools. The workflow is powered by the NetworkDataCompanion R package, a standalone collection of functions for managing, mapping, and filtering TCGA data. Here we show how the pipeline can be used to study the differences between colon cancer subtypes that could be explained by epigenetic mechanisms. Lastly, we provide pre-generated networks for the 10 most common cancer types that can be readily accessed.Conclusions tcga-data-nf is a complete yet flexible and extensible framework that enables the reproducible inference and analysis of cancer regulatory networks, bridging a gap in the current universe of software tools.

      This work has been peer reviewed in GigaScience (see https://doi.org/10.1093/gigascience/giaf126), which carries out open, named peer-review. These reviews are published under a CC-BY 4.0 license and were as follows:

      Reviewer 2: Jérôme Salignon

      This manuscript presents tcga-data-nf, a Nextflow-based pipeline for downloading, preprocessing, and analyzing TCGA multi-omic data, with a focus on gene regulatory network (GRN) inference. The workflow integrates established bioinformatics tools (PANDA, DRAGON, and LIONESS) and adheres to best practices for reproducibility through containerization (Docker, Conda, and Nextflow profiles). The authors demonstrate the utility of their pipeline by applying it to colorectal cancer subtypes, identifying potential regulatory interactions in TGF-β signaling. The manuscript is well-written and well-structured and provides sufficient methodological details, as well as Jupyter notebooks, for reproducibility. However, there are some areas that require clarification and improvement for acceptance in GigaScience, particularly regarding the scope of the tool, the quality of the inferred regulatory networks, the case study figure, benchmarking, statistical validation, and parameters.

      Major comments:

      • While the pipeline is well designed and executed, the overall impact of the tool feels somewhat limited, especially for a journal like GigaScience, due to its pretty specific application to building GRNs in TCGAs, the relatively small number of parameters, the support of only 2 omics type, and the lack of novel algorithms. To increase the impact of this tool I would recommend adding functionalities, such as:

      o Supporting additional tools. A great strength of the pipeline is the integration with the Network Zoo (NetZoo) ecosystem. However, only three tools are included from NetZoo. Including additional tools would likely increase the scope of users interested in using the pipeline. In particular, an important weakness of the current pipeline is that it is not possible to conduct differential analysis between different networks, which prevents users from identifying the most significant differences between two networks of interest (e.g., CMS2 vs CMS4). The NetZoo contains different tools to conduct such analyses, such as Alpaca 1 or Crane 2, thus this may be implemented to make the pipeline more useful to a broader user base.

      o Adding parameters. A strength of the pipeline is the ability to customize it using various parameters. However, as such the pipeline does not offer many parameters. It would be beneficial to make the pipeline a bit more customizable. For example, novel parameters could be: adding options for excluding selected samples, using different batch correction methods, different methods to map CpGs to genes, additional normalization methods, and additional quality controls (e.g., PCA for methylation samples, md5sum checks). These are just examples and do not need to be all implemented but adding some extra parameters would help make the pipeline more appealing and customizable to various users.

      • The quality of the inferred regulatory networks is hard to judge. There are no direct comparisons with any other tools.

      o For instance, it is mentioned in the text that GRAND networks were derived using a fixed set of parameters, but it could be helpful to show a direct comparison between GRNs built from your tools with those from GRAND. This could reveal how the ability to customize GRNs using the pipeline's parameters helps in getting better biological insights.

      o Alternatively, or in addition, one could compare how networks built by your method fare in comparison to networks built from other methods, like RegEnrich 3 or NetSeekR 4, in terms of biological insights, accuracy, scalability, speed, functionalities and/or memory usage.

      o Another angle to judge the regulatory networks would be to check in a case study if the predicted gene interactions between disease and control networks are enriched in disease and gene-gene interactions databases, such as DisGeNet 5.

      • Figure 2 needs re-work:

      o Panel A and C: text is too small. "tf" should be written TF. "oi" should have another name. These panels might be moved to the supplements.

      o Panel D is confusing. Without significance it is hard to understand what the point of this panel is. I can see that certain TFs are cited in the main text but without information about significance, these may seem like cherry-picking. The legends states: Annotation of all TFs in cluster D (columns) to the Reactome parent term. "Immune system" and "Cellular respondes to stimuli" are more consistenly involved in cluster D, in comparison to cluster A.. However, this is a key result which should be shown in a main figure, not in Figure S6. I would also recommend using a -log scale when displaying the p-values to highlight the most significant entries.

      o Panel E is quite confusing; first, the color coding is unclear. For instance, what represents blue, purple and red colors? Second, what represents the edges' widths? I would recommend using different shapes for the methylation and expression nodes to reduce the number of colors, and adding a color legend. I would also consider merging the two graphs and representing in color the difference in the edge values so the reader can directly see the key differences.

      • Benchmarking analysis could be included to show the runtime and memory requirement for each pipeline step. It would also be beneficial to analyze a larger dataset than colon cancer to assess the scalability.

      • Statistical analysis: If computationally feasible, permutation testing could be implemented to quantify the robustness of inferred regulatory interactions. Also, in the method section, it should be clarified that FDR correction was applied for pathway enrichment analysis.

      Minor comments:

      • I am not sure why duplicate samples are discarded in the pipeline. Why not add counts for RNA-Seq and averaging beta values? I would expect that to yield more robust results.

      • It is a bit unclear in what context the NetworkDataCompanion tool could be used outside the workflow. It is also unclear how it helps with quality controls. Please clarify these aspects.

      • The manuscript is well-written, but words are sometimes missing or wrongly written, it needs careful re-read.

      • The expression '"same-same"' is unclear to me.

      • In this sentence: "Some of "same-same" genes (STAT5A, CREB3L1"…, I am not sure in which table or figure I can find this result?

      • Text is too small in the Directed Acyclic Graph, especially in Figure S4. Also, I would recommend adding the Directed Acyclic Graphs from Figure S1-S4 to the online documentation.

      • Regarding the code, I was puzzled to see a copyConfigFiles process. Also, there are files in bin/r/local_assets, these should be located in assets. And the container for the singularity and docker profile is likely the same, this should be clarified in the code.

      • It is recommended to remove the "defaults" channel from the list of channels declared in the containers/conda_envs/analysis.yml file. Please see information about that here https://www.anaconda.com/blog/is-conda-free and here https://www.theregister.com/2024/08/08/anaconda_puts_the_squeeze_on/.

      Additional comments (which do not need to be addressed):

      • Future work may consider enabling the use of the pipeline to build GRNs from other data sources than TCGA (i.e., nf-netzoo). Recount3 data is already being parsed for GTEx and TCGA samples, so it might be relatively easy to adapt the pipeline so that it can be used on any arbitrary recount3 dataset. Similarly, it could be useful if one could specify a dataset on the recountmethylation database 6 to build GRNs. While these unimodal datasets could not be used with the DRAGON method they would still benefit from all other features of the pipeline.

      • Using a nf-core template would enable better structure of the code and increase the visibility of the tool. Also using multiple containers is usually easier to maintain and update than a single large container, especially when a single tool needs to be updated or when modifying part of the pipeline. Another comment is that the code contains many comments which are not to explain the code but more like quick draft which makes the code harder to read by others.

      References 1. Padi, M., and Quackenbush, J. (2018). Detecting phenotype-driven transitions in regulatory network structure. npj Syst Biol Appl 4, 1-12. https://doi.org/10.1038/s41540-018-0052-5. 2. Lim, J.T., Chen, C., Grant, A.D., and Padi, M. (2021). Generating Ensembles of Gene Regulatory Networks to Assess Robustness of Disease Modules. Front. Genet. 11. https://doi.org/10.3389/fgene.2020.603264. 3. Tao, W., Radstake, T.R.D.J., and Pandit, A. (2022). RegEnrich gene regulator enrichment analysis reveals a key role of the ETS transcription factor family in interferon signaling. Commun Biol 5, 1-12. https://doi.org/10.1038/s42003-021-02991-5. 4. Srivastava, H., Ferrell, D., and Popescu, G.V. (2022). NetSeekR: a network analysis pipeline for RNA-Seq time series data. BMC Bioinformatics 23, 54. https://doi.org/10.1186/s12859-021-04554-1. 5. Hu, Y., Guo, X., Yun, Y., Lu, L., Huang, X., and Jia, S. (2025). DisGeNet: a disease-centric interaction database among diseases and various associated genes. Database 2025, baae122. https://doi.org/10.1093/database/baae122. 6. Maden, S.K., Walsh, B., Ellrott, K., Hansen, K.D., Thompson, R.F., and Nellore, A. (2023). recountmethylation enables flexible analysis of public blood DNA methylation array data. Bioinformatics Advances 3, vbad020. https://doi.org/10.1093/bioadv/vbad020.