10,000 Matching Annotations
  1. Nov 2025
    1. Author Response

      Reviewer #1 (Public Review):

      The authors convincingly show in this study the effects of the fas5 gene on changes in the CHC profile and the importance of these changes toward sexual attractiveness.

      The main strength of this study lies in its holistic approach (from genes to behaviour) showing a full and convincing picture of the stated conclusions. The authors succeeded in putting a very interdisciplinary set of experiments together to support the main claims of this manuscript.

      We appreciate the kind comments from the reviewer.

      The main weakness stems from the lack of transparency behind the statistical analyses conducted in the study. Detailed statistical results are never mentioned in the text, nor is it always clear what was compared to what. I also believe that some tests that were conducted are not adequate for the given data. I am therefore unable to properly assess the significance of the results from the presented information. Nevertheless, the graphical representations are convincing enough for me to believe that a revision of the statistics would not significantly affect the main conclusions of this manuscript.

      We apologize for neglecting a detailed description of statistical tests that were performed. We wrote additional paragraphs in the method part specifically explaining the statistical analyses (line 435-445; 489-502; 559-561; 586-591).

      The second major problem I had with the study was how it brushes over the somewhat contradicting results they found in males (Fig S2). These are only mentioned twice in the main text and in both cases as being "similarly affected", even though their own stats seem to indicate otherwise for many of the analysed compound groups. This also should affect the main conclusion concerning the effects of fas5 genes in the discussion, a more careful wording when interpreting the results is therefore necessary.

      Thank you for pointing this out. Though our focus clearly lay on the female CHC profiles as a function in sexual signaling has only been described thus far for them, we now elaborated the result and discussion for the fas5 RNAi male part (line 167-178; 258-268).

      Reviewer #2 (Public Review):

      Insects have long been known to use cuticular hydrocarbons for communication. While the general pathways for hydrocarbon synthesis have been worked out, their specificity and in particular the specificity of the different enzymes involved is surprisingly little understood. Here, the authors convincingly demonstrate that a single fatty acid synthase gene is responsible for a shift in the positions of methyl groups across the entire alkane spectrum of a wasp, and that the wasps males recognize females specifically based on these methyl group positions. The strength of the study is the combination of gene expression manipulations with behavioural observations evaluating the effect of the associated changes in the cuticular hydrocarbon profiles. The authors make sure that the behavioural effect is indeed due to the chemical changes by not only testing life animals, but also dead animals and corpses with manipulated cuticular hydrocarbons.

      I find the evidence that the hydrocarbon changes do not affect survival and desiccation resistance less convincing (due to the limited set of conditions and relatively small sample size), but the data presented are certainly congruent with the idea that the methyl alkane changes do not have large effects on desiccation.

      We appreciate the kind comments from the reviewer.

      Reviewer #3 (Public Review):

      In this manuscript, the authors are aiming to demonstrate that a fatty-acyl synthase gene (fas5) is involved in the composition of the blend of surface hydrocarbons of a parasitoid wasp and that it affects the sexual attractiveness of females for males. Overall, the manuscript reads very well, it is very streamlined, and the authors' claims are mostly supported by their experiments and observations.

      We appreciate the kind comments from the reviewer.

      However, I find that some experiments, information and/or discussion are absent to assess how the effects they observe are, at least in part, not due to other factors than fas5 and the methyl-branched (MB) alkanes. I'm also wondering if what the authors observe is only a change in the sexual attractiveness of females and not related to species recognition as well.

      We appreciate the interesting point that the reviewer raises in sexual attractiveness and species recognition and now expand upon this potential aspect in the discussion (lines 327-330). However, in this manuscript, we very much focused on the effect of fas5 knockdown on the conveyance of female sexual attractiveness in a single species (Nasonia vitripennis). Therefore, we argue that species recognition constitutes a different communication modality here, and we currently cannot infer whether and how species recognition is exactly encoded in Nasonia CHC profiles despite some circumstantial evidence for species-specificity (Buellesbach et al. 2013; Mair et al. 2017). Thus, we would like to refrain from any further speculation on species recognition before this can be unambiguously demonstrated, and remain within the mechanism of sexual attractiveness within a single species which we clearly show is mediated by the female MB-alkane fraction governed by the fatty acid synthase genes. We however still consider potential alternative explanations (e.g., n-alkenes acting as a deterrent of homosexual mating attempts).

      The authors explore the function of cuticular hydrocarbons (CHCs) and a fatty-acyl synthase in Nasonia vitripennis, a parasitic wasp. Using RNAi, they successfully knockdown the expression of the fas5 gene in wasps. The authors do not justify their choice of fatty-acyl synthase candidate gene. It would have been interesting to know if that is one of many genes they studied or if there was some evidence that drove them to focus their interest in fas5.

      In a previous study, 5 fas candidate genes orthologous to Drosophila melanogaster fas genes were identified and mapped in the genome of Nasonia vitripennis (Buellesbach et al. 2022). We actually investigated the effects of all of these fas genes on CHC variation, but only fas5 led to such a striking, traceable pattern shift. We are currently preparing another manuscript discussing the effects of the other fas genes, but decided to focus exclusively on fas5 here, due to its significance for revealing how sexual attractiveness can be encoded and conveyed in complex chemical profiles, maintained and governed by a surprisingly simple genetic basis.

      The authors observe large changes in the cuticular hydrocarbons (CHC) profile of male and females. These changes are mostly a reduction of some MB alkanes and an increase in others as well as an increase of n-alkene in fas5 knockdown females. For males fas5 knockdowns, the overall quantity of CHC is increased and consequently, multiple types of compounds are increased compared to wild-type, with only one compound appearing to decrease compared to wild-type. Insects are known to rely on ratios of compounds in blends to recognize odors. Authors address this by showing a plot of the relative ratios, but it seems to me that they do show statistical tests of those changes in the proportions of the different types of compounds. In the results section, the authors give percentages while referring to figures showing the absolute amount of CHCs. They should also test if the ratios are significantly different or not between experimental conditions. Similar data should be displayed for the males as well.

      We appreciate your suggestions. We kindly refer you to our response to reviewer 1, where we addressed the statistical tests. Specifically, we generated separate subplots to display the proportions of different compound classes and performed statistical tests to compare these proportions between different treatments for both males and females. Additionally, we have revised the results section to replace relative abundances with absolute quantity, as depicted in Figure 2C-G.

      Furthermore, the authors didn't use an internal standard to measure the quantity of CHCs in the extracts, which, to me, is the gold standard in the field. If I understood correctly, the authors check the abundance measured for known quantities of n-alkanes. I'm sure this method is fine, but I would have liked to be reassured that the quantities measured through this method are good by either testing some samples with an internal standard, or referring to work that demonstrates that this method is always accurate to assess the quantities of CHC in extracts of known volumes.

      We actually did include 7,5 ng/μl dodecane (C12) as an “internal” standard in the hexane resuspensions of all of our processed samples (line 456, Materials and Methods). This was primarily done to allow for visually inspecting and comparing the congruence of all chromatograms in the subsequent data analysis and immediately detect any variation from sample preparation, injection process and instrument fluctuation. In our study, we have a very elaborate and standardized CHC extraction method that the volume of solvent and duration for extraction are strictly controlled to minimize the variation from sample preparation steps. Furthermore, we calibrated each individual CHC compound quantity with a dilution series of external standards (C21-C40) of known concentration. By constructing a calibration curve based on this dilution series, we achieved the most accurate compound quantification, also taking into account and counteracting the generally diminishing quantities of compounds with higher chain lengths.

      The authors provide a sensible control for their RNAi experiments: targeting an unrelated gene, absent in N. vitripennis (the GFP). This allows us to see if the injection of RNAi might affect CHC profiles, which it appears to do in some cases in males, but not in females. The authors also show to the reader that their RNAi experiments do reduce the expression of the target gene. However, one of the caveats of their experiments, is that the authors don't provide evidence or information to allow the (non-expert) reader to assess whether the fas5 RNAi experiments did affect the expression of other fatty-acyl synthase genes. I'm not an expert in RNAi, so maybe this suggestion is not relevant, but it should, at least, be addressed somewhere in the manuscript that such off-target effects are very unlikely or impossible, in that case, or more generally.

      We acknowledge the reviewer’s concern about potential off-target effect of the fas5 knockdown. We actually did check initially for off-target effects on the other four previously published fas genes in N. vitripennis (Lammers et al. 2019; Buellesbach et al. 2022) and did not find any effects on their respective expressions. We now include these results as supplementary data (Figure 2-figure supplement 1). However, as mentioned in the cover letter to the editor, we discovered a previously uncharacterized fas gene in the most recent N. vitripennis genome assembly (NC_045761.1), fas6, most likely constituting a tandem gene duplication of fas5. These two genes turned out to have such high sequence similarity (> 90 %, Figure 2-figure supplement 2) that both were simultaneously downregulated by our fas5 dsRNAi construct, which we confirmed with qPCR and now incorporated into our manuscript (Fig. 2H). Therefore, we now explicitly mention that the knockdown affects both genes, and either one or both could have the observed phenotypic effects. Recognizing this RNAi off-target effect, we have now also incorporated a discussion of this issue in the appropriate section of the manuscript (line 364-377), as well as the potential off-target effects of our GFP dsRNAi controls (line 262-274).

      The authors observe that the modified CHCs profiles of RNAi females reduce courtship and copulation attempts, but not antennation, by males toward live and (dead) dummy females. They show that the MB alkanes of the CHC profile are sufficient to elicit sexual behaviors from males towards dummy females and that the same fraction from extracts of fas5 knockdown females does so significantly less. From the previous data, it seems that dummy females with fas5 female's MB alkanes profile elicit more antennation than CHC-cleared dummy females, but the authors do not display data for this type of target on the figure for MB alkane behavioral experiments.

      Actually similar proportions of males performed antennation behavior towards female dummies with MB alkane fraction of fas5 RNAi females and CHC-cleared female dummies (55% and 50%, respectively, see Author response image 1 for the corresponding parts of the sub-figures 3 E and 4 D). We did not deem it necessary to show the same data on CHC-cleared female dummies in Figure 3 as well.

      Author response image 1.

      Unfortunately, the authors don't present experiments testing the effect of the non-MB alkanes fractions of the CHC extracts on male behavior toward females. As such, they are not able to (and didn't) conclude that the MB-alkane is necessary to trigger the sexual behaviors of males. I believe testing this would have significantly enhanced the significance of this work. I would also have found it interesting for the authors to comment on whether they observe aggressive behavior of males towards females (live or dead) and/or whether such behavior is expected or not in inter-individual interactions in parasitoids wasps.

      In our experiment, we focus on the function of the MB-alkane fraction in female CHC profiles, and we comprehensibly demonstrate in figure 4 that the MB-alkane fraction from WT females alone is sufficient to trigger mating behavior coherent with that on alive and untreated female dummies. Therefore, we do not completely understand the reviewer’s concern about us not being ” able to (and didn't) conclude that the MB-alkane is necessary to trigger the sexual behaviors of males”. We appreciate the suggestion from the reviewer of testing the non-MB alkanes (n-alkanes and n-alkenes). However, due to the experimental procedure of separating the CHC compound class fractions through elution with molecular sieves, it was not possible for us to retrieve either the whole n-alkane or n-alkene fraction remaining bound to the sieves after separation). The role of n-alkenes in N. vitripennis is however considered in the discussion, as a deterrent for homosexual interactions between males (Wang et al. 2022a). Moreover, we did not observe aggressive behavior of males towards live or dead females.

      CHCs are used by insects to signal and/or recognize various traits of targets of interest, including species or groups of origin, fertility, etc. The authors claim that their experiments show the sexual attractiveness of females can be encoded in the specific ratio of MB alkanes. While I understand how they come to this conclusion, I am somewhat concerned. The authors very quickly discuss their results in light of the literature about the role of CHCs (and notably MB alkanes) in various recognition behaviors in Hymenoptera, including conspecific recognition. Previous work (cited by the authors) has shown that males recognize males from females using an alkene (Z9C31). As such, it remains possible that the "sexual attractiveness" of N. vitripennis females for males relies on them not being males and being from the right species as well. The authors do not address the question of whether the CHCs (and the MB alkanes in particular) of females signal their sex or their species. While I acknowledge that responding to this question is beyond the scope of this work, I also strongly believe that it should be discussed in the manuscript. Otherwise, non-specialist readers would not be able to understand what I believe is one of the points that could temper the conclusions from this work.

      We acknowledge the reviewer’s insight about the MB alkanes in signaling sex or species in N. vitripennis, and now include this aspect in our revised discussion (line 324-330). Moreover, we clearly demonstrate that n-alkenes have been reduced to minute trace components after our compound class separation, and the males still do not display courtship and copulation behaviors similar to WT females, thus strongly indicating that the n-alkenes do not play a role when relying solely on the changed MB-alkane patterns, further strengthening our main argument.

      References

      Benjamini, Y. and D. Yekutieli. 2001. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29:1165-1188.

      Buellesbach, J., J. Gadau, L. W. Beukeboom, F. Echinger, R. Raychoudhury, J. H. Werren, and T. Schmitt. 2013. Cuticular hydrocarbon divergence in the jewel wasp Nasonia: Evolutionary shifts in chemical communication channels? J. Evol. Biol. 26:2467-2478.

      Buellesbach, J., C. Greim, and T. Schmitt. 2014. Asymmetric interspecific mating behavior reflects incomplete prezygotic isolation in the jewel wasp genus Nasonia. Ethology 120:834-843.

      Buellesbach, J., H. Holze, L. Schrader, J. Liebig, T. Schmitt, J. Gadau, and O. Niehuis. 2022. Genetic and genomic architecture of species-specific cuticular hydrocarbon variation in parasitoid wasps. Proc. R. Soc. B 289:20220336.

      Engl, T., N. Eberl, C. Gorse, T. Krüger, T. H. P. Schmidt, R. Plarre, C. Adler, and M. Kaltenpoth. 2018. Ancient symbiosis confers desiccation resistance to stored grain pest beetles. Mol. Ecol. 27:2095-2108.

      Ferveur, J. F., J. Cortot, K. Rihani, M. Cobb, and C. Everaerts. 2018. Desiccation resistance: effect of cuticular hydrocarbons and water content in Drosophila melanogaster adults. Peerj 6.

      Lammers, M., K. Kraaijeveld, J. Mariën, and J. Ellers. 2019. Gene expression changes associated with the evolutionary loss of a metabolic trait: lack of lipogenesis in parasitoids. BMC Genom. 20:309.

      Mair, M. M., V. Kmezic, S. Huber, B. A. Pannebakker, and J. Ruther. 2017. The chemical basis of mate recognition in two parasitoid wasp species of the genus Nasonia. Entomol. Exp. Appl. 164:1-15.

      Wang, Y., W. Sun, S. Fleischmann, J. G. Millar, J. Ruther, and E. C. Verhulst. 2022a. Silencing Doublesex expression triggers three-level pheromonal feminization in Nasonia vitripennis males. Proc. R. Soc. B 289:20212002.

      Wang, Z., J. P. Receveur, J. Pu, H. Cong, C. Richards, M. Liang, and H. Chung. 2022b. Desiccation resistance differences in Drosophila species can be largely explained by variations in cuticular hydrocarbons. eLife 11:e80859.

    1. Author Response:

      Reviewer #1 (Public Review):

      The manuscript provides very high quality single-cell physiology combined with population physiology to reveal distinctives roles for two anatomically dfferent LN populations in the cockroach antennal lobe. The conclusion that non-spiking LNs with graded responses show glomerular-restricted responses to odorants and spiking LNs show similar responses across glomeruli generally supported with strong and clean data, although the possibility of selective interglomerular inhibition has not been ruled out. On balance, the single-cell biophysics and physiology provides foundational information useful for well-grounded mechanistic understanding of how information is processed in insect antennal lobes, and how each LN class contributes to odor perception and behavior.

      Thank you for this positive feedback.

      Reviewer #2 (Public Review):

      The manuscript "Task-specific roles of local interneurons for inter- and intraglomerular signaling in the insect antennal lobe" evaluates the spatial distribution of calcium signals evoked by odors in two major classes of olfactory local neurons (LNs) in the cockroach P. Americana, which are defined by their physiological and morphological properties. Spiking type I LNs have a patchy innervation pattern of a subset of glomeruli, whereas non-spiking type II LNs innervate almost all glomeruli (Type II). The authors' overall conclusion is that odors evoke calcium signals globally and relatively uniformly across glomeruli in type I spiking LNs, and LN neurites in each glomerulus are broadly tuned to odor. In contrast, the authors conclude that they observe odor-specific patterns of calcium signals in type II nonspiking LNs, and LN neurites in different glomeruli display distinct local odor tuning. Blockade of action potentials in type I LNs eliminates global calcium signaling and decorrelates glomerular tuning curves, converting their response profile to be more similar to that of type II LNs. From these conclusions, the authors infer a primary role of type I LNs in interglomerular signaling and type III LNs in intraglomerular signaling.

      The question investigated by this study - to understand the computational significance of different types of LNs in olfactory circuits - is an important and significant problem. The design of the study is straightforward, but methodological and conceptual gaps raise some concerns about the authors' interpretation of their results. These can be broadly grouped into three main areas.

      1) The comparison of the spatial (glomerular) pattern of odor-evoked calcium signals in type I versus type II LNs may not necessarily be a true apples-to-apples comparison. Odor-evoked calcium signals are an order of magnitude larger in type I versus type II cells, which will lead to a higher apparent correlation in type I cells. In type IIb cells, and type I cells with sodium channel blockade, odor-evoked calcium signals are much smaller, and the method of quantification of odor tuning (normalized area under the curve) is noisy. Compare, for instance, ROI 4 & 15 (Figure 4) or ROI 16 & 23 (Figure 5) which are pairs of ROIs that their quantification concludes have dramatically different odor tuning, but which visual inspection shows to be less convincing. The fact that glomerular tuning looks more correlated in type IIa cells, which have larger, more reliable responses compared to type IIb cells, also supports this concern.

      We agree with the reviewer that "the comparison of the spatial (glomerular) pattern of odor-evoked calcium signals is not necessarily a true apples-to-apples comparison". Type I and type II LNs are different neuron types. Given their different physiology and morphology, this is not even close to a "true apples-to-apples comparison" - and a key point of the manuscript is to show just that.

      As we have emphasized in response to Essential Revision 1, the differences in Ca2+ signals are not an experimental shortcoming but a physiologically relevant finding per se. These data, especially when combined with the electrophysiological data, contribute to a better understanding of these neurons’ physiological and computational properties.

      It is physiologically determined that the Ca2+ signals during odorant stimulation in the type II LNs are smaller than in type I LNs. And yes, the signals are small because small postsynpathetic Ca2+ currents predominantly cause the signals. Regardless of the imaging method, this naturally reduces the signal-to-noise ratio, making it more challenging to detect signals. To address this issue, we used a well-defined and reproducible method for analyzing these signals. In this context, we do not agree with the very general criticism of the method. The reviewer questions whether the signals are odorant-induced or just noise (see also minor point 12). If we had recorded only noise, we would expect all tuning curves (for each odorant and glomerulus) to be the same. In this context, we disagree with the reviewer's statement that the tuning curves do not represent the Ca2+ signals in Figure 4 (ROI 4 and 15) and Figure 5 (ROI 16 and 23). This debate reflects precisely the kind of 'visual inspection bias' that our clearly defined analysis aims to avoid. On close inspection, the differences in Ca2+ signals can indeed be seen. Figure II (of this letter) shows the signals from the glomeruli in question at higher magnification. The sections of the recordings that were used for the tuning curves are marked in red.

      Figure II: Ca2+ signals of selected glomeruli that were questioned by the reviewer.

      2) An additional methodological issue that compounds the first concern is that calcium signals are imaged with wide-field imaging, and signals from each ROI likely reflect out of plane signals. Out of plane artifacts will be larger for larger calcium signals, which may also make it impossible to resolve any glomerular-specific signals in the type I LNs.

      Thank you for allowing us to clarify this point. The reviewer comment implies that the different amplitudes of the Ca2+ signals indicate some technical-methodological deficiency (poorly chosen odor concentration). But in fact, this is a key finding of this study that is physiologically relevant and crucial for understanding the function of the neurons studied. These very differences in the Ca2+ signals are evidence of the different roles these neurons play in AL. The different signal amplitudes directly show the distinct physiology and Ca2+ sources that dominate the Ca2+ signals in type I and type II LNs. Accordingly, it is impractical to equalize the magnitude of Ca2+ signals under physiological conditions by adjusting the concentration of odor stimuli.

      In the following, we address these issues in more detail: 1) Imaging Method 2) Odorant stimulation 3) Cell type-specific Ca2+ signals

      1) Imaging Method:

      Of course, we agree with the reviewer comment that out-of-focus and out-of-glomerulus fluorescence can potentially affect measurements, especially in widefield optical imaging in thick tissue. This issue was carefully addressed in initial experiments. In type I LNs, which innervate a subset of glomeruli, we detected fluorescence signals, which matched the spike pattern of the electrophysiological recordings 1:1, only in the innervated glomeruli. In the not innervated ROIs (glomeruli), we detected no or comparatively very little fluorescence, even in glomeruli directly adjacent to innervated glomeruli.

      To illustrate this, FIGURE I (of this response letter) shows measurements from an AL in which an uniglomerular projection neuron was investigated in an a set of experiments that were not directly related to the current study. In this experiment, a train of action potential was induced by depolarizing current. The traces show the action potential induced fluorescent signals from the innervated glomerulus (glomerulus #1) and the directly adjacent glomeruli.

      These results do not entirely exclude that the large Ca2+ signals from the innervated LN glomeruli may include out-of-focus and out-of-glomerulus fluorescence, but they do show that the bulk of the signal is generated from the recorded neuron in the respective glomeruli.

      Figure I: Simultaneous electrophysiological and optophysiological recordings of a uniglomerular projection using the ratiometric Ca2+ indicator fura-2. The projection neuron has its arborization in glomerulus 1. The train of action potentials was induced with a depolarizing current pulse (grey bar).

      2) Odorant Stimulation: It is important to note that the odorant concentration cannot be varied freely. For these experiments, the odorant concentrations have to be within a 'physiologically meaningful' range, which means: On the one hand, they have to be high enough to induce a clear response in the projection neurons (the antennal lobe output). On the other hand, however, the concentration was not allowed to be so high that the ORNs were stimulated nonspecifically. These criteria were met with the used concentrations since they induced clear and odorant-specific activity in projection neurons.

      3) Cell type-specific Ca2+ signals:

      The differences in Ca2+ signals are described and discussed in some detail throughout the text (e.g., page 6, lines 119-136; page 9, lines 193-198; page 10-11, lines 226-235; page 14-15, line 309-333). Briefly: In spiking type I LNs, the observed large Ca2+ signals are mediated mainly by voltage-depended Ca2+ channels activated by the Na+-driven action potential's strong depolarization. These large Ca2+ signals mask smaller signals that originate, for example, from excitatory synaptic input (i.e., evoked by ligand-activated Ca2+ conductances). Preventing the firing of action potentials can unmask the ligand-activated signals, as shown in Figure 4 (see also minor comments 8. and 10.). In nonspiking type II LNs, the action potential-generated Ca2+ signals are absent; accordingly, the Ca2+ signals are much smaller. In our model, the comparatively small Ca2+ signals in type II LNs are mediated mainly by (synaptic) ligand-gated Ca2+ conductances, possibly with contributions from voltage-gated Ca2+ channels activated by the comparatively small depolarization (compared with type I LNs).

      Accordingly, our main conclusion, that spiking LNs play a primary role in interglomerular signaling, while nonspiking LNs play an essential role in intraglomeular signaling, can be DIRECTLY inferred from the differences in odorant induced Ca2+ signals alone.

      a) Type I LN: The large, simultaneous, and uniform Ca2+ signals in the innervated glomeruli of an individual type I LN clearly show that they are triggered in each glomerulus by the propagated action potentials, which conclusively shows lateral interglomerular signal propagation.

      b) Type II LNs: In the type II LNs, we observed relatively small Ca2+ signals in single glomeruli or a small fraction of glomeruli of a given neuron. Importantly, the time course and amplitude of the Ca2+ signals varied between different glomeruli and different odors. Considering that type II LNs in principle, can generate large voltage-activated Ca2+ currents (larger that type I LNS; page 4, lines 82-86, Husch et al. 2009a,b; Fusca and Kloppenburg 2021), these data suggest that in type II LNs electrical or Ca2+ signals spread only within the same glomerulus; and laterally only to glomeruli that are electrotonically close to the odorant stimulated glomerulus.

      Taken together, this means that our conclusions regarding inter- and intraglomerular signaling can be derived from the simultaneously recorded amplitudes and the dynamics of the membrane potential and Ca2+ signals alone. This also means that although the correlation analyses support this conclusion nicely, the actual conclusion does not ultimately depend on the correlation analysis. We had (tried to) expressed this with the wording, “Quantitatively, this is reflected in the glomerulus-specific odorant responses and the diverse correlation coefficiiants across…” (page 10, lines 216-217) and “ …This is also reflected in the highly correlated tuning curves in type I LNs and low correlations between tuning curves in type II LNs”(page 13, lines 293-295).

      3) Apart from the above methodological concerns, the authors' interpretation of these data as supporting inter- versus intra-glomerular signaling are not well supported. The odors used in the study are general odors that presumably excite feedforward input to many glomeruli. Since the glomerular source of excitation is not determined, it's not possible to assign the signals in type II LNs as arising locally - selective interglomerular signal propagation is entirely possible. Likewise, the study design does not allow the authors to rule out the possibility that significant intraglomerular inhibition may be mediated by type I LNs.

      The reviewer addresses an important point. However, from the comment, we get the impression that he/she has not taken into account the entire data set and the DISCUSSION. In fact, this topic has already been discussed in some detail in the original version (page 12, lines 268-271; page 15-16; lines 358-374). This section even has a respective heading: "Inter- and intraglomerular signaling via nonspiking type II LNs" (page 15, line 338). We apologize if our explanations regarding this point were unclear, but we also feel that the reviewer is arguing against statements that we did not make in this way.

      a) In 11 out of 18 type II LNs we found 'relatively uncorrelated' (r=0.43±0.16, N=11) glomerular tuning curves. These experiments argue strongly for a 'local excitation' with restricted signal propagation and do not provide support for interglomerular signal propagation. Thus, these results support our interpretation of intraglomerular signaling in this set of neurons.

      b) In 7 out of 18 experiments, we observed 'higher correlated' glomerular tuning curves (r=0.78±0.07, N=7). We agree with the reviewer that this could be caused by various mechanisms, including simultaneous input to several glomeruli or by interglomerular signaling. Both possibilities were mentioned and discussed in the original version of the manuscript (page 12, lines 268-271; page 15-16; lines 358-374). In the Discussion, we considered the latter possibility in particular (but not exclusively) for the type IIa1 neurons that generate spikelets. Their comparatively stronger active membrane properties may be particularly suitable for selective signal transduction between glomeruli.

      c) We have not ruled out that local signaling exists in type I LNs – in addition to interglomerular signaling. The highly localized Ca2+ signals in type I LNs, which we observed when Na+ -driven action potential generation was prevented, may support this interpretation. However, we would like to reiterate that the simultaneous electrophysiological and optophysiological recordings, which show highly correlated glomerular Ca2+ dynamics that match 1:1 with the simultaneously recorded action potential pattern, clearly suggest interglomerular signaling. We also want to emphasize that this interpretation is in agreement with previous models derived from electrophysiological studies(Assisi et al., 2011; Fujiwara et al., 2014; Hong and Wilson, 2015; Nagel and Wilson, 2016; Olsen and Wilson, 2008; Sachse and Galizia, 2002; Wilson, 2013).

      In light of the reviewer's comment(s), we have modified the text to clarify these points (page 14, lines 317-319).

      Reviewer #3 (Public Review):

      To elucidate the role of the two types of LNs, the authors combined whole-cell patch clamp recordings with calcium imaging via single cell dye injection. This method enables to monitor calcium dynamics of the different axons and branches of single LNs in identified glomeruli of the antennal lobe, while the membrane potential can be recorded at the same time. The authors recorded in total from 23 spiking (type I LN) and 18 non-spiking (type II LN) neurons to a set of 9 odors and analyzed the firing pattern as well as calcium signals during odor stimulation for individual glomeruli. The recordings reveal on one side that odor-evoked calcium responses of type I LNs are odor-specific, but homogeneous across glomeruli and therefore highly correlated regarding the tuning curves. In contrast, odor-evoked responses of type II LNs show less correlated tuning patterns and rather specific odor-evoked calcium signals for each glomerulus. Moreover the authors demonstrate that both LN types exhibit distinct glomerular branching patterns, with type I innervating many, but not all glomeruli, while type II LNs branch in all glomeruli.

      From these results and further experiments using pharmacological manipulation, the authors conclude that type I LNs rather play a role regarding interglomerular inhibition in form of lateral inhibition between different glomeruli, while type II LNs are involved in intraglomerular signaling by developing microcircuits in individual glomeruli.

      In my opinion the methodological approach is quite challenging and all subsequent analyses have been carried out thoroughly. The obtained data are highly relevant, but provide rather an indirect proof regarding the distinct roles of the two LN types investigated. Nevertheless, the conclusions are convincing and the study generally represents a valuable and important contribution to our understanding of the neuronal mechanisms underlying odor processing in the insect antennal lobe. I think the authors should emphasize their take-home messages and resulting conclusions even stronger. They do a good job in explaining their results in their discussion, but need to improve and highlight the outcome and meaning of their individual experiments in their results section.

      Thank you for this positive feedback.

      References:

      Assisi, C., Stopfer, M., Bazhenov, M., 2011. Using the structure of inhibitory networks to unravel mechanisms of spatiotemporal patterning. Neuron 69, 373–386. https://doi.org/10.1016/j.neuron.2010.12.019

      Das, S., Trona, F., Khallaf, M.A., Schuh, E., Knaden, M., Hansson, B.S., Sachse, S., 2017. Electrical synapses mediate synergism between pheromone and food odors in Drosophila melanogaster . Proc Natl Acad Sci U S A 114, E9962–E9971. https://doi.org/10.1073/pnas.1712706114

      Fujiwara, T., Kazawa, T., Haupt, S.S., Kanzaki, R., 2014. Postsynaptic odorant concentration dependent inhibition controls temporal properties of spike responses of projection neurons in the moth antennal lobe. PLOS ONE 9, e89132. https://doi.org/10.1371/journal.pone.0089132

      Fusca, D., Husch, A., Baumann, A., Kloppenburg, P., 2013. Choline acetyltransferase-like immunoreactivity in a physiologically distinct subtype of olfactory nonspiking local interneurons in the cockroach (Periplaneta americana). J Comp Neurol 521, 3556–3569. https://doi.org/10.1002/cne.23371

      Fuscà, D., and Kloppenburg, P. (2021). Odor processing in the cockroach antennal lobe-the network components. Cell Tissue Res.

      Hong, E.J., Wilson, R.I., 2015. Simultaneous encoding of odors by channels with diverse sensitivity to inhibition. Neuron 85, 573–589. https://doi.org/10.1016/j.neuron.2014.12.040

      Husch, A., Paehler, M., Fusca, D., Paeger, L., Kloppenburg, P., 2009a. Calcium current diversity in physiologically different local interneuron types of the antennal lobe. J Neurosci 29, 716–726. https://doi.org/10.1523/JNEUROSCI.3677-08.2009

      Husch, A., Paehler, M., Fusca, D., Paeger, L., Kloppenburg, P., 2009b. Distinct electrophysiological properties in subtypes of nonspiking olfactory local interneurons correlate with their cell type-specific Ca2+ current profiles. J Neurophysiol 102, 2834–2845. https://doi.org/10.1152/jn.00627.2009

      Nagel, K.I., Wilson, R.I., 2016. Mechanisms Underlying Population Response Dynamics in Inhibitory Interneurons of the Drosophila Antennal Lobe. J Neurosci 36, 4325–4338. https://doi.org/10.1523/JNEUROSCI.3887-15.2016

      Neupert, S., Fusca, D., Kloppenburg, P., Predel, R., 2018. Analysis of single neurons by perforated patch clamp recordings and MALDI-TOF mass spectrometry. ACS Chem Neurosci 9, 2089–2096.

      Olsen, S.R., Bhandawat, V., Wilson, R.I., 2007. Excitatory interactions between olfactory processing channels in the Drosophila antennal lobe. Neuron 54, 89–103. https://doi.org/10.1016/j.neuron.2007.03.010

      Olsen, S.R., Wilson, R.I., 2008. Lateral presynaptic inhibition mediates gain control in an olfactory circuit. Nature 452, 956–960. https://doi.org/10.1038/nature06864

      Sachse, S., Galizia, C., 2002. Role of inhibition for temporal and spatial odor representation in olfactory output neurons: a calcium imaging study. J Neurophysiol. 87, 1106–17.

      Shang, Y., Claridge-Chang, A., Sjulson, L., Pypaert, M., Miesenbock, G., 2007. Excitatory Local Circuits and Their Implications for Olfactory Processing in the Fly Antennal Lobe. Cell 128, 601–612.

      Wilson, R.I., 2013. Early olfactory processing in Drosophila: mechanisms and principles. Annu Rev Neurosci 36, 217–241. https://doi.org/10.1146/annurev-neuro-062111-150533

      Yaksi, E., Wilson, R.I., 2010. Electrical coupling between olfactory glomeruli. Neuron 67, 1034–1047. https://doi.org/10.1016/j.neuron.2010.08.041

    1. Author Response

      Reviewer #1 (Public Review):

      It is now widely accepted that the age of the brain can differ from the person's chronological age and neuroimaging methods are ideally suited to analyze the brain age and associated biomarkers. Preclinical studies of rodent models with appropriate neuroimaging do attest that lifestyle-related prevention approaches may help to slow down brain aging and the potential of BrainAGE as a predictor of age-related health outcomes. However, there is a paucity of data on this in humans. It is in this context the present manuscript receives its due attention.

      Comments:

      1) Lifestyle intervention benefits need to be analyzed using robust biomarkers which should be profiled non-invasively in a clinical setting. There is increasing evidence of the role of telomere length in brain aging. Gampawar et al (2020) have proposed a hypothesis on the effect of telomeres on brain structure and function over the life span and named it as the "Telomere Brain Axis". In this context, if the authors could measure telomere length before and after lifestyle intervention, this will give a strong biomarker utility and value addition for the lifestyle modification benefits. 2) Authors should also consider measuring BDNF levels before and after lifestyle intervention.

      Response to comments 1+2: we agree that associating both telomere length and BDNF level with brain age would be interesting and relevant. However, we did not measure these two variables. We would certainly consider adding these in future work. Regarding telomere length, we now include a short discussion of brain age in relation to other bodily ages, such as telomere length (Discussion section):

      “Studying changes in functional brain aging is part of a broader field that examines changes in various biological ages, such as telomere length1, DNA methylation2, and arterial stiffness3. Evaluating changes in these bodily systems over time allows us to capture health and lifestyle-related factors that affect overall aging and may guide the development of targeted interventions to reduce age-related decline. For example, in the CENTRAL cohort, we recently reported that reducing body weight and intrahepatic fat following a lifestyle intervention was related to methylation age attenuation4. In the current work, we used RSFC for brain age estimation, which resulted in a MAE of ~8 years, which was larger than the intervention period. Nevertheless, we found that brain age attenuation was associated with changes in multiple health factors. The precision of an age prediction model based on RSFC is typically lower than a model based on structural brain imaging5. However, a higher model precision may result in a lower sensitivity to detect clinical effects6,7. Better tools for data harmonization among dataset6 and larger training sample size5 may improve the accuracy of such models in the future. We also suggest that examining the dynamics of multiple bodily ages and their interactions would enhance our understanding of the complex aging process8,9. “

      And

      “These findings complement the growing interest in bodily aging indicated, for example, by DNA methylation4 as health biomarkers and interventions that may affect them.”

      Reviewer #2 (Public Review):

      In this study, Levakov et al. investigated brain age based on resting-state functional connectivity (RSFC) in a group of obese participants following an 18-month lifestyle intervention. The study benefits from various sophisticated measurements of overall health, including body MRI and blood biomarkers. Although the data is leveraged from a solid randomized control set-up, the lack of control groups in the current study means that the results cannot be attributed to the lifestyle intervention with certainty. However, the study does show a relationship between general weight loss and RSFC-based brain age estimations over the course of the intervention. While this may represent an important contribution to the literature, the RSFC-based brain age prediction shows low model performance, making it difficult to interpret the validity of the derived estimates and the scale of change. The study would benefit from more rigorous analyses and a more critical discussion of findings. If incorporated, the study contributes to the growing field of literature indicating that weight-reduction in obese subjects may attenuate the detrimental effect of obesity on the brain.

      The following points may be addressed to improve the study:

      Brain age / model performance:

      1) Figure 2: In the test set, the correlation between true and predicted age is 0.244. The fitted slope looks like it would be approximately 0.11 (55-50)/(80-35); change in y divided by change in x. This means that for a chronological age change of 12 months, the brain age changes by 0.11*12 = 1.3 months. I.e., due to the relatively poor model performance, an 80-year-old participant in the plot (fig 2) has a predicted age of ~55. Hence, although the age prediction step can generate a summary score for all the RSFC data, it can be difficult to interpret the meaning of these brain age estimates and the 'expected change' since the scale is in years.

      2) In Figure 2 it could also help to add the x = y line to get a better overview of the prediction variance. The estimates are likely clustered around the mean/median age of the training dataset, and age is overestimated in younger subs and overestimated in older subs (usually referred to as "age bias"). It is important to inspect the data points here to understand what the estimates represent, i.e., is variation in RSFC potentially lost by wrapping the data in this summary measure, since the age prediction is not particularly accurate, and should age bias in the predictions be accounted for by adjusting the test data for the bias observed in the training data?

      Response to comment 1+2: we agree with the reviewer that due to the relatively moderate correlation between the predicted and observed age, a large change in the observed age corresponds to a small change in the predicted age. We now state this limitation in Results section 2.1:

      “Despite being significant and reproducible, we note that the correlations between the observed and predicted age were relatively moderate.”

      And discuss this point in the Discussion section:

      “In the current work, we used RSFC for brain age estimation, which resulted in a MAE of ~8 years, which was larger than the intervention period. Nevertheless, we found that brain age attenuation was associated with changes in multiple health factors. The precision of an age prediction model based on RSFC is typically lower than a model based on structural brain imaging5. However, a higher model precision may result in a lower sensitivity to detect clinical effects6,7. Better tools for data harmonization among dataset6 and larger training sample size5 may improve the accuracy of such models in the future.”

      Moreover, , we now add the x=y line to Fig. 2, so the readers can better assess the prediction variance as suggested by the reviewer:

      We prefer to avoid using different scales (year/month) in the x and y axes to avoid misleading the readers, but the list of observed and predicted ages are available as SI files with a precision of 2 decimals point (~3 days).

      We note that despite the moderate precision accuracy, we replicated these results in three separate cohorts.

      Regarding the effect of “age bias” (also known as “regression attenuation” or “regression dilution” 10), we are aware of this phenomenon and agree that it must be accounted for. In fact, the “age bias” is one of the reasons we chose to use the difference between the expected and observed ages as the primary outcome of the study, as this measure already takes this bias into account. To demonstrate this effect we now compute brain age attenuation in two ways: 1. As described and used in the current study (Methods 4.9); and 2. By regressing out the effect of age on the predicted brain age at both times separately, then subtracting the adjusted predicted age at T18 from the adjusted predicted age at T0. The second method is the standard method to account for age bias as described in a previous work 11. Below is a scatter plot of both measures across all participants:

      The x-axis represents the first method, used in the current study, and the y-axis represents the second method, described in Smith et al., (2019). Across all subjects, we found a nearly perfect 1:1 correspondence between the two methods (r=.998, p<0.001; MAE=0.45), as the two are mathematically identical. The small gap between the two is because the brain age attenuation model also takes into account the difference in the exact time that passed between the two scans for each participant (mean=21.36m, std = 1.68m).

      We now note this in Methods section 4.9:

      “We note that the result of computing the difference between the bias-corrected brain age gap at both times was nearly identical to the brain age attenuation measure (r=.99, p<0.001; MAE=0.45). The difference between the two is because the brain age attenuation model takes into account the difference in the exact time that passed between the two scans for each participant (mean=21.36m, std = 1.68m).”

      3) In Figure 3, some of the changes observed between time points are very large. For example, one subject with a chronological age of 62 shows a ten-year increase in brain age over 18 months. This change is twice as large as the full range of age variation in the brain age estimates (average brain age increases from 50 to 55 across the full chronological age span). This makes it difficult to interpret RSFC change in units of brain age. E.g., is it reasonable that a person's brain ages by ten years, either up or down, in 18 months? The colour scale goes from -12 years to 14 years, so some of the observed changes are 14 / 1.5 = 9 times larger than the actual time from baseline to follow-up.

      We agree that our model precision was relatively low, especially compared to the period of the intervention, as also stated by reviewer #1. We now discuss this issue in light of the studies pointed out by the reviewer (Discussion section):

      “In the current work, we used RSFC for brain age estimation, which resulted in a MAE of ~8 years, which was larger than the intervention period. Nevertheless, we found that brain age attenuation was associated with changes in multiple health factors. The precision of an age prediction model based on RSFC is typically lower than a model based on structural brain imaging5. However, a higher model precision may result in a lower sensitivity to detect clinical effects6,7. Better tools for data harmonization among datasets6 and larger training sample size5 may improve the accuracy of such models in the future.”

      Again, we note that despite the moderate precision accuracy, we replicated these results in three separate cohorts and found that both the correlation and the MAE between the predicted and observed age were significant in all of them.

      RSFC for age prediction:

      1) Several studies show better age prediction accuracy with structural MRI features compared to RSFC. If the focus of the study is to use an accurate estimate of brain ageing rather than specifically looking at changes in RSFC, adding structural MRI data could be helpful.

      We focused on brain structural changes in a previous work, and the focus of the current work was assessing age-related functional connectivity alterations. We now added a few sentences in the Introduction section that would hopefully better motivate our choice:

      “We previously found that weight loss, glycemic control, lowering of blood pressure, and increment in polyphenols-rich food were associated with an attenuation in brain atrophy 12. Obesity is also manifested in age-related changes in the brain’s functional organization as assessed with resting-state functional connectivity (RSFC). These changes are dynamic13 and can be observed in short time scales14 and thus of relevance when studying lifestyle intervention.”

      2) If changes in RSFC are the main focus, using brain age adds a complicated layer that is not necessarily helpful. It could be easier to simply assess RSFC change from baseline to follow up, and correlate potential changes with changes in e.g., BMI.

      We are specifically interested in age-related changes as we described a-priori in the registration of the study: https://clinicaltrials.gov/ct2/show/NCT03020186

      Moreover, age-related changes in RSFC are complex, multivariate and dependent upon the choice of theoretical network measures. We think that a data-driven brain age prediction approach might better capture these multifaceted changes and their relation to aging. We now state this in the Introduction section:

      “Studies have linked obesity with decreased connectivity within the default mode network15,16 and increased connectivity with the lateral orbitofrontal cortex17, which are also seen in normal aging18,19. Longitudinal trials have reported changes in these connectivity patterns following weight reduction20,21, indicating that they can be altered. However, findings regarding functional changes are less consistent than those related to anatomical changes due to the multiple measures22 and scales23 used to quantify RSFC. Hence, focusing on a single measure, the functional brain age, may better capture these complex, multivariant changes and their relation to aging. “

      The lack of control groups

      1) If no control group data is available, it is important to clarify this in the manuscript, and evaluate which conclusions can and cannot be drawn based on the data and study design.

      We agree that this point should be made more clear, and we now state this in the limitation section of the Discussion:

      “We also note that the lack of a no-intervention control group limits our ability to directly relate our findings to the intervention. Hence, we can only relate brain age attenuation to the observed changes in health biomarkers.”

      Also, following reviewers’ #2 and #3 comments, we refer to the weight loss following 18 months of lifestyle intervention instead of to the intervention itself. This is now made clear in the title, abstract, and the main text.

      Reviewer #3 (Public Review):

      The authors report on an interesting study that addresses the effects of a physical and dietary intervention on accelerated/decelerated brain ageing in obese individuals. More specifically, the authors examined potential associations between reductions in Body-Mass-Index (BMI) and a decrease in relative brain-predicted age after an 18-months period in N = 102 individuals. Brain age models were based on resting-state functional connectivity data. In addition to change in BMI, the authors also tested for associations between change in relative brain age and change in waist circumference, six liver markers, three glycemic markers, four lipid markers, and four MRI fat deposition measures. Moreover, change in self-reported consumption of food, stratified by categories such as 'processed food' and 'sweets and beverages', was tested for an association with change in relative brain age. Their analysis revealed no evidence for a general reduction in relative brain age in the tested sample. However, changes in BMI, as well as changes in several liver, glycemic, lipid, and fat-deposition markers showed significant covariation with changes in relative brain age. Three markers remained significant after additionally controlling for BMI, indicating an incremental contribution of these markers to change in relative brain age. Further associations were found for variables of subjective food consumption. The authors conclude that lifestyle interventions may have beneficial effects on brain aging.

      Overall, the writing is concise and straightforward, and the langue and style are appropriate. A strength of the study is the longitudinal design that allows for addressing individual accelerations or decelerations in brain aging. Research on biological aging parameters has often been limited to cross-sectional analyses so inferences about intra-individual variation have frequently been drawn from inter-individual variation. The presented study allows, in fact, investigating within-person differences. Moreover, I very much appreciate that the authors seek to publish their code and materials online, although the respective GitHub project page did not appear to be set to 'public' at the time (error 404). Another strength of the study is that brain age models have been trained and validated in external samples. One further strength of this study is that it is based on a registered trial, which allows for the evaluation of the aims and motivation of the investigators and provides further insights into the primary and secondary outcomes measures (see the clinical trial identification code).

      One weakness of the study is that no comparison between the active control group and the two experimental groups has been carried out, which would have enabled causal inferences on the potential effects of different types of interventions on changes in relative brain age. In this regard, it should also be noted that all groups underwent a lifestyle intervention. Hence, from an experimenter's perspective, it is problematic to conclude that lifestyle interventions may modulate brain age, given the lack of a control group without lifestyle intervention. This issue is fueled by the study title, which suggests a strong focus on the effects of lifestyle intervention. Technically, however, this study rather constitutes an investigation of the effects of successful weight loss/body fat reduction on brain age among participants who have taken part in a lifestyle intervention. In keeping with this, the provided information on the main effect of time on brain age is scarce, essentially limited to a sign test comparing the proportions of participants with an increase vs. decrease in relative brain age. Interestingly, this analysis did not suggest that the proportion of participants who benefit from the intervention (regarding brain age) significantly exceeds the number of participants who do not benefit. So strictly speaking, the data rather indicates that it's not the lifestyle intervention per sé that contributes to changes in brain age, but successful weight loss/body fat reduction. In sum, I feel that the authors' claims on the effects of the intervention cannot be underscored very well given the lack of a control group without lifestyle intervention.

      We agree that this point, also raised by reviewer #2, should be made clear, and we now state this in the limitation section of the Discussion:

      “We also note that the lack of a no-intervention control group limits our ability to directly relate our findings to the intervention. Hence, we can only relate brain age attenuation to the observed changes in health biomarkers.”

      Also, following reviewers #2 and #3, we refer to the weight loss following 18 months of lifestyle intervention instead of to the intervention itself. This is now explicitly mentioned in the title, abstract, and within the text:

      Title: “The effect of weight loss following 18 months of lifestyle intervention on brain age assessed with resting-state functional connectivity”

      Abstract: “…, we tested the effect of weight loss following 18 months of lifestyle intervention on predicted brain age, based on MRI-assessed resting-state functional connectivity (RSFC).”

      Another major weakness is that no rationale is provided for why the authors use functional connectivity data instead of structural scans for their age estimation models. This gets even more evident in view of the relatively low prediction accuracies achieved in both the validation and test sets. My notion of the literature is that the vast majority of studies in this field implicate brain age models that were trained on structural MRI data, and these models have achieved way higher prediction accuracies. Along with the missing rationale, I feel that the low model performances require some more elaboration in the discussion section. To be clear, low prediction accuracies may be seen as a study result and, as such, they should not be considered as a quality criterion of the study. Nevertheless, the choice of functional MRI data and the relevance of the achieved model performances for subsequent association analysis needs to be addressed more thoroughly.

      We agree that age estimation from structural compared to functional imaging yields a higher prediction accuracy. In a previous publication using the same dataset12, we demonstrated that weight loss was associated with an attenuation in brain atrophy, as we describe in the introduction:

      “We previously found that weight loss, glycemic control and lowering of blood pressure, as well as increment in polyphenols rich food, were associated with an attenuation in brain atrophy 12.”

      Here we were specifically interested in age-related functional alterations that are associated with successful weight reduction. Compared to structural brain changes aging effect on functional connectivity is more complex and multifaced. Hence, we decided to utilize a data-driven or prediction-driven approach for assessing age-related changes in functional connectivity by predicting participants’ functional brain age. We now describe this rationale in the introduction section:

      “Studies have linked obesity with decreased connectivity within the default mode network15,16 and increased connectivity with the lateral orbitofrontal cortex17, which are also seen in normal aging18,19. Longitudinal trials have reported changes in these connectivity patterns following weight reduction20,21, indicating that they can be altered. However, findings regarding functional changes are less consistent than those related to anatomical changes due to the multiple measures22 and scales23 used to quantify RSFC. Hence, focusing on a single measure, the functional brain age, may better capture these complex changes and their relation to aging.”

      We address the point regarding the low model performance in response to reviewer #2, comment #2.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors start the study with an interesting clinical observation, found in a small subset of prostate cancers: FOXP2-CPED1 fusion. They describe how this fusion results in enhanced FOXP2 protein levels, and further describe how FOXP2 increases anchorageindependent growth in vitro, and results in pre-malignant lesions in vivo. Intrinsically, this is an interesting observation. However, the mechanistic insights are relatively limited as it stands, and the main issues are described below.

      Main issues:

      1) While the study starts off with the FOXP2 fusion, the vast majority of the paper is actually about enhanced FOXP2 expression in tumorigenesis. Wouldn't it be more logical to remove the FOXP2 fusion data? These data seem quite interesting and novel but they are underdeveloped within the current manuscript design, which is a shame for such an exciting novel finding. Along the same lines, for a study that centres on the prostate lineage, it's not clear why the oncogenic potential of FOXP2 in mouse 3T3 fibroblasts was tested.

      We thank the reviewer very much for the comment. We followed the suggestion and added a set of data regarding the newly identified FOXP2 fusion in Figure 1 to make our manuscript more informative. We tested the oncogenic potential of FOXP2 in NIH3T3 fibroblasts because NIH3T3 cells are a widely used model to demonstrate the presence of transformed oncogenes2,3. In our study, we observed that when NIH3T3 cells acquired the exogenous FOXP2 gene, the cells lost the characteristic contact inhibition response, continued to proliferate and eventually formed clonal colonies. Please refer to "Answer to Essential Revisions #1 from the Editors” for details.

      2) While the FOXP2 data are compelling and convincing, it is not clear yet whether this effect is specific, or if FOXP2 is e.g. universally relevant for cell viability. Targeting FOXP2 by siRNA/shRNA in a non-transformed cell line would address this issue.

      We appreciate these helpful comments. Please refer to the "Answer to Essential Revisions #1 from the Editors” for details.

      3) Unfortunately, not a single chemical inhibitor is truly 100% specific. Therefore, the Foretinib and MK2206 experiments should be confirmed using shRNAs/KOs targeting MEK and AKT. With the inclusion of such data, the authors would make a very compelling argument that indeed MEK/AKT signalling is driving the phenotype.

      We thank the reviewer for highlighting this point and we agree with the reviewer’s point that no chemical inhibitor is 100% specific. In this study, we used chemical inhibitors to provide further supportive data indicating that FOXP2 confers oncogenic effects by activating MET signaling. We characterized a FOXP2-binding fragment located in MET and HGF in LNCaP prostate cancer cells by utilizing the CUT&Tag method. We also found that MET restoration partially reversed oncogenic phenotypes in FOXP2-KD prostate cancer cells. All these data consistently supported that FOXP2 activates MET signaling in prostate cancer. Please refer to the "Answer to Essential Revisions #2 from the Editors” and to the "Answer to Essential Revisions #7 from the Editors” for details.

      4) With the FOXP2-CPED1 fusion being more stable as compared to wild-type transcripts, wouldn't one expect the fusion to have a more severe phenotype? This is a very exciting aspect of the start of the study, but it is not explored further in the manuscript. The authors would ideally elaborate on why the effects of the FOXP2-CPED1 fusion seem comparable to the FOXP2 wildtype, in their studies.

      We thank the reviewer very much for the comment. We had quantified the number of colonies of FOXP2- and FOXP2-CPED1-overexpressing cells, and we found that both wildtype FOXP2 and FOXP2-CPED1 had a comparable putative functional influence on the transformation of human prostate epithelial cells RWPE-1 and mouse primary fibroblasts NIH3T3 (P = 0.69, by Fisher’s exact test for RWPE-1; P = 0.23, by Fisher’s exact test for NIH3T3). We added the corresponding description to the Results section in Line 487 on Page 22 in the tracked changes version of the revised manuscript. Please refer to the "Answer to Essential Revisions #5 from the Editors” for details.

      5) The authors claim that FOXP2 functions as an oncogene, but the most-severe phenotype that is observed in vivo, is PIN lesions, not tumors. While this is an exciting observation, it is not the full story of an oncogene. Can the authors justifiably claim that FOXP2 is an oncogene, based on these results?

      We appreciate the comment, and we made the corresponding revision in the revised manuscript. Please refer to the "Answer to Essential Revisions #3 from the Editors” for details.

      6) The clinical and phenotypic observations are exciting and relevant. The mechanistic insights of the study are quite limited in the current stage. How does FOXP2 give its phenotype, and result in increased MET phosphorylation? The association is there, but it is unclear how this happens.

      We appreciate this valuable suggestion. In the current study, we used the CUT&Tag method to explore how FOXP2 activated MET signaling in LNCaP prostate cancer cells, and we identified potential FOXP2-binding fragments in MET and HGF. Therefore, we proposed that FOXP2 activates MET signaling in prostate cancer through its binding to MET and METassociated gene. Please refer to the "Answer to Essential Revisions #2 from the Editors” for details.

      Reviewer #2 (Public Review):

      1) The manuscript entitled "FOXP2 confers oncogenic effects in prostate cancer through activating MET signalling" by Zhu et al describes the identification of a novel FOXP2CPED1 gene fusion in 2 out of 100 primary prostate cancers. A byproduct of this gene fusion is the increased expression of FOXP2, which has been shown to be increased in prostate cancer relative to benign tissue. These data nominated FOXP2 as a potential oncogene. Accordingly, overexpression of FOXP2 in nontransformed mouse fibroblast NIH-3T3 and human prostate RWPE-1 cells induced transforming capabilities in both cell models. Mechanistically, convincing data were provided that indicate that FOXP2 promotes the expression and/or activity of the receptor tyrosine kinase MET, which has previously been shown to have oncogenic functions in prostate cancer. Notably, the authors create a new genetically engineered mouse model in which FOXP2 is overexpressed in the prostatic luminal epithelial cells. Overexpression of FOXP2 was sufficient to promote the development of prostatic intraepithelial neoplasia (PIN) a suspected precursor to prostate adenocarcinoma and activate MET signaling.

      Strengths:

      This study makes a convincing case for FOXP2 as 1) a promoter of prostate cancer initiation and 2) an upstream regulator of pro-cancer MET signaling. This was done using both overexpression and knockdown models in cell lines and corroborated in new genetically engineered mouse models (GEMMs) of FOXP2 or FOXP2-CPED1 overexpression in prostate luminal epithelial cells as well as publicly available clinical cohort data.

      Major strengths of the study are the demonstration that FOXP2 or FOXP2-CPED1 overexpression transforms RWPE-1 cells to now grow in soft agar (hallmark of malignant transformation) and the creation of new genetically engineered mouse models (GEMMs) of FOXP2 or FOXP2-CPED1 overexpression in prostate luminal epithelial cells. In both mouse models, FOXP2 overexpression increased the incidence of PIN lesions, which are thought to be a precursor to prostate cancer. While FOXP2 alone was not sufficient to cause prostate cancer in mice, it is acknowledged that single gene alterations causing prostate cancer in mice are rare. Future studies will undoubtedly want to cross these GEMMs with established, relatively benign models of prostate cancer such as Hi-Myc or Pb-Pten mice to see if FOXP2 accelerates cancer progression (beyond the scope of this study).

      We appreciate these positive comments from the reviewer. We agree with the suggestion from the reviewer that it is worth exploring whether FOXP2 is able to cooperate with a known disease driver to accelerate the progression of prostate cancer. Therefore, we are going to cross Pb-FOXP2 transgenic mice with Pb-Pten KO mice to assess if FOXP2 is able to accelerate malignant progression.

      2) Weaknesses: It is unclear why the authors decided to use mouse fibroblast NIH3T3 cells for their transformation studies. In this regard, it appears likely that FOXP2 could function as an oncogene across diverse cell types. Given the focus on prostate cancer, it would have been preferable to corroborate the RWPE-1 data with another prostate cell model and test FOXP2's transforming ability in RWPE-1 xenograft models. To that end, there is no direct evidence that FOXP2 can cause cancer in vivo. The GEMM data, while compelling, only shows that FOXP2 can promote PIN in mice and the lone xenograft model chosen was for fibroblast NIH-3T3 cells.

      To determine the oncogenic activity of FOXP2 and the FOXP2-CPDE1 fusion, we initially used mouse primary fibroblast NIH3T3 for transformation experiments, because NIH3T3 cells are a widely used cell model to discover novel oncogenes2,3,10,11. Subsequently, we observed that overexpression of FOXP2 and its fusion variant drove RWPE-1 cells to lose the characteristic contact inhibition response, led to their anchorage-independent growth in vitro, and promoted PIN in the transgenic mice. During preparation of the revised manuscript, we tested the transformation ability of FOXP2 and FOXP2-CPED1 in RWPE1 xenograft models. We subcutaneously injected 2 × 106 RWPE-1 cells into the flanks of NOD-SCID mice. The NODSCID mice were divided into five groups (n = 5 mice in each group): control, FOXP2overexpressing (two stable cell lines) and FOXP2-CPED1- overexpressing (two cell lines) groups. The experiment lasted for 4 months. We observed that no RWPE-1 cell-injected mice developed tumor masses. We propose that FOXP2 and its fusion alone are not sufficient to generate the microenvironment suitable for RWPE-1-xenograft growth. Collectively, our data suggest that FOXP2 has oncogenic potential in prostate cancer, but is not sufficient to act alone as an oncogene.

      3) There is a limited mechanism of action. While the authors provide correlative data suggesting that FOXP2 could increase the expression of MET signaling components, it is not clear how FOXP2 controls MET levels. It would be of interest to search for and validate the importance of potential FOXP2 binding sites in or around MET and the genes of METassociated proteins. At a minimum, it should be confirmed whether MET is a primary or secondary target of FOXP2. The authors should also report on what happened to the 4-gene MET signature in the FOXP2 knockdown cell models. It would be equally significant to test if overexpression of MET can rescue the anti-growth effects of FOXP2 knockdown in prostate cancer cells (positive or negative results would be informative).

      We appreciate all the valuable comments. As suggested, we performed corresponding experiments, please refer to the " Answers to Essential Revisions #2 from the Editors”, to the "Answer to Essential Revisions #6 from the Editors”, and to the "Answer to Essential Revisions #7 from the Editors” for details.

      Reviewer #3 (Public Review):

      1) In this manuscript, the authors present data supporting FOXP2 as an oncogene in PCa. They show that FOXP2 is overexpressed in PCa patient tissue and is necessary and sufficient for PCa transformation/tumorigenesis depending on the model system. Overexpression and knock-down of FOXP2 lead to an increase/decrease in MET/PI3K/AKT transcripts and signaling and sensitizes cells to PI3K/AKT inhibition.

      Key strengths of the paper include multiple endpoints and model systems, an over-expression and knock-down approach to address sufficiency and necessity, a new mouse knock-in model, analysis of primary PCa patient tumors, and benchmarking finding against publicly available data. The central discovery that FOXP2 is an oncogene in PCa will be of interest to the field. However, there are several critically unanswered questions.

      1) No data are presented for how FOXP2 regulates MET signaling. ChIP would easily address if it is direct regulation of MET and analysis of FOXP2 ChIP-seq could provide insights.

      2) Beyond the 2 fusions in the 100 PCa patient cohort it is unclear how FOXP2 is overexpressed in PCa. In the discussion and in FS5 some data are presented indicating amplification and CNAs, however, these are not directly linked to FOXP2 expression.

      3) There are some hints that full-length FOXP2 and the FOXP2-CPED1 function differently. In SF2E the size/number of colonies between full-length FOXP2 and fusion are different. If the assay was run for the same length of time, then it indicates different biologies of the overexpressed FOXP2 and FOXP2-CPED1 fusion. Additionally, in F3E the sensitization is different depending on the transgene.

      We appreciate these valuable comments and constructive remarks. As suggested, we performed the CUT&Tag experiments to detect the binding of FOXP2 to MET, and to examine the association of CNAs of FOXP2 with its expression. Please refer to the " Answer to Essential Revisions #2 from the Editors" and the " Answer to Essential Revisions #4 from the Editors" for details. We also added detailed information to show the resemblance observed between FOXP2 fusion- and wild-type FOXP2-overexpressing cells. We added the corresponding description to the Results section in Line 487 on Page 22 in the tracked changes version of the revised manuscript. Please refer to the “Answer to Essential Revisions #5 from the Editors” for details.

      2) The relationship between FOXP2 and AR is not explored, which is important given 1) the critical role of the AR in PCa; and 2) the existing relationship between the AR and FOXP2 and other FOX gene members.

      We thank the reviewer very much for highlighting this point. We agree that it is important to examine the relationship between FOXP2 and AR. We therefore analyzed the expression dataset of 255 primary prostate tumors from TCGA and observed that the expression of FOXP2 was significantly correlated with the expression of AR (Spearman's ρ = 0.48, P < 0.001) (Figure 1. a). Next, we observed that both FOXP2- and FOXP2-CPED1overexpressing 293T cells had a higher AR protein abundance than control cells (Figure 1. b). In addition, shRNA-mediated FOXP2 knockdown in LNCaP cells resulted in a decreased AR protein level compared to that in control cells (Figure 1. c). However, we analyzed our CUT&Tag data and observed no binding of FOXP2 to AR (Figure 1. d). Our data suggest that FOXP2 might be associated with AR expression.

      Figure 1. a. AR expression in a human prostate cancer dataset (TCGA, Prostate Adenocarcinoma, Provisional; n = 493) classified by FOXP2 expression level (bottom 25%, low expression, n = 120; top 25%, high expression, n = 120; negative expression, n = 15). P values were calculated by the MannWhitney U test. The correlation between FOXP2 and AR expression was evaluated by determining the Spearman's rank correlation coefficient. b. Immunoblot analysis of the expression levels of AR in 293T cells with overexpression of FOXP2 or FOXP2-CPED1. c. Immunoblot analysis of the expression levels of AR in LNCaP cells with stable expression of the scrambled vector or FOXP2 shRNA. d. CUT&Tag analysis of FOXP2 association with the promoter of AR. Representative track of FOXP2 at the AR gene locus is shown.

      Reference

      1. Mayr C, Bartel DP. Widespread shortening of 3'UTRs by alternative cleavage and polyadenylation activates oncogenes in cancer cells. Cell. 2009 Aug 21;138(4):673-84.
      2. Gara SK, Jia L, Merino MJ, Agarwal SK, Zhang L, Cam M et al., Germline HABP2 Mutation Causing Familial Nonmedullary Thyroid Cancer. N Engl J Med. 2015 Jul 30;373(5):448-55.
      3. Kohno T, Ichikawa H, Totoki Y, Yasuda K, Hiramoto M, Nammo T et al., KIF5B-RET fusions in lung adenocarcinoma. Nat Med. 2012 Feb 12;18(3):375-7.
      4. Chen F, Byrd AL, Liu J, Flight RM, DuCote TJ, Naughton KJ et al., Polycomb deficiency drives a FOXP2-high aggressive state targetable by epigenetic inhibitors. Nat Commun. 2023 Jan 20;14(1):336.
      5. Kaya-Okur HS, Wu SJ, Codomo CA, Pledger ES, Bryson TD, Henikoff JG et al., CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat Commun. 2019 Apr 29;10(1):1930.
      6. Spiteri E, Konopka G, Coppola G, Bomar J, Oldham M, Ou J et al., Identification of the transcriptional targets of FOXP2, a gene linked to speech and language, in developing human brain. Am J Hum Genet. 2007 Dec;81(6):1144-57.
      7. Lai CS, Fisher SE, Hurst JA, Vargha-Khadem F, Monaco AP. A forkhead-domain gene is mutated in a severe speech and language disorder. Nature. 2001 Oct 4;413(6855):519-23.
      8. Hannenhalli S, Kaestner KH. The evolution of Fox genes and their role in development and disease. Nat Rev Genet. 2009 Apr;10(4):233-40.
      9. Shu W, Yang H, Zhang L, Lu MM, Morrisey EE. Characterization of a new subfamily of winged-helix/forkhead (Fox) genes that are expressed in the lung and act as transcriptional repressors. J Biol Chem. 2001 Jul 20;276(29):27488-97.
      10. Wang C, Liu H, Qiu Q, Zhang Z, Gu Y, He Z. TCRP1 promotes NIH/3T3 cell transformation by over-activating PDK1 and AKT1. Oncogenesis. 2017 Apr 24;6(4):e323.
      11. Suh YA, Arnold RS, Lassegue B, Shi J, Xu X, Sorescu D et al., Cell transformation by the superoxide-generating oxidase Mox1. Nature. 1999 Sep 2;401(6748):79-82.
    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript seeks to identify the mechanism underlying priority effects in a plantmicrobe-pollinator model system and to explore its evolutionary and functional consequences. The manuscript first documents alternative community states in the wild: flowers tend to be strongly dominated by either bacteria or yeast but not both. Then lab experiments are used to show that bacteria lower the nectar pH, which inhibits yeast - thereby identifying a mechanism for the observed priority effect. The authors then perform an experimental evolution unfortunately experiment which shows that yeast can evolve tolerance to a lower pH. Finally, the authors show that low-pH nectar reduces pollinator consumption, suggesting a functional impact on the plant-pollinator system. Together, these multiple lines of evidence build a strong case that pH has far-reaching effects on the microbial community and beyond.

      The paper is notable for the diverse approaches taken, including field observations, lab microbial competition and evolution experiments, genome resequencing of evolved strains, and field experiments with artificial flowers and nectar. This breadth can sometimes seem a bit overwhelming. The model system has been well developed by this group and is simple enough to dissect but also relevant and realistic. Whether the mechanism and interactions observed in this system can be extrapolated to other systems remains to be seen. The experimental design is generally sound. In terms of methods, the abundance of bacteria and yeast is measured using colony counts, and given that most microbes are uncultivable, it is important to show that these colony counts reflect true cell abundance in the nectar.

      We have revised the text to address the relationship between cell counts and colony counts with nectar microbes. Specifically, we point out that our previous work (Peay et al. 2012) established a close correlation between CFUs and cell densities (r2 = 0.76) for six species of nectar yeasts isolated from D. aurantiacus nectar at Jasper Ridge, including M. reukaufii.

      As for A. nectaris, we used a flow cytometric sorting technique to examine the relationship between cell density and CFU (figure supplement 1). This result should be viewed as preliminary given the low level of replication, but this relationship also appears to be linear, as shown below, indicating that colony counts likely reflect true cell abundance of this species in nectar.

      It remains uncertain how closely CFU reflects total cell abundance of the entire bacterial and fungal community in nectar. However, a close association is possible and may be even likely given the data above, showing a close correlation between CFU and total cell count for several yeast species and A. nectaris, which are indicated by our data to be dominant species in nectar.

      We have added the above points in the manuscript (lines 263-264, 938-932).

      The genome resequencing to identify pH-driven mutations is, in my mind, the least connected and developed part of the manuscript, and could be removed to sharpen and shorten the manuscript.

      We appreciate this perspective. However, given the disagreement between this perspective and reviewer 2’s, which asks for a more expanded section, we have decided to add a few additional lines (lines 628-637), briefly expanding on the genomic differences between strains evolved in bacteria-conditioned nectar and those evolved in low-pH nectar.

      Overall, I think the authors achieve their aims of identifying a mechanism (pH) for the priority effect of early-colonizing bacteria on later-arriving yeast. The evolution and pollinator experiments show that pH has the potential for broader effects too. It is surprising that the authors do not discuss the inverse priority effect of early-arriving yeast on later-arriving bacteria, beyond a supplemental figure. Understandably this part of the story may warrant a separate manuscript.

      We would like to point out that, in our original manuscript, we did discuss the inverse priority effects, referring to relevant findings that we previously reported (Tucker and Fukami 2014, Dhami et al. 2016 and 2018, Vannette and Fukami 2018). Specifically, we wrote that: “when yeast arrive first to nectar, they deplete nutrients such as amino acids and limit subsequent bacterial growth, thereby avoiding pH-driven suppression that would happen if bacteria were initially more abundant (Tucker and Fukami 2014; Vannette and Fukami 2018)” (lines 385-388). However, we now realize that this brief mention of the inverse priority effects was not sufficiently linked to our motivation for focusing mainly on the priority effects of bacteria on yeast in the present paper. Accordingly, we added the following sentences: “Since our previous papers sought to elucidate priority effects of early-arriving yeast, here we focus primarily on the other side of the priority effects, where initial dominance of bacteria inhibits yeast growth.” (lines 398-401).

      I anticipate this paper will have a significant impact because it is a nice model for how one might identify and validate a mechanism for community-level interactions. I suspect it will be cited as a rare example of the mechanistic basis of priority effects, even across many systems (not just pollinator-microbe systems). It illustrates nicely a more general ecological phenomenon and is presented in a way that is accessible to a broader audience.

      Thank you for this positive assessment.

      Reviewer #2 (Public Review):

      The manuscript "pH as an eco-evolutionary driver of priority effects" by Chappell et al illustrates how a single driver-microbial-induced pH change can affect multiple levels of species interactions including microbial community structure, microbial evolutionary change, and hummingbird nectar consumption (potentially influencing both microbial dispersal and plant reproduction). It is an elegant study with different interacting parts: from laboratory to field experiments addressing mechanism, condition, evolution, and functional consequences. It will likely be of interest to a wide audience and has implications for microbial, plant, and animal ecology and evolution.

      This is a well-written manuscript, with generally clear and informative figures. It represents a large body and variety of work that is novel and relevant (all major strengths).

      We appreciate this positive assessment.

      Overall, the authors' claims and conclusions are justified by the data. There are a few things that could be addressed in more detail in the manuscript. The most important weakness in terms of lack of information/discussion is that it looks like there are just as many or more genomic differences between the bacterial-conditioned evolved strains and the low-pH evolved strains than there are between these and the normal nectar media evolved strains. I don't think this negates the main conclusion that pH is the primary driver of priority effects in this system, but it does open the question of what you are missing when you focus only on pH. I would like to see a discussion of the differences between bacteria-conditioned vs. low-pH evolved strains.

      We agree with the reviewer and have included an expanded discussion in the revised manuscript [lines 628-637]. Specifically, to show overall genomic variation between treatments, we calculated genome-wide Fst comparing the various nectar conditions. We found that Fst was 0.0013, 0.0014, and 0.0015 for the low-pH vs. normal, low pH vs. bacteria-conditioned, and bacteria-conditioned vs. normal comparisons, respectively. The similarity between all treatments suggests that the differences between bacteria-conditioned and low pH are comparable to each treatment compared to normal. This result highlights that, although our phenotypic data suggest alterations to pH as the most important factor for this priority effect, it still may be one of many affecting the coevolutionary dynamics of wild yeast in the microbial communities they are part of. In the full community context in which these microbes grow in the field, multi-species interactions, environmental microclimates, etc. likely also play a role in rapid adaptation of these microbes which was not investigated in the current study.

      Based on this overall picture, we have included additional discussion focusing on the effect of pH on evolution of stronger resistance to priority effects. We compared genomic differences between bacteria-conditioned and low-pH evolved strains, drawing the reader’s attention to specific differences in source data 14-15. Loci that varied between the low pH and bacteria-conditioned treatments occurred in genes associated with protein folding, amino acid biosynthesis, and metabolism.

      Reviewer #3 (Public Review):

      This work seeks to identify a common factor governing priority effects, including mechanism, condition, evolution, and functional consequences. It is suggested that environmental pH is the main factor that explains various aspects of priority effects across levels of biological organization. Building upon this well-studied nectar microbiome system, it is suggested that pH-mediated priority effects give rise to bacterial and yeast dominance as alternative community states. Furthermore, pH determines both the strengths and limits of priority effects through rapid evolution, with functional consequences for the host plant's reproduction. These data contribute to ongoing discussions of deterministic and stochastic drivers of community assembly processes.

      Strengths:

      Provides multiple lines of field and laboratory evidence to show that pH is the main factor shaping priority effects in the nectar microbiome. Field surveys characterize the distribution of microbial communities with flowers frequently dominated by either bacteria or yeast, suggesting that inhibitory priority effects explain these patterns. Microcosm experiments showed that A. nectaris (bacteria) showed negative inhibitory priority effects against M. reukaffi (yeast). Furthermore, high densities of bacteria were correlated with lower pH potentially due to bacteria-induced reduction in nectar pH. Experimental evolution showed that yeast evolved in low-pH and bacteria-conditioned treatments were less affected by priority effects as compared to ancestral yeast populations. This potentially explains the variation of bacteria-dominated flowers observed in the field, as yeast rapidly evolves resistance to bacterial priority effects. Genome sequencing further reveals that phenotypic changes in low-pH and bacteriaconditioned nectar treatments corresponded to genomic variation. Lastly, a field experiment showed that low nectar pH reduced flower visitation by hummingbirds. pH not only affected microbial priority effects but also has functional consequences for host plants.

      We appreciate this positive assessment.

      Weaknesses:

      The conclusions of this paper are generally well-supported by the data, but some aspects of the experiments and analysis need to be clarified and expanded.

      The authors imply that in their field surveys flowers were frequently dominated by bacteria or yeast, but rarely together. The authors argue that the distributional patterns of bacteria and yeast are therefore indicative of alternative states. In each of the 12 sites, 96 flowers were sampled for nectar microbes. However, it's unclear to what degree the spatial proximity of flowers within each of the sampled sites biased the observed distribution patterns. Furthermore, seasonal patterns may also influence microbial distribution patterns, especially in the case of co-dominated flowers. Temperature and moisture might influence the dominance patterns of bacteria and yeast.

      We agree that these factors could potentially explain the presented results. Accordingly, we conducted spatial and seasonal analyses of the data, which we detail below and include in two new paragraphs in the manuscript [lines 290-309].

      First, to determine whether spatial proximity influenced yeast and bacterial CFUs, we regressed the geographic distance between all possible pairs of plants to the difference in bacterial or fungal abundance between the paired plants. If plant location affected microbial abundance, one should see a positive relationship between distance and the difference in microbial abundance between a given pair of plants: a pair of plants that were more distantly located from each other should be, on average, more different in microbial abundance. Contrary to this expectation, we found no significant relationship between distance and the difference in bacterial colonization (A, p=0.07, R2=0.0003) and a small negative association between distance and the difference in fungal colonization (B, p<0.05, R2=0.004). Thus, there was no obvious overall spatial pattern in whether flowers were dominated by yeast or bacteria.

      Next, to determine whether climatic factors or seasonality affected the colonization of bacteria and yeast per plant, we used a linear mixed model predicting the average bacteria and yeast density per plant from average annual temperature, temperature seasonality, and annual precipitation at each site, the date the site was sampled, and the site location and plant as nested random effects. We found that none of these variables were significantly associated with the density of bacteria and yeast in each plant.

      To look at seasonality, we also re-ordered Fig 2C, which shows the abundance of bacteria- and yeast-dominated flowers at each site, so that the sites are now listed in order of sampling dates. In this re-ordered figure, there is no obvious trend in the number of flowers dominated by yeast throughout the period sampled (6.23 to 7/9), giving additional indication that seasonality was unlikely to affect the results.

      Additionally, sampling date does not seem to strongly predict bacterial or fungal density within each flower when plotted.

      These additional analyses, now included (figure supplements 2-4) and described (lines 290-309) in the manuscript, indicate that the observed microbial distribution patterns are unlikely to have been strongly influenced by spatial proximity, temperature, moisture, or seasonality, reinforcing the possibility that the distribution patterns instead indicate bacterial and yeast dominance as alternative stable states.

      The authors exposed yeast to nectar treatments varying in pH levels. Using experimental evolution approaches, the authors determined that yeast grown in low pH nectar treatments were more resistant to priority effects by bacteria. The metric used to determine the bacteria's priority effect strength on yeast does not seem to take into account factors that limit growth, such as the environmental carrying capacity. In addition, yeast evolves in normal (pH =6) and low pH (3) nectar treatments, but it's unclear how resistance differs across a range of pH levels (ranging from low to high pH) and affects the cost of yeast resistance to bacteria priority effects. The cost of resistance may influence yeast life-history traits.

      The strength of bacterial priority effects on yeast was calculated using the metric we previously published in Vannette and Fukami (2014): PE = log(BY/(-Y)) - log(YB/(Y-)), where BY and YB represent the final yeast density when early arrival (day 0 of the experiment) was by bacteria or yeast, followed by late arrival by yeast or bacteria (day 2), respectively, and -Y and Y- represent the final density of yeast in monoculture when they were introduced late or early, respectively. This metric does not incorporate carrying capacity. However, it does compare how each microbial species grows alone, relative to growth before or after a competitor. In this way, our metric compares environmental differences between treatments while also taking into account growth differences between strains.

      Here we also present additional growth data to address the reviewer’s point about carrying capacity. Our experiments that compared ancestral and evolved yeast were conducted over the course of two days of growth. In preliminary monoculture growth experiments of each evolved strain, we found that yeast populations did reach carrying capacity over the course of the two-day experiment and population size declined or stayed constant after three and four days of growth.

      However, we found no significant difference in monoculture growth between the ancestral stains and any of the evolved strains, as shown in Figure supplement 12B. This lack of significant difference in monoculture suggests that differences in intrinsic growth rate do not fully explain the priority effects results we present. Instead, differences in growth were specific to yeast’s response to early arrival by bacteria.

      We also appreciate the reviewer’s comment about how yeast evolves resistance across a range of pH levels, as well as the effect of pH on yeast life-history traits. In fact, reviewer #2 pointed out an interesting trade-off in life history traits between growth and resistance to priority effects that we now include in the discussion (lines 535-551) as well as a figure in the manuscript (Figure 8).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      The brain-machine interface used in this study differs from typical BMIs in that it's not intended to give subjects voluntary control over their environment. However, it is possible that rats may become aware of their ability to manipulate trial start times using their neural activity. Is there any evidence that the time required to initiate trials on high-coherence or low-coherence trials decreases with experience?

      This is a great question. First, we designed the experiment to avoid this possibility. Rats were experienced on the sequence of the automatic maze both pre and post implantation (totaling to weeks of pre-training and habituation). As such, the majority of the trials ever experienced by the rat were not controlled by their neural activity. During BMI experimentation, only 10% of trials were triggered during high coherence states and 10% for low coherence states, leaving ~80% of trials not controlled by their neural activity. We also implemented a pseudo-randomized trial sequence. When considered together, we specifically designed this experiment to avoid the possibility that rats would actively use their neural activity to control the maze.

      Second, we had a similar question when collecting data for this manuscript and so we conducted a pilot experiment. We took 3 rats from experiment #1 (after its completion) and we required them to perform “forced-runs” over the course of 3-4 days, a task where rats navigate to a reward zone and are rewarded with a chocolate pellet. The trajectory on “forced-runs” is predetermined and rats were always rewarded for navigating along the predetermined route. Every trial was initiated by strong mPFC-hippocampal theta coherence. We were curious as to whether time-to-trial-onset would decrease if we repeatedly paired trial onset to strong mPFC-hippocampal theta coherence. 1 out of 3 rats (rat 21-35) showed a significant correlation between time-to-trial onset and trial number, indicating that our threshold for strong mPFC-hippocampal theta coherence was being met more quickly with experience (Figure R1A). When looking over sessions and rats, there was considerable variability in the magnitude of this correlation and sometimes even the direction (Figure R1B). As such, the degree to which rat 21-35 was aware of controlling the environment by reaching strong mPFC-hippocampal theta coherence is unclear, but this question requires future experimentation.

      Author response image 1.

      Strong mPFC-hippocampal theta coherence was used to control trial onset for the entirety of forced-navigation sessions. Time-to-trial onset is a measurement of how long it took for strong coherence to be met. A) Time-to-trial onset was averaged across sessions for each rat, then plotted as a function of trial number (within-session experience on the forced-runs task). Rat 21-35 showed a significant negative correlation between time-to-trial onset and trial number, indicating that time-to-coherence reduced with experience. The rest of the rats did not display this effect. B) Correlation between trial-onset and trial number (y-axis; see A) across sessions (x-axis). A majority of sessions showed a negative correlation between time-to-trial onset and trial number, like what was seen in (A), but the magnitude and sometimes direction of this effect varied considerably even within an animal.

      Is there any evidence that rats display better performance on trials with random delays in which HPC-PFC coherence was naturally elevated?

      This question is now addressed in Extended Figure 5 and discussed in the section titled “strong prefrontal-hippocampal theta coherence leads to correct choices on a spatial working memory task”.

      The introduction frames this study as a test of the "communication through coherence" hypothesis. In its strongest form, this hypothesis states that oscillatory synchronization is a pre-requisite for inter-areal communication, i.e. if two areas are not synchronized, they cannot transfer information. Recent experimental evidence shows this relationship is more likely inverted-coherence is a consequence of inter-areal interactions, rather than a cause. See Schneider et al. (DOI: 10.1016/j.neuron.2021.09.037) and Vinck et al. (10.1016/j.neuron.2023.03.015) for a more in-depth explanation of this distinction. The authors should expand their treatment of this hypothesis in light of these findings.

      Our introduction and discussions have sections dedicated to these studies now.

      Figure 6 - It would be much more intuitive to use the labels "Rat 1", "Rat 2", and "Rat 3"; the "21-4X" identifiers are confusing.

      This was corrected in the paper.

      Figure 6C - The sub-plots within this figure are rather small and difficult to interpret. The figure would be easier to parse if the data were presented as a heatmap of the ratio of theta power during blue vs. red stim, with each pixel corresponding to one channel.

      This suggestion was implemented in the paper. See Fig 6C. Extended Fig. 8 now shows the power spectra as a function of recording shank and channel.

      Ext. Figure 2B - What happens during an acquisition failure? Instead of "Amount of LFP data," consider using "Buffer size".

      Corrected.

      Ext. Figure 2D-E - Instead of "Amount of data," consider using "Window size"

      Referred to as buffer size.

      Ext. Figure 2E - y-axis should extend down to 4 Hz. Are all of the last four values exactly at 8 Hz?

      Yes. Values plateau at 8Hz. These data represent an average over ~50 samples.

      Ext. Figure 2F - consider moving this before D/E, since those panels are summaries of panel F

      Corrected.

      Ext. Figure 4A - ANOVA tells you that accuracy is impacted by delay duration, but not what that impact is. A post-hoc test is required to show that long delays lead to lower accuracy than short ones. Alternatively, one could compute the correlation between delay duration and proportion correctly for each mouse, and look for significant negative values.

      We included supplemental analyses in Extended Fig. 4

      Reviewer #2 (Recommendations For The Authors):

      The authors should replace terms that suggest a causal relationship between PFC-HPC synchrony and behavior, such as 'leads to', 'biases', and 'enhances' with more neutral terms.

      Causal implications were toned down and wherever “leads” or “led” remains, we specifically mean in the context of coherence being detected prior to a choice being made.

      The rationale for the analysis described in the paragraph starting on line 324, and how it fits with the preceding results, was not clear to me. The authors also write at the start of this paragraph "Given that mPFC-hippocampal theta coherence fluctuated in a periodical manner (Extended Fig. 5B)", but this figure only shows example data from 2 trials.

      The reviewer is correct. While we point towards 3 examples in the manuscript now, we focused this section on the autocorrelation analysis, which did not support our observation as we noticed a rather linear decay in correlation over time. As such, the periodicity observed was almost certainly a consequence of overlapping data in the epochs used to calculate coherence rather than intrinsic periodicity.

      Shortly after the start of the results section (line 112), the authors go into a very detailed description of how they validated their BMI without first describing what the BMI actually does. This made this and the subsequent paragraphs difficult to follow. I suggest the authors start with a general description of the BMI (and the general experiment) before going into the details.

      Corrected. See first paragraph of “Development of a closed-loop…”.

      In Figure 2C, as expected, around the onset of 'high' coherence trials, there is an increase in theta coherence but this appears to be very transient. However, it is unclear what the heatmap represents: is it a single trial, single session, an average across animals, or something else? In Figure 3F, however, the increase appears to be much more sustained.

      The sample size was rats for every panel in this figure. This was clarified at the end of Fig. 3.

      In Figure 2D, it was not clear to me what units of measurement are used when the averages and error bars are calculated. What is the 'n' here? Animals or sessions? This should be made clear in this figure as well as in other figures.

      The sample size is rats. This is now clarified at the end of Fig 2.

      Describing the study of Jones and Wilson (2005), the authors write: "While foundational, this study treated the dependent variable (choice accuracy) as independent to test the effect of choice outcome on task performance." (line 83) It was not clear to me what is meant by "dependent" and "independent" here. Explaining this more clearly might clarify how the authors' study goes beyond this and other previous studies.

      The reviewer is correct. A discussion on independent/dependent variables in the context of rationale for our experiment was removed.

      Reviewer #3 (Recommendations For The Authors):

      As explained in the public review, my comments mainly concern the interpretation of the experimental paradigm and its link with previous findings. I think modifying these in order to target the specific advance allowed by the paradigm would really improve the match between the experimental and analytical data that is very solid and the author's conclusions.

      Concerning the paradigm, I recommend that the authors focus more on their novel ability to clearly dissociate the functional role of theta coherence prior to the choice as opposed to induced by the choice. Currently, they explain by contrasting previous studies based on dependent variables whereas their approach uses an independent variable. I was a bit confused by this, particularly because the task variable is not really independent given that it's based on a brain-driven loop. Since theta coherence remains correlated with many other neurophysiological variables, the results cannot go beyond showing that leading up to the decision it correlates with good choice accuracy, without providing evidence that it is theta coherence itself that enhances this accuracy as they suggest in lines 93-94.

      The reviewer is correct. A discussion on independent/dependent variables in the context of rationale for our experiment was removed.

      Regarding previous results with muscimol inactivation, I recommend that the authors expand their discussion on this point. I think that their correlative data is not sufficient to conclude as they do that despite "these structures being deemed unnecessary" (based on causal muscimol experiments), they "can still contribute rather significantly" since their findings do not show a contribution, merely a correlation. This extra discussion could include possible explanations of the apparent, and thought-provoking discrepancies that they uncover such as: theta coherence may be a correlate of good accuracy without an underlying causal relation, theta coherence may always correlate with good accuracy but only be causally important in some tasks related to spatial working memory or, since muscimol experiments leave the brain time to adapt to the inactivation, redundancy between brain areas may mask their implication in the physiological context in certain tasks (see Goshen et al 2011).

      The second paragraph of the discussion is now dedicated to this.

      Possible further analysis :

      • In Extended 4A the authors show that performance drops with delay duration. It would be very interesting to see this graph with the high coherence / low coherence / yoked trials to see if the theta coherence is most important for longer trials for example.

      This is a great suggestion. Due to 10% of trials being triggered by high coherence states, our sample size precludes a robust analysis as suggested. Given that we found an enhancement effect on a task with minimal spatial working memory requirements (Fig. 4), it seems that coherence may be a general benefit or consequence of choice processes. Nonetheless, this remains an important question to address in a future study.

      • Figure 6: The authors explain in the text that although the effect of stimulation of VMT is variable, overall VMT activation increased PFC-HPC coherence. I think in the figure the results are only shown for one rat and session per panel. It would be interesting to add a figure including their whole data set to show the overall effect as well as the variability.

      The reviewer is correct and this comment promoted significant addition of detail to the manuscript. We have added an extended figure (Ext. Fig. 9) showing our VMT stimulation recording sessions. We originally did not include these because we were performing a parameter search to understanding if VMT stimulation could increase mPFC-hippocampal theta coherence. The results section was expanded accordingly.

      Changes to writing / figures :

      • The paper by Eliav et al, 2018 is cited to illustrate the universality of coupling between hippocampal rhythms and spikes whereas the main finding of this paper is that spikes lock to non-rhythmic LFP in the bat hippocampus. It seems inappropriate to cite this paper in the sentence on line 65.

      We agree with the reviewer and this citation was removed.

      • Line 180 when explaining the protocol, it would help comprehension if the authors clearly stated that "trial initiation" means opening the door to allow the rat to make its choice. I was initially unfamiliar with the paradigm and didn't figure this out immediately.

      We added a description to the second paragraph of our first results section.

      • Lines 324 and following: the analysis shows that there is a slow decay over around 2s of the theta coherence but not that it is periodical (as in regularly occurring in time), this would require the auto-correlation to show another bump at the timescale corresponding to the period of the signal. I recommend the authors use a different terminology.

      This comment is now addressed above in our response to Reviewer #2.

      • Lines 344: I am not sure why the stable theta coherence levels during the fixed delay phase show that the link with task performance is "through mechanisms specific to choice". Could the authors elaborate on this?

      We elaborated on this point further at the end of “Trials initiated by strong prefrontal-hippocampal theta coherence are characterized by prominent prefrontal theta rhythms and heightened pre-choice prefrontal-hippocampal synchrony”

      • Line 85: "independent to test the effect of choice outcome on task performance." I think there is a typo here and "choice outcome" should be "theta coherence".

      The sentence was removed in the updated draft.

    1. Author Response

      Reviewer 1 (Public Review):

      To me, the strengths of the paper are predominantly in the experimental work, there's a huge amount of data generated through mutagenesis, screening, and DMS. This is likely to constitute a valuable dataset for future work.

      We are grateful to the reviewer for their generous comment.

      Scientifically, I think what is perhaps missing, and I don't want this to be misconstrued as a request for additional work, is a deeper analysis of the structural and dynamic molecular basis for the observations. In some ways, the ML is used to replace this and I think it doesn't do as good a job. It is clear for example that there are common mechanisms underpinning the allostery between these proteins, but they are left hanging to some degree. It should be possible to work out what these are with further biophysical analysis…. Actually testing that hypothesis experimentally/computationally would be nice (rather than relying on inference from ML).

      We agree with the reviewer that this study should motivate a deeper biophysical analysis of molecular mechanisms. However, in our view, the ML portion of our work was not intended as a replacement for mechanistic analysis, nor could it serve as one. We treated ML as a hypothesis-generating tool. We hypothesized that distant homologs are likely to have similar allosteric mechanisms which may not be evident from visual analysis of DMS maps. We used ML to (a) extract underlying similarities between homologs (b) make cross predictions across homologs. In fact, the chief conclusion of our work is that while common patterns exist across homologs, the molecular details differ. ML provides tantalizing evidence to this effect. The conclusive evidence will require, as the reviewer rightly suggests, detailed experimental or molecular dynamics characterization. Along this line, we note that we have recently reported our atomistic MD analysis of allostery hotspots in TetR (JACS, 2022, 144, 10870). See ref. 41.

      Changes to manuscript:<br /> “Detailed biophysical or molecular dynamics characterization will be required to further validate our conclusions(38).”

      Reviewer 3 (Public Review):

      However - at least in the manuscript's present form - the paper suffers from key conceptual difficulties and a lack of rigor in data analysis that substantially limits one's confidence in the authors' interpretations.

      We hope the responses below address and allay the reviewer’s concerns.

      A key conceptual challenge shaping the interpretation of this work lies in the definition of allostery, and allosteric hotspot. The authors define allosteric mutations as those that abrogate the response of a given aTF to a small molecule effector (inducer). Thus, the results focus on mutations that are "allosterically dead". However, this assay would seem to miss other types of allosteric mutations: for example, mutations that enhance the allosteric response to ligand would not be captured, and neither would mutations that more subtly tune the dynamic range between uninduced ("off) and induced ("on") states (without wholesale breaking the observed allostery). Prior work has even indicated the presence of TetR mutations that reverse the activity of the effector, causing it to act as a co-repressor rather than an inducer (Scholz et al (2004) PMID: 15255892). Because the work focuses only on allosterically dead mutations, it is unclear how the outcome of the experiments would change if a broader (and in our view more complete) definition of allostery were considered.

      We agree with the reviewer that mutations that impact allostery manifest in many different ways. Furthermore, the effect size of these mutations runs the full gamut from subtle changes in dynamic range to drastic reversal of function. To unpack allostery further, allostery of aTF can be described, not just by the dynamic range, but by the actual basal and induced expression levels of the reporter, EC50 and Hill coefficient. Given the systemic nature of allostery, a substantial fraction of aTF mutations may have some subtle impact on one or more of these metrics. To take the reviewer’s argument one step further, one would have to accurately quantify the effect size of every single amino acid mutation on all the above properties to have a comprehensive sequence-function landscape of allostery. Needless to say, this is extremely hard! Resolution of small effect sizes is very difficult, even at high sequencing depth. To the best of our knowledge, a heroic effort approaching such comprehensive analysis has been accomplished so far only once (PMID: 3491352).

      Our focus, therefore, was to screen for the strongest phenotypic impact on allostery i.e., loss of function. Mutations leading to loss of function can be relatively easily identified by cell-sorting. Because our goal was to compare hotspots across homologs, we surmised that loss of function mutations, given their strong phenotypic impact, are likely to provide the clearest evidence of whether allosteric hotspots are conserved across remote homologs.

      The reviewer raised the point of activity-reversing mutations. Yes, there are activity reversing mutations in TetR. However, they represent an insignificant fraction. In the paper cited by the reviewer, there are 15 activity-reversing mutations among 4000 screened. Furthermore, the paper shows that activity-reversing in TetR requires two-tofour mutations, while our library is exclusively single amino acid substitutions. For these reasons, we did not screen for activity-reversing mutations. Nonetheless, we agree with the reviewer that screening for activity-reversing mutations across homologs would be very interesting.

      The separation in fluorescence between the uninduced and induced states (the assay dynamic range, or fold induction) varies substantially amongst the four aTF homologs. Most concerningly, the fluorescence distributions for the uninduced and induced populations of the RolR single mutant library overlap almost completely (Figure 1, supplement 1), making it unclear if the authors can truly detect meaningful variation in regulation for this homolog.

      Yes, the reviewer is correct that the fold induction ratio varies among the four aTF homologs. However, we note that such differences are common among natural aTFs. Depending on the native downstream gene regulated by the aTF, some aTFs show higher ligand-induced activation, and others are lower. While this is not a hard and fast rule, aTFs that regulate efflux pumps tend to have higher fold induction than those that regulate metabolic enzymes. In summary, the variation in fold induction among the four aTFs is not a flaw in experimental design nor indicates experimental inconsistency but is instead just an inherent property of protein-DNA interaction strength and the allosteric response of each aTF.

      Among the four aTFs, wildtype RolR has the weakest fold induction (15-fold) which makes sorting the RolR library particularly challenging. To minimize false positives as much as possible, we require that dead mutant be present in (a) non-fluorescent cells after ligandinduction (b) non-fluorescent cells before ligand-induction (c) at least two out of the three replicates for both sorts. Additionally, for RolR specifically, we adjusted the nonfluorescent gate to be far more stringent than the other three aTFs (Fig. 1 – figure supplement 1). Furthermore, we assign residues as allosteric hotspots, not individual dead mutations. This buffers against false strong signals from stray individual dead mutations. Finally, the top interquartile range winnows them to residues showing strong consistent dead phenotype. As a result of these “safeguards” we have built in, the number of allosteric hotspots of RolR (57) is comparable to the other three aTFs (51, 53 and 48). This suggests that we are not overestimating the number of hotspots despite the weaker fold induction of RolR. We highlight in a new supplementary figure (Figure 1 – figure supplement 4) that changing the read count threshold from 5X to 10X produces near identical patterns of mutations suggesting that our results are also robust to changes in ready depth stringency.

      Changes to manuscript: In response to the reviewer's comment, we have added the following sentence.

      “We note that the lower fold induction (dynamic range) of RolR makes it particularly challenging to separate the dead variants from the rest.”

      The methods state that "variants with at least 5 reads in both the presence and absence of ligand in at least two replicates were identified as dead". However, the use of a single threshold (5 reads) to define allosterically dead mutations across all mutations in all four homologs overlooks several important factors:

      Depending on the starting number of reads for a given mutation in the population (which may differ in orders of magnitude), the observation of 5 reads in the gated nonfluorescent region might be highly significant, or not significant at all. Often this is handled by considering a relative enrichment (say in the induced vs uninduced population) rather than a flat threshold across all variants.

      We regret the lack of clarity in our presentation. We wish to better explain the rationale behind our approach. First, we understand the reviewer’s point on considering relative enrichment to define a threshold. This approach works well in DMS experiments involving genetic selections, which is commonly the case, because activity scales well with selection stringency. One can then pick enrichment/depletion relative to the middle of the read count distribution as a measure of gain or loss of function.

      Second, this strategy does not, in practice, work well for cell-sorting screens. While it may be tempting to think of cell sorting as comparably activity-scaled as genetic selections, in reality, the fidelity of fluorescent-activated cell sorters is much lower. Making quantitative claims of activity based on cell sorting enrichment can be risky. It is wiser to treat cell sorting results as yes/no binary i.e., does the mutation disrupt allostery or not. More importantly, the yes/no binary classification suffices for our need to identify if a certain mutation adversely impacts allosteric activity or not.

      Third, the above argument does not imply that all mutations have the same effect size on allostery. They don’t. We capture the effect size on individual residues, not individual mutations, by counting the number of dead mutations at a residue position. This is an important consideration because it safeguards us from minor inconsistencies that inevitably arise from cell sorting.

      Fourth, a variant to be classified as allosterically dead, it must be present both in uninduced and induced DNA-bound populations in at least two out of three replicates (four conditions total). This is a stringent criterion for selecting dead variants resulting in highly consistent regions of importance in the protein even upon varying read count thresholds. To the extent possible, we have minimized the possibility of false positive bleed-through.

      Finally, two separate normalizations were performed on the total sequence reads to be able to draw a common read count threshold 1) between experimental conditions & replicates and 2) across proteins. First, total sequencing reads were normalized to 200k total across all sample conditions (presorted, -inducer, and +inducer) and replicates for each homolog, allowing comparisons within a single protein. Next, reads were normalized again to account for differences in the theoretical size of each protein’s single-mutant library, allowing for comparisons across proteins by drawing a commont readcount cutoff. For example, total sequencing reads of RolR (4,332 possible mutants) increased by 1.18x relative to MphR (3,667 possible mutants) for a total of 236k reads.

      Changes to manuscript: We have provided substantial additional details in the Fluorescence-activated cell sorting and NGS preparation and analysis sections.

      We also added the following in the main text.

      “In other words, we use cell sorting as a binary classifier i.e., does the mutation disrupt allostery or not. We capture the effect size on individual residues, not individual mutations, by counting the number of dead mutations at a residue position. This is an important consideration because it safeguards us from minor inconsistencies that inevitably arise from cell sorting.”

      Depending on the noise in the data (as captured in the nucleotide-specific q-scores) and the number of nucleotides changed relative to the WT (anywhere between 1-3 for a given amino acid mutation) one might have more or less chance of observing five reads for a given mutation simply due to sequencing noise.

      All the reads considered in our analyses pass the Illumina quality threshold of Q-score ≥ 30 which as per Illumina represent “perfect reads with no errors or ambiguities”. This translates into a probability of 1 in 1000 incorrect base call or 99.9% base call accuracy.

      We use chip-based oligonucleotides to build our DMS library, which allows us to prespecify the exact codon that encodes a point mutation. This means the nucleotide count and protein count are the same. The scenario referred to by the reviewer i.e., “anywhere between 1-3 for a given amino acid mutation” only applies to codon randomized or errorprone PCR library generation. We regret if the chip-based library assembly part was unclear.

      Depending on the shape and separation of the induced (fluorescent) and uninduced (non-fluorescent) population distributions, one might have more or less chance of observing five reads by chance in the gated non-fluorescent region. The current single threshold does not account for variation in the dynamic range of the assay across homologs.

      We have addressed the concern raised by the reviewer on fluorescent population distributions in answers to questions 10 and 11.

      The reviewer makes an important point about the choice of sequencing threshold. We use the sequencing threshold to simply make a binary choice for whether a certain variant exists in the sorted population or not. We do not use the sequencing reads as to scale the activity of the variant. To address the reviewer's comment, we have included a new supplementary figure (Fig 1 – figure supplement 4) where we compare the data by adjust the threshold two levels – 5 and 10 reads. As is evident in the new figure, the fundamental pattern of allosteric hotspots and the overall data interpretation does not change.

      TetR: 5x – 53 hotspots, 10x – 51 hotspots

      TtgR: 5x – 51 hotspots, 10x – 51 hotspots

      MphR: 5x – 48 hotspots, 10x – 48 hotspots

      RolR: 5x – 57 hotspots, 10x – 60 hotspots

      In other words, changing the threshold to be more or less strict may have a modest impact on the overall number of hotspots in the dataset. Still, the regions of functional importance are consistent across different thresholds. We have expanded the discussion in the manuscript to address this point.

      Changes to manuscript: We have now included a new supplementary comparing hotspot data at two thresholds: Figure 1 – figure supplement 4.

      We also added the following in the main text.

      “To assess the robustness of our classification of hotspots, we determined the number of hotspots at two different sequencing thresholds – 5x and 10x. At 5x and 10x, the number of hotspots are – TetR: 53, 51; TtgR: 51, 51; MphR: 48, 48 and RolR: 57,60, respectively. Changing the threshold has a modest impact on the overall number of hotspots and the regions of functional importance are consistent at both thresholds”

      The authors provide a brief written description of the "weighted score" used to define allosteric hotspots (see y-axis for figure 1B), but without an equation, it is not clear what was calculated. Nonetheless, understanding this weighted score seems central to their definition of allosteric hotspots.

      We regret the lack of clarity in our presentation. The weighted score was used to quantify the “deadness” of every residue position in the protein. At each position in the protein, the number of mutations that inhibited activity was summed up and the ‘deadness’ of each mutation was weighted based on how many replicates is appeared to inactivate the protein. Weighted score at each residue position is given by

      Where at position x in the protein, D1 is the number of mutations dead in one replicate only, D2 is the number of mutations dead in 2 replicates, D3 is the number of mutations dead in 3 replicates, and Total is the total number of variants present in the data set (based on sequencing data). Any dead mutation that is seen in only one replicate is discarded and does not contribute to the “deadness” of the residue. Mutations seen in two and three replicates contribute to the score. We have included a new supplementary figure (Fig. 1 – figure supplement 2) to give the reader a detailed heatmap of all mutations and their impact for each protein.

      Changes to manuscript: The weighted scoring scheme is now described in greater detail under Materials and Methods in the “NGS preparation and analysis” section.

      The authors do not provide some of the standard "controls" often used to assess deep mutational scanning data. For example, one might expect that synonymous mutations are not categorized as allosterically dead using their methods (because they should still respond to ligand) and that most nonsense mutations are also not allosterically dead (because they should no longer repress GFP under either condition). In general, it is not clear how the authors validated the assay/confirmed that it is giving the expected results.

      As we state in response to question 12, we use chip-based oligonucleotides to build our DMS library, which allows us to pre-specify the exact codon that encodes a point mutation. We have no synonymous or nonsense mutations in our DMS library. Each protein mutation is encoded by a single unique codon. The only stop codon is at 3’end of the gene.

      The authors performed three replicates of the experiment, but reproducibility across replicates and noise in the assay is not presented/discussed.

      Changes to manuscript: A new supplementary table (Table 1) is now provided with the pairwise correlation coefficients between all replicates for each protein.

      In the analysis of long-range interactions, the authors assert that "hotspot interactions are more likely to be long-range than those of non-hotspots", but this was not accompanied by a statistical test (Figure 2 - figure supplement 1).

      In response to the reviewer's comment, we now include a paired t-test comparing nonhotspots and hotspots with long-range interactions in the main text.

      Changes to manuscript: In all four aTFs, hotspots constituted a higher fraction of LRIs than non-hotspots (Figure 2 – figure supplement 1; P = 0.07).

    1. Author Response

      Reviewer #1 (Public Review):

      This study provides further detailed analysis of recently published Fly Atlas datasets supplemented with newly generated single cell RNA-seq data obtained from 6,000 testis cells. Using these data, the authors define 43 germline cell clusters and 22 somatic cell clusters. This work confirms and extends previous observations regarding changing gene expression programs through the course of germ cell and somatic cell differentiation.

      This study makes several interesting observations that will be of interest to the field. For example, the authors find that spermatocytes exhibit sex chromosome specific changes in gene expression. In addition, comparisons between the single nucleus and single cell data reveal differences in active transcription versus global mRNA levels. For example, previous results showed that (1) several mRNAs remain high in spermatids long after they are actively transcribed in spermatocytes and (2) defined a set of post-meiotic transcripts. The analysis presented here shows that these patterns of mRNA expression are shared by hundreds of genes in the developing germline. Moreover, variable patterns between the sn- and sc-RNAseq datasets reveals considerable complexity in the post-transcriptional regulation of gene expression.

      Overall, this paper represents a significant contribution to the field. These findings will be of broad interest to developmental biologists and will establish an important foundation for future studies. However, several points should be addressed.

      In figure 1, I am struck by the widespread expression of vasa outside of the germ cell lineage. Do the authors have a technical or biological explanation for this observation? This point should be addressed in the paper with new experiments or further explanation in the text.

      Thank you for pointing this out. We found that our single cell dataset shows a similar (low) level of vasa expression outside the germline, suggesting that this is not due to single nucleus versus single cell RNA-seq (cluster 1, red in the lefthand umap).

      Analyzing the single nucleus RNA-seq in more detail revealed that, compared to the germline, both the fraction of cells in a cluster expressing vasa and the level at which they express it are very low. This analysis is included in a new Figure 1 – figure supplement 1. It is likely that much of this is due to a technical artifact, such as ambient RNA. Finally, we note in the resubmission that vasa is in fact expressed in embryonic somatic cells, and thus some of the vasa expression we observe may be real (Renault. Biol Open 2012; https://doi.org/10.1242/bio.20121909).

      Plots in the original submission drew undue attention to the few somatic cells that exhibited vasa signal, due to the fact that expressing cell points were forced to the front of the plot. Given our new analysis reporting the low levels and fraction of cells exhibiting vasa expression (Figure 1 – figure supplement 1), we have modified the panels of Figure 1, changing point size to more faithfully reflect the small proportion of somatic cells with some vasa expression.

      The proposed bifurcation of the cyst cells into head and tail populations is interesting and worth further exploration/validation. While the presented in situ hybridization for Nep4, geko, and shg hint at differences between these populations, double fluorescent in situs or the use of additional markers would help make this point clearer. Higher magnification images would also help in this regard.

      We thank the reviewer for their suggestions on clarifying the differences between HCC and TCC populations. As suggested, we have repeated the FISH experiments of Nep4 and geko with higher resolution, and included the additional marker Coracle that demarcates the junction between HCC and TCC (Figure 6O,Q,S,T). These panels replaced previous Nep4 and geko FISH images (see previous Figure 6Q,U,U’). FISH for Nep4 validated the split, and the enrichment of geko strongly suggests that this arm represents one cell type (HCCs). We have not yet identified a gene reciprocally enriched to the other arm. Therefore, in the revised submission, we call the assignment of TCC identity, and to a lesser extent, HCC identity ‘tentative’, but point out that genes predicted to be enriched to one or the other arm represent fertile candidates for the field to test.

      Reviewer #2 (Public Review):

      In this manuscript the authors explain in greater detail a recent testis snRNAseq dataset that many of these authors published earlier this year as part of the Fly Cell Atlas (FCA) Li et al. Science 2022. As part of the current effort additional collaborators were recruited and about 6,000 whole cell scRNAseq cells were added to the previous 42,000 nuclei dataset. The authors now describe 65 snRNseq clusters, each representing potential cell types or cell states, including 43 germline clusters and 22 somatic clusters. The authors state that this analysis confirms and extends previously knowledge of the testis in several important areas.

      “However, in areas where testis biology is well studied, such as the development of germ cells from GSC to the onset of spermatocyte differentiation, the resolution seems less than current knowledge by considerable margins. No clusters correspond to GSCs, or specific mitotic spermatogonia, and even the major stages of meiotic prophase are not resolved. Instead, the transitions between one state and the next are broad and almost continuous, which could be an intrinsic characteristic of the testis compared to other tissues, of snRNAseq compared to scRNAseq, or of the particular experimental and software analysis choices that were used in this study.”

      Note that the referee raises the same issue later in their review also. To respond succinctly, we placed the relevant sentence from a later portion of this referee’s comment here

      “Support for the view that the problems are mostly technical, rather than a reflection of testis biology, comes from studies of scRNAseq in the mouse, where it has been possible to resolve a stem cell cluster, and germ cell pathways that follow known germ cell differentiation trajectories with much more discrete steps than were reported here (for example, Cao et al. 2021 cited by the authors).”

      Respectfully, we have a different interpretation of other work as cited by this referee. Our data, as well as that from others, supports the notion that transitions are generally broad and continuous and are indeed a feature of testis biology. As we report here, data from both single cell and single nucleus RNAseq exhibit transitions from one cluster to the next. Thus, this feature cannot be due to the choice of method (single cell versus single nucleus).

      In fact, prior scRNA-seq results on systems containing a continuously renewing cell population, such as is the case in the testis, do indeed exhibit a contiguous trajectory rather than discrete, well-separated cell states in gene expression space (that is, in a UMAP presentation). For example, this is the case from single-cell or single-nucleus sequencing from spermatogenesis in mouse (Cao et al 2021), human (Sohni et al 2019), and zebrafish (Qian et al 2022).

      Along differentiation trajectories in these tissues, successive clusters are defined by their aggregate, transcript repertoire. Indeed, differentially-expressed genes can be identified for clusters, with expression enriched in a given cluster. However, expression is rarely restricted to a cluster. For instance, Cao et al. subcluster spermatogonia into four subgroups, termed SPG1-4. They state clearly that these SPG1-4 “follow a continuous differentiation trajectory,” as can be inferred by marker expression across cells in this lineage. Similar to our findings, while the spermatogonia can fall into discrete clusters, gene expression patterns are contiguous. For example, the “undifferentiated” marker used in Cao et al, Crabp1, clearly shows expression in SPG1-3, annotated as spermatogonial stem cells, undifferentiated spermatogonia, and early differentiated spermatogonia, respectively. Likewise, markers for the “SPG3” state spermatogonia have detectable expression in SPG2 and SPG4, and likewise for markers of the “SPG4” state (with expression found also in SPG3). <br /> Analogous study of human spermatogenesis arrives at a similar conclusion. In that work, although clusters are named as “spermatogonial stem cell (SSC)”, the authors are careful to specifically point out that, “…while we refer to the SSC-1 and SSC-2 cell clusters as ‘‘SSCs,’’ scRNA-seq is not a functional assay and thus we do not know the percentage of cells in these clusters with SSC activity. These subsets almost certainly contain other A-SPG cells [A type spermatogonia], including SPG progenitors that have committed to differentiate.” (Sohi et al 2019)

      Thus, the work in several disparate systems, all involving renewing lineages, finds that discrete clusters, such as a “stem cell cluster” are not identified. In the Drosophila testis, germline differentiation flows in a continuous-like manner similar to spermatogenesis in several other organisms studied by scRNA-seq, and our finding is not a function of the methodology, but rather a facet of the biology of the organ.

      Operating in parallel with continuous differentiation, we did find evidence of, and extensively discussed in concert with Figure 4, huge and dramatic shifts in transcriptional state in spermatocytes compared to spermatogonia, in early spermatids compared to spermatocytes, and in late spermatid elongation. Lastly, as we describe further below, new data in this resubmission identify four distinct genes with stage-selective expression as predicted by our analysis (new Figure 2 - figure supplement 1), illustrating the utility of our study for the field to find new markers and new genes to test for function.

      A goal of the study was to identify new rare cell types, and the hub, a small apical somatic cell region, was mentioned as a target region, since it regulates both stem cell populations, GSCs and CySCs, is capable of regeneration, and other fascinating properties. However the analysis of the hub cluster revealed more problems of specificity. 41 or 120 cells in the cluster were discordant with the remaining 79 which did express markers consistent with previous studies. Why these cells co-clustered was not explained and one can only presume that similar problems may be found in other clusters.

      Our writing seems not to have been clear enough on this point and we thank the reviewer. We have revised the section. In addition, we have added new data (Figure 7 - figure supplement 2). We had already stated that only 79 of these 120 nuclei were near to each other in 2D UMAP space, while other members of original cluster 90 were dispersed. Thus the 79 hub nuclei in fact clustered together on the UMAP. Other nuclei that mapped at dispersed positions were initially ‘called’ as part of this cluster in the original Fly Cell Atlas (FCA) paper (Li et al., 2022), making it obvious that a correction to that assignment was necessary, which we carried out. To our eye, no other called cluster was represented by such dispersed groupings. For the hub, we definitively established the 79 nuclei to represent hub cells by marker gene analysis, including the identification of a new maker, tup, that was included in the 79 annotated hub nuclei but excluded from the 41 other nuclei (Figure 7). In this resubmission, to independently verify the relationship of the 79 nuclei to each other, we subjected the 120 nuclei from the original cluster 90 defined by the FCA study to hierarchical clustering using only genes that are highly expressed and variable in these nuclei (Figure 7 - figure supplement 2). This computationally distinct approach strongly supported our identification of the 79 definitive hub nuclei.

      Indeed, many other indications of specificity issues were described, including contamination of fat body with spermatocytes, the expression of germline genes such as Vasa in many somatic cell clusters like muscle, hemocytes, and male gonad epithelium, and the promiscuous expression of many genes, including 25% of somatic-specific transcription factors, in mid to late spermatocytes. The expression of only one such genes, Hml, was documented in tissue, and the authors for reasons not explained did not attempt to decisively address whether this phenomenon is biologically meaningful.

      We discussed the question of vasa expression in somatic clusters in some detail above, in response to referee #1, and included new analysis in the resubmission.

      With respect to the observation of ‘somatic gene’ expression in spermatocytes, we are also intrigued. We do not believe this is due to “contamination,” but rather a spermatocyte expression program that includes expression of somatic genes. First, these somatic markers were not observed in other germline clusters, which would be expected if this was due to general transcript contamination. Second, we observed expression of somatic markers in spermatocytes independently in the single-cell and single-nucleus data, making it unlikely to be an artifact of preparation of isolated nuclei. Finally, in the resubmission, in addition to Hml, we validated ‘somatic’ marker expression in spermatocytes by FISH of a somatic, tail cyst cell marker, Vsx1. Vsx1 is predicted to be expressed at low levels in spermatocytes in our dataset and is clearly visible in germline cells by FISH (Figure 3 – figure supplement 2G,H). We also refer the referee to Figure 6K, where the mRNA for the somatic cyst cell marker eya was observed by FISH at low levels in spermatocytes.

      A truly interesting question mentioned by the authors is why the testis consistently ranks near the top of all tissues in the complexity of its gene expression. In the Li et al. (2022) paper it was suggested that this is due an inherently greater biological complexity of spermiogenesis than other tissues. It seems difficult to independently and rationally determine "biological complexity," but if a conserved characteristic of testis was to promiscuously express a wide range of (random?) genes, something not out of the question, this would be highly relevant and important.

      We agree that the massive transcriptional program found in spermatocytes is, indeed, truly interesting. There are many speculations as to why spermatocytes are so highly transcriptional, including the possibility of “transcriptional scanning” (e.g., Xia et al. 2020) regulating the evolution of new genes. Testing such models is beyond the scope of this paper. However, one must also keep in mind that spermatogenesis involves one of the most dramatic cellular transformations in biology, where cellular components spanning from nuclei to chromatin to Golgi, cell cycle, extensive membrane addition, changes in cell shape, and building of a complex swimming organelle all must occur and be temporally coordinated. Small wonder that many genes must be expressed to accomplish these tasks.

      Unfortunately, the most likely problems are simply technical. Drosophila cells are small and difficult to separate as intact cells. The use of nuclei was meant to overcome this inherent problem, but the effectiveness of this new approach is not yet well-documented. Support for the view that the problems are mostly technical, rather than a reflection of testis biology, comes from studies of scRNAseq in the mouse, where it has been possible to resolve a stem cell cluster, and germ cell pathways that follow known germ cell differentiation trajectories with much more discrete steps than were reported here (for example, Cao et al. 2021 cited by the authors).

      We respectfully disagree with the referee about this collection of statements. First, the use of snRNASeq has been extensively characterized and compared to scRNA-seq in brain tissue by McLaughlin et al., 2021 (cited in the original submission) and was shown to be effective (McLaughlin, et al. eLife 2021;10:e63856. DOI: https://doi.org/10.7554/eLife.63856). snRNA-seq has a distinct advantage when dealing with long, thin cells, such as neurons or cyst cells (as featured in this work), where cytoplasm can easily be sheared off during cell isolation. Second, in a previous portion of our response to this referee, we discussed how our interpretation of Cao et al., 2021 differs from that expressed by this referee. Lastly, as requested in ‘Essential revision’ 2, we adjusted clustering methods and selected four genes, two predicted to be markers for early stage germline cells, and two for mid-spermatocyte stage development. FISH analysis demonstrates that expression for each of these maps to the appropriate stages (new Figure 2 - figure supplement 1). This confirms that the datasets we present in this manuscript can be mined to identify unique, diagnostic markers for various stages.

      The conclusions that were made by the authors seem to either be facts that are already well known, such as the problem that transcriptional changes in spermatocytes will be obscured by the large stored mRNA pool, or promises of future utility. For example, "mining the snRNA-seq data for changes in gene expression as one cluster advances to the next should identify new sub-stage-specific markers." If worthwhile new markers could be identified from these data, surely this could have been accomplished and presented in a supplemental Table. As it currently stands, the manuscript presents the dataset including a fair description of its current limitations, but very little else of novel biological interest is to be found.

      “In sum, this project represents an extremely worthwhile undertaking that will eventually pay off. However, some currently unappreciated technical issues, in cell/nuclear isolation, and certainly in the bioinformatic programs and procedures used that mis-clustered many different cells, has created the current difficulties.

      Most scRNAseq software is written to meet the needs of mammalian researchers working with cultured cells, cellular giants compared to Drosophila and of generally similar size. Such software may not be ideal for much smaller cells, but which also include the much wider variation in cell size, properties and biological mechanisms that exist in the world of tissues.”

      We appreciate the referee’s acknowledgement that this ‘undertaking will eventually pay off’. It was not our intention to address ‘function’ for this study, but rather to make the system accessible to the broadest community possible. We are uncertain if there is any remaining reservation held by this referee. A brief summary of what we covered in the manuscript may help allay any residual concern. Obviously, study of the Drosophila testis and spermatogenesis benefits from the knowledge of a large number of established cell-type and stage-selective markers. Thus, we extensively used the community’s accepted markers to assign identity to clusters in both the sn- and sc-RNA-seq UMAPs. We believe that effort well establishes the validity and reliability of the dataset . Furthermore, we identified upwards of a dozen new markers out of the cluster analysis, and verified their expression by FISH or reporter line in various figures throughout (tup, amph, piwi, geko, Nep4, CG3902, Akr1B, loqs, Vsx1, Drep2, Pxt, CG43317, Vha16-5, l(2)41Ab). To our mind, these contributions, coupled with annotation of the datasets, suggest strongly that they will serve the community well. This is especially true as we provide users with objects that they can feed into commonly used software algorithms such as Seurat and Monocle to explore the datasets to their purposes. Rather than simply relying on default settings within some of the applications, we also adjusted parameters for various clusterings as called for; some of which were in response to astute comments from referees, and included in the resubmission. Of course, it is possible that rare issues may arise in the datasets as these are further studied, but that is the case with all scRNA-seq data, and is not specific to work on this model organism.

      Reviewer #3 (Public Review):

      In this study, the authors use recently published single nucleus RNA sequencing data and a newly generated single cell RNA sequencing dataset to determine the transcriptional profiles of the different cell types in the Drosophila ovary. Their analysis of the data and experimental validation of key findings provide new insight into testis biology and create a resource for the community. The manuscript is clearly written, the data provide strong support for the conclusions, and the analysis is rigorous. Indeed, this manuscript serves as a case study demonstrating best practices in the analysis of this type of genomics data and the many types of predictions that can be made from a deep dive into the data. Researchers who are studying the testis will find many starting points for new projects suggested by this work, and the insightful comparison of methods, such as between slingshot and Monocle3 and single cell vs single nucleus sequencing will be of interest beyond the study of the Drosophila testis.

      We greatly appreciate the reviewer’s comments.

      Reviewer #4 (Public Review):

      This is an extraordinary study that will serve as key resource for all researchers in the field of Drosophila testis development. The lineages that derive from the germline stem cells and somatic stem cells are described in a detail that has not been previously achieved. The RNAseq approaches have permitted the description of cell states that have not been inferred from morphological analyses, although it is the combination of RNAseq and morphological studies that makes this study exceptional. The field will now have a good understanding of interactions between specific cell states in the somatic lineage with specific states in the germ cell lineage. This resource will permit future studies on precise mechanisms of communication between these lineages during the differentiation process, and will serve as a model for studies of co-differentiation in other stem cell systems. The combination of snRNAseq and scRNAseq has conclusively shown differences in transcriptional activation and RNA storage at specific stages of germ cell differentiation and is a unique study that will inform other studies of cell differentiation.

      Could the authors please describe whether genes on the Y chromosome are expressed outside of the male germline. For example, what is represented by the spots of expression within the seminal vesicle observed in Figure 3D?

      Prior work demonstrated that proteins encoded by Y-linked genes are not expressed outside of the germline (Zhang et al. Genetics 2020. https://doi.org/10.1534/genetics.120.303324). In our snRNAseq dataset, we find that genes on the Y chromosome are not highly expressed outside of the male germline (on the order of ~100-fold lower in other tissues). In fact, we observe Y chromosome transcripts at this level in many nuclei across tissues collected for the Fly Cell Atlas project, including the ovary. Since we have not followed up on the Fly Cell Atlas observations directly using FISH to examine Y chromosome transcript expression outside the germline, we cannot rule out the possibility that such low level expression is real. However, the detection across several tissues argues that this is likely technical artifact. With regard to ‘spots of expression within the seminal vesicle’ (Figure 3D), a spot is colored red if the average expression level of genes on the Y chromosome is greater in that cell than in an average cell on our plot. These red spots are likely due to ambient RNA being carried over.

      I would appreciate some discussion of the "somatic factors" that are observed to be upregulated in spermatocytes (e.g. Mhc, Hml, grh, Syt1). Is there any indication of functional significance of any of these factors in spermatocytes?

      This is an excellent question. Although we validated expression for several (Hml, Vsx1 and eya), we did not test for their function here and this issue remains to be studied. This is now directly stated in the main text.

      In the discussion of cyst cell lineage differentiation following cluster 74 the authors state that neither the HCC or TCC lineages were enriched for eya (Figure 6V). It seems in this panel that cluster 57 shows some enrichment for eya - is this regarded as too low expression to be considered enriched?

      We thank the reviewer for their insightful comment and we agree with their conclusions. We have modified the text to reflect the low, but present, expression of eya in the HCC and TCC lineages. The text now reads as follows at line (insert line # here): “Enrichment of eya was dramatically reduced in the clusters along either late cyst cell branch compared to those of earlier lineage nuclei (Figure 6J,U).”

    1. Author Response:

      Reviewer #1 (Public Review):

      Cell surface proteins are of vital interest in the functions and interactions of cells and their neighbors. In addition, cells manufacture and secrete small membrane vesicles that appear to represent a subset of the cell surface protein composition.

      Various techniques have been developed to allow the molecular definition of many cell surface proteins but most rely on the special chemistry of amino acid residues in exposed on the parts of membrane proteins exposed to the cell exterior.

      In this report Kirkemo et al. have devised a method that more comprehensively samples the cell surface protein composition by relying on the membrane insertion or protein glycan adhesion of an enzyme that attaches a biotin group to a nearest neighbor cellular protein. The result is a more complex set of proteins and distinctive differences between normal and a myc oncogene tumor cells and their secreted extracellular vesicle counterparts. These results may be applied to the identification of unique cell surface determinants in tumor cells that could be targets for immune or drug therapy. The results may be strengthened by a more though evaluation of the different EV membrane species represented in the broad collection of EVs used in this investigation.

      We thank the reviewer for recognizing the importance of the work outlined in the manuscript. We have addressed the necessary improvements in the essential revisions section above.

      Reviewer #2 (Public Review):

      This paper describes two methods for labeling cell-surface proteins. Both methods involve tethering an enzyme to the membrane surface to probe the proteins present on cells and exosomes. Two different enzyme constructs are used: a single strand lipidated DNA inserted into the membrane that enables binding of an enzyme conjugated to a complementary DNA strand (DNA-APEX2) or a glycan-targeting binding group conjugated to horseradish peroxidase (WGA-HRP). Both tethered enzymes label proteins on the cell surface using a biotin substrate via a radical mechanism. The method provides significantly enhanced labeling efficiency and is much faster than traditional chemical labeling methods and methods that employ soluble enzymes. The authors comprehensively analyze the labeled proteins using mass spectrometry and find multiple proteins that were previously undetectable with chemical methods and soluble enzymes. Furthermore, they compare the labeling of both cells and the exosomes that are formed from the cells and characterize both up- and down-regulated proteins related to cancer development that may provide a mechanistic underpinning.

      Overall, the method is novel and should enable the discovery of many low-abundance cell-surface proteins through more efficient labeling. The DNA-APEX2 method will only be accessible to more sophisticated laboratories that can carry out the protocols but the WGA-HRP method employs a readily available commercial product and give equivalent, perhaps even better, results. In addition, the method cannot discriminate between proteins that are genuinely expressed on the cell from those that are non-specifically bound to the cell surface.

      The authors describe the approach and identify two unique proteins on the surface of prostate cell lines.

      Strengths:

      Good introduction with appropriate citations of relevant literature Much higher labeling efficiency and faster than chemical methods and soluble enzyme methods. Ability to detect low-abundance proteins, not accessible from previous labeling methods.

      Weaknesses: The DNA-APEX2 method requires specialized reagents and protocols that are much more challenging for a typical laboratory to carry out than conventional chemical labeling methods.

      The claims and findings are sound. The finding of novel proteins and the quantitative measurement of protein up- and down-regulation are important. The concern about non-specifically bound proteins could be addressed by looking at whether the detected proteins have a transmembrane region that would enable them to localize in the cell membrane.

      We thank the reviewer for recognizing the strengths and importance of this work. We also thank the reviewer for mentioning the issue of non-specifically bound proteins. As addressed above in the essential revisions sections, we believe that any low affinity, non-specific binding proteins are likely removed in the multiple wash/centrifugation steps on cells or the multiple centrifugation steps and sucrose gradient purification on EVs. Given the likelihood for removal of non-specific binders, we believe that the secreted proteins identified are likely high affinity interactions and their differential expression on either cells or EVs play an important part in the downstream biology of both sample types. However, the previous data presentation did not clarify which proteins pertained to the transmembrane plasma membrane proteome versus secreted protein forms. For further clarity in the data presentation (Figure 3D, 4D, 5D), we have bolded proteins that are also found in the SURFY database that only includes surface annotated proteins with a predicted transmembrane domain (Bausch-Fluck et al., The in silico human surfaceome. PNAS. 2018). We have also italicized proteins that are annotated to be secreted from the cell to the extracellular space (Uniprot classification). We have updated the text and caption as shown below:

      New Figure 3:

      Figure 3. WGA-HRP identifies a number of enriched markers on Myc-driven prostate cancer cells. (A) Overall scheme for biotin labeling, and label-free quantification (LFQ) by LC-MS/MS for RWPE-1 Control and Myc over-expression cells. (B) Microscopy image depicting morphological differences between RWPE-1 Control and RWPE-1 Myc cells after 3 days in culture. (C) Volcano plot depicting the LFQ comparison of RWPE-1 Control and Myc labeled cells. Red labels indicate upregulation in the RWPE-1 Control cells over Myc cells and green labels indicate upregulation in the RWPE-1 Myc cells over Control cells. All colored proteins are 2-fold enriched in either dataset between four replicates (two technical, two biological, p<0.05). (D) Heatmap of the 15 most upregulated transmembrane (bold) or secreted (italics) proteins in RWPE-1 Control and Myc cells. Scale indicates intensity, defined as (LFQ Area - Mean LFQ Area)/standard deviation. Extracellular proteins with annotated transmembrane domains are bolded and annotated secreted proteins are italicized. (E) Table indicating fold-change of most differentially regulated proteins by LC-MS/MS for RWPE-1 Control and Myc cells. (F) Upregulated proteins in RWPE-1 Myc cells (Myc, ANPEP, Vimentin, and FN1) are confirmed by western blot. (G) Upregulated surface proteins in RWPE-1 Myc cells (Vimentin, ANPEP, FN1) are detected by immunofluorescence microscopy. The downregulated protein HLA-B by Myc over-expression was also detected by immunofluorescence microscopy. All western blot images and microscopy images are representative of two biological replicates. Mass spectrometry data is based on two biological and two technical replicates (N = 4).

      New Figure 4:

      Figure 4. WGA-HRP identifies a number of enriched markers on Myc-driven prostate cancer EVs. (A) Workflow for small EV isolation from cultured cells. (B) Labeled proteins indicating canonical exosome markers (ExoCarta Top 100 List) detected after performing label-free quantification (LFQ) from whole EV lysate. The proteins are graphed from least abundant to most abundant. (C) Workflow of exosome labeling and preparation for mass spectrometry. (D) Heatmap of the 15 most upregulated proteins in RWPE-1 Control or Myc EVs. Scale indicates intensity, defined as (LFQ Area - Mean LFQ Area)/SD. Extracellular proteins with annotated transmembrane domains are bolded and annotated secreted proteins are italicized. (E) Table indicating fold-change of most differentially regulated proteins by LC-MS/MS for RWPE-1 Control and Myc cells. (F) Upregulated proteins in RWPE-1 Myc EVs (ANPEP and FN1) are confirmed by western blot. Mass spectrometry data is based on two biological and two technical replicates (N = 4). Due to limited sample yield, one replicate was performed for the EV western blot.

      New Figure 5:

      Figure 5. WGA-HRP identifies a number of EV-specific markers that are present regardless of oncogene status. (A) Matrix depicting samples analyzed during LFQ comparison--Control and Myc cells, as well as Control and Myc EVs. (B) Principle component analysis (PCA) of all four groups analyzed by LFQ. Component 1 (50.4%) and component 2 (15.8%) are depicted. (C) Functional annotation clustering was performed using DAVID Bioinformatics Resource 6.8 to classify the major constituents of component 1 in PCA analysis. (D) Heatmap of the 25 most upregulated proteins in RWPE-1 cells or EVs. Proteins are listed in decreasing order of expression with the most highly expressed proteins in EVs on the far left and the most highly expressed proteins in cells on the far right. Scale indicates intensity, defined as (LFQ Area - Mean LFQ Area)/SD. Extracellular proteins with annotated transmembrane domains are bolded and annotated secreted proteins are italicized. (E) Table indicating fold-change of most differentially regulated proteins by LC-MS/MS for RWPE-1 EVs compared to parent cells. (F) Western blot showing the EV specific marker ITIH4, IGSF8, and MFGE8.Mass spectrometry data is based on two biological and two technical replicates (N = 4). Due to limited sample yield, one replicate was performed for the EV western blot.

      Authors mention time-sensitive changes but it is unclear how this method would enable one to obtain this kind of data. How would this be accomplished? The statement "Due to the rapid nature of peroxidase enzymes (1-2 min), our approaches enable kinetic experiments to capture rapid changes, such as binding, internalization, and shuttling events." Yes, it is faster, but not sure I can think of an experiment that would enable one to capture such events.

      We thank the reviewer for this comment and giving us an opportunity to elaborate on the types of experiments enabled by this new method. A previous study (Y, Li et al. Rapid Enzyme-Mediated Biotinylation for Cell Surface Proteome Profiling. Anal. Chem. 2021) showed that labeling the cell surface with soluble HRP allowed the researchers to detect immediate surface protein changes in response to insulin treatment. They demonstrated differential surfaceome profiling changes at 5 minutes vs 2 hours following treatment with insulin. Only methods utilizing these rapid labeling enzymes could allow for this type of resolution. A few other biological settings that experience rapid cell surface changes are: response to drug treatment, T-cell activation and synapse formation (S, Valitutti, et al. The space and time frames of T cell activation at the immunological synapse. FEBS Letters. 2010) and GPCR activation (T, Gupte et al. Minute-scale persistence of a GPCR conformation state triggered by non-cognate G protein interactions primes signaling. Nat. Commun. 2019). We also believe the method would be useful for post-translational processes where proteins are rapidly shuttling to the cell surface. We have updated the discussion to elaborate on these types of experiments.

      "Due to the fast kinetics of peroxidase enzymes (1-2 min), our approaches could enable kinetic experiments to capture rapid post-translational trafficking of surfaces proteins, such as response to insulin, certain drug treatments, T-cell activation and synapse formation, and GPCR activation."

      The authors do not have any way to differentiate between proteins expressed by cells and presented on their membranes from proteins that non-specifically bind to the membrane surface. Non-specific binding (NSB) is not addressed. Proteins can non-specifically bind to the cell or EV surface. The results are obtained by comparisons (cells vs exosomes, controls vs cancer cells), which is fine because it means that what is being measured is differentially expressed, so even NSB proteins may be up- and down-regulated. But the proteins identified need to be confirmed. For example, are all the proteins being detected transmembrane proteins that are known to be associated with the membrane?

      As mentioned above, we utilized the most rigorous informatics analysis available (Uniprot and SURFY) to annotate the proteins we find as having a signal sequence and/or TM domain. Data shown in heatmaps are based off of significance (p < 0.05) across all four replicates, which supports that any secreted proteins present are likely due to actual biological differences between oncogenic status and/or sample origin (i.e. EV vs cell). We have addressed this point in a previous comment above.

      The term "extracellular vesicles" (EVs) might be more appropriate than "exosomes" to describe the studied preparation.

      As we describe above in response to earlier comments, we have systematically changed from using exosomes to small extracellular vesicles and better defined the isolation procedure that we used in the methods section.

      Reviewer #3 (Public Review):

      The article by Kirkemo et al explores approaches to analyse the surface proteome of cells or cell-derived extracellular vesicles (EVs, called here exosomes, but the more generic term "extracellular vesicles" would be more appropriate because the used procedure leads to co-isolation of vesicles of different origin), using tools to tether proximity-biotinylation enzymes to membranes. The authors determine the best conditions for surface labeling of cells, and demonstrate that tethering the enzymes (APEX or HRP) increases the number of proteins detected by mass-spectrometry. They further use one of the two approaches (where HRP binds to glycans), to analyse the biotinylated proteome of two variants of a prostate cancer cell line, and the corresponding EVs. The approaches are interesting, but their benefit for analysis of cells or EVs is not very strongly supported by the data.

      First, the authors honestly show (fig2-suppl figures) that only 35% of the proteins identified after biotinylation with their preferred tool actually correspond to annotated surface proteins. This is only slightly better than results obtained with a non-tethered sulfo-NHS-approach (30%).

      We thank the reviewer for this comment. The reason we utilize membrane protein enrichment methods is that membrane protein abundance is low compared to cytosolic proteins and their identification can be overwhelmed by cytosolic contaminants. Nonetheless, despite our best efforts to limit labeling to the membrane proteins, cytosolic proteins can carry over. Thus, we utilize informatics methods to identify the proteins that are annotated to be membrane associated. The Uniprot GOCC (Gene Ontology Cellular Component) Plasma Membrane database is the most inclusive of membrane proteins only requiring they contain either a signal sequence, transmembrane domain, GPI anchor or other membrane associated motifs yielding a total of 5,746 proteins. This will include organelle membrane proteins. It is known that proteins can traffic from the internal organelles to the cell surface so these can be bonified cell surface proteins too. To increase the informatics stringency for membrane proteins we have now applied a new database aggregated from work by the Wollscheid lab, called SURFY (Bausch-Fluck et al., The in silico human surfaceome. PNAS. 2018). This is a machine learning method trained on 735 high confidence membrane proteins from the Cell Surface Protein Atlas (CSPA). SURFY predicts a total of 2,886 cell surface proteins. When we filter our data using SURFY for proteins, peptides and label free quantitation (LFQ) area for three methods, we find that the difference between NHS-Biotin and WGA-HRP expands considerably (see new Figure 3-Supplemental Figure 1 below). We observe these differences when the datasets are searched with either the GOCC Plasma Membrane database or the entire human Uniprot database. The difference is especially large for LFQ analysis, which quantitatively scores peptide intensity as opposed to simply count the number hits as for protein and peptide analysis. Cytosolic carry over is the major disadvantage of NHS-Biotin, which suppresses signal strength and is reflected in the lower LFQ values (24% for NHS-biotin compared to 40% for WGA-HRP). We have updated the main text and supplemental figure below:

      "Both WGA-HRP and biocytin hydrazide had similar levels of cell surface enrichment on the peptide and protein level when cross-referenced with the SURFY curated database for extracellular surface proteins with a predicted transmembrane domain (Figure 3 - Figure supplement 1A). Sulfo-NHS-LC-LC-biotin and whole cell lysis returned the lowest percentage of cell surface enrichment, suggesting a larger portion of the total sulfo-NHS-LC-LC-biotin protein identifications were of intracellular origin, despite the use of the cell-impermeable format. These same enrichment levels were seen when the datasets were searched with the curated GOCC-PM database, as well as the Uniprot entire human proteome database (Figure 3 - Figure supplement 1B). Importantly, of the proteins quantified across all four conditions, biocytin hydrazide and WGA-HRP returned higher overall intensity values for SURFY-specified proteins than either sulfo-NHS-LC-LC-biotin or whole cell lysis. Importantly, although biocytin hydrazide shows slightly higher cell surface enrichment compared to WGA-HRP, we were unable to perform the comparative analysis at 500,000 cells--instead requiring 1.5 million--as the protocol yielded too few cells for analysis."

      Figure 3-Figure Supplement 1. Comparison of surface enrichment between replicates for different mass spectrometry methods. (A) The top three methods (NHS-Biotin, Biocytin Hydrazide, and WGA-HRP) were compared for their ability to enrich cell surface proteins on 1.5 M RWPE-1 Control cells by LC-MS/MS after being searched with the Uniprot GOCC Plasma Membrane database. Shown are enrichment levels on the protein, peptide, and average MS1 intensity of top three peptides (LFQ area) levels. (B) The top three methods (NHS-Biotin, Biocytin Hydrazide, and WGA-HRP) were compared for their ability to enrich cell surface proteins on 1.5 M RWPE-1 Control cells by LC-MS/MS after being searched with the entire human Uniprot database. Shown are enrichment levels on the protein, peptide, and average MS1 intensity of top three peptides (LFQ area) levels. Proteins or peptides detected from cell surface annotated proteins (determined by the SURFY database) were divided by the total number of proteins or peptides detected. LFQ areas corresponding to cell surface annotated proteins (SURFY) were divided by the total area sum intensity for each sample. The corresponding percentages for two biological replicates were plotted.

      There are additional advantages to WGA-HRP over NHS-biotin. These include: (i) labeling time is 2 min versus 30 min, which would afford higher kinetic resolution as needed, and (ii) the NHS-biotin labels lysines, which hinders tryptic cleavage and downstream peptide analysis, whereas the WGA-HRP labels tyrosines, eliminating impacts on tryptic patterns. WGA-HRP is slightly below biocytin hydrazide in peptide and protein ID and somewhat more by LFQ. However, there are significant advantages over biocytin hydrazide: (i) sample size for WGA-HRP can be reduced a factor of 3-5 because of cell loss during the multiple washing steps after periodate oxidation and hydrazide labeling, (ii) the time of labeling is dramatically reduced from 3 hr for hydrazide to 2 min for WGA-HRP, and (iii) the HRP enzyme has a large labeling diameter (20-40 nm, but also reported up to 200 nm) and can label non-glycosylated membrane proteins as opposed to biocytin hydrazide that only labels glycosylated proteins. The hydrazide method is the current standard for membrane protein enrichment, and we feel that the WGA-HRP will compete especially when cell sample size is limited or requires special handling. In the case of EVs, we were not able to perform hydrazide labeling due to the two-step process and small sample size.

      Indeed the list of identified proteins in figures 4 and 5 include several proteins whose expected subcellular location is internal, not surface exposed, and whose location in EVs should also be inside (non-exhaustively: SDCBP = syntenin, PDCD6IP = Alix, ARRDC1, VPS37B, NUP35 = nucleopore protein)…

      We thank the reviewer for this comment. We have elaborated on this point in a number of response paragraphs above. The proteins that the reviewer points out are annotated as “plasma membrane” in the very inclusive GOCC plasma membrane database. However, this means that they may also spend time in other locations in the cell or reside on organelle membranes. We have done further analysis to remove any intracellular membrane residing proteins that are included in the GOCC plasma membrane database, including the five proteins mentioned above. We also have further highlighted proteins that appear in the SURFY database, as discussed above and in our response to Reviewer 2’s comment. To increase stringency, we have bolded proteins that are found in the more selective SURFY database and italicized secreted proteins. Due to our new analysis and data presentation, it is more clear which markers are bona fide extracellular resident membrane proteins. We have updated the Figures and Figure legends as mentioned above, as well as added this statement in the Data Processing and Analysis methods:

      "Additionally, to not miss any key surface markers such as secreted proteins or anchored proteins without a transmembrane domain, we chose to initially avoid searching with a more stringent protein list, such as the curated SURFY database. However, following the analysis, we bolded proteins found in the SURFY database and italicized proteins known to be secreted (Uniprot)."

      The membrane proteins identified as different between the control and Myc-overexpressing cells or their EVs, would have been identified as well by a regular proteomic analysis.

      To directly compare surfaceomes of EVs to cells, we are compelled to use the same proteomic method. For parental cell surfaceomic analysis, a membrane enrichment method is required due to the high levels of cytosolic proteins that swamp out signal from membrane proteins. Although EVs have a higher proportion of membrane to cytosol, whole EV proteomics would still have significant cytosolic contamination.

      Second, the title highlights the benefit of the technique for small-scale samples: this is demonstrated for cells (figures 1-2), but not for EVs: no clear quantitative indication of amount of material used is provided for EV samples. Furthermore, no comparison with other biotinylation technics such as sulfo-NHS is provided for EVs/exosomes. Therefore, it is difficult to infer the benefit of this technic applied to the analysis of EVs/exosomes.

      We appreciate the reviewer for this comment. We have updated the methods as mentioned above in our response to the Essential Revisions. In brief, the yield of EVs post-sucrose gradient isolation was 3-5 µg of protein from 16x15 cm2 plates of cells, totaling 240 mL of media. Since we had previously demonstrated that our method was superior to sulfo-NHS for enriching surface proteins on cells, we proceeded to use the WGA-HRP for the EV labeling experiments.

      In addition, the WGA-based tethering approach, which is the only one used for the comparative analysis of figures 4 and 5, possibly induces a bias towards identification of proteins with a particular glycan signature: a novelty would possibly have come from a comparison of this approach with the other initially evaluated, the DNA-APEX one, where tethering is induced by lipid moieties, thus should not depend on glycans. The authors may have then identified by LC-MS/MS specific glycan-associated versus non-glycan-associated proteins in the cells or EVs membranes. Also ideally, the authors should have compared the 4 combinations of the 2 enzymes (APEX and HRP) and 2 tethers (lipid-bound DNA and WGA) to identify the bias introduced by each one.

      We thank the reviewer for this comment. We performed analysis to determine whether there was a bias towards Uniprot annotated “Glyco” vs “Non-Glyco” surface proteins within the SURFY database identified across the WGA-HRP, APEX2-DNA, APEX2, and HRP labeling methods. We performed this analysis by measuring the total LFQ area detected for each category (glycoprotein vs non-glycoprotein) and dividing that by the total LFQ area found across all proteins detected in the sample. We found similar normalized areas of non-glyco surface proteins between WGA-HRP and APEX2-DNA suggesting there is not a bias against non-glycosylated proteins in the WGA-HRP sample. There were slightly elevated levels of Glycoproteins in the WGA-HRP sample over APEX2-DNA. It is not surprising to us that there is little bias because the free-radicals generated by biotin-tyramide can label over tens of nanometers and thus can label not just the protein they are attached to, but neighbors also, regardless of glycosylation status. We have added this as Figure 2-Supplement 3, and amended the text in the manuscript below in purple.

      Figure 2 – Figure Supplement 3: Comparison of enrichment of Glyco- vs Non-Glyco-proteins. (A) TIC area of Uniprot annotated Glycoproteins compared to Non-Glycoproteins in the SURFY database for each labeling method compared to total TIC area. There was not a significant difference in detection of Non-Glycoproteins detected between WGA-HRP and APEX2-DNA and only a slightly higher detection of Glycoproteins in the WGA-HRP sample over APEX2-DNA.

      "As the mode of tethering WGA-HRP involves GlcNAc and sialic acid glycans, we wanted to determine whether there was a bias towards Uniprot annotated 'Glycoprotein' vs 'Non-Glycoprotein' surface proteins identified across the WGA-HRP, APEX2-DNA, APEX2, and HRP labeling methods. We looked specifically looked at surface proteins founds in the SURFY database, which is the most restrictive surface database and requires that proteins have a predicted transmembrane domain (Bausch-Fluck et al., The in silico human surfaceome. PNAS. 2018). We performed this analysis by measuring the average MS1 intensity across the top three peptides (area) for SURFY glycoproteins and non-glycoproteins for each sample and dividing that by the total LFQ area found across all GOCC annotated membrane proteins detected in each sample. We found similar normalized areas of non-glyco surface proteins across all samples (Figure 2 - Figure supplement 4). If a bias existed towards glycosylated proteins in WGA-HRP compared to the glycan agnostic APEX2-DNA sample, then we would have seen a larger percentage of non-glycosylated surface proteins identified in APEX2-DNA over WGA-HRP. Due to the large labeling radius of the HRP enzyme, we find it unsurprising that the WGA-HRP method is able to capture non-glycosylated proteins on the surface to the same degree (Rees et al. Selective Proteomic Proximity Labeling Assay SPPLAT. Current Protocols in Protein Science. 2015). There is a slight increase in the area percentage of glycoproteins detected in the WGA-HRP compared to the APEX2-DNA sample but this is likely due to the fact that a greater number of surface proteins in general are detected with WGA-HRP."

      As presented the article is thus an interesting technical description, which does not convince the reader of its benefit to use for further proteomic analyses of EVs or cells. Such info is of course interesting to share with other scientists as a sort of "negative" or "neutral" result. Maybe a novelty of the presented work is the differential proteome analysis of surface enriched EV/cell proteins in control versus myc-expressing cells. Such analyses of EVs from different derivatives of a tumor cell line have been performed before, for instance comparing cells with different K-Ras mutations (Demory-Beckler, Mol Cell proteomics 2013 # 23161513). However, here the authors compare also cells and EVs, and find possibly interesting discrepancies in the upregulated proteins. These results could probably be exploited more extensively. For instance, authors could give clearer info (lists) on the proteins differentially regulated in the different comparisons: in EVs from both cells, in EVs vs cells, in both cells.

      We appreciate the reviewer for this critique and have updated the manuscript accordingly. We have changed the title to “Cell surface tethered promiscuous biotinylators enable small-scale comparative surface proteomic analysis of human extracellular vesicles and cells” to more accurately depict the focus of our manuscript which, as the reviewer highlighted, is that this technology allows for comparative analysis between the surfaceomes of cells vs EVs. We appreciate the fine work from the Coffey lab on whole EV analysis of KRAS transformed cells. They identified a mix of surface and cytosolic proteins that change in EVs from the transformed cells, whereas our data focuses specifically on the surfaceome differences in Myc transformed and non-transformed cells and corresponding small EVs. We believe this makes important contributions to the field as well.

      To further address the reviewer’s suggestions, we additionally have significantly reorganized the figures to better display the differentially regulated proteins. We have removed the volcano plots and instead included heatmaps with the top 30 (Figure 3 and Figure 4) and top 50 (Figure 5) differentially regulated proteins across cells and EVs. We have also updated the lists of proteins in the supplemental source tables section. See responses to Reviewer 2 above for the updates to Figures 3-5. We have additionally included supplemental figures with lists of differentially upregulated proteins in the EV and Cell samples, which are shown below:

      Figure 3 – Supplement 3: List of proteins comparing enriched targets (>2-fold) in Myc cells versus Control cells. Targets that were found enriched (Myc/Control) in the Control cells (left) and Myc cells (right). The fold-change between Myc cells and Control cells is listed in the column to the right of the gene name.

      Figure 4 – Supplement 1: List of proteins comparing enriched targets (>1.5-fold) in Myc EVs versus Control EVs. Targets that were found enriched (Myc/Control) in the Control EVs (left) and Myc EVs (right). The fold-change between Myc EVs and Control EVs is listed in the column to the right of the gene name.

      Figure 4 – Figure Supplement 2: Venn diagram comparing enriched targets (>2-fold) in Cells and EVs. (A) Targets that were found enriched in the Control EVs (purple) and Control cells (blue) when each is separately compared to Myc EVs and Myc cells, respectively. The 5 overlapping enriched targets in common between Control cells and Control EVs are listed in the center. (B) Targets that were found enriched in the Myc EVs (purple) and Myc cells (blue) when each is separately compared to Control EVs and Control cells, respectively. The 12 overlapping enriched targets in common between Myc cells and Myc EVs are listed in the center.

      Figure 5 - Supplement 1: List of proteins comparing enriched targets (>2-fold) in Control EVs versus Control cells and Myc EVs versus Myc cells. (A)Targets that were found enriched (EV/cell) in the Control samples are listed. The fold-change values between Control EVs and Control cells are listed in the column to the right of the gene name. (B)Targets that were found enriched (EV/cell) in the Myc samples are listed. The fold-change values between Myc EVs and Myc cells are listed in the column to the right of the gene name.

    1. Author Response

      Reviewer #1 (Public Review):

      1) Comment: To determine the effect of diseased monocytes on retinal health, light-injured mouse retinas were injected with monocytes isolated from AMD patients (Figure 1 - figure supplement 1). This resulted in a reduction in photoreceptor number and ERG b-wave amplitude. However, the light-injured control eye was injected with PBS only, so no cells were present. The reasoning for using this control was not provided. The appropriate injection control would include monocytes isolated from non-AMD patients. This control should be performed side-by-side with cells from AMD patients.

      We thank the reviewer for this important comment. The purpose of the current study was to identify the macrophage subtype that may be associated with cell death in aAMD. We have previously reported that macrophages from AMD patient demonstrate a different phenotype compared with healthy patient in the rodent model for laser induced CNV (Hagbi-Levi S et al, 2016). Per the reviewer comment, we have performed additional experiments to assess the effect of monocytes from healthy controls in the photic retinal injury model. Results showed that monocytes from AMD and healthy patients exert different impact on the retina in this rodent model for aAMD. Interestingly, we found that monocytes from healthy patients were more neurotoxic to photoreceptors compared with monocytes from AMD patients. These results are included in the revised ms. as Figure 1- figure supplement 1H. A possible explanation for these findings is discussed in lines 179-190 of the revised manuscript. This finding reinforces the idea that the use of monocytes from AMD patients in the experiments is required to obtain a comprehensive understanding of their involvement in the progression of the disease.

      2) Comment: The authors hypothesize, from the experiments presented in Figure 1 - figure supplement 1, that the injected monocytes generated macrophages in the retina, which were responsible for the observed neurotoxicity (Lines 143-145). However, no direct evidence was presented. This idea should be tested in vivo. This could be done by injecting tracer-labeled human AMD-derived monocytes into light-injured mouse retinas. If the authors' hypothesis is true, collected retinas should contain tracer-labeled cells that express macrophage markers. Tracer-labeled M2a macrophage cells should be present since subsequent experiments identify this subclass as being associated with retinal cell death.

      Thank you for this important comment. To address the reviewers comment, retinal section from mice exposed to photic-retinal injury and injected with Dio-tracer labelled monocytes were stained with two M2a macrophages markers, CD206 (mannose receptor) and VEGF (Kadomoto, S et al, 2022; Jayasingam SD et al, 2019). Interestingly, we found co-localization of Dio-tracer staining (representing the injected human macrophages) with CD206 and VEGF markers in monocytes localized in different retinal layers, but not in monocytes remaining in the vitreous cavity. These data indicate that M2a markers are expressed during the polarization of monocytes into M2a phenotype which is maintained only upon entry into the retina tissue. These results were included in Figure 1- figure supplement 1K-S and discussed in the revised manuscript in lines 179-182.

      3) Comment: Photoreceptor number and b-wave amplitudes were measured in light-injured retinas injected with one of four macrophage cell types generated from human AMD-derived monocytes. The authors conclude that only injection of M2a cells reduced photoreceptor number and b-wave amplitudes (Figure 1C, E). This may be true, but it is difficult for the reader to make a conclusion (especially in Fig. 1E) due to the large error bars and five different traces overlapping each other. To make these results easier to interpret, graph control cells with only one experimental sample (cell type) at a time.

      Thank you for this comment. Per the reviewer comment, the graphs were modified in the revised ms. (Figure 1, panel H-K).

      4) Comment: Most injected macrophages were located in the vitreous. In the case of M2a cells, the authors note that "several of the cells migrated across the retinal layers reaching the subretinal space" (Lines 167,168). One possible explanation for why M0, M1, and M2c macrophages did not induce retinal degeneration is that they did not migrate to the subretinal space and around the optic nerve head. Supplementary figures should be added to demonstrate that this is not the case.

      Thank you for this comment. To address the reviewer comment we compared the migration patterns of the different macrophage phenotypes following intravitreal injection in mice exposed to photic-injury. Our results indicated that M0, M1 and M2c macrophages, similarly to M2a macrophages, migrated to the subretinal space and around the optic nerve. Thus, the neurotoxic effect of M2a is not explained by their capacity to infiltrate the retinal tissues. These results was included in Figure 1- figure supplement 2 E-H of the revised manuscript. These results are supported by our ex-vivo experiments, showing that co-culture of M2a macrophages with a retinal explants was associated with increased photoreceptor cells death compared to M1 macrophages. The results are presented and discussed in the revised manuscript in lines 200-203.

      5) Comment: Figure 1 - figure supplement 2: Panel A, B cells were stained with CD206 to demonstrate the presence of M2a macrophages (panel B). The authors conclude that panel A contains M1 and panel B contains M2a cells. The lack of CD206 expression illustrates that panel A cells are not M2a macrophages but do not demonstrate they are M1 macrophages. A control using an M1 cell marker is necessary to show that panel A cells are M1 and M1 cells are not detected in M2a cultures.

      Thank you for this comment. We have validated the phenotype of each macrophages subtype by qPCR (Figure 1 panel A). To further address the reviewer comment, we have performed additional immunocytochemistry for M1 macrophages using anti-CD80 antibody which is utilized as M1 macrophages marker (Bertani FR et al.2017). Results of the staining confirmed the identity of the M1 macrophages. These new results were included in Figure 1- figure supplement 2A, and are discussed in lines 168-170.

      6) Comment: Ex vivo, apoptotic photoreceptor and RPE cells are observed when cultured with M2a macrophages (Figure 2). Do injected M2a cells also induce apoptosis of RPE cells in vivo? This is important to establish that retinal explants are a good model for in vivo experiments.

      Thank you for this comment. To address the reviewer comment, we assessed RPE apoptosis (using TUNEL, Caspase 3 staining and RPE65 marker) after M2A cells delivery, in the in-vivo photic injury model. We could not detect apoptotic signal in the RPE layers 7 days after photic injury and therefore could not evaluate the effect of M2a macrophages on the RPE cells in-vivo (see Author response image 1). One possible explanation is that RPE cells that have undergone apoptosis are rapidly removed from the damaged tissue and are no longer detectable unlike photoreceptors. Furthermore, a study that investigated the impact of bright light on RPE cells in-vivo, showed that although RPE cells undergone structural and chemical modifications after photic-injury, TUNEL signal was not detected because RPE cell die by necrosis mechanism and not apoptosis (Jaadane I et al, 2017). Other studies validated that blue light induces RPE necrosis (Song W et al, 2022; Mohamed A et al, 2022). Taken together, it seems that ex-vivo retinal explant and in-vivo photic injury both simulate the mechanism of retinal cell death. However, the use of ex-vivo model allows for establishing the direct impact of M2a macrophages on retina in non-inflammatory context.

      Author responnse image 1.

      7) Comment: Reactive oxygen species (ROS) production was measured to determine if M2a cell-mediated neurotoxicity was due to oxidative stress. It is concluded that a ROS increase is partly responsible (Line 218). The data do not support this conclusion. ROS was detected in cultured M2a macrophages. More importantly, however, there was no increase in oxidative damage in vivo. The in vivo and cell culture results contradict each other so no conclusion can be made. The lack of in vivo confirmation weakens the argument that ROS drives M2a neurotoxicity. Text suggesting a role for ROS in neurotoxicity should be appropriately edited (Lines including 218, 244, 401,406,481).

      Thank you for this comment. The manuscript was revised according to the reviewer suggestion (Lines 250-256).

      8) Comment: The authors ask if the photoreceptor cell death is cytokine-mediated. Multiple cytokines were enriched in M2a-conditioned media. Of particular interest were CCR1 ligands MPIF1 and MCP4. The implication is that these two ligands mediate the M2a macrophages to photoreceptor cell death through CCR1. However, there is no attempt to show that either MPIF1 or MCP4 are present in vivo, or are sufficient to induce the retinal response observed. This could be demonstrated by injection of MPIF1 or MCP4. Evidence that either ligand phenocopies M2a macrophage injection would be direct evidence that CCR1 ligands activate the retinal response. Furthermore, co-injection with BX174 should block the effect of these ligands if they work through CCR1.

      Thank you for this comment. The identification of CCR1 ligands expression from M2a polarized macrophages directed our decision to study CCR1 in the context of atrophic AMD. We do not claim that these specific CCR1 ligands are sufficient to activate CCR1 and exert retinal injury. The mechanism is likely more complex. Yet, to address the reviewer comment, we have performed the experiments suggested by the reviewer. Mice were exposed to photic injury and immediately injected in one eye with MPIF1, MCP-4, or a combination of both and in second eye with PBS as vehicle. Intravitreal cytokines delivery was repeated two days later (following the half-life time of these cytokines) and ERG were recorded two days after the last injection. Injection of cytokines at a concentration of 300 ng per eye did not exacerbated photoreceptor death. Then, the same experiment was repeated with two higher concentrations of cytokine, 1.2 ug/eye and 2 ug/eye, but no changes are observed between the cytokines treated-eyes and the vehicle treated-eyes. Based on previous studies reporting the physiological concentration of different cytokines in eyes of un/healthy individuals and on experiments in which different cytokines are injected in rodent eye (Estevao C et al, 2021. Zeng Y et al, 2019; Roybal CN et al, 2018; Mugisho OO et al, 2018), the cytokine concentrations used in our experiment are in the range in which effect on the retina is expected.

      It is likely that a synergistic effect of M2a-secreted proteins in a particular microenvironment is necessary to increase the level of retinal damage (Bartee E et al, 2013). It is also likely that in the photic retinal injury model there is upregulation of cytokines that may mask additional delivery of exogenous cytokines. Comprehensive understanding of the complex interactions of these cytokines during retinal degeneration is beyond the scope of the current manuscript which is not focus on identifying ligand-induced CCR1 activation and its consequences. Additionally, we suggest that due to cytokine redundancy (Nicola NA; 1994), demonstrating that MPIF-4 or MCP-3 can increase photoreceptor death is not required for proving CCR1 receptor involvement.

    1. Author Response

      Reviewer #1 (Public Review):

      In this work George et al. describe RatInABox, a software system for generating surrogate locomotion trajectories and neural data to simulate the effects of a rodent moving about an arena. This work is aimed at researchers that study rodent navigation and its neural machinery.

      Strengths:

      • The software contains several helpful features. It has the ability to import existing movement traces and interpolate data with lower sampling rates. It allows varying the degree to which rodents stay near the walls of the arena. It appears to be able to simulate place cells, grid cells, and some other features.

      • The architecture seems fine and the code is in a language that will be accessible to many labs.

      • There is convincing validation of velocity statistics. There are examples shown of position data, which seem to generally match between data and simulation.

      Weaknesses:

      • There is little analysis of position statistics. I am not sure this is needed, but the software might end up more powerful and the paper higher impact if some position analysis was done. Based on the traces shown, it seems possible that some additional parameters might be needed to simulate position/occupancy traces whose statistics match the data.

      Thank you for this suggestion. We have added a new panel to figure 2 showing a histogram of the time the agent spends at positions of increasing distance from the nearest wall. As you can see, RatInABox is a good fit to the real locomotion data: positions very near the wall are under-explored (in the real data this is probably because whiskers and physical body size block positions very close to the wall) and positions just away from but close to the wall are slightly over explored (an effect known as thigmotaxis, already discussed in the manuscript).

      As you correctly suspected, fitting this warranted a new parameter which controls the strength of the wall repulsion, we call this “wall_repel_strength”. The motion model hasn’t mathematically changed, all we did was take a parameter which was originally a fixed constant 1, unavailable to the user, and made it a variable which can be changed (see methods section 6.1.3 for maths). The curves fit best when wall_repel_strength ~= 2. Methods and parameters table have been updated accordingly. See Fig. 2e.

      • The overall impact of this work is somewhat limited. It is not completely clear how many labs might use this, or have a need for it. The introduction could have provided more specificity about examples of past work that would have been better done with this tool.

      At the point of publication we, like yourself, also didn’t know to what extent there would be a market for this toolkit however we were pleased to find that there was. In its initial 11 months RatInABox has accumulated a growing, global user base, over 120 stars on Github and north of 17,000 downloads through PyPI. We have accumulated a list of testimonials[5] from users of the package vouching for its utility and ease of use, four of which are abridged below. These testimonials come from a diverse group of 9 researchers spanning 6 countries across 4 continents and varying career stages from pre-doctoral researchers with little computational exposure to tenured PIs. Finally, not only does the community use RatInABox they are also building it: at the time of writing RatInABx has received logged 20 GitHub “Issues” and 28 “pull requests” from external users (i.e. those who aren’t authors on this manuscript) ranging from small discussions and bug-fixes to significant new features, demos and wrappers.

      Abridged testimonials:

      ● “As a medical graduate from Pakistan with little computational background…I found RatInABox to be a great learning and teaching tool, particularly for those who are underprivileged and new to computational neuroscience.” - Muhammad Kaleem, King Edward Medical University, Pakistan

      ● “RatInABox has been critical to the progress of my postdoctoral work. I believe it has the strong potential to become a cornerstone tool for realistic behavioural and neuronal modelling” - Dr. Colleen Gillon, Imperial College London, UK

      ● “As a student studying mathematics at the University of Ghana, I would recommend RatInABox to anyone looking to learn or teach concepts in computational neuroscience.” - Kojo Nketia, University of Ghana, Ghana

      ● “RatInABox has established a new foundation and common space for advances in cognitive mapping research.” - Dr. Quinn Lee, McGill, Canada

      The introduction continues to include the following sentence highlighting examples of past work which relied of generating artificial movement and/or neural dat and which, by implication could have been done better (or at least accelerated and standardised) using our toolbox.

      “Indeed, many past[13, 14, 15] and recent[16, 17, 18, 19, 6, 20, 21] models have relied on artificially generated movement trajectories and neural data.”

      • Presentation: Some discussion of case studies in Introduction might address the above point on impact. It would be useful to have more discussion of how general the software is, and why the current feature set was chosen. For example, how well does RatInABox deal with environments of arbitrary shape? T-mazes? It might help illustrate the tool's generality to move some of the examples in supplementary figure to main text - or just summarize them in a main text figure/panel.

      Thank you for this question. Since the initial submission of this manuscript RatInABox has been upgraded and environments have become substantially more “general”. Environments can now be of arbitrary shape (including T-mazes), boundaries can be curved, they can contain holes and can also contain objects (0-dimensional points which act as visual cues). A few examples are showcased in the updated figure 1 panel e.

      To further illustrate the tools generality beyond the structure of the environment we continue to summarise the reinforcement learning example (Fig. 3e) and neural decoding example in section 3.1. In addition to this we have added three new panels into figure 3 highlighting new features which, we hope you will agree, make RatInABox significantly more powerful and general and satisfy your suggestion of clarifying utility and generality in the manuscript directly.

      On the topic of generality, we wrote the manuscript in such a way as to demonstrate how the rich variety of ways RatInABox can be used without providing an exhaustive list of potential applications. For example, RatInABox can be used to study neural decoding and it can be used to study reinforcement learning but not because it was purpose built with these use-cases in mind. Rather because it contains a set of core tools designed to support spatial navigation and neural representations in general. For this reason we would rather keep the demonstrative examples as supplements and implement your suggestion of further raising attention to the large array of tutorials and demos provided on the GitHub repository by modifying the final paragraph of section 3.1 to read:

      “Additional tutorials, not described here but available online, demonstrate how RatInABox can be used to model splitter cells, conjunctive grid cells, biologically plausible path integration, successor features, deep actor-critic RL, whisker cells and more. Despite including these examples we stress that they are not exhaustive. RatInABox provides the framework and primitive classes/functions from which highly advanced simulations such as these can be built.”

      Reviewer #3 (Public Review):

      George et al. present a convincing new Python toolbox that allows researchers to generate synthetic behavior and neural data specifically focusing on hippocampal functional cell types (place cells, grid cells, boundary vector cells, head direction cells). This is highly useful for theory-driven research where synthetic benchmarks should be used. Beyond just navigation, it can be highly useful for novel tool development that requires jointly modeling behavior and neural data. The code is well organized and written and it was easy for us to test.

      We have a few constructive points that they might want to consider.

      • Right now the code only supports X,Y movements, but Z is also critical and opens new questions in 3D coding of space (such as grid cells in bats, etc). Many animals effectively navigate in 2D, as a whole, but they certainly make a large number of 3D head movements, and modeling this will become increasingly important and the authors should consider how to support this.

      Agents now have a dedicated head direction variable (before head direction was just assumed to be the normalised velocity vector). By default this just smoothes and normalises the velocity but, in theory, could be accessed and used to model more complex head direction dynamics. This is described in the updated methods section.

      In general, we try to tread a careful line. For example we embrace certain aspects of physical and biological realism (e.g. modelling environments as continuous, or fitting motion to real behaviour) and avoid others (such as the biophysics/biochemisty of individual neurons, or the mechanical complexities of joint/muscle modelling). It is hard to decide where to draw but we have a few guiding principles:

      1. RatInABox is most well suited for normative modelling and neuroAI-style probing questions at the level of behaviour and representations. We consciously avoid unnecessary complexities that do not directly contribute to these domains.

      2. Compute: To best accelerate research we think the package should remain fast and lightweight. Certain features are ignored if computational cost outweighs their benefit.

      3. Users: If, and as, users require complexities e.g. 3D head movements, we will consider adding them to the code base.

      For now we believe proper 3D motion is out of scope for RatInABox. Calculating motion near walls is already surprisingly complex and to do this in 3D would be challenging. Furthermore all cell classes would need to be rewritten too. This would be a large undertaking probably requiring rewriting the package from scratch, or making a new package RatInABox3D (BatInABox?) altogether, something which we don’t intend to undertake right now. One option, if users really needed 3D trajectory data they could quite straightforwardly simulate a 2D Environment (X,Y) and a 1D Environment (Z) independently. With this method (X,Y) and (Z) motion would be entirely independent which is of unrealistic but, depending on the use case, may well be sufficient.

      Alternatively, as you said that many agents effectively navigate in 2D but show complex 3D head and other body movements, RatInABox could interface with and feed data downstream to other softwares (for example Mujoco[11]) which specialise in joint/muscle modelling. This would be a very legitimate use-case for RatInABox.

      We’ve flagged all of these assumptions and limitations in a new body of text added to the discussion:

      “Our package is not the first to model neural data[37, 38, 39] or spatial behaviour[40, 41], yet it distinguishes itself by integrating these two aspects within a unified, lightweight framework. The modelling approach employed by RatInABox involves certain assumptions:

      1. It does not engage in the detailed exploration of biophysical[37, 39] or biochemical[38] aspects of neural modelling, nor does it delve into the mechanical intricacies of joint and muscle modelling[40, 41]. While these elements are crucial in specific scenarios, they demand substantial computational resources and become less pertinent in studies focused on higher-level questions about behaviour and neural representations.

      2. A focus of our package is modelling experimental paradigms commonly used to study spatially modulated neural activity and behaviour in rodents. Consequently, environments are currently restricted to being two-dimensional and planar, precluding the exploration of three-dimensional settings. However, in principle, these limitations can be relaxed in the future.

      3. RatInABox avoids the oversimplifications commonly found in discrete modelling, predominant in reinforcement learning[22, 23], which we believe impede its relevance to neuroscience.

      4. Currently, inputs from different sensory modalities, such as vision or olfaction, are not explicitly considered. Instead, sensory input is represented implicitly through efficient allocentric or egocentric representations. If necessary, one could use the RatInABox API in conjunction with a third-party computer graphics engine to circumvent this limitation.

      5. Finally, focus has been given to generating synthetic data from steady-state systems. Hence, by default, agents and neurons do not explicitly include learning, plasticity or adaptation. Nevertheless we have shown that a minimal set of features such as parameterised function-approximator neurons and policy control enable a variety of experience-driven changes in behaviour the cell responses[42, 43] to be modelled within the framework.

      • What about other environments that are not "Boxes" as in the name - can the environment only be a Box, what about a circular environment? Or Bat flight? This also has implications for the velocity of the agent, etc. What are the parameters for the motion model to simulate a bat, which likely has a higher velocity than a rat?

      Thank you for this question. Since the initial submission of this manuscript RatInABox has been upgraded and environments have become substantially more “general”. Environments can now be of arbitrary shape (including circular), boundaries can be curved, they can contain holes and can also contain objects (0-dimensional points which act as visual cues). A few examples are showcased in the updated figure 1 panel e.

      Whilst we don’t know the exact parameters for bat flight users could fairly straightforwardly figure these out themselves and set them using the motion parameters as shown in the table below. We would guess that bats have a higher average speed (speed_mean) and a longer decoherence time due to increased inertia (speed_coherence_time), so the following code might roughly simulate a bat flying around in a 10 x 10 m environment. Author response image 1 shows all Agent parameters which can be set to vary the random motion model.

      Author response image 1.

      • Semi-related, the name suggests limitations: why Rat? Why not Agent? (But its a personal choice)

      We came up with the name “RatInABox” when we developed this software to study hippocampal representations of an artificial rat moving around a closed 2D world (a box). We also fitted the random motion model to open-field exploration data from rats. You’re right that it is not limited to rodents but for better or for worse it’s probably too late for a rebrand!

      • A future extension (or now) could be the ability to interface with common trajectory estimation tools; for example, taking in the (X, Y, (Z), time) outputs of animal pose estimation tools (like DeepLabCut or such) would also allow experimentalists to generate neural synthetic data from other sources of real-behavior.

      This is actually already possible via our “Agent.import_trajectory()” method. Users can pass an array of time stamps and an array of positions into the Agent class which will be loaded and smoothly interpolated along as shown here in Fig. 3a or demonstrated in these two new papers[9,10] who used RatInABox by loading in behavioural trajectories.

      • What if a place cell is not encoding place but is influenced by reward or encodes a more abstract concept? Should a PlaceCell class inherit from an AbstractPlaceCell class, which could be used for encoding more conceptual spaces? How could their tool support this?

      In fact PlaceCells already inherit from a more abstract class (Neurons) which contains basic infrastructure for initialisation, saving data, and plotting data etc. We prefer the solution that users can write their own cell classes which inherit from Neurons (or PlaceCells if they wish). Then, users need only write a new get_state() method which can be as simple or as complicated as they like. Here are two examples we’ve already made which can be found on the GitHub:

      Author response image 2.

      Phase precession: PhasePrecessingPlaceCells(PlaceCells)[12] inherit from PlaceCells and modulate their firing rate by multiplying it by a phase dependent factor causing them to “phase precess”.

      Splitter cells: Perhaps users wish to model PlaceCells that are modulated by recent history of the Agent, for example which arm of a figure-8 maze it just came down. This is observed in hippocampal “splitter cell”. In this demo[1] SplitterCells(PlaceCells) inherit from PlaceCells and modulate their firing rate according to which arm was last travelled along.

      • This a bit odd in the Discussion: "If there is a small contribution you would like to make, please open a pull request. If there is a larger contribution you are considering, please contact the corresponding author3" This should be left to the repo contribution guide, which ideally shows people how to contribute and your expectations (code formatting guide, how to use git, etc). Also this can be very off-putting to new contributors: what is small? What is big? we suggest use more inclusive language.

      We’ve removed this line and left it to the GitHub repository to describe how contributions can be made.

      • Could you expand on the run time for BoundaryVectorCells, namely, for how long of an exploration period? We found it was on the order of 1 min to simulate 30 min of exploration (which is of course fast, but mentioning relative times would be useful).

      Absolutely. How long it takes to simulate BoundaryVectorCells will depend on the discretisation timestep and how many neurons you simulate. Assuming you used the default values (dt = 0.1, n = 10) then the motion model should dominate compute time. This is evident from our analysis in Figure 3f which shows that the update time for n = 100 BVCs is on par with the update time for the random motion model, therefore for only n = 10 BVCs, the motion model should dominate compute time.

      So how long should this take? Fig. 3f shows the motion model takes ~10-3 s per update. One hour of simulation equals this will be 3600/dt = 36,000 updates, which would therefore take about 72,000*10-3 s = 36 seconds. So your estimate of 1 minute seems to be in the right ballpark and consistent with the data we show in the paper.

      Interestingly this corroborates the results in a new inset panel where we calculated the total time for cell and motion model updates for a PlaceCell population of increasing size (from n = 10 to 1,000,000 cells). It shows that the motion model dominates compute time up to approximately n = 1000 PlaceCells (for BoundaryVectorCells it’s probably closer to n = 100) beyond which cell updates dominate and the time scales linearly.

      These are useful and non-trivial insights as they tell us that the RatInABox neuron models are quite efficient relative to the RatInABox random motion model (something we hope to optimise further down the line). We’ve added the following sentence to the results:

      “Our testing (Fig. 3f, inset) reveals that the combined time for updating the motion model and a population of PlaceCells scales sublinearly O(1) for small populations n > 1000 where updating the random motion model dominates compute time, and linearly for large populations n > 1000. PlaceCells, BoundaryVectorCells and the Agent motion model update times will be additionally affected by the number of walls/barriers in the Environment. 1D simulations are significantly quicker than 2D simulations due to the reduced computational load of the 1D geometry.”

      And this sentence to section 2:

      “RatInABox is fundamentally continuous in space and time. Position and velocity are never discretised but are instead stored as continuous values and used to determine cell activity online, as exploration occurs. This differs from other models which are either discrete (e.g. “gridworld” or Markov decision processes) or approximate continuous rate maps using a cached list of rates precalculated on a discretised grid of locations. Modelling time and space continuously more accurately reflects real-world physics, making simulations smooth and amenable to fast or dynamic neural processes which are not well accommodated by discretised motion simulators. Despite this, RatInABox is still fast; to simulate 100 PlaceCell for 10 minutes of random 2D motion (dt = 0.1 s) it takes about 2 seconds on a consumer grade CPU laptop (or 7 seconds for BoundaryVectorCells).”

      Whilst this would be very interesting it would likely represent quite a significant edit, requiring rewriting of almost all the geometry-handling code. We’re happy to consider changes like these according to (i) how simple they will be to implement, (ii) how disruptive they will be to the existing API, (iii) how many users would benefit from the change. If many users of the package request this we will consider ways to support it.

      • In general, the set of default parameters might want to be included in the main text (vs in the supplement).

      We also considered this but decided to leave them in the methods for now. The exact value of these parameters are subject to change in future versions of the software. Also, we’d prefer for the main text to provide a low-detail high-level description of the software and the methods to provide a place for keen readers to dive into the mathematical and coding specifics.

      • It still says you can only simulate 4 velocity or head directions, which might be limiting.

      Thanks for catching this. This constraint has been relaxed. Users can now simulate an arbitrary number of head direction cells with arbitrary tuning directions and tuning widths. The methods have been adjusted to reflect this (see section 6.3.4).

      • The code license should be mentioned in the Methods.

      We have added the following section to the methods:

      6.6 License RatInABox is currently distributed under an MIT License, meaning users are permitted to use, copy, modify, merge publish, distribute, sublicense and sell copies of the software.

    1. Author Response:

      Reviewer #1:

      The largest concern with the manuscript is its use of resting-state recordings in Parkinson's Disease patients on and off levodopa, which the authors interpret as indicative of changes in dopamine levels in the brain but not indicative of altered movement and other neural functions. For example, when patients are off medication, their UPDRS scores are elevated, indicating they likely have spontaneous movements or motor abnormalities that will likely produce changed activations in MEG and LFP during "rest". Authors must address whether it is possible to study a true "resting state" in unmedicated patients with severe PD. At minimum this concern must be discussed in the manuscript.

      We agree that Parkinson’s disease can lead to unwanted movements such as tremor as well as hyperkinesias. This would of course be a deviation from a resting state in healthy subjects. However, such movements are part of the disease and occur unwillingly. The main tremor in Parkinson’s disease is a rest tremor and - as the name already suggests – it occurs while not doing anything. Therefore, such movements can arguably be considered part of the resting state of Parkinson’s disease. Resting state activity with and without medication is therefore still representative for changes in brain activity in Parkinson’s patients and indicative of alterations due to medication.

      To further investigate the effect of movement in our patients, we subdivided the UPDRS part 3 score into tremor and non-tremor subscores. For the tremor subscore we took the mean of item 15 and 17 of the UPDRS, whereas for the non-tremor subscore items 1, 2, 3, 9, 10, 12, 13, and 14 were averaged. Following Spiegel et al., 2007, we classified patients as akinetic-rigid (non-tremor score at least twice the tremor score), tremor-dominant (tremor score at least twice as large as the non-tremor score), and mixed type (for the remaining scores). Of the 17 patients, 1 was tremor dominant and 1 was classified as mixed type (his/her non-tremor score was greater than tremor score). None of our patients exhibited hyperkinesias during the recording. To exclude that our results are driven by tremor-related movement, we re-ran the HMM without the tremor-dominant and the mixed-type patient (see Figure R1 response letter).

      ON medication results for all HMM states remained the same. OFF medication results for the Ctx-Ctx and STN-STN state remained the same as well. The Ctx-STN state OFF medication was split into two states: Sensorimotor-STN connectivity was captured in one state and all other types of Ctx-STN connections were captured in another state (see Figure 1 response letter. The important point is that the biological conclusions stand across these solutions. Regardless, both with and without the two subjects a stable covariance matrix entailing sensorimotor-STN connectivity was determined, which is the main finding for the Ctx-STN state OFF medication.

      We therefore discuss this issue now within the limitation section (page 20):

      “Both motor impairment and motor improvement can cause movement during the resting state in PD. While such movement is a deviation from a resting state in healthy subjects, such movements are part of the disease and occur unwillingly. Therefore, such movements can arguably be considered part of the resting state of Parkinson’s disease. None of the patients in our cohort experienced hyperkinesia during the recording. All patients except for two were of the akinetic-rigid subtype. We verified that tremor movement is not driving our results. Recalculating the HMM states without these 2 subjects, even though it slightly changed some particular aspects of the HMM solution did not materially affect the conclusions.”

      Figure R1: States obtained after removing one tremor dominant and one mixed type patient from analysis. Panel C shows the split OFF medication cortico-STN state. Most of the cortico-STN connectivity is captured by the state shown in the top row (Figure 1 C OFF). Only the motor-STN connectivity in the alpha and beta band (along with a medial frontal-STN connection in the alpha band) is captured separately by the states labeled “OFF Split” (Figure 1 C OFF SPLIT).

      This reviewer was unclear on why increased "communication" in the medial OFC in delta and theta was interpreted as a pathological state indicating deteriorated frontal executive function. Given that the authors provide no evidence of poor executive function in the patients studied, the authors must at least provide evidence from other studies linking this feature with impaired executive function.

      If we understand the comment correctly it refers to the statement in the abstract “Dopaminergic medication led to communication within the medial and orbitofrontal cortex in the delta/theta frequency range. This is in line with deteriorated frontal executive functioning as a side effect of dopamine treatment in Parkinson’s disease”

      This statement is based on the dopamine overdose hypothesis reported in the Parkinson’s disease (PD) literature (Cools 2001; Kelly et al. 2009; MacDonald and Monchi 2011; Vaillancourt et al. 2013). We have elaborated upon the dopamine overdose hypothesis in the discussion on page 16. In short, dopaminergic neurons are primarily lost from the substantia nigra in PD, which causes a higher dopamine depletion in the dorsal striatal circuitry than within the ventral striatal circuits (Kelly et al. 2009; MacDonald and Monchi 2011). Thus, dopaminergic medication to treat the PD motor symptoms leads to increased dopamine levels in the ventral striatal circuits including frontal cortical activity, which can potentially explain the cognitive deficits observed in PD (Shohamy et al. 2005; George et al. 2013). We adjusted the abstract to read:

      “Dopaminergic medication led to coherence within the medial and orbitofrontal cortex in the delta/theta frequency range. This is in line with known side effects of dopamine treatment such as deteriorated executive functions in Parkinson’s disease.”

      In this article, authors repeatedly state their method allows them to delineate between pathological and physiological connectivity, but they don't explain how dynamical systems and discrete-state stochasticity support that goal.

      To recapitulate, the HMM divides a continuous time series into discrete states. Each state is a time-delay embedded covariance matrix reflecting the underlying connectivity between brain regions as well as the specific temporal dynamics in the data when such state is active. See Packard et al., (1980) for details about how a time-delay embedding characterises a linear dynamical system.

      Please note that the HMM was used as a data-driven, descriptive approach without explicitly assuming any a-priori relationship with pathological or physiological states. The relation between biology and the HMM states, thus, purely emerged from the data; i.e. is empirical. What we claim in this work is simply that the features captured by the HMM hold some relation with the physiology even though the estimation of the HMM was completely unsupervised (i.e. blind to the studied conditions). We have added this point also to the limitations of the study on page 19 and the following to the introduction to guide the reader more intuitively (page 4):

      “To allow the system to dynamically evolve, we use time delay embedding. Theoretically, delay embedding can reveal the state space of the underlying dynamical system (Packard et al., 1980). Thus, by delay-embedding PD time series OFF and ON medication we uncover the differential effects of a neurotransmitter such as dopamine on underlying whole brain connectivity.”

      Reviewer #2:

      Sharma et al. investigated the effect of dopaminergic medication on brain networks in patients with Parkinson's disease combining local field potential recordings from the subthalamic nucleus and magnetencephalography during rest. They aim to characterize both physiological and pathological spectral connectivity.

      They identified three networks, or brain states, that are differentially affected by medication. Under medication, the first state (termed hyperdopaminergic state) is characterized by increased connectivity of frontal areas, supposedly responsible for deteriorated frontal executive function as a side effect of medical treatment. In the second state (communication state), dopaminergic treatment largely disrupts cortico-STN connectivity, leaving only selected pathways communicating. This is in line with current models that propose that alleviation of motor symptoms relates to the disruption of pathological pathways. The local state, characterized by STN-STN oscillatory activities, is less affected by dopaminergic treatment.

      The authors utilize sophisticated methods with the potential to uncover the dynamics of activities within different brain network, which opens the avenue to investigate how the brain switches between different states, and how these states are characterized in terms of spectral, local, and temporal properties. The conclusions of this paper are mostly well supported by data, but some aspects, mainly about the presentation of the results, remain:

      We would like to thank the reviewer for his succinct and clear understanding of our work.

      1) The presentation of the results is suboptimal and needs improvement to increase readers' comprehension. At some points this section seems rather unstructured, some results are presented multiple times, and some passages already include points rather suitable for the discussion, which adds too much information for the results section.

      We have removed repetitions in the results sections and removed the rather lengthy introductory parts of each subsection. Moreover, we have now moved all parts, which were already an interpretation of our findings to the discussion.

      2) It is intriguing that the hyperdopaminergic state is not only identified under medication but also in the off-state. This is intriguing, especially with the results on the temporal properties of states showing that the time of the hyperdopaminergic state is unaffected by medication. When such a state can be identified even in the absence of levodopa, is it really optimal to call it "hyperdopaminergic"? Do the results not rather suggest that the identified network is active both off and on medication, while during the latter state its' activities are modulated in a way that could relate to side effects?

      The reviewer’s interpretations of the results pertaining to the hyper-dopaminergic state are correct. The states had been named post-hoc as explained in the results section. The hyper-dopaminergic state’s name derived from it showing the overdosing effects of dopamine. Of course, these results are only visible on medication. But off medication, this state also exists without exhibiting the effects of excess dopamine. To avoid confusion or misinterpretation of the findings and also following the relevant comment by reviewer 1, we renamed all states to be more descriptive:

      Hyperdopaminergic > Cortico-cortical state

      Communication > Cortico-STN state

      Local > STN-STN state.

      3) Some conclusions need to be improved/more elaborated. For example, the coherence of bilateral STN-STN did not change between medication off and on the state. Yet it is argued that a) "Since synchrony limits information transfer (Cruz et al. 2009; Cagnan, Duff, and Brown 2015; Holt et al. 2019) , local oscillations are a potential mechanism to prevent excessive communication with the cortex" (line 436) and b) "Another possibility is that a loss of cortical afferents causes local basal ganglia oscillations to become more pronounced" (line 438). Can these conclusions really be drawn if the local oscillations did not change in the first place?

      We apologize for the unclear description. Our conclusion was based on the following results:

      a) We state that STN-STN connectivity as measured by the magnitude of STN-STN coherence does not change OFF vs ON medication in the Cortico-STN state. This result is obtained using inter-medication analysis.

      b) But ON medication, STN-STN coherence in the Cortico-STN state was significantly different from mean coherence within the ON condition. These results are obtained using intra-medication analysis.

      Based on this, we conclude that in the Cortico-STN state, although OFF vs ON medication the magnitude of STN-STN coherence was unchanged, the STN-STN coherence was significantly different from mean coherence in the ON medication condition. The emergence of synchronous STN-STN activity may limit information exchange between STN and cortex ON medication.

      An alternative explanation for these findings might be a mechanism preventing connectivity between cortex and the STN ON medication. This missing interaction between STN and cortex might cause STN-STN oscillations to increase compared to the mean coherence within the ON state. Unfortunately, we cannot test such causal influences with our analysis.

      We have added the following discussion to the manuscript on page 17 in order to improve the exposition:

      “Bilateral STN–STN coherence in the alpha and beta band did not change in the cortico-STN state ON versus OFF medication (InterMed analysis). However, STN-STN coherence was significantly higher than the mean level ON medication (IntraMed analysis). Since synchrony limits information transfer (Cruz et al. 2009; Cagnan, Duff, and Brown 2015; Holt et al. 2019), the high coherence within the STN ON medication could prevent communication with the cortex. A different explanation would be that a loss of cortical afferents leads to increased local STN coherence. The causal nature of the cortico-basal ganglia interaction is an endeavour for future research.”

      Reviewer #3:

      In PD, pathological neuronal activity along the cortico-basal ganglia network notably consists in the emergence of abnormal synchronized oscillatory activity. Nevertheless, synchronous oscillatory activity is not necessarily pathological and also serve crucial cognitive functions in the brain. Moreover, the effect of dopaminergic medication on oscillatory network connectivity occurring in PD are still poorly understood. To clarify these issues, Sharma and colleagues simultaneously-recorded MEG-STN LFP signals in PD patients and characterized the effect of dopamine (ON and OFF dopaminergic medication) on oscillatory whole-brain networks (including the STN) in a time-resolved manner. Here, they identified three physiologically interpretable spectral connectivity patterns and found that cortico-cortical, cortico-STN, and STN-STN networks were differentially modulated by dopaminergic medication.

      Strengths:

      1) Both the methodological and experimental approaches used are thoughtful and rigorous.

      a) The use of an innovative data-driven machine learning approach (by employing a hidden Markov model), rather than hand-crafted analyses, to identify physiologically interpretable spectral connectivity patterns (i.e., distinct networks/states) is undeniably an added value. In doing so, the results are not biased by the human expertise and subjectivity, which make them even more solid.

      b) So far, the recurrent oscillatory patterns of transient network connectivity within and between the cortex and the STN reported in PD was evaluated/assessed to specific cortico-STN spectral connectivity. Conversely, whole-brain MEG studies in PD patients did not account for cortico-STN and STN-STN connectivity. Here, the authors studied, for the first time, the whole-brain connectivity including the STN (whole brain-STN approach) and therefore provide new evidence of the brain connectivity reported in PD, as well as new information regarding the effect of dopaminergic medication on the recurrent oscillatory patterns of transient network connectivity within and between the cortex and the STN reported in PD.

      2) Studying the temporal properties of the recurrent oscillatory patterns of transient network connectivity both ON and OFF medication is extremely important and provide interesting and crucial information in order to delineated pathological versus physiologically-relevant spectral brain connectivity in PD.

      We would like to thank the reviewer for their valuable feedback and correct interpretation of our manuscript.

      Weaknesses:

      1) In this study, the authors implied that the ON dopaminergic medication state correspond to a physiological state. However, as correctly mentioned in the limitations of the study, they did not have (for obvious reasons) a control/healthy group. Moreover, no one can exclude the emergence of compensatory and/or plasticity mechanisms in the brain of the PD patients related to the duration of the disease and/or the history of the chronic dopamine-replacement therapy (DRT). Duration of the disease and DRT history should be therefore considered when characterizing the recurrent oscillatory patterns of transient network connectivity within and between the cortex and the STN reported in PD, as well as when examining the effect of the dopaminergic medication on the functioning of these specific networks.

      We would like to thank the reviewer for pointing this out. We regressed duration of disease (year of measurement – year of onset) on the temporal properties of the HMM states. We found no relationship between any of the temporal properties and disease duration. Similarly, we regressed levodopa equivalent dosage for each subject on the temporal properties and found no relationship. We now discuss this point in the manuscript (page 20):

      “A further potential influencing factor might be the disease duration and the amount of dopamine patients are receiving. Both factors were not significantly related to the temporal properties of the states.”

      2) Here, the authors recorded LFPs in the STN activity. LFP represents sub-threshold (e.g., synaptic input) activity at best (Buzsaki et al., 2012; Logothetis, 2003). Recent studies demonstrated that mono-polar, but also bi-polar, BG LFPs are largely contaminated by volume conductance of cortical electroencephalogram (EEG) activity even when re-referenced (Lalla et al., 2017; Marmor et al., 2017). Therefore, it is likely that STN LFPs do not accurately reflect local cellular activity. In this study, the authors examined and measured coherence between cortical areas and STN. However, they cannot guarantee that STN signals were not contaminated by volume conducted signals from the cortex.

      We appreciate this concern and thank the reviewer for bringing it up. Marmor et al. (2017) investigated this on humans and is therefore most closely related to our research. They find that re-referenced STN recordings are not contaminated by cortical signals. Furthermore, the data in Lalla et al. (2017) is based on recordings in rats, making a direct transfer to human STN recordings problematic due to the different brain sizes. Since we re-referenced our LFP signals as recommended in the Marmor paper, we think that contamination due to cortical signals is relatively minor; see Litvak et al. (2011), Hirschmann et al. (2013), and Neumann et al. (2016) for additional references supporting this. That being said, we now discuss this potential issue in the paper on page 20.

      “Lastly, we recorded LFPs from within the STN –an established recording procedure during the implantation of DBS electrodes in various neurological and psychiatric diseases. Although for Parkinson patients results on beta and tremor activity within the STN have been reproduced by different groups (Reck et al. 2010, Litvak et al. 2011, Florin et al. 2013, Hirschmann et al. 2013, Neumann et al. 2016), it is still not fully clear whether these LFP signals are contaminated by volume-conducted cortical activity. However, while volume conduction seems to be a larger problem in rodents even after re-referencing the LFP signal (Lalla et al. 2017), the same was not found in humans (Marmor et al. 2017).”

      3) The methods and data processing are rigorous but also very sophisticated which make the perception of the results in terms of oscillatory activity and neural synchronization difficult.

      To aid intuition on how to interpret the result in light of the methods used, one can compare the analysis pipeline to a windowing approach. In a more standard approach, windows of different time length can be defined for different epochs within the time series and for each window coherence and connectivity can be determined. The difference in our approach is that we used an unsupervised learning algorithm to select windows of varying length based on recurring patterns of whole brain network activity. Within those defined windows we then determine the oscillatory properties via coherence and power – which is the same as one would do in a classical analysis. We have added an explanation of the concept of “oscillatory activity” within our framework to the introduction (page 2 footnote):

      “For the purpose of our paper, we refer to oscillatory activity or oscillations as recurrent, but transient frequency–specific patterns of network activity, even though the underlying patterns can be composed of either sustained rhythmic activity, neural bursting, or both (Quinn et al. 2019).”

      Moreover, we provide a more intuitive explanation of the analysis within the first section of the results (page 4):

      “Using an HMM, we identified recurrent patterns of transient network connectivity between the cortex and the STN, which we henceforth refer to as an ‘HMM state’. In comparison to classic sliding-window analysis, an HMM solution can be thought of as a data-driven estimation of time windows of variable length (within which a particular HMM state was active): once we know the time windows when a particular state is active, we compute coherence between different pairs of regions for each of these recurrent states.”

      4) Previous studies have shown that abnormal oscillations within the STN of PD patients are limited to its dorsolateral/motor region, thus dividing the STN into a dorsolateral oscillatory/motor region and ventromedial non-oscillatory/non-motor region (Kuhn et al. 2005; Moran et al. 2008; Zaidel et al. 2009, 2010; Seifreid et al. 2012; Lourens et al. 2013, Deffains et al., 2014). However, the authors do not provide clear information about the location of the LFP recordings within the STN.

      We selected the electrode contacts based on intraoperative microelectrode recordings (for details, see page 23). The first directional recording height after the entry into the STN was selected to obtain the three directional LFP recordings from the respective hemisphere. This practice has been proven to improve target location (Kochanski et al., 2019; Krauss et al., 2021). The common target area for DBS surgery is the dorsolateral STN. To confirm that the electrodes were actually located within this part of the STN, we now reconstructed the DBS location with Lead-DBS (Horn et al. 2019). All electrodes – except for one – were located within the dorsolateral STN (see figure 7 of the manuscript). To exclude that our results were driven by outlier, we reanalysed our data without this patient. No change in the overall connectivity pattern was observed (see figure R3 of the response letter).

      Figure R2: Lead DBS reconstruction of the location of electrodes in the STN for different subjects. The red electrodes have not been placed properly in the STN. The contacts marked in red represent the directional contacts from which the data was used for analysis.

      Figure R3: HMM states obtained after running the analysis without the subject with the electrode outside the STN.

      References:

      Buzsáki G, Anastassiou CA, Koch C. The origin of extracellular fields and currents-EEG, ECoG, LFP and spikes. Nat Rev Neurosci 2012; 13: 407–20.

      Cagnan H, Duff EP, Brown P. The relative phases of basal ganglia activities dynamically shape effective connectivity in Parkinson’s disease. Brain 2015; 138: 1667–78.

      Cools R. Enhanced or impaired cognitive function in Parkinson’s disease as a function of dopaminergic medication and task demands. Cereb Cortex 2001; 11: 1136–43.

      Cruz A V., Mallet N, Magill PJ, Brown P, Averbeck BB. Effects of dopamine depletion on network entropy in the external globus pallidus. J Neurophysiol 2009; 102: 1092–102.

      Florin E, Erasmi R, Reck C, Maarouf M, Schnitzler A, Fink GR, et al. Does increased gamma activity in patients suffering from Parkinson’s disease counteract the movement inhibiting beta activity? Neuroscience 2013; 237: 42–50.

      George JS, Strunk J, Mak-Mccully R, Houser M, Poizner H, Aron AR. Dopaminergic therapy in Parkinson’s disease decreases cortical beta band coherence in the resting state and increases cortical beta band power during executive control. NeuroImage Clin 2013; 3: 261–70.

      Hirschmann J, Özkurt TE, Butz M, Homburger M, Elben S, Hartmann CJ, et al. Differential modulation of STN-cortical and cortico-muscular coherence by movement and levodopa in Parkinson’s disease. Neuroimage 2013; 68: 203–13.

      Holt AB, Kormann E, Gulberti A, Pötter-Nerger M, McNamara CG, Cagnan H, et al. Phase-dependent suppression of beta oscillations in parkinson’s disease patients. J Neurosci 2019; 39: 1119–34.

      Horn A, Li N, Dembek TA, Kappel A, Boulay C, Ewert S, et al. Lead-DBS v2: Towards a comprehensive pipeline for deep brain stimulation imaging. Neuroimage 2019; 184: 293–316.

      Kelly C, De Zubicaray G, Di Martino A, Copland DA, Reiss PT, Klein DF, et al. L-dopa modulates functional connectivity in striatal cognitive and motor networks: A double-blind placebo-controlled study. J Neurosci 2009; 29: 7364–78.

      Kochanski RB, Bus S, Brahimaj B, Borghei A, Kraimer KL, Keppetipola KM, et al. The impact of microelectrode recording on lead location in deep brain stimulation for the treatment of movement disorders. World Neurosurg 2019; 132: e487–95.

      Krauss P, Oertel MF, Baumann-Vogel H, Imbach L, Baumann CR, Sarnthein J, et al. Intraoperative neurophysiologic assessment in deep brain stimulation surgery and its impact on lead placement. J Neurol Surgery, Part A Cent Eur Neurosurg 2021; 82: 18–26.

      Lalla L, Rueda Orozco PE, Jurado-Parras MT, Brovelli A, Robbe D. Local or not local: Investigating the nature of striatal theta oscillations in behaving rats. eNeuro 2017; 4: 128–45.

      Litvak V, Jha A, Eusebio A, Oostenveld R, Foltynie T, Limousin P, et al. Resting oscillatory cortico-subthalamic connectivity in patients with Parkinson’s disease. Brain 2011; 134: 359–74.

      MacDonald PA, MacDonald AA, Seergobin KN, Tamjeedi R, Ganjavi H, Provost JS, et al. The effect of dopamine therapy on ventral and dorsal striatum-mediated cognition in Parkinson’s disease: Support from functional MRI. Brain 2011; 134: 1447–63.

      MacDonald PA, Monchi O. Differential effects of dopaminergic therapies on dorsal and ventral striatum in Parkinson’s disease: Implications for cognitive function. Parkinsons Dis 2011; 2011: 1–18.

      Marmor O, Valsky D, Joshua M, Bick AS, Arkadir D, Tamir I, et al. Local vs. volume conductance activity of field potentials in the human subthalamic nucleus. J Neurophysiol 2017; 117: 2140–51.

      Neumann WJ, Degen K, Schneider GH, Brücke C, Huebl J, Brown P, et al. Subthalamic synchronized oscillatory activity correlates with motor impairment in patients with Parkinson’s disease. Mov Disord 2016; 31: 1748–51.

      Packard NH, Crutchfield JP, Farmer JD, Shaw RS. Geometry from a time series. Phys Rev Lett 1980; 45: 712–6.

      Quinn AJ, van Ede F, Brookes MJ, Heideman SG, Nowak M, Seedat ZA, et al. Unpacking Transient Event Dynamics in Electrophysiological Power Spectra. Brain Topogr 2019; 32: 1020–34.

      Reck C, Himmel M, Florin E, Maarouf M, Sturm V, Wojtecki L, et al. Coherence analysis of local field potentials in the subthalamic nucleus: Differences in parkinsonian rest and postural tremor. Eur J Neurosci 2010; 32: 1202–14.

      Shohamy D, Myers CE, Grossman S, Sage J, Gluck MA. The role of dopamine in cognitive sequence learning: Evidence from Parkinson’s disease. Behav Brain Res 2005; 156: 191–9.

      Spiegel J, Hellwig D, Samnick S, Jost W, Möllers MO, Fassbender K, et al. Striatal FP-CIT uptake differs in the subtypes of early Parkinson’s disease. J Neural Transm 2007; 114: 331–5.

      Vaillancourt DE, Schonfeld D, Kwak Y, Bohnen NI, Seidler R. Dopamine overdose hypothesis: Evidence and clinical implications. Mov Disord 2013; 28: 1920–9.

    1. Author Response

      Reviewer #1 (Public Review):

      Determination of the biomechanical forces and downstream pathways that direct heart valve morphogenesis is an important area of research. In the current study, potential functions of localized Yap signaling in cardiac valve morphogenesis were examined. Extensive immunostainings were performed for Yap expression, but Yap activation status as indicated by nuclear versus cytoplasmic localization, Yap dephosphorylation, or expression of downstream target genes was not examined.

      We thank the reviewer for appreciating the significance of this work, and we also thank the reviewer for the constructive suggestions. Following these suggestions, we have improved analysis of YAP activation status and used nuclear versus cytoplasmic localization to quantify YAP activation. To address the reviewer’s concerns, we have conducted extra qPCR analysis of YAP downstream target genes and YAP upstream genes in Hippo pathway. Please find the detailed revisions in our responses to the Recommendations for authors.

      The goal of the work was to determine Yap activation status relative to different mechanical environments, but no biomechanical data on developing heart valves were provided in the study.

      We appreciate the reviewer for raising this concern. We have previously published the biomechanical data of developing chick embryonic heart valves in the following study:

      Buskohl PR, Gould RA, Butcher JT. Quantification of embryonic atrioventricular valve biomechanics during morphogenesis. Journal of Biomechanics. 2012;45(5):895-902.

      In that study, we used micropipette aspiration to measure the nonlinear biomechanics (strain energy) of chick embryonic heart valves at different developmental stages. Here in this study, we used the same method to measure the strain energy of YAP activated/inhibited cushion explants and compared it to the data from our previous study. Our findings were summarized in the Results: “YAP inhibition elevated valve stiffness”, and the detailed measurements, including images and data, are presented in Figure S4.

      There are several major weaknesses that diminish enthusiasm for the study.

      1) The Hippo/Yap pathway activation leads to dephosphorylation of Yap, nuclear localization, and induced expression of downstream target genes. However, there are no data included in the study on Yap nuclear/cytoplasmic ratios, phosphorylation status, or activation of other Hippo pathway mediators. Analysis of Yap expression alone is insufficient to determine activation status since it is widely expressed in multiple cells throughout the valves. The specificity for activated Yap signaling is not apparent from the immunostainings.

      We thank the reviewer for pointing out this weakness. We have now implemented nuclear versus cytoplasmic localization as recommended to quantify YAP activation. We have also conducted additional experiments to analyze via qPCR YAP downstream target genes and YAP upstream genes in Hippo pathway. Please see the detailed revisions in our responses to the Recommendations for authors.

      2) The specific regionalized biomechanical forces acting on different regions of the valves were not measured directly or clearly compared with Yap activation status. In some cases, it seems that Yap is not present in the nuclei of endothelial cells surrounding the valve leaflets that are subject to different flow forces (Fig 1B) and the main expression is in valve interstitial subpopulations. Thus the data presented do not support differential Yap activation in endothelial cells subject to different fluid forces. There is extensive discussion of different forces acting on the valve leaflets, but the relationship to Yap signaling is not entirely clear.

      We thank the reviewer for these important questions. The region-specific biomechanics have been well mapped and studied, thanks to the help from Computational Fluid Dynamics supported by ultrasound velocity and pressure measurements. For example:

      Yalcin, H.C., Shekhar, A., McQuinn, T.C. and Butcher, J.T. (2011), Hemodynamic patterning of the avian atrioventricular valve. Dev. Dyn., 240: 23-35.

      Bharadwaj KN, Spitz C, Shekhar A, Yalcin HC, Butcher JT. Computational fluid dynamics of developing avian outflow tract heart valves. Ann Biomed Eng. 2012 Oct;40(10):2212-27. doi: 10.1007/s10439-012-0574-8.

      Ayoub S, Ferrari G, Gorman RC, Gorman JH, Schoen FJ, Sacks MS. Heart Valve Biomechanics and Underlying Mechanobiology. Compr Physiol. 2016 Sep 15;6(4):1743-1780.

      Salman HE, Alser M, Shekhar A, Gould RA, Benslimane FM, Butcher JT, et al. Effect of left atrial ligation-driven altered inflow hemodynamics on embryonic heart development: clues for prenatal progression of hypoplastic left heart syndrome. Biomechanics and Modeling in Mechanobiology. 2021;20(2):733-50.

      Ho S, Chan WX, Yap CH. Fluid mechanics of the left atrial ligation chick embryonic model of hypoplastic left heart syndrome. Biomechanics and Modeling in Mechanobiology. 2021;20(4):1337-51.

      Those studies have shown that USS develops on the inflow surface of valves while OSS develops on the outflow surface of valves, CS develops in the tip region of valves while TS develops in the regions of elongation and compaction. Here in this study, we mimic those forces in our in-vitro and ex-vivo models. This allows us to study the direct effect of specific force on the YAP activity in different cell lineages. The results showed that OSS promoted YAP activation in VECs while USS inhibited it, CS promoted YAP activation in VICs while TS inhibited it. This result well explained the spatiotemporal distribution of YAP activation in Figure 1. For example, nuclear YAP was mostly found in VECs on the fibrosa side, where OSS develops, and YAP was not expressed in the nuclei in VECs of the atrialis/ventricularis side, where USS develops. It is also worth noting that formation of OSS on the outflow side is slower, and thus the side specific YAP activation in VECs was not in effect at the early stage, from E11.5 to E14.5.

      3) The requirement for Yap signaling in heart valve remodeling as described in the title was not demonstrated through manipulation of Yap activity.

      With respect, it is unclear what the reviewer is asking for given no experiments are suggested nor an elaboration of alternative interpretations of our results that emphasize against YAP requirement. It has been previously shown that YAP signaling is required for early EMT stages of valvulogenesis using conditional YAP deletion in mice:

      Zhang H, von Gise A, Liu Q, Hu T, Tian X, He L, et al. Yap1 Is Required for Endothelial to Mesenchymal Transition of the Atrioventricular Cushion. Journal of Biological Chemistry. 2014;289(27):18681-92.

      Signaling roles for early regulators at these later fetal stages are different, sometimes opposite early EndMT stages, thus contraindicating reliance on these early data to explain later events:

      Bassen D, Wang M, Pham D, Sun S, Rao R, Singh R, et al. Hydrostatic mechanical stress regulates growth and maturation of the atrioventricular valve. Development. 2021;148(13).

      However, embryos with YAP deletion failed to form endocardial cushions and could not survive long enough for the study of its roles in later cushion growth and remodeling into valve leaflets. In this work,

      We first showed the localization of YAP activity and its direct link with local shear or pressure domains. Then we explicitly applied controlled gain and loss of function of YAP via specific molecules. We also applied critical mechanical gain or loss of function studies to demonstrate YAP mechanoactivation necessity and sufficiency to achieve growth and remodeling.

      Reviewer #2 (Public Review)

      This study by Wang et al. examines changes in YAP expression in embryonic avian cultured explants in response to high and low shear stress, as well as tensile and compressive stress. The authors show that YAP expression is increased in response to low, oscillatory shear stress, as well as high compressive stress conditions. Inhibition of YAP signaling prevents compressive stress-induced increases in circularity, decreased pHH3 expression, and increases VE-cadherin expression. On the other hand, YAP gain of function prevents tensile stress-induced decreases in pHH3 expression and VE-cadherin expansion. It also decreases the strain energy density of embryonic avian cushion explants. Finally, using an avian model of left atrial ligation, the authors demonstrate that unloaded regions within the primitive valve structures are associated with increased YAP expression, compared to regions of restricted flow where YAP expression is low. Overall, this study sheds light on the biomechanical regulation of YAP expression in developing valves.

      We thank the reviewer for the accurate summary and their enthusiasm for this work.

      Strengths of the manuscript include:

      • Novel insights into the dynamic expression pattern of YAP in valve cell populations during post-EMT stages of embryonic valvulogenesis.

      • Identify the positive regulation of YAP expression in response to low, oscillatory shear stress, as well as high compressive stress conditions.

      • Identify a link between YAP signaling in regulating stress-induced cell proliferation and valve morphogenesis.

      • The inclusion of the atrial left atrial ligation model is innovative, and the data showing distinguishable YAP expression levels between restricted, and non-restricted flow regions is insightful.

      We thank the reviewer for appreciating the strengths of this work.

      This is a descriptive study that focuses on changes in YAP expression following exposure to diverse stress conditions in embryonic avian cushion explants. Overall, the study currently lacks mechanistic insights, and conclusions based on data are highly over-interpreted, particularly given that the majority of experimental protocols rely on one method of readout.

      We thank the reviewer for constructive suggestions.

      Reviewer #3 (Public Review)

      In this manuscript, Wang et al. assess the role of wall shear stress and hydrostatic pressure during valve morphogenesis at stages where the valve elongates and takes shape. The authors elegantly demonstrate that shear and pressure have different effects on cell proliferation by modulating YAP signaling. The authors use a combination of in vitro and in vivo approaches to show that YAP signaling is activated by hydrostatic pressure changes and inhibited by wall shear stress.

      We thank the reviewer for their enthusiasm for the impact of our work.

      There are a few elements that would require clarification:

      1) The impact of YAP on valve stiffness was unclear to me. How is YAP signaling affecting stiffness? is it through cell proliferation changes? I was unclear about the model put forward:

      • Is it cell proliferation (cell proliferation fluidity tissue while non-proliferating tissue is stiffer?)

      • Is it through differential gene expression?

      This needs clarification.

      We thank the reviewer for raising this important question. Cell proliferation can affect valve stiffness but is a minor factor compared with ECM deposition and cell contractility Our micropipette aspiration data showed that the higher cell proliferation rate induced by YAP activation did lead to stiffer valves when compared to the controls. This may be because at the early stages, cells are more elastic than the viscous ECM. However, the stiffness of YAP activated valves were only about half of that of YAP inhibited valves, showing that the transcriptional level factor plays a more important role. This also suggests that YAP inhibited valves exhibited a more mature phenotype. An analogous role of YAP has also been found in cardiomyocytes. Many theories propose that in cardiomyocytes when YAP is activated the proliferation programs are turned on, while when YAP is inhibited the proliferation programs are turned off and maturation programs are released. Similarly, here we hypothesize that YAP works like a mechanobiological switch, converting mechanical signaling into the decision between growth and maturation. We have revised the Discussion to include this hypothesis.

      2) The model proposes an early asymmetric growth of the cushion leading to different shear forces (oscillatory vs unidirectional shear stress). What triggers the initial asymmetry of the cushion shape? is YAP involved?

      Although the initial geometry of the cushion model is symmetric, the force acting on it is asymmetric. The detailed numerical simulation of how the initial forces trigger the asymmetric morphogenesis can be found in our previous publication:

      Buskohl PR, Jenkins JT, Butcher JT. Computational simulation of hemodynamic-driven growth and remodeling of embryonic atrioventricular valves. Biomechanics and Modeling in Mechanobiology. 2012;11(8):1205-17.

      The color maps represent the dilatation rates when a) only pressure is applied, b) only shear stress is applied, and c) both pressure and shear stress are applied. It is such load that initiates an asymmetric morphological change, as shown in d). In addition, we believe YAP is involved during the initiation because it is directly nuclear activated by CS and OSS or cytoplasmically activated by TS and LSS.

      3) The differential expression of YAP and its correlation to cell proliferation is a little hard to see in the data presented. Drawings highlighting the main areas would help the reader to visualise the results better.

      We thank the reviewer for this helpful suggestion, we have improved the visualization of Figure 3C and Figure 4C with insets of higher magnification.

      4) The origin of osmotic/hydrostatic pressure in vivo. While shear is clearly dependent upon blood flow, it is less clear that hydrostatic pressure is solely dependent upon blood flow. For example, it has been proposed that ECM accumulation such as hyaluronic acid could modify osmotic pressure (see for example Vignes et al.PMID: 35245444). Could the authors clarify the following questions:

      • How blood flow affects osmotic pressure in vivo?

      • Is ECM a factor that could affect osmotic pressure in this system?

      We thank the reviewer for sharing this interesting study. The osmotic pressure plays a critical role in mechanotransduction and the development of many tissues including cardiovascular tissues and cartilage. As proposed in the reference, osmotic pressure is an interstitial force generated by cardiac contractility. Here in our study, the hydrostatic pressure is different, which is an external force applied by flowing blood. According to Bernoulli's law, when an incompressible fluid flows around a solid, the static pressure it applies on the solid is equal to its total pressure minus its dynamic pressure.

      Despite the difference, the osmotic pressure can mimic the effect of hydrostatic pressure in-vitro. The in-vitro osmotic pressure model has been widely used in cartilage research, for example:

      P. J. Basser, R. Schneiderman, R. A. Bank, E. Wachtel, and A. Maroudas, “Mechanical properties of the collagen network in human articular cartilage as measured by osmotic stress technique.,” Arch. Biochem. Biophys., vol. 351, no. 2, pp. 207–19, 1998.

      D. a. Narmoneva, J. Y. Wang, and L. a. Setton, “Nonuniform swelling-induced residual strains in articular cartilage,” J. Biomech., vol. 32, no. 4, pp. 401–408, 1999.

      C. L. Jablonski, S. Ferguson, A. Pozzi, and A. L. Clark, “Integrin α1β1 participates in chondrocyte transduction of osmotic stress,” Biochem. Biophys. Res. Commun., vol. 445, no. 1, pp. 184–190, 2014.

      Z. I. Johnson, I. M. Shapiro, and M. V. Risbud, “Extracellular osmolarity regulates matrix homeostasis in the intervertebral disc and articular cartilage: Evolving role of TonEBP,” Matrix Biol., vol. 40, pp. 10–16, 2014.

      When maturing cushions shift from GAGs dominated ECM to collagen dominated ECM, the water and ion retention capacity of the tissue would be greatly changed, and thus reducing the osmotic pressure. This could in turn accelerate the maturation of cushions. By contrast, the ECM of growing cushions remain GAGs dominated, which would delay maturation and prolong the growth.

      The revised second section of Results is as follows:

      Shear and hydrostatic stress regulate YAP activity

      In addition to the co-effector of the Hippo pathway, YAP is also a key mediator in mechanotransduction. Indeed, the spatiotemporal activation of YAP correlated with the changes in the mechanical environment. During valve remodeling, unidirectional shear stress (USS) develops on the inflow surface of valves, where YAP is rarely expressed in the nuclei of VECs (Figure 2A). On the other side, OSS develops on the outflow surface, where VECs with nuclear YAP localized. The YAP activation in VICs also correlated with hydrostatic pressure. The pressure generated compressive stress (CS) in the tips of valves, where VICs with nuclear YAP localized (Figure 2B). Whereas tensile stress (TS) was created in the elongated regions, where YAP was absent in VIC nuclei.

      To study the effect of shear stress on the YAP activity in VECs, we applied USS and OSS directly onto a monolayer of freshly isolated VECs. The VEC was obtained from AV cushions of chick embryonic hearts at HH25. The cushions were placed on collagen gels with endocardium adherent to the collagen and incubated to enable the VECs to migrate onto the gel. We then removed the cushions and immediately applied the shear flow to the monolayer for 24 hours. The low stress OSS (2 dyn/cm2) promoted YAP nuclear translocation in VEC (Figure 2C, E), while high stress USS (20 dyn/cm2) restrained YAP in cytoplasm.

      To study the effect of hydrostatic stress on the YAP activation in VICs, we used media with different osmolarities to mimic the CS and TS. CS was induced by hypertonic condition while TS was created by hypotonic condition, and the Unloaded (U) condition refers to the osmotically balanced media. Notably, in-vivo hydrostatic pressure is generated by flowing blood, while in-vivo osmotic pressure is generated by cardiac contractility and plays a critical role in the mechanotransduction during valve development (30). Despite the different in-vivo origination, the osmotic pressure provides a reliable model to mimic the hydrostatic pressure in-vitro (31). We cultured HH34 AV cushion explants under different loading conditions for 24 hours and found that the trapezoidal cushions adopted a spherical shape (Figure 2D). TS loaded cushions significantly compacted, and the YAP activation in VICs of TS loaded cushions was significantly lower than that in CS loaded VICs (Figure 2F).

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript applies the framework of information theory to study a subset of cellular receptors (called lectins) that bind to glycan molecules, with a specific focus on the kinds of glycans that are typical of fungal pathogens. The authors use the concentration of various types of ligands as the input to the signaling channel, and measure the "response" of individual cells using a GFP reporter whose expression is driven by a promoter that responds to NFκB. While this work is overall technically solid, I would suggest that readers keep several issues in mind while evaluating these results.

      1) One of the largest potential limitations of the study is the reliance of the authors on exogenous expression of the relevant receptors in U937 cells. Using a cell-line system like this has several advantages, most notably the fact that the authors can engineer different reporters and different combinations of receptors easily into the same cells. This would be much more difficult with, say, primary cells extracted from a mouse or a human. While the ability to introduce different proteins into the cells is a benefit, the problem is that it is not clear how physiologically relevant the results are. To their credit, the authors perform several controls that suggest that differences in transfection efficiency are not the source of the differences in channel capacity between, say, dectin-1 and dectin-2. As the authors themselves clearly demonstrate, however, the differences in the properties of these signaling system are not based on receptor expression levels, but rather on some other property of the receptor. Now, it could be that the dectin-2 receptor is somehow just more "noisy" in terms of its activity compared to, say, dectin-1. This seems a somewhat less likely explanation, however, and so it is likely that downstream details of the signaling systems differ in some way between dectin-2 and the more "information efficient" receptors studied by the authors.

      The channel capacity of a cell signaling network depends critically on the distributions of the downstream signaling molecules in question: see the original paper by Cheong et al. (2011, Science 334 (6054), 354-8) and subsequent papers (notably Selimkhanov et al. (2014) Science 346 (6215), 1370-3 and Suderman et al. (2018) Interface Focus 8 (6), 20180039). The U937 cells considered here clearly don't serve the physiological function of detecting the glycans considered by the authors; despite the fact that this is an artificial cell line, the fact the authors have to exogenously express the relevant receptors indicates that these cells are not necessarily a good model for the types of cells in the body that actually have evolved to sense these glycan molecules.

      Signaling molecules readily exhibit cell-type-specific expression levels that influence cellular responses to external stimuli (Rowland et al.(2017) Nat Commun 8, 16009). So it is unclear that the distributions of downstream signaling molecules in U937 cells mirror those that would be observed in the immune cell types relevant to this response. As such, the physiological relevance of the differences between dectin-2 channel capacities and those exhibited by the other receptors are currently unclear.

      We appreciate Reviewer #1’s in-depth comments related to physiological relevance of the U937 cell. A big benefit of using information theory to investigate a biological communication channel is the realization of quantitative measurement of information that the channel transmits without having detailed measurement of spatiotemporal dynamics of receptors and downstream signaling cascades. In addition, the quantity of measured information itself in turn gives us a decent prediction about detailed signaling mechanisms by comparing the information quantity difference. For example, we investigated how transmission of glycan information from dectin-2 is synergistically modulated in the presence of either dectin-1, DC-SIGN or mincle. Our approach allows to investigate how individual lectins on immune cells contribute to glycan information transmission and be integrated in the presence other type of lectins. Therefore, the findings describe how physiologically relevant lectins are integrating the extracellular signal in a more defined way. Furthermore, we found that our model cell line has one order of magnitude higher expression of dectin-2 compared with primary human monocytes and exhibits a similar zymosan binding pattern (will be described in Recommendations for the authors and Figure R8).

      We fully agree that acquiring more information on the information transmission capability of primary immune cells would increase physiological relevance. In the revised manuscript we addressed this concern by comparing the receptor expression levels of our model cell lines with primary monocytes, for which we find an agreement of cellular heterogeneity. However, we would also like to point out that the very basic nature of our question, of how information stored in glycans is processed by lectins, is not tightly bound to these difference of primary cells and cell lines.

      Line 382: Finally, it is important to take into consideration that our conclusions came from model cell lines, which were used as a surrogate for cell-type-specific lectin expression patterns of primary immune cells. Human monocytes and dectin-2 positive U937 cells have comparable receptor densities and respond similar to stimulation with zymosan particles (SI Fig. 6A and B).

      2) Another issue that readers might want to keep in mind is that the details of the channel capacity calculation are a bit unclear as the manuscript is currently written. The authors indicate that their channel capacity calculations follow the approach of Cheong et al. (2011) Science 334 (6054), 354-8. However, the extent to which they follow that previous approach is not obvious. For instance, the calculations presented in the 2011 work use a combined bootstrapping/linear extrapolation approach to estimate the mutual information at infinite population size in order to deal with known inaccuracies in the calculation that arise from finite-size effects. The Cheong approach also deals with the question of how many bins to use in order to estimate the joint probability distribution across signal and response.

      They do this by comparing the mutual information they calculate for the real data with that calculated for random data to ensure that they are not calculating spuriously high mutual information based on having too many bins. While the Cheong et al. paper does a great job explaining why these steps need to be undertaken, a subsequent paper by Suderman et al. (2017, PNAS 114 (22), 5755-60) explains the approach in even greater detail in the supporting information. Those authors also implemented several improvements to the general approach, including a bootstrap method for more accurately estimating the error in the mutual information and channel capacity estimates.

      The problem here is that, while the authors claim to follow the approach of Cheong et al., it seems that they have re-implemented the calculation, and they do not provide sufficient detail to evaluate the extent to which they are performing the same exact calculation. Since estimates of mutual information are technically challenging, specific details of the steps in their approach would be helpful in order to understand how closely their results can be compared with the results of previous authors. For instance, Cheong et al. estimate the "channel capacity" by trying a set of likely unimodal and bimodal distributions for the input to the channel, and choosing the maximal value as the channel capacity. This is clearly a very approximate approach, since the channel capacity is defined as the supremum over an (uncountably infinite) set of input probability distributions. In any case, the authors of the current manuscript use a different approach to this maximization problem. Although it is a bit unclear how their approach works, it seems that they treat the probability of each input bin as an independent parameter (under the constraint that the probabilities sum to one) and then use an optimization algorithm implemented in Python to maximize the mutual information. In principle, this could be a better approach, since the set of input distributions considered is potentially much larger. The details of the optimization algorithm matter, however, and those are currently unclear as the paper is written.

      We thank Reviewer #1’s recommendation for increasing the legitimacy of the calculation. In the revised manuscript we tried to explain channel capacity calculation procedures in more detail with statistical approaches that adopted from Cheong et al. (2011) and Suderman et al. (2018) (SI section 1 and 2). Furthermore, we decide the number of binning from not only random dataset but also the number of total samples as shown below:

      Figure R1. A) Extrapolated channel capacity values of random dataset at infinitely subsampled distribution under various total number of samples and output binning. The white line in the heatmap represents the channel capacity value at 0.01 bit. B) Extrapolated channel capacity values at infinite subsample size of U937 cells’ input (TNF-a doses) and output (GFP reporter) response.

      Figure R1 describes channel capacity values from random (A) and experimental dataset (B, TNFAR + TNF-a). The channel capacity values from random data indicates the dependence of channel capacity on the number of the output binning and total number sample. According to this heatmap, we decided the allowed bias as 0.01 bits as shown in contour line shown in Figure R1A. Since our minimum dataset that used for channel capacity calculation in the absence of labelled input is near 90,000, the expected bias in channel capacity calculation is therefore less than 0.01 bits in binning range from 10 to 1000 as shown in Figure R1A.

      Furthermore, we demonstrated mutual information maximization procedure using predefined unibimodal input distribution and compared with the systematic method that we used in the work. We found that there is no noticeable difference in channel capacity value between two approaches (SI Figure 3M).

      3) Another issue to be careful about when interpreting these findings is the fact that the authors use logarithmic bins when calculating the channel capacity estimates. This is equivalent to saying that the "output" of the cell signaling channel is not the amount of protein produced under the control of the NFκB promoter, but rather the log of the protein level. Essentially, the authors are considering a case where the relevant output of the system is not the amount of protein itself, but the fold change in the amount of protein. That might be a reasonable assumption, especially if the protein being produced is a transcription factor whose own promoters have evolved to detect fold changes. For many proteins, however, the cell is likely responsive to linear changes in protein concentration, not fold changes. And so choosing the log of the protein level as the output may not make sense in terms of understanding how much information is actually contained in this particular output variable. Regardless, choosing logarithmic bins is not purely a matter of convenience or arbitrary choice, but rather corresponds to a very strong statement about what the relevant output of the channel is.

      We understand Reviewer #1’s concern regarding the choice of log binning. We found that if the number of binning is higher than 200, no matter the binning methods, including linear, logarithmic or equal frequency, the estimated channel capacities in each binning number are converged into the same value. The only difference is how quickly the values approach the converged channel capacity as increasing the binning number (shown in Figure R2). In the revised manuscript, we used linear binning to represent more relevant protein signaling as the Reviewer mentioned. Note that the channel capacity values calculated from linear binning do not show noticeable different from our previously calculated channel capacity values.

      On the other hand, linear binning generates significant bias, if we consider labelled input (i.e., continuous input) into channel capacity calculation, due to the increase of binning in input region.

      Figure R2. Output binning number and binning method dependence of channel capacity value for experimental dataset. The inset plots show the relative difference of channel capacity value to the maximum channel capacity value in the entire binning range (i.e., from 10 to 1000) of the corresponding binning method.

      According to Reviewer #1’s comment we have changed the binning method from logarithmic binning to linear binning in the whole experimental dataset except in the presence of labelled input (i.e., dectin-2 antibody). If we consider channel capacity between labelled input and NF-kB reporter, equal frequency binning is used for every layer of the channel capacity (i.e., labelled input-binding, binding-GFP, labelled input-GFP)

      Reviewer #2 (Public Review):

      My expertise is more on the theoretical than the experimental aspects of this paper, so those will be the focus of these comments.

      Signal transduction is an important area of study for mathematical biologists and biophysicists. This setting is a natural one for information-theoretic methods, and such methods are attracting increasing research interest. Experimental results that attempt to directly quantify the Shannon capacity of signal transduction are particularly interesting. This paper represents an important contribution to this emerging field.

      My main comments are about the rigorousness and correctness of the theoretical results. More details about these results would improve the paper and help the reader understand the results.

      We understand reviewer #2’s comment related with rigorousness and correctness of the theoretical results of this work. In the revised manuscript, we added following contents to help the reader to better understand the channel capacity calculation procedures.

      • General illustrative introduction regarding how we measured input and output dataset and how we handle those data to prepare joint probability distribution shown in SI section 1.1 and 1.2.

      • Exemplified mutual information maximization procedure using experimental and arbitrary dataset shown in SI section 1.3.

      The calculation of channel capacity, given in the methods, is quite a standard calculation and appears to be correct. However, I was confused by the use of the "weighting value" w_i, which is not specified in the manuscript. The input distribution appears to be a product of the weight w_i and the input probability value p_i, and these appear always to occur together as a product w_i p_i. (In joint probabilities w_i p(i,j), the input probability can be extracted using Bayes' rule, leaving w_i p_i p(j|i).) This leads met wonder two things. First, what role does w_i play (is it even necessary)? Second, of particular interest here is the capacity-achieving input distribution p_i, but w_i obscures it; is the physical input distribution p_i equal to the capacity-achieving distribution? If not, what is the meaning of capacity?

      We thank Reviewer #2’s comment regarding the arbitrariness of the weightings. We realize there was a lack of explanation on the weighting values in the original manuscript. 𝑃x(𝑖) is a marginal probability distribution of input from the original dataset and 𝑃x'(𝑖) is the marginal probability distribution of modified input that maximize the mutual information. In usual case 𝑃x(𝑖) is not equal to 𝑃x'(𝑖) and therefore one needs to find 𝑃x'(𝑖) from 𝑃x(𝑖). Because 𝑃x'(𝑖) is a linear combination of 𝑃x(𝑖), it can be expressed as 𝑤(𝑖)𝑃x(𝑖) , where 𝑤(𝑖) is the weightings, under constraint ∑input/i 𝑤(𝑖)𝑃x (𝑖) = 1 . The changed input distribution, in turn, modifies the joint probability distribution as 𝑃'xy (𝑖, 𝑗) = 𝑤(𝑖)𝑃xy)(𝑖, 𝑗). To help readers understand of this work we expanded the Appendix with illustrative descriptions.

      A more minor but important point: the inputs and outputs of the communication channel are never explicitly defined, which makes the meaning of the results unclear. When evaluating the capacity of an information channel, the inputs X and outputs Y should be carefully defined, so that the mutual information I(X;Y) is meaningful; the mutual information is then maximized to obtain capacity. Although it can be inferred that the input X is the ligand concentration, and the output Y is the expression of GFP, it would be helpful if this were stated explicitly.

      We agree with Reviewer’s suggestion for better description of input and output in the manuscript. Therefore, we have modified Figure 1 A and B and the main text to describe the source of input and output much clearly, as follows:

      Line 92: Accounting for the stochastic behavior of cellular signaling, information theory provides robust and quantitative tools to analyze complex communication channels. A fundamental metric of information theory is entropy, which determines the amount of disorder or uncertainty of variables. In this respect, cellular signaling pathways having high variability of the initiating input signals (e.g. stimulants) and the corresponding highly variable output response (i.e. cellular signaling) can be characterized as a high entropy. Importantly, input and output can have mutual dependence and therefore knowing the input distribution can partly provide the information of output distribution. If noise is present in the communication channel, input and output have reduced mutual dependence. This mutual dependence between input and output is called mutual information. Mutual information is, therefore, a function of input distribution and the upper bound of mutual information is called channel capacity (SI section 1) (Cover and Thomas, 2012). In this report, a communication channel describes signal transduction pathway of C-type lectin receptor, which ultimately lead to NF-κB translocation and finally GFP expression in the reporter model (Fig. 1A). To quantify the signaling information of the communication channels, we used channel capacity. Importantly, the channel capacity isn’t merely describing the resulting maximum intensity of the reporter cells. The channel capacity takes cellular variation and activation across a whole range of incoming stimulus of single cell resolved data into account and quantifies all of that data into a single number.

    1. Author Response:

      Reviewer #1 (Public Review):

      Overall, the authors have done a nice job covering the relevant literature, presenting a story out of complicated data, and performing many thoughtful analyses.

      However, I believe the paper requires quite major revisions.

      We thank the reviewer for their encouraging assessment of our manuscript. We are grateful for their valuable and especially detailed feedback that helped us to substantially improve our manuscript.

      Major issues:

      I do not believe the current results present a clear, comprehensible story about sleep and motor memory consolidation. As presented, sleep predicts an increase in the subsequent learning curve, but there is a negative relationship between learning curve and task proficiency change (which is, as far as I can tell, similar to "memory retention"). This makes it seem as if sleep predicts more forgetting on initial trials within the subsequent block (or worse memory retention) - is this true? Regardless of whether it is statistically true, there appears another story in these data that is being sacrificed to fit a story about sleep. To my eye, the results may first and foremost tell a circadian (rather than sleep) story. Examining the data in Figure 2A and 2B, it appears that every AM learning period has a higher learning curve (slope) than every PM period. While this could, of course, be due to having just slept, the main story gleaned from such a result is not a sleep effect on retention, which has been the emphasis on motor memory consolidation research in the last couple of decades, but on new learning. The fact that this effect appears present in the first session (juggling blocks 1-3 in adolescents and blocks 1-5 in adults) makes this seem the more likely story here, since it has less to do with "preparing one to re-learn" and more to do with just learning and when that learning is optimal. But even if it does not reach statistical significance in the first session alone, it remains a concern and, in my opinion, should be considered a focus in the manuscript unless the authors can devise a reason to definitively rule it out.

      Here is how I recommend the authors proceed on this point: include all sessions from all subjects into a mixed effect model, predicting the slope of the learning curve with time of day and age group as fixed effects and subjects as random effects:

      learning curve slope ~ AM/PM [AM (0) or PM (1)] + age [adolescent (0) or adult (1)] + (1|subject)

      …or something similar with other regressors of interest. If this is significant for AM/PM status, they should re-try the analysis using only the first session. If this is significant, then a sleep-centric story cannot be defended here at all, in my opinion. If it is not (which could simply result from low power, but the authors could decide this), the authors should decide if they think they can rule out circadian effects and proceed accordingly. I should note that, while to many, a sleep story would be more interesting or compelling, that is not my opinion, and I would not solely opt to reject this paper if it centered a time-of-day story instead.

      The authors need to work out precisely what is happening in the behavior here, and let the physiology follow that story. They should allow themselves to consider very major revisions (and drop the physiology) if that is most consistent with the data. As presented, I am very unclear of what to take away from the study.

      We thank the reviewer for the opportunity to further elaborate on our behavioral results. We agree that the interpretation of the behavior in the complex gross-motor task is not straight forward, which might be partly due to less controllability compared to for example finger-tapping tasks. The reviewer is correct that, initially sleep seems to predict more forgetting on initial trials within the subsequent block given the dip in task proficiency and a resulting increase in steepness of the learning curve after the sleep retention interval. Notably, this dip in performance after sleep has also been reported for finger-tapping tasks (cf. Eichenlaub et al, 2020). The performance dip is also present in the wake first group (Figure 2) after the first interval. This observation suggests that picking up the task again after a period of time comes at a cost. Interestingly, this performance dip is no longer present after the second retention interval indicating that the better the task proficiency the easier it is to pick up juggling again. In other words, juggling has been better consolidated after additional training. Critically, our results show, that participants with higher SO-spindle coupling strength have a lower dip in performance after the retention interval, thus indicating a learning advantage.

      Figure 2

      (A) Number of successful three-ball cascades (mean ± standard error of the mean [SEM]) of adolescents (circles) for the sleep-first (blue) and wake-first group (green) per juggling block. Grand average learning curve (black lines) as computed in (C) are superimposed. Dashed lines indicate the timing of the respective retention intervals that separate the three performance tests. Note that adolescents improve their juggling performance across the blocks. (B) Same conventions as in (A) but for adults (diamonds). Similar to adolescents, adults improve their juggling performance across the blocks regardless of group.

      We discuss the sleep effect on juggling in the discussion section (page 22 – 23, lines 502 – 514):

      "How relevant is sleep for real-life gross-motor memory consolidation? We found that sleep impacts the learning curve but did not affect task proficiency in comparison to a wake retention interval (Figure 2DE). Two accounts might explain the absence of a sleep effect on task proficiency. (1) Sleep rather stabilizes than improves gross-motor memory, which is in line with previous gross-motor adaption studies (Bothe et al, 2019; Bothe et al, 2020). (2) Pre-sleep performance is critical for sleep to improve motor skills (Wilhelm et al, 2012). Participants commonly reach asymptotic pre-sleep performance levels in finger tapping tasks, which is most frequently used to probe sleep effects on motor memory. Here we found that using a complex juggling task, participants do not reach asymptotic ceiling performance levels in such a short time. Indeed, the learning progression for the sleep-first and wake-first groups followed a similar trend (Figure 2AB), suggesting that more training and not in particular sleep drove performance gains."

      If indeed the authors keep the sleep aspect of this story, here are some comments regarding the physiology. The authors present several nice analyses in Figure 3. However, given the lack of behavioral difference between adolescents and adults (Fig 2D), they combine the groups when investigating behavior-physiology relationships. In some ways, then, Figure 3 has extraneous details to the point of motor learning and retention, and I believe the paper would benefit from more focus. If the authors keep their sleep story, I believe Figure 3 and 4 should be combined and some current figure panels in Figure 3 should be removed or moved to the supplementary information.

      We thank the reviewers for their suggestion and we agree that the figures of our manuscript would benefit from more focus. Therefore, we combined Figure 3 and 4 from the original manuscript into a revised Figure 3 in the updated version of the manuscript. In more detail, subpanels that explain our methodological approach can now be found in Figure 3 – figure supplement 1, while the updated Figure 3 now focuses on developmental changes in oscillatory dynamics and SO-spindle coupling strength as well as their relationship to gross-motor learning.

      Updated Figure 3:

      (A) Left: topographical distribution of the 1/f corrected SO and spindle amplitude as extracted from the oscillatory residual (Figure 3 – figure supplement 1A, right). Note that adolescents and adults both display the expected topographical distribution of more pronounced frontal SO and centro-parietal spindles. Right: single subject data of the oscillatory residual for all subjects with sleep data color coded by age (darker colors indicate older subjects). SO and spindle frequency ranges are indicated by the dashed boxes. Importantly, subjects displayed high inter-individual variability in the sleep spindle range and a gradual spindle frequency increase by age that is critically underestimated by the group average of the oscillatory residuals (Figure 3 – figure supplement 1A, right). (B) Spindle peak locked epoch (NREM3, co-occurrence corrected) grand averages (mean ± SEM) for adolescents (red) and adults (black). Inset depicts the corresponding SO-filtered (2 Hz lowpass) signal. Grey-shaded areas indicate significant clusters. Note, we found no difference in amplitude after normalization. Significant differences are due to more precise SO-spindle coupling in adults. (C) Top: comparison of SO-spindle coupling strength between adolescents and adults. Adults displayed more precise coupling than adolescents in a centro-parietal cluster. T-scores are transformed to z-scores. Asterisks denote cluster-corrected two-sided p < 0.05. Bottom: Exemplary depiction of coupling strength (mean ± SEM) for adolescents (red) and adults (black) with single subject data points. Exemplary single electrode data (bottom) is shown for C4 instead of Cz to visualize the difference. (D) Cluster-corrected correlations between individual coupling strength and overnight task proficiency change (post – pre retention) for adolescents (red, circle) and adults (black, diamond) of the sleep-first group (left, data at C4). Asterisks indicate cluster-corrected two-sided p < 0.05. Grey-shaded area indicates 95% confidence intervals of the trend line. Participants with a more precise SO-spindle coordination show improved task proficiency after sleep. Note that the change in task proficiency was inversely related to the change in learning curve (cf. Figure 2D), indicating that a stronger improvement in task proficiency related to a flattening of the learning curve. Further note that the significant cluster formed over electrodes close to motor areas. (E) Cluster-corrected correlations between individual coupling strength and overnight learning curve change. Same conventions as in (D). Participants with more precise SO-spindle coupling over C4 showed attenuated learning curves after sleep.

      and

      Figure 3 - figure supplement 1

      (A) Left: Z-normalized EEG power spectra (mean ± SEM) for adolescents (red) and adults (black) during NREM sleep in semi-log space. Data is displayed for the representative electrode Cz unless specified otherwise. Note the overall power difference between adolescents and adults due to a broadband shift on the y-axis. Straight black line denotes cluster-corrected significant differences. Middle: 1/f fractal component that underlies the broadband shift. Right: Oscillatory residual after subtracting the fractal component (A, middle) from the power spectrum (A, left). Both groups show clear delineated peaks in the SO (< 2 Hz) and spindle range (11 – 16 Hz) establishing the presence of the cardinal sleep oscillations in the signal. (B) Top: Spindle frequency peak development based on the oscillatory residuals. Spindle frequency is faster at all but occipital electrodes in adults than in adolescents. T-scores are transformed to z-scores. Asterisks denote cluster-corrected two-sided p < 0.05. Bottom: Exemplary depiction of the spindle frequency (mean ± SEM) for adolescents (red) and adults (black) with single subject data points at Cz. (C) SO-spindle co-occurrence rate (mean ± SEM) for adolescents (red) and adults (black) during NREM2 and NREM3 sleep. Event co-occurrence is higher in NREM3 (F(1, 51) = 1209.09, p < 0.001, partial eta² = 0.96) as well as in adults (F(1, 51) = 11.35, p = 0.001, partial eta² = 0.18). (D) Histogram of co-occurring SO-spindle events in NREM2 (blue) and NREM3 (purple) collapsed across all subjects and electrodes. Note the low co-occurring event count in NREM2 sleep. (E) Single subject (top) and group averages (bottom, mean ± SEM) for adolescents (red) and adults (black) of individually detected, for SO co-occurrence-corrected sleep spindles in NREM3. Spindles were detected based on the information of the oscillatory residual. Note the underlying SO-component (grey) in the spindle detection for single subject data and group averages indicating a spindle amplitude modulation depending on SO-phase. (F) Grand average time frequency plots (-2 to -1.5s baseline-corrected) of SO-trough-locked segments (corrected for spindle co-occurrence) in NREM3 for adolescents (left) and adults (right). Schematic SO is plotted superimposed in grey. Note the alternating power pattern in the spindle frequency range, showing that SO-phase modulates spindle activity in both age groups.

      Why did the authors use Spearman rather than Pearson correlations in Figure 4? Was it to reduce the influence of the outlier subject? They should minimally clarify and justify this, since it is less conventional in this line of research. And it would be useful to know if the relationship is significant with Pearson correlations when robust regression is applied. I see the authors are using MATLAB, and the robustfit toolbox (https://www.mathworks.com/help/stats/robustfit.html) is a simple way to address this issue.

      We thank the reviewers for their suggestion. We agree that when inspecting the scatter plots it looks like that the correlations could be severely influenced by two outliers in the adult group. Because this is an important matter, we recalculated all previously reported correlations without the two outliers (Figure R4, left column) and followed the reviewer’s suggestion to also compute robust regression (Figure R4, right column) and found no substantial deviation from our original results.

      In more detail, increase in task proficiency resulted in flattening of the learning curve when removing outliers (Figure R4A, rhos = -0.70, p < 0.001) and when applying robust regression analysis (Figure R4B, b = -0.30, t(67) = -10.89, rho = -0.80, p < 0.001). Likewise, higher coupling strength still predicted better task proficiency (mean rho = 0.35, p = 0.029, cluster-corrected) and flatter learning curves after sleep (rho = -0.44, p = 0.047, cluster-corrected) when removing the outliers (Figure R4CE) and when calculating robust regression (Figure R4DF, task proficiency: b = 82.32, t(40) = 3.12, rho = 0.45, p = 0.003; learning curve: b = -26.84, t(40) = -2.96, rho = -0.43, p = 0.005). Furthermore, we calculated spearman rank correlations and cluster-corrected spearman rank correlations in our original manuscript, to mitigate the impact of outliers, even though Pearson correlations are more widely used in the field. Therefore, we still report spearman rank correlations for single electrodes instead of robust correlations as it is more consistent with the cluster-correlation analyses.

      We now use robust trend lines instead of linear trend lines in our scatter plots. Further, we added the correlations without outliers (Figure R4ACE) to the supplements as Figure 2 – figure supplement 1D and Figure 3 – figure supplement 2 FG. These additional analyses are now reported in the results section of the revised manuscript (page 9, lines 186 – 191):

      "[…] we confirmed a strong negative correlation between the change (post retention values – pre retention values) in task proficiency and the change in learning curve after the retention interval (Figure 2F; rhos = -0.71, p < 0.001), which also remained strong after outlier removal (Figure 2 – figure supplement 1D). This result indicates that participants who consolidate their juggling performance after a retention interval show slower gains in performance."

      And (page 16, lines 343 – 346):

      "[…] Furthermore, our results remained consistent when including coupled spindle events in NREM2 (Figure 3 – figure supplement 2E) and after outlier removal (Figure 3 – figure supplement 2FG)."

      Furthermore, we now state that we specifically utilized spearman rank correlations to mitigate the impact of outliers in our analyses in the method section (page 35, lines 808 – 813)::

      "For correlational analyses we utilized spearman rank correlations (rhos; Figure 2F & Figure 3DE) to mitigate the impact of possible outliers as well as cluster-corrected spearman rank correlations by transforming the correlation coefficients to t-values (p < 0.05) and clustering in the space domain (Figure 3DE). Linear trend lines were calculated using robust regression."

      Figure R4

      (A) Spearman rank correlation between task proficiency change and learning curve change collapsed across adolescents (red dot) and adults (black diamonds) after removing two outlier subjects in the adult age group. Grey-shaded area indicates 95% confidence intervals of the robust trend line. (B) Robust regression of task proficiency change and learning curve change of the original sample. (C) Cluster-corrected correlations (right) between individual coupling strength and overnight task proficiency change (post – pre retention) after outlier removal (left, spearman correlation at C4, uncorrected). Asterisks indicate cluster-corrected two-sided p < 0.05. (D) Robust regression of coupling strength at C4 and task proficiency of the original sample. (E) Same conventions as in (C) but for overnight learning curve change. (F) Same conventions as in (D) but for overnight learning curve change.

      Additionally, with only a single night of recording data, it is impossible to disentangle possible trait-based sleep characteristics (e.g., Subject 1 has high SO-spindle coupling in general and retains motor memories well, but these are independent of each other) from a specific, state-based account (e.g., Subject 1's high SO-spindle coupling on night 1 specifically led to their improved retention or change in learning, etc., and this is unrelated to their general SO-spindle coupling or motor performance abilities). Clearly, many studies face this limitation, but this should be acknowledged.

      We thank the reviewers for their important remark. We agree that it is impossible to make a sound statement about whether our reported correlations represent trait- or state-based aspects of the sleep and learning relationship with the data that we have reported in the manuscript. However, while we are lacking a proper baseline condition without any task engagement, we still recorded polysomnography for all subjects during an adaptation night. Given the expected pronounced differences in sleep architecture between the adaptation nights and learning nights (see Table R3 for an overview collapsed across both age groups), we initially refrained from entering data from the adaptation nights into our original analyses, but we now fully report the data below. Note that the differences are driven by the adaptation night, where subjects first have to adjust to sleeping with attached EEG electrodes in a sleep laboratory.

      Table R3. Sleep architecture (mean ± standard deviation) for the adaptation and learning night collapsed across both age groups. Nights were compared using paired t-tests

      To further clarify whether subjects with high coupling strength have a motor learning advantage (i.e. trait-effect) or a learning induced enhancement of coupling strength is indicative for improved overnight memory change (i.e. state-effect), we ran additional analyses using the data from the adaptation night. Note that the coupling strength metric was not impacted by differences in event number and our correlations with behavior were not influenced by sleep architecture (please refer to our answer of issue #7 for the results).Therefore, we considered it appropriate to also utilize data from the adaptation night.

      First, we correlated SO-spindle coupling strength obtained from the adaptation night with the coupling strength in the learning night. We found that overall, coupling strength is highly correlated between the two measurements (mean rho across all channels = 0.55, Figure R5A), supporting the notion that coupling strength remains rather stable within the individual (i.e. trait), similar to what has been reported about the stable nature of sleep spindles as a “neural finger-print” (De Gennaro & Ferrara, 2003; De Gennaro et al, 2005; Purcell et al, 2017).

      To investigate a possible state-effect for coupling strength and motor learning, we calculated the difference in coupling strength between the two nights (learning night – adaptation night) and correlated these values with the overnight change in task proficiency and learning curve. We identified no significant correlations with a learning induced coupling strength change; neither for task proficiency nor learning curve change (Figure R5B). Note that there was a positive correlation of coupling strength change with overnight task proficiency change at Cz (Figure R5B, left), however it did not survive cluster-corrected correlational analysis (rhos = 0.34, p = 0.15). Combined, these results favor the conclusion that our correlations between coupling strength and learning rather reflect a trait-like relationship than a state-like relationship. This is in line with the interpretation of our previous studies that SO-spindle coupling strength reflects the efficiency and integrity of the neuronal pathway between neocortex and hippocampus that is paramount for memory networks and the information transfer during sleep (Hahn et al, 2020; Helfrich et al, 2019; Helfrich et al, 2018; Winer et al, 2019). For a comprehensive review please see Helfrich et al (2021), which argued that SO-spindle coupling predicts the integrity of memory pathways and therefore correlates with various metrics of behavioral performance or structural integrity.

      Figure R5

      (A) Topographical plot of spearman rank correlations of coupling strength in the adaptation night and learning night across all subjects. Overall coupling strength was highly correlated between the two measurements. (B) Cluster-corrected correlation between learning induced coupling strength changes (learning night – adaptation night) and overnight change in task proficiency (left) as well as learning curve (right). We found no significant clusters, although correlations showed similar trends as our original analyses, with more learning induced changes in coupling strength resulting in better overnight task proficiency and flattened learning curves.

      We have now added the additional state-trait analyses (Figure R5) to the updated manuscript as Figure 3 – figure supplement 2HI and report them in the results section (page 17, lines 361 – 375):

      "Finally, we investigated whether subjects with high coupling strength have a gross-motor learning advantage (i.e. trait-effect) or a learning induced enhancement of coupling strength is indicative for improved overnight memory change (i.e. state-effect). First, we correlated SO-spindle coupling strength obtained from the adaptation night with the coupling strength in the learning night. We found that overall, coupling strength is highly correlated between the two measurements (mean rho across all channels = 0.55, Figure 3 – figure supplement 2H), supporting the notion that coupling strength remains rather stable within the individual (i.e. trait). Second, we calculated the difference in coupling strength between the learning night and the adaptation night to investigate a possible state-effect. We found no significant cluster-corrected correlations between coupling strength change and task proficiency- as well as learning curve change (Figure 3 – figure supplement 2I).

      Collectively, these results indicate the regionally specific SO-spindle coupling over central EEG sensors encompassing sensorimotor areas precisely indexes learning of a challenging motor task."

      We further refer to these new results in the discussion section (page 23, lines 521 – 528):

      "Moreover, we found that SO-spindle coupling strength remains remarkably stable between two nights, which also explains why a learning-induced change in coupling strength did not relate to behavior (Figure 3 – figure supplement 2I). Thus, our results primarily suggest that strength of SO-spindle coupling correlates with the ability to learn (trait), but does not solely convey the recently learned information. This set of findings is in line with recent ideas that strong coupling indexes individuals with highly efficient subcortical-cortical network communication (Helfrich et al, 2021)."

      Additionally, we now provide descriptive data of the adaptation and learning night (Table R3) in the Supplementary file – table 1 and explicitly mention the adaptation night in the results section, which was previously only mentioned in the method section(page 6, lines 101 – 105):.

      "Polysomnography (PSG) was recorded during an adaptation night and during the respective sleep retention interval (i.e. learning night) except for the adult wake-first group (for sleep architecture descriptive parameters of the adaptation night and learning night as well as for adolescents and adults see Supplementary file – table 1 & 2)."

      Reviewer #2 (Public Review):

      In this study Hahn and colleagues investigate the role of Slow-oscillation spindle coupling for motor memory consolidation and the impact of brain maturation on these interactions. The authors employed a real-life gross-motor task, where adolescents and adults learned to juggle. They demonstrate that during post-learning sleep SO-spindles are stronger coupled in adults as compared to adolescents. The authors further show, that the strength of SO-spindle coupling correlates with overnight changes in the learning curve and task proficiency, indicating a role of SO-spindle coupling in motor memory consolidation.

      Overall, the topic and the results of the present study are interesting and timely. The authors employed state of the art analyse carefully taking the general variability of oscillatory features into account. It also has to be acknowledged that the authors moved away from using rather artificial lab-tasks to study the consolidation of motor memories (as it is standard in the field), adding ecological validity to their findings. However, some features of their analyses need further clarification.

      We thank the reviewer for their positive assessment of our manuscript. Incorporating the encouraging and helpful feedback, we believe that we substantially improved the clarity and robustness of our analyses.

      1) Supporting and extending previous work of the authors (Hahn et al, 2020), SO-spindle coupling over centro-parietal areas was stronger in adults as compared to adolescents. Despite these differences in the EEG results the authors collapsed the data of adults and adolescents for their correlational analyses (Fig. 4a and 4b). Why would the authors think that this procedure is viable (also given the fact that different EEG systems were used to record the data)?

      We thank the reviewers for the opportunity to clarify why we think it is viable to collapse the data of adolescents and adults for our correlational analyses. In the following we split our answers based on the two points raised by the reviewers: (1) electrophysiological differences (i.e. coupling strength) between the groups and (2) potential signal differences due to different EEG systems.

      1. Electrophysiological differences

      Upon inspecting the original Figure 4, it is apparent that the coupling strength of the combined sample does not form isolated clusters for each age group. In other words, while adult coupling strength is on the higher and adolescent coupling on the lower end due to the developmental increase in coupling strength we reported in the original Figure 3F, both samples overlap forming a linear trend. Second, when running the correlational analyses between coupling strength and task proficiency as well as learning curve separately for each age group, we found that they follow the same direction (Figure R3). Adolescents with higher coupling strength show better task proficiency (Figure R3A, rhos = 0.66, p = 0.005). This effect was also present when using robust regression (b = 109.97, t(15)=3.13, rho = 0.63, p = 0.007). Like adolescents, adults with higher coupling strength at C4 displayed better task proficiency after sleep (Figure R3B, rhos = 0.39, p = 0.053). This relationship was stronger when using robust regression (b = 151.36, t(23)=3.17, rho =0.56, p = 0.004). For learning curves, we found the expected negative correlation at C4 for adolescents (Figure R3C, rhos = -0.57, p = 0.020) and adults (Figure R3D, rhos = -0.44, p = 0.031). Results were comparable when using robust regression (adolescents: b = -59.58, t(15) = -2.94, rho = -0.60, p = 0.010; adults: b = -21.99, t(23 )= -1.71, rho = -0.37, p = 0.101).

      Taken together, these results demonstrate that adolescents and adults show the effects and the same direction at the same electrode, thus, making it highly unlikely that our results are just by chance and that our initial correlation analyses are just driven by one group.

      Additionally, we already controlled for age in our original analyses using partial correlations (also refer to our answer to issue #6). Hence, our additional analyses provide additional support that it is viable to collapse the analyses across both age groups even though they differ in coupling strength.

      1. Different EEG-systems

        The reviewers also raise the question whether our analyses might be impacted by the different EEG systems we used to record our data. This is an important concern especially when considering that cross-frequency coupling analyses can be severely confounded by differences in signal properties (Aru et al, 2015). In our sample, the strongest impact factor on signal properties is most likely age, given the broadband power differences in the power spectrum we found between the groups (original Figure 3A). Importantly, we also found a similar systematic power difference in our longitudinal study using the same ambulatory EEG system for both data recordings (Hahn et al, 2020). This is in line with numerous other studies demonstrating age related EEG power changes in broadband- as well as SO and sleep spindle frequency ranges (Campbell & Feinberg, 2016; Feinberg & Campbell, 2013; Helfrich et al, 2018; Kurth et al, 2010; Muehlroth et al, 2019; Muehlroth & Werkle-Bergner, 2020; Purcell et al, 2017). Therefore, we already had to take differences in signal property into account for our cross-frequency analyses. Regardless whether the underlying cause is an age difference or different signal-to-noise ratios of different EEG systems.

      To mitigate confounds in the signal, we used a data-driven and individualized approach detecting SO and sleep spindle events based on individualized frequency bands and a 75-percentile amplitude criterion relative to the underlying signal. Additionally we z-normalized all spindle events prior to the cross-frequency coupling analyses (Figure R3E). We found no amplitude differences around the spindle peak (point of SO-phase readout) between adolescents that were recorded with an ambulatory amplifier system (alphatrace) and adults that were recorded with a stationary amplifier system (neuroscan) using cluster-based random permutation testing. This was also the case for the SO-filtered (< 2 Hz) signal (Figure R3E, inset). Critically, the significant differences in amplitude from -1.4 to -0.8 s (p = 0.023, d = -0.73) and 0.4 to 1.5 s (p < 0.001, d = 1.1) are not caused by age related differences in power or different EEG-systems but instead by the increased coupling strength (i.e. higher coupling precision of spindles to SOs) in adults giving rise to a more pronounced SO-wave shape when averaging across spindle peak locked epochs.

      Consequently, our analysis pipeline already controlled for possible differences in signal property introduced through different amplifier systems. Nonetheless, we also wanted to directly compare the signal-to-noise ratio of the ambulatory and stationary amplifier systems. However, we only obtained data from both amplifier systems in the adult sleep first group, because we recorded EEG during the juggling learning phase with the ambulatory system in addition to the PSG with the stationary system. First, we computed the power spectra in the 1 to 49 Hz frequency range during the juggling learning phase (ambulatory) and during quiet wakefulness (stationary) for every subject in the adult sleep first group in 10-seconds segments. Next, we computed the signal-to-noise ratio (mean/standard deviation) of the power spectra per frequency across all segments. We only found a small negative cluster from 21.9 to 22.5 Hz (p = 0.042, d = 0.53; Figure R3F), which did not pertain our frequency-bands of interest. Critically, the signal-to-noise ratio of both amplifiers converged in the upper frequency bands approaching the noise floor, therefore, strongly supporting the notion that both systems in fact provided highly comparable estimates.

      In conclusion, both age groups display highly similar effects and direction when correlating coupling strength with behavior. Further, after individualization and normalization the analytical signal, we found no differences in signal properties that would confound the cross-frequency analysis. Lastly, we did not find systematic differences in signal-to-noise ratio between the different EEG-systems. Thus, we believe it is justified to collapse the data across all participants for the correlational analyses, as it combines both, the developmental aspect of enhanced coupling precision from adolescence to adulthood and the behavioral relevance for motor learning which we deem a critical research advance from our previous study.

      Figure R3

      (A) Cluster-corrected correlations (right) between individual coupling strength and overnight task proficiency change (post – pre retention) for adolescents of the sleep-first group (left, spearman correlation at C4, uncorrected). Asterisks indicate cluster-corrected two-sided p < 0.05. Grey-shaded area indicates 95% confidence intervals of the robust trend line. Participants with a more precise SO-spindle coordination show improved task proficiency after sleep. (B) Cluster-corrected correlation of coupling strength and overnight task proficiency change) for adults. Same conventions as in (A). Similar trend of higher coupling strength predicting better task proficiency after sleep (C) Cluster-corrected correlation of coupling strength and overnight learning curve change for adolescents. Same conventions as in (A). Higher coupling strength related to a flatter learning curve after sleep. (D) Cluster-corrected correlation of coupling strength and overnight learning curve change for adults. Same conventions as in (A). Higher coupling strength related to a flatter learning curve after sleep. (E) Spindle peak locked epoch (NREM3, co-occurrence corrected) grand averages (mean ± SEM) for adolescents (red) and adults (black). Inset depicts the corresponding SO-filtered (2 Hz lowpass) signal. Black lines indicate significant clusters. Note, we found no difference in amplitude after normalization. Significant differences are due to more precise SO-spindle coupling in adults. Spindle frequency is blurred due to individualized spindle detection. (F) Signal-to-noise ratio for the stationary EEG amplifier (green) during quiet wakefulness and for the ambulatory EEG amplifier (purple) during juggling training. Grey shaded area denotes cluster-corrected p < 0.05. Note that signal-to-noise ratio converges in the higher frequency ranges.

      We have now added Figure R3E as Figure 3B to the revised version of the manuscript to demonstrate that there were no systematic differences between the two age groups in the analytical signal due to the expected age related power differences or EEG-systems. Specifically, we now state in the results section (page 13 – 14, lines 282 – 294):

      "We assessed the cross frequency coupling based on z-normalized spindle epochs (Figure 3B) to alleviate potential power differences due to age (Figure 3 – figure supplement 1A) or different EEG-amplifier systems that could potentially confound our analyses (Aru et al, 2015). Importantly, we found no amplitude differences around the spindle peak (point of SO-phase readout) between adolescents and adults using cluster-based random permutation testing (Figure 3B), indicating an unbiased analytical signal. This was also the case for the SO-filtered (< 2 Hz) signal (Figure 3B, inset). Critically, the significant differences in amplitude from -1.4 to -0.8 s (p = 0.023, d = -0.73) and 0.4 to 1.5 s (p < 0.001, d = 1.1) are not caused by age related differences in power or different EEG-systems but instead by the increased coupling strength (i.e. higher coupling precision of spindles to SOs) in adults giving rise to a more pronounced SO-wave shape when averaging across spindle peak locked epochs."

      Further, we added the correlational analyses that we computed separately for the age groups (Figure R3A-D) to the revised manuscript (Figure 3 – figure supplement 2CD) as they further substantiate our claims about the relationship between SO-spindle coupling and gross-motor learning.

      We now refer to these analyses in the results section (page 16, lines 338 – 343):

      "Critically, when computing the correlational analyses separately for adolescents and adults, we identified highly similar effects at electrode C4 for task proficiency (Figure 3 – figure supplement 2C) and learning curve (Figure 3 – figure supplement 2D) in each group. These complementary results demonstrate that coupling strength predicts gross-motor learning dynamics in both, adolescents as well as adults, and further show that this effect is not solely driven by one group."

      2) The authors might want to explicitly show that the reported correlations (with regards to both learning curve and task proficiency change) are not driven by any outliers.

      We thank the reviewers for their suggestion. We agree that when inspecting the scatter plots it looks like that the correlations could be severely influenced by two outliers in the adult group. Because this is an important matter, we recalculated all previously reported correlations without the two outliers (Figure R4, left column) and followed the reviewer’s suggestion to also compute robust regression (Figure R4, right column) and found no substantial deviation from our original results.

      In more detail, increase in task proficiency resulted in flattening of the learning curve when removing outliers (Figure R4A, rhos = -0.70, p < 0.001) and when applying robust regression analysis (Figure R4B, b = -0.30, t(67) = -10.89, rho = -0.80, p < 0.001). Likewise, higher coupling strength still predicted better task proficiency (mean rho = 0.35, p = 0.029, cluster-corrected) and flatter learning curves after sleep (rho = -0.44, p = 0.047, cluster-corrected) when removing the outliers (Figure R4CE) and when calculating robust regression (Figure R4DF, task proficiency: b = 82.32, t(40) = 3.12, rho = 0.45, p = 0.003; learning curve: b = -26.84, t(40) = -2.96, rho = -0.43, p = 0.005). Furthermore, we calculated spearman rank correlations and cluster-corrected spearman rank correlations in our original manuscript, to mitigate the impact of outliers, even though Pearson correlations are more widely used in the field. Therefore, we still report spearman rank correlations for single electrodes instead of robust correlations as it is more consistent with the cluster-correlation analyses.

      We now use robust trend lines instead of linear trend lines in our scatter plots. Further, we added the correlations without outliers (Figure R4ACE) to the supplements as Figure 2 – figure supplement 1D and Figure 3 – figure supplement 2 FG. These additional analyses are now reported in the results section of the revised manuscript (page 9, lines 186 – 191):

      "[…] we confirmed a strong negative correlation between the change (post retention values – pre retention values) in task proficiency and the change in learning curve after the retention interval (Figure 2F; rhos = -0.71, p < 0.001), which also remained strong after outlier removal (Figure 2 – figure supplement 1D). This result indicates that participants who consolidate their juggling performance after a retention interval show slower gains in performance."

      And (page 16, lines 343 – 346):

      "[…] Furthermore, our results remained consistent when including coupled spindle events in NREM2 (Figure 3 – figure supplement 2E) and after outlier removal (Figure 3 – figure supplement 2FG)."

      Furthermore, we now state that we specifically utilized spearman rank correlations to mitigate the impact of outliers in our analyses in the method section (page 35, lines 808 – 813)::

      "For correlational analyses we utilized spearman rank correlations (rhos; Figure 2F & Figure 3DE) to mitigate the impact of possible outliers as well as cluster-corrected spearman rank correlations by transforming the correlation coefficients to t-values (p < 0.05) and clustering in the space domain (Figure 3DE). Linear trend lines were calculated using robust regression."

      Figure R4:

      (A) Spearman rank correlation between task proficiency change and learning curve change collapsed across adolescents (red dot) and adults (black diamonds) after removing two outlier subjects in the adult age group. Grey-shaded area indicates 95% confidence intervals of the robust trend line. (B) Robust regression of task proficiency change and learning curve change of the original sample. (C) Cluster-corrected correlations (right) between individual coupling strength and overnight task proficiency change (post – pre retention) after outlier removal (left, spearman correlation at C4, uncorrected). Asterisks indicate cluster-corrected two-sided p < 0.05. (D) Robust regression of coupling strength at C4 and task proficiency of the original sample. (E) Same conventions as in (C) but for overnight learning curve change. (F) Same conventions as in (D) but for overnight learning curve change.

      3) The sleep data of all participants (thus from both sleep first and wake first) were used to determine the features of SO-spindle coupling in adolescents and adults. Were there any differences between groups (sleep first vs. wake first)? This might be in interesting in general but especially because only data of the sleep first group entered the subsequent correlational analyses.

      We thank the reviewers for their remark. We agree that adding additional information about possible differences between the sleep first and wake first groups would allow for a more comprehensive assessment of the reported data. We did not explain our reasoning to include only the sleep first groups for the correlation analyses clearly enough in the original manuscript. Unfortunately, we can only report data for the adolescents in our sample, because we did not record polysomnography (PSG) for the adult wake first group. This is also one of the two reasons why we focused on the sleep first groups for our correlational analyses.

      Adolescents in the sleep first group did not differ from adolescents in the wake first group in terms of sleep architecture (except REM (%), which did not correlate with behavior [task proficiency: rho = -0.17, p = 0.28; learning curve: -0.02, p = 0.90]) as well as SO and sleep spindle event descriptive measures (see Table R2). Importantly, we found no differences in coupling strength between the two groups (Figure R2A).

      Table R2. Summary of sleep architecture and SO/spindle event descriptive measures (at electrode C4) of adolescents in the sleep first and wake first group (mean ± standard deviation). Independent t-tests were used for comparisons

      The second reason why we focused our analyses on sleep first was that adolescents in the wake first group had higher task proficiency after the sleep retention interval than the sleep first group (Figure R2A; t(23) = -2.24, p = 0.034). This difference in performance is directly explained by the additional juggling test that the wake first group performed at the time point of their learning night, which should be considered as additional training. Therefore, we excluded the wake first group from our correlational analyses because sleep and wake first group are not comparable in terms of juggling training during the night when we assessed SO-spindle coupling strength.

      Figure R2

      (A) Comparison of SO-spindle coupling strength in the adolescent sleep first (blue) and wake first (green) group using cluster-based random permutation testing (Monte-Carlo method, cluster alpha 0.05, max size criterion, 1000 iterations, critical alpha level 0.05, two-sided). Left: exemplary depiction of coupling strength at electrode C4 (mean ± SEM). Right: z-transformed t-values plotted for all electrodes obtained from the cluster test. No significant clusters emerged. (B) Comparison of task proficiency between sleep first and wake first group after the sleep retention interval (mean ± SEM). Adolescents in the wake first group had higher task proficiency given the additional juggling performance test, which also reflects additional training.

      These additional analyses (Figure R2) and the summary statistics of sleep architecture and SO/spindle event descriptives of adolescents in the sleep first and wake first group (Table R2), are now reported in the revised version of the manuscript as Figure 3 – figure supplement 2AB and Supplementary file – table 7. We now explicitly explain our rationale of why we only considered participants in the sleep first group for our correlational analyses in the results section (page 6, lines 101 – 105):

      "Polysomnography (PSG) was recorded during an adaptation night and during the respective sleep retention interval (i.e. learning night) except for the adult wake-first group (for sleep architecture descriptive parameters of the adaptation night and learning night as well as for adolescents and adults see Supplementary file – table 1 & 2)"

      And (page 15, lines 311 – 320):

      "[…] Furthermore, given that we only recorded polysomnography for the adults in the sleep first group and that adolescents in the wake first group showed enhanced task proficiency at the time point of the sleep retention interval due to additional training (Figure 3 – figure supplement 2A), we only considered adolescents and adults of the sleep-first group to ensure a similar level of juggling experience adolescents and adults of the sleep-first group to ensure a similar level of juggling experience (for summary statistics of sleep architecture and SO and spindle events of subjects that entered the correlational analyses see Supplementary file – table 6). Notably, we found no differences in electrophysiological parameters (i.e. coupling strength, event detection) between the adolescents of the wake first and sleep first group (Figure 3 – figure supplement 2B & Supplementary file – table 7)."

      4) To allow a more comprehensive assessment of the underlying data information with regards to general sleep descriptives (minutes, per cent of time spent in different sleep stages, overall sleep time etc.) as well as related to SOs, spindles and coupled events (e.g. number, density etc.) would be needed.

      We agree with the reviewers that additional information about sleep architecture and SO as well as sleep spindle characteristics are needed for a more comprehensive assessment of our data. We now added summary tables for sleep architecture and SO/spindle event descriptive measures for the whole sample (Table R4) and for the sleep first groups that we used for our correlational analyses (Table R5) to the supplementary material in the updated manuscript. It is important to note, that due to the longer sleep opportunity of adolescents that we provided to accommodate the overall higher sleep need in younger participants, adolescents and adults differed in most general sleep architecture markers and SO as well as sleep spindle descriptive measures. In addition, changes in sleep architecture are prominent during the maturational phase from adolescence to adulthood, which might introduce additional variance between the two age groups.

      Table R4. Summary of sleep architecture and SO/spindle event descriptive measures (at electrode C4) of adolescents and adults across the whole sample (mean ± standard deviation) in the learning night. Independent t-tests were used for comparisons

      Table R5. Summary of sleep architecture and SO/spindle event descriptive measures (at electrode C4) of adolescents and adults in the sleep first group (mean ± standard deviation) in the learning night. Independent t-tests were used for comparisons

      In order to ensure that our correlational analyses are not driven by these systematic differences between the two age groups, we used cluster-corrected partial correlations to control for sleep architecture markers (Figure R7) and SO/spindle descriptive measurements (Figure R8A). Critically, none of these possible confounders changed the pattern of our initial correlational analyses of coupling strength and task proficiency/learning curve. Additionally, we also controlled for differences in spindle event number by using a bootstrapped resampling approach. We randomly drew 200 spindle events in 100 iterations and subsequently recalculated the coupling strength for each subject. We found that resampled values and our original observation of coupling strength are almost perfectly correlated, indicating that differences in event number are unlikely to have an impact on coupling strength as long as there are at least 200 events (Figure R8B). Combined these analyses demonstrate that our correlations between coupling strength and behavior are not influenced by the reported differences in sleep architecture and SO/spindle descriptive measures.

      Figure 7R

      Summary of cluster-corrected partial correlations of coupling strength with task proficiency (left) and learning curve (right) controlling for possible confounding factors. Asterisks indicate location of the detected cluster. The pattern of initial results remained highly stable.

      Figure R8

      (A) Summary of cluster-corrected partial correlations of coupling strength with task proficiency (left) and learning curve (right) controlling SO/spindle descriptive measures at critical electrode C4. Asterisks indicate location of the detected cluster. The pattern of initial results remained highly stable. (B) Spearman correlation between resampled coupling strength (N = 200, 100 iterations) and original observation of coupling strength for adolescents (red circles) and adults (black diamonds), indicating that coupling strength is not influenced by spindle event number if at least 200 events are present. Grey-shaded area indicates 95% confidence intervals of the robust trend line.

      We now provide general sleep descriptives (Table R4 & R5) in the revised version of the manuscript as Supplementary file – table 2 & table 6. These data are referred to in the results section (page 6, lines 101 – 105):

      "Polysomnography (PSG) was recorded during an adaptation night and during the respective sleep retention interval (i.e. learning night) except for the adult wake-first group (for sleep architecture descriptive parameters of the adaptation night and learning night as well as for adolescents and adults see Supplementary file – table 1 & 2)."

      And (page 15, lines 311 – 318):

      "Furthermore, given that we only recorded polysomnography for the adults in the sleep first group and that adolescents in the wake first group showed enhanced task proficiency at the time point of the sleep retention interval due to additional training (Figure 3 – figure supplement 2A), we only considered adolescents and adults of the sleep-first group to ensure a similar level of juggling experience (for summary statistics of sleep architecture and SO and spindle events of subjects that entered the correlational analyses see Supplementary file – table 6)."

      The additional control analyses (Figure R7 & R8) are also now added to the revised manuscript as Figure 3 – figure supplement 3 & 4 in the results section (page 16, lines 356 – 360):

      "For a summary of the reported cluster-corrected partial correlations as well as analyses controlling for differences in sleep architecture see Figure 3 – figure supplement 3. Further, we also confirmed that our correlations are not influenced by individual differences in SO and spindle event parameters (Figure 3 – figure supplement 4)."

      5) The authors used a partial correlations to rule out that age drove the relationship between coupling strength, learning curve and task proficiency. It seems like this analysis was done specifically for electrode C4, after having already established that coupling strength at electrode C4 correlates in general with changes in the learning curve and task proficiency. I think the claim that results were not driven by age as confounding factor would be stronger if the authors used a cluster-corrected partial correlation in the first place (just as in the main analysis).

      The reviewers are correct that initially we only conducted the partial correlation for electrode C4. Following the reviewers suggestion we now additionally computed cluster-corrected partial correlations similar to our main analysis. Like in our original analyses, we found a significant positive central cluster (Figure R6A, mean rho = 0.40, p = 0.017) showing that higher coupling strength related to better task proficiency after sleep and a negative cluster-corrected correlation at C4 showing that higher coupling strength was related to flatter learning curves after sleep (Figure R6B, rho = -0.47, p = 0.049) also when controlling for age.

      Figure R6

      (A) Cluster-corrected partial correlation of individual coupling strength in the learning night and overnight change in task proficiency (post – pre retention) collapsed across adolescents and adults, controlling for age. Asterisks indicate cluster-corrected two-sided p < 0.05. A similar significant cluster to the original analysis (Figure 4A) emerged comprising electrodes Cz and C4. (B) Same conventions as in A. Like in the original analysis (Figure 4B) a negative correlation between coupling strength at C4 and learning curve change survived cluster-corrected partial correlations when controlling for age.

      We now always report cluster-corrected partial correlations when controlling for possible confounding variables in the updated version of the manuscript (also see answer to issue #7). A summary of all computed partial correlations including Figure R6 can now be found as Figure 3 – figure supplement 3 & 4 in the revised manuscript.

      Specifically we now state in the results section (page 16 – 17, lines 347 – 360):

      "To rule out age as a confounding factor that could drive the relationship between coupling strength, learning curve and task proficiency in the mixed sample, we used cluster-corrected partial correlations to confirm their independence of age differences (task proficiency: mean rho = 0.40, p = 0.017; learning curve: rhos = -0.47, p = 0.049). Additionally, given that we found that juggling performance could underlie a circadian modulation we controlled for individual differences in alertness between subjects due to having just slept. We partialed out the mean PVT reaction time before the juggling performance test after sleep from the original analyses and found that our results remained stable (task proficiency: mean rho = 0.37, p = 0.025; learning curve: rhos = -0.49, p = 0.040). For a summary of the reported cluster-corrected partial correlations as well as analyses controlling for differences in sleep architecture see Figure 3 – figure supplement 3. Further, we also confirmed that our correlations are not influenced by individual differences in SO and spindle event parameters (Figure 3 – figure supplement 4)."

      And in the methods section (page 35, lines 813 – 814):

      "To control for possible confounding factors we computed cluster-corrected partial rank correlations (Figure 3 – figure supplement 3 and 4)."

      References

      Aru, J., Aru, J., Priesemann, V., Wibral, M., Lana, L., Pipa, G., Singer, W. & Vicente, R. (2015) Untangling cross-frequency coupling in neuroscience. Curr Opin Neurobiol, 31, 51-61.

      Bothe, K., Hirschauer, F., Wiesinger, H. P., Edfelder, J., Gruber, G., Birklbauer, J. & Hoedlmoser, K. (2019) The impact of sleep on complex gross-motor adaptation in adolescents. Journal of Sleep Research, 28(4).

      Bothe, K., Hirschauer, F., Wiesinger, H. P., Edfelder, J. M., Gruber, G., Hoedlmoser, K. & Birklbauer, J. (2020) Gross motor adaptation benefits from sleep after training. J Sleep Res, 29(5), e12961.

      Campbell, I. G. & Feinberg, I. (2016) Maturational Patterns of Sigma Frequency Power Across Childhood and Adolescence: A Longitudinal Study. Sleep, 39(1), 193-201.

      Dayan, E. & Cohen, L. G. (2011) Neuroplasticity subserving motor skill learning. Neuron, 72(3), 443-54. De Gennaro, L. & Ferrara, M. (2003) Sleep spindles: an overview. Sleep Med Rev, 7(5), 423-40.

      De Gennaro, L., Ferrara, M., Vecchio, F., Curcio, G. & Bertini, M. (2005) An electroencephalographic fingerprint of human sleep. Neuroimage, 26(1), 114-22.

      Dinges, D. F., Pack, F., Williams, K., Gillen, K. A., Powell, J. W., Ott, G. E., Aptowicz, C. & Pack, A. I. (1997) Cumulative sleepiness, mood disturbance, and psychomotor vigilance performance decrements during a week of sleep restricted to 4-5 hours per night. Sleep, 20(4), 267-77.

      Dinges, D. F. & Powell, J. W. (1985) Microcomputer Analyses of Performance on a Portable, Simple Visual Rt Task during Sustained Operations. Behavior Research Methods Instruments & Computers, 17(6), 652-655.

      Eichenlaub, J. B., Biswal, S., Peled, N., Rivilis, N., Golby, A. J., Lee, J. W., Westover, M. B., Halgren, E. & Cash, S. S. (2020) Reactivation of Motor-Related Gamma Activity in Human NREM Sleep. Front Neurosci, 14, 449.

      Feinberg, I. & Campbell, I. G. (2013) Longitudinal sleep EEG trajectories indicate complex patterns of adolescent brain maturation. American Journal of Physiology - Regulatory, Integrative and Comparative Physiology, 304(4), R296-303.

      Hahn, M., Heib, D., Schabus, M., Hoedlmoser, K. & Helfrich, R. F. (2020) Slow oscillation-spindle coupling predicts enhanced memory formation from childhood to adolescence. Elife, 9.

      Helfrich, R. F., Lendner, J. D. & Knight, R. T. (2021) Aperiodic sleep networks promote memory consolidation. Trends Cogn Sci.

      Helfrich, R. F., Lendner, J. D., Mander, B. A., Guillen, H., Paff, M., Mnatsakanyan, L., Vadera, S., Walker, M. P., Lin, J. J. & T., K. R. (2019) Bidirectional prefrontal-hippocampal dynamics organize information transfer during sleep in humans. Nature Communications, 10(1), 3572.

      Helfrich, R. F., Mander, B. A., Jagust, W. J., Knight, R. T. & Walker, M. P. (2018) Old Brains Come Uncoupled in Sleep: Slow Wave-Spindle Synchrony, Brain Atrophy, and Forgetting. Neuron, 97(1), 221-230 e4.

      Killgore, W. D. (2010) Effects of sleep deprivation on cognition. Prog Brain Res, 185, 105-29.

      Kurth, S., Jenni, O. G., Riedner, B. A., Tononi, G., Carskadon, M. A. & Huber, R. (2010) Characteristics of sleep slow waves in children and adolescents. Sleep, 33(4), 475-80.

      Maris, E. & Oostenveld, R. (2007) Nonparametric statistical testing of EEG- and MEG-data. J Neurosci Methods, 164(1), 177-90.

      Muehlroth, B. E., Sander, M. C., Fandakova, Y., Grandy, T. H., Rasch, B., Shing, Y. L. & Werkle-Bergner, M. (2019) Precise Slow Oscillation-Spindle Coupling Promotes Memory Consolidation in Younger and Older Adults. Sci Rep, 9(1), 1940.

      Muehlroth, B. E. & Werkle-Bergner, M. (2020) Understanding the interplay of sleep and aging: Methodological challenges. Psychophysiology, 57(3), e13523.

      Niethard, N., Ngo, H. V. V., Ehrlich, I. & Born, J. (2018) Cortical circuit activity underlying sleep slow oscillations and spindles. Proceedings of the National Academy of Sciences of the United States of America, 115(39), E9220-E9229.

      Purcell, S. M., Manoach, D. S., Demanuele, C., Cade, B. E., Mariani, S., Cox, R., Panagiotaropoulou, G., Saxena, R., Pan, J. Q., Smoller, J. W., Redline, S. & Stickgold, R. (2017) Characterizing sleep spindles in 11,630 individuals from the National Sleep Research Resource. Nature Communications, 8, 15930.

      Van Dongen, H. P., Maislin, G., Mullington, J. M. & Dinges, D. F. (2003) The cumulative cost of additional wakefulness: dose-response effects on neurobehavioral functions and sleep physiology from chronic sleep restriction and total sleep deprivation. Sleep, 26(2), 117-26.

      Wilhelm, I., Metzkow-Meszaros, M., Knapp, S. & Born, J. (2012) Sleep-dependent consolidation of procedural motor memories in children and adults: the pre-sleep level of performance matters. Developmental Science, 15(4), 506-15.

      Winer, J. R., Mander, B. A., Helfrich, R. F., Maass, A., Harrison, T. M., Baker, S. L., Knight, R. T., Jagust, W. J. & Walker, M. P. (2019) Sleep as a potential biomarker of tau and beta-amyloid burden in the human brain. J Neurosci.

    1. Author Response

      Reviewer #1 (Public Review):

      Redox signaling is a dynamic and concerted orchestra of inter-connected cellular pathways. There is always a debate whether ROS (reactive oxygen species) could be a friend or foe. Continued research is needed to dissect out how ROS generation and progression could diverge in physiological versus pathophysiological states. Similarly, there are several paradoxical studies (both animal and human) wherein exercise health benefits were reported to be accompanied by increases in ROS generation. It is in this context, that the present manuscript deserves attention.

      Utilizing the in-vitro studies as well as mice model work, this manuscript illustrates the different regulatory mechanisms of exercise and antioxidant intervention on redox balance and blood glucose level in diabetes. The manuscript does have some limitations and might need additional experiments and explanation.

      The authors should consider addressing the following comments with additional experiments.

      1) Although hepatic AMPK activation appears to be a central signaling element for the benefits of moderate exercise and glucose control, additional signals (on hepatic tissue) related to hepatic gluconeogenesis such as Forkhead box O1 (FoxO1), phosphoenolpyruvate carboxykinase (PEPCK), and GLUT2 needs to be profiled to present a holistic approach. Authors should consider this and revise the manuscript.

      We appreciate the constructive suggestion. Besides glycolysis, gluconeogenesis and glucose uptake are critical in maintaining liver and blood glucose homeostasis.

      FoxO1 has been tightly linked with hepatic gluconeogenesis through inhibiting the transcription of gluconeogenesis-related PEPCK and G6Pase expression (1, 2). Herein, we found the expression of FoxO1 increased in the diabetic group but reduced in the CE, IE and EE groups (Fig. X1A, Fig.5E-F in manuscript). Meanwhile, the mRNA level of Pepck and G6PC (one of the three G6Pase catalytic-subunit-encoding genes) also decreased in the CE, IE, and EE groups (Fig. X1B-1C, Fig.5H-I in manuscript). These results indicates that these three modes of exercise all inhibited gluconeogenesis through down-regulating FoxO1.

      For the glucose uptake, we detected the protein expression of GLUT2 in the liver tissue. Glut2 helps in the uptake of glucose by the hepatocytes for glycolysis and glycogenesis. Accordingly, we found GLUT2,a glucose sensor in liver, was up-regulated in diabetic rats, but down-regulated by the CE and IE intervention. However, GLUT2 didn’t decrease in the EE group, which is consistent with the results of the unimproved blood glucose by EE intervention (Figure X1A, Fig.5E and 5G in manuscript).

      Taken together, moderate exercise could benefits glucose control through increasing glycolysis and decreasing gluconeogenesis. We added this part in Page 9 line 251-263 and Figure 5E-5I in this version.

      Figure X1. A. Representative protein level and quantitative analysis of FOXO1 (82 kDa), GLUT2 (60-70 kDa) and Actin (45 kDa) in the rats in the Ctl, T2D, T2D + CE, T2D + IE and T2D + EE groups. C-D. Expression of hepatic Pepck and G6PC mRNA in the Ctl, T2D, T2D + CE, T2D + IE and T2D + EE groups were evaluated by real-time PCR analysis. Values represent mean ratios of Pepck and G6PC transcripts normalized to GAPDH transcript levels.

      2) Very recently sestrin2 signaling is assumed significant attention in relation to exercise and antioxidant responses. Therefore, authors should profile the sestrin2 levels as it is linked to several targets such as mTOR, AMPK and Sirt1. Additionally, the levels of Nrf2 should be reported as this is the central regulator of the threshold mechanisms of oxidative stress and ROS generation.

      We appreciate reviewer’s expert comments. Nrf2 is an important mediator of antioxidant signaling, playing a fundamental role in maintaining the redox homeostasis of the cell. Under unstressed conditions, Nrf2 activity is suppressed by its innate repressor Kelch-like ECH-associated protein 1 (Keap1) (3). With the increase of ROS level in the development of diabetes, Nrf2 was activated to induce the transcription of several antioxidant enzymes (4, 5).

      Nrf2 expression level has been reported to increase in HFD mice or diabetic patients (6, 7). It has been found from in vitro studies that NRF2 activation is achieved with acute exposure to high glucose, whereas longer incubation times or oscillating glucose concentration failed to activate Nrf2 (8, 9). These suggest that the increase of ROS in diabetes can cause compensatory upregulation of Nrf2. In our study, we found that Nrf2 increased in diabetic rats, which can further initiate the expression of antioxidant enzymes. As shown in Fig.X2A (Fig.2H-2K in manuscript), Grx and Trx involved in thioredoxin metabolism were up-regulated accordingly like Nrf2. After CE intervention, the level of Nrf2 increased further more (Fig.2E-2F), suggesting that CE intervention could activate antioxidant system to achieve a high-level redox balance. We have added these new results into Figure 2.

      On the other hand, the expression level of Sestrin2 and Nrf2 decreased after antioxidant supplement. Our results suggest that the antioxidant treatment improved the diabetes through inhibiting ROS level to achieve a low-level redox balance, but moderate exercise enhanced ROS tolerance to achieve a high-level balance (Fig.X2D-F, Fig.3E-3G in manuscript).

      We added the new data in “Page 5 line 147-153 and Page 7 line 183-186” and Figure 2-3 in current version.

      Figure X2. A-C. Representative protein level and quantitative analysis of Nrf2 (97 kDa), Sestrin2 (57 kDa) and Actin (45 kDa) in the rats in the Ctl, T2D and T2D + CE groups. D-F. Representative protein level and quantitative analysis of Nrf2 (97 kDa), Sestrin2 (57 kDa) and HSP90 (90 kDa) in the rats in the Ctl, T2D and T2D + APO groups.

      3) Authors should discuss the exercise-associated hormesis curve. They should discuss whether moderate exercise could decrease the sensitivity to oxidative stress by altering the bell-shaped dose-response curve.

      We thank the reviewer’s valuable comments. According to literatures, Zsolt Radak et al proposed a bell-shaped dose-response curve between normal physiological function and level of ROS in healthy individuals, and suggested that moderate exercise can extend or stretch the levels of ROS while increases the physiological function (10). Our results validated this hypothesis and further proposed that moderate exercise could produce ROS meanwhile increase antioxidant enzyme activity to maintain high level redox balance according to the Bell-shaped curve, whereas excessive exercise would generate a higher level of ROS, leading to reduced physiological function. In this study, we found the state of diabetic individuals is more applicable to the description of a S-shaped curve, due to the high level of oxidative stress and decreased reduction level in diabetic individuals (Fig.8B). With the increase of ROS, the physiological function of diabetic individuals gradually decreases and enters a state of redox imbalance. Moderate exercise shifts the S-shaped curve into a bell-shaped dose-response curve, thus reducing the sensitivity to oxidative stress in diabetic individuals and restoring redox homeostasis. However, with excessive exercise, ROS production increases beyond the threshold range of redox balance, resulting in decreased physiological function (Fig.8B, see the decreasing portion of the bell curve to the right of the apex).

      Nevertheless, the antioxidant intervention increased physiological activity by reducing ROS levels in diabetic individuals, restoring a bell-shaped dose-response curve at low level of ROS (Fig.8B). Therefore, redox balance could be achieved either at low level of ROS mediated by antioxidant intervention or at high level of ROS mediated by moderate exercise, both of which were regulated by AMPK activation. Therefore, both high and low levels of redox balance can lead to high physiological function as long as they are in the redox balance threshold range. Then, the activation of AMPK is an important sign of exercise or antioxidant intervention to obtain redox dynamic balance which helps restore physiological function. Accordingly, we speculate that the antioxidant intervention based on moderate exercise might offset the effect of exercise, but antioxidants could be beneficial during excessive exercise. The human study also supports that supplementation with antioxidants may preclude the health-promoting effects of exercise (11). Therefore, personalized intervention with respect to redox balance will be crucial for the effective treatment of diabetes patients.

      We added this part into “Discussion” in this version (Page 13-14 line 389-418).

      4) It would not be ideal to single-out AMPK as a sole biomarker in this manuscript. Instead, authors should consider AMPK activation and associated signaling in relation to redox balance. This should also be presented in Fig 7.

      We thank reviewer’s critical comments. According to the comments, we have discussed the AMPK signaling in the discussion part (Page 13, line 373-384) and added the AMPK signaling in Fig.8A.

      Reference:

      1. R. A. Haeusler, K. H. Kaestner, D. Accili, FoxOs function synergistically to promote glucose production. J Biol Chem 285, 35245-35248 (2010).
      2. J. Nakae, T. Kitamura, D. L. Silver, D. Accili, The forkhead transcription factor Foxo1 (Fkhr) confers insulin sensitivity onto glucose-6-phosphatase expression. J Clin Invest 108, 1359-1367 (2001).
      3. M. McMahon, K. Itoh, M. Yamamoto, J. D. Hayes, Keap1-dependent proteasomal degradation of transcription factor Nrf2 contributes to the negative regulation of antioxidant response element-driven gene expression. J Biol Chem 278, 21592-21600 (2003).
      4. R. S. Arnold et al., Hydrogen peroxide mediates the cell growth and transformation caused by the mitogenic oxidase Nox1. Proc Natl Acad Sci U S A 98, 5550-5555 (2001).
      5. J. M. Lee, M. J. Calkins, K. Chan, Y. W. Kan, J. A. Johnson, Identification of the NF-E2-related factor-2-dependent genes conferring protection against oxidative stress in primary cortical astrocytes using oligonucleotide microarray analysis. J Biol Chem 278, 12029-12038 (2003).
      6. T. Jiang et al., The protective role of Nrf2 in streptozotocin-induced diabetic nephropathy. Diabetes 59, 850-860 (2010).
      7. X. H. Wang et al., High Fat Diet-Induced Hepatic 18-Carbon Fatty Acids Accumulation Up-Regulates CYP2A5/CYP2A6 via NF-E2-Related Factor 2. Front Pharmacol 8, 233 (2017).
      8. T. S. Liu et al., Oscillating high glucose enhances oxidative stress and apoptosis in human coronary artery endothelial cells. J Endocrinol Invest 37, 645-651 (2014).
      9. Z. Ungvari et al., Adaptive induction of NF-E2-related factor-2-driven antioxidant genes in endothelial cells in response to hyperglycemia. Am J Physiol Heart Circ Physiol 300, H1133-1140 (2011).
      10. Z. Radak et al., Exercise, oxidants, and antioxidants change the shape of the bell-shaped hormesis curve. Redox Biol 12, 285-290 (2017).
      11. M. Ristow et al., Antioxidants prevent health-promoting effects of physical exercise in humans. Proc Natl Acad Sci U S A 106, 8665-8670 (2009).
    1. Author Response

      Reviewer #1 (Public Review):

      This study explores the mechanisms responsible for reduced steroidogenesis of adrenocortical cells in a mouse model of systemic inflammation induced by LPS administration. Working from RNA and protein profiling data sets in adrenocortical tissue from LPS-treated mice they report that LPS perturbs the TCA cycle at the level of succinate dehydrogenase B (SDHB) impairing oxidative phosphorylation. Additional studies indicate these events are coupled to increased IL-1β levels which inhibit SDHB expression through DNA methyltransferase-dependent DNA methylation of the SDHB promoter.

      In general, these are interesting studies with some novel implications. I do, however, have concerns with some of the author's rather broad conclusions given the limitations of their experimental approach. The paper could be improved by addressing the following points:

      1) The limitations of using LPS as the model for systemic inflammation need to be explicitly described.

      We thank the Reviewer for this suggestion. Indeed, the LPS model has several limitations as a preclinical model of sepsis, which are outlined in the revised Discussion. Despite its limitations, we chose this model over other models of sepsis, such as the cecal slurry model, due to its high reproducibility, which enabled the here presented mechanistic studies.

      2) The initial in vivo findings, which support the proposed metabolic perturbation, are based on descriptive profiling data obtained at one time point following a single dose of LPS. The author's conclusion that the ultimate transcriptional pathway identified hinges critically on knowledge of the time course of this effect following LPS, which is not adequately addressed in the paper. How was this time and dose of LPS established and are there data from different dose and time points?

      We thank the Reviewer for raising this question, which we indeed addressed at the beginning of our studies in order to determine a suitable time point and dose of LPS treatment. We chose 6 h as a suitable starting time point to perform transcriptional analyses, based on the fact that LPS triggers transcriptional changes in the adrenal gland and other tissues within the range of few hours (1-3). Confirming our expectations we found 2,609 differentially expressed genes (Figure 1a) in the adrenal cortex of LPS-treated mice among which many were involved in cellular metabolism (Figure 1d,e, 2a-e, Table 1, Table 2). Acute transcriptional changes, which are more likely to reflect direct effects of inflammatory signals compared to changes occurring at later time points (for instance in the range of days), would allow us to mechanistically investigate the effects of inflammation in the adrenal gland, which was the purpose of our studies. Hence, we were guided by the transcriptional changes observed at 6 h of LPS treatment and established the hypothesis that disruption of the TCA cycle in adrenocortical cells is key in the impact of inflammation on adrenal function. Along this line, we analyzed the metabolomic profile of the adrenal gland at 6 and 24 h of LPS treatment. At 6 h succinate levels as well as the succinate / fumarate ratio remained unchanged (Author response image 1A), while at 24 h post-injection these were increased by LPS (Author response image 1B, Figure 2l,o,q). The time delay of the increase in succinate levels (observed at 24 h) following downregulation of Sdhb mRNA expression (at 6 h) can be explained by the time required for reduction of SDHB protein levels, which is dependent on the protein turnover suggested to be approximately 12 h in HeLa cells (4). Based on these findings, all further metabolomic analyses were performed at 24 h of LPS treatment.

      Author response image 1. LPS increases the succinate/fumarate ratio at 24 but not 6h. Mice were i.p. injected with 1 mg/kg LPS and 6 h (A) and 24 h (B) post-injection succinate and fumarate levels were determined by LC-MS/MS in the adrenal gland. n=8-10; data are presented as mean ± s.e.m. Statistical analysis was done with two-tailed Mann-Whitney test. *p < 0.05.

      Having established the most suitable time points of LPS treatments to observe induced transcriptional and metabolic changes, we set out to define the LPS dose to be used in subsequent experiments. The data shown in Author response image 1, were acquired after treatment with 1 mg/kg LPS. This is a dose that was previously reported to cause transcriptional re-profiling of the adrenal gland (1, 2). However, 5 mg/kg LPS, similarly to 1 mg/kg LPS, also reduced Sdhb, Idh1 and Idh2 expression at 4 h (Author response image 2A) and increased succinate and isocitrate levels at 24 h (Author response image 2B) in the adrenal gland. Given that the effects of 1 and 5 mg/kg LPS were similar, for animal welfare reasons we continued our studies with the lower dose.

      Author response image 2. Five mg/kg LPS downregulate Sdhb, Idh1 and Idh2 expression and increase succinate and isocitrate levels in the adrenal gland of mice. Sdhb, Idh1 and Idh2 expression (A) and succinate and isocitrate levels (B) were assessed in the adrenal gland of mice treated with 5 mg/kg LPS for 4 h (A) and 24 h (B). n=5; data are presented as mean ± s.d. Statistical analysis was done with two-tailed Mann-Whitney test. p < 0.05, *p < 0.01.

      3) Related to the point above, the authors data supporting a break in the TCA cycle would be strengthened direct biochemical assessment (metabolic flux analysis) of step kin the TCA cycle process impacted.

      We entirely agree with the Reviewer and considered performing TCA cycle metabolic flux analyses in adrenocortical cells. Unfortunately, the low yield of adrenocortical cells per mouse (approx. 3,000- 6,000) does not allow the performance of metabolic flux experiments, which require higher cell numbers per sample, several time points per condition and an adequate number of replicates per experiment. Moreover, NCI-H295R cells being adrenocortical carcinoma cells are expected to have substantially altered metabolic fluxes compared to normal cells. Since we wouldn’t have the capacity to confirm findings from metabolic flux experiments in NCI-H295R cells in primary adrenocortical cells, as we did for the rest of the experiments, we decided to not perform metabolic flux experiments in NCI-H295R cells. However, performing metabolic flux analyses in adrenocortical cells under inflammatory or other stress conditions remains an important future task that we will pursue upon establishment of a more suitable cell culture system.

      4) The proposed connection of DNMT and IL1 signaling to systemic inflammation and reduced steroidogenesis could be more firmly established by additional studies in adrenal cortical cells lacking these genes.

      We thank the Reviewer for this excellent suggestion. In the revised manuscript we strengthened the evidence for an IL-1β –DNMT1 link and show that DNMT1 deficiency blocks the effects of IL-1β on SDHB promoter methylation (Figure 6k), the succinate / fumarate ratio (Figure 6m), the oxygen consumption rate (Figure 6n) and steroidogenesis (Figure 6o-q) in adrenocortical cells. In order to validate the role of IL-1β in vivo, mice were simultaneously treated with LPS and Raleukin, an IL-1R antagonist. Treatment with Raleukin increased the SDH activity (Figure 6r), reduced succinate levels and the succinate / fumarate ratio (Figure 6s,t) and increased corticosterone production in LPS-treated mice (Figure 6u).

      Reviewer #2 (Public Review):

      The present manuscript provides a mechanistic explanation for an event in adrenal endocrinology: the resistance which develops during excessive inflammation relative to acute inflammation. The authors identify disturbances in adrenal mitochondria function that differentiate excessive inflammation. During severe inflammation the TCA in the adrenal is disrupted at the level of succinate production producing an accumulation of succinate in the adrenal cortex. The authors also provide a mechanistic explanation for the accumulation of succinate, they demonstrate that IL1b decreases expression of SDH the enzyme that degrades succinate through a methylation event in the SDH promoter. This work presents a solid explanation for an important phenomenon. Below are a few questions that should be resolved experimentally.

      1) The authors should confirm through direct biochemical assays of enzymatic activity that steroidogenesis enzyme activity is not impaired. Many of these enzymes are located in the mitochondria and their activity may be diminished due to the disturbed, high succinate environment of the cortical cell as opposed to the low ATP production.

      We thank the Reviewer for this question. The activity of the first and rate-limiting steroidogenic enzyme, cytochrome P450-side-chain-cleavage (SCC, CYP11A1) which generates pregnenolone from cholesterol, was recently shown to require intact SDH function (5). In agreement with this report we show that production of progesterone, the direct derivative of pregnenolone, is impaired upon SDH inhibition (Figure 5b,e,h). In addition, we assessed the activity of CYP11B1 (steroid 11β-hydroxylase), the enzyme catalyzing the conversion of 11-deoxycorticosterone to corticosterone, i.e. the last step of glucocorticoid synthesis, by determining the corticosterone and 11-deoxycorticosterone levels by LC-MS/MS and calculating the ratio of corticosterone to 11-deoxycorticosterone in ACTH-stimulated adrenocortical cells and explants. The corticosterone / 11-deoxycorticosterone ratio was not affected by Sdhb silencing in adrenocortical cells (Figure 5- Supplement 2g) nor did it change upon LPS treatment in adrenal explants (Figure 5- Supplement 2h), suggesting that CYP11B1 activity may not be altered upon SDH blockage. Hence, we propose that upon inflammation impairment of SDH function may disrupt at least the first steps of steroidogenesis (producing pregnenolone/progesterone), thereby diminishing production of all downstream adrenocortical steroids. This is now discussed in the revised manuscript.

      2) What is the effect of high ROS production? Is steroidogenesis resolved if ROS is pharmacologically decreased even if the reduction of ATP is not resolved?

      We thank the Reviewer for this suggestion, which helped us to broaden our findings. Indeed, ROS scavenging by the vitamin E analog Trolox (Figure 5n) partially reversed the inhibitory effect of DMM on steroidogenesis (Figure 5o,p), suggesting that impairment of SDH function impacts steroidogenesis also via enhanced ROS production (Figure 4g).

      3) Does increased intracellular succinate (through cell permeable succinate treatment) inhibit steroidogenesis even if there is not a blockage of OXPHOS?

      We suggest that SDH inhibition and succinate accumulation lead to reduced steroidogenesis due to impaired oxidative phosphorylation (Figure 4c,e, 5i), reduced ATP synthesis (Figure 4d, 5j-m) and increased ROS production (Figure 4g, 5o,p). Since SDH is part (complex II) of the electron chain transfer it cannot be decoupled from oxidative phosphorylation, thereby limiting the experimental means for addressing this question.

      4) It should be demonstrated the genetic loss of IL1 signaling in adrenal cortical cells results in a loss of the effect of LPS on reduced steroidogenesis and increased succinate accumulation.

      We thank the Reviewer for this suggestion. Development of a mouse line with genetic loss of Il-1r in adrenocortical cells was rather impossible during the short time of revisions. Instead, mice under LPS treatment were treated with the IL-1R antagonist, Raleukin, to study the in vivo effects of IL-1β in the adrenal gland. IL-1R antagonism increased SDH activity in the adrenal cortex (Figure 6r), decreased succinate levels and the succinate/fumarate ratio in the adrenal gland (Figure 6s,t) and enhanced corticosterone production (Figure 6u) in LPS-treated mice, supporting our hypothesis that IL-1β mediates the effects of systemic inflammation in the adrenal cortex.

      5) It should be demonstrated the genetic loss of IL1 signaling in adrenal cortical cells results in a loss of the effect of LPS on SDH activity and ATP production and SDH promoter methylation

      As outlined above, Raleukin treatment increased SDH activity in the adrenal cortex (Figure 6r) and decreased succinate levels and the succinate/fumarate ratio in the adrenal gland (Figure 6s,t) of mice treated with LPS. Furthermore, IL-1β reduced the ATP/ADP ratio (Figure 6e) and enhanced SDHB promoter methylation in NCI-H295R cells (Figure 6k).

      6) It should be shown that the silencing of DNMT eliminates or diminishes the effect of LPS on reduced steroidogenesis and increased succinate accumulation.

      We thank the Reviewer for this suggestion, which prompted us to strengthen the evidence for the implication of DNMT1 in the effects of LPS on adrenocortical cell metabolism and function. As mentioned above, development of a new mouse line, in this case bearing genetic loss of DNMT1 in adrenocortical cells, was considered impossible during the short time of revisions. Therefore, we assessed the role of DNMT1 by silencing it via siRNA transfections in primary adrenocortical cells and NCI-H295R cells. We show that DNMT1 silencing inhibits the effect of IL-1β on SDHB promoter methylation (Figure 6k), restores Sdhb expression (Figure 6l) and reduces the succinate/fumarate ratio in IL-1β treated adrenocortical cells (Figure 6m). Accordingly, DNMT1 silencing restores ACTH-induced production of corticosterone, 11-deoxycorticosterone and progesterone in IL-1β treated adrenocortical cells (Figure 6o-q). We chose to stimulate adrenocortical cells with IL-1β instead of LPS, as in vitro the effects of IL-1β were more robust than these of LPS (possibly due to a reduction of TLR4 expression or function in cultured adrenocortical cells) and in order to show the link between IL-1β and DNMT1.

      7) Does silencing of DNMT reduce OXPHOS in adrenal cortical cells?

      We measured the oxygen consumption rate in NCI-H295R cells, which were transfected with siRNA against DNMT1 and treated or not with IL-1β. IL-1β reduced the OCR in cells transfected with control siRNA, while DNMT1 silencing blunted the effect of IL-1β (Figure 6n).

      8) The effects of LPS on reduced adrenal steroidogenesis are not elaborated at the physiological level. The manuscript should demonstrate the ramifications of the adrenal function decreasing after LPS. Does CORT release become less pronounced after subsequent challenges? Does baseline CORT decrease at some point? No physiological consequences are shown. Similarly, these physiological consequences of decreased adrenal function should be dependent on decreased SDH activity and OXPHOS in adrenal cells and this should be demonstrated experimentally.

      We thank the Reviewer for raising this excellent question. Inflammation is a potent inducer of the Hypothalamus-Pituitary-Adrenal gland (HPA) axis, causing increased glucocorticoid production, a stress response leading to vital immune and metabolic adaptations. Accordingly, LPS treatment rapidly increases glucocorticoid production in mice (1, 6, 7). Reduced adrenal gland responsiveness to ACTH associates with decreased survival of septic mice (8). These preclinical findings stand in accordance with observations in septic patients, in which impairment of adrenal function correlates with high risk for death (9). Along this line, ACTH test was suggested to have prognostic value for identification of septic patients with high mortality risk (9, 10).

      In order to confirm impairment of the adrenal gland function in septic mice, animals were subjected to sepsis via administration of a high LPS dose (10 mg / kg) and treated with ACTH 24 h later. Indeed, the ACTH-induced increase in corticosterone levels was diminished in LPS-treated mice (Author response image 3). This finding was further confirmed in adrenal explants, in which LPS pre-treatment also blunted ACTH-stimulated corticosterone production (Figure 5s).

      Author response image 3. High LPS dose blunts the ACTH response in mice. C57BL/6J mice were i.p. injected with 10 mg/kg LPS or PBS and 24 h later they were i.p. injected with 1 mg/kg ACTH. One hour after ACTH administration blood was retroorbitally collected and corticosterone plasma levels were determined by LC-MS/MS. n=4-5; data are presented as mean ± s.d. Statistical analysis was done with two-tailed Mann-Whitney test. *p < 0.05.

      Given that purpose of our studies was to dissect the mechanisms underlying adrenal gland dysfunction in inflammation rather than analyzing the physiological consequences thereof, we chose not to follow these lines of investigations and concentrate on the role of cell metabolism in adrenocortical cells in the context of inflammation.

      References

      1. W. Kanczkowski, A. Chatzigeorgiou, M. Samus, N. Tran, K. Zacharowski, T. Chavakis, S. R. Bornstein, Characterization of the LPS-induced inflammation of the adrenal gland in mice. Mol Cell Endocrinol 371, 228-235 (2013).
      2. L. S. Chen, S. P. Singh, M. Schuster, T. Grinenko, S. R. Bornstein, W. Kanczkowski, RNA-seq analysis of LPS-induced transcriptional changes and its possible implications for the adrenal gland dysregulation during sepsis. J Steroid Biochem Mol Biol 191, 105360 (2019).
      3. V. I. Alexaki, G. Fodelianaki, A. Neuwirth, C. Mund, A. Kourgiantaki, E. Ieronimaki, K. Lyroni, M. Troullinaki, C. Fujii, W. Kanczkowski, A. Ziogas, M. Peitzsch, S. Grossklaus, B. Sonnichsen, A. Gravanis, S. R. Bornstein, I. Charalampopoulos, C. Tsatsanis, T. Chavakis, DHEA inhibits acute microglia-mediated inflammation through activation of the TrkA-Akt1/2-CREB-Jmjd3 pathway. Mol Psychiatry 23, 1410-1420 (2018).
      4. C. Yang, J. C. Matro, K. M. Huntoon, D. Y. Ye, T. T. Huynh, S. M. Fliedner, J. Breza, Z. Zhuang, K. Pacak, Missense mutations in the human SDHB gene increase protein degradation without altering intrinsic enzymatic function. FASEB J 26, 4506-4516 (2012).
      5. H. S. Bose, B. Marshall, D. K. Debnath, E. W. Perry, R. M. Whittal, Electron Transport Chain Complex II Regulates Steroid Metabolism. iScience 23, 101295 (2020).
      6. W. Kanczkowski, V. I. Alexaki, N. Tran, S. Grossklaus, K. Zacharowski, A. Martinez, P. Popovics, N. L. Block, T. Chavakis, A. V. Schally, S. R. Bornstein, Hypothalamo-pituitary and immune-dependent adrenal regulation during systemic inflammation. Proc Natl Acad Sci U S A 110, 14801-14806 (2013).
      7. W. Kanczkowski, A. Chatzigeorgiou, S. Grossklaus, D. Sprott, S. R. Bornstein, T. Chavakis, Role of the endothelial-derived endogenous anti-inflammatory factor Del-1 in inflammation-mediated adrenal gland dysfunction. Endocrinology 154, 1181-1189 (2013).
      8. C. Jennewein, N. Tran, W. Kanczkowski, L. Heerdegen, A. Kantharajah, S. Drose, S. Bornstein, B. Scheller, K. Zacharowski, Mortality of Septic Mice Strongly Correlates With Adrenal Gland Inflammation. Crit Care Med 44, e190-199 (2016).
      9. D. Annane, V. Sebille, G. Troche, J. C. Raphael, P. Gajdos, E. Bellissant, A 3-level prognostic classification in septic shock based on cortisol levels and cortisol response to corticotropin. JAMA 283, 1038-1045 (2000).
      10. E. Boonen, S. R. Bornstein, G. Van den Berghe, New insights into the controversy of adrenal function during critical illness. Lancet Diabetes Endocrinol 3, 805-815 (2015).
      11. C. C. Huang, Y. Kang, The transient cortical zone in the adrenal gland: the mystery of the adrenal X-zone. J Endocrinol 241, R51-R63 (2019).
    1. Author reponse

      Reviewer #1 (Public Review):

      In their paper, Kroell and Rolfs use a set of sophisticated psychophysical experiments in visually-intact observers, to show that visual processing at the fovea within the 250ms or so before saccading to a peripheral target containing orientation information, is influenced by orientation signals at the target. Their approach straddles the boundary between enforcing fixation throughout stimulus presentation (a standard in the field) and leaving it totally unconstrained. As such, they move the field of saccade pre-processing towards active vision in order to answer key questions about whether the fovea predicts features at the gaze target, over what time frame, with what precision, and over what spatial extent around the foveal center. The results support the notion that there is feature-selective enhancement centered on the center of gaze, rather than on the predictively remapped location of the target. The results further show that this enhancement extends about 3 deg radially from the foveal center and that it starts ~ 200ms or so before saccade onset. They also show that this enhancement is reinforced if the target remains present throughout the saccade. The hypothesized implications of these findings are that they could enable continuity of perception trans-saccadically and potentially, improve post-saccadic gaze correction.

      Strengths:

      The findings appear solid and backed up by converging evidence from several experimental manipulations. These included several approaches to overcome current methodological constraints to the critical examination of foveal processing while being careful not to interfere with saccade planning and performance. The authors examined the spatial frequency characteristics of the foveal enhancement relative, hit rates and false alarm rates for detecting a foveal probe that was congruent or incongruent in terms of orientation to the peripheral saccade target embedded in flickering, dynamic noise (i/f )images. While hit rates are relatively easy to interpret, the authors also reconstructed key features of the background noise to interpret false alarms as reflecting foveal enhancement that could be correlated with target orientation signals. The study also - in an extensive Supplementary Materials section - uses appropriate statistical analyses and controls for multiple factors impacting experimental/stimulus design and analysis. The approach, as well as the level of care towards experimental details provided in this manuscript, should prove welcome and useful for any other investigators interested in the questions posed.

      Weaknesses:

      I find no major weaknesses in the experiments, analyses or interpretations. The conclusions of the paper appear well supported by the data. My main suggestion would be to see a clearer discussion of the implications of the present findings for truly naturalistic, visually-guided performance and action. Please consider the implication of the phenomena and behaviors reported here when what is located at the gaze center (while peripheral targets are present), is not a noisy, relatively feature-poor, low-saliency background, but another high-saliency target, likely crowded by other nearby targets. As such, a key question that emerges and should be addressed in the Discussion at least is whether the fovea's role described in the present experiments is restricted to visual scenarios used here, or whether they generalize to the rather different visual environments of everyday life.

      This is a very interesting question. While we cannot provide a definite answer, we have added a paragraph discussing the role of foveal prediction in more naturalistic visual contexts to the Discussion section (‘Does foveal prediction transfer to other visual features and complex natural environments?’). We pasted this paragraph in response to another comment in the ‘Recommendations for the authors’ section below. We suggest that “the pre-saccadic decrease in foveal sensitivity demonstrated previously[9] as well as in our own data (Figure 2B) may boost the relative strength of fed-back signals by reducing the conspicuity of foveal feedforward input”, presumably allowing the foveal prediction mechanism to generalize to more naturalistic environments with salient foveal stimulation.

      Reviewer #2 (Public Review):

      Human and primates move their eyes with rapid saccades to reposition the high-resolution region of the retina, the fovea, over objects of interest. Thus, each saccade involves moving the fovea from a pre-saccadic location to a saccade target. Although it has been long known that saccades profoundly alter visual processing at the time of saccade, scientists simply do not know how the brain combines information across saccades to support our normal perceptual experience. This paper addresses a piece of that puzzle by examining how eye movements affect processing at the fovea before it moves. Using a dynamic noise background and a dual psychophysical task, the authors probe both the performance and selectivity of visual processing for orientation at the fovea in the few hundred milliseconds preceding a saccade. They find that hit rates and false alarm rates are dynamically and automatically modulated by the saccade planning. By taking advantage of the specific sequence of noise shown on each trial, they demonstrate that the tuning of foveal processing is affected by the orientation of the saccade target suggesting foveal specific feedback.

      A major strength of the paper is the experimental design. The use of dynamic filtered noise to probe perceptual processing is a clever way of measuring the dynamics of selectivity at the fovea during saccade preparation. The use of a dual-task allows the authors to evaluate the tuning of foveal processing as well and how it depends on the peripheral target orientation. They show compellingly that the orientation of the saccade target (the future location of the fovea) affects processing at the fovea before it moves.

      There are two weaknesses with the paper in its current form. The first is that the key claim of foveal "enhancement" relies on the tuning of the false alarms. A more standard measure of enhancement would be to look at the sensitivity, or d-prime, of the performance on the task. In this study, hits and false alarms increase together, which is traditionally interpreted as a criterion shift and not an enhancement. However, because of the external noise, false alarms are driven by real signals. The authors are aware of this and argue that the fact that the false alarms are tuned indicates enhancement. But it is unclear to me that a criterion shift wouldn't also explain this tuning and the change in the noise images. For example, in a task with 4 alternative choices (Present/Congruent, Present/Incongruent, Absent/Congruent, Absent/Incongruent), shifting the criterion towards the congruent target would increase hits and false alarms for that target and still result in a tuned template (because that template is presumably what drove the decision variable that the adjusted criterion operates on). I believe this weakness could be addressed with a computational model that shows that a criterion shift on the output of a tuned template cannot produce the pattern of hits and false alarms.

      We thank the reviewer for this comment. We will present three arguments, each of which suggests that our effects are perceptual in nature and cannot be explained by a shift in decision criterion: (1) the temporal specificity of the difference in Hit Rates (HRs), (2) the spatial specificity of the difference in HRs and (3) the phenomenological quality of the foveally predicted signal. In general, a criterion shift would indeed affect hits and false alarms alike. Nonetheless, the difference in HRs only manifested under specific and meaningful conditions:

      First, the increase in congruent as compared to incongruent HRs, i.e., enhancement, was temporally specific: congruent and incongruent HRs were virtually identical when the probe appeared in a baseline time bin or one (Figure 2B) or even two (Figure 4A) early pre-saccadic time bins. Based on another reviewer’s comment, we collected additional data to measure the time course and extent of foveal enhancement during fixation. While pre-saccadic enhancement developed rapidly, enhancement started to emerge 200 ms after target onset during fixation. Crucially, these time courses mirror the typical temporal development of visual sensitivity during pre-saccadic attention shifts and covert attentional allocation, respectively[8,33]. We are unaware of data demonstrating similar temporal specificity for a shift in decision criterion. One could argue that a template of the target orientation needs to build up before it can influence criterion. Nonetheless, this template would be expected to remain effective after this initial temporal threshold has been crossed. In contrast, we observe pronounced enhancement in medium but not late stages of saccade preparation in the PRE-only condition (Figure 4A).

      Second, it has been argued that a defining difference between innately perceptual effects and post-perceptual criterion shifts is their spatial specificity[53]: in opposition to perceptual effects, criterion shifts should manifest in a spatially global fashion. Due to a parafoveal control condition detailed in our reply to the next comment, we maintain the claim that enhancement is spatially specific: congruent HRs exceeded incongruent ones within a confined spatial region around the center of gaze. We did not observe enhancement for probes presented at 3 dva eccentricity even when we raised parafoveal performance to a foveal level by adaptively increasing probe contrast. The accuracy of saccade landing or, more specifically, the mean remapped target location (Figure 3B) influenced the spatial extent of the enhanced region in a fashion that is reconcilable with previous findings[30]. A criterion shift that is both spatially and temporally selective, follows the time course of pre-saccadic or covert attention depending on observers’ oculomotor behavior, does not remain effective throughout the entire trial after its onset, is sensitive to the mean remapped target location across trials, and does not apply to parafoveal probes even after their contrast has been increased to match foveal performance, would be unprecedented in the literature and, even if existent, appear just as functionally meaningful as sensitivity changes occurring under the same conditions.

      Lastly and on a more informal note, we would like to describe a phenomenological percept that was spontaneously reported by 6 out of 7 observers in Experiment 1 and experienced by the author L.M.K. many times. On a small subset of trials, participants in our paradigms have the strong phenomenological impression of perceiving the target in the pre-saccadic center of gaze. This percept is rare but so pronounced that some observers interrupt the experiment to ask which probe orientation they should report if they had perceived two on the same trial (“The orientation of the normal probe or of the one that looked exactly like the target”). Interestingly, the actual saccade target and its foveal equivalent are perceived simultaneously in two spatiotopically separate locations, suggesting that this percept cannot be ascribed to a temporal misjudgment of saccade execution (after which the target would have actually been foveated). We have no data to prove this observation but nonetheless wanted to share it. Experiencing it ourselves has left us with no doubt that the fed-back signal is truly – and almost eerily – perceptual in nature.

      The analysis suggested by the reviewer is very interesting. Yet for several reasons stated in the ‘Suggestions to the authors’ section, our dataset is not cut out for an analysis of noise properties at this level of complexity. We had always planned to resolve these concerns experimentally, i.e., by demonstrating specificity in HRs. We believe that our arguments above provide a strong case for a perceptual phenomenon and have incorporated them into the Discussion of our revised manuscript.

      The second weakness is that the author's claim that feedback is spatially selective to the fovea is confounded by the fact that acuity and contrast sensitivity are higher in the fovea. Therefore, the subject's performance would already be spatially tuned. Even the very central degree, the foveola, is inhomogeneous. Thus, finding spatially-tuned sensitivity to the probes may simply indicate global feature gain on top of already spatially tuned processing in the fovea. Another possible explanation that is consistent with the "no enhancement" interpretation is that the fovea has increased. This is consistent with the observation that the congruency effects were aligned to the center of gaze and not the saccade endpoint. It looks from the Gaussian fits that a single gain parameter would explain the difference in the shape of the congruent and incongruent hit rates, but I could not figure out if this was explicitly tested from the existing methods. Additional experiments without prepared saccades would be an easy way to address this issue. Is the hit rate tuned when there is no saccade preparation? If so, it seems likely that the spatial selectivity is not tuned feedback, but inhomogeneous feedforward processing.

      We fully agree. We do not consider a fixation condition diagnostic to resolve this question since, as of now, correlates of foveal feedback have exclusively been observed during fixation. In those studies, it was suggested that the effect, i.e., a foveal representation of peripheral stimuli, reflects the automatic preparation of an eye movement that was simply not executed[11,12,14]. To address another reviewer’s comment, we collected additional data in a fixation experiment. The probe stimulus could exclusively appear in the screen center (as in Experiment 1) and observers maintained fixation throughout the trial. While pre-saccadic congruency effects were significantly more pronounced and developed faster, congruency effects did emerge during fixation when the probe appeared 200 ms after the target. If pre-saccadic processes indeed spill over to fixation tasks to some extent and trigger relevant neural mechanisms even when no saccade is executed, we could expect a similar feedback-induced spatial profile during fixation. Since this matches the reviewer’s prediction if the pre-saccadic profiles resulted from inhomogeneous feedforward processing, we do not consider a fixation condition suitable to distinguish between both hypotheses.

      To test whether the tuning of enhancement is effectively a consequence of declining visual performance in the parafovea/periphery, we instead raised parafoveal performance to a foveal level by adaptively increasing the opacity of the probe: while leaving all remaining experimental parameters unchanged, we presented the probe in one of two parafoveal locations, i.e., 3 dva to the left or right of the screen center. Observers were explicitly informed about the placement of the probe. We administered a staircase procedure to determine the probe opacity at which performance for parafoveal target-incongruent probes would be just as high as foveal performance had been in the preceding sessions. While the foveal probe was presented at a median opacity of 28.3±7.6%, a parafoveal opacity of 39.0±11.1% was required to achieve the same performance level. As a result, the gray dot at 0 dva in the figure below represents the incongruent HR in the center of gaze and ranges at 80% on the y-axis. The gray dots at ±3 dva represent incongruent parafoveal HRs and also range at ~80% on the y-axis. Using the reviewer’s terminology, we effectively removed the influence of acuity- (or contrast-sensitivity-) dependent spatial tuning. If the spatial profiles had indeed been the result of “global feature gain on top of already spatially tuned processing“, this manipulation should render parafoveal feature gain just as detectable as foveal feature gain. Instead, congruent and incongruent parafoveal HRs were statistically indistinguishable (away from the saccade target: p = .127, BF10 = 0.531; towards the saccade target: p = .336, BF10 = 0.352), inconsistent with the idea of a spatially global feature gain.

      We had included these data in our initial submission. They were collected in the same observers that contributed the spatial profiles (Experiment 2). The data points at 0 dva in the reduced figure above correspond to the foveal probe location in Figure 2D. The data points at ±3 dva had been plotted and discussed in our initial submission, yet only very briefly. Based on this and another reviewer’s comment, we realize that we should have explained this condition more extensively in the main text rather than in the Methods and have added a dedicated paragraph to the Results section.

      This paper is important because it compellingly demonstrates that visual processing in the fovea anticipates what is coming once the eyes move. The exact form of the modulation remains unclear and the authors could do more to support their interpretations. However, understanding this type of active and predictive processing is a part of the puzzle of how sensory systems work in concert with motor behavior to serve the goals of the organism.

      Reviewer #3 (Public Review):

      This manuscript examines one important and at the same time little investigated question in vision science: what happens to the processing of the foveal input right before the onset of a saccade. This is clearly something of relevance as humans perform saccades about 3 times every second. Whereas what happens to visual perception in the visual periphery at the saccade goal is well characterized, little is known about what happens at the very center of gaze, which represents the future retinal location where the saccade target will be viewed at high resolution upon landing. To address this problem the authors implemented an elegant experiment in which they probed foveal vision at different times before the onset of the saccade by using a target, with the same or different orientation with respect to the stimulus at the saccade goal, embedded in dynamic noise. The authors show that foveal processing of the saccade target is initiated before saccade execution resulting in the visual system being more sensitive to foveal stimuli which features match with those of the stimuli at the saccades goal. According to the authors, this process enables a smooth transition of visual perception before and after the saccade. The experiment is well designed and the results are solid, overall I think this work represents a valuable contribution to the field and its results have important implications. My comments below:

      1. The change in the overall performance between the baseline condition and when the probe is presented after the saccade target is large, but I wonder if there are other unrelated factors that contribute to this difference, for example, simply presenting the probe after vs before the onset of a peripheral stimulus, or the fact that in the baseline the probe is presented right after a fixation marker, but in the other condition there was a longer time interval between the presentation of the marker and the probe transient. The authors should discuss how these confounding factors have been accounted for.

      We thank the reviewer for this helpful comment. We would like to clarify that the probe was never presented right after the fixation dot. In the baseline condition, fixation dot and target were separated by 50 ms, i.e., the duration of one noise image. Since the fixation dot was an order of magnitude smaller than the probe (0.3 vs 3 dva in diameter) and since two large-field visual transients caused by the onset of a new background noise image occurred between fixation dot disappearance and probe appearance, we consider it unlikely that the performance difference was caused by any kind of stimulus interaction such as masking. Nonetheless, we had been puzzled by this difference already when inspecting preliminary results and wondered if it may reflect observers’ temporal expectations about the trial sequence. We therefore explicitly instructed and repeatedly reminded observers that the probe could appear before the peripheral target. Since the difference persisted, we ascribed it to a predictive remapping of attention to the fovea during saccade preparation, as we had stated in the Discussion.

      Another contributing factor may be that observers approached the oculomotor and perceptual detection tasks sequentially. In early trial phases, they may have prioritized localizing the target and programming the eye movement. After motor planning had been initiated, resources may have been freed up for the foveal detection task. Since on the majority of probe-present trials, the probe appeared after the saccade target, this strategy would have been mostly adaptive. Crucially, however, observers yielded similar incongruent Hit Rates in the baseline and last pre-saccadic time bin (70% vs 74%). While we observed pronounced enhancement in the last pre-saccadic bin, congruent and incongruent Hit Rates in the baseline bin were virtually identical. We therefore conclude that lower overall performance in the baseline bin did not prevent congruency effects from occurring. Instead, congruency effects started developing only after target appearance. We have added this potential explanation to the Results.

      1. Somewhat related to point 3, the authors conclude that the effects reported here are the result of saccade preparation/execution, however, a control condition in which the saccade is not performed is missing. This leaves me wondering whether the effect is only present during saccade preparation or if it may also be present to some extent or to its full extent when covert attention is engaged, i.e when subjects perform the same task without making a saccade.

      Foveal feedback has, as of now, exclusively been demonstrated during fixation (see references in Introduction and Discussion). In most of these studies, it was suggested that these effects (i.e., the foveal representation of a peripheral stimulus) may reflect the automatic preparation of an eye movement that was simply not executed[11,12,14]. Since foveal feedback has been demonstrated during fixation, and since eye movement preparation may influence foveal processing even when the eyes remain stationary, we considered it likely that congruency effects would emerge during fixation. Nonetheless, we agree with the reviewer that an explicit comparison between saccade preparation and fixation would enrich our data set and allow for stronger conclusions. We therefore collected additional data from seven observers. While all remaining experimental parameters were identical to Experiment 1, observers maintained fixation throughout each trial. We found that pre-saccadic foveal enhancement was more pronounced and emerged earlier than foveal enhancement during fixation. We present these data in the Results section (Figure 5) and have updated the Methods section to incorporate this additional experiment. We have furthermore added a paragraph to the Discussion which addresses potential mechanisms of foveal enhancement during fixation and saccade preparation.

      Furthermore, the reviewer’s comment helped us realize that we never stated a crucial part of our motivation explicitly. We now do so in the Introduction:

      “Despite the theoretical usefulness of such a mechanism, there are reasons to assume that foveal feedback may break down while an eye movement is prepared to a different visual field location. First and foremost, saccade preparation is accompanied with an obligatory shift of attention to the saccade target[6-8] which in turn has been shown to decrease foveal sensitivity[9]. Moreover, the execution of a rapid eye movement induces brief motion signals on the retina[20] which may mask or in other ways interfere with the pre-saccadic prediction signal. On a more conceptual level, the recruitment of foveal processing as an ‘active blackboard’[21] may become obsolete in the face of an imminent foveation of relevant peripheral stimuli – unless, of course, foveal processing serves the establishment of trans-saccadic visual continuity.”

      We believe that the additional data and the revisions to the Introduction and Discussion have strengthened our manuscript and thank the reviewer for this comment.

      1. Differently from other tasks addressing pre-saccadic perception in the literature here subjects do not have to discriminate the peripheral stimulus at the saccade goal, and most processing resources are presumably focused at the foveal location. Could this have influenced the results reported here?

      This is true. We intentionally made the features of the peripheral target as task-irrelevant as possible, contrary to previous investigations. We wanted to ensure that the enhancement we find would be automatic and not induced by a peripheral discrimination task, as we state in the Discussion and the Methods. We agree that the foveal detection task likely focused processing resources on the center of gaze in Experiment 1. In Experiment 2, however, we measured the spatial profile of enhancement which involved two different conditions:

      1. In each observer’s first six sessions, the probe could be presented anywhere on a horizontal axis of 9 dva length. On a given trial, an observer could not predict where it would appear, and therefore could not strategically allocate their attention. Nonetheless, enhancement of target-congruent orientation information was tuned to the fovea.
      2. In the final, seventh session, the probe appeared exclusively in one of two possible peripheral locations: 3 dva to the left or 3 dva to the right of the screen center. Observers were explicitly informed that the probe would never appear foveally, and processing resources should therefore have been allocated to the peripheral probe locations. The general performance level in this condition was comparable to performance in the fovea (see reply to the next comment). Nonetheless, we did not find peripheral enhancement of target-congruent information.

      Importantly, the magnitude of the foveal congruency effect in the PRE-only condition of Experiment 1 (i.e., when the target disappeared before the eyes landed on it) was comparable to the foveal congruency effect in Experiment 2 (PRE-only throughout), suggesting that the format of the task – i.e., purely foveal detection or foveal and peripheral detection – did not alter our findings.

      1. The spatial profile of the enhancement is very interesting and it clearly shows that the enhancement is limited to a central region. To which extent this profile is influenced by the fact that the probe was presented at larger eccentricities and therefore was less visible at 4.5 deg than it was at 0 deg? According to the caption, when the probe was presented more eccentrically the performance was raised to a foveal level by adaptively increasing probe transparency. This is not clear, was this done separately based on performance at baseline? Does this mean that the contrast of the stimulus was different for the points at +- 3 dva but the performance was comparable at baseline? Please explain.

      Based on the previous comment and comments of Reviewer #2, we realize that we should have explained this condition more extensively in the main text rather than in the Methods and have adapted the manuscript accordingly. As stated in our reply to the previous comment, Experiment 2 involved one session in which we addressed whether the lack of parafoveal/peripheral enhancement could be due to a simple decrease in acuity as mentioned by the reviewer. Observers were explicitly informed that the to-be detected stimulus (the probe) would appear either 3 dva to the left or right but never in the screen center and were shown slowed-down example trials for illustration. Observers then performed a staircase procedure which was targeted at determining the probe contrast at which performance for parafoveal target-incongruent probes would be just as high as foveal performance for target-incongruent probes had been in the previous six sessions. While the foveal probe was presented at a median opacity of 28.3±7.6%, an opacity of 39.0±11.1% was required to achieve the same performance level at a 3 dva eccentricity. Therefore, the gray curve in Figure 2D that represents incongruent Hits reaches its peak just under 80% on the y-axis. The gray dots at ±3 dva also range at ~80% on the y-axis. The performance level for target-incongruent probes (‘baseline’ here) in the parafovea is thus equal to foveal performance for target-incongruent probes. Target-congruent parafoveal feature information had the same “chance” to be enhanced as foveal information in the preceding sessions. Despite an equation of performance, we found no parafoveal enhancement. This suggests that enhancement is a true consequence of visual field location and not simply mediated by visual acuity at that location.

      1. The enhancement is significant within a region of 6.4 dva around the center of gaze. This is a rather large region, especially considering that it extends also in the direction opposite to the saccade. I was expecting the enhancement to be more confined to the central foveal region. Was the effect shown in Figure 2D influenced by the fact that saccades in this task were characterized by a large undershoot (Fig 1 D)? Did the effect change if only saccades landing closer to the target were included in the analysis? There may not be enough data for resolving the time course, but maybe there are differences in the size of the main effect.

      Width of the profile: In general, the width of the enhancement profile is likely to be influenced by two experimental/analysis choices: the size of the probe stimulus presented during the experiment and the width of the moving window combining adjacent probe locations for analysis.

      Probe size: Since the probe itself had a comparably large diameter of 3 dva, even the leftmost significant point at -2.6 dva could be explained by an enhancement of the foveal portion of the probe. We had mentioned this briefly in the Discussion but realize that this point is crucial and should be made more explicit. Moving window width: We designed the experiment with the intention to densely sample a range of spatial locations during data collection and combine a certain number of adjacent locations using a moving window during analysis (see preregistration: https://osf.io/6s24m). To ensure the reliability of every data point, the width of this window was chosen based on how many trials were lost during preprocessing. We chose a window width of 7 locations as this ensured that each data point contained at least 30 trials on an individual-observer level. Nonetheless, the width of the resulting enhancement profile depends on the width of the moving window:

      We added these caveats to the Results section and incorporated the figure above into the Supplements. We now state explicitly that…

      “the main conclusions that can be drawn are that enhancement i) peaks in the center of gaze, ii) is not uniform throughout the tested spatial range as, for instance, global feature-based attention would predict, and iii) is asymmetrical, extending further towards the saccade target than away from it.”

      For the above reasons, the absolute width of the profile should be interpreted with caution.

      Saccadic landing accuracy: To address the reviewer’s question, we inspected the spatial enhancement profile separately for trials in which the saccade landed on the target (i.e., within a radius of 1.5 dva from its center) or off-target but still within the accepted landing area. This trial separation criterion, besides appearing meaningful, ensured that all observers contributed trials to every data point. We had never resolved the time course in this experiment and could therefore not collapse across time points as suggested by the reviewer. To increase the number of trials per data point, we instead increased the width of the moving window sliding across locations from 6 to 9 neighboring locations (but see caveat above).

      Considering only saccades that landed on the target (‘accurate’; A) yielded significant enhancement from -2.6 to 2.1 dva and from 3.2 dva throughout the measured range towards the saccade target. Saccades that landed off-target (‘inaccurate’; B) showed a more pronounced asymmetry. When only considering inaccurate saccades, enhancement reached significance between -1.1 and 4.4 dva.

      The increased asymmetry for inaccurate saccades may be related to predictive remapping: since inaccurate saccades were hypometric on average, the predictively remapped location of the target was shifted towards the target by the magnitude of the undershoot. Asymmetric enhancement would therefore have boosted congruency at the remapped target location across all trials. In consequence, we inspected if aligning probe locations to the remapped target location on an individual-trial level would lead to a narrower profile for inaccurate saccades. This was not the case. Instead, we observed two parafoveal maxima (C). Their position on the x-axis equals the mean remapping-dependent leftwards (2.0 dva) and rightwards (1.9 dva) displacement across trials. In other words, they correspond to the pre-saccadic center of gaze. Note that these profiles could not be fitted with a mixture of Gaussians and were fitted using polynomials instead.  

      In sum, while we do not observe a clear narrowing of the enhancement profile for accurate saccades, the profile’s asymmetry is more pronounced for inaccurate eye movements. An increase in asymmetry could bear functional advantages since it would boost congruency at the remapped target location across all trials. Importantly though, this adjustment seems to rely on an estimate of average rather than single-trial saccade characteristics: aligning probe locations to the remapped attentional locus on an individual trial level provides further evidence that, irrespective of individual saccade endpoints, enhancement was aligned to the fovea. We have added these analyses to the Results section (Figure 3). We have also added the remapped profiles for all saccades and accurate saccades only to the Supplements.

      1. Is the size of the enhanced region around the center of gaze related to the precision of saccades? Presumably, if saccades are less precise a larger enhanced area may be more beneficial.

      This is a very interesting point. To address this question, we estimated each observer’s saccadic precision by computing bivariate kernel densities from their saccade landing coordinates. As we measured the horizontal extent of enhancement in our experiment, we defined the horizontal bandwidth as an estimate of saccadic imprecision. To estimate the size of the enhanced region for each observer, we created 10,000 bootstrapping samples for each observer’s congruent and incongruent HRs (4 locations combined at each step) We then determined the difference between the bootstrapped congruent and incongruent HRs and defined significantly enhanced locations as all locations for which <= 5% of these differences fell below zero. We then defined the width of the enhancement profile as the maximum number of consecutive significant locations.

      Instead of a positive correlation, we observed a negative correlation between the bandwidth of landing coordinates (i.e., saccadic imprecision) and the size of the enhanced window (r = -.56, p = .117). In other words, there was a non-significant tendency that the less precise an observer’s saccades, the narrower their estimated region of enhancement. We furthermore inspected the magnitude of enhancement per position within in the enhanced region. To do so, we computed the mean difference between congruent and incongruent HR across all positions in the enhanced region. The sizes of the orange circles in the figure above represent the resulting values (ranging from 2.9% to 13.3%). As saccadic precision decreases, the magnitude of enhancement per data point in the enhanced region tends to decrease as well. We therefore suggest that high saccadic precision is a sign of efficient oculomotor programming, which in turn allows peri-saccadic perceptual processes to operate more effectively. We added this analysis to the Supplements and refer to it in the Results section of the revised manuscript.

    1. Author Response

      Reviewer #3 (Public Review):

      The authors showed that D2R antagonism did not affect the initial dip amplitude but shortened the temporal length of the dip and the rebound ACh levels. In addition, by using both ACh and DA sensors, the authors showed DA levels correlate with ACh dip length and rebound level, not the dip amplitude. Both pieces of evidence support their conclusion that DA does not evoke the dip but controls the overall shape of ACh dip. Overall the current study provides solid data and interpretation. The combination of D2R antagonist and CIN-specific Drd2 KO further support a causal relationship between DA and ACh dip. Overall, the experiments are well-designed, carefully conducted and the manuscript is well-written.

      At the behavioral level, the author found a positive correlation between total AUC (of ACh signal dip) and press latency in Figure 10, indicating cholinergic levels contributes to the motivation. The next logic experiment would be to compare the press latency between control and ChAT-Drd2KO mice, since KO mice have smaller AUC while not affecting DA. However, this piece of information was missing in the manuscript. The author instead showed the correlation between AUC and latency was disrupted, which is indirectly related to the conclusion and hard to interpret. Figure 10 showed that eticlopride elongates the press latency, in a dose-dependent manner. However, it is not clear what this press latency means and how it was measured in this CRF task (Since there is no initial cue in the CRF test, how can we define the press latency?).

      We did compare the press latency between control and ChATDrd2KO mice (Figure 10B). At baseline (saline), there is no difference between press latency between these two groups. We measured press latency as the time to press the lever after the lever has been extended. When the lever extends, it makes a sound (cue), which signals to the mice that a new trial has started. The fact that press latency is not enhanced in ChATDrd2KO mice was surprising to us. It is possibly due to compensation via other neuronal mechanisms that regulate press latency (see discussion to comment 6 of public review).

      Pearson r<0.5 is normally defined as a weak correlation. It is better to state r values and discuss that in the manuscript.

      A valid comment. We clarified our correlation analyses in the methods section (line 717):

      “We used a variance explained statistical analysis (R2) to determine the % of variance in our correlation analyses (example: a correlation of 0.5 means 0.52 X 100= 25% of the variance in Y is “explained” or predicted by the X variable. When comparing correlation values, Fisher’s transformation was used to convert Pearson correlation coefficients to z-scores.”

      We also added this to the result section: e.g., line 256: “which accounts for 22% of the variance in the ACh decrease explained by the DA peak.

      Is there any correlation between ACh AUC and other behavior indexes such as press speed or the time between press and reward licking?

      We don’t have the ability to measure press speed and there is no press rate because the lever retracts after the first lever press. We quantified the correlation between time to press until head entry (press to reward latency) and ACh AUC and the results are difficult to interpret. For Drd2f/fl control mice we determined a weak negative correlation (the larger the ACh dip the lower the press to reward latency). In contrast, in ChATDrd2KO mice we found a weak positive correlation between ACh AUC and press to reward latency (the smaller the dip, the lower the press to reward latency). Given these conflicting results, it is difficult to determine how the ACh AUC affects press to reward latency.

      In figure 2B CS+ group, the author was focusing on the responses at CS+, however, the ACh dips at reward delivery seem to persist even after in this particular example. This might be an interesting phenomenon in which ACh got dissociated from DA signals, which needs further analysis from the author.

      We see a persistent signal at reward delivery in both DA and ACh up to the 8 days of testing. However, 1 mouse lost its optical fiber for the GACh signal so the data from Days 6-8 is from 2 mice. We also measured the correlation between DA and ACh at reward delivery for all 8 days of testing (see below). The correlation data is variable with the strongest correlation being observed on Day 2. It is possible that these signals could get dissociated after even more days of testing, but we do not have this data available.

    1. Author Response:

      Reviewer #1 (Public Review):

      Jo et al. use a combination of micropatterned differentiation, single cell RNA sequencing and pharmacological treatments to study primordial germ cell (PGC) differentiation starting from human pluripotent stem cells. Geometrical confinement in conjunction with a pre-differentiation step allowed the authors to reach remarkable differentiation efficiencies. While Minn et al. already reported the presence of PGC-like cells in micropatterned differentiating human cultures by scRNA-Seq (as acknowledged by the authors), the careful characterization of the PGC-like population using immunostainings and scRNA-Seq is a strength of the manuscript. The attempt at mechanistically dissecting the signaling pathways required for PGC fate specification is somehow weaker. The authors do not present sufficient evidence supporting the ability to specify PGC fate in the absence of Wnt signaling and the importance of the relative signaling levels of BMP to Nodal pathways; the wording of the text should be amended to better reflect the presented evidence or the authors should perform additional experiments to support these claims.

      We thank the reviewer for this comment. As described in more detail in the responses below, we have significantly strengthened the evidence for the rescue of Wnt inhibition by exogenous Activin treatment and have nuanced our interpretation. We believe that our data suggest low levels of Wnt may be required directly for PGC competence, while much higher levels are required indirectly to induce Nodal, with Nodal signaling being the limiting factor for PGC specification under the reference condition with BMP4 treatment only. We describe this in detail in the manuscript but summarize it here in a simplified diagram:

      We have also carried out additional experiments that match model predictions demonstrating the importance of relative BMP and Nodal signaling levels and amended the text to reflect the evidence as suggested. More details are provided below.

      The molecular characterization of why colonies confined to small areas differentiate much better would greatly increase the biological significance of the manuscript (the technical achievement of reaching such efficiency is impressive on its own).

      We believe the mechanism by which cells confined to small colonies differentiate to PGCLCs more efficiently is explained by a larger fraction of the cells being exposed to the necessary levels of BMP and Nodal signaling. In large colonies BMP signaling was shown to be restricted to a distance of 50-100 um from the colony edge through receptor localization and secretion of inhibitors (Etoc et al, Dev Cell 2016). From this one would expect that BMP signaling extends a similar distance from the edge in small colonies, so that a larger fraction of cells are receiving the BMP signal needed to differentiate to PGCLCs. Because it was not previously shown that the length scale of BMP signaling and downstream signals are preserved as colony size is reduced, we have now included an analysis of BMP signaling (pSmad1 levels) and Nodal signaling (nuclear Smad2/3 levels) as a function of colony size (Figure 5i-k). This confirms our hypothesis and provides a potential mechanism.

      The authors propose a mathematical model based on BMP and Nodal signaling that qualitatively recapitulate their experimental data. While the authors should be commended for providing examples of other simple models that do not fully recapitulate their data, it would have been nice to see an attempt at challenging quantitatively the model. In particular, the authors do not take advantage of the ability to explore in a more systematic manner the BMP/Nodal phase space with their system.

      We thank the reviewer for this suggestion. Experimentally we have now tested the effect of 5x5 = 25 different combinations of BMP and Activin doses on PGCLC differentiation. We then challenged the mathematical model to predict the ‘phase diagram’ corresponding to this data with good agreement (Figure 6f). It is important to note here that the model was fit using only data with 50ng/ml of BMP, making this a true prediction. We also point out that the phase diagram predicted in this way is different from the one shown in Figure 6d, not only because of the lower resolution, but because Figure 6f shows the steady state after uniform stimulation in space and time (i.e. the response on the very edge), whereas the predicted phase diagram shows average expression at 42h in a 100um range from the colony edge using the previously measured spatiotemporal gradients of BMP and Activin response. Finally, the data in Figure 6f shows mean expression levels as opposed to the percentage double positive cells for the same data in Figure 4q because our model does not simulate individual cells and noise, only allowing us to compare mean expression. We explain all this in the text now. As a minor change to facilitate comparison of data and model we have now plotted the concentrations of BMP and Activin in Figure 6 rather than the scaled model parameters from 0 to 1, we also further optimized the model parameters without qualitative changes.

      The authors' claim that PGCLC formation can be rescued by exogenous Activin when blocking endogenous Wnt production is surprising given the literature. The authors only show that they can restore a TFAP2C+SOX17+ population but do not actually stain for an established germ cell marker. It appears essential to perform a PRDM1 staining in these conditions (Figure 4A) to unambiguously identify this population.

      We have significantly extended our analysis of the effect of WNT inhibition and subsequent rescue of PGCs by Activin treatment. This includes staining for TFAP2C,NANOG,PRDM1 and staining for LEF1 as a measure of WNT signaling. Figure 4 and Figure 4—figure supplement 1 now also include treatment with IWR-1, a different small molecule inhibitor of WNT signaling, as well inhibition by IWR-1 and IWP2 at different times and different doses.

      The authors only provide weak evidence that the fates depend on the relative signaling levels of BMP and Nodal. Indeed, fewer cells acquire a fate the lower BMP concentration they use, including the fates marked by Sox17 expression. It would more convincing to show the assay of Figure 4F for a range of BMP concentrations at which the overall differentiation works sufficiently well.

      As suggested, we have now included a range of BMP concentrations. The reduction in PGCs at lower BMP doses is in line with our model and does not contradict a dependence on the relative signaling levels of BMP and Nodal by which we mean that optimal dose of Activin for PGCLC specification depends on the level of BMP and vice versa. We have amended the text to state this more clearly.

      References

      Chen, Di, Na Sun, Lei Hou, Rachel Kim, Jared Faith, Marianna Aslanyan, Yu Tao, et al. 2019. “Human Primordial Germ Cells Are Specified From Lineage-Primed Progenitors..” Cell Reports 29 (13): 4568–4582.e5. doi:10.1016/j.celrep.2019.11.083.

      Etoc, Fred, Jakob Metzger, Albert Ruzo, Christoph Kirst, Anna Yoney, M Zeeshan Ozair, Ali H Brivanlou, and Eric D Siggia. 2016. “A Balance Between Secreted Inhibitors and Edge Sensing Controls Gastruloid Self-Organization..” Developmental Cell 39 (3): 302–15. doi:10.1016/j.devcel.2016.09.016.

      Kobayashi, Toshihiro, Haixin Zhang, Walfred W C Tang, Naoko Irie, Sarah Withey, Doris Klisch, Anastasiya Sybirna, et al. 2017. “Principles of Early Human Development and Germ Cell Program From Conserved Model Systems..” Nature 546 (7658): 416–20. doi:10.1038/nature22812.

      Kojima, Yoji, Kotaro Sasaki, Shihori Yokobayashi, Yoshitake Sakai, Tomonori Nakamura, Yukihiro Yabuta, Fumio Nakaki, et al. 2017. “Evolutionarily Distinctive Transcriptional and Signaling Programs Drive Human Germ Cell Lineage Specification From Pluripotent Stem Cells..” Cell Stem Cell 21 (4): 517–532.e5. doi:10.1016/j.stem.2017.09.005.

      Sasaki, Kotaro, Tomonori Nakamura, Ikuhiro Okamoto, Yukihiro Yabuta, Chizuru Iwatani, Hideaki Tsuchiya, Yasunari Seita, et al. 2016. “The Germ Cell Fate of Cynomolgus Monkeys Is Specified in the Nascent Amnion..” Developmental Cell 39 (2): 169–85. doi:10.1016/j.devcel.2016.09.007.

      Tyser, R.C.V., Mahammadov, E., Nakanoh, S. et al. Single-cell transcriptomic characterization of a gastrulating human embryo. Nature 600, 285–289 (2021). https://doi.org/10.1038/s41586-021-04158-y

    1. Author Response:

      Reviewer #1:

      The paper uses a microfluidic-based method of cell volume measurement to examine single cell volume dynamics during cell spreading and osmotic shocks. The paper successfully shows that the cell volume is largely maintained during cell spreading, but small volume changes depend on the rate of cell deformation during spreading, and cell ionic homeostasis. Specifically, the major conclusion that there is a mechano-osmotic coupling between cell shape and cell osmotic regulation, I think, is correct. Moreover, the observation that fast deforming cell has a larger volume change is informative.

      The authors examined a large number of conditions and variables. It's a paper rich in data and general insights. The detailed mathematical model, and specific conclusions regarding the roles of ion channels and cytoskeleton, I believe, could be improved with further considerations.

      We thank the referee for the nice comment on our work and for the detailed suggestions for improving it.

      Major points of consideration are below.

      1) It would be very helpful if there is a discussion or validation of the FXm method accuracy. During spreading, the cell volume change is at most 10%. Is the method sufficiently accurate to consider 5-10% change? Some discussion about this would be useful for the reader.

      This is an important point and we are sorry if it was not made clear in our initial manuscript. We have now made it more clear in the text (p. 4 and Figure S1E and S1F).

      The important point is that the absolute accuracy of the volume measure is indeed in the 5 to 10% range, but the relative precision (repeated measures on the same cell) is much higher, rather in the 1% range, as detailed below based on experimental measures.

      1) Accuracy of absolute volume measurements. The accuracy of the absolute measure of the volume depends on several parameters which can vary from one experiment to the other: the exact height of the chamber, and the biological variability form one batch of cell to another (we found that the distribution of volumes in a population of cultured cells depends strongly on the details of the culture – seeding density, substrate, etc... - which we normalized as much as possible to reduce this variability, as described in previous articles, e.g. see2). To estimate this variability overall, the simplest is to compare the average volume of the cell population in different experiments, carried out in different chambers and on different days.

      Graph showing the initial average volume of cells +/- STD for 7 spreading experiments and 27 osmotic shock experiments, expressed as a % deviation from the average volume over all the experiments.

      The average deviation is of 10.9 +/- 8%

      2) Precision of relative volume measurements. When the same cell is imaged several times in a time-lapse experiment, as it is spreading on a substrate, or as it is swelling or shrinking during an osmotic shock, most of the variability occurring from one experiment to another does not apply. To experimentally assess the precision of the measure, we performed high time resolution (one image every 30 ms) volume measurements of 44 spread cells during 9 s. During this period of time, the volume of the cell should not change significantly, thus giving the precision of the measure.

      Graph showing the coefficient of variation of the volume (STD/mean) for each individual cell (n=44) across the almost 300 frames of the movie. This shows that on average the precision of volume measurements for the same cell is 0.97±0.21%. In addition, if more precision was needed, averaging several consecutive measures can further reduce the noise, a method which is very commonly used but that we did not have to apply to our dataset.

      We have included these results in the revised manuscript, since they might help the reader to estimate what can be obtained from this method of volume measurement. We also point the reviewer to previous research articles using this method and showing both population averages and time-lapse data2–8 . Another validation of our volume measurement method comes from the relative volume changes in response to osmotic shock (Ponder’s relation) measured with FXm, which gave results very similar to the numbers of previously published studies. We actually performed these experiments to validate our method, since the results are not novel.

      2) The role of cell active contraction (myosin dynamics) is completely neglected. The membrane tether tension results, LatA and Y-compound results all indicate that there is a large influence of myosin contraction during cell spreading. I think most would not be surprised by this. But the model has no contribution from cortical/cytoskeletal active stress. The authors are correct that the osmotic pressure is much larger than hydraulic pressure, which is related to active contraction. But near steady state volume, the osmotic pressure difference must be equal to hydraulic pressure difference, as demanded by thermodynamics. Therefore, near equilibrium they must be close to each other in magnitude. During cell spreading, water dynamics is near equilibrium (given the magnitude of volume change), and therefore is it conceptually correct to neglect myosin active contraction? BTW, 1 solute model does not imply equal osmolarity between cytoplasm and external media. 1 solute model with active contraction was considered before, e.g., ref. 17 and Tao, et al, Biophys. J. 2015, and the steady state solution gives hydraulic pressure difference equal to osmotic pressure difference.

      This is an excellent point raised by the referee. We have two types of answers for this. First an answer from an experimental point of view, which shows that acto-myosin contractility does not seem to play a direct role in the control of the cell volume, at least in the cells we used here. Based on these results we then propose a theoretical reason why this is the case. It contrasts with the view proposed in the articles mentioned by the referee for a reason which is not coming from the physical principles, with which we fully agree, but from the actual numbers, available in the literature, of the amount of the various types of osmolytes inside the cell. We give these points in more details below and we hope they will convince the referee. We also now mention them explicitly in the main text of the article (p. 6-7, Figure S3F) and in the Supplementary file with the model.

      A. Experimental results

      To test the effect of acto-myosin contraction on cell volume, we performed two experiments:

      1) We measured the volume of same cell before and after treatment with the Rho kinase ROCK inhibitor Y-27632, which decreases cortical contractility. The experiment was performed on cells plated on poly-L-Lysin (PLL), like osmotic shock experiments, a substrate on which cells adhere, allowing the change of solution, but do not spread and remain rounded. This allowed us to evaluate the effect of the drug. Cells were plated on PLL-coated glass. The change of medium itself (with control medium) induced a change of volume of less than 2%, similar to control osmotic shock experiments (maybe due to shear stress). When the cells were treated with Y-27, the change of volume was similar to the change with the control medium (now commented in the text p. 6-7, Figure S3F). To make the analysis more complete, we distinguished the cells that remained round throughout the experiment from the cells which slightly spread, since spreading could have an effect on volume. Indeed we observed that treatment with Y-27 induced more cells to spread (Figure S3F), probably because the cortex was less tensed, allowing the adhesive forces on PLL to induce more spreading9. Nevertheless, the spreading remained rather slow and the volume change of cells treated or not with Y-27 was not significantly different. This shows that, in the absence of fast spreading induced by Y-27, the reduction of contractility per se does not have any effect on the cell volume.

      Graphs showing proportion of cells that spread during the experiments (left); average relative volume of round (middle) and spread (right) control (N=3, n=77) and Y-27 treated cells (N=4, N=297).

      2) To evaluate the impact of a reduction of contractility in the total absence of adhesion, we measured the average volume of control cells versus cells which have been pretreated with Y-27, plated on a non-adhesive substrate (PLL-PEG treatment). This experiment showed that the volume of the cells evolved similarly in time for both conditions, proving that contractility per se has no effect on the cell volume or cell growth, in the absence of spreading.

      Graphs showing average relative volume of control (N=5, n=354) and Y-27 (N=3, n=292) treated cells plated on PLL-PEG (left); distributions of initial volume for control (middle) and Y-27 treated cells (right) represented on the left graph.

      Taken together these results show that inhibition of contractility per se does not significantly affect cell volume. It thus confirms our interpretation of our results on cell spreading that reduction of contractility has an effect on cell volume, specifically in the context of cell spreading, primarily because it affects the spreading speed.

      B. Theoretical interpretation

      In accordance with our experiments, in our model, the effect of contractility is implicitly included in the model because it modulates the spreading dynamics, which is an input to the model, i.e. through the parameters tau_a and A_0.

      We do not include the effect of contractility directly in the water transport equation because our quantitative estimates support that the contribution of the hydrostatic pressure to the volume (or the volume change) is negligible in comparison to the osmotic pressure, and this even for small variation near the steady-state volume. The main important point is that the concentration of ions inside the cell is actually much lower than outside of the cell10,11. The difference is about 100 mM and corresponds mostly to nonionic small trapped osmolytes, such as metabolites12. The osmotic pressure corresponding to this is about 10^5 Pa. Taking the cortical tension to be of order of 1 mN/m and cell size to be about ten microns we get a hydrostatic pressure difference of about 100 Pa due to cortical tension. A significant change in cell volume, of the order observed during cell spreading (let’s consider a ten percent decrease) will increase the osmotic pressure of the trapped nonionic osmolytes by 10^4 Pa (their number in the cell remaining identical). For this osmotic pressure to be balanced by an increase in the hydrostatic pressure, the cortical tension would need to increase by a factor of 100, which we consider to be unrealistic. Therefore, we find it reasonable to ignore the contribution of the hydrostatic pressure difference in the water flux equation. It is also consistent with the novel experiments presented above which show that inhibition of cortical contractility changes the cells volume below what can be detected by our measures (thus likely at maximum in the 1% range). This is now explained in the main text and Supplementary file.

      Regarding our minimal model required to define cell volume, the reason why we believe one solute model is not sufficient is fundamentally the same as above: the concentration of trapped osmolytes is comparable to the total osmolarity, which means that their contribution to the total osmotic pressure cannot be discarded. Secondly, within the simplest one solute model, the pump and leak dynamics fixes in inner osmolytes concentration but does not involve the actual cell size. The most natural term that depends on the size is the Laplace pressure (inversely proportional to the cell size in a spherical cell model). But as discussed above, this term may only permit osmotic pressure differences of the order of 100 Pa, corresponding to an osmolytes concentration difference of the order of 0.1 mM. That is only a tiny fraction of the external medium osmolarity, which is about 300 mM. Such a model could thus only work for extremely fine tuning of the pump and leak rates to values with less than about 1% variation. Furthermore, such a model could not explain finite volume changes upon osmotic shocks without involving huge (100-fold) cell surface tension variations, as discussed above. For these reasons, we believe that the one-solute model is not appropriate to describe our experiments, and we feel that a trapped population of nonionic osmolytes is needed to balance the osmolarity difference created by the solute pump and leak.

      In the revised version of the manuscript, we have now added a section in Supplementary file and in the main text, explaining in more detail this approximation.

      3) The authors considered the role of Na, K, and Cl in the model, and used pharmacological inhibitors of NHE exchanger. I think this part of the experiments and model are somewhat weak. I am not sure the conclusions drawn are robust. First there are many ion channels/pumps in regulating Na, K and Cl. The most important of which is NaK exchanger. NHE also involves H, and this is not in the model. The ion flux expressions in the model are also problematic. The authors correctly includes voltage and concentration dependences, but used a constant active term S_i in SM eq. 3 for active pumping. I am not sure this is correct. Ion pump fluxes have been studied and proposed expressions based on experimental data exist. A study of Na, K, Cl dynamics, and membrane voltage on cell volume dynamics was published in Yellen et al, Biophys. J. 2018. In that paper, they used different expressions based on previously proposed flux expressions. It might be correct that in small concentration differences, their expressions can be linearized or approximated to achieve similar expressions as here. But this point should be considered more carefully.

      We thank the reviewer for this comment. Indeed, we have not well justified our use of the NHE inhibitor EIPA. Our aim was not to directly affect the major ion pumps involved in volume regulation (which would indeed rather be the Na+/K+ exchanger), because that would likely strongly impact the initial volume of the cell and not only the volume response to spreading, making the interpretation more difficult. We based our choice on previous publication, e.g.13, showing that EIPA inhibited the main fast volume changes previously reported for cultured cells: it was shown to inhibit volume loss in spreading cells, as well as mitotic cell swelling14,15. Using EIPA, we also found that, while the initial volume was only slightly affected, the volume loss was completely abolished even in fast spreading cells (Y-27 and EIPA combined treatment, Figure S5H). This clearly proves that the volume loss behavior can be abolished, without changing the speed of spreading, which was our main aim with this experiment.

      The most direct effect of inhibiting NHE exchangers is to change the cell pH16,17, which, given the low number of H protons in the cell (negligible contribution to cells osmotic pressure), cannot affect the cell volume directly. A well-studied mechanism through which proton transport can have indirect effect on cell volume is through the effect of pH on ion transporters or due to the coupling between NHE and HCO3/Cl exchanger. The latter case is well studied in the literature18. In brief, the flux of proton out of the cell through the NHE due to Na gradient leads to an outflux of HC03 and an influx of Cl. The change in Cl concentration will have an effect on the osmolarity and cell volume.

      We thus performed hyperosmotic shocks with this drug and we found that, as expected, it had no effect on the immediate volume change (the Ponder’s relation), but affected the rate of volume recovery (combined with cell growth). Overall, the cells treated with EIPA showed a faster volume increase, which is what is expected if active pumping rate is reduced. This is in contrast with the above mentioned mechanism of volume regulation which will to lead to a reduced volume recovery of EIPA treated cells. This leads us to conclude that there is potentially another effect of NHE perturbation. Changing the pH will have a large impact on the functioning of many other processes, in particular, it can have an effect on ion transport16. Overall, the cells treated with EIPA showed a faster volume increase, which is what is expected if active pumping rate is reduced.

      On the model side, the referee correctly points out that there are many ion transporters that are known to play a role in volume regulation which are not included in Eq. 3. In the revised manuscript we now start with a more general ion transport equation. We show that the main equation (Eq.1 - or Supplementary file Eq.13) relating volume change to tension is not affected by this generalization. This is because we consider only the linear relation between the small changes in volume and tension. We note that the generic description of the PML (Supplementary file Eqs.1-6) can be seen as general and does not require the pump and channel rates to be constant; both \Lambda_i and S_i can be a function of potential and ion concentration along with membrane tension. It is only later in the analysis that we do make the assumption that these parameters only depend on tension. This point is now made clear in the Supplementary file.

      There is a huge body of work both theoretical and experimental in which the effect of different ion transporters on cell volume is analyzed. The aim of this work is not to provide an analysis of cell volume and the effect of various co-transporters but is rather limited to understanding the coupling between cell spreading, surface tension and cell volume.

      To analytically estimate the sign of the mechano-osmotic coupling parameter alpha we use a minimal model. For this we indeed take the pumps and channels to be constant. As it is again a perturbative expansion around the steady state concentration, electric potential, and volume, the expression of alpha can be easily computed for a model with more general ion transporters. This generalization will come at the cost of additional parameters in the alpha expression. We decided to keep the simpler transport model, the goal of this estimate is merely to show that the sign of alpha is not a given and depends on relative values of parameters. Even for the simple model we present, the sign of alpha could be changed by varying parameters within reasonable ranges.

      Given these points, and the clarification of the reasons to use EIPA in our experiments, a full mechanistic explanation of the effect of this drug is beyond the scope of this work. Because of this we are not analyzing the effect of EIPA on the model parameter alpha in detail. We now clarified our interpretation of these results in the main text of the article.

      Reviewer #2:

      The work by Venkova et al. addresses the role of plasma membrane tension in cell volume regulation. The authors study how different processes that exert mechanical stress on cells affect cell volume regulation, including cell spreading, cell confinement and osmotic shock experiments. They use live cell imaging, FXm (cell volume) and AFM measurements and perform a comparative approach using different cell lines. As a key result the authors find that volume regulation is associated with cell spreading rate rather than absolute spreading area. Pharmacological assays further identified Arp2/3 and NHE1 as molecular regulators of volume loss during cell spreading. The authors present a modified mechano-osmotic pump and leak model (PLM) based on the assumption of a mechanosensitive regulation of ion flux that controls cell volume.

      This work presents interesting data and theoretical modelling that contribute new insight into the mechanisms of cell volume regulation.

      We thank the referee for the nice comments on our work. We really appreciate the effort (s)he made to help us improve our article, including the careful inspection of the figures. We think our work is much improved thanks to his/her input.

      Reviewer #3:

      The study by Venkova and co-workers studies the coupling between cell volume and the osmotic balance of the cell. Of course, a lot of work as already been done on this subject, but the main specific contribution of this work is to study the fast dynamics of volume changes after several types of perturbations (osmotic shocks, cell spreading, and cell compression). The combination of volume dynamics at very high time resolution, and the robust fits obtained from an adapted Pump and Leak Model (PLM) makes the article a step-forward in our understanding of how cell volume is regulated during cell deformations. The authors clearly show that:

      -The rate at which cell deforms directly impacts the volume change

      -Below a certain deformation rate (either by cell spreading or external compression), the cells adapt fast enough not to change their volume. The plot dV/dt vs dA/dt shows a clear proportionality relation.

      -The theoretical description of volume change dynamics with the extended PLM makes the overall conclusions very solid.

      Overall the paper is very well written, contains an impressive amount of quantitative data, comparing several cell types and physiological and artificial conditions.

      We thank the referee for the positive comment on our work.

      My main concern about this study is related to the role of membrane tension. In the PLM model, the coupling of cell osmosis to cell deformation is made through the membrane-tension dependent activity of ion channels. While the role of ion channels is extensively tested, it brings some surprising results. Moreover, the tension is measured only at fixed time points, and the comparison to theoretical predictions is not always as convincing as expected: when comparing fig 6I and 6J, I see that predictions shows that EIPA (+ or - Y27), CK-666 (+ or - Y27) and Y27 alone should have lower tension than in the control conditions, and this is clearly not the case in fig 6J. But I would not like to emphasize too much on those discrepancies, as the drugs in the real case must have broad effects that may not be directly comparable to the theory.

      We apologize for the mislabeling of the Figure 6I (now Figure 5I). This plot shows the theoretical estimate for the difference in tension (in the units of homeostatic tension) between the case when the cell loses its volume upon spreading (as observed in experiments) compared to the hypothetical situation when the cell does not lose volume upon spreading (alpha = 0). The positive value of the tension difference predicts that the cell tension would have been higher if the cell were not losing volume upon spreading, which is the case for the treatments with EIPA and CK-666 (+ Y27) and corresponds to what we found experimentally.

      It thus matches our experimental observations for drug treatments which reduce or abolish the volume loss during spreading and correspond to higher tether force only at short time.

      We have corrected the figure and figure legend and explained it better in the text.

      But I wonder if the authors would have a better time showing that the dynamics of tension are as predicted by theory in the first place, as comparing theoretical predictions with experiments using drugs with pleiotropic effects may be hazardous.

      Actually, a recent publication (https://doi.org/10.1101/2021.01.22.427801) shows that tension follows volume changes during osmotic shocks, and overall find the same dynamics of volume changes than in this manuscript. I am thus wondering if the authors could use the same technique than describe in this paper (FLIM of flipper probe) in order to study the dynamics of tension in their system, or at least refer to this paper in order to support their claim that tension is the coupling factor between volume and deformation.

      As was suggested by the referee, we tried to use the FLIPPER probe. We first tried to reproduce osmotic shock experiments adding to the HeLa cells 4% of PEG400 (+~200 mOsm) or 50% of H20 (-~170 mOsm) and measuring the average probe lifetime before and after the shock. We found significantly lower probe lifetime for hyperosmotic condition compared with control, and non-significant, but slightly higher lifetime for hypoosmotic shock. The magnitude of lifetime changes was comparable with the study cited by the reviewer, but the quality of our measures did not allow us to have a better resolution. Next we measured average lifetime for control and CK-666+Y-27 treated cells 30 min and 3 h after plating, because we have highest tether force values for CK-666+Y-27 at 30 min. We did not see a change in lifetime in control cells between 30 min and 3 h (which also did not see with the tether pulling). Cells treated with CK-666+Y-27 showed a slightly lower lifetime values than control cells, but both 30 min and 3 h after plating, which means that it did not correspond to the transient effect of fast spreading but probably rather to the effect of the drugs on the measure.

      Graph showing FLIPPER lifetime before and after osmotic shock for HeLa cells plated on PLL- coated substrate. Left: control (N=3, n=119) and hyperosmotic shock (N=3, n=115); Right: control (N=3, n=101) and hypoosmotic shock (N=3, n=80). p-value are obtained by t-test.

      Graph showing FLIPPER lifetime for control just after the plating on PLL-coated glass (the same data for control shown at the previous graph), 30 min (control: N=3, n=88; Y-27+CK-666: N=3, n=130) and 3 h (control: N=3, n=78; Y-27+CK-666: N=3, n=142) after plating on fibronectin-coated glass. p-value are obtained by t-test.

      Because the cell to cell variability might mask the trend of single cell changes in lifetime during spreading, we also tried to follow the lifetime of individual cells every 5 min along the spreading. Most illuminated cells did not spread, while cells in non-illuminated fields of view spread well, suggesting that even with an image every 5 minutes and the lowest possible illumination, the imaging was too toxic to follow cell spreading in time. We could obtain measures for a few cells, which did not show any particular trend, but their spreading was not normal. So we cannot really conclude much from these experiments.

      Graph showing FLIPPER lifetime changes for 3 individual cells plated on fibronectin-coated glass (shown in blue, magenta and green) and average lifetime of cells from non-illuminated field (cyan, n=7)

      Our conclusions are the following:

      1) We are able to visualize some change in the lifetime of the probe for osmotic shock experiments, similar to the published results, but with a rather large cell to cell variability.

      2) The spreading experiments comparing 30 minutes and 3 hours, in control or drug treated cells did not reproduce the results we observed with tether pulling, with a global effect of the drugs on the measures at both 30 min and 3 hours.

      3) Following single cells in time led to too much toxicity and prevented normal spreading.

      We think that this technology, which is still in its early developments, especially in terms of the microscope setting that has to be used (and we do not have it in our Institute, so we had to go on a platform in another institute with limited time to experiment), cannot be implemented in the frame of the revision of this article to provide reliable results. We thus consider that these experiments are for further development of the work and are out of the scope of this study. It would be very interesting to study in details the comparison between the oldest and more established method of tether pulling and the novel method of the FLIPPER probe, during cell spreading and in other contexts. To our knowledge this has never been done so far, so it is not in the frame of this study that we can do it. It is not clear from the literature that the two methods would measure the same thing in all conditions even if they might match in some.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript will interest cognitive scientists, neuroimaging researchers, and neuroscientists interested in the systems-level organization of brain activity. The authors describe four brain states that are present across a wide range of cognitive tasks and determine that the relative distribution of the brain states shows both commonalities and differences across task conditions.

      The authors characterized the low-dimensional latent space that has been shown to capture the major features of intrinsic brain activity using four states obtained with a Hidden Markov Model. They related the four states to previously-described functional gradients in the brain and examined the relative contribution of each state under different cognitive conditions. They showed that states related to the measured behavior for each condition differed, but that a common state appears to reflect disengagement across conditions. The authors bring together a state-of-the-art analysis of systemslevel brain dynamics and cognitive neuroscience, bridging a gap that has long needed to be bridged.

      The strongest aspect of the study is its rigor. The authors use appropriate null models and examine multiple datasets (not used in the original analysis) to demonstrate that their findings replicate. Their thorough analysis convincingly supports their assertion that common states are present across a variety of conditions, but that different states may predict behavioural measures for different conditions. However, the authors could have better situated their work within the existing literature. It is not that a more exhaustive literature review is needed-it is that some of their results are unsurprising given the work reported in other manuscripts; some of their work reinforces or is reinforced by prior studies; and some of their work is not compared to similar findings obtained with other analysis approaches. While space is not unlimited, some of these gaps are important enough that they are worth addressing:

      We appreciate the reviewer’s thorough read of our manuscript and positive comments on its rigor and implications. We agree that the original version of the manuscript insufficiently situated this work in the existing literature. We have made extensive revisions to better place our findings in the context of prior work. These changes are described in detail below.

      1) The authors' own prior work on functional connectivity signatures of attention is not discussed in comparison to the latest work. Neither is work from other groups showing signatures of arousal that change over time, particularly in resting state scans. Attention and arousal are not the same things, but they are intertwined, and both have been linked to large-scale changes in brain activity that should be captured in the HMM latent states. The authors should discuss how the current work fits with existing studies.

      Thank you for raising this point. We agree that the relationship between low-dimensional latent states and predefined activity and functional connectivity signatures is an important and interesting question in both attention research and more general contexts. Here, we did not empirically relate the brain states examined in this study and functional connectivity signatures previously investigated in our lab (e.g., Rosenberg et al., 2016; Song et al., 2021a) because the research question and methodological complexities deserved separate attention that go beyond the scope of this paper. Therefore, we conceptually addressed the reviewer’s question on how functional connectivity signatures of attention are related to the brain states that were observed here. Next, we asked how arousal relates to the brain states by indirectly predicting arousal levels of each brain state based on its activity patterns’ spatial resemblance to the predefined arousal network template (Goodale et al., 2021).

      Latent states and dynamic functional connectivity

      Previous work suggested that, on medium time scales (~20-60 seconds), changes in functional connectivity signatures of sustained attention (Rosenberg et al., 2020) and narrative engagement (Song et al., 2021a) predicted changes in attentional states. How do these attention-related functional connectivity dynamics relate to latent state dynamics, measured on a shorter time scale (1 second)?

      Theoretically, there are reasons to think that these measures are related but not redundant. Both HMM and dynamic functional connectivity provide summary measures of the whole-brain functional interactions that evolve over time. Whereas HMM identifies recurring low-dimensional brain states, dynamic functional connectivity used in our and others’ prior studies captures high-dimensional dynamical patterns. Furthermore, while the mixture Gaussian function utilized to infer emission probability in our HMM infers the states from both the BOLD activity patterns and their interactions, functional connectivity considers only pairwise interactions between regions of interests. Thus, with a theoretical ground that the brain states can be characterized at multiple scales and different methods (Greene et al., 2023), we can hypothesize that the both measures could (and perhaps, should be able to) capture brain-wide latent state changes. For example, if we were to apply kmeans clustering methods on the sliding window-based dynamic functional connectivity as in Allen et al. (2014), the resulting clusters could arguably be similar to the latent states derived from the HMM.

      However, there are practical reasons why the correspondence between our prior dynamic functional connectivity models and current HMM states is difficult to test directly. A time point-bytime point matching of the HMM state sequence and dynamic functional connectivity is not feasible because, in our prior work, dynamic functional connectivity was measured in a sliding time window (~20-60 seconds), whereas the HMM state identification is conducted at every TR (1 second). An alternative would be to concatenate all time points that were categorized as each HMM state to compute representative functional connectivity of that state. This “splicing and concatenating” method, however, disrupts continuous BOLD-signal time series and has not previously been validated for use with our dynamic connectome-based predictive models. In addition, the difference in time series lengths across states would make comparisons of the four states’ functional connectomes unfair.

      One main focus of our manuscript was to relate brain dynamics (HMM state dynamics) to static manifold (functional connectivity gradients). We agree that a direct link between two measures of brain dynamics, HMM and dynamic functional connectivity, is an important research question. However, due to some intricacies that needed to be addressed to answer this question, we felt that it was beyond the scope of our paper. We are eager, however, to explore these comparisons in future work which can more thoroughly address the caveats associated with comparing models of sustained attention, narrative engagement, and arousal defined using different input features and methods.

      Arousal, attention, and latent neural state dynamics

      Next, the reviewer posed an important question about the relationship between arousal, attention, and latent states. The current study was designed to assess the relationship between attention and latent state dynamics. However, previous neuroimaging work showed that low-dimensional brain dynamics reflect fluctuations in arousal (Raut et al., 2021; Shine et al., 2016; Zhang et al., 2023). Behavioral studies showed that attention and arousal hold a non-linear relationship, for example, mind-wandering states are associated with lower arousal and externally distracted states are associated with higher arousal, when both these states indicate low attention (Esterman and Rothlein, 2019; Unsworth and Robison, 2018, 2016).

      To address the reviewer’s suggestion, we wanted to test if our brain states reflected changes in arousal, but we did not collect relevant behavioral or physiological measures. Therefore, to indirectly test for relationships, we predicted levels of arousal in brain states by applying the “arousal network template” defined by Dr. Catie Chang’s group (Chang et al., 2016; Falahpour et al., 2018; Goodale et al., 2021). The arousal network template was created from resting-state fMRI data to predict arousal levels indicated by eye monitoring and electrophysiological signals. In the original study, the arousal level at each time point was predicted by the correlation between the BOLD activity patterns of each TR to the arousal template. The more similar the whole-brain activation pattern was to the arousal network template, the higher the participant was predicted to be aroused at that moment. This activity pattern-based model was generalized to fMRI data during tasks (Goodale et al., 2021).

      We correlated the arousal template to the activity patterns of the four brain states that were inferred by the HMM. The DMN state was positively correlated with the arousal template (r=0.264) and the SM state was negatively correlated with the arousal template (r=-0.303) (Author response image 1). These values were not tested for significance because they were single observations. While speculative, this may suggest that participants are in a high arousal state during the DMN state and a low arousal state during the SM state. Together with our results relating brain states to attention, it is possible that the SM state is a common state indicating low arousal and low attention. On the other hand, the DMN state, a signature of a highly aroused state, may benefit gradCPT task performance but not necessarily in engaging with a sitcom episode. However, because this was a single observation and we did not collect a physiological measure of arousal to validate this indirect prediction result, we did not include the result in the manuscript. We hope to more directly test this question in future work with behavioral and physiological measures of arousal.

      Author response image 1.

      Changes made to the manuscript

      Importantly, we agree with the reviewer that a theoretical discussion about the relationships between functional connectivity, latent states, gradients, as well as attention and arousal was a critical omission from the original Discussion. We edited the Discussion to highlight past literature on these topics and encourage future work to investigate these relationships.

      [Manuscript, page 11] “Previous studies showed that large-scale neural dynamics that evolve over tens of seconds capture meaningful variance in arousal (Raut et al., 2021; Zhang et al., 2023) and attentional states (Rosenberg et al., 2020; Yamashita et al., 2021). We asked whether latent neural state dynamics reflect ongoing changes in attention in both task and naturalistic contexts.”

      [Manuscript, page 17] “Previous work showed that time-resolved whole-brain functional connectivity (i.e., paired interactions of more than a hundred parcels) predicts changes in attention during task performance (Rosenberg et al., 2020) as well as movie-watching and story-listening (Song et al., 2021a). Future work could investigate whether functional connectivity and the HMM capture the same underlying “brain states” to bridge the results from the two literatures. Furthermore, though the current study provided evidence of neural state dynamics reflecting attention, the same neural states may, in part, reflect fluctuations in arousal (Chang et al., 2016; Zhang et al., 2023). Complementing behavioral studies that demonstrated a nonlinear relationship between attention and arousal (Esterman and Rothlein, 2019; Unsworth and Robison, 2018, 2016), future studies collecting behavioral and physiological measures of arousal can assess the extent to which attention explains neural state dynamics beyond what can be explained by arousal fluctuations.”

      2) The 'base state' has been described in a number of prior papers (for one early example, see https://pubmed.ncbi.nlm.nih.gov/27008543). The idea that it might serve as a hub or intermediary for other states has been raised in other studies, and discussion of the similarity or differences between those studies and this one would provide better context for the interpretation of the current work. One of the intriguing findings of the current study is that the incidence of this base state increases during sitcom watching, the strongest evidence to date is that it has a cognitive role and is not merely a configuration of activity that the brain must pass through when making a transition.

      We greatly appreciate the reviewer’s suggestion of prior papers. We were not aware of previous findings of the base state at the time of writing the manuscript, so it was reassuring to see consistent findings. In the Discussion, we highlighted the findings of Chen et al. (2016) and Saggar et al. (2022). Both studies highlighted the role of the base state as a “hub”-like transition state. However, as the reviewer noted, these studies did not address the functional relevance of this state to cognitive states because both were based on resting-state fMRI.

      In our revised Discussion, we write that our work replicates previous findings of the base state that consistently acted as a transitional hub state in macroscopic brain dynamics. We also note that our study expands this line of work by characterizing what functional roles the base state plays in multiple contexts: The base state indicated high attentional engagement and exhibited the highest occurrence proportion as well as longest dwell times during naturalistic movie watching. The base state’s functional involvement was comparatively minor during controlled tasks.

      [Manuscript, page 17-18] “Past resting-state fMRI studies have reported the existence of the base state. Chen et al. (2016) used the HMM to detect a state that had “less apparent activation or deactivation patterns in known networks compared with other states”. This state had the highest occurrence probability among the inferred latent states, was consistently detected by the model, and was most likely to transition to and from other states, all of which mirror our findings here. The authors interpret this state as an “intermediate transient state that appears when the brain is switching between other more reproducible brain states”. The observation of the base state was not confined to studies using HMMs. Saggar et al. (2022) used topological data analysis to represent a low-dimensional manifold of resting-state whole-brain dynamics as a graph, where each node corresponds to brain activity patterns of a cluster of time points. Topologically focal “hub” nodes were represented uniformly by all functional networks, meaning that no characteristic activation above or below the mean was detected, similar to what we observe with the base state. The transition probability from other states to the hub state was the highest, demonstrating its role as a putative transition state.

      However, the functional relevance of the base state to human cognition had not been explored previously. We propose that the base state, a transitional hub (Figure 2B) positioned at the center of the gradient subspace (Figure 1D), functions as a state of natural equilibrium. Transitioning to the DMN, DAN, or SM states reflects incursion away from natural equilibrium (Deco et al., 2017; Gu et al., 2015), as the brain enters a functionally modular state. Notably, the base state indicated high attentional engagement (Figure 5E and F) and exhibited the highest occurrence proportion (Figure 3B) as well as the longest dwell times (Figure 3—figure supplement 1) during naturalistic movie watching, whereas its functional involvement was comparatively minor during controlled tasks. This significant relevance to behavior verifies that the base state cannot simply be a byproduct of the model. We speculate that susceptibility to both external and internal information is maximized in the base state—allowing for roughly equal weighting of both sides so that they can be integrated to form a coherent representation of the world—at the expense of the stability of a certain functional network (Cocchi et al., 2017; Fagerholm et al., 2015). When processing rich narratives, particularly when a person is fully immersed without having to exert cognitive effort, a less modular state with high degrees of freedom to reach other states may be more likely to be involved. The role of the base state should be further investigated in future studies.”

      3) The link between latent states and functional connectivity gradients should be considered in the context of prior work showing that the spatiotemporal patterns of intrinsic activity that account for most of the structure in resting state fMRI also sweep across functional connectivity gradients (https://pubmed.ncbi.nlm.nih.gov/33549755/). In fact, the spatiotemporal dynamics may give rise to the functional connectivity gradients (https://pubmed.ncbi.nlm.nih.gov/35902649/). HMM states bear a marked resemblance to the high-activity phases of these patterns and are likely to be closely linked to them. The spatiotemporal patterns are typically obtained during rest, but they have been reported during task performance (https://pubmed.ncbi.nlm.nih.gov/30753928/) which further suggests a link to the current work. Similar patterns have been observed in anesthetized animals, which also reinforces the conclusion of the current work that the states are fundamental aspects of the brain's functional organization.

      We appreciate the comments that relate spatiotemporal patterns, functional connectivity gradients, and the latent states derived from the HMM. Our work was also inspired by the papers that the reviewer suggested, especially Bolt et al.’s (2022), which compared the results of numerous dimensionality and clustering algorithms and suggested three spatiotemporal patterns that seemed to be commonly supported across algorithms. We originally cited these studies throughout the manuscript, but did not discuss them comprehensively. We have revised the Discussion to situate our findings on past work that used resting-state fMRI to study low-dimensional latent brain states.

      [Manuscript, page 15-16] “This perspective is supported by previous work that has used different methods to capture recurring low-dimensional states from spontaneous fMRI activity during rest. For example, to extract time-averaged latent states, early resting-state analyses identified task-positive and tasknegative networks using seed-based correlation (Fox et al., 2005). Dimensionality reduction algorithms such as independent component analysis (Smith et al., 2009) extracted latent components that explain the largest variance in fMRI time series. Other lines of work used timeresolved analyses to capture latent state dynamics. For example, variants of clustering algorithms, such as co-activation patterns (Liu et al., 2018; Liu and Duyn, 2013), k-means clustering (Allen et al., 2014), and HMM (Baker et al., 2014; Chen et al., 2016; Vidaurre et al., 2018, 2017), characterized fMRI time series as recurrences of and transitions between a small number of states. Time-lag analysis was used to identify quasiperiodic spatiotemporal patterns of propagating brain activity (Abbas et al., 2019; Yousefi and Keilholz, 2021). A recent study extensively compared these different algorithms and showed that they all report qualitatively similar latent states or components when applied to fMRI data (Bolt et al., 2022). While these studies used different algorithms to probe data-specific brain states, this work and ours report common latent axes that follow a long-standing theory of large-scale human functional systems (Mesulam, 1998). Neural dynamics span principal axes that dissociate unimodal to transmodal and sensory to motor information processing systems.”

      Reviewer #2 (Public Review):

      In this study, Song and colleagues applied a Hidden Markov Model to whole-brain fMRI data from the unique SONG dataset and a grad-CPT task, and in doing so observed robust transitions between lowdimensional states that they then attributed to specific psychological features extracted from the different tasks.

      The methods used appeared to be sound and robust to parameter choices. Whenever choices were made regarding specific parameters, the authors demonstrated that their approach was robust to different values, and also replicated their main findings on a separate dataset.

      I was mildly concerned that similarities in some of the algorithms used may have rendered some of the inter-measure results as somewhat inevitable (a hypothesis that could be tested using appropriate null models).

      This work is quite integrative, linking together a number of previous studies into a framework that allows for interesting follow-up questions.

      Overall, I found the work to be robust, interesting, and integrative, with a wide-ranging citation list and exciting implications for future work.

      We appreciate the reviewer’s comments on the study’s robustness and future implications. Our work was highly motivated by the reviewer’s prior work.

      Reviewer #3 (Public Review):

      My general assessment of the paper is that the analyses done after they find the model are exemplary and show some interesting results. However, the method they use to find the number of states (Calinski-Harabasz score instead of log-likelihood), the model they use generally (HMM), and the fact that they don't show how they find the number of states on HCP, with the Schaeffer atlas, and do not report their R^2 on a test set is a little concerning. I don't think this perse impedes their results, but it is something that they can improve. They argue that the states they find align with long-standing ideas about the functional organization of the brain and align with other research, but they can improve their selection for their model.

      We appreciate the reviewer’s thorough read of the paper, evaluation of our analyses linking brain states to behavior as “exemplary”, and important questions about the modeling approach. We have included detailed responses below and updated the manuscript accordingly.

      Strengths:

      • Use multiple datasets, multiple ROIs, and multiple analyses to validate their results

      • Figures are convincing in the sense that patterns clearly synchronize between participants

      • Authors select the number of states using the optimal model fit (although this turns out to be a little more questionable due to what they quantify as 'optimal model fit')

      We address this concern on page 30-31 of this response letter.

      • Replication with Schaeffer atlas makes results more convincing

      • The analyses around the fact that the base state acts as a flexible hub are well done and well explained

      • Their comparison of synchrony is well-done and comparing it to resting-state, which does not have any significant synchrony among participants is obvious, but still good to compare against.

      • Their results with respect to similar narrative engagement being correlated with similar neural state dynamics are well done and interesting.

      • Their results on event boundaries are compelling and well done. However, I do not find their Chang et al. results convincing (Figure 4B), it could just be because it is a different medium that explains differences in DMN response, but to me, it seems like these are just altogether different patterns that can not 100% be explained by their method/results.

      We entirely agree with the reviewer that the Chang et al. (2021) data are different in many ways from our own SONG dataset. Whereas data from Chang et al. (2021) were collected while participants listened to an audio-only narrative, participants in the SONG sample watched and listened to audiovisual stimuli. They were scanned at different universities in different countries with different protocols by different research groups for different purposes. That is, there are numerous reasons why we would expect the model should not generalize. Thus, we found it compelling and surprising that, despite all of these differences between the datasets, the model trained on the SONG dataset generalized to the data from Chang et al. (2021). The results highlighted a robust increase in the DMN state occurrence and a decrease in the base state occurrence after the narrative event boundaries, irrespective of whether the stimulus was an audiovisual sitcom episode or a narrated story. This external model validation was a way that we tested the robustness of our own model and the relationship between neural state dynamics and cognitive dynamics.

      • Their results that when there is no event, transition into the DMN state comes from the base state is 50% is interesting and a strong result. However, it is unclear if this is just for the sitcom or also for Chang et al.'s data.

      We apologize for the lack of clarity. We show the statistical results of the two sitcom episodes as well as Chang et al.’s (2021) data in Figure 4—figure supplement 2 in our original manuscript. Here, we provide the exact values of the base-to-DMN state transition probability, and how they differ across moments after event boundaries compared to non-event boundaries.

      For sitcom episode 1, the probability of base-to-DMN state transition was 44.6 ± 18.8 % at event boundaries whereas 62.0 ± 10.4 % at non-event boundaries (FDR-p = 0.0013). For sitcom episode 2, the probability of base-to-DMN state transition was 44.1 ± 18.0 % at event boundaries whereas 62.2 ± 7.6 % at non-event boundaries (FDR-p = 0.0006). For the Chang et al. (2021) dataset, the probability of base-to-DMN state transition was 33.3 ± 15.9 % at event boundaries whereas 58.1 ± 6.4 % at non-event boundaries (FDR-p < 0.0001). Thus, our result, “At non-event boundaries, the DMN state was most likely to transition from the base state, accounting for more than 50% of the transitions to the DMN state” (pg 11, line 24-25), holds true for both the internal and external datasets.

      • The involvement of the base state as being highly engaged during the comedy sitcom and the movie are interesting results that warrant further study into the base state theory they pose in this work.

      • It is good that they make sure SM states are not just because of head motion (P 12).

      • Their comparison between functional gradient and neural states is good, and their results are generally well-supported, intuitive, and interesting enough to warrant further research into them. Their findings on the context-specificity of their DMN and DAN state are interesting and relate well to the antagonistic relationship in resting-state data.

      Weaknesses:

      • Authors should train the model on part of the data and validate on another

      Thank you for raising this issue. To the best of our knowledge, past work that applied the HMM to the fMRI data has conducted training and inference on the same data, including initial work that implemented HMM on the resting-state fMRI (Baker et al., 2014; Chen et al., 2016; Vidaurre et al., 2018, 2017) as well as more recent work that applied HMMs to the task or movie-watching fMRI (Cornblath et al., 2020; Taghia et al., 2018; van der Meer et al., 2020; Yamashita et al., 2021). That is, the parameters—emission probability, transition probability, and initial probability—were estimated from the entire dataset and the latent state sequence was inferred using the Viterbi algorithm on the same dataset.

      However, we were also aware of the potential problem this may have. Therefore, in our recent work asking a different research question in another fMRI dataset (Song et al., 2021b), we trained an HMM on a subset of the dataset (moments when participants were watching movie clips in the original temporal order) and inferred latent state sequence of the fMRI time series in another subset of the dataset (moments when participants were watching movie clips in a scrambled temporal order). To the best of our knowledge, this was the first paper that used different segments of the data to fit and infer states from the HMM.

      In the current study, we wanted to capture brain states that underlie brain activity across contexts. Thus, we presented the same-dataset training and inference procedure as our primary result. However, for every main result, we also showed results where we separated the data used for model fitting and state inference. That is, we fit the HMM on the SONG dataset, primarily report the inference results on the SONG dataset, but also report inference on the external datasets that were not included in model fitting. The datasets used were the Human Connectome Project dataset (Van Essen et al., 2013), Chang et al. (2021) audio-listening dataset, Rosenberg et al. (2016) gradCPT dataset, and Chen et al. (2017) Sherlock dataset.

      However, to further address the concern of the reviewer whether the HMM fit is reliable when applied to held-out data, we computed the reliability of the HMM inference by conducting crossvalidations and split-half reliability analysis.

      (1) Cross-validation

      To separate the dataset used for HMM training and inference, we conducted cross-validation on the SONG dataset (N=27) by training the model with the data from 26 participants and inferring the latent state sequence of the held-out participant.

      First, we compared the robustness of the model training by comparing the mean activity patterns of the four latent states fitted at the group level (N=27) with the mean activity patterns of the four states fitted across cross-validation folds. Pearson’s correlations between the group-level vs. cross-validated latent states’ mean activity patterns were r = 0.991 ± 0.010, with a range from 0.963 to 0.999.

      Second, we compared the robustness of model inference by comparing the latent state sequences that were inferred at the group level vs. from held-out participants in a cross-validation scheme. All fMRI conditions had mean similarity higher than 90%; Rest 1: 92.74 ± 5.02 %, Rest2: 92.74 ± 4.83 %, GradCPT face: 92.97 ± 6.41 %, GradCPT scene: 93.27 ± 5.76 %, Sitcom ep1: 93.31 ± 3.92 %, Sitcom ep2: 93.13 ± 4.36 %, Documentary: 92.42 ± 4.72 %.

      Third, with the latent state sequences inferred from cross-validation, we replicated the analysis of Figure 3 to test for synchrony of the latent state sequences across participants. The crossvalidated results were highly similar to manuscript Figure 3, which was generated from the grouplevel analysis. Mean synchrony of latent state sequences are as follows: Rest 1: 25.90 ± 3.81%, Rest 2: 25.75 ± 4.19 %, GradCPT face: 27.17 ± 3.86 %, GradCPT scene: 28.11 ± 3.89 %, Sitcom ep1: 40.69 ± 3.86%, Sitcom ep2: 40.53 ± 3.13%, Documentary: 30.13 ± 3.41%.

      Author response image 2.

      (2) Split-half reliability

      To test for the internal robustness of the model, we randomly assigned SONG dataset participants into two groups and conducted HMM separately in each. Similarity (Pearson’s correlation) between the two groups’ activation patterns were DMN: 0.791, DAN: 0.838, SM: 0.944, base: 0.837. The similarity of the covariance patterns were DMN: 0.995, DAN: 0.996, SM: 0.994, base: 0.996.

      Author response image 3.

      We further validated the split-half reliability of the model using the HCP dataset, which contains data of a larger sample (N=119). Similarity (Pearson’s correlation) between the two groups’ activation patterns were DMN: 0.998, DAN: 0.997, SM: 0.993, base: 0.923. The similarity of the covariance patterns were DMN: 0.995, DAN: 0.996, SM: 0.994, base: 0.996.

      Together the cross-validation and split-half reliability results demonstrate that the HMM results reported in the manuscript are reliable and robust to the way we conducted the analysis. The result of the split-half reliability analysis is added in the Results.

      [Manuscript, page 3-4] “Neural state inference was robust to the choice of 𝐾 (Figure 1—figure supplement 1) and the fMRI preprocessing pipeline (Figure 1—figure supplement 5) and consistent when conducted on two groups of randomly split-half participants (Pearson’s correlations between the two groups’ latent state activation patterns: DMN: 0.791, DAN: 0.838, SM: 0.944, base: 0.837).”

      • Comparison with just PCA/functional gradients is weak in establishing whether HMMs are good models of the timeseries. Especially given that the HMM does not explain a lot of variance in the signal (~0.5 R^2 for only 27 brain regions) for PCA. I think they don't report their own R^2 of the timeseries

      We agree with the reviewer that the PCA that we conducted to compare with the explained variance of the functional gradients was not directly comparable because PCA and gradients utilize different algorithms to reduce dimensionality. To make more meaningful comparisons, we removed the data-specific PCA results and replaced them with data-specific functional gradients (derived from the SONG dataset). This allows us to directly compare SONG-specific functional gradients with predefined gradients (derived from the resting-state HCP dataset from Margulies et al. [2016]). We found that the degrees to which the first two predefined gradients explained whole-brain fMRI time series (SONG: 𝑟! = 0.097, HCP: 0.084) were comparable to the amount of variance explained by the first two data-specific gradients (SONG: 𝑟! = 0.100, HCP: 0.086). Thus, the predefined gradients explain as much variance in the SONG data time series as SONG-specific gradients do. This supports our argument that the low-dimensional manifold is largely shared across contexts, and that the common HMM latent states may tile the predefined gradients.

      These analyses and results were added to the Results, Methods, and Figure 1—figure supplement 8. Here, we only attach changes to the Results section for simplicity, but please see the revised manuscript for further changes.

      [Manuscript, page 5-6] “We hypothesized that the spatial gradients reported by Margulies et al. (2016) act as a lowdimensional manifold over which large-scale dynamics operate (Bolt et al., 2022; Brown et al., 2021; Karapanagiotidis et al., 2020; Turnbull et al., 2020), such that traversals within this manifold explain large variance in neural dynamics and, consequently, cognition and behavior (Figure 1C). To test this idea, we situated the mean activity values of the four latent states along the gradients defined by Margulies et al. (2016) (see Methods). The brain states tiled the two-dimensional gradient space with the base state at the center (Figure 1D; Figure1—figure supplement 7). The Euclidean distances between these four states were maximized in the two-dimensional gradient space, compared to a chance where the four states were inferred from circular-shifted time series (p < 0.001). For the SONG dataset, the DMN and SM states fell at more extreme positions of the primary gradient than expected by chance (both FDR-p values = 0.004; DAN and SM states, FDRp values = 0.171). For the HCP dataset, the DMN and DAN states fell at more extreme positions on the primary gradient (both FDR-p values = 0.004; SM and base states, FDR-p values = 0.076). No state was consistently found at the extremes of the secondary gradient (all FDR-p values > 0.021).

      We asked whether the predefined gradients explain as much variance in neural dynamics as latent subspace optimized for the SONG dataset. To do so, we applied the same nonlinear dimensionality reduction algorithm to the SONG dataset’s ROI time series. Of note, the SONG dataset includes 18.95% rest, 15.07% task, and 65.98% movie-watching data whereas the data used by Margulies et al. (2016) was 100% rest. Despite these differences, the SONG-specific gradients closely resembled the predefined gradients, with significant Pearson’s correlations observed for the first (r = 0.876) and second (r = 0.877) gradient embeddings (Figure 1—figure supplement 8). Gradients identified with the HCP data also recapitulated Margulies et al.’s (2016) first (r = 0.880) and second (r = 0.871) gradients. We restricted our analysis to the first two gradients because the two gradients together explained roughly 50% of the entire variance of functional brain connectome (SONG: 46.94%, HCP: 52.08%), and the explained variance dropped drastically from the third gradients (more than 1/3 drop compared to second gradients). The degrees to which the first two predefined gradients explained whole-brain fMRI time series (SONG: 𝑟! = 0.097, HCP: 0.084) were comparable to the amount of variance explained by the first two data-specific gradients (SONG: 𝑟! = 0.100, HCP: 0.086; Figure 1—figure supplement 8). Thus, the low-dimensional manifold captured by Margulies et al. (2016) gradients is highly replicable, explaining brain activity dynamics as well as data-specific gradients, and is largely shared across contexts and datasets. This suggests that the state space of whole-brain dynamics closely recapitulates low-dimensional gradients of the static functional brain connectome.”

      The reviewer also pointed out that the PCA-gradient comparison was weak in establishing whether HMMs are good models of the time series. However, we would like to point out that the purpose of the comparison was not to validate the performance of the HMM. Instead, we wanted to test whether the gradients introduced by Margulies et al. (2016) could act as a generalizable lowdimensional manifold of brain state dynamics. To argue that the predefined gradients are a shared manifold, these gradients should explain SONG data fMRI time series as much as the principal components derived directly from the SONG data. Our results showed comparable 𝑟!, both in predefined gradient vs. data-specific PC comparisons and predefined gradient vs. data-specific gradient comparisons, which supported our argument that the predefined gradients could be the shared embedding space across contexts and datasets.

      The reviewer pointed out that the 𝑟2 of ~0.5 is not explaining enough variance in the fMRI signal. However, we respectfully disagree with this point because there is no established criterion for what constitutes a high or low 𝑟2 for this type of analysis. Of note, previous literature that also applied PCA to fMRI time series (Author response image 4A and 4B) (Lynn et al., 2021; Shine et al., 2019) also found that the cumulative explained variance of top 5 principal components is around 50%. Author response image 4C shows cumulative variances to which gradients explain the functional connectome of the resting-state fMRI data (Margulies et al., 2016).

      Author response image 4.

      Finally, the reviewer pointed out that the 𝑟! of the HMM-derived latent sequence to the fMRI time series should be reported. However, there is no standardized way of measuring the explained variance of the HMM inference. There is no report of explained variance in the traditional HMMfMRI papers (Baker et al., 2014; Chen et al., 2016; Vidaurre et al., 2018, 2017). Rather than 𝑟!, the HMM computes the log likelihood of the model fit. However, because log likelihood values are dependent on the number of data points, studies do not report log likelihood values nor do they use these metrics to interpret the goodness of model fit.

      To ask whether the goodness of the HMM fit was significant above chance, we compared the log likelihood of the HMM to the log likelihood distribution of the null HMM fits. First, we extracted the log likelihood of the HMM fit with the real fMRI time series. We iterated this 1,000 times when calculating null HMMs using the circular-shifted fMRI time series. The log likelihood of the real model was significantly higher than the chance distribution, with a z-value of 2182.5 (p < 0.001). This indicates that the HMM explained a large variance in our fMRI time series data, significantly above chance.

      • Authors do not specify whether they also did cross-validation for the HCP dataset to find 4 clusters

      We apologize for the lack of clarity. When we computed the Calinski-Harabasz score with the HCP dataset, three was chosen as the most optimal number of states (Author response image 5A). When we set K as 3, the HMM inferred the DMN, DAN, and SM states (Author response image 5C). The base state was included when K was set to 4 (Author response image 5B). The activation pattern similarities of the DMN, DAN, and SM states were r = 0.981, 0.984, 0.911 respectively.

      Author response image 5.

      We did not use K = 3 for the HCP data replication because we were not trying to test whether these four states would be the optimal set of states in every dataset. Although the CalinskiHarabasz score chose K = 3 because it showed the best clustering performance, this does not mean that the base state is not meaningful to this dataset. Likewise, the latent states that are inferred when we increase/decrease the number of states are also meaningful states. For example, in Figure 1—figure supplement 1, we show an example of the SONG dataset’s latent states when we set K to 7. The seven latent states included the DAN, SM, and base states, the DMN state was subdivided into DMN-A and DMN-B states, and the FPN state and DMN+VIS state were included. Setting a higher number of states like K = 7 would mean that we are capturing brain state dynamics in a higher dimension than when using K = 4. Because we are utilizing a higher number of states, a model set to K = 7 would inevitably capture a larger variance of fMRI time series than a model set to K = 4.

      The purpose of latent state replication with the HCP dataset was to validate the generalizability of the DMN, DAN, SM, and base states. Before characterizing these latent states’ relevance to cognition, we needed to verify that these latent states were not simply overfit to the SONG dataset. The fact that the HMM revealed a similar set of latent states when applied to the HCP dataset suggested that the states were not merely specific to SONG data.

      To make our points clearer in the manuscript, we emphasized that we are not arguing for the four states to be the exclusive states. We made edits to Discussion as follows.

      [Manuscript, page 16] “Our study adopted the assumption of low dimensionality of large-scale neural systems, which led us to intentionally identify only a small number of states underlying whole-brain dynamics. Importantly, however, we do not claim that the four states will be the optimal set of states in every dataset and participant population. Instead, latent states and patterns of state occurrence may vary as a function of individuals and tasks (Figure 1—figure supplement 2). Likewise, while the lowest dimensions of the manifold (i.e., the first two gradients) were largely shared across datasets tested here, we do not argue that it will always be identical. If individuals and tasks deviate significantly from what was tested here, the manifold may also differ along with changes in latent states (Samara et al., 2023). Brain systems operate at different dimensionalities and spatiotemporal scales (Greene et al., 2023), which may have different consequences for cognition. Asking how brain states and manifolds—probed at different dimensionalities and scales—flexibly reconfigure (or not) with changes in contexts and mental states is an important research question for understanding complex human cognition.”

      • One of their main contributions is the base state but the correlation between the base state in their Song dataset and the HCP dataset is only 0.399

      This is a good point. However, there is precedent for lower spatial pattern correlation of the base state compared to other states in the literature.

      Compared to the DMN, DAN, and SM states, the base state did not show characteristic activation or deactivation of functional networks. Most of the functional networks showed activity levels close to the mean (z = 0). With this flattened activation pattern, relatively low activation pattern similarity was observed between the SONG base state and the HCP base state.

      In Figure 1—figure supplement 6, we write, “The DMN, DAN, and SM states showed similar mean activity patterns. We refrained from making interpretations about the base state’s activity patterns because the mean activity of most of the parcels was close to z = 0”.

      A similar finding has been reported in a previous work by Chen et al. (2016) that discovered the base state with HMM. State 9 (S9) of their results is comparable to our base state. They report that even though the spatial correlation coefficient of the brain state from the split-half reliability analysis was the lowest for S9 due to its low degrees of activation or deactivation, S9 was stably inferred by the HMM. The following is a direct quote from their paper:

      “To the best of our knowledge, a state similar to S9 has not been presented in previous literature. We hypothesize that S9 is the “ground” state of the brain, in which brain activity (or deactivity) is similar for the entire cortex (no apparent activation or deactivation as shown in Fig. 4). Note that different groups of subjects have different spatial patterns for state S9 (Fig. 3A). Therefore, S9 has the lowest reproducible spatial pattern (Fig. 3B). However, its temporal characteristics allowed us to distinguish it consistently from other states.” (Chen et al., 2016)

      Thus, we believe our data and prior results support the existence of the “base state”.

      • Figure 1B: Parcellation is quite big but there seems to be a gradient within regions

      This is a function of the visualization software. Mean activity (z) is the same for all voxels within a parcel. To visualize the 3D contours of the brain, we chose an option in the nilearn python function that smooths the mean activity values based on the surface reconstructed anatomy.

      In the original manuscript, our Methods write, “The brain surfaces were visualized with nilearn.plotting.plot_surf_stat_map. The parcel boundaries in Figure 1B are smoothed from the volume-to-surface reconstruction.”

      • Figure 1D: Why are the DMNs further apart between SONG and HCP than the other states

      To address this question, we first tested whether the position of the DMN states in the gradient space is significantly different for the SONG and HCP datasets. We generated surrogate HMM states from the circular-shifted fMRI time series and positioned the four latent states and the null DMN states in the 2-dimensional gradient space (Author response image 6).

      Author response image 6.

      We next tested whether the Euclidean distance between the SONG dataset’s DMN state and the HCP dataset’s DMN state is larger than would be expected by chance (Author response image 7). To do so, we took the difference between the DMN state positions and compared it to the 1,000 differences generated from the surrogate latent states. The DMN states of the SONG and HCP datasets did not significantly differ in the Gradient 1 dimension (two-tailed test, p = 0.794). However, as the reviewer noted, the positions differed significantly in the Gradient 2 dimension (p = 0.047). The DMN state leaned more towards the Visual gradient in the SONG dataset, whereas it leaned more towards the Somatosensory-Motor gradient in the HCP dataset.

      Author response image 7.

      Though we cannot claim an exact reason for this across-dataset difference, we note a distinctive difference between the SONG and HCP datasets. Both datasets largely included resting-state, controlled tasks, and movie watching. The SONG dataset included 18.95% of rest, 15.07% of task, and 65.98% of movie watching. The task only contained the gradCPT, i.e., sustained attention task. On the other hand, the HCP dataset included 52.71% of rest, 24.35% of task, and 22.94% of movie watching. There were 7 different tasks included in the HCP dataset. It is possible that different proportions of rest, task, and movie watching, and different cognitive demands involved with each dataset may have created data-specific latent states.

      • Page 5 paragraph starting at L25: Their hypothesis that functional gradients explain large variance in neural dynamics needs to be explained more, is non-trivial especially because their R^2 scores are so low (Fig 1. Supplement 8) for PCA

      We address this concern on page 21-23 of this response letter.

      • Generally, I do not find the PCA analysis convincing and believe they should also compare to something like ICA or a different model of dynamics. They do not explain their reasoning behind assuming an HMM, which is an extremely simplified idea of brain dynamics meaning they only change based on the previous state.

      We appreciate this perspective. We replaced the Margulies et al.’s (2016) gradient vs. SONGspecific PCA comparison with a more direct Margulies et al.’s (2016) gradient vs. SONG-specific gradient comparison as described on page 21-23 of this response letter.

      More broadly, we elected to use HMM because of recent work showing correspondence between low-dimensional HMM states and behavior (Cornblath et al., 2020; Taghia et al., 2018; van der Meer et al., 2020; Yamashita et al., 2021). We also found the model’s assumption—a mixture Gaussian emission probability and first-order Markovian transition probability—to be the most suited to analyzing the fMRI time series data. We do not intend to claim that other data-reduction techniques would not also capture low-dimensional, behaviorally relevant changes in brain activity. Instead, our primary focus was identifying a set of latent states that generalize (i.e., recur) across multiple contexts and understanding how those states reflect cognitive and attentional states.

      Although a comparison of possible data-reduction algorithms is out of the scope of the current work, an exhaustive comparison of different models can be found in Bolt et al. (2022). The authors compared dozens of latent brain state algorithms spanning zero-lag analysis (e.g., principal component analysis, principal component analysis with Varimax rotation, Laplacian eigenmaps, spatial independent component analysis, temporal independent component analysis, hidden Markov model, seed-based correlation analysis, and co-activation patterns) to time-lag analysis (e.g., quasi-periodic pattern and lag projections). Bolt et al. (2022) writes “a range of empirical phenomena, including functional connectivity gradients, the task-positive/task-negative anticorrelation pattern, the global signal, time-lag propagation patterns, the quasiperiodic pattern and the functional connectome network structure, are manifestations of the three spatiotemporal patterns.” That is, many previous findings that used different methods essentially describe the same recurring latent states. A similar argument was made in previous papers (Brown et al., 2021; Karapanagiotidis et al., 2020; Turnbull et al., 2020).

      We agree that the HMM is a simplified idea of brain dynamics. We do not argue that the four number of states can fully explain the complexity and flexibility of cognition. Instead, we hoped to show that there are different dimensionalities to which the brain systems can operate, and they may have different consequences to cognition. We “simplified” neural dynamics to a discrete sequence of a small number of states. However, what is fascinating is that these overly “simplified” brain state dynamics can explain certain cognitive and attentional dynamics, such as event segmentation and sustained attention fluctuations. We highlight this point in the Discussion.

      [Manuscript, page 16] “Our study adopted the assumption of low dimensionality of large-scale neural systems, which led us to intentionally identify only a small number of states underlying whole-brain dynamics. Importantly, however, we do not claim that the four states will be the optimal set of states in every dataset and participant population. Instead, latent states and patterns of state occurrence may vary as a function of individuals and tasks (Figure 1—figure supplement 2). Likewise, while the lowest dimensions of the manifold (i.e., the first two gradients) were largely shared across datasets tested here, we do not argue that it will always be identical. If individuals and tasks deviate significantly from what was tested here, the manifold may also differ along with changes in latent states (Samara et al., 2023). Brain systems operate at different dimensionalities and spatiotemporal scales (Greene et al., 2023), which may have different consequences for cognition. Asking how brain states and manifolds—probed at different dimensionalities and scales—flexibly reconfigure (or not) with changes in contexts and mental states is an important research question for understanding complex human cognition.”

      • For the 25- ROI replication it seems like they again do not try multiple K values for the number of states to validate that 4 states are in fact the correct number.

      In the manuscript, we do not argue that the four will be the optimal number of states in any dataset. (We actually predict that this may differ depending on the amount of data, participant population, tasks, etc.) Instead, we claim that the four identified in the SONG dataset are not specific (i.e., overfit) to that sample, but rather recur in independent datasets as well. More broadly we argue that the complexity and flexibility of human cognition stem from the fact that computation occurs at multiple dimensions and that the low-dimensional states observed here are robustly related to cognitive and attentional states. To prevent misunderstanding of our results, we emphasized in the Discussion that we are not arguing for a fixed number of states. A paragraph included in our response to the previous comment (page 16 in the manuscript) illustrates this point.

      • Fig 2B: Colorbar goes from -0.05 to 0.05 but values are up to 0.87

      We apologize for the confusion. The current version of the figure is correct. The figure legend states, “The values indicate transition probabilities, such that values in each row sums to 1. The colors indicate differences from the mean of the null distribution where the HMMs were conducted on the circular-shifted time series.”

      We recognize that this complicates the interpretation of the figure. However, after much consideration, we decided that it was valuable to show both the actual transition probabilities (values) and their difference from the mean of null HMMs (colors). The values demonstrate the Markovian property of latent state dynamics, with a high probability of remaining in the same state at consecutive moments and a low probability of transitioning to a different state. The colors indicate that the base state is a transitional hub state by illustrating that the DMN, DAN, and SM states are more likely to transition to the base state than would be expected by chance.

      • P 16 L4 near-critical, authors need to be more specific in their terminology here especially since they talk about dynamic systems, where near-criticality has a specific definition. It is unclear which definition they are looking for here.

      We agree that our explanation was vague. Because we do not have evidence for this speculative proposal, we removed the mention of near-criticality. Instead, we focus on our observation as the base state being the transitional hub state within a metastable system.

      [Manuscript, page 17-18] “However, the functional relevance of the base state to human cognition had not been explored previously. We propose that the base state, a transitional hub (Figure 2B) positioned at the center of the gradient subspace (Figure 1D), functions as a state of natural equilibrium. Transitioning to the DMN, DAN, or SM states reflects incursion away from natural equilibrium (Deco et al., 2017; Gu et al., 2015), as the brain enters a functionally modular state. Notably, the base state indicated high attentional engagement (Figure 5E and F) and exhibited the highest occurrence proportion (Figure 3B) as well as the longest dwell times (Figure 3—figure supplement 1) during naturalistic movie watching, whereas its functional involvement was comparatively minor during controlled tasks. This significant relevance to behavior verifies that the base state cannot simply be a byproduct of the model. We speculate that susceptibility to both external and internal information is maximized in the base state—allowing for roughly equal weighting of both sides so that they can be integrated to form a coherent representation of the world—at the expense of the stability of a certain functional network (Cocchi et al., 2017; Fagerholm et al., 2015). When processing rich narratives, particularly when a person is fully immersed without having to exert cognitive effort, a less modular state with high degrees of freedom to reach other states may be more likely to be involved. The role of the base state should be further investigated in future studies.”

      • P16 L13-L17 unnecessary

      We prefer to have the last paragraph as a summary of the implications of this paper. However, if the length of this paper becomes a problem as we work towards publication with the editors, we are happy to remove these lines.

      • I think this paper is solid, but my main issue is with using an HMM, never explaining why, not showing inference results on test data, not reporting an R^2 score for it, and not comparing it to other models. Secondly, they use the Calinski-Harabasz score to determine the number of states, but not the log-likelihood of the fit. This clearly creates a bias in what types of states you will find, namely states that are far away from each other, which likely also leads to the functional gradient and PCA results they have. Where they specifically talk about how their states are far away from each other in the functional gradient space and correlated to (orthogonal) components. It is completely unclear to me why they used this measure because it also seems to be one of many scores you could use with respect to clustering (with potentially different results), and even odd in the presence of a loglikelihood fit to the data and with the model they use (which does not perform clustering).

      (1) Showing inference results on test data

      We address this concern on page 19-21 of this response letter.

      (2) Not reporting 𝑹𝟐 score

      We address this concern on page 21-23 of this response letter.

      (3) Not comparing the HMM model to other models

      We address this concern on page 27-28 of this response letter.

      (4) The use of the Calinski-Harabasz score to determine the number of states rather than the log-likelihood of the model fit

      To our knowledge, the log-likelihood of the model fit is not used in the HMM literature. It is because the log-likelihood tends to increase monotonically as the number of states increases. Baker et al. (2014) illustrates this problem, writing:

      “In theory, it should be possible to pick the optimal number of states by selecting the model with the greatest (negative) free energy. In practice however, we observe that the free energy increases monotonically up to K = 15 states, suggesting that the Bayes-optimal model may require an even higher number of states.”

      Similarly, the following figure is the log-likelihood estimated from the SONG dataset. Similar to the findings of Baker et al. (2014), the log-likelihood monotonically increased as the number of states increased (Author response image 8, right). The measures like AIC or BIC, which account for the number of parameters, also have the same issue of monotonic increase.

      Author response image 8.

      Because there is “no straightforward data-driven approach to model order selection” (Baker et al., 2014), past work has used different approaches to decide on the number of states. For example, Vidaurre et al. (2018) iterated over a range of the number of states to repeat the same HMM training and inference procedures 5 times using the same hyperparameters. They selected the number of states that showed the highest consistency across iterations. Gao et al. (2021) tested the clustering performance of the model output using the Calinski-Harabasz score. The number of states that showed the highest within-cluster cohesion compared to the across-cluster separation was selected as the number of states. Chang et al. (2021) applied HMM to voxels of the ventromedial prefrontal cortex using a similar clustering algorithm, writing: “To determine the number of states for the HMM estimation procedure, we identified the number of states that maximized the average within-state spatial similarity relative to the average between-state similarity”. In our previous paper (Song et al., 2021b), we reported both the reliability and clustering performance measures to decide on the number of states.

      In the current manuscript, the model consistency criterion from Vidaurre et al. (2018) was ineffective because the HMM inference was extremely robust (i.e., always inferring the exact same sequence) due to a large number of data points. Thus, we used the Calinski-Harabasz score as our criterion for the number of states selected.

      We agree with the reviewer that the selection of the number of states is critical to any study that implements HMM. However, the field lacks a consensus on how to decide on the number of states in the HMM, and the Calinski-Harabasz score has been validated in previous studies. Most importantly, the latent states’ relationships with behavioral and cognitive measures give strong evidence that the latent states are indeed meaningful states. Again, we are not arguing that the optimal set of states in any dataset will be four nor are we arguing that these four states will always be the optimal states. Instead, the manuscript proposes that a small number of latent states explains meaningful variance in cognitive dynamics.

      • Grammatical error: P24 L29 rendering seems to have gone wrong

      Our intention was correct here. To avoid confusion, we changed “(number of participantsC2 iterations)” to “(#𝐶!iterations, where N=number of participants)” (page 26 in the manuscript).

      Questions:

      • Comment on subject differences, it seems like they potentially found group dynamics based on stimuli, but interesting to see individual differences in large-scale dynamics, and do they believe the states they find mostly explain global linear dynamics?

      We agree with the reviewer that whether low-dimensional latent state dynamics explain individual differences—above and beyond what could be explained by the high-dimensional, temporally static neural signatures of individuals (e.g., Finn et al., 2015)—is an important research question. However, because the SONG dataset was collected in a single lab, with a focus on covering diverse contexts (rest, task, and movie watching) over 2 sessions, we were only able to collect 27 participants. Due to this small sample size, we focused on investigating group-level, shared temporal dynamics and across-condition differences, rather than on investigating individual differences.

      Past work has studied individual differences (e.g., behavioral traits like well-being, intelligence, and personality) using the HMM (Vidaurre et al., 2017). In the lab, we are working on a project that investigates latent state dynamics in relation to individual differences in clinical symptoms using the Healthy Brain Network dataset (Ji et al., 2022, presented at SfN; Alexander et al., 2017).

      Finally, the reviewer raises an interesting question about whether the latent state sequence that was derived here mostly explains global linear dynamics as opposed to nonlinear dynamics. We have two responses: one methodological and one theoretical. First, methodologically, we defined the emission probabilities as a linear mixture of Gaussian distributions for each input dimension with the state-specific mean (mean fMRI activity patterns of the networks) and variance (functional covariance across networks). Therefore, states are modeled with an assumption of linearity of feature combinations. Theoretically, recent work supports in favor of nonlinearity of large-scale neural dynamics, especially as tasks get richer and more complex (Cunningham and Yu, 2014; Gao et al., 2021). However, whether low-dimensional latent states should be modeled nonlinearly—that is, whether linear algorithms are insufficient at capturing latent states compared to nonlinear algorithms—is still unknown. We agree with the reviewer that the assumption of linearity is an interesting topic in systems neuroscience. However, together with prior work which showed how numerous algorithms—either linear or nonlinear—recapitulated a common set of latent states, we argue that the HMM provides a strong low-dimensional model of large-scale neural activity and interaction.

      • P19 L40 why did the authors interpolate incorrect or no-responses for the gradCPT runs? It seems more logical to correct their results for these responses or to throw them out since interpolation can induce huge biases in these cases because the data is likely not missing at completely random.

      Interpolating the RTs of the trials without responses (omission errors and incorrect trials) is a standardized protocol for analyzing gradCPT data (Esterman et al., 2013; Fortenbaugh et al., 2018, 2015; Jayakumar et al., 2023; Rosenberg et al., 2013; Terashima et al., 2021; Yamashita et al., 2021). The choice of this analysis is due to an assumption that sustained attention is a continuous attentional state; the RT, a proxy for the attentional state in the gradCPT literature, is a noisy measure of a smoothed, continuous attentional state. Thus, the RTs of the trials without responses are interpolated and the RT time courses are smoothed by convolving with a gaussian kernel.

      References

      Abbas A, Belloy M, Kashyap A, Billings J, Nezafati M, Schumacher EH, Keilholz S. 2019. Quasiperiodic patterns contribute to functional connectivity in the brain. Neuroimage 191:193–204.

      Alexander LM, Escalera J, Ai L, Andreotti C, Febre K, Mangone A, Vega-Potler N, Langer N, Alexander A, Kovacs M, Litke S, O’Hagan B, Andersen J, Bronstein B, Bui A, Bushey M, Butler H, Castagna V, Camacho N, Chan E, Citera D, Clucas J, Cohen S, Dufek S, Eaves M, Fradera B, Gardner J, Grant-Villegas N, Green G, Gregory C, Hart E, Harris S, Horton M, Kahn D, Kabotyanski K, Karmel B, Kelly SP, Kleinman K, Koo B, Kramer E, Lennon E, Lord C, Mantello G, Margolis A, Merikangas KR, Milham J, Minniti G, Neuhaus R, Levine A, Osman Y, Parra LC, Pugh KR, Racanello A, Restrepo A, Saltzman T, Septimus B, Tobe R, Waltz R, Williams A, Yeo A, Castellanos FX, Klein A, Paus T, Leventhal BL, Craddock RC, Koplewicz HS, Milham MP. 2017. Data Descriptor: An open resource for transdiagnostic research in pediatric mental health and learning disorders. Sci Data 4:1–26.

      Allen EA, Damaraju E, Plis SM, Erhardt EB, Eichele T, Calhoun VD. 2014. Tracking whole-brain connectivity dynamics in the resting state. Cereb Cortex 24:663–676.

      Baker AP, Brookes MJ, Rezek IA, Smith SM, Behrens T, Probert Smith PJ, Woolrich M. 2014. Fast transient networks in spontaneous human brain activity. Elife 3:e01867.

      Bolt T, Nomi JS, Bzdok D, Salas JA, Chang C, Yeo BTT, Uddin LQ, Keilholz SD. 2022. A Parsimonious Description of Global Functional Brain Organization in Three Spatiotemporal Patterns. Nat Neurosci 25:1093–1103.

      Brown JA, Lee AJ, Pasquini L, Seeley WW. 2021. A dynamic gradient architecture generates brain activity states. Neuroimage 261:119526.

      Chang C, Leopold DA, Schölvinck ML, Mandelkow H, Picchioni D, Liu X, Ye FQ, Turchi JN, Duyn JH. 2016. Tracking brain arousal fluctuations with fMRI. Proc Natl Acad Sci U S A 113:4518–4523.

      Chang CHC, Lazaridi C, Yeshurun Y, Norman KA, Hasson U. 2021. Relating the past with the present: Information integration and segregation during ongoing narrative processing. J Cogn Neurosci 33:1–23.

      Chang LJ, Jolly E, Cheong JH, Rapuano K, Greenstein N, Chen P-HA, Manning JR. 2021. Endogenous variation in ventromedial prefrontal cortex state dynamics during naturalistic viewing reflects affective experience. Sci Adv 7:eabf7129.

      Chen J, Leong YC, Honey CJ, Yong CH, Norman KA, Hasson U. 2017. Shared memories reveal shared structure in neural activity across individuals. Nat Neurosci 20:115–125.

      Chen S, Langley J, Chen X, Hu X. 2016. Spatiotemporal Modeling of Brain Dynamics Using RestingState Functional Magnetic Resonance Imaging with Gaussian Hidden Markov Model. Brain Connect 6:326–334.

      Cocchi L, Gollo LL, Zalesky A, Breakspear M. 2017. Criticality in the brain: A synthesis of neurobiology, models and cognition. Prog Neurobiol 158:132–152.

      Cornblath EJ, Ashourvan A, Kim JZ, Betzel RF, Ciric R, Adebimpe A, Baum GL, He X, Ruparel K, Moore TM, Gur RC, Gur RE, Shinohara RT, Roalf DR, Satterthwaite TD, Bassett DS. 2020. Temporal sequences of brain activity at rest are constrained by white matter structure and modulated by cognitive demands. Commun Biol 3:261.

      Cunningham JP, Yu BM. 2014. Dimensionality reduction for large-scale neural recordings. Nat Neurosci 17:1500–1509.

      Deco G, Kringelbach ML, Jirsa VK, Ritter P. 2017. The dynamics of resting fluctuations in the brain: Metastability and its dynamical cortical core. Sci Rep 7:3095.

      Esterman M, Noonan SK, Rosenberg M, Degutis J. 2013. In the zone or zoning out? Tracking behavioral and neural fluctuations during sustained attention. Cereb Cortex 23:2712–2723.

      Esterman M, Rothlein D. 2019. Models of sustained attention. Curr Opin Psychol 29:174–180.

      Fagerholm ED, Lorenz R, Scott G, Dinov M, Hellyer PJ, Mirzaei N, Leeson C, Carmichael DW, Sharp DJ, Shew WL, Leech R. 2015. Cascades and cognitive state: Focused attention incurs subcritical dynamics. J Neurosci 35:4626–4634.

      Falahpour M, Chang C, Wong CW, Liu TT. 2018. Template-based prediction of vigilance fluctuations in resting-state fMRI. Neuroimage 174:317–327.

      Finn ES, Shen X, Scheinost D, Rosenberg MD, Huang J, Chun MM, Papademetris X, Constable RT. 2015. Functional connectome fingerprinting: Identifying individuals using patterns of brain connectivity. Nat Neurosci 18:1664–1671.

      Fortenbaugh FC, Degutis J, Germine L, Wilmer JB, Grosso M, Russo K, Esterman M. 2015. Sustained attention across the life span in a sample of 10,000: Dissociating ability and strategy. Psychol Sci 26:1497–1510.

      Fortenbaugh FC, Rothlein D, McGlinchey R, DeGutis J, Esterman M. 2018. Tracking behavioral and neural fluctuations during sustained attention: A robust replication and extension. Neuroimage 171:148–164.

      Fox MD, Snyder AZ, Vincent JL, Corbetta M, Van Essen DC, Raichle ME. 2005. The human brain is intrinsically organized into dynamic, anticorrelated functional networks. Proc Natl Acad Sci U S A 102:9673–9678.

      Gao S, Mishne G, Scheinost D. 2021. Nonlinear manifold learning in functional magnetic resonance imaging uncovers a low-dimensional space of brain dynamics. Hum Brain Mapp 42:4510–4524.

      Goodale SE, Ahmed N, Zhao C, de Zwart JA, Özbay PS, Picchioni D, Duyn J, Englot DJ, Morgan VL, Chang C. 2021. Fmri-based detection of alertness predicts behavioral response variability. Elife 10:1–20.

      Greene AS, Horien C, Barson D, Scheinost D, Constable RT. 2023. Why is everyone talking about brain state? Trends Neurosci.

      Greene DJ, Marek S, Gordon EM, Siegel JS, Gratton C, Laumann TO, Gilmore AW, Berg JJ, Nguyen AL, Dierker D, Van AN, Ortega M, Newbold DJ, Hampton JM, Nielsen AN, McDermott KB, Roland JL, Norris SA, Nelson SM, Snyder AZ, Schlaggar BL, Petersen SE, Dosenbach NUF. 2020. Integrative and Network-Specific Connectivity of the Basal Ganglia and Thalamus Defined in Individuals. Neuron 105:742-758.e6.

      Gu S, Pasqualetti F, Cieslak M, Telesford QK, Yu AB, Kahn AE, Medaglia JD, Vettel JM, Miller MB, Grafton ST, Bassett DS. 2015. Controllability of structural brain networks. Nat Commun 6:8414.

      Jayakumar M, Balusu C, Aly M. 2023. Attentional fluctuations and the temporal organization of memory. Cognition 235:105408.

      Ji E, Lee JE, Hong SJ, Shim W (2022). Idiosyncrasy of latent neural state dynamic in ASD during movie watching. Poster presented at the Society for Neuroscience 2022 Annual Meeting.

      Karapanagiotidis T, Vidaurre D, Quinn AJ, Vatansever D, Poerio GL, Turnbull A, Ho NSP, Leech R, Bernhardt BC, Jefferies E, Margulies DS, Nichols TE, Woolrich MW, Smallwood J. 2020. The psychological correlates of distinct neural states occurring during wakeful rest. Sci Rep 10:1–11.

      Liu X, Duyn JH. 2013. Time-varying functional network information extracted from brief instances of spontaneous brain activity. Proc Natl Acad Sci U S A 110:4392–4397.

      Liu X, Zhang N, Chang C, Duyn JH. 2018. Co-activation patterns in resting-state fMRI signals. Neuroimage 180:485–494.

      Lynn CW, Cornblath EJ, Papadopoulos L, Bertolero MA, Bassett DS. 2021. Broken detailed balance and entropy production in the human brain. Proc Natl Acad Sci 118:e2109889118.

      Margulies DS, Ghosh SS, Goulas A, Falkiewicz M, Huntenburg JM, Langs G, Bezgin G, Eickhoff SB, Castellanos FX, Petrides M, Jefferies E, Smallwood J. 2016. Situating the default-mode network along a principal gradient of macroscale cortical organization. Proc Natl Acad Sci U S A 113:12574–12579.

      Mesulam MM. 1998. From sensation to cognition. Brain 121:1013–1052.

      Munn BR, Müller EJ, Wainstein G, Shine JM. 2021. The ascending arousal system shapes neural dynamics to mediate awareness of cognitive states. Nat Commun 12:1–9.

      Raut R V., Snyder AZ, Mitra A, Yellin D, Fujii N, Malach R, Raichle ME. 2021. Global waves synchronize the brain’s functional systems with fluctuating arousal. Sci Adv 7.

      Rosenberg M, Noonan S, DeGutis J, Esterman M. 2013. Sustaining visual attention in the face of distraction: A novel gradual-onset continuous performance task. Attention, Perception, Psychophys 75:426–439.

      Rosenberg MD, Finn ES, Scheinost D, Papademetris X, Shen X, Constable RT, Chun MM. 2016. A neuromarker of sustained attention from whole-brain functional connectivity. Nat Neurosci 19:165–171.

      Rosenberg MD, Scheinost D, Greene AS, Avery EW, Kwon YH, Finn ES, Ramani R, Qiu M, Todd Constable R, Chun MM. 2020. Functional connectivity predicts changes in attention observed across minutes, days, and months. Proc Natl Acad Sci U S A 117:3797–3807.

      Saggar M, Shine JM, Liégeois R, Dosenbach NUF, Fair D. 2022. Precision dynamical mapping using topological data analysis reveals a hub-like transition state at rest. Nat Commun 13.

      Schaefer A, Kong R, Gordon EM, Laumann TO, Zuo X-N, Holmes AJ, Eickhoff SB, Yeo BTT. 2018. Local-Global Parcellation of the Human Cerebral Cortex from Intrinsic Functional Connectivity MRI. Cereb Cortex 28:3095–3114.

      Shine JM. 2019. Neuromodulatory Influences on Integration and Segregation in the Brain. Trends Cogn Sci 23:572–583.

      Shine JM, Bissett PG, Bell PT, Koyejo O, Balsters JH, Gorgolewski KJ, Moodie CA, Poldrack RA. 2016. The Dynamics of Functional Brain Networks: Integrated Network States during Cognitive Task Performance. Neuron 92:544–554.

      Shine JM, Breakspear M, Bell PT, Ehgoetz Martens K, Shine R, Koyejo O, Sporns O, Poldrack RA. 2019. Human cognition involves the dynamic integration of neural activity and neuromodulatory systems. Nat Neurosci 22:289–296.

      Smith SM, Fox PT, Miller KL, Glahn DC, Fox PM, Mackay CE, Filippini N, Watkins KE, Toro R, Laird AR, Beckmann CF. 2009. Correspondence of the brain’s functional architecture during activation and rest. Proc Natl Acad Sci 106:13040–13045.

      Song H, Emily FS, Rosenberg MD. 2021a. Neural signatures of attentional engagement during narratives and its consequences for event memory. Proc Natl Acad Sci 118:e2021905118.

      Song H, Park B-Y, Park H, Shim WM. 2021b. Cognitive and Neural State Dynamics of Narrative Comprehension. J Neurosci 41:8972–8990.

      Taghia J, Cai W, Ryali S, Kochalka J, Nicholas J, Chen T, Menon V. 2018. Uncovering hidden brain state dynamics that regulate performance and decision-making during cognition. Nat Commun 9:2505.

      Terashima H, Kihara K, Kawahara JI, Kondo HM. 2021. Common principles underlie the fluctuation of auditory and visual sustained attention. Q J Exp Psychol 74:705–715.

      Tian Y, Margulies DS, Breakspear M, Zalesky A. 2020. Topographic organization of the human subcortex unveiled with functional connectivity gradients. Nat Neurosci 23:1421–1432.

      Turnbull A, Karapanagiotidis T, Wang HT, Bernhardt BC, Leech R, Margulies D, Schooler J, Jefferies E, Smallwood J. 2020. Reductions in task positive neural systems occur with the passage of time and are associated with changes in ongoing thought. Sci Rep 10:1–10.

      Unsworth N, Robison MK. 2018. Tracking arousal state and mind wandering with pupillometry. Cogn Affect Behav Neurosci 18:638–664.

      Unsworth N, Robison MK. 2016. Pupillary correlates of lapses of sustained attention. Cogn Affect Behav Neurosci 16:601–615.

      van der Meer JN, Breakspear M, Chang LJ, Sonkusare S, Cocchi L. 2020. Movie viewing elicits rich and reliable brain state dynamics. Nat Commun 11:1–14.

      Van Essen DC, Smith SM, Barch DM, Behrens TEJ, Yacoub E, Ugurbil K. 2013. The WU-Minn Human Connectome Project: An overview. Neuroimage 80:62–79.

      Vidaurre D, Abeysuriya R, Becker R, Quinn AJ, Alfaro-Almagro F, Smith SM, Woolrich MW. 2018. Discovering dynamic brain networks from big data in rest and task. Neuroimage, Brain Connectivity Dynamics 180:646–656.

      Vidaurre D, Smith SM, Woolrich MW. 2017. Brain network dynamics are hierarchically organized in time. Proc Natl Acad Sci U S A 114:12827–12832.

      Yamashita A, Rothlein D, Kucyi A, Valera EM, Esterman M. 2021. Brain state-based detection of attentional fluctuations and their modulation. Neuroimage 236:118072.

      Yeo BTT, Krienen FM, Sepulcre J, Sabuncu MR, Lashkari D, Hollinshead M, Roffman JL, Smoller JW, Zöllei L, Polimeni JR, Fisch B, Liu H, Buckner RL. 2011. The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J Neurophysiol 106:1125–1165.

      Yousefi B, Keilholz S. 2021. Propagating patterns of intrinsic activity along macroscale gradients coordinate functional connections across the whole brain. Neuroimage 231:117827.

      Zhang S, Goodale SE, Gold BP, Morgan VL, Englot DJ, Chang C. 2023. Vigilance associates with the low-dimensional structure of fMRI data. Neuroimage 267.

    1. Author Response

      Reviewer #1 (Public Review):

      1) Although I found the introduction well written, I think it lacks some information or needs to develop more on some ideas (e.g., differences between the cerebellum and cerebral cortex, and folding patterns of both structures). For example, after stating that "Many aspects of the organization of the cerebellum and cerebrum are, however, very different" (1st paragraph), I think the authors need to develop more on what these differences are. Perhaps just rearranging some of the text/paragraphs will help make it better for a broad audience (e.g., authors could move the next paragraph up, i.e., "While the cx is unique to mammals (...)").

      We have added additional context to the introduction and developed the differences between cerebral and cerebellar cortex, also re-arranging the text as suggested.

      2) Given that the authors compare the folding patterns between the cerebrum and cerebellum, another point that could be mentioned in the introduction is the fact that the cerebellum is convoluted in every mammalian species (and non-mammalian spp as well) while the cerebrum tends to be convoluted in species with larger brains. Why is that so? Do we know about it (check Van Essen et al., 2018)? I think this is an important point to raise in the introduction and to bring it back into the discussion with the results.

      We now mention in the introduction the fact that the cerebellum is folded in mammals, birds and some fishes, and provide references to the relevant literature. We have also expanded our discussion about the reasons for cortical folding in the discussion, which now contains a subsection addressing the subject (this includes references to the work of Van Essen).

      3) In the results, first paragraph, what do the authors mean by the volume of the medial cerebellum? This needs clarification.

      We have modified the relevant section in the results, and made the definition of the medial cerebellum more clear indicating that we refer to the vermal region of the cerebellum.

      4) In the results: When the authors mention 'frequency of cerebellar folding', do they mean the degree of folding in the cerebellum? At least in non-mammalian species, many studies have tried to compare the 'degree or frequency of folding' in the cerebellum by different proxies/measurements (see Iwaniuk et al., 2006; Yopak et al., 2007; Lisney et al., 2007; Yopak et al., 2016; Cunha et al., 2022). Perhaps change the phrase in the second paragraph of the result to: "There are no comparative analyses of the frequency of cerebellar folding in mammals, to our knowledge".

      We have modified the subsection in the methods referring to the measurement of folial width and folial perimeter to make the difference more clear. The folding indices that have been used previously (which we cite) are based on Zilles’s gyrification index. This index provides only a global idea of degree of folding, but it’s unable to distinguish a cortex with profuse shallow folds from one with a few deep ones. An example of this is now illustrated in Fig. 3d, where we also show how that problem is solved by the use of our two measurements (folial width and perimeter). The problem is also discussed in the section about the measurement of folding in the discussion section:

      “Previous studies of cerebellar folding have relied either on a qualitative visual score (Yopak et al. 2007, Lisney et al. 2008) or a “gyrification index” based on the method introduced by Zilles et al. (1988, 1989) for the study of cerebral folding (Iwaniuk et al. 2006, Cunha et al. 2020, 2021). Zilles’s gyrification index is the ratio between the length of the outer contour of the cortex and the length of an idealised envelope meant to reflect the length of the cortex if it were not folded. For instance, a completely lissencephalic cortex would have a gyrification index close to 1, while a human cerebral cortex typically has a gyrification index of ~2.5 (Zilles et al. 1988). This method has certain limitations, as highlighted by various researchers (Germanaud et al. 2012, 2014, Rabiei et al. 2018, Schaer et al. 2008, Toro et al. 2008, Heuer et al. 2019). One important drawback is that the gyrification index produces the same value for contours with wide variations in folding frequency and amplitude, as illustrated in Fig. 3d. In reality, folding frequency (inverse of folding wavelength) and folding amplitude represent two distinct dimensions of folding that cannot be adequately captured by a single number confusing both dimensions. To address this issue we introduced 2 measurements of folding: folial width and folial perimeter. These measurements can be directly linked to folding frequency and amplitude, and are comparable to the folding depth and folding wavelength we introduced previously for cerebral 3D meshes (Heuer et al. 2019). By using these measurements, we can differentiate folding patterns that could be confused when using a single value such as the gyrification index (Fig. 3d). Additionally, these two dimensions of folding are important, because they can be related to the predictions made by biomechanical models of cortical folding, as we will discuss now.”

      5) Sultan and Braitenberg (1993) measured cerebella that were sagittally sectioned (instead of coronal), right? Do you think this difference in the plane of the section could be one of the reasons explaining different results on folial width between studies? Why does the foliation index calculated by Sultan and Braitenberg (1993) not provide information about folding frequency?

      The measurement of foliation should be similar as far as enough folds are sectioned perpendicular to their main axis. This will be the case for folds in the medial cerebellum (vermis) sectioned sagittally, and for folds in the lateral cerebellum sectioned coronally. The foliation index of Sultan and Braitenberg does not provide a similar account of folding frequency as we do because they only measure groups of folia (what some called lamellae), whereas we measure individual folia. It is not easy to understand exactly how Sultan and Braitenberg proceeded from their paper. We contacted Prof. Fahad Sultan (we acknowledge his help in our manuscript). Author response image 1 provides a more clear description of their procedure:

      Author response image 1.

      As Author response image 1 shows, each of the structures that they call a fold is composed of several folia, and so their measurements are not comparable with ours which measure individual folia (a). The flattened representation (b) is made by stacking the lengths of the fold axes (dashed lines), separating them by the total length of each fold (the solid lines), which each may contain several folia.

      6) Another point that needs to be clarified is the log transformation of the data. Did the authors use log-transformed data for all types of analyses done in the study? Write this information in the material and methods.

      Yes, we used the log10 transformation for all our measurements. This is now mentioned in the methods section, and again in the section concerning allometry. We are including a link to all our code to facilitate exact replication of our entire method, including this transformation.

      7) The discussion needs to be expanded. The focus of the paper is on the folding pattern of the cerebellum (among different mammalian species) and its relationship with the anatomy of the cerebrum. Therefore, the discussion on this topic needs to be better developed, in my opinion (especially given the interesting results of this paper). For example, with the findings of this study, what can we say about how the folding of the cerebellum is determined across mammals? The authors found that the folial width, folial perimeter, and thickness of the molecular layer increase at a relatively slow rate across the species studied. Does this mean that these parameters have little influence on the cerebellar folding pattern? What mostly defines the folding patterns of the cerebellum given the results? Is it the interaction between section length and area? Can the authors explain why size does not seem to be a "limiting factor" for the folding of the cerebellum (for example, even relatively small cerebella are folded)? Is that because the 'white matter' core of the cerebellum is relatively small (thus more stress on it)?

      We have expanded the discussion as suggested, with subsections detailing the measuring of folding, the modelling of folding for the cerebrum and the cerebellum, and the role that cerebellar folding may play in its function. We refer to the literature on cortical folding modelling, and we discuss our results in terms of the factors that this research has highlighted as critical for folding. From the discussion subsection on models of cortical folding:

      “The folding of the cerebral cortex has been the focus of intense research, both from the perspective of neurobiology (Borrell 2018, Fernández and Borrell 2023) and physics (Toro and Burnod 2005, Tallinen et al. 2014, Kroenke and Bayly 2018). Current biomechanical models suggest that cortical folding should result from a buckling instability triggered by the growth of the cortical grey matter on top of the white matter core. In such systems, the growing layer should first expand without folding, increasing the stress in the core. But this configuration is unstable, and if growth continues stress is released through cortical folding. The wavelength of folding depends on cortical thickness, and folding models such as the one by Tallinen et al. (2014) predict a neocortical folding wavelength which corresponds well with the one observed in real cortices. Tallinen et al. (2014) provided a prediction for the relationship between folding wavelength λ and the mean thickness (𝑡) of the cortical layer: λ = 2π𝑡(µ/(3µ𝑠))1/3. (...)”

      From this biomechanical framework, our answers to the questions of the Reviewer would be:

      • How is the folding of the cerebellum determined across mammals? By the expansion of a layer of reduced thickness on top of an elastic layer (the white matter)

      • Folial width, folial perimeter, and thickness of the molecular layer increase at a relatively slow rate across the species studied. Does this mean that these parameters have little influence on the cerebellar folding pattern? On the contrary, that indicates that the shape of individual folia is stable, providing the smallest level of granularity of a folding pattern. In the extreme case where all folia had exactly the same size, a small cerebellum would have enough space to accommodate only a few folia, whereas a large cerebellum would accommodate many more.

      • What mostly defines the folding patterns of the cerebellum given the results? Is it the interaction between section length and area? It’s the mostly 2D expansion of the cerebellar cortical layer and its thickness.

      • Can the authors explain why size does not seem to be a "limiting factor" for the folding of the cerebellum? Because even a cerebellum of very small volume would fold if its cortex were thin enough and expanded sufficiently. That’s why the cerebellum folds even while being smaller than the cerebrum: because its cortex is much thinner.

      8) One caveat or point to be raised is the fact that the authors use the median of the variables measured for the whole cerebellum (e.g., median width and median perimeter across all folia). Although the cerebellum is highly uniform in its gross internal morphology and circuitry's organization across most vertebrates, there is evidence showing that the cerebellum may be organized in different functional modules. In that way, different regions or folia of the cerebellum would have different olivo-cortico-nuclear circuitries, forming, each one, a single cerebellar zone. Although it is not completely clear how these modules/zones are organized within the cerebellum, I think the authors could acknowledge this at the end of their discussion, and raise potential ideas for future studies (e.g., analyse folding of the cerebellum within the brain structure - vermis vs lateral cerebellum, for example). I think this would be a good way to emphasize the importance of the results of this study and what are the main questions remaining to be answered. For example, the expansion of the lateral cerebellum in mammals is suggested to be linked with the evolution of vocal learning in different clades (see Smaers et al., 2018). An interesting question would be to understand how foliation within the lateral cerebellum varies across mammalian clades and whether this has something to do with the cellular composition or any other aspect of the microanatomy as well as the evolution of different cognitive skills in mammals.

      We now address this point in a subsection of the discussion which details the implications of our methodological decisions and the limitations of our approach. It is true that the cerebellum is regionally variable. Our measurements of folial width, folial perimeter and molecular layer thickness are local, and we should be able to use them in the future to study regional variation. However, this comes with a number of difficulties. First, it would require sampling all the cerebellum (and the cerebrum) and not just one section. But even if that were possible that would increase the number of phenotypes, beyond the current scope of this study. Our central question about brain folding in the cerebellum compared to the cerebrum is addressed by providing data for a substantial number of mammalian species. As indicated by Reviewer #3, adding more variables makes phylogenetic comparative analyses very difficult because the models to fit become too large.

      Reviewer #2 (Public Review):

      1) The methods section does not address all the numerical methods used to make sense of the different brain metrics.

      We now provide more detailed descriptions of our measurements of foliation, phylogenetic models, analysis of partial correlations, phylogenetic principal components, and allometry. We have added illustrations (to Figs. 3 and 5), examples and references to the relevant literature.

      2) In the results section, it sometimes makes it difficult for the reader to understand the reason for a sub-analysis and the interpretation of the numerical findings.

      The revised version of our manuscript includes motivations for the different types of analyses, and we have also added a paragraph providing a guide to the structure of our results.

      3) The originality of the article is not sufficiently brought forward:

      a) the novel method to detect the depth of the molecular layer is not contextualized in order to understand the shortcomings of previously-established methods. This prevents the reader from understanding its added value and hinders its potential re-use in further studies.

      The revised version of the manuscript provides additional context which highlights the novelty of our approach, in particular concerning the measurement of folding and the use of phylogenetic comparative models. The limitations of the previous approaches are stated more clearly, and illustrated in Figs. 3 and 5.

      b) The numerous results reported are not sufficiently addressed in the discussion for the reader to get a full grasp of their implications, hindering the clarity of the overall conclusion of the article.

      Following the Reviewer’s advice, we have thoroughly restructured our results and discussion section.

      Reviewer #3 (Public Review):

      1) The first problem relates to their use of the Ornstein-Uhlenbeck (OU) model: they try fitting three evolutionary models, and conclude that the Ornstein-Uhlenbeck model provides the best fit. However, it has been known for a while that OU models are prone to bias and that the apparent superiority of OU models over Brownian Motion is often an artefact, a problem that increases with smaller sample sizes. (Cooper et al (2016) Biological Journal of the Linnean Society, 2016, 118, 64-77).

      Cooper et al.’s (2016) article “A Cautionary Note on the Use of Ornstein Uhlenbeck Models in Macroevolutionary Studies” suggests that comparing evolutionary models using the model’s likelihood leads often to incorrectly selecting OU over BM even for data generated from a BM process. However, Grabowski et al (2023) in their article ‘A Cautionary Note on “A Cautionary Note on the Use of Ornstein Uhlenbeck Models in Macroevolutionary Studies”’ suggest that Cooper et al.’s (2016) claim may be misleading. The work of Clavel et al. (2019) and Clavel and Morlon (2017) shows that the penalised framework implemented in mvMORPH can successfully recover the parameters of a multivariate OU process. To address more directly the concern of the Reviewer, we used simulations to evaluate the chances that we would decide for an OU model when the correct model was BM – a similar procedure to the one used by Cooper et al.’s (2016). However, instead of using the likelihood of the fitted models directly as Cooper et al. (2016) – which does not control for the number of parameters in the model – we used the Akaike Information Criterion, corrected for small sample sizes: AICc. The standard Akaike Information Criterion takes the number of parameters of the model into account, but this is not sufficient when the sample size is small. AICc provides a score which takes both aspects into account: model complexity and sample size. This information has been added to the manuscript:

      “We selected the best fitting model using the Akaike Information Criterion (AIC), corrected for 𝐴𝐼𝐶 = − 2 𝑙𝑜𝑔(𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑) + 2 𝑝. This approximation is insufficient when the𝑝 sample size small sample sizes (AICc). AIC takes into account the number of parameters in the model: is small, in which case an additional correction is required, leading to the corrected AIC: 𝐴𝐼𝐶𝑐 = 𝐴𝐼𝐶 + (2𝑝2 + 2𝑝)/(𝑛 − 𝑝 − 1), where 𝑛 is the sample size.”

      In 1000 simulations of 9 correlated multivariate traits for 56 species (i.e., 56*9 data points) using our phylogenetic tree, only 0.7% of the times we would decide for OU when the real model was BM.

      2) Second, for the partial correlations (e.g. fig 7) and Principal Components (fig 8) there is a concern about over-fitting: there are 9 variables and only 56 data points (violating the minimal rule of thumb that there should be >10 observations per parameter). Added to this, the inclusion of variables lacks a clear theoretical rationale. The high correlations between most variables will be in part because they are to some extent measuring the same things, e.g. the five different measures of cerebellar anatomy which include two measures of folial size. This makes it difficult to separate their effects. I get that the authors are trying to tease apart different aspects of size, but in practice, I think these results (e.g. the presence of negative coefficients in Fig 7) are really hard or impossible to interpret. The partial correlation network looks like a "correlational salad" rather than a theoretically motivated hypothesis test. It isn't clear to me that the PC analyses solve this problem, but it partly depends on the aims of these analyses, which are not made very clear.

      PCA is simply a rigid rotation of the data, distances among multivariate data points are all conserved. Neither our PCA nor our partial correlation analysis involve model fitting, the concept of overfitting does not apply. PCA and partial correlations are also not used here for hypothesis testing, but as exploratory methods which provide a transformation of the data aiming at capturing the main trends of multivariate change. The aim of our analysis of correlation structure is precisely to avoid the “correlational salad” that the Reviewer mentions. The Reviewer is correct: all our variables are correlated to a varying degree (note that there are 56 data points per variable = 56*9 data points, not just 56 data points). Partial correlations and PCA aim at providing a principled way in which correlated measurements can be explored. In the revised version of the manuscript we include a more detailed description of partial correlations and PCA (phylogenetic). Whenever variables measure the same thing, they will be combined into the same principal component (these are the combinations shown in Fig. 8 b and d). Additionally, two variables may be correlated because of their correlation with a third variable (or more). Partial correlations address this possibility by looking at the correlations between the residuals of each pair of variables after all other variables have been covaried out. We provide a simple example which should make this clear, providing in particular an intuition for the meaning of negative correlations:

      “All our phenotypes were strongly correlated. We used partial correlations to better understand pairwise relationships. The partial correlation between 2 vectors of measurements a and b is the correlation between their residuals after the influence of all other measurements has been covaried out. Even if the correlation between a and b is strong and positive, their partial correlation could be 0 or even negative. Consider, for example, 3 vectors of measurements a, b, c, which result from the combination of uncorrelated random vectors x, y, z. Suppose that a = 0.5 x + 0.2 y + 0.1 z, b = 0.5 x - 0.2 y + 0.1 z, and c = x. The measurements a and b will be positively correlated because of the effect of x and z. However, if we compute the residuals of a and b after covarying the effect of c (i.e., x), their partial correlation will be negative because of the opposite effect of y on a and b. The statistical significance of each partial correlation being different than 0 was estimated using the edge exclusion test introduced by Whittaker (1990).”

      The rationale for our analyses has been made more clear in the revised version of the manuscript, aided by the more detailed description of our methods. In particular, we describe better the reason for our 2 measurements of folial shape – width and perimeter – which measure independent dimensions of folding (this is illustrated in Fig. 3d).

      3) The claim of concerted evolution between cortical and cerebellar values (P 11-12) seems to be based on analyses that exclude body size and brain size. It, therefore, seems possible - or even likely - that all these analyses reveal overall size effects that similarly influence the cortex and cerebellum. When the authors state that they performed a second PC analysis with body and brain size removed "to better understand the patterns of neuroanatomical evolution" it isn't clear to me that is what this achieves. A test would be a model something like [cerebellar measure ~ cortical measure + rest of the brain measure], and this would deal with the problem of 'correlation salad' noted below.

      The answer to this question is in the partial correlation diagram in Fig. 7c. This analysis does not exclude body weight nor brain weight. It shows that the strong correlation between cerebellar area and length is supported by a strong positive partial correlation, as is the link between cerebral area and length. There is a significant positive partial correlation between cerebellar section area and cerebral section length. That is, even after covarying everything else, there is still a correlation between cerebellar section area and cerebral section length (this partial correlation is equivalent to the suggestion of the Reviewer). Additionally, there is a positive partial correlation between body weight and cerebellar section area, but not significant partial correlation between body weight and cerebral section area or length. Our approach aims at obtaining a general view of all the relationships in the data. Testing an individual model would certainly decrease the number of correlations, however, it would provide only a partial view of the problem.

      4) It is not quite clear from fig 6a that the result does indeed support isometry between the data sets (predicted 2/3 slope), and no coefficient confidence intervals are provided.

      We have now added the numerical values of the CIs to all our plots in addition to the graphical representations (grey regions) in the previous version of the manuscript. The isometry slope (0.67) is either within the CIs (both for the linear and orthogonal regressions) or at the margin, indicating that if the relationships are not isometric, they are very close to it.

      Referencing/discussion/attribution of previous findings

      5) With respect to the discussion of the relationship between cerebellar architecture and function, and given the emphasis here on correlated evolution with cortex, Ramnani's excellent review paper goes into the issues in considerable detail, which may also help the authors develop their own discussion: Ramnani (2006) The primate cortico-cerebellar system: anatomy and function. Nature Reviews Neuroscience 7, 511-522 (2006)

      We have added references to the work of Ramnani.

      6) The result that humans are outliers with a more folded cerebellum than expected is interesting and adds to recent findings highlighting evolutionary changes in the hominin human cerebellum, cerebellar genes, and epigenetics. Whilst Sereno et al (2020) are cited, it would be good to explain that they found that the human cerebellum has 80% of the surface area of the cortex.

      We have added this information to the introduction:

      “In humans, the cerebellum has ~80% of the surface area of the cerebral cortex (Sereno et al. 2020), and contains ~80% of all brain neurons, although it represents only ~10% of the brain mass (Azevedo et al. 2009)”

      7) It would surely also be relevant to highlight some of the molecular work here, such as Harrison & Montgomery (2017). Genetics of Cerebellar and Neocortical Expansion in Anthropoid Primates: A Comparative Approach. Brain Behav Evol. 2017;89(4):274-285. doi: 10.1159/000477432. Epub 2017 (especially since this paper looks at both cerebellar and cortical genes); also Guevara et al (2021) Comparative analysis reveals distinctive epigenetic features of the human cerebellum. PLoS Genet 17(5): e1009506. https://doi.org/10.1371/journal. pgen.1009506. Also relevant here is the complex folding anatomy of the dentate nucleus, which is the largest structure linking cerebellum to cortex: see Sultan et al (2010) The human dentate nucleus: a complex shape untangled. Neuroscience. 2010 Jun 2;167(4):965-8. doi: 10.1016/j.neuroscience.2010.03.007.

      The information is certainly important, and could have provided a wider perspective on cerebellar evolution, but we would prefer to keep a focus on cerebellar anatomy and address genetics only indirectly through phylogeny.

      8) The authors state that results confirm previous findings of a strong relationship between cerebellum and cortex (P 3 and p 16): the earliest reference given is Herculano-Houzel (2010), but this pattern was discovered ten years earlier (Barton & Harvey 2000 Nature 405, 1055-1058. https://doi.org/10.1038/35016580; Fig 1 in Barton 2002 Nature 415, 134-135 (2002). https://doi.org/10.1038/415134a) and elaborated by Whiting & Barton (2003) whose study explored in more detail the relationship between anatomical connections and correlated evolution within the cortico-cerebellar system (this paper is cited later, but only with reference to suggestions about the importance of functions of the cerebellum in the context of conservative structure, which is not its main point). In fact, Herculano-Houzel's analysis, whilst being the first to examine the question in terms of numbers of neurons, was inconclusive on that issue as it did not control for overall size or rest of the brain (A subsequent analysis using her data did, and confirmed the partially correlated evolution - Barton 2012, Philos Trans R Soc Lond B Biol Sci. 367:2097-107. doi: 10.1098/rstb.2012.0112.)

      We apologise for this oversight, these references are now included.

    1. Author Response:

      Reviewer #1:

      The authors present an interesting concept for the mechanism of rash induction in EGFR inhibitor (EGFRi) treated rats. EGFRi causes production of pro-inflammatory factors in epidermal keratinocytes which may induce dedifferentiation and reduction of the dWAT compartment, presumably mediated via PPAR. Factors produced by dedifferentiated FB then recruit monocytes thereby inducing skin inflammation. This work is aiming to improve targeted cancer therapy efficiency and is therefore of potential clinical relevance.

      However, most of the conclusions drawn by the authors are based on correlations, e.g. between the amount of dWAT and rash intensity. Mechanistic data have been mainly generated in vitro. The exact order of events to formulate a definitive mechanistic proof in vivo for this hypothesis is missing. In particular, it is not clear which cells in the skin, apart from keratinocytes, are specifically targeted by EGFR inhibitors and/or by Rosiglitazone. The authors also do not show EGFR staining in adipocytes and its inhibition by Afa. The effects of Afa and Rosi on monocytes / macrophages are completely ignored by the authors. Additionally, some of the presented results are overinterpreted and not really supporting what is claimed.

      Most importantly, the whole study is based on inhibitor treatments. Afatinib for example is not only inhibiting EGFR but all other erbB family members and as such it represents a panErbB inhibitor and it is not clear whether the observed effects are induced by inhibition of EGFR of other erbB receptors which have been shown to have also effects in the skin. For further specification of the role of EGFR, other, more specific inhibitors should be used to confirm the basic concept along with genetic proof either in genetically engineered mice or by Crispr-mediated-deletion.

      To further support the hypotheses of the authors, the study needs to be further substantiated by mechanistic experiments and the clinical relevance should be strengthened by performing histologic analysis of skin samples of patients treated with EGFRi and respective analysis of rash and e.g. BMI etc.

      Thanks for your positive comments on the potential impact for cancer patients suffering EGFR inhibitor induced skin rash. We have carefully considered all comments from the reviewer and revised our manuscript accordingly. In the following section, we summarize our responses to each comment of the reviewer. We believe that our responses have well addressed all concerns from the reviewer.

      We agree with the reviewer’s comment that our research may need more direct mechanistic in vivo studies upon our in vitro results. In our research, we have collected evidence from previous studies and used various in vitro and ex vivo experiments to investigate our findings. However, the study was still limited by currently available technologies.

      In the revised version, we supplemented the pEGFR and pERK staining of adipocytes in Figure 3-figure supplement 1C. The levels of phospho-EGFR and ERK in dWAT were significantly decreased after EGFRi treatment.

      This study was inspired by the observations of the unusual dWAT reduction during EGFRi treatment, thus we focused on the investigation of dermal adipocytes. In addition, the roles of mastocytes, monocytes, and macrophages in EGFRi-induced cutaneous toxicity have been thought as responders to increased expressions of cytokines. Local depletion of macrophages and degranulated mastocytes just provided partial resolution, indicating a multifactorial and complicated pathology of cutaneous toxicity induced by anti-EGFR therapy(Lichtenberger et al., 2013; Mascia et al., 2013).

      In terms of some inappropriate descriptions, we agree with the reviewer that they will be more convincing if there is a direct assessment from genetically engineered mice. For example, we tried to establish the relationship between S. aureus infection and EGFRi-induced rash based on a well-accepted study from Lingjuan Zhang (Zhang et al., 2015). They reported that adipose precursor cells secret antimicrobial peptide cathelicidin during differentiation to against S. aureus infection. Mice with impaired adipogenesis were more susceptible to S. aureus infection. This conclusion gave us insights into the relationship between S. aureus infection and EGFRi-induced skin inflammation. Unfortunately, the anti-CAMP antibody was made by the author’s lab and there are no mature products that can recognize CAMP in rats. To provide more mechanistic evidences, we conducted qPCR experiments to study the transcriptional level of the Camp gene both in dWAT and dFB cells isolated from rat skin (Figure 3I and 3J). dWAT in Afa group showed a lower expression level of Camp compared with control group. In addition, in different differentiation stages of dFB in vitro, transcriptional levels of Camp were decreased by Afa treatment while increased by Rosi. In summary, the data we collected could verify the causal relationship between EGFRi-induced dWAT reduction and S. aureus infection to some extent. However, the limitation of the technology is an obstacle for us to provide more evidences. Thus, in the revised manuscript, we have edited our writing to make the statement not that strong.

      According to the clinical evidence, the rash can also be induced by many specific Erbb1 inhibitors. All three generations of EGFR inhibitors in the clinic have very high incidence rates of cutaneous toxicity (Supplementary file 1). In the revised version, we provided rash models induced by both first-generation EGFRi, Erlotinib, Gefitinib, and the third-generation EGFRi, Osimertinib. As shown in Figure 1-figure supplement 1D, the rash caused by Erlotinib, Gefitinib, and Osimertinib had the same phenotypes as Afatinib-induced rash.

      In summary, the current form of evidences should support our findings, even more direct mechanistic studies would be better. We are now seeking the opportunity for cooperation to build a dermal adipocyte knockout mouse model platform and hope to investigate the specific roles of dermal adipocytes in the future. We also plan to have cooperation with hospitals to explore the clinical evidence of patients receiving EGFR inhibitors.

      References:

      Lichtenberger BM, Gerber PA, Holcmann M, Buhren BA, Amberg N, Smolle V, Schrumpf H, Boelke E, Ansari P, Mackenzie C, Wollenberg A, Kislat A, Fischer JW, Röck K, Harder J, Schröder JM, Homey B, Sibilia M. 2013. Epidermal EGFR controls cutaneous host defense and prevents inflammation. Sci Transl Med 5.

      Mascia F, Lam G, Keith C, Garber C, Steinberg SM, Kohn E, Yuspa SH. 2013. Genetic ablation of epidermal EGFR reveals the dynamic origin of adverse effects of anti-EGFR therapy. Sci Transl Med 5.

      Zhang L, Guerrero-juarez CF, Hata T, Bapat SP, Ramos R, Plikus M V, Gallo RL. 2015. Dermal adipocytes protect against invasive Staphylococcus aureus skin infection. Science 347:67–72.

      Reviewer #2:

      Leying Chen et al. investigated the mechanism of EGFR inhibitor-induced rash. They find that atrophy of dermal white adipose tissue (dWAT), a highly plastic adipose tissue with various skin-specific functions, correlates with rash occurrence and exacerbation in a murine model. The data indicate that EGFR inhibition induces the dedifferentiation of dWAT and lipolysis , finally lead to dWAT reduction which is a hallmark of the pathophysiology of rash. Notably, they demonstrate that stimulating dermal adipocyte expansion with a high-fat diet (HFD) or the pharmacological PPARγ agonist rosiglitazone (Rosi) ameliorated the severity of rash. Therefore, PPARγ agonists may represent a promising new therapeutic strategy in the treatment of EGFRI-related skin disorders pending to be confirmed in further study.

      We greatly appreciate the reviewer for giving the above positive comments.

      The conclusions of this paper are mostly well supported by data, but some results need to be clarified and verified.

      1) PPAR signaling in the pathology of EGFRI-induced skin toxicity. In figure 2 , the results show Rosi reversed the dedifferentiation of dermal adipocytes induced by Afa. This may due to PPARγ upregulation but not be confirmed in the results. The relative genes expression in dWAT after treated with Afa and ROSi were not demonstrated in the results.

      We thank the reviewer for reminding us for additional experiment of PPARγ. In the revised version, we collected attatched-dWAT after 5-day Afa or Rosi treatment, and performed transcriptional experiment of Pparg. The expression level of Pparg was downregulated by Afa treatment and upregulated by Rosi treatment (Figure 2-figure supplement 1D).

      2) the effect of PPAR signaling on PDGFRA-PI3K-AKT pathway The AKT pathway is a key downstream target of EGFR kinase, so it is reasonable to see p-AKT1 and p-AKT2 levels were decreased by Afa (figure 3C) However, addition of Rosi to Afa significantly activated both AKT1 and AKT2 . What is the underlying mechanism for the results and whether it is related to the PPAR signaling pathway.

      Given the importance of the PI3K/AKT pathway in regulating AP and mature adipocyte biology(Jeffery et al., 2015), we used p-AKT to characterize the activation of dFBs. The mechanism of how modulating PPARγ affects AKT is still unknown. One study found that MAPK and PI3K are upregulated and activated by rosiglitazone that in turn might enhance adipogenesis(Fayyad et al., 2019). In skeletal muscle, PPARγ enhances insulin-stimulated PI3K and Akt activation(Marx et al., 2004). It is also reported rosiglitazone has a neuroprotection effect against oxidative stress. The PPARγ-rosiglitazone complex binds to the neurotrophic factor-α1 (NF-α1) promoter and activates the transcription of NF-α1 mRNA which is then translated to the protein. NF-α1 binds to a cognate receptor and activates the AKT and ERK pathways(Thouennon et al., 2015). Thus, further studies should be carried out to investigate the effects of rosiglitazone to PI3K/AKT pathway on adipogenesis.

      3) According to figure 3 F , 3G and 3H., authors draw a conclusion that " a lack of APs and mature dWAT impairs the maintenance of the host defense and hair growth in the skin" In my opinion, there are no results can directly prove this. According to figure 3H, the impairment of hair growth may be caused by EGFR inhibition of hair follicles.

      We appreciate the reviewer for pointing this important point out. We tried to establish the relationship between S. aureus infection and EGFRI-induced rash based on a well-accepted study from Lingjuan Zhang (Zhang et al., 2015). They reported that adipose precursor cells secret antimicrobial peptide cathelicidin during differentiation to against S. aureus infection. Mice with impaired adipogenesis were more susceptible to S. aureus infection. This conclusion gave us insights into the relationship between S. aureus infection and EGFRI-induced skin inflammation. Unfortunately, the anti-CAMP antibody was made by the author’s lab and there are no mature products that can recognize CAMP in rats. To provide more mechanistic evidences, we conducted qPCR experiments to study the transcriptional level of the Camp gene both in dWAT and dFB cells isolated from rat skin (Figure 3I and 3J). dWAT in Afa group showed a lower expression level of Camp compared with control group. In addition, in different differentiation stages of dFB in vitro, transcriptional levels of Camp were decreased by Afa treatment while increased by Rosi. In summary, the data we collected depending on the current technology could verify the causal relationship between EGFRI-induced dWAT reduction and S. aureus infection to some extent. However, we agree with the reviewer that this conclusion needs more direct evidence. Thus, in the revised manuscript, we have edited our writing to make the statement not that strong.

      Since recent reports have shown that dermal adipocytes have the capacity to support hair regeneration, we used this conclusion to characterize the function of dWAT. However, we agree with the reviewer that it needs more specific and direct experiments to verify the causality with dWAT. And we are seeking the opportunity for cooperation to build a dermal adipocyte knockout mouse model platform and hope to investigate the specific roles of dermal adipocytes in the future. In the revised manuscript, we also adjusted the statements.

      4) EGFRI stimulates keratinocytes (HaCaT cells) to produce lipolytic cytokines (IL-6) (Figure 4G). IL6 enhanced the lipolysis of differentiated dFB (Figure S4M) and C18 fatty acids were supposed to be released the cell matrix during lipolysis. In figure 4H, HaCaTcells supernatants and dFB supernatants were collected. IL-6 was supposed to increase in HaCaTcells supernatants and was confirmed in Figure 4SK and S4L.However, C18 fatty acids were not showed to be in the dFB supernatants in the study directly.

      We thank the reviewer for pointing this out. We conducted additional lipidomics of dFB supernatants. However, because the differentiation medium needs to be changed every two days, it is hard to accumulate enough FFAs. We collected supernatants on Day3, Day 6, and Day 9. They were all below the detection limit of mass spectrum. We agree with the reviewer that more evidences are needed to prove the correlation between C18 FFAs and lipolysis. Therefore, we performed a mass spectrometry analysis of skin tissues from Ctrl and Afa groups after 3-day treatment to confirm the releasing of C18 FFAs. The result showed an increased tendency of C18:2 and other FFAs in the Afa group (Figure 1 in response letter). However, this increase had no significant statistic difference. This might be due to the interference of sebaceous gland and dermal adipocytes. In consequence, we adjusted the descriptions in the revised manuscript to make this statement not that strong.

      Figure 1. C18 concentrations in skin tissues from Ctrl and Afa groups after 3-day treatment. n=3.

      References:

      Fayyad AM, Khan AA, Abdallah SH, Alomran SS, Bajou K, Khattak MNK. 2019. Rosiglitazone Enhances Browning Adipocytes in Association with MAPK and PI3-K Pathways During the Differentiation of Telomerase-Transformed Mesenchymal Stromal Cells into Adipocytes. Int J Mol Sci 20.

      Jeffery E, Church CD, Holtrup B, Colman L, Rodeheffer MS. 2015. Rapid depot-specific activation of adipocyte precursor cells at the onset of obesity. Nat Cell Biol 17:376–385.

      Marx N, Duez H, Fruchart J-C, Staels B. 2004. Peroxisome proliferator-activated receptors and atherogenesis: regulators of gene expression in vascular cells. Circ Res 94:1168–1178. Thouennon E, Cheng Y, Falahatian V, Cawley NX, Loh YP. 2015. Rosiglitazone-activated PPARγ induces neurotrophic factor-α1 transcription contributing to neuroprotection. J Neurochem 134:463–470.

      Zhang L, Guerrero-juarez CF, Hata T, Bapat SP, Ramos R, Plikus M V, Gallo RL. 2015. Dermal adipocytes protect against invasive Staphylococcus aureus skin infection. Science 347:67–72.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public Review):

      The authors present a number of deep learning models to analyse the dynamics of epithelia. In this way they want to overcome the time-consuming manual analysis of such data and also remove a potential operator bias. Specifically, they set up models for identifying cell division events and cell division orientation. They apply these tools to the epithelium of the developing Drosophila pupal wing. They confirm a linear decrease of the division density with time and identify a burst of cell division after healing of a wound that they had induced earlier. These division events happen a characteristic time after and a characteristic distance away from the wound. These characteristic quantities depend on the size of the wound.

      Strengths:

      The methods developed in this work achieve the goals set by the authors and are a very helpful addition to the toolbox of developmental biologists. They could potentially be used on various developing epithelia. The evidence for the impact of wounds on cell division is compelling.

      The methods presented in this work should prove to be very helpful for quantifying cell proliferation in epithelial tissues.

      We thank the reviewer for the positive comments!

      Reviewer #2 (Public Review):

      In this manuscript, the authors propose a computational method based on deep convolutional neural networks (CNNs) to automatically detect cell divisions in two-dimensional fluorescence microscopy timelapse images. Three deep learning models are proposed to detect the timing of division, predict the division axis, and enhance cell boundary images to segment cells before and after division. Using this computational pipeline, the authors analyze the dynamics of cell divisions in the epithelium of the Drosophila pupal wing and find that a wound first induces a reduction in the frequency of division followed by a synchronised burst of cell divisions about 100 minutes after its induction.

      Comments on revised version:

      Regarding the Reviewer's 1 comment on the architecture details, I have now understood that the precise architecture (number/type of layers, activation functions, pooling operations, skip connections, upsampling choice...) might have remained relatively hidden to the authors themselves, as the U-net is built automatically by the fast.ai library from a given classical choice of encoder architecture (ResNet34 and ResNet101 here) to generate the decoder part and skip connections.

      Regarding the Major point 1, I raised the question of the generalisation potential of the method. I do not think, for instance, that the optimal number of frames to use, nor the optimal choice of their time-shift with respect to the division time (t-n, t+m) (not systematically studied here) may be generic hyperparameters that can be directly transferred to another setting. This implies that the method proposed will necessarily require re-labeling, re-training and re-optimizing the hyperparameters which directly influence the network architecture for each new dataset imaged differently. This limits the generalisation of the method to other datasets, and this may be seen as in contrast to other tools developed in the field for other tasks such as cellpose for segmentation, which has proven a true potential for generalisation on various data modalities. I was hoping that the authors would try themselves testing the robustness of their method by re-imaging the same tissue with slightly different acquisition rate for instance, to give more weight to their work.

      We thank the referee for the comments. Regarding this particular biological system, due to photobleaching over long imaging periods (and the availability of imaging systems during the project), we would have difficulty imaging at much higher rates than the 2 minute time frame we currently use. These limitations are true for many such systems, and it is rarely possible to rapidly image for long periods of time in real experiments. Given this upper limit in framerate, we could, in principle, sample this data at a lower framerate, by removing time points of the videos but this typically leads to worse results. With some pilot data, we have tried to use fewer time intervals for our analysis but they always gave worse results. We found we need to feed the maximum amount of information available into the model to get the best results (i.e. the fastest frame rate possible, given the data available). Our goal is to teach the neural net to identify dynamic space-time localised events from time lapse videos, in which the duration of an event is a key parameter. Our division events take 10 minutes or less to complete therefore we used 5 timepoints in the videos for the deep learning model. If we considered another system with dynamic events which have a duration T when we would use T/t timepoints where t is the minimum time interval (for our data t=2min). For example if we could image every minute we would use 10 timepoints. As discussed below, we do envision other users with different imaging setups and requirements may need to retrain the model for their own data and to help with this, we have now provided more detailed instructions how to do this (see later).

      In this regard, and because the authors claimed to provide clear instructions on how to reuse their method or adapt it to a different context, I delved deeper into the code and, to my surprise, felt that we are far from the coding practice of what a well-documented and accessible tool should be.

      To start with, one has to be relatively accustomed with Napari to understand how the plugin must be installed, as the only thing given is a pip install command (that could be typed in any terminal without installing the plugin for Napari, but has to be typed inside the Napari terminal, which is mentioned nowhere). Surprisingly, the plugin was not uploaded on Napari hub, nor on PyPI by the authors, so it is not searchable/findable directly, one has to go to the Github repository and install it manually. In that regard, no description was provided in the copy-pasted templated files associated to the napari hub, so exporting it to the hub would actually leave it undocumented.

      We thank the referee for suggesting the example of (DeXtrusion, Villars et al. 2023). We have endeavoured to produce similarly-detailed documentation for our tools. We now have clear instructions for installation requiring only minimal coding knowledge, and we have provided a user manual for the napari plug-in. This includes information on each of the options for using the model and the outputs they will produce. The plugin has been tested by several colleagues using both Windows and Mac operating systems.

      Author response image 1.

      Regarding now the python notebooks, one can fairly say that the "clear instructions" that were supposed to enlighten the code are really minimal. Only one notebook "trainingUNetCellDivision10.ipynb" has actually some comments, the other have (almost) none nor title to help the unskilled programmer delving into the script to guess what it should do. I doubt that a biologist who does not have a strong computational background will manage adapting the method to its own dataset (which seems to me unavoidable for the reasons mentioned above).

      Within the README file, we have now included information on how to retrain the models with helpful links to deep learning tutorials (which, indeed, some of us have learnt from) for those new to deep learning. All Jupyter notebooks now include more comments explaining the models.

      Finally regarding the data, none is shared publicly along with this manuscript/code, such that if one doesn't have a similar type of dataset - that must be first annotated in a similar manner - one cannot even test the networks/plugin for its own information. A common and necessary practice in the field - and possibly a longer lasting contribution of this work - could have been to provide the complete and annotated dataset that was used to train and test the artificial neural network. The basic reason is that a more performant, or more generalisable deep-learning model may be developed very soon after this one and for its performance to be fairly compared, it requires to be compared on the same dataset. Benchmarking and comparison of methods performance is at the core of computer vision and deep-learning.

      We thank the referee for these comments. We have now uploaded all the data used to train the models and to test them, as well as all the data used in the analyses for the paper. This includes many videos that were not used for training but were analysed to generate the paper’s results. The link to these data sets is provided in our GitHub page (https://github.com/turleyjm/cell-division-dl- plugin/tree/main). In the folder for the data sets and in the GitHub repository, we have included the Jupyter notebooks used to train the models and these can be used for retraining. We have made our data publicly available at Zenodo dataset https://zenodo.org/records/10846684 (added to last paragraph of discussion). We have also included scripts that can be used to compare the model output with ground truth, including outputs highlighting false positives and false negatives. Together with these scripts, models can be compared and contrasted, both in general and in individual videos. Overall, we very much appreciate the reviewer’s advice, which has made the plugin much more user- friendly and, hopefully, easier for other groups to train their own models. Our contact details are provided, and we would be happy to advise any groups that would like to use our tools.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors present a number of deep-learning models to analyse the dynamics of epithelia. In this way, they want to overcome the time-consuming manual analysis of such data and also remove a potential operator bias. Specifically, they set up models for identifying cell division events and cell division orientation. They apply these tools to the epithelium of the developing Drosophila pupal wing. They confirm a linear decrease of the division density with time and identify a burst of cell division after the healing of a wound that they had induced earlier. These division events happen a characteristic time after and a characteristic distance away from the wound. These characteristic quantities depend on the size of the wound.

      Strength:

      The methods developed in this work achieve the goals set by the authors and are a very helpful addition to the toolbox of developmental biologists. They could potentially be used on various developing epithelia. The evidence for the impact of wounds on cell division is solid.

      Weakness:

      Some aspects of the deep-learning models remained unclear, and the authors might want to think about adding details. First of all, for readers not being familiar with deep-learning models, I would like to see more information about ResNet and U-Net, which are at the base of the new deep-learning models developed here. What is the structure of these networks?

      We agree with the Reviewer and have included additional information on page 8 of the manuscript, outlining some background information about the architecture of ResNet and U-Net models.

      How many parameters do you use?

      We apologise for this omission and have now included the number of parameters and layers in each model in the methods section on page 25.

      What is the difference between validating and testing the model? Do the corresponding data sets differ fundamentally?

      The difference between ‘validating’ and ‘testing’ the model is validating data is used during training to determine whether the model is overfitting. If the model is performing well on the training data but not on the validating data, this a key signal the model is overfitting and changes will need to be made to the network/training method to prevent this. The testing data is used after all the training has been completed and is used to test the performance of the model on fresh data it has not been trained on. We have removed refence to the validating data in the main text to make it simpler and add this explanation to the methods. There is no fundamental (or experimental) difference between each of the labelled data sets; rather, they are collected from different biological samples. We have now included this information in the Methods text on page 24.

      How did you assess the quality of the training data classification?

      These data were generated and hand-labelled by an expert with many years of experience in identifying cell divisions in imaging data, to give the ground truth for the deep learning model.

      Reviewer #1 (Recommendations For The Authors):

      You repeatedly use 'new', 'novel' as well as 'surprising' and 'unexpected'. The latter are rather subjective and it is not clear based on what prior knowledge you make these statements. Unless indicated otherwise, it is understood that the results and methods are new, so you can delete these terms.

      We have deleted these words, as suggested, for almost all cases.

      p.4 "as expected" add a reference or explain why it is expected.

      A reference has now been included in this section, as suggested.

      p.4 "cell divisions decrease linearly with time" Only later (p.10) it turns out that you think about the density of cell divisions.

      This has been changed to "cell division density decreases linearly with time".

      p.5 "imagine is largely in one plane" while below "we generated a 3D z-stack" and above "our in vivo 3D image data" (p.4). Although these statements are not strictly contradictory, I still find them confusing. Eventually, you analyse a 2D image, so I would suggest that you refer to your in vivo data as being 2D.

      We apologise for the confusion here; the imaging data was initially generated using 3D z-stacks but this 3D data is later converted to a 2D focused image, on which the deep learning analysis is performed. We are now more careful with the language in the text.

      p.7 "We have overcome (...) the standard U-Net model" This paragraph remains rather cryptic to me. Maybe you can explain in two sentences what a U-Net is or state its main characteristics. Is it important to state which class you have used at this point? Similarly, what is the exact role of the ResNet model? What are its characteristics?

      We have included more details on both the ResNet and U-Net models and how our model incorporates properties from them on Page 8.

      p.8 Table 1 Where do I find it? Similarly, I could not find Table 2.

      These were originally located in the supplemental information document, but have been moved to the main manuscript.

      p.9 "developing tissue in normal homeostatic conditions" Aren't homeostatic and developing contradictory? In one case you maintain a state, in the other, it changes.

      We agree with the Reviewer and have removed the word ‘homeostatic’.

      p.9 "Develop additional models" I think 'models' refers to deep learning models, not to physical models of epithelial tissue development. Maybe you can clarify this?

      Yes, this is correct; we have phrased this better in the text.

      p.12 "median error" median difference to the manually acquired data?

      Yes, and we have made this clearer in the text, too.

      p.12 "we expected to observe a bias of division orientation along this axis" Can you justify the expectation? Elongated cells are not necessarily aligned with the direction of a uniaxially applied stress.

      Although this is not always the case, we have now included additional references to previous work from other groups which demonstrated that wing epithelial cells do become elongated along the P/D axis in response to tension.

      p.14 "a rather random orientation" Please, quantify.

      The division orientations are quantified in Fig. 4F,G; we have now changed our description from ‘random’ to ‘unbiased’.

      p.17 "The theories that must be developed will be statistical mechanical (stochastic) in nature" I do not understand. Statistical mechanics refers to systems at thermodynamic equilibrium, stochastic to processes that depend on, well, stochastic input.

      We have clarified that we are referring to non-equilibrium statistical mechanics (the study of macroscopic systems far from equilibrium, a rich field of research with many open problems and applications in biology).

      Reviewer #2 (Public Review):

      In this manuscript, the authors propose a computational method based on deep convolutional neural networks (CNNs) to automatically detect cell divisions in two-dimensional fluorescence microscopy timelapse images. Three deep learning models are proposed to detect the timing of division, predict the division axis, and enhance cell boundary images to segment cells before and after division. Using this computational pipeline, the authors analyze the dynamics of cell divisions in the epithelium of the Drosophila pupal wing and find that a wound first induces a reduction in the frequency of division followed by a synchronised burst of cell divisions about 100 minutes after its induction.

      In general, novelty over previous work does not seem particularly important. From a methodological point of view, the models are based on generic architectures of convolutional neural networks, with minimal changes, and on ideas already explored in general. The authors seem to have missed much (most?) of the literature on the specific topic of detecting mitotic events in 2D timelapse images, which has been published in more specialized journals or Proceedings. (TPMAI, CCVPR etc., see references below). Even though the image modality or biological structure may be different (non-fluorescent images sometimes), I don't believe it makes a big difference. How the authors' approach compares to this previously published work is not discussed, which prevents me from objectively assessing the true contribution of this article from a methodological perspective.

      On the contrary, some competing works have proposed methods based on newer - and generally more efficient - architectures specifically designed to model temporal sequences (Phan 2018, Kitrungrotsakul 2019, 2021, Mao 2019, Shi 2020). These natural candidates (recurrent networks, long-short-term memory (LSTM) gated recurrent units (GRU), or even more recently transformers), coupled to CNNs are not even mentioned in the manuscript, although they have proved their generic superiority for inference tasks involving time series (Major point 2). Even though the original idea/trick of exploiting the different channels of RGB images to address the temporal aspect might seem smart in the first place - as it reduces the task of changing/testing a new architecture to a minimum - I guess that CNNs trained this way may not generalize very well to videos where the temporal resolution is changed slightly (Major point 1). This could be quite problematic as each new dataset acquired with a different temporal resolution or temperature may require manual relabeling and retraining of the network. In this perspective, recent alternatives (Phan 2018, Gilad 2019) have proposed unsupervised approaches, which could largely reduce the need for manual labeling of datasets.

      We thank the reviewer for their constructive comments. Our goal is to develop a cell detection method that has a very high accuracy, which is critical for practical and effective application to biological problems. The algorithms need to be robust enough to cope with the difficult experimental systems we are interested in studying, which involve densely packed epithelial cells within in vivo tissues that are continuously developing, as well as repairing. In response to the above comments of the reviewer, we apologise for not including these important papers from the division detection and deep learning literature, which are now discussed in the Introduction (on page 4).

      A key novelty of our approach is the use of multiple fluorescent channels to increase information for the model. As the referee points out, our method benefits from using and adapting existing highly effective architectures. Hence, we have been able to incorporate deeper models than some others have previously used. An additional novelty is using this same model architecture (retrained) to detect cell division orientation. For future practical use by us and other biologists, the models can easily be adapted and retrained to suit experimental conditions, including different multiple fluorescent channels or number of time points. Unsupervised approaches are very appealing due to the potential time saved compared to manual hand labelling of data. However, the accuracy of unsupervised models are currently much lower than that of supervised (as shown in Phan 2018) and most importantly well below the levels needed for practical use analysing inherently variable (and challenging) in vivo experimental data.

      Regarding the other convolutional neural networks described in the manuscript:

      (1) The one proposed to predict the orientation of mitosis performs a regression task, predicting a probability for the division angle. The architecture, which must be different from a simple Unet, is not detailed anywhere, so the way it was designed is difficult to assess. It is unclear if it also performs mitosis detection, or if it is instead used to infer orientation once the timing and location of the division have been inferred by the previous network.

      The neural network used for U-NetOrientation has the same architecture as U-NetCellDivision10 but has been retrained to complete a different task: finding division orientation. Our workflow is as follows: firstly, U-NetCellDivision10 is used to find cell divisions; secondly, U-NetOrientation is applied locally to determine the division orientation. These points have now been clarified in the main text on Page 14.

      (2) The one proposed to improve the quality of cell boundary images before segmentation is nothing new, it has now become a classic step in segmentation, see for example Wolny et al. eLife 2020.

      We have cited similar segmentation models in our paper and thank the referee for this additional one. We had made an improvement to the segmentation models, using GFP-tagged E-cadherin, a protein localised in a thin layer at the apical boundary of cells. So, while this is primarily a 2D segmentation problem, some additional information is available in the z-axis as the protein is visible in 2-3 separate z-slices. Hence, we supplied this 3-focal plane input to take advantage of the 3D nature of this signal. This approach has been made more explicit in the text (Pages 14, 15) and Figure (Fig. 2D).

      As a side note, I found it a bit frustrating to realise that all the analysis was done in 2D while the original images are 3D z-stacks, so a lot of the 3D information had to be compressed and has not been used. A novelty, in my opinion, could have resided in the generalisation to 3D of the deep-learning approaches previously proposed in that context, which are exclusively 2D, in particular, to predict the orientation of the division.

      Our experimental system is a relatively flat 2D tissue with the orientation of the cell divisions consistently in the xy-plane. Hence, a 2D analysis is most appropriate for this system. With the successful application of the 2D methods already achieving high accuracy, we envision that extension to 3D would only offer a slight increase in effectiveness as these measurements have little room for improvement. Therefore, we did not extend the method to 3D here. However, of course, this is the next natural step in our research as 3D models would be essential for studying 3D tissues; such 3D models will be computationally more expensive to analyse and more challenging to hand label.

      Concerning the biological application of the proposed methods, I found the results interesting, showing the potential of such a method to automatise mitosis quantification for a particular biological question of interest, here wound healing. However, the deep learning methods/applications that are put forward as the central point of the manuscript are not particularly original.

      We thank the referee for their constructive comments. Our aim was not only to show the accuracy of our models but also to show how they might be useful to biologists for automated analysis of large datasets, which is a—if not the—bottleneck for many imaging experiments. The ability to process large datasets will improve robustness of results, as well as allow additional hypotheses to be tested. Our study also demonstrated that these models can cope with real in vivo experiments where additional complications such as progressive development, tissue wounding and inflammation must be accounted for.

      Major point 1: generalisation potential of the proposed method.

      The neural network model proposed for mitosis detection relies on a 2D convolutional neural network (CNN), more specifically on the Unet architecture, which has become widespread for the analysis of biology and medical images. The strategy proposed here exploits the fact that the input of such an architecture is natively composed of several channels (originally 3 to handle the 3 RGB channels, which is actually a holdover from computer vision, since most medical/biological images are gray images with a single channel), to directly feed the network with 3 successive images of a timelapse at a time. This idea is, in itself, interesting because no modification of the original architecture had to be carried out. The latest 10-channel model (U-NetCellDivision10), which includes more channels for better performance, required minimal modification to the original U-Net architecture but also simultaneous imaging of cadherin in addition to histone markers, which may not be a generic solution.

      We believe we have provided a general approach for practical use by biologists that can be applied to a range of experimental data, whether that is based on varying numbers of fluorescent channels and/or timepoints. We envisioned that experimental biologists are likely to have several different parameters permissible for measurement based on their specific experimental conditions e.g., different fluorescently labelled proteins (e.g. tubulin) and/or time frames. To accommodate this, we have made it easy and clear in the code on GitHub how these changes can be made. While the model may need some alterations and retraining, the method itself is a generic solution as the same principles apply to very widely used fluorescent imaging techniques.

      Since CNN-based methods accept only fixed-size vectors (fixed image size and fixed channel number) as input (and output), the length or time resolution of the extracted sequences should not vary from one experience to another. As such, the method proposed here may lack generalization capabilities, as it would have to be retrained for each experiment with a slightly different temporal resolution. The paper should have compared results with slightly different temporal resolutions to assess its inference robustness toward fluctuations in division speed.

      If multiple temporal resolutions are required for a set of experiments, we envision that the model could be trained over a range of these different temporal resolutions. Of course, the temporal resolution, which requires the largest vector would be chosen as the model's fixed number of input channels. Given the depth of the models used and the potential to easily increase this by replacing resnet34 with resnet50 or resnet101 the model would likely be able to cope with this, although we have not specifically tested this. (page 27)

      Another approach (not discussed) consists in directly convolving several temporal frames using a 3D CNN (2D+time) instead of a 2D, in order to detect a temporal event. Such an idea shares some similarities with the proposed approach, although in this previous work (Ji et al. TPAMI 2012 and for split detection Nie et al. CCVPR 2016) convolution is performed spatio-temporally, which may present advantages. How does the authors' method compare to such an (also very simple) approach?

      We thank the Reviewer for this insightful comment. The text now discusses this (on Pages 8 and 17). Key differences between the models include our incorporation of multiple light channels and the use of much deeper models. We suggest that our method allows for an easy and natural extension to use deeper models for even more demanding tasks e.g. distinguishing between healthy and defective divisions. We also tested our method with ‘difficult conditions’ such as when a wound is present; despite the challenges imposed by the wound (including the discussed reduction in fluorescent intensities near the wound edge), we achieved higher accuracy compared to Nie et al. (accuracy of 78.5% compared to our F1 score of 0.964) using a low-density in vitro system.

      Major point 2: innovatory nature of the proposed method.

      The authors' idea of exploiting existing channels in the input vector to feed successive frames is interesting, but the natural choice in deep learning for manipulating time series is to use recurrent networks or their newer and more stable variants (LSTM, GRU, attention networks, or transformers). Several papers exploiting such approaches have been proposed for the mitotic division detection task, but they are not mentioned or discussed in this manuscript: Phan et al. 2018, Mao et al. 2019, Kitrungrotaskul et al. 2019, She et al 2020.

      An obvious advantage of an LSTM architecture combined with CNN is that it is able to address variable length inputs, therefore time sequences of different lengths, whereas a CNN alone can only be fed with an input of fixed size.

      LSTM architectures may produce similar accuracy to the models we employ in our study, however due to the high degree of accuracy we already achieve with our methods, it is hard to see how they would improve the understanding of the biology of wound healing that we have uncovered. Hence, they may provide an alternative way to achieve similar results from analyses of our data. It would also be interesting to see how LTSM architectures would cope with the noisy and difficult wounded data that we have analysed. We agree with the referee that these alternate models could allow an easier inclusion of difference temporal differences in division time (see discussion on Page 20). Nevertheless, we imagine that after selecting a sufficiently large input time/ fluorescent channel input, biologists could likely train our model to cope with a range of division lengths.

      Another advantage of some of these approaches is that they rely on unsupervised learning, which can avoid the tedious relabeling of data (Phan et al. 2018, Gilad et al. 2019).

      While these are very interesting ideas, we believe these unsupervised methods would struggle under the challenging conditions within ours and others experimental imaging data. The epithelial tissue examined in the present study possesses a particularly high density of cells with overlapping nuclei compared to the other experimental systems these unsupervised methods have been tested on. Another potential problem with these unsupervised methods is the difficulty in distinguishing dynamic debris and immune cells from mitotic cells. Once again despite our experimental data being more complex and difficult, our methods perform better than other methods designed for simpler systems as in Phan et al. 2018 and Gilad et al. 2019; for example, analysis performed on lower density in vitro and unwounded tissues gave best F1 scores for a single video was 0.768 and 0.829 for unsupervised and supervised respectively (Phan et al. 2018). We envision that having an F1 score above 0.9 (and preferably above 0.95), would be crucial for practical use by biologists, hence we believe supervision is currently still required. We expect that retraining our models for use in other experimental contexts will require smaller hand labelled datasets, as they will be able to take advantage of transfer learning (see discussion on Page 4).

      References :

      We have included these additional references in the revised version of our Manuscript.

      Ji, S., Xu, W., Yang, M., & Yu, K. (2012). 3D convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence, 35(1), 221-231. >6000 citations

      Nie, W. Z., Li, W. H., Liu, A. A., Hao, T., & Su, Y. T. (2016). 3D convolutional networks-based mitotic event detection in time-lapse phase contrast microscopy image sequences of stem cell populations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 55-62).

      Phan, H. T. H., Kumar, A., Feng, D., Fulham, M., & Kim, J. (2018). Unsupervised two-path neural network for cell event detection and classification using spatiotemporal patterns. IEEE Transactions on Medical Imaging, 38(6), 1477-1487.

      Gilad, T., Reyes, J., Chen, J. Y., Lahav, G., & Riklin Raviv, T. (2019). Fully unsupervised symmetry-based mitosis detection in time-lapse cell microscopy. Bioinformatics, 35(15), 2644-2653.

      Mao, Y., Han, L., & Yin, Z. (2019). Cell mitosis event analysis in phase contrast microscopy images using deep learning. Medical image analysis, 57, 32-43.

      Kitrungrotsakul, T., Han, X. H., Iwamoto, Y., Takemoto, S., Yokota, H., Ipponjima, S., ... & Chen, Y. W. (2019). A cascade of 2.5 D CNN and bidirectional CLSTM network for mitotic cell detection in 4D microscopy image. IEEE/ACM transactions on computational biology and bioinformatics, 18(2), 396-404.

      Shi, J., Xin, Y., Xu, B., Lu, M., & Cong, J. (2020, November). A Deep Framework for Cell Mitosis Detection in Microscopy Images. In 2020 16th International Conference on Computational Intelligence and Security (CIS) (pp. 100-103). IEEE.

      Wolny, A., Cerrone, L., Vijayan, A., Tofanelli, R., Barro, A. V., Louveaux, M., ... & Kreshuk, A. (2020). Accurate and versatile 3D segmentation of plant tissues at cellular resolution. Elife, 9, e57613.

    1. Author Response

      Reviewer #1 (Public Review):

      High resolution mechanistic studies would be instrumental in driving the development of Cas7-11 based biotechnology applications. This work is unfortunately overshadowed by a recent Cell publication (PMID: 35643083) describing the same Cas7-11 RNA-protein complex. However, given the tremendous interest in these systems, it is my opinion that this independent study will still be well cited, if presented well. The authors obviously have been trying to establish a unique angle for their story, by probing deeper into the mechanism of crRNA processing and target RNA cleavage. The study is carried out rigorously. The current version of the manuscript appears to have been rushed out. It would benefit from clarification and text polishing.

      We thank the reviewer for the positive and helpful comments that have made the manuscript more impactful.

      To summarize the revisions, we have resolved the metal-dependence issue, updated the maps in both main and supplementary figures that support the model, re-organized the labels for clarity, and added the comparison between our and Kato et al.’ structures.

      In addition, we describe a new result with an isolated C7L.1 fragment that retains the processing and crRNA binding activities.

      Reviewer #2 (Public Review):

      In this manuscript, Gowswami et al. solved a cryo-EM structure of Desulfonema ishimotonii Cas7-11 (DiCas7-11) bound to a guiding CRISPR RNA (crRNA) and target RNA. Cas7-11 is of interest due to its unusual architecture as a single polypeptide, in contrast to other type III CRISPR-Cas effectors that are composed of several different protein subunits. The authors have obtained a high-quality cryo-EM map at 2.82 angstrom resolution, allowing them to build a structural model for the protein, crRNA and target RNA. The authors used the structure to clearly identify a catalytic histidine residue in the Cas7-11 Cas7.1 domain that is important for crRNA processing activity. The authors also investigated the effects of metal ions and crRNAtarget base pairing on target RNA cleavage. Finally, the authors used their structure to guide engineering of a compact version of Cas7-11 in which an insertion domain that is disordered in the cryo-EM map was removed. This compact Cas7-11 appears to have comparable cleavage activity to the full-length protein.

      The cryo-EM map presented in this manuscript is generally of high quality and the manuscript is very well illustrated. However, some of the map interpretation requires clarification (outlined below). This structure will be valuable as there is significant interest in DiCas7-11 for biotechnology. Indeed, the authors have begun to engineer the protein based on observations from the structure. Although characterization of this engineered Cas7-11 is limited in this study and similar engineering was also performed in a recently published paper (PMID 35643083), this proof-of-principle experiment demonstrates the importance of having such structural information.

      The biochemistry experiments presented in the study identify an important residue for crRNA processing, and suggest that target RNA cleavage is not fully metal-ion dependent. Most of these conclusions are based on straightforward structure-function experiments. However, some results related to target RNA cleavage are difficult to interpret as presented. Overall, while the cryo-EM data presented in this work is of high quality, both the structural model and the biochemical results require further clarification as outlined below.

      We thank the reviewer for the positive and helpful comments that have made the manuscript more impactful.

      To summarize the revisions, we have resolved the metal-dependence issue, updated the maps in both main and supplementary figures that support the model, re-organized the labels for clarity, and added the comparison between our and Kato et al.’ structures.

      In addition, we describe a new result with an isolated C7L.1 fragment that retains the processing and crRNA binding activities.

      1. The DiCas7-11 structure bound to target RNA was also recently reported by Kato et al. (PMID 35643083). The authors have not cited this work or compared the two structures. While the structures are likely quite similar, it is notable that the structure reported in the current paper is for the wild-type protein and the sample was prepared under reactive conditions, resulting in a partially cleaved target. Kato et al. used a catalytically dead version of Cas7-11 in which the target RNA should remain fully intact. Are there differences in the Cas7-11 structure observed in the presence of a partially cleaved target RNA in comparison to the Kato et al. structure? Such a comparison is appropriate given the similarities between the two reports. A figure comparing the two structures could be included in the manuscript.

      We have added a paragraph on page 12 that describe the differences in preparation of the two complexes and their structures. We observed minor differences in the overall protein structure (r.m.s.d. 0.918 Å for 8114 atoms) but did observe quite different interactions between the protein and the first 5’-tag nucleotide (U(-15) vs. G(-15)) due to the different constructs in pre-crRNA, which suggests an importance of U(-15) in forming the processing-competent active site. We added Figure 2-figure supplementary 3 that illustrates the similarities and the differences.

      2.The cryo-EM density map is of high quality, but some of the structural model is not fully supported by the experimental data (e.g. protein loops from the alphafold model were not removed despite lack of cryo-EM density). Most importantly, there is little density for the target RNA beyond the site 1 cleavage site, suggesting that the RNA was cleaved and the product was released. However, this region of the RNA was included in the structural model. It is unclear what density this region of the target RNA model was based on. Further discussion of the interpretation of the partially cleaved target RNA is necessary. Were 3D classes observed in various states of RNA cleavage and with varied density for the product RNAs?

      We should have made it clear in the Method that multiple maps were used in building the structure but only submitted the post-processed map to reviewers. When using the Relion 4.0’s local resolution estimation-generated map, we observed sufficient density for some of the regions the reviewer is referring to. For instance, the site 1 cleavage density does support the model for the two nucleotides beyond site 1 cleavage site (see the revised Figure 1 & Figure 1- figure supplement 3).

      However, there are protein loops that remain lack of convincing density. These include 134141 and 1316-1329 that are now removed from the final coordinate.

      The “partially cleaved target RNA” phrase is a result of weak density for nucleotides downstream of site 1 (+2 and +3) but clear density flanking site 2. This feature indicates that cleavage likely had taken place at site 1 but not site 2 in most of the particles went into the reconstruction. To further clarify this phrase, we added “The PFS region plus the first base paired nucleotide (+1*) are not observed.” on page 4 and better indicate which nucleotides are or are not built in our model in Figure 1.

      1. The authors argue that site 1 cleavage of target RNA is independent of metal ions. This is a potentially interesting result, but it is difficult to determine whether it is supported by the evidence provided in the manuscript. The Methods section only describes a buffer containing 10 mM MgCl2, but does not describe conditions containing EDTA. How much EDTA was added and was MgCl2 omitted from these samples? In addition, it is unclear whether the site 1 product is visible in Figures 2d and 3d. To my eye, the products that are present in the EDTA conditions on these gels migrated slightly slower than the typical site 1 product. This may suggest an alternate cleavage site or chemistry (e.g. cyclic phosphate is maintained following cleavage). Further experimental details and potentially additional experiments are required to fully support the conclusion that site 1 cleavage may be metal independent.

      As we pointed out in response to Reviewer 1’s #8 comment, this conclusion may have been a result of using an older batch of DiCas7-11 that contains degraded fragments.

      As shown in the attached figure below, “batch Y” was an older prep from our in-house clone and “batch X” is a newer prep from the Addgene purchased clone (gel on right), and they consistently produce metal-independent (batch Y) or metal-dependent (batch X) cleavage (gel on left). It is possible that the degraded fragments in batch Y carry a metal-independent cleavage activity that is absent in the more pure batch X.

      We further performed mass spectrometry analysis of two of the degraded fragments from batch Y (indicated by arrows below) and discovered that these are indeed part of DiCas7-11. We, however, cannot rationalize, without more experimental evidence, why these fragments might have generated metal-independent cleavage at site 1. Therefore, we simply updated all our cleavage results from the new and cleaner prep (batch X) (For instance, Figure 3c). As a result, all references to “metal-independence” were removed.

      With regard to the nature of cleaved products, we found both sites could be inhibited by specific 2’-deoxy modifications, consistent with the previous observation that Type III systems generate a 2’, 3’-cyclic product in spite of the metal dependence (for instance, see Hale, C. R., Zhao, P., Olson, S., Duff, M. O., Graveley, B. R., Wells, L., ... & Terns, M. P. (2009). RNA-guided RNA cleavage by a CRISPR RNA-Cas protein complex. Cell, 139(5), 945-956.)

      We added this rationale based on the new results and believe that these characterizations are now thorough and conclusive

      1. The authors performed an experiment investigating the importance of crRNA-target base pairing on cleavage activity (Figure 3e). However, negative controls for the RNA targets in the absence of crRNA and Cas7-11 were not included in this experiment, making it impossible to determine which bands on the gel correspond to substrates and which correspond to products. This result is therefore not interpretable by the reader and does not support the conclusions drawn by the authors.

      Our original gel image (below) does contain these controls but we did not include them for the figure due to space considerations (we should have included it as a supplementary figure). We have now completely updated Figure 3e with much better quality and controls. Both the older and the updated experiments show the same results.

      Original gel for Figure 3e containing controls.

    1. Author Response:

      Reviewer #1 (Public Review):

      This manuscript describes a series of behavioral experiments in which foraging rats are subjected to a novel fear conditioning paradigm. Different groups of animals receive a shock to the dorsal surface of the body paired with either tone, an artificial owl driven forward with pneumatic pressure, or a tone/owl combination. An additional control condition pairs tone with owl alone (ie no shock is delivered). In a subsequent test, only owl+shock and tone/owl+shock animals show increased latency to forage and a withdrawal response to tone (even though owl-shock rats do not experience tone during conditioning). The authors conclude that this tone response is due to sensitization and that fear conditioning does not occur in their experimental setup.

      This approach is intriguing and the issues raised by the manuscript are extremely important for the field to consider. However, there are many ways to interpret the results as they stand. One issue of primary importance is whether it can indeed be claimed that conditioning did not readily occur in the tone+shock group. The lack of a particular behavioral conditioned reaction does not equate to an absence of conditioning. It is possible that unseen (i.e. physiological) measures of conditioning, many of which were once standard DVs in the fear conditioning literature, are present in the tone+shock group. This possibility pushes against the claim made in the title and elsewhere. These claims should be softened.

      We agree with the reviewer and now acknowledge the following caveat in the discussion (pg. 10): “…although neither the tone-shock group nor the tone-owl group showed overt manifestations of fear conditioning (as measured by fleeing or freezing) to the tone that prevented a successful procurement of food, the possibility of physiological (e.g., cardiovascular, respiratory) changes associated with tone-induced fear (Steimer, 2002) cannot be excluded in these animals…”

      Because systemic, group-level retreat CRs are not noted in the tone+shock condition, it would indeed be important to establish if there are any experimental circumstances in which tone paired with a US applied to the dorsal surface of the body can produce consistent reactions (e.g. freezing) to tone alone. Though it may seem likely that tone + dorsal shock would indeed produce freezing in a different setting, this result should not be taken for granted - we've known since the 'noisy water' experiment (Garcia & Koelling, 1966) that not every CS pairs with every US and that association can indeed be selective. A positive control would be clarifying. If the authors could demonstrate that tone+dorsal shock produces freezing to tone in a commonly used fear conditioning setup (ie standard cubicle chamber) then the lack of a retreat CR in their naturalistic paradigm would gain added meaning.

      This is an excellent suggestion. As recommended, we performed a positive control experiment where naïve rats that underwent the same subcutaneous wire implant surgery were placed in a standard experimental chamber and presented with a delayed tone-shock pairing (same tone frequency/intensity and shock intensity/duration; the 24.1 s CS duration was based on the mean CS duration of tone-shock animals in the naturalistic fear conditioning experiment). As can be seen in Author response image 1 (Figure 4 in the revised manuscript) below, these animals exhibited reliable postshock freezing in a conditioning chamber (fear conditioning day 1) and tone CS-evoked freezing in a novel chamber (tone testing day 2), indicating that our original finding (i.e., no evidence of auditory and contextual fear conditioning in an ecologically-relevant environment) is unlikely due to a dorsal neck/body shock US per se.

      Author response image 1. Auditory fear conditioning in a standard experimental chamber. (A) Illustrations of a rat implanted with wires subcutaneously in the dorsal neck/body region undergoing successive days of habituation (10 min tethered, conditioning chamber), training (a single tone CS-shock US pairing), and tone testing (context shift). (B) Mean (crimson line) and individual (gray lines) percent freezing data from 8 rats (4 females, 4 males) during training in context A: 3 min baseline (BL1, BL2, BL3); 23.1 s epoch of tone (T) excluding 1 s overlap with shock (S); 1 min postshock (PS). (C) Mean and individual percent freezing data during tone testing in context B: 1 min baseline (BL1); 3 min tone (T1, T2, T3); 1 min post-tone (PT). (D) Mean + SEM (bar) and individual (dots) percent freezing to tone CS before (Train, T) and after (Test, T1) undergoing auditory fear conditioning (paired t-test; t(7) = -3.163, p = 0.016). * p < 0.05

      The altered withdrawal trajectory seen in owl+shock and tone/owl+shock groups occurs in neither the tone+shock nor the tone+owl group, introducing the possibility that it results from the specific pairing of owl and shock. Put differently - this response may indeed by an associative CR. Do altered withdrawal angles persist if animals that receive owl+shock are exposed to owl again the next day? Do manipulations of the owl and shock that diminish fear conditioning (e.g. unpairing of owl and shock stimuli) eliminate deflected withdrawal angles when the subject is exposed to owl alone? If so, it would cut against the interpretation that fear conditioning does not occur in the setup described here, and would instead demonstrate that it is indeed central to predatory defense. This interpretation is compatible with the effect of hippocampal lesion on freezing evoked by a live predator. Destruction of the rat hippocampus diminishes cat-evoked freezing - this is thought to occur because the rapid association of the cat's various features with threatening action is not formed by the rat (Fanselow, 2000, 2018). Even though this interpretation of the results differs from the authors', it in no way diminishes the interest of this work. This paradigm may indeed be a novel means by which to study rapidly acquired associations with ethological relevance. Follow-up experiments of the type described above are necessary to disambiguate opposing views of the current dataset.

      Whether “altered withdrawal angles persist if animals that receive owl+shock [a US-US pairing] are exposed to owl again the next day” is an interesting question, as it is conceivable that the owl US (Zambetti et al., 2019, iScience) can function as a CS to evoke anticipatory characteristic of the conditioned fear. This possibility is now mentioned as a caveat (pg. 10): “…the erratic escape trajectory behavior exhibited by owl-shock and tone/owl-shock animals may be indicative of rapid associative processes at work (Fanselow 2018). For example, the immediate-shock (and delayed shock-context shift) deficit in freezing (e.g., Fanselow 1986; Landeira-Fernandez et al., 2006) provides compelling evidence that postshock freezing is not a UR but rather a CR to the contextual representation CS that rapidly became associated with the footshock US. In a similar vein then the erratic escape CR topography in owl-shock and tone/owl-shock animals might represent a shift in ‘functional CR topography’ (Fanselow & Wassum 2016) resulting from the rapid association between some salient features of the owl and the dorsal neck/body shock. A rapid owl-shock association nevertheless cannot explain the owl-shock animals’ subsequent fleeing behavior to a novel tone (in the absence of owl), which likely reflects nonassociative fear.”

      Reviewer #2 (Public Review):

      This work is dealing with an interesting question whether a simple, one trial CS+US (Pavlovian) association occurs in a naturalistic environment. Pavlovian fear conditioning contains a repetition of a neutral sensory signal (tone, CS) which is paired with a mild US, usually foot-shock (<1 mA; thus, unpleasant rather than painful) and the CS+US association drives associative learning. In this paper, a single 2.5 mA electrical shock was paired with a novel 80 dB tone to monitor the occurrence of learning via measuring success rate and latency of foraging for food. Some animals experienced an owl-looming matched with the US, just before reaching the food. The authors placed hunger-motivated rats into a custom-built arena equipped with safe nest, gate, food zone as well as with a delivery of a self-controlled US (electrical shock in the neck muscle and/or owl-looming). The US was activated by the rats by approaching to the food. Thus, a conflicting situation was provoked where procuring the food is paired with an aversive conditioned signal. Four groups of rats were included in the experiments based on their conditioning types: tone+ shock, tone+ shock+ owl, shock+owl and tone+owl. Due to these conditioning procedures, none of the rat procured the food but fled to the nest. In contrast, in the retrieval phases (next two days), the tone-shock and tone-owl groups successfully procured the pellets but not the tone-shock-owl group during the conditioned tone presentation. Rats in the latter group fled to the nest upon tone presentation at the food zone. As the shock-owl animals (conditioned without tone) also fled to the nest triggered by (unfamiliar) tone presentation, their and the tone+shock+owl group's fled responses were assigned to be non-associative sensitization-like process. Furthermore, during the pre-tone trials, all groups showed similar behavior as in the tone test. These findings led the authors to conclude that classical Pavlovian fear conditioning may not present in an ecologically relevant environment.

      The raised question is relevant for broad audience of neuroscience and behavioral scientist. However, as the used fear conditioning paradigm is not a common one, it is difficult to interpret the finding. It is based on a single pairing of an unfamiliar, salient tone with a very strong (traumatizing?) electrical shock, delivered directly into the neck muscle and an innate signal (owl looming). In addition, as the tone presentation was followed by many events (gate opening, presence of food, shock and/or owl-looming) in front of the animals, it is hard to image what sort of tone association could be formed at all.

      We thank the reviewer for mentioning several important considerations. In regards to the shock amplitude used here, fear conditioning studies in rats have employed a wide range of numbers, durations and intensities of footshock; e.g., three footshocks: 1.0 mA/0.75-s and 4.0 mA/3-s (Fanselow 1984), 75 footshocks: 1 mA/2-s (Maren 1999; Zimmerman et al. 2007). Note also that 16-20 periorbital shocks (2.0 mA, 8 pulse train at 5 Hz) have been used in auditory fear conditioning in rats (Moita et al. 2003; Blair et al. 2005). Thus, it is unlikely that a single 2.5 mA dorsal neck/body shock (subcutaneous and not in the neck muscle) used in the present study is particularly traumatizing compared to higher intensity/longer duration (e.g., 4.0 mA/3-s) and far more numerous (e.g., 75) footshocks employed in fear conditioning studies.

      The relationship between footshock intensity and fear conditioning also warrants further discussion. Sigmundi, Bouton, and Bolles (1980) examined conditioned freezing in rats to 15 footshocks of 0.5, 1.0 and 2.0 mA intensities (0.5-s duration) and found that “[tone] CS-evoked freezing increased with US intensity.” In contrast, Fanselow (1984) observed relatively higher contextual freezing in rats subjected to three bouts of 1.0 mA/0.75-s than 4 mA/3-s footshocks. Irrespective, the animals that received three 4 mA/3-s footshocks still exhibited robust freezing. Based on the positive control experimental results (see above), it is unlikely that the present study’s failure to observe conditioned fear is due to the use of 2.5 mA shock intensity.

      As the animals in the present study underwent 5 baseline days of foraging (3 trials per day), they would have been habituated to the computer-controlled automated gate opening-closing and the presence of food by the time of tone-shock, tone-owl, owl-shock and tone-owl/shock events, making it unlikely that the tone would associate with the gate/food stimuli. In the employed delay conditioning configuration, the tone CS has greater temporal contiguity with the US (shock and/or owl) and the US is both novel and surprising relative to the other stimuli in the arena environment. Thus, it is more plausible that the tone CS would be associated with the intended US. In summary, we believe that if fear conditioning necessitates relatively sterile environmental settings in order to transpire, then fear conditioning would be implausible in the natural world filled with dynamic, complex stimuli.

      One could also argue that if a hungry animal does not try to collect food after an unpleasant, even a painful experience, then, it normally dies soon (thus, that is not a 'natural' behavior). The tone+shock and tone+owl groups showed similar behavioral features throughout the entire experiments and may reconcile the natural events: although these rats had had negative experience before, were still approaching to food zone due their hunger. Because of their motivation for food, the authors concluded that no association was formed. Based on this single measure, is it right to do so?

      In nature, prey animals adjust their foraging behavior to minimize danger (e.g., Stephens and Krebs 1986 Foraging Theory; Lima and Dill 1990 Can J Zool); thus, it is improbable that an aversive experience will lead to end of food seeking behavior leading to death. Indeed, Choi and Kim (2010 Proc Natl Acad Sci) employed a similar seminaturalistic environment (as the present study) and found that rats adjust their foraging behavior as a function of the predatory threat distance, consistent with the “predatory imminence” model (Fanselow and Lester 1988). Since only behavioral measures of fear were assessed (i.e., fleeing, latency to enter forage zone, pellet procurement), we now acknowledge a caveat in the discussion (see response to Reviewer 1’s comment 1). Note, however, that unlike the tone-shock paired animals that failed to flee to the tone CS and successfully procured the food pellet, the owl-shock animals exhibited robust fear behavior (promptly fled, ceasing foraging) to a novel tone.

      Reviewer #3 (Public Review):

      In this study, the authors aimed to test whether rats could be fear conditioned by pairing a subdermal electric shock to a tone, an owl-like approaching stimulus, or a combination of these in a naturalistic-like environment. The authors designed a task in which rats foraging for food were exposed to a tone paired to a shock, an owl-like stimulus, a combination of the owl and the shock, or paired the owl to a shock in a single trial. The authors indexed behaviors related to food approach after conditioning. The authors found that animals exposed to the owl-shock or the tone/owl-shock pairing displayed a higher latency to approach the food reward compared to animals that were presented with the tone-shock or the tone-owl pairing. These results suggest that pairing the owl with the shock was sufficient to induce inhibitory avoidance, whereas a single pairing of the tone-shock or the tone-owl was not. The authors concluded that standard fear conditioning does not readily occur in a naturalistic-like environment and that the inhibitory avoidance induced by the owl-shock pairing could be the result of increased sensitization rather than a fear association.

      Strengths:

      The manuscript is well-written, the behavioral assay is innovative, and the results are interesting. The inclusion of both males and females, and the behavioral sex comparison was commendable. The findings are timely and would be highly relevant to the field.

      Weaknesses:

      However, in its current state, this study does not provide convincing evidence to support their main claim that Pavlovian fear conditioning does not readily occur in naturalistic environments. The innovative task presented in this study is more akin to an inhibitory avoidance task rather than fear conditioning and should be reframed in such way.

      The reviewer’s comment is theoretically important in translating laboratory studies of fear to real world situations. Because our animals were engaged in a purposive/goal-oriented foraging behavior, that is, the leaving of nest in search of food in an open space brought about tone-shock, tone-owl, owl-shock and tone/owl shock outcomes, one can make the case that this is in principle an inhibitory avoidance (instrumental fear conditioning) task rather than a Pavlovian fear conditioning task. A pertinent question then is whether procedurally ‘pure’ laboratory Pavlovian conditioning tasks (i.e., displacing animals from their home cage to an experimental chamber and presenting CS and US) are possible in real world settings where behaviors of animals and humans are largely purposive/goal-oriented (Tolman 1948 Psychol Rev). It is generally accepted that “Outside the laboratory, stimulus [Pavlovian] learning and response [Instrumental] learning are almost inseparable (Bouton 2007 Learning and Behavior, pg. 28).” The goal of our study was to investigate whether widely-employed auditory fear conditioning readily produces associative fear memory that guides future behavior in animals performing naturalistic foraging behavior, and insofar as presenting a salient tone CS followed by an aversive shock US, the present study has a Pavlovian fear component.

      We thank the reviewer for raising this concern and have addressed the Pavlovian vs. Instrumental fear conditioning aspects of our study in the revised manuscript (pg. 10): “…there are obvious procedural differences between standard fear conditioning versus naturalistic fear conditioning. In the former paradigm, typically ad libitum fed animals are placed in an experimental chamber for a fixed time before receiving a CS-US pairing (irrespective of their ongoing behavior). Thus, the CS duration and ISI are constant across subjects. In our study, hunger-motivated rats searching for food must navigate to a fixed location in a large arena before experiencing a CS-US pairing (instrumental- or response-contingent). Because animals approach the US trigger zone at different latencies, the CS duration and ISI are variable across subjects.”

      References

      Bernstein, I. L., Vitiello, M. V., & Sigmundi, R. A. (1980). Effects of interference stimuli on the acquisition of learned aversions to foods in the rat. J Comp Physiol Psychol, 94(5), 921-931. doi:10.1037/h0077807

      Blair, H. T., Huynh, V. K., Vaz, V. T., Van, J., Patel, R. R., Hiteshi, A. K., . . . Tarpley, J. W. (2005). Unilateral storage of fear memories by the amygdala. J Neurosci, 25(16), 4198-4205. doi:10.1523/JNEUROSCI.0674-05.2005

      Bouton, M. E. (2007). Learning and Behavior: Sinauer Associates

      Choi, J. S., & Kim, J. J. (2010). Amygdala regulates risk of predation in rats foraging in a dynamic fear environment. Proc Natl Acad Sci U S A, 107(50), 21773-21777. doi:10.1073/pnas.1010079108

      Fanselow, M. S. (1984). Shock-induced analgesia on the formalin test: effects of shock severity, naloxone, hypophysectomy, and associative variables. Behav Neurosci, 98(1), 79-95. doi:10.1037//0735-7044.98.1.79

      Fanselow, M. S. (1986). Associative Vs Topographical Accounts of the Immediate Shock Freezing Deficit in Rats - Implications for the Response Selection-Rules Governing Species-Specific Defensive Reactions. Learning and Motivation, 17(1), 16-39. doi:Doi 10.1016/0023-9690(86)90018-4

      Fanselow, M. S. (2018). The Role of Learning in Threat Imminence and Defensive Behaviors. Curr Opin Behav Sci, 24, 44-49. doi:10.1016/j.cobeha.2018.03.003

      Fanselow, M. S., & Lester, L. S. (1988). A functional behavioristic approach to aversively motivated behavior: Predatory imminence as a determinant of the topography of defensive behavior: Lawrence Erlbaum Associates Inc.

      Fanselow, M. S., & Wassum, K. M. (2016). The Origins and Organization of Vertebrate Pavlovian Conditioning. Cold Spring Harbor Perspectives in Biology, 8(1). doi:ARTN a021717 10.1101/cshperspect.a021717

      Landeira-Fernandez, J., DeCola, J. P., Kim, J. J., & Fanselow, M. S. (2006). Immediate shock deficit in fear conditioning: effects of shock manipulations. Behav Neurosci, 120(4), 873-879. doi:10.1037/0735-7044.120.4.873

      Lima, S. L., & Dill, L. M. (1990). Behavioral Decisions Made under the Risk of Predation - a Review and Prospectus. Canadian Journal of Zoology, 68(4), 619-640. doi:DOI 10.1139/z90-092

      Maren, S. (1999). Neurotoxic basolateral amygdala lesions impair learning and memory but not the performance of conditional fear in rats. J Neurosci, 19(19), 8696-8703.

      Moita, M. A., Rosis, S., Zhou, Y., LeDoux, J. E., & Blair, H. T. (2003). Hippocampal place cells acquire location-specific responses to the conditioned stimulus during auditory fear conditioning. Neuron, 37(3), 485-497. doi:10.1016/s0896-6273(03)00033-3

      Sigmundi, R. A., Bouton, M. E., & Bolles, R. C. (1980). Conditioned Freezing in the Rat as a Function of Shock-Intensity and Cs Modality. Bulletin of the Psychonomic Society, 15(4), 254-256.

      Steimer, T. (2002). The biology of fear- and anxiety-related behaviors. Dialogues Clin Neurosci, 4(3), 231-249.

      Stephens, D. W., & Krebs, J. R. (1986). Foraging Theory: Princeton University Press.

      Tolman, E. C. (1948). Cognitive maps in rats and men. Psychol Rev, 55(4), 189-208. doi:10.1037/h0061626

      Zambetti, P. R., Schuessler, B. P., & Kim, J. J. (2019). Sex Differences in Foraging Rats to Naturalistic Aerial Predator Stimuli. iScience, 16, 442-452. doi:10.1016/j.isci.2019.06.011

      Zimmerman, J. M., Rabinak, C. A., McLachlan, I. G., & Maren, S. (2007). The central nucleus of the amygdala is essential for acquiring and expressing conditional fear after overtraining. Learn Mem, 14(9), 634-644. doi:10.1101/lm.607207

    1. Author Response:

      Reviewer #1 (Public Review):

      Overview

      This is a well-conducted study and speaks to an interesting finding in an important topic, whether ethological validity causes co-variation in gamma above and beyond the already present ethological differences present in systemic stimulus sensitivity.

      I like the fact that while this finding (seeing red = ethnologically valid = more gamma) seems to favor views the PI has argued for, the paper comes to a much simpler and more mechanistic conclusion. In short, it's good science.

      I think they missed a key logical point of analysis, in failing to dive into ERF <----> gamma relationships. In contrast to the modeled assumption that they have succeeded in color matching to create matched LGN output, the ERF and its distinct features are metrics of afferent drive in their own data. And, their data seem to suggest these two variables are not tightly correlated, so at very least it is a topic that needs treatment and clarity as discussed below.

      Further ERF analyses are detailed below.

      Minor concerns

      In generally, very well motived and described, a few terms need more precision (speedily and staircased are too inaccurate given their precise psychophysical goals)

      We have revised the results to clarify:

      "For colored disks, the change was a small decrement in color contrast, for gratings a small decrement in luminance contrast. In both cases, the decrement was continuously QUEST-staircased (Watson and Pelli, 1983) per participant and color/grating to 85% correct detection performance. Subjects then reported the side of the contrast decrement relative to the fixation spot as fast as possible (max. 1 s), using a button press."

      The resulting reaction times are reported slightly later in the results section.

      I got confused some about the across-group gamma analysis:

      "The induced change spectra were fit per participant and stimulus with the sum of a linear slope and up to two Gaussians." What is the linear slope?

      The slope is used as the null model – we only regarded gamma peaks as significant if they explained spectrum variance beyond any linear offsets in the change spectra. We have clarified in the Results:

      "To test for the existence of gamma peaks, we fit the per-participant, per-stimulus change spectra with three models: a) the sum of two gaussians and a linear slope, b) the sum of one Gaussian and a linear slope and c) only a linear slope (without any peaks) and chose the best-fitting model using adjusted R2-values."

      To me, a few other analyses approaches would have been intuitive. First, before averaging peak-aligned data, might consider transforming into log, and might consider making average data with measures that don't confound peak height and frequency spread (e.g., using the FWHM/peak power as your shape for each, then averaging).

      The reviewer comments on averaging peak-aligned data. This had been done specifically in Fig. 3C. Correspondingly, we understood the reviewer’s suggestion as a modification of that analysis that we now undertook, with the following steps: 1) Log-transform the power-change values; we did this by transforming into dB; 2) Derive FWHM and peak power values per participant, and then average those; we did this by a) fitting Gaussians to the per-participant, per-stimulus power change spectra, b) quantifiying FWHM as the Gaussian’s Standard Deviation, and the peak power as the Gaussian’s amplitude; 3) average those parameters over subjects, and display the resulting Gaussians. The resulting Gaussians are now shown in the new panel A in Figure 3-figure supplement 1.

      (A) Per-participant, the induced gamma power change peak in dB was fitted with a Gaussian added to an offset (for full description, see Methods). Plotted is the resulting Gaussian, with peak power and variance averaged over participants.

      Results seem to be broadly consistent with Fig. 3C.

      Moderate

      I. I would like to see a more precise treatment of ERF and gamma power. The initial slope of the ERF should, by typical convention, correlate strongly with input strength, and the peak should similarly be a predictor of such drive, albeit a weaker one. Figure 4C looks good, but I'm totally confused about what this is showing. If drive = gamma in color space, then these ERF features and gamma power should (by Occham's sledgehammer…) be correlated. I invoke the sledgehammer not the razor because I could easily be wrong, but if you could unpack this relationship convincingly, this would be a far stronger foundation for the 'equalized for drive, gamma doesn't change across colors' argument…(see also IIB below)…

      …and, in my own squinting, there is a difference (~25%) in the evoked dipole amplitudes for the vertically aligned opponent pairs of red- and green (along the L-M axis Fig 2C) on which much hinges in this paper, but no difference in gamma power for these pairs. How is that possible? This logic doesn't support the main prediction that drive matched differences = matched gamma…Again, I'm happy to be wrong, but I would to see this analyzed and explained intuitively.

      As suggested by the reviewer, we have delved deeper into ERF analyses. Firstly, we overhauled our ERF analysis to extract per-color ERF shape measures (such as timing and slope), added them as panels A and B in Figure 2-figure supplement 1:

      Figure 2-figure supplement 1. ERF and reaction time results: (A) Average pre-peak slope of the N70 ERF component (extracted from 2-12 ms before per-color, per-participant peak time) for all colors. (B) Average peak time of the N70 ERF component for all colors. […]. For panels A-C, error bars represent 95% CIs over participants, bar orientation represents stimulus orientation in DKL space. The length of the scale bar corresponds to the distance from the edge of the hexagon to the outer ring.

      We have revised the results to report those analyses:

      "The initial ERF slope is sometimes used to estimate feedforward drive. We extracted the per-participant, per-color N70 initial slope and found significant differences over hues (F(4.89, 141.68) = 7.53, pGG < 410 6). Specifically, it was shallower for blue hues compared to all other hues except for green and green-blue (all pHolm < 710-4), while it was not significantly different between all other stimulus hue pairs (all pHolm > 0.07, Figure 2-figure supplement 1A), demonstrating that stimulus drive (as estimated by ERF slope) was approximately equalized over all hues but blue.

      The peak time of the N70 component was significantly later for blue stimuli (Mean = 88.6 ms, CI95% = [84.9 ms, 92.1 ms]) compared to all (all pHolm < 0.02) but yellow, green and green-yellow stimuli, for yellow (Mean = 84.4 ms, CI95% = [81.6 ms, 87.6 ms]) compared to red and red-blue stimuli (all pHolm < 0.03), and fastest for red stimuli (Mean = 77.9 ms, CI95% = [74.5 ms, 81.1 ms]) showing a general pattern of slower N70 peaks for stimuli on the S-(L+M) axis, especially for blue (Figure 2-figure supplement 1B)."

      We also checked if our main findings (equivalence of drive-controlled red and green stimuli, weaker responses for S+ stimuli) are robust when controlled for differences in ERF parameters and added in the Results:

      "To attempt to control for potential remaining differences in input drive that the DKL normalization missed, we regressed out per-participant, per-color, the N70 slope and amplitude from the induced gamma power. Results remained equivalent along the L-M axis: The induced gamma power change residuals were not statistically different between red and green stimuli (Red: 8.22, CI95% = [-0.42, 16.85], Green: 12.09, CI95% = [5.44, 18.75], t(29) = 1.35, pHolm = 1.0, BF01 = 3.00).

      As we found differences in initial ERF slope especially for blue stimuli, we checked if this was sufficient to explain weaker induced gamma power for blue stimuli. While blue stimuli still showed weaker gamma-power change residuals than yellow stimuli (Blue: -11.23, CI95% = [-16.89, -5.57], Yellow: -6.35, CI95% = [-11.20, -1.50]), this difference did not reach significance when regressing out changes in N70 slope and amplitude (t(29) = 1.65, pHolm = 0.88). This suggests that lower levels of input drive generated by equicontrast blue versus yellow stimuli might explain the weaker gamma oscillations induced by them."

      We added accordingly in the Discussion:

      "The fact that controlling for N70 amplitude and slope strongly diminished the recorded differences in induced gamma power between S+ and S- stimuli supports the idea that the recorded differences in induced gamma power over the S-(L+M) axis might be due to pure S+ stimuli generating weaker input drive to V1 compared to DKL-equicontrast S- stimuli, even when cone contrasts are equalized.."

      Additionally, we made the correlation between ERF amplitude and induced gamma power clearer to read by correlating them directly. Accordingly, the relevant paragraph in the results now reads:

      "In addition, there were significant correlations between the N70 ERF component and induced gamma power: The extracted N70 amplitude was correlated across colors with the induced gamma power change within participants with on average r = -0.38 (CI95% = [-0.49, -0.28], pWilcoxon < 4*10-6). This correlation was specific to the gamma band and the N70 component: Across colors, there were significant correlation clusters between V1 dipole moment 68-79 ms post-stimulus onset and induced power between 28 54 Hz and 72 Hz (Figure 4C, rmax = 0.30, pTmax < 0.05, corrected for multiple comparisons across time and frequency)."

      II. As indicated above, the paper rests on accurate modeling of human LGN recruitment, based in fact on human cone recruitment. However, the exact details of how such matching was obtained were rapidly discussed-this technical detail is much more than just a detail in a study on color matching: I am not against the logic nor do I know of a flaw, but it's the hinge of the paper and is dealt with glancingly.

      A. Some discussion of model limitations

      B. Why it's valid to assume LGN matching has been achieved using data from the periphery: To buy knowledge, nobody has ever recorded single units in human LGN with these color stimuli…in contrast, the ERF is 'in their hands' and could be directly related (or not) to gamma and to the color matching predictions of their model.

      We have revised the respective paragraph of the introduction to read:

      "Earlier work has established in the non-human primate that LGN responses to color stimuli can be well explained by measuring retinal cone absorption spectra and constructing the following cone-contrast axes: L+M (capturing luminance), L-M (capturing redness vs. greenness), and S-(L+M) (capturing S-cone activation, which correspond to violet vs. yellow hues). These axes span a color space referred to as DKL space (Derrington, Krauskopf, and Lennie, 1984). This insight can be translated to humans (for recent examples, see Olkkonen et al., 2008; Witzel and Gegenfurtner, 2018), if one assumes that human LGN responses have a similar dependence on human cone responses. Recordings of human LGN single units to colored stimuli are not available (to our knowledge). Yet, sensitivity spectra of human retinal cones have been determined by a number of approaches, including ex-vivo retinal unit recordings (Schnapf et al., 1987), and psychophysical color matching (Stockman and Sharpe, 2000). These human cone sensitivity spectra, together with the mentioned assumption, allow to determine a DKL space for human observers. To show color stimuli in coordinates that model LGN activation (and thereby V1 input), monitor light emission spectra for colored stimuli can be measured to define the strength of S-, M-, and L-cone excitation they induce. Then, stimuli and stimulus background can be picked from an equiluminance plane in DKL space. "

      Reviewer #2 (Public Review):

      The major strengths of this study are the use of MEG measurements to obtain spatially resolved estimates of gamma rhythms from a large(ish) sample of human participants, during presentation of stimuli that are generally well matched for cone contrast. Responses were obtained using a 10deg diameter uniform field presented in and around the centre of gaze. The authors find that stimuli with equivalent cone contrast in L-M axis generated equivalent gamma - ie. that 'red' (+L-M) stimuli do not generate stronger responses than 'green (-L+M). The MEG measurements are carefully made and participants performed a decrement-detection task away from the centre of gaze (but within the stimulus), allowing measurements of perceptual performance and in addition controlling attention.

      There are a number of additional observations that make clear that the color and contrast of stimuli are important in understanding gamma. Psychophysical performance was worst for stimuli modulated along the +S-(L+M) direction, and these directions also evoked weakest evoked potentials and induced gamma. There also appear to be additional physiological asymmetries along non-cardinal color directions (e.g. Fig 2C, Fig 3E). The asymmetries between non-cardinal stimuli may parallel those seen in other physiological and perceptual studies and could be drawn out (e.g. Danilova and Mollon, Journal of Vision 2010; Goddard et al., Journal of Vision 2010; Lafer-Sousa et al., JOSA 2012).

      We thank the review for the pointers to relevant literature and have added in the Discussion:

      "Concerning off-axis colors (red-blue, green-blue, green-yellow and red-yellow), we found stronger gamma power and ERF N70 responses to stimuli along the green-yellow/red-blue axis (which has been called lime-magenta in previous studies) compared to stimuli along the red-yellow/green-blue axis (orange-cyan). In human studies varying color contrast along these axes, lime-magenta has also been found to induce stronger fMRI responses (Goddard et al., 2010; but see Lafer-Sousa et al., 2012), and psychophysical work has proposed a cortical color channel along this axis (Danilova and Mollon, 2010; but see Witzel and Gegenfurtner, 2013)."

      Similarly, the asymmetry between +S and -S modulation is striking and need better explanation within the model (that thalamic input strength predicts gamma strength) given that +S inputs to cortex appear to be, if anything, stronger than -S inputs (e.g. DeValois et al. PNAS 2000).

      We followed the reviewer’s suggestion and modified the Discussion to read:

      "Contrary to the unified pathway for L-M activation, stimuli high and low on the S-(L+M) axis (S+ and S ) each target different cell populations in the LGN, and different cortical layers within V1 (Chatterjee and Callaway, 2003; De Valois et al., 2000), whereby the S+ pathway shows higher LGN neuron and V1 afferent input numbers (Chatterjee and Callaway, 2003). Other metrics of V1 activation, such as ERPs/ERFs, reveal that these more numerous S+ inputs result in a weaker evoked potential that also shows a longer latency (our data; Nunez et al., 2021). The origin of this dissociation might lie in different input timing or less cortical amplification, but remains unclear so far. Interestingly, our results suggest that cortical gamma is more closely related to the processes reflected in the ERP/ERF: Stimuli inducing stronger ERF induced stronger gamma; and controlling for ERF-based measures of input drives abolished differences between S+ and S- stimuli in our data."

      Given that this asymmetry presents a potential exception to the direct association between LGN drive and V1 gamma power, we have toned down claims of a direct input drive to gamma power relationship in the Title and text and have refocused instead on L-M contrast.

      My only real concern is that the authors use a precomputed DKL color space for all observers. The problem with this approach is that the isoluminant plane of DKL color space is predicated on a particular balance of L- and M-cones to Vlambda, and individuals can show substantial variability of the angle of the isoluminant plane in DKL space (e.g. He, Cruz and Eskew, Journal of Vision 2020). There is a non-negligible chance that all the responses to colored stimuli may therefore be predicted by projection of the stimuli onto each individual's idiosyncratic Vlambda (that is, the residual luminance contrast in the stimulus). While this would be exhaustive to assess in the MEG measurements, it may be possible to assess perceptually as in the He paper above or by similar methods. Regardless, the authors should consider the implications - this is important because, for example, it may suggest that important of signals from magnocellular pathway, which are thought to be important for Vlambda.

      We followed the suggestion of the reviewer, performed additional analyses and report the new results in the following Results text:

      "When perceptual (instead of neuronal) definitions of equiluminance are used, there is substantial between-subject variability in the ratio of relative L- and M-cone contributions to perceived luminance, with a mean ratio of L/M luminance contributions of 1.5-2.3 (He et al., 2020). Our perceptual results are consistent with that: We had determined the color-contrast change-detection threshold per color; We used the inverse of this threshold as a metric of color change-detection performance; The ratio of this performance metric between red and green (L divided by M) had an average value of 1.48, with substantial variability over subjects (CI95% = [1.33, 1.66]).

      If such variability also affected the neuronal ERF and gamma power measures reported here, L/M-ratios in color-contrast change-detection thresholds should be correlated across subjects with L/M-ratios in ERF amplitude and induced gamma power. This was not the case: Change-detection threshold red/green ratios were neither correlated with ERF N70 amplitude red/green ratios (ρ = 0.09, p = 0.65), nor with induced gamma power red/green ratios (ρ = -0.17, p = 0.38)."

      Reviewer #3 (Public Review):

      This is an interesting article studying human color perception using MEG. The specific aim was to study differences in color perception related to different S-, M-, and L-cone excitation levels and especially whether red color is perceived differentially to other colors. To my knowledge, this is the first study of its kind and as such very interesting. The methods are excellent and manuscript is well written as expected this manuscript coming from this lab. However, illustrations of the results is not optimal and could be enhanced.

      Major

      The results presented in the manuscript are very interesting, but not presented comprehensively to evaluate the validity of the results. The main results of the manuscript are that the gamma-band responses to stimuli with absolute L-M contrast i.e. green and red stimuli do not differ, but they differ for stimuli on the S-(L+M) (blue vs red-green) axis and gamma-band responses for blue stimuli are smaller. These data are presented in figure 3, but in it's current form, these results are not well conveyed by the figure. The main results are illustrated in figures 3BC, which show the average waveforms for grating and for different color stimuli. While there are confidence limits for the gamma-band responses for the grating stimuli, there are no confidence limits for the responses to different color stimuli. Therefore, the main results of the similarities / differences between the responses to different colors can't be evaluated based on the figure and hence confidence limits should be added to these data.

      Figure 3E reports the gamma-power change values after alignment to the individual peak gamma frequencies, i.e. the values used for statistics, and does report confidence intervals. Yet, we see the point of the reviewer that confidence intervals are also helpful in the non-aligned/complete spectra. We found that inclusion of confidence intervals into Figure 3B,C, with the many overlapping spectra, renders those panels un-readable. Therefore, we included the new panel Figure 3-figure supplement 2A, showing each color’s spectrum separately:

      (A) Per-color average induced power change spectra. Banding shows 95% confidence intervals over participants. Note that the y-axis varies between colors.

      It is also not clear from the figure legend, from which time-window data is averaged for the waveforms.

      We have added in the legend:

      "All panels show power change 0.3 s to 1.3 s after stimulus onset, relative to baseline."

      The time-resolved profile of gamma-power changes are illustrated in Fig. 3D. This figure would a perfect place to illustrate the main results. However, of all color stimuli, these TFRs are shown only for the green stimuli, not for the red-green differences nor for blue stimuli for which responses were smaller. Why these TFRs are not showed for all color stimuli and for their differences?

      Figure 3-figure supplement 3. Per-color time-frequency responses: Average stimulus-induced power change in V1 as a function of time and frequency, plotted for each frequency.

      We agree with the reviewer that TFR plots can be very informative. We followed their request and included TFRs for each color as Figure 3-Figure supplement 3.

      Regarding the suggestion to also include TFRs for the differences between colors, we note that this would amount to 28 TFRs, one each for all color combinations. Furthermore, while gamma peaks were often clear, their peak frequencies varied substantially across subjects and colors. Therefore, we based our statistical analysis on the power at the peak frequencies, corresponding to peak-aligned spectra (Fig. 3c). A comparison of Figure 3C with Figure 3B shows that the shape of non-aligned average spectra is strongly affected by inter-subject peak-frequency variability and thereby hard to interpret. Therefore, we refrained from showing TFR for differences between colors, which would also lack the required peak alignment.

    1. Author Response:

      Joint Public Review:

      A highly robust result when investigating how neural population activity is impacted by performance in a task is that the trial to trial correlations (noise correlations) between neurons is reduced as performance increases. However the theoretical and experimental literature so far has failed to account for this robust link since reduced noise correlations do not systematically contribute to improved availability or transmission of information (often measured using decoding of stimulus identity). This paper sets out to address this discrepancy by proposing that the key to linking noise correlations to decoding and thus bridging the gap with performance is to rethink the decoders we use : instead of decoders optimized to the specific task imposed on the animal on any given trial (A vs B / B vs C / A vs C), they hypothesize that we should favor a decoder optimized for a general readout of stimulus properties (A vs B vs C).

      To test this hypothesis, the authors use a combination of quantitative data analysis and mechanistic network modeling. Data were recorded from neuronal populations in area V4 of two monkeys trained to perform an orientation change detection task, where the magnitude of orientation change could vary across trials, and the change could happen at cued (attended) or uncued (unattended) locations in the visual field. The model, which extends previous work by the authors, reproduces many basic features of the data, and both the model and data offer support for the hypothesis.

      The reviewers agreed that this is a potentially important contribution, that addresses a widely observed, but puzzling, relation between perceptual performance and noise correlations. The clarity of the hypothesis, and the combination of data analysis and computational modelling are two essential strengths of the paper.

      Overall this paper exhibits a new factor to be taken into account when analysing neural data : the choice of decoder and in particular how general or specific the decoder is. The fact that the generality of the decoder sheds light on the much debated question of noise correlations underscores its importance. The paper therefore opens multiple avenues for future research to probe this new idea, in particular for tasks with multiple stimuli dimensions.

      Nonetheless, as detailed below, the reviewers believe the manuscript clarity could be further improved in several points, and some additional analysis of the data would provide more straightforward test of the hypothesis.

      1. It would be important to verify that the model reproduces the correlation between noise and signal correlations since this is really a key argument leading to the author's hypothesis.

      We have incorporated this verification of the model into the manuscript, as referred to below in the Results:

      “Importantly, this model reproduces the correlation between noise and signal correlations (Figure 2–figure supplement 1) observed in electrophysiological data (Cohen & Maunsell, 2009; Cohen & Kohn, 2011). This correlation between the shared noise and the shared tuning is a key component of the general decoder hypothesis. We observed this strong relationship between noise and signal correlations in our recorded neurons (Figure 2–figure supplement 1A) as well as in our modeled data (Figure 2–figure supplement 1B). Using this model, we were able to measure the relationship between noise and signal correlations for varying strengths of attentional modulation. Consistent with the predictions of the general decoder hypothesis, attention weakened the relationship between noise and signal correlations (Figure 2–figure supplement 1C).”

      The new figure is as below:

      Figure 2–figure supplement 1. The model reproduces the relationship between noise and signal correlations that is key to the general decoder hypothesis. (A) As previously observed in electrophysiological data (Cohen & Maunsell, 2009; Cohen & Kohn, 2011), we observe a strong relationship between noise and signal correlations. During additional recordings collected during most recording sessions (for Monkey 1 illustrated here, n = 37 days with additional recordings), the monkey was rewarded for passively fixating the center of the monitor while Gabors with randomly interleaved orientations were flashed at the receptive field location (‘Stim 2’ location in Figure 1C). The presented orientations spanned the full range of stimulus orientations (12 equally spaced orientations from 0 to 330 degrees). We calculated the signal correlation for each pair of units based on their mean responses to each of the 12 orientations. We define the noise correlation for each pair of units as the average noise correlation for each orientation. The plot depicts signal correlation as a function of noise correlation across all recording sessions, binned into 8 equally sized sets of unit pairs. Error bars represent SEM. (B) The model reproduces the relationship between noise and signal correlations. Signal correlation is plotted as a function of noise correlation, binned into 20 equally sized sets of unit pairs (n = 2000 neurons), for each attentional modulation strength (green: least attended; yellow: most attended). The results were averaged over 50 tested orientations. (C) The slope of the relationship between noise and signal correlations (y-axis) decreases with increasing attentional modulation (x-axis). This suggests that noise is less aligned with signal correlation with increasing attentional modulation.

      2. Testing the hypothesis of the general decoder:<br /> 2.1 In the data, the authors compare mainly the specific (stimulus) decoder and the monkey's choice decoder. The general stimulus decoder is only considered in fig. 3f, because data across multiple orientations are available only for the cued condition, and therefore the general and specific decoders cannot be compared for changes between cued and uncued. However, the hypothesized relation between mean correlations and performance should also be true within a fixed attention condition (cued), comparing sessions with larger vs. smaller correlation. In other words, if the hypothesis is correct, you should find that performance of the "most general" decoder (as in fig. 3f) correlates negatively with average noise correlations, across sessions, more so than the "most specific" decoder.<br /> We have added a new supplementary figure to the manuscript:

      Figure 3–figure supplement 1. Based on the electrophysiological data, the performance of the monkey’s decoder was more related to mean correlated variability than the performance of the specific decoder within each attention condition. (A) Within the cued attention condition, the performance of the monkey’s decoder was more related to mean correlated variability (left plot; correlation coefficient: n = 71 days, r = -0.23, p = 0.058) than the performance of the specific decoder (right plot; correlation coefficient: r = 0.038, p = 0.75). The correlation coefficients associated with the two decoders were significantly different from each other (Williams’ procedure: t = 3.8, p = 1.5 x 10^-4). Best fit lines plotted in gray. Data from both monkeys combined (Monkey 1 data shown in orange: n = 44 days; Monkey 2 data shown in purple: n = 27 days) with mean correlated variability z-scored within monkey. (B) The data within the uncued attention condition showed a similar pattern, with the performance of the monkey’s decoder more related to mean correlated variability (n = 69 days, r = -0.20, p = 0.14) than the performance of the specific decoder (r = 0.085, p = 0.51; Williams’ procedure: t = 2.0, p = 0.049). Conventions as in (A) (Monkey 1: n = 42 days – see Methods for data exclusions as in Figure 3C; Monkey 2: n = 27 days).

      2.2 In figure 3f, a more straightforward and precise comparison is to use the stimulus decoders to predict the choice, and test whether the more specific or the more general can predict choices more accurately.

      We have added a new panel to Figure 3 (Figure 3G) that illustrates the results of this analysis comparing whether the specific or more-general decoders predict the monkey’s trial-by-trial choices more accurately:

      Figure 3… (G) The more general the decoder (x-axis), the better its performance predicting the monkey’s choices on the median changed orientation trials (y-axis; the proportion of leave-one-out trials in which the decoder correctly predicted the monkey’s decision as to whether the orientation was the starting orientation or the median changed orientation). Conventions as in (F) (see Methods for n values).

      The description of this new panel in the Results section is as below:

      “Further, the more general the decoder, the better it predicted the monkey’s trial-by-trial choices on the median changed orientation trials (Figure 3G).”

      The updated Methods section describing this new panel is as below:

      “For Figure 3G, we performanced analyses similar to those performed for Figure 3F, in that we tested each stimulus decoder: ‘1 ori’ decoders (n = 8 decoders; 1 specific decoder for either the first, second, fourth, or fifth largest changed orientation, for each of the 2 monkeys), ‘2 oris’ decoders (n = 12 decoders; 1 decoder for each of the 6 combinations of 2 changed orientations, for each of the 2 monkeys), ‘3 oris’ decoders (n = 8 decoders; 1 decoder for each of the 4 combinations of 3 changed orientations, for each of the 2 monkeys), and ‘4 oris’ decoders (n = 2 decoders; 1 decoder for the 1 combination of 4 changed orientations, for each of the 2 monkeys). However, unlike in Figure 3F, where the performance of the stimulus decoders was compared to the performance of the monkey’s decoder on the median orientation-change trials, here we calculated the performance of the stimulus decoder when tasked with predicting the trial-by-trial choices that the monkey made on the median orientation-change trials. We plotted the proportion of leave-one-out trials in which each decoder correctly predicted the monkey’s choice as to whether the orientation was the starting orientation or the median changed orientation.”

      3. The main goal of the manuscript is to determine the impact of noise correlations on various decoding schemes. The figures however only show how decoding co-varies with correlations, but a direct, more causal analysis of the effect of correlations on decoding seems to be missing. Such an analysis can be obtained by comparing decoding on simultaneously recorded activity with decoding on trial-shuffled activity, in which noise-correlations are removed.

      We have added the following Discussion section to address this point:

      “The purpose of this study was to investigate the relationship between mean correlated variability and a general decoder. We made an initial test of the overarching hypothesis that observers use a general decoding strategy in feature-rich environments by testing whether a decoder optimized for a broader range of stimulus values better matched the decoder actually used by the monkeys than a specific decoder optimized for a narrower range of stimulus values. We purposefully did not make claims about the utility of correlated variability relative to hypothetical situations in which correlated variability does not exist in the responses of a group of neurons, as we suspect that this is not a physiologically realistic condition. Studies that causally manipulate the level of correlated variability in neuronal populations to measure the true physiological and behavioral effects of increasing or decreasing correlated variability levels, through pharmacological or genetic means, may provide important insights into the impact of correlated variability on various decoding strategies.”

      4. How different are the four different decoders (specific/monkey, cued/uncued)? It would be interesting to see how much they overlap. More generally, the authors should discuss the alternative that attention modulates also the readout/decoding weights, rather than or in addition to modulating V4 activity.

      We have added the following to the manuscript:

      A fixed readout mechanism

      A prior study from our lab found that attention, rather than changing the neuronal weights of the observer’s decoder, reshaped neuronal population activity to better align with a fixed readout mechanism (Ruff & Cohen, 2019). To test whether the neuronal weights of the monkey’s decoder changed across attention conditions (attended versus unattended), Ruff and Cohen switched the neuronal weights across conditions, testing the stimulus information in one attention condition with the neuronal weights from the other. They found that even with the switched weights, the performance of the monkey’s decoder was still higher in the attended condition. The results of this study support the conclusion that attention reshapes neuronal activity so that a fixed readout mechanism can better read out stimulus information. In other words, differences in the performance of the monkey’s decoder across attention conditions may be due to differences in how well the neuronal activity aligns with a fixed decoder.

      Our study extends the findings of Ruff and Cohen to test whether that fixed readout mechanism is determined by a general decoding strategy. Our findings support the hypothesis that observers use a general decoding strategy in the face of changing stimulus and task conditions. Our findings do not exclude other potential explanations for the suboptimality of the monkey’s decoder, nor do they exclude the possibility that attention modulates decoder neuronal weights. However, our findings together with those of Ruff and Cohen shed light on why neuronal decoders are suboptimal in a manner that aligns the fixed decoder axis with the correlated variability axis (Ni et al., 2018; Ruff et al., 2018).”

      5. Quantifying the link between model and data :<br /> 5.1 the text providing motivation for the model could be improved. The motivation used in the manuscript is, essentially, that the model allows to extrapolate beyond the data (more stimuli, more repetitions, more neurons). The dangers of extrapolation beyond the range of the data are however well known. A model that extrapolates beyond existing data is useful to design new experiments and test predictions, but this is not done here. Because the manuscript is about information and decoding, a better motivation is the fact that this model takes an actual image as input, and produces tuning and covariance compatible with each other because they are constrained by an actual network that processes the input (as opposed to parametric models where tuning and covariance can be manipulated independently).

      We have modified the manuscript as below:

      “Here, we describe a circuit model that we designed to allow us to compare the specific and monkey’s decoders from our electrophysiological dataset to modeled ideal specific and general decoders. The primary benefit of our model is that it can take actual images as inputs and produce neuronal tuning and covariance that are compatible with each other because of constraints from the simulated network that processed the inputs (Huang et al., 2019). Parametric models in which tuning and covariance can be manipulated independently would not provide such constraints. In our model, the mean correlated variability of the population activity is restricted to very few dimensions, matching experimentally recorded data from visual cortex demonstrating that mean correlated variability occupies a low-dimensional subset of the full neuronal population space (Ecker et al., 2014; Goris et al., 2014; Huang et al., 2019; Kanashiro et al., 2017; Lin et al., 2015; Rabinowitz et al., 2015; Semedo et al., 2019; Williamson et al., 2016).”

      “Our study also demonstrates the utility of combining electrophysiological and circuit modeling approaches to studying neural coding. Our model mimicked the correlated variability and effects of attention in our physiological data. Critically, our model produced neuronal tuning and covariance based on the constraints of an actual network capable of processing images as inputs.”

      We have also removed the Results and Discussion text that suggested that the model allowed us to extrapolate beyond the data.

      5.2 The ring structure, and the orientation of correlations (Fig 2b) seem to be key ingredients of the model, but are they based on data, or ad-hoc assumptions?

      We have modified the manuscript to clarify this point, as below:

      “As the basis for our modeled general decoder, we first mapped the n-dimensional neuronal activity of our model in response to the full range of orientations to a 2-dimensional space. Because the neurons were tuned for orientation, we could map the n-dimensional population responses to a ring (Figure 2B, C). The orientation of correlations (the shape of each color cloud in Figure 2B) was not an assumed parameter, and illustrates the outcome of the correlation structure and dimensionality modeled by our data. In Figure 2B, we can see that the fluctuations along the radial directions are much larger than those along other directions for a given orientation. This is consistent with the low-dimensional structure of the modeled neuronal activity. In our model, the fluctuations of the neurons, mapped to the radial direction on the ring, were more elongated in the unattended state (Figure 2B) than in the attended state (Figure 2C).”

      5.3 In the model, the specific decoder is quite strongly linked to correlated variability and the improvement of the general decoder is clear but incremental (0.66 vs 0.83) whereas in the data there really is no correlation at all (Fig 3c). This is a bit problematic because the author's begin by stating that specific decoders cannot explain the link between noise correlations and accuracy but their specific decoder clearly shows a link.

      We appreciate this point and have modified the manuscript as below:

      “Indeed, we found that just as the performance of the physiological monkey’s decoder was more strongly related to mean correlated variability than the performance of the physiological specific decoder (Figure 3C; see Figure 3–figure supplement 1 for analyses per attention condition), the performance of the modeled general decoder was more strongly related to mean correlated variability than the performance of the modeled specific decoder (Figure 3D). We modeled much stronger relationships to correlated variability (Figure 3D) than observed with our physiological data (Figure 3C). We observed that the correlation with specific decoder performance was significant with the modeled data but not with the physiological data. This is not surprising as we saw attentional effects, albeit small ones, on specific decoder performance with both the physiological and the modeled data (Figure 3A, B). Even small attentional effects would result in a correlation between decoder performance and mean correlated variability with a large enough range of mean correlated variability values. It is possible that with enough electrophysiological data, the performance of the specific decoder would be significantly related to correlated variability, as well. As described above, our focus is not on whether the performance of any one decoder is significantly correlated with mean correlated variability, but on which decoder provides a better explanation of the frequently observed relationship between performance and mean correlated variability. The performance of the general decoder was more strongly related to mean correlated variability than the performance of the specific decoder.”

      “Our results suggest that the relationship between behavior and mean correlated variability is more consistent with observers using a more general strategy that employs the same neuronal weights for decoding any stimulus change.”

      6. General decoder: Some parts of the text (eg. Line 60, Line 413) refer to a decoder that accounts for discrimination along different stimulus dimensions (eg. different values of orientation, or different color of the visual input). But the results of the manuscripts are about a general decoder for multiple values along a single stimulus dimension. The disconnect should be discussed, and the relation between these two scenarios explained.

      We have modified the manuscript as below:

      “Here, we report the results of an initial test of this overarching hypothesis, based on a single stimulus dimension. We used a simple, well-studied behavioral task to test whether a more-general decoder (optimized for a broader range of stimulus values along a single dimension) better explained the relationship between behavior and mean correlated variability than a more-specific decoder (optimized for a narrower range of stimulus values along a single dimension). Specifically, we used a well-studied orientation change-detection task (Cohen & Maunsell, 2009) to test whether a general decoder for the full range of stimulus orientations better explained the relationship between behavior and mean correlated variability than a specific decoder for the orientation change presented in the behavioral trial at hand.

      This test based on a single stimulus dimension is an important initial test of the general decoder hypothesis because many of the studies that found that performance increased when mean correlated variability decreased used a change-detection task…”

      “We performed this initial test of the overarching general decoder hypothesis in the context of a change-detection task along a single stimulus dimension because this type of task was used in many of the studies that reported a relationship between perceptual performance and mean correlated variability (Cohen & Maunsell, 2009; 2011; Herrero et al., 2013; Luo & Maunsell, 2015; Mayo & Maunsell, 2016; Nandy et al., 2017; Ni et al., 2018; Ruff & Cohen, 2016; 2019; Verhoef & Maunsell, 2017; Yan et al., 2014; Zénon & Krauzlis, 2012). This simple and well-studied task provided an ideal initial test of our general decoder hypothesis.

      This initial test of the general decoder hypothesis suggests that a more general decoding strategy may explain observations in studies that use a variety of behavioral and stimulus conditions.”

      “This initial study of the general decoder hypothesis tested this idea in the context of a visual environment in which stimulus values only changed along a single dimension. However, our overarching hypothesis is that observers use a general decoding strategy in the complex and feature-rich visual scenes encountered in natural environments. In everyday environments, visual stimuli can change rapidly and unpredictably along many stimulus dimensions. The hypothesis that such a truly general decoder explains the relationship between perceptual performance and mean correlated variability is suggested by our finding that the modeled general decoder for orientation was more strongly related to mean correlated variability than the modeled specific decoder (Figure 3D). Future tests of a general decoder for multiple stimulus features would be needed to determine if this decoding strategy is used in the face of multiple changing stimulus features. Further, such tests would need to consider alternative hypotheses for how sensory information is decoded when observing multiple aspects of a stimulus (Berkes et al., 2009; Deneve, 2012; Lorteije et al., 2015). Studies that use complex or naturalistic visual stimuli may be ideal for further investigations of this hypothesis.”

      7. Some statements in the discussion such as l 354 "the relationship between behavior and mean correlated variability is explained by the hypothesis that observers use a general strategy" should be qualified : the authors clearly show that the general decoder amplifies the relationship but in their own data the relationship exists already with a specific decoder.

      We have modified the manuscript as below:

      “Our results suggest that the relationship between behavior and mean correlated variability is more consistent with observers using a more general strategy that employs the same neuronal weights for decoding any stimulus change.

      “Together, these results support the hypothesis that observers use a more general decoding strategy in scenarios that require flexibility to changing stimulus conditions.”

      “This initial test of the general decoder hypothesis suggests that a more general decoding strategy may explain observations in studies that use a variety of behavioral and stimulus conditions.”

      8. Low-Dimensionality, beginning of Introduction and end of Discussion: experimentally, cortical activity is low-dimensional, and the proposed model captures that. But some of the reviewers did not understand the argument offered for why this matters, for the relation between average correlations and performance. It seems that the dimensionality of the population covariance is not relevant: The point instead is that a change in amplitude of fluctuations along the f'f' direction necessarily impact performance of a "specific" decoder, whereas changes in all other dimensions can be accounted for by the appropriate weights of the "specific" decoder. On the other hand, changes in fluctuation strength along multiple directions may impact the performance of the "general" decoder.

      We have modified the manuscript as below:

      “These observations comprise a paradox because changes in this simple measure should have a minimal effect on information coding. Recent theoretical work shows that neuronal population decoders that extract the maximum amount of sensory information for the specific task at hand can easily ignore mean correlated noise (Kafashan et al., 2021; Kanitscheider et al., 2015b; Moreno-Bote et al., 2014; Pitkow et al., 2015; Rumyantsev et al., 2020; for review, see Kohn et al., 2016). Decoders for the specific task at hand can ignore mean correlated variability because it does not corrupt the dimensions of neuronal population space that are most informative about the stimulus (Moreno-Bote et al., 2014).”

      “Our results address a paradox in the literature. Electrophysiological and theoretical evidence supports that there is a relationship between mean correlated variability and perceptual performance (Abbott & Dayan, 1999; Clery et al., 2017; Haefner et al., 2013; Jin et al., 2019; Ni et al., 2018; Ruff & Cohen, 2019; reviewed by Ruff et al., 2018). Yet, a specific decoding strategy in which different sets of neuronal weights are used to decode different stimulus changes cannot easily explain this relationship (Kafashan et al., 2021; Kanitscheider et al., 2015b; Moreno-Bote et al., 2014; Pitkow et al., 2015; Rumyantsev et al., 2020; reviewed by Kohn et al., 2016). This is because specific decoders of neuronal population activity can easily ignore changes in mean correlated noise (Moreno-Bote et al., 2014).”

    1. Author Response:

      Reviewer #1 (Public Review):

      In this article, Bollmann and colleagues demonstrated both theoretically and experimentally that blood vessels could be targeted at the mesoscopic scale with time-of-flight magnetic resonance imaging (TOF-MRI). With a mathematical model that includes partial voluming effects explicitly, they outline how small voxels reduce the dependency of blood dwell time, a key parameter of the TOF sequence, on blood velocity. Through several experiments on three human subjects, they show that increasing resolution improves contrast and evaluate additional issues such as vessel displacement artifacts and the separation of veins and arteries.

      The overall presentation of the main finding, that small voxels are beneficial for mesoscopic pial vessels, is clear and well discussed, although difficult to grasp fully without a good prior understanding of the underlying TOF-MRI sequence principles. Results are convincing, and some of the data both raw and processed have been provided publicly. Visual inspection and comparisons of different scans are provided, although no quantification or statistical comparison of the results are included.

      Potential applications of the study are varied, from modeling more precisely functional MRI signals to assessing the health of small vessels. Overall, this article reopens a window on studying the vasculature of the human brain in great detail, for which studies have been surprisingly limited until recently.

      In summary, this article provides a clear demonstration that small pial vessels can indeed be imaged successfully with extremely high voxel resolution. There are however several concerns with the current manuscript, hopefully addressable within the study.

      Thank you very much for this encouraging review. While smaller voxel sizes theoretically benefit all blood vessels, we are specifically targeting the (small) pial arteries here, as the inflow-effect in veins is unreliable and susceptibility-based contrasts are much more suited for this part of the vasculature. (We have clarified this in the revised manuscript by substituting ‘vessel’ with ‘artery’ wherever appropriate.) Using a partial-volume model and a relative contrast formulation, we find that the blood delivery time is not the limiting factor when imaging pial arteries, but the voxel size is. Taking into account the comparatively fast blood velocities even in pial arteries with diameters ≤ 200 µm (using t_delivery=l_voxel/v_blood), we find that blood dwell times are sufficiently long for the small voxel sizes considered here to employ the simpler formulation of the flow-related enhancement effect. In other words, small voxels eliminate blood dwell time as a consideration for the blood velocities expected for pial arteries.

      We have extended the description of the TOF-MRA sequence in the revised manuscript, and all data and simulations/analyses presented in this manuscript are now publicly available at https://osf.io/nr6gc/ and https://gitlab.com/SaskiaB/pialvesseltof.git, respectively. This includes additional quantifications of the FRE effect for large vessels (adding to the assessment for small vessels already included), and the effect of voxel size on vessel segmentations.

      Main points:

      1) The manuscript needs clarifying through some additional background information for a readership wider than expert MR physicists. The TOF-MRA sequence and its underlying principles should be introduced first thing, even before discussing vascular anatomy, as it is the key to understanding what aspects of blood physiology and MRI parameters matter here. MR physics shorthand terms should be avoided or defined, as 'spins' or 'relaxation' are not obvious to everybody. The relationship between delivery time and slab thickness should be made clear as well.

      Thank you for this valuable comment that the Theory section is perhaps not accessible for all readers. We have adapted the manuscript in several locations to provide more background information and details on time-of-flight contrast. We found, however, that there is no concise way to first present the MR physics part and then introduce the pial arterial vasculature, as the optimization presented therein is targeted towards this structure. To address this comment, we have therefore opted to provide a brief introduction to TOF-MRA first in the Introduction, and then a more in-depth description in the Theory section.

      Introduction section:

      "Recent studies have shown the potential of time-of-flight (TOF) based magnetic resonance angiography (MRA) at 7 Tesla (T) in subcortical areas (Bouvy et al., 2016, 2014; Ladd, 2007; Mattern et al., 2018; Schulz et al., 2016; von Morze et al., 2007). In brief, TOF-MRA uses the high signal intensity caused by inflowing water protons in the blood to generate contrast, rather than an exogenous contrast agent. By adjusting the imaging parameters of a gradient-recalled echo (GRE) sequence, namely the repetition time (T_R) and flip angle, the signal from static tissue in the background can be suppressed, and high image intensities are only present in blood vessels freshly filled with non-saturated inflowing blood. As the blood flows through the vasculature within the imaging volume, its signal intensity slowly decreases. (For a comprehensive introduction to the principles of MRA, see for example Carr and Carroll (2012)). At ultra-high field, the increased signal-to-noise ratio (SNR), the longer T_1 relaxation times of blood and grey matter, and the potential for higher resolution are key benefits (von Morze et al., 2007)."

      Theory section:

      "Flow-related enhancement

      Before discussing the effects of vessel size, we briefly revisit the fundamental theory of the flow-related enhancement effect used in TOF-MRA. Taking into account the specific properties of pial arteries, we will then extend the classical description to this new regime. In general, TOF-MRA creates high signal intensities in arteries using inflowing blood as an endogenous contrast agent. The object magnetization—created through the interaction between the quantum mechanical spins of water protons and the magnetic field—provides the signal source (or magnetization) accessed via excitation with radiofrequency (RF) waves (called RF pulses) and the reception of ‘echo’ signals emitted by the sample around the same frequency. The T1-contrast in TOF-MRA is based on the difference in the steady-state magnetization of static tissue, which is continuously saturated by RF pulses during the imaging, and the increased or enhanced longitudinal magnetization of inflowing blood water spins, which have experienced no or few RF pulses. In other words, in TOF-MRA we see enhancement for blood that flows into the imaging volume."

      "Since the coverage or slab thickness in TOF-MRA is usually kept small to minimize blood delivery time by shortening the path-length of the vessel contained within the slab (Parker et al., 1991), and because we are focused here on the pial vasculature, we have limited our considerations to a maximum blood delivery time of 1000 ms, with values of few hundreds of milliseconds being more likely."

      2) The main discussion of higher resolution leading to improvements rather than loss presented here seems a bit one-sided: for a more objective understanding of the differences it would be worth to explicitly derive the 'classical' treatment and show how it leads to different conclusions than the present one. In particular, the link made in the discussion between using relative magnetization and modeling partial voluming seems unclear, as both are unrelated. One could also argue that in theory higher resolution imaging is always better, but of course there are practical considerations in play: SNR, dynamics of the measured effect vs speed of acquisition, motion, etc. These issues are not really integrated into the model, even though they provide strong constraints on what can be done. It would be good to at least discuss the constraints that 140 or 160 microns resolution imposes on what is achievable at present.

      Thank you for this excellent suggestion. We found it instructive to illustrate the different effects separately, i.e. relative vs. absolute FRE, and then partial volume vs. no-partial volume effects. In response to comment R2.8 of Reviewer 2, we also clarified the derivation of the relative FRE vs the ‘classical’ absolute FRE (please see R2.8). Accordingly, the manuscript now includes the theoretical derivation in the Theory section and an explicit demonstration of how the classical treatment leads to different conclusions in the Supplementary Material. The important insight gained in our work is that only when considering relative FRE and partial-volume effects together, can we conclude that smaller voxels are advantageous. We have added the following section in the Supplementary Material:

      "Effect of FRE Definition and Interaction with Partial-Volume Model

      For the definition of the FRE effect employed in this study, we used a measure of relative FRE (Al-Kwifi et al., 2002) in combination with a partial-volume model (Eq. 6). To illustrate the implications of these two effects, as well as their interaction, we have estimated the relative and absolute FRE for an artery with a diameter of 200 µm or 2 000 µm (i.e. no partial-volume effects at the centre of the vessel). The absolute FRE expression explicitly takes the voxel volume into account, and so instead of Eq. (6) for the relative FRE we used"

      Eq. (1)

      "Note that the division by M_zS^tissue⋅l_voxel^3 to obtain the relative FRE from this expression removes the contribution of the total voxel volume (l_voxel^3). Supplementary Figure 2 shows that, when partial volume effects are present, the highest relative FRE arises in voxels with the same size as or smaller than the vessel diameter (Supplementary Figure 2A), whereas the absolute FRE increases with voxel size (Supplementary Figure 2C). If no partial-volume effects are present, the relative FRE becomes independent of voxel size (Supplementary Figure 2B), whereas the absolute FRE increases with voxel size (Supplementary Figure 2D). While the partial-volume effects for the relative FRE are substantial, they are much more subtle when using the absolute FRE and do not alter the overall characteristics."

      Supplementary Figure 2: Effect of voxel size and blood delivery time on the relative flow-related enhancement (FRE) using either a relative (A,B) (Eq. (3)) or an absolute (C,D) (Eq. (12)) FRE definition assuming a pial artery diameter of 200 μm (A,C) or 2 000 µm, i.e. no partial-volume effects at the central voxel of this artery considered here.

      In addition, we have also clarified the contribution of the two definitions and their interaction in the Discussion section. Following the suggestion of Reviewer 2, we have extended our interpretation of relative FRE. In brief, absolute FRE is closely related to the physical origin of the contrast, whereas relative FRE is much more concerned with the “segmentability” of a vessel (please see R2.8 for more details):

      "Extending classical FRE treatments to the pial vasculature

      There are several major modifications in our approach to this topic that might explain why, in contrast to predictions from classical FRE treatments, it is indeed possible to image pial arteries. For instance, the definition of vessel contrast or flow-related enhancement is often stated as an absolute difference between blood and tissue signal (Brown et al., 2014a; Carr and Carroll, 2012; Du et al., 1993, 1996; Haacke et al., 1990; Venkatesan and Haacke, 1997). Here, however, we follow the approach of Al-Kwifi et al. (2002) and consider relative contrast. While this distinction may seem to be semantic, the effect of voxel volume on FRE for these two definitions is exactly opposite: Du et al. (1996) concluded that larger voxel size increases the (absolute) vessel-background contrast, whereas here we predict an increase in relative FRE for small arteries with decreasing voxel size. Therefore, predictions of the depiction of small arteries with decreasing voxel size differ depending on whether one is considering absolute contrast, i.e. difference in longitudinal magnetization, or relative contrast, i.e. contrast differences independent of total voxel size. Importantly, this prediction changes for large arteries where the voxel contains only vessel lumen, in which case the relative FRE remains constant across voxel sizes, but the absolute FRE increases with voxel size (Supplementary Figure 2). Overall, the interpretations of relative and absolute FRE differ, and one measure may be more appropriate for certain applications than the other. Absolute FRE describes the difference in magnetization and is thus tightly linked to the underlying physical mechanism. Relative FRE, however, describes the image contrast and segmentability. If blood and tissue magnetization are equal, both contrast measures would equal zero and indicate that no contrast difference is present. However, when there is signal in the vessel and as the tissue magnetization approaches zero, the absolute FRE approaches the blood magnetization (assuming no partial-volume effects), whereas the relative FRE approaches infinity. While this infinite relative FRE does not directly relate to the underlying physical process of ‘infinite’ signal enhancement through inflowing blood, it instead characterizes the segmentability of the image in that an image with zero intensity in the background and non-zero values in the structures of interest can be segmented perfectly and trivially. Accordingly, numerous empirical observations (Al-Kwifi et al., 2002; Bouvy et al., 2014; Haacke et al., 1990; Ladd, 2007; Mattern et al., 2018; von Morze et al., 2007) and the data provided here (Figure 5, 6 and 7) have shown the benefit of smaller voxel sizes if the aim is to visualize and segment small arteries."

      Note that our formulation of the FRE—even without considering SNR—does not suggest that higher resolution is always better, but instead should be matched to the size of the target arteries:

      "Importantly, note that our treatment of the FRE does not suggest that an arbitrarily small voxel size is needed, but instead that voxel sizes appropriate for the arterial diameter of interest are beneficial (in line with the classic “matched-filter” rationale (North, 1963)). Voxels smaller than the arterial diameter would not yield substantial benefits (Figure 5) and may result in SNR reductions that would hinder segmentation performance."

      Further, we have also extended the concluding paragraph of the Imaging limitation section to also include a practical perspective:

      "In summary, numerous theoretical and practical considerations remain for optimal imaging of pial arteries using time-of-flight contrast. Depending on the application, advanced displacement artefact compensation strategies may be required, and zero-filling could provide better vessel depiction. Further, an optimal trade-off between SNR, voxel size and acquisition time needs to be found. Currently, the partial-volume FRE model only considers voxel size, and—as we reduced the voxel size in the experiments—we (partially) compensated the reduction in SNR through longer scan times. This, ultimately, also required the use of prospective motion correction to enable the very long acquisition times necessary for 140 µm isotropic voxel size. Often, anisotropic voxels are used to reduce acquisition time and increase SNR while maintaining in-plane resolution. This may indeed prove advantageous when the (also highly anisotropic) arteries align with the anisotropic acquisition, e.g. when imaging the large supplying arteries oriented mostly in the head-foot direction. In the case of pial arteries, however, there is not preferred orientation because of the convoluted nature of the pial arterial vasculature encapsulating the complex folding of the cortex (see section Anatomical architecture of the pial arterial vasculature). A further reduction in voxel size may be possible in dedicated research settings utilizing even longer acquisition times and/or larger acquisition volumes to maintain SNR. However, if acquisition time is limited, voxel size and SNR need to be carefully balanced against each other."

      3) The article seems to imply that TOF-MRA is the only adequate technique to image brain vasculature, while T2 mapping, UHF T1 mapping (see e.g. Choi et al., https://doi.org/10.1016/j.neuroimage.2020.117259) phase (e.g. Fan et al., doi:10.1038/jcbfm.2014.187), QSM (see e.g. Huck et al., https://doi.org/10.1007/s00429-019-01919-4), or a combination (Bernier et al., https://doi.org/10.1002/hbm.24337​, Ward et al., https://doi.org/10.1016/j.neuroimage.2017.10.049) all depict some level of vascular detail. It would be worth quickly reviewing the different effects of blood on MRI contrast and how those have been used in different approaches to measure vasculature. This would in particular help clarify the experiment combining TOF with T2 mapping used to separate arteries from veins (more on this question below).

      We apologize if we inadvertently created the impression that TOF-MRA is a suitable technique to image the complete brain vasculature, and we agree that susceptibility-based methods are much more suitable for venous structures. As outlined above, we have revised the manuscript in various sections to indicate that it is the pial arterial vasculature we are targeting. We have added a statement on imaging the venous vasculature in the Discussion section. Please see our response below regarding the use of T2* to separate arteries and veins.

      "The advantages of imaging the pial arterial vasculature using TOF-MRA without an exogenous contrast agent lie in its non-invasiveness and the potential to combine these data with various other structural and functional image contrasts provided by MRI. One common application is to acquire a velocity-encoded contrast such as phase-contrast MRA (Arts et al., 2021; Bouvy et al., 2016). Another interesting approach utilises the inherent time-of-flight contrast in magnetization-prepared two rapid acquisition gradient echo (MP2RAGE) images acquired at ultra-high field that simultaneously acquires vasculature and structural data, albeit at lower achievable resolution and lower FRE compared to the TOF-MRA data in our study (Choi et al., 2020). In summary, we expect high-resolution TOF-MRA to be applicable also for group studies to address numerous questions regarding the relationship of arterial topology and morphometry to the anatomical and functional organization of the brain, and the influence of arterial topology and morphometry on brain hemodynamics in humans. In addition, imaging of the pial venous vasculature—using susceptibility-based contrasts such as T2-weighted magnitude (Gulban et al., 2021) or phase imaging (Fan et al., 2015), susceptibility-weighted imaging (SWI) (Eckstein et al., 2021; Reichenbach et al., 1997) or quantitative susceptibility mapping (QSM) (Bernier et al., 2018; Huck et al., 2019; Mattern et al., 2019; Ward et al., 2018)—would enable a comprehensive assessment of the complete cortical vasculature and how both arteries and veins shape brain hemodynamics.*"

      4) The results, while very impressive, are mostly qualitative. This seems a missed opportunity to strengthen the points of the paper: given the segmentations already made, the amount/density of detected vessels could be compared across scans for the data of Fig. 5 and 7. The minimum distance between vessels could be measured in Fig. 8 to show a 2D distribution and/or a spatial map of the displacement. The number of vessels labeled as veins instead of arteries in Fig. 9 could be given.

      We fully agree that estimating these quantitative measures would be very interesting; however, this would require the development of a comprehensive analysis framework, which would considerably shift the focus of this paper from data acquisition and flow-related enhancement to data analysis. As noted in the discussion section Challenges for vessel segmentation algorithms, ‘The vessel segmentations presented here were performed to illustrate the sensitivity of the image acquisition to small pial arteries’, because the smallest arteries tend to be concealed in the maximum intensity projections. Further, the interpretation of these measures is not straightforward. For example, the number of detected vessels for the artery depicted in Figure 5 does not change across resolutions, but their length does. We have therefore estimated the relative increase in skeleton length across resolutions for Figures 5 and 7. However, these estimates are not only a function of the voxel size but also of the underlying vasculature, i.e. the number of arteries with a certain diameter present, and may thus not generalise well to enable quantitative predictions of the improvement expected from increased resolutions. We have added an illustration of these analyses in the Supplementary Material, and the following additions in the Methods, Results and Discussion sections.

      "For vessel segmentation, a semi-automatic segmentation pipeline was implemented in Matlab R2020a (The MathWorks, Natick, MA) using the UniQC toolbox (Frässle et al., 2021): First, a brain mask was created through thresholding which was then manually corrected in ITK-SNAP (http://www.itksnap.org/) (Yushkevich et al., 2006) such that pial vessels were included. For the high-resolution TOF data (Figures 6 and 7, Supplementary Figure 4), denoising to remove high frequency noise was performed using the implementation of an adaptive non-local means denoising algorithm (Manjón et al., 2010) provided in DenoiseImage within the ANTs toolbox, with the search radius for the denoising set to 5 voxels and noise type set to Rician. Next, the brain mask was applied to the bias corrected and denoised data (if applicable). Then, a vessel mask was created based on a manually defined threshold, and clusters with less than 10 or 5 voxels for the high- and low-resolution acquisitions, respectively, were removed from the vessel mask. Finally, an iterative region-growing procedure starting at each voxel of the initial vessel mask was applied that successively included additional voxels into the vessel mask if they were connected to a voxel which was already included and above a manually defined threshold (which was slightly lower than the previous threshold). Both thresholds were applied globally but manually adjusted for each slab. No correction for motion between slabs was applied. The Matlab code describing the segmentation algorithm as well as the analysis of the two-echo TOF acquisition outlined in the following paragraph are also included in our github repository (https://gitlab.com/SaskiaB/pialvesseltof.git). To assess the data quality, maximum intensity projections (MIPs) were created and the outline of the segmentation MIPs were added as an overlay. To estimate the increased detection of vessels with higher resolutions, we computed the relative increase in the length of the segmented vessels for the data presented in Figure 5 (0.8 mm, 0.5 mm, 0.4 mm and 0.3 mm isotropic voxel size) and Figure 7 (0.16 mm and 0.14 mm isotropic voxel size) by computing the skeleton using the bwskel Matlab function and then calculating the skeleton length as the number of voxels in the skeleton multiplied by the voxel size."

      "To investigate the effect of voxel size on vessel FRE, we acquired data at four different voxel sizes ranging from 0.8 mm to 0.3 mm isotropic resolution, adjusting only the encoding matrix, with imaging parameters being otherwise identical (FOV, TR, TE, flip angle, R, slab thickness, see section Data acquisition). The total acquisition time increases from less than 2 minutes for the lowest resolution scan to over 6 minutes for the highest resolution scan as a result. Figure 5 shows thin maximum intensity projections of a small vessel. While the vessel is not detectable at the largest voxel size, it slowly emerges as the voxel size decreases and approaches the vessel size. Presumably, this is driven by the considerable increase in FRE as seen in the single slice view (Figure 5, small inserts). Accordingly, the FRE computed from the vessel mask for the smallest part of the vessel (Figure 5, red mask) increases substantially with decreasing voxel size. More precisely, reducing the voxel size from 0.8 mm, 0.5 mm or 0.4 mm to 0.3 mm increases the FRE by 2900 %, 165 % and 85 %, respectively. Assuming a vessel diameter of 300 μm, the partial-volume FRE model (section Introducing a partial-volume model) would predict similar ratios of 611%, 178% and 78%. However, as long as the vessel is larger than the voxel (Figure 5, blue mask), the relative FRE does not change with resolution (see also Effect of FRE Definition and Interaction with Partial-Volume Model in the Supplementary Material). To illustrate the gain in sensitivity to detect smaller arteries, we have estimated the relative increase of the total length of the segmented vasculature (Supplementary Figure 9): reducing the voxel size from 0.8 mm to 0.5 mm isotropic increases the skeleton length by 44 %, reducing the voxel size from 0.5 mm to 0.4 mm isotropic increases the skeleton length by 28 %, and reducing the voxel size from 0.4 mm to 0.3 mm isotropic increases the skeleton length by 31 %. In summary, when imaging small pial arteries, these data support the hypothesis that it is primarily the voxel size, not the blood delivery time, which determines whether vessels can be resolved."

      "Indeed, the reduction in voxel volume by 33 % revealed additional small branches connected to larger arteries (see also Supplementary Figure 8). For this example, we found an overall increase in skeleton length of 14 % (see also Supplementary Figure 9)."

      "We therefore expect this strategy to enable an efficient image acquisition without the need for additional venous suppression RF pulses. Once these challenges for vessel segmentation algorithms are addressed, a thorough quantification of the arterial vasculature can be performed. For example, the skeletonization procedure used to estimate the increase of the total length of the segmented vasculature (Supplementary Figure 9) exhibits errors particularly in the unwanted sinuses and large veins. While they are consistently present across voxel sizes, and thus may have less impact on relative change in skeleton length, they need to be addressed when estimating the absolute length of the vasculature, or other higher-order features such as number of new branches. (Note that we have also performed the skeletonization procedure on the maximum intensity projections to reduce the number of artefacts and obtained comparable results: reducing the voxel size from 0.8 mm to 0.5 mm isotropic increases the skeleton length by 44 % (3D) vs 37 % (2D), reducing the voxel size from 0.5 mm to 0.4 mm isotropic increases the skeleton length by 28 % (3D) vs 26 % (2D), reducing the voxel size from 0.4 mm to 0.3 mm isotropic increases the skeleton length by 31 % (3D) vs 16 % (2D), and reducing the voxel size from 0.16 mm to 0.14 mm isotropic increases the skeleton length by 14 % (3D) vs 24 % (2D).)"

      Supplementary Figure 9: Increase of vessel skeleton length with voxel size reduction. Axial maximum intensity projections for data acquired with different voxel sizes ranging from 0.8 mm to 0.3 mm (TOP) (corresponding to Figure 5) and 0.16 mm to 0.14 mm isotropic (corresponding to Figure 7) are shown. Vessel skeletons derived from segmentations performed for each resolution are overlaid in red. A reduction in voxel size is accompanied by a corresponding increase in vessel skeleton length.

      Regarding further quantification of the vessel displacement presented in Figure 8, we have estimated the displacement using the Horn-Schunck optical flow estimator (Horn and Schunck, 1981; Mustafa, 2016) (https://github.com/Mustafa3946/Horn-Schunck-3D-Optical-Flow). However, the results are dominated by the larger arteries, whereas we are mostly interested in the displacement of the smallest arteries, therefore this quantification may not be helpful.

      Because the theoretical relationship between vessel displacement and blood velocity is well known (Eq. 7), and we have also outlined the expected blood velocity as a function of arterial diameter in Figure 2, which provided estimates of displacements that matched what was found in our data (as reported in our original submission), we believe that the new quantification in this form does not add value to the manuscript. What would be interesting would be to explore the use of this displacement artefact as a measure of blood velocities. This, however, would require more substantial analyses in particular for estimation of the arterial diameter and additional validation data (e.g. phase-contrast MRA). We have outlined this avenue in the Discussion section. What is relevant to the main aim of this study, namely imaging of small pial arteries, is the insight that blood velocities are indeed sufficiently fast to cause displacement artefacts even in smaller arteries. We have clarified this in the Results section:

      "Note that correction techniques exist to remove displaced vessels from the image (Gulban et al., 2021), but they cannot revert the vessels to their original location. Alternatively, this artefact could also potentially be utilised as a rough measure of blood velocity."

      "At a delay time of 10 ms between phase encoding and echo time, the observed displacement of approximately 2 mm in some of the larger vessels would correspond to a blood velocity of 200 mm/s, which is well within the expected range (Figure 2). For the smallest arteries, a displacement of one voxel (0.4 mm) can be observed, indicative of blood velocities of 40 mm/s. Note that the vessel displacement can be observed in all vessels visible at this resolution, indicating high blood velocities throughout much of the pial arterial vasculature. Thus, assuming a blood velocity of 40 mm/s (Figure 2) and a delay time of 5 ms for the high-resolution acquisitions (Figure 6), vessel displacements of 0.2 mm are possible, representing a shift of 1–2 voxels."

      Regarding the number of vessels labelled as veins, please see our response below to R1.5.

      In the main quantification given, the estimation of FRE increase with resolution, it would make more sense to perform the segmentation independently for each scan and estimate the corresponding FRE: using the mask from the highest resolution scan only biases the results. It is unclear also if the background tissue measurement one voxel outside took partial voluming into account (by leaving a one voxel free interface between vessel and background). In this analysis, it would also be interesting to estimate SNR, so you can compare SNR and FRE across resolutions, also helpful for the discussion on SNR.

      The FRE serves as an indicator of the potential performance of any segmentation algorithm (including manual segmentation) (also see our discussion on the interpretation of FRE in our response to R1.2). If we were to segment each scan individually, we would, in the ideal case, always obtain the same FRE estimate, as FRE influences the performance of the segmentation algorithm. In practice, this simply means that it is not possible to segment the vessel in the low-resolution image to its full extent that is visible in the high-resolution image, because the FRE is too low for small vessels. However, we agree with the core point that the reviewer is making, and so to help address this, a valuable addition would be to compare the FRE for the section of a vessel that is visible at all resolutions, where we found—within the accuracy of the transformations and resampling across such vastly different resolutions—that the FRE does not increase any further with higher resolution if the vessel is larger than the voxel size (page 18 and Figure 5). As stated in the Methods section, and as noted by the reviewer, we used the voxels immediately next to the vessel mask to define the background tissue signal level. Any resulting potential partial-volume effects in these background voxels would affect all voxel sizes, introducing a consistent bias that would not impact our comparison. However, inspection of the image data in Figure 5 showed partial-volume effects predominantly within those voxels intersecting the vessel, rather than voxels surrounding the vessel, in agreement with our model of FRE.

      "All imaging data were slab-wise bias-field corrected using the N4BiasFieldCorrection (Tustison et al., 2010) tool in ANTs (Avants et al., 2009) with the default parameters. To compare the empirical FRE across the four different resolutions (Figure 5), manual masks were first created for the smallest part of the vessel in the image with the highest resolution and for the largest part of the vessel in the image with the lowest resolution. Then, rigid-body transformation parameters from the low-resolution to the high-resolution (and the high-resolution to the low-resolution) images were estimated using coregister in SPM (https://www.fil.ion.ucl.ac.uk/spm/), and their inverse was applied to the vessel mask using SPM’s reslice. To calculate the empirical FRE (Eq. (3)), the mean of the intensity values within the vessel mask was used to approximate the blood magnetization, and the mean of the intensity values one voxel outside of the vessel mask was used as the tissue magnetization."

      "To investigate the effect of voxel size on vessel FRE, we acquired data at four different voxel sizes ranging from 0.8 mm to 0.3 mm isotropic resolution, adjusting only the encoding matrix, with imaging parameters being otherwise identical (FOV, TR, TE, flip angle, R, slab thickness, see section Data acquisition). The total acquisition time increases from less than 2 minutes for the lowest resolution scan to over 6 minutes for the highest resolution scan as a result. Figure 5 shows thin maximum intensity projections of a small vessel. While the vessel is not detectable at the largest voxel size, it slowly emerges as the voxel size decreases and approaches the vessel size. Presumably, this is driven by the considerable increase in FRE as seen in the single slice view (Figure 5, small inserts). Accordingly, the FRE computed from the vessel mask for the smallest part of the vessel (Figure 5, red mask) increases substantially with decreasing voxel size. More precisely, reducing the voxel size from 0.8 mm, 0.5 mm or 0.4 mm to 0.3 mm increases the FRE by 2900 %, 165 % and 85 %, respectively. Assuming a vessel diameter of 300 μm, the partial-volume FRE model (section Introducing a partial-volume model) would predict similar ratios of 611%, 178% and 78%. However, if the vessel is larger than the voxel (Figure 5, blue mask), the relative FRE remains constant across resolutions (see also Effect of FRE Definition and Interaction with Partial-Volume Model in the Supplementary Material). To illustrate the gain in sensitivity to smaller arteries, we have estimated the relative increase of the total length of the segmented vasculature (Supplementary Figure 9): reducing the voxel size from 0.8 mm to 0.5 mm isotropic increases the skeleton length by 44 %, reducing the voxel size from 0.5 mm to 0.4 mm isotropic increases the skeleton length by 28 %, and reducing the voxel size from 0.4 mm to 0.3 mm isotropic increases the skeleton length by 31 %. In summary, when imaging small pial arteries, these data support the hypothesis that it is primarily the voxel size, not blood delivery time, which determines whether vessels can be resolved."

      Figure 5: Effect of voxel size on flow-related vessel enhancement. Thin axial maximum intensity projections containing a small artery acquired with different voxel sizes ranging from 0.8 mm to 0.3 mm isotropic are shown. The FRE is estimated using the mean intensity value within the vessel masks depicted on the left, and the mean intensity values of the surrounding tissue. The small insert shows a section of the artery as it lies within a single slice. A reduction in voxel size is accompanied by a corresponding increase in FRE (red mask), whereas no further increase is obtained once the voxel size is equal or smaller than the vessel size (blue mask).

      After many internal discussions, we had to conclude that deducing a meaningful SNR analysis that would benefit the reader was not possible given the available data due to the complex relationship between voxel size and other imaging parameters in practice. In detail, we have reduced the voxel size but at the same time increased the acquisition time by increasing the number of encoding steps—which we have now also highlighted in the manuscript. We have, however, added additional considerations about balancing SNR and segmentation performance. Note that these considerations are not specific to imaging the pial arteries but apply to all MRA acquisitions, and have thus been discussed previously in the literature. Here, we wanted to focus on the novel insights gained in our study. Importantly, while we previously noted that reducing voxel size improves contrast in vessels whose diameters are smaller than the voxel size, we now explicitly acknowledge that, for vessels whose diameters are larger than the voxel size reducing the voxel size is not helpful---since it only reduces SNR without any gain in contrast---and may hinder segmentation performance, and thus become counterproductive.

      "In general, we have not considered SNR, but only FRE, i.e. the (relative) image contrast, assuming that segmentation algorithms would benefit from higher contrast for smaller arteries. Importantly, the acquisition parameters available to maximize FRE are limited, namely repetition time, flip angle and voxel size. SNR, however, can be improved via numerous avenues independent of these parameters (Brown et al., 2014b; Du et al., 1996; Heverhagen et al., 2008; Parker et al., 1991; Triantafyllou et al., 2011; Venkatesan and Haacke, 1997), the simplest being longer acquisition times. If the aim is to optimize a segmentation outcome for a given acquisition time, the trade-off between contrast and SNR for the specific segmentation algorithm needs to be determined (Klepaczko et al., 2016; Lesage et al., 2009; Moccia et al., 2018; Phellan and Forkert, 2017). Our own—albeit limited—experience has shown that segmentation algorithms (including manual segmentation) can accommodate a perhaps surprising amount of noise using prior knowledge and neighborhood information, making these high-resolution acquisitions possible. Importantly, note that our treatment of the FRE does not suggest that an arbitrarily small voxel size is needed, but instead that voxel sizes appropriate for the arterial diameter of interest are beneficial (in line with the classic “matched-filter” rationale (North, 1963)). Voxels smaller than the arterial diameter would not yield substantial benefits (Figure 5) and may result in SNR reductions that would hinder segmentation performance."

      5) The separation of arterial and venous components is a bit puzzling, partly because the methodology used is not fully explained, but also partly because the reasons invoked (flow artefact in large pial veins) do not match the results (many small vessels are included as veins). This question of separating both types of vessels is quite important for applications, so the whole procedure should be explained in detail. The use of short T2 seemed also sub-optimal, as both arteries and veins result in shorter T2 compared to most brain tissues: wouldn't a susceptibility-based measure (SWI or better QSM) provide a better separation? Finally, since the T2* map and the regular TOF map are at different resolutions, masking out the vessels labeled as veins will likely result in the smaller veins being left out.

      We agree that while the technical details of this approach were provided in the Data analysis section, the rationale behind it was only briefly mentioned. We have therefore included an additional section Inflow-artefacts in sinuses and pial veins in the Theory section of the manuscript. We have also extended the discussion of the advantages and disadvantages of the different susceptibility-based contrasts, namely T2, SWI and QSM. While in theory both T2 and QSM should allow the reliable differentiation of arterial and venous blood, we found T2* to perform more robustly, as QSM can fail in many places, e.g., due to the strong susceptibility sources within superior sagittal and transversal sinuses and pial veins and their proximity to the brain surface, dedicated processing is required (Stewart et al., 2022). Further, we have also elaborated in the Discussion section why the interpretation of Figure 9 regarding the absence or presence of small veins is challenging. Namely, the intensity-based segmentation used here provides only an incomplete segmentation even of the larger sinuses, because the overall lower intensity found in veins combined with the heterogeneity of the intensities in veins violates the assumptions made by most vascular segmentation approaches of homogenous, high image intensities within vessels, which are satisfied in arteries (page 29f) (see also the illustration below). Accordingly, quantifying the number of vessels labelled as veins (R1.4a) would provide misleading results, as often only small subsets of the same sinus or vein are segmented.

      "Inflow-artefacts in sinuses and pial veins

      Inflow in large pial veins and the sagittal and transverse sinuses can cause flow-related enhancement in these non-arterial vessels. One common strategy to remove this unwanted signal enhancement is to apply venous suppression pulses during the data acquisition, which saturate bloods spins outside the imaging slab. Disadvantages of this technique are the technical challenges of applying these pulses at ultra-high field due to constraints of the specific absorption rate (SAR) and the necessary increase in acquisition time (Conolly et al., 1988; Heverhagen et al., 2008; Johst et al., 2012; Maderwald et al., 2008; Schmitter et al., 2012; Zhang et al., 2015). In addition, optimal positioning of the saturation slab in the case of pial arteries requires further investigation, and in particular supressing signal from the superior sagittal sinus without interfering in the imaging of the pial arteries vasculature at the top of the cortex might prove challenging. Furthermore, this venous saturation strategy is based on the assumption that arterial blood is traveling head-wards while venous blood is drained foot-wards. For the complex and convoluted trajectory of pial vessels this directionality-based saturation might be oversimplified, particularly when considering the higher-order branches of the pial arteries and veins on the cortical surface. Inspired by techniques to simultaneously acquire a TOF image for angiography and a susceptibility-weighted image for venography (Bae et al., 2010; Deistung et al., 2009; Du et al., 1994; Du and Jin, 2008), we set out to explore the possibility of removing unwanted venous structures from the segmentation of the pial arterial vasculature during data postprocessing. Because arteries filled with oxygenated blood have T2-values similar to tissue, while veins have much shorter T2-values due to the presence of deoxygenated blood (Pauling and Coryell, 1936; Peters et al., 2007; Uludağ et al., 2009; Zhao et al., 2007), we used this criterion to remove vessels with short T2* values from the segmentation (see Data Analysis for details). In addition, we also explored whether unwanted venous structures in the high-resolution TOF images—where a two-echo acquisition is not feasible due to the longer readout—can be removed based on detecting them in a lower-resolution image."

      "Removal of pial veins

      Inflow in large pial veins and the superior sagittal and transverse sinuses can cause a flow-related enhancement in these non-arterial vessels (Figure 9, left). The higher concentration of deoxygenated haemoglobin in these vessels leads to shorter T2 values (Pauling and Coryell, 1936), which can be estimated using a two-echo TOF acquisition (see also Inflow-artefacts in sinuses and pial veins). These vessels can be identified in the segmentation based on their T2 values (Figure 9, left), and removed from the angiogram (Figure 9, right) (Bae et al., 2010; Deistung et al., 2009; Du et al., 1994; Du and Jin, 2008). In particular, the superior and inferior sagittal and the transversal sinuses and large veins which exhibited an inhomogeneous intensity profile and a steep loss of intensity at the slab boundary were identified as non-arterial (Figure 9, left). Further, we also explored the option of removing unwanted venous vessels from the high-resolution TOF image (Figure 7) using a low-resolution two-echo TOF (not shown). This indeed allowed us to remove the strong signal enhancement in the sagittal sinuses and numerous larger veins, although some small veins, which are characterised by inhomogeneous intensity profiles and can be detected visually by experienced raters, remain."

      Figure 9: Removal of non-arterial vessels in time-of-flight imaging. LEFT: Segmentation of arteries (red) and veins (blue) using T_2^ estimates. RIGHT: Time-of-flight angiogram after vein removal.*

      Our approach also assumes that the unwanted veins are large enough that they are also resolved in the low-resolution image. If we consider the source of the FRE effect, it might indeed be exclusively large veins that are present in TOF-MRA data, which would suggest that our assumption is valid. Fundamentally, the FRE depends on the inflow of un-saturated spins into the imaging slab. However, small veins drain capillary beds in the local tissue, i.e. the tissue within the slab. (Note that due to the slice oversampling implemented in our acquisition, spins just above or below the slab will also be excited.) Thus, small veins only contain blood water spins that have experienced a large number of RF pulses due to the long transit time through the pial arterial vasculature, the capillaries and the intracortical venules. Hence, their longitudinal magnetization would be similar to that of stationary tissue. To generate an FRE effect in veins, “pass-through” venous blood from outside the imaging slab is required. This is only available in veins that are passing through the imaging slab, which have much larger diameters. These theoretical considerations are corroborated by the findings in Figure 9, where large disconnected vessels with varying intensity profiles were identified as non-arterial. Due to the heterogenous intensity profiles in large veins and the sagittal and transversal sinuses, the intensity-based segmentation applied here may only label a subset of the vessel lumen, creating the impression of many small veins. This is particularly the case for the straight and inferior sagittal sinus in the bottom slab of Figure 9. Nevertheless, future studies potentially combing anatomical prior knowledge, advanced segmentation algorithms and susceptibility measures would be capable of removing these unwanted veins in post-processing to enable an efficient TOF-MRA image acquisition dedicated to optimally detecting small arteries without the need for additional venous suppression RF pulses.

      6) A more general question also is why this imaging method is limited to pial vessels: at 140 microns, the larger intra-cortical vessels should be appearing (group 6 in Duvernoy, 1981: diameters between 50 and 240 microns). Are there other reasons these vessels are not detected? Similarly, it seems there is no arterial vasculature detected in the white matter here: it is due to the rather superior location of the imaging slab, or a limitation of the method? Likewise, all three results focus on a rather homogeneous region of cerebral cortex, in terms of vascularisation. It would be interesting for applications to demonstrate the capabilities of the method in more complex regions, e.g. the densely vascularised cerebellum, or more heterogeneous regions like the midbrain. Finally, it is notable that all three subjects appear to have rather different densities of vessels, from sparse (participant II) to dense (participant I), with some inhomogeneities in density (frontal region in participant III) and inconsistencies in detection (sinuses absent in participant II). All these points should be discussed.

      While we are aware that the diameter of intracortical arteries has been suggested to be up to 240 µm (Duvernoy et al., 1981), it remains unclear how prevalent intracortical arteries of this size are. For example, note that in a different context in the Duvernoy study (in teh revised manuscript), the following values are mentioned (which we followed in Figure 1):

      “Central arteries of the Iobule always have a large diameter of 260 µ to 280 µ, at their origin. Peripheral arteries have an average diameter of 150 µ to 180 µ. At the cortex surface, all arterioles of 50 µ or less, penetrate the cortex or form anastomoses. The diameter of most of these penetrating arteries is approximately 40 µ.”

      Further, the examinations by Hirsch et al. (2012) (albeit in the macaque brain), showed one (exemplary) intracortical artery belonging to group 6 (Figure 1B), whose diameter appears to be below 100 µm. Given these discrepancies and the fact that intracortical arteries in group 5 only reach 75 µm, we suspect that intracortical arteries with diameters > 140 µm are a very rare occurrence, which we might not have encountered in this data set.

      Similarly, arteries in white matter (Nonaka et al., 2003) and the cerebellum (Duvernoy et al., 1983) are beyond our resolution at the moment. The midbrain is an interesting suggesting, although we believe that the cortical areas chosen here with their gradual reduction in diameter along the vascular tree, provide a better illustration of the effect of voxel size than the rather abrupt reduction in vascular diameter found in the midbrain. We have added the even higher resolution requirements in the discussion section:

      "In summary, we expect high-resolution TOF-MRA to be applicable also for group studies, to address numerous questions regarding the relationship of arterial topology and morphometry to the anatomical and functional organization of the brain, and the influence of arterial topology and morphometry on brain hemodynamics in humans. Notably, we have focused on imaging pial arteries of the human cerebrum; however, other brain structures such as the cerebellum, subcortex and white matter are of course also of interest. While the same theoretical considerations apply, imaging the arterial vasculature in these structures will require even smaller voxel sizes due to their smaller arterial diameters (Duvernoy et al., 1983, 1981; Nonaka et al., 2003)."

      Regarding the apparent sparsity of results from participant II, this is mostly driven by the much smaller coverage in this subject (19.6 mm in Participant II vs. 50 mm and 58 mm in Participant I and III, respectively). The reduction in density in the frontal regions might indeed constitute difference in anatomy or might be driven by the presence or more false-positive veins in Participant I than Participant III in these areas. Following the depiction in Duvernoy et al. (1981), one would not expect large arteries in frontal areas, but large veins are common. Thus, the additional vessels in Participant I in the frontal areas might well be false-positive veins, and their removal would result in similar densities for both participants. Indeed, as pointed out in section Future directions, we would expect a lower arterial density in frontal and posterior areas than in middle areas. The sinuses (and other large false-positive veins) in Participant II have been removed as outlined and discussed in sections Removal of pial veins and Challenges for vessel segmentation algorithms, respectively.

      7) One of the main practical limitations of the proposed method is the use of a very small imaging slab. It is mentioned in the discussion that thicker slabs are not only possible, but beneficial both in terms of SNR and acceleration possibilities. What are the limitations that prevented their use in the present study? With the current approach, what would be the estimated time needed to acquire the vascular map of an entire brain? It would also be good to indicate whether specific processing was needed to stitch together the multiple slab images in Fig. 6-9, S2.

      Time-of-flight acquisitions are commonly performed with thin acquisition slabs, following initial investigations by Parker et al. (1991) to maximise vessel sensitivity and minimize noise. We therefore followed this practice for our initial investigations but wanted to point out in the discussion that thicker slabs might provide several advantages that need to be evaluated in future studies. This would include theoretical and empirical evaluations balancing SNR gains from larger excitation volumes and SNR losses due to more acceleration. For this study, we have chosen the slab thickness such as to keep the acquisition time at a reasonable amount to minimize motion artefacts (as outlined in the Discussion). In addition, due to the extreme matrix sizes in particular for the 0.14 mm acquisition, we were also limited in the number of data points per image that can be indexed. This would require even more substantial changes to the sequence than what we have already performed. With 16 slabs, assuming optimal FOV orientation, full-brain coverage including the cerebellum of 95 % of the population (Mennes et al., 2014) could be achieved with an acquisition time of (16  11 min 42 s = 3 h 7 min 12 s) at 0.16 mm isotropic voxel size. No stitching of the individual slabs was performed, as subject motion was minimal. We have added a corresponding comment in the Data Analysis.

      "Both thresholds were applied globally but manually adjusted for each slab. No correction for motion between slabs was applied as subject motion was minimal. The Matlab code describing the segmentation algorithm as well es the analysis of the two-echo TOF acquisition outlined in the following paragraph are also included in the github repository (https://gitlab.com/SaskiaB/pialvesseltof.git)."

      8) Some researchers and clinicians will argue that you can attain best results with anisotropic voxels, combining higher SNR and higher resolution. It would be good to briefly mention why isotropic voxels are preferred here, and whether anisotropic voxels would make sense at all in this context.

      Anisotropic voxels can be advantageous if the underlying object is anisotropic, e.g. an artery running straight through the slab, which would have a certain diameter (imaged using the high-resolution plane) and an ‘infinite’ elongation (in the low-resolution direction). However, the vessels targeted here can have any orientation and curvature; an anisotropic acquisition could therefore introduce a bias favouring vessels with a particular orientation relative to the voxel grid. Note that the same argument applies when answering the question why a further reduction slab thickness would eventually result in less increase in FRE (section Introducing a partial-volume model). We have added a corresponding comment in our discussion on practical imaging considerations:

      "In summary, numerous theoretical and practical considerations remain for optimal imaging of pial arteries using time-of-flight contrast. Depending on the application, advanced displacement artefact compensation strategies may be required, and zero-filling could provide better vessel depiction. Further, an optimal trade-off between SNR, voxel size and acquisition time needs to be found. Currently, the partial-volume FRE model only considers voxel size, and—as we reduced the voxel size in the experiments—we (partially) compensated the reduction in SNR through longer scan times. This, ultimately, also required the use of prospective motion correction to enable the very long acquisition times necessary for 140 µm isotropic voxel size. Often, anisotropic voxels are used to reduce acquisition time and increase SNR while maintaining in-plane resolution. This may indeed prove advantageous when the (also highly anisotropic) arteries align with the anisotropic acquisition, e.g. when imaging the large supplying arteries oriented mostly in the head-foot direction. In the case of pial arteries, however, there is not preferred orientation because of the convoluted nature of the pial arterial vasculature encapsulating the complex folding of the cortex (see section Anatomical architecture of the pial arterial vasculature). A further reduction in voxel size may be possible in dedicated research settings utilizing even longer acquisition times and a larger field-of-view to maintain SNR. However, if acquisition time is limited, voxel size and SNR need to be carefully balanced against each other."

      Reviewer #2 (Public Review):

      Overview

      This paper explores the use of inflow contrast MRI for imaging the pial arteries. The paper begins by providing a thorough background description of pial arteries, including past studies investigating the velocity and diameter. Following this, the authors consider this information to optimize the contrast between pial arteries and background tissue. This analysis reveals spatial resolution to be a strong factor influencing the contrast of the pial arteries. Finally, experiments are performed on a 7T MRI to investigate: the effect of spatial resolution by acquiring images at multiple resolutions, demonstrate the feasibility of acquiring ultrahigh resolution 3D TOF, the effect of displacement artifacts, and the prospect of using T2* to remove venous voxels.

      Impression

      There is certainly interest in tools to improve our understanding of the architecture of the small vessels of the brain and this work does address this. The background description of the pial arteries is very complete and the manuscript is very well prepared. The images are also extremely impressive, likely benefiting from motion correction, 7T, and a very long scan time. The authors also commit to open science and provide the data in an open platform. Given this, I do feel the manuscript to be of value to the community; however, there are concerns with the methods for optimization, the qualitative nature of the experiments, and conclusions drawn from some of the experiments.

      Specific Comments :

      1) Figure 3 and Theory surrounding. The optimization shown in Figure 3 is based fixing the flip angle or the TR. As is well described in the literature, there is a strong interdependency of flip angle and TR. This is all well described in literature dating back to the early 90s. While I think it reasonable to consider these effects in optimization, the language needs to include this interdependency or simply reference past work and specify how the flip angle was chosen. The human experiments do not include any investigation of flip angle or TR optimization.

      We thank the reviewer for raising this valuable point, and we fully agree that there is an interdependency between these two parameters. To simplify our optimization, we did fix one parameter value at a time, but in the revised manuscript we clarified that both parameters can be optimized simultaneously. Importantly, a large range of parameter values will result in a similar FRE in the small artery regime, which is illustrated in the optimization provided in the main text. We have therefore chosen the repetition time based on encoding efficiency and then set a corresponding excitation flip angle. In addition, we have also provided additional simulations in the supplementary material outlining the interdependency for the case of pial arteries.

      "Optimization of repetition time and excitation flip angle

      As the main goal of the optimisation here was to start within an already established parameter range for TOF imaging at ultra-high field (Kang et al., 2010; Stamm et al., 2013; von Morze et al., 2007), we only needed to then further tailor these for small arteries by considering a third parameter, namely the blood delivery time. From a practical perspective, a TR of 20 ms as a reference point was favourable, as it offered a time-efficient readout minimizing wait times between excitations but allowing low encoding bandwidths to maximize SNR. Due to the interdependency of flip angle and repetition time, for any one blood delivery time any FRE could (in theory) be achieved. For example, a similar FRE curve at 18 ° flip angle and 5 ms TR can also be achieved at 28 ° flip angle and 20 ms TR; or the FRE curve at 18 ° flip angle and 30 ms TR is comparable to the FRE curve at 8 ° flip angle and 5 ms TR (Supplementary Figure 3 TOP). In addition, the difference between optimal parameter settings diminishes for long blood delivery times, such that at a blood delivery time of 500 ms (Supplementary Figure 3 BOTTOM), the optimal flip angle at a TR of 15 ms, 20 ms or 25 ms would be 14 °, 16 ° and 18 °, respectively. This is in contrast to a blood delivery time of 100 ms, where the optimal flip angles would be 32 °, 37 ° and 41 °. In conclusion, in the regime of small arteries, long TR values in combination with low flip angles ensure flow-related enhancement at blood delivery times of 200 ms and above, and within this regime there are marginal gains by further optimizing parameter values and the optimal values are all similar."

      Supplementary Figure 3: Optimal imaging parameters for small arteries. This assessment follows the simulations presented in Figure 3, but in addition shows the interdependency for the corresponding third parameter (either flip angle or repetition time). TOP: Flip angles close to the Ernst angle show only a marginal flow-related enhancement; however, the influence of the blood delivery time decreases further (LEFT). As the flip angle increases well above the values used in this study, the flow-related enhancement in the small artery regime remains low even for the longer repetition times considered here (RIGHT). BOTTOM: The optimal excitation flip angle shows reduced variability across repetition times in the small artery regime compared to shorter blood delivery times.

      "Based on these equations, optimal T_R and excitation flip angle values (θ) can be calculated for the blood delivery times under consideration (Figure 3). To better illustrate the regime of small arteries, we have illustrated the effect of either flip angle or T_R while keeping the other parameter values fixed to the value that was ultimately used in the experiments; although both parameters can also be optimized simultaneously (Haacke et al., 1990). Supplementary Figure 3 further delineates the interdependency between flip angle and T_R within a parameter range commonly used for TOF imaging at ultra-high field (Kang et al., 2010; Stamm et al., 2013; von Morze et al., 2007). Note how longer T_R values still provide an FRE effect even at very long blood delivery times, whereas using shorter T_R values can suppress the FRE effect (Figure 3, left). Similarly, at lower flip angles the FRE effect is still present for long blood delivery times, but it is not available anymore at larger flip angles, which, however, would give maximum FRE for shorter blood delivery times (Figure 3, right). Due to the non-linear relationships of both blood delivery time and flip angle with FRE, the optimal imaging parameters deviate considerably when comparing blood delivery times of 100 ms and 300 ms, but the differences between 300 ms and 1000 ms are less pronounced. In the following simulations and measurements, we have thus used a T_R value of 20 ms, i.e. a value only slightly longer than the readout of the high-resolution TOF acquisitions, which allowed time-efficient data acquisition, and a nominal excitation flip angle of 18°. From a practical standpoint, these values are also favorable as the low flip angle reduces the specific absorption rate (Fiedler et al., 2018) and the long T_R value decreases the potential for peripheral nerve stimulation (Mansfield and Harvey, 1993)."

      2) Figure 4 and Theory surrounding. A major limitation of this analysis is the lack of inclusion of noise in the analysis. I believe the results to be obvious that the FRE will be modulated by partial volume effects, here described quadratically by assuming the vessel to pass through the voxel. This would substantially modify the analysis, with a shift towards higher voxel volumes (scan time being equal). The authors suggest the FRE to be the dominant factor effecting segmentation; however, segmentation is limited by noise as much as contrast.

      We of course agree with the reviewer that contrast-to-noise ratio is a key factor that determines the detection of vessels and the quality of the segmentation, however there are subtleties regarding the exact inter-relationship between CNR, resolution, and segmentation performance.

      The main purpose of Figure 4 is not to provide a trade-off between flow-related enhancement and signal-to-noise ratio—in particular as SNR is modulated by many more factors than voxel size alone, e.g. acquisition time, coil geometry and instrumentation—but to decide whether the limiting factor for imaging pial arteries is the reduction in flow-related enhancement due to long blood delivery times (which is the explanation often found in the literature (Chen et al., 2018; Haacke et al., 1990; Masaryk et al., 1989; Mut et al., 2014; Park et al., 2020; Parker et al., 1991; Wilms et al., 2001; Wright et al., 2013)) or due to partial volume effects. Furthermore, when reducing voxel size one will also likely increase the number of encoding steps to maintain the imaging coverage (i.e., the field-of-view) and so the relationship between voxel size and SNR in practice is not straightforward. Therefore, we had to conclude that deducing a meaningful SNR analysis that would benefit the reader was not possible given the available data due to the complex relationship between voxel size and other imaging parameters. Note that these considerations are not specific to imaging the pial arteries but apply to all MRA acquisitions, and have thus been discussed previously in the literature. Here, we wanted to focus on the novel insights gained in our study, namely that it provides an expression for how relative FRE contrast changes with voxel size with some assumptions that apply for imaging pial arteries.

      Further, depending on the definition of FRE and whether partial-volume effects are included (see also our response to R2.8), larger voxel volumes have been found to be theoretically advantageous even when only considering contrast (Du et al., 1996; Venkatesan and Haacke, 1997), which is not in line with empirical observations (Al-Kwifi et al., 2002; Bouvy et al., 2014; Haacke et al., 1990; Ladd, 2007; Mattern et al., 2018; von Morze et al., 2007).

      The notion that vessel segmentation algorithms perform well on noisy data but poorly on low-contrast data was mainly driven by our own experiences. However, we still believe that the assumption that (all) segmentation algorithms are linearly dependent on contrast and noise (which the formulation of a contrast-to-noise ratio presumes) is similarly not warranted. Indeed, the necessary trade-off between FRE and SNR might be specific to the particular segmentation algorithm being used than a general property of the acquisition. Please also note that our analysis of the FRE does not suggest that an arbitrarily high resolution is needed. Importantly, while we previously noted that reducing voxel size improves contrast in vessels whose diameters are smaller than the voxel size, we now explicitly acknowledge that, for vessels whose diameters are larger than the voxel size reducing the voxel size is not helpful---since it only reduces SNR without any gain in contrast---and may hinder segmentation performance, and thus become counterproductive. But we take the reviewer’s point and also acknowledge that these intricacies need to be mentioned, and therefore we have rephrased the statement in the discussion in the following way:

      "In general, we have not considered SNR, but only FRE, i.e. the (relative) image contrast, assuming that segmentation algorithms would benefit from higher contrast for smaller arteries. Importantly, the acquisition parameters available to maximize FRE are limited, namely repetition time, flip angle and voxel size. SNR, however, can be improved via numerous avenues independent of these parameters (Brown et al., 2014b; Du et al., 1996; Heverhagen et al., 2008; Parker et al., 1991; Triantafyllou et al., 2011; Venkatesan and Haacke, 1997), the simplest being longer acquisition times. If the aim is to optimize a segmentation outcome for a given acquisition time, the trade-off between contrast and SNR for the specific segmentation algorithm needs to be determined (Klepaczko et al., 2016; Lesage et al., 2009; Moccia et al., 2018; Phellan and Forkert, 2017). Our own—albeit limited—experience has shown that segmentation algorithms (including manual segmentation) can accommodate a perhaps surprising amount of noise using prior knowledge and neighborhood information, making these high-resolution acquisitions possible. Importantly, note that our treatment of the FRE does not suggest that an arbitrarily small voxel size is needed, but instead that voxel sizes appropriate for the arterial diameter of interest are beneficial (in line with the classic “matched-filter” rationale (North, 1963)). Voxels smaller than the arterial diameter would not yield substantial benefits (Figure 5) and may result in SNR reductions that would hinder segmentation performance."

      3) Page 11, Line 225. "only a fraction of the blood is replaced" I think the language should be reworded. There are certainly water molecules in blood which have experience more excitation B1 pulses due to the parabolic flow upstream and the temporal variation in flow. There is magnetization diffusion which reduces the discrepancy; however, it seems pertinent to just say the authors assume the signal is represented by the average arrival time. This analysis is never verified and is only approximate anyways. The "blood dwell time" is also an average since voxels near the wall will travel more slowly. Overall, I recommend reducing the conjecture in this section.

      We fully agree that our treatment of the blood dwell time does not account for the much more complex flow patterns found in cortical arteries. However, our aim was not do comment on these complex patterns, but to help establish if, in the simplest scenario assuming plug flow, the often-mentioned slow blood flow requires multiple velocity compartments to describe the FRE (as is commonly done for 2D MRA (Brown et al., 2014a; Carr and Carroll, 2012)). We did not intend to comment on the effects of laminar flow or even more complex flow patterns, which would require a more in-depth treatment. However, as the small arteries targeted here are often just one voxel thick, all signals are indeed integrated within that voxel (i.e. there is no voxel near the wall that travels more slowly), which may average out more complex effects. We have clarified the purpose and scope of this section in the following way:

      "In classical descriptions of the FRE effect (Brown et al., 2014a; Carr and Carroll, 2012), significant emphasis is placed on the effect of multiple “velocity segments” within a slice in the 2D imaging case. Using the simplified plug-flow model, where the cross-sectional profile of blood velocity within the vessel is constant and effects such as drag along the vessel wall are not considered, these segments can be described as ‘disks’ of blood that do not completely traverse through the full slice within one T_R, and, thus, only a fraction of the blood in the slice is replaced. Consequently, estimation of the FRE effect would then need to accommodate contribution from multiple ‘disks’ that have experienced 1 to k RF pulses. In the case of 3D imaging as employed here, multiple velocity segments within one voxel are generally not considered, as the voxel sizes in 3D are often smaller than the slice thickness in 2D imaging and it is assumed that the blood completely traverses through a voxel each T_R. However, the question arises whether this assumption holds for pial arteries, where blood velocity is considerably lower than in intracranial vessels (Figure 2). To answer this question, we have computed the blood dwell time , i.e. the average time it takes the blood to traverse a voxel, as a function of blood velocity and voxel size (Figure 2). For reference, the blood velocity estimates from the three studies mentioned above (Bouvy et al., 2016; Kobari et al., 1984; Nagaoka and Yoshida, 2006) have been added in this plot as horizontal white lines. For the voxel sizes of interest here, i.e. 50–300 μm, blood dwell times are, for all but the slowest flows, well below commonly used repetition times (Brown et al., 2014a; Carr and Carroll, 2012; Ladd, 2007; von Morze et al., 2007). Thus, in a first approximation using the plug-flow model, it is not necessary to include several velocity segments for the voxel sizes of interest when considering pial arteries, as one might expect from classical treatments, and the FRE effect can be described by equations (1) – (3), simplifying our characterization of FRE for these vessels. When considering the effect of more complex flow patterns, it is important to bear in mind that the arteries targeted here are only one-voxel thick, and signals are integrated across the whole artery."

      4) Page 13, Line 260. "two-compartment modelling" I think this section is better labeled "Extension to consider partial volume effects" The compartments are not interacting in any sense in this work.

      Thank you for this suggestion. We have replaced the heading with Introducing a partial-volume model (page 14) and replaced all instances of ‘two-compartment model’ with ‘partial-volume model’.

      5) Page 14, Line 284. "In practice, a reduction in slab …." "reducing the voxel size is a much more promising avenue" There is a fair amount on conjecture here which is not supported by experiments. While this may be true, the authors also use a classical approach with quite thin slabs.

      The slab thickness used in our experiments was mainly limited by the acquisition time and the participants ability to lie still. We indeed performed one measurement with a very experienced participant with a thicker slab, but found that with over 20 minutes acquisition time, motion artefacts were unavoidable. The data presented in Figure 5 were acquired with similar slab thickness, supporting the statement that reducing the voxel size is a promising avenue for imaging small pial arteries. However, we indeed have not provided an empirical comparison of the effect of slab thickness. Nevertheless, we believe it remains useful to make the theoretical argument that due to the convoluted nature of the pial arterial vascular geometry, a reduction in slab thickness may not reduce the acquisition time if no reduction in intra-slab vessel length can be achieved, i.e. if the majority of the artery is still contained in the smaller slab. We have clarified the statement and removed the direct comparison (‘much more’ promising) in the following way:

      "In theory, a reduction in blood delivery time increases the FRE in both regimes, and—if the vessel is smaller than the voxel—so would a reduction in voxel size. In practice, a reduction in slab thickness―which is the default strategy in classical TOF-MRA to reduce blood delivery time―might not provide substantial FRE increases for pial arteries. This is due to their convoluted geometry (see section Anatomical architecture of the pial arterial vasculature), where a reduction in slab thickness may not necessarily reduce the vessel segment length if the majority of the artery is still contained within the smaller slab. Thus, given the small arterial diameter, reducing the voxel size is a promising avenue when imaging the pial arterial vasculature."

      6) Figure 5. These image differences are highly exaggerated by the lack of zero filling (or any interpolation) and the fact that the wildly different. The interpolation should be addressed, and the scan time discrepancy listed as a limitation.

      We have extended the discussion around zero-filling by including additional considerations based on the imaging parameters in Figure 5 and highlighted the substantial differences in voxel volume. Our choice not to perform zero-filling was driven by the open question of what an ‘optimal’ zero-filling factor would be. We have also highlighted the substantial differences in acquisition time when describing the results.

      Changes made to the results section:

      "To investigate the effect of voxel size on vessel FRE, we acquired data at four different voxel sizes ranging from 0.8 mm to 0.3 mm isotropic resolution, adjusting only the encoding matrix, with imaging parameters being otherwise identical (FOV, TR, TE, flip angle, R, slab thickness, see section Data acquisition). The total acquisition time increases from less than 2 minutes for the lowest resolution scan to over 6 minutes for the highest resolution scan as a result."

      Changes made to the discussion section:

      "Nevertheless, slight qualitative improvements in image appearance have been reported for higher zero-filling factors (Du et al., 1994), presumably owing to a smoother representation of the vessels (Bartholdi and Ernst, 1973). In contrast, Mattern et al. (2018) reported no improvement in vessel contrast for their high-resolution data. Ultimately, for each application, e.g. visual evaluation vs. automatic segmentation, the optimal zero-filling factor needs to be determined, balancing image appearance (Du et al., 1994; Zhu et al., 2013) with loss in statistical independence of the image noise across voxels. For example, in Figure 5, when comparing across different voxel sizes, the visual impression might improve with zero-filling. However, it remains unclear whether the same zero-filling factor should be applied for each voxel size, which means that the overall difference in resolution remains, namely a nearly 20-fold reduction in voxel volume when moving from 0.8-mm isotropic to 0.3-mm isotropic voxel size. Alternatively, the same ’zero-filled’ voxel sizes could be used for evaluation, although then nearly 94 % of the samples used to reconstruct the image with 0.8-mm voxel size would be zero-valued for a 0.3-mm isotropic resolution. Consequently, all data presented in this study were reconstructed without zero-filling."

      7) Figure 7. Given the limited nature of experiment may it not also be possible the subject moved more, had differing brain blood flow, etc. Were these lengthy scans acquired in the same session? Many of these differences could be attributed to other differences than the small difference in spatial resolution.

      The scans were acquired in the same session using the same prospective motion correction procedure. Note that the acquisition time of the images with 0.16 mm isotropic voxel size was comparatively short, taking just under 12 minutes. Although the difference in spatial resolution may seem small, it still amounts to a 33% reduction in voxel volume. For comparison, reducing the voxel size from 0.4 mm to 0.3 mm also ‘only’ reduces the voxel volume by 58 %—not even twice as much. Overall, we fully agree that additional validation and optimisation of the imaging parameters for pial arteries are beneficial and have added a corresponding statement to the Discussion section.

      Changes made to the results section (also in response to Reviewer 1 (R1.22))

      "We have also acquired one single slab with an isotropic voxel size of 0.16 mm with prospective motion correction for this participant in the same session to compare to the acquisition with 0.14 mm isotropic voxel size and to test whether any gains in FRE are still possible at this level of the vascular tree."

      Changes made to the discussion section:

      "Acquiring these data at even higher field strengths would boost SNR (Edelstein et al., 1986; Pohmann et al., 2016) to partially compensate for SNR losses due to acceleration and may enable faster imaging and/or smaller voxel sizes. This could facilitate the identification of the ultimate limit of the flow-related enhancement effect and identify at which stage of the vascular tree does the blood delivery time become the limiting factor. While Figure 7 indicates the potential for voxel sizes below 0.16 mm, the singular nature of this comparison warrants further investigations."

      8) Page 22, Line 395. Would the analysis be any different with an absolute difference? The FRE (Eq 6) divides by a constant value. Clearly there is value in the difference as other subtractive inflow imaging would have infinite FRE (not considering noise as the authors do).

      Absolutely; using an absolute FRE would result in the highest FRE for the largest voxel size, whereas in our data small vessels are more easily detected with the smallest voxel size. We also note that relative FRE would indeed become infinite if the value in the denominator representing the tissue signal was zero, but this special case highlights how relative FRE can help characterize “segmentability”: a vessel with any intensity surrounded by tissue with an intensity of zero is trivially/infinitely segmentatble. We have added this point to the revised manuscript as indicated below.

      Following the suggestion of Reviewer 1 (R1.2), we have included additional simulations to clarify the effects of relative FRE definition and partial-volume model, in which we show that only when considering both together are smaller voxel sizes advantageous (Supplementary Material).

      "Effect of FRE Definition and Interaction with Partial-Volume Model

      For the definition of the FRE effect in this study, we used a measure of relative FRE (Al-Kwifi et al., 2002) in combination with a partial-volume model (Eq. 6). To illustrate the effect of these two definitions, as well as their interaction, we have estimated the relative and absolute FRE for an artery with a diameter of 200 µm and 2 000 µm (i.e. no partial-volume effects). The absolute FRE explicitly takes the voxel volume into account, i.e. instead of Eq. (6) for the relative FRE we used"

      Eq. (1)

      Note that the division by

      to obtain the relative FRE removes the contribution of the total voxel volume

      "Supplementary Figure 2 shows that, when partial volume effects are present, the highest relative FRE arises in voxels with the same size as or smaller than the vessel diameter (Supplementary Figure 2A), whereas the absolute FRE increases with voxel size (Supplementary Figure 2C). If no partial-volume effects are present, the relative FRE becomes independent of voxel size (Supplementary Figure 2B), whereas the absolute FRE increases with voxel size (Supplementary Figure 2D). While the partial-volume effects for the relative FRE are substantial, they are much more subtle when using the absolute FRE and do not alter the overall characteristics."

      Supplementary Figure 2: Effect of voxel size and blood delivery time on the relative flow-related enhancement (FRE) using either a relative (A,B) (Eq. (3)) or an absolute (C,D) (Eq. (12)) FRE definition assuming a pial artery diameter of 200 μm (A,C) or 2 000 µm, i.e. no partial-volume effects at the central voxel of this artery considered here.

      Following the established literature (Brown et al., 2014a; Carr and Carroll, 2012; Haacke et al., 1990) and because we would ultimately derive a relative measure, we have omitted the effect of voxel volume on the longitudinal magnetization in our derivations, which make it appear as if we are dividing by a constant in Eq. 6, as the effect of total voxel volume cancels out for the relative FRE. We have now made this more explicit in our derivation of the partial volume model.

      "Introducing a partial-volume model

      To account for the effect of voxel volume on the FRE, the total longitudinal magnetization M_z needs to also consider the number of spins contained within in a voxel (Du et al., 1996; Venkatesan and Haacke, 1997). A simple approximation can be obtained by scaling the longitudinal magnetization with the voxel volume (Venkatesan and Haacke, 1997) . To then include partial volume effects, the total longitudinal magnetization in a voxel M_z^total becomes the sum of the contributions from the stationary tissue M_zS^tissue and the inflowing blood M_z^blood, weighted by their respective volume fractions V_rel:"

      A simple approximation can be obtained by scaling the longitudinal magnetization with the voxel volume (Venkatesan and Haacke, 1997) . To then include partial volume effects, the total longitudinal magnetization in a voxel M_z^total becomes the sum of the contributions from the stationary tissue M_zS^tissue and the inflowing blood M_z^blood, weighted by their respective volume fractions V_rel:

      Eq. (4)

      For simplicity, we assume a single vessel is located at the center of the voxel and approximate it to be a cylinder with diameter d_vessel and length l_voxel of an assumed isotropic voxel along one side. The relative volume fraction of blood V_rel^blood is the ratio of vessel volume within the voxel to total voxel volume (see section Estimation of vessel-volume fraction in the Supplementary Material), and the tissue volume fraction V_rel^tissue is the remainder that is not filled with blood, or

      Eq. (5)

      We can now replace the blood magnetization in equation Eq. (3) with the total longitudinal magnetization of the voxel to compute the FRE as a function of vessel-volume fraction:

      Eq. (6)

      Based on your suggestion, we have also extended our interpretation of relative and absolute FRE. Indeed, a subtractive flow technique where no signal in the background remains and only intensities in the object are present would have infinite relative FRE, as this basically constitutes a perfect segmentation (bar a simple thresholding step).

      "Extending classical FRE treatments to the pial vasculature

      There are several major modifications in our approach to this topic that might explain why, in contrast to predictions from classical FRE treatments, it is indeed possible to image pial arteries. For instance, the definition of vessel contrast or flow-related enhancement is often stated as an absolute difference between blood and tissue signal (Brown et al., 2014a; Carr and Carroll, 2012; Du et al., 1993, 1996; Haacke et al., 1990; Venkatesan and Haacke, 1997). Here, however, we follow the approach of Al-Kwifi et al. (2002) and consider relative contrast. While this distinction may seem to be semantic, the effect of voxel volume on FRE for these two definitions is exactly opposite: Du et al. (1996) concluded that larger voxel size increases the (absolute) vessel-background contrast, whereas here we predict an increase in relative FRE for small arteries with decreasing voxel size. Therefore, predictions of the depiction of small arteries with decreasing voxel size differ depending on whether one is considering absolute contrast, i.e. difference in longitudinal magnetization, or relative contrast, i.e. contrast differences independent of total voxel size. Importantly, this prediction changes for large arteries where the voxel contains only vessel lumen, in which case the relative FRE remains constant across voxel sizes, but the absolute FRE increases with voxel size (Supplementary Figure 9). Overall, the interpretations of relative and absolute FRE differ, and one measure may be more appropriate for certain applications than the other. Absolute FRE describes the difference in magnetization and is thus tightly linked to the underlying physical mechanism. Relative FRE, however, describes the image contrast and segmentability. If blood and tissue magnetization are equal, both contrast measures would equal zero and indicate that no contrast difference is present. However, when there is signal in the vessel and as the tissue magnetization approaches zero, the absolute FRE approaches the blood magnetization (assuming no partial-volume effects), whereas the relative FRE approaches infinity. While this infinite relative FRE does not directly relate to the underlying physical process of ‘infinite’ signal enhancement through inflowing blood, it instead characterizes the segmentability of the image in that an image with zero intensity in the background and non-zero values in the structures of interest can be segmented perfectly and trivially. Accordingly, numerous empirical observations (Al-Kwifi et al., 2002; Bouvy et al., 2014; Haacke et al., 1990; Ladd, 2007; Mattern et al., 2018; von Morze et al., 2007) and the data provided here (Figure 5, 6 and 7) have shown the benefit of smaller voxel sizes if the aim is to visualize and segment small arteries."

      9) Page 22, Line 400. "The appropriateness of " This also ignores noise. The absolute enhancement is the inherent magnetization available. The results in Figure 5, 6, 7 don't readily support a ratio over and absolute difference accounting for partial volume effects.

      We hope that with the additional explanations on the effects of relative FRE definition in combination with a partial-volume model and the interpretation of relative FRE provided in the previous response (R2.8) and that Figures 5, 6 and 7 show smaller arteries for smaller voxels, we were able to clarify our argument why only relative FRE in combination with a partial volume model can explain why smaller voxel sizes are advantageous for depicting small arteries.

      While we appreciate that there exists a fundamental relationship between SNR and voxel volume in MR (Brown et al., 2014b), this relationship is also modulated by many more factors (as we have argued in our responses to R2.2 and R1.4b).

      We hope that the additional derivations and simulations provided in the previous response have clarified why a relative FRE model in combination with a partial-volume model helps to explain the enhanced detectability of small vessels with small voxels.

      10) Page 24, Line 453. "strategies, such as radial and spiral acquisitions, experience no vessel displacement artefact" These do observe flow related distortions as well, just not typically called displacement.

      Yes, this is a helpful point, as these methods will also experience a degradation of spatial accuracy due to flow effects, which will propagate into errors in the segmentation.

      As the reviewer suggests, flow-related artefacts in radial and spiral acquisitions usually manifest as a slight blur, and less as the prominent displacement found in Cartesian sampling schemes. We have added a corresponding clarification to the Discussion section:

      "Other encoding strategies, such as radial and spiral acquisitions, experience no vessel displacement artefact because phase and frequency encoding take place in the same instant; although a slight blur might be observed instead (Nishimura et al., 1995, 1991). However, both trajectories pose engineering challenges and much higher demands on hardware and reconstruction algorithms than the Cartesian readouts employed here (Kasper et al., 2018; Shu et al., 2016); particularly to achieve 3D acquisitions with 160 µm isotropic resolution."

      11) Page 24, Line 272. "although even with this nearly ideal subject behaviour approximately 1 in 4 scans still had to be discarded and repeated" This is certainly a potential source of bias in the comparisons.

      We apologize if this section was written in a misleading way. For the comparison presented in Figure 7, we acquired one additional slab in the same session at 0.16 mm voxel size using the same prospective motion correction procedure as for the 0.14 mm data. For the images shown in Figure 6 and Supplementary Figure 4 at 0.16 mm voxel size, we did not use a motion correction system and, thus, had to discard a portion of the data. We have clarified that for the comparison of the high-resolution data, prospective motion correction was used for both resolutions. We have clarified this in the Discussion section:

      "This allowed for the successful correction of head motion of approximately 1 mm over the 60-minute scan session, showing the utility of prospective motion correction at these very high resolutions. Note that for the comparison in Figure 7, one slab with 0.16 mm voxel size was acquired in the same session also using the prospective motion correction system. However, for the data shown in Figure 6 and Supplementary Figure 4, no prospective motion correction was used, and we instead relied on the experienced participants who contributed to this study. We found that the acquisition of TOF data with 0.16 mm isotropic voxel size in under 12 minutes acquisition time per slab is possible without discernible motion artifacts, although even with this nearly ideal subject behaviour approximately 1 in 4 scans still had to be discarded and repeated."

      12) Page 25, Line 489. "then need to include the effects of various analog and digital filters" While the analysis may benefit from some of this, most is not at all required for analysis based on optimization of the imaging parameters.

      We have included all four correction factors for completeness, given the unique acquisition parameter and contrast space our time-of-flight acquisition occupies, e.g. very low bandwidth of only 100 Hz, very large matrix sizes > 1024 samples, ideally zero SNR in the background (fully supressed tissue signal). However, we agree that probably the most important factor is the non-central chi distribution of the noise in magnitude images from multiple-channel coil arrays, and have added this qualification in the text:

      "Accordingly, SNR predictions then need to include the effects of various analog and digital filters, the number of acquired samples, the noise covariance correction factor, and—most importantly—the non-central chi distribution of the noise statistics of the final magnitude image (Triantafyllou et al., 2011)."

      Al-Kwifi, O., Emery, D.J., Wilman, A.H., 2002. Vessel contrast at three Tesla in time-of-flight magnetic resonance angiography of the intracranial and carotid arteries. Magnetic Resonance Imaging 20, 181–187. https://doi.org/10.1016/S0730-725X(02)00486-1

      Arts, T., Meijs, T.A., Grotenhuis, H., Voskuil, M., Siero, J., Biessels, G.J., Zwanenburg, J., 2021. Velocity and Pulsatility Measures in the Perforating Arteries of the Basal Ganglia at 3T MRI in Reference to 7T MRI. Frontiers in Neuroscience 15. Avants, B.B., Tustison, N., Song, G., 2009. Advanced normalization tools (ANTS). Insight j 2, 1–35. Bae, K.T., Park, S.-H., Moon, C.-H., Kim, J.-H., Kaya, D., Zhao, T., 2010. Dual-echo arteriovenography imaging with 7T MRI: CODEA with 7T. J. Magn. Reson. Imaging 31, 255–261. https://doi.org/10.1002/jmri.22019

      Bartholdi, E., Ernst, R.R., 1973. Fourier spectroscopy and the causality principle. Journal of Magnetic Resonance (1969) 11, 9–19. https://doi.org/10.1016/0022-2364(73)90076-0

      Bernier, M., Cunnane, S.C., Whittingstall, K., 2018. The morphology of the human cerebrovascular system. Human Brain Mapping 39, 4962–4975. https://doi.org/10.1002/hbm.24337

      Bouvy, W.H., Biessels, G.J., Kuijf, H.J., Kappelle, L.J., Luijten, P.R., Zwanenburg, J.J.M., 2014. Visualization of Perivascular Spaces and Perforating Arteries With 7 T Magnetic Resonance Imaging: Investigative Radiology 49, 307–313. https://doi.org/10.1097/RLI.0000000000000027

      Bouvy, W.H., Geurts, L.J., Kuijf, H.J., Luijten, P.R., Kappelle, L.J., Biessels, G.J., Zwanenburg, J.J.M., 2016. Assessment of blood flow velocity and pulsatility in cerebral perforating arteries with 7-T quantitative flow MRI: Blood Flow Velocity And Pulsatility In Cerebral Perforating Arteries. NMR Biomed. 29, 1295–1304. https://doi.org/10.1002/nbm.3306

      Brown, R.W., Cheng, Y.-C.N., Haacke, E.M., Thompson, M.R., Venkatesan, R., 2014a. Chapter 24 - MR Angiography and Flow Quantification, in: Magnetic Resonance Imaging. John Wiley & Sons, Ltd, pp. 701–737. https://doi.org/10.1002/9781118633953.ch24

      Brown, R.W., Cheng, Y.-C.N., Haacke, E.M., Thompson, M.R., Venkatesan, R., 2014b. Chapter 15 - Signal, Contrast, and Noise, in: Magnetic Resonance Imaging. John Wiley & Sons, Ltd, pp. 325–373. https://doi.org/10.1002/9781118633953.ch15

      Carr, J.C., Carroll, T.J., 2012. Magnetic resonance angiography: principles and applications. Springer, New York. Cassot, F., Lauwers, F., Fouard, C., Prohaska, S., Lauwers-Cances, V., 2006. A Novel Three-Dimensional Computer-Assisted Method for a Quantitative Study of Microvascular Networks of the Human Cerebral Cortex. Microcirculation 13, 1–18. https://doi.org/10.1080/10739680500383407

      Chen, L., Mossa-Basha, M., Balu, N., Canton, G., Sun, J., Pimentel, K., Hatsukami, T.S., Hwang, J.-N., Yuan, C., 2018. Development of a quantitative intracranial vascular features extraction tool on 3DMRA using semiautomated open-curve active contour vessel tracing: Comprehensive Artery Features Extraction From 3D MRA. Magn. Reson. Med 79, 3229–3238. https://doi.org/10.1002/mrm.26961

      Choi, U.-S., Kawaguchi, H., Kida, I., 2020. Cerebral artery segmentation based on magnetization-prepared two rapid acquisition gradient echo multi-contrast images in 7 Tesla magnetic resonance imaging. NeuroImage 222, 117259. https://doi.org/10.1016/j.neuroimage.2020.117259

      Conolly, S., Nishimura, D., Macovski, A., Glover, G., 1988. Variable-rate selective excitation. Journal of Magnetic Resonance (1969) 78, 440–458. https://doi.org/10.1016/0022-2364(88)90131-X

      Deistung, A., Dittrich, E., Sedlacik, J., Rauscher, A., Reichenbach, J.R., 2009. ToF-SWI: Simultaneous time of flight and fully flow compensated susceptibility weighted imaging. J. Magn. Reson. Imaging 29, 1478–1484. https://doi.org/10.1002/jmri.21673

      Detre, J.A., Leigh, J.S., Williams, D.S., Koretsky, A.P., 1992. Perfusion imaging. Magnetic Resonance in Medicine 23, 37–45. https://doi.org/10.1002/mrm.1910230106

      Du, Y., Parker, D.L., Davis, W.L., Blatter, D.D., 1993. Contrast-to-Noise-Ratio Measurements in Three-Dimensional Magnetic Resonance Angiography. Investigative Radiology 28, 1004–1009. Du, Y.P., Jin, Z., 2008. Simultaneous acquisition of MR angiography and venography (MRAV). Magn. Reson. Med. 59, 954–958. https://doi.org/10.1002/mrm.21581

      Du, Y.P., Parker, D.L., Davis, W.L., Cao, G., 1994. Reduction of partial-volume artifacts with zero-filled interpolation in three-dimensional MR angiography. J. Magn. Reson. Imaging 4, 733–741. https://doi.org/10.1002/jmri.1880040517

      Du, Y.P., Parker, D.L., Davis, W.L., Cao, G., Buswell, H.R., Goodrich, K.C., 1996. Experimental and theoretical studies of vessel contrast-to-noise ratio in intracranial time-of-flight MR angiography. Journal of Magnetic Resonance Imaging 6, 99–108. https://doi.org/10.1002/jmri.1880060120

      Duvernoy, H., Delon, S., Vannson, J.L., 1983. The Vascularization of The Human Cerebellar Cortex. Brain Research Bulletin 11, 419–480. Duvernoy, H.M., Delon, S., Vannson, J.L., 1981. Cortical blood vessels of the human brain. Brain Research Bulletin 7, 519–579. https://doi.org/10.1016/0361-9230(81)90007-1

      Eckstein, K., Bachrata, B., Hangel, G., Widhalm, G., Enzinger, C., Barth, M., Trattnig, S., Robinson, S.D., 2021. Improved susceptibility weighted imaging at ultra-high field using bipolar multi-echo acquisition and optimized image processing: CLEAR-SWI. NeuroImage 237, 118175. https://doi.org/10.1016/j.neuroimage.2021.118175

      Edelstein, W.A., Glover, G.H., Hardy, C.J., Redington, R.W., 1986. The intrinsic signal-to-noise ratio in NMR imaging. Magn. Reson. Med. 3, 604–618. https://doi.org/10.1002/mrm.1910030413

      Fan, A.P., Govindarajan, S.T., Kinkel, R.P., Madigan, N.K., Nielsen, A.S., Benner, T., Tinelli, E., Rosen, B.R., Adalsteinsson, E., Mainero, C., 2015. Quantitative oxygen extraction fraction from 7-Tesla MRI phase: reproducibility and application in multiple sclerosis. J Cereb Blood Flow Metab 35, 131–139. https://doi.org/10.1038/jcbfm.2014.187

      Fiedler, T.M., Ladd, M.E., Bitz, A.K., 2018. SAR Simulations & Safety. NeuroImage 168, 33–58. https://doi.org/10.1016/j.neuroimage.2017.03.035

      Frässle, S., Aponte, E.A., Bollmann, S., Brodersen, K.H., Do, C.T., Harrison, O.K., Harrison, S.J., Heinzle, J., Iglesias, S., Kasper, L., Lomakina, E.I., Mathys, C., Müller-Schrader, M., Pereira, I., Petzschner, F.H., Raman, S., Schöbi, D., Toussaint, B., Weber, L.A., Yao, Y., Stephan, K.E., 2021. TAPAS: An Open-Source Software Package for Translational Neuromodeling and Computational Psychiatry. Front. Psychiatry 12. https://doi.org/10.3389/fpsyt.2021.680811

      Gulban, O.F., Bollmann, S., Huber, R., Wagstyl, K., Goebel, R., Poser, B.A., Kay, K., Ivanov, D., 2021. Mesoscopic Quantification of Cortical Architecture in the Living Human Brain. https://doi.org/10.1101/2021.11.25.470023

      Haacke, E.M., Masaryk, T.J., Wielopolski, P.A., Zypman, F.R., Tkach, J.A., Amartur, S., Mitchell, J., Clampitt, M., Paschal, C., 1990. Optimizing blood vessel contrast in fast three-dimensional MRI. Magn. Reson. Med. 14, 202–221. https://doi.org/10.1002/mrm.1910140207

      Helthuis, J.H.G., van Doormaal, T.P.C., Hillen, B., Bleys, R.L.A.W., Harteveld, A.A., Hendrikse, J., van der Toorn, A., Brozici, M., Zwanenburg, J.J.M., van der Zwan, A., 2019. Branching Pattern of the Cerebral Arterial Tree. Anat Rec 302, 1434–1446. https://doi.org/10.1002/ar.23994

      Heverhagen, J.T., Bourekas, E., Sammet, S., Knopp, M.V., Schmalbrock, P., 2008. Time-of-Flight Magnetic Resonance Angiography at 7 Tesla. Investigative Radiology 43, 568–573. https://doi.org/10.1097/RLI.0b013e31817e9b2c

      Hirsch, S., Reichold, J., Schneider, M., Székely, G., Weber, B., 2012. Topology and Hemodynamics of the Cortical Cerebrovascular System. J Cereb Blood Flow Metab 32, 952–967. https://doi.org/10.1038/jcbfm.2012.39

      Horn, B.K.P., Schunck, B.G., 1981. Determining optical flow. Artificial Intelligence 17, 185–203. https://doi.org/10.1016/0004-3702(81)90024-2

      Huck, J., Wanner, Y., Fan, A.P., Jäger, A.-T., Grahl, S., Schneider, U., Villringer, A., Steele, C.J., Tardif, C.L., Bazin, P.-L., Gauthier, C.J., 2019. High resolution atlas of the venous brain vasculature from 7 T quantitative susceptibility maps. Brain Struct Funct 224, 2467–2485. https://doi.org/10.1007/s00429-019-01919-4

      Johst, S., Wrede, K.H., Ladd, M.E., Maderwald, S., 2012. Time-of-Flight Magnetic Resonance Angiography at 7 T Using Venous Saturation Pulses With Reduced Flip Angles. Investigative Radiology 47, 445–450. https://doi.org/10.1097/RLI.0b013e31824ef21f

      Kang, C.-K., Park, C.-A., Kim, K.-N., Hong, S.-M., Park, C.-W., Kim, Y.-B., Cho, Z.-H., 2010. Non-invasive visualization of basilar artery perforators with 7T MR angiography. Journal of Magnetic Resonance Imaging 32, 544–550. https://doi.org/10.1002/jmri.22250

      Kasper, L., Engel, M., Barmet, C., Haeberlin, M., Wilm, B.J., Dietrich, B.E., Schmid, T., Gross, S., Brunner, D.O., Stephan, K.E., Pruessmann, K.P., 2018. Rapid anatomical brain imaging using spiral acquisition and an expanded signal model. NeuroImage 168, 88–100. https://doi.org/10.1016/j.neuroimage.2017.07.062

      Klepaczko, A., Szczypiński, P., Deistung, A., Reichenbach, J.R., Materka, A., 2016. Simulation of MR angiography imaging for validation of cerebral arteries segmentation algorithms. Computer Methods and Programs in Biomedicine 137, 293–309. https://doi.org/10.1016/j.cmpb.2016.09.020

      Kobari, M., Gotoh, F., Fukuuchi, Y., Tanaka, K., Suzuki, N., Uematsu, D., 1984. Blood Flow Velocity in the Pial Arteries of Cats, with Particular Reference to the Vessel Diameter. J Cereb Blood Flow Metab 4, 110–114. https://doi.org/10.1038/jcbfm.1984.15

      Ladd, M.E., 2007. High-Field-Strength Magnetic Resonance: Potential and Limits. Top Magn Reson Imaging 18, 139–152. Lesage, D., Angelini, E.D., Bloch, I., Funka-Lea, G., 2009. A review of 3D vessel lumen segmentation techniques: Models, features and extraction schemes. Medical Image Analysis 13, 819–845. https://doi.org/10.1016/j.media.2009.07.011

      Maderwald, S., Ladd, S.C., Gizewski, E.R., Kraff, O., Theysohn, J.M., Wicklow, K., Moenninghoff, C., Wanke, I., Ladd, M.E., Quick, H.H., 2008. To TOF or not to TOF: strategies for non-contrast-enhanced intracranial MRA at 7 T. Magn Reson Mater Phy 21, 159. https://doi.org/10.1007/s10334-007-0096-9

      Manjón, J.V., Coupé, P., Martí‐Bonmatí, L., Collins, D.L., Robles, M., 2010. Adaptive non-local means denoising of MR images with spatially varying noise levels. Journal of Magnetic Resonance Imaging 31, 192–203. https://doi.org/10.1002/jmri.22003

      Mansfield, P., Harvey, P.R., 1993. Limits to neural stimulation in echo-planar imaging. Magn. Reson. Med. 29, 746–758. https://doi.org/10.1002/mrm.1910290606

      Masaryk, T.J., Modic, M.T., Ross, J.S., Ruggieri, P.M., Laub, G.A., Lenz, G.W., Haacke, E.M., Selman, W.R., Wiznitzer, M., Harik, S.I., 1989. Intracranial circulation: preliminary clinical results with three-dimensional (volume) MR angiography. Radiology 171, 793–799. https://doi.org/10.1148/radiology.171.3.2717754

      Mattern, H., Sciarra, A., Godenschweger, F., Stucht, D., Lüsebrink, F., Rose, G., Speck, O., 2018. Prospective motion correction enables highest resolution time-of-flight angiography at 7T: Prospectively Motion-Corrected TOF Angiography at 7T. Magn. Reson. Med 80, 248–258. https://doi.org/10.1002/mrm.27033

      Mattern, H., Sciarra, A., Lüsebrink, F., Acosta‐Cabronero, J., Speck, O., 2019. Prospective motion correction improves high‐resolution quantitative susceptibility mapping at 7T. Magn. Reson. Med 81, 1605–1619. https://doi.org/10.1002/mrm.27509

      Mennes, M., Jenkinson, M., Valabregue, R., Buitelaar, J.K., Beckmann, C., Smith, S., 2014. Optimizing full-brain coverage in human brain MRI through population distributions of brain size. NeuroImage 98, 513–520. https://doi.org/10.1016/j.neuroimage.2014.04.030 Moccia, S., De Momi, E., El Hadji, S., Mattos, L.S., 2018. Blood vessel segmentation algorithms — Review of methods, datasets and evaluation metrics. Computer Methods and Programs in Biomedicine 158, 71–91. https://doi.org/10.1016/j.cmpb.2018.02.001

      Mustafa, M.A.R., 2016. A data-driven learning approach to image registration. Mut, F., Wright, S., Ascoli, G.A., Cebral, J.R., 2014. Morphometric, geographic, and territorial characterization of brain arterial trees. International Journal for Numerical Methods in Biomedical Engineering 30, 755–766. https://doi.org/10.1002/cnm.2627

      Nagaoka, T., Yoshida, A., 2006. Noninvasive Evaluation of Wall Shear Stress on Retinal Microcirculation in Humans. Invest. Ophthalmol. Vis. Sci. 47, 1113. https://doi.org/10.1167/iovs.05-0218

      Nishimura, D.G., Irarrazabal, P., Meyer, C.H., 1995. A Velocity k-Space Analysis of Flow Effects in Echo-Planar and Spiral Imaging. Magnetic Resonance in Medicine 33, 549–556. https://doi.org/10.1002/mrm.1910330414

      Nishimura, D.G., Jackson, J.I., Pauly, J.M., 1991. On the nature and reduction of the displacement artifact in flow images. Magnetic Resonance in Medicine 22, 481–492. https://doi.org/10.1002/mrm.1910220255

      Nonaka, H., Akima, M., Hatori, T., Nagayama, T., Zhang, Z., Ihara, F., 2003. Microvasculature of the human cerebral white matter: Arteries of the deep white matter. Neuropathology 23, 111–118. https://doi.org/10.1046/j.1440-1789.2003.00486.x

      North, D.O., 1963. An Analysis of the factors which determine signal/noise discrimination in pulsed-carrier systems. Proceedings of the IEEE 51, 1016–1027. https://doi.org/10.1109/PROC.1963.2383

      Park, C.S., Hartung, G., Alaraj, A., Du, X., Charbel, F.T., Linninger, A.A., 2020. Quantification of blood flow patterns in the cerebral arterial circulation of individual (human) subjects. Int J Numer Meth Biomed Engng 36. https://doi.org/10.1002/cnm.3288

      Parker, D.L., Goodrich, K.C., Roberts, J.A., Chapman, B.E., Jeong, E.-K., Kim, S.-E., Tsuruda, J.S., Katzman, G.L., 2003. The need for phase-encoding flow compensation in high-resolution intracranial magnetic resonance angiography. J. Magn. Reson. Imaging 18, 121–127. https://doi.org/10.1002/jmri.10322

      Parker, D.L., Yuan, C., Blatter, D.D., 1991. MR angiography by multiple thin slab 3D acquisition. Magn. Reson. Med. 17, 434–451. https://doi.org/10.1002/mrm.1910170215

      Pauling, L., Coryell, C.D., 1936. The magnetic properties and structure of hemoglobin, oxyhemoglobin and carbonmonoxyhemoglobin. Proceedings of the National Academy of Sciences 22, 210–216. https://doi.org/10.1073/pnas.22.4.210

      Payne, S.J., 2017. Cerebral Blood Flow And Metabolism: A Quantitative Approach. World Scientific. Peters, A.M., Brookes, M.J., Hoogenraad, F.G., Gowland, P.A., Francis, S.T., Morris, P.G., Bowtell, R., 2007. T2* measurements in human brain at 1.5, 3 and 7 T. Magnetic Resonance Imaging 25, 748–753. https://doi.org/10.1016/j.mri.2007.02.014

      Pfeifer, R.A., 1930. Grundlegende Untersuchungen für die Angioarchitektonik des menschlichen Gehirns. Berlin: Julius Springer. Phellan, R., Forkert, N.D., 2017. Comparison of vessel enhancement algorithms applied to time-of-flight MRA images for cerebrovascular segmentation. Medical Physics 44, 5901–5915. https://doi.org/10.1002/mp.12560

      Pohmann, R., Speck, O., Scheffler, K., 2016. Signal-to-Noise Ratio and MR Tissue Parameters in Human Brain Imaging at 3, 7, and 9.4 Tesla Using Current Receive Coil Arrays. Magn. Reson. Med. 75, 801–809. https://doi.org/10.1002/mrm.25677

      Reichenbach, J.R., Venkatesan, R., Schillinger, D.J., Kido, D.K., Haacke, E.M., 1997. Small vessels in the human brain: MR venography with deoxyhemoglobin as an intrinsic contrast agent. Radiology 204, 272–277. https://doi.org/10.1148/radiology.204.1.9205259 Schmid, F., Barrett, M.J.P., Jenny, P., Weber, B., 2019. Vascular density and distribution in neocortex. NeuroImage 197, 792–805. https://doi.org/10.1016/j.neuroimage.2017.06.046

      Schmitter, S., Bock, M., Johst, S., Auerbach, E.J., Uğurbil, K., Moortele, P.-F.V. de, 2012. Contrast enhancement in TOF cerebral angiography at 7 T using saturation and MT pulses under SAR constraints: Impact of VERSE and sparse pulses. Magnetic Resonance in Medicine 68, 188–197. https://doi.org/10.1002/mrm.23226

      Schulz, J., Boyacioglu, R., Norris, D.G., 2016. Multiband multislab 3D time-of-flight magnetic resonance angiography for reduced acquisition time and improved sensitivity. Magn Reson Med 75, 1662–8. https://doi.org/10.1002/mrm.25774

      Shu, C.Y., Sanganahalli, B.G., Coman, D., Herman, P., Hyder, F., 2016. New horizons in neurometabolic and neurovascular coupling from calibrated fMRI, in: Progress in Brain Research. Elsevier, pp. 99–122. https://doi.org/10.1016/bs.pbr.2016.02.003

      Stamm, A.C., Wright, C.L., Knopp, M.V., Schmalbrock, P., Heverhagen, J.T., 2013. Phase contrast and time-of-flight magnetic resonance angiography of the intracerebral arteries at 1.5, 3 and 7 T. Magnetic Resonance Imaging 31, 545–549. https://doi.org/10.1016/j.mri.2012.10.023

      Stewart, A.W., Robinson, S.D., O’Brien, K., Jin, J., Widhalm, G., Hangel, G., Walls, A., Goodwin, J., Eckstein, K., Tourell, M., Morgan, C., Narayanan, A., Barth, M., Bollmann, S., 2022. QSMxT: Robust masking and artifact reduction for quantitative susceptibility mapping. Magnetic Resonance in Medicine 87, 1289–1300. https://doi.org/10.1002/mrm.29048

      Stucht, D., Danishad, K.A., Schulze, P., Godenschweger, F., Zaitsev, M., Speck, O., 2015. Highest Resolution In Vivo Human Brain MRI Using Prospective Motion Correction. PLoS ONE 10, e0133921. https://doi.org/10.1371/journal.pone.0133921

      Szikla, G., Bouvier, G., Hori, T., Petrov, V., 1977. Angiography of the Human Brain Cortex. Springer Berlin Heidelberg, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-81145-6

      Triantafyllou, C., Polimeni, J.R., Wald, L.L., 2011. Physiological noise and signal-to-noise ratio in fMRI with multi-channel array coils. NeuroImage 55, 597–606. https://doi.org/10.1016/j.neuroimage.2010.11.084

      Tustison, N.J., Avants, B.B., Cook, P.A., Zheng, Y., Egan, A., Yushkevich, P.A., Gee, J.C., 2010. N4ITK: Improved N3 Bias Correction. IEEE Transactions on Medical Imaging 29, 1310–1320. https://doi.org/10.1109/TMI.2010.2046908

      Uludağ, K., Müller-Bierl, B., Uğurbil, K., 2009. An integrative model for neuronal activity-induced signal changes for gradient and spin echo functional imaging. NeuroImage 48, 150–165. https://doi.org/10.1016/j.neuroimage.2009.05.051

      Venkatesan, R., Haacke, E.M., 1997. Role of high resolution in magnetic resonance (MR) imaging: Applications to MR angiography, intracranial T1-weighted imaging, and image interpolation. International Journal of Imaging Systems and Technology 8, 529–543. https://doi.org/10.1002/(SICI)1098-1098(1997)8:6<529::AID-IMA5>3.0.CO;2-C

      von Morze, C., Xu, D., Purcell, D.D., Hess, C.P., Mukherjee, P., Saloner, D., Kelley, D.A.C., Vigneron, D.B., 2007. Intracranial time-of-flight MR angiography at 7T with comparison to 3T. J. Magn. Reson. Imaging 26, 900–904. https://doi.org/10.1002/jmri.21097

      Ward, P.G.D., Ferris, N.J., Raniga, P., Dowe, D.L., Ng, A.C.L., Barnes, D.G., Egan, G.F., 2018. Combining images and anatomical knowledge to improve automated vein segmentation in MRI. NeuroImage 165, 294–305. https://doi.org/10.1016/j.neuroimage.2017.10.049

      Wilms, G., Bosmans, H., Demaerel, Ph., Marchal, G., 2001. Magnetic resonance angiography of the intracranial vessels. European Journal of Radiology 38, 10–18. https://doi.org/10.1016/S0720-048X(01)00285-6

      Wright, S.N., Kochunov, P., Mut, F., Bergamino, M., Brown, K.M., Mazziotta, J.C., Toga, A.W., Cebral, J.R., Ascoli, G.A., 2013. Digital reconstruction and morphometric analysis of human brain arterial vasculature from magnetic resonance angiography. NeuroImage 82, 170–181. https://doi.org/10.1016/j.neuroimage.2013.05.089

      Yushkevich, P.A., Piven, J., Hazlett, H.C., Smith, R.G., Ho, S., Gee, J.C., Gerig, G., 2006. User-guided 3D active contour segmentation of anatomical structures: Significantly improved efficiency and reliability. NeuroImage 31, 1116–1128. https://doi.org/10.1016/j.neuroimage.2006.01.015

      Zhang, Z., Deng, X., Weng, D., An, J., Zuo, Z., Wang, B., Wei, N., Zhao, J., Xue, R., 2015. Segmented TOF at 7T MRI: Technique and clinical applications. Magnetic Resonance Imaging 33, 1043–1050. https://doi.org/10.1016/j.mri.2015.07.002

      Zhao, J.M., Clingman, C.S., Närväinen, M.J., Kauppinen, R.A., van Zijl, P.C.M., 2007. Oxygenation and hematocrit dependence of transverse relaxation rates of blood at 3T. Magn. Reson. Med. 58, 592–597. https://doi.org/10.1002/mrm.21342

      Zhu, X., Tomanek, B., Sharp, J., 2013. A pixel is an artifact: On the necessity of zero-filling in fourier imaging. Concepts Magn. Reson. 42A, 32–44. https://doi.org/10.1002/cmr.a.21256

    1. Author Response

      Reviewer #1 (Public Review):

      The data support the claims, and the manuscript does not have significant weaknesses in its present form. Key strengths of the paper include using a creative HR-based reporter system combining different inducible DSB positions along a chromosome arm and testing plasmid-based and chromosomal donor sequences. Combining that system with the visualization of specific chromosomal sites via microscopy is powerful. Overall, this work will constitute a timely and helpful contribution to the field of DSB/genome mobility in DNA repair, especially in yeast, and may inform similar mechanisms in other organisms. Importantly, this study also reconciles some of the apparent contradictions in the field.

      We thank the reviewer for these positive comments on the quality of the THRIV system, in helping us to understand global mobility and to reconcile the different studies in the field. The possibility that these mobilities also exist in other organisms is attractive because they could be a way to anticipate the position of the damage in the genome and its possible outcome.

      Reviewer #2 (Public Review):

      The authors are clarifying the role of global mobility in homologous recombination (HR). Global mobility is positively correlated with recombinant product formation in some reports. However, some studies argue the contrary and report that global mobility is not essential for HR. To characterize the role of global chromatin mobility during HR, the authors set up a system in haploid yeast cells that allows simultaneously tracking of HR at the single-cell level and allows the analysis of different positions of the DSB induction. By moving the position of the DSB within their system, the authors postulate that the chromosomal conformation surrounding a DNA break affects the global mobility response. Finally, the authors assessed the contributions of H2A(X) phosphorylation, checkpoint progression and Rad51 in the mobility response.

      One of the strengths of the manuscript is the development of "THRIV" as an efficient method for tracking homologous recombination in vivo. The authors take advantage of the power of yeast genetics and use gene deletions and as well as mutations to test the contribution of H2A(X) phosphorylation, checkpoint progression and Rad51 to the mobility response in their THRIV system.

      A major weakness in the manuscript is the lack of a marker to indicate that DSB formation has occurred (or is occurring)? Although at 6 hours there is 80% I-SceI cutting, around 20% of the cells are uncut and cannot be distinguished from the ones that are cut (or have already been repaired). Thus, the MSD analysis is done in the blind with respect to cells actually undergoing DSB repair.

      The authors clearly outlined their aims and have substantial evidence to support their conclusions. They discovered new features of global mobility that may clear up some of the controversies in the field. They overinterpreted some of their observations, but these criticisms can be easily addressed.

      The authors addressed conflicting results concerning the importance of global mobility to HR and their results aid in reconciling some of the controversies in the field. A key strength of this manuscript is the analysis of global mobility in response to breaks at different locations within chromosomes? They identified two types of DSB-induced global chromatin mobility involved in HR and postulate that they differ based on the position of the DSB. For example, DSBs close to the centromere exhibit increased global mobility that is not essential for repair and depends solely on H2A(X) phosphorylation. However, if the DSB is far away from the centromere, then global mobility is essential for HR and is dependent on H2A(X) phosphorylation, checkpoint progression as well as the Rad51 recombinase.

      The Bloom lab had previously identified differences in mobility based on the position of the tracked site. However, in the study reported here, the mobility response is analyzed after inducing DSBs located at different positions along the chromosome.

      They also addressed the question of the importance of the Rad51 protein in increased global mobility in haploid cells. Previous studies used DNA damaging agents that induce DSBs randomly throughout the genome, where it would have been rare to induce DSBs near the centromere. In the studies reported in this manuscript, they find no increase in global mobility in a rad51∆ background for breaks induced near the centromere (proximal), but find that breaks induced near the telomeres (distal), are dependent on both gamma-H2A(X) spreading and the Rad51 recombinase.

      We thank the referee for his constructive comments on the strength of our system to accurately determine the impact of a DSB according to its position in the genome. Concerning the issue of damaged cells that were not detected, it is a very important and exciting issue because it confronts our data with the question of biological heterogeneity. We provide evidence on the consistency of our findings despite the lack of detection of undamaged cells.

      Reviewer #3 (Public Review):

      In this study, Garcia Fernandez et al. employ a variety of genetic constructs to define the mechanism underlying the global chromatin mobility elicited in response to a single DNA double-strand break (DSB). Such local and global chromatin mobility increases have been described a decade ago by the Gasser and Rothstein laboratories, and a number of determinants have been identified: one epistasis group results in H2A-S129 phosphorylation via Rad9 and Mec1 activation. The mechanism is thought to be due to chromatin rigidification (Herbert 2017; Miné-Hattab 2017) or general eviction of histones (Cheblal 2020). More enigmatic, global chromatin mobility increase also depends on Rad51, a central recombination protein downstream of checkpoint activation (Smith & Rothstein 2017), which is also required for local DSB mobility (Dion .. Gasser 2012). The authors set out to address this difficulty in the field.

      A premise of their study is the convergence of two types of observations: First, the H2A phosphorylation ChIP profile matches that of Rad51, with both spreading in trans on other chromosomes at the level of centromeres when a DSB occurs in the vicinity of one of them (Renkawitz 2014). Second, global mobility depends on H2A phosphorylation and on Rad51 (their previous study Herbert 2017). They thus address whether the Rad51-ssDNA filament (and associated proteins) marks the chromatin engaged during the homology search. They found that the extent of the mobility depends on the residency time of the filament in a particular genomic and nuclear region, which can be induced at an initially distant trans site by providing a region of homology. Unfortunately, these findings are not clearly apparent from the title and the abstract, and in fact somewhat misrepresented in the manuscript, which would call for a rewrite (see points below).

      The main goal of our study was to understand the role of global mobility in the repair by homologous recombination, depending on the location of the damage. We found distinct global mobility mechanisms, in particular in the involvement of the Rad51 nucleofilament, depending on whether the DSB was pericentromeric or not. It is thus likely that when the DSB is far from the pericentromere, the residence time of the Rad51 nucleofilament with the donor has an impact on global mobility. Thus, if our experiments were not designed to answer directly the question of the residence time of the nucleofilament, we now discuss in more detail the causes and consequences of the global mobility.

      To this end, they induce the formation of a site-specific DSB in either of two regions: a centromere-proximal region and a telomere-proximal region, and measure the mobility of an undamaged site near the centromere on another chromosome (with a LacO-LacI-GFP system). This system reveals that only the centromere-proximal DSB induces the mobility of the centromere-proximal undamaged site, in a Rad9- and Rad51-independent manner. Providing a homologous donor in the vicinity of the LacO array (albeit in trans) restores its mobility when the DSB is located in a subtelomeric region, in a Rad9- and Rad51-dependent fashion. These genetic requirements are the same as those described for local DSB mobility (Dion & Gasser 2012), drawing a link between the two types of mobility, which to my knowledge was not described. The authors should focus their message (too scattered in the current manuscript), on these key findings and the diffusive "painting" model, in which the canvas is H2A, the moving paintbrush Mec1, and the hand the Rad51-ssDNA filament whose movement depends on Rad9. In the absence of Rad51-Rad9 the hand stays still, only decorating H2A in its immediate environment. The amount of paint deposited depends on the residency time of the Rad51-ssDNA-Mec1 filament in a given nuclear region. This synthesis is in agreement with the data presented and contrasts with their proposal that "two types of global mobility" exist.

      The brush model is very useful in explaining the distal mobility, which indeed is linked to local mobility genetic requirements, but it is also helpful to think of different model than the brush model when pericentromeric damage occurs. To stay in the terms of painting technique, this model would be similar to the pouring technique, when oil paint is deposited on water and spreads in a multidirectional manner. It is likely that Mec1 or Tel1 are the factors responsible for this spreading pattern. We therefore propose to maintain the notion of two distinct types of mobilities. Without going into pictorial techniques in the text, we have attempted to clarify these two models in the manuscript.

      The rest of the manuscript attempts to define a role in DSB repair of this phosphor-H2A-dependent mobility, using a fluorescence recovery assay upon DSB repair. They correlate a defect in the centromere-proximal mobility (in the rad9 or h2a-s129a mutant) when a DSB is distantly induced in the subtelomere with a defect in repairing the DSB. Repair efficiency is not affected by these mutations when the donor is located initially close to the DSB site. This part is less convincing, as repair failure specifically at a distant donor in the rad9 and H2A-S129A mutants may result from other defects relating to chromatin than its mobility (i.e. affecting homology sampling, DNA strand invasion, D-loop extension, D-loop disruption, etc), which could be partially alleviated by repeated DSB-donor encounters when the two are spatially close. In fact, suggesting that undamaged site mobility is required for the early step of the homology search directly contradicts the fact that the centromere-proximal mobility induced by a subtelomeric DSB depends on the presence of a donor near the centromere: mobility is thus a product of homology identification and increased Rad51-ssDNA filament residency in the vicinity of the centromere, and so downstream of homology search. This is a major pitfall in their interpretation and model.

      We thank the referee for helping to clarify the question of the cause and consequence of global mobility. As he pointed out, the fact that a donor is required to observe both H2A phosphorylation and distal mobility implicates the recombination process itself, as well as the residence time of the Rad51 nucleofilament, in the ƴ--‐H2A(X) spreading and indicates that recombination would be the cause of distal mobility. In contrast, the fact that proximal mobility can exist independently of homologous recombination suggests that in this particular configuration, HR would then be a consequence of proximal mobility.

      In conclusion, I think the data presented are of importance, as they identify a link between local and global chromatin mobility. The authors should rewrite their manuscript and reorganize the figures to focus on the painter model that their data support. I propose experiments that will help bolster the manuscript conclusions.

      1) Attempt dual-color tracking of the DSB (i.e. Rad52-mCherry or Ddc1-mCherry) and the donor site, and track MSD as a function of proximity between the DSB and the Lac array (with DSB +/-dCen). The expectation is that only upon contact (or after getting in close range) should the MSD at the centromere-proximal LacO array increase with a DSB at a subtelomere. Furthermore, this approach will help distinguish MSDs in cells bearing a DSB (Rad52 foci) from undamaged ones (no Rad52 foci)(see Mine-Hattab & Rothstein 2012). This would help overcome the inefficient DSB induction of their system (less than 50% at 1 hr post-galactose addition, and reaching 80% at 6 hr). For the reader to have a better appreciation of the data distribution, replace the whisker plots of MSD at 10 seconds with either scatter dot plot or violin plots, whichever conveys most clearly the distribution of the data: indeed, a bimodal distribution is expected in the current data, with undamaged cells having lower, and damaged cells having higher MSDs.

      The reviewer raises two points here.

      The first point concerns the residence time of the Rad51 filament with the donor when a subtelomeric DSB happens. Measuring the DSBs as a function of the distance between donor and Rad52mCherry (or Ddc1--‐mCherry) would allow deciding on the cause or the consequence of the global mobility. Thus, if mobility is the consequence of (stochastic) contact, leading to a better efficiency of homologous recombination, we would see an increase in MSDs only when the distance between donor and filament would be small. Conversely, if global mobility is the cause of contact, the increase in mobility would be visible even when the distance between donor and filament is large. It would be necessary to have a labelling system with 3 different fluorophores — the one for the global mobility, the one for the donor and the one allowing following the filament. This triple labelling is still to be developed.

      The second point concerns the important question of the heterogeneity of a population, a central challenge in biology. Here we wish to distinguish between undamaged and damaged cells. Even if a selection of the damaged cells had been made, this would not solve entirely the inherent cell to cell variation: at a given time, it is possible that a cell, although damaged, moves little and conversely that a cell moves more, even if not damaged. The question of heterogeneity is therefore important and the subject of intense research that goes beyond the framework of our work (Altschuler and Wu, 2010). However, in order to start to clarify if a bias could exist when considering a mixed population (20% undamaged and 80% damaged), we analyzed MSDs, using a scatter plot. We considered two population of cells where the damage is the best controlled, i.e. i) the red population which we know has been repaired and, importantly, has lost the cut site and will be not cut again (undamaged--‐only population) and ii) the white population, blocked in G2/M, because it is damaged and not repaired (damaged--‐only population). These two populations show very significant differences in their median MSDs. We artificially mixed the MSDs values obtained from these two populations at a rate of 20% of undamaged--‐only cells and 80% of damaged--‐only cells. We observed that the mean MSDs of the damaged--‐only and undamaged--‐only cells were significantly different. Yet, the mean MSD of damaged--‐only cells was not statistically different from the mean MSD from the 20%--‐80% mixed cell population. Thus, the conclusions based on the average MSDs of all cells remain consistent.

      Scatter plot showing the MSD at 10 seconds of the damaged-­‐only population (in white), the repaired-­‐only population (in red), or the 20%-­‐80% mixed population

      2) Perform the phospho-H2A ChIP-qPCR in the C and S strains in the absence of Rad51 and Rad9, to strengthen the painter model.

      ChIP experiments in mutant backgrounds as well as phosphorylation/dephosphorylation kinetics would corroborate the mobility data described here, but are beyond the scope of this manuscript. Yet, a phospho--‐ H2A ChIP experiment was performed in a Δrad51 mutant in Renkawitz et al. 2013. In that case, γH2A propagation was restricted only to the region around the DSB, corroborating both the requirement for Rad51 in distal mobility and the lack of requirement for Rad51 in proximal mobility.

      3) Their data at least partly run against previously published results, or fail to account for them. For instance, it is hard to see how their model (or the painter model), could explain the constitutively activated global mobility increase observed by Smith .. Rothstein 2018 in a rad51 rad52 mutant. Furthermore, the gasser lab linked the increased chromatin mobility to a general loss of histones genome-wide, which would be inconsistent with the more localized mechanism proposed here. Do they represent an independent mechanism? These conflicting observations need to be discussed in detail.

      Apart from the fact that the mechanisms in place in a haploid or a diploid cell are not necessarily comparable, it is not clear to us that our data are inconsistent with that of Smith et al. (Smith et al., 2018). Indeed, it is not known by which mechanisms the increase in global mobility is constitutively activated in a Δrad51 Δrad52 mutant. But according to their hypothesis the induction of a checkpoint is likely and so is the phosphorylation of H2A. It would be interesting to verify γH2A in such a context. This question is now mentioned in the main text.

      Concerning histone loss, it appears to be different depending on the number of DSBs. Upon multiple DNA damage following genotoxic treatment with Zeocin, Susan Gasser's group has clearly established that nucleosome loss occurs (Cheblal et al., 2020; Hauer et al., 2017). Nucleosome loss, like H2A phosphorylation as we have shown (Garcia Fernandez et al., 2021; Herbert et al., 2017), leads to increased global mobility. The state of chromatin following these histone losses or modifications is not yet fully understood, but could coexist. In the case of a single DSB by HO, it is the local mobility of the MAT locus that is examined (Fig3B in (Cheblal et al., 2020). In this case, the increase in mobility is indeed dependent on Arp8 which controls histone degradation and correlates with a polymer pattern consistent with normal chromatin. It is likely that histone degradation occurs locally when a single DSB occurs. Concerning histone loss genome wide, the question remains open. If histone eviction nevertheless occurred globally upon a single DSB, both types of modifications could be possible. This aspect is now mentioned in the discussion.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Li et al characterize sex differences in the impact of macrophage RELMa in protection against diet-induced obesity [DIO]. This is a key area of interest as obesity studies in mice have generally focused exclusively on male animals, as they tend to gain more weight, faster than female mice. The authors use a combination of flow cytometry, adoptive transfer, and single-cell transcriptomics to characterize the mechanism of action for female-specific DIO protection. They identify a potential role for eosinophils in mediating female DIO protection downstream of RELMa production by macrophage. They also use the transcriptomic characterization of the stromal vascular fraction of the adipose tissue to evaluate molecular and cellular drivers of this sex-specific DIO protection.

      Although the authors provide solid evidence for many claims in the manuscript, there is generally not enough information about the studies' methods (especially on the computational/data analysis aspects) for a careful evaluation of the result's robustness at this stage.

      We have significantly expanded the methodology, especially of the scRNAseq, and deposited the script and raw data in public repositories. We also validated our methods and can confirm that the analysis presented is robust. This resubmission contains new Fig 7 and new supplementary material with this methodology and validation.

      Reviewer #2 (Public Review):

      In the study by Li et al., the authors hypothesize that RELMa, a macrophage-derived protein, plays a sex-dimorphic role as a protective factor in obesity in females vs males. The authors perform largely in vivo studies utilizing male and female WT and RELMa KO mice on a high-fat diet and perform an in-depth analysis of immune cell composition, gene expression, and single-cell RNA Sequencing. The authors find that WT females are protected from obesity and inflammation vs males, and this protection is lost in female RELMa KO mice. Further analysis by the authors including flow cytometry of the visceral fat SVF in female WT mice showed reduced macrophage infiltration, higher levels of eosinophils, and Th2 cytokine expression compared to WT male mice and female KO mice. The authors show that protection from obesity and inflammation in female RELMa KO mice can be rescued with an injection of eosinophils and recombinant RELMa. Lastly, the authors use single-cell RNA-Sequencing to further analyze SVF cells in WT and KO male and female mice on a high-fat diet.

      Overall, we find that the study represents an important finding in the immunometabolism field showing that RELMa is a key myeloid-derived factor that helps influence the macrophage-eosinophil function in female mice and protects from diet-induced obesity and inflammation in a sexually dimorphic manner. Overall, the study provides strong and convincing data supporting the authors' hypothesis and conclusion.

      We thank the reviewer for their positive review of our manuscript and their helpful feedback which we address below.

      Reviewer #3 (Public Review):

      Li, Ruggiero-Ruff et al. examine the role of RELMα, an anti-inflammatory macrophage signature gene, in mediating sex differences in high-fat diet (HFD)-induced obesity in young mice. Specifically, the authors hypothesize that RELMα protects females against HFD-induced obesity. Comparisons between RELMα-knockout (KO) and wildtype (WT) mice of both sexes revealed sex- and RELMα-specific differences in weight gain, immune cell populations, and inflammatory signaling in response to HFD. RELMα-deficiency in females led to increased weight gain, expansion of pro-inflammatory macrophage populations, and eosinophil loss in response to HFD. Female RELMα-deficiency could be rescued by RELMα treatment or eosinophil transfer. Single-cell RNA-sequencing (scRNA-seq) of adipose stromal vascular fraction (SVF) revealed sex- and RELMα-dependent differences under HFD conditions and identified potential "pro-obesity" and "anti-obesity" genes in a cell-type-specific manner. Using trajectory analysis, the authors suggest dysregulation of macrophage-to-monocyte transition in RELMα-deficient mice.

      The conclusions of this paper are mostly well supported by the data, but some aspects of the statistical and single-cell analyses will need to be corrected, clarified, and extended to enhance the report.

      We thank Dr. Ocanas for their positive comments and for the helpful feedback to improve our study. We have addressed all the comments and significantly revised the manuscript.

      Strengths:

      The authors use several orthogonal approaches (i.e., flow cytometry, immunohistochemistry, scRNA-Seq) and models to support their hypotheses.

      The authors demonstrate that phenotypes observed in HFD-fed females with RELMα-deficiency (i.e., weight gain, loss of eosinophils, a gain of M1 macrophages) can be rescued by RELMα treatment or eosinophil transfer.

      The authors recognized the complexity of macrophage activation that is beyond the 'M1/M2' paradigm and informed readers in the introduction as to why this paradigm was used in this study. During the scRNA-seq analyses, the authors further sub-cluster macrophages to include more granularity.

      Weaknesses:

      1) There are several instances in the text where the authors claim that there is a significant difference between the two groups, but the statistics for these comparisons are not shown in the figure.

      Because we are dealing with three variables: genotype, diet and sex, and many differences, we thought it too complicated to add all the significant differences on the graph, but sometimes just mentioned these in the text with a p value, or didn’t mention at all if the difference was obvious, or not meaningful (for example, we weren’t interested in comparing a WT male on a Ctr diet with a RELMalpha KO female on a HFD for the purpose of our hypothesis). We have now ensured clarity in the text and in the figures, and addressed the specific point-by-point comments from the reviewer. We have also now carefully re-evaluated the text to ensure that any significant differences we discuss are shown in the figure.

      2) It is unfortunate that eosinophils could not be identified in the single-cell analysis since this population of cells was shown to be important in rescuing the RELMα-deficiency in HFD-fed females. The authors should note in the discussion how future scRNA-Seq experiments could overcome this limitation (i.e., enriching immune cells prior to scRNA-Seq).

      We were indeed disappointed that we were not able to obtain eosinophil single cell seq, but realize that this is a reported issue in the field. We have expanded our discussion of this and cited a paper that performs eosinophil single cell sequencing (published at the time our manuscript was being submitted): ““At the same time as our ongoing analysis, the first publication of eosinophil single cell RNA-seq was published, using a flow cytometry based approach rather than 10x, including RNAse inhibitor in the sorting buffer, and performing prior eosinophil enrichment (PMID: 36509106). Based on guidance from 10x, we employed targeted approaches to identify eosinophil clusters according to eosinophil markers (e.g. Siglecf, Prg2, Ccr3, Il5r), and relaxed the scRNA-Seq cutoff analysis to include more cells and intronic content, but still could not find eosinophils. We conclude that eosinophils may be absent due to the enzyme digestion required for SVF isolation and processing for single cell sequencing, which could lead to specific eosinophil population loss due to low RNA content, RNases or cell viability issues. Future experiments would be needed to optimize eosinophil single cell sequencing, based on the recent publication of eosinophil single cell sequencing.”

      3a) There are several issues with the scRNA-Seq analysis and interpretation. More details on the steps taken in the single-cell analyses should be included in the methods section.

      We agree with the reviewer that more details on steps taken in the single cell data processing and bioinformatics needs to be included in the methods section. We included more information and separated sections within the data processing section in the Materials and Methods on the methodology used for these approaches, as well as provided a code for our data processing in a public Github repository: https://github.com/rrugg002/Sexual-dimorphism-in-obesity-is-governed-by-RELM-regulation-of-adipose-macrophages-and-eosinophils.

      b) With regards to the 'pseudobulk' analyses presented in Figs. 5-6, several of the differentially expressed genes identified in Fig. 6 are hemoglobin genes (i.e., Hba, Hbb genes). It is not uncommon to filter these genes out of single-cell analysis since their presence usually indicates red blood cell (RBC) contamination (PMID: 31942070, PMID: 35672358). We would recommend assessing RBC contamination as well as removing Fig. 6 from the manuscript and focusing on cell-type-specific analyses. Re-analysis will likely have an impact on the overall conclusions of the study.

      Prior to our first submission, we consulted with 10x support scientists and the UCR bioinformatics core director to ensure that our analysis included the appropriate filtering. We have now added details in the Methods. The PMIDs provided above are from studies that looked at hippocampus development (where they didn’t perfuse so there may be blood contamination) or whole blood (where there would be significant red blood cell contamination). In contrast, we perfused our mice and treated the single cell suspension with RBC lysis buffer, as detailed in Methods. Also, we have now extended our scSeq analysis to compare hemoglobin RNA to red blood cell specific markers including Gypa/CD235a. While hemoglobin is distributed throughout the myeloid population in the female KO mice, Gypa/CD235a, which would suggest RBC contamination is not expressed at all (see new Fig 7B). Additionally, we provide hemoglobin protein ELISA and IF staining to support our finding that macrophages from KO mice express hemoglobin protein. Last, two publications support hemoglobin expression by nonerythroid sources, including macrophages (PMID: 10359765; PMID: 25431740). While we are confident based on above that our data is not due to RBC contamination, we cannot exclude the fact that, although unlikely, macrophages may be phagocytosing RBC and preserving specifically hemoglobin RNA and protein. Nonetheless, we discuss this possibility in the text. In conclusion, based on the justification above and the new data, we are confident that our findings and overall conclusions are robust.

      To assess for potential RBC contamination, in addition to Gypa, we additionally looked at top genes expressed by murine erythrocytes (PMID: 24637361). Please see below feature plots, showing little to no expression, and a very different distribution than the hemoglobin genes (see new Fig 7a):

      Also, we had a small cluster of potential RBCs (only 75 cells) that we filtered out of downstream DEG analysis, which revealed the same data as in the first submission.

      4) Within the text, there are several instances where the authors claim that a pathway is upregulated based on their Gene Ontology (GO) over-representation analysis (ORA). To come to this conclusion, the authors identify genes that are upregulated in one condition and then perform GO-ORA on these genes. However, the authors do not consider negative regulators, whose upregulation would actually decrease the pathway. Authors should either replace their GO-ORA analysis with one that considers the magnitude and direction of differentially expressed genes and provides an activation z-score (i.e., Ingenuity Pathway Analysis) or replace instances of 'upregulated' or 'downregulated' pathways with 'over-represented' pathways.

      Unfortunately, we did not have access to IPA for this project, therefore we have changed our analysis to over and under-represented pathways as suggested.

      5) For Fig.7A, a representative tSNE plot for each group (WT Female, KO Female, WT Male, KO Male) should be shown to ensure there is proper integration of the clusters across groups. There are some instances where the scRNA-Seq data do not appear to be integrated properly (i.e., Supplemental Figure 2C). The authors should explore integration techniques (i.e., Seurat; PMID: 29608179) to correct for potential batch effects within the analysis.

      We thank the reviewer for the suggestion of proper integration of the clusters across groups. We performed integration using the Cell Ranger aggregation (aggr) pipeline (see updated materials and methods section). In addition, many technical controls were performed to prevent batch effects between our samples. For sequencing, we used the 10x genomics library sequencing depth and run parameters for both gene expression and multiplexing libraries. For all 3’ gene expression library sequencing, we sequenced at a depth of 20,000 read pairs per cell and for all cell multiplexing library sequencing we sequenced at a depth of 5,000 read pairs per cell. All libraries were paired-end dual indexed libraries and were pooled on one flow cell lane using a 4:1 ratio (3’ Gene expression: Multiplexing ratio) in the Novaseq, as recommended by 10x Genomics, in order to maintain nucleotide diversity and prevent batch effects during the sequencing process. When performing integration/aggregation of all sample gene expression libraries using the Cell Ranger aggregation (aggr) pipeline, we performed sequencing depth normalization between all samples. Cell Ranger does this by equalizing the average read depth per cell between groups before merging all sample libraries and counts together. This is a default setting in the Cell Ranger aggr pipeline, and this approach avoids artifacts that may be introduced due to differences in sequencing depth. Thus, we are confident that changes we observed in gene expression and cell type populations are due to biological differences and not technical variability. Below we have provided a tSNE plot showing clustering of all 12 samples after we performed integration:

      We updated old Fig.7 (now Fig. 6) and included a representative tSNE plot for each group. We also updated the tSNE plot for Figure 5-figure supplement 2C (previously S2C) showing overall clustering amongst all groups. The largest population differences occurred in the fibroblast population and these population differences were largely due to sex differences. Because we are confident that integration was performed appropriately and that batch effects were controlled for, we believe these sex differences are a biological effect.

      6) LncRNA Gm47283 is identified as a gene that is differentially expressed by genotype in HFD females (Fig. 7G); however, according to Ensembl this gene is encoded on the Y-chromosome (https://uswest.ensembl.org/Mus_musculus/Gene/Summary?g=ENSMUSG00000096768;r=Y:90796007-90827734). The authors should use the RELMα genotype and sex chromosomally-encoded genes to confirm that their multiplexing was appropriate.

      We agree with the reviewer that it is crucial to confirm that multiplexing and all subsequent analyses are performed correctly. Comparison between males and females contains internal controls that increase confidence, such as Xist gene that is expressed only in females, and Ddx3y that is located on the Y chromosome. LncRNA, Gm47283 is located in the syntenic region of Y chromosome and is also present in females, annotated as Gm21887 located in the syntenic region of the X chromosome. It also has 100% alignment with Gm55594 on X chromosome. Additionally, it is also referred to erythroid differentiation regulator 1 (Erd1), x or y depending on the chromosome, although NCBI database specifies partial assembly and incomplete annotation. Therefore, this explains why we see expression of this gene in females. We have discussed this in the text. We revised the text to refer to this LncRNA as Gm47283/Gm21887 to prevent further confusion. The RELMalpha genotype (absence in the KO) was also confirmed. Last, the PC analysis (see Fig 5) supports clustering by group.

      7) For Fig. 8, samples should be co-clustered and integrated across groups before performing trajectory analysis to allow for direct comparisons between groups.

      We appreciate the valuable feedback and suggestions, which have been helpful in clarifying the trajectory analysis, which we have done as follows:

      Regarding the co-clustering and integration of our samples across groups, here is the explanation of our trajectory analysis approach. We have co-clustered all of our samples using the align_cds function from the Monocle3 package. We have included the code for Figure 8 in our Github repository at https://github.com/rrugg002/Sexual-dimorphism-in-obesity-is-governed-by-RELM-regulation-of-adipose-macrophages-and-eosinophils/blob/main/Figure8.R. Specifically, lines 138, 166, 196 and 225 of the code indicate that the align_cds function was used to cluster our samples by "Sample.ID".

      The align_cds function in Monocle3 can be used to co-cluster all samples in a single-cell RNA-seq experiment by aligning coding sequences (CDS) across different cell types or conditions. The align_cds function takes a set of reference CDS sequences and single-cell RNA-seq reads and identifies the CDS sequences within each read, allowing the identification of differentially expressed genes across different cell types or conditions based on the aligned CDS sequences. More details about align_cds can be found here https://rdrr.io/github/cole-trapnell-lab/monocle3/man/align_cds.html .

      We hope that this additional information alleviates the reviewer’s concerns.

      8) Since the experiments presented in this report were from young mice using a single diet intervention, the authors should comment on how age and other obesogenic diets may impact the results found here. Also, the authors should expand their discussion as to what upstream regulators (i.e., hormones or genetics) may be driving the sex differences in RELMα expression in response to HFD.

      We thank the reviewer for the suggestion. We included several sentences to address this comment. However, since reviewers commented that some of the text needs to be trimmed down, extensive discussion regarding reasons for sex differences, which are numerous, are outside the scope of this manuscript. For example, sex differences can arise from all or any of these:

      1. Sex steroid hormones (estrogen and testosterone) are an obvious possibility for sex differences and this discussion has been included below and in the text.

      2. Sex differences we observe may stem from variety of other factors, besides ovarian estrogen; including extraovarian estrogen, primarily estrogen produced in adipose tissues (32119876).

      3. Sex differences exist in fat deposition, which may or may not be estrogen dependent (25578600, 21834845).

      4. Sex difference were determined in metabolic rate and oxidative phosphorylation, which may also be independent of estrogen (28650095, and reviewed in 26339468).

      5. Sex differences exist in the immune system, some of which are estrogen independent, but dependent on sex chromosomes (32193609).

      6. Sex differences particularly in myeloid lineage, which may also be estrogen independent (25869128).

      7. Sex differences were determined in adipokine levels, including leptin and adiponectin, which influence immune cells in adipose tissues (33268480).

      The role of estrogen is not clear either, and thus extensive discussion is not possible. Numerous studies demonstrated that estrogen is protective from inflammation, thus it is possible that estrogen drives some of the sex differences observed herein. However, several studies determined that estrogen can be pro-inflammatory (20554954, 15879140, 18523261). Previous publications by us (30254630, 33268480) and others (25869128) demonstrated intrinsic sex differences in immune system, that are maybe dependent on sex chromosome complement and/or Xist expression (34103397, 30671059).

      Studies are more consistent that estrogen is protective from weight gain: postmenopausal women with diminished estrogen, and ovariectomized animal models gain weight. The effects of ovariectomy on weight gain and its additive effects with high fat diet were reported in Rhesus monkeys (for example PMID: 2663699; and PMID: 16421340); and in rodents (PMID: 7349433).

      The reviewer is correct that the effects of aging or estrogen on RELMa levels would be of significant interest, and could be a future direction of our studies. Aging-mediated increase in inflammation (including of adipose tissue, recently reviewed in 36875140), that may be dependent on estrogen, can exacerbate obesity-mediated inflammation. We have added this discussion.

      For these reasons we limited our discussion regarding possible differences and stated this in the discussion: “Several studies demonstrated the protective role of estrogen in obesity-mediated inflammation and in weight gain, as discussed above. Whether estrogen protection occurs via estrogen regulation of RELMa levels is a focus of our future studies. Alternatively, intrinsic sex differences in immune system have been demonstrated as well (30254630, 33268480, 25869128) that are dependent on sex chromosome complement and/or Xist expression (34103397, 30671059), and RELMa may be regulated by these as well. Additionally, ageing-mediated increase in inflammation (including of adipose tissue, recently reviewed in 36875140), may also occur via changes in RELMa levels. Our studies used young but developmentally mature mice (4-6 weeks old when placed on diet, 18 weeks old at sacrifice), and future work on aged mice would be needed to investigate aging-mediated inflammation. Furthermore, there are sex differences in fat deposition, metabolic rates and oxidative phosphorylation (reviewed in 26339468), and adipokine expression (Coss) that regulate cytokine and chemokines levels, and therefore may regulate levels of RELMa as well. These possibilities will be addressed in future studies.”

    1. Author Response

      Reviewer #2 (Public Review):

      Charme is a long non-coding RNA reported by the authors in their previous studies. Their previous work, mainly using skeletal muscles as a model, showed the functional relevance of Charme, and presented data demonstrating its nuclear role, primarily via modulating the sub-nuclear localization of Matrin 3 (MATR3). Their data from skeletal muscles suggested that loss of the intronic region of Charme affects the local 3D genome organization, affecting MATR3 occupancy and this gene expression. Loss of Charme in vivo leads to cardiac defects. In this manuscript, they characterize the cardiac developmental defects and present molecular data supporting how the loss of Charme affects the cardiac transcriptome repertoire. Specifically, by performing whole transcriptome analysis in E12.5 hearts, they identify gene expression changes affected in developing hearts due to loss of Charme. Based on their previous study in skeletal muscles, they assume that Charme regulates cardiac gene expression primarily via MATR3 also in developing cardiomyocytes. They provide CLIP-seq data for MATR3 (transcriptome-wide foot printing of MATR3) in wild-type E15.5 hearts and connect the binding of MATR3 to gene expression changes observed in Charme knockout hearts. I credit the authors for providing CLIP seq data from in vivo embryonic samples, which is technically demanding.

      Major strengths:

      Although, as previously indicated by the authors in Charme knockout mice, the major strength is the effect of Charme on cardiac development. While the phenotype might be subtle, the functional data indicate that the role of Charme is essential for cardiac development and function. The combinatorial analysis of MATR3 CLIP-seq and transcriptional changes in the absence of Charme suggests a role of Charme that could be dependent on MATR3.

      We thank this reviewer for appreciating our methodological efforts and the importance of the MATR3 CLIP-seq data from in vivo embryonic samples.

      Weakness:

      (i) Nuclear lncRNAs often affect local gene expression by influencing the local chromatin.

      Charme locus is in close proximity to MYBPC2, which is essential for cardiac function, sarcomerogenesis, and sarcomere maintenance. It is important to rule out that the cardiac-specific developmental defects due to Charme loss are not due to (a) the influence of Charme on MYBPC2 or, of that matter, other neighboring genes, (b) local chromatin changes or enhancer-promoter contacts of MYBPC2 and other immediate neighbors (both aspects in the developmental time window when Charme expression is prominent in the heart, ideally from E11 to E15.5)

      Although the cis-activity represents a mechanism-of-action for several lncRNAs, our previous work does not reveal this kind of activity for pCharme. To add stronger evidence, we have now analysed the expression of pCharme neighbouring genes in cardiac muscle. Genes were selected by narrowing the analysis not only on the genes in “linear” proximity but also on eventual chromatin contacts, which may underlie possible candidates for in cis regulation. To this purpose, we made use of the analyses that in the meantime were in progress (to answer point iv) on available Hi-C datasets (Rosa- Garrido et al. 2017). Starting from a 1 Mb region around Charme locus, we found that most of the interactions with Charme occur in a region spanning from 240 kb upstream and 115 kb downstream of Charme for a total of 370 Kb (Rev#2_Capture Fig. 1A). This region includes 39 genes, 9 of them expressed in the neonatal heart but none showing significant deregulation (see Table S2). To note, this genomic region also included the MYBPC2 locus, for which we did not find a decreased expression in the heart from our RNA-seq data (Revised Figure 2-figure supplement 1C and Table S2). This trend was confirmed through RT-qPCR analyses of several genes from E15.5 extracts, which revealed no significant difference in their abundance upon Charme ablation (Rev#2_Capture fig. 1B).

      Fig. 1. A) Contact map depicting Hi-C data of left ventricular mice heart retrived from GEO accession ID GSM2544836. Data related to 1 Mb region around Charme locus were visualized using Juicebox Web App (https://aidenlab.org/juicebox/). B) RT-qPCR quantification of Charme and its neighbouring genes in CharmeWT vs CharmeKO E15.5.5 hearts. Data were normalized to GAPDH mRNA and represent means ± SEM of WT and KO (n=3) pools. Data information: p < 0.05; p < 0.01, **p < 0.001 unpaired Student’s t test.

      For a better understanding, we also checked possible “local” Charme activities in skeletal muscle cells, from previous datasets (Ballarino et al., 2018). We found that in murine C2C12 cells treated with two different gapmers against Charme, three of its neighbouring genes were expressed (Josd2, Emc10 and Pold1), but none showed significant alterations in their expression levels in response to Charme knock-down (Rev#2_Capture Fig. 2).

      Taken together, these results would exclude the possibility of Charme in cis activity as responsible for the phenotype.

      Fig. 2: Average expression from RNA-seq (FPKM) quantification of Charme neighbouring genes in C2C12 differentiated myotubes treated with Gap-scr vs Gap-Charme. Values for Gap-Charme represent the average values of gene expression after treatment with two different gapmers (GAP-2 and GAP-2/3).

      (ii) The authors provide data indicating cardiac developmental defects in Charme knockouts. Detailed developmental phenotyping is missing, which is necessary to pinpoint the exact developmental milestones affected by Charme. This is critical when reporting the cell type/ organ-specific developmental function of a newly identified regulator.

      We did our best to answer this concern.

      Let us first emphasise that, since their generation, we have never observed any particular tissue alteration, morphological or physiological, when dissecting the CharmeKO animals other than the muscular ones. The high specificity of pCharme expression, as also shown here by ISH (Figure 1C-D, Figure 1-figure supplement 1A-B, Figure 3A), together with the minimal alteration applied to the locus for CRISPR-Cas-mediated KO (PolyA insertion), strongly excludes the presence of an alteration in other tissues and their involvement in the development of the phenotype.

      Nevertheless, we now add more developmental details to the cardiac phenotype (see also Essential revision point 2).

      1- First of all, gene expression analyses performed at 12.5E, 15.5E, 18.5E and neonatal (PN2) stages allowed us to identify, at the molecular level, the developmental time point when CharmeKO effects on the cardiac muscle can be found. Our new results clearly indicate that the pCharme-mediated regulation of morphogenic and cardiac differentiation genes is detectable from E15.5 fetal stage onward (Rev#2_Capture Fig. 3/Revised Figure 2E). Together with the analysis of pCharme targets and coherently with the altered cardiac maturation and performance, this evidence is also supported by the analysis of the myosins Myh6/Myh7 ratio, which diminution in CharmeKO hearts starts from E15.5 up to 69% of control levels at PN stages (Revised Figure 2F).

      2- Hematoxylin-eosin staining of dorso-ventral cryosections from CharmeWT and CharmeKO hearts confirmed the fetal malformation at the E15.5 stage (Revised Figure 2G). Moreover, the hypotrabeculation phenotype of CharmeKO hearts, which was initially examined by immunofluorescence, now finds confirmation by the analysis of key trabecular markers (Irx3 and Sema3a), which expression significantly decreases upon pCharme ablation (Rev#1_Capture Fig. 3B/Revised Figure 2-figure supplement 1G).

      3- Finally, the gene expression analysis on Ki-67, Birc5 and Ccna2 (Revised Figure 2-figure supplement 1E) definitively rules out the influence of pCharme ablation on cell-cycle genes and cardiomyocytes proliferation, thus allowing a more careful interpretation of the embryonic phenotype. Note that, coherently with the lncRNA implication at later stages of development, the expression of important cardiac regulators, such as Gata4, Nkx2-5 and Tbx5, is not altered by its ablation at any of the tested time points (Rev#2_Capture Fig.3), while pCharme absence mainly affects genes which are expressed downstream of these factors.

      These new results have been included in the revised version of the manuscript and better discussed.

      Fig. 3: RT-qPCR quantification Gata4, Nkx2-5 and Tbx5 in CharmeWT and CharmeKO cardiac extract at E12.5, E15.5 and E18.5 days of embryonal development. Data were normalized to GAPDH mRNA and represent means ± SEM of WT and KO (n=3) pools.

      (iii) Along the same line, at the molecular level, the authors provide evidence indicating a change in the expression of genes involved in cardiogenesis and cardiac function. Based on changes in mRNA levels of the genes affected due to loss of Charme and based on immunofluorescence analysis of a handful of markers, they propose a role of Charme in cell cycle and maturation. Such claims could be toned down or warrant detailed experimental validation.

      See above, response to Reviewer #2 (Public Review) weakness (ii).

      (iv) Authors extrapolate the mechanistic finding in skeletal muscle they reported for Charme to the developing heart. While the data support this hypothesis, it falls short in extending the mechanistic understanding of Charme beyond the papers previously published by the authors. CLIP-seq data is a step in the right direction. MATR3 is a relatively abundant RBP, binding transcriptome-wide, mainly in the intronic region, based on currently available CLIP-seq data, as well as shown by the authors' own CLIP seq in cardiomyocytes. It is also shown to regulate pre-mRNA splicing/ alternative splicing along with PTB (PMID: 25599992) and 3D genome organization (PMID: 34716321). In addition, the authors propose a MATR3 depending molecular function for Charme primarily dependent on the intronic region of Charme and due to the binding of MATR3. Answering the following question would enable a better mechanistic understanding of how Charme controls cardiac development.

      (i) what are the proximal genomic regions in the 3D space to Charme locus in embryonic cardiomyocytes? Authors can re-analysis published Hi-C data sets from embryonic cardiomyocytes or perform a 4-C experiment using Charme locus for this purpose.

      See above, response to Reviewer #2 (Public Review) weakness (i).

      (ii) does the loss of Charme affect the splicing landscape of MATR3 bound pre-mRNAs in E12.5 ventricles in general and those arising from the NCTC region specifically?

      This is an intriguing issue, as also highlighted by new evidence showing that the reactivation of fetal-specific RNA-binding proteins, including MATR3, in the injured heart drives transcriptome-wide switches through the regulation of early steps of RNA transcription and processing (D'Antonio et al., 2022).

      Using the rMATS software on our neonatal RNA-Seq datasets we then investigated the effect of pCharme depletion on splicing, with a focus on NCTC. As shown in the Rev#2_Capture Fig.4A, all classical splicing alterations were investigated, such as exon-skipping, alternative 5’ splice site, alternative 3’ splice site, mutually excluded exons and intron retention. Intriguingly, we did observe a slight alteration in the splicing patterns, in particular considering exon skipping events (62% corresponding to 381 genes). Among them, the majority corresponded to exon exclusion events (237 events = 209 genes) while a smaller fraction to exon inclusion (144 events = 133 genes). Moreover, by intersecting these genes with the MATR3-bound RNAs we found a slightly significant enrichment (p=0,038) for exon inclusion (Rev#2_Capture Fig.4B).

      Regarding the NCTC locus, we demonstrate that in hearts pCharme acts through different target genes. Indeed, none of the NCTC-arising transcripts are bound by MATR3 (see Table S4) or substrate for alternative splicing regulation.

      While these results are very interesting for deepening the investigation of pCharme/MATR3 interplay, their biological significance needs to be further investigated through one-by-one analysis of specific transcripts. As a prosecution of the project, Nanopore sequencing of these samples on a MinION platform is currently undergoing in the lab to obtain a better characterization of alternative splicing events in response to the lncRNA ablation during development.

      Fig. 4: A) Left and middle panel: Pie Chart depicting the proportion of significantly altered (FDR < 0.05) splicing events detected by rMATS comparing neonatal CharmeWT and CharmeKO RNA-seq samples. All classical splicing alterations were investigated, such as exon-skipping, alternative 3’ splice site (A3SS), intron retention, alternative 5’ splice site (A5SS) and mutually excluded exons (MXE). Right panel. Volcano plot depicting significant exon skipping events in CharmeKO (FDR < 0.05, PSI<0 for excluded and included exons, FDR >= 0.05 for invariant exons). X-axis represent exon-inclusion ratio or Percentage Spliced In (PSI) while y-axis represent –log10 of p-value. B) Pie charts representing the fraction of transcripts with at least one significant excluded (left panel), invariant (middle panel) and included (right panel) exons that are bound by MATR3. P-values of MATR3 targets enrichment for each comparison is depicted below. Statistical significance was assessed with Fisher exact test.

      (iii) MATR3 binds DNA, as also shown by authors in previous studies. Is the MATR3 genomic binding altered by Charme loss in cardiomyocytes globally, as well as on the loci differentially expressed in Charme knockout heart? Overlapping MATR3 genomic binding changes and transcriptome binding changes to differentially expressed genes in the absence of Charme would better clarify the MATR3-centric mechanisms proposed here. Further connecting that to 3D genome changes due to Charme loss could provide needed clarity to the mechanistic model proposed here.

      Previous experience from our (Desideri et al., 2020) and other labs (Zeitz et al 2009 J Cell Biochem), indicate that Chromatin IP is not the most suitable approach for identifying MATR3 specific targets because of the broad distribution of MATR3 over the genome. Given the number of animals that would need to be sacrificed, we moved further to strengthen our MATR3 CLIP evidence by adding the i) CharmeKO MATR3 CLIP-seq control and the ii) combinatorial analysis of MATR3 CLIP-seq with the RNA-seq data.

      We have better explained the reasoning within the text, which now reads “The known ability of MATR3 to interact with both DNA and RNA and the high retention of pCharme on the chromatin may predict the presence of chromatin and/or specific transcripts within these MATR3-enriched condensates. In skeletal muscle cells, we have previously observed on a genome-wide scale, a global reduction of MATR3 chromatin binding in the absence of pCharme (Desideri et al., 2020). Nevertheless, the broad distribution of the protein over the genome made the identification of specific targets through MATR3-ChIP challenging.” (lines 274-279).

      Indeed, we found that MATR3 binding was significantly decreased on numerous peaks (434/626), while its increase was observed on a smaller fraction of regions (192/626) (Revised Figure 5C). As a control, we performed MATR3 motif enrichment analysis on the differentially bound regions revealing its proximity to the peak summit (+/- 50 nt) (Revised Figure 5-figure supplement 1D) close to the strongest enrichment of MATR3, further confirming a direct and highly specific binding of the protein to these sites. To better characterise the relationship between MATR3 and pCharme, we then intersected the newly identified regions with the MATR3-bound transcripts whose expression was altered by Charme depletion. While gain peaks were equally distributed across DEGs, loss peaks were significantly enriched in a subset of pCharme down-regulated DEGs (Revised Figure 5D), suggesting a crosstalk between the lncRNA and the protein in regulating the expression of this specific group of genes. Interestingly, these RNAs mainly distribute across the same GO categories as pCharme downregulated DEGs and include genes, such as Cacna1c, Notch3, Myo18B and Rbm20 involved in embryo development and validated as pCharme/Matr3 targets in primary cardiac cells (Revised Figure 5D, lower panel and 5E)

    1. Author Response

      Reviewer #1 (Public Review):

      The role of the parietal (PPC), the retrospenial (RSP) and the the visual cortex (S1) was assessed in three tasks corresponding a simple visual discrimination task, a working-memory task and a two-armed bandit task all based on the same sensory-motor requirements within a virtual reality framework. A differential involvement of these areas was reported in these tasks based on the effect of optogenetic manipulations. Photoinhibition of PPC and RSP was more detrimental than photoinhibition of S1 and more drastic effects were observed in presumably more complex tasks (i.e. working-memory and bandit task). If mice were trained with these more complex tasks prior to training in the simple discrimination task, then the same manipulations produced large deficits suggesting that switching from one task to the other was more challenging, resulting in the involvement of possibly larger neural circuits, especially at the cortical level. Calcium imaging also supported this view with differential signaling in these cortical areas depending on the task considered and the order to which they were presented to the animals. Overall the study is interesting and the fact that all tasks were assessed relying on the same sensory-motor requirements is a plus, but the theoretical foundations of the study seems a bit loose, opening the way to alternate ways of interpreting the data than "training history".

      1) Theoretical framework:

      The three tasks used by the authors should be better described at the theoretical level. While the simple task can indeed be considered a visual discrimination task, the other two tasks operationally correspond to a working-memory task (i.e. delay condition which is indeed typically assessed in a Y- or a T-maze in rodent) or a two-armed bandit task (i.e. the switching task), respectively. So these three tasks are qualitatively different, are therefore reliant on at least partially dissociable neural circuits and this should be clearly analyzed to explain the rationale of the focus on the three cortical regions of interest.

      We are glad to see that the reviewer finds our study interesting overall and sees value in the experimental design. We agree that in the previous version, we did not provide enough motivation for the specific tasks we employed and the cortical areas studied.

      Navigating to reward locations based on sensory cues is a behavior that is crucial for survival and amenable to a head-fixed laboratory setting in virtual reality for mice. In this context of goal-directed navigation based on sensory cues, we chose to center our study on posterior cortical association areas, PPC and RSC, for several reasons. RSC has been shown to be crucial for navigation across species, poised to enable the transformation between egocentric and allocentric reference frames and to support spatial memory across various timescales (Alexander & Nitz, 2015; Fischer et al., 2020; Pothuizen et al., 2009; Powell et al., 2017). It furthermore has been shown to be involved in cognitive processes beyond spatial navigation, such as temporal learning and value coding (Hattori et al., 2019; Todd et al., 2015), and is emerging as a crucial region for the flexible integration of sensory and internal signals (Stacho & ManahanVaughan, 2022). It thus is a prime candidate area in the study of how cognitive experience may affect cortical involvement in goal-directed navigation.

      RSC is heavily interconnected with PPC, which is generally thought to convert sensory cues into actions (Freedman & Ibos, 2018) and has been shown to be important for navigation-based decision tasks (Harvey et al., 2012; Pinto et al., 2019). Specific task components involving short-term memory have been suggested to cause PPC to be necessary for a given task (Lyamzin & Benucci, 2019), so we chose such task components in our complex tasks to maximize the likelihood of large PPC involvement to compare the simple task to.

      One such task component is a delay period between cue and the ultimate choice report, which is a common design in decision tasks (Goard et al., 2016; Harvey et al., 2012; Katz et al., 2016; Pinto et al., 2019). We agree with the reviewer that traditionally such a task would be referred to as a workingmemory task. However, we refrain from using this terminology because it may cause readers to expect that to solve the task, mice use a working-memory dependent strategy in its strictest and most traditional sense, that is mice show no overt behaviors indicative of the ultimate choice until the end of the delay period. If the ultimate choice is apparent earlier, mice may use what is sometimes referred to as an embodiment-based strategy, which by some readers may be seen as precluding working memory. Indeed, in new choice-decoding analyses from the mice’s running patterns, we show that mice start running towards the side of the ultimate choice during the cue period already (Figure 1—figure supplement 1). Regardless of these seemingly early choices, however, we crucially have found much larger performance decrements from inhibition in mice performing the delay task compared to mice performing the simple task, along with lower overall task performance in the delay task, indicating that the insertion of a delay period increased subjective task difficulty. As traditional working-memory versus embodiment-based strategies are not the focus of our study here and do not seem to inform the performance decrements from inhibition, we chose to label the task descriptively with the crucial task parameter rather than with the supposedly underlying cognitive process.

      For the switching task, we appreciate that the reviewer sees similarities to a two-armed bandit task. However, in a two-armed bandit task, rewards are typically delivered probabilistically, whereas in our task, cue and action values are constant within each of the two rule blocks, and only the rule, i.e. the cuechoice association, reverses across blocks. This is a crucial distinction because in our design, blocks of Rule A in the switching task are identical to the simple task, with fixed cue-choice associations and guaranteed reward delivery if the correct choice is made, allowing a fair comparison of cortical involvement across tasks.

      We have now heavily revised the introduction, results, and discussion sections of the manuscript to better explain the motivation for the tasks and the investigated brain areas. These revisions cover all the points mentioned in this response.

      Furthermore, we agree with the reviewer that the three tasks are qualitatively different and likely depend on at least partially dissociable circuits. We consider the large differences in cortical inhibition effects between the simple and the complex tasks as evidence for this notion. We also want to highlight that in fact, we performed task-specific optogenetic manipulations presented in the Supplementary Material to further understand the involvement of different areas in task-specific processes. In what is now Figure 1—figure supplement 4, we restricted inhibition in the delay task to either the cue period only or delay period only, finding that interestingly, PPC or RSC inhibition during either period caused larger performance drops than observed in the simple task. We also performed epoch-specific inhibition of PPC in the switching task, targeting specifically reward and inter-trial-interval periods following rule switches, in what is now Figure 1—figure supplement 5. With such PPC inhibition during the ITI, we observed no effect on performance recovery after rule switches and thus found PPC activity to be dispensable for rule updates.

      For the working-memory task we do not know the duration of the delay but this really is critical information; per definition, performance in such a task is delay-dependent, this is not explored in the paper.

      We thank the reviewer for pointing out the lack of information on delay duration and have now added this to the Methods section.

      We agree that in classical working memory tasks where the delay duration is purely defined by the experimenter and varied throughout a session, performance is typically dependent on delay duration. However, in our delay task, the delay distance is kept constant, and thus the delay is not varied by the experimenter. Instead, the time spent in the delay period is determined by the mouse, and the only source of variability in the time spent in the delay period is minor differences in the mice’s running speeds across trials or sessions. Notably, the differences in time in the delay period were greatest between mice because some mice ran faster than others. Within a mouse, the time spent in the delay period was generally rather consistent due to relatively constant running speeds. Also, because the mouse had full control over the delay duration, it could very well speed up its running if it started to forget the cue and run more slowly if it was confident in its memory. Thus, because the delay duration was set by the mouse and not the experimenter, it is very challenging or impossible to interpret the meaning and impact of variations in the delay duration. Accordingly, we had no a priori reason to expect a relationship between task performance and delay duration once mice have become experts at the delay task. Indeed, we do not see such a relationship in our data (see plot here, n = 85 sessions across 7 mice). In order to test the effect of delay duration on behavioral performance, we would have to systematically change the length of the delay period in the maze, which we did not do and which would require an entirely new set of experiments.

      Also, the authors heavily rely on "decision-making" but I am genuinely wondering if this is at all needed to account for the behavior exhibited by mice in these tasks (it would be more accurate for the bandit task) as with the perspective developed by the authors, any task implies a "decision-making" component, so that alone is not very informative on the nature of the cognitive operations that mice must compute to solve the tasks. I think a more accurate terminology in line with the specific task considered should be employed to clarify this.

      We acknowledge that the previous emphasis on decision-making may have created expectations that we demonstrate effects that are specific to the ‘decision-making’ aspect of a decision task. As we do not isolate the decision-making process specifically, we have substantially revised our wording around the tasks and removed the emphasis on decision-making, including in the title. Rather than decision-making, we now highlight the navigational aspect of the tasks employed.

      The "switching"/bandit task is particularly interesting. But because the authors only consider trials with highest accuracy, I think they are missing a critical component of this task which is the balance between exploiting current knowledge and the necessity to explore alternate options when the former strategy is no longer effective. So trials with poor performance are thus providing an essential feedback which is a major drive to support exploratory actions and a critical asset of the bandit task. There is an ample literature documenting how these tasks assess the exploration/exploitation trade-off.

      We completely agree with the reviewer that the periods following rule switches are an essential part of the switching task and of high interest. Indeed, ongoing work in the lab is carefully quantifying the mice’s strategy in this task and exploring how mice use errors after switches to update their belief about the rule. In this project, however, a detailed quantification of switching task strategy seemed beyond the scope because our focus was on training history and not on the specifics of each task. While we agree with the reviewer about the interesting nature of the switching period, it would be too much for a single paper to investigate the detailed mechanisms of each task on top of what we already report for training history. Instead, we have now added quantifications of performance recovery after rule switches in Figure 1— figure supplement 2, showing that rule switches cause below-chance performance initially, followed by recovery within tens of trials.

      2) Training history vs learning sets vs behavioral flexibility:

      The authors consider "training history" as the unique angle to interpret the data. Because the experimental setup is the same throughout all experiments, I am wondering if animals are just simply provided with a cognitive challenge assessing behavioral flexibility given that they must identify the new rule while restraining from responding using previously established strategies. According to this view, it may be expected for cortical lesions to be more detrimental because multiple cognitive processes are now at play.

      It is also possible that animals form learning sets during successive learning episodes which may interfere with or facilitate subsequent learning. Little information is provided regarding learning dynamics in each task (e.g. trials to criterion depending on the number of tasks already presented) to have a clear view on that.

      We thank the reviewer for raising these interesting ideas. We have now evaluated these ideas in the context of our experimental design and results. One of the main points to consider is that for mice transitioned from either of the complex tasks to the simple task, the simple task is not a novel task, but rather a well-known simplification of the previous tasks. Mice that are experts on the delay task have experienced the simple task, i.e. trials without a delay period, during their training procedure before being exposed to delay periods. Switching task expert mice know the simple task as one rule of the switching task and have performed according to this rule in each session prior to the task transition. Accordingly, upon to the transition to the simple task, both delay task expert mice and switching task expert mice perform at very high levels on the very first simple task session. We now quantify and report this in Figure 2—figure supplement 1 (A, B). This is crucial to keep in mind when assessing ‘learning sets’ or ‘behavioral flexibility’ as possible explanations for the persistent cortical involvement after the task transitions. In classical learning sets paradigms, animals are exposed to a series of novel associations, and the learning of previous associations speeds up the learning of subsequent ones (Caglayan et al., 2021; Eichenbaum et al., 1986; Harlow, 1949). This is a distinct paradigm from ours because the simple task does not contain novel associations that are new to the mice already trained on the complex tasks. Relatedly, the simple task is unlikely to present a challenge of behavioral flexibility to these mice given our experimental design and the observation of high simple task performance in the first session after the task transition.

      We now clarify these points in the introduction, results, and discussion sections, also acknowledging that it will be of interest for future work to investigate how learning sets may affect cortical task involvement.

      3) Calcium imaging data versus interventions:

      The value of the calcium imaging data is not entirely clear. Does this approach bring a new point to consider to interpret or conclude on behavioral data or is it to be considered convergent with the optogenetic interventions? Very specific portions of behavioral data are considered for these analyses (e.g. only highly successful trials for the switching/bandit task) and one may wonder if considering larger or different samples would bring similar insights. The whole take on noise correlation is difficult to apprehend because of the same possible interpretation issue, does this really reflect training history, or that a new rule now must be implemented or something else? I don't really get how this correlative approach can help to address this issue.

      We thank the reviewer for pointing out that the relationship between the inhibition dataset and calcium imaging dataset is not clear enough. We restricted analyses of inhibition and calcium imaging data in the switching task to the identical cue-choice associations as present in the simple task (i.e. Rule A trials of the switching task). We did this because we sought to make the fairest and most convincing comparison across tasks for both datasets. However, we can now see that not reporting results with trials from the other rule causes concerns that the reported differences across tasks may only hold for a specific subset of trials.

      We have now added analyses of optogenetic inhibition effects and calcium imaging results considering Rule B trials. In Figure 1—figure supplement 2, we show that when considering only Rule B trials in the switching task, effects of RSC or PPC inhibition on task performance are still increased relative to the ones observed in mice trained on and performing the simple task. We also show that overall task performance is lower in Rule B trials of the switching task than in the simple task, mirroring the differences across tasks when considering Rule A trials only.

      We extended the equivalent comparisons to the calcium imaging dataset, only considering Rule B trials of the switching task in Figure 4—figure supplement 3. With Rule B trials only, we still find larger mean activity and trial-type selectivity levels in RSC and PPC, but not in V1, compared to the simple task, as well as lower noise correlations. We thus find that our conclusions about area necessity and activity differences across tasks hold for Rule B trials and are not due to only considering a subset of the switching task data.

      In Figure 4—figure supplement 4, we further leverage the inclusion of Rule B trials and present new analyses of different single-neuron selectivity categories across rules in the switching task, reporting a prevalence of mixed selectivity in our dataset.

      Furthermore, to clarify the link between the optogenetic inhibition and the calcium imaging datasets, we have revised the motivation for the imaging dataset, as well as the presentation of its results and discussion. Investigating an area’s neural activity patterns is a crucial first step towards understanding how differential necessity of an area across tasks or experience can be explained mechanistically on a circuit level. We now elaborate on the fact that mechanistically, changes in an area’s necessity may or may not be accompanied by changes in activity within that area, as previous work in related experimental paradigms has reported differences in necessity in the absence of differences in activity (Chowdhury & DeAngelis, 2008; Liu & Pack, 2017). This phenomenon can be explained by differences in the readout of an area’s activity. We now make more explicit that in contrast to the scenario where only the readout changes, we find an intriguing correspondence between increased necessity (as seen in the inhibition experiments) and increased activity and selectivity levels (as seen in the imaging experiments) in cortical association areas depending on the current task and previous experience. Rather than attributing the increase in necessity solely to these observed changes in activity, we highlight that in the simple task condition already, cortical areas contain a high amount of task information, ruling out the idea that insufficient local information would cause the small performance deficits from inhibition. Our results thus suggest that differential necessity across tasks and experience may still require changes at the readout level despite changes in local activity. We view our imaging results as an exciting first step towards a mechanistic understanding of how cognitive experience affects cortical necessity, but we stress that future work will need to test directly the relationship between cortical necessity and various specific features of the neural code.

      Reviewer #2 (Public Review):

      The authors use a combination of optogenetics and calcium imaging to assess the contribution of cortical areas (posterior parietal cortex, retrosplenial cortex, S1/V1) on a visual-place discrimination task. Headfixed mice were trained on a simple version of the task where they were required to turn left or right depending on the visual cue that was present (e.g. X = go left; Y = go right). In a more complex version of the task the configurations were either switched during training or the stimuli were only presented at the beginning of the trial (delay).

      The authors found that inhibiting the posterior parietal cortex and retrosplenial cortex affected performance, particularly on the complex tasks. However, previous training on the complex tasks resulted in more pronounced impairments on the simple task than when behaviourally naïve animals were trained/tested on a simple task. This suggests that the more complex tasks recruit these cortical areas to a greater degree, potentially due to increased attention required during the tasks. When animals then perform the simple version of the task their previous experience of the complex tasks is transferred to the simple task resulting in a different pattern of impairments compared to that found in behaviorally naïve animals.

      The calcium imaging data showed a similar pattern of findings to the optogenetic study. There was overall increased activity in the switching tasks compared to the simple tasks consistent with the greater task demands. There was also greater trial-type selectivity in the switching task compared to the simple task. This increased trial-type selectivity in the switching tasks was subsequently carried forward to the simple task so that activity patterns were different when animals performed the simple task after experiencing the complex task compared to when they were trained on the simple task alone

      Strengths:

      The use of optogenetics and calcium-imaging enables the authors to look at the requirement of these brain structures both in terms of necessity for the task when disrupted as well as their contribution when intact.

      The use of the same experimental set up and stimuli can provide a nice comparison across tasks and trials.

      The study nicely shows that the contribution of cortical regions varies with task demands and that longerterm changes in neuronal responses c can transfer across tasks.

      The study highlights the importance of considering previous experience and exposure when understanding behavioural data and the contribution of different regions.

      The authors include a number of important controls that help with the interpretation of the findings.

      We thank the reviewer for pointing out these strengths in our work and for finding our main conclusions supported.

      Weaknesses:

      There are some experimental details that need to be clarified to help with understanding the paper in terms of behavior and the areas under investigation.

      The use of the same stimuli throughout is beneficial as it allows direct comparisons with animals experiencing the same visual cues. However, it does limit the extent to which you can extrapolate the findings. It is perhaps unsurprising to find that learning about specific visual cues affects subsequent learning and use of those specific cues. What would be interesting to know is how much of what is being shown is cue specific learning or whether it reflects something more general, for example schema learning which could be generalised to other learning situations. If animals were then trained on a different discrimination with different stimuli would this previous training modify behavior and neural activity in that instance. This would perhaps be more reflective of the types of typical laboratory experiments where you may find an impairment on a more complex task and then go on to rule out more simple discrimination impairments. However, this would typically be done with slightly different stimuli so you don't introduce transfer effects.

      We agree with the reviewer that investigating the effects of schema learning on cortical task involvement is an exciting future direction and have now explicitly mentioned this in the Discussion section. As the reviewer points out, however, our study was not designed to test this idea specifically. Because investigating schema learning would require developing and implementing an entirely new set of behavioral task variants, we feel this is beyond the scope of the current work. As to the question of how generalized the effects of cognitive experience are, our data in the run-to-target task suggest that if task settings are sufficiently distinct, cortical involvement can be similarly low regardless of complex task experience (now Figure 3—figure supplement 1). This finding is in line with recent work from (Pinto et al., 2019), where cortical involvement appears to change rapidly depending on major differences in task demands. However, work in MT has shown that previous motion discrimination training using dots can alter MT involvement in motion discrimination of gratings (Liu & Pack, 2017), highlighting that cortical involvement need not be tightly linked to the sensory cue identity.

      It is not clear whether length of training has been taken into account for the calcium imaging study given the slow development of neural representations when animals acquire spatial tasks.

      We apologize that the training duration and the temporal relationship between task acquisition and calcium imaging was not documented for the calcium imaging dataset. Please see our detailed reply below the ‘recommendations for the authors’ from Reviewer 2 below.

      The authors are presenting the study in terms of decision-making, however, it is unclear from the data as presented whether the findings specifically relate to decision making. I'm not sure the authors are demonstrating differential effects at specific decision points.

      We understand that the previous emphasis on decision-making may have created expectations that we demonstrate effects that are specific to the ‘decision-making’ aspect of a decision task. As we do not isolate the decision-making process specifically, we have substantially revised our wording around the tasks and removed the emphasis on decision-making, including in the title. Rather than decision-making, we now highlight the navigational aspect of the tasks employed.

      While we removed the emphasis on the decision-making process in our tasks, we found the reviewer’s suggestion to measure ‘decision points’ a useful additional behavioral characterization across tasks. So, we quantified how soon a mouse’s ultimate choice can be decoded from its running pattern as it progresses through the maze towards the Y-intersection. We now show these results in Figure 1—figure supplement 1. Interestingly, we found that in the delay task, choice decoding accuracy was already very high during the cue period before the onset of the delay. Nevertheless, we had shown that overall task performance and performance with inhibition were lower in the delay task compared to the simple task. Also, in segment-specific inhibition experiments, we had found that inhibition during only the delay period or only the cue period decreased task performance substantially more than in the simple task, thus finding an interesting absence of differential inhibition effects around decision points. Overall, how early a mouse made its ultimate decision did not appear predictive of the inhibition-induced task decrements, which we also directly quantify in Figure 1—figure supplement 1.

    1. Author Response

      Reviewer #1 (Public Review):

      Trudel and colleagues aimed to uncover the neural mechanisms of estimating the reliability of the information from social agents and non-social objects. By combining functional MRI with a behavioural experiment and computational modelling, they demonstrated that learning from social sources is more accurate and robust compared with that from non-social sources. Furthermore, dmPFC and pTPJ were found to track the estimated reliability of the social agents (as opposed to the non-social objects). The strength of this study is to devise a task consisting of the two experimental conditions that were matched in their statistical properties and only differed in their framing (social vs. non-social). The novel experimental task allows researchers to directly compare the learning from social and non-social sources, which is a prominent contribution of the present study to social decision neuroscience.

      Thank you so much for your positive feedback about our work. We are delighted that you found that our manuscript provided a prominent contribution to social decision neuroscience. We really appreciate your time to review our work and your valuable comments that have significantly helped us to improve our manuscript further.

      One of the major weaknesses is the lack of a clear description about the conceptual novelty. Learning about the reliability/expertise of social and non-social agents has been of considerable concern in social neuroscience (e.g., Boorman et al., Neuron 2013; and Wittmann et al., Neuron 2016). The authors could do a better job in clarifying the novelty of the study beyond the previous literature.

      We understand the reviewer’s comment and have made changes to the manuscript that, first, highlight more strongly the novelty of the current study. Crucially, second, we have also supplemented the data analyses with a new model-based analysis of the differences in behaviour in the social and non-social conditions which we hope makes clearer, at a theoretical level, why participants behave differently in the two conditions.

      There has long been interest in investigating whether ‘social’ cognitive processes are special or unique compared to ‘non-social’ cognitive processes and, if they are, what makes them so. Differences between conditions could arise during the input stage (e.g. the type of visual input that is processed by social and non-social system), at the algorithm stage (e.g. the type of computational principles that underpin social versus non-social processes) or, even if identical algorithms are used, social and non-social processes might depend on distinct anatomical brain areas or neurons within brain areas. Here, we conducted multiple analyses (in figures 2, 3, and 4 in the revised manuscript and in Figure 2 – figure supplement 1, Figure 3 – figure supplement 1, Figure 4 – figure supplement 3, Figure 4 – figure supplement 4) that not only demonstrated basic similarities in mechanism generalised across social and non-social contexts, but also demonstrated important quantitative differences that were linked to activity in specific brain regions associated with the social condition. The additional analyses (Figure 4 – figure supplement 3, Figure 4 – figure supplement 4) show that differences are not simply a consequence of differences in the visual stimuli that are inputs to the two systems1, nor does the type of algorithm differ between conditions. Instead, our results suggest that the precise manner in which an algorithm is implemented differs when learning about social or non-social information and that this is linked to differences in neuroanatomical substrates.

      The previous studies mentioned by the reviewer are, indeed, relevant ones and were, of course, part of the inspiration for the current study. However, there are crucial differences between them and the current study. In the case of the previous studies by Wittmann, the aim was a very different one: to understand how one’s own beliefs, for example about one’s performance, and beliefs about others, for example about their performance levels, are combined. Here, however, instead we were interested in the similarities and differences between social and non-social learning. It is true that the question resembles the one addressed by Boorman and colleagues in 2013 who looked at how people learned about the advice offered by people or computer algorithms but the difference in the framing of that study perhaps contributed to authors’ finding of little difference in learning. By contrast, in the present study we found evidence that people were predisposed to perceive stability in social performance and to be uncertain about non-social performance. By accumulating evidence across multiple analyses, we show that there are quantitative differences in how we learn about social versus non-social information, and that these differences can be linked to the way in which learning algorithms are implemented neurally. We therefore contend that our findings extend our previous understanding of how, in relation to other learning processes, ‘social’ learning has both shared and special features.

      We would like to emphasize the way in which we have extended several of the analyses throughout the revision. The theoretical Bayesian framework has made it possible to simulate key differences in behaviour between the social and non-social conditions. We explain in our point-by-point reply below how we have integrated a substantial number of new analyses. We have also more carefully related our findings to previous studies in the Introduction and Discussion.

      Introduction, page 4:

      [...] Therefore, by comparing information sampling from social versus non-social sources, we address a long-standing question in cognitive neuroscience, the degree to which any neural process is specialized for, or particularly linked to, social as opposed to non-social cognition 2–9. Given their similarities, it is expected that both types of learning will depend on common neural mechanisms. However, given the importance and ubiquity of social learning, it may also be that the neural mechanisms that support learning from social advice are at least partially specialized and distinct from those concerned with learning that is guided by nonsocial sources. However, it is less clear on which level information is processed differently when it has a social or non-social origin. It has recently been argued that differences between social and non-social learning can be investigated on different levels of Marr’s information processing theory: differences could emerge at an input level (in terms of the stimuli that might drive social and non-social learning), at an algorithmic level or at a neural implementation level 7. It might be that, at the algorithmic level, associative learning mechanisms are similar across social and non-social learning 1. Other theories have argued that differences might emerge because goal-directed actions are attributed to social agents which allows for very different inferences to be made about hidden traits or beliefs 10. Such inferences might fundamentally alter learning about social agents compared to non-social cues.

      Discussion, page 15:

      […] One potential explanation for the assumption of stable performance for social but not non-social predictors might be that participants attribute intentions and motivations to social agents. Even if the social and non-social evidence are the same, the belief that a social actor might have a goal may affect the inferences made from the same piece of information 10. Social advisors first learnt about the target’s distribution and accordingly gave advice on where to find the target. If the social agents are credited with goal-directed behaviour then it might be assumed that the goals remain relatively constant; this might lead participants to assume stability in the performances of social advisors. However, such goal-directed intentions might not be attributed to non-social cues, thereby making judgments inherently more uncertain and changeable across time. Such an account, focussing on differences in attribution in social settings aligns with a recent suggestion that any attempt to identify similarities or differences between social and non-social processes can occur at any one of a number of the levels in Marr’s information theory 7. Here we found that the same algorithm was able to explain social and non-social learning (a qualitatively similar computational model could explain both). However, the extent to which the algorithm was recruited when learning about social compared to non-social information differed. We observed a greater impact of uncertainty on judgments about social compared to non-social information. We have shown evidence for a degree of specialization when assessing social advisors as opposed to non-social cues. At the neural level we focused on two brain areas, dmPFC and pTPJ, that have not only been shown to carry signals associated with belief inferences about others but, in addition, recent combined fMRI-TMS studies have demonstrated the causal importance of these activity patterns for the inference process […]

      Another weakness is the lack of justifications of the behavioural data analyses. It is difficult for me to understand why 'performance matching' is suitable for an index of learning accuracy. I understand the optimal participant would adjust the interval size with respect to the estimated reliability of the advisor (i.e., angular error); however, I am wondering if the optimal strategy for participants is to exactly match the interval size with the angular error. Furthermore, the definitions of 'confidence adjustment across trials' and 'learning index' look arbitrary.

      First, having read the reviewer’s comments, we realise that our choice of the term ‘performance matching’ may not have been ideal as it indeed might not be the case that the participant intended to directly match their interval sizes with their estimates of advisor/predictor error. Like the reviewer, our assumption is simply that the interval sizes should change as the estimated reliability of the advisor changes and, therefore, that the intervals that the participants set should provide information about the estimates that they hold and the manner in which they evolve. On re-reading the manuscript we realised that we had not used the term ‘performance matching’ consistently or in many places in the manuscript. In the revised manuscript we have simply removed it altogether and referred to the participants’ ‘interval setting’.

      Most of the initial analyses in Figure 2a-c aim to better understand the raw behaviour before applying any computational model to the data. We were interested in how participants make confidence judgments (decision-making per se), but also how they adapt their decisions with additional information (changes or learning in decision making). In the revised manuscript we have made clear that these are used as simple behavioural measures and that they will be complemented later by more analyses derived from more formal computational models.

      In what we now refer to as the ‘interval setting’ analysis (Figure 2a), we tested whether participants select their interval settings differently in the social compared to non-social condition. We observe that participants set their intervals closer to the true angular error of the advisor/predictor in the social compared to the non-social condition. This observation could arise in two ways. First, it could be due to quantitative differences in learning despite general, qualitative similarity: mechanisms are similar but participants differ quantitatively in the way that they learn about non-social information and social information. Second, it could, however, reflect fundamentally different strategies. We tested basic performance differences by comparing the mean reward between conditions. There was no difference in reward between conditions (mean reward: paired t-test social vs. non-social, t(23)= 0.8, p=0.4, 95% CI= [-0.007 0.016]), suggesting that interval setting differences might not simply reflect better or worse performance in social or non-social contexts but instead might reflect quantitative differences in the processes guiding interval setting in the two cases.

      In the next set of analyses, in which we compared raw data, applied a computational model, and provided a theoretical account for the differences between conditions, we suggest that there are simple quantitative differences in how information is processed in social and nonsocial conditions but that these have the important impact of making long-term representations – representations built up over a longer series of trials – more important in the social condition. This, in turn, has implications for the neural activity patterns associated with social and non-social learning. We, therefore, agree with the reviewer, that one manner of interval setting is indeed not more optimal than another. However, the differences that do exist in behaviour are important because they reveal something about the social and non-social learning and its neural substrates. We have adjusted the wording and interpretation in the revised manuscript.

      Next, we analysed interval setting with two additional, related analyses: interval setting adjustment across trials and derivation of a learning index. We tested the degree to which participants adjusted their interval setting across trials and according to the prediction error (learning index, Figure f); the latter analysis is very similar to a trial-wise learning rate calculated in previous studies11. In contrast to many other studies, the intervals set by participants provide information about the estimates that they hold in a simple and direct way and enable calculation of a trial-wise learning index; therefore, we decided to call it ‘learning index’ instead of ‘learning rate’ as it is not estimated via a model applied to the data, but instead directly calculated from the data. Arguably the directness of the approach, and its lack of dependence on a specific computational model, is a strength of the analysis.

      Subsequently in the manuscript, a new analysis (illustrated in new Figure 3) employs Bayesian models that can simulate the differences in the social and non-social conditions and demonstrate that a number of behavioural observations can arise simply as a result of differences in noise in each trial-wise Bayesian update (Figure 3 and specifically 3d; Figure 3 – figure supplement 1b-c). In summary, the descriptive analyses in Figure 2a-c aid an intuitive understanding of the differences in behaviour in the social and non-social conditions. We have then repeated these analyses with Bayesian models incorporating different noise levels and showed that in such a way, the differences in behaviour between social and non-social conditions can be mimicked (please see next section and manuscript for details).

      We adjusted the wording in a number of sections in the revised manuscript such as in the legend of Figure 2 (figures and legend), Figure 4 (figures and legend).

      Main text, page 5:

      The confidence interval could be changed continuously to make it wider or narrower, by pressing buttons repeatedly (one button press resulted in a change of one step in the confidence interval). In this way participants provided what we refer to as an ’interval setting’.

      We also adjusted the following section in Main text, page 6:

      Confidence in the performance of social and non-social advisors

      We compared trial-by-trial interval setting in relation to the social and non-social advisors/predictors. When setting the interval, the participant’s aim was to minimize it while ensuring it still encompassed the final target position; points were won when it encompassed the target position but were greater when it was narrower. A given participant’s interval setting should, therefore, change in proportion to the participant’s expectations about the predictor’s angular error and their uncertainty about those expectations. Even though, on average, social and non-social sources did not differ in the precision with which they predicted the target (Figure 2 – figure supplement 1), participants gave interval settings that differed in their relationships to the true performances of the social advisors compared to the non-social predictors. The interval setting was closer to the angular error in the social compared to the non-social sessions (Figure 2a, paired t-test: social vs. non-social, t(23)= -2.57, p= 0.017, 95% confidence interval (CI)= [-0.36 -0.4]). Differences in interval setting might be due to generally lower performance in the nonsocial compared to social condition, or potentially due to fundamentally different learning processes utilised in either condition. We compared the mean reward amounts obtained by participants in the social and non-social conditions to determine whether there were overall performance differences. There was, however, no difference in the reward received by participants in the two conditions (mean reward: paired t-test social vs. non-social, t(23)= 0.8, p=0.4, 95% CI= [-0.007 0.016]), suggesting that interval setting differences might not simply reflect better or worse performance

      Discussion, page 14:

      Here, participants did not match their confidence to the likely accuracy of their own performance, but instead to the performance of another social or non-social advisor. Participants used different strategies when setting intervals to express their confidence in the performances of social advisors as opposed to non-social advisors. A possible explanation might be that participants have a better insight into the abilities of social cues – typically other agents – than non-social cues – typically inanimate objects.

      As the authors assumed simple Bayesian learning for the estimation of reliability in this study, the degree/speed of the learning should be examined with reference to the distance between the posterior and prior belief in the optimal Bayesian inference.

      We thank the reviewer for this suggestion. We agree with the reviewer that further analyses that aim to disentangle the underlying mechanisms that might differ between both social and non-social conditions might provide additional theoretical contributions. We show additional model simulations and analyses that aim to disentangle the differences in more detail. These new results allowed clearer interpretations to be made.

      In the current study, we showed that judgments made about non-social predictors were changed more strongly as a function of the subjective uncertainty: participants set a larger interval, indicating lower confidence, when they were more uncertain about the non-social cue’s accuracy to predict the target. In response to the reviewer’s comments, the new analyses were aimed at understanding under which conditions such a negative uncertainty effect might emerge.

      Prior expectations of performance First, we compared whether participants had different prior expectations in the social condition compared to the non-social condition. One way to compare prior expectations is by comparing the first interval set for each advisor/predictor. This is a direct readout of the initial prior expectation with which participants approach our two conditions. In such a way, we test whether the prior beliefs before observing any social or non-social information differ between conditions. Even though this does not test the impact of prior expectations on subsequent belief updates, it does test whether participants have generally different expectations about the performance of social advisors or non-social predictors. There was no difference in this measure between social or non-social cues (Figure below; paired t-test social vs. non-social, t(23)= 0.01, p=0.98, 95% CI= [-0.067 0.68]).

      Figure. Confidence interval for the first encounter of each predictor in social and non-social conditions. There was no initial bias in predicting the performance of social or non-social predictors.

      Learning across time We have now seen that participants do not have an initial bias when predicting performances in social or non-social conditions. This suggests that differences between conditions might emerge across time when encountering predictors multiple times. We tested whether inherent differences in how beliefs are updated according to new observations might result in different impacts of uncertainty on interval setting between social and non-social conditions. More specifically, we tested whether the integration of new evidence differed between social and non-social conditions; for example, recent observations might be weighted more strongly for non-social cues while past observations might be weighted more strongly for social cues. This approach was inspired by the reviewer’s comments about potential differences in the speed of learning as well as the reduction of uncertainty with increasing predictor encounters. Similar ideas were tested in previous studies, when comparing the learning rate (i.e. the speed of learning) in environments of different volatilities 12,13. In these studies, a smaller learning rate was prevalent in stable environments during which reward rates change slower over time, while higher learning rates often reflect learning in volatile environments so that recent observations have a stronger impact on behaviour. Even though most studies derived these learning rates with reinforcement learning models, similar ideas can be translated into a Bayesian model. For example, an established way of changing the speed of learning in a Bayesian model is to introduce noise during the update process14. This noise is equivalent to adding in some of the initial prior distribution and this will make the Bayesian updates more flexible to adapt to changing environments. It will widen the belief distribution and thereby make it more uncertain. Recent information has more weight on the belief update within a Bayesian model when beliefs are uncertain. This increases the speed of learning. In other words, a wide distribution (after adding noise) allows for quick integration of new information. On the contrary, a narrow distribution does not integrate new observations as strongly and instead relies more heavily on previous information; this corresponds to a small learning rate. So, we would expect a steep decline of uncertainty to be related to a smaller learning index while a slower decline of uncertainty is related to a larger learning index. We hypothesized that participants reduce their uncertainty quicker when observing social information, thereby anchoring more strongly on previous beliefs instead of integrating new observations flexibly. Vice versa, we hypothesized a less steep decline of uncertainty when observing non-social information, indicating that new information can be flexibly integrated during the belief update (new Figure 3a).

      We modified the original Bayesian model (Figure 2d, Figure 2 – figure supplement 2) by adding a uniform distribution (equivalent to our prior distribution) to each belief update – we refer to this as noise addition to the Bayesian model14,21 . We varied the amount of noise between δ = [0,1], while δ= 0 equals the original Bayesian model and δ= 1 represents a very noisy Bayesian model. The uniform distribution was selected to match the first prior belief before any observation was made (equation 2). This δ range resulted in a continuous increase of subjective uncertainty around the belief about the angular error (Figure 3b-c). The modified posterior distribution denoted as 𝑝′(σ x) was derived at each trial as follows:

      We applied each noisy Bayesian model to participants’ choices within the social and nonsocial condition.

      The addition of a uniform distribution changed two key features of the belief distribution: first, the width of the distribution remains larger with additional observations, thereby making it possible to integrate new observations more flexibly. To show this more clearly, we extracted the model-derived uncertainty estimate across multiple encounters of the same predictor for the original model and the fully noisy Bayesian model (Figure 3 – figure supplement 1). The model-derived ‘uncertainty estimate’ of a noisy Bayesian model decays more slowly compared to the ‘uncertainty estimate’ of the original Bayesian model (upper panel). Second, the model-derived ‘accuracy estimate’ reflects more recent observations in a noisy Bayesian model compared to the ‘accuracy estimate’ derived from the original Bayesian model, which integrates past observations more strongly (lower panel). Hence, as mentioned beforehand, a rapid decay of uncertainty implies a small learning index; or in other words, stronger integration of past compared to recent observations.

      In the following analyses, we tested whether an increasingly noisy Bayesian model mimics behaviour that is observed in the non-social compared to social condition. For example, we tested whether an increasingly noisy Bayesian model also exhibits a strongly negative ‘predictor uncertainty’ effect on interval setting (Figure 2e). In such a way, we can test whether differences in noise in the updating process of a Bayesian model might reproduce important qualitative differences in learning-related behaviour seen in the social and nonsocial conditions.

      We used these modified Bayesian models to simulate trial-wise interval setting for each participant according to the observations they made when selecting a particular advisor or non-social cue. We simulated interval setting at each trial and examined whether an increase in noise produced model behaviours that resembled participant behaviour patterns observed in the non-social condition as opposed to social condition. At each trial, we used the accuracy estimate (Methods, equation 6) – which represents a subjective belief about a single angular error -- to derive an interval setting for the selected predictor. To do so, we first derived the point-estimate of the belief distribution at each trial (Methods, equation 6) and multiplied it with the size of one interval step on the circle. The step size was derived by dividing the circle size by the maximum number of possible steps. Here is an example of transforming an accuracy estimate into an interval: let’s assume the belief about the angular error at the current trial is 50 (Methods, equation 6). Now, we are trying to transform this number into an interval for the current predictor on a given trial. To obtain the size of one interval step, the circle size (360 degrees) is divided by the maximum number of interval steps (40 steps; note, 20 steps on each side), which results in nine degrees that represents the size of one interval step. Next, the accuracy estimate in radians (0,87) is multiplied by the step size in radians (0,1571) resulting in an interval of 0,137 radians or 7,85 degrees. The final interval size would be 7,85.

      Simulating Bayesian choices in that way, we repeated the behavioural analyses (Figure 2b,e,f) to test whether intervals derived from more noisy Bayesian models mimic intervals set by participants in the non-social condition: greater changes in interval setting across trials (Figure 3 – figure supplement 1b), a negative ‘predictor uncertainty' effect on interval setting (Figure 3 – figure supplement 1c), and a higher learning index (Figure 3d).

      First, we repeated the most crucial analysis -- the linear regression analysis (Figure 2e) and hypothesized that intervals that were simulated from noisy Bayesian models would also show a greater negative ‘predictor uncertainty’ effect on interval setting. This was indeed the case: irrespective of social or non-social conditions, the addition of noise (increased weighting of the uniform distribution in each belief update) led to an increasingly negative ‘predictor uncertainty’ effect on confidence judgment (new Figure 3d). In Figure 3d, we show the regression weights (y-axis) for the ‘predictor uncertainty’ on confidence judgment with increasing noise (x-axis). This result is highly consistent with the idea that that in the non-social condition the manner in which task estimates are updated is more uncertain and more noisy. By contrast, social estimates appear relatively more stable, also according to this new Bayesian simulation analysis.

      This new finding extends the results and suggests a formal computational account of the behavioural differences between social and non-social conditions. Increasing the noise of the belief update mimics behaviour that is observed in the non-social condition: an increasingly negative effect of ‘predictor uncertainty’ on confidence judgment. Noteworthily, there was no difference in the impact that the noise had in the social and non-social conditions. This was expected because the Bayesian simulations are blind to the framing of the conditions. However, it means that the observed effects do not depend on the precise sequence of choices that participants made in these conditions. It therefore suggests that an increase in the Bayesian noise leads to an increasingly negative impact of ‘predictor uncertainty’ on confidence judgments irrespective of the condition. Hence, we can conclude that different degrees of uncertainty within the belief update is a reasonable explanation that can underlie the differences observed between social and non-social conditions.

      Next, we used these simulated confidence intervals and repeated the descriptive behavioural analyses to test whether interval settings that were derived from more noisy Bayesian models mimic behavioural patterns observed in non-social compared to social conditions. For example, more noise in the belief update should lead to more flexible integration of new information and hence should potentially lead to a greater change of confidence judgments across predictor encounters (Figure 2b). Further, a greater reliance on recent information should lead to prediction errors more strongly in the next confidence judgment; hence, it should result in a higher learning index in the non-social condition that we hypothesize to be perceived as more uncertain (Figure 2f). We used the simulated confidence interval from Bayesian models on a continuum of noise integration (i.e. different weighting of the uniform distribution into the belief update) and derived again both absolute confidence change and learning indices (Figure 3 – figure supplement 1b-c).

      ‘Absolute confidence change’ and ‘learning index’ increase with increasing noise weight, thereby mimicking the difference between social and non-social conditions. Further, these analyses demonstrate the tight relationship between descriptive analyses and model-based analyses. They show that a noise in the Bayesian updating process is a conceptual explanation that can account for both the differences in learning and the difference in uncertainty processing that exist between social and non-social conditions. The key insight conveyed by the Bayesian simulations is that a wider, more uncertain belief distribution changes more quickly. Correspondingly, in the non-social condition, participants express more uncertainty in their confidence estimate when they set the interval, and they also change their beliefs more quickly as expressed in a higher learning index. Therefore, noisy Bayesian updating can account for key differences between social and non-social condition.

      We thank the reviewer for making this point, as we believe that these additional analyses allow theoretical inferences to be made in a more direct manner; we think that it has significantly contributed towards a deeper understanding of the mechanisms involved in the social and non-social conditions. Further, it provides a novel account of how we make judgments when being presented with social and non-social information.

      We made substantial changes to the main text, figures and supplementary material to include these changes:

      Main text, page 10-11 new section:

      The impact of noise in belief updating in social and non-social conditions

      So far, we have shown that, in comparison to non-social predictors, participants changed their interval settings about social advisors less drastically across time, relied on observations made further in the past, and were less impacted by their subjective uncertainty when they did so (Figure 2). Using Bayesian simulation analyses, we investigated whether a common mechanism might underlie these behavioural differences. We tested whether the integration of new evidence differed between social and non-social conditions; for example, recent observations might be weighted more strongly for non-social cues while past observations might be weighted more strongly for social cues. Similar ideas were tested in previous studies, when comparing the learning rate (i.e. the speed of learning) in environments of different volatilities12,13. We tested these ideas using established ways of changing the speed of learning during Bayesian updates14,21. We hypothesized that participants reduce their uncertainty quicker when observing social information. Vice versa, we hypothesized a less steep decline of uncertainty when observing non-social information, indicating that new information can be flexibly integrated during the belief update (Figure 5a).

      We manipulated the amount of uncertainty in the Bayesian model by adding a uniform distribution to each belief update (Figure 3b-c) (equation 10,11). Consequently, the distribution’s width increases and is more strongly impacted by recent observations (see example in Figure 3 – figure supplement 1). We used these modified Bayesian models to simulate trial-wise interval setting for each participant according to the observations they made by selecting a particular advisor in the social condition or other predictor in the nonsocial condition. We simulated confidence intervals at each trial. We then used these to examine whether an increase in noise led to simulation behaviour that resembled behavioural patterns observed in non-social conditions that were different to behavioural patterns observed in the social condition.

      First, we repeated the linear regression analysis and hypothesized that interval settings that were simulated from noisy Bayesian models would also show a greater negative ‘predictor uncertainty’ effect on interval setting resembling the effect we had observed in the nonsocial condition (Figure 2e). This was indeed the case when using the noisy Bayesian model: irrespective of social or non-social condition, the addition of noise (increasing weight of the uniform distribution to each belief update) led to an increasingly negative ‘predictor uncertainty’ effect on confidence judgment (new Figure 3d). The absence of difference between the social and non-social conditions in the simulations, suggests that an increase in the Bayesian noise is sufficient to induce a negative impact of ‘predictor uncertainty’ on interval setting. Hence, we can conclude that different degrees of noise in the updating process are sufficient to cause differences observed between social and non-social conditions. Next, we used these simulated interval settings and repeated the descriptive behavioural analyses (Figure 2b,f). An increase in noise led to greater changes of confidence across time and a higher learning index (Figure 3 – figure supplement 1b-c). In summary, the Bayesian simulations offer a conceptual explanation that can account for both the differences in learning and the difference in uncertainty processing that exist between social and non-social conditions. The key insight conveyed by the Bayesian simulations is that a wider, more uncertain belief distribution changes more quickly. Correspondingly, in the non-social condition, participants express more uncertainty in their confidence estimate when they set the interval, and they also change their beliefs more quickly. Therefore, noisy Bayesian updating can account for key differences between social and non-social condition.

      Methods, page 23 new section:

      Extension of Bayesian model with varying amounts of noise

      We modified the original Bayesian model (Figure 2d, Figure 2 – figure supplement 2) to test whether the integration of new evidence differed between social and non-social conditions; for example, recent observations might be weighted more strongly for non-social cues while past observations might be weighted more strongly for social cues. [...] To obtain the size of one interval step, the circle size (360 degrees) is divided by the maximum number of interval steps (40 steps; note, 20 steps on each side), which results in nine degrees that represents the size of one interval step. Next, the accuracy estimate in radians (0,87) is multiplied by the step size in radians (0,1571) resulting in an interval of 0,137 radians or 7,85 degrees. The final interval size would be 7,85.

      We repeated behavioural analyses (Figure 2b,e,f) to test whether confidence intervals derived from more noisy Bayesian models mimic behavioural patterns observed in the nonsocial condition: greater changes of confidence across trials (Figure 3 – figure supplement 1b), a greater negative ‘predictor uncertainty' on confidence judgment (Figure 3 – figure supplement 1c) and a greater learning index (Figure 3d).

      Discussion, page 14: […] It may be because we make just such assumptions that past observations are used to predict performance levels that people are likely to exhibit next 15,16. An alternative explanation might be that participants experience a steeper decline of subjective uncertainty in their beliefs about the accuracy of social advice, resulting in a narrower prior distribution, during the next encounter with the same advisor. We used a series of simulations to investigate how uncertainty about beliefs changed from trial to trial and showed that belief updates about non-social cues were consistent with a noisier update process that diminished the impact of experiences over the longer term. From a Bayesian perspective, greater certainty about the value of advice means that contradictory evidence will need to be stronger to alter one’s beliefs. In the absence of such evidence, a Bayesian agent is more likely to repeat previous judgments. Just as in a confirmation bias 17, such a perspective suggests that once we are more certain about others’ features, for example, their character traits, we are less likely to change our opinions about them.

      Reviewer #2 (Public Review):

      Humans learn about the world both directly, by interacting with it, and indirectly, by gathering information from others. There has been a longstanding debate about the extent to which social learning relies on specialized mechanisms that are distinct from those that support learning through direct interaction with the environment. In this work, the authors approach this question using an elegant within-subjects design that enables direct comparisons between how participants use information from social and non-social sources. Although the information presented in both conditions had the same underlying structure, participants tracked the performance of the social cue more accurately and changed their estimates less as a function of prediction error. Further, univariate activity in two regions-dmPFC and pTPJ-tracked participants' confidence judgments more closely in the social than in the non-social condition, and multivariate patterns of activation in these regions contained information about the identity of the social cues.

      Overall, the experimental approach and model used in this paper are very promising. However, after reading the paper, I found myself wanting additional insight into what these condition differences mean, and how to place this work in the context of prior literature on this debate. In addition, some additional analyses would be useful to support the key claims of the paper.

      We thank the reviewer for their very supportive comments. We have addressed their points below and have highlighted changes in our manuscript that we made in response to the reviewer’s comments.

      (1) The framing should be reworked to place this work in the context of prior computational work on social learning. Some potentially relevant examples:

      • Shafto, Goodman & Frank (2012) provide a computational account of the domainspecific inductive biases that support social learning. In brief, what makes social learning special is that we have an intuitive theory of how other people's unobservable mental states lead to their observable actions, and we use this intuitive theory to actively interpret social information. (There is also a wealth of behavioral evidence in children to support this account; for a review, see Gweon, 2021).

      • Heyes (2012) provides a leaner account, arguing that social and non-social learning are supported by a common associative learning mechanism, and what distinguishes social from non-social learning is the input mechanism. Social learning becomes distinctively "social" to the extent that organisms are biased or attuned to social information.

      I highlight these papers because they go a step beyond asking whether there is any difference between mechanisms that support social and nonsocial learning-they also provide concrete proposals about what that difference might be, and what might be shared. I would like to see this work move in a similar direction.

      References<br /> (In the interest of transparency: I am not an author on these papers.)

      Gweon, H. (2021). Inferential social learning: how humans learn from others and help others learn. PsyArXiv. https://doi.org/10.31234/osf.io/8n34t

      Heyes, C. (2012). What's social about social learning?. Journal of Comparative Psychology, 126(2), 193.

      Shafto, P., Goodman, N. D., & Frank, M. C. (2012). Learning from others: The consequences of psychological reasoning for human learning. Perspectives on Psychological Science, 7(4), 341-351.

      Thank you for this suggestion to expand our framing. We have now made substantial changes to the Discussion and Introduction to include additional background literature, the relevant references suggested by the reviewer, addressing the differences between social and non-social learning. We further related our findings to other discussions in the literature that argue that differences between social and non-social learning might occur at the level of algorithms (the computations involved in social and non-social learning) and/or implementation (the neural mechanisms). Here, we describe behaviour with the same algorithm (Bayesian model), but the weighing of uncertainty on decision-making differs between social and non-social contexts. This might be explained by similar ideas put forward by Shafto and colleagues (2012), who suggest that differences between social and non-social learning might be due to the attribution of goal-directed intention to social agents, but not non-social cues. Such an attribution might lead participants to assume that advisor performances will be relatively stable under the assumption that they should have relatively stable goal-directed intentions. We also show differences at the implementational level in social and non-social learning in TPJ and dmPFC.

      Below we list the changes we have made to the Introduction and Discussion. Further, we would also like to emphasize the substantial extension of the Bayesian modelling which we think clarifies the theoretical framework used to explain the mechanisms involved in social and non-social learning (see our answer to the next comments below).

      Introduction, page 4:

      [...]<br /> Therefore, by comparing information sampling from social versus non-social sources, we address a long-standing question in cognitive neuroscience, the degree to which any neural process is specialized for, or particularly linked to, social as opposed to non-social cognition 2–9. Given their similarities, it is expected that both types of learning will depend on common neural mechanisms. However, given the importance and ubiquity of social learning, it may also be that the neural mechanisms that support learning from social advice are at least partially specialized and distinct from those concerned with learning that is guided by nonsocial sources.

      However, it is less clear on which level information is processed differently when it has a social or non-social origin. It has recently been argued that differences between social and non-social learning can be investigated on different levels of Marr’s information processing theory: differences could emerge at an input level (in terms of the stimuli that might drive social and non-social learning), at an algorithmic level or at a neural implementation level 7. It might be that, at the algorithmic level, associative learning mechanisms are similar across social and non-social learning 1. Other theories have argued that differences might emerge because goal-directed actions are attributed to social agents which allows for very different inferences to be made about hidden traits or beliefs 10. Such inferences might fundamentally alter learning about social agents compared to non-social cues.

      Discussion, page 15:

      […] One potential explanation for the assumption of stable performance for social but not non-social predictors might be that participants attribute intentions and motivations to social agents. Even if the social and non-social evidence are the same, the belief that a social actor might have a goal may affect the inferences made from the same piece of information 10. Social advisors first learnt about the target’s distribution and accordingly gave advice on where to find the target. If the social agents are credited with goal-directed behaviour then it might be assumed that the goals remain relatively constant; this might lead participants to assume stability in the performances of social advisors. However, such goal-directed intentions might not be attributed to non-social cues, thereby making judgments inherently more uncertain and changeable across time. Such an account, focussing on differences in attribution in social settings aligns with a recent suggestion that any attempt to identify similarities or differences between social and non-social processes can occur at any one of a number of the levels in Marr’s information theory 7. Here we found that the same algorithm was able to explain social and non-social learning (a qualitatively similar computational model could explain both). However, the extent to which the algorithm was recruited when learning about social compared to non-social information differed. We observed a greater impact of uncertainty on judgments about social compared to non-social information. We have shown evidence for a degree of specialization when assessing social advisors as opposed to non-social cues. At the neural level we focused on two brain areas, dmPFC and pTPJ, that have not only been shown to carry signals associated with belief inferences about others but, in addition, recent combined fMRI-TMS studies have demonstrated the causal importance of these activity patterns for the inference process […]

      (2) The results imply that dmPFC and pTPJ differentiate between learning from social and non-social sources. However, more work needs to be done to rule out simpler, deflationary accounts. In particular, the condition differences observed in dmPFC and pTPJ might reflect low-level differences between the two conditions. For example, the social task could simply have been more engaging to participants, or the social predictors may have been more visually distinct from one another than the fruits.

      We understand the reviewer’s concern regarding low-level distinctions between the social and non-social condition that could confound for the differences in neural activation that are observed between conditions in areas pTPJ and dmPFC. From the reviewer’s comments, we understand that there might be two potential confounders: first, low-level differences such that stimuli within one condition might be more distinct to each other compared to the relative distinctiveness between stimuli within the other condition. Therefore, simply the greater visual distinctiveness of stimuli in one condition than another might lead to learning differences between conditions. Second, stimuli in one condition might be more engaging and potentially lead to attentional differences between conditions. We used a combination of univariate analyses and multivariate analyses to address both concerns.

      Analysis 1: Univariate analysis to inspect potential unaccounted variance between social and non-social condition

      First, we used the existing univariate analysis (exploratory MRI whole-brain analysis, see Methods) to test for neural activation that covaried with attentional differences – or any other unaccounted neural difference -- between conditions. If there were neural differences between conditions that we are currently not accounting for with the parametric regressors that are included in the fMRI-GLM, then these differences should be captured in the constant of the GLM model. For example, if there are attentional differences between conditions, then we could expect to see neural differences between conditions in areas such as inferior parietal lobe (or other related areas that are commonly engaged during attentional processes).

      Importantly, inspection of the constant of the GLM model should capture any unaccounted differences, whether they are due to attention or alternative processes that might differ between conditions. When inspecting cluster-corrected differences in the constant of the fMRI-GLM model during the setting of the confidence judgment, there were no clustersignificant activation that was different between social and non-social conditions (Figure 4 – figure supplement 4a; results were familywise-error cluster-corrected at p<0.05 using a cluster-defining threshold of z>2.3). For transparency, we show the sub-threshold activation map across the whole brain (z > 2) for the ‘constant’ contrasted between social and nonsocial condition (i.e. constant, contrast: social – non-social).

      For transparency we additionally used an ROI-approach to test differences in activation patterns that correlated with the constant during the confidence phase – this means, we used the same ROI-approach as we did in the paper to avoid any biased test selection. We compared activation patterns between social and non-social conditions in the same ROI as used before; dmPFC (MNI-coordinate [x/y/z: 2,44,36] 16), bilateral pTPJ (70% probability anatomical mask; for reference see manuscript, page 23) and additionally compared activation patterns between conditions in bilateral IPLD (50% probability anatomical mask, 20). We did not find significantly different activation patterns between social and non-social conditions in any of these areas: dmPFC (confidence constant; paired t-test social vs nonsocial: t(23) = 0.06, p=0.96, [-36.7, 38.75]), bilateral TPJ (confidence constant; paired t-test social vs non-social: t(23) = -0.06, p=0.95, [-31, 29]), bilateral IPLD (confidence constant; paired t-test social vs non-social: t(23) = -0.58, p=0.57, [-30.3 17.1]).

      There were no meaningful activation patterns that differed between conditions in either areas commonly linked to attention (eg IPL) or in brain areas that were the focus of the study (dmPFC and pTPJ). Activation in dmPFC and pTPJ covaried with parametric effects such as the confidence that was set at the current and previous trial, and did not correlate with low-level differences such as attention. Hence, these results suggest that activation between conditions was captured better by parametric regressors such as the trial-wise interval setting, i.e. confidence, and are unlikely to be confounded by low-level processes that can be captured with univariate neural analyses.

      Analysis 2: RSA to test visual distinctiveness between social and non-social conditions

      We addressed the reviewer’s other comment further directly by testing whether potential differences between conditions might arise due to a varying degree of visual distinctiveness in one stimulus set compared to the other stimulus set. We used RSA analysis to inspect potential differences in early visual processes that should be impacted by greater stimulus similarity within one condition. In other words, we tested whether the visual distinctiveness of one stimuli set was different to the visual distinctiveness of the other stimuli set. We used RSA analysis to compare the Exemplar Discriminability Index (EDI) between conditions in early visual areas. We compared the dissimilarity of neural activation related to the presentation of an identical stimulus across trials (diagonal in RSA matrix) with the dissimilarity in neural activation between different stimuli across trials (off-diagonal in RSA matrix). If stimuli within one stimulus set are very similar, then the difference between the diagonal and off-diagonal should be very small and less likely to be significant (i.e. similar diagonal and off-diagonal values). In contrast, if stimuli within one set are very distinct from each other, then the difference between the diagonal and off-diagonal should be large and likely to result in a significant EDI (i.e. different diagonal and off-diagonal values) (see Figure 4g for schematic illustration). Hence, if there is a difference in the visual distinctiveness between social and non-social conditions, then this difference should result in different EDI values for both conditions – hence, visual distinctiveness between the stimuli set can be tested by comparing the EDI values between conditions within the early visual processing. We used a Harvard-cortical ROI mask based on bilateral V1. Negative EDI values indicate that the same exemplars are represented more similarly in the neural V1 pattern than different exemplars. This analysis showed that there was no significant difference in EDI between conditions (Figure 4 – figure supplement 4b; EDI paired sample t-test: t(23) = -0.16, p=0.87, 95% CI [-6.7 5.7]).

      We have further replicated results in V1 with a whole-brain searchlight analysis, averaging across both social and non-social conditions.

      In summary, by using a combination of univariate and multivariate analyses, we could test whether neural activation might be different when participants were presented with a facial or fruit stimuli and whether these differences might confound observed learning differences between conditions. We did not find meaningful neural differences that were not accounted for with the regressors included in the GLM. Further, we did not find differences in the visual distinctiveness between the stimuli sets. Hence, these control analyses suggest that differences between social and non-social conditions might not arise because of differences in low-level processes but are instead more likely to develop when learning about social or non-social information.

      Moreover, we also examined behaviourally whether participants differed in the way they approached social and non-social condition. We tested whether there were initial biases prior to learning, i.e. before actually receiving information from either social or non-social information sources. Therefore, we tested whether participants have different prior expecations about the performance of social compared to non-social predictors. We compared the confidence judgments at the first trial of each predictor. We found that participants set confidence intervals very similarly in social and non-social conditions (Figure below). Hence, it did not seem to be the case that differences between conditions arose due to low level differences in stimulus sets or prior differences in expectations about performances of social compared to non-social predictors. However, we can show that differences between conditions are apparent when updating one’s belief about social advisors or non-social cues and as a consequence, in the way that confidence judgments are set across time.

      Figure. Confidence interval for the first encounter of each predictor in social and non-social conditions. There was no initial bias in predicting the performance of social or non-social predictors.

      Main text page 13:

      [… ]<br /> Additional control analyses show that neural differences between social and non-social conditions were not due to the visually different set of stimuli used in the experiment but instead represent fundamental differences in processing social compared to non-social information (Figure 4 – figure supplement 4). These results are shown in ROI-based RSA analysis and in whole-brain searchlight analysis. In summary, in conjunction, the univariate and multivariate analyses demonstrate that dmPFC and pTPJ represent beliefs about social advisors that develop over a longer timescale and encode the identities of the social advisors.

      References

      1. Heyes, C. (2012). What’s social about social learning? Journal of Comparative Psychology 126, 193–202. 10.1037/a0025180.
      2. Chang, S.W.C., and Dal Monte, O. (2018). Shining Light on Social Learning Circuits. Trends in Cognitive Sciences 22, 673–675. 10.1016/j.tics.2018.05.002.
      3. Diaconescu, A.O., Mathys, C., Weber, L.A.E., Kasper, L., Mauer, J., and Stephan, K.E. (2017). Hierarchical prediction errors in midbrain and septum during social learning. Soc Cogn Affect Neurosci 12, 618–634. 10.1093/scan/nsw171.
      4. Frith, C., and Frith, U. (2010). Learning from Others: Introduction to the Special Review Series on Social Neuroscience. Neuron 65, 739–743. 10.1016/j.neuron.2010.03.015.
      5. Frith, C.D., and Frith, U. (2012). Mechanisms of Social Cognition. Annu. Rev. Psychol. 63, 287–313. 10.1146/annurev-psych-120710-100449.
      6. Grabenhorst, F., and Schultz, W. (2021). Functions of primate amygdala neurons in economic decisions and social decision simulation. Behavioural Brain Research 409, 113318. 10.1016/j.bbr.2021.113318.
      7. Lockwood, P.L., Apps, M.A.J., and Chang, S.W.C. (2020). Is There a ‘Social’ Brain? Implementations and Algorithms. Trends in Cognitive Sciences, S1364661320301686. 10.1016/j.tics.2020.06.011.
      8. Soutschek, A., Ruff, C.C., Strombach, T., Kalenscher, T., and Tobler, P.N. (2016). Brain stimulation reveals crucial role of overcoming self-centeredness in self-control. Sci. Adv. 2, e1600992. 10.1126/sciadv.1600992.
      9. Wittmann, M.K., Lockwood, P.L., and Rushworth, M.F.S. (2018). Neural Mechanisms of Social Cognition in Primates. Annu. Rev. Neurosci. 41, 99–118. 10.1146/annurev-neuro080317-061450.
      10. Shafto, P., Goodman, N.D., and Frank, M.C. (2012). Learning From Others: The Consequences of Psychological Reasoning for Human Learning. Perspect Psychol Sci 7, 341– 351. 10.1177/1745691612448481.
      11. McGuire, J.T., Nassar, M.R., Gold, J.I., and Kable, J.W. (2014). Functionally Dissociable Influences on Learning Rate in a Dynamic Environment. Neuron 84, 870–881. 10.1016/j.neuron.2014.10.013.
      12. Behrens, T.E.J., Woolrich, M.W., Walton, M.E., and Rushworth, M.F.S. (2007). Learning the value of information in an uncertain world. Nature Neuroscience 10, 1214– 1221. 10.1038/nn1954.
      13. Meder, D., Kolling, N., Verhagen, L., Wittmann, M.K., Scholl, J., Madsen, K.H., Hulme, O.J., Behrens, T.E.J., and Rushworth, M.F.S. (2017). Simultaneous representation of a spectrum of dynamically changing value estimates during decision making. Nat Commun 8, 1942. 10.1038/s41467-017-02169-w.
      14. Allenmark, F., Müller, H.J., and Shi, Z. (2018). Inter-trial effects in visual pop-out search: Factorial comparison of Bayesian updating models. PLoS Comput Biol 14, e1006328. 10.1371/journal.pcbi.1006328.
      15. Wittmann, M., Trudel, N., Trier, H.A., Klein-Flügge, M., Sel, A., Verhagen, L., and Rushworth, M.F.S. (2021). Causal manipulation of self-other mergence in the dorsomedial prefrontal cortex. Neuron.
      16. Wittmann, M.K., Kolling, N., Faber, N.S., Scholl, J., Nelissen, N., and Rushworth, M.F.S. (2016). Self-Other Mergence in the Frontal Cortex during Cooperation and Competition. Neuron 91, 482–493. 10.1016/j.neuron.2016.06.022.
      17. Kappes, A., Harvey, A.H., Lohrenz, T., Montague, P.R., and Sharot, T. (2020). Confirmation bias in the utilization of others’ opinion strength. Nat Neurosci 23, 130–137. 10.1038/s41593-019-0549-2.
      18. Trudel, N., Scholl, J., Klein-Flügge, M.C., Fouragnan, E., Tankelevitch, L., Wittmann, M.K., and Rushworth, M.F.S. (2021). Polarity of uncertainty representation during exploration and exploitation in ventromedial prefrontal cortex. Nat Hum Behav. 10.1038/s41562-020-0929-3.
      19. Yu, Z., Guindani, M., Grieco, S.F., Chen, L., Holmes, T.C., and Xu, X. (2022). Beyond t test and ANOVA: applications of mixed-effects models for more rigorous statistical analysis in neuroscience research. Neuron 110, 21–35. 10.1016/j.neuron.2021.10.030.
      20. Mars, R.B., Jbabdi, S., Sallet, J., O’Reilly, J.X., Croxson, P.L., Olivier, E., Noonan, M.P., Bergmann, C., Mitchell, A.S., Baxter, M.G., et al. (2011). Diffusion-Weighted Imaging Tractography-Based Parcellation of the Human Parietal Cortex and Comparison with Human and Macaque Resting-State Functional Connectivity. Journal of Neuroscience 31, 4087– 4100. 10.1523/JNEUROSCI.5102-10.2011.
      21. Yu, A.J., and Cohen, J.D. Sequential effects: Superstition or rational behavior? 8.
      22. Nili, H., Wingfield, C., Walther, A., Su, L., Marslen-Wilson, W., and Kriegeskorte, N. (2014). A Toolbox for Representational Similarity Analysis. PLoS Comput Biol 10, e1003553. 10.1371/journal.pcbi.1003553.
      23. Lockwood, P.L., Wittmann, M.K., Nili, H., Matsumoto-Ryan, M., Abdurahman, A., Cutler, J., Husain, M., and Apps, M.A.J. (2022). Distinct neural representations for prosocial and self-benefiting effort. Current Biology 32, 4172-4185.e7. 10.1016/j.cub.2022.08.010.
    1. Author Response:

      Reviewer #1:

      The manuscript “A computationally designed fluorescent biosensor for D-serine" by Vongsouthi et al. reports the engineering of a fluorescent biosensor for D-serine using the D-alanine-specific solute-binding protein from Salmonella enterica (DalS) as a template. The authors engineer a DalS construct that has the enhanced cyan fluorescent protein (ECFP) and the Venus fluorescent protein (Venus) as terminal fusions, which serve as donor and acceptor fluorophores in resonance energy transfer (FRET) experiments. The reporters should monitor a conformational change induced by solute binding through a change of the FRET signal. The authors combine homology-guided rational protein engineering, in-silico ligand docking and computationally guided, stabilizing mutagenesis to transform DalS into a D-serine-specific biosensor applying iterative mutagenesis experiments. Functionality and solute affinity of modified DalS is probed using FRET assays. Vongsouthi et al. assess the applicability of the finally generated D-serine selective biosensor (D-SerFS) in-situ and in-vivo using fluorescence microscopy.

      Ionotropic glutamate receptors are ligand-gated ion channels that are importantly involved in brain development, learning, memory and disease. D-serine is a co-agonist of ionotropic glutamate receptors of the NMDA subtype. The modulation of NMDA signalling in the central nervous system through D-serine is hardly understood. Optical biosensors that can detect D-serine are lacking and the development of such sensors, as proposed in the present study, is an important target in biomedical research.

      The manuscript is well written and the data are clearly presented and discussed. The authors appear to have succeeded in the development of D-serine-selective fluorescent biosensor. But some questions arose concerning experimental design. Moreover, not all conclusions are fully supported by the data presented. I have the following comments.

      1) In the homology-guided design two residues in the binding site were mutated to the ones of the D-serine specific homologue NR1 (i.e. F117L and A147S), which lead to a significant increase of affinity to D-serine, as desired. The third residue, however, was mutated to glutamine (Y148Q) instead of the homologous valine (V), which resulted in a substantial loss of affinity to D-serine (Table 1). This "bad" mutation was carried through in consecutive optimization steps. Did the authors also try the homologous Y148V mutation? On page 5 the authors argue that Q instead of V would increase the size of the side chain pocket. But the opposite is true: the side chain of Q is more bulky than the one of V, which may explain the dramatic loss of affinity to D-serine. Mutation Y148V may be beneficial.

      Yes, we have previously tested the mutation of position 148 to valine (V). We have now included this data in the paper as Supplementary Information Figure 1 (below). The fluorescence titration showed that the 148V variant displayed poor D-serine specificity compared to Q148 at the same position (the sequence background of the variant was F117L/A147S/D216E/A76D. Thus, Q was superior to V at this position and V was not taken forward for further engineering. In the text, we meant that Q would increase the size of the side chain pocket relative to the wild-type amino acid, Y. We can see that this is unclear and have updated this sentence.

      Supplementary Figure 1. Dose-response curves for F117L/A147S/Y148V/D216E/A76D (LSVED) with glycine, D-alanine and D-serine. Values are the (475 nm/530 nm) fluorescence ratio as a percentage of the same ratio for the apo sensor. No significant change is detected in response to glycine. The KD for D-alanine and D-serine are estimated to be > 4000 mM based on fitting curves with the following equation:

      2) Stabilities of constructs were estimated from melting temperatures (Tm) measured using thermal denaturation probed using the FRET signal of ECFP/Venus fusions. I am not sure if this methodology is appropriate to determine thermal stabilities of DalS and mutants thereof. Thermal unfolding of the fluorescence labels ECFP and Venus and their intrinsic, supposedly strongly temperature-dependent fluorescence emission intensities will interfere. A deconvolution of signals will be difficult. It would be helpful to see raw data from these measurements. All stabilities are reported in terms of deltaTm. What is the absolute Tm of the reference protein DalS? How does the thermal stability of DalS compare to thermal stabilities of ECFP and Venus? A more reliable probe for thermal stability would be the far-UV circular dichroism (CD) spectroscopic signal of DalS without fusions. DalS is a largely helical domain and will show a strong CD signal.

      We agree that raw data for the thermal denaturation experiments should be shown and have included this in the supporting information of an updated manuscript (Supplementary Data Figure 7). The data plots ECFP/Venus fluorescence ratio against temperature. When the temperature is increased from 20 to 90 °C, we observe two transitions in the ECFP/Venus fluorescence ratio. The fluorescent proteins are more thermostable than the DalS binding protein, and that temperature transition does not vary (~90 °C); thus, the first transition corresponds to the unfolding of the binding protein and the second transition to the unfolding or loss of fluorescence from the fluorescent proteins. This is an appropriate method for characterising the thermostability of the binding protein in the sensor for two main reasons. Firstly, the calculated melting temperature from the first sigmoidal transition changes upon mutation to the binding protein in a predictable way (e.g. mutations to the binding site/protein core are destabilising), while the second transition occurs consistently at ~ 90 °C. This supports that the first transition corresponds to the unfolding of the binding protein. Secondly, characterising the stability of the binding protein in the context of the full sensor is more relevant to the end-application. Excising the binding domain and testing that in isolation would results in data that are not directly relevant to the sensor. The absolute thermostabilities for all variants can be found in Table 1 of the manuscript.

      Supplementary Figure 7. The (475 nm/530 nm) fluorescence ratio as a function of increasing temperature (20 – 90 °C) for key variants in the engineering trajectory of D-serFS. Values are normalised as a percentage of the same ratio for the sensor at 20 °C and are represented as mean ± s.e.m. (n = 3). The first sigmoidal transition in the data changes upon mutation to the binding protein while the second transition begins at ~ 90 °C for all variants. The second transition is not observed in full as the upper temperature limit for the experiment is 90 °C.

      3) The final construct D-SerFS has a dynamic range of only 7%, which is a low value. It seems that the FRET signal change caused by ligand binding to the construct is weak. Is it sufficient to reliably measure D-serine levels in-situ and in-vivo?

      First, we have modified the sensor, which now has a dynamic range of 14.7% (Figure 5, below). The magnitude of the change is reasonable for this sensor class; they function with relative low dynamic range because they are ratiometric sensors, i.e. they are accurate even with low dynamic range because of their ratiometric property. For example, the Gly-sensor GlyFS published in 2018 (Nature Chem. Biol.) has one of the highest dynamic ranges in this sensor class of only ~28%. The Glu sensor described by Okumuto et al., (2005) (PNAS, 102, 8740) has a dynamic range of ~9%. So, the FRET change is not a low value for ratiometric sensors of this class (which have been used very effectively for over a decade). Most importantly, the data from experiments with biological tissue and in vivo (Fig. 6) demonstrate a detectable (and statistically significant) response to changes in D-serine concentration in tissue.

      Figure 5. Characterization of full-length D-serFS. (A) Schematic showing the ECFP (blue), D-serFS binding protein (D-serFS BP; grey) and Venus (yellow) domains in D-serFS. The C-terminal residues of the Venus fluorescent protein sequence are labelled, showing the truncated (top) and full-length (bottom) C-terminal sequences. The underlined amino acids in truncated D-serFS represent residues introduced from the backbone vector sequence during cloning. Represents the STOP codon. (B) Sigmoidal dose response curves for truncated and full-length D-serFS with D-serine (n = 3). Values are the (475 nm/530 nm) fluorescence ratio as a percentage of the same ratio for the apo sensor. (C) Binding affinities (M) determined by fluorescence titration of truncated and full-length D-serFS, for glycine, D-alanine and D-serine (n = 3).*

      In Figure 5H in-vivo signal changes show large errors and the signal of the positive sample is hardly above error compared to the signal of the control.

      We have removed the in vivo data. Regardless, the comment is incorrect. Statistical analysis confirms that there is no significant change in the control (P = 0.08411), whereas the change for the sample with D-serine was significant to P = 0.00998.

      “H) ECFP/Venus ratio recorded in vivo in control recordings (left panel, baseline recording first, control recording after 10 minutes; paired two-sided Student’s t-test vs. baseline, t(6) = -2.07,P = 0.08411; n = 6 independent experiments) and during D-serine application (right panel, baseline recording first, second recording after D-serine injection, 1 mM; paired two-sided Student’s t-test vs. baseline, t(3) = -5.85,P = 0.00998; n = 4 independent experiments). Values are mean +- s.e.m. throughout. **P < 0.01.”

      Figure 5G is unclear. What does the fluorescence image show?

      We have removed the in-vivo data from the manuscript. However, Figure 6 in the original manuscript shows a schematic of how the sensor is applied to the brain for in-vivo experiments (biotin injection, followed by sensor injection and then imaging). The fluorescence image shows the detected Venus fluorescence following pressure loading of the sensor into the brain.

      Work presented in this manuscript that assesses functionality and applicability of the developed sensor in-situ and in-vivo is limited compared to the work showing its design. For example, control experiments showing FRET signal changes of the wild-type ECFP-DalS-Venus construct in comparison to the designed D-SerFS would be helpful to assess the outcome.

      Indeed, the in situ and in vivo work was never the focus of the study, which is already a large paper. To avoid confusion, the in vivo work is now omitted and the in situ work is present to show proof, in principle, that the sensor can be used to image D-serine. We reiterate – this is a protein engineering paper, not a neuroscience paper.

      4) The FRET spectra shown in Supplementary Figure 2, which exemplify the measurement of fluorescence ratios of ECFP/Venus, are confusing. I cannot see a significant change of FRET upon application of ligand. The ratios of the peak fluorescence intensities of ECFP and Venus (scanned from the data shown in Supplementary Figure 2) are the same for apo states and the ligand-saturated states. Instead what happens is that fluorescence emission intensities of both the donor and the acceptor bands are reduced upon application of ligand.

      We thank the reviewer for bringing this to our attention. The spectra were not normalised to account for the effect of dilution when saturating with ligand, giving rise to an observed decrease in emission intensity from both ECFP and Venus. We can also see how the figure is hard to interpret when both variants are displayed on the same axes, so we have separated them in an updated figure shown below and normalised the data as a percentage of the maximum emission intensity from ECFP at 475 nm. This has been changed in the supporting information of an updated manuscript. Hopefully it is now clear that there is a ratiometric change upon addition of ligand.

      Figure 3. Emission spectra (450 – 550 nm) of (A) LSQED and (B) LSQED-T197Y (LSQEDY) upon excitation of ECFP (lexc = 433 nm), normalised to the maximum emission intensity from ECFP (475 nm). For all sensor variants, the FRET efficiency decreases in response to saturation with D-serine (A, B; orange), leading to decreased emission from Venus (530 nm) relative to ECFP (475 nm). When comparing the apo states of LSQED and LSQEDY (A, B; dark green), it can be seen that the T197Y mutation results in a decreased Venus emission (lower FRET efficiency). This suggests a shift in the apo population of the sensor towards the spectral properties of the saturated, closed state and explains the decreased dynamic range of LSQEDY compared to LSQED. Values are mean ± s.e.m (n = 3).

      Reviewer #2:

      The authors describe the development and use of a D-Serine sensor based on a periplasmic ligand binding protein (DalS) from Salmonella enterica in conjunction with a FRET readout between enhanced cyan fluorescent protein and Venus fluorescent protein. They rationally identify point mutations in the binding pocket that make the binding protein somewhat more selective for D-serine over glycine and D-alanine. Ligand docking into the binding site, as well as algorithms for increasing the stability, identified further mutants with higher thermostability and higher affinity for D-serine. The combined computational efforts lead to a sensor for D-serine with higher affinity for D-serine (Kd = ~ 7 µM), but also showed affinity for the native D-alanine (Kd = ~ 13 uM) and glycine (Kd = ~40 uM). Molecular simulations were then used to explain how remote mutations identified in the thermostability screen could lead to the observed alteration of ligand affinity. Finally, the D-SerFS was tested in 2P-imaging in hippocampal slices and in anesthetized mice using biotin-straptavidin to anchor exogenously applied purified protein sensor to the brain tissue and pipetting on saturating concentrations of D-serine ligand.

      Although presented as the development of a sensor for biology, this work primarily focuses on the application of existing protein engineering techniques to alter the ligand affinity and specificity of a ligand-binding protein domain. The authors are somewhat successful in improving specificity for the desired ligand, but much context is lacking. For any such engineering effort, the end goals should be laid out as explicitly as possible. What sorts of biological signals do they desire to measure? On what length scale? On what time scale? What is known about the concentrations of the analyte and potential competing factors in the tissue? Since the authors do not demonstrate the imaging of any physiological signals with their sensor and do not discuss in detail the nature of the signals they aim to see, the reader is unable to evaluate what effect (if any) all of their protein engineering work had on their progress toward the goal of imaging D-serine signals in tissue.

      As a paper describing a combination of protein engineering approaches to alter the ligand affinity and specificity of one protein, it is a relatively complete work. In its current form trying to present a new fluorescent biosensor for imaging biology it is strongly lacking. I would suggest the authors rework the story to exclusively focus on the protein engineering or continue to work on the sensor/imaging/etc until they are able to use it to image some biology.

      Additional Major Points:

      1) There is no discussion of why the authors chose to use non-specific chemical labeling of the tissue with NHS-biotin to anchor their sensor vs. genetic techniques to get cell-type specific expression and localization. There is no high-resolution imaging demonstrating that the sensor is localized where they intended.

      We use non-specific chemical labelling for proof-of-concept experiments that show the sensor can respond to changes in D-serine concentration in the extracellular environment of brain tissue. Cell-type specific expression of the sensor is possible based on our previous development of a similar sensor for glycine (Zhang et al., 2018; doi: https://doi.org/10.1038/s41589-018-0108-2) where the sensor was expressed by HEK293 cells and neurons, and targeted to the membrane. However, this is beyond the scope of this manuscript. Figure 5G of the original manuscript shows that the sensor (identified by Venus fluorescence) is localized to the area where D-serFS is pressure-loaded into the brain.

      2) Why does the fluorescence of both the CFP and they YFP decrease upon addition of ligand (see e.g. Supplementary Figure 2)? Were these samples at the same concentration? Is this really a FRET sensor or more of an intensiometric sensor? Is this also true with 2P excitation? How does the Venus fluorescence change when Venus is excited directly? Perhaps fluorescence lifetime measurements could help inform what is happening.

      Please see response to major comments from reviewer #1 and Figure 3. We hope this clarifies that the sensor is ratiometric. The sensor behaves similarly under two-photon excitation (2PE) as shown in Figure 5A.

      3) How reproducible are the spectral differences between LSQED and LSQED-T197Y? Only one trace for each is shown in Supplementary Figure 2 and the differences are very small, but the authors use these data to draw conclusions about the protein open-closed equilibrium.

      We have updated this to show data points representing the mean ± s.e.m (n = 3).

      4) The first three mutations described are arrived upon by aligning DalS (which is more specific for D-Ala) with the NMDA receptor (which binds D-Ser). The authors then mutate two of the ligand pocket positions of DalS to the same amino acid found in NMDAR, but mutate the third position to glutamine instead of valine. I really can't understand why they don't even test Y148V if their goal is a sensor that hopefully detects D-Ser similar to the native NMDAR. I'm sure most readers will have the same confusion.

      Please see response to major comments from reviewer #1. Additionally, while the NR1 binding domain of the NMDAR was used a structural guide for rational design of the DalS binding site, the high affinity of the NMDAR for both D-serine and glycine was not desirable in a D-serine-specific sensor.

    1. Author response:

      Reviewer #1 (Public Review):

      This is an important and very well conducted study providing novel evidence on the role of zinc homeostasis for the control of infection with the intracellular bacterium S. typhimurium also disentangling the underlying mechanisms and providing clear evidence on the importance of spatio-temporal distribution of (free) zinc within the cell.

      We thank the reviewer for the positive comments.

      1) It would be important to provide more information on the genotype of mice.

      As suggested by the reviewer, we have added the detailed genotype of Slc30a1flagEGFP/+ and Slc30a1fl/flLysMCre mice to the revised supplementary Figure supplement 10.

      2) It is rather unlikely that C57Bl6 mice survive up to two weeks after i.p. injection of 1x10E5 bacteria.

      According to the reviewer comment, we have tested survival rate using a group of our experimental animals and C57BL/6 wild type.

      The Salmonella stain is a gift from our friend, Professor Ge Bao-xue. We have sent this stain for genetic characterisation which we found 100% identity to Salmonella enterica Typhimurium with many strains originated from poultry. One of them is Salmonella enterica subsp. enterica serovar Typhimurium strain MeganVac1 (Accession: CP112994.1), a live attenuated stain. We hope that this would support the relationship between the high infectious dose and mice survive.

      Author response image 1.

      (A) Survival rate of Slc30a1fl/fl and Slc30a1fl/flLysMCre (n = 14-15/group) and (B) Survival rate of C57BL/6 wild type (n = 8) after Salmonella infection for two weeks. (C) A fulllength sequence (1,478 bases) of 16S rDNA genes sequences of Salmonella stain and (D) the sequencing electropherogram.

      3) To be sure that macrophages Slc30A1 fl/fl LysMcre mice really have an impaired clearance of bacteria it would be important to rule out an effect of Slc30A1 deletion of bacterial phagocytosis and containment (f.e. evaluation of bacterial numbers after 30 min of infection).

      As the reviewer advised, we have repeated the experiment and measured the bacterial numbers after 30 min of infection (dashed line in A). The results show that there is no statistical difference in the bacterial numbers after 30 min between Slc30a1fl/flLysMCre and Slc30a1fl/fl BMDMs. Therefore, the reduction of bacterial numbers after 24 hours occurs due to the impairment of intracellular pathogen-killing capacity as the reviewer pointed out.

      Author respnse image 2.

      (A) Time course of the intracellular pathogen-killing capacity of Salmonellainfected Slc30a1fl/flLysMCre and Slc30a1fl/fl BMDMs measured in colony-forming units per ml (n = 5). (B) Fold change in Salmonella survival (CFU/mL) at different time points from A. (C) Representative images of Salmonella colonies on solid agar medium at 24 hours. Data are represented as mean ± SEM. P values were determined using 2-tailed unpaired Student’s t-test. P<0.05, *P<0.01, and ns, not significant.

      4) Does the addition of zinc to macrophages negatively affect iNOS transcription as previously observed for the divalent metal iron and is a similar mechanism also employed (CEBPß/NF-IL6 modulation) (Dlaska M et al. J Immunol 1999)?

      The reviewer has raised an important point here since free zinc also play a role in multiple levels of cellular signaling components (Kembe et al., 2015). Dlaska and colleague reported that NF-IL6, a protein responsible for iNOS transcription is negatively regulated by iron perturbation under IFNg/LPS stimulation in macrophages (Dlaska and Weiss, 1999). As the reviewer suggested, our results showed that zinc supplementation decreases the iNOS expression in macrophages after Salmonella infection, suggesting that free zinc might play a role in iNOS regulation.

      However, in Slc30a1fl/flLysMCre macrophages, despite increase intracellular free zinc, lacking Slc30a1 also induces Mt1, a zinc reservoir which might negatively affect NO production (Schwarz et al., 1995) or alternatively inhibits iNOS through NF-kB pathway (Cong et al., 2016) as reported by previous studies. Therefore, we couldn’t rule out the possibility that defects in Salmonella clearance due to iNOS/NO inhibition may be caused by a complex combination of excess free zinc and overexpression of the zinc reservoir. To prove this hypothesis, further studies using the specific target, for example Mtfl/fliNOSfl/flLysMCre model might be needed to investigate the precision mechanism.

      Author response image 3.

      RT-qPCR analysis of mRNA encoding Nos2 in BMDMs after infected with Salmonella and Salmonella plus ZnSO4 (20 μM) for 4 h.

      Reference:

      Dlaska M, Weiss G. 1999. Central role of transcription factor NF-IL6 for cytokine and ironmediated regulation of murine inducible nitric oxide synthase expression. The Journal of Immunology. 162:6171-6177, PMID: 10229861

      Kambe T, Tsuji T, Hashimoto A, Itsumura N. 2015. The physiological, biochemical, and molecular roles of zinc transporters in zinc homeostasis and metabolism. Physiological Reviews. 95:749-784. https://doi: 10.1152/physrev.00035.2014, PMID: 26084690

      Schwarz MA, Lazo JS, Yalowich JC, Allen WP, Whitmore M, Bergonia HA, Tzeng E, Billiar TR, Robbins PD, Lancaster JR Jr, et al. 1995. Metallothionein protects against the cytotoxic and DNA-damaging effects of nitric oxide. Proceedings of the National Academy of Sciences of the United States of America. 92: 4452-4456. https://doi: 10.1073/pnas.92.10.4452, PMID: 7538671

      Cong W, Niu C, Lv L, Ni M, Ruan D, Chi L, Wang Y, Yu Q, Zhan K, Xuan Y, Wang Y, Tan Y, Wei T, Cai L, Jin L. 2016. Metallothionein prevents age-associated cardiomyopathy via inhibiting NF-κB pathway activation and associated nitrative damage to 2-OGD. Antioxidants & Redox Signaling. 25: 936-952. https://doi: 10.1089/ars.2016.6648, PMID: 27477335

      5) How does Zinc or TPEN supplementation to bacteria in LB medium affect the log growth of Salmonella?

      We found that zinc supplementation at both low (20 µM) and high (640 µM) concentrations negatively effects Salmonella growth, especially during log phase and stationary phase in the broth culture medium, but not TPEN (20 µM) supplementation. These indicates that high zinc conditions occur at cellular levels such as within phagosomes (Botella et al., 2011) can limit bacterial growth.

      Author response image 4.

      Growth curve (optical density, OD 600 nm) of Salmonella in LB medium at different concentrations of ZnSO4 and/or TPEN. Bar graph indicating Salmonella growth at specific time points. Each value was expressed as mean of triplicates for each testing and data were determined using 2-tailed unpaired Student’s t-test. P<0.05, P<0.01, **P<0.001 and ns, not significant.

      Reference:

      Botella H, Peyron P, Levillain F, Poincloux R, Poquet Y, Brandli I, Wang C, Tailleux L, Tilleul S, Charrière GM, Waddell SJ, Foti M, Lugo-Villarino G, Gao Q, Maridonneau-Parini I, Butcher PD, Castagnoli PR, Gicquel B, de Chastellier C, Neyrolles O. 2011. Mycobacterial p(1)-type ATPases mediate resistance to zinc poisoning in human macrophages. Cell Host Microbe. 10:248-59. https://doi: 10.1016/j.chom.2011.08.006, PMID: 21925112

      Reviewer #2 (Public Review):

      This paper explores the importance of zinc metabolism in host defense against the intracellular pathogen Salmonella Typhimurium. Using conditional mice with a deletion of the Slc30a1 zinc exporter, the authors show a critical role for zinc homeostasis in the pathogenesis of Salmonella. Specifically, mice deficient in Slc30a1 gene in LysM+ myeloid cells are hypersusceptible to Salmonella infection, and their macrophages show alter phenotypes in response to Salmonella. The study adds important new information on the role metal homeostasis plays in microbe host interactions. Despite the strengths, the manuscript has some weaknesses. The authors conclude that lack of slc30a1 in macrophages impairs nos2-dependent anti-Salmonella activity. However, this idea is not tested experimentally. In addition, the research presented on Mt1 is preliminary. The text related to Figure 7 could be deleted without affecting the overall impact of the findings.

      We thank the reviewer for his/her positive comments and constructive suggestions.

      Reviewer #3 (Public Review):

      Na-Phatthalung et al observed that transcripts of the zinc transporter Slc30a1 was upregulated in Salmonella-infected murine macrophages and in human primary macrophages therefore they sought to determine if, and how, Slc30a1 could contribute to the control of bacterial pathogens. Using a reporter mouse the authors show that Slc30a1 expression increases in a subset of peritoneal and splenic macrophages of Salmonella-infected animals. Specific deletion of Slc30a1 in LysM+ cells resulted in a significantly higher susceptibility of mice to Salmonella infection which, counter to the authors conclusions, is not explained by the small differences in the bacterial burden observed in vivo and in vitro. Although loss of Slc30a1 resulted in reduced iNOS levels in activated macrophages, the study lacks experiments that mechanistically link loss of NO-mediated bactericidal activity to Salmonella survival in Slc30a1 deficient cells. The additional deletion of Mt1, another zinc binding protein, resulted in even lower nitrite levels of activated macrophages but only modest effects on Salmonella survival. By combining genetic approaches with molecular techniques that measure variables in macrophage activation and the labile zinc pool, Na-Phattalung et al successfully demonstrate that Slc30a1 and metallothionein 1 regulate zinc homeostasis in order to modulate effective immune responses to Salmonella infection. The authors have done a lot of work and the information that Slc30a1 expression in macrophages contributes to control of Salmonella infection in mice is a new finding that will be of interest to the field. Whether the mechanism by which SLC30A1 controls bacterial replication and/or lethality of infection involves nitric oxide production by macrophages remains to be shown.

      We very much appreciate the reviewer’s detailed evaluation and suggestions. The manuscript has been revised thoroughly according to the reviewer’s advice.

    1. Author Response

      Reviewer #1 (Public Review):

      This work focuses on the mechanisms that underlie a previous observation by the authors that the type VI secretion system (T6SS) of a Pseudomonas chlororaphis (Pchl) strain can induce sporulation in Bacillus subtilis (Bsub). The authors bioinformatically characterize the T6SS system in Pchl and identify all the core components of the T6SS, as well as 8 putative effectors and their domain structures. They then show that the Pchl T6SS, and in particular its effector Tse1, is necessary to induce sporulation in Bsub. They demonstrate that Tse1 has peptidoglycan hydrolase activity and causes cell wall and cell membrane defects in Bsub. Finally, the authors also study the signaling pathway in Bsub that leads to the induction of sporulation, and their data suggest that cell wall damage may lead to the degradation of the anti-sigma factor RsiW, leading to activation of the extracellular sigma factor σW that causes increased levels of ppGpp. Sensing of high ppGpp levels by the kinases KinA and KinB may lead to phosphorylation of Spo0F, and induction of the sporulation cascade.

      The findings add to the field's understanding of how competitive bacterial interactions work mechanistically and provide a detailed example of how bacteria may antagonize their neighbors, how this antagonism may be sensed, and the resulting defensive measures initiated.

      While several of the conclusions of this paper are supported by the data, additional controls would bolster some aspects of the data, and some of the final interpretations are not substantiated by the current data.

      • The Bsub signaling pathway that is proposed is intricate and extensive as shown in Fig 5A. However, the data supporting that is very sparse:

      a) The authors show no data showing that the proteases PrsW and/or RasP, or the extracellular sigma factor σW are necessary, or that the cleavage of RsiW is needed, for induction of sporulation - this could presumably be tested using mutants of those genes.

      It has been previously demonstrated that the proteases PrsW and/or RasP cleave RsiW under certain conditions such as alkaline-shock (Heinrich et al., 2009). In first place, PrsW cleaves RsiW and the resulting cleaved-RsiW serves as substrate to RasP. In the previous version of the manuscript, we already demonstrated that treatment with Tse1 causes damage to PG and delocalization of RsiW, however as the reviewer comments we did not show the participation of any of these proteases in the proposed signaling pathway. We have now generated single mutants in rsiW and prsW and they have been treated with Tse1. We have observed no variation in the levels of sporulation compared to untreated strains (Figure 1) a finding according to their suggested implication in the sporulation signaling pathway activated by Tse1. Positive controls, that is the single mutants grown at 37ºC, were still able to sporulate. This data has been added to Figure 6B in the new version of the manuscript.

      As suggested by other reviewers, we have generated a sister plot of this figure showing the raw CFUs in each case. These data are included in Supplementary file 3. This experiment and the related figure have been incorporated into the new version of the manuscript.

      Figure 1. A) Quantification of the percentage of sporulated Bsub, rsiW and prsW cells after treatment with purified Tse1 showing that rsiW and prsW single mutants are blind to the presence of Tse1. B) Cell density (CFUs/mL) of total (blue bars) and sporulated population (brown bars) of different Bacillus strains (Bsub, ∆rsiW and ∆prsW) untreated and treated with Tse1. Sporulation at 37ºC is shown as positive control in each strain. Statistical significance was assessed via t-tests. p value < 0.1, p value < 0.001, **p value < 0.0001.

      Similarly, they don't demonstrate that the levels of ppGpp increase in the cell upon exposure to Pchl.

      We have not been able to measure the levels of ppGpp, however, given that in the same proposed sporulation cascade the levels of different nucleotides are altered (Kriel et al., 2013, Tojo et al., 2013, López and Kolter, 2010), we have alternatively analyzed the levels of ATP using an ATP Determination Kit (Thermo, A22066). We have found that ATP levels increased by 3-fold in Bsub cells treated with Tse1 compared to untreated control cells. Consistently, no increase in ATP levels were observed in rsiW or prsW mutants treated with Tse1. We have incorporated all the raw luminescence data obtained for each sample and treatment in Figure 6-source data 1. This experiment, figures (Figure 6A in the new version of the manuscript) and description in “Materials and Methods” have been added to the new version of the manuscript.

      c) There is some data showing that kinA and kinB mutants don't induce sporulation (Fig supplement 7A), but that is lacking the 'no attacker' control that would demonstrate an induction.

      We have included in the new version of the manuscript the ‘no attacker’ control sporulation (%). The figure shows that the presence of Pchl strains induces the sporulation of all kinase mutants. This new data has been incorporated in Figure 6-figure supplement 1A in the new version of the manuscript.

      d) There is some data showing that RsiW may be cleaved (Fig 5C, D), but that data would benefit from a positive control showing that the lack of YFP foci is seen in a condition where RsiW is known to be cleaved, as well as from a time-course showing that the foci are present prior to the addition of Tse1, and then disappear. As it is shown now, it is possible that the addition of Tse1 just blocks the production of RsiW or its insertion into the membrane (especially given the membrane damage seen). Further, there is no data that the disappearance of the YFP loci requires the proteases PrsW and /or RasP - such data would also support the idea that the disappearance is due to cleavage of RsiW.

      Thank you for your useful suggestion. It is important to consider that we have not seen repression of the expression of genes that encode any of the two proteases on cells treated with Tse1 in our transcriptomics analysis. However, we agree that additional experiments would enhance the significance of our findings. We have repeated the whole experiment including a positive control to demonstrate that YFP foci disappears in a condition in which RsiW is known to be degraded by PrsW and RasP. Bacillus cells have been incubated in medium at pH 10 which provokes an alkaline shock that triggers RsiW cleavage (Asai, 2017; Heinrich et al., 2009). As shown in Fiugre 6D under this condition we also observed disappearance of YFP foci . We have also provided extra images with quantification of average signal from YFP-foci in Figure 6-figure supplement 2 .

      • The entire manuscript suggests that T6SS is solely responsible for the induction of sporulation. While T6SS does appear to play a major part in explaining the sporulation induction seen, in the absence of 'no attacker' controls for Fig. 2A, it is impossible to see this. From the data shown in Fig. 2C, and figure supplement 2A, the 'no attacker' sporulation rate seems to be ~20%, while the rate is ~40% with Pchl strains lacking T6SS, suggesting that an additional factor may be playing a role.

      This must be a misunderstanding of the message of this manuscript. The conceptual fundament of this study was settled in our previous manuscript (Molina-Santiago et al., 2019). We demonstrated that B. subtilis sporulated in the presence of P. chlororaphis. Interestingly, the overgrowth of P. chlororaphis over B. subtilis colony did not eliminate cells of B. subtilis, given that most of them were sporulated. The data we obtained strongly suggested that a functional T6SS was involved in the cellular response of Bacillus in the close cell to cell contact. In this new manuscript, we have explored this idea, and found that indeed, the T6SS of P. chlororaphis mobilized at least one effector, Tse1, which is able to trigger sporulation in Bacillus. Thus we did not conclude, and neither have done in this new study, that T6SS is the only factor expressed by P. chlororaphis responsible for sporulation activation in Bacillus. We have accordingly rephrased some sentences of the manuscript to clarify the proposed implication of T6SS in B. subtilis sporulation.

      In addition, as mentioned above, we have included data of sporulation percentages in the absence of an attacker to better compare the induction of sporulation observed in the presence of the different Pchl strains and in the presence of Tse1.

      Reviewer #2 (Public Review):

      In a previous study, the authors showed that cell-cell contact with Pseudomonas chlororaphis induces sporulation in Bacillus subtilis. Here, the authors build on this finding and elucidate the mechanism behind this observation. They describe the enzymatic activity of a protein (Tse1) secreted by the type VI secretion system (T6SS) of P. chlororaphis (Pch), which partially degrades the peptidoglycan (PG) of targeted B. subtilis cells and triggers a signal cascade culminating in sporulation.

      Most of the key conclusions of this paper (Tse1 being secreted by the T6SS and inducing sporulation in targeted cells) are well supported by the data. One conclusion (sporulation response being an anti-T6SS "defense" strategy) is not well supported by the data and should be removed or rephrased.

      The authors elucidate the enzymatic activity of Tse1, a T6SS effector protein, in a genus (Pseudomonas) of great interest to microbiologists, and to researchers studying the T6SS specifically. They also carefully dissect the cellular response (signal cascade and sporulation) of an important model organism (B. subtilis; Bsub) specifically to exposure to Tse1. The results describing this cellular response contribute substantially to our understanding of how T6SS effector proteins interact with cells of Gram-positive species.

      My only major concerns regard the interpretation of these results as sporulation being an adaptive and/or specific response to attacks by the T6SS. I outline my reasoning below.

      • Interpretation of sporulation as a "defense" mechanism/strategy against the T6SS. In order for a phenotype X to be regarded as a "defense against Y" mechanism, it has to be shown that phenotype X (sporulation in response to Tse1) evolved - at least in part - for the purposes of increasing survival in the presence of Y (T6SS attacker). There are no experiments in this study comparing e.g. a sporulating Bsub with a non-sporulating Bsub, that would allow testing if sporulation increases survival. The experiments carefully describe the cellular response to Tse1, but no inference can be made with regards to this being adaptive for Bsub, or if it helps the cells survive against T6SS attacks, etc. A more parsimonious explanation would be that Tse1 happens to target the PG and causes envelope stress, triggering sporulation. So, it would be a general stress response that also happens to be triggered by T6SS. Now, some general (cell envelope) stress responses are known to be very effective at protecting against the T6SS. But in those instances, a beneficial effect for survival in the face of T6SS attacks has been shown in dedicated experiments. Purely observing a response to a T6SS effector, as this study does (very well), is not evidence that the response has evolved for the purpose of surviving T6SS attacks. Tucked away in the supplement (and briefly mentioned in the main text) is data on Bsub and Bacillus cereus, showing that i) cell densities of the sporulating Bsub and a sporulating B. cereus strain are not affected by an active T6SS, and ii) cell densities of an asporogenic B. cereus are slightly reduced by an active T6SS. However, the effect sizes of density reduction by the T6SS in the asporogenic B. cereus are minute (20x10^6 vs. ~50x10^6). In typical killing assays against e.g. gram-negative strains, a typical effect size for T6SS killing would be a several order of magnitude reduction in survival of the target strain when exposed to a T6SS attacker. Based on this dataset alone (Figure Suppl. 8), I would say that all three Bacillus strains are not experiencing any "fitness-relevant" killing by the T6SS, which is in line with the T6SS often being useless against gram-positives when it comes to killing. Hence, no claims about fitness benefits of sporulation in response to a T6SS attack, or this being a "defense mechanism/strategy" should be made in the manuscript.

      Thanks for this interesting introductory and specific comments. We agree with the reviewer and have rephrased some sentences of the manuscript. Sporulation is not an adaptive or specific response of Bacillus to T6SS, indeed and as stated by reviewer 2, sporulation is a general stress response. It might happen that the way the manuscript was written, at some points, gave the wrong impression. In consequence we have rephrased some sentences. Nevertheless, in Figure supplement 8 (in the new version of the manuscript is Figure 6-figure supplement 3) we made a mistake during generation of the Figure. We have again done this experiment and we have generated a new and corrected chart that shows three orders of magnitude reduction in survival of the asporogenic B. cereus strain in competition with Pchl mutant strains compared to Pchl WT strain. These new findings show that the absence of sporulation ability leads to a severe reduction in survival of Bacillus cereus DSM 2302 population in competition with Pchl with an active T6SS compared to the survival in competition with Pchl hcp mutant. In this figure, it is also shown that Bacillus population also decreased in competition with tse1 mutant, demonstrating that Tse1 is responsible for killing Bacillus. However, there is a statistical difference in the survival of Bacillus competing with hcp or tse1 mutants. The increased survival of Bacillus in the interaction with tse1 strain compared to Bacillus-hcp competition, is suggestive of the ability of this strain to deliver additional T6SS-dependent toxins. This observation is in accordance to the data presented in Fig. 2B, which indicated that tse1 mutant has an active T6SS able to kill E. coli.

      • Data supporting baseline "no competitor" sporulation rates being no different from those triggered by T6SS mutants is not convincing. For the data shown in Fig. 2A, a key comparison here would be to show baseline Bsub sporulation rates in absence of a competitor. This measurement is shown in Fig supplement 2A, and the value shown there (roughly 22% on average) appears to be much lower than the average T6SS mutant shown in Fig. 2A. The main text states that sporulation rates induced rate by the different T6SS mutants are "statistically" similar to the no-competitor baseline (L206/207). I am not convinced by this, since i) overall sporulation rates (incl of WT Pch) appear to have been lower in the experiment shown in supplement 2A, so a direct comparison between the no-competitor baseline and the data shown in Fig. 2A is not possible; and ii) hcp and tse1 mutants were tested in different experiments throughout the study, and sporulation rates appear to consistently hover around 30-40%, which is higher than the roughly 22% for "no competitor" depicted in Supplement Fig2A. I am focussing on this, because for the interpretation of the results, and the main narrative of the paper, knowing if "simply interacting with a T6SS-negative P. chlororaphis" induces some sporulation would make a big difference. One sentence in the discussion adds to my confusion about this: L464/465, "... a strain lacking paar (Δpaar) had an active T6SS that triggered sporulation comparably to Δhcp, ΔtssA, and Δtse1 strains", suggesting that the authors' claims that even strains lacking active T6SS trigger increased sporulation (which I would agree with, based on the data).

      We understand the reviewer's comment that a direct comparison between the two figures is not correct due to fluctuations of the baseline sporulation rates between experiments. To solve this issue, we have added the baseline "no competitor" sporulation percentages in the experiments represented in Figure 2B in the new version of the manuscript.

      Related with the sporulation provoked by a T6SS-negative P. chlororaphis, the reviewer is right. Bacillus sporulation occurs due to many external factors (abiotic and biotic stresses) so the presence of P. chlororaphis in the competition already has an effect on the sporulation percentage of B. subtilis. Accordingly, we have removed the statement on the sporulation rates induced by the different T6SS mutants are "statistically" similar to the no-competitor. However, our previous data (Molina-Santiago, Nat Comm 2019) and current findings convincedly demonstrate the relevance of the T6SS and, specifically the Tse1 toxin, in the induction of sporulation at least in the close cell to cell contact.

      • Claim regarding "bacteriolytic activity" when tse1 is heterologously expressed in E. coli. The data supporting this claim (Fig2-supplement 2C) only shows a lower net population growth rate after induction of tse1 (truncated vs. non-truncated) expression. This could be caused by: slower growth (but no death), equal growth (with some death), or a combination of the two. The claim of "bacteriolytic" activity in E. coli is therefore not supported by this dataset.

      We agree with the reviewer and we have decided to remove this figure and the experiment of “bacteriolytic activity” given that it does not contribute conceptually to the message of the manuscript.

      I cannot comment in more detail on the validity of the biochemistry/enzymatic activity assays as these are not my area of expertise.

      Reviewer #3 (Public Review):

      The authors identify tse1, a gene located in the type 6 secretion system (T6SS) locus of the bacterium Pseudomonas chlororaphis, as necessary and sufficient for induction of Bacillus subtilis sporulation. The authors demonstrate that Tse1 is a hydrolase that targets peptidoglycan in the bacterial cell wall, triggering activation of the regulatory sigma factor sigma-w. The sporulation-inducing effects of sigma-w are dependent on the downstream presence of the sensor histidine kinases KinA and KinB. Overall, this is a well-structured paper that uses a combination of methods including bacterial genetics, HPCL, microscopy, and immunohistochemistry to elucidate the mechanism of action of Tse1 against B. subtilis peptidoglycan. There are some concerns regarding a few experimental controls that were not included/discussed and (in a few figures) the visual representation of the data could be improved. The structure of the manuscript and experiments is such that key questions are addressed in a logical flow that demonstrates the mechanisms described by the authors.

      To begin, we have concerns regarding the sporulation assays and their results. The data should be presented as "Percent sporulation" or "Sporulation (%)" - not as a "sporulation rate": there is no kinetic element to any of these measurements, so no rate is being measured (be careful of this in the text as well, for instance near lines 204). More importantly, there is no data provided to indicate that changes in percent spores are not instead just the death of non-sporulated cells. For example, imagine that within a population of B. subtilis cells, 85% of the cells are vegetative and 15% are spores. If, upon exposure to tse1, a large proportion of the vegetative cells are killed (say, 80% of them), this could lead to an apparent increase in sporulation: from 15% for the untreated population to ~50% of the treated, but the difference would be entirely due to a change in the vegetative population, not due to a change in sporulation. The authors need to clearly describe how they conducted their sporulation assays (currently there is no information about this in the methods) as well as provide the raw data of the counts of vegetative cells for their assays to eliminate this concern.

      Thanks for the suggestion. We have changed all the titles and data presented as “sporulation rate” by “sporulation (%)” or “sporulation percentage”. As also suggested by reviewer 2, we have included the raw data of the CFUs counts of total population and sporulated cells to show that there is no substantial change in the rate of death. Also, we have added a section in Material and Methods to specify how sporulation assays have been done. Quote text:

      “Sporulation assays

      Spots of bacteria were resuspended in 1 mL sterile distilled water. Then, serial dilutions were made and cultured in LB solid media for vegetative cells CFU counts. The same serial dilutions were further heated at 80ºC for 10 minutes to kill vegetative cells and immediately cultured again in LB solid media. Plates were grown overnight at 28 ºC and the resulting colonies were counted to calculate the percentage of Bsub sporulation (%). A list of raw CFUs (total and spore population) from all figures with sporulation percentage is shown in Supplementary file 3.”

      A related concern is regarding the analysis of the kinases and the effects of their deletions on the impact of Tse1. Previous literature shows that the basal levels of sporulation in a B. subtilis kinA or a kinB mutant are severely defective relative to a wild-type strain; these mutants sporulate poorly on their own. Therefore, the data presented on Lines 394+ and the associated Supplemental Figure regarding the sporulation defects of these two mutants are not compelling for showing that these kinases are required for this effector to act. It is likely that simply missing these kinases would severely impact the ability of these strains to sporulate at all, irrespective of the presence of Tse1, and no discussion of this confounding concern is discussed.

      Previous literature shows that mutation of kinases affects sporulation of B. subtilis. Histidine kinases KinA and KinB are the first responsible for initiation of sporulation cascade upon phosphorylation of spo0F. However, as shown in Figure 6-figure supplement 1A, single mutants in these kinases (ΔkinA, ΔkinB) still sporulate given that the phosphorylation cascade is controlled by numerous intermediaries and other histidine kinases that form a multicomponent phosphorelay (KinA-E). In this context, the sporulation of B. subtilis can be also triggered by KinC or KinD in the absence of KinA or KinB, as KinC/KinD can act directly on the master regulator of sporulation Spo0A (Burbulys et al., 1991; Wang et al., 2017).

      In addition, as suggested by reviewer 1, we have added to Figure 6-figure supplement 1A of the new version of the manuscript, the sporulation percentage 'no competitor' control of each kinase mutant and B. subtilis WT. The results show that, as commented by the reviewer and also supported by literature, these mutants sporulate poorly on their own in the absence of an attacker (none). However, as shown in the figure, all kinase mutants increase the sporulation percentage in the presence of a competitor.

      Another concern is regarding the statistical tests used in Figure 2. For statistical tests in A, B, and D, it should be stated whether a post-test was used to correct for multiple comparisons, and, if so, which post-test was used. to provide a stronger control comparison. For C, we suggest the inclusion of a mock control in addition to the two conditions already included (i.e., an extraction from an E. coli strain expressing the empty vector)

      We have clarified the statistical tests used in Figure 2. Briefly, we have used one-way ANOVA followed by the Dunnett test in Figure 2A, B and D for the statistical analysis of the sporulation percentage of Bsub in competition with Pchl as control group. In relation to Figure 2C, it is not possible to add a mock control with a strain carrying the empty vector, because this is a suicide plasmid (pDEST17) unable to replicate in E. coli without chromosome integration.

      An additional concern regarding controls is that there is an absence of loading controls for the immunoblot assays. In Figure 5D and all immunoblot assays, there is no mention of a loading control, which is a critical control that should be included.

      In the previous version of the manuscript, we already included a loading control for Figure 5D in Figure supplement 7B, both for cell and for supernatant fractions. In the new version of the manuscript, the loading control of Figure 6E (in the previous version of the manuscript Figure 5D) is shown in Figure 6-figure supplement 2C. We have also included the original unedited gels and blot (Figure 6-figure supplement 2- source data 1 and Figure 6-figure supplement 2-source data 2).

      Some of the visualizations could be improved to help the reader understand and appropriately interpret the data presented. For instance, in Figures 3 and 4 the scale bars are different across each of the Figure's imaging panels. These should be scaled consistently for better comparison. Additionally, the red false colorization makes the printed images difficult to see. Black-and-white would be easier to see and would not subtract from the images.

      The reviewer is right. Scales bar equal 2 in Figure 3A, but the length of the bars was not the same. We have edited the images to have the same magnifications for better comparison.

      In relation to Figure 4, we have changed the magnifications and now all the figures have the same scale bars and magnifications. In addition, we have added more images of broader fields in Figure 4-figure supplement 1 which were used to measure the percentage of permeabilized cells and to obtain the fluorescence intensity measures shown in Figure 4.

      An additional weakness of the paper is that the RNA-seq data is not fully investigated, and there is an absence of methods included regarding the RNA-seq differential abundance analysis (it is mentioned on L379-380 but no information is provided in the methods). As stated by the authors, 58% of differentially regulated genes belonged to the sw regulon, but the other 42% of genes are not discussed, and will hopefully be a target of future investigations.

      The methods section has been modified for a better explanation of the RNA-seq differential abundance analysis. Quote text: “The raw reads were pre-processed with SeqTrimNext (Falgueras et al., 2010) using the specific NGS technology configuration parameters. This pre-processing removes low-quality, ambiguous and low-complexity stretches, linkers, adapters, vector fragments, and contaminated sequences while keeping the longest informative parts of the reads. SeqTrimNext also discarded sequences below 25 bp. Subsequently, clean reads were aligned and annotated using the Bsub reference genome with Bowtie2 (Langmead and Salzberg, 2012) in BAM files, which were then sorted and indexed using SAMtools v1.484(Li et al., 2009). Uniquely localized reads were used to calculate the read number value for each gene via Sam2counts (https://github.com/vsbuffalo/sam2counts). Differentially expressed genes (DEGs) were analyzed via DEgenes Hunter, which provides a combined p value calculated (based on Fisher’s method) using the nominal p values provided by edgeR (Robinson et al., 2010) and DEseq2. This combined p value was adjusted using the Benjamini-Hochberg (BH) procedure (false discovery rate approach) and used to rank all the obtained DEGs. For each gene, combined p value < 0.05 and log2-fold change > 1 or < −1 were considered as the significance threshold”

      Regarding the RNA-seq analysis, we are aware of the amount of information that can be extracted. Previous to filtering the information shown in the manuscript, we have done bioinformatic analysis trying to find a connection with the cellular response, that is increase of sporulation. Besides this, we had some observations but with no direct connection to sporulation, which would be interesting to pursue in future studies, but not for the clarity of this story (Figure 23 below). In any case, we are including the whole picture of the transcriptomics changes occurring in Bsub after treatment with Tse1. KEGG pathway analyses of genes differentially expressed showed induction of flagellar assembly and aminobenzoate degradation, nitrogen and amino acid metabolisms. Interestingly, fatty acid degradation and CAMP resistance pathways were also induced, probably related to changes suffered in the cell wall after the action of Tse1 toxin. On the other hand, synthesis and degradation of ketone bodies pathway was mostly repressed.

      Figure 2. KEGG pathway analyses of genes differentially expressed occurring in Bsub after treatment with Tse1.

      Another methodological concern in this paper is the limited details provided for the calculation of the permeabilization rate (Figure 4, L359, L662-664). It is not clear how, or if, cell density was controlled for in these experiments.

      We agree with the reviewer and we have explained with more detail how the permeabilization rate was calculated. Quote text: “N=3 for Bsub treated with Tse1 and N=3 for untreated Bsub. N refers to the number of CLSM fields analyzed to calculate the number of permeabilized cells of the total of cells in the field”

      Finally, one weakness of the paper is the broad conclusions that they draw. The authors claim that the mechanism of sporulation activation is conserved across Bacilli when the authors only test one B. subtilis and one B. cereus strain. They further argue (lines 469+) that Tse1 requires a PAAR repeat for its targeting, but do not provide direct evidence for this possibility.

      We have reduced the tone of the final conclusion in order to specify that the activation of sporulation is a mechanism that can be found in different Bacillus species such as Bsub and Bcer. Related with the second appreciation, we have included a further explanation for this argument. Quote text: “As shown in Figure 2B, a paar mutant has an active T6SS able to kill E. coli. However, as shown in Figure 2A, we noticed that a paar mutant (which encodes tse1) is not able to trigger B. subtilis sporulation to a similar level than Pchl WT strain. Given that paar deletion apparently abolishes Tse1 secretion, we suggest that Tse1 is a PAAR-associated effector that requires a PAAR repeat domain protein to be targeted for secretion, thereby increasing Bacillus sporulation during contact with Pseudomonas cells (Cianfanelli et al., 2016; Hachani et al., 2014; Whitney et al., 2014)”.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Elkind et al. use a deep learning segmentation algorithm trained on detecting putative cell nuclei in mouse brains to count cells in the Allen Mouse Brain Connectivity Atlas. The Allen Mouse Brain Connectivity Atlas is a dataset compromising hundreds of mice brains. The authors use this increased statistical power for detecting differences in volume, cell count, and cell density between strains (C57BL/6J and FVB.CD1) as well as sex differences.

      Both volume, cell count, and cell density are regularly used in neuroanatomy to normalize or benchmark results so having a large available dataset for others to compare their data would be a useful resource. The trained segmentation algorithm might also find utility in assays where investigators for one reason or another can't dedicate an entire labeled channel to count cell nuclei.

      Nevertheless, because of technical reasons, I find the current work problematic.

      We thank the Reviewer for acknowledging potential usefulness of our work, and the insightful, helpful comments. We believe this consideration has made our revised manuscript much stronger compared to the initial submission. We hope our revised version will also clear the Reviewer’s remaining doubts.

      Major:

      The authors make use of the "red" channel from the Allen Mouse Brain Connectivity Project (AMBCP). The AMBCP was acquired using two-photon tomography with the TissueCyte 1000 system (http://help.brain-map.org/download/attachments/2818171/Connectivity_Overview.pdf?version=2&modificationDate=1489022310670&api=v2). The sample is illuminated at 925 nm wavelength and the channel the authors describe as autofluorescence is collected through a 593/40 nm bandpass filter. The authors go on to describe their rationale for using this channel for quantifying cell nuclei:

      "We noticed that the red (background) channel of STPT images, taken for the purpose of atlas alignment, typically features dark, round-like objects resembling cell nuclei. We had observed this phenomenon in our own imaging of mouse brains but found little more than anecdotal mentions of it in the literature8,9,10,11".

      The authors here cite a Scientific Reports paper from 2021 with 11 citations, a Journal of Clinical Pathology paper from 2005 with 87 citations, and lastly a paper in Laboratory Investigation from 2016 with 41 citations. The authors completely fail to cite the work from Watt Webb's group (co-inventor of 2p microscopy) in PNAS from 2003 that entirely described the phenomena of native fluorescence by multiphoton- excitation (https://www.pnas.org/doi/10.1073/pnas.0832308100 ), citations so far: 1959 citations. This is either indicative of poor scholarship or an attempt to describe something as novel. Either way, the native fluorescence and second harmonic generation from multiphoton illumination are perfectly characterized by Webb and colleagues and they clearly show the differential effect on nucleosides, retinol, indoleamines, and collagen. This is also where the authors should have paid more attention to discrepancies in their own data when correlated to well-established cell nuclei markers (Murakami et al). The authors will note "black large spots" in the data at specific anatomical regions and structures, like the fornix and stria medullaris: https://connectivity.brain-map.org/projection/experiment/siv/263780729?imageId=263780960&imageType=TWO_PHOTON,SEGMENTATION&initImage=TWO_PHOTON&x=15702&y=18833&z=5

      which is not reproduced in for example the Allen Reference Atlas H&E staining: http://atlas.brain-map.org/atlas?atlas=1&plate=100960284#atlas=1&plate=100960284&resolution=4.19&x=5507.4000244140625&y=5903.39990234375&zoom=-2

      In connection here notice the poor signal in the 2p "autofluorescence" within the paraventricular nucleus: https://connectivity.brain-map.org/projection/experiment/siv/263780729?imageId=263780960&imageType=TWO_PHOTON,SEGMENTATION&initImage=TWO_PHOTON&x=15702&y=17833&z=6

      and then compare it to the H&E staining: http://atlas.brain-map.org/atlas?atlas=1&plate=100960280#atlas=1&plate=100960276&resolution=1.50&x=5342.476283482143&y=5368.023856026786&zoom=0

      These multiphoton-specific signals are especially pronounced in the pons and medulla which makes quantification especially dubious, which is even apparent simply from looking at Figure 1c in the manuscript.

      We thank the Reviewer for the comments and sincerely apologize for missing the seminal work of Webb’s group. We included the former references for their specific mention or illustration of non-autofluorescent nuclei. We indeed entirely missed to address the underlying chemistry that Webb’s group beautifully characterized. We have added the following sentence in the Results section “Autofluorescence of STPT images displays cell nuclei” (red font for new sentence; Reference #15 corresponds to Zipfel et al.):

      “We noticed that the red (background) channel of STPT images, taken for the purpose of atlas alignment, typically features dark, round-like objects resembling cell nuclei. This phenomenon was described in previous literature11,12,13,14. In particular, Zipfel et al. characterized the use of multiphoton-excited native florescence and second harmonic generation for the purpose of staining-free tissue imaging15.”

      And mentioned the dependency of our method on the presence of intrinsically fluorescent molecules in the Discussion:

      “The study has several limitations. First, the model is sensitive to the contrast between dark nuclei and autofluorescent surroundings, which can be limited by image quality and tissue composition. In particular, the staining-free approach depends on the presence of intrinsic molecular indicators such as NADH, retinol or collagen15, which may vary between cell or tissue components, even within the brain.”

      We understand that more generally, the Reviewer’s major concern above was regarding the technical validity of our approach; that the segmentation based on small objects lacking autofluorescence, as evident in the STPT dataset, in fact corresponds to cells/nuclei.

      In our initial Supplemental Figure 1 (in current version Figure 1—figure supplement 1) we provide technical validation of the method, by showing nuclear staining, and autofluorescence side-by-side, using epifluorescence microscopy. In our revision we now report appropriate statistical measures for this analysis (true positives, false positives, false negatives).

      In addition, we performed the following two sets of validations –

      (i) Technical validation of our staining-free quantification approach, by nuclear staining. We performed nuclear staining (Hoechst 33342) followed by STPT imaging of 9 female brains and trained a new deep neural network (DNN) to segment the resulting images (STPT was performed by TissueVision). Unfortunately, in STPT it is not technically possible to analyze nuclear staining and autofluorescence in the very same tissue. Therefore, we compared per-region density, cell count and volume of the nuclei-stained validation brains to our original DNN-based analysis of AMBCA brains. We show a correlation coefficient >0.99 for per-region cell count in AMBCA autofluorescence and our nuclear staining (and a similar correlation coefficient for volume). However, the number of cells in nuclear staining over the whole brain is 56% larger than in autofluorescence. Although we currently have no technically feasible way to prove this, one likely explanation for this discrepancy is the nature of the two signals the imaging detects; as positive (Hoechst fluorophore) or autofluorescence. Further, discrepancies between the two methods were notably higher in glial-rich tissues (e.g., CTX L1, midbrain, brainstem) – leading to the speculation that low-autofluorescent object-counts may be biased to detect neurons, rather than glia.

      (ii) Independent validation of the biological findings – discussed further below. Regarding the specific concern of “black large spots” in the fornix and stria medullaris – we would like to emphasize that our DNN does not identify and segment dark regions like ventricles and tracks. We provide in the Author Response Image 1 three examples featuring “black large spots” of different shapes and size, with examples of the segmentation results as shown in Figures 1 and 2 of the manuscript. Note that colored circles, that appear as dots depending on magnification, are the objects that were detected and segmented by the DNN. In the Figure we demonstrate that (1) fiber tracts (incl. fornix, stria medullaris) are not segmented; (2) striatal patches (that are smaller still than the fiber tracts in question) are not segmented; and (3) putative blood vessels, appearing as elongated, black structures, are ignored by our DNN.

      Author Response Image 1. How does the DNN deal with large black spots? Examples for fiber tracts, striatal patches, and blood vessels; adapted from Figures 1 and 2 in the manuscript. Note that dots/outlines represent segmented putative “nuclei” as detected by the model, colored by assigned region according to Allen Mouse Brain hierarchy. Example (1): fiber tracts (incl. fornix, stria medullaris) are not segmented. Example (2): Striasomes (patches in the striatum, that are smaller still than the fiber tracts in question) are not segmented, and the much smaller objects that are detected as putative nuclei are indicated by arrows. Example (3) putative blood vessels, appearing as elongated, black structures, are ignored by our DNN. Examples of the segmentation images were adapted from the manuscript’s Figure 1 to correspond to the STPT image featuring fiber tracts (and Striasomes/patches) was pointed out by the Reviewer.

      Retrieved from: https://connectivity.brain-map.org/projection/experiment/siv/263780729?imageId=263780960&imageType=TWO_PHOTON,SEGMENTATION&initImage=TWO_PHOTON&x=15702&y=18833&z=5.

      Regarding the claim of problematic counting in brain stem regions, we agree, and had addressed this limitation in the manuscript’s Discussion (see below). We believe that our counting is valuable even if in some regions there is a significant systematic error: Most of the analyses in this study compare brain regions across individuals and thus systematic error is less impactful. In the revision, we nevertheless took care to validate and quantify the size of this effect. Briefly, we compared counting based on nuclear staining (Hoechst) from 9 STPT imaged brains, to our quantifications of non-autofluorescent objects. As expected, the ratio between these counts depends on the brain region, and accuracy is better in regions with high brightness, which are not on the border of the section (Figure 2—figure supplement 2). As for pons and medulla, the densities in our Hoechst quantifications are 43% and 60% higher than in our AMBCA analysis, respectively, yet rank order is kept in both.

      We have revised the relevant sentences in the Discussion:

      Original sentences: The study has several limitations. … In the hindbrain (pons, medulla), contrast was exceedingly weak, and we expect our quantifications in this region to strongly underestimate real cell densities, to an extent we cannot quantify.

      Revised sentences: The study has several limitations. … In the hindbrain (pons, medulla), contrast was exceedingly weak, and we expect our quantifications in this region to be 66% of the value estimated by nuclear staining (Figure 2—figure supplement 2).

      The authors here use the correlation on log-log coordinates between their data and that of Murakami et al to argue that the method has validity. However, the variance explained here is R^2 = 0.74 which is very poor given the log-log coordinates. A more valid metric would use linear coordinates and computing the ICC and interpret it according to established guidelines (e.g. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4913118/).

      As mentioned by the Reviewer, Figure 2D compares Murakami et al. cell counts and ours, across all brain regions. The value “r=0.869” represents the correlation coefficient between the two vectors in log scale and not the R^2. We also now display the correlation coefficient for the linear scale, in which case p=0.98. As suggested by the Reviewer, we added ICC values between the two vectors in linear scale. Using 6 different forms (ICC – 1-1;1-k;C-1;C-k;A-1;A-k), the ICC values were 0.98-0.99, thus corresponding to an excellent agreement (ICC values are mentioned in legend of Figure 2).

      Author Response Image 2 displays the revised Figure 2D (left), and the log value of the ratio between the AMBCA-based cell count and the Murakami-based value (right), as a function of region volume. The mean value across regions is zero, corresponding to similar cell counts in both methods. Indeed, there exist outlier regions, that may be attributed to either registration errors, different experimental protocols or may stem from the fact that the Murakami values are based on 3 brains, compared to hundreds of AMBCA brains.

      Author Response Image 2. Correlation with cell counts in Murakami et al. Left, revised Figure 2D; Right, ratio between AMBCA-based cell counts and Murakami et al. counts, as a function of region volume

      In addition to the above concern, the authors argue that the large sample size of the AMBCP is what would enable them to find statistically significant small effect sizes that might have gone undetected in the literature. However, this argument falls flat once we examine some of the main findings the authors report. Although the authors do not directly report measures of dispersion we can estimate it from the figures and then arrive at the sample size needed to find the reported effect size. For example, the effect that describes ORBvl2/3 volume is larger in female mice compared to males would only require n=13 mice at the desired power of 0.8. Likewise, the sample size needed to detect the increased BST volume in male mice looks to be roughly n=16 mice at the desired power of 0.8. Both of these estimates are well within what is a reasonable sample size to expect in an ordinary study. This begs the question: why did the authors simply not verify some of their main findings in an independent sample obtained through traditional ways to quantify volume and cell density since it is well within reach? Such validation would strengthen the arguments of the paper.

      We thank the reviewer for this comment and apologize. In the revised version we do report dispersion.

      We would like to emphasize that due to our restricted time and resources, we decided to focus our experimental validation on the technical comparison of nuclear staining vs. autofluorescence-based segmentation, outlined above.

      We then verified the biological findings from the initial cohort using C57BL/6J volume data from an additional 663 males vs.166 females on AMBCA. This independent cohort showed similar sexual dimorphism in the volume of MEA, BST and ORBvl2/3, as depicted in the following figure (panels A-D and also as new Figure 4—figure supplement 1).

      We fully acknowledge the interesting issue raised on sample sizes required to detect our reported effect sizes. Therefore, we here also present the average p-value for sexual dimorphism in volumes of MEA, BST and ORBvl2/3, as a function of the sample size (panel E in Figure 4—figure supplement 1 of the revised manuscript). The Reviewer will note that the regions with largest effect size (MEA, BST) can be detected within more ordinary sample sizes, and indeed, MEA and BST dimorphism is evident in the literature. ORB dimorphism required much greater sample size; and our analysis (Figure 4) systematically detected many more dimorphic regions, in volume, density and count.

      Reviewer #2 (Public Review):

      This report describes a large-scale analysis of cell counts in mouse brains. The authors found that the Allen Mouse Connectivity project has a rich dataset for cell counting that is yet to be analyzed, and they developed methods to quantify cells in different nuclei. They go on to compare males vs females and two different strains. From this analysis, they found specific differences between male versus female brains, left versus right hemispheres, and C57BL/6 versus FVB.CD1 mice, especially with regard to cell counts and density.

      Overall, the methodology is sound and the quality of the data seems high. In fact, this study uses >100 brains for the statistics, and this is one of the major strengths of this study. For researchers who are interested in interrogating the differences at the macroscopic level in brain structures, this study will be a great resource. For example, the manuscript contains an interesting finding that for most brain areas, females have larger volumes but fewer cell numbers.

      We thank the Reviewer for these comments. We would like to mention that the revised version of the manuscript does not include a statement regarding BL6 female volume. We found a batch effect in the AMBCA experiments, mostly affecting the volume in their first batch (Figure 2—figure supplement 1B). That batch included mostly males, and had, for some reason, lower volume compared to all later experiments, which caused the volume differences. We emphasize that (1) the total number of cells did not show any batch effect (Figure 2—figure supplement 1C); (2) We normalized the volume and repeated the analysis. Aside the finding that females did not in fact have larger volumes, other main findings remained unchanged.

      Reviewer #3 (Public Review):

      Elkind et al. have devised a strategy to detect cells in whole brain samples of the large, publicly accessible Allen Mouse Brain Connectivity database. They put together an analysis pipeline to quantify cell numbers and -density as well as volumes for all annotated brain areas in these samples. This allowed them to make several important discoveries such as (1) strain-, sex- and hemisphere-specific differences in cell densities, (2) a large interindividual variability in cell numbers, and (3) an absence of linear scaling of cell count with volume, among others. The key strength of this work lies in its comprehensive analysis, the large sample size that the authors have drawn from (making their conclusions particularly robust), and the fact that they have made their analysis tools accessible. A weakness of the current manuscript is the dense layout and overplotting of several of the figures, and the lack of necessary information to understand them more easily. Another, conceptual weakness of using the autofluorescence channel for cell detection is that the identity (neuronal vs non-neuronal) of the underlying cells remains unresolved. Overall, however, I believe that this study has the potential to serve as a valuable reference point, and I would expect this work to have a lasting impact on quantitative studies of mouse brain cytoarchitecture.

      We thank the Reviewer for these valuable comments. We have tried to minimize overplotting of figures and hopefully added all necessary information. For example, the revised manuscript presents more pared-down figures, with data labels omitted if they crowded the graphic. Instead, we provide the full data in Supplemental tables, and our online accessible GUI. We hope the reader will feel encouraged to both zoom the presented data, more deeply explore additional tables, and our online tool.

      Regarding the question of cell types, we were unfortunately not able to provide a definitive answer, but our validation experiments provided some potential clues. For example, nuclear staining (Hoechst) uniformly detected 65% more cells than AMBCA autofluorescence quantification. And, in neuron-rich regions, the correspondence between nuclear staining and AMBCA autofluorescence was notably better than in glia-rich regions (e.g., CTX L1, midbrain, medulla). These discrepancies between the techniques may therefore point to an underlying difference in cell types composition – such that counting low-autofluorescent nuclei is biased to neurons.

      In addition, however, the methods differ in their native physical properties; in that one detects presence of a fluorescent signal (e.g., the nuclear stain is detected beyond its focal plane), compared to the detection of the absence of a signal (which, in turn, is dependent on the presence of surrounding intrinsic fluorescent molecules). It is technically non-trivial to assess the extent to which these factors apply. We have added a clarification along these lines in the Discussion (below). We would further like to emphasize the nature of our study as a comparative, systematic analysis within this interesting cohort, rather than providing definitive cell counts – that we found to be greatly variable across the population.

      “We further attempted to estimate the region-specific accuracy of our cell counting by comparing autofluorescence STPT with brain-wide imaging of nuclear-stained STPT. However, this comparison is technically nontrivial because of the native physical properties of direct staining vs. autofluorescence. For example, stained nuclei located off the focal plane may appear in the image, yet remain undetected by autofluorescence. In addition, tissue composition (e.g., cell types, extracellular matrix) may affect the imaged region. Indeed, in regions rich with non-neuronal cells the error of autofluorescent-based counting was larger compared to nuclear staining. Hence, one may speculate that autofluorescent-based detection is biased for neurons”

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors report the generation of a mesoscale excitatory projectome from the ventrolateral prefrontal cortex (vlPFC) in the macaque brain by using AAV2/9-CaMKIIa-Tau-GFP labeling and imaging with high-throughput serial two-photon tomography. They present a novel data pipeline that integrates the STP data with macroscopic dMRI data from the same brain in a common 3D space, achieving a direct comparison of the two tracing methods. The analysis of the data revealed an interesting discrepancy between the high resolution STP data and the lower resolution dMRI data with respect to the extent of the frontal lobe projection through the inferior fronto-occipital fasciculus (IFOF) - the longest associative axon bundle in the human brain.

      The authors report the generation of a mesoscale excitatory projectome from the ventrolateral prefrontal cortex (vlPFC) in the macaque brain by using AAV2/9-CaMKIIa-Tau-GFP labeling and imaging with high-throughput serial two-photon tomography. They also present a novel data pipeline that integrates the STP data with macroscopic dMRI data from the same brain in a common 3D space, achieving a direct comparison of the two tracing methods. Overall the paper can serve as a how to example for analyzing large non-human primate brain data, though some parts of the paper can be improved and the interpretation of the data should also be further strengthened.

      We thank the reviewer for his positive evaluation of our manuscript.

      The methodological part should include more detail on image acquisition - speed of imaging, pixel residence time, total time for data acquisition of a single brain and data sizes. Also the time and hardware needed for the computational analysis should be included, including the registration to the common reference and the running time for the machine learning predictions - this should also include the F score for the axon detection.

      We thank the reviewer for pointing out these vital issues. We have added these technical details in the resubmitted manuscript.

      “High x-y resolution (0.95 μm/pixel) serial 2D images were acquired in the coronal plane at a z-interval of 200 μm across the entire macaque brain. The scanning time of a single field-of-view which contains 1024 by 1024 pixels was 1.629 s (i.e., pixel residence time was ~1.6 μs), as resulted in a continuous ~1 month scanning and ~5 TB STP tomography data for a single monkey brain.”

      “The data analysis was undertaken on a compute cluster with a 3.1 - 3.3 GHz 248 core CPU, 2.8 T of RAM, and 17472 CUDA cores.”

      “The total computational time for the machine learning predictions in one macaque brain was ~ 1.5 months.”

      “To evaluate overall classifier performance, the precision–recall F measure, also called F-score, was computed by using additional four labeled images as test sets. Higher accuracy performance achieved by the classifier often yield higher F-scores (94.41% ± 1.99%, mean ± S.E.M.).”

      “For registration to the 3D common space, it took half an hour approximately.”

      The discrepancy between the high resolution STP data and the lower resolution dMRI data with respect to the extent of the frontal lobe projection through the inferior fronto-occipital fasciculus seems puzzling. One would expect that the STP data would reveal more detail not less.. One possibility is that the Tau-GFP does not diffuse throughout the full axon arborization of the PFC neurons, resulting in a technical artifact. Can this be excluded to support the functional significance of the current data?

      We thank the reviewer for raising this important issue. We apologize for not providing sufficient details of the IFOF debate due to limited space and causing confusion. We have added literature background of the IFOF debate to the section of Introduction (also recommended by Reviewer #2). Thanks to the comments by Reviewer #2, the present finding provides direct support for the speculation that the IFOF of macaque monkeys may not exist in a mono-synaptic way.

      The AAV construct encoding cytoskeletal GFP (Tau-GFP) was used here to label all processes of the infected neuron, including axons and synaptic terminals. About 3 weeks of post-surgery survival time are usually sufficient to label intracerebral circuits in rodents (Lanciego and Wouterlood, 2020). We have extended the survival time to 2-3 months in order to achieve adequate labeling of axonal fibers and terminals in macaques.

      Regarding the extent of Tau-GFP diffuse, the STP images and high-resolution confocal microscopic analysis further showed differences in the morphology of axon fibers that populate the route and terminals of these axon fibers. Consistent with previous reports (Fuentes-Santamaria et al., 2009; Watakabe and Hirokawa, 2018), the axon fibers were thin and formed bouton-like varicosities in the terminal regions (MD, Figure 2—figure supplement 7D; caudate, Figure 2—figure supplement 7J; PFC, Figure 1—figure supplement 5A-D). Those results indicate that the Tau-GFP has reached axonal terminals.

      References:

      Fuentes-Santamaria V, Alvarado JC, McHaffie JG, Stein BE (2009) Axon Morphologies and Convergence Patterns of Projections from Different Sensory-Specific Cortices of the Anterior Ectosylvian Sulcus onto Multisensory Neurons in the Cat Superior Colliculus. Cereb Cortex 19:2902-2915.

      Lanciego JL, Wouterlood FG (2020) Neuroanatomical tract-tracing techniques that did go viral. Brain Struct Funct 225:1193-1224.

      Watakabe A, Hirokawa J (2018) Cortical networks of the mouse brain elaborate within the gray matter. Brain Struct Funct 223:3633-3652.

      Reviewer #2 (Public Review):

      The authors utilized viral vectors as neural tracers to delineate the connectivity map of the macaque vlPFC at the axonal level. There are three main goals of this study: 1) determine an effective viral vector for tract-tracing in the macaque brain, 2) delineate the detailed map of excitatory vlPFC projections to the rest of the brain, and 3) compare vlPFC connectivity between tracing and tractography results.

      We thank the reviewer for his/her constructive comments, to which we respond below.

      Accordingly, my comments are organized around each aim:

      1) This study demonstrates the advantage of viral tracing technique in targeting neuron type-specific pathways. The authors conducted injection experiments with three types of viral vectors and found success of AAV in labeling long-distance connections without causing fatal neurotoxicity in the monkey. This success extends the application of AAV from rodents to nonhuman primates. The fact that AAV specifically targets glutamatergic neurons makes it advantageous for mapping excitatory projections.

      Although the labeling efficacy of each viral vector type is described in the text, Fig. 2 does not present a clear comparison across viral vectors, despite such comparison for a thalamic injection in Fig. 2S. Without a comparable graph to Fig. 2E, it is unclear to what extent the VSV and lentivirus failed in labeling long-distance pathways.

      We thank the reviewer for the helpful suggestion. As suggested, we have added three new figures as Supplementary materials in the revised manuscript.

      Figure 2—figure supplement 2. Expression of GFP using VSV-△G injected into MD thalamus of the macaque brain. (A) GFP-labeled neurons were found in the MD thalamus ~5 days after injection of VSV-△G encoding Tau-GFP. (B) A magnified view illustrating the morphology of GFP-labeled neurons in the area outlined with a white box in (A). (C) Higher magnification view of GFP-positive axons.

      Figure 2—figure supplement 3. Expression of GFP using lentivirus injected into MD thalamus of the macaque brain. (A) Lentivirus construct was injected into the macaque thalamus and examined for transgene expression after ~9 months. (B) High power views of the dotted rectangle in panel A. (C) Magnified view of panel B. Note the presence of GFP-positive cells.

      Figure 2—figure supplement 4. Expression of GFP using AAV2/9 injected into MD thalamus of the macaque brain. (A) GFP-labeled axons were observed in the subcortical regions ~42 days after injection of AAV2/9 encoding Tau-GFP in MD thalamus. The inset shows the injection site in MD thalamus. Two dashed line boxes enclose the regions of interest: frontal white matter and ALIC, whose GFP signal are magnified in (B) and (C), respectively. (D) Higher magnification view of GFP-positive axons.

      2) The authors quantified connectivity strength by the GFP signal intensity using a machine-learning algorithm. Both the quantitative approach and the resulting excitatory projection map are important contributions to advancing our knowledge of vlPFC connectivity.

      However, several issues with the analysis lead to concerns about the connectivity result. First, the strength measure is based on axonal patterns in the terminal fields (which the authors refer to as "axon clusters"), detected by a machine-learning algorithm (page 25, lines 11-13). However, the actual synaptic connections are the small dot-looking signals in the background. These "green dots" are boutons on the dendritic trees. The density of boutons rather than the passing fibers reflects the density of synapses. The brief method description does not mention how the boutons are quantified, and it is unclear whether the signal was treated as the background noise and filtered out. Second, it is difficult for the reader to assess the robustness of the vlPFC connectivity patterns, due to these issues: i) It is unclear how many injection cases were used to generate the result reported in the subsection "Brain-wide excitatory projectome of vlPFC in macaques". The text mentions a singular "injection site" (page 8, line 12) and Fig. 4 shows a single site. However, there are three cases listed in Table 1. Is the result an average of all three cases? ii) Relatedly, it is unclear in which anatomical area the injection was placed for each case. Table 1 lists the site as "vlPFC" for all three cases, while the vlPFC contains areas 44, 45 and 12l. These areas have different projection patterns documented in the tract tracing literature. If different areas were injected in the three cases, they should be reported separately. iii) It is hard to compare the projection patterns with those reported in the literature. Conventionally, tract tracing studies report terminal fields by showing original labeling patterns in both cortical and subcortical regions without averaging within divided areas (see e.g. Petrides & Pandya, 2007, J Neurosci). It is hard to compare Fig. 3 with previous tract tracing studies to assess its robustness.

      We thank the reviewer for his/her constructive comments, to which we respond below.

      1). We appreciate the reviewer’s comment and sincerely apologize for not explaining this point clearly in our previous submission. The major concern is whether the axonal varicosities were likely to be treated as the background noise and removed by mistake. In fact, the dot-looking autofluorescence rather than the axonal varicosities were reduced through a machine-learning algorithm in segmentation. Hence we have provided new results and updated the “Materials and Methods” and “Discussion” sections in the revision accordingly.

      “Fluorescent images of primate (Abe et al., 2017) brain often contain high-intensity dot-looking background signal caused by accumulation of lipofuscin. Thanks to the broad emission spectrum of lipofuscin, dot-looking background and GFP-positive axonal varicosities are easily distinguishable from each other. For instance (Figure 1—figure supplement 4), axonal varicosities can be selectively excited in green channel, while dot-looking background lipofuscin usually present in both green channel and red channel. During quantitative analysis, a machine learning algorithm was adopted to reliably segment the GFP labelled axonal fibers including axonal varicosities, and remove the lipofuscin background (Arganda-Carreras et al., 2017; Gehrlach et al., 2020).”

      “One recent study compared results of terminal labelling using Synaptophysin-EGFP-expressing AAV (specifically labelling synaptic endings) with the cytoplasmic EGFP AAV (labelling axon fibers and synaptic endings). There was high correspondence between synaptic EGFP and cytoplasmic EGFP signals in target regions (Oh et al., 2014). Thus, we relied on quantifying GFP-positive pixels (containing signals from both axonal fibers and terminals) rather than the number of synaptic terminals, similarly done in recent reports (Oh et al., 2014; Gehrlach et al., 2020).”

      Figure 1—figure supplement 4. Difference between axonal varicosities and dot-looking background. STP images (A-D) and high-resolution confocal images (E-H) were acquired in green channel and the red channel. Synaptic terminals (indicated by white arrows) can be specifically excited in green channel, while dot-looking background lipofuscin (indicated by yellow arrows) can be visualized both in green channel and red channel. (C and G) No colocalization was found between axonal varicosities and dot-looking background. Axonal varicosities were easily distinguished from dot-looking background in the merged image. (D and H) The dot-looking autofluorescence rather than the axonal varicosities was reduced through a machine-learning algorithm.

      References:

      Abe H, Tani T, Mashiko H, Kitamura N, Miyakawa N, Mimura K, Sakai K, Suzuki W, Kurotani T, Mizukami H, Watakabe A, Yamamori T, Ichinohe N (2017) 3D reconstruction of brain section images for creating axonal projection maps in marmosets. J Neurosci Methods 286:102-113.

      Arganda-Carreras I, Kaynig V, Rueden C, Eliceiri KW, Schindelin J, Cardona A, Sebastian Seung H (2017) Trainable Weka Segmentation: a machine learning tool for microscopy pixel classification. Bioinformatics 33:2424-2426.

      Gehrlach DA, Weiand C, Gaitanos TN, Cho E, Klein AS, Hennrich AA, Conzelmann KK, Gogolla N (2020) A whole-brain connectivity map of mouse insular cortex. Elife 9.

      Oh SW et al. (2014) A mesoscale connectome of the mouse brain. Nature 508:207-214.

      2.1) We apologize for causing these confusions due to insufficient description in the main text. Now we have revised the description of the “Materials and Methods” section accordingly. Furthermore, we have made both the whole-brain serial two-photon data and high-resolution diffusion MRI data freely available to the community, as allows researchers in the field to perform further analyses that we have not done in the current study.

      “Three samples were injected with AAV in vlPFC, and two of them were able to be imaged with STP. Unfortunately, one sample became “loose” and fell off from the agar block after several weeks of imaging. So, the quantitative results were not shown in Figure 3.”

      2.2) We apologize for insufficient description of the precise location of the injection sites. We have revised the description of “Materials and Methods” section and provided a new figure to clarify the exact location of the injection sites.

      “Figure 3-4 and Figure 4—figure supplement 2-4 were derived from sample #8 with infected area in 45, 12l and 44 of vlPFC. Figure 1—figure supplement 6 was derived from sample #7 with infected area in 12l and 45 of vlPFC.”

      Figure 1—figure supplement 6. Representative fluorescent images showing injection site and major tracts of sample #7. (A) STP image of the injection site in vlPFC are shown overlaid with the monkey brain template (left hand side), mainly spanning areas 12l and 45a. (B) Confocal image of the AAV infected neurons (indicated by white arrows). (C-F) Representative confocal images of major tracts originating from vlPFC.

      2.3) We agree with the reviewer that most tract tracing studies report terminal fields by showing original labeling patterns. Several recent studies report the total volume of segmented GFP-positive pixels (Oh et al., 2014) or percentage of total labeled axons (Do et al., 2016; Gehrlach et al., 2020) to represent the connectivity strength, and other studies provide the projection density as well (Hunnicutt et al., 2016). We have provided both percentage of total labeled axons (Figure 3C right panel), projection density (Figure 3C left panel) and representative original fluorescent images (Figure. 4, Figure 4—figure supplement 2 and Figure 4—figure supplement 4) to demonstrate our projection data at different dimensions.

      References:

      Do JP, Xu M, Lee SH, Chang WC, Zhang S, Chung S, Yung TJ, Fan JL, Miyamichi K, Luo L, Dan Y (2016) Cell type-specific long-range connections of basal forebrain circuit. Elife 5.

      Gehrlach DA, Weiand C, Gaitanos TN, Cho E, Klein AS, Hennrich AA, Conzelmann KK, Gogolla N (2020) A whole-brain connectivity map of mouse insular cortex. Elife 9.

      Hunnicutt BJ, Jongbloets BC, Birdsong WT, Gertz KJ, Zhong H, Mao T (2016) A comprehensive excitatory input map of the striatum reveals novel functional organization. Elife 5.

      Oh SW et al. (2014) A mesoscale connectome of the mouse brain. Nature 508:207-214.

      3) Using the ground-truth from tract tracing to validate tractography results is a timely problem and this study showed promising consistency and discrepancy between the two modalities. Especially, the discrepancy between tracing and tractography data on the IFOF termination brings critical insights into a potential cross-species difference. The finding that IFOF does not reach the occipital cortex provides important support for the speculation that IFOF may not exist in monkeys (for a context of the IFOF debate see Schmahmann & Pandya, 2006, pp 445-446).

      I have minor concerns regarding the statistical robustness of the tracing-tractography comparison. The authors compared the vlPFC-CC-contralateral tract instead of a global connectivity pattern without justification. Why omitting other major tracts that connect with vlPFC? In addition, the results are shown for only one monkey, while two monkeys went through both tracer injection and dMRI scans. It is unclear how the results were chosen or whether the data were averaged.

      We apologize for not describing it clearly. The STP images were acquired in the coronal plane with high x-y resolution (0.95 μm/pixel), while the z resolution was relatively low (200 μm). The axonal connection information along z axis may be lost due to the present step size (relatively large) such that it is technically demanding to reconstruct the axonal density maps in sagittal or horizontal plane. Therefore, we focused on the vlPFC-CC-contralateral tract traveling along the coronal plane when quantifying the similarity coefficients along the anterior-posterior axis of the whole macaque brain, and omitted the tracts that were shown as dots in the coronal plane. We have revised it in the resubmitted manuscript.

      “GFP projection and probabilistic tract were plotted with the Dice coefficients and Pearson coefficients (R) along the anterior-posterior axis of the whole macaque brain. The Dice coefficients and Pearson coefficients were higher in dense projection regions, especially for the vlPFC-CC-contralateral tract (Figure 6A). To carry out a proof-of-principle investigation, we focused on the vlPFC-CC-contralateral tract that was reconstructed in 3D space by using STP and dMRI data, respectively.”

      With regard to the demonstration of dMRI data, we apologize for not making it clear in previous version. We have already revised Figure 6 and Figure 7 so that dMRI scans from different macaque monkeys were shown separately.

      Figure 6. Comparison of vlPFC connectivity profiles by STP tomography and diffusion tractography. (A) Percentage of projection, Probabilistic tracts, Dice coefficients and Pearson coefficients (R) were plotted along the anterior-posterior axis in the macaque brain. Blue and red colors indicate results of two dMRI data sets acquired from different macaque monkeys. (B, C) 3D visualization of the fiber tracts issued from the injection site in vlPFC to corpus callosum to the contralateral vlPFC by STP tomography and diffusion tractography. (D-F) Representative coronal slices of the diffusion tractography map and the axonal density map along the vlPFC-CC-contralateral tract, overlaid with the corresponding anatomical MR images. (G-J) GFP-labeled axon images as marked in Figure 6F were shown with magnified views. (H, J) correspond to high magnification images of the white boxes indicated in G and I, both of which presented a great deal of details about axonal morphology.

      Figure 7. Illustration of the inferior fronto-occipital fasciculus by diffusion tractography and STP. (A) The fiber tractography of IFOF (lateral view). Two inclusion ROIs at the external capsule (pink) and the anterior border of the occipital lobe (purple) were used and shown on the coronal plane. The IFOF stems from the frontal lobe, travels along the lateral border of the caudate nucleus and external/extreme capsule, forms a bowtie-like pattern and anchors into the occipital lobe. (B) The reconstructed traveling course of IFOF based on vlPFC projectome was shown in 3D space. (C) The Szymkiewicz-Simpson overlap coefficients between 2D coronal brain slices of the dMRI-derived IFOF tract and vlPFC projections were plotted along the anterior-posterior axis of the macaque brain. Blue and red colors indicate results of two dMRI data sets acquired from different macaque monkeys. Four cross-sectional slices (D-G) along the IFOF tracts were arbitrarily chosen to demonstrate the spatial correspondence between the diffusion tractography and axonal tracing of STP images. (D-G) The detected GFP signals (green) of vlPFC projectome and the IFOF tracts (red) obtained by diffusion tractography were overlaid on anatomical MRI images, with a magnified view of the box area. Evidently there was no fluorescent signal detected in the superior temporal area where the dMRI-derived IFOF tract passes through (G).

    1. Author Response:

      Reviewer #1:

      Zappia et al investigate the function of E2F transcriptional activity in the development of Drosophila, with the aim of understanding which targets the E2F/Dp transcription factors control to facilitate development. They follow up two of their previous papers (PMID 29233476, 26823289) that showed that the critical functions of Dp for viability during development reside in the muscle and the fat body. They use Dp mutants, and tissue-targetted RNAi against Dp to deplete both activating and repressive E2F functions, focussing primarily on functions in larval muscle and fat body. They characterize changes in gene expression by proteomic profiling, bypassing the typical RNAseq experiments, and characterize Dp loss phenotypes in muscle, fat body, and the whole body. Their analysis revealed a consistent, striking effect on carbohydrate metabolism gene products. Using metabolite profiling, they found that these effects extended to carbohydrate metabolism itself. Considering that most of the literature on E2F/Dp targets is focused on the cell cycle, this paper conveys a new discovery of considerable interest. The analysis is very good, and the data provided supports the authors' conclusions quite definitively. One interesting phenotype they show is low levels of glycolytic intermediates and circulating trehalose, which is traced to loss of Dp in the fat body. Strikingly, this phenotype and the resulting lethality during the pupal stage (metamorphosis) could be rescued by increasing dietary sugar. Overall the paper is quite interesting. It's main limitation in my opinion is a lack of mechanistic insight at the gene regulation level. This is due to the authors' choice to profile protein, rather than mRNA effects, and their omission of any DNA binding (chromatin profiling) experiments that could define direct E2F1/ or E2F2/Dp targets.

      We appreciate the reviewer’s comment. Based on previously published chromatin profiling data for E2F/Dp and Rbf in thoracic muscles (Zappia et al 2019, Cell Reports 26, 702–719) we discovered that both Dp and Rbf are enriched upstream the transcription start site of both cell cycle genes and metabolic genes (Figure 5 in Zappia et al 2019, Cell Reports 26, 702–719). Thus, our data is consistent with the idea that the E2F/Rbf is binding to the canonical target genes in addition to a new set of target genes encoding proteins involved in carbohydrate metabolism. We think that E2F takes on a new role, and rather than being re-targeted away from cell cycle genes. We agree that the mechanistic insight would be relevant to further explore.

      Reviewer #2:

      The study sets out to answer what are the tissue specific mechanisms in fat and muscle regulated by the transcription factor E2F are central to organismal function. The study also tries to address which of these roles of E2F are cell intrinsic and which of these mechanisms are systemic. The authors look into the mechanisms of E2F/Dp through knockdown experiments in both the fat body* (see weakness) and muscle of drosophila. They identify that muscle E2F contributes to fat body development but fat body KD of E2F does not affect muscle function. To then dissect the cause of adult lethality in flies, the authors proteomic and metabolomic profiling of fat and muscle to gain insights. While in the muscle, the cause seems to be an as of yet undetermined systemic change , the authors do conclude that adult lethality in fat body specific Dp knockdown is the result of decrease trehalose in the hemolymph and defects in lipid production in these flies. The authors then test this model by presenting fat body specific Dp knockdown flies with high sugar diet and showing adult survival is rescued. This study concurs with and adds to the emerging idea from human studies that E2F/Dp is critical for more than just its role in the cell-cycle and functions as a metabolic regulator in a tissue-specific manner. This study will be of interest to scientists studying inter-organ communication between muscle and fat.

      The conclusions of this paper are partially supported by data. The weaknesses can be mitigated by specific experiments and will likely bolster conclusions.

      1) This study relies heavily on the tissue specificity of the Gal4 drivers to study fat-muscle communication by E2F. The authors have convincingly confirmed that the cg-Gal4 driver is never turned on in the muscle and vice versa for Dmef2-Gal4. However, the cg-Gal4 driver itself is capable of turning on expression in the fat body cells and is also highly expressed in hemocytes (macrophage-like cells in flies). In fact, cg-Gal4 is used in numerous studies e.g.:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4125153/ to study the hemocytes and fat in combination. Hence, it is difficult to assess what contribution hemocytes provide to the conclusions for fat-muscle communication. To mitigate this, the authors could test whether Lpp-Gal4>Dp-RNAi (Lpp-Gal4 drives expression exclusively in fat body in all stages) or use ppl-Gal4 (which is expressed in the fat, gut, and brain) but is a weaker driver than cg. It would be good if they could replicate their findings in a subset of experiments performed in Figure 1-4.

      This is indeed an important point. We apologize for previously not including this information. Reference is now on page 7.

      Another fat body driver, specifically expressed in fat body and not in hemocytes, as cg-GAL4, was tested in previous work (Guarner et al Dev Cell 2017). The driver FB-GAL4 (FBti0013267), and more specifically the stock yw; P{w[+mW.hs]=GawB}FB P{w[+m*] UAS-GFP 1010T2}#2; P{w[+mC]=tubP-GAL80[ts]}2, was used to induce the loss of Dp in fat body in a time-controlled manner using tubGAL80ts. The phenotype induced in larval fat body of FB>DpRNAi,gal80TS recapitulates findings related to DNA damage response characterized in both Dp -/- and CG>Dp- RNAi (see Figure 5A-B, Guarner et al Dev Cell 2017). The activation of DNA damage response upon the loss of Dp was thoroughly studied in Guarner et al Dev Cell 2017. The appearance of binucleates in cg>DpRNAi is presumably the result of the abnormal transcription of multiple G2/M regulators in cells that have been able to repair DNA damage and to resume S-phase (see discussion in Guarner et al Dev Cell 2017). More details regarding the fully characterized DNA damage response phenotype were added on page 6 & 7 of manuscript.

      Additionally, r4-GAL4 was also used to drive Dp-RNAi specifically to fat body. But since this driver is weaker than cg-GAL4, the occurrence of binucleated cells in r4>DpRNAi fat body was mild (see Figure R1 below).

      As suggested by the reviewer, Lpp-GAL4 was used to knock down the expression of Dp specifically in fat body. All animals Lpp>DpRNAi died at pupa stage. New viability data were included in Figure 1-figure supplement 1. Also, larval fat body were dissected and stained with phalloidin and DAPI to visualize overall tissue structure. Binucleated cells were present in Lpp>DpRNAi fat body but not in the control Lpp>mCherry-RNAi (Figure 2-figure supplement 1B). These results were added to manuscript on page 7.

      Furthermore, Dp expression was knockdowned using a hemocyte-specific driver, hml-GAL4. No defects were detected in animal viability (data not shown).

      Thus, overall, we conclude that hemocytes do not seem to contribute to the formation of binucleated-cells in cg>Dp-RNAi fat body.

      Finally, since no major phenotype was found in muscles when E2F was inactivated in fat body (please see point 3 for more details), we consider that the inactivation E2F in both fat body and hemocytes did not alter the overall muscle morphology. Thus, exploring the contribution of cg>Dp-RNAi hemocytes in muscles would not be very informative.

      2) The authors perform a proteomics analysis on both fat body and muscle of control or the respective tissue specific knockdown of Dp. However, the authors denote technical limitations to procuring enough third instar larval muscle to perform proteomics and instead use thoracic muscles of the pharate pupa. While the technical limitations are understandable, this does raise a concern of comparing fat body and muscle proteomics at two distinct stages of fly development and likely contributes to differences seen in the proteomics data. This may impact the conclusions of this paper. It would be important to note this caveat of not being able to compare across these different developmental stage datasets.

      We appreciate the suggestion of the reviewer. This caveat was noted and included in the manuscript. Please see page 11.

      3) The authors show that the E2F signaling in the muscle controls whether binucleate fat body nuclei appear. In other words, is the endocycling process in fat body affected if muscle E2F function is impaired. However, they conclude that imparing E2F function in fat does not affect muscle. While muscle organization seems fine, it does appear that nuclear levels of Dp are higher in muscles during fat specific knock-down of Dp (Figure 1A, column 2 row 3, for cg>Dp-RNAi). Also there is an increase in muscle area when fat body E2F function is impaired. This change is also reflected in the quantification of DLM area in Figure 1B. But the authors don't say much about elevated Dp levels in muscle or increased DLM area of Fat specific Dp KD. Would the authors not expect Dp staining in muscle to be normal and similar to mCherry-RNAi control in Cg>dpRNAi? The authors could consider discussing and contextualizing this as opposed to making a broad statement regarding muscle function all being normal. Perhaps muscle function may be different, perhaps better when E2F function in fat is impaired.

      The overall muscle structure was examined in animals staged at third instar larva (Figure 1A-B). No defects were detected in muscle size between cg>Dp-RNAi animals and controls. In addition, the expression of Dp was not altered in cg>Dp-RNAi muscles compared to control muscles. The best developmental stage to compare the muscle structure between Mef2>Dp-RNAi and cg>Dp-RNAi animals is actually third instar larva, prior to their lethality at pupa stage (Figure 1- figure supplement 1).

      Based on the reviewer’s comment, we set up a new experiment to further analyze the phenotype at pharate stage. However, when we repeated this experiment, we did not recover cg>Dp-RNAi pharate, even though 2/3 of Mef2>Dp-RNAi animals survived up to late pupal stage. We think that this is likely due to the change in fly food provider. Since most cg>DpRNAi animals die at early pupal stage (>75% animals, Figure 1-figure supplement 1), pharate is not a good representative developmental stage to examine phenotypes. Therefore, panels were removed.

      Text was revised accordingly (page 6).

      4) In lines 376-380, the authors make the argument that muscle-specific knockdown can impair the ability of the fat body to regulate storage, but evidence for this is not robust. While the authors refer to a decrease in lipid droplet size in figure S4E this is not a statistically significant decrease. In order to make this case, the authors would want to consider performing a triglyceride (TAG) assay, which is routinely performed in flies.

      Our conclusions were revised and adjusted to match our data. The paragraph was reworded to highlight the outcome of the triglyceride assay, which was previously done. We realized the reference to Figure 6H that shows the triglyceride (TAG) assay was missing on page 17. Please see page 17 and page 21 of discussion.

    1. Author response:

      Reviewer #1 (Public Review):

      In this study, Girardello et al. use proteomics to reveal the membrane tension sensitive caveolin-1 interactome in migrating cells. The authors use EM and surface rendering to demonstrate that caveolae formed at the rear of migrating cells are complex membrane-linked multilobed structures, and they devise a robust strategy to identify caveolin-1 associated proteins using APEX2-mediated proximity biotinylation. This important dataset is further validated using proximity ligation assays to confirm key interactions, and follows up with an interrogation of a surprising relationship between caveolae and RhoGTPase signalling, where caveolin-1 recruits ROCK1 under high membrane tension conditions, and ROCK1 activity is required to reform caveolae upon reversion to isotonic solution. However, caveolin-1 recruits the RhoA inactivator ARHGAP29 when membrane tension is low and ARHGAP29 overexpression leads to disassembly of caveolae and reduced cell motility. This study builds on previous findings linking caveolae to positive feedback regulation of RhoA signalling, and provides further evidence that caveolae serve to drive rear retraction in migration but also possess an intrinsic brake to limit RhoA activation, leading the authors to suggest that cycles of caveolae assembly and disassembly could thereby be central to establish a stable cell rear for persistent cell migration

      A major strength of the manuscript is the robust proteomic dataset. The experimental set up is well defined and mostly well controlled, and there is good internal validation in that the high abundance of core caveolar proteins in low membrane tension (isotonic) conditions, and absence under high membrane tension (brief hypo-osmotic shock) conditions, correlating very well with previous finding. The data could however be better presented to show where statically robust changes occur, and supplementary information should include a table of showing abundance. It's very good to see a link to PRIDE, providing a useful resource for the community.

      We thank the reviewer for the positive feedback. We have included the outputs from the search engine in Supplementary File 1.

      The authors detail several known interactions and their mechanosensitivty, but also report new interactors of caveolin-1. Several mechanosensitive interactions of caveolin-1 take place at the cell rear, but others are more diffuse across the cell looking at the PLA data (e.g FLN1, CTTN, HSPB1; Figure 4A-F and Figure 4 supplement 1). It is interesting to speculate that those at the cell rear are involved in caveolae, whilst others are linked specifically to caveolin-1 (e.g. dolines). PLA or localisation analysis with Cavin1/PTRF may be able to resolve this and further specify caveolae versus non-caveolae mechanosensitive interactions.

      We thank the reviewer for this interesting idea. It is true that many if not most proteins we identified to be associated with Cav1 are not restricted to the cell rear. To analyse to what extent the identified proteins interact with Cav1 at the rear we reanalysed our PLA data for some of the antibody combinations we looked at. This new analysis is now shown in Fig 5G. As expected, for Cav1/PTRF and Cav1/EHD2 most PLA dots (70-80%) were found at the rear. This rear bias is also evident from the representative images we show in the Figure panels 5A and 5E. On the contrary, much fewer PLA dots (~40%) were rear-localised for Cav1/CTTN and Cav1/FLNA antibody combinations. This reflects the much broader cellular distribution of these proteins compared to the core caveolae proteins, and might suggest that there are generally few links between caveolae and cortical actin. However, it is also possible that such links/interactions are more difficult to detect using PLA (because of the extended distance between caveolae and the actin cortex, or because of steric constraints).

      The Cav1/ARHGAP29 influence on YAP signalling is interesting, but appear to be quite isolated from the rest of the manuscript. Does overexpression of ARHGAP29 influence YAP signalling and/or caveolar protein expression/Cav1pY14?

      Our data and published work originally prompted us to speculate that there is a potential functional link between Cav1, YAP, and ARHGAP29. In an attempt to address this we have performed several Western blots on cell lysates from cells overexpressing ARHGAP29. We did not see major changes in Cav1 Y14 phosphorylation levels in cells overexpressing ARHGAP29, and YAP and pYAP levels also remained unchanged (not shown). In addition, based on previous literature 1,2 we expected to see an effect on ARHGAP29 mRNA levels and YAP target gene transcripts in Cav1 siRNA transfected cells. To our surprise, the mRNA levels of three independent YAP target genes and ARHGAP29 were unchanged in Cav1 siRNA treated cells (this is now shown in Figure 6 Figure Supplement 1). Our data therefore suggest that in RPE1 cells, the connection between Cav1 and ARHGAP29 is independent of YAP signalling, and that the increase in ARHGAP29 protein levels observed in Cav1 siRNA cells is due to some unknown post-translational mechanism.

      ARHGAP29 and RhoA/ROCK1 related observations are very interesting and potentially really important. However, the link between ARHGAP29 and caveolae is not well established (other than in proteomic data). PLA or FRET could help establish this.

      We agree that the physical and functional link between caveolae (or Cav1) and ARHGAP29 was not well worked out in the original manuscript. In an attempt to address this we have performed PLA assays in GFP-ARHGAP29 transfected cells (as we did not find a suitable ARHGAP29 antibody that works reliably in IF) using anti-Cav1 and anti-GFP antibodies. The PLA signal we obtained for Cav1 and ARHGAP29 was not significantly different to control PLA experiments. There was very little PLA signal to start with. This is not surprising given that ARHGAP29 localisation is mostly diffuse in the cytoplasm, whilst Cav1 is concentrated at the rear. In addition, in cases where we do see ARHGAP29 localisation at the cell cortex, Cav1 tends to be absent (this is now shown in Figure 6 – Figure Supplement 2E). In other words, with the tools we have available, we see little colocalization between Cav1 and ARHGAP29 at steady state. Altogether we speculate that ARHGAP29, through its negative effect on RhoA, flattens caveolae at the membrane or interferes with caveolae assembly at these sites.

      This of course prompts the question why ARHGAP29 was identified in the Cav1 proteome with such specificity and reproducibility in the first place? This can be explained by the way APEX2 labeling works. Proximity biotinylation with APEX2 is extremely sensitive and restricted to a labelling radius of ~20 nm 3. The labeling reaction is conducted on live and intact cells at room temperature for 1 min. Although 1 min appears short, dynamic cellular processes occur at the time scale of seconds and are ongoing during the labelling reaction. It is conceivable that within this 1 min time frame, ARHGAP29 cycles on and off the rear membrane (kiss and run). This allows ARHGAP29 to be biotinylated by Cav1-APEX2, resulting in its identification by MS. We have included this in the discussion section.

      The relationship between ARHGAP29 and RhoA signalling is not well defined. Is GAP activity important in determining the effect on migration and caveolae formation? What is the effect on RhoA activity? Alternatively, the authors could investigate YAP dependent transcriptional regulation downstream of overexpression.

      We have addressed this point using overexpression and siRNA transfections. We overexpressed ARHGAP29 or ARHGAP29 lacking its GAP domain and performed WB analysis against pMLC (which is a commonly used and reliable readout for RhoA and myosin-II activity). Much to our surprise, overexpression of ARHGAP29 increased (rather than decreased) pMLC levels, partially in a GAP-dependent manner (see Author response image 1). This is puzzling, as ARHGAP29 is expected to reduce RhoA-GTP levels, which in turn is expected to reduce ROCK activity and hence pMLC levels. In addition, and also surprisingly, siRNA-mediated silencing of ARHGAP29 did not significantly change pMLC levels. By contrast, pMLC levels were strongly reduced in Cav1 siRNA treated cells (this is shown in Fig. 6A and 6B in the revised manuscript). These new data underscore the important role of caveolae in the control of myosin-II activity, but do not allow us to draw any firm conclusions about the role of ARHGAP29 at the cell rear.

      Author response image 1.

      Overexpression of ARHGAP29 reduces, rather than increases pMLC in RPE1 cells.

      We are uncertain as to how to interpret the ARHGAP29 overexpression data presented in Author response image 1 and therefore decided not to include it in the manuscript. One possibility is that inactivation of RhoA below a certain critical threshold causes other mechanisms to compensate. For instance, the activity of alternative MLC kinases such as MLCK could be enhanced under these conditions. Another possibility is that ARHGAP29 controls MLC phosphorylation indirectly. For instance, it has been shown that ARHGAP29 promotes actin destabilization through inactivating LIMK/cofilin signalling 1. In agreement with this, we find that overexpression of ARHGAP29 reduces p-cofilin (serine 3) levels (see Author response image 2). Since cofilin and MLC crosstalk 4, it is possible that increased pMLC levels are the result of a feedback loop that compensates for the effect of actin depolymerisation. This is now discussed in the discussion section. Whichever the case, we hope the reviewers understand that deeper mechanistic insight into the intricate mechanisms of Rho signalling at the cell rear are beyond the scope of this manuscript.

      Author response image 2.

      Overexpression of ARHGAP29 reduces p-cofilin levels in RPE1.

      Reviewer #2 (Public Review):

      Girardello et al investigated the composition of the molecular machinery of caveolae governing their mechano-regulation in migrating cells. Using live cell imaging and RPE1 cells, the authors provide a spatio-temporal analysis of cavin-3 distribution during cell migration and reveal that caveolae are preferentially localized at the rear of the cell in a stable manner. They further characterize these structures using electron tomography and reveal an organization into clusters connected to the cell surface. By performing a proteomic approach, they address the interactome of caveolin-1 proteins upon mechanical stimulation by exposing RPE1 cells to hypo-osmotic shock (which aims to increase cell membrane tension) or not as a control condition. The authors identify over 300 proteins, notably proteins related to actin cytoskeleton and cell adhesion. These results were further validated in cellulo by interrogating protein-protein interactions using proximity ligation assays and hypo-osmotic shock. These experiments confirmed previous data showing that high membrane tension induces caveolae disassembly in a reversible manner. Eventually, based on literature and on the results collected by the proteomic analysis, authors investigated more deeply the molecular signaling pathway controlling caveolae assembly upon mechanical stimuli. First, they confirm the targeting of ROCK1 with Caveolin-1 and the implication of the kinase activity for caveolae formation (at the rear of the cell). Then, they show that RhoGAP ARHGAP29, a factor newly identified by the proteomic analysis, is also implicated in caveolae mechano-regulation likely through YAP protein and found that overexpression of RhoGAP ARHGAP29 affects cell motility. Overall, this paper interrogated the role of membrane tension in caveolae located at the rear of the cell and identified a new pathway controlling cell motility.

      Strengths:

      Using a proximity-based proteomic assay, the authors reveal the protein network interacting with caveolae upon mechanical stimuli. This approach is elegant and allows to identify a substantial new set of factors involved in the mechano-regulation of caveolin-1, some of which have been verified directly in the cell by PLA. This study provides a compelling set of data on the interactions between caveolae and its cortical network which was so far ill-characterized.

      We thank the reviewer for this positive feedback.

      Weaknesses:

      The methodology demonstrating an impact of membrane tension is not precise enough to directly assess a direct role on caveolae at a subcellular scale, that is between the front and the rear of the cell. First, a better characterization of the "front-rear" cellular model is encouraged.

      We agree with the reviewer that a quantitative analysis of the caveolae front-rear polarity would strengthen our conclusions. To address this, we have analysed the localisation of Cav1 and cavins in detail and in a large pool of cells, both in fixed and live cells. Our quantification clearly shows that Cav1 and cavins are enriched at the cell rear. This is now shown in Figure 1 and Figure 1 - Figure Supplement 1. To demonstrate that Cav1/cavins are truly rear-localised we analysed live migrating cells expressing tagged Cav1 or cavins. This analysis, which was performed on several individual time lapse movies, showed that caveolae rear localisation is remarkably stable (e.g. Figure 1C and 1D). We also present novel data panels and movies showing caveolae dynamics during rear retractions, in dividing cells, and in cells that polarise de novo. This new data is now described in the first paragraph of the results section.

      Secondly, authors frequently present osmotic shock as "high membrane tension" stimuli. While osmotic shock is widely used in the field, this study is focused only on caveolae localized at the rear of cell and it remains unclear how the level of a global mechanical stimuli triggered by an osmotic shock could mimic a local stimuli.

      We agree with the reviewer that osmotic shock will cause a global increase in membrane tension and therefore is only of limited value to understand how membrane tension is regulated at the rear, and how caveolae respond to such a local stimulus. It was not our aim nor is it our expertise to address such questions. To answer this sophisticated optogenetic approaches or localised membrane tension measurements (e.g. through the use of the Flipper-TR probe) are needed. It is beyond the scope of this manuscript to perform such experiments. However, given the strong enrichment of caveolae at the cell rear, we believe it is justified to propose that the changes we observe in the proteome do (mostly) reflect changes in caveolae at the rear. We have now included several quantifications on fixed cells, live cells, and PLA assays to support that caveolae are highly enriched at the rear. In addition, and importantly, a recent preprint by the Roux lab shows that membrane tension gradients indeed exist in many migrating and non-migrating cells 5. Using very similar hypotonic shock assays, the Caswell lab also showed that low membrane tension at the rear is required for caveolae formation 6. We have included a section in the discussion in which we elaborate on how membrane tension is controlled in migrating cells, and how it might regulate caveolae rear localisation.

      In the present case, it remains unknown the extent to which this mechanical stress is physiologically relevant to mimic mechanical forces applied at the rear of a migrating cell.

      This is true. Our study does not address the nature of mechanical forces at the cell rear. This a complex subject that is technically challenging to address, and therefore is beyond the scope of this manuscript.

      Some images are not satisfying to fully support the conclusions of the article.

      We agree that some of the images, in particular the ones presented for the PLA assays, do not always show a clear rear localisation of caveolae. We have explained above why this is the case. We hope that our new quantitative measurements, movies and figure panels, addresses the reviewer’s concern.

      At this stage, the lack of an unbiased quantitative analysis of the spatio-temporal analysis of caveolae upon well-defined mechanical stimuli is also needed.

      These are all very good points that were previously addressed beautifully by the Caswell group 6. To address this in part in our RPE1 cell system, we imaged RPE1 cells exposed to the ROCK inhibitor Y27632 (see Author response image 3). The data shows that cell rear retraction is impeded in response to ROCK inhibition, which is in line with several previous reports. Cavin-1 remained mostly associated with the cell rear, although the distribution appeared more diffuse. We believe this data does not add much new insight into how caveolae function at the rear, and hence was not included in the manuscript.

      Author response image 3.

      Effect of ROCK inhibition on cavin1 rear localisation and rear retraction. Cells were imaged one hour after the addition of Y27632.

      Cells on images, in particular Figure 1, are difficult to see. Signal-to noise ratio in different cell area could generate a biased. Since there is inconsistency between caveolae density and localization between Figures, more solid illustrations are needed along quantitative analysis.

      As mentioned above, we have carefully analysed the localisation of caveolae in fixed cells (using Cav1 and cavin1 antibodies as well as Cav1 and cavin fusion proteins) and in live cells transfected with various different caveolae proteins. The analysis clearly demonstrates an enrichment of caveolae at the rear (Figure 1 and Figure 1 – Figure Supplement 1). Our tomography and TEM data supports this as well (Figure 2).

      References:

      1. Qiao Y, Chen J, Lim YB, et al. YAP Regulates Actin Dynamics through ARHGAP29 and Promotes Metastasis. Cell reports. 2017;19(8):1495-1502.

      2. Rausch V, Bostrom JR, Park J, et al. The Hippo Pathway Regulates Caveolae Expression and Mediates Flow Response via Caveolae. Curr Biol. 2019;29(2):242-255 e246.

      3. Hung V, Udeshi ND, Lam SS, et al. Spatially resolved proteomic mapping in living cells with the engineered peroxidase APEX2. Nat Protoc. 2016;11(3):456-475.

      4. Wiggan O, Shaw AE, DeLuca JG, Bamburg JR. ADF/cofilin regulates actomyosin assembly through competitive inhibition of myosin II binding to F-actin. Dev Cell. 2012;22(3):530-543.

      5. Juan Manuel García-Arcos AM, Julissa Sánchez Velázquez, Pau Guillamat, Caterina Tomba, Laura Houzet, Laura Capolupo, Giovanni D’Angelo, Adai Colom, Elizabeth Hinde, Charlotte Aumeier, Aurélien Roux. Actin dynamics sustains spatial gradients of membrane tension in adherent cells. bioRxiv 20240715603517. 2024.

      6. Hetmanski JHR, de Belly H, Busnelli I, et al. Membrane Tension Orchestrates Rear Retraction in Matrix-Directed Cell Migration. Dev Cell. 2019;51(4):460-475 e410.

      7. Tsai TY, Collins SR, Chan CK, et al. Efficient Front-Rear Coupling in Neutrophil Chemotaxis by Dynamic Myosin II Localization. Dev Cell. 2019;49(2):189-205 e186.

      8. Mueller J, Szep G, Nemethova M, et al. Load Adaptation of Lamellipodial Actin Networks. Cell. 2017;171(1):188-200 e116.

      9. De Belly H, Yan S, Borja da Rocha H, et al. Cell protrusions and contractions generate long-range membrane tension propagation. Cell. 2023.

      10. Matthaeus C, Sochacki KA, Dickey AM, et al. The molecular organization of differentially curved caveolae indicates bendable structural units at the plasma membrane. Nat Commun. 2022;13(1):7234.

      11. Sinha B, Koster D, Ruez R, et al. Cells respond to mechanical stress by rapid disassembly of caveolae. Cell. 2011;144(3):402-413.

      12. Lieber AD, Schweitzer Y, Kozlov MM, Keren K. Front-to-rear membrane tension gradient in rapidly moving cells. Biophysical journal. 2015;108(7):1599-1603.

      13. Shi Z, Graber ZT, Baumgart T, Stone HA, Cohen AE. Cell Membranes Resist Flow. Cell. 2018;175(7):1769-1779 e1713.

      14. Grande-Garcia A, Echarri A, de Rooij J, et al. Caveolin-1 regulates cell polarization and directional migration through Src kinase and Rho GTPases. The Journal of cell biology. 2007;177(4):683-694.

      15. Grande-Garcia A, del Pozo MA. Caveolin-1 in cell polarization and directional migration. Eur J Cell Biol. 2008;87(8-9):641-647.

      16. Ludwig A, Howard G, Mendoza-Topaz C, et al. Molecular composition and ultrastructure of the caveolar coat complex. PLoS biology. 2013;11(8):e1001640.

    1. Author Response

      Reviewer #1 (Public Review):

      Esmaily and colleagues report two experimental studies in which participants make simple perceptual decisions, either in isolation or in the context of a joint decision-making procedure. In this "social" condition, participants are paired with a partner (in fact, a computer), they learn the decision and confidence of the partner after making their own decision, and the joint decision is made on the basis of the most confident decision between the participant and the partner. The authors found that participants' confidence, response times, pupil dilation, and CPP (i.e. the increase of centro-parietal EEG over time during the decision process) are all affected by the overall confidence of the partner, which was manipulated across blocks in the experiments. They describe a computational model in which decisions result from a competition between two accumulators, and in which the confidence of the partner would be an input to the activity of both accumulators. This model qualitatively produced the variation in confidence and RTs across blocks.

      The major strength of this work is that it puts together many ingredients (behavioral data, pupil and EEG signals, computational analysis) to build a picture of how the confidence of a partner, in the context of joint decision-making, would influence our own decision process and confidence evaluations. Many of these effects are well described already in the literature, but putting them all together remains a challenge.

      We are grateful for this positive assessment.

      However, the construction is fragile in many places: the causal links between the different variables are not firmly established, and it is not clear how pupil and EEG signals mediate the effect of the partner's confidence on the participant's behavior.

      We have modified the language of the manuscript to avoid the implication of a causal link.

      Finally, one limitation of this setting is that the situation being studied is very specific, with a joint decision that is not the result of an agreement between partners, but the automatic selection of the most confident decisions. Thus, whether the phenomena of confidence matching also occurs outside of this very specific setting is unclear.

      We have now acknowledged this caveat in the discussion in line 485 to 504. The final paragraph of the discussion now reads as follows:

      “Finally, one limitation of our experimental setup is that the situation being studied is confined to the design choices made by the experimenters. These choices were made in order to operationalize the problem of social interaction within the psychophysics laboratory. For example, the joint decisions were not made through verbal agreement (Bahrami et al., 2010, 2012). Instead, following a number of previous works (Bang et al., 2017, 2020) joint decisions were automatically assigned to the most confident choice. In addition, the partner’s confidence and choice were random variables drawn from a distribution prespecified by the experimenter and therefore, by design, unresponsive to the participant’s behaviour. In this sense, one may argue that the interaction partner’s behaviour was not “natural” since they did not react to the participant's confidence communications (note however that the partner’s confidence and accuracy were not entirely random but matched carefully to the participant’s behavior prerecorded in the individual session). How much of the findings are specific to these experimental setting and whether the behavior observed here would transfer to real-life settings is an open question. For example, it is plausible that participants may show some behavioral reaction to a human partner’s response time variations since there is some evidence indicating that for binary choices such as those studied here, response times also systematically communicate uncertainty to others (Patel et al., 2012). Future studies could examine the degree to which the results might be paradigm-specific.”

      Reviewer #2 (Public Review):

      This study is impressive in several ways and will be of interest to behavioral and brain scientists working on diverse topics.

      First, from a theoretical point of view, it very convincingly integrates several lines of research (confidence, interpersonal alignment, psychophysical, and neural evidence accumulation) into a mechanistic computational framework that explains the existing data and makes novel predictions that can inspire further research. It is impressive to read that the corresponding model can account for rather non-intuitive findings, such as that information about high confidence by your collaborators means people are faster but not more accurate in their judgements.

      Second, from a methodical point of view, it combines several sophisticated approaches (psychophysical measurements, psychophysical and neural modelling, electrophysiological and pupil measurements) in a manner that draws on their complementary strengths and that is most compelling (but see further below for some open questions). The appeal of the study in that respect is that it combines these methods in creative ways that allow it to answer its specific questions in a much more convincing manner than if it had used just either of these approaches alone.

      Third, from a computational point of view, it proposes several interesting ways by which biologically realistic models of perceptual decision-making can incorporate socially communicated information about other's confidence, to explain and predict the effects of such interpersonal alignment on behavior, confidence, and neural measurements of the processes related to both. It is nice to see that explicit model comparison favor one of these ways (top-down driving inputs to the competing accumulators) over others that may a priori have seemed more plausible but mechanistically less interesting and impactful (e.g., effects on response boundaries, no-decision times, or evidence accumulation).

      Fourth, the manuscript is very well written and provides just the right amount of theoretical introduction and balanced discussion for the reader to understand the approach, the conclusions, and the strengths and limitations.

      Finally, the manuscript takes open science practices seriously and employed preregistration, a replication sample, and data sharing in line with good scientific practice.

      We are grateful to the reviewer for their positive assessment of our work.

      Having said all these positive things, there are some points where the manuscript is unclear or leaves some open questions. While the conclusions of the manuscript are not overstated, there are unclarities in the conceptual interpretation, the descriptions of the methods, some procedures of the methods themselves, and the interpretation of the results that make the reader wonder just how reliable and trustworthy some of the many findings are that together provide this integrated perspective.

      We hope that our modifications and revisions in response to the criticisms listed below will be satisfactory. To avoid redundancies, we have combined each numbered comment with the corresponding recommendation for the Authors.

      First, the study employs rather small sample sizes of N=12 and N=15 and some of the effects are rather weak (e.g., the non-significant CPP effects in study 1). This is somewhat ameliorated by the fact that a replication sample was used, but the robustness of the findings and their replicability in larger samples can be questioned.

      Our study brings together questions from two distinct fields of neuroscience: perceptual decision making and social neuroscience. Each of these two fields have their own traditions and practical common sense. Typically, studies in perceptual decision making employ a small number of extensively trained participants (approximately 6 to 10 individuals). Social neuroscience studies, on the other hand, recruit larger samples (often more than 20 participants) without extensive training protocols. We therefore needed to strike a balance in this trade-off between number of participants and number of data points (e.g. trials) obtained from each participant. Note, for example, that each of our participants underwent around 4000 training trials. Strikingly, our initial study (N=12) yielded robust results that showed the hypothesized effects nearly completely, supporting the adequacy of our power estimate. However, we decided to replicate the findings because, like the reviewer, we believe in the importance of adequate sampling. We increased our sample size to N=15 participants to enhance the reliability of our findings. However, we acknowledge the limitation of generalizing to larger samples, which we have now discussed in our revised manuscript and included a cautionary note regarding further generalizations.

      To complement our results and add a measure of their reliability, here we provide the results of a power analysis that we applied on the data from study 1 (i.e. the discovery phase). These results demonstrate that the sample size of study 2 (i.e. replication) was adequate when conditioned on the results from study 1 (see table and graph pasted below). The results showed that N=13 would be an adequate sample size for 80% power for behavoural and eye-tracking measurements. Power analysis for the EEG measurements indicated that we needed N=17. Combining these power analyses. Our sample size of N=15 for Study 2 was therefore reasonably justified.

      We have now added a section to the discussion (Lines 790-805) that communicates these issues as follows:

      “Our study brings together questions from two distinct fields of neuroscience: perceptual decision making and social neuroscience. Each of these two fields have their own traditions and practical common sense. Typically, studies in perceptual decision making employ a small number of extensively trained participants (approximately 6 to 10 individuals). Social neuroscience studies, on the other hand, recruit larger samples (often more than 20 participants) without extensive training protocols. We therefore needed to strike a balance in this trade-off between number of participants and number of data points (e.g. trials) obtained from each participant. Note, for example, that each of our participants underwent around 4000 training trials. Importantly, our initial study (N=12) yielded robust results that showed the hypothesized effects nearly completely, supporting the adequacy of our power estimate. However, we decided to replicate the findings in a new sample with N=15 participants to enhance the reliability of our findings and examine our hypothesis in a stringent discovery-replication design. In Figure 4-figure supplement 5, we provide the results of a power analysis that we applied on the data from study 1 (i.e. the discovery phase). These results demonstrate that the sample size of study 2 (i.e. replication) was adequate when conditioned on the results from study 1.”

      We conducted Monte Carlo simulations to determine the sample size required to achieve sufficient statistical power (80%) (Szucs & Ioannidis, 2017). In these simulations, we utilized the data from study 1. Within each sample size (N, x-axis), we randomly selected N participants from our 12 partpincats in study 1. We employed the with-replacement sampling method. Subsequently, we applied the same GLMM model used in the main text to assess the dependency of EEG signal slopes on social conditions (HCA vs LCA). To obtain an accurate estimate, we repeated the random sampling process 1000 times for each given sample size (N). Consequently, for a given sample size, we performed 1000 statistical tests using these randomly generated datasets. The proportion of statistically significant tests among these 1000 tests represents the statistical power (y-axis). We gradually increased the sample size until achieving an 80% power threshold, as illustrated in the figure.The the number indicated by the red circle on the x axis of this graph represents the designated sample size.

      Second, the manuscript interprets the effects of low-confidence partners as an impact of the partner's communicated "beliefs about uncertainty". However, it appears that the experimental setup also leads to greater outcome uncertainty (because the trial outcome is determined by the joint performance of both partners, which is normally reduced for low-confidence partners) and response uncertainty (because subjects need to consider not only their own confidence but also how that will impact on the low-confidence partner). While none of these other possible effects is conceptually unrelated to communicated confidence and the basic conclusions of the manuscript are therefore valid, the reader would like to understand to what degree the reported effects relate to slightly different types of uncertainty that can be elicited by communicated low confidence in this setup.

      We appreciate the reviewer’s advice to remain cautious about the possible sources of uncertainty in our experiment. In the Discussion (lines 790-801) we have now added the following paragraph.

      “We have interpreted our findings to indicate that social information, i.e. partner’s confidence, impacts the participants beliefs about uncertainty. It is important to underscore here that, similar to real life, there are other sources of uncertainty in our experimental setup that could affect the participants' belief. For example, under joint conditions, the group choice is determined through the comparison of the choices and confidences of the partners. As a result, the participant has a more complex task of matching their response not only with their perceptual experience but also coordinating it with the partner to achieve the best possible outcome. For the same reason, there is greater outcome uncertainty under joint vs individual conditions. Of course, these other sources of uncertainty are conceptually related to communicated confidence but our experimental design aimed to remove them, as much as possible, by comparing the impact of social information under high vs low confidence of the partner.”

      In addition to the above, we would like to clarify one point here with specific respect to the comment. Note that the computer-generated partner’s accuracy was identical under high and low confidence. In addition, our behavioral findings did not show any difference in accuracy under HCA and LCA conditions. As a consequence, the argument that “the trial outcome is determined by the joint performance of both partners, which is normally reduced for low-confidence partners)” is not valid because the low-confidence partner’s performance is identical to that of the high-confidence partner. It is possible, of course, that we have misunderstood the reviewer’s point here and we would be happy to discuss this further if necessary.

      Third, the methods used for measurement, signal processing, and statistical inference in the pupil analysis are questionable. For a start, the methods do not give enough details as to how the stimuli were calibrated in terms of luminance etc so that the pupil signals are interpretable.

      Here we provide in Author response image 1 the calibration plot for our eye tracking setup, describing the relationship between pupil size and display luminance. Luminance of the random dot motion stimuli (ie white dots on black background) was Cd/m2 and, importantly, identical across the two critical social conditions. We hope that this additional detail satisfies the reviewer’s concern. For the purpose of brevity, we have decided against adding this part to the manuscript and supplementary material.

      Author response image 1.

      Calibration plot for the experimental setup. Average pupil size (arbitrary units from eyelink device) is plotted against display luminance. The plot is obtained by presenting the participant with uniform full screen displays with 10 different luminance levels covering the entire range of the monitor RGB values (0 to 255) whose luminance was separately measured with a photometer. Each display lasted 10 seconds. Error bars are standard deviation between sessions.

      Moreover, while the authors state that the traces were normalized to a value of 0 at the start of the ITI period, the data displayed in Figure 2 do not show this normalization but different non-zero values. Are these data not normalized, or was a different procedure used? Finally, the authors analyze the pupil signal averaged across a wide temporal ITI interval that may contain stimulus-locked responses (there is not enough information in the manuscript to clearly determine which temporal interval was chosen and averaged across, and how it was made sure that this signal was not contaminated by stimulus effects).

      We have now added the following details to the Methods section in line 1106-1135.

      “In both studies, the Eye movements were recorded by an EyeLink 1000 (SR- Research) device with a sampling rate of 1000Hz which was controlled by a dedicated host PC. The device was set in a desktop and pupil-corneal reflection mode while data from the left eye was recorded. At the beginning of each block, the system was recalibrated and then validated by 9-point schema presented on the screen. For one subject was, a 3-point schema was used due to repetitive calibration difficulty. Having reached a detection error of less than 0.5°, the participants proceeded to the main task. Acquired eye data for pupil size were used for further analysis. Data of one subject in the first study was removed from further analysis due to storage failure.

      Pupil data were divided into separate epochs and data from Inter-Trials Interval (ITI) were selected for analysis. ITI interval was defined as the time between offset of trial (t) feedback screen and stimulus presentation of trial (t+1). Then, blinks and jitters were detected and removed using linear interpolation. Values of pupil size before and after the blink were used for this interpolation. Data was also mid-pass filtered using a Butterworth filter (second order,[0.01, 6] Hz)[50]. The pupil data was z-scored and then was baseline corrected by removing the average of signal in the period of [-1000 0] ms interval (before ITI onset). For the statistical analysis (GLMM) in Figure 2, we used the average of the pupil signal in the ITI period. Therefore, no pupil value is contaminated by the upcoming stimuli. Importantly, trials with ITI>3s were excluded from analysis (365 out of 8800 for study 1 and 128 out 6000 for study 2. Also see table S7 and Selection criteria for data analysis in Supplementary Materials)”

      Fourth, while the EEG analysis in general provides interesting data, the link to the well-established CPP signal is not entirely convincing. CPP signals are usually identified and analyzed in a response-locked fashion, to distinguish them from other types of stimulus-locked potentials. One crucial feature here is that the CPPs in the different conditions reach a similar level just prior to the response. This is either not the case here, or the data are not shown in a format that allows the reader to identify these crucial features of the CPP. It is therefore questionable whether the reported signals indeed fully correspond to this decision-linked signal.

      Fifth, the authors present some effective connectivity analysis to identify the neural mechanisms underlying the possible top-down drive due to communicated confidence. It is completely unclear how they select the "prefrontal cortex" signals here that are used for the transfer entropy estimations, and it is in fact even unclear whether the signals they employ originate in this brain structure. In the absence of clear methodical details about how these signals were identified and why the authors think they originate in the prefrontal cortex, these conclusions cannot be maintained based on the data that are presented.

      Sixth, the description of the model fitting procedures and the parameter settings are missing, leaving it unclear for the reader how the models were "calibrated" to the data. Moreover, for many parameters of the biophysical model, the authors seem to employ fixed parameter values that may have been picked based on any criteria. This leaves the impression that the authors may even have manually changed parameter values until they found a set of values that produced the desired effects. The model would be even more convincing if the authors could for every parameter give the procedures that were used for fitting it to the data, or the exact criteria that were used to fix the parameter to a specific value.

      Seventh, on a related note, the reader wonders about some of the decisions the authors took in the specification of their model. For example, why was it assumed that the parameters of interest in the three competing models could only be modulated by the partner's confidence in a linear fashion? A non-linear modulation appears highly plausible, so extreme values of confidence may have much more pronounced effects. Moreover, why were the confidence computations assumed to be finished at the end of the stimulus presentation, given that for trials with RTs longer than the stimulus presentation, the sensory information almost certainly reverberated in the brain network and continued to be accumulated (in line with the known timing lags in cortical areas relative to objective stimulus onset)? It would help if these model specification choices were better justified and possibly even backed up with robustness checks.

      Eight, the fake interaction partners showed several properties that were highly unnatural (they did not react to the participant's confidence communications, and their response times were random and thus unrelated to confidence and accuracy). This questions how much the findings from this specific experimental setting would transfer to other real-life settings, and whether participants showed any behavioral reactions to the random response time variations as well (since several studies have shown that for binary choices like here, response times also systematically communicate uncertainty to others). Moreover, it is also unclear how the confidence convergence simulated in Figure 3d can conceptually apply to the data, given that the fake subjects did not react to the subject's communicated confidence as in the simulation.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors set out to extend modeling of bispecific engager pharmacology through explicit modelling of the search of T cells for tumour cells, the formation of an immunological synapse and the dissociation of the immunological synapse to enable serial killing. These features have not been included in prior models and their incorporation may improve the predictive value of the model.

      Thank you for the positive feedback.

      The model provides a number of predictions that are of potential interest- that loss of CD19, the target antigen, to 1/20th of its initial expression will lead to escape and that the bone marrow is a site where the tumour cells may have the best opportunity to develop loss variants due to the limited pressure from T cells.

      Thank you for the positive feedback.

      A limitation of the model is that adhesion is only treated as a 2D implementation of the blinatumomab mediated bridge between T cell and B cells- there is no distinct parameter related to the distinct adhesion systems that are critical for immunological synapse formation. For example, CD58 loss from tumours is correlated with escape, but it is not related to the target, CD19. While they begin to consider the immunological synapse, they don't incorporate adhesion as distinct from the engager, which is almost certainly important.

      We agree that adhesion molecules play critical roles in cell-cell interaction. In our model, we assumed these adhesion molecules are constant (or not showing difference across cell populations). This assumption made us to focus on the BiTE-mediated interactions.

      Revision: To clarify this point, we added a couple of sentences in the manuscript.

      “Adhesion molecules such as CD2-CD58, integrins and selectins, are critical for cell-cell interaction. The model did not consider specific roles played by these adhesion molecules, which were assumed constant across cell populations. The model performed well under this simplifying assumption”.

      In addition, we acknowledged the fact that “synapse formation is a set of precisely orchestrated molecular and cellular interactions. Our model merely investigated the components relevant to BiTE pharmacologic action and can only serve as a simplified representation of this process”.

      While the random search is a good first approximation, T cell behaviour is actually guided by stroma and extracellular matrix, which are non-isotropic. In a lymphoid tissue the stroma is optimised for a search that can be approximated as brownian, or more accurately, a correlated random walk, but in other tissues, particularly tumours, the Brownian search is not a good approximation and other models have been applied. It would be interesting to look at observations from bone marrow or other sites to determine the best approximating for the search related to BiTE targets.

      We agree that the tissue stromal factors greatly influence the patterns of T cell searching strategy. Our current model considered Brownian motion as a good first approximation for two reasons: 1) we define tissues as homogeneous compartments to attain unbiased evaluations of factors that influence BiTE-mediated cell-cell interaction, such as T cell infiltration, T: B ratio, and target expression. The stromal factors were not considered in the model, as they require spatially resolved tissue compartments to represent the gradients of stromal factors; 2) our model was primarily calibrated against in vitro data obtained from a “well-mixed” system that does not recapitulate specific considerations of tissue stromal factors. We did not obtain tissue-specific data to support the prediction of T cell movement. This is under current investigation in our lab. Therefore, we are cautious about assuming different patterns of T cell movement in the model when translating into in vivo settings. We acknowledged the limitation of our model for not considering the more physiologically relevant T-cell searching strategies.

      Revision: In the Discussion, we added a limitation of our model: “We assumed Brownian motion in the model as a good first approximation of T cell movement. However, T cells often take other more physiologically relevant searching strategies closely associated with many stromal factors. Because of these stromal factors, the cell-cell encounter probabilities would differ across anatomical sites.”

      Reviewer #3 (Public Review):

      Liu et al. combined mechanistic modeling with in vitro experiments and data from a clinical trial to develop an in silico model to describe response of T cells against tumor cells when bi-specific T cell engager (BiTE) antigens, a standard immunotherapeutic drug, are introduced into the system. The model predicted responses of T cell and target cell populations in vitro and in vivo in the presence of BiTEs where the model linked molecular level interactions between BiTE molecules, CD3 receptors, and CD19 receptors to the population kinetics of the tumor and the T- cells. Furthermore, the model predicted tumor killing kinetics in patients and offered suggestions for optimal dosing strategies in patients undergoing BiTE immunotherapy. The conclusions drawn from this combined approach are interesting and are supported by experiments and modeling reasonably well. However, the conclusions can be tightened further by making some moderate to minor changes in their approach. In addition, there are several limitations in the model which deserves some discussion.

      Strengths

      A major strength of this work is the ability of the model to integrate processes from the molecular scales to the populations of T cells, target cells, and the BiTE antibodies across different organs. A model of this scope has to contain many approximations and thus the model should be validated with experiments. The authors did an excellent job in comparing the basic and the in vitro aspects of their approach with in vitro data, where they compared the numbers of engaged target cells with T cells as the numbers of the BiTE molecules, the ratio of effector and target cells, and the expressions of the CD3 and CD19 receptors were varied. The agreement with the model with the data were excellent in most cases which led to several mechanistic conclusions. In particular, the study found that target cells with lower CD19 expressions escape the T cell killing.

      The in vivo extension of the model showed reasonable agreements with the kinetics of B cell populations in patients where the data were obtained from a published clinical trial. The model explained differences in B cell population kinetics between responders and non-responders and found that the differences were driven by the differences in the T cell numbers between the groups. The ability of the model to describe the in vivo kinetics is promising. In addition, the model leads to some interesting conclusions, e.g., the model shows that the bone marrow harbors tumor growth during the BiTE treatment. The authors then used the model to propose an alternate dosage scheme for BiTEs that needed a smaller dose of the drug.

      Thank you for the positive comments.

      Weaknesses

      There are several weaknesses in the development of the model. Multiscale models of this nature contain parameters that need to be estimated by fitting the model with data. Some these parameters are associated with model approximations or not measured in experiments. Thus, a common practice is to estimate parameters with some 'training data' and then test model predictions using 'test data'. Though Supplementary file 1 provides values for some of the parameters that appeared to be estimated, it was not clear which dataset were used for training and which for test. The confidence intervals of the estimated parameters and the sensitivity of the proposed in vivo dosage schemes to parameter variations were unclear.

      We agree with the reviewer on the model validation.

      Revision: To ensure reproducibility, we summarized model assumptions and parameter values/sources in the supplementary file 1. To mimic tumor heterogeneity and evolution process, we applied stochastic agent-based models, which are challenging to be globally optimized against the data. The majority of key parameters was obtained or derived from the literature. Details have been provided in the response to Reviewer 3 - Question 1. In our modeling process, we manually optimized sensitive coefficient (β) for base model using pilot in-vitro data and sensitive coefficient (β) for in-vivo model by re-calibrating against the in-vitro data at a low BiTE concentration. BiTE concentrations in patients (mostly < 2 ng/ml) is only relevant to the low bound of the concentration range we investigated in vitro (0.65-2000 ng/ml). We have added some clarification/limitation of this approach in the text (details are provided in the following question). We understand the concerns, but the agent-based modeling nature prevent us to do global optimization.

      The model appears to show few unreasonable behaviors and does not agree with experiments in several cases which could point to missing mechanisms in the model. Here are some examples. The model shows a surprising decrease in the T cell-target cell synapse formation when the affinity of the BiTEs to CD3 was increased; the opposite should have been more intuitive. The authors suggest degradation of CD3 could be a reason for this behavior. However, this probably could be easily tested by removing CD3 degradation in the model. Another example is the increase in the % of engaged effector cells in the model with increasing CD3 expressions does not agree well with experiments (Fig. 3d), however, a similar fold increase in the % of engaged effector cells in the model agrees better with experiments for increasing CD19 expressions (Fig. 3e). It is unclear how this can be explained given CD3 and CD19 appears to be present in similar copy numbers per cell (~104 molecules/cell), and both receptors bind the BiTE with high affinities (e.g., koff < 10-4 s-1).

      Thank you for pointing this out. The bidirectional effect of CD3 affinity on IS formation is counterintuitive. In a hypothetical situation when there is no CD3 downregulation, the bidirectional effect disappears (as shown below), consistent with our view that CD3 downregulation accounts for the counterintuitive behavior. We have included the simulation to support our point. From a conceptual standpoint, the inclusion of CD3 degradation means the way to maximize synapse formation is for the BiTE to first bind tumor antigen, after which the tumor-BiTE complex “recruits” a T cell through the CD3 arm.

      We agree that the model did not adequately capture the effect of CD3 expression at the highest BiTE concentration 100 ng/ml, while the effects at other BiTE concentrations were well captured (as shown below, left). The model predicted a much moderate effect of CD3 expression on IS formation at the highest concentration. This is partly because the model assumed rapid CD3 downregulation upon antibody engagement. We did a similar simulation as above, with moderate CD3 downregulation (as shown below, right). This increases the effect of CD3 expression at the highest BiTE concentration, consistent with experiments. Interestingly, a rapid CD3 downregulation rate, as we concluded, is required to capture data profiles at all other conditions. Considering BiTE concentration at 100 ng/ml is much higher than therapeutically relevant level in circulation (< 2 ng/ml), we did not investigate the mechanism underlying this inconsistent model prediction but we acknowledged the fact that the model under-predicted IS formation in Figure 3d. Notably, this discrepancy may rarely appear in our clinical predictions as the CD3 expression is low level and blood BiTE concentration is very low (< 2 ng/ml).

      Revision: we have made text adjustment to increase clarity on these points. In addition, we added: “The base model underpredicted the effect of CD3 expression on IS formation at 100 ng/ml BiTE concentration, which is partially because of the rapid CD3 downregulation upon BiTE engagement and assay variation across experimental conditions.”

      The model does not include signaling and activation of T cells as they form the immunological synapse (IS) with target cells. The formation IS leads to aggregation of different receptors, adhesion molecules, and kinases which modulate signaling and activation. Thus, it is likely the variations of the copy numbers of CD3, and the CD19-BiTE-CD3 will lead to variations in the cytotoxic responses and presumably to CD3 degradation as well. Perhaps some of these missing processes are responsible for the disagreements between the model and the data shown in Fig. 3. In addition, the in vivo model does not contain any development of the T cells as they are stimulated by the BiTEs. The differences in development of T cells, such as generation of dysfunctional/exhausted T cells could lead to the differences in responses to BiTEs in patients. In particular, the in vivo model does not agree with the kinetics of B cells after day 29 in non-responders (Fig. 6d); could the kinetics of T cell development play a role in this?

      We agree that intracellular signaling is critical to T cell activation and cytotoxic effects. IS formation, T cell activation, and cytotoxicity are a cascade of events with highly coordinated molecular and cellular interactions. Compared to the events of T cell activation and cytotoxicity, IS formation occurs at a relatively earlier time. As shown in our study, IS formation can occur at 2-5 min, while the other events often need hours to be observed. We found that IS formation is primarily driven by two intercellular processes: cell-cell encounter and cell-cell adhesion. The intracellular signaling would be initiated in the process of cell-cell adhesion or at the late stage of IS formation. We think these intracellular events are relevant but may not be the reason why our model did not adequately capture the profiles in Figure 3d at the highest BiTE concentrations. Therefore, we did not include intracellular signaling in the models. Another reason was that we simulated our models at an agent level to mimic the process of tumor evolution, which is computationally demanding. Intracellular events for each cell may make it more challenging computationally.

      T cell activation and exhaustion throughout the BiTE treatment is very complicated, time-variant and impacted by multiple factors like T cell status, tumor burden, BiTE concentration, immune checkpoints, and tumor environment. T cell proliferation and death rates are challenging to estimate, as the quantitative relationship with those factors is unknown. Therefore, T cell abundance (expansion) was considered as an independent variable in our model. T cell counts are measured in BiTE clinical trials. We included these data in our model to reveal expanded T cell population. Patients with high T cell expansion are often those with better clinical response. Notably, the T cell decline due to rapid redistribution after administration was excluded in the model. T cell abundance was included in the simulations in Figure 6 but not proof of concept simulations in Figure 7.

      In Figure 6d, kinetics of T cell abundance had been included in the simulations for responders and non-responders in MT103-211 study. Thus, the kinetics of T cell development can’t be used to explain the disagreement between model prediction and observation after day 29 in non-responders. The observed data is actually median values of B-cell kinetics in non-responders (N = 27) with very large inter-subject variation (baseline from 10-10000/μL), which makes it very challenging to be perfectly captured by the model. A lot of non-responders with severe progression dropped out of the treatment at the end of cycle 1, which resulted in a “more potent” efficacy in the 2nd cycle. This might be main reason for the disagreement.

      Variation in cytotoxic response was not included in our models. Tumor cells were assumed to be eradicated after the engagement with effecter cells, no killing rate or killing probability was implemented. This assumption reduced the model complexity and aligned well with our in-vitro and clinical data. Cytotoxic response in vivo is impacted by multiple factors like copy number of CD3, cytokine/chemokine release, tumor microenvironment and T cell activation/exhaustion. For example, the cytotoxic response and killing rate mediated by 1:1 synapse (ET) and other variants (ETE, TET, ETEE, etc.) are supposed to be different as well. Our model did not differentiate the killing rate of these synapse variants, but the model has quantified these synapse variants, providing a framework for us to address these questions in the future. We agree that differentiate the cytotoxic responses under different scenarios cell may improve model prediction and more explorations need to be done in the future.

      Revision: We added a discussion of the limitations which we believe is informative to future studies.

      “Our models did not include intracellular signaling processes, which are critical for T activation and cytotoxicity. However, our data suggests that encounter and adhesion are more relevant to initial IS formation. To make more clinically relevant predictions, the models should consider these intracellular signaling events that drive T cell activation and cytotoxic effects. Of note, we did consider the T cell expansion dynamics in organs as independent variable during treatment for the simulations in Figure 6. T cell expansion in our model is case-specific and time-varying.”

      References:

      Chen W, Yang F, Wang C, Narula J, Pascua E, Ni I, Ding S, Deng X, Chu ML, Pham A, Jiang X, Lindquist KC, Doonan PJ, Blarcom TV, Yeung YA, Chaparro-Riggers J. 2021. One size does not fit all: navigating the multi-dimensional space to optimize T-cell engaging protein therapeutics. MAbs 13:1871171. DOI: 10.1080/19420862.2020.1871171, PMID: 33557687

      Dang K, Castello G, Clarke SC, Li Y, AartiBalasubramani A, Boudreau A, Davison L, Harris KE, Pham D, Sankaran P, Ugamraj HS, Deng R, Kwek S, Starzinski A, Iyer S, Schooten WV, Schellenberger U, Sun W, Trinklein ND, Buelow R, Buelow B, Fong L, Dalvi P. 2021. Attenuating CD3 affinity in a PSMAxCD3 bispecific antibody enables killing of prostate tumor cells with reduced cytokine release. Journal for ImmunoTherapy of Cancer 9:e002488. DOI: 10.1136/jitc-2021-002488, PMID: 34088740

      Gong C, Anders RA, Zhu Q, Taube JM, Green B, Cheng W, Bartelink IH, Vicini P, Wang BPopel AS. 2019. Quantitative Characterization of CD8+ T Cell Clustering and Spatial Heterogeneity in Solid Tumors. Frontiers in Oncology 8:649. DOI: 10.3389/fonc.2018.00649, PMID: 30666298

      Mejstríková E, Hrusak O, Borowitz MJ, Whitlock JA, Brethon B, Trippett TM, Zugmaier G, Gore L, Stackelberg AV, Locatelli F. 2017. CD19-negative relapse of pediatric B-cell precursor acute lymphoblastic leukemia following blinatumomab treatment. Blood Cancer Journal 7: 659. DOI: 10.1038/s41408-017-0023-x, PMID: 29259173

      Samur MK, Fulciniti M, Samur AA, Bazarbachi AH, Tai YT, Prabhala R, Alonso A, Sperling AS, Campbell T, Petrocca F, Hege K, Kaiser S, Loiseau HA, Anderson KC, Munshi NC. 2021. Biallelic loss of BCMA as a resistance mechanism to CAR T cell therapy in a patient with multiple myeloma. Nature Communications 12:868. DOI: 10.1038/s41467-021-21177-5, PMID: 33558511

      Xu X, Sun Q, Liang X, Chen Z, Zhang X, Zhou X, Li M, Tu H, Liu Y, Tu S, Li Y. 2019. Mechanisms of relapse after CD19 CAR T-cell therapy for acute lymphoblastic leukemia and its prevention and treatment strategies. Frontiers in Immunology 10:2664. DOI: 10.3389/fimmu.2019.02664, PMID: 31798590

      Yoneyama T, Kim MS, Piatkov K, Wang H, Zhu AZX. 2022. Leveraging a physiologically-based quantitative translational modeling platform for designing B cell maturation antigen-targeting bispecific T cell engagers for treatment of multiple myeloma. PLOS Computational Biology 18: e1009715. DOI: 10.1371/journal.pcbi.1009715, PMID: 35839267

    1. Author Response

      Reviewer #1 (Public Review):

      This study examines the factors underlying the assembly of MreB, an actin family member involved in mediating longitudinal cell wall synthesis in rod-shaped bacteria. Required for maintaining rod shape and essential for growth in model bacteria, single molecule work indicates that MreB forms treadmilling polymers that guide the synthesis of new peptidoglycan along the longitudinal cell wall. MreB has proven difficult to work with and the field is littered with artifacts. In vitro analysis of MreB assembly dynamics has not fared much better as helpfully detailed in the introduction to this study. In contrast to its distant relative actin, MreB is difficult to purify and requires very specific conditions to polymerize that differ between groups of bacteria. Currently, in vitro analysis of MreB and related proteins has been mostly limited to MreBs from Gram-negative bacteria which have different properties and behaviors from related proteins in Gram-positive organisms.

      Here, Mao and colleagues use a range of techniques to purify MreB from the Gram-positive organism Geobacillus stearothermophilus, identify factors required for its assembly, and analyze the structure of MreB polymers. Notably, they identify two short hydrophobic sequences-located near one another on the 3-D structure-which are required to mediate membrane anchoring.

      With regard to assembly dynamics, the authors find that Geobacillus MreB assembly requires both interactions with membrane lipids and nucleotide binding. Nucleotide hydrolysis is required for interaction with the membrane and interaction with lipids triggers polymerization. These experiments appear to be conducted in a rigorous manner, although the salt concentration of the buffer (500mM KCl) is quite high relative to that used for in vitro analysis of MreBs from other organisms. The authors should elaborate on their decision to use such a high salt buffer, and ideally, provide insight into how it might impact their findings relative to previous work.

      Response 1.1. MreB proteins are notoriously difficult to maintain in a soluble form. Some labs deleted the N-terminal amphipathic or hydrophobic sequences to increase solubility, while other labs used full-length protein but high KCl concentration (300 mM KCl) (Harne et al, 2020; Pande et al., 2022; Popp et al, 2010; Szatmari et al, 2020). Early in the project, we tested many conditions and noticed that high KCl helped keeping a slightly better solubility of full length MreBGs, without the need for deleting a part of the protein. In addition, concentrations of salt > 100 mM would better mimic the conditions met by the protein in vivo. While 50-100 mM KCl is traditionally used in actin polymerization assays, physiological salt concentrations are around 100-150 mM KCl in invertebrates and vertebrates (Schmidt-Nielsen, 1975), around 50-250 in fungal and plant cells (Rodriguez-Navarro, 2000) and 200-300 mM in the budding yeast (Arino et al, 2010). However, cytoplasmic K+ concentration varies greatly (up to 800 mM) depending on the osmolality of the medium in both E. coli (Cayley et al, 1991; Epstein & Schultz, 1965; Rhoads et al, 1976), and B. subtilis, in which the basal intracellular concentration of KCl was estimated to be ~ 350 mM (Eisenstadt, 1972; Whatmore et al, 1990). 500 mM KCl can therefore be considered as physiological as 100 mM KCl for bacterial cells. Since we observed plenty of pairs of protofilaments at 500 mM KCl and this condition helped to avoid aggregation, we kept this high concentration as a standard for most of our experiments. Nonetheless, we had also performed TEM polymerization assays at 100 mM in line with most of MreB and F-actin in vitro literature, and found no difference in the polymerization (or absence of polymerization) conditions. This was indicated in the initial submission (e.g. M&M section L540 and footnote of Table S2) but since two reviewers bring it up as a main point, it is evident we failed at communicating it clearly, for which we apologize. This has been clarified in the revised version of the manuscript. We have also almost systematically added the 100 mM KCl concentration too as per reviewer #2 request and to conciliate our salt conditions with those used for some in vitro analysis of MreBs from other organisms (see also response to reviewer #2 comments 1A and 1B = Responses 2.1A, 2.1B below). We then decided to refer to the 100 mM KCl concentration as our “standard condition” in the revised version of the manuscript, but we compile and compare the results obtained at 500 mM too, as both concentrations are within the physiological range in Bacillus.

      Additionally, this study, like many others on MreB, makes much of MreB's relationship to actin. This leads to confusion and the use of unhelpful comparisons. For example, MreB filaments are not actin-like (line 58) any more than any polymer is "actin-like." As evidenced by the very beautiful images in this manuscript, MreB forms straight protofilaments that assemble into parallel arrays, not the paired-twisted polymers that are characteristic of F-actin. Generally, I would argue that work on MreB has been hindered by rather than benefitted from its relationship to actin (E.g early FP fusion data interpreted as evidence for an MreB endoskeleton supporting cell shape or depletion experiments implicating MreB in chromosome segregation) and thus such comparisons should be avoided unless absolutely necessary.

      Response 1.2. We completely agree with reviewer #1 regarding unhelpful comparisons of actin and MreB, and that work on MreB has been traditionally hindered from its relationship to eukaryotic actin. MreB is nonetheless a structural homolog of actin, with a close structural fold and common properties (polymerization into pairs of protofilaments, ATPase activity…). It still makes sense to refer to a protein with common features, common ancestry and widely studied as long as we don’t enclose our mind into a conceptual framework. This said, actin and MreB diverged very early in evolution, which may account for differences in their biochemical properties and cellular functions. Current data on MreB filaments confirm that they display F-actin-like and F-actin-unlike properties. We thank the reviewer for this insightful comment. We have revised the text to remove any inaccurate or unhelpful comparison to actin (in particular the ‘actin-like filaments’ statement, previously used once)

      Reviewer #2 (Public Review):

      The paper "Polymerization cycle of actin homolog MreB from a Gram-positive bacterium" by Mao et al. provides the second biochemical study of a gram-positive MreB, but importantly, the first study examines how gram-positive MreB filaments bind to membranes. They also show the first crystal structure of a MreB from a Gram-positive bacterium - in two nucleotide-bound forms, finally solving structures that have been missing for too long. They also elucidate what residues in Geobacillus MreB are required for membrane associations. Also, the QCM-D approach to monitoring MreB membrane associations is a direct and elegant assay.

      While the above findings are novel and important, this paper also makes a series of conclusions that run counter to multiple in vitro studies of MreBs from different organisms and other polymers with the actin fold. Overall, they propose that Geobacillus MreB contains biochemical properties that are quite different than not only the other MreBs examined so far but also eukaryotic actin and every actin homolog that has been characterized in vitro. As the conclusions proposed here would place the biochemical properties of Geobacillus MreB as the sole exception to all other actin fold polymers, further supporting experiments are needed to bolster these contrasting conclusions and their overall model.

      Response 2.0. We are grateful to reviewer #2 for stressing out the novelty and importance of our results. Most of our conclusions were in line with previous in vitro studies of MreBs (formation of pairs of straight filaments on a lipid layer, both ATP and GTP binding and hydrolysis, distortion of liposomes…), to the exception of the claimed requirement of NTP hydrolysis for membrane binding prior to polymerization based on the absence of pairs of filaments in free solution or in the presence of AMP-PNP in our experimental conditions (which we agree was not sufficient to make such a bold claim, see below). Thanks to the reviewer’s comments, we have performed many controls and additional experiments that lead us to refine our results and largely conciliate them with the literature. Please see the answer to the global review comments - our conclusions have been revised on the basis of our new data.

      1. (Difference 1) - The predominant concern about the in vitro studies that makes it difficult to evaluate many of their results (much less compare them to other MreB/s and actin homologs) is the use of a highly unconventional polymerization buffer containing 500(!) mM KCL. As has been demonstrated with actin and other polymers, the high KCl concentration used here (500mM) is certain to affect the polymerization equilibria, as increasing salt increases the hydrophobic effect and inhibits salt bridges, and therefore will affect the affinity between monomers and filaments. For example, past work has shown that high salt greatly changes actin polymerization, causing: a decreased critical concentration, increased bundling, and a greatly increased filament stiffness (Kang et al., 2013, 2012). Similarly, with AlfA, increased salt concentrations have been shown to increase the critical concentration, decrease the polymerization kinetics, and inhibit the bundling of AlfA filaments (Polka et al., 2009).

      A more closely related example comes from the previous observation that increasing salt concentrations increasingly slow the polymerization kinetics of B. subtilis MreB (Mayer and Amann, 2009). Lastly, These high salt concentrations might also change the interactions of MreB(Gs) with the membrane by screening charges and/or increasing the hydrophobic effect. Given that 500mM KCl was used throughout this paper, many (if not all) of the key experiments should be repeated in more standard salt concentration (~100mM), similar to those used in most previous in vitro studies of polymers.

      Response 2.1A. As per reviewer #2 request, we have done at 100 mM KCl too most experiments (TEM, cryo-EM, QCMD and ATPase assays) initially performed at 500 mM KCl only. The KCl concentration affects both membrane binding and filament stiffness as anticipated by the reviewer but the main conclusions are the same. The revised version of the manuscript compiles and compares the results obtained at both high and low [KCl], both concentrations being within the physiological range in Bacillus. Please see point 1 of the response to the global review comments and the first response to reviewer 1 (Response 1.1) for further elaboration.

      Please note that in Mayer & Amann, 2009 (B. subtilis MreB), light scattering in free solution was inversely proportional to the KCl concentration, with the higher light scattering signal at 0 mM KCl (!), a > 2-fold reduction below 30 mM KCl and no scatter at all at 250 mM, suggesting a “salting in” phenomenon (see also the “Other Points to address” answers 1A and 2, below) (Mayer & Amann, 2009). Since no effective polymer formation (e.g. polymers shown by EM) was demonstrated in these experiments, it cannot be excluded that KCl was simply preventing aggregation of B. subtilis MreB in solution, as we observe. For all their other light scattering experiments, the ‘standard polymerization condition’ used by Mayer & Amann was 0.2 mM ATP, 5 mM MgCl2, 1 mM EGTA and 10 mM imidazole pH 7.0, to which MreB (in 5 mM Tris pH 8.0) was added. No KCl was present in their ‘standard’ polymerization conditions.

      This would test if the many divergent properties of MreB(Gs) reported here arise from some difference in MreB(Gs) relative to other MreBs (and actin homologs), or if they arise from the 400mM difference in salt concentration between the studies. Critically, it would also allow direct comparisons to be made relative to previous studies of MreB (and other actin homologs) that used much lower salt, thereby allowing them to definitively demonstrate whether MreB(Gs) is indeed an outlier relative to other MreB and actin homologs. I would suggest using 100mM KCL, as historically, all polymerization assays of actin and numerous actin homologs have used 50-100mM KCL: 50mM KCl (for actin in F buffer) or 100mM KCl for multiple prokaryotic actin homologs and MreB (Deng et al., 2016; Ent et al., 2014; Esue et al., 2006, 2005; Garner et al., 2004 ; Polka et al., 2009 ; Rivera et al., 2011 ; Salje et al., 2011). Likewise, similar salt concentrations are standard for tubulin (80 mM K-Pipes) and FtsZ (100 mM KCl or 100mM KAc in HMK100 buffer).

      Response 2.1B. We appreciate the reviewer’s feedback on this point. Please note that, although actin polymerization assays are historically performed at 50-100 mM KCl and thus 100 mM KCl was used for other bacterial actin homologs (MamK, ParM and AlfA), MreB polymerization assays have previously been reported at 300 mM KCl too (Harne et al., 2020; Pande et al., 2022; Popp et al., 2010; Szatmari et al., 2020), which is closer to the physiological salt concentration in bacterial cells (see Response 1.1), but also in the absence of KCl (see above). As a matter of fact, we originally wanted to use a “standard polymerization condition” based on the literature on MreB, before realizing there was none: only half used KCl (the other half used NaCl, or no monovalent salt at all) and among these, KCl concentrations varied (out of 8 publications, 2 used 20 mM KCl, 2 used 50 mM KCl and 4 used 300 mM KCl).

      1. (Difference 2) - One of the most important differences claimed in this paper is that MreB(Gs) filaments are straight, a result that runs counter to the curved T. Maritima and C. crescentus filaments detailed by the Löwe group (Ent et al., 2014; Salje et al., 2011). Importantly, this difference could also arise from the difference in salt concentrations used in each study (500mM here vs. 100mM in the Löwe studies), and thus one cannot currently draw any direct comparisons between the two studies.

      One example of how high salt could be causing differences in filament geometry: high salts are known to greatly increase the bending stiffness of actin filaments, making them more rigid (Kang et al., 2013). Likewise, increasing salt is known to change the rigidity of membranes. As the ability of filaments to A) bend the membrane or B) Deform to the membrane depends on the stiffness of filaments relative to the stiffness of the membrane, the observed difference in the "straight vs. curved" conformation of MreB filaments might simply arise from different salt concentrations. Thus, in order to draw several direct comparisons between their findings and those of other MreB orthologs (as done here), the studies of MreB(GS) confirmations on lipids should be repeated at the same buffer conditions as used in the Löwe papers, then allowing them to be directly compared.

      Response 2.2. We fully agreed with reviewer #2 that the salts could be affecting the assay and did cryo-EM experiments also in the presence of 100 mM KCl as requested. The results unambiguously showed countless curved liposomes on the contact areas with MreB (Fig. 2F-G and Fig. 2-S5), very similar to what was reported for Thermotoga and Caulobacter MreBs by the Lowe group. Our results therefore confirm the previous findings that MreBs can bend lipids, and suggest that, indeed, high salt may increase filament stiffness as it has been shown for actin filaments. We are very grateful to reviewer #2 for his suggestion and for drawing our attention to the work of Kang et al, 2013. The different bending observed when varying the salt concentration raise relevant questions regarding the in vivo behavior of MreB, since KCl was shown to vary greatly depending on the medium composition. The manuscript has been updated accordingly in the Results (from L243) and Discussion sections (L585-595).

      1. (Difference 3) - The next important difference between MreB(Gs) and other MreBs is the claim that MreB polymers do not form in the absence of membranes.

      A) This is surprising relative to other MreBs, as MreBs from 1) T. maritime (multiple studies), E.coli (Nurse and Marians, 2013), and C. crescentus (Ent et al., 2014) have been shown to form polymers in solution (without lipids) with electron microscopy, light scattering, and time-resolved multi-angle light scattering. Notably, the Esue work was able to observe the first phase of polymer formation and a subsequent phase of polymer bundling (Esue et al., 2006) of MreB in solution. 2) Similarly, (Mayer and Amann, 2009) demonstrated B. subtilis MreB forms polymers in the absence of membranes using light scattering.

      Response 2.3A. The literature does convincingly show that Thermotoga MreB forms polymers in solution, without lipids (note that for Caulobacter MreB filaments were only reported in the presence of lipids, (van den Ent et al, 2014)). Assemblies reported in solution are bundles or sheets (included in at the earlier time points in the time-resolved EM experiments reported by Esue et al. 2006 mentioned by the reviewer – ‘2 minutes after adding ATP, EM revealed that MreB formed short filamentous bundles’) (Esue et al, 2006). However, and as discussed above (Response 2.1A), the light scattering experiments in Mayer et Amann, 2009 do not conclusively demonstrate the presence of polymers of B. subtilis MreB in solution (Mayer & Amann, 2009). We performed many light scattering experiments of B. subtilis MreB in solution in the past (before finding out that filaments were only forming in the presence of lipids), and got similar scattering curves (see two examples of DLS experiments in Author response image 1) in conditions in which NO polymers could ever been observed by EM while plenty of aggregates were present.

      Author response image 1.

      We did not consider these results publishable in the absence of true polymers observed by TEM. As pointed out on the interesting study from Nurse et al. (on E. coli MreB) (Nurse & Marians, 2013), one cannot rely only on light scattering only because non-specific aggregates would show similar patterns than polymers. Over the last two decades, about 15 publications showed polymers of MreB from several Gram-negative species, while none (despite the efforts of many) showed a single convincing MreB polymer from a Gram-positive bacterium by EM. A simple hypothesis is that a critical parameter was missing, and we present convincing evidence that lipids are critical for Geobacillus MreB to form pairs of filaments in the conditions tested. However, in solution too we do occasionally see pairs of filaments (Fig 2-S2), and also sheet-like structures among aggregates when the concentration of MreB is increased (Fig. 2-S2 and Fig. 3-S2). Thus, we agree with the reviewer that it cannot be claimed that Geobacillus MreB is unable to polymerize in the absence of lipids, but rather that lipids strongly stimulate its polymerization, condition depending.

      B) The results shown in figure 5A also go against this conclusion, as there is only a 2-fold increase in the phosphate release from MreB(Gs) in the presence of membranes relative to the absence of membranes. Thus, if their model is correct, and MreB(Gs) polymers form only on membranes, this would require the unpolymerized MreB monomers to hydrolyze ATP at 1/2 the rate of MreB in filaments. This high relative rate of hydrolysis of monomers compared to filaments is unprecedented. For all polymers examined so far, the rate of monomer hydrolysis is several orders of magnitude less than that of the filament. For example, actin monomers are known to hydrolyze ATP 430,000X slower than the monomers inside filaments (Blanchoin and Pollard, 2002; Rould et al., 2006).

      Response 2.3B. We agree with the reviewer. We have now found conditions where sheets of MreB form in solution (at high MreB concentration) in the presence of ADP and AMP-PNP. However, we have now added several controls that exclude efficient formation of polymers in solution in the presence of ATP at low concentrations of MreBGs (≤ 1.5 µM), the condition used for the malachite green assays. At these MreB concentrations, pairs of filaments are observed in the presence of lipids, but very unfrequently in solution, and sheets are not observed in solution either (Fig. 2-S2A, B). Yet, albeit puzzling, in these conditions Pi release is reproducibly observed in solution, reduced only ~ 2 to 3-fold relative to Pi release in the presence of lipids (Fig. 5A and Fig. 5-S1). A reinforcing observation is when the ATPase assays is performed at 100 mM KCl (Fig. 5A). In this condition MreB binding to lipids is increased relative to 500 mM KCl (Fig. 4-S4C), and the stimulation of the ATPase activity by the presence of lipids is also stronger that at 500 mM (Fig. 5-S1A). Further work is needed to characterize in detail the ATPase activity of MreB proteins, for which data in the literature is very scarce. We can’t exclude that MreB could nucleate in solution or form very unstable filaments that cannot be seen in our EM assay but consume ATP in the process. At the moment, the significance of the Pi released in solution is unknown and will require further investigation.

      C) Thus, there is a strong possibility that MreB(Gs) polymers are indeed forming in solution in addition to those on the membrane, and these "solution polymers" may not be captured by their electron microscopy assay. For example, high salt could be interfering with the absorption of filaments to glow discharged lacking lipids.

      Response 2.3C. We appreciate the reviewer’s insight about this critical point. Polymers presented in the original Fig. 2A were obtained at 500 mM KCl but we had tested the polymerization of MreB at 100 mM KCl as well, without noticing differences. We have nonetheless redone this quantitatively and used these data for the revised Fig. 2A, as we are now using 100 mM KCl as our standard polymerization condition throughout the revised manuscript. We also followed the other suggestion of the reviewer and tested glow discharged grids (a more classic preparation for soluble proteins) vs non-glow discharged EM grids, as well as a higher concentration of MreB. Grids are generally glow-discharged to make them hydrophilic in order to adsorb soluble proteins, but the properties of MreB (soluble but obviously presenting hydrophobic domains) made difficult to predict what support putative soluble polymers would preferentially interact with. Septins for example bind much better to hydrophobic grids despite their soluble properties (I. Adriaans, personal communication). Virtually no double filaments were observed in solution at either low or high [MreB]. The fact that in some conditions (high [MreB], other nucleotides) we were able to detect sheet-like structures excluded a technical issue that would prevent the detection of existing but “invisible” polymers here. We have added these new data in Fig. 2-S2.

      As indicated above, the reviewer’s comments made us realize that we could not state or imply that MreB cannot polymerize in the absence of lipids. As a matter of fact, we always saw some random filaments in the EM fields, both in solution and in the presence of non-hydrolysable analogues, at very low frequency (Fig. 2A). And we do see now sheets at high MreB concentration (Fig. 2-S2B). We could be just missing the optimal conditions for polymerisation in solution, while our phrasing gave the impression that no polymers could ever form in the absence of ATP or lipids. Therefore, we have:

      1) analyzed all TEM data to present it as semi-quantitative TEM, using our methodology originally implemented for the analysis of the mutants

      2) reworked the text to remove any issuing statements and to indicate that MreBGs was only found to bind to a lipid monolayer as a double protofilament in the presence of ATP/GTP but that this does not exclude that filaments may also form in other conditions.

      In order to definitively prove that MreB(Gs) does not have polymers in solution, the authors should:

      i) conduct orthogonal experiments to test for polymers in solution. The simplest test of polymerization might be conducting pelleting assays of MreB(Gs) with and without lipids, sweeping through the concentration range as done in 2B and 5a.

      Response 2.3Ci. Following reviewer #2 suggestion, we conducted a series of sedimentation assays in the presence and in the absence of lipids, at low (100 mM) and high (500 mM) salt, for both the wild-type protein and the three membrane-anchoring mutants (all at 1.3 µM). Sedimentation experiments in salt conditions preventing aggregation in solution (500 mM KCl) fitted with our TEM results: MreB wild-type pelleting increased in the presence of both ATP and lipids (Fig. R1). The sedimentation was further increased at 100 mM KCl, which would fit our other results indicating an increased interaction of MreB with the membrane. However, in addition to be poorly reproducible (in our hands), the approach does not discriminate between polymers and aggregates (or monomers bound to liposomes) and since MreB has a strong tendency to aggregate, we believe that the technique is ill-suited to reliably address MreB polymerization and prefer not to include sedimentation data in our manuscript. The recent work from Pande et al. (2022) illustrates well this issue since no sedimentation of MreB (at 2 µM) was observed in solution in conditions supporting polymerization (at 300 mM KCl): ‘the protein does not pellet on its own in the absence of liposome, irrespective of its polymerization state’, implying that sedimentation does not allow to detect MreB5 filaments in solution (Pande et al., 2022).

      ii) They also could examine if they see MreB filaments in the absence of lipids at 100mM salt (as was seen in both Löwe studies), as the high salt used here might block the charges on glow discharged grids, making it difficult for the polymer to adhere.

      See above, Response 2.3C

      iii) Likewise, the claim that MreB lacking the amino-terminus and the α2β7 hydrophobic loop "is required for polymerization" is questionable as if deleting these resides blocks membrane binding, the lack of polymers on the membrane on the grid is not unexpected, as these filaments that cannot bind the membrane would not be observable. Given these mutants cannot bind the membrane, mutant polymers could still indeed exist in solution, and thus pelleting assays should be used to test if non-membrane associated filaments composed of these mutants do or do not exist.

      Response 2.3Ciii. This is a fair point, we thank the reviewer for this remark. We did not mean to state or imply that the hydrophobic loop was required for polymerization per se, but that polymerization into double filaments only efficiently occurs upon membrane binding, which is mediated by the two hydrophobic sequences. We tested all three mutants by sedimentation as suggested by reviewer #2. In the salt condition that limits aggregation (500 mM KCl) the mutants did not pellet while the wild-type protein did (in the presence of lipids) (Fig. R2 below), in agreement with our EM data. We tested the absence of lipids on the mutant bearing the 2 deletions and observed that the (partial) sedimentation observed at low KCl concentration was ATP and lipid dependent (Fig. R3).

      Given our concerns about MreB sedimentation assays (see above, Response 2.3Ci), we prefer not to include these sedimentation data in our manuscript. Instead, we tested by TEM the possible polymerization of the mutants in solution (we only tested them in the presence of lipids in the initial submission). No filaments were detected in solution for any of the mutants (Fig. 4-S3A).

      A final note, the results shown in "Figure 1 - figure supplement 2, panel C" appear to directly refute the claim that MreB(Gs) requires lipids to polymerize. As currently written, it appears they can observe MreB(Gs) filaments on EM grids without lipids. If these experiments were done in the presence of lipids, the figure legend should be updated to indicate that. If these experiments were done in the absence of lipids, the claim that membrane association is required for MreB polymerizations should be revised.

      The TEM experiments show were indeed performed in the presence of lipids. We apologize for this was not clearly stated in the legend. To prevent all confusion, we have nevertheless removed these images in this figure since the polymerization conditions and lipid requirement are not yet presented when this figure is referred to in the text. We have instead added a panel with the calibration curve for the size exclusion profiles as per request of reviewer #3. The main point of this figure is to show the tendency of MreBGs to aggregate: analytical size-exclusion chromatography shows a single peak corresponding to the monomeric MreBGs, molecular weight ~ 37 KDa, in our purification conditions, but it can readily shift to a peak corresponding to high MW aggregates, depending on the protein concentration and/or storage conditions.

      1. (Difference 4) - The next difference between this study and previous studies of MreB and actin homologs is the conclusion that MreB(Gs) must hydrolyze ATP in order to polymerize. This conclusion is surprising, given the fact that both T. Maritima (Salje · 2011, Bean 2008) and B. subtilis MreB (Mayer 2009) have been shown to polymerize in the presence of ATP as well as AMP-PNP.

      Likewise, MreB polymerization has been shown to lag ATP hydrolysis in not only T. maritima MreB (Esue 2005), eukaryotic actin, and all other prokaryotic actin homologs whose polymerization and phosphate release have been directly compared: MamK (Deng et al., 2016), AlfA (Polka et al., 2009), and two divergent ParM homologs (Garner et al., 2004; Rivera et al., 2011). Currently, the only piece of evidence supporting the idea that MreB(Gs) must hydrolyze ATP in order to polymerize comes from 2 observations: 1) using electron microscopy, they cannot see filaments of MreB(Gs) on membranes in the presence of AMP-PNP or ApCpp, and 2) no appreciable signal increase appears testing AMPPNP- MreB(Gs) using QCM-D. This evidence is by no means conclusive enough to support this bold claim: While their competition experiment does indicate AMPPNP binds to MreB(Gs), it is possible that MreB(Gs) cannot polymerize when bound to AMPPNP.

      For example, it has been shown that different actin homologs respond differently to different non-hydrolysable analogs: Some, like actin, can hydrolyze one ATP analog but not the other, while others are able to bind to many different ATP analogs but only polymerize with some of one of them.

      Response 2.4. We agree with the reviewer, it is uncertain what analogs bind because they are quite different to ATP and some proteins just do not like them, they can change conditions such that filaments stop forming as well and be (theoretically) misleading. This is why we had tested ApCpp in addition to AMP-PNP as non-hydrolysable analog (Fig. 3A). As indicated above, our new complementary experiments (Fig. 3-S1B-D) now show that some rare (i.e. unfrequently and in limited amount) dual polymers are detected in the presence of ApCpp (Fig. 3A) and at high MreB concentration only in the presence of AMP-PNP (Fig. 3-S1B-D), suggesting different critical concentrations in the presence of alternative nucleotides. We have dampened our conclusions, in the light of our new data, and modified the discussion accordingly.

      Thus, to further verify their "hydrolysis is needed for polymerization" conclusion, they should:

      A. Test if a hydrolysis deficient MreB(Gs) mutant (such as D158A) is also unable to polymerize by EM.

      Response 2.4A. We thank the reviewer for this suggestion. As this conclusion has been reviewed on the basis of our new data (see previous response), testing putative ATPase deficient mutants is no longer required here. The study of ATPase mutants is planned for future studies (see Response 3.10 to reviewer #3).

      B. They also should conduct an orthogonal assay of MreB polymerization aside from EM (pelleting assays might be the easiest). They should test if polymers of ATP, AMP-PNP, and MreB(Gs)(D158A) form in solution (without membranes) by conducting pelleting assays. These could also be conducted with and without lipids, thereby also addressing the points noted above in point 3.

      Response 2.4B. Please see Response 2.3Ci above.

      C. Polymers may indeed form with ATP-gamma-S, and this non-hydrolysable ATP analog should be tested.

      Response 2.4C. It is fairly possible that ATP-γ-S supports polymerization since it is known to be partially hydrolysable by actin giving a mild phenotype (Mannherz et al, 1975). This molecule can even be a bona fide substrate for some ATPases (e.g. (Peck & Herschlag, 2003). Thus, we decided to exclude this “non-hydrolysable” analog and tested instead AMP-PNP and ApCpp. We know that ATP-γ-S has been and it is still frequently used, but we preferred to avoid it for the moment for the above-indicated reasons. We chose AMPPNP and AMPPCP instead because (1) they were shown to be completely non-hydrolysable by actin, in contrast to ATP-γ-S; (2) they are widely used (the most commonly used for structural studies; (Lacabanne et al, 2020), (3) AMPPNP was previously used in several publications on MreB (Bean & Amann, 2008; Nurse & Marians, 2013; Pande et al., 2022; Popp et al., 2010; Salje et al, 2011; van den Ent et al., 2014)and thus would allow direct comparison. AMPPCP was added to confirm the finding with AMP-PNP. There are many other analogs that we are planning to explore in future studies (see next Response, 2.4D).

      D. They could also test how the ADP-Phosphate bound MreB(Gs) polymerizes in bulk and on membranes, using beryllium phosphate to trap MreB in the ADP-Pi state. This might allow them to further refine their model.

      Response 2.4D. We plan to address the question of the transition state in depth in following-up work, using a series of analogs and mutants presumably affected in ATPase activity, both predicted and identified in a genetic screen. As indicated above, it is uncertain what analogs bind because they are quite different to ATP and some may bind but prevent filament formation. Thus, we anticipate that trying just one may not be sufficient, they can change conditions and be (theoretically) misleading and thus a thorough analysis is needed to address this question. Since our model and conclusions have been revised on the basis of our new data, we believe that these experiments are beyond the scope of the current manuscript.

      E. Importantly, the Mayer study of B. subtilis MreB found the same results in regard to nucleotides, "In polymerization buffer, MreB produced phosphate in the presence of ATP and GTP, but not in ADP, AMP, GDP or AMP-PNP, or without the readdition of any nucleotide". Thus this paper should be referenced and discussed

      Response 2.4E. We agree that Pi release was detected previously. We have added the reference (L121)

      1. (Difference 5) - The introduction states (lines 128-130) "However, the need for nucleotide binding and hydrolysis in polymerization remains unclear due to conflicting results, in vivo and in vitro, including the ability of MreB to polymerize or not in the presence of ADP or the non-hydrolysable ATP analog AMP-PNP."

      A) While this is a great way to introduce the problem, the statement is a bit vague and should be clarified, detaining the conflicting results and appropriate references. For example, what conflicting in vivo results are they referring to? Regarding "MreB polymerization in AMP-PNP", multiple groups have shown the polymerization of MreB(Tm) in the presence of AMP-PNP, but it is not clear what papers found opposing results.

      Response 2.5A. Thanks for the comment. We originally did not detail these ‘conflicting results’ in the Introduction because we were doing it later in the text, with the appropriate references, in particular in the Discussion (former L433-442). We have now removed this from the Discussion section and added a sentence in the introduction too (L123-130) quickly detailing the discrepancies and giving the references.

      • For more clarity, we have removed the “in vivo” (which referred to the distinct results reported for the presumed ATPase mutants by the Garner and Graumann groups) and focus on the in vitro discrepancies only.

      • These discrepancies are the following: while some studies showed indeed polymerization (as assessed by EM) of MreBTm in the presence of AMPPNP, the studies from Popp et al and Esue et al on T. maritima MreB, and of Nurse et al on E. coli MreB reported aggregation in the presence of AMP-PNP (Esue et al., 2006; Popp et al., 2010) or ADP (Nurse & Marians, 2013), or no assembly in the presence of ADP (Esue et al., 2006). As for the studies reporting polymerization in the presence of AMP-PNP by light scattering only (Bean & Amann, 2008; Gaballah et al, 2011; Mayer & Amann, 2009; Nurse & Marians, 2013), they could not differentiate between aggregates or true polymers and thus cannot be considered conclusive.

      B) The statement "However, the need for nucleotide binding and hydrolysis in polymerization remains unclear due to conflicting results, in vivo and in vitro, including the ability of MreB to polymerize or not in the presence of ADP or the non-hydrolyzable ATP analog AMP-PNP" is technically incorrect and should be rephrased or further tested.

      i. For all actin (or tubulin) family proteins, it is not that a given filament "cannot polymerize" in the presence of ADP but rather that the ADP-bound form has a higher critical concentration for polymer formation relative to the ATP-bound form. This means that the ADP polymers can indeed polymerize, but only when the total protein exceeds the ADP critical concentration. For example, many actin-family proteins do indeed polymerize in ADP: ADP actin has a 10-fold higher critical concentration than ATP actin, (Pollard, 1984) and the ADP critical concentrations of AlfA and ParM are 5X and 50X fold higher (respectively) than their ATP-bound forms(Garner et al., 2004; Polka et al., 2009)

      Response 2.5Bi. Absolutely correct. We apologize for the lack of accuracy of our phrasing and have corrected it (L123).

      ii. Likewise, (Mayer and Amann, 2009) have already demonstrated that B. subtilis MreB can polymerize in the presence of ADP, with a slightly higher critical concentration relative to the ATP-bound form.

      Response 2.5Bii. In Mayer and Amann, 2009, the same light scattering signal (interpreted as polymerization) occurred regardless of the nucleotide, and also in the absence of nucleotide (their Fig. 10) and ATP-, ADP- and AMP-PNP-MreB ‘displayed nearly indistinguishable critical concentrations’. They concluded that MreB polymerization is nucleotide-independent. Please see below (responses to ’Other points to address’) our extensive answer to the Mayer & Amann recurring point of reviewer #2

      Thus, to prove that MreB(Gs) polymers do not form in the presence of ADP would require one to test a large concentration range of ADP-bound MreB(Gs). They should test if ADP- MreB(Gs) polymerizes at the highest MreB(Gs) concentrations that can be assayed. Even if this fails, it may be the MreB(Gs) ADP polymerizes at higher concentrations than is possible with their protein preps (13uM). An even more simple fix would be to simply state MreB(Gs)-ADP filaments do not form beneath a given MreB(Gs) concentration.

      We agree with the reviewer. Our wording was overstating our conclusions. Based on our new quantifications (Fig. 3-S1B, D), we have rephrased the results section and now indicate that pairs of filaments are occasionally observed in the presence of ADP in our conditions across the range of MreB concentration that could be tested, suggesting a higher critical concentration for MreB-ADP (L310-312). Only at the highest MreB concentration, sheet- and ribbon-like structures were observed in the presence of ADP (Fig. 3-S2B).

      Other Points to address:

      1) There are several points in this paper where the work by Mayer and Amann is ignored, not cited, or readily dismissed as "hampered by aggregation" without any explanation or supporting evidence of that fact.

      We have cited the Mayer study where appropriate. However, we cannot cite it as proof of polymerization in such or such condition since their approach does not show that polymers were obtained in their conditions. Again, they based all their conclusions solely on light scattering experiments, which cannot differentiate between polymers and aggregates.

      A) Lines 100-101 - While the irregular 3-D formations seen formed by MreB in the Dersch 2020 paper could be interpreted as aggregates, stating that the results from specifically the Gaballah and Meyer papers (and not others) were "hampered by aggregation" is currently an arbitrary statement, with no evidence or backing provided. Overall, these lines (and others in the paper) dismiss these two works without giving any evidence to that point. Thus, they should provide evidence for why they believe all these papers are aggregation, or remove these (and other) dismissive statements.

      We apologize if our statements about these reports seemed dismissive or disrespectful, it was definitely not our intention. Light scattering shows an increase of size of particles over time, but there is no way to tell if the scattering is due to organized (polymers) or disorganized (aggregation) assemblies. Thus, it cannot be considered a conclusive evidence of polymerization without the proof that true filaments are formed by the protein in the conditions tested, as confirmed by EM for example. MreB is known to easily aggregate (see our size exclusion chromatography profiles and ones from Dersch 2020 (Dersch et al, 2020), and note that no chromatography profiles were shown in the Mayer report) and, as indicated above, we had similar light scattering results for MreB for years, while only aggregates could be observed by TEM (see above Response 2.3A). Several observations also suggest that aggregation instead of polymerization might be at play in the Mayer study, for example ‘polymerization’ occurring in salt-less buffer but ‘inhibited’ with as low as 100 mM KCl, which should rather be “salting in” (see below). We did not intend to be dismissive, but it seemed wrong to report their conclusions as conclusive evidence. We thought that we had cited these papers where appropriate but then explained that they show no conclusive proof of polymerization and why, but it is evident that we failed at communicating it clearly. We have reworked the text to remove any issuing and arbitrary statement about our concerns regarding these reports (e.g. L93 & L126).

      One important note - There are 2 points indicating that dismissing the Meyer and Amann work as aggregation is incorrect:

      1) the Meyer work on B. subtilis MreB shows both an ATP and a slightly higher ADP critical concentration. As the emergence of a critical concentration is a steady-state phenomenon arising from the association/dissociation of monomers (and a kinetically limiting nucleation barrier), an emergent critical concentration cannot arise from protein aggregation, critical concentrations only arise from a dynamic equilibrium between monomer and polymer.

      • Critical concentration for ATP, ADP or AMPPNP were described in Mayer & Amann (Mayer & Amann, 2009) as “nearly indistinguishable” (see Response 2.5Bii)
      • Protein aggregation depends on the solution (pH and ions), protein concentration and temperature. And above a certain concentration, proteins can become instable, thus a critical concentration for aggregation can emerge.

      2) Furthermore, Meyer observed that increased salt slowed and reduced B. subtilis MreB light scattering, the opposite of what one would expect if their "polymerization signal" was only protein aggregation, as higher salts should increase the rate of aggregation by increasing the hydrophobic effect.

      It is true that at high salt concentration proteins can precipitate, a phenomenon described as “salting out”. However, it is also true that salts help to solubilize proteins (“salting in”), and that proteins tend to precipitate in the absence of salt. Considering that the starting point of the Mayer and Amann experiment (Mayer & Amann, 2009) is the absence of salt (where they observed the highest scattering) and that they gradually reduce this scattering by increasing KCl (the scattering is almost abolished below 100 mM only!) it is plausible that a salting-in phenomenon might be at play, due to increased solubility of MreB by salt. In any case, this cannot be taken as a proof that polymerization rather than aggregation occurred.

      B) Lines 113-137 -The authors reference many different studies of MreB, including both MreB on membranes and MreB polymerized in solution (which formed bundles). However, they again neglect to mention or reference the findings of Meyer and Amann (Mayer and Amann, 2009), as it was dismissed as "aggregation". As B. subtilis is also a gram-positive organism, the Meyer results should be discussed.

      We did cite the Mayer and Amann paper but, as explained above, we cannot cite this study as an example of proven polymerization. We avoided as much as possible to polemicize in the text and cited this paper when possible. Again, we have reworked the text to avoid any issuing or dismissive statement. Also, we forgot mentioned this study at L121 as an example of reported ATPase activity, and this has now been corrected.

      2) Lines 387-391 state the rates of phosphate release relative to past MreB findings: "These rates of Pi release upon ATP hydrolysis (~ 1 Pi/MreB in 6 min at 53{degree sign}C) are comparable to those observed for MreBTm and MreB(Ec) in vitro". While the measurements of Pi release AND ATP hydrolysis have indeed been measured for actin, this statement does not apply to MreB and should be corrected: All MreB papers thus far have only measured Pi release alone, not ATP hydrolysis at the same time. Thus, it is inaccurate to state "rates of Pi release upon ATP hydrolysis" for any MreB study, as to accurately determine the rate of Pi release, one must measure: 1. The rate of polymer over time, 2) the rate of ATP hydrolysis, and 3) the rate of phosphate release. For MreB, no one has, so far, even measured the rates of ATP hydrolysis and phosphate release with the same sample.

      We completely agree with the reviewer, we apologize if our formulation was inaccurate. We have corrected the sentence (L479). Thank you for pointing out this mistake.

      3) The interpretation of the interactions between monomers in the MreB crystal should be more carefully stated to avoid confusion. While likely not their intention, the discussions of the crystal packing contacts of MreB can appear to assume that the monomer-monomer contacts they see in crystals represent the contacts within actual protofilaments. One cannot automatically assume the observations of monomer-monomer contacts within a crystal reflect those that arise in the actual filament (or protofilament).

      We agree, we thank the reviewer for his comments. We have revamped the corresponding paragraph.

      A) They state, "the apo form of MreBGs forms less stable protofilaments than its G- homologs ." Given filaments of the Apo form of MreB(GS) or b. subtilis have never been observed in solution, this statement is not accurate: while the contacts in the crystal may change with and without nucleotide, if the protein does not form polymers in solution in the apo state, then there are no "real" apo protofilaments, and any statements about their stability become moot. Thus this statement should be rephrased or appropriately qualified.

      see above.

      B) Another example: while they may see that in the apo MreB crystal, the loop of domain IB makes a single salt bridge with IIA and none with IIB. This contrasts with every actin, MreB, and actin homolog studied so far, where domain IB interacts with IIB. This might reflect the real contacts of MreB(Gs) in the solution, or it may be simply a crystal-packing artifact. Thus, the authors should be careful in their claims, making it clear to the reader that the contacts in the crystal may not necessarily be present in polymerized filaments.

      Again, we agree with the reviewer, we cannot draw general conclusions about the interactions between monomers from the apo form. We have rephrased this paragraph.

      4) lines 201-202 - "Polymers were only observed at a concentration of MreB above 0.55 μM (0.02 mg/mL)". Given this concentration dependence of filament formation, which appears the same throughout the paper, the authors could state that 0.55 μM is the critical concentration of MreB on membranes under their buffer conditions. Given the lack of critical concentration measurement in most of the MreB literature, this could be an important point to make in the field.

      Following reviewer’s #2 suggestion, we have now estimated the critical concentration (Cc=0.4485 µM) and reported it in the text. (L218).

      5) Both mg/ml and uM are used in the text and figures to refer to protein concentration. They should stick to one convention, preferably uM, as is standard in the polymer field.

      Sorry for the confusion. We have homogenized to MreB concentrations to µM throughout the text and figures.

      6) Lines 77-78 - (Teeffelen et al., 2011) should be referenced as well in regard to cell wall synthesis driving MreB motion.

      This has been corrected, sorry for omitting this reference.

      7) Line 90 - "Do they exhibit turnover (treadmill) like actin filaments?". This phrase should be modified, as turnover and treadmilling are two very different things. Turnover is the lifetime of monomers in filaments, while treadmilling entails monomer addition at one end and loss at the other. While treadmilling filaments cause turnover, there are also numerous examples of non-treadmilling filaments undergoing turnover: microtubules, intermediate filaments, and ParM. Likewise, an antiparallel filament cannot directionally treadmill, as there is no difference between the two filament ends to confer directional polarity.

      This is absolutely true, we apologize for our mistake. The sentence has been corrected (L82).

      8) Throughout the paper, the term aggregation is used occasionally to describe the polymerization shown in many previous MreB studies, almost all of which very clearly showed "bundled" filaments, very distinct entities from aggregates, as a bundle of polymers cannot form without the filaments first polymerizing on their own. Evidence to this point, polymerization has been shown to precede the bundling of MreB(Tm) by (Esue et al., 2005).

      We agree with reviewer #2 about polymers preceding bundles and “sheets”. However, we respectfully disagree that we used the word aggregation “throughout the paper” to describe structures that clearly showed polymers or sheets of filaments. A search (Ctrl-F: “aggreg”) reveals only 6 matches, 3 describing our own observations (L152, 163/5, and 1023/28), one referring to (Salje et al., 2011) (L107) but citing her claim that they observed aggregation (due to the N-terminus), and the last two (L100, L440) refer (again) to the Gaballah/Mayer/Dersch publications to say that aggregation could not be excluded in these reports as discussed above (Dersch et al., 2020; Gaballah et al., 2011; Mayer & Amann, 2009).

      9) lines 106-108 mention that "The N-terminal amphipathic helix of E. coli MreB (MreBEc) was found to be necessary for membrane binding. " This is not accurate, as Salje observed that one single helix could not cause MreB to mind to the membrane, but rather, multiple amphipathic helices were required for membrane association (Salje et al., 2011).

      Salje et al showed that in vivo the deletion of the helix abolishes the association of MreB to the membrane. This publication also shows that in vitro, addition of the helix to GFP (not to MreB) prompts binding to lipid vesicles, and that this was increased if there are 2 copies of the helix, but they could not test this directly in vitro with MreB (which is insoluble when expressed with its N-terminus). This prompted them to speculate that multiple MreBs could bind better to the membrane than monomers. However, this remained to be demonstrated. Additional hydrophobic regions in MreB such as the hydrophobic loop could participate to membrane anchoring but are absent in their in vitro assays with GFP.

      The Salje results imply that dimers (or further assemblies) of MreB drive membrane association, a point that should be discussed in regard to the question "What prompts the assembly of MreB on the inner leaflet of the cytoplasmic membrane?" posed on lines 86-87.

      We agree that this is an interesting point. As it is consistent with our results, we have incorporated it to our model (Fig. 6) and we are addressing it in the discussion L573-575.

      10) On lines 414-415, it is stated, "The requirement of the membrane for polymerization is consistent with the observation that MreB polymeric assemblies in vivo are membrane-associated only." While I agree with this hypothesis, it must be noted that the presence or absence of MreB polymers in the cytoplasm has not been directly tested, as short filaments in the cytoplasm would diffuse very quickly, requiring very short exposures (<5ms) to resolve them relative to their rate of diffusion. Thus, cytoplasmic polymers might still exist but have not been tested.

      This is also an interesting point. Indeed if a nucleated form, or very short (unbundled) polymers exist in the cytoplasm, they have not been tested by fluorescence microscopy. However, the polymers that localize at the membrane (~ 200 nm), if soluble, would have been detected in the cytoplasm by the work of reviewer #2, us or others.

      11) lines 429-431 state, "but polymerization in the presence of ADP was in most cases concluded from light scattering experiments alone, so the possibility that aggregation rather than ordered polymerization occurred in the process cannot be excluded."

      A) If an increased light scattering signal is initiated by the addition of ADP (or any nucleotide), that signal must come from polymerization or multimerization. What the authors imply is that there must be some ADP-dependent "aggregation" of MreB, which has not been seen thus far for any polymer. Furthermore, why would the addition of ADP initiate aggregation?

      We did not mean that ADP itself would prompt aggregation, but that the protein would aggregate in the buffer regardless of the presence of ADP or other nucleotides. The Mayer & Amann study claims that MreB “polymerization” is nucleotide-independent, as they got identical curves with ATP, ADP, AMPPNP and even with no nucleotides at all (Fig. 10 in their paper, pasted here) (Mayer & Amann, 2009).

      Their experiments with KCl are also remarkable as when they lowered the salt they got faster and faster “polymerization”, with the strongest light scattering signal in the absence of any salt. The high KCl concentration in which they got almost no more “polymers” was 75 mM KCl, and ‘polymerization was almost entirely inhibited at 100 mM’ (Fig. 7, pasted below). Yet the intracellular level of KCl in bacteria is estimated to be ~300 mM (see Response 1.1)

      B) Likewise, the statement "Differences in the purity of the nucleotide stocks used in these studies could also explain some of the discrepancies" is unexplained and confusing. How could an impurity in a nucleotide stock affect the past MreB results, and what is the precedent for this claim?

      We meant that the presence of ATP in the ADP stocks might have affected the outcome of some assays, generating the conflicting results existing in the literature. We agree this sentence was confusing, we have removed it.

      12) lines 467-469 state, "Thus, for both MreB and actin, despite hydrolyzing ATP before and after polymerization, respectively, the ADP-Pi-MreB intermediate would be the long-lived intermediate state within the filaments."

      A) For MreB, this statement is extremely speculative and unbiased, as no one has measured 1) polymerization, 2) ATP hydrolysis, and 3) phosphate release. For example, it could be that ATP hydrolysis is slow, while phosphate release is fast, as is seen in the actin from Saccharomyces cerevisiae.

      We agree that this was too speculative. This has been removed from the (extensively) modified Discussion section. Thanks for the comment.

      B) For actin, the statement of hydrolysis of ATP of monomer occurring "before polymerization" is functionally irrelevant, as the rate of ATP hydrolysis of actin monomers is 430,000 times slower than that of actin monomers inside filaments (Blanchoin and Pollard, 2002; Rould et al., 2006).

      We agree that the difference of hydrolysis rate between G-actin and F-actin implies that ATP hydrolysis occurs after polymerization. We are afraid that we do not follow the reviewer’s point here, we did not say or imply that ATP hydrolysis by actin monomers was functionally relevant.

      13) Lines 442-444. "On the basis of our data and the existing literature, we propose that the requirement for ATP (or GTP) hydrolysis for polymerization may be conserved for most MreBs." Again, this statement both here (and in the prior text) is an extremely bold claim, one that runs contrary to a large amount of past work on not just MreB, but also eukaryotic actin and every actin homolog studied so far. They come to this model based on 1) one piece of suggestive data (the behavior of MreB(GS) bound to 2 non-hydrolysable ATP analogs in 500mM KCL), and 2) the dismissal (throughout the paper) of many peer-reviewed MreB papers that run counter to their model as "aggregation" or "contaminated ATP stocks ." If they want to make this bold claim that their finding invalidates the work of many labs, they must back it up with further validating experiments.

      We respectfully disagree that our model was based on “one piece of suggestive data” and backed-up by dismissing most past work in the field. We only wanted to raise awareness about the conflicting data between some reports (listed in response 2.5a), and that the claims made by some publications are to be taken with caution because they only rely on light scattering or, when TEM was performed, showed only disorganized structures.

      This said, we clearly failed in proposing our model and we are sorry to see that we really annoyed the reviewer with our suspicion that the work by Mayer & Amann reports aggregation. As indicated above, we have amended our manuscript relative to this point. We also agree that our suggestion to generalize our findings to most MreBs was unsupported, and overstated considering how confusing some result from the literature are. We have refined our model and reworked the text to take on board the reviewer’s remarks as well as the new data generated during the revision process.

      We would like to thank reviewer #2 for his in-depth review of our manuscript.  

      Reviewer #3 (Public Review):

      The major claim from the paper is the dependence of two factors that determine the polymerization of MreB from a Gram-positive, thermophilic bacteria 1) The role of nucleotide hydrolysis in driving the polymerization. 2) Lipid bilayer as a facilitator/scaffold that is required for hydrolysis-dependent polymerization. These two conclusions are contrasting with what has been known until now for the MreB proteins that have been characterized in vitro. The experiments performed in the paper do not completely justify these claims as elaborated below.

      We understand the reviewer’ concerns in view of the existing literature on actin and Gram-negative MreBs. We may just be missing the optimal conditions for polymerisation in solution, while our phrasing gave the impression that polymers could never form in the absence of ATP or lipids. Our new data actually shows that MreBGs at higher concentration can assemble into bundle- and sheet-like structures in solution and in the presence of ADP/AMP-PNP. Pairs of filaments are however only observed in the presence of lipids for all conditions tested. As indicated in the answers to the global review comments, we have included our new data in the manuscript, revised our conclusions and claims about the lipid requirement and expanded on these points in the Discussion.

      Major comments:

      1) No observation of filaments in the absence of lipid monolayer can also be accounted due to the higher critical concentration of polymerization for MreBGS in that condition. It is seen that all the negative staining without lipid monolayer condition has been performed at a concentration of 0.05 mg/mL. It is important to check for polymerization of the MreBGS at higher concentration ranges as well, in order to conclusively state the requirement of lipids for polymerization.

      Response 3.1. 0.05 mg/ml (1.3µM) is our standard condition, and our leeway was limited by the rapid aggregation observed at higher MreB concentrations, as indicated in the text. We have now tested as well 0.25 mg/ml (6.5 µM - the maximum concentration possible before major aggregation occurs in our experimental conditions). At this higher concentration, we see some sheet-like structures in solution, confirming a requirement of a higher concentration of MreB for polymerization in these conditions (see the answers to the global review comments for more details)

      We thank the reviewer for pushing us to address this point. We have revised our conclusions accordingly.

      2) The absence of filaments for the non-hydrolysable conditions in the lipid layer could also be because the filaments that might have formed are not binding to the planar lipid layer, and not necessarily because of their inability to polymerize.

      Response 3.2. This is a fair point. To test the possibility that polymers would form but would not bind to the lipid layer we have now added additional semi-quantitative EM controls (for both the non-hydrolysable ATP analogs and the three ‘membrane binding’ deletion mutants) testing polymerization in solution (without lipids) and also using plasma-treated grids. These showed that in our standard polymerization conditions, virtually no polymers form in solution (Fig. 3-S1B and Fig. 4-S4A). Albeit at very low frequency, some dual protofilaments were however detected in the presence of ADP or AMP-PNP at the high MreB concentration (Fig. 3-S1D). At this high MreB concentration, the sheet-like structures occasionally observed in solution in the presence of ATP were frequent in the presence of ADP and very frequent in the presence of AMP-PNP (Fig. 3-S2B). We have revised our conclusions on the basis of these new data: MreBGs can form polymeric assemblies in solution and in the absence of ATP hydrolysis at a higher critical concentration than in the presence of ATP and lipids.

      See the answers to the global review comments (point 2) and Response 2.3C to reviewer #2 for more details.

      3) Given the ATPase activity measurements, it is not very convincing that ATP rather than ADP will be present in the structure. The ATP should have been hydrolysed to ADP within the structure. The structure is now suggestive that MreB is not capable of hydrolysis, which is contradictory to the ATP hydrolysis data.

      Response 3.3. We thank the reviewer for her insightful remarks about the MreB-ATP crystal structure. The electron density map clearly demonstrates the presence of 3 phosphates. However, as suggested by the reviewer, the density which was attributed to a Mg2+ ion was to be interpreted as a water molecule. The absence of Mg2+ in the crystal could thus explain why the ATP had not been hydrolyzed.

      References

      Arino J, Ramos J, Sychrova H (2010) Alkali metal cation transport and homeostasis in yeasts. Microbiology and molecular biology reviews 74: 95-120

      Bean GJ, Amann KJ (2008) Polymerization properties of the Thermotoga maritima actin MreB: roles of temperature, nucleotides, and ions. Biochemistry 47: 826-835

      Cayley S, Lewis BA, Guttman HJ, Record MT, Jr. (1991) Characterization of the cytoplasm of Escherichia coli K-12 as a function of external osmolarity. Implications for protein-DNA interactions in vivo. Journal of molecular biology 222: 281-300

      Dersch S, Reimold C, Stoll J, Breddermann H, Heimerl T, Defeu Soufo HJ, Graumann PL (2020) Polymerization of Bacillus subtilis MreB on a lipid membrane reveals lateral co-polymerization of MreB paralogs and strong effects of cations on filament formation. BMC Mol Cell Biol 21: 76

      Eisenstadt E (1972) Potassium content during growth and sporulation in Bacillus subtilis. Journal of bacteriology 112: 264-267

      Epstein W, Schultz SG (1965) Cation Transport in Escherichia coli: V. Regulation of cation content. J Gen Physiol 49: 221-234

      Esue O, Wirtz D, Tseng Y (2006) GTPase activity, structure, and mechanical properties of filaments assembled from bacterial cytoskeleton protein MreB. Journal of bacteriology 188: 968-976

      Gaballah A, Kloeckner A, Otten C, Sahl HG, Henrichfreise B (2011) Functional analysis of the cytoskeleton protein MreB from Chlamydophila pneumoniae. PloS one 6: e25129

      Harne S, Duret S, Pande V, Bapat M, Beven L, Gayathri P (2020) MreB5 Is a Determinant of Rod-to-Helical Transition in the Cell-Wall-less Bacterium Spiroplasma. Curr Biol 30: 4753-4762 e4757

      Kang H, Bradley MJ, McCullough BR, Pierre A, Grintsevich EE, Reisler E, De La Cruz EM (2012) Identification of cation-binding sites on actin that drive polymerization and modulate bending stiffness. Proceedings of the National Academy of Sciences of the United States of America 109: 16923-16927

      Lacabanne D, Wiegand T, Wili N, Kozlova MI, Cadalbert R, Klose D, Mulkidjanian AY, Meier BH, Bockmann A (2020) ATP Analogues for Structural Investigations: Case Studies of a DnaB Helicase and an ABC Transporter. Molecules 25

      Mannherz HG, Brehme H, Lamp U (1975) Depolymerisation of F-actin to G-actin and its repolymerisation in the presence of analogs of adenosine triphosphate. Eur J Biochem 60: 109-116

      Mayer JA, Amann KJ (2009) Assembly properties of the Bacillus subtilis actin, MreB. Cell motility and the cytoskeleton 66: 109-118

      Nurse P, Marians KJ (2013) Purification and characterization of Escherichia coli MreB protein. The Journal of biological chemistry 288: 3469-3475

      Pande V, Mitra N, Bagde SR, Srinivasan R, Gayathri P (2022) Filament organization of the bacterial actin MreB is dependent on the nucleotide state. The Journal of cell biology 221

      Peck ML, Herschlag D (2003) Adenosine 5 '-O-(3-thio)triphosphate (ATP-gamma S) is a substrate for the nucleotide hydrolysis and RNA unwinding activities of eukaryotic translation initiation factor eIF4A. Rna 9: 1180-1187

      Popp D, Narita A, Maeda K, Fujisawa T, Ghoshdastider U, Iwasa M, Maeda Y, Robinson RC (2010) Filament structure, organization, and dynamics in MreB sheets. The Journal of biological chemistry 285: 15858-15865

      Rhoads DB, Waters FB, Epstein W (1976) Cation transport in Escherichia coli. VIII. Potassium transport mutants. J Gen Physiol 67: 325-341

      Rodriguez-Navarro A (2000) Potassium transport in fungi and plants. Biochimica et biophysica acta 1469: 1-30

      Salje J, van den Ent F, de Boer P, Lowe J (2011) Direct membrane binding by bacterial actin MreB. Molecular cell 43: 478-487

      Schmidt-Nielsen B (1975) Comparative physiology of cellular ion and volume regulation. J Exp Zool 194: 207-219

      Szatmari D, Sarkany P, Kocsis B, Nagy T, Miseta A, Barko S, Longauer B, Robinson RC, Nyitrai M (2020) Intracellular ion concentrations and cation-dependent remodelling of bacterial MreB assemblies. Sci Rep-Uk 10

      van den Ent F, Izore T, Bharat TA, Johnson CM, Lowe J (2014) Bacterial actin MreB forms antiparallel double filaments. eLife 3: e02634

      Whatmore AM, Chudek JA, Reed RH (1990) The Effects of Osmotic Upshock on the Intracellular Solute Pools of Bacillus subtilis. Journal of general microbiology 136: 2527-2535

    1. Author Response

      Reviewer #1 (Public Review):

      Briggs et al use a combination of mathematical modelling and experimental validation to tease apart the contributions of metabolic and electronic coupling to the pancreatic beta cell functional network. A number of recent studies have shown the existence of functional beta cell subpopulations, some of which are difficult to fully reconcile with established electrophysiological theory. More generally, the contribution of beta cell heterogeneity (metabolism, differentiation, proliferation, activity) to islet function cannot be explained by existing combined metabolic/electrical oscillator models. The present studies are thus timely in modelling the islet electrical (structural) and functional networks. Importantly, the authors show that metabolic coupling primarily drives the islet functional network, giving rise to beta cell subpopulations. The studies, however, do not diminish the critical role of electrical coupling in dictating glucose responsiveness, network extent as well as longer-range synchronization. As such, the studies show that islet structural and functional networks both act to drive islet activity, and that conclusions on the islet structural network should not be made using measures of the functional network (and vice versa).

      Strengths:

      • State-of-the-art multi-parameter modelling encompassing electrical and metabolic components.

      • Experimental validation using advanced FRAP imaging techniques, as well as Ca2+ data from relevant gap junction KO animals.

      • Well-balanced arguments that frame metabolic and electrical coupling as essential contributors to islet function.

      • Likely to change how the field models functional connectivity and beta cell heterogeneity.

      Weaknesses:

      • Limitations of FRAP and electrophysiological gap junction measures not considered.

      • Limitations of Cx36 (gap junction) KO animals not considered.

      • Accuracy of citations should be improved in a few cases.

      We thank reviewer 1 for their positive comments, including the many strengths in the approaches, arguments and impact. We do note the weaknesses raised by the reviewer and have addressed them following the comments below.

      We would like to also note that when we refer to metabolic activity driving the functional network, we are not referring to metabolic coupling between beta cells. Rather we mean that two cells that show either high levels of metabolic activity (glycolytic flux) or that show similar levels metabolic activity will show increased synchronization and thus a functional network edge as compares to cells with elevated gap junction conductance. Increased metabolic activity would likely generate increased depolarizing currents that will provide an increased coupling current to drive synchronization; whereas similar metabolic activity would mean a given coupling current could more readily drive synchronized activity. We have substantially rewritten the manuscript to clarify this point.

      Reviewer #2 (Public Review):

      In their present work, Briggs et al. combine biophysical simulations and experimental recordings of beta cell activity with analyses of functional network parameters to determine the role played by gap-junctional coupling, metabolism, and KATP conductance in defining the functional roles that the cells play in the functional networks, assess the structure-function relationship, and to resolve an important current open question in the field on the role of so-called hub cells in islets of Langerhans.

      Combining differential equation-based simulations on 1000 coupled cells with demanding calcium, NAPDH, and FRAP imaging, as well as with advanced network analyses, and then comparing the network metrics with simulated and experimentally determined properties is an achievement in its own right and a major methodological strength. The findings have the potential to help resolve the issue of the importance of hub cells in beta cell networks, and the methodological pipeline and data may prove invaluable for other researchers in the community.

      However, methodologically functional networks may be based on different types of calcium oscillations present in beta cells, i.e., fast oscillations produced by bursts of electrical activity, slow oscillations produced by metabolic/glycolytic oscillations, or a mixture of both. At present, the authors base the network analyses on fast oscillations only in the case of simulated traces and on a mixture of fast and slow oscillations in the case of experimental traces. Since different networks may depend on the studied beta cell properties to a different extent (e.g., fast oscillation-based networks may, more importantly, depend on electrical properties and slow oscillationbased networks may more strongly depend on metabolic properties), it is important that in drawing the conclusions the authors separately address the influence of a cell's electrical and metabolic properties on its functional role in the network based on fast oscillations, slow oscillations, or a mixture of both.

      We thank reviewer 2 for their positive comments, including addressing the importance of this study as it pertains to islet biology and acknowledging methodological complexities of this study. We also thank the reviewer for their careful reading and providing useful comments. We have integrated each comment into the manuscript. Most importantly, we have now extended our analysis to both fast and slow oscillations by incorporating an additional mathematical model of coupled slow oscillations and performing additional experimental analysis of fast, slow, and mixed oscillations.

      Reviewer #3 (Public Review):

      Over the past decade, novel approaches to understanding beta cell connectivity and how that contributes to the overall function of the pancreatic islet have emerged. The application of network theory to beta cell connectivity has been an extremely useful tool to understand functional hierarchies amongst beta cells within an islet. This helps to provide functional relevance to observations from structural and gene expression data that beta cells are not all identical.

      There are a number of "controversies" in this field that have arisen from the mathematical and subsequent experimental identification of beta "hub" cells. These are small populations of beta cells that are very highly connected to other beta cells, as assessed by applying correlation statistics to individual beta cell calcium traces across the islet.

      In this paper Briggs et al set out to answer the following areas of debate:

      They use computational datasets, based on established models of beta cells acting in concert (electrically coupled) within an islet-like structure, to show that it is similarities in metabolic parameters rather than "structural" connections (ie proximity which subserves gap junction coupling) that drives functional network behaviour. Whilst the computational models are quite relevant, the fact that the parameters (eg connectivity coefficients) are quite different to what is measured experimentally, confirm the limitations of this model. Therefore it was important for the authors to back up this finding by performing both calcium and metabolic imaging of islet beta cells. These experimental data are reported to confirm that metabolic coupling was more strongly related to functional connectivity than gap junction coupling. However, a limitation here is that the metabolic imaging data confirmed a strong link between disconnected beta cells and low metabolic coupling but did not robustly show the opposite. Similarly, I was not convinced that the FRAP studies, which indirectly measured GJ ("structural") connections were powered well enough to be related to measures of beta cell connectivity.

      The group goes on to provide further analytical and experimental data with a model of increasing loss of GJ connectivity (by calcium imaging islets from WT, heterozygous (50% GJ loss), and homozygous (100% loss). Given the former conclusion that it was metabolic not GJ connectivity that drives small world network behaviour, it was surprising to see such a great effect on the loss of hubs in the homs. That said, the analytical approaches in this model did help the authors confirm that the loss of gap junctions does not alter the preferential existence of beta cell connectivity and confirms the important contribution of metabolic "coupling". One perhaps can therefore conclude that there are two types of network behaviour in an islet (maybe more) and the field should move towards an understanding of overlapping network communities as has been done in brain networks.

      Overall this is an extremely well-written paper which was a pleasure to read. This group has neatly and expertly provided both computational and experimental data to support the notion that it is metabolic but not "structural" ie GJ coupling that drives our observations of hubs and functional connectivity. However, there is still much work to do to understand whether this metabolic coupling is just a random epiphenomenon or somehow fated, the extent to which other elements of "structural" coupling - ie the presence of other endocrine cell types, the spatial distribution of paracrine hormone receptors, blood vessels and nerve terminals are also important.

      We thank reviewer 3 for their positive comments, including the methodology, writing style, and the importance of this paper to the broader islet community. We thank the reviewer for their very in-depth and helpful comments. We have addressed each comment below and made significant changes to the manuscript according. We conducted more FRAP experiments and separated results into slow, fast, and mixed oscillations. We included analysis of an additional computational model that simulates slow calcium oscillations. Additionally, we substantially rewrote the paper to clarify that we are not referring to metabolic coupling and speak on the broader implications of network theory and our findings.

      Reviewer #4 (Public Review):

      This manuscript describes a complex, highly ambitious set of modeling and experimental studies that appear designed to compare the structural and functional properties of beta cell subpopulations within the islet network in terms of their influence on network synchronization. The authors conclude that the most functionally coupled cell subpopulations in the islet network are not those that are most structurally coupled via gap junctions but those that are most metabolically active.

      Strengths of the paper include (1) its use of an interdisciplinary collection of methods including computer simulations, FRAP to monitor functional coupling by gap junctions, the monitoring of Ca2+ oscillations in single beta cells embedded in the network, and the use of sophisticated approaches from probability theory. Most of these methods have been used and validated previously. Unfortunately, however, it was not clear what the underlying premise of the paper actually is, despite many stated intentions, nor what about it is new compared to previous studies, an additional weakness.

      Although the authors state that they are trying to answer 3 critical questions, it was not clear how important these questions are in terms of significance for the field. For example, they state that a major controversy in the field is whether network structure or network function mediates functional synchronization of beta cells within the islet. However, this question is not much debated. As an example, while it is known that there can be long-range functional coupling in islets, no workers in the field believe there is a physical structure within islets that mediates this, unlike the case for CNS neurons that are known to have long projections onto other neurons. Beta cells within the islets are locally coupled via gap junctions, as stated repeatedly by the authors but these mediate short-range coupling. Thus, there are clearly functional correlations over long ranges but no structures, only correlated activity. This weakness raises questions about the overall significance of the work, especially as it seems to reiterate ideas presented previously.

      We thank reviewer 4 for their positive comments, including our multidisciplinary use of mathematical models and experimental imaging techniques. We have now included an additional model of slow oscillations (the Integrated Oscillator Model) to improve our conclusions. We also thank reviewer 4 for the insightful comments. We have carefully reviewed each comment and made significant changes to the manuscript accordingly. In particular, we have significantly rewritten the introduction and discussion attempting to clarify what is new in our manuscript and what is previously shown. Additionally, we agree with the reviewers’ sentiment that there is little debate over whether, for example, there are physical structures within the islet that mediate long-range functional connections. However, there is current debate over whether functional beta-cell subpopulations can dictate islet dynamics (see [11]–[13]). This debate can be framed by observing whether these functional subpopulations emerge from the islet due to physical connections (structural network) or something more nuisance (such as intrinsic dynamics). We have reframed the introduction and discussion to clarify this debate as well as more clearly state the premise of the paper.

      Specific Comments

      1). The authors state it is well accepted that the disruption of gap junctional coupling is a pathophysiological characteristic of diabetes, but this is not an opinion widely accepted by the field, although it has been proposed. The authors should scale back on such generalizations, or provide more compelling evidence to support such a claim.

      Thank you for pointing this out, we have provided more specific citations and changes the wording from “well accepted” to “has been documented”. See Discussion page 13 lines 415-416.

      2) The paper relies heavily on simulations performed using a version of the model of Cha et al (2011). While this is a reasonable model of fast bursting (e.g. oscillations having periods <1 min.), the Ca2+ oscillations that were recorded by the authors and shown in Fig. 2b of the manuscript are slow oscillations with periods of 5 min and not <1 min, which is a weakness of the model in the current context. Furthermore, the model outputs that are shown lack the well-known characteristics seen in real islets, such as fast-spiking occurring on prolonged plateaus, again as can be seen by comparing the simulated oscillations shown in Fig. 1d with those in Fig. 2b. It is recommended that the simulations be repeated using a more appropriate model of slow oscillations or at least using the model of Cha et al but employed to simulate in slower bursting.

      The reviewer raises an important point and caveat associated with our simulated model and experimental data. This point was also made by other reviewers, and a similar response to this comment can be found elsewhere in response to reviewer 2 point 6. To address this comment, we have performed several additional experiments and analyses:

      1) We collected additional Ca2+ (to identify the functional network and hubs) and FRAP data (to assess gap junction permeability) in islets which show either pure slow, pure fast, or mixed oscillations. We generated networks based on each time scale to compare with FRAP gap junction permeability data. We found that the conclusions of our first draft to be consistent across all oscillation types. There was no relationship between gap junction conductance, as approximated using FRAP, and normalized degree for slow (Figure 3j), fast (Figure 3 Supp 1d,e), or mixed (Figure 3 Supp 1g,h) oscillations. We also include discussion of these conclusions - See Results page 7 lines 184-186 and lines 188-191, Discussion page 12 lines 357-360.

      2) We also performed additional simulations with a coupled ‘Integrated Oscillator Model’ which shows slow oscillations because of metabolic oscillations (Figure 2). We compared connectivity with gap junction coupling and underlying cell parameters. In this case, there is an association between functional and structural networks, with highly-connected hub cells showing higher gap junction conductance (Figure 2f) but also low KATP channel conductance (gKATP) (Figure 2e). However, there are some caveats to these findings – given the nature of the IOM model, we were limited to simulating smaller islets (260 cells) and less heterogeneity in the calcium traces was observed. Additional analysis suggests the greater association between functional and structural networks in this model was a result of the smaller islets, and the association was also dependent on threshold (unlike in the Cha-Noma fast oscillator model) robust. These limitations and results are discussed further (Discussion page 11 lines 344-354).

      Additionally, in the IOM, the underlying cell dynamics of highly-connected hub cells are differentiated by KATP channel conductance (gKATP), which is different than in the fast oscillator model (differentiated by metabolism, kglyc). However this difference between models can be linked to differences in the way duty cycle is influenced by gKATP and kglyc (Figure 1h, Figure 2g). In each model there was a similar association between duty cycle and highly-connected hub cells. We also discuss these findings (Discussion page 11 lines 334-343).

      Overall these results and discussion with respect to the coupled IOM oscillator model can be found in Figure 2, Results page 6 lines 128-156 and Discussion page 11 lines 332-354.

      3) Much of the data analyzed whether obtained via simulation or through experiment seems to produce very small differences in the actual numbers obtained, as can be seen in the bar graphs shown in Figs. 1e,g for example (obtained from simulations), or Fig. 2j (obtained from experimental measurements). The authors should comment as to why such small differences are often seen as a result of their analyses throughout the manuscript and why also in many cases the observed variance is high. Related to the data shown, very few dots are shown in Figs. 1eg or Fig 4e and 4h even though these points were derived from simulations where 100s of runs could be carried out and many more points obtained for plotting. These are weaknesses unless specific and convincing explanations are provided.

      We thank the reviewer for these comments, which are similar to those of reviewer 2 (point 4) and reviewer 3 (point 6). Indeed there is some variability between cells in both simulations and experiments related to the metabolic activity in hubs and non-hubs. The variability points to potentially other factors being involved in determining hubs beyond simply kglyc, including a minor role for gap junction coupling structural network and potentially cell position and other intrinsic factors. We now discuss this point – see Discussion page 12 lines 364-266.

      The differences between hubs and nonhubs appear small because the value of kglyc is very small. For figure 1e, the average kglyc for nonhubs was 1.26x10-4 s-1 (which is the average of the distribution because most cells are non hubs) while the average kglyc for hubs was 1.4x10-4 s-1 which is about half of a standard deviation higher. The paired t-test controls for the small value of average kglyc.

      For simulation data each of the 5 dots corresponds to a simulated islet averaged over 1000 cells (or 260 cells for coupled IOM). The computational resources are high to generate such data so it is not feasible to conduct 100s of runs. Again, we note the comparisons between hubs and non-hubs are paired, and we find statistically significant differences for kglyc in figure 1 using only 5 paired data points. That we find these differences indicates the substantial difference between hubs and non-hubs. This is further supported all effect sizes being much greater than 0.8 for all significantly different findings (Cha Noma - kglyc: 2.85, gcoup: 0.82) (IOM: gKATP: 1.27, gcoup: 2.94) – We have included these effect sizes in the captions see Figure 1 and 2 captions (pages 34, 36)

      To consider all of the available data rather than the average across an entire islet, we created a kernel density estimate the kglyc for hubs and nonhubs created by concatenating every single cell in each of the five islets. A kstest results in a highly significant difference (P<0.0001) between these two distributions.

      Author response image 1.

      4) The data shown in Fig. 4i,j are intended to compare long-range synchronization at different distances along a string of coupled cells but the difference between the synchronized and unsynchronized cells for gcoup and Kglyc was subtle, very much so.

      Thank you for pointing out these subtle differences. The y-axis scale for i and j is broad to allow us to represent all distances on a single plot. After correction for multiple comparison, the differences were still statistically significant. As the reviewer mentioned in point 3, each plot contains only five data points, each of which represent the average of a single simulated islet, therefore we are not concerned about statistical significance coming from too large of a sample size. We also checked the differences between synchronized and nonsynchronized cell pairs in figure 4 panels e and h (now figure 5 e, h). These are the same data as i and j but normalized such that all of the distances could be averaged together. We again found statistical significance between synchronized and non-synchronized cell pairs. As can be seen in Author response image 2 the difference between synchronized and non-synchronized cell pairs is greater than the variability between simulated islets. Thus, in this case the variability is not substantial.

      Author response image 2.

      5) The data shown in Fig. 5 for Cx36 knockout islets are used to assess the influence of gap junctional coupling, which is reasonable, but it would be reassuring to know that loss of this gene has no effects on the expression of other genes in the beta cell, especially genes involved with glucose metabolism.

      This is an important point. Previous studies have assessed that no significant change in NAD(P)H is observed in Cx36 deficient islets – see Benninger et al J.Physiol 2011 [14]. Islet architecture is also retained. Further the insulin secretory response of dissociated Cx36 knockout beta cells is the same as that of dissociated wildtype beta cells, further indicating no significant defect in the intrinsic ability of the beta cell to release insulin – see Benninger et al J.Physiol 2011 [14]. We now Mention these findings in the discussion. See Discussion page 14 lines 459-464.

      6) In many places throughout the paper, it is difficult to ascertain whether what is being shown is new vs. what has been shown previously in other studies. The paper would thus benefit strongly from added text highlighting the novelty here and not just restating what is known, for instance, that islets can exhibit small-world network properties. This detracts from the strengths of the paper and further makes it difficult to wade through. Even the finding here that metabolic characteristics of the beta cells can infer profound and influential functional coupling is not new, as the authors proposed as much many years ago. Again, this makes it difficult to distill what is new compared to what is mainly just being confirmed here, albeit using different methods.

      Thank you for the suggestion, we have made significant modifications throughout the Introduction, Discussion and Results to be clearer about what is known from previous work and what is newly found in this manuscript.

      Reviewer #5 (Public Review):

      The authors use state-of-the-art computation, experiment, and current network analysis to try and disaggregate the impact of cellular metabolism driving cellular excitability and structural electrical connections through gap junctions on islet synchronization. They perform interesting simulations with a sophisticated mathematical model and compare them with closely associated experiments. This close association is impressive and is an excellent example of using mathematics to inform experiments and experimental results. The current conclusions, however, appear beyond the results presented. The use of functional connectivity is based on correlated calcium traces but is largely without an understood biophysical mechanism. This work aims to clarify such a mechanism between metabolism and structural connection and comes out on the side of metabolism driving the functional connectivity, but both are required and more nuanced conclusions should be drawn.

      We thank reviewer 5 for their positive comments, including our multifaceted experimental and computational techniques. We also found the reviewers careful reading and thoughtful comments to be very helpful and we have worked to integrate each comment into our manuscript. It is evident from the reviewer comments that we did not clearly explain what was meant by our conclusions concerning the functional network reflecting metabolism rather than gap junctions. We have conducted significant rewriting to show that we are not concluding that communication (metabolic or electric) occurs due to conduits other than gap junctions. Rather, our data suggest that the functional network (which reflects calcium synchronization) reflects intrinsic dynamics of the cells, which include metabolic rates, more than individual gap junction connections.

      References referred to in this response to reviewers document:

      [1] A. Stožer et al., “Functional connectivity in islets of Langerhans from mouse pancreas tissue slices,” PLoS Comput Biol, vol. 9, no. 2, p. e1002923, 2013.

      [2] N. L. Farnsworth, A. Hemmati, M. Pozzoli, and R. K. Benninger, “Fluorescence recovery after photobleaching reveals regulation and distribution of connexin36 gap junction coupling within mouse islets of Langerhans,” The Journal of physiology, vol. 592, no. 20, pp. 4431–4446, 2014.

      [3] C.-L. Lei, J. A. Kellard, M. Hara, J. D. Johnson, B. Rodriguez, and L. J. Briant, “Beta-cell hubs maintain Ca2+ oscillations in human and mouse islet simulations,” Islets, vol. 10, no. 4, pp. 151–167, 2018.

      [4] N. R. Johnston et al., “Beta cell hubs dictate pancreatic islet responses to glucose,” Cell metabolism, vol. 24, no. 3, pp. 389–401, 2016.

      [5] V. Kravets et al., “Functional architecture of pancreatic islets identifies a population of first responder cells that drive the first-phase calcium response,” PLoS Biology, vol. 20, no. 9, p. e3001761, 2022.

      [6] H. Ren et al., “Pancreatic α and β cells are globally phase-locked,” Nature Communications, vol. 13, no. 1, p. 3721, 2022.

      [7] A. Stožer et al., “From Isles of Königsberg to Islets of Langerhans: Examining the function of the endocrine pancreas through network science,” Frontiers in Endocrinology, vol. 13, p. 922640, 2022.

      [8] J. Zmazek et al., “Assessing different temporal scales of calcium dynamics in networks of beta cell populations,” Frontiers in physiology, vol. 12, p. 337, 2021.

      [9] M. E. Corezola do Amaral et al., “Caloric restriction recovers impaired β-cell-β-cell gap junction coupling, calcium oscillation coordination, and insulin secretion in prediabetic mice,” American Journal of Physiology-Endocrinology and Metabolism, vol. 319, no. 4, pp. E709–E720, 2020.

      [10] J. M. Dwulet, J. K. Briggs, and R. K. P. Benninger, “Small subpopulations of beta-cells do not drive islet oscillatory [Ca2+] dynamics via gap junction communication,” PLOS Computational Biology, vol. 17, no. 5, p. e1008948, May 2021, doi: 10.1371/journal.pcbi.1008948.

      [11] B. E. Peercy and A. S. Sherman, “Do oscillations in pancreatic islets require pacemaker cells?,” Journal of Biosciences, vol. 47, no. 1, pp. 1–11, 2022.

      [12] G. A. Rutter, N. Ninov, V. Salem, and D. J. Hodson, “Comment on Satin et al.‘Take me to your leader’: an electrophysiological appraisal of the role of hub cells in pancreatic islets. Diabetes 2020; 69: 830–836,” Diabetes, vol. 69, no. 9, pp. e10–e11, 2020.

      [13] L. S. Satin and P. Rorsman, “Response to comment on satin et al.‘Take me to your leader’: An electrophysiological appraisal of the role of hub cells in pancreatic islets. Diabetes 2020; 69: 830–836,” Diabetes, vol. 69, no. 9, pp. e12–e13, 2020.

      [14] R. K. Benninger, W. S. Head, M. Zhang, L. S. Satin, and D. W. Piston, “Gap junctions and other mechanisms of cell–cell communication regulate basal insulin secretion in the pancreatic islet,” The Journal of physiology, vol. 589, no. 22, pp. 5453–5466, 2011.

      [15] R. Fried, Erectile dysfunction as a cardiovascular impairment. Academic Press, 2014. [16] T. Pipatpolkai, S. Usher, P. J. Stansfeld, and F. M. Ashcroft, “New insights into KATP channel gene mutations and neonatal diabetes mellitus,” Nature Reviews Endocrinology, vol. 16, no. 7, pp. 378–393, 2020.

      [17] A. M. Notary, M. J. Westacott, T. H. Hraha, M. Pozzoli, and R. K. P. Benninger, “Decreases in Gap Junction Coupling Recovers Ca2+ and Insulin Secretion in Neonatal Diabetes Mellitus, Dependent on Beta Cell Heterogeneity and Noise,” PLOS Computational Biology, vol. 12, no. 9, p. e1005116, Sep. 2016, doi: 10.1371/journal.pcbi.1005116.

      [18] J. V. Rocheleau, G. M. Walker, W. S. Head, O. P. McGuinness, and D. W. Piston, “Microfluidic glucose stimulation reveals limited coordination of intracellular Ca2+ activity oscillations in pancreatic islets,” Pro ceedings of the National Academy of Sciences, vol. 101, no. 35, pp. 12899–12903, 2004. [19] R. K. Benninger, M. Zhang, W. S. Head, L. S. Satin, and D. W. Piston, “Gap junction coupling and calcium waves in the pancreatic islet,” Biophysical journal, vol. 95, no. 11, pp. 5048–5061, 2008.

    1. Author Response:

      Evaluation Summary:

      The authors assessed multivariate relations between a dimensionality-reduced symptom space and brain imaging features, using a large database of individuals with psychosis-spectrum disorders (PSD). Demonstrating both high stability and reproducibility of their approaches, this work showed a promise that diagnosis or treatment of PSD can benefit from a proposed data-driven brain-symptom mapping framework. It is therefore of broad potential interest across cognitive and translational neuroscience.

      We are very grateful for the positive feedback and the careful read of our paper. We would especially like to thank the Reviewers for taking the time to read this lengthy and complex manuscript and for providing their helpful and highly constructive feedback. Overall, we hope the Editor and the Reviewers will find that our responses address all the comments and that the requested changes and edits improved the paper.

      Reviewer 1 (Public Review):

      The paper assessed the relationship between a dimensionality-reduced symptom space and functional brain imaging features based on the large multicentric data of individuals with psychosis-spectrum disorders (PSD).

      The strength of this study is that i) in every analysis, the authors provided high-level evidence of reproducibility in their findings, ii) the study included several control analyses to test other comparable alternatives or independent techniques (e.g., ICA, univariate vs. multivariate), and iii) correlating to independently acquired pharmacological neuroimaging and gene expression maps, the study highlighted neurobiological validity of their results.

      Overall the study has originality and several important tips and guidance for behavior-brain mapping, although the paper contains heavy descriptions about data mining techniques such as several dimensionality reduction algorithms (e.g., PCA, ICA, and CCA) and prediction models.

      We thank the Reviewer for their insightful comments and we appreciate the positive feedback. Regarding the descriptions of methods and analytical techniques, we have removed these descriptions out of the main Results text and figure captions. Detailed descriptions are still provided in the Methods, so that they do not detract from the core message of the paper but can still be referenced if a reader wishes to look up the details of these methods within the context of our analyses.

      Although relatively minors, I also have few points on the weaknesses, including i) an incomplete description about how to tell the PSD effects from the normal spectrum, ii) a lack of overarching interpretation for other principal components rather than only the 3rd one, and iii) somewhat expected results in the stability of PC and relevant indices.

      We are very appreciative of the constructive feedback and feel that these revisions have strengthened our paper. We have addressed these points in the revision as following:

      i) We are grateful to the Reviewer for bringing up this point as it has allowed us to further explore the interesting observation we made regarding shared versus distinct neural variance in our data. It is important to not confuse the neural PCA (i.e. the independent neural features that can be detected in the PSD and healthy control samples) versus the neuro-behavioral mapping. In other words, both PSD patients and healthy controls are human and therefore there are a number of neural functions that both cohorts exhibit that may have nothing to do with the symptom mapping in PSD patients. For instance, basic regulatory functions such as control of cardiac and respiratory cycles, motor functions, vision, etc. We hypothesized therefore that there are more common than distinct neural features that are on average shared across humans irrespective of their psychopathology status. Consequently, there may only be a ‘residual’ symptom-relevant neural variance. Therefore, in the manuscript we bring up the possibility that a substantial proportion of neural variance may not be clinically relevant. If this is in fact true then removing the shared neural variance between PSD and CON should not drastically affect the reported symptom-neural univariate mapping solution, because this common variance does not map to clinical features and therefore is orthogonal statistically. We have now verified this hypothesis quantitatively and have added extensive analyses to highlight this important observation made the the Reviewer. We first conducted a PCA using the parcellated GBC data from all 436 PSD and 202 CON (a matrix with dimensions 638 subjects x 718 parcels). We will refer to this as the GBC-PCA to avoid confusion with the symptom/behavioral PCA described elsewhere in the manuscript. This GBC-PCA resulted in 637 independent GBC-PCs. Since PCs are orthogonal to each other, we then partialled out the variance attributable to GBC-PC1 from the PSD data by reconstructing the PSD GBC matrix using only scores and coefficients from the remaining 636 GBC-PCs (GBˆCwoP C1). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. The results are shown in Fig. S21 and reproduced below. Removing the first PC of shared neural variance (which accounted for about 15.8% of the total GBC variance across CON and PSD) from PSD data attenuated the statistics slightly (not unexpected as the variance was by definition reduced) but otherwise did not strongly affect the univariate mapping solution.

      We repeated the symptom-neural regression next with the first 2 GBC-PCs partialled out of the PSD data Fig. S22, with the first 3 PCs parsed out Fig. S23, and with the first 4 neural PCs parsed out Fig. S24. The symptom-neural maps remain fairly robust, although the similarity with the original βP CGBC maps does drop as more common neural variance is parsed out. These figures are also shown below:

      Fig. S21. Comparison between the PSD βP CGBC maps computed using GBC and GBC with the first neural PC parsed out. If a substantial proportion of neural variance is not be clinically relevant, then removing the shared neural variance between PSD and CON should not drastically affect the reported symptom-neural univariate mapping solution, because this common variance will not map to clinical features. We therefore performed a PCA on CON and PSD GBC to compute the shared neural variance (see Methods), and then parsed out the first GBC-PC from the PSD GBC data (GBˆCwoP C1). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. (A) The βP C1GBC map, also shown in Fig. S10. (B) The first GBC-PC accounted for about 15.8% of the total GBC variance across CON and PSD. Removing GBC-PC1 from PSD data attenuated the βP C1GBC statistics slightly (not unexpected as the variance was by definition reduced) but otherwise did not strongly affect the univariate mapping solution. (C) Correlation across 718 parcels between the two βP C1GBC map shown in A and B. (D-O) The same results are shown for βP C2GBC to βP C5GBC maps.

      Fig. S22. Comparison between the PSD βP CGBC maps computed using GBC and GBC with the first two neural PCs parsed out. We performed a PCA on CON and PSD GBC and then parsed out the first three GBC-PC from the PSD GBC data (GBˆCwoP C1−2, see Methods). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. (A) The βP C1GBC map, also shown in Fig. S10. (B) The second GBC-PC accounted for about 9.5% of the total GBC variance across CON and PSD. (C) Correlation across 718 parcels between the two βP C1GBC map shown in A and B. (D-O) The same results are shown for βP C2GBC to βP C5GBC maps.

      Fig. S23. Comparison between the PSD βP CGBC maps computed using GBC and GBC with the first three neural PCs parsed out. We performed a PCA on CON and PSD GBC and then parsed out the first three GBC-PC from the PSD GBC data (GBˆCwoP C1−3, see Methods). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. (A) The βP C1GBC map, also shown in Fig. S10. (B) The second GBC-PC accounted for about 9.5% of the total GBC variance across CON and PSD. (C) Correlation across 718 parcels between the two βP C1GBC map shown in A and B. (D-O) The same results are shown for βP C2GBC to βP C5GBC maps.

      Fig. S24. Comparison between the PSD βP CGBC maps computed using GBC and GBC with the first four neural PCs parsed out. We performed a PCA on CON and PSD GBC and then parsed out the first four GBC-PC from the PSD GBC data (GBˆCwoP C1−4, see Methods). We then reran the univariate regression as described in Fig. 3, using the same five symptom PC scores across 436 PSD. (A) The βP C1GBC map, also shown in Fig. S10. (B) The second GBC-PC accounted for about 9.5% of the total GBC variance across CON and PSD. (C) Correlation across 718 parcels between the two βP C1GBC map shown in A and B. (D-O) The same results are shown for βP C2GBC to βP C5GBC maps.

      For comparison, we also computed the βP CGBC maps for control subjects, shown in Fig. S11. In support of the βP CGBC in PSD being circuit-relevant, we observed only mild associations between GBC and PC scores in healthy controls:

      Results: All 5 PCs captured unique patterns of GBC variation across the PSD (Fig. S10), which were not observed in CON (Fig. S11). ... Discussion: On the contrary, this bi-directional “Psychosis Configuration” axis also showed strong negative variation along neural regions that map onto the sensory-motor and associative control regions, also strongly implicated in PSD (1, 2). The “bi-directionality” property of the PC symptom-neural maps may thus be desirable for identifying neural features that support individual patient selection. For instance, it may be possible that PC3 reflects residual untreated psychosis symptoms in this chronic PSD sample, which may reveal key treatment neural targets. In support of this circuit being symptom-relevant, it is notable that we observed a mild association between GBC and PC scores in the CON sample (Fig. S11).

      ii) In our original submission we spotlighted PC3 because of its pattern of loadings on to hallmark symptoms of PSD, including strong positive loadings across Positive symptom items in the PANSS and conversely strong negative loadings on to most Negative items. It was necessary to fully examine this dimension in particular because these are key characteristics of the target psychiatric population, and we found that the focus on PC3 was innovative because it provided an opportunity to quantify a fully data-driven dimension of symptom variation that is highly characteristic of the PSD patient population. Additionally, this bi-directional axis captured shared variance from measures in other traditional symptoms factors, such the PANSS General factor and cognition. This is a powerful demonstration of how data-driven techniques such as PCA can reveal properties intrinsic to the structure of PSD-relevant symptom data which may in turn improve the mapping of symptom-neural relationships. We refrained from explaining each of the five PCs in detail in the main text as we felt that it would further complicate an already dense manuscript. Instead, we opted to provide the interpretation and data from all analyses for all five PCs in the Supplement. However, in response to the Reviewers’ thoughtful feedback that more focus should be placed on other components, we have expanded the presentation and discussion of all five components (both regarding the symptom profiles and neural maps) in the main text:

      Results: Because PC3 loads most strongly on to hallmark symptoms of PSD (including strong positive loadings across PANSS Positive symptom measures in the PANSS and strong negative loadings onto most Negative measures), we focus on this PC as an opportunity to quantify an innovative, fully data-driven dimension of symptom variation that is highly characteristic of the PSD patient population. Additionally, this bi-directional symptom axis captured shared variance from measures in other traditional symptoms factors, such the PANSS General factor and cognition. We found that the PC3 result provided a powerful empirical demonstration of how using a data-driven dimensionality-reduced solution (via PCA) can reveal novel patterns intrinsic to the structure of PSD psychopathology.

      iii) We felt that demonstrating the stability of the PCA solution was extremely important, given that this degree of rigor has not previously been tested using broad behavioral measures across psychosis symptoms and cognition in a cross-diagnostic PSD sample. Additionally, we demonstrated reproducibility of the PCA solution using independent split-half samples. Furthermore, we derived stable neural maps using the PCA solution. In our original submission we show that the CCA solution was not reproducible in our dataset. Following the Reviewers’ feedback, we computed the estimated sample sizes needed to sufficiently power our multivariate analyses for stable/reproducible solutions. using the methods in (3). These results are discussed in detail in our resubmitted manuscript and in our response to the Critiques section below.

      Reviewer 2 (Public Review):

      The work by Ji et al is an interesting and rather comprehensive analysis of the trend of developing data-driven methods for developing brain-symptom dimension biomarkers that bring a biological basis to the symptoms (across PANSS and cognitive features) that relate to psychotic disorders. To this end, the authors performed several interesting multivariate analyses to decompose the symptom/behavioural dimensions and functional connectivity data. To this end, the authors use data from individuals from a transdiagnostic group of individuals recruited by the BSNIP cohort and combine high-level methods in order to integrate both types of modalities. Conceptually there are several strengths to this paper that should be applauded. However, I do think that there are important aspects of this paper that need revision to improve readability and to better compare the methods to what is in the field and provide a balanced view relative to previous work with the same basic concepts that they are building their work around. Overall, I feel as though the work could advance our knowledge in the development of biomarkers or subject level identifiers for psychiatric disorders and potentially be elevated to the level of an individual "subject screener". While this is a noble goal, this will require more data and information in the future as a means to do this. This is certainly an important step forward in this regard.

      We thank the Reviewer for their insightful and constructive comments about our manuscript. We have revised the text to make it easier to read and to clarify our results in the context of prior works in the field. We fully agree that a great deal more work needs to be completed before achieving single-subject level treatment selection, but we hope that our manuscript provides a helpful step towards this goal.

      Strengths:

      • Combined analysis of canonical psychosis symptoms and cognitive deficits across multiple traditional psychosis-related diagnoses offers one of the most comprehensive mappings of impairments experienced within PSD to brain features to date
      • Cross-validation analyses and use of various datasets (diagnostic replication, pharmacological neuroimaging) is extremely impressive, well motivated, and thorough. In addition the authors use a large dataset and provide "out of sample" validity
      • Medication status and dosage also accounted for
      • Similarly, the extensive examination of both univariate and multivariate neuro-behavioural solutions from a methodological viewpoint, including the testing of multiple configurations of CCA (i.e. with different parcellation granularities), offers very strong support for the selected symptom-to-neural mapping
      • The plots of the obtained PC axes compared to those of standard clinical symptom aggregate scales provide a really elegant illustration of the differences and demonstrate clearly the value of data-driven symptom reduction over conventional categories
      • The comparison of the obtained neuro-behavioural map for the "Psychosis configuration" symptom dimension to both pharmacological neuroimaging and neural gene expression maps highlights direct possible links with both underlying disorder mechanisms and possible avenues for treatment development and application
      • The authors' explicit investigation of whether PSD and healthy controls share a major portion of neural variance (possibly present across all people) has strong implications for future brain-behaviour mapping studies, and provides a starting point for narrowing the neural feature space to just the subset of features showing symptom-relevant variance in PSD

      We are very grateful for the positive feedback. We would like to thank the Reviewers for taking the time to read this admittedly dense manuscript and for providing their helpful critique.

      Critiques:

      • Overall I found the paper very hard to read. There are abbreviation everywhere for every concept that is introduced. The paper is methods heavy (which I am not opposed to and quite like). It is clear that the authors took a lot of care in thinking about the methods that were chosen. That said, I think that the organization would benefit from a more traditional Intro, Methods, Results, and Discussion formatting so that it would be easier to parse the Results. The figures are extremely dense and there are often terms that are coined or used that are not or poorly defined.

      We appreciate the constructive feedback around how to remove the dense content and to pay more attention to the frequency of abbreviations, which impact readability. We implemented the strategies suggested by the Reviewer and have moved the Methods section after the Introduction to make the subsequent Results section easier to understand and contextualize. For clarity and length, we have moved methodological details previously in the Results and figure captions to the Methods (e.g. descriptions of dimensionality reduction and prediction techniques). This way, the Methods are now expanded for clarity without detracting from the readability of the core results of the paper. Also, we have also simplified the text in places where there was room for more clarity. For convenience and ease of use of the numerous abbreviations, we have also added a table to the Supplement (Supplementary Table S1).

      • One thing I found conceptually difficult is the explicit comparison to the work in the Xia paper from the Satterthwaite group. Is this a fair comparison? The sample is extremely different as it is non clinical and comes from the general population. Can it be suggested that the groups that are clinically defined here are comparable? Is this an appropriate comparison and standard to make. To suggest that the work in that paper is not reproducible is flawed in this light.

      This is an extremely important point to clarify and we apologize that we did not make it sufficiently clear in the initial submission. Here we are not attempting to replicate the results of Xia et al., which we understand were derived in a fundamentally different sample than ours both demographically and clinically, with testing very different questions. Rather, this paper is just one example out of a number of recent papers which employed multivariate methods (CCA) to tackle the mapping between neural and behavioral features. The key point here is that this approach does not produce reproducible results due to over-fitting, as demonstrated robustly in the present paper. It is very important to highlight that in fact we did not single out any one paper when making this point. In fact, we do not mention the Xia paper explicitly anywhere and we were very careful to cite multiple papers in support of the multivariate over-fitting argument, which is now a well-know issue (4). Nevertheless, the Reviewers make an excellent point here and we acknowledge that while CCA was not reproducible in the present dataset, this does not explicitly imply that the results in the Xia et al. paper (or any other paper for that matter) are not reproducible by definition (i.e. until someone formally attempts to falsify them). We have made this point explicit in the revised paper, as shown below. Furthermore, in line with the provided feedback, we also applied the multivariate power calculator derived by Helmer et al. (3), which quantitatively illustrates the statistical point around CCA instability.

      Results: Several recent studies have reported “latent” neuro-behavioral relationships using multivariate statistics (5–7), which would be preferable because they simultaneously solve for maximal covariation across neural and behavioral features. Though concerns have emerged whether such multivariate results will replicate due to the size of the feature space relative to the size of the clinical samples (4), Given the possibility of deriving a stable multivariate effect, here we tested if results improve with canonical correlation analysis (CCA) (8) which maximizes relationships between linear combinations of symptom (B) and neural features (N) across all PSD (Fig. 5A).

      Discussion: Here we attempted to use multivariate solutions (i.e. CCA) to quantify symptom and neural feature co- variation. In principle, CCA is well-suited to address the brain-behavioral mapping problem. However, symptom-neural mapping using CCA across either parcel-level or network-level solutionsin our sample was not reproducible even when using a low-dimensional symptom solution and parcellated neural data as a starting point. Therefore, while CCA (and related multivariate methods such as partial least squares) are theoretically appropriate and may be helped by regularization methods such as sparse CCA, in practice many available psychiatric neuroimaging datasets may not provide sufficient power to resolve stable multivariate symptom-neural solutions (3). A key pressing need for forthcoming studies will be to use multivariate power calculators to inform sample sizes needed for resolving stable symptom-neural geometries at the single subject level. Of note, though we were unable to derive a stable CCA in the present sample, this does not imply that the multivariate neuro-behavioral effect may not be reproducible with larger effect sizes and/or sample sizes. Critically, this does highlight the importance of power calculations prior to computing multivariate brain-behavioral solutions (3).

      • Why was PCA selected for the analysis rather than ICA? Authors mention that PCA enables the discovery of orthogonal symptom dimensions, but don't elaborate on why this is expected to better capture behavioural variation within PSD compared to non-orthogonal dimensions. Given that symptom and/or cognitive items in conventional assessments are likely to be correlated in one way or another, allowing correlations to be present in the low-rank behavioural solution may better represent the original clinical profiles and drive more accurate brain-behaviour mapping. Moreover, as alluded to in the Discussion, employing an oblique rotation in the identification of dimensionality-reduced symptom axes may have actually resulted in a brain-behaviour space that is more generalizable to other psychiatric spectra. Why not use something more relevant to symptom/behaviour data like a factor analysis?

      This is a very important point! We agree with the Reviewer that an oblique solution may better fit the data. For this reason, we performed an ICA as shown in the Supplement. We chose to show PCA for the main analyses here because it is a deterministic solution and the number of significant components could be computed via permutation testing. Importantly, certain components from the ICA solution in this sample were highly similar to the PCs shown in the main solution (Supplementary Note 1), as measured by comparing the subject behavioral scores (Fig. S4), and neural maps (Fig. S13). However, notably, certain components in the ICA and PCA solutions did not appear to have a one-to-one mapping (e.g. PCs 1-3 and ICs 1-3). The orthogonality of the PCA solution forces the resulting components to capture maximally separated, unique symptom variance, which in turn map robustly on to unique neural circuits. We observed that the data may be distributed in such a way that in the ICA highly correlated independent components emerge, which do not maximally separate the symptom variance associate with neural variance. We demonstrate this by plotting the relationship between parcel beta coefficients for the βP C3GBC map versus the βIC2GBC and βIC3GBC maps. The sigmoidal shape of the distribution indicates an improvement in the Z-statistics for the βP C3GBC map relative to the βIC2GBC and βIC3GBC maps. We have added this language to the main text Results:

      Notably, independent component analysis (ICA), an alternative dimensionality reduction procedure which does not enforce component orthogonality, produced similar effects for this PSD sample, see Supplementary Note 1 & Fig. S4A). Certain pairs of components between the PCA and ICA solutions appear to be highly similar and exclusively mapped (IC5 and PC4; IC4 and PC5) (Fig. S4B). On the other hand, PCs 1-3 and ICs 1-3 do not exhibit a one-to-one mapping. For example, PC3 appears to correlate positively with IC2 and equally strongly negatively with IC3, suggesting that these two ICs are oblique to the PC and perhaps reflect symptom variation that is explained by a single PC. The orthogonality of the PCA solution forces the resulting components to capture maximally separated, unique symptom variance, which in turn map robustly on to unique neural circuits. We observed that the data may be distributed in such a way that in the ICA highly correlated independent components emerge, which do not maximally separate the symptom variance associate with neural variance. We demonstrate this by plotting the relationship between parcel beta coefficients for the βP C3GBC map versus the βIC2GBC and βIC3GBC maps Fig. ??G). The sigmoidal shape of the distribution indicates an improvement in the Z-statistics for the βP C3GBC map relative to the βIC2GBC and βIC3GBC maps.

      Additionally, the Reviewer raises an important point, and we agree that orthogonal versus oblique solutions warrant further investigation especially with regards to other psychiatric spectra and/or other stages in disease progression. For example, oblique components may better capture dimensions of behavioral variation in prodromal individuals, as these individuals are in the early stages of exhibiting psychosis-relevant symptoms and may show early diverging of dimensions of behavioral variation. We elaborate on this further in the Discussion:

      Another important aspect that will require further characterization is the possibility of oblique axes in the symptom-neural geometry. While orthogonal axes derived via PCA were appropriate here and similar to the ICA-derived axes in this solution, it is possible that oblique dimensions more clearly reflect the geometry of other psychiatric spectra and/or other stages in disease progression. For example, oblique components may better capture dimensions of neuro-behavioral variation in a sample of prodromal individuals, as these patients are exhibiting early-stage psychosis-like symptoms and may show signs of diverging along different trajectories.

      Critically, these factors should constitute key extensions of an iteratively more robust model for indi- vidualized symptom-neural mapping across the PSD and other psychiatric spectra. Relatedly, it will be important to identify the ‘limits’ of a given BBS solution – namely a PSD-derived effect may not generalize into the mood spectrum (i.e. both the symptom space and the resulting symptom-neural mapping is orthogonal). It will be important to evaluate if this framework can be used to initialize symptom-neural mapping across other mental health symptom spectra, such as mood/anxiety disorders.

      • The gene expression mapping section lacks some justification for why the 7 genes of interest were specifically chosen from among the numerous serotonin and GABA receptors and interneuron markers (relevant for PSD) available in the AHBA. Brief reference to the believed significance of the chosen genes in psychosis pathology would have helped to contextualize the observed relationship with the neuro-behavioural map.

      We thank the Reviewer for providing this suggestion and agree that it will strengthen the section on gene expression analysis. Of note, we did justify the choice for these genes, but we appreciate the opportunity to expand on the neurobiology of selected genes and their relevance to PSD. We have made these edits to the text:

      We focus here on serotonin receptor subunits (HTR1E, HTR2C, HTR2A), GABA receptor subunits (GABRA1, GABRA5), and the interneuron markers somatostatin (SST) and parvalbumin (PVALB). Serotonin agonists such as LSD have been shown to induce PSD-like symptoms in healthy adults (9) and the serotonin antagonism of “second-generation” antipsychotics are thought to contribute to their efficacy in targeting broad PSD symptoms (10–12). Abnormalities in GABAergic interneurons, which provide inhibitory control in neural circuits, may contribute to cognitive deficits in PSD (13–15) and additionally lead to downstream excitatory dysfunction that underlies other PSD symptoms (16, 17). In particular, a loss of prefrontal parvalbumin-expression fast-spiking interneurons has been implicated in PSD (18–21).

      • What the identified univariate neuro-behavioural mapping for PC3 ("psychosis configuration") actually means from an empirical or brain network perspective is not really ever discussed in detail. E.g., in Results, "a high positive PC3 score was associated with both reduced GBC across insular and superior dorsal cingulate cortices, thalamus, and anterior cerebellum and elevated GBC across precuneus, medial prefrontal, inferior parietal, superior temporal cortices and posterior lateral cerebellum." While the meaning and calculation of GBC can be gleaned from the Methods, a direct interpretation of the neuro-behavioural results in terms of the types of symptoms contributing to PC3 and relative hyper-/hypo-connectivity of the DMN compared to e.g. healthy controls could facilitate easier comparisons with the findings of past studies (since GBC does not seem to be a very commonly-used measure in the psychosis fMRI literature). Also important since GBC is a summary measure of the average connectivity of a region, and doesn't provide any specificity in terms of which regions in particular are more or less connected within a functional network (an inherent limitation of this measure which warrants further attention).

      We acknowledge that GBC is a linear combination measure that by definition does not provide information on connectivity between any one specific pair of neural regions. However, as shown by highly robust and reproducible neurobehavioral maps, GBC seems to be suitable as a first-pass metric in the absence of a priori assumptions of how specific regional connectivity may map to the PC symptom dimensions, and it has been shown to be sensitive to altered patterns of overall neural connectivity in PSD cohorts (22–25) as well as in models of psychosis (9, 26). Moreover, it is an assumption free method for dimensionality reduction of the neural connectivity matrix (which is a massive feature space). Furthermore, GBC provides neural maps (where each region can be represented by a value, in contrast to full functional connectivity matrices), which were necessary for quantifying the relationship with independent molecular benchmark maps (i.e. pharmacological maps and gene expression maps). We do acknowledge that there are limitations to the method which we now discuss in the paper. Furthermore we agree with the Reviewer that the specific regions implicated in these symptom-neural relationships warrants a more detailed investigation and we plan to develop this further in future studies, such as with seed-based functional connectivity using regions implicated in PSD (e.g. thalamus (2, 27)) or restricted GBC (22) which can summarize connectivity information for a specific network or subset of neural regions. We have provided elaboration and clarification regarding this point in the Discussion:

      Another improvement would be to optimize neural data reduction sensitivity for specific symptom variation (28). We chose to use GBC for our initial geometry characterizations as it is a principled and assumption-free data-reduction metric that captures (dys)connectivity across the whole brain and generates neural maps (where each region can be represented by a value, in contrast to full functional connectivity matrices) that are necessary for benchmarking against molecular imaging maps. However, GBC is a summary measure that by definition does not provide information regarding connectivity between specific pairs of neural regions, which may prove to be highly symptom-relevant and informative. Thus symptom-neural relationships should be further explored with higher-resolution metrics, such as restricted GBC (22) which can summarize connectivity information for a specific network or subset of neural regions, or seed-based FC using regions implicated in PSD (e.g. thalamus (2, 27)).

      • Possibly a nitpick, but while the inclusion of cognitive measures for PSD individuals is a main (self-)selling point of the paper, there's very limited focus on the "Cognitive functioning" component (PC2) of the PCA solution. Examining Fig. S8K, the GBC map for this cognitive component seems almost to be the inverse for that of the "Psychosis configuration" component (PC3) focused on in the rest of the paper. Since PC3 does not seem to have high loadings from any of the cognitive items, but it is known that psychosis spectrum individuals tend to exhibit cognitive deficits which also have strong predictive power for illness trajectory, some discussion of how multiple univariate neuro-behavioural features could feasibly be used in conjunction with one another could have been really interesting.

      This is an important piece of feedback concerning the cognitive measure aspect of the study. As the Reviewer recognizes, cognition is a core element of PSD symptoms and the key reason for including this symptom into the model. Notably, the finding that one dimension captures a substantial proportion of cognitive performance-related variance, independent of other residual symptom axes, has not previously been reported and we fully agree that expanding on this effect is important and warrants further discussion. We would like to take two of the key points from the Reviewers’ feedback and expand further. First, we recognize that upon qualitative inspection PC2 and PC3 neural maps appear strongly anti-correlated. However, as demonstrated in Fig. S9O, PC2 and PC3 maps were anti-correlated at r=-0.47. For comparison, the PC2 map was highly anti-correlated with the BACS composite cognitive map (r=-0.81). This implies that the PC2 map in fact reflects unique neural circuit variance that is relevant for cognition, but not necessarily an inverse of the PC3.

      In other words, these data suggest that there are PSD patients with more (or less) severe cognitive deficits independent of any other symptom axis, which would be in line with the observation that these symptoms are not treatable with antipsychotic medication (and therefore should not correlate with symptoms that are treatable by such medications; i.e. PC3). We have now added these points into the revised paper:

      Results Fig. 1E highlights loading configurations of symptom measures forming each PC. To aid interpretation, we assigned a name for each PC based on its most strongly weighted symptom measures. This naming is qualitative but informed by the pattern of loadings of the original 36 symptom measures (Fig. 1). For example, PC1 was highly consistent with a general impairment dimension (i.e. “Global Functioning”); PC2 reflected more exclusively variation in cognition (i.e. “Cognitive Functioning”); PC3 indexed a complex configuration of psychosis-spectrum relevant items (i.e. “Psy- chosis Configuration”); PC4 generally captured variation mood and anxiety related items (i.e. “Affective Valence”); finally, PC5 reflected variation in arousal and level of excitement (i.e. “Agitation/Excitation”). For instance, a generally impaired patient would have a highly negative PC1 score, which would reflect low performance on cognition and elevated scores on most other symptomatic items. Conversely, an individual with a high positive PC3 score would exhibit delusional, grandiose, and/or hallucinatory behavior, whereas a person with a negative PC3 score would exhibit motor retardation, social avoid- ance, possibly a withdrawn affective state with blunted affect (29). Comprehensive loadings for all 5 PCs are shown in Fig. 3G. Fig. 1F highlights the mean of each of the 3 diagnostic groups (colored spheres) and healthy controls (black sphere) projected into a 3-dimensional orthogonal coordinate system for PCs 1,2 & 3 (x,y,z axes respectively; alternative views of the 3-dimensional coordinate system with all patients projected are shown in Fig. 3). Critically, PC axes were not parallel with traditional aggregate symptom scales. For instance, PC3 is angled at 45◦ to the dominant direction of PANSS Positive and Negative symptom variation (purple and blue arrows respectively in Fig. 1F). ... Because PC3 loads most strongly on to hallmark symptoms of PSD (including strong positive load- ings across PANSS Positive symptom measures in the PANSS and strong negative loadings onto most Negative measures), we focus on this PC as an opportunity to quantify an innovative, fully data-driven dimension of symptom variation that is highly characteristic of the PSD patient population. Additionally, this bi-directional symptom axis captured shared variance from measures in other traditional symptoms factors, such the PANSS General factor and cognition. We found that the PC3 result provided a powerful empirical demonstration of how using a data-driven dimensionality-reduced solution (via PCA) can reveal novel patterns intrinsic to the structure of PSD psychopathology.

      Another nitpick, but the Y axes of Fig. 8C-E are not consistent, which causes some of the lines of best fit to be a bit misleading (e.g. GABRA1 appears to have a more strongly positive gene-PC relationship than HTR1E, when in reality the opposite is true.)

      We have scaled each axis to best show the data in each plot but see how this is confusing and recognise the need to correct this. We have remade the plots with consistent axes labelling.

      • The authors explain the apparent low reproducibility of their multivariate PSD neuro-behavioural solution using the argument that many psychiatric neuroimaging datasets are too small for multivariate analyses to be sufficiently powered. Applying an existing multivariate power analysis to their own data as empirical support for this idea would have made it even more compelling. The following paper suggests guidelines for sample sizes required for CCA/PLS as well as a multivariate calculator: Helmer, M., Warrington, S. D., Mohammadi-Nejad, A.-R., Ji, J. L., Howell, A., Rosand, B., Anticevic, A., Sotiropoulos, S. N., & Murray, J. D. (2020). On stability of Canonical Correlation Analysis and Partial Least Squares with application to brain-behavior associations (p. 2020.08.25.265546). https://doi.org/10.1101/2020.08.25.265546

      We deeply appreciate the Reviewer’s suggestion and the opportunity to incorporate the methods from the Helmer et al. paper. We now highlight the importance of having sufficiently powered samples for multivariate analyses in our other manuscript first-authored by our colleague Dr. Markus Helmer (3). Using the method described in the above paper (GEMMR version 0.1.2), we computed the estimated sample sizes required to power multivariate CCA analyses with 718 neural features and 5 behavioral (PC) features (i.e. the feature set used throughout the rest of the paper):

      As argued in Helmer et al., rtrue is likely below 0.3 in many cases, thus the estimated sample size of 33k is likely a lower bound for the required sample size for sufficiently-powered CCA analyses using the 718+5 features leveraged throughout the univariate analyses in the present manuscript. This number is two orders of magnitude greater than our available sample (and at least one order of magnitude greater than any single existing clinical dataset). Even if rtrue is 0.5, a sample size of ∼10k would likely be required.

      As argued in Helmer et al., rtrue is likely below 0.3 in many cases, thus the estimated sample size of 33k is likely a lower bound for the required sample size for sufficiently-powered CCA analyses using the 718+5 features leveraged throughout the univariate analyses in the present manuscript. This number is two orders of magnitude greater than our available sample (and at least one order of magnitude greater than any single existing clinical dataset). Even if rtrue is 0.5, a sample size of ∼10k would likely be required. We also computed the estimated sample sizes required for 180 neural features (symmetrized neural cortical parcels) and 5 symptom PC features, consistent with the CCA reported in our main text:

      Assuming that rtrue is likely below 0.3, this minimal required sample size remains at least an order of magnitude greater than the size of our present sample, consistent with the finding that the CCA solution computed using these data was unstable. As a lower limit for the required sample size plausible using the feature sets reported in our paper, we additionally computed for comparison the estimated N needed with the smallest number of features explored in our analyses, i.e. 12 neural functional network features and 5 symptom PC features:

      These required sample sizes are closer to the N=436 used in the present sample and samples reported in the clinical neuroimaging literature. This is consistent with the observation that when using 12 neural and 5 symptom features (Fig. S15C) the detected canonical correlation r = 0.38 for CV1 is much lower (and likely not inflated due to overfitting) and may be closer to the true effect because with the n=436 this effect is resolvable. This is in contrast to the 180 neural features and 5 symptom feature CCA solution where we observed a null CCA effect around r > 0.6 across all 5 CVs. This clearly highlights the inflation of the effect in the situation where the feature space grows. There is no a priori plausible reason to believe that the effect for 180 vs. 5 feature mapping is literally double the effect when using 12 vs. 5 feature mapping - especially as the 12 features are networks derived from the 180 parcels (i.e. the effect should be comparable rather than 2x smaller). Consequently, if the true CCA effect with 180 vs. 5 features was actually in the more comparable r = 0.38, we would need >5,000 subjects to resolve a reproducible neuro-behavioral CCA map (an order of magnitude more than in the BSNIP sample). Moreover, to confidently detect effects if rtrue is actually less than 0.3, we would require a sample size >8,145 subjects. We have added this to the Results section on our CCA results:

      Next, we tested if the 180-parcel CCA solution is stable and reproducible, as done with PC-to-GBC univariate results. The CCA solution was robust when tested with k-fold and leave-site-out cross- validation (Fig. S16) likely because these methods use CCA loadings derived from the full sample. However, the CCA loadings did not replicate in non-overlapping split-half samples (Fig. 5L, see see Supplementary Note 4). Moreover, a leave-one-subject-out cross-validation revealed that removing a single subject from the sample affected the CCA solution such that it did not generalize to the left-out subject (Fig. 5M). This is in contrast to the PCA-to-GBC univariate mapping, which was substantially more reproducible for all attempted cross-validations relative to the CCA approach. This is likely because substantially more power is needed to resolve a stable multivariate neuro-behavioral effect with this many features. Indeed, a multivariate power analysis using 180 neural features and 5 symptom features, and assuming a true canonical correlation of r = 0.3, suggests that a minimal sample size of N = 8145 is needed to sufficiently detect the effect (3), an order of magnitude greater than the available sample size. Therefore, we leverage the univariate neuro-behavioral result for subsequent subject-specific model optimization and comparisons to molecular neuroimaging maps.

      Additionally, we added the following to Supplementary Note 4: Establishing the Reproducibility of the CCA Solution:

      Here we outline the details of the split-half replication for the CCA solution. Specifically, the full patient sample was randomly split (referred to as “H1” and “H2” respectively), while preserving the proportion of patients in each diagnostic group. Then, CCA was performed independently for H1 and H2. While the loadings for behavioral PCs and original behavioral items are somewhat similar (mean r 0.5) between the two CCAs in each run, the neural loadings were not stable across H1 and H2 CCA solutions. Critically, CCA results did not perform well for leave-one-subject-out cross-validation (Fig. 5M). Here, one patient was held out while CCA was performed using all data from the remaining 435 patients. The loadings matrices Ψ and Θ from the CCA were then used to calculate the “predicted” neural and behavioral latent scores for all 5 CVs for the patient that was held out of the CCA solution. This process was repeated for every patient and the final result was evaluated for reproducibility. As described in the main text, this did not yield reproducible CCA effects (Fig. 5M). Of note, CCA may yield higher reproducibility if the neural feature space were to be further reduced. As noted, our approach was to first parcellate the BOLD signal and then use GBC as a data-driven method to yield a neuro-biologically and quantitatively interpretable neural data reduction, and we additionally symmetrized the result across hemispheres. Nevertheless, in sharp contrast to the PCA univariate feature selection approach, the CCA solutions were still not stable in the present sample size of N = 436. Indeed, a multivariate power analysis (3) estimates that the following sample sizes will be required to sufficiently power a CCA between 180 neural features and 5 symptom features, at different levels of true canonical correlation (rtrue):

      To test if further neural feature space reduction may be improve reproducibility, we also evaluated CCA solutions with neural GBC parcellated according to 12 brain-wide functional networks derived from the recent HCP driven network parcellation (30). Again, we computed the CCA for all 36 item-level symptom as well as 5 PCs (Fig. S15). As with the parcel-level effects, the network-level CCA analysis produced significant results (for CV1 when using 36 item-level scores and for all 5 CVs when using the 5 PC-derived scores). Here the result produced much lower canonical correlations ( 0.3-0.5); however, these effects (for CV1) clearly exceeded the 95% confidence interval generated via random permutations, suggesting that they may reflect the true canonical correlation. We observed a similar result when we evaluated CCAs computed with neural GBC from 192 symmetrized subcortical parcels and 36 symptoms or 5 PCs (Fig. S14). In other words, data-reducing the neural signal to 12 functional networks likely averaged out parcel-level information that may carry symptom-relevant variance, but may be closer to capturing the true effect. Indeed, the power analysis suggests that the current sample size is closer to that needed to detect an effect with 12 + 5 features:

      Note that we do not present a CCA conducted with parcels across the whole brain, as the number of variables would exceed the number of observations. However, the multivariate power analysis using 718 neural features and 5 symptom features estimates that the following sample sizes would be required to detect the following effects:

      This analysis suggests that even the lowest bound of 10k samples exceeds the present available sample size by two orders of magnitude.

      We have also added Fig. S19, illustrating these power analyses results:

      Fig. S19. Multivariate power analysis for CCA. Sample sizes were calculated according to (3), see also https://gemmr.readthedocs.io/en/latest/. We computed the multivariate power analyses for three versions of CCA reported in this manuscript: i) 718 neural vs. 5 symptom features; ii) 180 neural vs. 5 symptom features; iii) 12 neural vs. 5 symptom features. (A) At different levels of features, the ratio of samples (i.e. subjects) required per feature to derive a stable CCA solution remains approximately the same across all values of rtrue. As discussed in (3), at rtrue = 0.3 the number of samples required per feature is about 40, which is much greater than the ratio of samples to features available in our dataset. (B) The total number of samples required (nreq)) for a stable CCA solution given the total number of neural and symptom features used in our analyses, at different values of rtrue. In general these required sample sizes are much greater than the N=436 (light grey line) PSD in our present dataset, consistent with the finding that the CCA solutions computed using our data were unstable. Notably, the ‘12 vs. 5’ CCA assuming rtrue = 0.3 requires only 700 subjects, which is closest to the N=436 (horizontal grey line) used in the present sample. This may be in line with the observation of the CCA with 12 neural vs 5 symptom features (Fig. S15C) that the canonical correlation (r = 0.38 for CV1) clearly exceeds the 95% confidence interval, and may be closer to the true effect. However, to confidently detect effects in such an analysis (particularly if rtrue is actually less than 0.3), a larger sample would likely still be needed.

      We also added the corresponding methods in the Methods section:

      Multivariate CCA Power Analysis. Multivariate power analyses to estimate the minimum sample size needed to sufficiently power a CCA were computed using methods described in (3), using the Genera- tive Modeling of Multivariate Relationships tool (gemmr, https://github.com/murraylab/ gemmr (v0.1.2)). Briefly, a model was built by: 1) Generating synthetic datasets for the two input data matrices, by sampling from a multivariate normal distribution with a joint covariance matrix that was structured to encode CCA solutions with specified properties; 2) Performing CCAs on these synthetic datasets. Because the joint covariance matrix is known, the true values of estimated association strength, weights, scores, and loadings of the CCA, as well as the errors for these four metrics, can also be computed. In addition, statistical power that the estimated association strength is different from 0 is determined through permutation testing; 3) Varying parameters of the generative model (number of features, assumed true between-set correlation, within-set variance structure for both datasets) the required sample size Nreq is determined in each case such that statistical power reaches 90% and all of the above described error metrics fall to a target level of 10%; and 4) Fitting and validating a linear model to predict the required sample size Nreq from parameters of the generative model. This linear model was then used to calculate Nreq for CCA in three data scenarios: i) 718 neural vs. 5 symptom features; ii) 180 neural vs. 5 symptom features; iii) 12 neural vs. 5 symptom features.

      • Given the relatively even distribution of males and females in the dataset, some examination of sex effects on symptom dimension loadings or neuro-behavioural maps would have been interesting (other demographic characteristics like age and SES are summarized for subjects but also not investigated). I think this is a missed opportunity.

      We have now provided additional analyses for the core PCA and univariate GBC mapping results, testing for effects of age, sex, and SES in Fig. S8. Briefly, we observed a significant positive relationship between age and PC3 scores, which may be because older patients (whom presumably have been ill for a longer time) exhibit more severe symptoms along the positive PC3 – Psychosis Configuration dimension. We also observed a significant negative relationship between Hollingshead index of SES and PC1 and PC2 scores. Lower PC1 and PC2 scores indicate poorer general functioning and cognitive performance respectively, which is consistent with higher Hollingshead indices (i.e. lower-skilled jobs or unemployment and fewer years of education). We also found significant sex differences in PC2 – Cognitive Functioning, PC4 – Affective Valence, and PC5 – Agitation/Excitement scores.

      Fig. S8. Effects of age, socio-economic status, and sex on symptom PCA solution. (A) Correlations between symptom PC scores and age (years) across N=436 PSD. Pearson’s correlation value and uncorrected p-values are reported above scatterplots. After Bonferroni correction, we observed a significant positive relationship between age and PC3 score. This may be because older patients have been ill for a longer period of time and exhibit more severe symptoms along the positive PC3 dimension. (B) Correlations between symptom PC scores and socio-economic status (SES) as measured by the Hollingshead Index of Social Position (31), across N=387 PSD with available data. The index is computed as (Hollingshead occupation score * 7) + (Hollingshead education score * 4); a higher score indicates lower SES (32). We observed a significant negative relationship between Hollingshead index and PC1 and PC2 scores. Lower PC1 and PC2 scores indicate poorer general functioning and cognitive performance respectively, which is consistent with higher Hollingshead indices (i.e. lower-skilled jobs or unemployment and fewer years of education). (C) The Hollingshead index can be split into five classes, with 1 being the highest and 5 being the lowest SES class (31). Consistent with (B) we found a significant difference between the classes after Bonferroni correction for PC1 and PC2 scores. (D) Distributions of PC scores across Hollingshead SES classes show the overlap in scores. White lines indicate the mean score in each class. (E) Differences in PC scores between (M)ale and (F)emale PSD subjects. We found a significant difference between sexes in PC2 – Cognitive Functioning, PC4 – Affective Valence, and PC5 – Agitation/Excitement scores. (F) Distributions of PC scores across M and F subjects show the overlap in scores. White lines indicate the mean score for each sex.

      Bibliography

      1. Jie Lisa Ji, Caroline Diehl, Charles Schleifer, Carol A Tamminga, Matcheri S Keshavan, John A Sweeney, Brett A Clementz, S Kristian Hill, Godfrey Pearlson, Genevieve Yang, et al. Schizophrenia exhibits bi-directional brain-wide alterations in cortico-striato-cerebellar circuits. Cerebral Cortex, 29(11):4463–4487, 2019.
      2. Alan Anticevic, Michael W Cole, Grega Repovs, John D Murray, Margaret S Brumbaugh, Anderson M Winkler, Aleksandar Savic, John H Krystal, Godfrey D Pearlson, and David C Glahn. Characterizing thalamo-cortical disturbances in schizophrenia and bipolar illness. Cerebral cortex, 24(12):3116–3130, 2013.
      3. Markus Helmer, Shaun D Warrington, Ali-Reza Mohammadi-Nejad, Jie Lisa Ji, Amber Howell, Benjamin Rosand, Alan Anticevic, Stamatios N Sotiropoulos, and John D Murray. On stability of canonical correlation analysis and partial least squares with application to brain-behavior associations. bioRxiv, 2020. .
      4. Richard Dinga, Lianne Schmaal, Brenda WJH Penninx, Marie Jose van Tol, Dick J Veltman, Laura van Velzen, Maarten Mennes, Nic JA van der Wee, and Andre F Marquand. Evaluating the evidence for biotypes of depression: Methodological replication and extension of. NeuroImage: Clinical, 22:101796, 2019.
      5. Cedric Huchuan Xia, Zongming Ma, Rastko Ciric, Shi Gu, Richard F Betzel, Antonia N Kaczkurkin, Monica E Calkins, Philip A Cook, Angel Garcia de la Garza, Simon N Vandekar, et al. Linked dimensions of psychopathology and connectivity in functional brain networks. Nature communications, 9(1):3003, 2018.
      6. Andrew T Drysdale, Logan Grosenick, Jonathan Downar, Katharine Dunlop, Farrokh Mansouri, Yue Meng, Robert N Fetcho, Benjamin Zebley, Desmond J Oathes, Amit Etkin, et al. Resting-state connectivity biomarkers define neurophysiological subtypes of depression. Nature medicine, 23(1):28, 2017.
      7. Meichen Yu, Kristin A Linn, Russell T Shinohara, Desmond J Oathes, Philip A Cook, Romain Duprat, Tyler M Moore, Maria A Oquendo, Mary L Phillips, Melvin McInnis, et al. Childhood trauma history is linked to abnormal brain connectivity in major depression. Proceedings of the National Academy of Sciences, 116(17):8582–8590, 2019.
      8. David R Hardoon, Sandor Szedmak, and John Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural computation, 16(12):2639–2664, 2004.
      9. Katrin H Preller, Joshua B Burt, Jie Lisa Ji, Charles H Schleifer, Brendan D Adkinson, Philipp Stämpfli, Erich Seifritz, Grega Repovs, John H Krystal, John D Murray, et al. Changes in global and thalamic brain connectivity in LSD-induced altered states of consciousness are attributable to the 5-HT2A receptor. eLife, 7:e35082, 2018.
      10. Mark A Geyer and Franz X Vollenweider. Serotonin research: contributions to understanding psychoses. Trends in pharmacological sciences, 29(9):445–453, 2008.
      11. H Y Meltzer, B W Massey, and M Horiguchi. Serotonin receptors as targets for drugs useful to treat psychosis and cognitive impairment in schizophrenia. Current pharmaceutical biotechnology, 13(8):1572–1586, 2012.
      12. Anissa Abi-Dargham, Marc Laruelle, George K Aghajanian, Dennis Charney, and John Krystal. The role of serotonin in the pathophysiology and treatment of schizophrenia. The Journal of neuropsychiatry and clinical neurosciences, 9(1):1–17, 1997.
      13. Francine M Benes and Sabina Berretta. Gabaergic interneurons: implications for understanding schizophrenia and bipolar disorder. Neuropsychopharmacology, 25(1):1–27, 2001.
      14. Melis Inan, Timothy J. Petros, and Stewart A. Anderson. Losing your inhibition: Linking cortical gabaergic interneurons to schizophrenia. Neurobiology of Disease, 53:36–48, 2013. ISSN 0969-9961. . What clinical findings can teach us about the neurobiology of schizophrenia?
      15. Samuel J Dienel and David A Lewis. Alterations in cortical interneurons and cognitive function in schizophrenia. Neurobiology of disease, 131:104208, 2019.
      16. John E Lisman, Joseph T Coyle, Robert W Green, Daniel C Javitt, Francine M Benes, Stephan Heckers, and Anthony A Grace. Circuit-based framework for understanding neurotransmitter and risk gene interactions in schizophrenia. Trends in neurosciences, 31(5):234–242, 2008.
      17. Anthony A Grace. Dysregulation of the dopamine system in the pathophysiology of schizophrenia and depression. Nature Reviews Neuroscience, 17(8):524, 2016.
      18. John F Enwright III, Zhiguang Huo, Dominique Arion, John P Corradi, George Tseng, and David A Lewis. Transcriptome alterations of prefrontal cortical parvalbumin neurons in schizophrenia. Molecular psychiatry, 23(7): 1606–1613, 2018.
      19. Daniel J Lodge, Margarita M Behrens, and Anthony A Grace. A loss of parvalbumin-containing interneurons is associated with diminished oscillatory activity in an animal model of schizophrenia. Journal of Neuroscience, 29(8): 2344–2354, 2009.
      20. Clare L Beasley and Gavin P Reynolds. Parvalbumin-immunoreactive neurons are reduced in the prefrontal cortex of schizophrenics. Schizophrenia research, 24(3):349–355, 1997.
      21. David A Lewis, Allison A Curley, Jill R Glausier, and David W Volk. Cortical parvalbumin interneurons and cognitive dysfunction in schizophrenia. Trends in neurosciences, 35(1):57–67, 2012.
      22. Alan Anticevic, Margaret S Brumbaugh, Anderson M Winkler, Lauren E Lombardo, Jennifer Barrett, Phillip R Corlett, Hedy Kober, June Gruber, Grega Repovs, Michael W Cole, et al. Global prefrontal and fronto-amygdala dysconnectivity in bipolar i disorder with psychosis history. Biological psychiatry, 73(6):565–573, 2013.
      23. Alex Fornito, Jong Yoon, Andrew Zalesky, Edward T Bullmore, and Cameron S Carter. General and specific functional connectivity disturbances in first-episode schizophrenia during cognitive control performance. Biological psychiatry, 70(1):64–72, 2011.
      24. Avital Hahamy, Vince Calhoun, Godfrey Pearlson, Michal Harel, Nachum Stern, Fanny Attar, Rafael Malach, and Roy Salomon. Save the global: global signal connectivity as a tool for studying clinical populations with functional magnetic resonance imaging. Brain connectivity, 4(6):395–403, 2014.
      25. Michael W Cole, Alan Anticevic, Grega Repovs, and Deanna Barch. Variable global dysconnectivity and individual differences in schizophrenia. Biological psychiatry, 70(1):43–50, 2011.
      26. Naomi R Driesen, Gregory McCarthy, Zubin Bhagwagar, Michael Bloch, Vincent Calhoun, Deepak C D’Souza, Ralitza Gueorguieva, George He, Ramani Ramachandran, Raymond F Suckow, et al. Relationship of resting brain hyperconnectivity and schizophrenia-like symptoms produced by the nmda receptor antagonist ketamine in humans. Molecular psychiatry, 18(11):1199–1204, 2013.
      27. Neil D Woodward, Baxter Rogers, and Stephan Heckers. Functional resting-state networks are differentially affected in schizophrenia. Schizophrenia research, 130(1-3):86–93, 2011.
      28. Zarrar Shehzad, Clare Kelly, Philip T Reiss, R Cameron Craddock, John W Emerson, Katie McMahon, David A Copland, F Xavier Castellanos, and Michael P Milham. A multivariate distance-based analytic framework for connectome-wide association studies. Neuroimage, 93 Pt 1:74–94, Jun 2014. .
      29. Alan J Gelenberg. The catatonic syndrome. The Lancet, 307(7973):1339–1341, 1976.
      30. Jie Lisa Ji, Marjolein Spronk, Kaustubh Kulkarni, Grega Repovš, Alan Anticevic, and Michael W Cole. Mapping the human brain’s cortical-subcortical functional network organization. NeuroImage, 185:35–57, 2019.
      31. August B Hollingshead et al. Four factor index of social status. 1975.
      32. Jaya L Padmanabhan, Neeraj Tandon, Chiara S Haller, Ian T Mathew, Shaun M Eack, Brett A Clementz, Godfrey D Pearlson, John A Sweeney, Carol A Tamminga, and Matcheri S Keshavan. Correlations between brain structure and symptom dimensions of psychosis in schizophrenia, schizoaffective, and psychotic bipolar i disorders. Schizophrenia bulletin, 41(1):154–162, 2015.
    1. Author Response

      Reviewer #1 (Public Review):

      Buglak et al. describe a role for the nuclear envelope protein Sun1 in endothelial mechanotransduction and vascular development. The study provides a full mechanistic investigation of how Sun1 is achieving its function, which supports the concept that nuclear anchoring is important for proper mechanosensing and junctional organization. The experiments have been well designed and were quantified based on independent experiments. The experiments are convincing and of high quality and include Sun1 depletion in endothelial cell cultures, zebrafish, and in endothelial-specific inducible knockouts in mice.

      We thank the reviewer for their enthusiastic comments and for noting our use of multiple model systems.

      Reviewer #2 (Public Review):

      Endothelial cells mediate the growth of the vascular system but they also need to prevent vascular leakage, which involves interactions with neighboring endothelial cells (ECs) through junctional protein complexes. Buglak et al. report that the EC nucleus controls the function of cell-cell junctions through the nuclear envelope-associated proteins SUN1 and Nesprin-1. They argue that SUN1 controls microtubule dynamics and junctional stability through the RhoA activator GEF-H1.

      In my view, this study is interesting and addresses an important but very little-studied question, namely the link between the EC nucleus and cell junctions in the periphery. The study has also made use of different model systems, i.e. genetically modified mice, zebrafish, and cultured endothelial cells, which confirms certain findings and utilizes the specific advantages of each model system. A weakness is that some important controls are missing. In addition, the evidence for the proposed molecular mechanism should be strengthened.

      We thank the reviewer for their interest in our work and for highlighting the relative lack of information regarding connections between the EC nucleus and cell periphery, and for noting our use of multiple model systems. We thank the reviewer for suggesting additional controls and mechanistic support, and we have made the revisions described below.

      Specific comments:

      1) Data showing the efficiency of Sun1 inactivation in the murine endothelial cells is lacking. It would be best to see what is happening on the protein level, but it would already help a great deal if the authors could show a reduction of the transcript in sorted ECs. The excision of a DNA fragment shown in the lung (Fig. 1-suppl. 1C) is not quantitative at all. In addition, the gel has been run way too short so it is impossible to even estimate the size of the DNA fragment.

      We agree that the DNA excision is not sufficient to demonstrate excision efficiency. We attempted examination of SUN1 protein levels in mutant retinas via immunofluorescence, but to date we have not found a SUN1 antibody that works in mouse retinal explants. We argue that mouse EC isolation protocols enrich but don’t give 100% purity, so that RNA analysis of lung tissue also has caveats. Finally, we contend that our demonstration of a consistent vascular phenotype in Sun1iECKO mutant retinas argues that excision has occurred. To test the efficiency of our excision protocol, we bred Cdh5CreERT2 mice with the ROSAmT/mG excision reporter (cells express tdTomato absent Cre activity and express GFP upon Cre-mediated excision (Muzumdar et al., 2007). Utilizing the same excision protocol as used for the Sun1iECKO mice, we see a significantly high level of excision in retinal vessels only in the presence of Cdh5CreERT2 (Reviewer Figure 1).

      Reviewer Figure 1: Cdh5CreERT2 efficiently excises in endothelial cells of the mouse postnatal retina. (A) Representative images of P7 mouse retinas with the indicated genotypes, stained for ERG (white, nucleus). tdTomato (magenta) is expressed in cells that have not undergone Cre-mediated excision, while GFP (green) is expressed in excised cells. Scale bar, 100μm. (B) Quantification of tdTomato fluorescence relative to GFP fluorescence as shown in A. tdTomato and GFP fluorescence of endothelial cells was measured by creating a mask of the ERG channel. n=3 mice per genotype. ***, p<0.001 by student’s two-tailed unpaired t-test.

      2) The authors show an increase in vessel density in the periphery of the growing Sun1 mutant retinal vasculature. It would be important to add staining with a marker labelling EC nuclei (e.g. Erg) because higher vessel density might reflect changes in cell size/shape or number, which has also implications for the appearance of cell-cell junctions. More ECs crowded within a small area are likely to have more complicated junctions. Furthermore, it would be useful and straightforward to assess EC proliferation, which is mentioned later in the experiments with cultured ECs but has not been addressed in the in vivo part.

      We concur that ERG staining is important to show any changes in nuclear shape or cell density in the post-natal retina. We now include this data in Figure1-figure supplement 1F-G. We do not see obvious changes in nuclear shape or number, though we do observe some crowding in Sun1iECKO retinas, consistent with increased density. However, when normalized to total vessel area, we do not observe a significant difference in the nuclear signal density in Sun1iECKO mutant retinas relative to controls.

      3) It appears that the loss of Sun1/sun1b in mice and zebrafish is compatible with major aspects of vascular growth and leads to changes in filopodia dynamics and vascular permeability (during development) without severe and lasting disruption of the EC network. It would be helpful to know whether the loss-of-function mutants can ultimately form a normal vascular network in the retina and trunk, respectively. It might be sufficient to mention this in the text.

      We thank the reviewer for pointing this out. It is true that developmental defects in the vasculature resulting from various genetic mutations are often resolved over time. We’ve made text changes to discuss viability of Sun1 global KO mice and lack of perduring effects in sun1 morphant fish, perhaps resulting from compensation by SUN2, which is partially functionally redundant with SUN1 in vivo (Lei et al., 2009; Zhang, et al., 2009) (p. 20).

      4) The only readout after the rescue of the SUN1 knockdown by GEF-H1 depletion is the appearance of VE-cadherin+ junctions (Fig. 6G and H). This is insufficient evidence for a relatively strong conclusion. The authors should at least look at microtubules. They might also want to consider the activation status of RhoA as a good biochemical readout. It is argued that RhoA activity goes up (see Fig. 7C) but there is no data supporting this conclusion. It is also not clear whether "diffuse" GEF-H1 localization translates into increased Rho A activity, as is suggested by the Rho kinase inhibition experiment. GEF-H1 levels in the Western blot in (Fig. 6- supplement 2C) have not been quantitated.

      We agree that analysis of RhoA activity and additional analysis of rescued junctions strengthens our conclusions, so we performed these experiments. New data (Figure 6IJ) shows that co-depletion of SUN1 and GEF-H1 rescues junction integrity as measured by biotin-matrix labeling. Interestingly, co-depletion of SUN1 and GEF-H1 does not rescue reduced microtubule density at the periphery (Figure 6-figure supplement 3BC), placing GEF-H1 downstream of aberrant microtubule dynamics in SUN1 depleted cells. This is consistent with our model (Figure 8) describing how loss of SUN1 leads to increased microtubule depolymerization, resulting in release and activation of GEF-H1 that goes on to affect actomyosin contractility and junction integrity. In addition, we include images of the junctions in GEF-H1 single KD (Figure 6-figure supplement 3BC) and quantify the western blot in Figure 6-figure supplement 3A.

      We performed RhoA activity assays and new data shows that SUN1 depletion results in increased RhoA activation, while co-depletion of SUN1 and GEF-H1 ameliorates this increase (Figure 6-figure supplement 2D). This is consistent with our model in which loss of SUN1 leads to increased RhoA activity via release of GEF-H1 from microtubules. In addition, we now cite a recent study describing that GEF-H1 is activated when unbound to microtubules, with this activation resulting in increased RhoA activity (Azoitei et al., 2019).

      5) The criticism raised for the GEF-H1 rescue also applies to the co-depletion of SUN1 and Nesprin-1. This mechanistic aspect is currently somewhat weak and should be strengthened. Again, Rho A activity might be a useful and quantitative biochemical readout.

      We respectfully point out that we showed that co-depletion of nesprin-1 and SUN1 rescues SUN1 knockdown effects via several readouts, including rescue of junction morphology, biotin labeling, microtubule localization at the periphery, and GEFH1/microtubule localization. We’ve moved this data to the main figure (Figure 7B-C, E-F) to better highlight these mechanistic findings. These results are consistent with our model that nesprin-1 effects are upstream of GEF-H1 localization. We also added results showing that nesprin-1 knockdown alone does not affect junction integrity, microtubule density, or GEF-H1/microtubule localization (Figure 7-figure supplement 1B-G).

      Reviewer #3 (Public Review):

      Here, Buglak and coauthors describe the effect of Sun1 deficiency on endothelial junctions. Sun1 is a component of the LINC complex, connecting the inner nuclear membrane with the cytoskeleton. The authors show that in the absence of Sun1, the morphology of the endothelial adherens junction protein VE-cadherin is altered, indicative of increased internalization of VE-cadherin. The change in VE-cadherin dynamics correlates with decreased angiogenic sprouting as shown using in vivo and in vitro models. The study would benefit from a stricter presentation of the data and needs additional controls in certain analyses.

      We thank the reviewer for their insightful comments, and in response we have performed the revisions described below.

      1) The authors implicate the changes in VE-cadherin morphology to be of consequence for "barrier function" and mention barrier function frequently throughout the text, for example in the heading on page 12: "SUN1 stabilizes endothelial cell-cell junctions and regulates barrier function". The concept of "barrier" implies the ability of endothelial cells to restrict the passage of molecules and cells across the vessel wall. This is tested only marginally (Suppl Fig 1F) and these data are not quantified. Increased leakage of 10kDa dextran in a P6-7 Sun1-deficient retina as shown here probably reflects the increased immaturity of the Sun1-deficient retinal vasculature. From these data, the authors cannot state that Sun1 regulates the barrier or barrier function (unclear what exactly the authors refer to when they make a distinction between the barrier as such on the one hand and barrier function on the other). The authors can, if they do more experiments, state that loss of Sun1 leads to increased leakage in the early postnatal stages in the retina. However, if they wish to characterize the vascular barrier, there is a wide range of other tissue that should be tested, in the presence and absence of disease. Moreover, a regulatory role for Sun1 would imply that Sun1 normally, possibly through changes in its expression levels, would modulate the barrier properties to allow more or less leakage in different circumstances. However, no such data are shown. The authors would need to go through their paper and remove statements regarding the regulation of the barrier and barrier function since these are conclusions that lack foundation.

      We thank the reviewer for pointing out that the language used regarding the function and integrity of the junctions is confusing, although we suggest that the endothelial cell properties measured by our assays are typically equated with “barrier function” in the literature. However, we have edited our language to precisely describe our results as suggested by the reviewer.

      2) In Fig 6g, the authors show that "depletion of GEF-H1 in endothelial cells that were also depleted for SUN1 rescued the destabilized cell-cell junctions observed with SUN1 KD alone". However, it is quite clear that Sun1 depletion also affects cell shape and cell alignment and this is not rescued by GEF-H1 depletion (Fig 6g). This should be described and commented on. Moreover please show the effects of GEF-H1 alone.

      We thank the reviewer for pointing out the effects on cell shape. SUN1 depletion typically leads to shape changes consistent with elevated contractility, but this is considered to be downstream of the effects quantified here. We updated the panel in Figure 6G to a more representative image showing cell shape rescue by co-depletion of SUN1 and GEF-H1. We present new data panels showing that GEF-H1 depletion alone does not affect junction integrity (Figure 6I-J). We also present new data showing that co-depletion of GEF-H1 and SUN1 does not rescue microtubule density at the periphery (Figure 6-figure supplement 3B-C), consistent with our model that GEF-H1 activation is downstream of microtubule perturbations induced by SUN1 loss.

      3) In Fig. 6a, the authors show rescue of junction morphology in Sun1-depleted cells by deletion of Nesprin1. The effect of Nesprin1 KD alone is missing.

      We thank the reviewer for this comment, and we now include new panels (Figure 7figure supplement 1B-G) demonstrating that Nesprin-1 depletion does not affect biotin-matrix labeling, peripheral microtubule density, or GEF-H1/microtubule localization absent co-depletion with SUN1. These findings are consistent with our model that Nesprin-1 loss does not affect cell junctions on its own because it is held in a non-functional complex with SUN1 that is not available in the absence of SUN1.

      References

      Azoitei, M. L., Noh, J., Marston, D. J., Roudot, P., Marshall, C. B., Daugird, T. A., Lisanza, S. L., Sandί, M., Ikura, M., Sondek, J., Rottapel, R., Hahn, K. M., Danuser, & Danuser, G. (2019). Spatiotemporal dynamics of GEF-H1 activation controlled by microtubule- and Src-mediated pathways. Journal of Cell Biology, 218(9), 3077-3097. https://doi.org/10.1083/jcb.201812073

      Denis, K. B., Cabe, J. I., Danielsson, B. E., Tieu, K. V, Mayer, C. R., & Conway, D. E. (2021). The LINC complex is required for endothelial cell adhesion and adaptation to shear stress and cyclic stretch. Molecular Biology of the Cell, mbcE20110698. https://doi.org/10.1091/mbc.E20-11-0698

      King, S. J., Nowak, K., Suryavanshi, N., Holt, I., Shanahan, C. M., & Ridley, A. J. (2014). Nesprin-1 and nesprin-2 regulate endothelial cell shape and migration. Cytoskeleton (Hoboken, N.J.), 71(7), 423–434. https://doi.org/10.1002/cm.21182

      Lei, K., Zhang, X., Ding, X., Guo, X., Chen, M., Zhu, B., Xu, T., Zhuang, Y., Xu, R., & Han, M. (2009). SUN1 and SUN2 play critical but partially redundant roles in anchoring nuclei in skeletal muscle cells in mice. PNAS, 106(25), 10207–10212.

      Muzumdar, M. D., Tasic, B., Miyamichi, K., Li, L., & Luo, L. (2007). A global doublefluorescent Cre reporter mouse. Genesis, 45(9), 593-605. https://doi.org/10.1002/dvg.20335

      Ueda, N., Maekawa, M., Matsui, T. S., Deguchi, S., Takata, T., Katahira, J., Higashiyama, S., & Hieda, M. (2022). Inner Nuclear Membrane Protein, SUN1, is Required for Cytoskeletal Force Generation and Focal Adhesion Maturation. Frontiers in Cell and Developmental Biology, 10, 885859. https://doi.org/10.3389/fcell.2022.885859

      Zhang, X., Lei, K., Yuan, X., Wu, X., Zhuang, Y., Xu, T., Xu, R., & Han, M. (2009). SUN1/2 and Syne/Nesprin-1/2 complexes connect centrosome to the nucleus during neurogenesis and neuronal migration in mice. Neuron, 64(2), 173–187. https://doi.org/10.1016/j.neuron.2009.08.018.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The study by Teplenin and coworkers assesses the combined effects of localized depolarization and excitatory electrical stimulation in myocardial monolayers. They study the electrophysiological behaviour of cultured neonatal rat ventricular cardiomyocytes expressing the light-gated cation channel Cheriff, allowing them to induce local depolarization of varying area and amplitude, the latter titrated by the applied light intensity. In addition, they used computational modeling to screen for critical parameters determining state transitions and to dissect the underlying mechanisms. Two stable states, thus bistability, could be induced upon local depolarization and electrical stimulation, one state characterized by a constant membrane voltage and a second, spontaneously firing, thus oscillatory state. The resulting 'state' of the monolayer was dependent on the duration and frequency of electrical stimuli, as well as the size of the illuminated area and the applied light intensity, determining the degree of depolarization as well as the steepness of the local voltage gradient. In addition to the induction of oscillatory behaviour, they also tested frequency-dependent termination of induced oscillations.

      Strengths:

      The data from optogenetic experiments and computational modelling provide quantitative insights into the parameter space determining the induction of spontaneous excitation in the monolayer. The most important findings can also be reproduced using a strongly reduced computational model, suggesting that the observed phenomena might be more generally applicable.

      Weaknesses:

      While the study is thoroughly performed and provides interesting mechanistic insights into scenarios of ventricular arrhythmogenesis in the presence of localized depolarized tissue areas, the translational perspective of the study remains relatively vague. In addition, the chosen theoretical approach and the way the data are presented might make it difficult for the wider community of cardiac researchers to understand the significance of the study.

      Reviewer #2 (Public review):

      In the presented manuscript, Teplenin and colleagues use both electrical pacing and optogenetic stimulation to create a reproducible, controllable source of ectopy in cardiomyocyte monolayers. To accomplish this, they use a careful calibration of electrical pacing characteristics (i.e., frequency, number of pulses) and illumination characteristics (i.e., light intensity, surface area) to show that there exists a "sweet spot" where oscillatory excitations can emerge proximal to the optogenetically depolarized region following electrical pacing cessation, akin to pacemaker cells. Furthermore, the authors demonstrate that a high-frequency electrical wave-train can be used to terminate these oscillatory excitations. The authors observed this oscillatory phenomenon both in vitro (using neonatal rat ventricular cardiomyocyte monolayers) and in silico (using a computational action potential model of the same cell type). These are surprising findings and provide a novel approach for studying triggered activity in cardiac tissue.

      The study is extremely thorough and one of the more memorable and grounded applications of cardiac optogenetics in the past decade. One of the benefits of the authors' "two-prong" approach of experimental preps and computational models is that they could probe the number of potential variable combinations much deeper than through in vitro experiments alone. The strong similarities between the real-life and computational findings suggest that these oscillatory excitations are consistent, reproducible, and controllable.

      Triggered activity, which can lead to ventricular arrhythmias and cardiac sudden death, has been largely attributed to sub-cellular phenomena, such as early or delayed afterdepolarizations, and thus to date has largely been studied in isolated single cardiomyocytes. However, these findings have been difficult to translate to tissue and organ-scale experiments, as well-coupled cardiac tissue has notably different electrical properties. This underscores the significance of the study's methodological advances: the use of a constant depolarizing current in a subset of (illuminated) cells to reliably result in triggered activity could facilitate the more consistent evaluation of triggered activity at various scales. An experimental prep that is both repeatable and controllable (i.e., both initiated and terminated through the same means).

      The authors also substantially explored phase space and single-cell analyses to document how this "hidden" bi-stable phenomenon can be uncovered during emergent collective tissue behavior. Calibration and testing of different aspects (e.g., light intensity, illuminated surface area, electrical pulse frequency, electrical pulse count) and other deeper analyses, as illustrated in Appendix 2, Figures 3-8, are significant and commendable.

      Given that the study is computational, it is surprising that the authors did not replicate their findings using well-validated adult ventricular cardiomyocyte action potential models, such as ten Tusscher 2006 or O'Hara 2011. This may have felt out of scope, given the nice alignment of rat cardiomyocyte data between in vitro and in silico experiments. However, it would have been helpful peace-of-mind validation, given the significant ionic current differences between neonatal rat and adult ventricular tissue. It is not fully clear whether the pulse trains could have resulted in the same bi-stable oscillatory behavior, given the longer APD of humans relative to rats. The observed phenomenon certainly would be frequency-dependent and would have required tedious calibration for a new cell type, albeit partially mitigated by the relative ease of in silico experiments.

      For all its strengths, there are likely significant mechanistic differences between this optogenetically tied oscillatory behavior and triggered activity observed in other studies. This is because the constant light-elicited depolarizing current is disrupting the typical resting cardiomyocyte state, thereby altering the balance between depolarizing ionic currents (such as Na+ and Ca2+) and repolarizing ionic currents (such as K+ and Ca2+). The oscillatory excitations appear to later emerge at the border of the illuminated region and non-stimulated surrounding tissue, which is likely an area of high source-sink mismatch. The authors appear to acknowledge differences in this oscillatory behavior and previous sub-cellular triggered activity research in their discussion of ectopic pacemaker activity, which is canonically expected more so from genetic or pathological conditions. Regardless, it is exciting to see new ground being broken in this difficult-to-characterize experimental space, even if the method illustrated here may not necessarily be broadly applicable.

      We thank the reviewers for their thoughtful and constructive feedback, as well as for recognizing the conceptual and technical strengths of our work. We are especially pleased that our integrated use of optogenetics, electrical pacing, and computational modelling was seen as a rigorous and innovative approach to investigating spontaneous excitability in cardiac tissue.

      At the core of our study was the decision to focus exclusively on neonatal rat ventricular cardiomyocytes. This ensured a tightly controlled and consistent environment across experimental and computational settings, allowing for direct comparison and deeper mechanistic insight. While extending our findings to adult or human cardiomyocytes would enhance translational relevance, such efforts are complicated by the distinct ionic properties and action potential dynamics of these cells, as also noted by Reviewer #2. For this foundational study, we chose to prioritize depth and clarity over breadth.

      Our computational domain was designed to faithfully reflect the experimental system. The strong agreement between both domains is encouraging and supports the robustness of our framework. Although some degree of theoretical abstraction was necessary (thereby sometimes making it a bit harder to read), it reflects the intrinsic complexity of the collective behaviours we aimed to capture such as emergent bi-stability. To make these ideas more accessible, we included simplified illustrations, a reduced model, and extensive supplementary material.

      A key insight from our work is the emergence of oscillatory behaviour through interaction of illuminated and non-illuminated regions. Rather than replicating classical sub-cellular triggered activity, this behaviour arises from systems-level dynamics shaped by the imposed depolarizing current and surrounding electrotonic environment. By tuning illumination and local pacing parameters, we could reproducibly induce and suppress these oscillations, thereby providing a controllable platform to study ectopy as a manifestation of spatial heterogeneity and collective dynamics.

      Altogether, our aim was to build a clear and versatile model system for investigating how spatial structure and pacing influence the conditions under which bistability becomes apparent in cardiac tissue. We believe this platform lays strong groundwork for future extensions into more physiologically and clinically relevant contexts.

      In revising the manuscript, we carefully addressed all points raised by the reviewers. We have also responded to each of their specific comments in detail, which are provided below.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the authors):

      Please find my specific comments and suggestions below:

      (1) Line 64: When first introduced, the concept of 'emergent bi-stability' may not be clear to the reader.

      We concur that the full breadth of the concept of emergent bi-stability may not be immediately clear upon first mention. Nonetheless, its components have been introduced separately: “emergent” was linked to multicellular behaviour in line 63, while “bi-stability” was described in detail in lines 39–56. We therefore believe that readers could form an intuitive understanding of the combined term, which will be further clarified as the manuscript develops. To further ease comprehension of the reader, we have added the following clarification to line 64:

      “Within this dynamic system of cardiomyocytes, we investigated emergent bi-stability (a concept that will be explained more thoroughly later on) in cell monolayers under the influence of spatial depolarization patterns.”

      (2) Lines 67-80: While the introduction until line 66 is extremely well written, the introduction of both cardiac arrhythmia and cardiac optogenetics could be improved. It is especially surprising that miniSOG is first mentioned as a tool for optogenetic depolarisation of cardiomyocytes, as the authors would probably agree that Channelrhodopsins are by far the most commonly applied tools for optogenetic depolarisation (please also refer to the literature by others in this respect). In addition, miniSOG has side effects other than depolarisation, and thus cannot be the tool of choice when not directly studying the effects of oxidative stress or damage.

      The reviewer is absolutely correct in noting that channelrhodopsins are the most commonly applied tools for optogenetic depolarisation. We introduced miniSOG primarily for historical context: the effects of specific depolarization patterns on collective pacemaker activity were first observed with this tool (Teplenin et al., 2018). In that paper, we also reported ultralong action potentials, occurring as a side effect of cumulative miniSOG-induced ROS damage. In the following paragraph (starting at line 81), we emphasize that membrane potential can be controlled much better using channelrhodopsins, which is why we employed them in the present study.

      (3) Line 78: I appreciate the concept of 'high curvature', but please always state which parameter(s) you are referring to (membrane voltage in space/time, etc?).

      We corrected our statement to include the specification of space curvature of the depolarised region:

      “In such a system, it was previously observed that spatiotemporal illumination can give rise to collective behaviour and ectopic waves (Teplenin et al. (2018)) originating from illuminated/depolarised regions (with high spatial curvature).”

      (4) Line 79: 'bi-stable state' - not yet properly introduced in this context.

      The bi-stability mentioned here refers back to single cell bistability introduced in Teplenin et al. (2018), which we cited again for clarity.

      “These waves resulted from the interplay between the diffusion current and the single cell bi-stable state (Teplenin et al. (2018)) that was induced in the illuminated region.”

      (5) Line 84-85: 'these ion channels allow the cells to respond' - please describe the channel used; and please correct: the channels respond to light, not the cells. Re-ordering this paragraph may help, because first you introduce channels for depolarization, then you go back to both de- and hyperpolarization. On the same note, which channels can be used for hyperpolarization of cardiomyocytes? I am not aware of any, even WiChR shows depolarizing effects in cardiomyocytes during prolonged activation (Vierock et al. 2022). Please delete: 'through a direct pathway' (Channelrhodopsins a directly light-gated channels, there are no pathways involved).

      We realised that the confusion arose from our use of incorrect terminology: we mistakenly wrote hyperpolarisation instead of repolarisation. In addition to channelrhodopsins such as WiChR, other tools can also induce a repolarising effect, including light-activatable chloride pumps (e.g., JAWS). However, to improve clarity, we recognize that repolarisation is not relevant to our manuscript and therefore decided to remove its mention (see below). Regarding the reported depolarising effects of WiChR in Vierock et al. (2022), we speculate that these may arise either from the specific phenotype of the cardiomyocytes used in the study, i.e. human induced pluripotent stem cell-derived atrial myocytes (aCMs), or from the particular ionic conditions applied during patch-clamp recordings (e.g., a bath solution containing 1 mM KCl). Notably, even after prolonged WiChR activation, the aCMs maintained a strongly negative maximum diastolic potential of approximately –55 mV.

      “Although effects of illuminating miniSOG with light might lead to formation of depolarised areas, it is difficult to control the process precisely since it depolarises cardiomyocytes indirectly. Therefore, in this manuscript, we used light-sensitive ion channels to obtain more refined control over cardiomyocyte depolarisation. These ion channels allow the cells to respond to specific wavelengths of light, facilitating direct depolarisation (Ördög et al. (2021, 2023)). By inducing cardiomyocyte depolarisation only in the illuminated areas, optogenetics enables precise spatiotemporal control of cardiac excitability, an attribute we exploit in this manuscript (Appendix 2 Figure 1).”

      (6) Figure 1: What would be the y-axis of the 'energy-like curves' in B? What exactly did you plot here?

      The graphs in Figure 1B are schematic representations intended to clarify the phenomenon for the reader. They do not depict actual data from any simulation or experiment. We clarified this misunderstanding by specifying that Figure 1B is a schematic representation of the effects at play in this paper.

      “(B) Schematic representation showing how light intensity influences collective behaviour of excitable systems, transitioning between a stationary state (STA) at low illumination intensities and an oscillatory state (OSC) at high illumination intensities. Bi-stability occurs at intermediate light intensities, where transitions between states are dependent on periodic wave train properties. TR. OSC, transient oscillations.”

      To expand slightly beyond the paper: our schematic representation was inspired by a common visualization in dynamical systems used to illustrate bi-stability (for an example, see Fig. 3 in Schleimer, J. H., Hesse, J., Contreras, S. A., & Schreiber, S. (2021). Firing statistics in the bistable regime of neurons with homoclinic spike generation. Physical Review E, 103(1), 012407.). In this framework, the y-axis can indeed be interpreted as an energy landscape, which is related to a probability measure through the Boltzmann distribution: . Here, p denotes the probability of occupying a particular state (STA or OSC). This probability can be estimated from the area (BCL × number of pulses) falling within each state, as shown in Fig. 4C. Since an attractor corresponds to a high-probability state, it naturally appears as a potential well in the landscape.

      (7) Lines 92-93: 'this transition resulted for the interaction of an illuminated region with depolarized CM and an external wave train' - please consider rephrasing (it is not the region interacting with depolarized CM; and the external wave train could be explained more clearly).

      We rephrased our unclear sentence as follows:

      “This transition resulted from the interaction of depolarized cardiomyocytes in an illuminated region with an external wave train not originating from within the illuminated region.”

      (8) Figure 2 and elsewhere: When mentioning 'frequency', please state frequency values and not cycle lengths. Please also reconsider your distinction between high and low frequencies; 200 ms (5 Hz) is actually the normal heart rate for neonatal rats (300 bpm).

      In the revised version, we have clarified frequency values explicitly and included them alongside period values wherever frequency is mentioned, to avoid any ambiguity. We also emphasize that our use of "high" and "low" frequency is strictly a relative distinction within the context of our data, and not meant to imply a biological interpretation.

      (9) Lines 129-131: Why not record optical maps? Voltage dynamics in the transition zone between depolarised and non-depolarised regions might be especially interesting to look at?

      We would like to clarify that optical maps were recorded for every experiment, and all experimental traces of cardiac monolayer activity were derived from these maps. We agree with the reviewer that the voltage dynamics in the transition zone are particularly interesting. However, we selected the data representations that, in our view, best highlight the main mechanisms. When we analysed full voltage profiles, they didn’t add extra insights to this main mechanism. As the other reviewer noted, the manuscript already presents a wide range of regimes, so we decided not to introduce further complexity.

      (10) Lines 156-157: Why was the model not adapted to match the biophysical properties (e.g., kinetics, ion selectivity, light sensitivity) of Cheriff?

      The model was not adapted to the biophysical properties of Cheriff, because this would entail a whole new study involving extensive patch-clamping experiments, fitting, and calibration to model the correct properties of the ion channel. Beyond considerations of time efficiency, incorporating more specific modelling parameters would not change the essence of our findings. While numeric parameter ranges might shift, the core results would remain unchanged. This is a result of our experimental design where we applied constant illumination of long duration (6s or longer), thus making a difference in kinetical properties of an optogenetic tool irrelevant. In addition, we were able to observe qualitatively similar phenomena using many other depolarising optogenetic tools (e.g. ChR2, ReaChR, CatCh and more) in our in-vitro experiments. We ended up with Cheriff as our optotool-of-choice for the practical reasons of good light-sensitivity and a non-overlapping spectrum with our fluorescent dyes.

      Therefore, computationally using a more general depolarising ion channel hints at the more general applicability of the observed phenomena, supporting our claim of a universal mechanism  (demonstrated experimentally with CheRiff and computationally with ChR2).

      (11) Line 158: 1.7124 mW/mm^2 - While I understand that this is the specific intensity used as input in the model, I am convinced that the model is not as accurate to predict behaviour at this specific intensity (4 digits after the comma), especially given that the model has not been adapted to Cheriff (probably more light sensitive than ChR2). Can this be rephrased?

      We did not aim for quantitative correspondence between the computational model and the biological experiments, but rather for qualitative agreement and mechanistic insight (see line 157). Qualitative comparisons are computationally obtained in a whole range of different intensities, as demonstrated in the 3D diagram of Fig. 4C. We wanted to demonstrate that at one fixed light intensity (chosen to be 1.7124 mW/mm^2 for the most clear effect), it was possible for all three states (STA, OSC. TR. OSC.) to coexist depending on the number of pulses and their period. Therefore the specific intensity used in the computational model is correct, and for reproducibility, we have left it unchanged while clarifying that it refers specifically to the in silico model:

      “Simulating at a fixed constant illumination of 1.7124 𝑚𝑊∕𝑚𝑚<sup>2</sup> and a fixed number of 4 pulses, frequency dependency of collective bi-stability was reproduced in Figure 4A.”

      (12) Lines 160, 165, and elsewhere: 'Once again, Once more' - please delete or rephrase.

      We agree that we could have written these binding words better and reformulated them to:

      “Similar to the experimental observations, only intermediate electrical pacing frequencies (500-𝑚𝑠 period) caused transitions from collective stationary behaviour to collective oscillatory behaviour and ectopic pacemaker activity had periods (710 𝑚𝑠) that were different from the stimulation train period (500 𝑚𝑠). Figure 4B shows the accumulation of pulses necessary to invoke a transition from the collective stationary state to the collective oscillatory state at a fixed stimulation period (600 𝑚𝑠). Also in the in silico simulations, ectopic pacemaker activity had periods (750 𝑚𝑠) that were different from the stimulation train period (600 𝑚𝑠). Also for the transient oscillatory state, the simulations show frequency selectivity (Appendix 2 Figure 4B).”

      (13) Line 171: 'illumination strength': please refer to 'light intensity'.

      We have revised our formulation to now refer specifically to “light intensity”:

      “We previously identified three important parameters influencing such transitions: light intensity, number of pulses, and frequency of pulses.”

      (14) Lines 187-188: 'the illuminated region settles into this period of sending out pulses' - please rephrase, the meaning is not clear.

      We reformulated our sentence to make its content more clear to the reader:

      “For the conditions that resulted in stable oscillations, the green vertical lines in the middle and right slices represent the natural pacemaker frequency in the oscillatory state. After the transition from the stationary towards the oscillatory state, oscillatory pulses emerging from the illuminated region gradually dampen and stabilize at this period, corresponding to the natural pacemaker frequency.”

      (15) Figure 7: A)- please state in the legend which parameter is plotted on the y-axis (it is included in the main text, but should be provided here as well); C) The numbers provided in brackets are confusing. Why is (4) a high pulse number and (3) a low pulse number? Why not just state the number of pulses and add alpha, beta, gamma, and delta for the panels in brackets? I suggest providing the parameters (e.g., 800 ms cycle length, 2 pulses, etc) for all combinations, but not rate them with low, high, etc. (see also comment above).

      We appreciate the reviewer’s comments and have revised the caption for figure 7, which now reads as follows:

      “Figure 7. Phase plane projections of pulse-dependent collective state transitions. (A) Phase space trajectories (displayed in the Voltage – x<sub>r</sub> plane) of the NRVM computational model show a limit cycle (OSC) that is not lying around a stable fixed point (STA). (B) Parameter space slice showing the relationship between stimulation period and number of pulses for a fixed illumination intensity (1.72 𝑚𝑊 ∕𝑚𝑚2) and size of the illuminated area (67 pixels edge length). Letters correspond to the graphs shown in C. (C) Phase space trajectories for different combinations of stimulus train period and number of pulses (α: 800 ms cycle length + 2 pulses, β: 800 ms cycle length + 4 pulses, γ: 250 ms cycle length + 3 pulses, δ: 250 ms cycle length + 8 pulses). α and δ do not result in a transition from the resting state to ectopic pacemaker activity, as under these circumstances the system moves towards the stationary stable fixed point from outside and inside the stable limit cycle, respectively. However, for β and γ, the stable limit cycle is approached from outside and inside, respectively, and ectopic pacemaker activity is induced.”

      (16) Line 258: 'other dimensions by the electrotonic current' - not clear, please rephrase and explain.

      We realized that our explanation was somewhat convoluted and have therefore changed the text as follows:

      “Rather than producing oscillations, the system returns to the stationary state along dimensions other than those shown in Figure 7C (Voltage and x<sub>r</sub>), as evidenced by the phase space trajectory crossing itself. This return is mediated by the electrotonic current.”

      (17) Line 263: ‘increased too much’ – please rephrase using scientific terminology.

      We rephrased our sentence to:

      “However, this is not a Hopf bifurcation, because in that case the system would not return to the stationary state when the number of pulses exceeds a critical threshold.”

      (18) Line 275: 'stronger diffusion/electrotonic influence from the non-illuminated region' - not sure diffusion is the correct term here. Please explain by taking into account the membrane potential. Please make sure to use proper terminology. The same applies to lines 281-282.

      We appreciate this comment, which prompted us to revisit on our text. We realised that some sections could be worded more clearly, and we also identified an error in the legend of Supplementary Figure 7. The corresponding corrections are provided below:

      “However, repolarisation reserve does have an influence, prolonging the transition when it is reduced (Appendix 2 Figure 7). This effect can be observed either by moving further from the boundary of the illuminated region, where the electrotonic influence from the non-illuminated region is weaker, or by introducing ionic changes, such as a reduction in I<sub>Ks</sub> and/or I<sub>to</sub>. For example, because the electrotonic influence is weaker in the center of the illuminated region, the voltage there is not pulled down toward the resting membrane potential as quickly as in cells at the border of the illuminated zone.”

      “To add a multicellular component to our single cell model we introduced a current that replicates the effect of cell coupling and its associated electrotonic influence.”

      “Figure 7. The effect of ionic changes on the termination of pacemaker activity. The mechanism that moves the oscillating illuminated tissue back to the stationary state after high frequency pacing is dependent on the ionic properties of the tissue, i.e. lower repolarisation reserves (20% 𝐼<sub>𝐾𝑠</sub> + 50% 𝐼<sub>𝑡𝑜</sub>) are associated with longer transition times.”

      (19) Line 289: -58 mV (to be corrected), -20 mV, and +50 mV - please justify the selection of parameters chosen. This also applies elsewhere- the selection of parameters seems quite arbitrary, please make sure the selection process is more transparent to the reader.

      Our choice of parameters was guided by the dynamical properties of the illuminated cells as well as by illustrative purposes. The value of –58 mV corresponds to the stimulation threshold of the model. The values of 50 mV and –20 mV match those used for single-cell stimulation (Figure 8C2, right panel), producing excitable and bistable dynamics, respectively. We refer to this point in line 288 with the phrase “building on this result.” To maintain conciseness, we did not elaborate on the underlying reasoning within the manuscript and instead reported only the results.

      We also corrected the previously missed minus sign: -58 mV.

      (20) Figure 8 and corresponding text: I don't understand what stimulation with a voltage means. Is this an externally applied electric field? Or did you inject a current necessary to change the membrane voltage by this value? Please explain.

      Stimulation with a specific voltage is a standard computational technique and can be likened to performing a voltage-clamp experiment on each individual cell. In this approach, the voltage of every cell in the tissue is briefly forced to a defined value.

      (21) Figure 8C- panel 2: Traces at -20 mV and + 50 mV are identical. Is this correct? Please explain.

      Yes, that is correct. The cell responds similarly to a voltage stimulus of -20 mV or one of 50 mV, because both values are well above the excitation threshold of a cardiomyocyte.

      (22) Line 344 and elsewhere: 'diffusion current' - This is probably not the correct terminology for gap-junction mediated currents. Please rephrase.

      A diffusion current is a mathematical formulation for a gap junction mediated current here, so , depending on the background of the reader, one of the terms might be used focusing on different aspects of the results. In a mathematical modelling context one often refers to a diffusion current because cardiomyocytes monolayers and tissues can be modelled using a reaction-diffusion equation. From the context of fine-grain biological and biophysical details, one uses the term gap-junction mediated current. Our choice is motivated by the main target audience we have in mind, namely interdisciplinary researchers with a core background in the mathematics/physics/computer science fields.

      However, to not exclude our secondary target audience of biological and medical readers we now clarified the terminology, drawing the parallel between the different fields of study at line 79:

      “These waves resulted from the interplay between the diffusion current (also known in biology/biophysics as the gap junction mediated current) and the bi-stable state that was induced in the illuminated region.”

      (23) Lines 357-58: 'Such ectopic sources are typically initiated by high frequency pacing' - While this might be true during clinical testing, how would you explain this when not externally imposed? What could be biological high-frequency triggers?

      Biological high-frequency triggers could include sudden increases in heart rates, such as those induced by physical activity or emotional stress. Another possibility is the occurrence of paroxysmal atrial or ventricular fibrillation, which could then give rise to an ectopic source.

      (24) Lines 419-420: 'large ionic cell currents and small repolarising coupling currents'. Are coupling currents actually small in comparison to cellular currents? Can you provide relative numbers (~ratio)?

      Coupling currents are indeed small compared to cellular currents. This can be inferred from the I-V curve shown in Figure 8C1, which dips below 0 and creates bi-stability only because of the small coupling current. If the coupling current were larger, the system would revert to a monostable regime. To make this more concrete, we have now provided the exact value of the coupling current used in Figure 8C1.

      “Otherwise, if the hills and dips of the N-shaped steady-state IV curve were large (Figure 8C-1), they would have similar magnitudes as the large currents of fast ion channels, preventing the subtle interaction between these strong ionic cell currents and the small repolarising coupling currents (-0.103649 ≈ 0.1 pA).”

      (25) Line 426: Please explain how ‘voltage shocks’ were modelled.

      We would like to refer the reviewer to our response to comment (20) regarding how we model voltage shocks. In the context of line 426, a typical voltage shock corresponds to a tissue-wide stimulus of 50 mV. Independent of our computational model, line 426 also cites other publications showing that, in clinical settings, high-voltage shocks are unable to terminate ectopic sustained activity, consistent with our findings.

      (26) Lines 429 ff: 0.2pA/pF would correspond to 20 pA for a small cardiomyocyte of 100 pF, this current should be measurable using patch-clamp recordings.

      In trying to be succinct, we may have caused some confusion. The difference between the dips (-0.07 pA/pF) and hills (_≈_0.11 pA/pF) is approximately 0.18 pA/pF. For a small cardiomyocyte, this corresponds to deviations from zero of roughly ±10 pA. Considering that typical RMS noise levels in whole-cell patch-clamp recordings range from 2-10 pA , it is understandable that detecting these peaks and dips in an I-V curve (average current after holding a voltage for an extended period)  is difficult. Achieving statistical significance would therefore require patching a large number of cells.

      Given the already extensive scope of our manuscript in terms of techniques and concepts, we decided not to pursue these additional patch-clamp experiments.

      Reviewer #2 (Recommendations for the authors):

      Given the deluge of conditions to consider, there are several areas of improvement possible in communicating the authors' findings. I have the following suggestions to improve the manuscript.

      (1) Please change "pulse train" straight pink bar OR add stimulation marks (such as "*", or individual pulse icons) to provide better visual clarity that the applied stimuli are "short ON, long OFF" electrical pulses. I had significant initial difficulty understanding what the pulse bars represented in Figures 2, 3, 4A-B, etc. This may be partially because stimuli here could be either light (either continuous or pulsed) or electrical (likely pulsed only). To me, a solid & unbroken line intuitively denotes a continuous stimulation. I understand now that the pink bar represents the entire pulse-train duration, but I think readers would be better served with an improvement to this indicator in some fashion. For instance, the "phases" were much clearer in Figures 7C and 8D because of how colour was used on the Vm(t) traces. (How you implement this is up to you, though!)

      We have addressed the reviewer’s concern and updated the figures by marking each external pulse with a small vertical line (see below).

      (2) Please label the electrical stimulation location (akin to the labelled stimulation marker in circle 2 state in Figure 1A) in at least Figures 2 and 4A, and at most throughout the manuscript. It is unclear which "edge" or "pixel" the pulse-train is originating from, although I've assumed it's the left edge of the 2D tissue (both in vitro and silico). This would help readers compare the relative timing of dark blue vs. orange optical signal tracings and to understand how the activation wavefront transverses the tissue.

      We indicated the pacing electrode in the optical voltage recordings with a grey asterisk. For the in silico simulations, the electrode was assumed to be far away, and the excitation was modelled as a parallel wave originating from the top boundary, indicated with a grey zone.

      (3) Given the prevalence of computational experiments in this study, I suggest considering making a straightforward video demonstrating basic examples of STA, OSC, and TR.OSC states. I believe that a video visualizing these states would be visually clarifying to and greatly appreciated by readers. Appendix 2 Figure 3 would be the no-motion visualization of the examples I'm thinking of (i.e., a corresponding stitched video could be generated for this). However, this video-generation comment is a suggestion and not a request.

      We have included a video showing all relevant states, which is now part of the Supplementary Material.

      (4) Please fix several typos that I found in the manuscript:

      (4A) Line 279: a comma is needed after i.e. when used in: "peculiar, i.e. a standard". However, this is possibly stylistic (discard suggestion if you are consistent in the manuscript).

      (4B) Line 382: extra period before "(Figure 3C)".

      (4C) Line 501: two periods at end of sentence "scientific purposes.." .

      We would like to thank the reviewer for pointing out these typos. We have corrected them and conducted an additional check throughout the manuscript for minor errors.

    1. Reviewer #2 (Public review):

      A summary of what the authors were trying to achieve.

      The authors aim to determine whether the gene Hsb17b7 is essential for hair cell function and, if so, to elucidate the underlying mechanism, specifically the HSB17B7 metabolic role in cholesterol biogenesis. They use animal, tissue, or data from zebrafish, mouse, and human patients.

      Strengths:

      (1) This is the first study of Hsb17b7 in the zebrafish (a previous report identified this gene as a hair cell marker in the mouse utricle).

      (2) The authors demonstrate that Hsb17b7 is expressed in hair cells of zebrafish and the mouse cochlea.

      (3) In zebrafish larvae, a likely KO of the Hsb17b7 gene causes a mild phenotype in an acoustic/vibrational assay, which also involves a motor response.

      (4) In zebrafish larvae, a likely KO of the Hsb17b7 gene causes a mild reduction in lateral line neuromast hair cell number and a mild decrease in the overall mechanotransduction activity of hair cells, assayed with a fluorescent dye entering the mechanotransduction channels.

      (5) When HSB17B7 is overexpressed in a cell line, it goes to the ER, and an increase in Cholesterol cytoplasmic puncta is detected. Instead, when a truncated version of HSB17B7 is overexpressed, HSB17B7 forms aggregates that co-localize with cholesterol.

      (6) It seems that the level of cholesterol in crista and neuromast hair cells decreases when Hsb17b7 is defective (but see comment below).

      Weakness:

      (1) The statement that HSD17B7 is "highly" expressed in sensory hair cells in mice and zebrafish seems incorrect for zebrafish:

      (a) The data do not support the notion that HSB17B7 is "highly expressed" in zebrafish. Compared to other genes (TMC1, TMIE, and others), the HSB17B7 level of expression in neuromast hair cells is low (Figure 1F), and by extension (Figure 1C), also in all hair cells. This interpretation is in line with the weak detection of an mRNA signal by ISH (Figure 1G I"). On this note, the staining reported in I" does not seem to label the cytoplasm of neuromast hair cells. An antisense probe control, along with a positive control (such as TMC1 or another), is necessary to interpret the ISH signal in the neuromast.

      (b) However, this is correct for mouse cochlear hair cells, based on single-cell RNA-seq published databases and immunostaining performed in the study. However, the specificity of the anti-HSD17B7 antibody used in the study (in immunostaining and western blot) is not demonstrated. Additionally, it stains some supporting cells or nerve terminals. Was that expression expected?

      (2) A previous report showed that HSD17B7 is expressed in mouse vestibular hair cells by single-cell RNAseq and immunostaining in mice, but it is not cited:

      Spatiotemporal dynamics of inner ear sensory and non-sensory cells revealed by single-cell transcriptomics.

      Jan TA, Eltawil Y, Ling AH, Chen L, Ellwanger DC, Heller S, Cheng AG.

      Cell Rep. 2021 Jul 13;36(2):109358. doi: 10.1016/j.celrep.2021.109358.

      (3) Overexpressed HSD17B7-EGFP C-terminal fusion in zebrafish hair cells shows a punctiform signal in the soma but apparently does not stain the hair bundles. One limitation is the consequence of the C-terminal EGFP fusion to HSD17B7 on its function, which is not discussed.

      (4) A mutant Zebrafish CRISPR was generated, leading to a truncation after the first 96 aa out of the 340 aa total. It is unclear why the gene editing was not done closer to the ATG. This allele may conserve some function, which is not discussed.

      (5) The hsd17b7 mutant allele has a slightly reduced number of genetically labeled hair cells (quantified as a 16% reduction, estimated at 1-2 HC of the 9 HC present per neuromast). On a note, it is unclear what criteria were used to select HC in the picture. Some Brn3C:mGFP positive cells are apparently not included in the quantifications (Figure 2F, Figure 5A).

      (6) The authors used FM4-64 staining to evaluate the hair cell mechanotransduction activity indirectly. They found a 40% reduction in labeling intensity in the HCs of the lateral line neuromast. Because the reduction of hair cell number (16%) is inferior to the reduction of FM4-64 staining, the authors argue that it indicates that the defect is primarily affecting the mechanotransduction function rather than the number of HCs. This argument is insufficient. Indeed, a scenario could be that some HC cells died and have been eliminated, while others are also engaged in this path and no longer perform the MET function. The numbers would then match. If single-cell staining can be resolved, one could determine the FM4-64 intensity per cell. It would also be informative to evaluate the potential occurrence of cell death in this mutant. On another note, the current quantification of the FM4-64 fluorescence intensity and its normalization are not described in the methods. More importantly, an independent and more direct experimental assay is needed to confirm this point. For example, using a GCaMP6-T2A-RFP allele for Ca2+ imaging and signal normalization.

      (7) The authors used an acoustic startle response to elicit a behavioral response from the larvae and evaluate the "auditory response". They found a significative decrease in the response (movement trajectory, swimming velocity, distance) in the hsd17b7 mutant. The authors conclude that this gene is crucial for the "auditory function in zebrafish".

      This is an overstatement:

      (a) First, this test is adequate as a screening tool to identify animals that have lost completely the behavioral response to this acoustic and vibrational stimulation, which also involves a motor response. However, additional tests are required to confirm an auditory origin of the defect, such as Auditory Evoked Potential recordings, or for the vestibular function, the Vestibulo-Ocular Reflex.

      (b) Secondly, the behavioral defects observed in the mutant compared to the control are significantly different, but the differences are slight, contained within the Standard Deviation (20% for velocity, 25% for distance). To this point, the Figure 2 B and C plots are misleading because their y-axis do not start at 0.

      (8) Overexpression of HSD17B7 in cell line HEI-OC1 apparently "significantly increases" the intensity of cholesterol-related signal using a genetically encoded fluorescent sensor (D4H-mCherry). However, the description of this quantification (per cell or per surface area) and the normalization of the fluorescent signal are not provided.

      (9) When this experiment is conducted in vivo in zebrafish, a reduction in the "DH4 relative intensity" is detected (same issue with the absence of a detailed method description). However, as the difference is smaller than the standard deviation, this raises questions about the biological relevance of this result.

      (10) The authors identified a deaf child as a carrier of a nonsense mutation in HSB17B7, which is predicted to terminate the HSB17B7 protein before the transmembrane domain. However, as no genetic linkage is possible, the causality is not demonstrated.

      (11) Previous results obtained from mouse HSD17B7-KO (citation below) are not described in sufficient detail. This is critical because, in this paper, the mouse loss-of-function of HSD17B7 is embryonically lethal, whereas no apparent phenotype was reported in heterozygotes, which are viable and fertile. Therefore, it seems unlikely that heterozygous mice exhibit hearing loss or vestibular defects; however, it would be essential to verify this to support the notion that the truncated allele found in one patient is causal.

      Hydroxysteroid (17beta) dehydrogenase 7 activity is essential for fetal de novo cholesterol synthesis and for neuroectodermal survival and cardiovascular differentiation in early mouse embryos.

      Jokela H, Rantakari P, Lamminen T, Strauss L, Ola R, Mutka AL, Gylling H, Miettinen T, Pakarinen P, Sainio K, Poutanen M.<br /> Endocrinology. 2010 Apr;151(4):1884-92. doi: 10.1210/en.2009-0928. Epub 2010 Feb 25.

      (12) The authors used this truncated protein in their startle response and FM4-64 assays. First, they show that contrary to the WT version, this truncated form cannot rescue their phenotypes when overexpressed. Secondly, they tested whether this truncated protein could recapitulate the startle reflex and FM4-64 phenotypes of the mutant allele. At the homozygous level (not mentioned by the way), it can apparently do so to a lesser degree than the previous mutant. Again, the differences are within the Standard Deviation of the averages. The authors conclude that this mutation found in humans has a "negative effect" on hearing, which is again not supported by the data.

      (13) The authors looked at the distribution of the HSB17B7 in a cell line. The WT version goes to the ER, while the truncated one forms aggregates. An interesting experiment consisted of co-expressing both constructs (Figure S6) to see whether the truncated version would mislocalize the WT version, which could be a mechanism for a dominant phenotype. However, this is not the case.

      (14) Through mass spectrometry of HSB17B7 proteins in the cell line, they identified a protein involved in ER retention, RER1. By biochemistry and in a cell line, they show that truncated HSB17B7 prevents the interaction with RER1, which would explain the subcellular localization.

      Hydroxysteroid (17beta) dehydrogenase 7 activity is essential for fetal de novo cholesterol synthesis and for neuroectodermal survival and cardiovascular differentiation in early mouse embryos.

      Jokela H, Rantakari P, Lamminen T, Strauss L, Ola R, Mutka AL, Gylling H, Miettinen T, Pakarinen P, Sainio K, Poutanen M.<br /> Endocrinology. 2010 Apr;151(4):1884-92. doi: 10.1210/en.2009-0928. Epub 2010 Feb 25.

      (15) Information and specificity validation of the HSB17B7 antibody are not presented. It seems that it is the same used on mice by IF and on zebrafish by Western. If so, the antibody could be used on zebrafish by IF to localize the endogenous protein (not overexpression as done here). Secondly, the specificity of the antibody should be verified on the mutant allele. That would bring confidence that the staining on the mouse is likely specific.

    1. The team studied genomes from strains with a worldwide distribution and of different ages and determined that Y. pestis has an unstable molecular clock. This makes it particularly difficult to measure the rate at which mutations accumulate in its genome over time, which are then used to calculate dates of emergence. Because Y. pestis evolves at a very slow pace, it is almost impossible to determine exactly where it originated.

      This explains the scientific limitation that creates the big debate. Since the plague genome evolves so slowly, they can't even tell where it started!

    1. These changes from the Y. pseudotuberculosis progenitor included loss of insecticidal activity, increased resistance to antibacterial factors in the flea midgut, and extending Yersinia biofilm-forming ability to the flea host environment.

      This is the technical explanation for the famous "blocked flea" which is the key to the rat theory. The biofilm is what clogs the flea's gut and forces it to bite more.

    2. the interactions of Y. pestis with its flea vector that lead to colonization and successful transmission are the result of a recent evolutionary adaptation that required relatively few genetic changes.

      This is a great detail for my argument! The article calls the flea jump a "recent evolutionary adaptation." This suggests the mechanism might have been imperfect or inefficient in the 14th century, which actually strengthens the argument against the rat-flea model being the sole cause of the Black Death's incredibly fast spread. It provides scientific backing for why I need to seriously consider the human ectoparasite model and not just discard it immediately.

    1. Alternative putative etiologies of the Black Death include a viral hemorrhagic fever [16] or a currently unknown pathogen [19]. In part, these alternative etiologies reflect apparent discrepancies between historical observations of extremely rapid spread of mortality during the Black Death with the dogma based on Indian epidemiology that plague is associated with transmission from infected rats via blocked fleas

      This is a perfect summary of the whole problem I'm trying to solve. Historians originally doubted the Y. pestis theory because the plague spread way too fast to be the slow rat-flea model. This confirms that I'm right to use my map to visually test the difference between the slow rat spread (the "dogma") and the rapid human spread (my hypothesis).

    2. our aDNA results identified two previously unknown but related clades of Y. pestis associated with distinct medieval mass graves. These findings suggest that plague was imported to Europe on two or more occasions, each following a distinct route.

      This is good for my hypothesis! It proves that the plague was too complex to have followed just one simple path. Since the scientists found two different strains, my map must show two separate main routes into Europe, which means I can directly test the differences between the rat theory and the human flea theory.

    3. Here we identified DNA and protein signatures specific for Y. pestis in human skeletons from mass graves in northern, central and southern Europe that were associated archaeologically with the Black Death and subsequent resurgences. We confirm that Y. pestis caused the Black Death and later epidemics on the entire European continent over the course of four centuries.

      This could be the best starting point for my project. It shuts down the argument about what caused the plague, so I don't have to waste time debating the pathogen itself. I can now focus 100% on mapping the how and when of the spread, which is the whole point of my research.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Authors’ reply (____Ono et al)

      Review Commons Refereed Preprint #RC-2025-03137

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Ono et al addressed how condensin II and cohesin work to define chromosome territories (CT) in human cells. They used FISH to assess the status of CT. They found that condensin II depletion leads to lengthwise elongation of G1 chromosomes, while double depletion of condensin II and cohesin leads to CT overlap and morphological defects. Although the requirement of condensin II in shortening G1 chromosomes was already shown by Hoencamp et al 2021, the cooperation between condensin II and cohesin in CT regulation is a new finding. They also demonstrated that cohesin and condensin II are involved in G2 chromosome regulation on a smaller and larger scale, respectively. Though such roles in cohesin might be predictable from its roles in organizing TADs, it is a new finding that the two work on a different scale on G2 chromosomes. Overall, this is technically solid work, which reports new findings about how condensin II and cohesin cooperate in organizing G1 and G2 chromosomes.

      We greatly appreciate the reviewer’s supportive comments. The reviewer has accurately recognized our new findings concerning the collaborative roles of condensin II and cohesin in establishing and maintaining interphase chromosome territories.

      Major point:

      They propose a functional 'handover' from condensin II to cohesin, for the organization of CTs at the M-to-G1 transition. However, the 'handover', i.e. difference in timing of executing their functions, was not experimentally substantiated. Ideally, they can deplete condensin II and cohesin at different times to prove the 'handover'. However, this would require the use of two different degron tags and go beyond the revision of this manuscript. At least, based on the literature, the authors should discuss why they think condensin II and cohesin should work at different timings in the CT organization.

      We take this comment seriously, especially because Reviewer #2 also expressed the same concern. 

      First of all, we must admit that the basic information underlying the “handover” idea was insufficiently explained in the original manuscript. Let us make it clear below:

      • Condensin II bound to chromosomes and is enriched along their axes from anaphase through telophase (Ono et al., 2004; Hirota et al., 2004; Walther et al., 2018).
      • In early G1, condensin II is diffusely distributed within the nucleus and does not bind tightly to chromatin, as shown by detergent extraction experiments (Ono et al., 2013).
      • Cohesin starts binding to chromatin when the cell nucleus reassembles (i.e., during the cytokinesis stage shown in Fig. 1B), apparently replacing condensins I and II (Brunner et al., 2025).
      • Condensin II progressively rebinds to chromatin from S through G2 phase (Ono et al., 2013). The cell cycle-dependent changes in chromosome-bound condensin II and cohesin summarized above are illustrated in Fig. 1A. We now realize that Fig. 1B in the original manuscript was inconsistent with Fig. 1A, creating unnecessary confusion, and we sincerely apologize for this. The fluorescence images shown in the original Fig. 1B were captured without detergent extraction prior to fixation, giving the misleading impression that condensin II remained bound to chromatin from cytokinesis through early G1. This was not our intention. To clarify this, we have repeated the experiment in the presence of detergent extraction and replaced the original Fig. 1B with a revised panel. Figs. 1A and 1B are now more consistent with each other. Accordingly, we have modified the correspsonding sentences as follows:

      Although condensin II remains nuclear throughout interphase, its chromatin binding is weak in G1 and becomes robust from S phase through G2 (Ono et al., 2013). Cohesin, in contrast, replaces condensin II in early G1 (Fig. 1 B)(Abramo et al., 2019; Brunner et al., 2025), and establishes topologically associating domains (TADs) in the G1 nucleus (Schwarzer et al., 2017; Wutz et al., 2017)*. *

      While there is a loose consensus in the field that condensin II is replaced by cohesin during the M-to-G1 transition, it remains controversial whether there is a short window during which neither condensin II nor cohesin binds to chromatin (Abramo et al., 2019), or whether there is a stage in which the two SMC protein complexes “co-occupy” chromatin (Brunner et al., 2025). Our images shown in the revised Fig. 1B cannot clearly distinguish between these two possibilities.

      From a functional point of view, the results of our depletion experiments are more readily explained by the latter possibility. If this is the case, the “interplay” or “cooperation” rather than the “handover” may be a more appropriate term to describe the functional collaboration between condensin II and cohesin during the M-to-G1 transition. For this reason, we have avoided the use of the word “handover” in the revised manuscript. It should be emphasized, however, that given their distinct chromosome-binding kinetics, the cooperation of the two SMC complexes during the M-to-G1 transition is qualitatively different from that observed in G2. Therefore, the central conclusion of the present study remains unchanged.

      For example, a sentence in Abstract has been changed as follows:

      a functional interplay between condensin II and cohesin during the mitosis-to-G1 transition is critical for establishing chromosome territories (CTs) in the newly assembling nucleus.

      While the reviewer suggested one experiment, it is clearly beyond the scope of the current study. It should also be noted that even if such a cell line were available, the proposed application of sequential depletion to cells progressing from mitosis to G1 phase would be technically challenging and unlikely to produce results that could be interpreted with confidence.

      Other points:

      Figure 2E: It seems that the chromosome length without IAA is shorter in Rad21-aid cells than H2-aid cells or H2-aid Rad21-aid cells. How can this be interpreted? This comment is well taken. A related comment was made by Reviewer #3 (Major comment #2). Given the substantial genetic manipulations applied to establish multiple cell lines used in the present study, it is, strictly speaking, not straightforward to compare the -IAA controls between different cell lines. Such variations are most prominently observed in Fig. 2E, although they can also be observed to lesser extent in other experiments (e.g., Fig. 3E). This issue is inherently associated with all studies using genetically manipulated cell lines and therefore cannot be completely avoided. For this reason, we focus on the differences between -IAA and +IAA within each cell line, rather than comparing the -IAA conditions across different cell lines. In this sense, a sentence in the original manuscript (lines 178-180) was misleading. In the revised manuscript, we have modified the corresponding and subsequent sentence as follows:

      Although cohesin depletion had a marginal effect on the distance between the two site-specific probes (Fig.2, C and E), double depletion did not result in a significant change (Fig.2, D and E), consistent with the partial restoration of centromere dispersion (Fig. 1G).

      • *

      In addition, we have added a section entitled “Limitations of the study” at the end of the Discussion to address technical issues that are inevitably associated with the current approach.

      Figure 3: Regarding the CT morphology, could they explain further the difference between 'elongated' and 'cloud-like (expanded)'? Is it possible to quantify the frequency of these morphologies? In the original manuscript, we provided data that quantitatively distinguished between the “elongated” and “cloud-like” phenotypes. Specifically, Fig. 2E shows that the distance between two specific loci (Cen 12 and 12q15) is increased in the elongated phenotype but not in the cloud-like phenotype. In addition, the cloud-like morphology was clearly deviated from circularity, as indicated by the circularity index (Fig. 3F). However, because circularity can also decrease in rod-shaped chromosomes, these datasets alone may not be sufficiently convincing, as the reviewer pointed out. We have now included an additional parameter, the aspect ratio, defined as the ratio of an object’s major axis to its minor axis (new Fig. 3F). While this intuitive parameter was altered upon condensin II depletion and double depletion, again, we acknowledge that it is not sufficient to convincingly distinguish between the elongated and cloud-like phenotypes proposed in the original manuscript. For these reasons, in the revised manuscript, we have toned down our statements regarding the differences in CT morphology between the two conditions. Nonetheless, together with the data from Figs. 1 and 2, it is that the Rabl configuration observed upon condensin II depletion is further exacerbated in the absence of cohesin. Accordingly, we have modified the main text and the cartoon (Fig 3H) to more accurately depict the observations summarized above.

      Figure 5: How did they assign C, P and D3 for two chromosomes? The assignment seems obvious in some cases, but not in other cases (e.g. in the image of H2-AID#2 +IAA, two D3s can be connected to two Ps in the other way). They may have avoided line crossing between two C-P-D3 assignments, but can this be justified when the CT might be disorganized e.g. by condensin II depletion? This comment is well taken. As the reviewer suspected, we avoided line crossing between two sets of assignments. Whenever there was ambiguity, such images were excluded from the analysis. Because most chromosome territories derived from two homologous chromosomes are well separated even under the depleted conditions as shown in Fig. 6C, we did not encounter major difficulties in making assignments based on the criteria described above. We therefore remain confident that our conclusion is valid.

      That said, we acknowledge that our assignments of the FISH images may not be entirely objective. We have added this point to the “Limitations of the study” section at the end of the Discussion.

      Figure 6F: The mean is not indicated on the right-hand side graph, in contrast to other similar graphs. Is this an error? We apologize for having caused this confusion. First, we would like to clarify that the right panel of Fig. 6F should be interpreted together with the left panel, unlike the seemingly similar plots shown in Figs. 6G and 6H. In the left panel of Fig. 6F, the percentages of CTs that contact the nucleolus are shown in grey, whereas those that do not are shown in white. All CTs classified in the “non-contact” population (white) have a value of zero in the right panel, represented by the bars at 0 (i.e., each bar corresponds to a collection of dots having a zero value). In contrast, each CT in the “contact” population (grey) has a unique contact ratio value in the right panel. Because the right panel consists of two distinct groups, we reasoned that placing mean or median bars would not be appropriate. This was why no mean or median bars were shown in in the tight panel (The same is true for Fig. S5 A and B).

      That said, for the reviewer’s reference, we have placed median bars in the right panel (see below). In the six cases of H2#2 (-/+IAA), Rad21#2 (-/+IAA), Double#2 (-IAA), and Double#3 (-IAA), the median bars are located at zero (note that in these cases the mean bars [black] completely overlap with the “bars” derived from the data points [blue and magenta]). In the two cases of Double#2 (+IAA) and Double#3 (+IAA), they are placed at values of ~0.15. Statistically significant differences between -IAA and +IAA are observed only in Double#2 and Double#3, as indicated by the P-value shown on the top of the panel. Thus, we are confident in our conclusion that CTs undergo severe deformation in the absence of both condensin II and cohesin.

      Figure S1A: The two FACS profiles for Double-AID #3 Release-2 may be mixed up between -IAA and +IAA. The review is right. This inadvertent error has been corrected.

      The method section explains that 'circularity' shows 'how closely the shape of an object approximates a perfect circle (with a value of 1 indicating a perfect circle), calculated from the segmented regions'. It would be helpful to provide further methodological details about it. We have added further explanations regarding the circularity in Materials and Methods together with a citation (two added sentences are underlined below):

      To analyze the morphology of nuclei, CTs, and nucleoli, we measured “circularity,” a morphological index that quantifies how closely the shape of an object approximates a perfect circle (value =1). Circularity was defined as 4π x Area/Perimeter2, where both the area and perimeter of each segmented object were obtained using ImageJ. This index ranges from 0 to 1, with values closer to 1 representing more circular objects and lower values correspond to elongated or irregular shapes (Chen et al, 2017).

      Chen, B., Y. Wang, S. Berretta and O. Ghita. 2017. Poly Aryl Ether Ketones (PAEKs) and carbon-reinforced PAEK powders for laser sintering. J Mater Sci 52:6004-6019.

      Reviewer #1 (Significance (Required)):

      Ono et al addressed how condensin II and cohesin work to define chromosome territories (CT) in human cells. They used FISH to assess the status of CT. They found that condensin II depletion leads to lengthwise elongation of G1 chromosomes, while double depletion of condensin II and cohesin leads to CT overlap and morphological defects. Although the requirement of condensin II in shortening G1 chromosomes was already shown by Hoencamp et al 2021, the cooperation between condensin II and cohesin in CT regulation is a new finding. They also demonstrated that cohesin and condensin II are involved in G2 chromosome regulation on a smaller and larger scale, respectively. Though such roles in cohesin might be predictable from its roles in organizing TADs, it is a new finding that the two work on a different scale on G2 chromosomes. Overall, this is technically solid work, which reports new findings about how condensin II and cohesin cooperate in organizing G1 and G2 chromosomes.

      See our reply above.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Ono et al use a variety of imaging and genetic (AID) depletion approaches to examine the roles of condensin II and cohesin in the reformation of interphase genome architecture in human HCT16 cells. Consistent with previous literature, they find that condensin II is required for CENP-A dispersion in late mitosis/early G1. Using in situ FISH at the centromere/q arm of chromosome 12 they then establish that condensin II removal causes lengthwise elongation of chromosomes that, interestingly, can be suppressed by cohesin removal. To better understand changes in whole-chromosome morphology, they then use whole chromosome painting to examine chromosomes 18 and 19. In the absence of condensin II, cells effectively fail to reorganise their chromosomes from rod-like structures into spherical chromosome territories (which may explain why CENP-A dispersion is suppressed). Cohesin is not required for spherical CT formation, suggesting condensin II is the major initial driver of interphase genome structure. Double depletion results in complete disorganisation of chromatin, leading the authors to conclude that a typical cell cycle requires orderly 'handover' from the mitotic to interphase genome organising machinery. The authors then move on to G2 phase, where they use a variety of different FISH probes to assess alterations in chromosome structure at different scales. They thereby establish that perturbation of cohesin or condensin II influences local and longer range chromosome structure, respectively. The effects of condensin II depletion become apparent at a genomic distance of 20 Mb, but are negligible either below or above. The authors repeat the G1 depletion experiment in G2 and now find that condensin II and cohesin are individually dispensable for CT organisation, but that dual depletion causes CT collapse. This rather implies that there is cooperation rather than handover per se. Overall this study is a broadly informative multiscale investigation of the roles of SMC complexes in organising the genome of postmitotic cells, and solidifies a potential relationship between condensin II and cohesin in coordinating interphase genome structure. The deeper investigation of the roles of condensin II in establishing chromosome territories and intermediate range chromosome structure in particular is a valuable and important contribution, especially given our incomplete understanding of what functions this complex performs during interphase.

      We sincerely appreciate the reviewer’s supportive comments. The reviewer has correctly acknowledged both the current gaps in our understanding of the role of condensin II in interphase chromosome organization and our new findings on the collaborative roles of condensin II and cohesin in establishing and maintaining interphase chromosome territories.

      Major comments:

      In general the claims and conclusions of the manuscript are well supported by multiscale FISH labelling. An important absent control is western blotting to confirm protein depletion levels. Currently only fluorescence is used as a readout for the efficiency of the AID depletion, and we know from prior literature that even small residual quantities of SMC complexes are quite effective in organising chromatin. I would consider a western blot a fairly straightforward and important technical control.

      Let me explain why we used immunofluorescence measurements to evaluate the efficiency of depletion. In our current protocol for synchronizing at the M-to-G1 transition, ~60% of control and H2-depleted cells, and ~30% of Rad21-depleted and co-depleted cells, are successfully synchronized in G1 phase. The apparently lower synchronization efficiency in the latter two groups is attributable to the well-documented mitotic delay caused by cohesin depletion. From these synchronized populations, early G1 cells were selected based on their characteristic morphologies (see the legend of Fig. 1C). In this way, we analyzed an early G1 cell population that had completed mitosis without chromosome segregation defects. We acknowledge that this represents a technically challenging aspect of M-to-G1 synchronization in HCT116 cells, whose synchronization efficiency is limited compared with that of HeLa cells. Nevertheless, this approach constitutes the most practical strategy currently available. Hence, immunofluorescence provides the only feasible means to evaluate depletion efficiency under these conditions.

      Although immunoblotting can, in principle, be applied to G2-arrested cell populations, we do not believe that information obtained from such experiments would affect the main conclusions of the current study. Please note that we carefully designed and performed all experiments with appropriate controls: H2 depletion, RAD21 depletion, and double depletion, with outcomes confirmed using independent cell lines (Double-AID#2 and Double-AID#3) whenever deemed necessary.

      We fully acknowledge the technical limitations associated with the AID-mediated depletion techniques, which are now described in the section entitled “Limitations of the study” at the end of the Discussion. Nevertheless, we emphasize that these limitations do not compromise the validity of our findings.

      I find the point on handover as a mechanism for maintaining CT architecture somewhat ambiguous, because the authors find that the dependence simply switches from condensin II to both condensin II and cohesin, between G1 and G2. To me this implies augmented cooperation rather than handover. I have two further suggestions, both of which I would strongly recommend but would consider desirable but 'optional' according to review commons guidelines.

      First of all, we would like to clarify a possible misunderstanding regarding the phrase “handover as a mechanism for maintaining CT architecture somewhat ambiguous”. In the original manuscript, we proposed handover as a mechanism for establishing G1 chromosome territories, not for maintaining CTs.

      That said, we take this comment very seriously, especially because Reviewer #1 also expressed the same concern. Please see our reply to Reviewer #1 (Major point).

      In brief, we agree with the reviewer that the word “handover” may not be appropriate to describe the functional relationship between condensin II and cohesin during the M-to-G1 transition. In the revised manuscript, we have avoided the use of the word “handover”, replacing it with “interplay”. It should be emphasized, however, that given their distinct chromosome-binding kinetics, the cooperation of the two SMC complexes during the M-to-G1 transition is qualitatively different from that observed in G2. Therefore, the central conclusion of the present study remains unchanged.

      For example, a sentence in Abstract has been changed as follows:

      a functional interplay between condensin II and cohesin during the mitosis-to-G1 transition is critical for establishing chromosome territories (CTs) in the newly assembling nucleus.

      Firstly, the depletions are performed at different stages of the cell cycle but have different outcomes. The authors suggest this is because handover is already complete, but an alternative possibility is that the phenotype is masked by other changes in chromosome structure (e.g. duplication/catenation). I would be very curious to see, for example, how the outcome of this experiment would change if the authors were to repeat the depletions in the presence of a topoisomerase II inhibitor.

      The reviewer’s suggestion here is somewhat vague, and it is unclear to us what rationale underlies the proposed experiment or what meaningful outcomes could be anticipated. Does the reviewer suggest that we perform topo II inhibitor experiments both during the M-to-G1 transition and in G2 phase, and then compare the outcomes between the two conditions?

      For the M-to-G1 transition, Hildebrand et at (2024) have already reported such experiments. They used a topo II inhibitor to provided evidence that mitotic chromatids are self-entangled and that the removal of these mitotic entanglements is required to establish a normal interphase nucleus. Our own preliminary experiments (not presented in the current manuscript) showed that ICRF treatment of cells undergoing the M-to-G1 transition did not affect post-mitotic centromere dispersion. The same treatment also had little effect on the suppression of centromere dispersion observed in condensin II-depleted cells.

      Under G2-arrested condition, because chromosome territories are largely individualized, we would expect topo II inhibition to affect only the extent of sister catenation, which is not the focus of our current study. We anticipate that inhibiting topo II in G2 would have only a marginal, if any, effect on the maintenance of chromosome territories detectable by our current FISH approaches.

      In any case, we consider the suggested experiment to be beyond the scope of the present manuscript, which focuses on the collaborative roles of condensin II and cohesin as revealed by multi-scale FISH analyses.

      Secondly, if the author's claim of handover is correct then one (not exclusive) possibility is that there is a relationship between condensin II and cohesin loading onto chromatin. There does seem to be a modest co-dependence (e.g. fig S4 and S7), could the authors comment on this?

      First of all, we wish to point out the reviewer’s confusion between the G2 experiments and the M-to-G1 experiments. Figs. S4 and S7 concern experiments using G2-arrested cells, not M-to-G1 cells in which a possible handover mechanism is discussed. Based on Fig. 1, in which the extent of depletion in M-to-G1 cells was tested, no evidence of “co-dependence” between H2 depletion and RAD21 depletion was observed.

      That said, as the reviewer correctly points out, we acknowledge the presence of marginal yet statistically significant reductions in the RAD21 signal upon H2 depletion (and vice versa) in G2-arrested cells (Figs. S4 and S7).

      Another control experiment here would be to treat fully WT cells with IAA and test whether non-AID labelled H2 or RAD21 dip in intensity. If they do not, then perhaps there's a causal relationship between condensin II and cohesin levels?

      According to the reviewer’s suggestion, we tested whether IAA treatment causes an unintentional decreases in the H2 or RAD21 signals in G2-arrested cells, and found that it is not the case (see the attached figure below).

      Thus, these data indicate that there is a modest functional interdependence between condensin II and cohesin in G2-arrested cells. For instance, condensin II depletion may modestly destabilize chromatin-bound cohesin (and vice versa). However, we note that these effects are minor and do not affect the overall conclusions of the study. In the revised manuscript, we have described these potentially interesting observations briefly as a note in the corresponding figure legends (Fig. S4).

      I recognise this is something considered in Brunner et al 2025 (JCB), but in their case they depleted SMC4 (so all condensins are lost or at least dismantled). Might bear further investigation.

      Methods:

      Data and methods are described in reasonable detail, and a decent number of replicates/statistical analyses have been. Documentation of the cell lines used could be improved. The actual cell line is not mentioned once in the manuscript. Although it is referenced, I'd recommend including the identity of the cell line (HCT116) in the main text when the cells are introduced and also in the relevant supplementary tables. Will make it easier for readers to contextualise the findings.

      We apologize for the omission of important information regarding the parental cell line used in the current study. The information has been added to Materials and Methods as well as the resource table.

      Minor comments:

      Overall the manuscript is well-written and well presented. In the introduction it is suggested that no experiment has established a causal relationship between human condensin II and chromosome territories, but this is not correct, Hoencamp et al 2021 (cell) observed loss of CTs after condensin II depletion. Although that manuscript did not investigate it in as much detail as the present study, the fundamental relationship was previously established, so I would encourage the authors to revise this statement.

      We are somewhat puzzled by this comment. In the original manuscript, we explicitly cited Hoencamp et al (2021) in support of the following sentences:

      • *

      (Lines 78-83 in the original manuscript)

      *Moreover, high-throughput chromosome conformation capture (Hi-C) analysis revealed that, under such conditions, chromosomes retain a parallel arrangement of their arms, reminiscent of the so-called Rabl configuration (Hoencamp et al., 2021). These findings indicate that the loss or impairment of condensin II during mitosis results in defects in post-mitotic chromosome organization. *

      • *

      That said, to make the sentences even more precise, we have made the following revision in the manuscript.

      • *

      (Lines 78- 82 in the revised manuscript)

      *Moreover, high-throughput chromosome conformation capture (Hi-C) analysis revealed that, under such conditions, chromosomes retain a parallel arrangement of their arms, reminiscent of the so-called Rabl configuration (Hoencamp et al., 2021). These findings,together with cytological analyses of centromere distributions, indicate that the loss or impairment of condensin II during mitosis results in defects in post-mitotic chromosome organization. *

      • *

      The following statement was intended to explain our current understanding of the maintenance of chromosome territories. Because Hoencamp et al (2021) did not address the maintenance of CTs, we have kept this sentence unchanged.

      • *

      (Lines 100-102 in the original manuscript)

      Despite these findings, there is currently no evidence that either condensin II, cohesin, or their combined action contributes to the maintenance of CT morphology in mammalian interphase cells (Cremer et al., 2020).

      • *

      • *

      Reviewer #2 (Significance (Required)):

      General assessment:

      Strengths: the multiscale investigation of genome architecture at different stages of interphase allow the authors to present convincing and well-analysed data that provide meaningful insight into local and global chromosome organisation across different scales.

      Limitations:

      As suggested in major comments.

      Advance:

      Although the role of condensin II in generating chromosome territories, and the roles of cohesin in interphase genome architecture are established, the interplay of the complexes and the stage specific roles of condensin II have not been investigated in human cells to the level presented here. This study provides meaningful new insight in particular into the role of condensin II in global genome organisation during interphase, which is much less well understood compared to its participation in mitosis.

      Audience:

      Will contribute meaningfully and be of interest to the general community of researchers investigating genome organisation and function at all stages of the cell cycle. Primary audience will be cell biologists, geneticists and structural biochemists. Importance of genome organisation in cell/organismal biology is such that within this grouping it will probably be of general interest.

      My expertise is in genome organization by SMCs and chromosome segregation.

      We appreciate the reviewer’s supportive comments. As the reviewer fully acknowledges, this study is the first systematic survey of the collaborative role of condensin II and cohesin in establishing and maintaining interphase chromosome territories. In particular, multi-scale FISH analyses have enabled us to clarify how the two SMC protein complexes contribute to the maintenance of G2 chromosome territories through their actions at different genomic scales. As the reviewer notes, we believe that the current study will appeal to a broad readership in cell and chromosome biology. The limitations of the current study mentioned by the reviewer are addressed in our reply above.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary:

      The manuscript “Condensin II collaborates with cohesin to establish and maintain interphase chromosome territories" investigates how condensin II and cohesin contribute to chromosome organization during the M-to-G1 transition and in G2 phase using published auxin-inducible degron (AID) cell lines which render the respective protein complexes nonfunctional after auxin addition. In this study, a novel degron cell line was established that enables the simultaneous depletion of both protein complexes, thereby facilitating the investigation of synergistic effects between the two SMC proteins. The chromosome architecture is studied using fluorescence in situ hybridization (FISH) and light microscopy. The authors reproduce a number of already published data and also show that double depletion causes during the M-to-G1 transition defects on chromosome territories, producing expanded, irregular shapes that obscure condensin II-specific phenotypes. Findings in G2 cells point to a new role of condensin II for chromosome conformation at a scale of ~20Mb. Although individual depletion has minimal effects on large-scale CT morphology in G2, combined loss of both complexes produces marked structural abnormalities, including irregular crescent-shaped CTs displaced toward the nucleolus and increased nucleolus-CT contact. The authors propose that condensin II and cohesin act sequentially and complementarily to ensure proper post-mitotic CT formation and maintain chromosome architecture across genomic scales.

      We greatly appreciate the reviewer’s supportive comments. The reviewer has accurately recognized our new findings concerning the collaborative roles of condensin II and cohesin in the establishment and maintenance of interphase chromosome territories.

      Concenrs about statistics:

      • The authors provide the information on how many cells are analyzed but not the number of independent experiments. My concern is that there might variations in synchronization of the cell population and in the subsequent preparation (FISH) affecting the final result. We appreciate the reviewer’s important comment regarding the biological reproducibility of our experiments. As the reviewer correctly points out, variations in cell-cycle synchronization and FISH sample preparation can occur across experiments. To address this concern, we repeated the key experiments supporting our main conclusions (Figs. 3 and 6) two additional times, resulting in three independent biological replicas in total. All replicate experiments reproduced the major observations from the original analyses. These results further substantiated our original conclusion, despite the inevitable variability arising from cell synchronization or sample preparation in this type of experiments. In the revised manuscript, we have now explicitly indicated the number of biological replicates in the corresponding figures.

      The analyses of chromosome-arm conformation shown in Fig. 5 were already performed in three independent rounds of experiments, as noted in the original submission. In addition, similar results were already obtained in other analyses reported in the manuscript. For example, centromere dispersion was quantified using an alternative centromere detection method (related to Fig. 1), and distances between specific chromosomal sites were measured using different locus-specific probes (related to Figs. 2 and 4). In both cases, the results were consistent with those presented in the manuscript.

      • Statistically the authors analyze the effect of cells with induced degron vs. vehicle control (non-induced). However, the biologically relevant question is whether the data differ between cell lines when the degron system is induced. This is not tested here (cf. major concern 2 and 3). See our reply to major concerns 2 and 3.

      • Some Journal ask for blinded analysis of the data which might make sense here as manual steps are involved in the data analysis (e.g. line 626 / 627the convex hull of the signals was manually delineated, line 635 / 636 Chromosome segmentation in FISH images was performed using individual thresholding). However personally I have no doubts on the correctness of the work. We thank the reviewer for pointing out that some steps in our data analysis were performed manually, such as delineating the convex hull of signals and segmenting chromosomes in FISH and IF images using individual thresholds. These manual steps were necessary because signal intensities vary among cells and chromosomes, making fully automated segmentation unreliable. To ensure objectivity, we confirmed that the results were consistent across two independently established double-depletion cell lines, which produced essentially identical findings. In addition, we repeated the key experiments underpinning our main conclusions (Figs. 3 and 6) two additional times, and the results were fully consistent with the original analyses. Therefore, we are confident that our current data analysis approach does not compromise the validity of our conclusions. Finally, we appreciate the reviewer’s kind remark that there is no doubt regarding the correctness of our work.

      Major concerns:

      • Degron induction appears to delay in Rad21-AID#1 and Double-AID#1 cells the transition from M to G1, as shown in Fig. S1. After auxin treatment, more cells exhibit a G2 phenotype than in an untreated population. What are the implications of this for the interpretation of the experiments? In our protocol shown in Fig. 1C, cells were released into mitosis after G2 arrest, and IAA was added 30 min after release. It is well established that cohesin depletion causes a prometaphase delay due to spindle checkpoint activation (e.g., Vass et al, 2003, Curr Biol; Toyoda and Yanagida, 2006, MBoC; Peters et al, 2008, Genes Dev), which explains why cells with 4C DNA content accumulated, as judged by FACS (Fig. S1). The same was true for doubly depleted cells. However, a fraction of cells that escaped this delay progressed through mitosis and enter the G1 phase of the next cell cycle. We selected these early G1 cells and used them for down-stream analyses. This experimental procedure was explicitly described in the legends of Fig. 1C and Fig. S1A as follows:

      (Lines 934-937; Legend of Fig. 1C)

      From the synchronized populations, early G1cells were selected based on their characteristic morphologies (i.e., pairs of small post-mitotic cells) and subjected to downstream analyses. Based on the measured nuclear sizes (Fig. S2 G), we confirmed that early G1 cells were appropriately selected.

      (Lines 1114-1119; Legend of Fig. S1A)

      In this protocol, ~60% of control and H2-depleted cells, and ~30% of Rad21-depleted and co-depleted cells, were successfully synchronized in G1 phase. The apparently lower synchronization efficiency in the latter two groups is attributable to the well documented mitotic delay caused by cohesin depletion (Hauf et al., 2005; Haarhuis et al., 2013; Perea-Resa et al., 2020). From these synchronized populations, early G1 cells were selected based on their characteristic morphologies (see the legend of Fig. 1 C).

      • *

      Thus, using this protocol, we analyzed an early G1 cell population that had completed mitosis without chromosome segregation defects. We acknowledge that this represents a technically challenging aspect of synchronizing cell-cycle progression from M to G1 in HCT116 cells, whose synchronization efficiency is limited compared with that of HeLa cells. Nevertheless, this approach constitutes the most practical strategy currently available.

      • Line 178 "In contrast, cohesin depletion had a smaller effect on the distance between the two site-specific probes compared to condensin II depletion (Fig. 2, C and E)." The data in Fig. 2 E show both a significant effect of H2 and a significant effect of RAD21 depletion. Whether the absolute difference in effect size between the two conditions is truly relevant is difficult to determine, as the distribution of the respective control groups also appears to be different. This comment is well taken. Reviewer #1 has made a comment on the same issue. See our reply to Reviewer #1 (Other points, Figure 2E).

      In brief, in the current study, we should focus on the differences between -IAA and +IAA within each cell line, rather than comparing the -IAA conditions across different cell lines. In this sense, a sentence in the original manuscript (lines 178-180) was misleading. In the revised manuscript, we have modified the corresponding and subsequent sentence as follows:

      Although cohesin depletion had a marginal effect on the distance between the two site-specific probes (Fig.2, C and E), double depletion did not result in a significant change (Fig.2, D and E), consistent with the partial restoration of centromere dispersion (Fig. 1G).

      • In Figures 3, S3 and related text in the manuscript I cannot follow the authors' argumentation, as H2 depletion alone leads to a significant increase in the CT area (Chr. 18, Chr. 19, Chr. 15). Similar to Fig. 2, the authors argue about the different magnitude of the effect (H2 depletion vs double depletion). Here, too, appropriate statistical tests or more suitable parameters describing the effect should be used. I also cannot fully follow the argumentation regarding chromosome elongation, as double depletion in Chr. 18 and Chr. 19 also leads to a significantly reduced circularity. Therefore, the schematic drawing Fig. 3 H (double depletion) seems very suggestive to me. This comment is related to the comment above (Major comment #2). See our reply to Reviewer #1 (Other points, Figure 2E).

      It should be noted that, in Figure 3 (unlike in Figure 2), we did not compare the different magnitudes of the effect observed between H2 depletion and double depletion. Thus, the reviewer’s comment that “Similar to Fig. 2, the authors argue about the different magnitude of the effect (H2 depletion vs double depletion) ” does not accurately reflected our description.

      Moreover, while the distance between two specific loci (Fig. 2E) and CT circularity (Fig. 3G) are intuitively related, they represent distinct parameters. Thus, it is not unexpected that double depletion resulted in apparently different outcomes for the two measurements. Thus, the reviewer’s counter-argument is not strictly applicable here.

      That said, we agree with the reviewer that our descriptions here need to be clarified.

      The differences between H2 depletion and double depletion are two-fold: (1) centromere dispersion is suppressed upon H2 depletion, but not upon double depletion (Fig 1G); (2) the distance between Cen 12 and 12q15 increased upon H2 depletion, but not upon double depletion (Fig 2E).

      We have decided to remove the “homologous pair overlap” panel (formerly Fig. 3E) from the revised manuscript. Accordingly, the corresponding sentence has been deleted from the main text. Instead, we have added a new panel of “aspect ratio”, defined as the ratio of the major to the minor axis (new Fig. 3F). While this intuitive parameter was altered upon condensin II depletion and double depletion, again, we acknowledge that it is not sufficient to convincingly distinguish between the elongated and cloud-like phenotypes proposed in the original manuscript. For these reasons, in the revised manuscript, we have toned down our statements regarding the differences in CT morphology between the two conditions. Nonetheless, together with the data from Figs. 1 and 2, it is clear that the Rabl configuration observed upon condensin II depletion is further exacerbated in the absence of cohesin. Accordingly, we have modified the main text and the cartoon (Fig 3H) to more accurately depict the observations summarized above.

      • 5 and accompanying text. I agree with the authors that this is a significant and very interesting effect. However, I believe the sharp bends is in most cases an artifact caused by the maximum intensity projection. I tried to illustrate this effect in two photographs: Reviewer Fig. 1, side view, and Reviewer Fig. 2, same situation top view (https://cloud.bio.lmu.de/index.php/s/77npeEK84towzJZ). As I said, in my opinion, there is a significant and important effect; the authors should simply adjust the description. This comment is well taken. We appreciate the reviewer’s effort to help clarify our original observations. We have therefore added a new section entitled “Limitations of the study” to explicitly describe the constrains of our current approach. That said, as the reviewer also acknowledges, our observations remain valid because all experiments were performed with appropriate controls.

      Minor concerns:

      • I would like to suggest proactively discussing possible artifacts that may arise from the harsh conditions during FISH sample preparation. We fully agree with the reviewer’s concerns. For FISH sample preparation, we used relatively harsh conditions, including (1) fixation under a hypotonic condition (0.3x PBS), (2) HCl treatment, and (3) a denaturation step. We recognize that these procedures inevitably affect the preservation of the original structure; however, they are unavoidable in the standard FISH protocol. We also acknowledge that our analyses were limited to 2D structures based on projected images, rather than full 3D reconstructions. These technical limitations are now explicitly described in a new section entitled “Limitations of the study”, and the technical details are provided in Materials and Methods.

      • It would be helpful if the authors could provide the original data (microscopic image stacks) for download. We thank the reviewer for this suggestion and understand that providing the original image stacks could be of interest to readers. We agree that if the nuclei were perfectly spherical, as is the case for example in lymphocytes, 3D image stacks would contain much more information than 2D projections. However, as is typical for adherent cultured cells, including the HCT116-derived cells used in this study, the nuclei are flattened due to cell adhesion to the culture dish, with a thickness of only about one-tenth of the nuclear diameter (10–20 μm). Considering also the inevitable loss of structural preservation during FISH sample preparation, we were concerned that presenting 3D images might confuse rather than clarify. We therefore believe that representing the data as 2D projections, while explicitly acknowledging the technical limitations, provides the clearest and most interpretable presentation of our results. These limitations are now described in a new section of the manuscript.

      • The authors use a blind deconvolution algorithm to improve image quality. It might be helpful to test other methods for this purpose (optional). We thank the reviewer for this valuable suggestion and fully agree that it is a valid point. We recognize that alternative image enhancement methods can offer advantages, particularly for smaller structures or when multiple probes are analyzed simultaneously. In our study, however, the focus was on detecting whole chromosome territories (CTs) and specific chromosomal loci, which can be visualized clearly with our current FISH protocol combined with blind deconvolution. We therefore believe that the image quality we obtained is sufficient to support the conclusions of this manuscript.

      Reviewer #3 (Significance (Required)):

      Advance:

      Ono et al. addresses the important question on how the complex pattern of chromatin is reestablished after mitosis and maintained during interphase. In addition to affinity interactions (1,2), it is known that cohesin plays an important role in the formation and maintenance of chromosome organization interphase (3). However, current knowledge does not explain all known phenomena. Even with complete loss of cohesin, TAD-like structures can be recognized at the single-cell level (4), and higher structures such as chromosome territories are also retained (5). The function of condensin II during mitosis is another important factor that affects chromosome architecture in the following G1 phase (6). Although condensin II is present in the cell nucleus throughout interphase, very little is known about the role of this protein in this phase of the cell cycle. This is where the present publication comes in, with a new double degron cell line in which essential subunits of cohesin AND condensin can be degraded in a targeted manner. I find the data from the experiments in the G2 phase most interesting, as they suggest a previously unknown involvement of condensin II in the maintenance of larger chromatin structures such as chromosome territories.

      The experiments regarding the M-G1 transition are less interesting to me, as it is known that condensin II deficiency in mitosis leads to elongated chromosomes (Rabl configuration)(6), and therefore the double degradation of condensin II and cohesin describes the effects of cohesin on an artificially disturbed chromosome structure.

      For further clarification, we provide below a table summarizing previous studies relevant to the present work. We wish to emphasize three novel aspects of the present study. First, newly established cell lines designed for double depletion enabled us to address questions that had remained inaccessible in earlier studies. Second, to our knowledge, no study has previously reported condensin II depletion, cohesin depletion and double depletion in G2-arrested cells. Third, the present study represents the first systematic comparison of two different stages of the cell cycle using multiscale FISH under distinct depletion conditions. Although the M-to-G1 part of the present study partially overlaps with previous work, it serves as an important prelude to the subsequent investigations. We are confident that the reviewer will also acknowledge this point.

      cell cycle

      cond II depletion

      cohesin depletion

      double depletion

      M-to-G1

      Hoencamp et al (2021); Abramo et al (2019); Brunner et al (2025);

      this study

      Schwarzer et al (2017);

      Wutz et al (2017);

      this study

      this study

      G2

      this study

      this study

      this study

      Hoencamp et al (2021): Hi-C and imaging (CENP-A distribution)

      Abramo et al (2019): Hi-C and imaging

      Brunner et al (2025): mostly imaging (chromatin tracing)

      Schwarzer et al (2017); Wutz et al (2017): Hi-C

      this study: imaging (multi-scale FISH)

      General limitations:

      (1) Single cell imaging of chromatin structure typically shows only minor effects which are often obscured by the high (biological) variability. This holds also true for the current manuscript (cf. major concern 2 and 3).

      See our reply above.

      (2) A common concern are artefacts introduced by the harsh conditions of conventional FISH protocols (7). The authors use a method in which the cells are completely dehydrated, which probably leads to shrinking artifacts. However, differences between samples stained using the same FISH protocol are most likely due to experimental variation and not an artefact (cf. minor concern 1).

      See our reply above.

      • The anisotropic optical resolution (x-, y- vs. z-) of widefield microscopy (and most other light microscopic techniques) might lead to misinterpretation of the imaged 3D structures. This seems to be the cases in the current study (cf. major concern 4). See our reply above.

      • In the present study, the cell cycle was synchronized. This requires the use of inhibitors such as the CDK1 inhibitor RO-3306. However, CDK1 has many very different functions (8), so unexpected effects on the experiments cannot be ruled out. The current approaches involving FISH inevitably require cell cycle synchronization. We believe that the use of the CDK1 inhibitor RO-3306 to arrest the cell cycle at G2 is a reasonable choice, although we cannot rule out unexpected effects arising from the use of the drug. This issue has now been addressed in the new section entitled “Limitations of the study”.

      Audience:

      The spatial arrangement of genomic elements in the nucleus and their (temporal) dynamics are of high general relevance, as they are important for answering fundamental questions, for example, in epigenetics or tumor biology (9,10). The manuscript from Ono et al. addresses specific questions, so its intended readership is more likely to be specialists in the field.

      We are confident that, given the increasing interest in the 3D genome and its role in regulating diverse biological functions, the current manuscript will attract the broad readership of leading journals in cell biology.

      About the reviewer:

      By training I'm a biologist with strong background in fluorescence microscopy and fluorescence in situ hybridization. In recent years, I have been involved in research on the 3D organization of the cell nucleus, chromatin organization, and promoter-enhancer interactions.

      We greatly appreciate the reviewer’s constructive comments on both the technical strengths and limitations of our fluorescence imaging approaches, which have been very helpful in revising the manuscript. As mentioned above, we have decided to add a special paragraph entitled “Limitations of the study” at the end of the Discussion section to discuss these issues.

      All questions regarding the statistics of angularly distributed data are beyond my expertise. The authors do not correct their statistical analyses for "multiple testing". Whether this is necessary, I cannot judge.

      We thank the reviewer for raising this important point. In our study, the primary comparisons were made between -IAA and +IAA conditions within the same cell line. Accordingly, the figures report P-values for these pairwise comparisons.

      For the distance measurements, statistical evaluations were performed in PRISM using ANOVA (Kruskal–Wallis test), and the P-values shown in the figures are based on these analyses (Fig. 1, G and H; Fig. 2 E; Fig. 3 F and G; Fig. 4 F; Fig. 6 F [right]–H; Fig. S2 B and G; Fig. S3 D and H; Fig. S5 A [right] and B [right]; Fig. S8 B). While the manuscript focuses on pairwise comparisons between -IAA and +IAA conditions within the same cell line, we also considered potential differences across cell lines as part of the same ANOVA framework, thereby ensuring that multiple testing was properly addressed. Because cell line differences are not the focus of the present study, the corresponding results are not shown.

      For the angular distribution analyses, we compared -IAA and +IAA conditions within the same cell line using the Mardia–Watson–Wheeler test; these analyses do not involve multiple testing (circular scatter plots; Fig. 5 C–E and Fig. S6 B, C, and E–H). In addition, to determine whether angular distributions exhibited directional bias under each condition, we applied the Rayleigh test to each dataset individually (Fig. 5 F and Fig. S6 I). As these tests were performed on a single condition, they are also not subject to the problem of multiple testing. Collectively, we consider that the statistical analyses presented in our manuscript appropriately account for potential multiple testing issues, and we remain confident in the robustness of the results.

      Literature

      Falk, M., Feodorova, Y., Naumova, N., Imakaev, M., Lajoie, B.R., Leonhardt, H., Joffe, B., Dekker, J., Fudenberg, G., Solovei, I. et al. (2019) Heterochromatin drives compartmentalization of inverted and conventional nuclei. Nature, 570, 395-399. Mirny, L.A., Imakaev, M. and Abdennur, N. (2019) Two major mechanisms of chromosome organization. Curr Opin Cell Biol, 58, 142-152. Rao, S.S.P., Huang, S.C., Glenn St Hilaire, B., Engreitz, J.M., Perez, E.M., Kieffer-Kwon, K.R., Sanborn, A.L., Johnstone, S.E., Bascom, G.D., Bochkov, I.D. et al. (2017) Cohesin Loss Eliminates All Loop Domains. Cell, 171, 305-320 e324. Bintu, B., Mateo, L.J., Su, J.H., Sinnott-Armstrong, N.A., Parker, M., Kinrot, S., Yamaya, K., Boettiger, A.N. and Zhuang, X. (2018) Super-resolution chromatin tracing reveals domains and cooperative interactions in single cells. Science, 362. Cremer, M., Brandstetter, K., Maiser, A., Rao, S.S.P., Schmid, V.J., Guirao-Ortiz, M., Mitra, N., Mamberti, S., Klein, K.N., Gilbert, D.M. et al. (2020) Cohesin depleted cells rebuild functional nuclear compartments after endomitosis. Nat Commun, 11, 6146. Hoencamp, C., Dudchenko, O., Elbatsh, A.M.O., Brahmachari, S., Raaijmakers, J.A., van Schaik, T., Sedeno Cacciatore, A., Contessoto, V.G., van Heesbeen, R., van den Broek, B. et al. (2021) 3D genomics across the tree of life reveals condensin II as a determinant of architecture type. Science, 372, 984-989. Beckwith, K.S., Ødegård-Fougner, Ø., Morero, N.R., Barton, C., Schueder, F., Tang, W., Alexander, S., Peters, J.-M., Jungmann, R., Birney, E. et al. (2023) Nanoscale 3D DNA tracing in single human cells visualizes loop extrusion directly in situ. BioRxiv 8 of 9https://doi.org/10.1101/2021.04.12.439407. Massacci, G., Perfetto, L. and Sacco, F. (2023) The Cyclin-dependent kinase 1: more than a cell cycle regulator. Br J Cancer, 129, 1707-1716. Bonev, B. and Cavalli, G. (2016) Organization and function of the 3D genome. Nat Rev Genet, 17, 661-678. Dekker, J., Belmont, A.S., Guttman, M., Leshyk, V.O., Lis, J.T., Lomvardas, S., Mirny, L.A., O'Shea, C.C., Park, P.J., Ren, B. et al. (2017) The 4D nucleome project. Nature, 549, 219-226.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary:

      The manuscript „Condensin II collaborates with cohesin to establish and maintain interphase chromosome territories" investigates how condensin II and cohesin contribute to chromosome organization during the M-to-G1 transition and in G2 phase using published auxin-inducible degron (AID) cell lines which render the respective protein complexes nonfunctional after auxin addition. In this study, a novel degron cell line was established that enables the simultaneous depletion of both protein complexes, thereby facilitating the investigation of synergistic effects between the two SMC proteins. The chromosome architecture is studied using fluorescence in situ hybridization (FISH) and light microscopy. The authors reproduce a number of already published data and also show that double depletion causes during the M-to-G1 transition defects on chromosome territories, producing expanded, irregular shapes that obscure condensin II-specific phenotypes. Findings in G2 cells point to a new role of condensin II for chromosome conformation at a scale of ~20Mb. Although individual depletion has minimal effects on large-scale CT morphology in G2, combined loss of both complexes produces marked structural abnormalities, including irregular crescent-shaped CTs displaced toward the nucleolus and increased nucleolus-CT contact. The authors propose that condensin II and cohesin act sequentially and complementarily to ensure proper post-mitotic CT formation and maintain chromosome architecture across genomic scales.

      Concerns about statistics:

      (1) The authors provide the information on how many cells are analyzed but not the number of independent experiments. My concern is that there might variations in synchronization of the cell population and in the subsequent preparation (FISH) affecting the final result.

      (2) Statistically the authors analyze the effect of cells with induced degron vs. vehicle control (non-induced). However, the biologically relevant question is whether the data differ between cell lines when the degron system is induced. This is not tested here (cf. major concern 2 and 3).

      (3) Some Journal ask for blinded analysis of the data which might make sense here as manual steps are involved in the data analysis (e.g. line 626 / 627the convex hull of the signals was manually delineated, line 635 / 636 Chromosome segmentation in FISH images was performed using individual thresholding). However personally I have no doubts on the correctness of the work.

      Major concerns:

      (1) Degron induction appears to delay in Rad21-AID#1 an Double-AID#1 cells the transition from M to G1, as shown in Fig. S1. After auxin treatment, more cells exhibit a G2 phenotype than in an untreated population. What are the implications of this for the interpretation of the experiments?

      (2) Line 178 "In contrast, cohesin depletion had a smaller effect on the distance between the two site-specific probes compared to condensin II depletion (Fig. 2, C and E)." The data in Fig. 2 E show both a significant effect of H2 and a significant effect of RAD21 depletion. Whether the absolute difference in effect size between the two conditions is truly relevant is difficult to determine, as the distribution of the respective control groups also appears to be different.

      (3) In Figures 3, S3 and related text in the manuscript I cannot follow the authors' argumentation, as H2 depletion alone leads to a significant increase in the CT area (Chr. 18, Chr. 19, Chr. 15). Similar to Fig. 2, the authors argue about the different magnitude of the effect (H2 depletion vs double depletion). Here, too, appropriate statistical tests or more suitable parameters describing the effect should be used. I also cannot fully follow the argumentation regarding chromosome elongation, as double depletion in Chr. 18 and Chr. 19 also leads to a significantly reduced circularity. Therefore, the schematic drawing Fig. 3 H (double depletion) seems very suggestive to me.

      (4) Fig. 5 and accompanying text. I agree with the authors that this is a significant and very interesting effect. However, I believe the sharp bends is in most cases an artifact caused by the maximum intensity projection. I tried to illustrate this effect in two photographs: Reviewer Fig. 1, side view, and Reviewer Fig. 2, same situation top view (https://cloud.bio.lmu.de/index.php/s/77npeEK84towzJZ). As I said, in my opinion, there is a significant and important effect; the authors should simply adjust the description.

      Minor concerns:

      (1) I would like to suggest proactively discussing possible artifacts that may arise from the harsh conditions during FISH sample preparation..

      (2) It would be helpful if the authors could provide the original data (microscopic image stacks) for download

      (3) The authors use a blind deconvolution algorithm to improve image quality. It might be helpful to test other methods for this purpose (optional).

      Significance

      Advance:

      Ono et al. addresses the important question on how the complex pattern of chromatin is reestablished after mitosis and maintained during interphase. In addition to affinity interactions (1,2), it is known that cohesin plays an important role in the formation and maintenance of chromosome organization interphase (3). However, current knowledge does not explain all known phenomena. Even with complete loss of cohesin, TAD-like structures can be recognized at the single-cell level (4), and higher structures such as chromosome territories are also retained (5). The function of condensin II during mitosis is another important factor that affects chromosome architecture in the following G1 phase (6). Although condensin II is present in the cell nucleus throughout interphase, very little is known about the role of this protein in this phase of the cell cycle. This is where the present publication comes in, with a new double degron cell line in which essential subunits of cohesin AND condensin can be degraded in a targeted manner. I find the data from the experiments in the G2 phase most interesting, as they suggest a previously unknown involvement of condensin II in the maintenance of larger chromatin structures such as chromosome territories. The experiments regarding the M-G1 transition are less interesting to me, as it is known that condensin II deficiency in mitosis leads to elongated chromosomes (Rabl configuration)(6), and therefore the double degradation of condensin II and cohesin describes the effects of cohesin on an artificially disturbed chromosome structure.

      General limitations:

      (1) Single cell imaging of chromatin structure typically shows only minor effects which are often obscured by the high (biological) variability. This holds also true for the current manuscript (cf. major concern 2 and 3).

      (2) A common concern are artefacts introduced by the harsh conditions of conventional FISH protocols (7). The authors use a method in which the cells are completely dehydrated, which probably leads to shrinking artifacts. However, differences between samples stained using the same FISH protocol are most likely due to experimental variation and not an artefact (cf. minor concern 1).

      (3) The anisotropic optical resolution (x-, y- vs. z-) of widefield microscopy (and most other light microscopic techniques) might lead to misinterpretation of the imaged 3D structures. This seems to be the cases in the current study (cf. major concern 4).

      (4) In the present study, the cell cycle was synchronized. This requires the use of inhibitors such as the CDK1 inhibitor RO-3306. However, CDK1 has many very different functions (8), so unexpected effects on the experiments cannot be ruled out.

      Audience:

      The spatial arrangement of genomic elements in the nucleus and their (temporal) dynamics are of high general relevance, as they are important for answering fundamental questions, for example, in epigenetics or tumor biology (9,10). The manuscript from Ono et al. addresses specific questions, so its intended readership is more likely to be specialists in the field.

      About the reviewer: By training I'm a biologist with strong background in fluorescence microscopy and fluorescence in situ hybridization. In recent years, I have been involved in research on the 3D organization of the cell nucleus, chromatin organization, and promoter-enhancer interactions.

      All questions regarding the statistics of angularly distributed data are beyond my expertise. The authors do not correct their statistical analyses for "multiple testing". Whether this is necessary, I cannot judge.

      Literature

      1. Falk, M., Feodorova, Y., Naumova, N., Imakaev, M., Lajoie, B.R., Leonhardt, H., Joffe, B., Dekker, J., Fudenberg, G., Solovei, I. et al. (2019) Heterochromatin drives compartmentalization of inverted and conventional nuclei. Nature, 570, 395-399.
      2. Mirny, L.A., Imakaev, M. and Abdennur, N. (2019) Two major mechanisms of chromosome organization. Curr Opin Cell Biol, 58, 142-152.
      3. Rao, S.S.P., Huang, S.C., Glenn St Hilaire, B., Engreitz, J.M., Perez, E.M., Kieffer-Kwon, K.R., Sanborn, A.L., Johnstone, S.E., Bascom, G.D., Bochkov, I.D. et al. (2017) Cohesin Loss Eliminates All Loop Domains. Cell, 171, 305-320 e324.
      4. Bintu, B., Mateo, L.J., Su, J.H., Sinnott-Armstrong, N.A., Parker, M., Kinrot, S., Yamaya, K., Boettiger, A.N. and Zhuang, X. (2018) Super-resolution chromatin tracing reveals domains and cooperative interactions in single cells. Science, 362.
      5. Cremer, M., Brandstetter, K., Maiser, A., Rao, S.S.P., Schmid, V.J., Guirao-Ortiz, M., Mitra, N., Mamberti, S., Klein, K.N., Gilbert, D.M. et al. (2020) Cohesin depleted cells rebuild functional nuclear compartments after endomitosis. Nat Commun, 11, 6146.
      6. Hoencamp, C., Dudchenko, O., Elbatsh, A.M.O., Brahmachari, S., Raaijmakers, J.A., van Schaik, T., Sedeno Cacciatore, A., Contessoto, V.G., van Heesbeen, R., van den Broek, B. et al. (2021) 3D genomics across the tree of life reveals condensin II as a determinant of architecture type. Science, 372, 984-989.
      7. Beckwith, K.S., Ødegård-Fougner, Ø., Morero, N.R., Barton, C., Schueder, F., Tang, W., Alexander, S., Peters, J.-M., Jungmann, R., Birney, E. et al. (2023) Nanoscale 3D DNA tracing in single human cells visualizes loop extrusion directly in situ. BioRxiv https://doi.org/10.1101/2021.04.12.439407.
      8. Massacci, G., Perfetto, L. and Sacco, F. (2023) The Cyclin-dependent kinase 1: more than a cell cycle regulator. Br J Cancer, 129, 1707-1716.
      9. Bonev, B. and Cavalli, G. (2016) Organization and function of the 3D genome. Nat Rev Genet, 17, 661-678.
      10. Dekker, J., Belmont, A.S., Guttman, M., Leshyk, V.O., Lis, J.T., Lomvardas, S., Mirny, L.A., O'Shea, C.C., Park, P.J., Ren, B. et al. (2017) The 4D nucleome project. Nature, 549, 219-226.
    1. Synthèse de la Conférence : Les Chiffres Mesurent-ils l’Essentiel ?

      Résumé

      Cette conférence inaugurale du cycle "Mesurer la valeur de notre monde" explore la tension croissante entre la quantification omniprésente de la société et la perception d'une perte de valeur.

      Les intervenants, issus des mathématiques, de la sondologie, de la comptabilité et de la philosophie, convergent vers une conclusion centrale :

      les chiffres, en eux-mêmes, ne mesurent pas l'essentiel.

      Leur véritable signification et leur pertinence dépendent entièrement des modèles, des conventions et des hypothèses qui les sous-tendent.

      Loin d'être objectifs ou neutres, ces cadres de référence sont le fruit de choix conceptuels, sociaux et souvent politiques, qui méritent un examen critique approfondi.

      Les principaux points à retenir sont les suivants :

      La primauté du modèle : Pour le mathématicien Cédric Villani, l'erreur la plus grave ne réside pas dans le calcul, mais dans le modèle de représentation du monde.

      Les chiffres ne sont que le produit final d'un raisonnement, de formules et d'hypothèses qui constituent le véritable cœur de l'analyse.

      Le contexte est clé :

      Le sondeur Jean-Daniel Lévy insiste sur le fait qu'un chiffre d'opinion isolé est dénué de sens.

      La compréhension émerge de l'analyse des tendances ("un film plutôt qu'une photo"), de la segmentation des données et, crucialement, de l'articulation entre les mesures quantitatives et les études qualitatives qui révèlent les logiques profondes des individus.

      La comptabilité comme outil d'action : L'expert-comptable Alexandre Rambaud déconstruit l'idée d'une comptabilité comme miroir objectif de la réalité.

      Il propose une vision instrumentale, notamment en comptabilité écologique, où les chiffres ne visent pas à "valoriser" la nature, mais à quantifier les moyens nécessaires à sa préservation pour guider l'action.

      La libération de la domination :

      La philosophe Valérie Charolles appelle à se "libérer de la domination des chiffres" en prenant conscience de leur nature construite.

      Elle met en lumière "l'innétrisme" (l'illettrisme numérique) qui nous rend vulnérables aux inférences trompeuses et plaide pour une réappropriation citoyenne des conventions (comptables, statistiques, électorales) qui façonnent notre monde.

      1. Introduction : La Quantification du Monde

      La conférence s'ouvre sur le constat d'une "quantification du monde" généralisée.

      Bettina Laville, présidente de l'IEA de Paris, souligne le paradoxe contemporain : alors que tout est mesuré – des sondages d'opinion quotidiens au reporting extra-financier des entreprises, jusqu'aux indicateurs de bonheur – une impression de "perte de valeur" prédomine.

      Ce sentiment naît de la crainte que le chiffre, en envahissant tous les domaines, n'efface "la valeur au sens de ce qui justement ne se compte pas".

      Ce cycle de cinq conférences a pour ambition d'explorer ce phénomène à travers plusieurs thématiques :

      1. Introduction générale (cette séance)

      2. La mesure de la nature

      3. La mesure des villes

      4. La mesure de l'égalité

      5. La mesure de la valeur elle-même (bonheur, etc.)

      2. La Primauté du Modèle sur le Chiffre : La Perspective du Mathématicien

      Cédric Villani, professeur de mathématiques et médaillé Fields, recadre d'emblée le débat en affirmant que l'essence des mathématiques réside dans le raisonnement et non dans le calcul.

      Le Raisonnement avant le Calcul

      Contrairement à l'image populaire du mathématicien comme "bon calculateur", la discipline, depuis la Grèce antique, se concentre sur "le raisonnement qui mène au calcul, pas dans le résultat lui-même".

      À l'ère des ordinateurs, de nombreux mathématiciens excellent dans l'échafaudage de concepts et de relations logiques, même s'ils sont "des brêles en calcul".

      Ce qui importe, ce sont les formules, les hypothèses et l'architecture intellectuelle sous-jacente.

      Leçons de l'Histoire des Sciences

      Cédric Villani illustre sa thèse par deux exemples historiques majeurs où l'erreur ne provenait pas du calcul mais du modèle :

      Cas d'Étude

      Le Modèle Sous-jacent

      L'Erreur et sa Nature

      Conclusion

      La Définition du Mètre (Révolution Française)

      Le mètre est défini comme la 40 millionième partie du tour de la Terre. Un projet scientifico-politique universaliste.

      Une erreur de mesure de 0,2 millimètre, vécue comme une "honte" par ses auteurs (Delambre et Méchain).

      L'erreur était minuscule, mais elle a tourmenté Méchain toute sa vie.

      L'erreur était dans la précision de la mesure, mais le modèle conceptuel était révolutionnaire et a fondé le système d'unités universel.

      Le Calcul de l'Âge de la Terre (19e siècle)

      Un modèle de refroidissement d'une Terre supposée solide, basé sur les travaux de Fourier.

      Une erreur monstrueuse. Le calcul de Lord Kelvin aboutissait à 24 millions d'années, alors que l'âge réel est de 4,5 milliards d'années.

      L'erreur venait entièrement du modèle de départ.

      La Terre possède un intérieur liquide générant de la convection, ce qui change radicalement les calculs.

      Il cite à ce propos Thomas Huxley : "La mathématique peut se comparer à un moulin d'une facture exquise [...] cependant ce que l'on en tire dépend de ce que l'on y a mis [...] des pages de formule ne fourniront pas un résultat fiable à partir de données imprécises."

      Les Hypothèses Politiques derrière les Chiffres

      Les chiffres utilisés dans le débat public ne sont jamais neutres ; ils reposent sur des hypothèses et des choix, souvent politiques.

      L'objectif de 2 tonnes de carbone par an et par individu : Ce chiffre repose sur une hypothèse politique forte, celle d'une répartition "également à travers tous les citoyens de l'humanité" du droit à émettre du carbone.

      Le calcul de Jean-Marc Jancovici sur les vols en avion : L'idée que chaque personne ne devrait prendre l'avion que quatre ou cinq fois dans sa vie est le résultat d'un calcul basé sur des hypothèses scientifiques et politiques, notamment sur la répartition de cet effort.

      Le rapport Meadows (Club de Rome, 1972) : Ce célèbre modèle du monde reliait cinq grands compartiments (démographie, pollution, industrie, etc.) via 140 équations.

      Ses auteurs reconnaissaient eux-mêmes l'impossibilité de modéliser des facteurs essentiels comme "la volonté politique d'agir" ou "le sentiment de justice".

      Ce qu'il Reste à Mesurer

      Interrogé sur ce qu'il regretterait de ne pas voir mesuré, Cédric Villani évoque le concept de "viscosité" de la société : "tout ce qui dans une société empêche d'agir".

      Cela inclut les rapports de pouvoir établis, les lourdeurs administratives, les procédures dilatoires, etc.

      Mesurer cette force d'inertie qui dissipe l'énergie du changement serait, selon lui, un indicateur fascinant.

      3. L'Opinion en Chiffres : Entre Mesure et Compréhension

      Jean-Daniel Lévy, directeur de l'institut Harris Interactive, apporte la perspective du sondeur, en soulignant la complexité cachée derrière les chiffres d'opinion.

      L'Immense Partie Immergée des Sondages

      Il révèle que les sondages publiés dans les médias représentent moins de 0,1 % de l'activité de son institut.

      L'essentiel du travail (99,9 %) est confidentiel et concerne le marketing, l'évaluation de produits ou les études pour des acteurs publics et privés.

      Nous sommes donc "sans le savoir entourés de formules mathématiques qui sont appelées à régir notre vie".

      Dépasser le Chiffre Unique

      Un chiffre de sondage ne doit jamais être considéré comme une "vérité absolue". Pour lui donner du sens, deux approches sont indispensables :

      1. Faire un film, pas une photographie : Il est crucial de poser la même question à intervalles réguliers pour observer les dynamiques et les évolutions d'opinion, par exemple sur une réforme comme celle des retraites.

      2. Analyser le détail des résultats : La véritable information se trouve dans la segmentation des données (selon le genre, l'âge, la catégorie sociale, la proximité politique, etc.), qui permet de comprendre les fractures et les logiques spécifiques à chaque groupe.

      L'Articulation Essentielle du Quantitatif et du Qualitatif

      Les chiffres mesurent, mais ne permettent pas toujours de comprendre.

      Pour saisir les logiques profondes, il faut recourir à des méthodes qualitatives (groupes de discussion, entretiens).

      Exemple de la réforme des retraites : Les études qualitatives ont révélé que pour beaucoup de Français, le débat ne portait pas sur les retraites elles-mêmes, mais sur le sens et la pénibilité du travail.

      Exemple des valeurs fondamentales : Les enquêtes qualitatives montrent que les grandes éruptions sociales en France se structurent souvent autour de deux notions fondatrices non-explicites : l'égalité (héritage de 1789) et la solidarité/service public (héritage de 1945).

      Les Chiffres Invisibles et la Subjectivité de la Mesure

      Les signaux faibles de 2017 : L'analyse des chiffres électoraux de 2017 aurait dû tempérer l'idée d'une adhésion massive au projet d'Emmanuel Macron.

      Deux données clés ont été sous-estimées : la baisse de la participation entre les deux tours (un fait inédit hors 1969) et le record absolu de 4 millions de votes blancs ou nuls, signifiant que 12 % des votants présents au second tour refusaient le choix proposé.

      La formulation des questions : Le résultat d'un sondage dépend étroitement de la manière dont la question est posée.

      Les cotes de confiance d'Emmanuel Macron peuvent ainsi varier de 29 % à 45 % selon l'institut, car les questions diffèrent subtilement ("faites-vous confiance pour...", "avoir de bonnes idées", "conduire le pays", etc.).

      En conclusion, les chiffres sont une "condition nécessaire mais non suffisante". Ils fournissent des repères, mais se fier exclusivement à eux sans analyse contextuelle et qualitative mène à des "erreurs remarquables".

      4. La Comptabilité comme Outil d'Action

      Alexandre Rambaud, titulaire de la chaire de comptabilité écologique, propose de voir la comptabilité non pas comme une technique de calcul, mais comme un système de représentation et de gouvernance.

      Les Quatre Fonctions de la Comptabilité

      Le chiffre n'est que la dernière étape du processus comptable, qui repose sur trois fonctions fondamentales préalables :

      1. Prendre en compte : Décider de ce qui est important, définir les objets à suivre et les classer dans des catégories.

      C'est un acte de représentation et de modélisation.

      2. Être comptable de ses actes : Lier les actions à des responsabilités (redevabilité) et en garder la trace.

      3. Rendre des comptes : Établir des rapports et des codes pour permettre la discussion et la prise de décision au sein d'une gouvernance.

      4. Compter : Utiliser des instruments chiffrés pour rendre la complexité d'une organisation assimilable et gérable.

      Mesure Instrumentale contre "Juste Valeur"

      La comptabilité est traversée par une opposition fondamentale :

      Le Measurement (la mesure) : Une approche instrumentale où les chiffres (quantitatifs et qualitatifs) sont des ordres de grandeur définis par des conventions internes pour piloter une organisation.

      La Valuation (la valorisation) : L'idée que le marché peut révéler une "juste valeur" objective d'un actif, y compris des ressources naturelles.

      Cette approche vise une sorte de transparence, une représentation absolue du monde en chiffres.

      La Proposition de la Comptabilité Écologique

      La chaire de comptabilité écologique se positionne fermement du côté du measurement.

      Elle rejette la tentation de "chiffrer un écosystème", ce qui n'aurait aucun sens.

      Son projet est d'utiliser les chiffres pour accompagner et outiller l'action de préservation :

      Au lieu de "valoriser" un écosystème, elle cherche à calculer les coûts nécessaires pour le préserver ou le restaurer.

      Au lieu de chercher une "juste valeur" de la nature, elle se demande combien il faudrait payer un agriculteur pour qu'il puisse à la fois vivre décemment et garantir le bon état écologique de ses sols.

      L'objectif n'est pas de mesurer l'essentiel dans l'absolu, mais de "mesurer ce qui est essentiel pour permettre de protéger ce que l'on a à protéger".

      5. Se Libérer de la Domination des Chiffres

      La philosophe Valérie Charolles conclut en appelant à une prise de distance critique face à l'hégémonie des chiffres.

      Le Défi de l' "Innétrisme" et les Inférences Trompeuses

      Nous sommes souvent mal armés pour interpréter les chiffres, ce qui conduit à des "inférences trompeuses".

      La communication est asymétrique entre les experts qui produisent les chiffres et le public qui les reçoit.

      Exemple de la croissance : Annoncer un taux de croissance de 6,8 % en Éthiopie contre 1,7 % en France est trompeur.

      Rapporté par habitant, le gain de richesse est 15 fois supérieur en France (705 $) qu'en Éthiopie (50 $).

      Présentation des données : Dire que la France a une croissance de 1,7 % est factuellement équivalent à dire que son PIB "est sur une tendance de doublement en 43 ans".

      La seconde formulation change radicalement la perception de la situation.

      Chiffres et Nombres : une Distinction Cruciale

      Il faut distinguer :

      Les nombres : Des entités théoriques abstraites, opérant par raisonnement pur (domaine des mathématiques).

      Les chiffres : Des grandeurs mesurées ou des quantités calculées qui visent à rendre compte du réel.

      Ils ne peuvent exister sans un ensemble de conventions (définitions, étalons de mesure, modèles).

      L'Analyse Critique des Conventions : Là où Tout se Joue

      La véritable analyse doit porter sur les normes, modèles et conventions qui servent à produire les chiffres, car "c'est là que tout se joue".

      Ces conventions peuvent être datées, limitées ou biaisées.

      La comptabilité d'entreprise : Son cadre, hérité de la Renaissance, traite le travail comme une charge et non comme une valeur, et privilégie une perspective de liquidation à court terme.

      Les modèles financiers : Ils sous-estiment systématiquement la probabilité des événements extrêmes (crises, krachs), comme l'a montré Benoît Mandelbrot.

      Les systèmes électoraux : La manière de compter les voix (proportionnelle, majoritaire) détermine la composition des parlements et donc les politiques menées.

      Le problème n'est donc pas de rejeter les chiffres, mais de "se libérer de leur domination".

      Cela implique de comprendre que nous avons un pouvoir sur eux, car ce sont des représentations politiques et sociales qui décident des lois électorales, des normes comptables ou des modes de calcul du PIB.

      La voie à suivre est de renforcer la culture statistique citoyenne et de soumettre les cadres de référence à un débat démocratique constant.

    1. Synthèse : Le Côté Sombre de la Morale

      Résumé

      Cette synthèse examine les thèses présentées par Jean Decety sur ce qu'il nomme "le côté sombre de la morale".

      L'argument central est que si la morale est un pilier de la coopération sociale, elle possède une facette destructrice.

      Lorsque des croyances se transforment en convictions morales absolues, elles deviennent un puissant moteur de dogmatisme, d'intolérance et de violence.

      Ces convictions, caractérisées par un sentiment d'objectivité, un consensus social perçu et une stabilité temporelle, transcendent les idéologies politiques et les causes spécifiques.

      L'objectif de la recherche de Decety est de développer un modèle théorique unifié, en s'appuyant sur la psychologie, les neurosciences, l'anthropologie et la théorie de l'évolution, pour expliquer les mécanismes psychologiques universels qui sous-tendent ce phénomène.

      Le processus clé est la "moralisation", qui convertit des préférences sociales en valeurs sacrées, engageant le système de récompense du cerveau.

      Ce processus est souvent associé à une faible sensibilité métacognitive, où les individus les plus extrêmes sont paradoxalement les moins informés sur le sujet, mais les plus convaincus de leur savoir.

      En moralisant une question, on la rend imperméable à l'analyse coûts-bénéfices et à tout compromis, ce qui conduit à une polarisation accrue et entrave le dialogue démocratique.

      1. La Double Nature de la Morale

      La morale est généralement perçue comme un produit de la co-évolution gènes-culture, spécifique à Homo sapiens, qui apporte des bénéfices clairs à la vie sociale.

      Le Côté Positif : La morale est un mécanisme essentiel qui :

      ◦ Régule les échanges interpersonnels.   

      ◦ Facilite la coexistence et la coopération.   

      ◦ Minimise ou canalise l'agression.   

      ◦ Équilibre les conflits entre les intérêts individuels et collectifs.   

      ◦ Motive les actions collectives pour le bien commun, comme le mouvement pour le droit de vote des femmes ou les droits civiques.

      Le Côté Sombre : C'est l'aspect qui intéresse principalement Jean Decety.

      La morale, lorsqu'elle est poussée à l'extrême sous forme de convictions inébranlables, peut :

      ◦ Alimenter le dogmatisme et l'intolérance.  

      ◦ Motiver la violence et des actions collectives extrêmes.   

      ◦ Justifier le vigilantisme, où des individus s'arrogent le droit de rendre la justice eux-mêmes.

      2. La Conviction Morale : Définition et Conséquences

      La conviction morale est le concept central de l'analyse.

      Elle est définie comme une croyance forte et absolue qu'une chose est intrinsèquement bonne ou mauvaise, morale ou immorale.

      Caractéristiques

      Une conviction morale est perçue par celui qui la détient comme :

      Absolue : Elle ne tolère aucune variation ou exception, quel que soit le contexte.

      Objective : Elle est considérée comme une vérité fondamentale de la réalité, applicable à tous, partout et à tout moment.

      Conséquences Négatives

      Lorsqu'une forte conviction morale est associée à la perception d'un large consensus au sein de sa communauté, elle peut conduire à :

      L'intolérance : Un refus d'accepter des points de vue divergents.

      Le dogmatisme : Un état d'esprit inflexible et un refus de l'analyse critique.

      La violence : L'histoire et l'actualité montrent que la violence est souvent utilisée pour maintenir un ordre moral perçu.

      Les auteurs de génocides, de guerres ou de tortures pensent fréquemment que leurs actions sont légitimes.

      Exemples Concrets Citées

      Plusieurs cas illustrent comment des individus aux idéologies très différentes partagent des mécanismes psychologiques similaires fondés sur la conviction morale :

      Cas

      Description

      Motivation Morale sous-jacente

      Émeutes au Nigeria (2002)

      Plus de 220 personnes tuées suite à la publication d'un article de journal jugé offensant envers le prophète Mahomet.

      Défense de l'honneur religieux.

      Lorna Green (Wyoming, USA)

      Condamnée pour avoir incendié une clinique pratiquant l'avortement.

      La vie est sacrée et l'avortement est un meurtre.

      Activistes climatiques

      Utilisation de "tactiques de choc" et de protestations violentes, comme celles contre un projet d'aéroport.

      Urgence de lutter contre le réchauffement climatique.

      Kathleen Stock (Angleterre)

      Professeure de philosophie harcelée et contrainte à la démission par des activistes transgenres.

      Conviction que l'affirmation selon laquelle le sexe est une réalité biologique est une attaque inacceptable.

      Terrorisme

      Les individus commettant des actes terroristes sont souvent fortement convaincus de la justesse de leur cause (divine ou politique).

      Accomplissement d'un devoir moral supérieur.

      3. L'Architecture Fonctionnelle de la Conviction Morale

      Decety propose un modèle fonctionnel pour expliquer la formation et les effets des convictions morales, basé sur l'interaction de plusieurs composantes.

      Composantes Clés

      1. Objectivité : La croyance que ses propres valeurs sont des vérités objectives et universellement applicables.

      2. Consensus Social : La perception que les membres de sa communauté ou de sa coalition partagent les mêmes croyances, ce qui renforce la conviction.

      3. Stabilité Temporelle : Plus une croyance est perçue comme ayant une base morale, plus elle reste stable dans le temps.

      Le Mécanisme Central : La Conversion des Préférences en Valeurs

      Le moteur de la conviction morale est sa capacité à transformer des préférences sociales en valeurs sacrées.

      Préférence : "Je choisis de ne pas manger de viande issue de l'élevage industriel." (Problème personnel)

      Valeur Moralisée : "Personne ne devrait manger de viande issue de l'élevage industriel car c'est immoral." (Problème moral universel)

      Les valeurs agissent comme des forces de motivation puissantes qui fixent des objectifs, guident les décisions et suscitent l'action.

      Le Substrat Neurobiologique

      • Les valeurs, y compris les valeurs morales, sont traitées par le système de récompense et de valuation du cerveau.

      Il n'existe pas de circuit cérébral spécifique à la morale ; celle-ci utilise les mêmes mécanismes que ceux qui attribuent une valeur à la nourriture ou à un partenaire.

      • La spécificité humaine réside dans la capacité unique de notre espèce à attribuer une valeur à des objets abstraits et arbitraires, comme des idéologies, des symboles (drapeau), une religion ou une cause politique.

      4. Mécanismes Psychologiques : Métacognition et Dogmatisme

      Les convictions morales fortes sont souvent associées à une faible capacité de réflexion critique.

      Métacognition : La capacité de réfléchir à ses propres processus de pensée.

      La sensibilité métacognitive mesure la corrélation entre la confiance d'une personne en sa réponse et la justesse réelle de cette réponse.

      Faible Sensibilité Métacognitive : Les recherches montrent que les individus dogmatiques et moralement convaincus ont souvent une faible sensibilité métacognitive.

      Il y a un décalage entre leur niveau de confiance (très élevé) et leurs connaissances réelles (souvent faibles).

      L'Exemple des OGM : Une étude menée aux États-Unis, en Allemagne et en France a montré que les opposants les plus extrêmes aux OGM étaient ceux qui avaient le moins de connaissances en biologie, mais qui pensaient en savoir le plus.

      C'est une illustration du principe : "Moins ils en savent, plus ils pensent savoir".

      5. Les Défis de la "Moralisation" et l'Analyse Coûts-Bénéfices

      Une fois qu'une question est "moralisée", elle devient extrêmement difficile à débattre rationnellement.

      Échec de l'Analyse Coûts-Bénéfices : Les convictions morales, en devenant des valeurs sacrées, empêchent toute forme de compromis ou d'analyse pragmatique des coûts et des bénéfices.

      Par exemple, pour un militant anti-avortement absolu, aucun argument contextuel (viol, âge de la mère, malformation du fœtus) ne peut justifier une exception.

      Polarisation et Démocratie : La moralisation excessive des débats publics conduit à une polarisation extrême, rendant le dialogue constructif et la recherche de compromis – essentiels à la vie en société – presque impossibles.

      Approche Proposée : Decety suggère que, même pour des sujets moralisés, encourager une analyse coûts-bénéfices est une voie pour progresser en tant que société, plutôt que de rester figé dans des positions irréconciliables.

      6. Points Clés de la Discussion (Q&A)

      Distinction entre Morale et Éthique : Pour les besoins de sa recherche sur les mécanismes psychologiques, Decety ne fait pas de distinction fondamentale.

      Il ne s'intéresse pas à ce que les gens devraient faire (éthique prescriptive), mais aux mécanismes qui transforment une préférence en une croyance absolue.

      Signification du terme "Absolu" : Une valeur est absolue lorsqu'elle est insensible au contexte, aux preuves factuelles ou aux circonstances atténuantes.

      L'exemple de l'avortement montre que même face à des scénarios extrêmes, la position morale reste inchangée.

      Perspective sur le Terrorisme : Decety est en accord avec l'idée que les terroristes sont hautement convaincus moralement.

      Cependant, il conteste le terme de "lavage de cerveau" (brainwashed), arguant que leurs actions sont souvent rationnelles au sein de leur propre système de valeurs, de leur histoire et des normes de leur groupe.

    1. Dossier d'Information : Les Dynamiques de la Négociation de Paix selon Alberto Fergusson

      Synthèse

      Ce document de synthèse analyse les réflexions et les expériences d'Alberto Fergusson, un acteur clé du processus de paix colombien, qui allie une expertise en médecine, psychiatrie et psychanalyse à une pratique intensive des négociations.

      Ses observations, issues de plus d'une décennie d'implication, notamment dans les pourparlers avec l'ELN, révèlent les dynamiques psychologiques et sociales complexes qui sous-tendent les processus de paix.

      Les points à retenir sont les suivants :

      Le Paradoxe de l'Accord (Individu vs. Groupe) :

      L'observation la plus frappante de Fergusson est qu'un accord est quasi systématiquement possible lors de discussions individuelles et privées avec les membres de la partie adverse, y compris les dirigeants.

      Cependant, cet accord devient impossible à atteindre une fois que les discussions retournent à la table de négociation formelle, avec ses dynamiques de groupe et ses impératifs de représentation.

      L'Importance Capitale des Canaux Parallèles ("Back Channels") : Contrairement à l'idée reçue, la majorité des décisions cruciales ne sont pas prises lors des sessions officielles, mais dans le cadre de discussions informelles et de réunions secrètes.

      La maîtrise de ces canaux parallèles est un art qui requiert l'identification des bons interlocuteurs et la gestion précise du format et de la durée des échanges.

      L'Application de la Psychopathologie à la Négociation : Fergusson tire ses principaux outils d'analyse de son travail avec des sans-abri atteints de maladies mentales graves.

      Il postule que les mécanismes de défense et les perturbations émotionnelles observées dans la "folie" éclairent les comportements, parfois irrationnels, des acteurs dans des situations de haute tension comme les négociations de paix.

      La Question Fondamentale sur l'Impact Réel des Négociations : Fergusson s'interroge de manière critique sur la capacité des négociations à modifier durablement les processus sociaux.

      Il se demande si les accords de paix réussis sont le fruit d'une habileté de négociation ou s'ils ne font que formaliser une évolution déjà inéluctable des dynamiques sociales, soulevant le risque de parvenir à des accords "artificiels" et prématurés.

      Contexte et Objectifs du Chercheur

      Alberto Fergusson, fort d'une formation en médecine, psychiatrie et psychanalyse, a consacré une part importante de sa carrière à des activités psychosociales.

      Son travail initial auprès de sans-abri atteints de schizophrénie en Colombie lui a permis de développer un modèle, l'"auto-analyse accompagnée", pour comprendre et accompagner les personnes souffrant de troubles émotionnels sévères.

      Depuis près de vingt ans, il applique les connaissances acquises dans ce domaine au processus de paix colombien.

      Il a été directement impliqué dans les pourparlers, notamment en tant que membre de la délégation gouvernementale du président Santos lors des discussions avec l'ELN en Équateur et à Cuba.

      Il a également été membre de la Commission de la Vérité en Colombie.

      Actuellement professeur à l'Université du Rosaire, il consacre un mois à l'IEA de Paris (en mode virtuel) pour organiser, synthétiser et repenser une décennie d'expériences.

      Ce travail de réflexion est crucial car il s'apprête à réintégrer le processus de paix colombien avec une perspective académique, visant à analyser la situation d'un point de vue plus large et moins partisan.

      Thèmes Centraux et Observations Clés

      De la "Folie" à la "Normalité" : Une Approche Inversée

      Fergusson qualifie son approche de "confession" : il reconnaît que l'essentiel de sa compréhension des processus de négociation provient de son expérience avec des personnes atteintes de maladies mentales graves.

      Sa présentation est intitulée "La normalité à la lumière de la folie" (Normality in the light of Madness), signifiant que les mécanismes psychologiques extrêmes observés chez ses patients offrent une grille de lecture pertinente pour les dynamiques apparemment "normales" des négociations politiques.

      Le Paradoxe de l'Accord : Individu contre Groupe

      L'observation la plus puissante et la plus récurrente de Fergusson est la dichotomie radicale entre les interactions individuelles et les dynamiques de groupe.

      En tête-à-tête : Fergusson affirme que, sans exception, lors de conversations approfondies et individuelles avec n'importe quel membre de la partie adverse (y compris les plus hauts dirigeants de l'ELN), il a toujours été possible de parvenir à un consensus.

      Il déclare : "nous aurions toujours pu signer l'accord individuellement, en tête-à-tête."

      À la table de négociation : Dès que la discussion est portée à la table formelle, où les dynamiques de groupe, les hiérarchies (nécessité d'obtenir l'approbation du leader suprême, comme "Gabino" pour l'ELN) et les pressions de représentation entrent en jeu, l'accord devient compliqué, voire impossible.

      Ce paradoxe constitue le cœur de son questionnement actuel : pourquoi ce qui est mutuellement acceptable en privé devient-il inacceptable en public ?

      L'Irrationalité Apparente : Agir Contre Ses Propres Intérêts

      Une autre observation centrale est que, dans le cadre des négociations, les individus et les groupes adoptent fréquemment des positions qui vont manifestement à l'encontre de leurs propres intérêts, ou du moins partiellement.

      Fergusson cherche à dépasser la simple explication des "facteurs émotionnels et psychologiques" pour analyser en détail les mécanismes qui conduisent à ces décisions contre-productives.

      Le Rôle Crucial des Canaux de Négociation Parallèles ("Back Channels")

      Fergusson affirme sans équivoque que la plupart des décisions importantes ne sont pas prises à la table officielle de négociation.

      Lieu de décision réel : Les véritables avancées se produisent lors de réunions informelles, en marge des sessions officielles.

      L'art du "Back Channel" : Le succès de ces canaux parallèles dépend d'une stratégie fine :

      1. Identifier l'interlocuteur clé : Il faut savoir repérer la personne de l'autre camp avec qui un accord de principe peut être trouvé.   

      2. Rassembler les décideurs : Dans un exemple réussi, Fergusson et son homologue de l'ELN, après s'être mis d'accord, ont organisé une réunion privée entre leurs deux dirigeants respectifs pour leur présenter leur solution commune.

      Ce fut le moment où les négociations ont le plus progressé.  

      3. Maîtriser la durée : La longueur d'une réunion est un facteur critique. Fergusson note que si des êtres humains continuent de parler après avoir trouvé un accord, ils finiront par trouver un désaccord.

      Savoir quand s'arrêter est essentiel.

      La Question Fondamentale : Négociation et Évolution Sociale

      La principale question de recherche de Fergusson, qu'il explore durant sa résidence, est la suivante :

      "Jusqu'à quel point peut-on changer les processus sociaux par le biais des négociations ?"

      Il illustre ce dilemme avec une analogie : celle d'une personne qui, toute la nuit, pousse de toutes ses forces pour faire venir le soleil et qui, à 6 heures du matin, lorsque le soleil se lève, s'écrie : "J'ai réussi !".

      Négociateur : Agent du changement ou simple facilitateur ?

      Les négociateurs sont-ils les artisans d'un accord, ou leur intervention se contente-t-elle de faciliter ou d'accélérer une trajectoire que les dynamiques sociales et les conflits auraient de toute façon suivie ?

      Le risque des "lois sociales naturelles" : Il se demande si les négociateurs, en tentant de forcer un accord, ne vont pas à l'encontre des "lois sociales naturelles", créant ainsi des arrangements artificiels et prématurés.

      Le critère du succès : Pour Fergusson, un accord réussi n'est pas celui qui tient six mois ou deux ans.

      Sa question porte sur les accords de paix durables et leur véritable origine : l'habileté des négociateurs ou l'évolution inéluctable de la société.

      Perspectives Issues de la Discussion

      Les échanges avec les autres chercheurs ont enrichi et précisé plusieurs points :

      Légitimer le Changement de Position sans "Perdre la Face" :

      ◦ Un participant a suggéré que le rôle du négociateur est de créer un cadre où les parties peuvent légitimement changer de position sans "perdre la face".  

      ◦ Cette idée est illustrée par une expérience de dégustation de vin : des dégustateurs ont radicalement changé leur évaluation d'un vin après avoir vu l'étiquette, mais n'ont jamais admis avoir changé d'avis.

      Ils ont prétendu que c'était le vin qui avait "changé" (il s'était "ouvert").  

      Leçon pour le négociateur : Il ne s'agit pas de convaincre l'autre partie de changer d'avis, mais de présenter la situation différemment (par exemple, en invoquant de "nouveaux événements" ou de "nouveaux aspects") afin que l'adoption d'une nouvelle position apparaisse comme une réponse logique à un contexte modifié, et non comme une capitulation.

      L'Équilibre entre Secret et Public :

      ◦ Même les processus de paix qui semblent secrets, comme celui avec les FARC, sont en réalité un mélange complexe d'échanges publics et de canaux parallèles.  

      ◦ Fergusson confirme que l'accord final avec les FARC a été le résultat d'une "chaîne de canaux parallèles", souvent au grand dam des dirigeants qui n'apprécient pas ces manœuvres.

    1. Mesurer les Inégalités : Synthèse et Perspectives du Débat

      Résumé Exécutif

      Ce document de synthèse analyse les thèmes centraux d'un débat d'experts sur la mesure des inégalités et son lien avec leur réduction.

      Trois perspectives complémentaires émergent :

      1. Les indicateurs comme conventions socio-politiques:

      Florence Jany-Catrice, économiste, soutient que toute mesure des inégalités est le fruit de conventions socio-politiques et non une vérité objective.

      Les indicateurs sont des instruments à double face, servant à la fois la connaissance et la gouvernance.

      Elle critique les mesures standards comme le rapport interdécile (D9/D1) qui masquent les réalités aux extrêmes de la distribution et occulte des inégalités fondamentales comme le partage capital/travail.

      Mesurer n'entraîne pas automatiquement une réduction, car il existe une chaîne complexe entre savoir et agir.

      2. La communication et l'action citoyenne :

      Cécile Duflot, directrice d'Oxfam France, présente l'approche de son organisation, qui consiste à utiliser des données robustes (notamment de Crédit Suisse/UBS) pour produire des "killer facts" :

      des comparaisons choc conçues pour rendre visible l'ampleur de la concentration extrême des richesses.

      L'objectif est de mobiliser l'opinion publique et de plaider pour une régulation politique, en arguant que les niveaux actuels d'inégalité de patrimoine créent des fractures sociales, privent l'action publique de ressources et posent un problème démocratique fondamental.

      3. L'expérience vécue comme révélateur : Nicolas Duvoux, sociologue, propose de dépasser le décalage entre la stabilité relative des indicateurs officiels et la forte tension sociale ressentie.

      En s'appuyant sur l'analogie de la "température ressentie", il affirme que la mesure de la perception subjective des inégalités n'est pas une alternative à la mesure objective, mais un moyen de l'affiner.

      Cette approche révèle le rôle central du patrimoine dans le sentiment de sécurité et la capacité à se projeter dans l'avenir.

      Elle met en lumière des fractures que les indicateurs monétaires traditionnels ne captent pas, de la précarité des classes populaires à la capacité des ultra-riches de façonner l'avenir collectif via la philanthropie.

      En conclusion, le débat converge sur l'idée que si mesurer les inégalités ne suffit pas à les réduire, mesurer autrement — en critiquant les conventions, en rendant visibles les extrêmes et en intégrant l'expérience vécue — est le premier pas indispensable pour poser un diagnostic partagé et engager une action politique et sociale efficace.

      --------------------------------------------------------------------------------

      Thème 1 : Les Indicateurs comme Conventions Socio-Politiques (Florence Jany-Catrice)

      L'économiste Florence Jany-Catrice pose le cadre conceptuel du débat en affirmant que la quantification des faits sociaux, et en particulier des inégalités, est une opération complexe qui repose sur des conventions.

      Reprenant les travaux d'Alain Desrosières, elle insiste sur le duo "convenir et mesurer", soulignant que derrière chaque chiffre se cache une part de normativité et une théorie de la justice, consciente ou non.

      La Double Face des Indicateurs : Connaissance et Gouvernance

      Les indicateurs d'inégalité ne sont pas de simples outils de connaissance neutres. Ils possèdent une double nature :

      Instruments de connaissance : Ils permettent de se représenter l'état de la société.

      Instruments de gouvernance : Ils servent de marqueurs pour évaluer l'efficacité des politiques publiques de redistribution et reflètent l'état des rapports de force sociaux.

      Cependant, le lien entre l'observation d'un phénomène et sa prise en charge politique n'est ni linéaire ni automatique.

      Comme le démontre l'exemple de la commission Stiglitz-Sen-Fitousi (2008), dont la recommandation d'adjoindre au PIB un indicateur de répartition des richesses a été largement ignorée, "on peut très bien savoir mais ne pas vouloir".

      L'impact d'un diagnostic dépend de la capacité des acteurs sociaux (experts, chercheurs, ONG) à le rendre suffisamment partagé et à défendre des visions politiques alternatives.

      Les Limites des Mesures Conventionnelles

      Florence Jany-Catrice met en évidence les faiblesses et les angles morts des indicateurs les plus couramment utilisés.

      Indicateur / Concept

      Description et Critique

      Rapport Capital/Travail

      Considéré comme la "première inégalité" du capitalisme, il mesure le partage de la valeur ajoutée entre la rémunération du travail (salaires) et celle du capital (dividendes, intérêts).

      Cet indicateur, bien qu'existant, est de moins en moins visible dans le débat public, illustrant un glissement des intérêts et des expertises.

      Rapport Interdécile (D9/D1)

      Rapport entre le revenu des 10 % les plus riches et celui des 10 % les plus pauvres.

      Bien qu'il semble stable en France (autour de 3,5), cet indicateur est critiqué car il exclut volontairement les "valeurs aberrantes", c'est-à-dire les très hauts et très bas revenus. Il masque ainsi l'aggravation des inégalités aux "queues de la distribution".

      Pauvreté Monétaire Relative

      En France, elle est définie par le seuil de 60 % du revenu médian. F. Jany-Catrice souligne qu'il s'agit avant tout d'un indicateur d'inégalité de répartition, et non de pauvreté absolue.

      Vers des Indicateurs Alternatifs et le "Statactivisme"

      Face aux limites des outils officiels, des initiatives de la société civile émergent pour proposer d'autres manières de compter.

      Le BIP 40 (Baromètre des Inégalités et de la Pauvreté) :

      Créé dans les années 2000 par le Réseau d'alerte sur les inégalités, cet indicateur composite et multidimensionnel (revenu, travail, éducation, santé, logement, justice) montrait une "explosion" des inégalités entre 1980 et 1995, à rebours de l'indicateur officiel de l'INSEE qui indiquait une régression de la pauvreté.

      L'objectif n'était pas d'opposer un "vrai" chiffre à un "faux", mais de démontrer que "selon les lunettes que l'on chausse, on peut raconter des histoires" très différentes sur l'état de la société.

      Le "Statactivisme" : Ce néologisme désigne les stratégies statistiques utilisées par des acteurs sociaux pour critiquer une autorité et s'en émanciper.

      Il s'agit d'une réappropriation du "pouvoir émancipateur" des statistiques pour fournir des données sur les angles morts de la production publique (ex: les plus riches) ou des visions alternatives.

      Thème 2 : Le Rôle d'Oxfam dans le Débat Public (Cécile Duflot)

      Cécile Duflot explique comment Oxfam, une organisation historiquement dédiée à la lutte contre la pauvreté, s'est concentrée sur les causes de celle-ci, arrivant "assez rapidement sur la question des inégalités".

      L'approche d'Oxfam est décrite comme éminemment politique et militante, visant à mobiliser le pouvoir citoyen.

      Méthodologie et Stratégie de Communication

      Le rapport annuel d'Oxfam, publié symboliquement pendant le Forum de Davos, repose sur une méthodologie précise et une stratégie de communication percutante.

      Source des données : Le rapport s'appuie principalement sur les données du Crédit Suisse (aujourd'hui UBS) et de Forbes, utilisant la "méthode la plus robuste pour calculer le patrimoine", la même que celle utilisée par des institutions comme la Haute autorité pour la transparence de la vie publique.

      Les "Killer Facts" : La stratégie d'Oxfam consiste à traduire des données brutes en comparaisons frappantes et intuitivement compréhensibles, car les ordres de grandeur comme le milliard d'euros sont "nébuleux" pour le grand public.

      ◦ Exemple cité : "Les 8 premiers milliardaires du monde possédaient ce que possède la moitié des plus pauvres".   

      Illustration de l'effet de moyenne : L'entrée de Carlos Tavares (PDG de Stellantis) dans une pièce de 99 smicards ferait passer le revenu moyen de 16 000 € à environ 400 000 €, masquant le fait que 99 % des personnes sont toujours au SMIC.

      Même le ratio D9/D1 (écart de 1 à 229 dans ce cas) reste trompeur, car il y a "plus d'écart au sein des 10 % les plus riches [...] que entre les 10 % les plus pauvres et les 10 % les plus riches".

      Au-delà du Revenu : La Concentration du Patrimoine et ses Conséquences

      Oxfam se concentre sur les inégalités de patrimoine, considérées comme plus fondamentales que celles de revenu.

      L'injustice perçue : La majorité des grandes fortunes sont héritées. En France, "plus de 70 % de la fortune des milliardaires est une fortune héritée".

      C. Duflot cite un milliardaire danois parlant de "gagner à la loterie du sperme".

      Conséquences des inégalités extrêmes :

      1. Fracturation sociale : Elles sont vécues comme injustes et fragilisent la cohésion sociale.  

      2. Privation de ressources publiques : La concentration du patrimoine chez les ultra-riches, qui bénéficient de taux d'imposition effectifs plus faibles, réduit la base taxable.   

      3. Problème démocratique : L'accumulation extrême de richesse se traduit par l'achat du pouvoir.

      C. Duflot cite un interlocuteur : "Le premier milliard, on peut le dépenser. [...] À partir du 2e milliard, [...] on achète le pouvoir", notamment via l'achat de médias et la pression sur les dirigeants politiques.

      Une Démarche Militante pour la Régulation

      La finalité du travail d'Oxfam n'est pas de "ne pas aimer les riches", mais de plaider pour une plus grande régulation, arguant que les sociétés plus égalitaires sont en meilleure santé globale (travaux de Wilkinson) et plus stables.

      Le rapport d'Oxfam de septembre 2017, qui analysait le premier budget du gouvernement Macron (baisse des APL, suppression de l'ISF), est présenté comme ayant anticipé la colère sociale qui a mené au mouvement des "gilets jaunes", car "les gens [...] comprennent très bien le message politique".

      Thème 3 : L'Objectivité Supérieure du Subjectif (Nicolas Duvoux)

      Le sociologue Nicolas Duvoux part d'une énigme : le contraste entre la relative stabilité des indicateurs macroéconomiques d'inégalité en France et le niveau très élevé de "tension, de colère, d'insatisfaction".

      Son travail vise à réconcilier la mesure objective et l'expérience vécue sans renoncer à la scientificité.

      La "Température Ressentie" des Inégalités

      Nicolas Duvoux propose de ne pas opposer l'objectif et le subjectif, mais d'utiliser la subjectivité comme une clé d'entrée pour "raffiner, mieux comprendre, mieux saisir l'objectivité des rapports sociaux".

      Analogie : Tout comme la température ressentie affine la température ambiante en y ajoutant des facteurs comme le vent ou l'humidité, la mesure du statut social subjectif donne une information plus fine que le statut objectif, car elle intègre la synthèse cognitive que fait l'individu de sa propre situation.

      Récusation du "subjectivisme" : Il insiste sur le fait que sa démarche n'isole pas le point de vue subjectif, mais l'intègre à l'analyse des structures objectives (ressources économiques, patrimoine) pour obtenir une vision plus riche. L'objectif est de "contextualiser la subjectivité".

      Le Patrimoine comme Clé de Lecture de la Sécurité Sociale

      La mesure subjective fait systématiquement ressortir le poids du patrimoine comme facteur déterminant de la sécurité ou de l'insécurité sociale.

      La pauvreté ressentie : Elle touche des groupes qui ne sont pas nécessairement pauvres au sens monétaire (petits indépendants, retraités locataires).

      Elle révèle une "impossibilité de rendre soutenable une situation" où les revenus stagnent face à des charges qui augmentent (ex: loyers).

      La pauvreté est alors vécue comme un "enfermement" et un manque de liberté dans l'affectation de ses ressources.

      L'avenir confisqué : L'inégalité est redéfinie comme une "inégalité de temps vécu", c'est-à-dire une différence dans la "capacité à se projeter" dans l'avenir.

      Cette capacité est directement indexée sur la dotation en ressources, et particulièrement en patrimoine.

      La philanthropie des ultra-riches : À l'autre extrême du spectre social, le don philanthropique est analysé non pas comme un simple acte de générosité, mais comme un levier permettant aux plus fortunés d'assurer la transmission dynastique de leur patrimoine et d'exercer un contrôle sur les choix collectifs, se saisissant ainsi de "l'avenir collectif".

      Changer la Représentation de la Hiérarchie Sociale

      Cette approche conduit à une vision de la société structurée par des "franchissements de paliers de sécurité" plutôt que par une échelle linéaire et monétaire.

      Elle réintroduit de la discontinuité entre les groupes sociaux et permet de donner une représentation statistique à des phénomènes comme la mobilisation des "gilets jaunes", en validant la difficulté exprimée par de larges pans de la population.

    1. Couplage neurovasculaire

      L'imagerie optique permet de mesurer le couplage neurovasculaire - Car lors de l'augmentation de l'activité neuronale, il y a un augmentation de la consommation d'oxygène dans cette zone - cela a un impact sur les concentration de déoxy et de oxy - ce qui peut être mesuré avec l'imagerie optique

    1. /hyperpost/🌐/🧊/snarf-peergos.chat/

      Use this link to view the page in Peergs

      close the view and the enclosing folder is shown

      where the page can be edited using indy0pad.next

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): * Summary: * In this manuscript, Turner AH. et al. demonstrated the viral replication in cells depleting Rab11B small GTPase, which is a paralogue of Rab11A. It has been reported that Rab11A is responsible for the intracellular transport of viral RNP via recycling endosomes. The authors showed that Rab11B knockdown reduced the viral protein expression and viral titer. This may be caused by reduced attachment of viral particles on Rab11B knockdown cells.

      • Major comments:*
      • Comment 1 Fig 2-4: The authors should provide Western blot results with equal amount of loading control (GAPDH). The bands shown in these figures lack quantifiability and are not reliable as data.*

      We have rerun these western blots with more equal loading, and included a second loading control (beta-actin) in addition to the GAPDH. These blots can be seen in new Figures 2 and 3, and the quantification against both GAPDH (Figure 2/3) as well as actin (Fig S2) is now included. We have also included additional biological replicates for Fig 2 B-D. These additional experiments have strengthened our conclusion that Rab11B is required for efficient protein production in cells infected with recent H3N2, but not H1N1, isolates.

      Comment 2 Fig 2-4: Why are the results different between Rab11B knockdown alone and Rab11A/B double knockdown? If the authors claims are correct, the results of Rab11B knockdown should be reproducible in Rab11A/B double knockdown cells.

      Prior literature indicates that the Rab11A and Rab11B isoforms can play opposing roles in the trafficking of some cargos (ie, with one isoform transporting a molecule to the cell surface, while the other isoform takes it off again). In this scenario, it is possible that removing both 'halves' of the trafficking loop can ablate a phenotype. However, since our double knockdown used half the amount of siRNA for each isoform (for the same total amount), it is also possible this observation is simply the result of less efficient knockdown. In order to distinguish between these possibilities we depleted Rab11A or Rab11B individually, with this same 'half dose' of siRNA (see new Figure S3). We observed that Rab11B was still robustly required for H3N2 viral protein production. These results suggest that Rab11A and Rab11B could be playing mutually opposing roles in this case, which is consistent with prior Rab11 literature.

      Comment 3 Fig 6: For better understanding, please provide a schematic illustration of experimental setting.

      We have added a new graphical overview to this figure (see new Figure 6A).

      Comment 4: It is necessary to test other siRNA sequences or perform a rescue experiment by expressing an siRNA-resistant clone in the knockdown cells. There seems to be an activation of host defense system, such as IFN pathways.

      In order to rule out the possibility of off-target effects we created a novel cell line that inducibly expresses a Rab11B shRNA sequence (see new Fig 4). This knockdown strategy used a completely different method (shRNA delivered by lentiviral vector vs transient transfection of siRNA), in a different cellular background (H441 "club like" cells vs A549 lung adenocarcinoma). This new depletion strategy showed that the Rab11B dependent H3N2 protein production phenotype is seen across multiple knockdown strategies and cellular backgrounds.

      **Referees cross-commenting**

      I agree with other reviewers' comments in part.

      Reviewer #1 (Significance (Required)):

      The authors propose a novel role for Rab11B in modulating attachment pathway of H3N2 influenza A virus by unknown mechanism. Although previous studies focus on the function of Rab11A on endocytic transport, the function and specificity of Rab11B has remained less clear. The findings may be of interest to a broad audience, including researchers in cell biology, immunology, and host-pathogen interactions. However, the study remains at a superficial level of analysis and does not lead to a deeper understanding of the underlying mechanisms.

      We agree with the reviewer that a strength of this manuscript is its multi-disciplinary nature, particularly with regard to advances in our understanding of Rab11B function. We have added a significant number of experiments and new figures to bolster the rigor and reproducibility of our findings. We have also added a new figure (Fig 7) that uses reverse genetics to map the Rab11B phenotype to the HA gene of the H3N2 isolate under study. By creating '7+1' reassortant viruses with the H3 HA or the N2 NA on a PR8 (H1N1) background (see Fig 7E-H) we were able to demonstrate that Rab11B is acting specifically on one of the HA-mediated entry steps. This provides additional mechanistic insight, by mapping the Rab11B-phenotype to a step at or prior to fusion. Fundamentally, we believe the novelty and rigor of our observation that recent H3N2 viruses enter through a different route than H1N1 isolates is worthy of observation in this updated form, so that the field can begin follow up studies.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): Summary: The authors compare the effect of RAB11A and RAB11B knockdown on replication of contemporary H1N1 and H3N2 influenza A virus strains in A549 cells (human lung epithelials cells). They find a reduction in viral protein expression for tested H3N2 but not for H1N1 isolates. Mechanistically they suggest that RAB11A affects virion attachment to the cell surface.

      Major comments: The provided data do not conclusively support the suggested mechanism of action and essential controls are missing to substantiate the authors claims: • Knockdown efficacy has to be confirmed on protein level, showing reduced levels of RAB11A and B by Western blot. This is a standard in the field. Off target effects cannot be avoided by RNAi approaches and are usually ruled out by using multiple siRNAs or by complementing the targeted protein in trans.

      We have verified knockdown efficacy at the protein level in new Fig 1A/B. However, due to the high degree of protein level conservation between Rab11A and Rab11B it is very difficult to develop isoform specific antibodies, and we were unable to obtain a Rab11B-specific antibody that can detect endogenous protein (despite testing 6 commercially available antibodies for specificity). Using an antibody that detects both 11A and 11B (Fig1A) we were able to observe very slight changes in the molecular weight of the Rab11 band(s) detected upon knockdown of 11A vs 11B (suggestive of the two isoforms running as a dimer, with Rab11A the lower band and Rab11B the upper band). Cells depleted of both isoforms simultaneously showed a near complete loss of signal. Using a Rab11A antibody (that we confirmed as specific) we were able to observe loss of the Rab11A signal in both the 11A and 11A+B knockdowns (Fig 1B).

      • Viral titers should be presented as absolute titers not as % (here the labelling is actually misleading in all graphs indicating pfu/ml)

      This data is now shown in new Figure S1, where it is clear that the trends remain consistent across biological replicates. The axis labels of Fig 1D/E and Fig 3A have been corrected as requested to make clear we are normalizing to account for experiment-to-experiment variation in peak titer.

      • Reduction of viral protein expression goes hand in hand with a reduction in GAPDH. While this is accounted for in the quantification a general block of protein expression cannot be ruled out since the stability of house keeper proteins and viral proteins might be different. Testing multiple house keeping proteins could overcome this issue.

      We have included a second loading control (beta-actin) in addition to the GAPDH for new Figure 2 and 3. The quantification of viral protein production compared to beta actin is now included in new Fig S2. We have also included additional biological replicates for Fig 2 B-D. These additional experiments have strengthened our conclusion that Rab11B is required for efficient protein production in cells infected with recent H3N2, but not H1N1, isolates.

      • The FACS data in Fig 5 are not convincing. The previous figures showed modest reduction in viral protein expression and the fluorescence is indicated here on a logarithmic scale. Quantification and indication of mean fluorescence intensity from the same data would be a better readout to convincingly show that less cells are infected.

      We have reanalyzed the existing data to quantify the geometric mean of viral protein expression in the infected cell populations (new Figure 5D, E). This analysis shows no significant difference in geometric mean of HA (Fig 5D) or M2 (Fig 5E) expression between cells treated with NT, 11A or 11B siRNA. This additional analysis strengthens our original conclusion that when Rab11B is knocked down, fewer cells get infected, but those that do produce the same level of viral proteins.

      • During the time of addition experiment in Fig 6, the authors are testing for HA/M2 positive cells after 16h of infection. This is a multicycle scnario so in a second round they would measure the effect of knockdown in absence of amonium chloride. Shorter infections up to 8h with higher MOI would overcome this problem.

      By maintaining cells in ammonium chloride throughout the infection we are preventing endosomal acidification at any point in the infection period, so this experiment should be measuring solely the effect of one round of infection. The 16 hr timepoint was chosen to allow for optimized staining and analysis of samples by flow cytometry, within the available hours of the flow cytometry facility.

      • Standard error of mean is not an appropriate way of representing experimental error for the provided results and should be replaced by SD. Correct labeling of axis with units is required.

      We have updated the axes throughout the manuscript as requested. We have obtained additional statistical expertise (reflected in the updated author list) regarding the issue of SD vs SEM. Standard deviation (SD) would show a measure of the spread of the data, however the full distribution can be clearly seen as we plotted every individual data point. Standard error of the mean (SEM) is a measure of confidence for the mean of the population which takes into account SD and also sample size. SEM is not obvious to estimate by eye in the same way as SD, and we feel is more helpful to the reader to understand how likely the two population means differ from each other on a given graph.

      Minor comments: • The authors show a rescue of viral replication upon double knockdown of RAB11A and B. Maybe this is just a consequence of inefficient knockdown since only half of the siRNAs were used?

      In order to determine if this was the case we depleted Rab11A or Rab11B individually, with this same 'half dose' of siRNA (see new Figure S3). We observed that Rab11B was still robustly required for H3N2 viral protein production. These results suggest that Rab11A and Rab11B could be playing mutually opposing roles in this case (ie, Rab11B transporting a molecule to the surface, while Rab11A recycles it off), which is consistent with prior Rab11 literature.

      • Specific experimental issues that are easily addressable. • Are prior studies referenced appropriately? • Are the text and figures clear and accurate? • Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      • Reviewer #2 (Significance (Required)): Significance The authors claim an H3N2 specific dependency on RAB11B for early steps of infection. While this is per se interesting the provided data do not fully support the claims and lack a mechanistic explanation. What is the difference between H1 and H3 strains (virion shape, HA load per virion, attachment force of H1 vs H3). The readouts used are not close enough to the events with regards to timing and could be supported by established entry assays in the field.

      We have provided additional discussion of the differences between H1s and H3s, including sialic acid binding preferences and changes in the HA-sialic acid avidity (lines 76-84). Notably, we have included a new assay (new Fig 7) that provides additional mechanistic insight into the observation that recent H3N2 but not H1N1 isolates depend on Rab11B early in infection. Using reverse genetics we were able to map the Rab11B phenotype to the HA gene of the H3N2 isolate under study. By creating '7+1' reassortant viruses with either the H3 HA or the N2 NA on a PR8 (H1N1) background (see Fig 7E) we are able to demonstrate that Rab11B is acting specifically at one of the HA-mediated entry steps. This excludes several non-HA dependent steps early in the life cycle (uncoating, RNP transport to the nucleus, nuclear import), thus providing additional confirmation that Rab11B acts at one of the earliest steps in the viral life cycle (and by definition, at or prior to fusion). Fundamentally, we believe the novelty and rigor of our observation that recent H3N2 viruses enter through a different route than H1N1 isolates is worthy of observation in this updated form, so that the field can begin follow up studies.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Manuscript Reference: RC-2025-03007 TITLE: Rab11B is required for binding and entry of recent H3N2, but not H1N1, influenza A isolates Allyson Turner, Sara Jaffrani, Hannah Kubinski, Deborah Ajayi, Matthew Owens, Madeline McTigue, Conor Fanuele, Cailey Appenzeller, Hannah Despres, Madaline Schmidt, Jessica Crothers, and Emily Bruce

      Summary Here, Turner et al. build upon existing knowledge of Influenza A virus (IAV) dependence on the Rab11 family of proteins and provide insights into the specific role of Rab11B isoform in H3N2 virus binding and entry. The introduction is clearly written and provides sufficient background on prior research involving Rab11. It effectively identifies the current gap in knowledge and justifies the investigation of more clinically relevant, circulating strains of IAV. The methods section provides sufficient detail to ensure reproducibility. Similarly, the discussion is well structured, aligns with the introduction, and thoughtfully outlines relevant follow-up experiments. The authors present data from a series of experiments which suggest that the reduced H3N2 infection and viral protein production in Rab11B-depleted cells is due to impaired virus binding. While the evidence supports a Rab11B-specific phenotype in the context of H3N2 infection, we recommend additional experiments (outlined below), to further validate and strengthen these findings. These would help solidify the mechanistic link between Rab11B depletion and the observed phenotype for H3N2 strains of IAV.

      Major comments Figure 1. (B) & (C) The authors normalise viral titers to the non-targeting control (NTC) siRNA set at 100. While this approach allows for relative comparisons, we recommend including the corresponding raw PFU/ml values, at least in the supplementary materials. This will better illustrate the biological significance of gene depletion and variability of the results.

      We have included the raw PFU/mL values in new Figure S1, while peak viral production varied by biological replicate (pasted below, with each biological replicate having a differently shaped data point). While the depletion-induced trends are clearly visible across biological replicates, normalization to average titer in the NT condition for each replicate allows for cleaner visualization.

      In addition, the current protocol uses a high MOI (1), and a relatively short infection period (16 hours) to capture single-cycle replication. However, to better assess the impact of gene knockdown on virus production and spread, we suggest performing a multicycle replication assay using a lower MOI (e.g, 0.01-0.001) over an extended time period, such as 48 hours before titration, provided that cell viability under these conditions is acceptable.

      We appreciate this suggestion and repeatedly attempted to carry out a multicycle growth curve to obtain this data. Unfortunately, out of four independent biological replicates we attempted, we were only able to maintain cell viability and adherence in one biological replicate (shown below). We have not included this data in the revised manuscript due to the limited replicates we were able to obtain, though we can add it in a further revision if the reviewer feels it is warranted.

      Figure 7. (B) & (C) The authors present interesting data showing that siRNA-mediated depletion of Rab11B reduces virion binding of a recently circulating strain of H3N2, but not H1N1, suggesting a subtype-specific role. However, we strongly recommend complementing this assay with a single-cell resolution approach such as immunofluorescence detection of surface-bound viruses through HA staining and image quantification. This would allow the authors to directly assess virion binding per cell and visualise the phenotype, strengthening the mechanistic insight on H3N2 binding in Rab11B-depleted cells. Furthermore, the data, particularly for H1N1 (Figure 7.C), shows substantial variance, which suggests a suboptimal assay sensitivity and limits the strength of the conclusion that the knockdown does not affect H1N1 binding, this limitation may be overcome by implementing the above experimental suggestion.

      We have made substantial efforts to include this data, but were ultimately unable to include this assay due to technical difficulties in implementation (NA stripping caused cells to lift off coverslips, difficulties in antibody sensitivity and specificity, among other issues). We also piloted single cell-based flow cytometry assays to attempt to measure signal from bound virions, but were unable to achieve sufficient differentiation between mock and bound samples with the antibodies we could obtain. However, we have included a new experimental approach that is able to genetically map the 11B-dependent phenotype to the HA gene, thus providing additional mechanistic insight and confirming that Rab11B acts on one of the earliest steps in the viral life cycle (prior to or at fusion).

      Minor comments General The authors should state which statistical test was used for each dataset in the respective figure legends.

      This information is now included in each figure legend.

      Figure 1. Suggest changing Y axis title to PFU/ml [relative to NTC]

      We have changed the axis titles of normalized data to "PFU as % of NT" throughout.

      The co-depletion of Rab11A and Rab11B appears to be less efficient than individual knockdowns, based on RT- qPCR data (Figure 1.A). It is possible that the partial 'rescue' phenotype observed in Figures 2-4 is due to incomplete knockdown, rather than a true biological interaction. This possibility should be acknowledged.

      In order to distinguish between a partial 'rescue' and inefficient knockdown, we depleted Rab11A or Rab11B individually, with the same 'half dose' of siRNA used in the double knockdown (see new Figure S3). We observed that Rab11B was still robustly required for H3N2 viral protein production. These results suggest that Rab11A and Rab11B could be playing mutually opposing roles in this case, which is consistent with prior Rab11 literature, rather than simply inefficient knockdown.

      Furthermore, knockdown efficiency is assessed only at the mRNA level. To strengthen the conclusions, the authors are encouraged to provide western blot data confirming protein-level depletion of Rab11A and Rab11B, particularly in the double knockdown condition. This would help clarify whether co-transfection of siRNAs affect the efficiency of each individual knockdown at the protein level.

      We have verified knockdown efficacy at the protein level in new Fig 1A/B. However, due to the high degree of protein level conservation between Rab11A and Rab11B it is very difficult to develop isoform specific antibodies, and we were unable to obtain a Rab11B-specific antibody that can detect endogenous protein (despite testing 6 commercially available antibodies for specificity). Using an antibody that detects both 11A and 11B (Fig1A) we were able to observe very slight changes in the molecular weight of the Rab11 band(s) detected upon knockdown of 11A vs 11B (suggestive of the two isoforms running as a dimer, with Rab11A the lower band and Rab11B the upper band). Cells depleted of both isoforms simultaneously showed a near complete loss of signal. Using a Rab11A antibody (that we confirmed as specific) we were able to observe loss of the Rab11A signal in both the 11A and 11A+B knockdowns (Fig 1B).

      Figure 6. (A) & (B) are missing error bars, particularly the Rab11B knockdown data points.

      Error bars are plotted in each graph, but due to very limited experimental variation these error bars are too small to appear on the graph (11B points in Fig 6B, D).

      Figure 7. If including any repeats in the binding assay, authors are encouraged to use appropriate controls in each experiment such as exogenous neuraminidase treatment or sialidase treatment.

      When attempting to establish a microscopy based binding assay we included exogenous neuraminidase in each experiment. Unfortunately, the combination of glass coverslips and treatment with exogenous neuraminidase at incubation times sufficient to strip virus also removed cells from the coverslips.

      Reviewer #3 (Significance (Required)):

      General assessment: Provides a conceptual advancement of subtype specific receptor preferences.

      Advance: The study raises interesting observations regarding influenza virus subtype differences in cell surface receptor binding, in a Rab11B-dependent manner.

      Audience: Influenza virologists, respiratory virologists

      Expertise: Virus entry, Virus cell biology

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Title: Rab11B is required for binding and entry of recent H3N2, but not H1N1, influenza A isolates

      Allyson Turner, Sara Jaffrani, Hannah Kubinski, Deborah Ajayi, Matthew Owens, Madeline McTigue, Conor Fanuele, Cailey Appenzeller, Hannah Despres, Madaline Schmidt, Jessica Crothers, and Emily Bruce

      Summary

      Here, Turner et al. build upon existing knowledge of Influenza A virus (IAV) dependence on the Rab11 family of proteins and provide insights into the specific role of Rab11B isoform in H3N2 virus binding and entry. The introduction is clearly written and provides sufficient background on prior research involving Rab11. It effectively identifies the current gap in knowledge and justifies the investigation of more clinically relevant, circulating strains of IAV. The methods section provides sufficient detail to ensure reproducibility. Similarly, the discussion is well structured, aligns with the introduction, and thoughtfully outlines relevant follow-up experiments. The authors present data from a series of experiments which suggest that the reduced H3N2 infection and viral protein production in Rab11B-depleted cells is due to impaired virus binding. While the evidence supports a Rab11B-specific phenotype in the context of H3N2 infection, we recommend additional experiments (outlined below), to further validate and strengthen these findings. These would help solidify the mechanistic link between Rab11B depletion and the observed phenotype for H3N2 strains of IAV.

      Major comments

      Figure 1. (B) & (C)

      The authors normalise viral titers to the non-targeting control (NTC) siRNA set at 100. While this approach allows for relative comparisons, we recommend including the corresponding raw PFU/ml values, at least in the supplementary materials. This will better illustrate the biological significance of gene depletion and variability of the results. In addition, the current protocol uses a high MOI (1), and a relatively short infection period (16 hours) to capture single-cycle replication. However, to better assess the impact of gene knockdown on virus production and spread, we suggest performing a multicycle replication assay using a lower MOI (e.g, 0.01-0.001) over an extended time period, such as 48 hours before titration, provided that cell viability under these conditions is acceptable.

      Figure 7. (B) & (C)

      The authors present interesting data showing that siRNA-mediated depletion of Rab11B reduces virion binding of a recently circulating strain of H3N2, but not H1N1, suggesting a subtype-specific role. However, we strongly recommend complementing this assay with a single-cell resolution approach such as immunofluorescence detection of surface-bound viruses through HA staining and image quantification. This would allow the authors to directly assess virion binding per cell and visualise the phenotype, strengthening the mechanistic insight on H3N2 binding in Rab11B-depleted cells. Furthermore, the data, particularly for H1N1 (Figure 7.C), shows substantial variance, which suggests a suboptimal assay sensitivity and limits the strength of the conclusion that the knockdown does not affect H1N1 binding, this limitation may be overcome by implementing the above experimental suggestion.

      Minor comments

      General

      The authors should state which statistical test was used for each dataset in the respective figure legends.

      Figure 1.

      Suggest changing Y axis title to PFU/ml [relative to NTC] The co-depletion of Rab11A and Rab11B appears to be less efficient than individual knockdowns, based on RT- qPCR data (Figure 1.A). It is possible that the partial 'rescue' phenotype observed in Figures 2-4 is due to incomplete knockdown, rather than a true biological interaction. This possibility should be acknowledged. Furthermore, knockdown efficiency is assessed only at the mRNA level. To strengthen the conclusions, the authors are encouraged to provide western blot data confirming protein-level depletion of Rab11A and Rab11B, particularly in the double knockdown condition. This would help clarify whether co-transfection of siRNAs affect the efficiency of each individual knockdown at the protein level.

      Figure 6.

      (A) & (B) are missing error bars, particularly the Rab11B knockdown data points.

      Figure 7.

      If including any repeats in the binding assay, authors are encouraged to use appropriate controls in each experiment such as exogenous neuraminidase treatment or sialidase treatment.

      Significance

      General assessment: Provides a conceptual advancement of subtype specific receptor preferences.

      Advance: The study raises interesting observations regarding influenza virus subtype differences in cell surface receptor binding, in a Rab11B-dependent manner.

      Audience: Influenza virologists, respiratory virologists

      Expertise: Virus entry, Virus cell biology

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We will provide the revised manuscript as a PDF with highlighted changes, the Word file with tracked changes linked to reviewer comments, and all updated figures.

      To address the reviewers' suggestions, we have conducted additional experiments that are now incorporated into new figures, or we have added new images to several existing figures where appropriate.

      Please note that all figures have been renumbered to improve clarity and facilitate cross-referencing throughout the text. As recommended by Referee #3, all figure legends have been thoroughly revised to reflect these updates and are now labeled following the standard A-Z panel format, enhancing readability and ensuring easier identification. In addition, all figure legends now include the sample size for each statistical analysis.

      For clarity and ease of reference, we provide below a comprehensive list of all figures included in the revised version. Figures that have undergone modifications are underlined.

      Figure 1____. The first spermatogenesis wave in prepuberal mice.

      This figure now includes amplified images of representative spermatocytes and a summary schematic illustrating the timeline of spermatogenesis. In addition, it now presents the statistical analysis of spermatocyte quantification to support the visual data.

      __Figure 2.____ Cilia emerge across all stages of prophase I in spermatocytes during the first spermatogenesis wave. __

      The images of this figure remain unchanged from the original submission, but all the graphs present now the statistical analysis of spermatocyte quantification.

      Figure 3. Ultrastructure and markers of prepuberal meiotic cilia.

      This figure remains unchanged from the original submission; however, we have replaced the ARL3-labelled spermatocyte image (A) with one displaying a clearer and more representative signal.

      __Figure 4. Testicular tissue presents spermatocyte cysts in prepuberal mice and adult humans. __

      This figure remains unchanged from the original submission.

      __Figure 5. Cilia and flagella dynamics are correlated during prepuberal meiosis. __

      This figure remains unchanged from the original submission.

      __Figure 6. Comparative proteomics identifies potential regulators of ciliogenesis and flagellogenesis. __

      This figure remains unchanged from the original submission.

      Figure 7.____ Deciliation induces persistence of DNA damage in meiosis.

      This figure has been substantially revised and now includes additional experiments analyzing chloral hydrate treatment, aimed at more accurately assessing DNA damage under both control and treated conditions. Images F-I and graph J are new.

      Figure 8____. Aurora kinase A is a regulator of cilia disassembly in meiosis.

      This figure is remodelled as the original version contained a mistake in previous panel II, for this, graph in new Fig.8 I has been corrected. In addition, it now contains additional data of αTubulin staining in arrested ciliated metaphases I after AURKA inhibition (new panel L1´).

      __Figure 9. Schematic representation of the prepuberal versus adult seminiferous epithelium. __

      This figure remains unchanged from the original submission.

      __Supplementary Figure 1. Meiotic stages during the first meiotic wave. __

      This figure remains unchanged from the original submission.

      __Supplementary Figure 2 (new)____. __

      This is a new figure that includes additional data requested by the reviewers. It includes additional markers of cilia in spermatocytes (glutamylated Tubulin/GT335), and the control data of cilia markers in non-ciliated spermatocytes. It also includes now the separated quantification of ciliated spermatocytes for each stage, as requested by reviewers, complementing graphs included in Figure 2.

      Please note that with the inclusion of this new Supplementary Figure 2, the numbering of subsequent supplementary figures has been updated accordingly.

      Supplementary Figure 3 (previously Suppl. Fig. 2)__. Ultrastructure of prophase I spermatocytes. __

      This figure is equal in content to the original submission, but some annotations have been included.

      Supplementary Figure 4 (previously Suppl. Fig. 3).__ Meiotic centrosome under the electron microscope. __

      This figure remains unchanged from the original submission, but additional annotations have been included.

      Supplementary Figure 5 (previously Suppl. Fig. 4)__. Human testis contains ciliated spermatocytes. __

      This figure has been revised and now includes additional H2AX staining to better determine the stage of ciliated spermatocytes and improve their identification.

      Supplementary Figure 6 (previously Suppl. Fig. 5). GLI1 and GLI3 readouts of Hedgehog signalling are not visibly affected in prepuberal mouse testes.

      This figure has been remodeled and now includes the quantification of GLI1 and GLI3 and its corresponding statistical analysis. It also includes the control data for Tubulin, instead of GADPH.

      Supplementary Figure 7 (previously Suppl. Fig. 6)__. CH and MLN8237 optimization protocol. __

      This figure has been remodeled to incorporate control experiments using 1-hour organotypic culture treatment.

      Supplementary Figure 8 (previously Suppl. Fig. 7)__. Tracking first meiosis wave with EdU pulse injection during prepubertal meiosis. __This figure remains unchanged from the original submission.

      Supplementary Figure 9 (previously Suppl. Fig. 8)__. PLK1 and AURKA inhibition in cultured spermatocytes. __

      This figure has been remodeled and now includes additional data on spindle detection in control and AURKA-inhibited spermatocytes (both ciliated and non ciliated).


      __Response to the reviewers __

      We will submit both the PDF version of the revised manuscript and the Word file with tracked changes relative to the original submission. Each modification made in response to reviewers' suggestions is annotated in the Word document within the corresponding section of the text.

      A detailed, point-by-point response to each reviewer's comments is provided in the following section.

      Response to the Referee #1


      In this manuscript by Perez-Moreno et al., titled "The dynamics of ciliogenesis in prepubertal mouse meiosis reveal new clues about testicular maturation during puberty", the authors characterize the development of primary cilia during meiosis in juvenile male mice. The authors catalog a variety of testicular changes that occur as juvenile mice age, such as changes in testis weight and germ cell-type composition. They next show that meiotic prophase cells initially lack cilia, and ciliated meiotic prophase cells are detected after 20 days postpartum, coinciding with the time when post-meiotic spermatids within the developing testes acquire flagella. They describe that germ cells in juvenile mice harbor cilia at all substages of meiotic prophase, in contrast to adults where only zygotene stage meiotic cells harbor cilia. The authors also document that cilia in juvenile mice are longer than those in adults. They characterize cilia composition and structure by immunofluorescence and EM, highlighting that cilia polymerization may initially begin inside the cell, followed by extension beyond the cell membrane. Additionally, they demonstrate ciliated cells can be detected in adult human testes. The authors next perform proteomic analyses of whole testes from juvenile mice at multiple ages, which may not provide direct information about the extremely small numbers of ciliated meiotic cells in the testis, and is lacking follow up experiments, but does serve as a valuable resource for the community. Finally, the authors use a seminiferous tubule culturing system to show that chemical inhibition of Aurora kinase A likely inhibits cilia depolymerization upon meiotic prophase I exit and leads to an accumulation of metaphase-like cells harboring cilia. They also assess meiotic recombination progression using their culturing system, but this is less convincing.

      Author response: We sincerely thank Ref #1 for the thorough and thoughtful evaluation of our manuscript. We are particularly grateful for the reviewer's careful reading and constructive feedback, which have helped us refine several sections of the text and strengthen our discussion. All comments and suggestions have been carefully considered and addressed, as detailed below.


      __Major comments: __

      1. There are a few issues with the experimental set up for assessing the effects of cilia depolymerization on DNA repair (Figure 7-II). First, how were mid pachytene cells identified and differentiated from early pachytene cells (which would have higher levels of gH2AX) in this experiment? I suggest either using H1t staining (to differentiate early/mid vs late pachytene) or the extent of sex chromosome synapsis. This would ensure that the authors are comparing similarly staged cells in control and treated samples. Second, what were the gH2AX levels at the starting point of this experiment? A more convincing set up would be if the authors measure gH2AX immediately after culturing in early and late cells (early would have higher gH2AX, late would have lower gH2AX), and then again after 24hrs in late cells (upon repair disruption the sampled late cells would have high gH2AX). This would allow them to compare the decline in gH2AX (i.e., repair progression) in control vs treated samples. Also, it would be informative to know the starting gH2AX levels in ciliated vs non-ciliated cells as they may vary.

      Response:

      We thank Ref #1 for this valuable comment, which significantly contributed to improving both the design and interpretation of the cilia depolymerization assay.

      Following this suggestion, we repeated the experiment including 1-hour (immediately after culturing), and 24-hour cultures for both control and chloral hydrate (CH)-treated samples (n = 3 biological replicates). To ensure accurate staging, we now employ triple immunolabelling for γH2AX, SYCP3, and H1T, allowing clear distinction of zygotene (H1T−), early pachytene (H1T−), and late pachytene (H1T+) cells. The revised data (Figure 7) now provide a more complete and statistically robust analysis of DNA damage dynamics. These results confirm that CH-induced deciliation leads to persistence of the γH2AX signal at 24 hours, indicating impaired DNA repair progression in pachytene spermatocytes. The new images and graphs are included in the revised Figure 7.

      Regarding the reviewer's final point about the comparison of γH2AX levels between ciliated and non-ciliated cells, we regret that direct comparison of γH2AX levels between ciliated and non-ciliated cells is not technically feasible. To preserve cilia integrity, all cilia-related imaging is performed using the squash technique, which maintains the three-dimensional structure of the cilia but does not allow reliable quantification of DNA damage markers due to nuclear distortion. Conversely, the nuclear spreading technique, used for DNA damage assessment, provides optimal visualization of repair foci but results in the loss of cilia due to cytoplasmic disruption during the hypotonic step. Given that spermatocytes in juvenile testes form developmentally synchronized cytoplasmic cysts, we consider that analyzing a statistically representative number of spermatocytes offers a valid and biologically meaningful measure of tissue-level effects.

      In conclusion, we believe that the additional experiments and clarifications included in revised Figure 7 strengthen our conclusion that cilia depolymerization compromises DNA repair during meiosis. Further functional confirmation will be pursued in future works, since we are currently generating a conditional genetic model for a ciliopathy in our laboratory.

      The authors analyze meiotic progression in cells cultured with/without AURKA inhibition in Figure 8-III and conclude that the distribution of prophase I cells does not change upon treatment. Is Figure 8-III A and B the same data? The legend text is incorrect, so it's hard to follow. Figure 8-III A shows a depletion of EdU-labelled pachytene cells upon treatment. Moreover, the conclusion that a higher proportion of ciliated zygotene cells upon treatment (Figure 8-II C) suggests that AURKA inhibition delays cilia depolymerization (page 13 line 444) does not make sense to me.

      Response:

      We thank Ref#1 for identifying this issue and for the careful examination of Figure 8. We discovered that the submitted version of Figure 8 contained a mismatch between the figure legend and the figure panels. The legend text was correct; however, the figure inadvertently included a non-corresponding graph (previously panel II-A), which actually belonged to Supplementary Figure 7 in the original submission. We apologize for this mistake.

      This error has been corrected in the revised version. The updated Figure 8 now accurately presents the distribution of EdU-labelled spermatocytes across prophase I substages in control and AURKA-inhibited cultures (previously Figure 8-II B, now Figure 8-A). The corrected data show no significant differences in the proportions of EdU-labelled spermatocytes among prophase I substages after 24 hours of AURKA inhibition, confirming that meiotic progression is not delayed and that no accumulation of zygotene cells occurs under this treatment. Therefore, the observed increase in ciliated zygotene spermatocytes upon AURKA inhibition (new Figure 8 H-I) is best explained by a delay in cilia disassembly, rather than by an arrest or slowdown in meiotic progression. The figure legend and main text have been revised accordingly.

      How do the authors know that there is a monopolar spindle in Figure 8-IV treated samples? Perhaps the authors can use a different Tubulin antibody (that does not detect only acetylated Tubulin) to show that there is a monopolar spindle.

      Response:

      We appreciate Ref#1 for this excellent suggestion. In the original submission (lines 446-447), we described that ciliated metaphase I spermatocytes in AURKA-inhibited samples exhibited monopolar spindle phenotypes. This description was based on previous reports showing that AURKA or PLK1 inhibition produces metaphases with monopolar spindles characterized by aberrant yet characteristic SYCP3 patterns, abnormal chromatin compaction, and circular bivalent alignment around non-migrated centrosomes (1). In our study, we observed SYCP3 staining consistent with these characteristic features of monopolar metaphases I.

      However, we agree with Ref #1 that this could be better sustained with data. Following the reviewer's suggestion, we performed additional immunostaining using α-Tubulin, which labels total microtubules rather than only the acetylated fraction. For clarity purposes, the revised Figure 8 now includes α-Tubulin staining in the same ciliated metaphase I cells shown in the original submission, confirming the presence of defective microtubule polymerization and defective spindle organization. For clarity, we now refer to these ciliated metaphases I as "arrested MI". This new data further support our conclusion that AURKA inhibition disrupts spindle bipolarization and prevents cilia depolymerization, indicating that cilia maintenance and bipolar spindle organization are mechanistically incompatible events during male meiosis. The abstract, results, and discussion section has been expanded accordingly, emphasizing that the persistence of cilia may interfere with microtubule polymerization and centrosome separation under AURKA inhibition. The Discussion has been expanded to emphasize that persistence of cilia may interfere with centrosome separation and microtubule polymerization, contrasting with invertebrate systems -e.g. Drosophila (2) and P. brassicae (3)- in which meiotic cilia persist through metaphase I without impairing bipolar spindle assembly.

      1. Alfaro, et al. EMBO Rep 22, (2021). DOI: 15252/embr.202051030 (PMID: 33615693)
      2. Riparbelli et al . Dev Cell (2012) DOI: 1016/j.devcel.2012.05.024 (PMID: 22898783)
      3. Gottardo et al, Cytoskeleton (Hoboken) (2023) DOI: 1002/cm.21755 (PMID: 37036073)

      The authors state in the abstract that they provide evidence suggesting that centrosome migration and cilia depolymerization are mutually exclusive events during meiosis. This is not convincing with the data present in the current manuscript. I suggest amending this statement in the abstract.

      Response:

      We thank Ref#1 for this valuable observation, with which we fully agree. To avoid overstatement, the original statement has been removed from the Abstract, Results, and Discussion, and replaced with a more accurate formulation indicating that cilia maintenance and bipolar spindle formation are mutually exclusive events during mouse meiosis.

      This revised statement is now directly supported by the new data presented in Figure 8, which demonstrate that AURKA inhibition prevents both spindle bipolarization and cilia depolymerization. We are grateful to the reviewer for highlighting this important clarification.


      Minor comments:

      The presence of cilia in all stages of meiotic prophase I in juvenile mice is intriguing. Why is the cellular distribution and length of cilia different in prepubertal mice compared to adults (where shorter cilia are present only in zygotene cells)? What is the relevance of these developmental differences? Do cilia serve prophase I functions in juvenile mice (in leptotene, pachytene etc.) that are perhaps absent in adults?

      Related to the above point, what is the relevance of the absence of cilia during the first meiotic wave? If cilia serve a critical function during prophase I (for instance, facilitating DSB repair), does the lack of cilia during the first wave imply differing cilia (and repair) requirements during the first vs latter spermatogenesis waves?

      In my opinion, these would be interesting points to discuss in the discussion section.

      Response:

      We thank the reviewer for these thoughtful observations, which we agree are indeed intriguing.

      We believe that our findings likely reflect a developmental role for primary cilia during testicular maturation. We hypothesize that primary cilia at this stage might act as signaling organelles, receiving cues from Sertoli cells or neighboring spermatocytes and transmitting them through the cytoplasmic cysts shared by spermatocytes. Such intercellular communication could be essential for coordinating tissue maturation and meiotic entry during puberty. Although speculative, this hypothesis aligns with the established role of primary cilia as sensory and signaling hubs for GPCR and RTK pathways regulating cell differentiation and developmental patterning in multiple tissues (e.g., 1, 2). The Discussion section has been expanded to include these considerations.

      1. Goetz et al, Nat Rev Genet (2010)- DOI: 1038/nrg2774 (PMID: 20395968)
      2. Naturky et al , Cell (2019) DOI: 1038/s41580-019-0116-4 (PMID: 30948801) Our study focuses on the first spermatogenic wave, which represents the transition from the juvenile to the reproductive phase. It is therefore plausible that the transient presence of longer cilia during this period reflects a developmental requirement for external signaling that becomes dispensable in the mature testis. Given that this is only the second study to date examining mammalian meiotic cilia, there remains a vast area of research to explore. We plan to address potential signaling cascades involved in these processes in future studies.

      On the other hand, while we cannot confirm that the cilia observed in zygotene spermatocytes persist until pachytene within the same cell, it is reasonable to speculate that they do, serving as longer-lasting signaling structures that facilitate testicular development during the critical pubertal window. In addition, the observation of ciliated spermatocytes at all prophase I substages at 20 dpp, together with our proteomic data, supports the idea that the emergence of meiotic cilia exerts a significant developmental impact on testicular maturation.

      In summary, although we cannot yet define specific prophase I functions for meiotic cilia in juvenile spermatocytes, our data demonstrate that the first meiotic wave differs from later waves in cilia dynamics, suggesting distinct regulatory requirements between puberty and adulthood. These findings underscore the importance of considering developmental context when using the first meiotic wave as a model for studying spermatogenesis.

      The authors state on page 9 lines 286-288 that the presence of cytoplasmic continuity via intercellular bridges (between developmentally synchronous spermatocytes) hints towards a mechanism that links cilia and flagella formation. Please clarify this statement. While the correlation between the timing of appearance of cilia and flagella in cells that are located within the same segment of the seminiferous tubule may be hinting towards some shared regulation, how would cytoplasmic continuity participate in this regulation? Especially since the cytoplasmic continuity is not between the developmentally distinct cells acquiring the cilia and flagella?

      Response:

      We thank Ref#1 for this excellent question and for the opportunity to clarify our statement.

      The presence of intercellular bridges between spermatocytes is well known and has long been proposed to support germ cell communication and synchronization (1,2) as well as sharing mRNA (3) and organelles (4). A classic example is the Akap gene, located on the X chromosome and essential for the formation of the sperm fibrous sheath; cytoplasmic continuity through intercellular bridges allows Akap-derived products to be shared between X- and Y-bearing spermatids, thereby maintaining phenotypic balance despite transcriptional asymmetry (5). In addition, more recent work has further demonstrated that these bridges are critical for synchronizing meiotic progression and for processes such as synapsis, double-strand break repair, and transposon repression (6).

      In this context, and considering our proteomic data (Figure 6), our statement did not intend to imply direct cytoplasmic exchange between ciliated and flagellated cells. Although our current methods do not allow comprehensive tracing of cytoplasmic continuity from the basal to the luminal compartment of the seminiferous epithelium, we plan to address this limitation using high-resolution 3D and ultrastructural imaging approaches in future studies.

      Based on our current data, we propose that cytoplasmic continuity within developmentally synchronized spermatocyte cysts could facilitate the coordinated regulation of ciliogenesis, and similarly enable the sharing of regulatory factors controlling flagellogenesis within spermatid cysts. This coordination may occur through the diffusion of centrosomal or ciliary proteins, mRNAs, or signaling intermediates involved in the regulation of microtubule dynamics. However, we cannot exclude the possibility that such cytoplasmic continuity extends across all spermatocytes derived from the same spermatogonial clone, potentially providing a larger regulatory network.]] This mechanism could help explain the temporal correlation we observe between the appearance of meiotic cilia and the onset of flagella formation in adjacent spermatids within the same seminiferous segment.

      We have revised the Discussion to explicitly clarify this interpretation and to note that, although hypothetical, it is consistent with established literature on cytoplasmic continuity and germ cell coordination.

      1. Dym, et al. * Reprod.*(1971) DOI: 10.1093/biolreprod/4.2.195 (PMID: 4107186)
      2. Braun et al. Nature. (1989) DOI: 1038/337373a0 (PMID: 2911388)
      3. Greenbaum et al. * Natl. Acad. Sci. USA*(2006). DOI: 10.1073/pnas.0505123103 (PMID: 16549803)
      4. Ventelä et al. Mol Biol Cell. (2003) DOI: 1091/mbc.e02-10-0647 (PMID: 12857863)
      5. Turner et al. Journal of Biological Chemistry (1998). DOI: 1074/jbc.273.48.32135 (PMID: 9822690)
      6. Sorkin, et al. Nat Commun (2025). DOI: 1038/s41467-025-56742-9 (PMID: 39929837)
      7. *note: due to manuscript-length limitations, not all cited references can be included in the text; they are listed here to substantiate our response.*

      Individual germ cells in H&E-stained testis sections in Figure 1-II are difficult to see. I suggest adding zoomed-in images where spermatocytes/round spermatids/elongated spermatids are clearly distinguishable.

      Response:

      Ref#1 is very right in this suggestion. We have revised Figure 1 to improve the quality of the H&E-stained testis sections and have added zoomed-in panels where spermatocytes, round spermatids, and elongated spermatids are clearly distinguishable. These additions significantly enhance the clarity and interpretability of the figure.

      In Figure 2-II B, the authors document that most ciliated spermatocytes in juvenile mice are pachytene. Is this because most meiotic cells are pachytene? Please clarify. If the data are available (perhaps could be adapted from Figure 1-III), it would be informative to see a graph representing what proportions of each meiotic prophase substages have cilia.

      Response:

      We thank the reviewer for this valuable observation. Indeed, the predominance of ciliated pachytene spermatocytes reflects the fact that most meiotic cells in juvenile testes are at the pachytene stage (Figure 1). We have clarified this point in the text and have added a new supplementary figure (Supplementary Figure 2, new figure) presenting a graph showing the proportion of spermatocytes at each prophase I substage that possess primary cilia. This visualization provides a clearer quantitative overview of ciliation dynamics across meiotic substages.

      I suggest annotating the EM images in Sup Figure 2 and 3 to make it easier to interpret.

      Response:

      We thank the reviewer for this helpful suggestion. We have now added annotations to the EM images in Supplementary Figures 3 and 4 to facilitate their interpretation. These visual guides help readers more easily identify the relevant ultrastructural features described in the text.

      The authors claim that the ratio between GLI3-FL and GLI3-R is stable across their analyzed developmental window in whole testis immunoblots shown in Sup Figure 5. Quantifying the bands and normalizing to the loading control would help strengthen this claim as it hard to interpret the immunoblot in its current form.

      Response:

      We thank the reviewer for this valuable suggestion. Following this recommendation, Supplementary Figure 5 has been revised to include quantification of GLI1 and GLI3 protein levels, normalized to the loading control.

      After quantification, we observed statistically significant differences across developmental stages. Specifically, GLI1 expression is slightly higher at 21 dpp compared to 8 dpp. For GLI3, we performed two complementary analyses:

      • Total GLI3 protein (sum of full-length and repressor forms normalized to loading control) shows a progressive decrease during development, with the lowest levels at 60 dpp (Supplementary Figure 5D).
      • GLI3 activation status, assessed as the GLI3-FL/GLI3-R ratio, is highest during the 19-21 dpp window, compared to 8 dpp and 60 dpp. Although these results suggest a possible transient activation of GLI3 during testicular maturation, we caution that this cannot automatically be attributed to increased Hedgehog signaling, as GLI3 processing can also be affected by other processes, such as changes in ciliogenesis. Furthermore, because the analysis was performed on whole-testis protein extracts, these changes cannot be specifically assigned to ciliated spermatocytes.

      We have expanded the Discussion to address these findings and to highlight the potential involvement of the Desert Hedgehog (DHH) pathway, which plays key roles in testicular development, Sertoli-germ cell communication, and spermatogenesis (1, 2, 3). We plan to investigate these pathways further in future studies.

      1. Bitgood et al. Curr Biol. (1996). DOI: 1016/s0960-9822(02)00480-3 (PMID: 8805249)
      2. Clark et al. Biol Reprod. (2000) DOI: 1095/biolreprod63.6.1825 (PMID: 11090455)
      3. O'Hara et al. BMC Dev Biol. (2011) DOI: 1186/1471-213X-11-72 (PMID: 22132805) *note: due to manuscript-length limitations, not all cited references can be included in the text; they are listed here to substantiate our response.

      There are a few typos throughout the manuscript. Some examples: page 5 line 172, Figure 3-I legend text, Sup Figure 5-II callouts, Figure 8-III legend, page 15 line 508, page 17 line 580, page 18 line 611.

      Response:

      We thank the reviewer for detecting this. All typographical errors have been corrected, and figure callouts have been reviewed for consistency.

      __ ____Response to the Referee #2__

      __ __This study focuses on the dynamic changes of ciliogenesis during meiosis in prepubertal mice. It was found that primary cilia are not an intrinsic feature of the first wave of meiosis (initiating at 8 dpp); instead, they begin to polymerize at 20 dpp (after the completion of the first wave of meiosis) and are present in all stages of prophase I. Moreover, prepubertal cilia (with an average length of 21.96 μm) are significantly longer than adult cilia (10 μm). The emergence of cilia coincides temporally with flagellogenesis, suggesting a regulatory association in the formation of axonemes between the two. Functional experiments showed that disruption of cilia by chloral hydrate (CH) delays DNA repair, while the AURKA inhibitor (MLN8237) delays cilia disassembly, and centrosome migration and cilia depolymerization are mutually exclusive events. These findings represent the first detailed description of the spatiotemporal regulation and potential roles of cilia during early testicular maturation in mice. The discovery of this phenomenon is interesting; however, there are certain limitations in functional research.

      We thank Ref#2 for taking the time to evaluate our manuscript and for summarizing its main findings. We regret that the reviewer did not find the study sufficiently compelling, but we respectfully clarify that the strength of our work lies precisely in addressing a largely unexplored aspect of mammalian meiosis for which virtually no prior data exist. Given the extremely limited number of studies addressing cilia in mammalian meiosis (only five to date, including our own previous publication on adult mouse spermatogenesis) (1-5), we consider that the present work provides the first robust and integrative evidence on the emergence, morphology, and potential roles of primary cilia during prepubertal testicular development. The study combines histology, high-resolution microscopy, proteomics, and pharmacological perturbations, supported by quantitative analyses, thereby establishing a solid and much-needed reference framework for future functional studies.

      We emphasize that this manuscript constitutes the first comprehensive characterization of ciliogenesis during prepubertal mouse meiosis, complemented by functional in vitro assays that begin to address potential roles of these cilia. For this reason, we want to underscore the importance of this study in providing a solid framework that will support and guide future research

      Major points:

      1. The prepubertal cilia in spermatocytes discovered by the authors lack specific genetic ablation to block their formation, making it impossible to evaluate whether such cilia truly have functions. Because neither in the first wave of spermatogenesis nor in adult spermatogenesis does this type of cilium seem to be essential. In addition, the authors also imply that the formation of such cilia appears to be synchronized with the formation of sperm flagella. This suggests that the production of such cilia may merely be transient protein expression noise rather than a functionally meaningful cellular structure.

      Response:

      We agree that a genetic ablation model would represent the ideal approach to directly test cilia function in spermatogenesis. However, given the complete absence of prior data describing the dynamics of ciliogenesis during testis development, our priority in this study was to establish a rigorous structural and temporal characterization of this process in the main mammalian model organism, the mouse. This systematic and rigorous phenotypic characterization is a necessary first step before any functional genetics could be meaningfully interpreted.

      To our knowledge, this study represents the first comprehensive analysis of ciliogenesis during prepubertal mouse meiosis, extending our previous work on adult spermatogenesis (1). Beyond these two contributions, only four additional studies have addressed meiotic cilia-two in zebrafish (2, 3), with Mytlys et al. also providing preliminary observations relevant to prepubertal male meiosis that we discuss in the present work, one in Drosophila (4) and a recent one in butterfly (5). No additional information exists for mammalian gametogenesis to date.

      1. López-Jiménez et al. Cells (2022) DOI: 10.3390/cells12010142 (PMID: 36611937)
      2. Mytlis et al. Science (2022) DOI: 10.1126/science.abh3104 (PMID: 35549308)
      3. Xie et al. J Mol Cell Biol (2022) DOI: 10.1093/jmcb/mjac049 (PMID: 35981808)
      4. Riparbelli et al . Dev Cell (2012) DOI: 10.1016/j.devcel.2012.05.024 (PMID: 22898783)
      5. Gottardo et al, Cytoskeleton (Hoboken) (2023) DOI: 10.1002/cm.21755 (PMID: 37036073) We therefore consider this descriptive and analytical foundation to be essential before the development of functional genetic models. Indeed, we are currently generating a conditional genetic model for a ciliopathy in our laboratory. These studies are ongoing and will directly address the type of mechanistic questions raised here, but they extend well beyond the scope and feasible timeframe of the present manuscript.

      We thus maintain that the present work constitutes a necessary and timely contribution, providing a robust reference dataset that will facilitate and guide future functional studies in the field of cilia and meiosis.

      Taking this into account, we would be very pleased to address any additional, concrete suggestions from Ref#2 that could further strengthen the current version of the manuscript

      The high expression of axoneme assembly regulators such as TRiC complex and IFT proteins identified by proteomic analysis is not particularly significant. This time point is precisely the critical period for spermatids to assemble flagella, and TRiC, as a newly discovered component of flagellar axonemes, is reasonably highly expressed at this time. No intrinsic connection with the argument of this paper is observed. In fact, this testicular proteomics has little significance.

      Response:

      We appreciate this comment but respectfully disagree with the reviewer's interpretation of our proteomic data. To our knowledge, this is the first proteomic study explicitly focused on identifying ciliary regulators during testicular development at the precise window (19-21 dpp) when both meiotic cilia and spermatid flagella first emerge.

      While Piprek et al (1) analyzed the expression of primary cilia in developing gonads, proteomic data specifically covering the developmental transition at 19-21 dpp were not previously available. Furthermore, a recent cell-sorting study (2), detected expression of cilia proteins in pachytene spermatocytes compared to round spermatids, but did not explore their functional relevance or integrate these data with developmental timing or histological context.

      In contrast, our dataset integrates histological staging, high-resolution microscopy, and quantitative proteomics, revealing a set of candidate regulators (including DCAF7, DYRK1A, TUBB3, TUBB4B, and TRiC) potentially involved in cilia-flagella coordination. We view this as a hypothesis-generating resource that outlines specific proteins and pathways for future mechanistic studies on both ciliogenesis and flagellogenesis in the testis.

      Although we fully agree that proteomics alone cannot establish causal function, we believe that dismissing these data as having little significance overlooks their value as the first molecular map of the testis at the developmental window when axonemal structures arise. Our dataset provides, for the first time, an integrated view of proteins associated with ciliary and flagellar structures at the developmental stage when both axonemal organelles first appear. We thus believe that our proteomic dataset represents an important and novel contribution to the understanding of testicular development and ciliary biology.

      Considering this, we would again welcome any specific suggestions from Ref#2 on additional analyses or clarifications that could make the relevance of this dataset even clearer to readers.

      1. Piprek et al. Int J Dev Biol. (2019) doi: 10.1387/ijdb.190049rp (PMID: 32149371).
      2. Fang et al. Chromosoma. (1981) doi: 10.1007/BF00285768 (PMID: 7227045).

      Response to the Referee #3

      In "The dynamics of ciliogenesis in prepubertal mouse meiosis reveals new clues about testicular development" Pérez-Moreno, et al. explore primary cilia in prepubertal mouse spermatocytes. Using a combination of microscopy, proteomics, and pharmacological perturbations, the authors carefully characterize prepubertal spermatocyte cilia, providing foundational work regarding meiotic cilia in the developing mammalian testis.

      Response: We sincerely thank Ref#3 for their positive assessment of our work and for the thoughtful suggestions that have helped us strengthen the manuscript. We are pleased that the reviewer recognizes both the novelty and the relevance of our study in providing foundational insights into meiotic ciliogenesis during prepubertal testicular development. All specific comments have been carefully considered and addressed as detailed below.


      Major concerns:

      1. The authors provide evidence consistent with cilia not being present in a larger percentage of spermatocytes or in other cells in the testis. The combination of electron microscopy and acetylated tubulin antibody staining establishes the presence of cilia; however, proving a negative is challenging. While acetylated tubulin is certainly a common marker of cilia, it is not in some cilia such as those in neurons. The authors should use at least one additional cilia marker to better support their claim of cilia being absent.

      Response:

      We thank the reviewer for this helpful suggestion. In the revised version, we have strengthened the evidence for cilia identification by including an additional ciliary marker, glutamylated tubulin (GT335), in combination with acetylated tubulin and ARL13B (which were included in the original submission). These data are now presented in the new Supplementary Figure 2, which also includes an example of a non-ciliated spermatocyte showing absence of both ARL13B and AcTub signals.

      Taken together, these markers provide a more comprehensive validation of cilia detection and confirm the absence of ciliary labelling in non-ciliated spermatocytes.

      The conclusion that IFT88 localizes to centrosomes is premature as key controls for the IFT88 antibody staining are lacking. Centrosomes are notoriously "sticky", often sowing non-specific antibody staining. The authors must include controls to demonstrate the specificity of the staining they observe such as staining in a genetic mutant or an antigen competition assay.

      Response:

      We appreciate the reviewer's concern and fully agree that antibody specificity is critical when interpreting centrosomal localization. The IFT88 antibody used in our study is commercially available and has been extensively validated in the literature as both a cilia marker (1, 2), and a centrosome marker in somatic cells (3). Labelling of IFT88 in centrosomes has also been previously described using other antibodies (4, 5). In our material, the IFT88 signal consistently appears at one of the duplicated centrosomes and at both spindle poles-patterns identical to those reported in somatic cells. We therefore consider the reported meiotic IFT88 staining as specific and biologically reliable.

      That said, we agree that genetic validation would provide the most definitive confirmation. We would like to inform that we are currently since we are currently generating a conditional genetic model for a ciliopathy in our laboratory that will directly assess both antibody specificity and functional consequences of cilia loss during meiosis. These experiments are in progress and will be reported in a follow-up study.

      1. Wong et al. Science (2015). DOI: 1126/science.aaa5111 (PMID: 25931445)
      2. Ocbina et al. Nat Genet (2011). DOI: 1038/ng.832 (PMID: 21552265)
      3. Vitre et al. EMBO Rep (2020). DOI: 15252/embr.201949234 (PMID: 32270908)
      4. Robert A. et al. J Cell Sci (2007). DOI: 1242/jcs.03366 (PMID: 17264151)
      5. Singla et al, Developmental Cell (2010). DOI: 10.1016/j.devcel.2009.12.022 (PMID: 20230748) *note: due to manuscript-length limitations, not all cited references can be included in the text; they are listed here to substantiate our response.

      There are many inconsistent statements throughout the paper regarding the timing of the first wave of spermatogenesis. For example, the authors state that round spermatids can be detected at 21dpp on line 161, but on line 180, say round spermatids can be detected a 19dpp. Not only does this lead to confusion, but such discrepancies undermine the validity of the rest of the paper. A summary graphic displaying key events and their timing in the first wave of spermatogenesis would be instrumental for reader comprehension and could be used by the authors to ensure consistent claims throughout the paper.

      Response:

      We thank the reviewer for identifying this inconsistency and apologize for the confusion. We confirm that early round spermatids first appear at 19 dpp, as shown in the quantitative data (Figure 1J). This can be detected in squashed spermatocyte preparations, where individual spermatocytes and spermatids can be accurately quantified. The original text contained an imprecise reference to the histological image of 21 dpp (previous line 161), since certain H&E sections did not clearly show all cell types simultaneously. However, we have now revised Figure 1, improving the image quality and adding a zoomed-in panel highlighting early round spermatids. Image for 19 dpp mice in Fig 1D shows early, yet still aflagellated spermatids. The first ciliated spermatocytes and the earliest flagellated spermatids are observed at 20 dpp. This has been clarified in the text.

      In addition, we also thank the reviewer for the suggestion of adding a summary graphic, which we agree greatly facilitates reader comprehension. We have added a new schematic summary (Figure 1K) illustrating the key stages and timing of the first spermatogenic wave.

      In the proteomics experiments, it is unclear why the authors assume that changes in protein expression are predominantly due to changes within the germ cells in the developing testis. The analysis is on whole testes including both the somatic and germ cells, which makes it possible that protein expression changes in somatic cells drive the results. The authors need to justify why and how the conclusions drawn from this analysis warrant such an assumption.

      Response:

      We agree with the reviewer that our proteomic analysis was performed on whole testis samples, which contain both germ and somatic cells. Although isolation of pure spermatocyte populations by FACS would provide higher resolution, obtaining sufficient prepubertal material for such analysis would require an extremely large number of animals. To remain compliant with the 3Rs principle for animal experimentation, we therefore used whole-testis samples from three biological replicates per age.

      We acknowledge that our assumption-that the main differences arise from germ cells-is a simplification. However, germ cells constitute the vast majority of testicular cells during this developmental window and are the population undergoing major compositional changes between 15 dpp and adulthood. It is therefore reasonable to expect that a substantial fraction of the observed proteomic changes reflects alterations in germ cells. We have clarified this point in the revised text and have added a statement noting that changes in somatic cells could also contribute to the proteomic profiles.

      The authors should provide details on how proteins were categorized as being involved in ciliogenesis or flagellogenesis, specifically in the distinction criteria. It is not clear how the categorizations were determined or whether they are valid. Thus, no one can repeat this analysis or perform this analysis on other datasets they might want to compare.

      Response:

      We thank the reviewer for this opportunity to clarify our approach. The categorization of protein as being involved in ciliogenesis or flagellogenesis was based on their Gene Ontology (GO) cellular component annotations obtained from the PANTHER database (Version 19.0), using the gene IDs of the Differentially Expressed Proteins (DEPs). Specifically, we used the GO terms cilium (GO:0005929) and motile cilium (GO:0031514). Since motile cilium is a subcategory of cilium, proteins annotated only with the general cilium term, but not included under motile cilium, were considered to be associated with primary cilia or with shared structural components common to different types of cilia. These GO terms are represented in the bottom panel of the Figure 6.

      This information has been added to the Methods section and referenced in the Results for transparency and reproducibility.

      In the pharmacological studies, the authors conclude that the phenotypes they observe (DNA damage and reduced pachytene spermatocytes) are due to loss of or persistence of cilia. This overinterprets the experiment. Chloral hydrate and MLN8237 certainly impact ciliation as claimed, but have additional cellular effects. Thus, it is possible that the observed phenotypes were not a direct result of cilia manipulation. Either additional controls must address this or the conclusions need to be more specific and toned down.

      Response:

      We thank the reviewer for this fair observation and have taken steps to strengthen and refine our interpretation. In the revised version, we now include data from 1-hour and 24-hour cultures for both control and chloral hydrate (CH)-treated samples (n = 3 biological replicates). The triple immunolabelling with γH2AX, SYCP3, and H1T allows accurate staging of zygotene (H1T⁻), early pachytene (H1T⁻), and late pachytene (H1T⁺) spermatocytes.

      The revised Figure 7 now provides a more complete and statistically supported analysis of DNA damage dynamics, confirming that CH-induced deciliation leads to persistent γH2AX signal at 24 hours, indicative of delayed or defective DNA repair progression. We have also toned down our interpretation in the Discussion, acknowledging that CH could affect other cellular pathways.

      As mentioned before, the conditional genetic model that we are currently generating will allow us to evaluate the role of cilia in meiotic DNA repair in a more direct and specific way.

      Assuming the conclusions of the pharmacological studies hold true with the proper controls, the authors still conflate their findings with meiotic defects. Meiosis is not directly assayed, which makes this conclusion an overstatement of the data. The conclusions need to be rephrased to accurately reflect the data.

      Response:

      We agree that this aspect required clarification. As noted above, we have refined both the Results and Discussion sections to make clear that our assays specifically targeted meiotic spermatocytes.

      We now present data for meiotic stages at zygotene, early pachytene and late pachytene. This is demonstrated with the labelling for SYCP3 and H1T, both specific marker for meiosis that are not detectable in non meiotic cells. We believe that this is indeed a way to assay the meiotic cells, however, we have specified now in the text that we are analysing potential defects in meiosis progression. We are sorry if this was not properly explained in the original manuscript: it is now rephrased in the new version both in the results and discussion section.

      It is not clear why the authors chose not to use widely accepted assays of Hedgehog signaling. Traditionally, pathway activation is measured by transcriptional output, not GLI protein expression because transcription factor expression does not necessarily reflect transcription levels of target genes.

      Response:

      We agree with the reviewer that measuring mRNA levels of Hedgehog pathway target genes, typically GLI1 and PTCH1, is the most common method for measuring pathway activation, and is widely accepted by researchers in the field. However, the methods we use in this manuscript (GLI1 and GLI3 immunoblots) are also quite common and widely accepted:

      Regarding GLI1 immunoblot, many articles have used this method to monitor Hedgehog signaling, since GLI1 protein levels have repeatedly been shown to also go up upon pathway activation, and down upon pathway inhibition, mirroring the behavior of GLI1 mRNA. Here are a few publications that exemplify this point:

      • Banday et al. 2025 Nat Commun. DOI: 10.1038/s41467-025-56632-0 (PMID: 39894896)
      • Shi et al 2022 JCI Insight DOI: 10.1172/jci.insight.149626 (PMID: 35041619)
      • Deng et al. 2019 eLife, DOI: 10.7554/eLife.50208 (PMID: 31482846)
      • Zhu et al. 2019 Nat Commun, DOI: 10.1038/s41467-019-10739-3 (PMID: 31253779)
      • Caparros-Martin et al 2013 Hum Mol Genet, DOI: 10.1093/hmg/dds409 (PMID: 23026747) *note: due to manuscript-length limitations, not all cited references can be included in the text; they are listed here to substantiate our response.

      As for GLI3 immunoblot, Hedgehog pathway activation is well known to inhibit GLI3 proteolytic processing from its full length form (GLI3-FL) to its transcriptional repressor (GLI3-R), and such processing is also commonly used to monitor Hedgehog signal transduction, of which the following are but a few examples:

      • Pedraza et al 2025 eLife, DOI: 10.7554/eLife.100328 (PMID: 40956303)
      • Somatilaka et al 2020 Dev Cell, DOI: 10.1016/j.devcel.2020.06.034 (PMID: 32702291)
      • Infante et al 2018, Nat Commun, DOI: 10.1038/s41467-018-03339-0 (PMID: 29515120)
      • Wang et al 2017 Dev Biol DOI: 10.1016/j.ydbio.2017.08.003 (PMID: 28800946)
      • Singh et al 2015 J Biol Chem DOI: 10.1074/jbc.M115.665810 (PMID: 26451044)
      • *note: due to manuscript-length limitations, not all cited references can be included in the text; they are listed here to substantiate our response.*

      In summary, we think that we have used two well established markers to look at Hedgehog signaling (three, if we include the immunofluorescence analysis of SMO, which we could not detect in meiotic cilia).

      These Hh pathway analyses did not provide any convincing evidence that the prepubertal cilia we describe here are actively involved in this pathway, even though Hh signaling is cilia-dependent and is known to be active in the male germline (Sahin et al 2014 Andrology PMID: 24574096; Mäkelä et al 2011 Reproduction PMID: 21893610; Bitgood et al 1996 Curr Biol. PMID: 8805249).

      That said, we fully agree that our current analyses do not allow us to draw definitive conclusions regarding Hedgehog pathway activity in meiotic cilia, and we now state this explicitly in the revised Discussion.

      Also in the Hedgehog pathway experiment, it is confusing that the authors report no detection of SMO yet detect little to no expression of GLIR in their western blot. Undetectable SMO indicates Hedgehog signaling is inactive, which results in high levels of GLIR. The impact of this is that it is not clear what is going on with Hh signaling in this system.

      Response:

      It is true that, when Hh signaling is inactive (and hence SMO not ciliary), the GLI3FL/GLI3R ratio tends to be low.

      Although our data in prepuberal mouse testes show a strong reduction in total GLI3 protein levels (GLI3FL+GLI3R) as these mice grow older, this downregulation of total GLI3 occurs without any major changes in the GLI3FL/GLI3R ratio, which is only modestly affected (suppl. Figure 6).

      Hence, since it is the ratio that correlates with Hh signaling rather than total levels, we do not think that the GLI3R reduction we see is incompatible with our non-detection of SMO in cilia: it seems more likely that overall GLI3 expression is being downregulated in developing testes via a Hh-independent mechanism.

      Also potentially relevant here is the fact that some cell types depend more on GLI2 than on GLI3 for Hh signaling. For instance, in mouse embryos, Hh-mediated neural tube patterning relies more heavily on GLI2 processing into a transcriptional activator than on the inhibition of GLI3 processing into a repressor. In contrast, the opposite is true during Hh-mediated limb bud patterning (Nieuwenhuis and Hui 2005 Clin Genet. PMID: 15691355). We have not looked at GLI2, but it is conceivable that it could play a bigger role than GLI3 in our model.

      Moreover, several forms of GLI-independent non-canonical Hh signaling have been described, and they could potentially play a role in our model, too (Robbins et al 2012 Sci Signal. PMID: 23074268).

      We have revised the discussion to clarify some of these points.

      All in all, we agree that our findings regarding Hh signaling are not conclusive, but we still think they add important pieces to the puzzle that will help guide future studies.

      There are multiple instances where it is not clear whether the authors performed statistical analysis on their data, specifically when comparing the percent composition of a population. The authors need to include appropriate statistical tests to make claims regarding this data. While the authors state some impressive sample sizes, once evaluated in individual categories (eg specific cell type and age) the sample sizes of evaluated cilia are as low as 15, which is likely underpowered. The authors need to state the n for each analysis in the figures or legends.

      We thank the reviewer for highlighting this important issue. We have now included the sample size (n) for every analysis directly in the figure legends. Although this adds length, it improves transparency and reproducibility.

      Regarding the doubts of Ref#3 about the different sample sizes, the number of spermatocytes quantified in each stage is in agreement with their distribution in meiosis (example, pachytene lasts for 10 days this stage is widely represented in the preparations, while its is much difficult to quantify metaphases I that are less present because the stage itself lasts for less than 24hours). Taking this into account, we ensured that all analyses remain statistically valid and representative, applying the appropriate statistical tests for each dataset. These details are now clearly indicated in the revised figures and legends.

      Minor concerns:

      1. The phrase "lactating male" is used throughout the paper and is not correct. We assume this term to mean male pups that have yet to be weaned from their lactating mother, but "lactating male" suggests a rare disorder requiring medical intervention. Perhaps "pre-weaning males" is what the authors meant.

      Response:

      We thank the reviewer for noticing this terminology error. The expression has been corrected to "pre-weaning males" throughout the manuscript.

      The convention used to label the figures in this paper is confusing and difficult to read as there are multiple panels with the same letter in the same figure (albeit distinct sections). Labeling panels in the standard A-Z format is preferred. "Panel Z" is easier to identify than "panel III-E".

      Response:

      We thank the reviewer for this suggestion. All figures have been relabelled using the standard A-Z panel format, ensuring consistency and easier readability across the manuscript.

    1. Synthèse : Comprendre le Logiciel de l'Esprit

      Résumé Exécutif

      Cette note de synthèse analyse les thèmes centraux de la présentation du professeur Uichol Kim, qui remet en question les paradigmes occidentaux dominants sur l'esprit humain et le succès.

      L'argument principal est que le "logiciel" occidental de l'esprit, fondé sur des hypothèses d'individualisme, de compétition ("la survie du plus apte") et de déterminisme biologique, est fondamentalement erroné.

      Le professeur Kim propose une vision alternative où la coopération, les relations et la co-création sont les véritables moteurs de l'évolution et du bien-être humains.

      Il soutient que l'évolution humaine a été rendue possible non par la compétition, mais par des innovations sociales et technologiques comme la maîtrise du feu et le langage, qui ont favorisé la collaboration.

      L'esprit humain n'est pas un système biologique fermé et prédéterminé, mais un système ouvert et socialement construit, façonné par les expériences et les relations interpersonnelles, un concept renforcé par les découvertes en épigénétique et en neurosciences.

      Enfin, des études empiriques à grande échelle, notamment les travaux de Daniel Kahneman et l'étude longitudinale de Harvard sur le développement des adultes, convergent vers une conclusion univoque :

      le véritable bonheur et une vie longue et saine ne découlent pas de la richesse ou du succès individuel, mais de la qualité des relations chaleureuses et du partage avec les autres.

      La satisfaction dans la vie (liée au revenu) et le bonheur (lié aux expériences relationnelles) sont deux concepts distincts, souvent confondus au détriment du bien-être humain.

      --------------------------------------------------------------------------------

      1. Critique des Hypothèses Fondamentales du "Logiciel" Occidental

      Le professeur Kim commence par souligner l'importance des "hypothèses de base sur la réalité" qui, selon Peter Drucker, forment le paradigme d'une culture et d'une science.

      Ces hypothèses, souvent implicites et résistantes au changement, déterminent ce qui est considéré comme un fait. La pensée occidentale repose sur plusieurs hypothèses qui sont remises en question.

      L'Individu comme Unité de Base (Socrate) : L'injonction socratique "Connais-toi toi-même" a placé l'individu comme l'unité d'analyse fondamentale, considérée comme "indivisible".

      La Compétition comme Moteur de l'Évolution (Darwin) : La théorie de l'évolution de Charles Darwin, basée sur la compétition, la sélection naturelle et la "survie du plus apte", a été largement appliquée à la société humaine, aux entreprises et aux individus, créant une croyance fondamentale en la nécessité de la compétition.

      Le Déterminisme Biologique et Pathologique (Freud) : Sigmund Freud a adopté un modèle biologique, définissant le comportement humain en termes de pulsions sexuelles ou violentes.

      Ses théories ont été généralisées à l'ensemble de la population à partir d'études de cas de patients "hystériques" et anormaux, ce qui constitue une extrapolation problématique.

      Le Comportementalisme Réductionniste (Skinner) : B.F. Skinner a étudié des pigeons et des rats pour comprendre les êtres humains, supposant que les comportements de base sont le fondement des comportements complexes, ignorant ainsi la spécificité humaine et le rôle du contexte social.

      Le Développement Cognitif sans Contexte (Piaget) : Le modèle de développement cognitif de Jean Piaget, bien qu'influent, est critiqué pour son omission quasi-totale du rôle des parents et des émotions, car Piaget observait principalement ses propres enfants de manière isolée.

      2. Un Paradigme Alternatif : L'Agentivité et l'Auto-Efficacité

      En opposition aux modèles déterministes, le professeur Kim met en avant le travail d'Albert Bandura sur le "soi en tant qu'agent proactif".

      L'être humain n'est pas simplement déterminé par la biologie ou l'environnement, mais possède une agentivité qui lui permet de façonner son propre avenir.

      Concept d'Auto-Efficacité : Il s'agit de la "croyance en sa propre capacité à organiser et exécuter les actions requises pour gérer des situations futures".

      Les personnes ayant une auto-efficacité élevée agissent, pensent et ressentent différemment, produisant leur propre avenir plutôt que de simplement le prévoir.

      Composantes Clés : L'intention, la connaissance, les objectifs, les croyances et les compétences sont essentiels.

      Influence Sociale : L'auto-efficacité n'est pas purement individuelle. Elle est modifiée et renforcée par :

      Le feedback : La pratique constante, comme le font les athlètes et les musiciens.    ◦ Le soutien social : Un élément crucial pour augmenter l'auto-efficacité d'une personne.

      3. Réévaluation de l'Évolution Humaine : La Coopération Prime sur la Compétition

      L'exposé conteste directement l'idée que la compétition est le principal moteur de l'évolution humaine en réexaminant notre héritage biologique et anthropologique.

      Deux Modèles de Chimpanzés : Il existe une distinction entre les chimpanzés communs (agressifs, violents, hiérarchiques) et les bonobos ou "chimpanzés pygmées" (dominés par les femelles, égalitaires, non-violents).

      L'espèce la plus proche de l'ancêtre humain est le bonobo, suggérant que nos racines sont plus coopératives qu'agressives.

      Le Rôle du Milieu : Les Homo sapiens ont évolué dans la savane subsaharienne, un environnement ouvert, tandis que les chimpanzés vivent dans la jungle.

      Adaptations Clés pour la Coopération :

      La Bipédie : Marcher sur deux pieds a permis de réduire le stress thermique, mais a surtout provoqué une "descente du larynx", rendant possible la production de jusqu'à 20 000 sons différents, base essentielle du langage et de la communication complexe.   

      La Maîtrise du Feu : La plus grande transformation. Les humains ont appris à contrôler le feu, ce qui a permis de cuire les aliments. La cuisson a détruit les bactéries et permis de consommer cinq fois plus de calories que la viande crue.  

      Développement du Cerveau : Cet apport calorique supplémentaire est la cause principale de la taille du cerveau humain (quatre fois plus grand que celui du chimpanzé), en particulier du lobe frontal.

      C'est en surmontant notre instinct (la peur du feu) que nous avons développé un plus grand cerveau, et non l'inverse.

      4. L'Esprit Humain comme Système Ouvert et Socialement Construit

      La présentation souligne une différence fondamentale entre les humains et les autres primates : la capacité de stocker et de transmettre l'information en dehors du corps.

      Le Corps comme Système Fermé, l'Esprit comme Système Ouvert : Alors que le corps est défini par la peau, l'esprit est un système ouvert.

      Le cerveau humain, avec ses milliards de neurones et ses billions de connexions potentielles, intègre de nouvelles idées et se reconfigure en permanence par l'interaction avec les autres.

      L'Explosion de la Créativité : Il y a 30 000 à 40 000 ans, l'art rupestre est apparu comme la "première technologie de l'information", permettant de projeter des images et de combiner des concepts (ex: l'homme-lion).

      Stockage Externe de l'Information :

      ◦ Un chimpanzé comme Kanzi peut apprendre à communiquer avec des symboles, mais ne peut pas enseigner cette connaissance à sa progéniture.

      À sa mort, tout son savoir disparaît.  

      ◦ Chez les humains, l'invention de l'écriture (cunéiforme), du papier et de l'imprimerie a permis un stockage et une transmission de l'information exponentiels, permettant aux générations futures de se connecter spirituellement et intellectuellement aux idées passées.

      Neurosciences et Épigénétique :

      Épigénétique : L'idée qu'un gène spécifique définit une expression unique est une simplification excessive. Les gènes peuvent être activés ou désactivés par des facteurs environnementaux (alimentation, exercice, stress, expériences). Nous naissons avec des gènes, mais leur expression dépend de l'expérience.  

      Le Cerveau comme Construction Sociale : Citant le neurobiologiste Gerald Hüther, le professeur Kim affirme que "le cerveau humain est une construction sociale".

      Les connexions neuronales se forment et se renforcent par l'expérience sociale et la répétition (ex: faire du vélo, conduire).   

      L'Absence d'Objectivité Pure : Toute information sensorielle passe par le système limbique, où elle est connectée aux émotions.

      Un même stimulus active un réseau cognitif et émotionnel.

      5. Contrastes Culturels : Individualisme Occidental vs Relationalisme Oriental

      Le "logiciel de l'esprit" varie considérablement selon les cultures.

      La Dualité Cartésienne : René Descartes, par son doute radical, a établi une dualité stricte entre le corps (soumis aux lois naturelles) et l'âme/esprit (capable de comprendre Dieu et la vérité).

      Cela a conduit à une pensée dichotomique (noir/blanc, bien/mal).

      La Vision Relationnelle Est-Asiatique : En Asie de l'Est, le noir et le blanc (Yin et Yang) ne sont pas en opposition mais en relation.

      Le caractère chinois pour "humain" (人間) signifie "entre les humains".

      ◦ La devise n'est pas "Je pense, donc je suis" mais pourrait être traduite par "Je suis entre, donc je suis" (I am between, therefore I am).

      Exemples Coréens :

      Culture du riz : La riziculture nécessite une coopération intense, favorisant une culture de l'harmonie.  

      Le concept de Cheong (情) : Une forme de connexion humaine profonde, de compassion et d'affection. Ne pas ressentir de compassion pour un enfant en train de se noyer signifie ne pas être humain.  

      Piété filiale : Le corps n'appartient pas à l'individu mais a été reçu des parents.

      Le succès est donc un devoir envers eux. Les enfants représentent le futur et les parents le passé, créant une interdépendance où les parents ne peuvent être heureux que si leurs enfants le sont.

      6. La Science du Bonheur : Les Relations Avant l'Argent et le Succès

      Les recherches empiriques les plus récentes en psychologie et en économie convergent pour démanteler le mythe selon lequel l'argent et le succès individuel mènent au bonheur.

      A. Les Travaux de Daniel Kahneman (Prix Nobel)

      Kahneman fait une distinction cruciale entre la "satisfaction de vie" (liée au "soi qui se souvient") et le "bien-être émotionnel" ou bonheur (lié au "soi qui expérimente").

      Caractéristique

      Satisfaction de Vie

      Bonheur (Bien-être Émotionnel)

      Prédicteurs

      Revenu, éducation, succès, atteinte d'objectifs

      Santé, relations, absence de solitude, partage

      Relation au Revenu

      Augmente avec le revenu

      Plafonne à un revenu médian (~75 000 $)

      Concept du Soi

      "Soi qui se souvient" (Remembering self)

      "Soi qui expérimente" (Experiencing self)

      Focalisation

      Évaluation globale de la vie, réalisations

      Expériences vécues dans le moment présent

      Conclusion de Kahneman : Les gens poursuivent la satisfaction de vie (liée au statut social et à l'argent) en pensant qu'elle leur apportera le bonheur. Cependant, les personnes à hauts revenus sont souvent plus stressées et ne consacrent pas plus de temps à des activités agréables. C'est une "illusion de focalisation" où l'on surestime l'impact d'un seul facteur (l'argent) sur le bien-être global.

      B. L'Étude Longitudinale de Harvard sur le Développement des Adultes

      Cette étude, menée sur 85 ans auprès de deux groupes (hommes de Harvard et hommes de quartiers défavorisés de Boston), est l'une des plus longues jamais réalisées.

      Découverte Surprenante : Le facteur le plus puissant influençant la santé et la longévité n'est ni l'argent, ni le succès, ni le QI.

      Principaux Résultats :

      ◦ Les personnes les plus satisfaites de leurs relations à 50 ans étaient les plus en bonne santé à 80 ans.   

      ◦ Les relations chaleureuses sont un meilleur prédicteur d'une vie longue et heureuse que le statut social, le QI ou les gènes.  

      La solitude tue. Elle est associée à un décès plus précoce (jusqu'à 10 ans), au stress, à la dépression et à une mauvaise santé physique.   

      ◦ La qualité des relations avec la mère dans l'enfance prédisait l'efficacité au travail et des revenus plus élevés.   

      ◦ Des relations chaleureuses avec les parents étaient liées à moins d'anxiété et une plus grande satisfaction à l'âge adulte.

      Conclusion de Robert Waldinger (directeur actuel de l'étude) : "La clé du vieillissement en bonne santé est : relation, relation, relation."

      Les personnes les plus heureuses et en meilleure santé sont celles qui ont cultivé les "connexions les plus chaleureuses avec les autres".

      7. Débat sur l'Analogie du "Logiciel"

      Lors de la session de questions-réponses, l'analogie du "logiciel de l'esprit" est remise en question.

      La Critique : Un intervenant suggère que l'analogie est potentiellement trompeuse.

      Un logiciel est un ensemble d'instructions spécifiques exécutées par un ordinateur standard.

      Le cerveau ne fonctionne pas de cette manière ; il s'apparente davantage à un réseau neuronal artificiel complexe d'où émerge un comportement.

      Des termes comme "culture", "récits" ou "habitudes" pourraient être plus appropriés et moins confus.

      La Réponse du Professeur Kim : Il reconnaît qu'il s'agit d'une analogie utilisée pour inciter les gens à penser différemment, en s'éloignant des vues déterministes (biologiques, cognitives-mécaniques) et en soulignant que le "logiciel" est invisible et que chacun fonctionne différemment.

      L'analogie vise à introduire le concept d'agentivité et l'importance du soutien social.

      Il admet ne pas avoir de meilleure analogie pour l'instant et souligne que les ordinateurs eux-mêmes sont des créations humaines qui imitent certaines de nos fonctions.

    1. Document d'Information : Utilisation des Systèmes d'IA pour la Prise de Décision dans l'État Moderne

      Synthèse Exécutive

      Ce document synthétise les perspectives d'experts sur l'application des systèmes d'intelligence artificielle (IA) dans deux domaines sociétaux critiques : le droit en Europe et la santé en Afrique du Sud.

      Dans le secteur juridique européen, l'IA est présentée comme une solution à la pression croissante entre l'augmentation des coûts du travail juridique et la nécessité de maintenir un état de droit de haute qualité face à une complexité réglementaire grandissante.

      Les applications clés incluent l'optimisation de la recherche d'informations juridiques, la révision de contrats, la diligence raisonnable et l'analyse de cas complexes.

      L'IA n'est pas considérée comme une menace pour l'emploi des juristes, mais plutôt comme un outil pour automatiser les tâches fastidieuses, leur permettant de se concentrer sur des activités à plus forte valeur ajoutée.

      Cependant, des risques importants subsistent, notamment le manque d'explicabilité des décisions prises par l'IA (risque d'aliénation) et la multiplication des erreurs en cas de faille dans un système automatisé.

      Dans le secteur de la santé sud-africain, confronté à des ressources limitées et à une forte prévalence de maladies transmissibles, l'IA offre un potentiel immense pour passer d'un modèle de santé curatif coûteux à un modèle préventif.

      Les applications vont du diagnostic assisté par l'analyse d'images médicales à la prédiction de l'apparition de maladies grâce à des modèles d'apprentissage automatique.

      Une vision d'avenir optimiste repose sur le déploiement de technologies à faible coût, comme les dispositifs portables (wearables), pour un suivi continu des individus.

      Ces données pourraient créer des "jumeaux numériques" des citoyens et, à terme, des villes entières, permettant une surveillance, une simulation et des interventions proactives en matière de santé publique à une échelle sans précédent.

      L'adaptation des technologies au contexte local à faibles ressources est une condition essentielle de succès.

      Enfin, le document souligne l'importance cruciale de la collaboration interdisciplinaire pour développer des systèmes d'IA qui soient non seulement techniquement performants mais aussi socialement pertinents et responsables.

      L'IA dans le Domaine Juridique : Relever les Défis en Europe

      L'analyse du professeur Henrik Palmer Olsen de l'Université de Copenhague met en lumière les tensions et les opportunités liées à l'intégration de l'IA dans le système juridique européen.

      Le Défi : La Pression entre le Coût et l'État de Droit

      Le principal défi identifié est une "pression" économique et qualitative.

      D'un côté, le travail juridique devient de plus en plus coûteux.

      De l'autre, la demande pour ce travail augmente en raison de la complexification croissante de la réglementation, due au développement technologique, économique et social.

      Les États européens sont donc confrontés au dilemme de maîtriser les dépenses tout en garantissant la haute qualité de l'état de droit, un principe fondamental de leur société.

      Le Rôle de l'IA : Soutien et Optimisation du Travail Juridique

      L'IA peut jouer un rôle de soutien essentiel pour résoudre cette tension de plusieurs manières :

      Recherche d'informations juridiques : L'IA peut analyser des milliers de pages de textes juridiques (lois, précédents judiciaires) de manière beaucoup plus rapide et fiable qu'un humain.

      Cela réduit considérablement le temps consacré à la recherche de sources pertinentes pour la prise de décision.

      Révision de contrats : Pour les grandes entreprises gérant de nombreux contrats, l'IA peut automatiser la vérification de la conformité des contrats entrants avec les standards internes, en s'assurant que les clauses requises sont présentes.

      Diligence raisonnable (Due Diligence) : Lors de l'acquisition d'une entreprise, l'IA peut analyser rapidement le portefeuille de contrats pour évaluer leur valeur économique et identifier les obligations qui en découlent.

      Analyse de cas complexes : Dans des affaires longues et complexes (par ex. fraude fiscale, cas environnementaux) impliquant des milliers de documents sur plusieurs années, l'IA peut aider à construire et visualiser des chronologies et des séquences d'événements, offrant ainsi une meilleure vue d'ensemble aux humains.

      Ces applications permettent d'accomplir un travail juridique de haute qualité à moindre coût.

      L'Impact sur la Profession Juridique

      Contrairement aux craintes courantes, l'IA ne devrait pas éliminer les emplois des juristes.

      Au contraire, elle est susceptible d'améliorer leurs conditions de travail en prenant en charge les aspects les plus "fastidieux" et répétitifs du métier, qui ne requièrent pas une compétence juridique de haut niveau.

      Les juristes pourront ainsi se consacrer aux tâches plus intéressantes et fondamentales, telles que la construction d'arguments, la défense des clients et la garantie de la justice.

      Risques et Préoccupations Essentiels

      L'utilisation de l'IA dans le domaine juridique n'est pas sans risques. Deux préoccupations majeures sont soulevées :

      1. Le risque d'aliénation par manque d'explicabilité : L'IA fonctionne différemment de l'intelligence humaine.

      Les décisions juridiques prises par certains algorithmes peuvent être difficiles, voire impossibles, à expliquer. Si les citoyens et même les professionnels ne peuvent pas comprendre comment une décision a été prise, cela peut entraîner une aliénation vis-à-vis des autorités de l'État.

      2. Le risque de multiplication des erreurs : Une faille dans un processus juridique automatisé ne provoque pas une seule erreur isolée, mais une erreur multipliée sur potentiellement des milliers de cas.

      Cela peut conduire à des violations massives des droits des citoyens si les systèmes ne fonctionnent pas correctement.

      Ces risques ne sont pas une perspective lointaine ; il est jugé crucial de les prendre en compte dès maintenant, lors du développement des modèles d'IA, notamment en concevant des systèmes où les humains restent "dans la boucle" pour superviser et collaborer avec l'IA.

      L'IA dans le Domaine de la Santé : Une Approche Préventive pour l'Afrique du Sud

      Deshen Moodley, de l'Université du Cap, expose les défis uniques du système de santé sud-africain et le potentiel transformateur de l'IA.

      Le Défi : Un Système de Santé sous Forte Pression

      Le système de santé sud-africain est décrit comme "très tendu" en raison de plusieurs facteurs :

      Ressources limitées : En tant que pays en développement, les fonds alloués à la santé sont restreints.

      Fardeau élevé des maladies transmissibles : Le pays fait face à une forte prévalence du VIH et de la tuberculose, ce qui met une pression énorme sur le système.

      Pénurie de personnel qualifié : Il y a un manque critique de médecins et d'infirmières.

      Modèle de santé curatif : Le système est principalement réactif, traitant les patients une fois qu'ils sont malades, ce qui implique des traitements coûteux et une gestion de crise constante.

      Le Rôle de l'IA : De la Détection à la Prévention

      L'IA, bien qu'encore sous-explorée en Afrique du Sud, a un potentiel immense pour améliorer la détection et, surtout, la prévention.

      Détection et diagnostic : L'IA peut être utilisée pour analyser automatiquement des images médicales (radiographies, etc.) ou pour recommander des diagnostics et des interventions.

      Santé préventive : C'est le domaine le plus prometteur.

      En utilisant des modèles d'apprentissage automatique et des techniques basées sur la connaissance, l'IA peut prédire l'apparition d'une maladie avant qu'elle ne se manifeste.

      Cela permet des interventions proactives et un passage crucial vers un modèle de santé préventive, particulièrement pertinent pour les pays à faibles ressources.

      Adapter l'IA aux Contextes à Faibles Ressources

      Un simple transfert de technologie des pays développés n'est pas une solution viable. Il est impératif de prendre en compte le contexte local. L'approche privilégiée se concentre sur :

      Technologies à faible coût : Développer des solutions open source, avec des coûts de déploiement et de maintenance réduits et de faibles besoins en puissance de calcul.

      Interopérabilité : Un projet concret, le "Open Health Mediator", a été développé en partenariat avec une ONG africaine pour une fraction du coût des solutions équivalentes dans les pays développés.

      Dispositifs portables (Wearables) à faible coût : À l'instar des téléphones portables, le prix des wearables devrait chuter, permettant une adoption à grande échelle en Afrique pour un suivi continu de la santé des individus.

      Vision d'Avenir : La Santé Préventive et les Jumeaux Numériques

      La vision optimiste pour les 10 à 20 prochaines années est centrée sur la convergence de plusieurs technologies pour une santé préventive à grande échelle.

      1. Suivi continu via les wearables : Une simple montre-bracelet mesurant la fréquence cardiaque ou l'ECG pourrait, grâce à l'IA, détecter l'humeur et l'état émotionnel d'une personne et prédire les états négatifs pouvant affecter sa santé.

      2. Le Jumeau Numérique individuel : La collecte continue de données via ces dispositifs crée une "empreinte virtuelle" ou un jumeau numérique de l'individu, un miroir de sa personne dans le monde virtuel.

      3. Le Jumeau Numérique d'une ville : En agrégeant les données des jumeaux numériques individuels, il devient possible de créer un jumeau numérique d'une ville entière.

      Ce modèle permettrait de surveiller la santé et le bien-être à une échelle sans précédent, de simuler la propagation de maladies, d'apprendre des interactions entre les individus et leur environnement, et de mettre en place des interventions proactives.

      Un tel système aurait été un "game-changer" lors de la pandémie de COVID-19.

      Cette vision ambitieuse repose sur la convergence de l'IA, des systèmes cyber-physiques (jumeaux numériques) et de la réalité virtuelle.

      L'Importance de la Collaboration Interdisciplinaire

      Les deux experts soulignent la valeur de l'environnement de recherche interdisciplinaire de l'IEA de Paris.

      Le fait d'être confronté à des spécialistes d'autres domaines (juristes, philosophes, technologues) a permis d'élargir leurs horizons, de générer de nouvelles approches à leurs propres problèmes de recherche et de repenser la manière de communiquer des idées complexes à un public non technique.

      Cette expérience renforce l'idée que le développement futur de systèmes d'IA ayant un impact sociétal majeur doit impérativement adopter une approche interdisciplinaire pour être efficace et responsable.

    1. Synthèse : Décomposition de la Discrimination

      Résumé Exécutif

      Cette étude, présentée par la Professeure Lina Restrepo-Plaza, propose une approche méthodologique innovante issue de l'économie expérimentale pour décomposer la discrimination en deux composantes distinctes :

      • la discrimination fondée sur les préférences (ou les goûts) et
      • la discrimination fondée sur les croyances (ou statistique).

      En utilisant une version modifiée du "Jeu des Biens Publics" dans le contexte post-conflit en Colombie, l'expérience vise à isoler les motivations sous-jacentes des comportements discriminatoires.

      Les résultats préliminaires révèlent des preuves claires de discrimination fondée sur les préférences.

      Notamment, les participants non-victimes du conflit ont tendance à discriminer les victimes ainsi que les ex-combattants.

      Un résultat majeur et contre-intuitif émerge : les victimes directes du conflit se montrent plus coopératives et moins discriminatoires envers les ex-combattants que ne le sont les non-victimes, suggérant une forme de résilience et une plus grande ouverture.

      L'importance de cette décomposition réside dans ses implications pour les politiques publiques.

      Une discrimination basée sur les croyances peut être corrigée par des campagnes d'information, tandis qu'une discrimination ancrée dans les préférences nécessite des interventions plus profondes, telles que la promotion du contact intergroupes pour réduire les préjugés.

      L'étude ouvre ainsi des voies pour des interventions anti-discriminatoires plus ciblées et potentiellement plus efficaces.

      --------------------------------------------------------------------------------

      1. Contexte et Problématique de la Discrimination

      La discrimination est un phénomène économique et social persistant et quantifiable à l'échelle mondiale. Des données récentes illustrent des disparités significatives :

      États-Unis (2022) : Les femmes gagnent 82 centimes pour chaque dollar gagné par un homme.

      États-Unis (2023) : Les Latinos gagnent 76 centimes pour chaque dollar gagné par un Américain blanc.

      Colombie : 75 % des Vénézuéliens gagnent moins que le salaire minimum, contre 43 % des Colombiens.

      Du point de vue de la science économique, la discrimination est principalement conceptualisée selon deux axes :

      1. Discrimination fondée sur les préférences ("Taste-based") : Un individu traite différemment une autre personne en raison d'une aversion ou d'un préjugé intrinsèque envers cette personne ou le groupe auquel elle appartient.

      C'est un comportement motivé par une antipathie qui n'est pas nécessairement rationalisée.

      2. Discrimination fondée sur les croyances ("Belief-based" ou statistique) : Un individu agit différemment en se basant sur des croyances ou des stéréotypes concernant les caractéristiques moyennes d'un groupe (par exemple, la productivité, la fiabilité).

      Le comportement n'est pas motivé par une aversion personnelle, mais par une inférence statistique, même si celle-ci est erronée.

      La principale difficulté méthodologique consiste à distinguer et mesurer l'influence respective de ces deux mécanismes, car les approches traditionnelles (comme la fourniture d'informations supplémentaires pour neutraliser les croyances) sont souvent "bruitées" et sensibles à des facteurs contextuels (voix, apparence, etc.).

      2. Une Approche par l'Économie Expérimentale

      Pour surmonter ces limites, la recherche utilise un protocole d'économie expérimentale basé sur le "Jeu des Biens Publics", un modèle canonique qui étudie la coopération et la confiance.

      2.1 Le Jeu des Biens Publics

      Le jeu se déroule entre deux participants anonymes. La mécanique est la suivante :

      • Chaque joueur reçoit une dotation initiale (par exemple, 15 $).

      • Chaque joueur peut décider de contribuer tout ou partie de cette somme à un "compte commun".

      • L'équipe de recherche bonifie le compte commun en ajoutant 2 $ pour chaque 5 $ qui y sont déposés.

      • La somme totale du compte commun (contributions + bonus) est ensuite divisée à parts égales entre les deux joueurs, quel que soit le montant de leur contribution individuelle.

      Ce dispositif crée un dilemme social :

      Coopération maximale : Si les deux joueurs contribuent la totalité de leur dotation, le gain collectif est maximisé, et leur gain individuel final est supérieur à leur dotation de départ (21 $ chacun dans l'exemple).

      Incitation à la défection : Un joueur a un intérêt individuel à ne rien contribuer tout en bénéficiant de la contribution de l'autre, ce qui lui permet de conserver sa dotation initiale et de recevoir la moitié du pot commun (il termine avec 25,5 $ tandis que le coopérateur n'a que 10,5 $).

      Échec de la coopération : Si personne ne contribue, personne ne bénéficie du bonus.

      La décision de contribuer est donc fortement influencée par les croyances qu'un joueur a sur le comportement de son partenaire.

      2.2 Population et Contexte de l'Étude

      L'expérience a été menée en Colombie auprès de 193 participants du SENA, un grand organisme public de formation professionnelle pour les populations vulnérables.

      Après le processus de paix, le SENA a intégré des victimes du conflit, des non-victimes (issues de milieux économiques vulnérables similaires) et des ex-combattants.

      Les participants savaient que leur partenaire anonyme pouvait appartenir à l'un de ces trois groupes :

      • Victime du conflit

      • Non-victime

      • Ex-combattant

      La présence d'ex-combattants dans le bassin de participants, bien que leur nombre soit faible (7), a rendu cette possibilité saillante et crédible pour tous.

      3. Le Dispositif de Décomposition

      L'étude utilise deux tâches successives pour isoler les composantes de la discrimination.

      Tâche

      Description

      Mécanisme de Discrimination Capturé

      1. Coopération Inconditionnelle

      Les participants décident combien contribuer pour chaque type de partenaire possible (victime, non-victime, ex-combattant), sans connaître le montant que l'autre contribuera.

      Préférences + Croyances. La décision est influencée à la fois par l'aversion potentielle pour un groupe et par les croyances sur la probabilité de coopération de ce groupe.

      2. Coopération Conditionnelle

      Les participants indiquent combien ils contribueraient pour chaque montant possible de contribution de l'autre (par ex. "si l'autre contribue 0, je contribue X ; s'il contribue 5, je contribue Y...").

      Préférences uniquement. L'incertitude sur le comportement de l'autre est éliminée.

      Si un participant contribue différemment face à une victime et une non-victime qui ont toutes deux contribué le même montant, cette différence ne peut être attribuée qu'à une préférence.

      L'étude évite de demander directement aux participants leurs croyances afin de contourner les biais de désirabilité sociale et de dissonance cognitive, qui poussent les gens à rationaliser leurs réponses.

      4. Résultats Préliminaires et Analyse

      L'analyse des données, bien qu'encore en cours pour la partie "croyances", fournit déjà des conclusions claires sur la discrimination fondée sur les préférences.

      4.1 Mise en Évidence de la Discrimination

      Discrimination envers les ex-combattants : Les victimes et les non-victimes discriminent toutes deux les ex-combattants.

      Cependant, les non-victimes discriminent beaucoup plus fortement que les victimes.

      Relations entre victimes et non-victimes :

      ◦ Les non-victimes discriminent les victimes.   

      ◦ De manière surprenante, les victimes manifestent une discrimination positive envers les non-victimes, se comportant mieux avec elles qu'avec les membres de leur propre groupe.

      4.2 Le Résultat Contre-intuitif : La Résilience des Victimes

      Le résultat le plus marquant est que les personnes ayant été directement exposées au conflit (les victimes) se montrent plus coopératives et moins enclines à discriminer les ex-combattants que la population non directement affectée.

      Ce constat suggère que l'exposition à des situations difficiles peut favoriser des comportements de résilience et une plus grande ouverture à la coopération.

      Ce résultat est qualifié de "très surprenant" et "porteur d'espoir".

      4.3 Données sur les Ex-Combattants

      Avec seulement sept ex-combattants dans l'échantillon, les données sur leur propre comportement sont anecdotiques.

      Cependant, l'observation initiale est qu'ils ne discriminent aucun groupe et se comportent de la même manière avec les autres qu'entre eux.

      5. Implications et Perspectives

      5.1 Implications pour les Politiques Publiques

      La capacité à décomposer la discrimination est cruciale pour concevoir des interventions efficaces :

      • Si la discrimination est principalement fondée sur les croyances, des campagnes d'information peuvent suffire à corriger les perceptions erronées et à mettre à jour les croyances des individus sur les autres groupes.

      • Si elle est principalement fondée sur les préférences, des interventions plus profondes sont nécessaires.

      Des stratégies basées sur le contact intergroupes, comme celles pratiquées au SENA où les différents groupes étudient ensemble, se sont avérées efficaces pour réduire les préjugés et les stéréotypes.

      5.2 Pistes de Recherche Futures

      La discussion a soulevé plusieurs axes pour de futures recherches :

      Adaptation à d'autres tâches : Appliquer cette méthode à d'autres jeux économiques (jeu de la confiance, jeu de l'ultimatum) pour tester la robustesse des résultats.

      Intégration de données qualitatives : Compléter l'approche quantitative en interrogeant les participants sur leurs représentations, même biaisées, pour comprendre les arguments qu'ils jugent "acceptables".

      Étude en jeux répétés : Analyser comment la discrimination évolue sur plusieurs tours d'interaction.

      Une expérience positive répétée avec un membre d'un autre groupe est-elle suffisante pour modifier un préjugé, et si oui, à quelle vitesse ?

      Cela permettrait de mesurer la "résilience du préjugé".

    1. Document d'Information : Repenser la Collaboration avec l'Ennemi

      Résumé Exécutif

      Ce document synthétise les réflexions d'Adam Kahane, directeur de Reos Partners, sur la nature et les mécanismes de la collaboration dans des contextes de profonds désaccords.

      L'analyse est issue de son travail de réécriture de son livre de 2017, Collaborating with the Enemy.

      L'idée centrale de Kahane est que la collaboration est définie par une tension fondamentale : la nécessité de travailler avec des personnes avec qui l'on est en désaccord pour résoudre des problèmes complexes, et la crainte que ce faisant, on trahisse ses propres valeurs fondamentales.

      Pour explorer cette dynamique, il propose un modèle de "cercles concentriques" qui classe les relations de la collaboration la plus proche à l'élimination de l'ennemi.

      L'objectif principal est de trouver des moyens d'élargir le cercle de la collaboration.

      Alors que la première édition de son livre se concentrait sur les approches individuelles, sa recherche actuelle vise à identifier et à comprendre les approches collectives qui favorisent une collaboration plus large et plus efficace.

      Celles-ci incluent des cadres constitutionnels et juridiques, des systèmes politiques et réglementaires, des normes culturelles et des processus de réconciliation.

      La discussion qui suit son exposé met en lumière des concepts clés tels que l'importance de trouver des objectifs communs, même minimes ; le rôle de la planification par scénarios non pas pour prédire mais pour façonner l'avenir ; et la prise de conscience que la collaboration peut également servir à créer des conflits en unissant un groupe contre un autre.

      1. Contexte et Problématique Centrale

      Adam Kahane est un praticien spécialisé dans la conception et la facilitation de dialogues multipartites sur des questions complexes depuis 1991.

      Son travail l'a amené à intervenir dans divers contextes, notamment :

      • Le processus de paix en Colombie, impliquant toutes les parties, y compris les factions armées.

      • Les chaînes d'approvisionnement alimentaire durable, réunissant des communautés, des entreprises et des régulateurs.

      • Les relations entre les États-Unis et la Chine, avec des acteurs de la sécurité et de la défense.

      • Le travail avec des peuples autochtones et insulaires du détroit de Torrès en Australie.

      Sa réflexion actuelle s'inscrit dans le cadre de la réécriture de son livre _Collaborating with the Enemy:

      How to work with people you don't agree with or like or trust_.

      La question fondamentale qui guide son travail peut être résumée par une formulation plus grandiose : "Comment diable pouvons-nous vivre ensemble ?"

      Les Quatre Approches face à une Situation Problématique

      Selon Kahane, face à une situation que nous jugeons problématique, quatre stratégies principales s'offrent à nous :

      1. Forcer (Make) : Tenter d'imposer notre volonté, indépendamment de ce que les autres veulent.

      2. S'adapter (Adapt) : Accepter la situation telle qu'elle est, car nous ne pouvons pas la changer.

      3. Sortir (Exit) : Quitter la situation (émigrer, démissionner, divorcer).

      4. Collaborer (Collaborate) : Travailler avec d'autres acteurs pour changer la situation.

      Son travail se concentre sur cette quatrième option.

      La Double Signification de la "Collaboration"

      Kahane souligne une dualité sémantique cruciale dans le mot "collaboration", qui est au cœur des défis qu'il explore.

      Sens positif : Travailler ensemble avec d'autres. Les recherches Google pour "collaboration" montrent des images de coopération harmonieuse.

      Sens négatif : Collaborer de manière traîtresse avec l'ennemi. Il illustre ce point avec une photo de 1944 montrant deux collaboratrices françaises punies par la tonture de leurs cheveux.

      Cette double signification révèle la tension inhérente à toute entreprise de collaboration :

      "D'une part, nous pensons que nous pourrions avoir besoin de travailler avec ces autres personnes pour arriver là où nous essayons d'aller, et d'autre part, nous craignons que le faire nous obligerait à trahir ce que nous tenons pour précieux."

      2. Un Modèle de Relations : Les Cercles Concentriques

      Pour mieux comprendre les frontières de la collaboration, Kahane propose un modèle de cercles concentriques illustrant différents niveaux de volonté d'interaction avec autrui :

      1. Collaboration : Le cercle intérieur, composé des personnes avec qui nous sommes prêts à travailler activement.

      2. Cohabitation : Les personnes avec qui nous ne voulons pas collaborer, mais avec qui nous sommes prêts à partager un espace (un foyer, une ville, un pays).

      3. Coexistence : Les personnes avec qui nous ne sommes pas prêts à cohabiter, mais dont nous tolérons l'existence à condition qu'elles restent séparées.

      C'est le principe de l'apartheid ("apartness").

      4. Élimination : Le cercle extérieur, composé de nos ennemis, des personnes que nous ne sommes même pas prêts à laisser coexister et que nous devons expulser ou éliminer.

      L'objectif de sa recherche est de comprendre comment "déplacer la frontière entre les personnes avec qui nous sommes prêts à collaborer et celles que nous considérons comme nos ennemis".

      3. Forces Motrices et Forces Restrictives

      La décision de collaborer ou non est influencée par des forces contradictoires.

      Forces Poussant à la Collaboration

      Forces Freinant la Collaboration

      Nécessité d'une action collective : Des défis qui exigent une réponse commune (ex: gestion des eaux usées dans la ville divisée de Nicosie, changement climatique).

      Différences réelles : Désaccords, méfiance et conflits d'intérêts concrets et non imaginaires.

      Peur du conflit violent : La crainte qu'une absence de collaboration ne mène à la guerre.

      Fragmentation et polarisation : Tendance au tribalisme, à la partisanerie, aux bulles d'information, à la démagogie et à la diabolisation.

      Sentiment d'interconnexion ("All My Relations") : Une conviction, notamment issue des traditions des Premières Nations, que nous sommes tous liés, que nous nous entendions bien ou non.

      Identification exclusive à son groupe ("mon peuple") : Une vision qui empêche de s'ouvrir à la collaboration avec des "autres".

      La diabolisation est un frein particulièrement puissant : "ces autres ne sont pas simplement nos adversaires ou nos ennemis, ce sont des démons, des diables. Et comment pourrions-nous collaborer avec le diable ? Nous ne le pouvons pas."

      4. L'Enquête Actuelle : Des Approches Individuelles aux Approches Collectives

      La question centrale qui anime la réécriture du livre de Kahane est de nature pratique : "Quelles approches permettent une collaboration plus nombreuse et de meilleure qualité ?".

      Il s'agit d'identifier des méthodes pour élargir le cercle des acteurs avec lesquels nous sommes disposés et capables de travailler.

      Le Passage de l'Individuel au Collectif

      La première édition de son livre se concentrait sur les approches individuelles, destinées à aider les individus à mieux collaborer. Ces approches étaient :

      • Accepter le conflit autant que la connexion.

      • Expérimenter pour avancer.

      • Reconnaître son propre rôle dans le système.

      Pour la seconde édition, Kahane souhaite compléter cette perspective en explorant les approches collectives.

      Il considère la relation entre le travail individuel et collectif comme une "bande de Möbius", où l'un ne va pas sans l'autre.

      Exemples d'Approches Collectives à Explorer

      Kahane a dressé une liste préliminaire d'approches collectives, qu'elles soient anciennes ou de pointe, qui permettent de collaborer au-delà des différences :

      Constitutions et accords : Cadres établis pour gérer les différences sans recourir à la violence.

      Organisation politique : Façons de s'organiser pour collaborer avec certains contre d'autres, ou contre un problème commun.

      Systèmes réglementaires : Mécanismes pour gérer les différences.

      Organisation des villes : Comment l'urbanisme peut faciliter la cohabitation et le travail en grande diversité.

      Politiques et "Nudges" : Interventions (comme celles d'Antanas Mockus à Bogota) conçues pour modifier les relations entre les gens, les faisant passer de la violence à la paix.

      Culture, valeurs et normes : Leur influence sur la capacité à collaborer.

      Réconciliation et guérison : Le rôle de la prise en charge des traumatismes collectifs et du rétablissement de la paix.

      5. Perspectives Issues de la Discussion

      Plusieurs intervenants ont enrichi la réflexion de Kahane avec des concepts et des exemples pertinents :

      Trouver un objectif commun, même minime : Même avec le pire ennemi, il est souvent possible de trouver un motif commun.

      Commencer par ce petit objectif peut créer une expérience de collaboration positive qui change la dynamique de la relation.

      La finalité de la collaboration : Consensus ou Agonisme ? : La collaboration vise-t-elle à atteindre un consensus ou à gérer une tension permanente ("agonisme") ? Kahane adopte une posture pragmatique : l'objectif est de résoudre le problème en question.

      Le meilleur scénario est de pouvoir vivre avec des différences et une pluralité permanentes. Il cite le président colombien Santos :

      "il est possible de travailler avec des gens avec qui nous ne sommes pas d'accord et avec qui nous ne serons jamais d'accord".

      La Planification par Scénarios comme Outil de Co-création : La méthode des scénarios, apprise chez Shell, peut être détournée de son objectif initial (prévoir et s'adapter à l'avenir). Utilisée dans des contextes de conflit (Colombie, Myanmar), elle devient un moyen pour des acteurs, même en guerre, de "co-créer des récits sur ce qui pourrait arriver afin d'influencer ce qui arrive".

      Le Droit au-delà des Constitutions : Des règles de procédure, telles que les exigences de supermajorité ou l'obligation de motiver les décisions, peuvent contraindre les acteurs à dialoguer, à faire des compromis et donc à collaborer.

      La Collaboration comme Moteur de Conflit : Une mise en garde cruciale a été formulée : "les gens collaborent principalement en partant d'un environnement pacifique pour créer plus de conflits".

      La collaboration se fait toujours avec certains et souvent contre d'autres, ce qui peut exacerber les conflits ou l'oppression.

      Le Cadre de la Justice Transitionnelle : Les cadres de la justice transitionnelle (commissions de vérité, réparations) offrent une approche systématique et globale pour aborder les problèmes de coexistence et de collaboration dans des contextes post-conflit, et sont de plus en plus appliqués à d'autres problématiques sociales.

    1. Document d'Information : Peut-on Réinventer les Lumières ?

      Synthèse

      Ce document d'information synthétise les arguments et les thèmes clés abordés lors de la séance de clôture du cycle "Peut-on réinventer les Lumières ?", organisée par l'Institut d'Études Avancées de Paris.

      Les interventions de Francis Wolf et Céline Spector, deux philosophes éminents, ont convergé vers une défense robuste et nuancée de l'universalisme, tout en examinant de manière critique les objections contemporaines, notamment celles issues des courants identitaires et postcoloniaux.

      L'argument central, porté par Francis Wolf, est que l'humanité forme une communauté morale unique, fondée sur des droits et des devoirs réciproques.

      Il déconstruit méthodiquement les critiques affirmant que les valeurs universelles ne sont qu'un masque pour la domination occidentale.

      En distinguant l'origine d'une idée de sa portée et en s'appuyant sur des exemples concrets de luttes pour la démocratie et la liberté à travers le monde (Printemps arabes, Iran), il soutient que l'universalisme est un outil d'émancipation essentiel. Il insiste sur la distinction fondamentale entre l'universel, qui garantit la diversité, et l'uniforme, qui la nie.

      Céline Spector prolonge cette analyse en se concentrant sur les critiques postcoloniales des droits de l'homme.

      Elle en systématise les principaux arguments (ethnocentrisme, fiction idéologique, outil de colonisation) tout en soulignant les paradoxes inhérents au concept de droits humains dès son origine.

      Son propos, en accord avec celui de Wolf, vise à réaffirmer la pertinence de cet héritage des Lumières face à ces objections.

      La discussion a ensuite exploré plusieurs concepts connexes, dont la notion de "pluriversel" (jugée contradictoire ou maladroite), l'existence de précédents non-occidentaux aux droits humains (la Charte du Mandé de 1236), et la tension persistante entre l'idéal universel et son application souvent défaillante ("deux poids, deux mesures").

      Enfin, le débat s'est ouvert sur les défis contemporains, tels que les droits de la nature face à la crise environnementale et le rôle de l'héritage des Lumières dans la construction d'une Europe capable de résister aux dynamiques impériales.

      --------------------------------------------------------------------------------

      Contexte de l'Événement

      La discussion s'est tenue dans le cadre de la séance de clôture du cycle de conférences de l'IEA de Paris, présidé par Betina Laville, sur le thème "Peut-on réinventer les Lumières ?".

      L'objectif était de conclure une année de réflexion sur la place de l'universel dans un monde qualifié de "fracturé" et de plus en plus contestataire envers l'héritage intellectuel européen.

      Les deux intervenants principaux étaient :

      Francis Wolf : Philosophe, professeur émérite à l'École Normale Supérieure, spécialiste de philosophie antique et auteur de travaux significatifs sur l'humanisme et l'universalisme, notamment Plaidoyer pour l'universel.

      Céline Spector : Philosophe, professeure à Sorbonne Université, spécialiste des Lumières (en particulier Montesquieu et Rousseau) et des questions européennes, auteure de No Demos. Souveraineté et démocratie à l'épreuve de l'Europe.

      Le Plaidoyer pour l'Universalisme de Francis Wolf

      Francis Wolf a structuré son intervention comme une défense des valeurs universelles, qu'il définit à travers une thèse fondatrice : "l'humanité forme une communauté morale de droit et de devoirs réciproques".

      Il se concentre principalement sur la réfutation des critiques qui jugent cet universalisme excessif, au profit de communautés morales restreintes ("infrahumaines").

      Les Critiques de l'Universalisme

      Wolf identifie deux grands courants critiques contemporains de l'universalisme :

      1. Les idéologies "de droite" : Nationalistes, racistes et xénophobes, elles nient l'existence de l'Homme en général pour n'admettre que des communautés de "semblables" ("nous" contre "eux").

      Cette vision, selon Wolf, est en pleine résurgence et se manifeste par le piétinement du droit international (depuis l'invasion de l'Ukraine), la remise en cause du droit des réfugiés (accords de Genève) et la montée des politiques discriminatoires ou d'épuration ethnique.

      2. Les idéologies identitaristes "de gauche" : Symétriques aux premières, elles reprennent des arguments hérités d'un "marxisme simplifié" selon lesquels toute prétention à l'universalité est un leurre masquant la domination.

      Réfutation des Arguments Anti-Universalistes

      Wolf examine et réfute systématiquement plusieurs arguments récurrents contre les valeurs universelles.

      Argument Critique

      Réfutation par Francis Wolf

      1. Aucune lutte ne peut se faire au nom de l'universel, car elle défend toujours des victimes particulières.

      Si les combats pour des minorités oublient qu'ils visent l'égalité pour tous, ils trahissent leur propre cause.

      Les colonisés n'ont pas lutté pour devenir colonisateurs, mais pour abolir le colonialisme.

      2. L'universel se présente comme neutre, mais ne l'est jamais ; il nie les relations de domination.

      Bien que l'universel soit parfois utilisé pour nier les injustices, il n'est pas nécessaire de se définir uniquement "en tant que" (femme, colonisé, etc.).

      Les identités sont métissées, fluides et non des essences réifiées.

      3. L'expérience des souffrances particulières est incommunicable et il n'y a pas de lieu neutre pour juger.

      Une injustice ne concerne pas que la victime ou le coupable, mais la communauté morale entière. Sans un "tiers lieu" permettant de juger, il n'y a plus de justice, seulement des vengeances. Toute souffrance a une dimension communicable.

      4. L'universel n'est que le masque des intérêts dominants.

      Cet argument, bien que souvent justifié par l'histoire (colonisation, guerre d'Irak), n'est pas généralisable.

      Les pires entreprises de domination (génocides) n'ont pas besoin de ce prétexte et se font au nom d'identités essentialisées ("sous-hommes", "bêtes nuisibles").

      5. Tout universel est en fait particulier ; c'est un autre nom de l'Occident.

      Concéder qu'un universel naît dans un contexte particulier n'en limite pas la portée. L'algèbre, née en Perse, n'est pas une science "iranienne".

      La démocratie et les droits humains sont réclamés par les peuples en lutte partout dans le monde (Printemps arabes, Hong Kong, Iran), et leurs despotes les rejettent en les qualifiant de "valeurs occidentales".

      Prétendre que l'Occident a seul inventé les droits humains est une "illusion occidentaliste" (Amartya Sen).

      La Vertu Émancipatrice de l'Universel

      Pour conclure, Wolf affirme que l'universalisme conserve sa force émancipatrice.

      Il pose la question : qui est le véritable ethnocentriste ?

      Celui qui croit en l'existence de consciences critiques dans toutes les cultures, ou celui qui essentialise les autres cultures en leur déniant cette capacité critique ?

      Il distingue enfin l'universel de l'uniforme. Loin d'effacer les particularités, les valeurs universelles (laïcité, liberté d'opinion, tolérance) sont la condition de leur coexistence.

      Elles constituent un "universel de second niveau", formel, qui garantit la diversité.

      La Critique Postcoloniale des Droits de l'Homme selon Céline Spector

      Céline Spector se déclare en "profond accord" avec Francis Wolf et concentre son propos sur la critique spécifique des droits de l'homme par les études postcoloniales et décoloniales.

      Les Paradoxes Originels des Droits de l'Homme

      Dès leur proclamation aux États-Unis (1776) et en France (1789), les droits de l'homme présentent des paradoxes fondamentaux :

      • Ils sont à la fois évidents et advenus (nés de révolutions).

      • Ils sont à la fois naturels et historiques.

      • Ils sont à la fois innés et civiques.

      • Ils sont à la fois universels et situés.

      Ces paradoxes ont nourri les critiques (marxistes, féministes) qui y voyaient une hypocrisie, notamment en raison de l'exclusion des femmes, des esclaves et d'autres minorités.

      Les Cinq Piliers de la Critique Postcoloniale

      Spector résume la critique postcoloniale des droits de l'homme en cinq arguments principaux :

      1. Ils ne sont pas universels mais occidentaux, protégeant uniquement les citoyens d'Europe.

      2. Ce sont des fictions idéologiques ayant servi à justifier la "mission civilisatrice" de la colonisation.

      3. Ils sont associés à une conception de la raison qui exclut les peuples "sauvages" ou "barbares", jugés incapables d'y accéder.

      4. La liste des droits est arbitraire et abusive, notamment l'inclusion du droit de propriété qui a servi à exproprier les peuples nomades.

      5. Ce sont les droits des colons et de leurs complices, qui n'avaient aucune volonté politique de mettre fin au pillage des colonies ou à l'esclavage.

      Tout en reconnaissant la nécessité de prendre en compte ces critiques pour révéler les "tensions inhérentes aux Lumières", l'approche de Céline Spector vise à formuler des objections à cette vision, rejoignant ainsi la défense de l'universalisme de Francis Wolf.

      Thèmes Clés de la Discussion

      La période d'échange avec le public a permis d'approfondir plusieurs thématiques.

      Le Concept de "Pluriversel"

      Interrogés sur cette notion issue des théories décoloniales, les deux intervenants expriment leur scepticisme :

      Francis Wolf y voit soit une contradiction dans les termes, soit une simple reformulation du fait que l'universel est toujours perçu depuis un point de vue culturel particulier, sans pour autant y être prisonnier.

      Céline Spector, citant la définition du Dictionnaire décolonial, le décrit comme une "critique radicale de l'universalisme".

      Elle considère ce concept comme une "tentative maladroite" de la part d'auteurs (Ramon Grosfoguel, Walter Mignolo, etc.) qui se retrouvent dans une impasse existentielle : vouloir lutter pour les droits sans utiliser l'outil des droits universels.

      Précédents Historiques et Application du Droit

      La Charte du Mandé (1236) : Cette charte, issue de l'empire du Mali, est évoquée comme un possible précédent africain à la reconnaissance de valeurs universelles, telles que l'égalité entre ethnies et religions, et la participation des femmes au gouvernement.

      Le "Deux Poids, Deux Mesures" : Un participant soulève le problème du "double standard" dans l'application du droit international.

      Céline Spector reconnaît la légitimité de cette critique mais met en garde contre une indignation qui dévalorise les institutions internationales (ONU, CPI), les rendant fragiles et poussant les puissances hégémoniques à simplement les quitter.

      Universalité, Environnement et Europe

      Droits de la Nature : La question d'un "droit à l'environnement" est soulevée comme un défi majeur pour réinventer les Lumières.

      La discussion porte sur la tension entre les droits humains et les "droits de la nature", un concept de plus en plus débattu juridiquement (ex: le fleuve Whanganui en Nouvelle-Zélande, la lagune Mar Menor en Espagne).

      Ce débat interroge la centralité de l'homme dans la définition de l'environnement.

      L'Héritage des Lumières pour l'Europe : Céline Spector propose de voir dans l'héritage de Montesquieu, et spécifiquement son modèle de "République fédérative", un outil puissant pour penser la résistance des démocraties face à la résurgence des empires.

      Francis Wolf abonde en ce sens, soulignant que la construction européenne illustre la primauté du demos (communauté politique) sur l'ethnos (communauté préexistante), un principe également au cœur de la résistance ukrainienne.

      Les "Lumières Noires" : Ce terme, associé à Curtis Yarvin, est décrit comme un "usage complètement perverti" des Lumières, désignant une technocratie oligarchique où une élite numérique domine des citoyens dépossédés de leurs droits.

      C'est l'antithèse même de l'idéal des Lumières.

    1. La Créativité : Perspectives Croisées des Neurosciences, de l'Art, de la Musique et de l'Intelligence Artificielle

      Résumé

      Ce document de synthèse analyse les thèmes et les arguments clés d'une table ronde sur la créativité, réunissant des experts en neurosciences, composition musicale, arts plastiques et intelligence artificielle.

      La discussion s'articule autour d'un cadre conceptuel définissant la créativité humaine selon quatre dimensions : la nouveauté, l'adéquation, l'authenticité et l'agentivité.

      Les intervenants explorent comment ces dimensions se manifestent dans leurs domaines respectifs.

      En intelligence artificielle, la créativité émerge par des mécanismes de curiosité et des algorithmes évolutionnistes, permettant à des robots de découvrir de manière autonome des solutions nouvelles et efficaces à des problèmes complexes, comme le démontrent les exemples du jeu de Go ou de l'apprentissage moteur.

      Dans le domaine artistique et musical, la créativité oscille entre la génération au sein de contraintes strictes (l'algorithme de composition de Mozart) et la transgression délibérée des conventions pour créer de l'inédit (l'hybridation chez Beethoven).

      Les bases neuroscientifiques révèlent le rôle central du cortex préfrontal, qui agit comme un moniteur capable d'inhiber des stratégies inefficaces pour laisser émerger de nouvelles solutions issues de la mémoire.

      Enfin, des exemples tirés du monde animal, notamment le poulpe et sa capacité de camouflage et de ruse ("métis"), suggèrent que la créativité est un phénomène plus large que l'activité purement humaine.

      La discussion conclut sur les limites actuelles de l'IA, qui excelle à produire des surfaces cohérentes mais peine encore à générer des œuvres dotées de la profondeur structurelle et de l'authenticité caractéristiques de la création humaine.

      --------------------------------------------------------------------------------

      1. Un Cadre Théorique pour la Créativité

      Étienne Koechlin, neuroscientifique, propose un modèle standard pour décomposer le concept de créativité en quatre dimensions fondamentales.

      Ce cadre sert de référence tout au long de la discussion pour analyser les différentes manifestations de la créativité.

      Dimension

      Description

      Concepts Clés

      Cognitives

      Nouveauté

      La capacité à produire quelque chose qui n'existait pas auparavant. Cette possibilité est inhérente même aux systèmes formels les plus fermés, comme le démontre le théorème de Gödel.

      Génération, innovation, possibilité de l'inédit.

      Adéquation

      La production nouvelle doit être pertinente par rapport à un contexte externe. Cela peut être la solution à un problème, ou une œuvre d'art qui résonne avec un public.

      Évaluation, pertinence, contexte, originalité (articulation nouveauté/adéquation).

      Conatives

      Authenticité

      L'acte créatif est l'expression d'un individu, souvent issue d'un déséquilibre interne (insatisfaction, état extatique).

      Le créateur cherche à répondre à ce déséquilibre.

      Expression individuelle, déséquilibre interne, énergie créatrice.

      Agentivité

      La créativité est une action visant à transformer ou influencer le monde. Il y a une volonté d'être effectif, d'avoir un impact.

      Action, volonté, transformation du monde, effectivité.

      Koechlin souligne que ces dimensions peuvent être présentes à des degrés divers selon l'activité (humaine, animale ou artificielle).

      Par exemple, une IA comme AlphaGo fait preuve de nouveauté et d'adéquation (coups créatifs pour gagner), et d'une forme d'agentivité (interagir avec un joueur humain), mais son authenticité est considérée comme très réduite.

      2. La Créativité dans les Systèmes Artificiels

      Pierre-Yves Oudeyer, chercheur en IA, présente comment des machines peuvent générer des comportements et des connaissances à la fois nouveaux, pertinents et efficaces, remplissant ainsi plusieurs critères de la créativité.

      2.1. La Curiosité comme Moteur de l'Exploration

      Le travail de l'équipe de P-Y. Oudeyer se concentre sur la modélisation de la curiosité, comprise comme le mécanisme poussant un agent (enfant ou robot) à explorer spontanément son environnement.

      Apprentissage Autonome : Un robot quadrupède, initialement sans connaissance de son corps ou de l'environnement, apprend par expérimentation.

      Guidé par des algorithmes de curiosité, il teste des actions (bouger ses membres, vocaliser) et observe les résultats.

      Découverte de Régularités : Le robot découvre progressivement des relations de cause à effet : pousser un objet avec son bras le fait bouger, vocaliser vers un autre robot provoque une imitation.

      Cette exploration, motivée par la curiosité, le mène à découvrir les interactions sociales.

      Étienne Koechlin relie cette approche à la recherche en neurosciences sur les moteurs de l'action.

      Il oppose deux visions : l'action pour accumuler des ressources (récompenses) et l'action pour acquérir de l'information et améliorer ses modèles internes du monde.

      La curiosité est au cœur de cette seconde vision : on agit là où l'on pense pouvoir apprendre le plus.

      2.2. Algorithmes Évolutionnistes et Apprentissage par Renforcement

      Des algorithmes inspirés de l'évolution biologique permettent de générer des solutions créatives que des ingénieurs n'auraient pas envisagées.

      Créatures Virtuelles : Dans une simulation, des "créatures" composées de cellules virtuelles (muscles, cellules rigides) sont générées aléatoirement.

      Un critère de "fitness" (capacité à avancer vite) est défini.

      Les créatures les plus performantes sont sélectionnées, leurs "gènes" sont mutés aléatoirement pour créer une nouvelle génération.

      Au fil des générations, des formes de corps et des stratégies de locomotion efficaces et inattendues émergent.

      Robots Physiques : Un robot physique apprend à se déplacer par essais et erreurs (apprentissage par renforcement). Initialement, ses mouvements sont aléatoires et maladroits.

      En quelques minutes, il découvre comment se retourner, puis se mettre sur ses pattes et marcher de manière robuste, capable de réagir aux perturbations.

      La stratégie de mouvement finale n'a pas été programmée par un humain, mais découverte par le robot lui-même.

      Ces mêmes méthodes sont à la base des succès d'AlphaGo, qui a produit des coups jugés "hautement créatifs" par les experts humains.

      3. La Créativité dans la Pratique Artistique

      Les intervenants issus des domaines de la musique et des arts plastiques illustrent la tension créative entre la contrainte et la liberté, et entre la tradition et l'innovation.

      3.1. Musique : Algorithmes et Transgressions

      Le compositeur Floris Guédy présente deux modèles de création musicale :

      Le Jeu de Dés de Mozart : Un système algorithmique pour composer des menuets.

      En lançant des dés, on sélectionne des mesures pré-écrites dans une matrice.

      Bien que basé sur le hasard, le système est ultra-contraint par des règles d'harmonie tonale (fonctions harmoniques : sujet, verbe, complément).

      Le résultat est toujours cohérent et varié, générant des milliards de combinaisons possibles.

      Ce système peut être généralisé pour simuler, avec le même modèle de base, les styles de compositeurs ultérieurs (Schumann, Debussy) en changeant simplement les paramètres.

      L'Hybridation chez Beethoven : L'analyse des brouillons de la 30ème sonate pour piano montre un processus créatif différent. Beethoven oppose deux éléments musicaux (A : monodique et piqué ; B : accords liés) et crée un troisième élément (C) en hybridant leurs caractéristiques.

      Ses carnets révèlent un processus de recherche active, d'essais et d'erreurs pour trouver le contraste maximal rendant l'hybridation la plus audible possible.

      Pour F. Guédy, ce type de créativité, qui consiste à "casser les conventions" d'une infinité de manières possibles, est difficilement simulable par une IA qui cherche plutôt à reproduire ce qui est statistiquement probable.

      3.2. Arts et Artisanat : Co-création et Matière Active

      Patricia Ribault, spécialiste en arts plastiques, met en lumière la créativité dans les processus de "faire" et les interactions.

      La Co-création à Murano : Lors d'un workshop, des étudiants en design présentent des dessins aux maîtres verriers de Murano.

      Les artisans, confrontés à des formes qui dépassent leur savoir-faire traditionnel, doivent inventer de nouvelles techniques.

      Ce moment de "cocréation" pousse les techniques traditionnelles au-delà de leurs limites.

      La Matière Active ("Active Matter") : Elle décrit son travail au sein du cluster d'excellence "Matters of Activity", où des chercheurs de toutes disciplines (scientifiques, ingénieurs, designers) étudient des pratiques comme le filtrage, le tissage ou la découpe sous l'angle de la matière elle-même comme agent actif.

      Visualisation de la Neuroplasticité : Elle présente le projet "Brain Roads", une collaboration entre artistes, designers et neurochirurgiens visant à visualiser la complexité de la plasticité cérébrale.

      Face aux limites des imageries traditionnelles (tractographie), les artistes proposent de nouveaux modèles graphiques (inspirés des cartes de métro, des voxels) pour mieux guider le geste du chirurgien et représenter l'expérience des patients en chirurgie éveillée.

      4. Les Bases Biologiques et Neuroscientifiques

      La discussion explore les mécanismes cérébraux sous-jacents à la créativité humaine ainsi que ses manifestations dans le monde animal.

      4.1. Le Rôle du Cortex Préfrontal

      Étienne Koechlin explique que le cortex préfrontal est la région clé qui "autorise" la créativité chez l'homme.

      Le Mécanisme de Contrôle et d'Ouverture : Cette région du cerveau monitore en permanence nos comportements et stratégies mentales.

      Lorsqu'une stratégie est jugée non pertinente ou inefficace, le cortex préfrontal l'inhibe.

      Cette inhibition permet à de nouvelles options, issues d'un "remixage" contextualisé de la mémoire à long terme, d'émerger.

      Gestion de la Propre Limitation : Le système est conçu pour prendre en compte sa propre limitation. Il accepte de "perdre le contrôle" pour permettre l'émergence de la nouveauté.

      Les nouvelles options sont ensuite évaluées : si elles sont probantes, elles sont confirmées et consolidées en mémoire, enrichissant le répertoire de l'individu pour de futures créations.

      L'Exemple du Test des 9 Points : Ce test classique illustre le processus.

      Pour relier 9 points avec 4 segments de droite sans lever le crayon, il faut abandonner des modèles mentaux implicites (ne pas sortir du carré, ne pas repasser sur un trait).

      La solution émerge lorsqu'on transgresse ces règles auto-imposées.

      4.2. La Créativité Animale : Le Poulpe et la "Métis"

      Patricia Ribault utilise l'exemple du poulpe pour illustrer une forme d'intelligence créative non-humaine, la "métis" (la ruse), théorisée par Marcel d'Étienne et Jean-Pierre Vernand.

      Un Être sans Structure Rigide : Le poulpe peut prendre et perdre forme, ce qui lui confère une plasticité exceptionnelle.

      Maître du Camouflage : Sa créativité s'exprime dans sa capacité à interagir avec la perception de l'autre.

      Le camouflage n'est pas seulement se fondre, mais "tromper celui ou ceux qui vous regardent". Il peut être défensif ou offensif (hypnotiser une proie).

      Le "Mimic Octopus" : Cette espèce est capable non seulement de se camoufler mais de changer son comportement pour imiter d'autres animaux en fonction de la situation.

      La Métis comme Forme de Créativité : La métis est décrite comme une "intelligence à l'œuvre dans le devenir", utilisant "la prudence, la perspicacité, la promptitude", mais aussi "la ruse, voire le mensonge".

      L'être "amétis", comme le poulpe, est "insaisissable" et capable de "retourner constamment des situations".

      5. Thèmes Transversaux et Conclusion

      La discussion finale aborde plusieurs questions clés sur la nature de la créativité et les distinctions entre l'humain et la machine.

      Authenticité et Subjectivité : La question de l'authenticité reste la plus difficile à attribuer aux IA.

      L'authenticité humaine est liée à un déséquilibre interne et à une intention expressive.

      Les IA peuvent simuler une forme de subjectivité primaire (en ayant des modèles de leurs propres connaissances), mais l'expressivité profonde reste un attribut humain.

      Hasard et Contrainte : Le hasard est une composante essentielle du fonctionnement cérébral, notamment via le "bruit neuronal" qui augmente lorsque les modèles du monde sont mis en défaut, ouvrant le "champ des possibles".

      Cependant, comme le montre le jeu de Mozart, un hasard apparent peut opérer au sein de contraintes très fortes.

      La créativité réside dans ce jeu entre ouverture (pensée divergente) et fermeture (pensée convergente).

      Les Limites Actuelles de l'IA : Une anecdote est partagée sur une IA chargée d'improviser dans le style de L'Art de la Fugue de Bach.

      Le résultat était bluffant en surface ("la chair"), mais ignorait complètement la structure fondamentale de l'œuvre.

      De même, un texte rédigé par une IA est décrit comme "très fluide", "cohérent en surface", mais sans "corps" ni profondeur sémantique.

      Sérendipité : Il est souligné que la créativité ne peut pas être planifiée.

      Elle émerge souvent de la sérendipité : la découverte de quelque chose d'intéressant par hasard en cherchant autre chose.

      Pour être efficace, la sérendipité nécessite cependant une capacité de reconnaissance de ce qui est intéressant, ce qui renvoie à la subjectivité et au modèle interne du créateur.

    1. Reviewer #1 (Public review):

      Microglia are mononuclear phagocytes in the CNS and play essential roles in physiology and pathology. In some conditions, circulating monocytes may infiltrate in the CNS and differentiated into microglia or microglia-like cells. However, the specific mechanism is large unknown. In this study, the authors explored the epigenetic regulation of this process. The quality of this study will be significantly improved if a few questions are addressed.

      (1) The capacity of circulating myeloid cell-derived microglia are controversial. In this study, the authors utilized CX3CR1-GFP/CCR2-DsRed (hetero) mice as a lineage tracing line. However, this animal line is not an appropriate approach for this purpose. For example, when the CX3CR1-GFP/CCR2-DsRed as the undifferentiated donor cell, they are GFP+ and DsRed+. When the cell fate has been changed to microglia, they will change into GFP+ and DsRed- cells. However, this process is mediated with busulfan and artificially introduced bone marrow cells in the circulating cell, which is not existed in physiological and pathological conditions. These artifacts will potentially bring in artifacts and confound the conclusion, as the classical wrong text book knowledge of the bone marrow derived microglia theory and subsequently corrected by Fabio Rossi lab1,2. This is the most risk for drawing this conclusion. The top evidence is from the parabiosis animal model. Therefore, A parabiosis study before making this conclusion, combining a CX3CR1-GFP (hetero) mouse with a WT mouse without busulfan conditioning and looking at whether there are GFP+ microglia in the GFP- WT mouse brain. If there are no GFP+ microglia, the author should clarify this is not a physiological or pathological condition, but a defined artificial host condition, as previously study did3.

      (2) In some conditions, peripheral myeloid cells can infiltrate and replace the brain microglia4,5. Discuss it would be helpful to better understand the mechanism of microglia replacement.

      References:

      (1) Ajami, B., Bennett, J.L., Krieger, C., Tetzlaff, W., and Rossi, F.M. (2007). Local self-renewal can sustain CNS microglia maintenance and function throughout adult life. Nature neuroscience 10, 1538-1543. 10.1038/nn2014.

      (2) Ajami, B., Bennett, J.L., Krieger, C., McNagny, K.M., and Rossi, F.M.V. (2011). Infiltrating monocytes trigger EAE progression, but do not contribute to the resident microglia pool. Nature neuroscience 14, 1142-1149. http://www.nature.com/neuro/journal/v14/n9/abs/nn.2887.html#supplementary-information.

      (3) Mildner, A., Schmidt, H., Nitsche, M., Merkler, D., Hanisch, U.K., Mack, M., Heikenwalder, M., Bruck, W., Priller, J., and Prinz, M. (2007). Microglia in the adult brain arise from Ly-6ChiCCR2+ monocytes only under defined host conditions. Nature neuroscience 10, 1544-1553. 10.1038/nn2015.

      (4) Wu, J., Wang, Y., Li, X., Ouyang, P., Cai, Y., He, Y., Zhang, M., Luan, X., Jin, Y., Wang, J., et al. (2025). Microglia replacement halts the progression of microgliopathy in mice and humans. Science 389, eadr1015. 10.1126/science.adr1015.

      (5) Xu, Z., Rao, Y., Huang, Y., Zhou, T., Feng, R., Xiong, S., Yuan, T.F., Qin, S., Lu, Y., Zhou, X., et al. (2020). Efficient strategies for microglia replacement in the central nervous system. Cell reports 32, 108041. 10.1016/j.celrep.2020.108041.

    1. 'Écoute dans le Développement Humain : Une Analyse de la Perspective de la Professeure Elinor Ochs

      Résumé Analytique

      Ce document de synthèse analyse les arguments principaux de la professeure Elinor Ochs concernant le rôle sous-estimé de l'écoute dans le développement de l'enfant.

      La thèse centrale est que les études développementales dominantes, principalement menées dans les sociétés occidentales post-industrielles, se sont concentrées de manière excessive sur la production de la parole par l'enfant dans des contextes dyadiques (parent-enfant), tout en négligeant la compétence cruciale de l'écoute, en particulier l'écoute incidente ("overhearing") au sein d'interactions multipartites.

      En s'appuyant sur des décennies de recherche ethnographique, notamment son travail fondateur au Samoa, Ochs démontre que dans de nombreuses sociétés, les enfants sont socialisés dès leur plus jeune âge pour devenir des auditeurs compétents au sein de conversations de groupe.

      Cette "formation" à l'écoute est facilitée par des "affordances" culturelles spécifiques, telles que l'architecture ouverte des habitations, les postures corporelles qui orientent l'enfant vers l'espace public, et une économie domestique qui valorise la continuité générationnelle et les ressources partagées.

      En contraste, le modèle occidental, avec ses espaces privés et son accent sur l'individualisme économique, favorise des interactions dyadiques centrées sur l'enfant, amplifiant son rôle de locuteur plutôt que d'auditeur.

      En conclusion, la professeure Ochs soutient que les interactions multipartites offrent des avantages développementaux uniques, exposant les enfants à une plus grande diversité de locuteurs, de perspectives et de variétés linguistiques.

      Ses recherches remettent en question l'universalité des modèles actuels d'acquisition du langage et appellent à une réévaluation du rôle de l'écoute comme une compétence socio-culturellement construite, essentielle à l'apprentissage, à la coopération et à l'intégration sociale.

      Introduction : La Perspective d'une Anthropologue Linguistique

      La professeure Elinor Ochs, de l'UCLA, est une anthropologue linguistique qui combine les disciplines de la linguistique et de l'anthropologie.

      Sa méthodologie principale est le travail de terrain ethnographique, utilisant des enregistrements audio et vidéo pour documenter de manière détaillée comment la communication façonne les situations sociales, les relations et les modes de pensée.

      Domaine de spécialisation : Elle a co-créé le sous-domaine de la "socialisation langagière", qui postule qu'en apprenant une langue, les enfants acquièrent simultanément une compétence socioculturelle pour devenir une "personne" au sein de leur communauté.

      Expérience de recherche :

      Samoa (1978-1988) : Étude longitudinale sur l'acquisition du langage chez de jeunes enfants dans un village rural.  

      États-Unis (années 80 et 2000) : Recherches sur les différences de classe sociale dans le discours de résolution de problèmes et une étude interdisciplinaire à grande échelle documentant la vie de 32 familles de la classe moyenne.   

      Autisme (depuis 1997) : Étude des pratiques communicatives des enfants sur le spectre autistique à la maison et à l'école.

      Le Paradigme Dominant dans les Études Développementales : La Primauté de la Parole sur l'Écoute

      La professeure Ochs commence par un constat : bien que la parole et l'écoute soient deux pratiques communicatives universelles, la parole reste de loin l'objet d'intérêt principal dans tous les domaines qui étudient le langage. L'accent est mis sur la production du langage, et non sur le processus qui distingue l'audition de l'écoute.

      Les Limites des Études Quantitatives

      Les études quantitatives sur le développement du langage chez l'enfant se concentrent sur la langue produite par l'enfant, souvent réduite au nombre de mots.

      Une préoccupation majeure du public, notamment concernant les différences socio-économiques ("word gap"), est née de ces études.

      Le Modèle Dyadique : La généralisation dominante est que "plus un enfant entend de mots qui lui sont directement adressés, plus son vocabulaire sera étendu".

      Conditions Idéales Supposées : Ce modèle repose sur des conditions très spécifiques :

      1. L'enfant est l'allocutaire principal dans une conversation dyadique (un locuteur, un auditeur).  

      2. L'interaction est en face à face.  

      3. Le langage utilisé est simplifié et affectif (langage adressé à l'enfant ou "parler bébé").

      La Négation de l'Écoute Incidente : Dans ce cadre, l'écoute de conversations d'autres personnes ("overhearing") est considérée comme ayant "peu ou pas de bénéfice développemental".

      Biais Culturel : Ces études sont principalement situées dans des sociétés occidentales post-industrielles, avec très peu de recherches menées dans des sociétés aux économies sociopolitiques différentes.

      Un Modèle Alternatif : L'Apprentissage par l'Écoute en Contexte Multipartite

      La thèse centrale de la professeure Ochs, étayée par des recherches ethnographiques, est qu'un autre modèle d'apprentissage existe et est courant dans de nombreuses sociétés.

      Arguments Clés

      Argument

      Description

      Argument 1

      Les études développementales valorisent les conversations dyadiques fréquentes où le jeune enfant est locuteur ou allocutaire principal, motivant des interventions éducatives dans le monde entier.

      Argument 2

      Des études ethnographiques montrent que dans certaines sociétés, les nourrissons et les tout-petits participent régulièrement à des conversations multipartites en tant qu'auditeurs incidents légitimes ("legitimate overhearers") ou participants secondaires.

      Argument 3

      Qu'ils soient immergés dans des contextes multipartites ou dyadiques, les enfants neurotypiques acquièrent le langage avec succès dans différents contextes socioculturels.

      Argument 4

      Les interactions multipartites possèdent leurs propres affordances développementales, exposant les enfants à une diversité de locuteurs, de perspectives et de variétés linguistiques, et leur apprenant à adapter leur discours à différents interlocuteurs ("recipient design").

      Argument 5

      Les compétences d'écoute sont renforcées dès la petite enfance par des alignements corporels multipartites tournés vers l'extérieur et par des environnements construits ouverts qui offrent un accès auditif et visuel aux espaces publics.

      Étude de Cas Ethnographique : Le Village Samoan

      Le travail de terrain de la professeure Ochs au Samoa, il y a près de 50 ans, constitue la principale source de données pour son argumentaire.

      Contexte Linguistique et Social

      Langue Complexe : La langue samoane est ergative, avec des ordres de mots multiples, deux registres phonologiques, et un vocabulaire de respect complexe.

      Société Hiérarchique : La société est structurée avec des personnes titrées (grands chefs, orateurs) et non titrées.

      Absence de "Parler Bébé" : Les soignants n'utilisent généralement pas de langage simplifié ou de "parler bébé" avec les nourrissons. Ils n'étiquettent pas les objets et posent rarement des questions dont ils connaissent la réponse.

      Apprentissage Immersif : Les enfants acquièrent le samoan parlé en étant au milieu d'interactions multipartites.

      Les Affordances Environnementales et Corporelles pour l'Écoute

      Ochs identifie deux types principaux d'affordances qui favorisent une culture de l'écoute.

      1. Environnements Construits Ouverts :

      ◦ Les maisons traditionnelles samoanes n'ont ni murs extérieurs ni murs intérieurs. L'espace est ouvert, avec des nattes en feuilles de cocotier pour l'ombre.   

      ◦ Les maisons sont regroupées en concessions familiales ouvertes et proches de la route principale, donnant accès aux conversations publiques.  

      ◦ Les interactions simultanées à l'intérieur et à l'extérieur de la maison sont courantes, et les habitants sont habitués à écouter plusieurs conversations à la fois.  

      ◦ En revanche, les maisons de style européen (coloniales), bien que prestigieuses, sont murées, rectangulaires et moins appréciées car elles limitent l'accès auditif et sont très chaudes.

      2. Alignements Corporels Orientés vers l'Extérieur :

      Nourrissons : Ils sont souvent "nichés" dans les bras d'un soignant (adulte ou aîné) de manière à faire face à l'extérieur, vers l'espace public et la communauté. Ils sont portés sur le dos, sur la hanche, ou assis devant le soignant, regardant dans la même direction que les autres participants.  

      Enfants plus âgés : Ils doivent s'asseoir en tailleur (ne pas montrer la plante des pieds) et observer activement les personnes à l'intérieur de la maison ainsi que celles sur la route depuis le bord de la maison. Leurs tâches (messagers, service, etc.) les rendent mobiles et actifs dans la communauté.  

      ◦ Le mot samoan pour "respect" (fa'aaloalo) est composé du préfixe fa'a et de alo, qui signifie "visage", impliquant l'idée de "se tourner vers l'autre".

      Hypothèses Socio-Économiques et Questions Ouvertes

      La professeure Ochs relie ces différents modes d'interaction à la structure économique de la famille.

      Le Modèle de la Continuité Familiale (ex: Samoa) :

      ◦ Les enfants sont élevés pour soutenir les ressources économiques partagées de la famille et assurer la continuité générationnelle des biens.  

      ◦ Dans ce contexte, "la famille a un investissement pour que l'enfant écoute". L'écoute est une compétence essentielle pour apprendre les dynamiques sociales et économiques du groupe.  

      ◦ Ce modèle favorise la participation de l'enfant en tant qu'auditeur dans des conversations multipartites.

      Le Modèle de l'Indépendance Individuelle (ex: familles néolibérales américaines) :

      ◦ Les enfants sont élevés pour devenir des individus économiquement indépendants, un héritage culturel où les droits de succession ont été abolis bien avant la révolution industrielle.    ◦ L'accent est mis sur le développement rapide de l'enfant en tant qu'individu, ce qui favorise les interactions dyadiques intenses et centrées sur l'enfant.

      Questions Centrales pour la Recherche Future

      La présentation se termine par une série de questions fondamentales :

      1. Les habitats (ouverts ou murés) et les orientations corporelles peuvent-ils influencer la phénoménologie de l'écoute dans la petite enfance ?

      2. Ces facteurs socioculturels agissent-ils comme des "amplificateurs culturels" ?

      Un habitat privé et clos amplifie-t-il l'écoute en tant qu'allocutaire dyadique, tandis qu'un habitat ouvert amplifie l'écoute en tant que participant secondaire ?

      3. Les études développementales actuelles n'examinent-elles qu'une "fraction des possibilités" en matière d'environnements et d'affordances pour l'écoute ?

    1. Synthèse : L'Ascension de la Diversité comme Valeur Politique

      Résumé

      Ce document de synthèse analyse l'exposé de la professeure Lorraine Daston sur l'ascension extraordinairement rapide de la diversité en tant que valeur politique fondamentale.

      Le point de départ est un paradoxe : alors que les changements de valeurs morales sont généralement des processus séculaires, voire millénaires (ex. l'abolition de l'esclavage, l'égalité des sexes), la diversité s'est imposée comme un bien allant de soi en quelques décennies seulement, à partir des années 1970.

      L'hypothèse centrale de Daston est que cette ascension fulgurante n'est pas un événement ex nihilo. La valeur politique actuelle de la diversité "s'est appuyée" (piggybacked) sur des incarnations antérieures et bien établies de cette même valeur dans d'autres domaines.

      Le document retrace cette généalogie en trois étapes clés :

      1. La Diversité Esthétique : Depuis l'Antiquité (Pline l'Ancien), la "fécondité exubérante" de la nature, notamment la variété infinie des fleurs, a été perçue comme une forme de beauté pure, gratuite et admirable.

      Cette valeur a atteint son apogée aux XVIe-XVIIe siècles avec l'afflux de nouveautés et les cabinets de curiosités (Wunderkammern).

      2. La Diversité Économique : À partir du XVIIIe siècle, la diversité change de nature et s'associe à l'efficacité. L'exemple de la manufacture d'épingles d'Adam Smith illustre comment la division du travail – une forme de diversité des tâches – devient synonyme de productivité et d'innovation.

      3. La Synthèse Biologique : Au XIXe siècle, les biologistes, notamment Henri Milne-Edwards et Charles Darwin, fusionnent ces deux conceptions.

      Ils appliquent le principe de la division du travail à l'organisme vivant et à l'évolution des espèces, présentant la nature non plus comme un simple terrain de jeu esthétique, mais comme une "économie sauvagement compétitive" et efficace.

      C'est la naissance conceptuelle de la "biodiversité".

      La valeur politique contemporaine de la diversité, née aux États-Unis dans le sillage des mouvements pour les droits civiques des années 1960, puise sa force et son évidence dans ce double héritage.

      Elle invoque à la fois l'efficacité économique (les équipes diverses sont plus performantes) et la beauté esthétique, comme l'illustre la métaphore de la "Nation Arc-en-ciel" de Nelson Mandela, qui évoque simultanément la splendeur de la flore sud-africaine et l'harmonie multiraciale.

      La session de questions-réponses explore les critiques contemporaines (de gauche comme de droite), les contextes nationaux spécifiques et les distinctions conceptuelles cruciales avec des notions comme le pluralisme, l'égalité et l'équité.

      --------------------------------------------------------------------------------

      Introduction : Une Ascension "Météorique"

      L'analyse de Lorraine Daston part d'un constat qu'elle qualifie d'« étonnant » : la rapidité avec laquelle la diversité s'est établie comme une valeur politique, non seulement dans les arguments et la législation, mais aussi comme une intuition morale viscérale.

      Un changement de valeur exceptionnellement rapide : Les changements de valeurs fondamentales sont des processus extrêmement lents. Daston cite plusieurs exemples :

      L'esclavage : Il a fallu des millénaires pour passer d'une acceptation quasi universelle dans l'Antiquité à une réprobation quasi universelle aujourd'hui.   

      L'égalité des femmes : Les arguments en sa faveur remontent au XVIIe siècle en Europe, mais la législation sur le droit de vote n'est intervenue qu'au XXe siècle, et l'enracinement de cette valeur dans la conscience collective reste discutable.  

      L'égalité économique : Défendue depuis le XVIIIe siècle, elle n'a pas encore franchi le seuil de la législation, et encore moins celui de l'intuition morale.

      Un indicateur quantitatif : L'analyse des données de Google Ngram, qui mesure la fréquence des mots dans un corpus de millions de livres, montre une augmentation "météorique" de l'usage du mot "diversité" à partir des années 1970.

      Années 1970 : La hausse est principalement liée à la biodiversité.  

      Années 1980 : Le terme commence à être appliqué à des contextes sociaux et politiques.  

      Influence américaine : Les courbes pour le français (diversité) et l'allemand (Diversität) suivent celles de l'anglais avec un décalage d'environ cinq ans, suggérant une direction d'influence des États-Unis vers l'Europe.

      En allemand, le mot "Diversity" est d'abord importé de l'anglais avant d'être naturalisé en "Diversität".

      L'Hypothèse Centrale : Une Préhistoire de la Valeur

      Pour expliquer cette ascension rapide, Daston avance que "l'incarnation la plus récente de la diversité dans le domaine politique puise son évidence en partie dans des versions antérieures de la diversité, d'abord comme valeur esthétique, puis comme valeur économique".

      Chaque nouvelle version s'est appuyée sur la précédente, créant une sorte de palimpseste de significations qui confère à la valeur politique actuelle sa force d'évidence.

      Les Incarnations Historiques de la Diversité

      1. La Diversité comme Valeur Esthétique : La Surabondance de la Nature

      Depuis l'Antiquité, la nature, par sa "fécondité débordante" et son "excès exubérant", a été le premier exemple de la diversité en tant que beauté.

      Pline l'Ancien (~78 ap. J.-C.) : Il s'émerveillait de la prolifération "magnifique mais apparemment inutile" des fleurs, qu'il considérait comme la preuve que la nature est "dans son humeur la plus enjouée".

      Emmanuel Kant (XVIIIe siècle) : Pour illustrer la beauté pure, qui ne sert aucun but et ne peut être subsumée sous aucun concept, il choisit les fleurs comme exemple premier.

      L'expansion européenne (XVIe-XVIIe siècles) : L'arrivée de produits exotiques (tulipes du Levant, porcelaines de Chine, coquilles de nautile de l'Indo-Pacifique) a enrichi cette esthétique de la diversité, visible dans les natures mortes et les peintures de l'époque.

      Les cabinets de curiosités (Wunderkammern) : Considérés comme l'apogée de cette esthétique, ils rassemblaient des objets hétéroclites (artefacts, animaux empaillés, etc.) dans un esprit d'extravagance et de mépris pour la frugalité.

      2. La Diversité comme Valeur Économique : L'Efficacité et la Division du Travail

      À la fin du XVIIIe siècle, la diversité est associée à un concept radicalement différent : l'efficacité économique.

      La manufacture d'épingles : Décrite dans l'Encyclopédie de Diderot et D'Alembert, cette usine normande illustre comment la division de la fabrication en 18 opérations distinctes permet une efficacité "époustouflante" (jusqu'à 48 000 épingles par jour).

      Adam Smith (1776) : Dans La Richesse des Nations, il utilise cet exemple pour démontrer comment la division du travail favorise l'efficacité et l'innovation technologique.

      Applications étendues : Au XIXe siècle, ce principe est appliqué bien au-delà de l'industrie :

      Charles Babbage : S'en inspire pour concevoir le premier ordinateur, la machine analytique.    ◦ Émile Durkheim : L'utilise pour sa théorie de la solidarité organique dans les sociétés avancées.

      3. La Synthèse Biologique : De la Physiologie à la Biodiversité

      Ce sont les biologistes qui ont réuni les conceptions esthétique et économique de la diversité.

      Henri Milne-Edwards : Confronté à l'infinie variété des organismes, ce zoologiste français y a décelé un principe organisateur fondamental : la division du travail.

      Pour lui, "c'est surtout par la division du travail que la perfection est obtenue".

      Le corps d'un organisme complexe est comme une usine où chaque organe a sa fonction (le cerveau ne digère pas, l'estomac ne pense pas).

      Charles Darwin (1859) : En lisant Milne-Edwards, il relie le principe de la division du travail à la spéciation dans L'Origine des espèces.

      La nature n'est plus seulement un terrain de jeu, mais une "économie sauvagement compétitive" et extrêmement efficace.

      C'est le moment où la "corne d'abondance de Pline fusionne avec la manufacture d'épingles d'Adam Smith", donnant naissance à l'idée moderne de biodiversité.

      L'Émergence de la Diversité comme Valeur Politique

      Origines aux États-Unis : De l'Égalité à la Diversité

      Le consensus académique situe le début de l'ascension de la diversité politique aux États-Unis dans les années 1960.

      Le Mouvement des Droits Civiques : Les campagnes pour les droits des Afro-Américains, puis des femmes, se sont menées sous la bannière de l'égalité pour tous les citoyens, indépendamment de la race, du genre ou de la sexualité.

      L'argument était démographique : si un groupe représente X% de la population, il devrait être représenté à hauteur de X% dans toutes les sphères de la société.

      La controverse de l'Affirmative Action : Les programmes conçus pour appliquer ce principe (quotas, discrimination positive) se sont avérés politiquement controversés.

      Le tournant de la "Diversity Management" : Après que la Cour Suprême a jugé l'affirmative action inconstitutionnelle dans plusieurs décisions marquantes, une nouvelle spécialité a émergé : la gestion de la diversité.

      Dans les années 1990, le terme "diversité" a supplanté celui d'"égalité" dans les politiques publiques et privées.

      Influence et Exemples Mondiaux

      Cette nouvelle valeur s'est ensuite propagée à l'échelle mondiale.

      Union Européenne : Le concept est intégré dans les directives aux États membres vers 2012.

      Afrique du Sud post-apartheid : Cet exemple est particulièrement révélateur de la fusion des différentes couches de la valeur.

      L'archevêque Desmond Tutu a qualifié les Sud-Africains de "peuple arc-en-ciel de Dieu", un symbole religieux évoquant l'alliance après le Déluge.  

      Nelson Mandela a repris cette phrase à des fins civiques, soulignant les connotations multiraciales de l'arc-en-ciel.

      Dans son discours présidentiel, il déclare : "Nous contractons une alliance : nous construirons une société dans laquelle tous les Sud-Africains, noirs et blancs, pourront marcher la tête haute... une nation arc-en-ciel en paix avec elle-même et avec le monde."

      Cette métaphore puise sa force dans le double héritage de la diversité :

      Efficacité économique : L'argument selon lequel des équipes diverses obtiennent de meilleurs résultats par la combinaison des perspectives.

      Beauté esthétique : Mandela a souvent associé l'arc-en-ciel à la flore de son pays, comme "les célèbres jacarandas de Pretoria".

      Le cœur de la valeur politique de la diversité reste "la splendeur de la prairie en fleurs".

      Analyses et Critiques Contemporaines (Session Q&R)

      La discussion qui a suivi l'exposé a permis d'explorer plusieurs nuances et critiques contemporaines de la notion de diversité.

      Thème

      Analyse et Points Clés

      Déclin et Critiques

      L'observation d'un léger déclin dans l'usage du mot "diversité" après 2010 pourrait s'expliquer par l'émergence de critiques venant des deux côtés du spectre politique :<br>\

      • Critique de gauche : Au nom de l'universalisme, arguant que la diversité accorde un statut politique sur la base de caractéristiques distinctives, alors que l'égalité se fonde sur ce qui est commun à tous les êtres humains.<br>\

      • Critique de droite : Au nom de la méritocratie, considérant que le principe de diversité s'y oppose.

      Contextes Nationaux et Résistances

      L'application de la diversité varie considérablement selon les contextes nationaux :<br>\

      • France : Réticence à collecter des statistiques ethniques en raison de forts principes universalistes.<br>\

      • États-Unis : Le débat est centré sur la question raciale.<br>- Europe Centrale : La discussion porte souvent sur les populations Roms.<br>\

      • Résistances pratiques : La définition des groupes "divers" à inclure est souvent un "champ de bataille", une "guerre de tous contre tous" hobbesienne, loin de l'image d'un défilé arc-en-ciel.

      Distinctions Conceptuelles Clés

      Des distinctions importantes ont été établies avec des termes voisins :<br>\

      • Diversité vs. Pluralisme : La diversité tend à s'appliquer aux identités individuelles ou de groupe, tandis que le pluralisme est une catégorie plus large incluant la pluralité des opinions et des idées ("marketplace of ideas" de John Stuart Mill) au sein même de ces groupes.<br>\

      • Égalité vs. Équité : L'égalité (des chances) est compatible avec une méritocratie sur un "terrain de jeu équitable".

      L'équité (des résultats) devient très controversée dans un contexte de contraction économique (post-2008), où le gain d'un groupe est perçu comme la perte d'un autre, menant à la fragmentation.

      Le Pouvoir de la Métaphore Esthétique

      La métaphore de l'arc-en-ciel est qualifiée de "brillante" car elle désamorce la stratégie de l'altérité et du dénigrement.

      Personne ne hiérarchise les couleurs de l'arc-en-ciel ; au contraire, leur mélange est considéré comme plus beau que chaque couleur prise isolément.

      Cela démontre le rôle actif de la valeur esthétique de la diversité dans la sphère politique.

    1. Crise, Inégalités et Précarité : Synthèse des Analyses d'Esther Duflo, Claire Hédon et Frédéric Worms

      Résumé

      Ce document de synthèse analyse les interventions d'Esther Duflo, Claire Hédon et Frédéric Worms sur l'impact de la crise du coronavirus sur les inégalités et la précarité. Les conclusions clés sont les suivantes :

      Aggravation des Inégalités : La crise a un effet immédiat et délétère, exacerbant les inégalités existantes tant au sein des pays qu'entre eux.

      Les populations les plus pauvres et les plus vulnérables subissent de manière disproportionnée les chocs sanitaires et économiques.

      Aux États-Unis, par exemple, la probabilité de décès du coronavirus pour une personne noire est quatre fois supérieure à celle d'une personne blanche, à âge égal.

      Disparité des Réponses Économiques : Les pays riches ont pu mobiliser 20% de leur PIB pour soutenir leurs économies, contre 6% pour les pays émergents et seulement 2% pour les pays pauvres, ce qui laisse présager un enlisement de la pauvreté dans ces derniers.

      Révélation des Failles Systémiques : La crise a mis en lumière des problèmes structurels profonds :

      • une méfiance institutionnalisée envers les pauvres qui rend les systèmes de protection sociale punitifs,
      • un recul des services publics qui complique l'accès aux droits (notamment à cause de la dématérialisation), et
      • une incapacité de la communauté internationale à organiser une solidarité efficace.

      Opportunités de Changement : Malgré ses effets négatifs, la crise offre des opportunités.

      Elle a démontré que le gouvernement est une solution essentielle pour gérer les crises, et non le problème.

      L'expérience massive du chômage partiel pourrait également changer la perception de la redistribution, en montrant que chacun peut avoir besoin d'aide, et potentiellement ouvrir la voie à des systèmes plus respectueux de la dignité.

      Approche Structurelle : Le traitement des inégalités n'est pas seulement une conséquence à gérer, mais une condition préalable à la gestion efficace des crises futures, qu'elles soient sanitaires, climatiques ou démocratiques.

      La confiance dans un système de redistribution juste est indispensable pour obtenir l'adhésion collective aux efforts nécessaires.

      Enjeux de l'Accès au Droit : La crise a aggravé le phénomène de "non-recours" aux droits, où les personnes les plus précaires, confrontées à la fermeture des services physiques et à la barrière numérique, ne parviennent pas à obtenir les aides auxquelles elles ont droit.

      --------------------------------------------------------------------------------

      1. L'Impact Immédiat et Disproportionné de la Crise

      La crise du coronavirus, loin d'être un "grand égaliseur", a frappé de manière asymétrique, aggravant les vulnérabilités existantes.

      1.1. Inégalités au sein des Pays Riches

      Sur le plan sanitaire : Esther Duflo souligne que les populations les plus pauvres et minoritaires ont été les plus touchées.

      Aux États-Unis, en ajustant pour l'âge, une personne noire a quatre fois plus de chances de mourir du coronavirus qu'une personne blanche.

      Une étude de l'INSEE en France, citée par Claire Hédon, montre également une corrélation entre le niveau de vie de la commune et la mortalité.

      Sur le plan économique :

      ◦ La reprise est inégale. Aux États-Unis, le quart le plus riche de la population a retrouvé ses niveaux d'emploi et de salaire d'avant-crise, tandis que les plus pauvres, notamment dans le secteur des services, s'installent dans une crise durable.  

      ◦ Les dispositifs de solidarité, comme le chômage partiel en Europe, se sont principalement basés sur l'existence d'un emploi préalable, laissant de côté les personnes déjà en grande précarité.   

      ◦ Claire Hédon rapporte que les personnes aux minima sociaux ont vu leur situation se dégrader (courses plus chères dans les commerces de proximité, enfants non scolarisés à la cantine à 1€) sans bénéficier d'aides supplémentaires significatives.

      1.2. Inégalités entre les Pays

      Esther Duflo met en évidence un fossé immense dans la capacité de réponse économique à la crise.

      Catégorie de pays

      Dépenses de soutien fiscal (en % du PIB)

      Pays riches

      20 %

      Pays émergents

      6 %

      Pays pauvres

      2 % (d'un PIB déjà beaucoup plus petit)

      Cette disparité a des conséquences majeures :

      • Les pays riches ont pu emprunter massivement pour protéger leurs populations, une option inaccessible aux pays pauvres.

      • Alors qu'une reprise économique rapide est attendue dans les pays riches grâce à la vaccination, les pays pauvres risquent un "enlisement de la crise" et un renfermement de la pauvreté sur elle-même.

      2. Les Failles Systémiques Révélées et Exacerbées

      La crise a agi comme un révélateur de dysfonctionnements structurels profonds dans nos sociétés et nos institutions.

      2.1. La Méfiance envers les Pauvres et le Carcan Punitif de la Redistribution

      Esther Duflo affirme que nos systèmes de protection sociale sont qualitativement faibles et "punitifs à leur cœur" en raison d'une méfiance profonde envers les pauvres, perçus comme "paresseux".

      Cette vision, qualifiée de "victorienne", érige des barrières pour éviter que les bénéficiaires "ne se vautrent pas dans la complaisance".

      Claire Hédon confirme ce constat avec des exemples concrets :

      Le soupçon de fraude permanent : Elle cite le cas d'un homme ayant mis 15 mois à obtenir le RSA, ou ceux de personnes accusées de fraude pour avoir vendu leurs vêtements ou leur voiture pour survivre.

      Un regard culpabilisateur : "J'ai le sentiment qui est ancré dans la société un regard très culpabilisateur qui est aussi qu'est-ce que vous avez raté dans votre vie pour vous retrouver dans cette situation là."

      Elle soutient que c'est la société qui a échoué envers ces personnes, et non l'inverse.

      2.2. Le Recul des Services Publics et le Non-Recours aux Droits

      Claire Hédon, en tant que Défenseure des droits, alerte sur un "recul de la présence de l'État" qui a été aggravé par la crise.

      La dématérialisation comme barrière : La fermeture des services physiques (CAF, postes) a rendu l'accès aux droits quasi impossible pour les personnes sans connexion internet, sans matériel adéquat ou sans compétences numériques.

      Pour les plus précaires, la dématérialisation aboutit à un "non accès au droit".

      Le phénomène du non-recours : Beaucoup de personnes éligibles n'arrivent pas à faire valoir leurs droits. La lutte contre la fraude, en complexifiant les démarches, génère de fait du non-recours.

      Qualité de l'accueil : Même l'accès physique est semé d'embûches, comme l'illustre l'exemple d'un homme devant parcourir 30 km pour se rendre à la CAF, se voir refuser l'entrée faute de rendez-vous pris sur internet, puis être jugé "pas motivé" par les agents d'accueil.

      2.3. L'Échec de la Solidarité Internationale

      Esther Duflo déplore que les pays riches, qui ont dépensé des "trillions de dollars" pour leurs propres économies, aient été "aux grands abonnés absents" pour aider les pays pauvres.

      L'appel à un "plan Marshall pour les pays pauvres" qu'elle a lancé au début de la crise n'a pas été entendu.

      Cette incapacité à agir collectivement en temps de crise est un signal inquiétant pour les défis à venir, notamment le changement climatique.

      3. Les Crises comme Catalyseurs de Changements Potentiels

      Malgré le constat sombre, les intervenants identifient des lueurs d'espoir et des opportunités de repenser certains paradigmes.

      3.1. Le Rôle Essentiel de l'État

      Pour Esther Duflo, la crise a apporté une leçon majeure : "le gouvernement n'est pas le problème, le gouvernement est la solution."

      Seul l'État a la capacité :

      • D'imposer des mesures de santé publique (port du masque).

      • D'investir massivement dans la recherche et l'achat de vaccins.

      • D'emprunter au nom de la population pour la protéger des chocs économiques.

      Cette prise de conscience pourrait mener à un "regain d'appréciation pour l'importance du rôle du gouvernement".

      3.2. Vers une Nouvelle Perception de la Redistribution

      L'expérience massive et souple du chômage partiel en Europe a montré que "tout le monde peut avoir besoin d'aide".

      Des personnes "tout à fait vertueuses" se sont retrouvées dépendantes d'un soutien public.

      Espoir d'un changement de mentalité : Esther Duflo espère que cette expérience pourra "nous libérer un peu de ce carcan victorien" et permettre une redistribution "plus fluide, plus respectueuse, mettant la dignité des individus au cœur".

      Débat sur le revenu des jeunes : Claire Hédon note que la crise a rendu moins tabou le débat sur un revenu d'existence pour les 18-25 ans (via le RSA ou la généralisation de la Garantie Jeune).

      4. Une Approche Structurelle : Traiter les Inégalités pour Prévenir les Crises

      Frédéric Worms propose une analyse en trois niveaux de la réponse à la crise et plaide pour une vision structurelle à long terme.

      4.1. Trois Types de Réponses à la Crise

      1. La réponse "hypocrite" : Consiste à dire que, puisque les mesures sanitaires aggravent les inégalités, il ne fallait pas y répondre (ou pas autant).

      Frédéric Worms et Esther Duflo réfutent cet argument en soulignant qu'il n'y a pas d'arbitrage entre le sanitaire et l'économique : les pays qui ont mal géré la crise sanitaire ont aussi les pires résultats économiques.

      2. La réponse "honnête" (démocratie sociale) : Consiste à répondre aux deux dangers simultanément, en conjuguant les impératifs sanitaires, économiques et sociaux.

      3. La réponse "structurelle" (la plus forte) : Consiste à affirmer que le traitement des inégalités est la condition même de la réponse aux dangers sanitaires du 21e siècle. Les inégalités ne sont pas un effet secondaire, mais une cause première des crises.

      4.2. La Confiance comme Prérequis à l'Action Collective

      Cette approche structurelle est essentielle car, comme le souligne Esther Duflo, on ne peut pas gérer une crise (COVID, climatique) qui implique des sacrifices sans la confiance des citoyens.

      Confiance et redistribution : Les gens n'accepteront des mesures difficiles (ex: taxe carbone) que s'ils ont confiance dans le fait qu'ils seront justement compensés.

      Cette confiance est impossible sans un système de redistribution perçu comme "efficace, généreux et qui respecte les gens".

      Le cercle vicieux de la défiance : Frédéric Worms pointe une "défiance mutuelle" :

      celle des citoyens envers le gouvernement, mais aussi celle du gouvernement envers les citoyens (soupçon de fraude).

      Briser ce cercle nécessite de s'appuyer sur le savoir, la science, et des "institutions du désaccord" solides.

      5. Pistes d'Action et Solutions

      La discussion a également abordé des solutions concrètes pour lutter contre la pauvreté et les inégalités.

      Revenu Minimum Garanti vs. Revenu Universel :

      Pour les pays pauvres, Esther Duflo préconise un revenu universel très faible, accessible sur simple demande.

      L'enjeu principal y est la perte de dignité, et même un revenu modeste peut suffire à "mettre de quoi manger à vos enfants trois fois par jour".   

      Pour les pays riches, elle privilégie un revenu minimum garanti (sur le principe du RSA), qui concentre les ressources sur ceux qui en ont le plus besoin, car les informations pour les cibler existent.

      Elle insiste sur le fait que la dignité y est aussi liée au travail, qui nécessite plus que de l'argent (logement, garde d'enfants, etc.).

      Ce doit être un droit, non une charité.

      Le Droit au Travail : Claire Hédon et Esther Duflo s'accordent sur l'importance du droit au travail.

      Les personnes en situation de précarité souhaitent travailler, car c'est un "moyen d'être inséré dans la société".

      L'Approche Expérimentale : Esther Duflo plaide pour l'importation d'une attitude apprise dans son travail dans les pays pauvres :

      l'humilité de reconnaître qu'on ne sait pas toujours ce qui marche et la nécessité de tester rigoureusement les politiques publiques avant de les généraliser.

      Des études ont par exemple montré que la sécurité financière encourage l'initiative plutôt qu'elle ne la limite.

      Droit à l'accès au numérique : Face à la dématérialisation généralisée, Claire Hédon estime qu'il faut désormais réfléchir à un "droit à l'accès au numérique".

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Bisht et al address the hypothesis that protein folding chaperones may be implicated in aggregopathies and in particular Tau aggregation, as a means to identify novel therapeutic routes for these largely neurodegenerative conditions.

      The authors conducted a genetic screen in the Drosophila eye, which facilitates the identification of mutations that either enhance or suppress a visible disturbance in the nearly crystalline organization of the compound eye. They screened by RNA interference all 64 known Drosophila chaperones and revealed that mutations in 20 of them exaggerate the Tau-dependent phenotype, while 15 ameliorated it. The enhancer of the degeneration group included 2 subunits of the typically heterohexameric prefoldin complex and other co-translational chaperones.

      The authors characterized in depth one of the prefoldin subunits, Pfdn5, and convincingly demonstrated that this protein functions in the regulation of microtubule organization, likely due to its regulation of proper folding of tubulin monomers. They demonstrate convincingly using both immunohistochemistry in larval motor neurons and microtubule binding assays that Pfdn5 is a bona fide microtubule-associated protein contributing to the stability of the axonal microtubule cytoskeleton, which is significantly disrupted in the mutants.

      Similar phenotypes were observed in larvae expressing Frontotemporal dementia with Parkinsonism on chromosome 17-associated mutations of the human Tau gene V377M and R406W. On the strength of the phenotypic evidence and the enhancement of the TauV377Minduced eye degeneration, they demonstrate that loss of Pfdn5 exaggerates the synaptic deficits upon expression of the Tau mutants. Conversely, the overexpression of Pfdn5 or Pfdn6 ameliorates the synaptic phenotypes in the larvae, the vacuolization phenotypes in the adult, and even memory defects upon TauV377M expression.

      Strengths

      The phenotypic analyses of the mutant and its interactions with TauV377M at the cell biological, histological, and behavioral levels are precise, extensive, and convincing and achieve the aims of characterization of a novel function of Pfdn5. 

      Regarding this memory defect upon V377M tau expression. Kosmidis et al (2010), PMID: 20071510, demonstrated that pan-neuronal expression of Tau<sup>V377M</sup> disrupts the organization of the mushroom bodies, the seat of long-term memory in odor/shock and odor/reward conditioning. If the novel memory assay the authors use depends on the adult brain structures, then the memory deficit can be explained in this manner. 

      (1) If the mushroom bodies are defective upon Tau<sup>V377M</sup>. expression, does overexpression of Pfdn5 or 6 reverse this deficit? This would argue strongly in favor of the microtubule stabilization explanation.

      We thank the reviewer for this insightful comment. Consistent with Kosmidis et al. (2010), we confirm that expression of hTau<sup>V377M</sup> disrupts the architecture of mushroom bodies.   In addition, we find, as suggested by the reviewer, that coexpression of either Pfdn5 or Pfdn6 with hTau<sup>V377M</sup> significantly restores the organization of the mushroom bodies. These new findings strongly support the hypothesis that Pfdn5 or Pfdn6 mitigate hTau<sup>V377M</sup> -induced memory deficits by preserving the structure of the mushroom body, likely through stabilizing the microtubule network. This data has now been included in the revised manuscript (Figure 7H-O).

      (2) The discovery that Pfdn5 (and 6 most likely) affects tauV377M toxicity is indeed a novel and important discovery for the Tauopathies field. It is important to determine whether this interaction affects only the FTDP-17-linked mutations or also WT Tau isoforms, which are linked to the rest of the Tauopathies. Also, insights on the mode(s) that Pfdn5/6 affect Tau toxicity, such as some of the suggestions above, are aiming at will likely be helpful towards therapeutic interventions.

      We agree that determining whether prefoldin modulates the toxicity of both mutant and wildtype Tau is critical for understanding its broader relevance to Tauopathies. We have now performed additional experiments required to address this issue. These new data show that loss of Pfdn5 also exacerbates toxicity associated with wildype Tau (hTau<sup>WT</sup>), in a manner similar to that observed with hTau<sup>V337M</sup> or hTau<sup>R406W</sup>. Specifically, overexpression of hTau<sup>WT</sup> in a Pfdn5 mutant background leads to Tau aggregate formation (Figure S7G-I), and coexpression of Pfdn5 with hTau<sup>WT</sup> reduces the associated synaptic defects (Figure S11F-L). These findings underscore a general role for Pfdn5 in modulating diverse Tauopathy-associated phenotypes and suggest that it could be a broadly relevant therapeutic target. 

      Weakness

      (3) What is unclear, however, is how Pfdn5 loss or even overexpression affects the pathological Tau phenotypes. Does Pfdn5 (or 6) interact directly with TauV377M? Colocalization within tissues is a start, but immunoprecipitations would provide additional independent evidence that this is so.

      We appreciate this important suggestion. To investigate a potential direct interaction between Pfdn5 and Tau<sup>V377M</sup>, we performed co-immunoprecipitation experiments using lysates from adult fly brain expressing hTau<sup>V337M</sup>. Under the conditions tested, we did not detect a direct physical interaction. While this does not support a direct interaction, it does not strongly refute it either. We note that Pfdn5 and Tau are colocalized within axons (Figure S13J-K). At this stage, we are unable to resolve the issue of direct vs indirect association. If indirect, then Tau and Pfdn5 act within the same subcellular compartments (axon); if direct, then either only a small fraction of the total cellular proteins is in the Tau-Pfdn5 complex and therefore difficult to detect in bulk protein westerns, or the interactions are dynamic or occur in conditions that we have not been able to mimic in vitro. 

      (4) Does Pfdn5 loss exacerbate Tau<sup>V377M</sup> phenotypes because it destabilizes microtubules, which are already at least partially destabilized by Tau expression? Rescue of the phenotypes by overexpression of Pfdn5 agrees with this notion. 

      However, Cowan et al (2010) pmid: 20617325 demonstrated that wildtype Tau accumulation in larval motor neurons indeed destabilizes microtubules in a Tau phosphorylation-dependent manner. So, is Tau<sup>V377M</sup> hyperphosphorylated in the larvae?? What happens to Tau<sup>V377M</sup> phosphorylation when Pfdn5 is missing and presumably more Tau is soluble and subject to hyperphosphorylation as predicted by the above?

      We completely agree that it is important to link Tau-induced phenotypes with the microtubule destabilization and phosphorylation state of Tau.   We performed immunostaining using futsch antibody to check the microtubule organization at the NMJ and observed a severe reduction in futsch intensity when Tau<sup>V337M</sup> was expressed in the Pfdn5 mutant (ElavGal4>Tau<sup>V337M</sup>; DPfdn5<sup>15/40</sup>), suggesting that Pfdn5 absence exacerbates the hTau<sup>V337M</sup> defects due to more microtubule destabilization (Figure S6F-J). 

      We have performed additional experiments to examine the phosphorylation state of hTau in Drosophila larval axons. Immunocytochemistry indicated that only a subset of hTau aggregates in Pfdn5 mutants (Elav-Gal4>Tau<sup>V337M</sup>; DPfdn5<sup>15/40</sup>) are recognized by phospho-hTau antibodies.   For instance, the AT8 antibody (targeting pSer202/pThr205) (Goedert et al., 1995) labelled only a subset of aggregates identified by the total hTau antibody (D5D8N) (Figure S9AE). Moreover, feeding these larvae (Elav-Gal4>Tau<sup>V337M</sup; DPfdn5<sup>15/40</sup>) with LiCl, which blocks GSK3b, still showed robust Tau aggregation (Figure S9F-J). 

      These results imply that: a) soluble phospho-hTau levels in Pfdn5 mutants are low and not reliably detected with a single phospholylation-specific antibody; b) Loss of Pfdn5 results in Tau aggregation in a hyperphosphorylation-independent manner similar to what has been reported earlier (LI et al. 2022); and c) the destabilization of microtubules in Elav-Gal4>Tau<sup>V337M</sup>; DPfdn5<sup>15/40</sup> results in Tau dissociation and aggregate formation. These data and conclusions have been incorporated into the revised manuscript.

      (5) Expression of WT human Tau (which is associated with most common Tauopathies other than FTDP-17) as Cowan et al suggest has significant effects on microtubule stability, but such Tauexpressing larvae are largely viable. Will one mutant copy of the Pfdn5 knockout enhance the phenotype of these larvae?? Will it result in lethality? Such data will serve to generalize the effects of Pfdn5 beyond the two FDTP-17 mutations utilized.

      We have now examined whether heterozygous loss of Pfdn5 (∆Pfdn5/+) enhances the effect of Tau expression. While each genotype (hTau<sup>V337M</sup>, hTau<sup>WT</sup> or ∆Pfdn5/+) alone is viable, Elav-Gal4 driven expression of hTau<sup>V337M</sup> or hTau<sup>WT</sup> in Pfdn5 heterozygous background does not cause lethality. 

      (6) Does the loss of Pfdn5 affect TauV377M (and WTTau) levels?? Could the loss of Pfdn5 simply result in increased Tau levels? And conversely, does overexpression of Pfdn5 or 6 reduce Tau levels?? This would explain the enhancement and suppression of Tau<sup>V377M</sup> (and possibly WT Tau) phenotypes. It is an easily addressed, trivial explanation at the observational level, which, if true, begs for a distinct mechanistic approach.

      To test whether Pfdn5 modulates Tau phenotypes by altering Tau protein levels, we performed western blot analysis under Pfdn5 or Pfdn6 overexpression conditions and observed no change in hTau<sup>V337M</sup> levels (Figure 6O). However, in the absence of Pfdn5, both hTau<sup>V337M</sup> and hTau<sup>WT</sup> form large, insoluble aggregates that are not detected in soluble lysates by standard western blotting but are visualized by immunocytochemistry (Figure S7G-I). Thus, the apparent reduction in Tau levels on western blots reflects a solubility shift, not an actual decrease in Tau expression. These findings argue against a simple model in which Pfdn5 regulates Tau abundance and instead support a mechanism in which Pfdn5 loss leads to change in Tau conformation, leading to its sequesteration away for already destabilized microtubules.  

      (7) Finally, the authors argue that Tau<sup>V377M</sup> forms aggregates in the larval brain based on large puncta observed especially upon loss of Pfdn5. This may be so, but protocols are available to validate this molecularly the presence of insoluble Tau aggregates (for example, pmid: 36868851) or soluble Tau oligomers, as these apparently differentially affect Tau toxicity. Does Pfdn5 loss exaggerate the toxic oligomers, and overexpression promote the more benign large aggregates??

      We have performed additional experiments to analyze the nature of these aggregates using 1,6-HD. The 1,6-hexanediol can dissolve the Tau aggregate seeds formed by Tau droplets, but cannot dissolve the stable Tau aggregates (WEGMANN et al. 2018). We observed that 5% 1,6hexanediol failed to dissolve these Tau aggregates (Figure S8), demonstrating the formation of stable filamentous flame-shaped NFT-like aggregates in the absence of Pfdn5 (Figure 5D and Figure S9).

      Reviewer #2 (Public review):

      Bisht et al detail a novel interaction between the chaperone, Prefoldin 5, microtubules, and taumediated neurodegeneration, with potential relevance for Alzheimer's disease and other tauopathies. Using Drosophila, the study shows that Pfdn5 is a microtubule-associated protein, which regulates tubulin monomer levels and can stabilize microtubule filaments in the axons of peripheral nerves. The work further suggests that Pfdn5/6 may antagonize Tau aggregation and neurotoxicity. While the overall findings may be of interest to those investigating the axonal and synaptic cytoskeleton, the detailed mechanisms for the observed phenotypes remain unresolved and the translational relevance for tauopathy pathogenesis is yet to be established. Further, a number of key controls and important experiments are missing that are needed to fully interpret the findings.

      The strength of this study is the data showing that Pfdn5 localizes to axonal microtubules and the loss-of-function phenotypic analysis revealing disrupted synaptic bouton morphology. The major weakness relates to the experiments and claims of interactions with Tau-mediated neurodegeneration. 

      In particular, it is unclear whether knockdown of Pfdn5 may cause eye phenotypes independent of Tau. 

      Our new experiments confirm that knockdown of Pfdn5 alone does not cause eye phenotypes.

      Further, the GMR>tau phenotype appears to have been incorrectly utilized to examine agedependent, neurodegeneration.

      In response, we have modulated and explained our conclusions in this regard as described later in our “rebuttal.”

      This manuscript argues that its findings may be relevant to thinking about mechanisms and therapies applicable to tauopathies; however, this is premature given that many questions remain about the interactions from Drosophila, the detailed mechanisms remain unresolved, and absent evidence that Tau and Pfdn may similarly interact in the mammalian neuronal context. Therefore, this work would be strongly enhanced by experiments in human or murine neuronal culture or supportive evidence from analyses of human data.

      The reviewer is correct that the impact would be greater if Pfdn5-Tau interactions were also examined in human tissue.   While we have not attempted these experiments ourselves, we hope that our observations will stimulate others to test the conservation of phenomena we describe. There are, however, several lines of circumstantial evidence from human Alzheimer’s disease datasets that implicate PFDN5 in disease pathology. For example, recent compilations and analyses of proteomic data show reductions of CCT components, TBCE, as well as Prefoldin subunits, including PFDN5, in AD tissue (HSIEH et al. 2019; TAO et al. 2020; JI et al. 2022; ASKENAZI et al. 2023; LEITNER et al. 2024; SUN et al. 2024). Furthermore, whole blood mRNA expression data from Alzheimer's patients revealed downregulation of PFDN5 transcript (JI et al. 2022). Together, these findings from human data are consistent with the roles of PFDN5 in suppressing diverse neurodegenerative processes. We have incorporated these points into the discussion section of the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      See public review for experimental recommendations focusing on the Tau Pfdn interactions.  I would refrain from using the word aggregates, I would call them puncta, unless there is molecular or visual (ie AFM) evidence that they are indeed insoluble aggregates.  Finally, although including the full genotypes written out below the axis in the bar graphs is appreciated, it nevertheless makes them difficult to read due to crowding in most cases and somewhat distracting from the figure. 

      In my opinion, a more reader-friendly manner of reporting the phenotypes will be highly helpful. For example, listing each component of the genotype on the left of each bar graph and adding a cross or a filled circle under the bar to inform of the full genotype of the animals used.

      As described in the response to the previous comment, we now have strong direct evidences to support our view that the observed puncta are stable Tau aggregates. Thus, we feel justified to use the term Tau-aggregates in preference to Tau puncta. 

      We have tried to write the genotypes to make them more reader-friendly.

      Reviewer #2 (Recommendations for the authors):

      (1) Lines 119-121: 35 modifiers from 64 seem like an unusually high hit rate. Are these individual genes or lines? Were all modifiers supported by at least 2 independent RNAi strains targeting non-overlapping sequences? A supplemental table should be included detailing all genes and specific strains tested, with corresponding results.

      We agree with the reviewer that 35 modifiers from 64 genes may be too high. However, since the genes knocked down in the study are chaperones, crucial for maintaining proteostasis, we may have got unusually high hits. The information related to individual genes and lines is provided in Supplemental Table 1. We have now included an additional Supplemental Table 3, which lists the genes and the RNAi lines used in Figure 1, detailing the sequence target information. The table also specifies the number of independent RNAi strains used and the corresponding results. 

      (2) Figure 1: The authors quantify the areas of ommatidial fusion and necrosis as degeneration, but it is difficult to appreciate the aberrations in the photos provided. Was any consideration given to also quantifying eye size?

      We have processed the images to enhance their contrast and make the aberrations clearer. The percentage of degenerated eye area (Figure 1M) was normalized with total eye area. The method for quantifying degenerated area has been explained in the materials and methods section.

      (3) Figure 1: a) Only enhancers of rough eyes are shown but no controls are included to evaluate whether knockdown of these genes causes eye toxicity in the absence of Tau. These are important missing controls. All putative Tau enhancers, including Pdn5/6, need to be tested with GMR-GAL4 independently of Tau to determine whether they cause a rough eye. In a previous publication from some of the same investigators (Raut et al 2017), knockdown of Pfdn using eyGAL4 was shown to induce severe eye morphology defects - this raises questions about the results shown here. 

      We agree that assessing the effects of HSP knockdown independent of Tau is essential to confirm modifier specificity. We have now performed these knockdowns, and the data are reported in Supplemental Table 1. For RNAi lines represented in Figure 1, which enhanced Tau-induced degeneration/eye developmental defect, except for one of the RNAi lines against Pfdn6 (GD34204), no detectable eye defects were observed when knocked down with GMR-Gal4 at 25°C, suggesting that enhancement is specific to the Tau background. 

      Use of a more eye-specific GMR-Gal4 driver at 25°C versus broader expressing ey-Gal4 at 29°C in prior work (Raut et al. 2017) likely reflects the differences in the eye morphological defects.

      (b) Besides RNAi, do the classical Pdn5 deletion alleles included in this work also enhance the tau rough eye when heterozygous? Please also consider moving the Pfdn5/6 overexpression studies to evaluate possible suppression of the Tau rough eye to Figure 1, as it would enhance the interpretation of these data (but see also below).

      GMR-Gal4 driven expression of hTau<sup>V337M</sup> or hTau<sup>WT</sup> in Pfdn5 heterozygous background does not enhance rough eye phenotype. 

      (4) For genes of special interest, such as Pdn5, and other genes mentioned in the results, the main figure, or discussion, it is also important to perform quantitative PCR to confirm that the RNAi lines used actually knock down mRNA expression and by how much. These studies will establish specificity.

      We agree that confirming RNAi efficiency via quantitative PCR (qPCR) is essential for validating the knockdown efficiency. We have now included qPCR data, especially for key modifiers, confirming effective knockdown (Figure S2).

      (5) Lines 235-238: how do you conclude whether the tau phenotype is "enhanced" when Pfdn5 causes a similar phenotype on its own? Could the combination simply be additive? Did overexpression of Pdn5 suppress the UAS-hTau NMJ bouton phenotype (see below)? 

      Although Pfdn5 mutants and hTau expression individually increase satellite boutons, their combination leads to a significantly more severe and additional phenotype, such as significantly decreased bouton size and increased bouton number, indicating an enhancing rather than purely additive interaction (Figure 4 and Figure S6C). Moreover, we now show that overexpression of Pfdn5 significantly suppressed the hTau<sup>V337M</sup>-induced NMJ phenotypes. This new data has been incorporated as Figure S11F-L in the revised manuscript. 

      Alternatively, did the authors consider reducing fly tau in the Pdn5 mutant background?

      In new additional experiments, we observe that double mutants for Drosophila Tau (dTau) and Pfdn5 also exhibit severe NMJ defects, suggesting genetic interactions between dTau and Pfdn5. This data is shown below for the reviewer.

      Author response image 1.

      A double mutant combination of dTau and Pfdn5 aggravates the synaptic defects at the Drosophila NMJ. (A-D') Confocal images of NMJ synapses at muscle 4 of A2 hemisegment showing synaptic morphology in (A-A') control, (B-B') ΔPfdn5<SUP>15/40</SUP>, (C-C') dTauKO/dTauKO (Drosophila Tau mutant), (D-D') dTauKO/dTauKO; ∆Pfdn5<SUP>15/40</SUP> double immunolabeled for HRP (green), and CSP (magenta). The scale bar in D for (A-D') represents 10 µm. 

      (6) It may be important to further extend the investigation to the actin cytoskeleton. It is noted that Pfdn5 also stabilizes actin. Importantly, tau-mediated neurodegeneration in Drosophila also disrupts the actin cytoskeleton, and many other regulators of actin modify tau phenotypes.

      We appreciate the suggestion to examine the actin cytoskeleton. While prior studies indicate that Pfdn5 might regulate the actin cytoskeleton and that Tau<sup>V377M</sup> hyperstabilizes the actin cytoskeleton, we did not observe altered actin levels in Pfdn5 mutants (Figure 2G). However, actin dynamics may represent an additional mechanism through which Pfdn5 might temporally influence Tauopathy. Future work will address potential actin-related mechanisms in Tauopathy.

      (7) Figure 2: in the provided images, it is difficult to appreciate the futsch loops. Please include an image with increased magnification. It appears that fly strains harboring a genomic rescue BAC construct are available for Pfdn-this would be a complementary reagent to test besides Pfdn overexpression.

      We have updated Figure 2 to include high magnification NMJ images as insets, clearly showing the Futsch loops. While we have not yet tested a genomic rescue BAC construct for Pfdn5, we plan to use the fly line harboring this construct in future work.

      (8) Figure 3: Some of the data is not adequately explained. The use of Ran as a loading control seems rather unusual. What is the justification? Pfdn appears to only partially co-localize with a-tubulin in the axon; can the authors discuss or explain this? Further, in Pfdn5 mutants, there appears to be a loss of a-tubulin staining (3b'); this should also be discussed.

      We appreciate the reviewer's concern regarding the choice of loading control for our Western blot analysis. Importantly, since Tubulin levels and related pathways were the focus of our analysis, traditional loading controls such as α- or β-tubulin or actin were deemed unsuitable due to potential co-regulation. Ran, a nuclear GTPase involved in nucleocytoplasmic transport, is not known to be transcriptionally or post-translationally regulated by Tubulin-associated signaling pathways. To ensure its reliability as a loading control, we confirmed by densitometric analysis that Ran expression showed minimal variability across all samples. Hence, we used Ran for accurate normalization in the Western blot data represented in this manuscript. We have also used GAPDH as a loading control and found no difference with respect to Ran as a loading control across samples.

      We appreciate the reviewer's comment regarding the interpretation of our Pearson's correlation coefficient (PCC) results. While the mean colocalization value of 0.6 represents a moderate positive correlation (MUKAKA 2012), which may not reach the conventional threshold for "high positive" colocalization (usually considered 0.7-0.9), it nonetheless indicates substantial spatial overlap between the proteins of interest. Importantly, colocalization analysis provides supportive but indirect evidence for molecular proximity.  To further validate the interaction, we performed a microtubule binding assay, which directly demonstrates the binding of Pfdn5 to stabilized microtubules.

      In accordance with the western blot analysis shown in Figure 2G-I, the levels of Tubulin are reduced in the Pfdn5 mutants (Figure 3B''). We have incorporated and discussed this in the revised manuscript.

      (9) Figure 4: Overexpression of Pfdn appears to rescue the supernumerary satellite bouton numbers induced by human Tau; however, interpretation of this experiment is somewhat complicated as it is performed in Pfdn mutant genetic background. Can overexpression of Pfdn on its own rescue the Tau bouton defect in an otherwise wildtype background?

      We have now coexpressed Pfdn5 and hTau<SUP>V337M</SUP> in an otherwise wild-type background. As shown in Figure S11F-L, Pfdn5 overexpression suppresses Tau-induced bouton defects. We have incorporated the data in the Results section to support the role of Pfdn5 as a modifier of Tau toxicity.

      (10) Lines 256-263 / Figure 5: (a) What exactly are these tau-positive structures (punctae) being stained in larval brains in Fig 5C-E? Most prior work on tau aggregation using Drosophila models has been done in the adult brain, and human wildtype or mutant Tau is not known to form significant numbers of aggregates in neurons (although aggregates have been described following glia tau expression). 

      Therefore, the results need to be further clarified. Besides the provided schematic, a zoomed-out image showing the whole larval brain is needed here for orientation. Have these aggregates been previously characterized in the literature? 

      We agree with the reviewer that the expression of the wildtype or mutant form of human Tau in Drosophila is not known to form aggregates in the larval brain, in contrast to the adult brain (JACKSON et al. 2002; OKENVE-RAMOS et al. 2024). Consistent with previous reports, we also observed that Tau expression on its own does not form aggregates in the Drosophila larval brain.

      However, in the absence of Pfdn5, microtubule disruption is severe, leading to reduced Taumicrotubule binding and formation of globular/round or flame-shaped tangles like aggregates in the larval brain. Previous studies have reported that 1,6-hexanediol can dissolve the Tau aggregate seeds formed by Tau droplets, but cannot dissolve the stable Tau aggregates (WEGMANN et al. 2018). We observed that 5% 1,6-Hexanediol failed to dissolve these Tau puncta, demonstrating the formation of stable aggregates in the absence of Pfdn5. Additionally, we now performed a Tau solubility assay and show that in the absence of Pfdn5, a significant amount of Tau goes in the pellet fraction, which could not be detected by phospho-specific AT8 Tau antibody (targeting pSer202/pThr205) but was detected by total hTau antibody (D5D8N) on the western blots (Figure S8). These data further reinforce our conclusion that  Pfdn5 prevents the transition of hTau from soluble and/or microtubule-associated state to an aggregated, insoluble, and pathogenic state. These new data have been incorporated into the revised manuscript.

      (b) Can additional markers (nuclei, cell membrane, etc.) be used to highlight whether the taupositive structures are present in the cell body or at synapses?

      We performed the co-staining of Tau and Elav to assess the aggregated Tau localization. We found that in the presence of Pfdn5, Tau is predominantly cytoplasmic and localised to the cell body and axons. In the absence of Pfdn5, Tau forms aggregates but is still localized to the cell body or axons. However, some of the aggregates are very large, and the subcellular localization could not be determined (Figure S8M-N'). These might represent brain regions of possible nuclear breakdown and cell death (JACKSON et al. 2002).

      (c) It would also be helpful to perform western blots from larval (and adult) brains examining tau protein levels, phospho-tau species, possible higher-molecular weight oligomeric forms, and insoluble vs. soluble species. These studies would be especially important to help interpret the potential mechanisms of observed interactions.

      Western blot analysis revealed that overexpression of Pfdn5 does not alter total Tau levels (Figure 6O). In Pfdn5 mutants, however, hTau<sup>V337M</sup> levels were reduced in the supernatant fraction and increased in the pellet fraction, indicating a shift from soluble monomeric Tau to aggregated Tau.

      (d) Does overexpression of Pdn5 (UAS-Pdn5) suppress the formation of tau aggregates? I would therefore recommend that additional experiments be performed looking at adult flies (perhaps in Pfdn5 heterozygotes or using RNAi due to the larval lethality of Pdn5 null animals).

      Overexpression of Pfdn5 significantly reduced Tau-aggregates (Elav-Gal4/UASTau<sup>V337M</sup>; UAS-Pfdn5; DPfdn5<sup>15/40</sup>) observed in Pfdn5 mutants (Figure 5E). Coexpression of Pfdn5 and hTau<sup>V337M</sup> suppresses the Tau aggregates/puncta in 30-day adult brain. Since heterozygous DPfdn<sup>15</sup>/+ did not show a reduction in Pfdn5 levels, we did not test the suppression of Tau aggregates in  DPfdn<sup>15</sup>/+; Elav>UAS-Pfdn5, UAS-Tau<sup>V337M</sup>.

      (11) Figure 6, panels A-N: The GMR>Tau rough eye is not a "neurodegenerative" but rather a predominantly developmental phenotype. It results from aberrant retinal developmental patterning and the subsequent secretion/formation of the overlying eye cuticle (lenslets). I am confused by the data shown suggesting a "shrinking eye size" and increasing roughened surface over time (a GMR>tau eye similar to that shown in panel B cannot change to appear like the one in panel H with aging). The rough eye can be quite variable among a population of animals, but it is usually fixed at the time the adult fly ecloses from the pupal case, and quite stable over time in an individual animal. Therefore, any suppression of the Tau rough eye seen at 30 days should be appreciable as soon as the animals eclose. These results need to be clarified. If indeed there is robust suppression of Tau rough eye, it may be more intuitive and clearer to include these data with Figure 1, when first showing the loss-of-function enhancement of the Tau rough eye. Also, why is Pfdn6 included in these experiments but not in the studies shown in Figures 2-5?

      We thank the reviewer for their careful and knowledgeable assessment of the GMR>Tau rough eye model. We appreciate the clarification that the rough eye phenotype could be “developmental” rather than neurodegenerative.”  Our initial observations regarding "shrinking eye size" and "increased surface roughness" clearly show age-related progression of structural change.   Such progression has been observed and reported by others (IIJIMA-ANDO et al. 2012; PASSARELLA AND GOEDERT 2018).   We observed an age-dependent increase in the number of fused ommatidia in GMR-Gal4 >Tau, which were rescued by Pfdn5 or Pfdn6 expression. We noted that adult-specific induction of hTau<sup>V337M</sup> adult flies using the Gal80<sup>ts</sup> and GMR-GeneSwitch (GMR-GS) systems was not sufficient to induce a significant eye phenotype; thus, early expression of Tau in the developing eye imaginal disc appears to be required for the adult progressive phenotype that we observe. We feel that it is inadequate to refer to this adult progressive phenotype as “developmental,” and while admittedly arguable whether this can be termed “degenerative.”   

      To address neurodegeneration more directly, we focused on 30-day-old adult fly brains and demonstrated that Pfdn5 overexpression suppresses age-dependent Tau-induced neurodegeneration in the central nervous system (Figure 6H-N and Figure S12). This supports our central conclusion regarding the neuroprotective role of Pfdn5 in age-associated Tau pathology. Since we found an enhancement in the Tau-induced synaptic and eye phenotypes by Pfdn6 knockdown, we also generated CRISPR/Cas9-mediated loss-of-function mutants for Pfdn6. However, loss of Pfdn6 resulted in embryonic/early first instar lethality, which precluded its detailed analysis at the larval stages.

      (12) Figure 6, panels O-T: the elav>tau image appears to show a different frontal section plane compared to the other panels. It is advisable to show images at a similar level in all panels since vacuolar pathology can vary by region. It is also useful to be able to see the entire brain at a lower power, but the higher power inset view is obscuring these images. I would recommend creating separate panels rather than showing them as insets.

      In the revised figure, we now display the low- and high-magnification images as separate, clearly labeled panels instead of using insets. This improves visibility of the brain morphology while providing detailed views of the vacuolar pathology (Figure 6H-L).

      (13) Figure 6/7: For the experiments in which Pfdn5/6 is overexpressed and possibly suppresses tau phenotypes (brain vacuoles and memory), it is important to use controls that normalize the number of UAS binding sites, since increased UAS sites may dilute GAL4 and reduced Tau expression levels/toxicity. Therefore, it would be advisable to compare with Elav>Tau flies that also include a chromosome with an empty UAS site or other transgenes, such as UAS-GFP or UAS-lacZ.

      We thank the reviewer for the suggestion. Now we have incorporated proper controls in the brain vacuolization, the mushroom body, and ommatidial fusion rescue experiments. Also, we have independently verified whether Gal4 dilution has any effect on the Tau phenotypes (Figure 6H-L, Figure 7, and Figure S11A-B).

      (14) Lines 311-312: the authors say vacuolization occurs in human neurodegenerative disease, which is not really true to my knowledge and definitely not stated in the citation they use. Please re-phrase.

      Now we have made the appropriate changes in the revised manuscript.

      (15) Figure 7: The authors claim that Pfdn5/6 expression does not impact memory behavior, but there in fact appears to be a decrease in preference index (panel D vs panel B). Does this result complicate the interpretation of the potential interaction with Tau (panel F). Are data from wildtype control flies available?

      In our memory assay, a decrease in performance index (PI) of the trained flies compared to the naïve flies indicates memory formation (normal memory in control flies, Figure 7B). In contrast, a lack of significant difference in PI indicates a memory defect (Figure 7C: hTau<sup>V337M</sup> overexpressed flies). "Decrease in preference index (panel D vs panel B)" is not a sign of memory defect; it may be interpreted as a better memory instead. Hence, neuronal overexpression of Pfdn5 (Figure 7D) or Pfdn6 (Figure 7E) in wildtype neurons does not cause memory deficits. In addition, coexpression of Pfdn5/6 and hTau<sup>V337M</sup> successfully rescues the Tau-induced memory defect (significant drop in PI compared to the PI of naïve flies in Figure 7F-G). Moreover, almost complete rescue of the Tau-induced mushroom body defect on Pfdn5 or Pfdn6 expression further establishes potential interaction between Pfdn5/6 and Tau. This data has been incorporated into the revised manuscript.

      The memory assay itself with extensive data on wildtype flies and various other genotype will shortly be submitted for publication in another manuscript (Majumder et al, manuscript under preparation); However, we can confirm for the reviewer that wildtype flies, trained and assayed by the protocol described, show a significant decrease in performance index compared to the naïve flies, indicative of strong learning and memory performance, very similar to the control genotype data shown in Figure 7B. 

      Additional minor considerations

      (16) Lines 50-52: there are many therapeutic interventions for treating tauopathies, but not curative or particularly effective ones.

      Now we have made the appropriate changes in the revised manuscript.

      (17) Lines 87-106 seem like a duplication of the abstract. Consider deleting or condensing.

      We have made the appropriate changes in the revised manuscript.

      (18) Where is pfdn5 expressed? Development v. adult? Neuron v. glia? Conservation?

      Prefoldin5 is expressed throughout development but strongly localized to the larval trachea and neuronal axons. Drosophila Pfdn5 shows 35% overall identity with human PFDN5. 

      (19) Liine 187: is pfdn5 truly "novel"?

      The role of Pfdn5 as microtubule-binding and stabilizing is a new finding and has not been predicted or described before. Hence, it is a novel neuronal microtubule-associated protein.  

      (20) Figure 5, panel F, genotype labels on the x-axis are confusing; consider simplifying to Control, DPfdn, and Rescue.

      We have made appropriate changes in the figure for better readability.

      (21) Figures 5/8: it might be preferable to use consistent colors for Tau/HRP--Tau is labeled green in Figure 5 and then purple in Figure 8.

      We have made these changes where possible. 

      (22) Lines 311-312: Vacuolar neuropathology is NOT typically observed in human Tauopathy.

      We thank the reviewer for pointing this out. We have made the appropriate changes in the revised manuscript.

      (23) Lines 328-349: The explanation could be made more clear. Naïve flies should not necessarily be called controls. Also, a more detailed explanation of how the preference index is computed would be helpful. Why are some datapoints negative values?

      (a) We have rewritten this paragraph to make the description and explanation clearer. The detailed method and formula to calculate the Preference index have been incorporated in the Materials and Methods section.

      (b) We have replaced the term Control with Naïve. 

      (c) Datapoints with negative values appeared in some of the 'Trained' group flies. It indicates that post-CuSO<sub>4</sub> training, some groups showed repulsion towards the otherwise attractive odor 2,3B. As 2,3B is an attractive odorant, naïve or control flies show attraction towards it compared to air, which is evident from a higher number of flies in the Odor arm (O) compared to that of the Air arm (A) of the Y-maze; thus, the PI [(O-A/O+A)*100] is positive in case of naïve fly groups. Training of the flies led to an association of the attractive odorant with bitter food, leading to a decrease of attraction, and even repulsion towards the odorant in a few instances, resulting in less fly count in the odor arm compared to the air arm. Hence, the PI becomes negative as (O-A) is negative in such instances. Thus, it is not an anomaly but indicates strong learning. 

      (24) Line 403: misspelling "Pdfn"

      We have corrected this.

      (25) Lines 423-425: recommend re-phrasing, since tauopathies are human diseases. Mice and other animal models may be susceptible to tau-mediated neuronal dysfunction but not Tauopathy, per see.

      We have made the appropriate changes in the revised manuscript.

      (26) Lines 468-469: "tau neuropathology" rather than "tau associated neuropathies".

      We have made the appropriate changes in the revised manuscript. 

      References

      Askenazi, M., T. Kavanagh, G. Pires, B. Ueberheide, T. Wisniewski et al., 2023 Compilation of reported protein changes in the brain in Alzheimer's disease. Nat Commun 14: 4466.

      Hsieh, Y. C., C. Guo, H. K. Yalamanchili, M. Abreha, R. Al-Ouran et al., 2019 Tau-Mediated Disruption of the Spliceosome Triggers Cryptic RNA Splicing and Neurodegeneration in Alzheimer's Disease. Cell Rep 29: 301-316 e310.

      Iijima-Ando, K., M. Sekiya, A. Maruko-Otake, Y. Ohtake, E. Suzuki et al., 2012 Loss of axonal mitochondria promotes tau-mediated neurodegeneration and Alzheimer's disease-related tau phosphorylation via PAR-1. PLoS Genet 8: e1002918.

      Jackson, G. R., M. Wiedau-Pazos, T. K. Sang, N. Wagle, C. A. Brown et al., 2002 Human wildtype tau interacts with wingless pathway components and produces neurofibrillary pathology in Drosophila. Neuron 34: 509-519.

      Ji, W., K. An, C. Wang and S. Wang, 2022 Bioinformatics analysis of diagnostic biomarkers for Alzheimer's disease in peripheral blood based on sex differences and support vector machine algorithm. Hereditas 159: 38.

      Leitner, D., G. Pires, T. Kavanagh, E. Kanshin, M. Askenazi et al., 2024 Similar brain proteomic signatures in Alzheimer's disease and epilepsy. Acta Neuropathol 147: 27.

      Li, L., Y. Jiang, G. Wu, Y. A. R. Mahaman, D. Ke et al., 2022 Phosphorylation of Truncated Tau Promotes Abnormal Native Tau Pathology and Neurodegeneration. Mol Neurobiol 59: 6183-6199.

      Mershin, A., E. Pavlopoulos, O. Fitch, B. C. Braden, D. V. Nanopoulos et al., 2004 Learning and memory deficits upon TAU accumulation in Drosophila mushroom body neurons. Learn Mem 11: 277-287.

      Mukaka, M. M., 2012 Statistics corner: A guide to appropriate use of correlation coefficient in medical research. Malawi Med J 24: 69-71.

      Okenve-Ramos, P., R. Gosling, M. Chojnowska-Monga, K. Gupta, S. Shields et al., 2024 Neuronal ageing is promoted by the decay of the microtubule cytoskeleton. PLoS Biol 22: e3002504.

      Passarella, D., and M. Goedert, 2018 Beta-sheet assembly of Tau and neurodegeneration in Drosophila melanogaster. Neurobiol Aging 72: 98-105.

      Sun, Z., J. S. Kwon, Y. Ren, S. Chen, C. K. Walker et al., 2024 Modeling late-onset Alzheimer's disease neuropathology via direct neuronal reprogramming. Science 385: adl2992.

      Tao, Y., Y. Han, L. Yu, Q. Wang, S. X. Leng et al., 2020 The Predicted Key Molecules, Functions, and Pathways That Bridge Mild Cognitive Impairment (MCI) and Alzheimer's Disease (AD). Front Neurol 11: 233.

      Wegmann, S., B. Eftekharzadeh, K. Tepper, K. M. Zoltowska, R. E. Bennett et al., 2018 Tau protein liquid-liquid phase separation can initiate tau aggregation. EMBO J 37.

    1. Lo que hagas con tu experiencia estudiantil depende de ti. Recuerda por qué estás en la universidad y asegúrate de dedicar tu tiempo a alcanzar tus metas. En tu campus encontrarás recursos y personas dispuestas a ayudarte. Tú tienes el control: úsalo sabiamente.

      Todo depende de ti mismo ya no es la escuela secundaria

    2. En la cultura popular, algunas películas retratan la vida universitaria como una fiesta constante donde los estudiantes beben en exceso y malgastan el di

      La mayoría de gente piensa que es mucha fiestas grandes como en las películas pero es lo contrario

    1. Synthèse du projet Sympa

      Résumé Exécutif

      Sympa est un gestionnaire de listes de diffusion open-source (GPLv2), développé en Perl depuis 17 ans.

      Initialement conçu au sein de l'université Comète-Résu, il est aujourd'hui hébergé par Renater, le réseau national de télécommunications pour la technologie, l'enseignement et la recherche en France.

      Bien qu'il assure les fonctions de base d'un gestionnaire de listes, Sympa se distingue par des fonctionnalités avancées qui en font un outil puissant pour les grandes organisations.

      Ses principaux atouts sont sa capacité d'intégration profonde avec les systèmes d'information existants (bases de données, annuaires LDAP, systèmes d'authentification), ses mécanismes d'industrialisation pour la création et la gestion de milliers de listes, et un système d'autorisation par scénarios extrêmement flexible et expressif.

      Le projet, bien que mature et utilisé par des institutions prestigieuses (90% des universités françaises, ministères, entreprises comme Orange et Atos), fait face aux défis d'un code historique de 17 ans.

      Pour y répondre, l'équipe de développement a entamé une refonte majeure du code pour la future version 7.0.

      Cette version introduira une architecture modernisée, des tests unitaires, une nouvelle interface web et une migration vers Git pour faciliter les contributions externes.

      La vision à long terme inclut le déploiement en mode SaaS, la diffusion de messages multi-supports (SMS, web) et un système de plugins.

      Le projet lance un appel actif à la communauté pour contribuer au développement, à la documentation, au support et à la gestion du projet, offrant même un service d'hébergement gratuit pour la communauté Perl afin de promouvoir l'utilisation d'outils libres.

      1. Introduction à Sympa

      Définition et Origine

      Nom : Sympa est l'acronyme de "Système de Multi-postage Automatique".

      Âge : Il s'agit d'un logiciel mature, dont la première version a été publiée le 1er avril 1997, soit il y a 17 ans au moment de la présentation.

      Fonction de base : Comme Mailman ou PHPList, Sympa permet d'envoyer un seul e-mail à un serveur qui se charge de le distribuer à un grand nombre d'abonnés.

      Hébergement et Licence : Le projet est hébergé par Renater, l'équivalent français du réseau national pour la recherche et l'éducation. C'est un logiciel libre sous licence GPLv2.

      Philosophie Perl : L'équipe revendique fièrement l'utilisation de Perl, affirmant que malgré les questions sur l'utilisation d'un langage "plus moderne", Sympa reste l'un des meilleurs gestionnaires de listes de diffusion et "il fonctionne".

      Statistiques et Utilisateurs Clés

      Sympa est utilisé par une base d'utilisateurs majoritairement internationale, malgré son origine française.

      Métrique

      Chiffre Record

      Contexte

      Plus grande liste

      1,6 million d'abonnés

      Plus grand nombre d'hôtes virtuels

      30 000

      Sur un seul serveur, par l'hébergeur Infomaniac

      Plus grand nombre de listes

      32 000

      Sur un seul serveur

      Plus grand nombre d'abonnés

      3 millions

      Sur un seul serveur

      Principaux utilisateurs :

      Recherche et Éducation : 90% des universités et centres de recherche en France.

      Secteur Public : Plusieurs ministères français.

      Entreprises privées : Orange, Atos.

      Hébergeurs : Infomaniac, Switch (fourni par défaut à leurs clients).

      Organisations non gouvernementales : riseup.net, NAA, UNESCO, CGT.

      2. Fonctionnalités Principales et Différenciatrices

      Au-delà de l'envoi d'e-mails, Sympa se distingue par des capacités avancées conçues pour les environnements complexes.

      Gestion Avancée des E-mails

      Envoi en masse optimisé : Sympa permet de regrouper les e-mails par domaine et de personnaliser la fréquence d'envoi pour éviter d'être identifié comme un spammeur tout en assurant une distribution rapide.

      Support des standards (RFC) : Il prend en charge S/MIME (signature et chiffrement), DKIM et offre une protection contre DMARC, ce qui a été crucial lorsque Yahoo a modifié sa politique en avril, cassant de nombreux systèmes de listes de diffusion.

      Gestion des erreurs : La gestion des bounces est automatique et gérée par Sympa, non par l'expéditeur original. Le support de VERP (Variable Envelope Return Path) permet de traiter automatiquement les erreurs pour les adresses e-mail transférées.

      Suivi des e-mails : Un suivi respectueux de la vie privée (sans "spy pixels") permet de savoir ce qui est arrivé à un e-mail pour chaque utilisateur, en se basant sur les RFC.

      Personnalisation (Mail Merging) : Il est possible de fusionner des données utilisateur dans un e-mail pour envoyer des messages personnalisés.

      Archives Web : Sympa dispose d'archives web avec un contrôle d'accès fin.

      Intégration aux Systèmes d'Information (SI)

      Sympa est conçu pour s'intégrer nativement avec les briques logicielles d'un système d'information d'entreprise ou d'université.

      Composant

      Technologies Supportées

      Serveur de messagerie (MTA)

      Sendmail, Postfix, Exim

      Base de données (SGBDR)

      MySQL, PostgreSQL, Oracle, SQLite, Sybase ("sans espoir")

      Serveur Web

      Apache, lighttpd, Nginx

      Sources de données (Référentiels)

      Bases de données relationnelles, LDAP, fichiers plats, services web (texte brut)

      Systèmes d'authentification

      Natif (email/mot de passe), CAS, Shibboleth, LDAP

      Industrialisation de la Gestion des Listes

      Pour les environnements nécessitant la création de centaines ou de milliers de listes (par exemple, chaque année dans une université), Sympa offre des mécanismes d'automatisation.

      1. Création Manuelle : Un simple formulaire web où l'utilisateur remplit les informations de base (nom, objet, propriétaire).

      Les valeurs par défaut sont fournies par la configuration globale et un modèle de liste (Template Toolkit - tt2).

      2. Familles de Listes : Un mécanisme pour créer des listes en masse.

      Il utilise un modèle tt2 commun et un fichier XML qui définit les paramètres spécifiques de chaque liste à créer.

      Une seule commande permet de générer ou de mettre à jour toutes les listes de la famille.

      3. Listes Automatiques : Conçues pour les cas où il existe un très grand nombre de listes potentielles mais où seulement une fraction sera utilisée.

      ◦ Le nom de la liste contient lui-même les paramètres (ex: prefix-field1_value1-field2_value2).  

      ◦ La liste n'est créée dynamiquement que lors du premier envoi d'un message à cette adresse.   

      ◦ Une interface web a été développée pour simplifier la composition de ces adresses complexes.

      4. Familles de Familles : Il est possible de créer des familles de listes automatiques, permettant une industrialisation à plusieurs niveaux.

      Mécanisme d'Autorisation par Scénarios

      C'est l'une des fonctionnalités les plus originales et puissantes de Sympa.

      Principe : Les autorisations pour chaque action (envoyer un message, consulter les archives, etc.) sont définies dans des fichiers appelés "scénarios" (ex: send.scenario).

      Structure d'un scénario : C'est une séquence de règles évaluées de haut en bas.

      Chaque règle a la forme : test(arguments) 'auth_method' -> decision.

      Évaluation : Le traitement s'arrête à la première règle dont le test est vrai.

      Tests : De nombreux tests sont disponibles (is_subscriber, is_list_owner, etc.).

      Il est possible d'ajouter des tests personnalisés via des modules Perl (custom_condition).

      Méthodes d'authentification : Permettent d'appliquer des règles différentes selon la robustesse de l'authentification (ex: smime, smtp pour le champ From:, md5 pour un utilisateur authentifié sur le web).

      Décisions : Vont au-delà du simple "oui/non". Les décisions possibles incluent do_it (accepter), reject (rejeter), owner (modération par le propriétaire), etc.

      Ce système offre une grande expressivité pour définir des politiques d'accès très fines.

      Capacités de Gestion de Groupes

      Sympa peut être utilisé comme un gestionnaire de groupes pour des applications tierces.

      Interface SOAP (et REST en développement) : Une interface SOAP permet à d'autres applications d'interroger les données internes de Sympa (créer une liste, abonner un utilisateur, etc.).

      Intégration : Des plugins pour des applications comme DokuWiki ou LimeSurvey permettent d'interroger Sympa pour savoir à quelles listes (donc à quels groupes) un utilisateur appartient.

      L'application tierce peut alors accorder des privilèges en fonction de cette appartenance.

      Hiérarchie de groupes : Sympa permet d'inclure des listes dans d'autres listes, créant ainsi des groupes plus larges.

      Personnalisation Poussée

      Presque tous les aspects de Sympa sont personnalisables à différents niveaux (serveur global, hôte virtuel, liste individuelle) selon un principe de cascade.

      Interface Web : Entièrement basée sur des modèles Template Toolkit.

      Messages de service : Les messages envoyés aux utilisateurs (bienvenue, etc.) peuvent être modifiés.

      Modèles de création de liste.

      Scénarios d'autorisation.

      Paramètres de liste : Il est possible de créer ses propres paramètres en plus de la centaine existante.

      Attributs utilisateur : Possibilité d'ajouter des champs personnalisés pour les utilisateurs, qui pourront être synchronisés avec LDAP ou une base de données dans une future version.

      3. Architecture et Fonctionnement Technique

      Le flux de traitement d'un e-mail illustre l'architecture modulaire de Sympa :

      1. Réception : Un e-mail est envoyé à une liste et arrive sur le MTA entrant.

      2. Traitement Initial : Le MTA transmet l'e-mail au démon sympa.pl, qui évalue les autorisations, personnalise le message, etc.

      3. Stockage : Si le message est autorisé, il est stocké dans une base de données relationnelle (SGBDR). L'utilisation d'une base de données permet un accès concurrentiel sécurisé.

      4. Distribution : Un démon dédié, bulk.pl, se charge exclusivement de l'envoi des e-mails.

      Il lit les messages dans la base de données et ouvre de multiples sessions SMTP pour une distribution rapide et parallélisable sur plusieurs serveurs.

      5. Archivage : Simultanément, une copie du message est traitée par le démon archived.pl pour être ajoutée aux archives web.

      4. Le Projet Sympa : Développement et Communauté

      Gouvernance et Équipe

      Développeurs principaux : Le projet est passé de 2 développeurs historiques à une équipe élargie de 5 personnes, dont 3 externes à Renater.

      Mark (Strasbourg) : Gourou Perl.   

      Guillaume : Responsable sécurité, expert en bonnes pratiques.    ◦ Soji (Tokyo) : Spécialiste des e-mails et des problèmes d'encodage (a mené la migration vers UTF-8).   

      Etienne : Développeur polyglotte.  

      David Verdin (le présentateur) : "Homme à tout faire" (documentation, gestion de communauté, présentations).

      Contributions : Le projet bénéficie de nombreuses contributions de la communauté Perl.

      Défis d'un Logiciel Ancien

      Avec 17 ans d'histoire, le code de Sympa est devenu très hétérogène, avec des styles de codage variés issus de nombreux contributeurs.

      Base installée : L'importante base d'utilisateurs en production impose une grande prudence lors des modifications du code.

      Dépendances : L'ajout de nouveaux modules CPAN est compliqué car les utilisateurs en production préfèrent installer via des paquets de distribution, qui doivent donc exister pour ces modules.

      Absence de tests : Historiquement, le logiciel n'avait pas de tests unitaires ; les tests étaient effectués "en direct" sur les serveurs de production.

      5. L'Avenir de Sympa : Feuille de Route et Vision

      Versions à Venir (6.2, 7.0, 7.1)

      Version 6.2 : Presque finalisée, elle subit actuellement des tests manuels intensifs avant une sortie en bêta.

      Version 7.0 : Il s'agit d'une refonte majeure.

      Nouveau code : Réécriture complète menée par Guillaume pour moderniser l'architecture. 

      Tests unitaires : Implémentation systématique de tests.    ◦ Nouvelle interface web : Plus simple, plus moderne et ergonomique, développée par un contributeur de Nouvelle-Zélande.  

      Migration vers Git : Pour faciliter le fork et les contributions externes (par exemple sur GitHub).

      Version 7.1 et au-delà :

      Mode SaaS (Software as a Service).  

      Diffusion multi-supports : Envoi de messages via SMS ou mise à jour de services web.  

      Système de plugins : Pour permettre l'ajout de petites fonctionnalités sans attendre une intégration au cœur du logiciel.  

      Support des adresses e-mail internationalisées.

      Orientations Stratégiques

      Un objectif clé est de maintenir la double capacité de Sympa :

      1. Grandes installations : Capable de tourner sur des clusters en mode SaaS.

      2. Petites installations : Rester simple à installer et à faire fonctionner sur un petit serveur autonome.

      6. Appel à la Participation et Offres à la Communauté

      Opportunités de Contribution

      Le projet recherche activement de l'aide, y compris non technique :

      Développement : Correction de bugs, ajout de fonctionnalités.

      Documentation : La documentation est un wiki modifiable par tout utilisateur abonné à la liste sympa-users.

      Support : Aider les autres utilisateurs sur les listes de diffusion.

      Packaging : Créer des paquets pour différentes distributions Linux.

      Gestion de projet : Partage d'expérience sur la gestion d'un projet logiciel en pleine croissance.

      Offre d'Hébergement Gratuit

      Pour contrer l'utilisation de services comme Google Groups par les communautés du logiciel libre, l'équipe Sympa propose de fournir un service d'hébergement de listes de diffusion gratuit pour la communauté Perl mondiale.

      L'infrastructure de Renater permet de déployer un nouvel hôte virtuel en 30 minutes.

      7. Questions et Réponses Clés

      Nouvelle interface web (v7.0) : Elle sera plus simple, avec moins d'options par défaut pour ne pas submerger les nouveaux utilisateurs.

      L'ergonomie sera plus moderne et proche de ce que l'on trouve sur les réseaux sociaux.

      Interface REST : Une interface REST existe déjà pour la gestion de groupes (basée sur OAuth), mais la refonte du code vise à rendre toutes les fonctionnalités de Sympa accessibles via toutes ses interfaces (ligne de commande, SOAP, REST, web et e-mail).

      Stockage des e-mails et des pièces jointes : Les e-mails des archives sont stockés de façon permanente.

      L'anonymisation est un défi juridique et technique complexe.

      Les pièces jointes sont stockées et accessibles via un lien.

      Pour les listes qui le souhaitent, les pièces jointes volumineuses peuvent être automatiquement détachées et remplacées par un lien pour alléger les e-mails.

      Support des bases de données : MySQL est celle qui reçoit le plus d'attention car c'est la plus utilisée par l'équipe.

      PostgreSQL et SQLite sont également très bien maintenus et leurs schémas sont mis à jour automatiquement.

      Le support d'Oracle est plus difficile.

    1. Reviewer #2 (Public review):

      Summary:

      The role of PRC2 in post neural crest induction was not well understood. This work developed an elegant mouse genetic system to conditionally deplete EED upon SOX10 activation. Substantial developmental defects were identified for craniofacial and bone development. The authors also performed extensive single-cell RNA sequencing to analyze differentiation gene expression changes upon conditional EED disruption.

      Strengths:

      (1) Elegant genetic system to ablate EED post neural crest induction.

      (2) Single-cell RNA-seq analysis is extremely suitable for studying the cell type specific gene expression changes in developmental systems.

      Original Weaknesses:

      (1) Although this study is well designed and contains state-of-art single cell RNA-seq analysis, it lacks the mechanistic depth in the EED/PRC2-mediated epigenetic repression. This is largely because no epigenomic data was shown.

      (2) The mouse model of conditional loss of EZH2 in neural crest has been previously reported, as the authors pointed out in the discussion. What is novelty in this study to disrupt EED? Perhaps a more detailed comparison of the two mouse models would be beneficial.

      (3) The presentation of the single-cell RNA-seq data may need improvement. The complexity of the many cell types blurs the importance of which cell types are affected the most by EED disruption.

      (4) While it's easy to identify PRC2/EED target genes using published epigenomic data, it would be nice to tease out the direct versus indirect effects in the gene expression changes (e.g Fig. 4e)

      Comments on latest version:

      The authors have addressed weaknesses 2 and 3 of my previous comment very well. For weaknesses 1 and 4, the authors have added a main Fig 5 and its associated supplemental materials, which definitely strengthen the mechanistic depth of the story. However, I think the audience would appreciate if the following questions/points could be further addressed regarding the Cut&Tag data (mostly related to main Figure 5):

      (1) The authors described that Sox10-Cre would be expressed at E8.75, and in theory, EED-FL would be ablated soon after that. Why would E16.5 exhibit a much smaller loss in H3K27me3 compared to E12.5? Shouldn't a prolong loss of EED lead to even worse consequence?

      (2) The gene expression change at E12.5 upon loss of EED (shown in Fig. 4h) seems to be massive, including many PRC2-target genes. However, the H3K27me3 alteration seems to be mild even at E12.5. Does this infer a PRC2 or H3K27 methylation - independent role of EED? To address this, I suggest the authors re-consider addressing my previously commented weakness #4 regarding the RNA-seq versus Cut&Tag change correlation. For example, a gene scatter plot with X-axis of RNA-seq changes versus Y-axis of H3K27me3 level changes.

      (3) The CUT&Tag experiments seem to contain replicates according to the figure legend, but no statistical analysis was presented including the new supplemental tables. Also, for Fig. 5c-d, instead of showing the MRR in individual conditions, I think the audience would really want to know the differential MRR between Fl/WT and Fl/Fl. In other words, how many genes/ MRR have statistically lower H3K27me3 level upon EED loss.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors validate the contribution of RAP2A to GB progression. RAp2A participates in asymmetric cell division, and the localization of several cell polarity markers, including cno and Numb.

      Strengths:

      The use of human data, Drosophila models, and cell culture or neurospheres is a good scenario to validate the hypothesis using complementary systems.

      Moreover, the mechanisms that determine GB progression, and in particular glioma stem cells biology, are relevant for the knowledge on glioblastoma and opens new possibilities to future clinical strategies.

      Weaknesses:

      While the manuscript presents a well-supported investigation into RAP2A's role in GBM, several methodological aspects require further validation. The major concern is the reliance on a single GB cell line (GB5), which limits the generalizability of the findings. Including multiple GBM lines, particularly primary patient-derived 3D cultures with known stem-like properties, would significantly enhance the study's relevance.

      Additionally, key mechanistic aspects remain underexplored. Further investigation into the conservation of the Rap2l-Cno/aPKC pathway in human cells through rescue experiments or protein interaction assays would be beneficial. Similarly, live imaging or lineage tracing would provide more direct evidence of ACD frequency, complementing the current indirect metrics (odd/even cell clusters, Numb asymmetry).

      Several specific points require attention:

      (1) The specificity of Rap2l RNAi needs further confirmation. Is Rap2l expressed in neuroblasts or intermediate neural progenitors? Can alternative validation methods be employed?

      There are no available antibodies/tools to determine whether Rap2l is expressed in NB lineages, and we have not been able either to develop any. However, to further prove the specificity of the Rap2l phenotype, we have now analyzed two additional and independent RNAi lines of Rap2l along with the original RNAi line analyzed. We have validated the results observed with this line and found a similar phenotype in the two additional RNAi lines now analyzed. These results have been added to the text ("Results section", page 6, lines 142-148) and are shown in Supplementary Figure 3.

      (2) Quantification of phenotypic penetrance and survival rates in Rap2l mutants would help determine the consistency of ACD defects.

      In the experiment previously mentioned (repetition of the original Rap2l RNAi line analysis along with two additional Rap2l RNAi lines) we have substantially increased the number of samples analyzed (both the number of NB lineages and the number of different brains analyzed). With that, we have been able to determine that the penetrance of the phenotype was 100% or almost 100% in the 3 different RNAi lines analyzed (n>14 different brains/larvae analyzed in all cases). Details are shown in the text (page 6, lines 142-148), in Supplementary Figure 3 and in the corresponding figure legend.

      (3) The observations on neurosphere size and Ki-67 expression require normalization (e.g., Ki-67+ cells per total cell number or per neurosphere size). Additionally, apoptosis should be assessed using Annexin V or TUNEL assays.

      The experiment of Ki-67+ cells was done considering the % of Ki-67+ cells respect the total cell number in each neurosphere. In the "Materials and methods" section it is well indicated: "The number of Ki67+ cells with respect to the total number of nuclei labelled with DAPI within a given neurosphere were counted to calculate the Proliferative Index (PI), which was expressed as the % of Ki67+ cells over total DAPI+ cells"

      Perhaps it was not clearly showed in the graph of Figure 5A. We have now changed it indicating: "% of Ki67+ cells/ neurosphere" in the "Y axis". 

      Unfortunately, we currently cannot carry out neurosphere cultures to address the apoptosis experiments. 

      (4) The discrepancy in Figures 6A and 6B requires further discussion.

      We agree that those pictures can lead to confusion. In the analysis of the "% of neurospheres with even or odd number of cells", we included the neurospheres with 2 cells both in the control and in the experimental condition (RAP2A). The number of this "2 cell-neurospheres" was very similar in both conditions (27,7 % and 27 % of the total neurospheres analyzed in each condition), and they can be the result of a previous symmetric or asymmetric division, we cannot distinguish that (only when they are stained with Numb, for example, as shown in Figure 6B). As a consequence, in both the control and in the experimental condition, these 2-cell neurospheres included in the group of "even" (Figure 6A) can represent symmetric or asymmetric divisions. However, in the experiment shown in Figure 6B, it is shown that in these 2 cellneurospheres there are more cases of asymmetric divisions in the experimental condition (RAP2A) than in the control.

      Nevertheless, to make more accurate and clearer the conclusions, we have reanalyzed the data taking into account only the neurospheres with 3-5-7 (as odd) or 4-6-8 (as even) cells. Likewise, we have now added further clarifications regarding the way the experiment has been analyzed in the methods.

      (5) Live imaging of ACD events would provide more direct evidence.

      We agree that live imaging would provide further evidence. Unfortunately, we currently cannot carry out neurosphere cultures to approach those experiments.

      (6) Clarification of terminology and statistical markers (e.g., p-values) in Figure 1A would improve clarity.

      We thank the reviewer for pointing out this issue. To improve clarity, we have now included a Supplementary Figure (Fig. S1) with the statistical parameters used. Additionally, we have performed a hierarchical clustering of genes showing significant or not-significant changes in their expression levels.

      (7) Given the group's expertise, an alternative to mouse xenografts could be a Drosophila genetic model of glioblastoma, which would provide an in vivo validation system aligned with their research approach.

      The established Drosophila genetic model of glioblastoma is an excellent model system to get deep insight into different aspects of human GBM. However, the main aim of our study was to determine whether an imbalance in the mode of stem cell division, favoring symmetric divisions, could contribute to the expansion of the tumor. We chose human GBM cell lines-derived neurospheres because in human GBM it has been demonstrated the existence of cancer stem cells (glioblastoma or glioma stem cells -GSCs--). And these GSCs, as all stem cells, can divide symmetric or asymmetrically. In the case of the Drosophila model of GBM, the neoplastic transformation observed after overexpressing the EGF receptor and PI3K signaling is due to the activation of downstream genes that promote cell cycle progression and inhibit cell cycle exit. It has also been suggested that the neoplastic cells in this model come from committed glial progenitors, not from stem-like cells.

      With all, it would be difficult to conclude the causes of the potential effects of manipulating the Rap2l levels in this Drosophila system of GBM. We do not discard this analysis in the future (we have all the "set up" in the lab). However, this would probably imply a new project to comprehensively analyze and understand the mechanism by which Rap2l (and other ACD regulators) might be acting in this context, if it is having any effect. 

      However, as we mentioned in the Discussion, we agree that the results we have obtained in this study must be definitely validated in vivo in the future using xenografts with 3D-primary patient-derived cell lines.

      Reviewer #2 (Public review):

      This study investigates the role of RAP2A in regulating asymmetric cell division (ACD) in glioblastoma stem cells (GSCs), bridging insights from Drosophila ACD mechanisms to human tumor biology. They focus on RAP2A, a human homolog of Drosophila Rap2l, as a novel ACD regulator in GBM is innovative, given its underexplored role in cancer stem cells (CSCs). The hypothesis that ACD imbalance (favoring symmetric divisions) drives GSC expansion and tumor progression introduces a fresh perspective on differentiation therapy. However, the dual role of ACD in tumor heterogeneity (potentially aiding therapy resistance) requires deeper discussion to clarify the study's unique contributions against existing controversies. Some limitations and questions need to be addressed.

      (1) Validation of RAP2A's prognostic relevance using TCGA and Gravendeel cohorts strengthens clinical relevance. However, differential expression analysis across GBM subtypes (e.g., MES, DNA-methylation subtypes ) should be included to confirm specificity.

      We have now included a Supplementary figure (Supplementary Figure 2), in which we show the analysis of RAP2A levels in the different GBM subtypes (proneural, mesenchymal and classical) and their prognostic relevance (i.e. the proneural subtype that presents RAP2A levels significantly higher than the others is the subtype that also shows better prognostic).

      (2) Rap2l knockdown-induced ACD defects (e.g., mislocalization of Cno/Numb) are well-designed. However, phenotypic penetrance and survival rates of Rap2l mutants should be quantified to confirm consistency.

      We have now analyzed two additional and independent RNAi lines of Rap2l along with the original RNAi line. We have validated the results observed with this line and found a similar phenotype in the two additional RNAi lines now analyzed. To determine the phenotypic penetrance, we have substantially increased the number of samples analyzed (both the number of NB lineages and the number of different brains analyzed). With that, we have been able to determine that the penetrance of the phenotype was 100% or almost 100% in the 3 different Rap2l RNAi lines analyzed (n>14 different brains/larvae analyzed in all cases). These results have been added to the text ("Results section", page 6, lines 142-148) and are shown in Supplementary Figure 3 and in the corresponding figure legend. 

      (3) While GB5 cells were effectively used, justification for selecting this line (e.g., representativeness of GBM heterogeneity) is needed. Experiments in additional GBM lines (especially the addition of 3D primary patient-derived cell lines with known stem cell phenotype) would enhance generalizability.

      We tried to explain this point in the paper (Results). As we mentioned, we tested six different GBM cell lines finding similar mRNA levels of RAP2A in all of them, and significantly lower levels than in control Astros (Fig. 3A). We decided to focus on the GBM cell line called GB5 as it grew well (better than the others) in neurosphere cell culture conditions, for further analyses. We agree that the addition of at least some of the analyses performed with the GB5 line using other lines (ideally in primary patientderive cell lines, as the reviewer mentions) would reinforce the results. Unfortunately, we cannot perform experiments in cell lines in the lab currently. We will consider all of this for future experiments.

      (4) Indirect metrics (odd/even cell clusters, NUMB asymmetry) are suggestive but insufficient. Live imaging or lineage tracing would directly validate ACD frequency.

      We agree that live imaging would provide further evidence. Unfortunately, we cannot approach those experiments in the lab currently.

      (5) The initial microarray (n=7 GBM patients) is underpowered. While TCGA data mitigate this, the limitations of small cohorts should be explicitly addressed and need to be discussed.

      We completely agree with this comment. We had available the microarray, so we used it as a first approach, just out of curiosity of knowing whether (and how) the levels of expression of those human homologs of Drosophila ACD regulators were affected in this small sample, just as starting point of the study. We were conscious of the limitations of this analysis and that is why we followed up the analysis in the datasets, on a bigger scale. We already mentioned the limitations of the array in the Discussion:

      "The microarray we interrogated with GBM patient samples had some limitations. For example, not all the human genes homologs of the Drosophila ACD regulators were present (i.e. the human homologs of the determinant Numb). Likewise, we only tested seven different GBM patient samples. Nevertheless, the output from this analysis was enough to determine that most of the human genes tested in the array presented altered levels of expression"[....] In silico analyses, taking advantage of the existence of established datasets, such as the TCGA, can help to more robustly assess, in a bigger sample size, the relevance of those human genes expression levels in GBM progression, as we observed for the gene RAP2A."

      (6) Conclusions rely heavily on neurosphere models. Xenograft experiments or patient-derived orthotopic models are critical to support translational relevance, and such basic research work needs to be included in journals.

      We completely agree. As we already mentioned in the Discussion, the results we have obtained in this study must be definitely validated in vivo in the future using xenografts with 3D-primary patient-derived cell lines.

      (7) How does RAP2A regulate NUMB asymmetry? Is the Drosophila Rap2l-Cno/aPKC pathway conserved? Rescue experiments (e.g., Cno/aPKC knockdown with RAP2A overexpression) or interaction assays (e.g., Co-IP) are needed to establish molecular mechanisms.

      The mechanism by which RAP2A is regulating ACD is beyond the scope of this paper. We do not even know how Rap2l is acting in Drosophila to regulate ACD. In past years, we did analyze the function of another Drosophila small GTPase, Rap1 (homolog to human RAP1A) in ACD, and we determined the mechanism by which Rap1 was regulating ACD (including the localization of Numb): interacting physically with Cno and other small GTPases, such as Ral proteins, and in a complex with additional ACD regulators of the "apical complex" (aPKC and Par-6). Rap2l could be also interacting physically with the "Ras-association" domain of Cno (domain that binds small GTPases, such as Ras and Rap1). We have added some speculations regarding this subject in the Discussion:

      "It would be of great interest in the future to determine the specific mechanism by which Rap2l/RAP2A is regulating this process. One possibility is that, as it occurs in the case of the Drosophila ACD regulator Rap1, Rap2l/RAP2A is physically interacting or in a complex with other relevant ACD modulators."

      (8) Reduced stemness markers (CD133/SOX2/NESTIN) and proliferation (Ki-67) align with increased ACD. However, alternative explanations (e.g., differentiation or apoptosis) must be ruled out via GFAP/Tuj1 staining or Annexin V assays.

      We agree with these possibilities.  Regarding differentiation, the potential presence of increased differentiation markers would be in fact a logic consequence of an increase in ACD divisions/reduced stemness markers. Unfortunately, we cannot approach those experiments in the lab currently.

      (9) The link between low RAP2A and poor prognosis should be validated in multivariate analyses to exclude confounding factors (e.g., age, treatment history).

      We have now added this information in the "Results section" (page 5, lines 114-123).

      (10) The broader ACD regulatory network in GBM (e.g., roles of other homologs like NUMB) and potential synergies/independence from known suppressors (e.g., TRIM3) warrant exploration.

      The present study was designed as a "proof-of-concept" study to start analyzing the hypothesis that the expression levels of human homologs of known Drosophila ACD regulators might be relevant in human cancers that contain cancer stem cells, if those human homologs were also involved in modulating the mode of (cancer) stem cell division. 

      To extend the findings of this work to the whole ACD regulatory network would be the logic and ideal path to follow in the future.

      We already mentioned this point in the Discussion:

      "....it would be interesting to analyze in the future the potential consequences that altered levels of expression of the other human homologs in the array can have in the behavior of the GSCs. In silico analyses, taking advantage of the existence of established datasets, such as the TCGA, can help to more robustly assess, in a bigger sample size, the relevance of those human genes expression levels in GBM progression, as we observed for the gene RAP2A."

      (11) The figures should be improved. Statistical significance markers (e.g., p-values) should be added to Figure 1A; timepoints/culture conditions should be clarified for Figure 6A.

      Regarding the statistical significance markers, we have now included a Supplementary Figure (Fig. S1) with the statistical parameters used. Additionally, we have performed a hierarchical clustering of genes showing significant or notsignificant changes in their expression levels. 

      Regarding the experimental conditions corresponding to Figure 6A, those have now been added in more detail in "Materials and Methods" ("Pair assay and Numb segregation analysis" paragraph).

      (12) Redundant Drosophila background in the Discussion should be condensed; terminology should be unified (e.g., "neurosphere" vs. "cell cluster").

      As we did not mention much about Drosophila ACD and NBs in the "Introduction", we needed to explain in the "Discussion" at least some very basic concepts and information about this, especially for "non-drosophilists". We have reviewed the Discussion to maintain this information to the minimum necessary.

      We have also reviewed the terminology that the Reviewer mentions and have unified it.

      Reviewer #1 (Recommendations for the authors):

      To improve the manuscript's impact and quality, I would recommend:

      (1) Expand Cell Line Validation: Include additional GBM cell lines, particularly primary patient-derived 3D cultures, to increase the robustness of the findings.

      (2) Mechanistic Exploration: Further examine the conservation of the Rap2lCno/aPKC pathway in human cells using rescue experiments or protein interaction assays.

      (3) Direct Evidence of ACD: Implement live imaging or lineage tracing approaches to strengthen conclusions on ACD frequency.

      (4) RNAi Specificity Validation: Clarify Rap2l RNAi specificity and its expression in neuroblasts or intermediate neural progenitors.

      (5) Quantitative Analysis: Improve quantification of neurosphere size, Ki-67 expression, and apoptosis to normalize findings.

      (6) Figure Clarifications: Address inconsistencies in Figures 6A and 6B and refine statistical markers in Figure 1A.

      (7) Alternative In Vivo Model: Consider leveraging a Drosophila glioblastoma model as a complementary in vivo validation approach.

      Addressing these points will significantly enhance the manuscript's translational relevance and overall contribution to the field.

      We have been able to address points 4, 5 and 6. Others are either out of the scope of this work (2) or we do not have the possibility to carry them out at this moment in the lab (1, 3 and 7). However, we will complete these requests/recommendations in other future investigations.

      Reviewer #2 (Recommendations for the authors):

      Major Revision /insufficient required to address methodological and mechanistic gaps.

      (1) Enhance Clinical Relevance

      Validate RAP2A's prognostic significance across multiple GBM subtypes (e.g., MES, DNA-methylation subtypes) using datasets like TCGA and Gravendeel to confirm specificity.

      Perform multivariate survival analyses to rule out confounding factors (e.g., patient age, treatment history).

      (2) Strengthen Mechanistic Insights

      Investigate whether the Rap2l-Cno/aPKC pathway is conserved in human GBM through rescue experiments (e.g., RAP2A overexpression with Cno/aPKC knockdown) or interaction assays (e.g., Co-IP).

      Use live-cell imaging or lineage tracing to directly validate ACD frequency instead of relying on indirect metrics (odd/even cell clusters, NUMB asymmetry).

      (3) Improve Model Systems & Experimental Design

      Justify the selection of GB5 cells and include additional GBM cell lines, particularly 3D primary patient-derived cell models, to enhance generalizability.

      It is essential to perform xenograft or orthotopic patient-derived models to support translational relevance.

      (5) Address Alternative Interpretations

      Rule out other potential effects of RAP2A knockdown (e.g., differentiation or apoptosis) using GFAP/Tuj1 staining or Annexin V assays.

      Explore the broader ACD regulatory network in GBM, including interactions with NUMB and TRIM3, to contextualize findings within known tumor-suppressive pathways.

      (6) Improve Figures & Clarity

      Add statistical significance markers (e.g., p-values) in Figure 1A and clarify timepoints/culture conditions for Figure 6A.

      Condense redundant Drosophila background in the discussion and ensure consistent terminology (e.g., "neurosphere" vs. "cell cluster").

      We have been able to address points 1, partially 3 and 6. Others are either out of the scope of this work or we do not have the possibility to carry them out at this moment in the lab. However, we are very interested in completing these requests/recommendations and we will approach that type of experiments in other future investigations.

    1. Communication Numérique pour les Associations : Stratégies et Outils

      Synthèse

      Ce document de synthèse expose les stratégies et les outils essentiels pour permettre aux associations de communiquer efficacement et de renforcer les liens avec leurs adhérents grâce au numérique.

      La communication digitale associative repose sur une démarche stratégique préalable, consistant à définir des objectifs clairs, à comprendre précisément les usages numériques de ses adhérents et à évaluer les ressources (humaines et financières) disponibles.

      La stratégie de communication s'articule autour de trois piliers complémentaires :

      1. Le Site Web : Considéré comme le socle propriétaire et maîtrisable de la communication.

      Il doit être professionnel, optimisé pour les mobiles et structuré pour inciter à l'action via des appels clairs et répétés.

      2. L'Emailing et la Newsletter : Outils privilégiés pour maintenir un lien direct et personnalisé.

      L'utilisation d'une adresse e-mail professionnelle et d'outils dédiés permet de mesurer l'impact, de crédibiliser les échanges et de segmenter les communications.

      3. Les Réseaux Sociaux : Canaux puissants pour amplifier la visibilité et favoriser l'engagement.

      Une approche ciblée, privilégiant un ou deux réseaux pertinents pour l'audience, est plus efficace qu'une présence dispersée.

      L'utilisation de comptes professionnels et de fonctionnalités comme les communautés WhatsApp est recommandée pour structurer les interactions.

      La réussite de cette démarche dépend de la capacité de l'association à s'insérer dans les usages existants de ses membres plutôt que de tenter d'en créer de nouveaux, tout en garantissant la professionnalisation de ses outils et le respect des données personnelles.

      --------------------------------------------------------------------------------

      Contexte et Intervenants

      Ce document s'appuie sur le webinaire "Communiquez efficacement et renforcez le lien avec vos adhérents grâce au numérique", organisé par Solidatech et animé par :

      Camille Wassino, Responsable Marketing et Développement chez Solidatech.

      Sébastien Peron, Directeur de Folly Web.

      Présentation des Organisateurs

      Solidatech

      Solidatech est une structure qui, depuis 2008, a pour mission de renforcer l'impact des associations par le numérique.

      Bénéficiaires : Plus de 45 000 associations, fondations et fonds de dotation inscrits gratuitement.

      Appartenance : Fait partie du mouvement Emmaüs via la coopérative d'insertion Les Ateliers du Bocage et est le représentant français du réseau international TechSoup.

      Offres et Services :

      Outils Numériques : Accès à des logiciels (avec des réductions de 30% à 90% ou gratuits) et à du matériel informatique reconditionné ou neuf (partenariats avec Cisco, Dell).  

      Accompagnement : Un centre de ressources gratuit, une équipe support, un outil de diagnostic de maturité numérique, et un annuaire de prestataires (Prestatek). 

      Savoirs : Coproduction d'une étude nationale triennale sur la place du numérique dans le projet associatif.  

      Formation : Organisme certifié Qualiopi, proposant des formations sur les enjeux du numérique (RGPD, collaboration, etc.) et sur des outils spécifiques (Canva, Microsoft 365), finançables par les crédits OPCO pour les structures employeuses.

      Folly Web

      Folly Web organise des événements gratuits, en ligne et en présentiel dans une trentaine de villes en France, pour aider les TPE au sens large (porteurs de projet, indépendants, associations) à s'approprier le numérique.

      Modèle Économique : La gratuité des événements est assurée par un préfinancement, notamment par l'Afnic (Association Française pour le Nommage Internet en Coopération), qui gère les noms de domaine en .fr et a pour mission d'aider à la numérisation des TPE/PME via son dispositif "Réussir-en.fr".

      Le Cadre Stratégique de la Communication Associative

      Avant de déployer des outils, une réflexion stratégique est indispensable.

      Elle doit porter sur trois questions fondamentales pour éviter de disperser son énergie.

      1. Quels sont vos objectifs ? : Que cherche l'association à accomplir (recruter, fidéliser, informer, etc.) ?

      2. Qui sont vos adhérents ? : Comprendre leurs profils, et surtout, leurs usages numériques.

      L'enjeu est de s'intégrer dans leurs habitudes existantes (ex: sont-ils sur TikTok ?) plutôt que de les forcer à adopter un nouvel outil.

      3. Quelles sont vos ressources ? : Évaluer les capacités humaines (compétences, temps) et financières.

      Il est conseillé de se concentrer sur un ou deux canaux et de les maîtriser parfaitement plutôt que de se disperser.

      Sondages auprès des participants du webinaire

      Deux sondages ont permis de cerner les priorités et les pratiques des associations présentes.

      Sondage 1 : Principaux Enjeux de la Présence en Ligne

      Sondage 2 : Principaux Canaux Numériques Utilisés

      1. Garder le lien avec les adhérents

      1. Emails

      2. Recruter de nouveaux adhérents

      2. Site internet

      3. Échanger entre permanents de l'association

      3. Réseaux sociaux

      Ces résultats confirment la pertinence des trois piliers de communication développés ci-après.

      Pilier 1 : Le Site Web, Votre Socle Numérique

      Le site web est la plateforme de base de l'association. Contrairement aux réseaux sociaux, c'est un espace entièrement maîtrisé, considéré comme "votre commercial 24h/24, 7j/7".

      Le Nom de Domaine

      L'URL (adresse du site) est le premier élément de professionnalisme.

      Bonnes pratiques : Choisir un nom court, facile à retenir et à communiquer.

      Extension : Privilégier des extensions qui ancrent l'association sur son territoire, comme le .fr ou le .asso, plutôt que des extensions plus génériques comme le .com.

      Design et Expérience Utilisateur (UX)

      Les standards du web ont évolué, et les utilisateurs sont devenus plus exigeants.

      Lisibilité : Un site moderne, avec des contrastes et des couleurs bien choisis, est essentiel pour la crédibilité.

      Expérience Mobile : Une part très importante du trafic se fait sur mobile.

      Il est crucial que l'expérience sur smartphone soit fluide et intuitive.

      Valorisation : Un site bien conçu valorise l'association, donne envie de la rejoindre et sert de destination centrale pour les adhérents (actualités, inscriptions, partenaires, etc.).

      Structure d'une Page Efficace

      Une page web efficace suit une structure logique pour capter l'attention et guider l'utilisateur.

      1. Accroche Émotionnelle : La partie visible sans défiler doit susciter l'intérêt avec une image forte, une vidéo ou une phrase percutante.

      2. Arguments Clés : Une fois l'attention captée, présenter les caractéristiques ou les informations importantes de manière claire.

      3. Appel à l'Action (Call to Action - CTA) : C'est un point essentiel.

      Il faut explicitement dire à l'utilisateur ce que l'on attend de lui ("J'adhère", "Inscrivez-vous à la newsletter", "Contactez-nous").

      Ces CTA doivent être présents à plusieurs endroits de la page, car tous les utilisateurs ne la parcourent pas jusqu'en bas.

      Pilier 2 : L'Emailing et la Newsletter, le Lien Direct

      L'email reste un canal de communication extrêmement puissant pour maintenir un lien fort avec une audience qui a consenti à recevoir des informations.

      Professionnalisme et Outils

      Adresse d'envoi : Utiliser une adresse e-mail professionnelle (ex: prenom@nomdelasso.fr) plutôt qu'une adresse générique (@gmail.com) est un gage de crédibilité et de sérieux.

      Outils d'emailing : L'utilisation d'outils professionnels (comme Brevo, un outil français mentionné) est recommandée. Ils permettent de :

      Mesurer la performance : Suivre le taux de délivrabilité, le taux d'ouverture et le taux de clic.    ◦

      Analyser et optimiser : Comprendre ce qui fonctionne (ex: l'objet de l'email) et améliorer les futures campagnes.

      Collecte de Données et RGPD

      Simplicité : Ne collecter que les informations strictement nécessaires. Chaque champ supplémentaire dans un formulaire diminue le taux de complétion.

      Consentement : Toujours obtenir l'autorisation explicite des personnes pour leur envoyer des communications.

      Désabonnement : Intégrer systématiquement un lien de désabonnement facile d'accès.

      Centralisation : Regrouper toutes les données collectées (adhésion, événements, site web) dans une base unique (un tableur type Excel/Google Sheets au début, puis potentiellement un CRM).

      Différence entre Newsletter et Emailing

      Newsletter : Communication récurrente (ex: mensuelle) avec des contenus variés (actualités, mise en avant d'un membre, etc.).

      L'objectif est de garder le lien. Il est conseillé de définir un "squelette" pour gagner du temps à chaque envoi.

      Emailing : Communication ponctuelle avec un seul objectif bien défini (ex: une campagne de dons, l'annonce d'un événement majeur).

      Le message est entièrement centré sur cet objectif pour maximiser l'action.

      Automatisation

      Il est possible d'automatiser certains envois pour gagner du temps.

      Par exemple, un e-mail de rappel peut être envoyé automatiquement un mois avant la date d'échéance d'une adhésion.

      Pilier 3 : Les Réseaux Sociaux, Amplifier la Portée

      Les réseaux sociaux sont essentiels pour la visibilité, mais nécessitent une approche stratégique.

      Stratégie de Présence

      Focalisation : "Se focaliser sur un et le faire très très bien, voire deux maximum."

      Il est contre-productif de multiplier les canaux sans avoir les ressources pour les animer correctement.

      Comptes Professionnels : Il est impératif d'utiliser une page ou un compte professionnel plutôt qu'un profil personnel.

      Cela permet :

      ◦ De donner l'accès à plusieurs administrateurs.   

      ◦ D'assurer la pérennité du compte si un bénévole quitte l'association.   

      ◦ D'accéder à des statistiques détaillées et à des fonctionnalités spécifiques.

      Focus sur WhatsApp

      WhatsApp est un outil de plus en plus utilisé pour la communication directe avec les adhérents.

      Les Communautés : Cette fonctionnalité permet de "ranger sa chambre" en structurant la communication.

      On peut créer :

      ◦ Un canal d'annonces principal, où seul l'administrateur publie (communication descendante).  

      ◦ Des groupes de discussion spécifiques par équipe, par projet, etc., pour les échanges interactifs.

      Bonnes Pratiques : Pour éviter de submerger les membres, il est conseillé de segmenter les groupes par usage et de proposer l'adhésion à la communauté sur la base du volontariat (opt-in) plutôt que de l'imposer.

      Engagement et Contenu

      ADN des Plateformes : Chaque réseau social a ses propres codes, formats et algorithmes.

      Le contenu doit être adapté à chaque plateforme.

      Le Moteur de la Visibilité : L'engagement (commentaires, partages, "likes") est le facteur clé qui détermine la portée d'une publication.

      Conseil Pratique : Pour stimuler l'engagement, il est très efficace de poser des questions directement dans les publications afin d'inciter les abonnés à répondre en commentaire.

      --------------------------------------------------------------------------------

      Synthèse des Questions-Réponses

      L'utilité des communautés WhatsApp : Elles sont jugées efficaces pour structurer les échanges et éviter la "pollution" des messages en séparant les annonces des discussions.

      Créer un compte WhatsApp sans numéro personnel : Il faut un numéro de téléphone.

      La solution proposée est de souscrire un forfait mobile à bas coût au nom de l'association.

      L'importance du site web à l'ère des réseaux sociaux : Le site internet reste crucial.

      C'est une "base propriétaire" que l'association contrôle totalement, à l'abri des changements d'algorithmes des réseaux sociaux.

      Nom de domaine en .fr ou .org : Le .fr ancre l'association sur le territoire français sans ambiguïté.

      Si une association utilise déjà un .org, il est conseillé de continuer tout en réservant le .fr correspondant pour protéger son nom.

      Comment engager les seniors (65+) sur le numérique : La clé est de s'adapter à leurs usages.

      Si leur canal principal est la newsletter, il faut y mettre le maximum d'informations.

      Si leur moyen de contact préféré est le téléphone, il faut le proposer. Il s'agit de s'insérer dans leurs habitudes.

    1. Product Description .quill-editor-edit-mode .ql-editor { min-height: 125px; } .ql-container { box-sizing: border-box; font-family: Helvetica, Arial, sans-serif; font-size: 13px; height: 100%; margin: 0px; position: relative; } .ql-container.ql-disabled .ql-tooltip { visibility: hidden; } .ql-container.ql-disabled .ql-editor ul[data-checked]>li::before { pointer-events: none; } .ql-clipboard { left: -100000px; height: 1px; overflow-y: hidden; position: absolute; top: 50%; } .ql-clipboard p { margin: 0; padding: 0; } .ql-editor { box-sizing: border-box; line-height: 1.42; height: 100%; outline: none; overflow-y: auto; padding: 12px 15px; tab-size: 4; -moz-tab-size: 4; text-align: left; white-space: pre-wrap; word-wrap: break-word; } .ql-editor>* { cursor: text; } .ql-editor p, .ql-editor ol, .ql-editor ul, .ql-editor pre, .ql-editor blockquote, .ql-editor h1, .ql-editor h2, .ql-editor h3, .ql-editor h4, .ql-editor h5, .ql-editor h6 { margin: 0; padding: 0; counter-reset: list-1 list-2 list-3 list-4 list-5 list-6 list-7 list-8 list-9; } .ql-editor ol, .ql-editor ul { padding-left: 1.5em; } .ql-editor ol>li, .ql-editor ul>li { list-style-type: none; } .ql-editor ul>li::before { content: '\2022'; } .ql-editor ul[data-checked=true], .ql-editor ul[data-checked=false] { pointer-events: none; } .ql-editor ul[data-checked=true]>li *, .ql-editor ul[data-checked=false]>li * { pointer-events: all; } .ql-editor ul[data-checked=true]>li::before, .ql-editor ul[data-checked=false]>li::before { color: #777; cursor: pointer; pointer-events: all; } .ql-editor ul[data-checked=true]>li::before { content: '\2611'; } .ql-editor ul[data-checked=false]>li::before { content: '\2610'; } .ql-editor li::before { display: inline-block; white-space: nowrap; width: 1.2em; } .ql-editor li:not(.ql-direction-rtl)::before { margin-left: -1.5em; margin-right: 0.3em; text-align: right; } .ql-editor li.ql-direction-rtl::before { margin-left: 0.3em; margin-right: -1.5em; } .ql-editor ol li:not(.ql-direction-rtl), .ql-editor ul li:not(.ql-direction-rtl) { padding-left: 1.5em; } .ql-editor ol li.ql-direction-rtl, .ql-editor ul li.ql-direction-rtl { padding-right: 1.5em; } .ql-editor ol li { counter-reset: list-1 list-2 list-3 list-4 list-5 list-6 list-7 list-8 list-9; counter-increment: list-0; } .ql-editor ol li:before { content: counter(list-0, decimal) '. '; } .ql-editor ol li.ql-indent-1 { counter-increment: list-1; } .ql-editor ol li.ql-indent-1:before { content: counter(list-1, lower-alpha) '. '; } .ql-editor ol li.ql-indent-1 { counter-reset: list-2 list-3 list-4 list-5 list-6 list-7 list-8 list-9; } .ql-editor ol li.ql-indent-2 { counter-increment: list-2; } .ql-editor ol li.ql-indent-2:before { content: counter(list-2, lower-roman) '. '; } .ql-editor ol li.ql-indent-2 { counter-reset: list-3 list-4 list-5 list-6 list-7 list-8 list-9; } .ql-editor ol li.ql-indent-3 { counter-increment: list-3; } .ql-editor ol li.ql-indent-3:before { content: counter(list-3, decimal) '. '; } .ql-editor ol li.ql-indent-3 { counter-reset: list-4 list-5 list-6 list-7 list-8 list-9; } .ql-editor ol li.ql-indent-4 { counter-increment: list-4; } .ql-editor ol li.ql-indent-4:before { content: counter(list-4, lower-alpha) '. '; } .ql-editor ol li.ql-indent-4 { counter-reset: list-5 list-6 list-7 list-8 list-9; } .ql-editor ol li.ql-indent-5 { counter-increment: list-5; } .ql-editor ol li.ql-indent-5:before { content: counter(list-5, lower-roman) '. '; } .ql-editor ol li.ql-indent-5 { counter-reset: list-6 list-7 list-8 list-9; } .ql-editor ol li.ql-indent-6 { counter-increment: list-6; } .ql-editor ol li.ql-indent-6:before { content: counter(list-6, decimal) '. '; } .ql-editor ol li.ql-indent-6 { counter-reset: list-7 list-8 list-9; } .ql-editor ol li.ql-indent-7 { counter-increment: list-7; } .ql-editor ol li.ql-indent-7:before { content: counter(list-7, lower-alpha) '. '; } .ql-editor ol li.ql-indent-7 { counter-reset: list-8 list-9; } .ql-editor ol li.ql-indent-8 { counter-increment: list-8; } .ql-editor ol li.ql-indent-8:before { content: counter(list-8, lower-roman) '. '; } .ql-editor ol li.ql-indent-8 { counter-reset: list-9; } .ql-editor ol li.ql-indent-9 { counter-increment: list-9; } .ql-editor ol li.ql-indent-9:before { content: counter(list-9, decimal) '. '; } .ql-editor .ql-indent-1:not(.ql-direction-rtl) { padding-left: 3em; } .ql-editor li.ql-indent-1:not(.ql-direction-rtl) { padding-left: 4.5em; } .ql-editor .ql-indent-1.ql-direction-rtl.ql-align-right { padding-right: 3em; } .ql-editor li.ql-indent-1.ql-direction-rtl.ql-align-right { padding-right: 4.5em; } .ql-editor .ql-indent-2:not(.ql-direction-rtl) { padding-left: 6em; } .ql-editor li.ql-indent-2:not(.ql-direction-rtl) { padding-left: 7.5em; } .ql-editor .ql-indent-2.ql-direction-rtl.ql-align-right { padding-right: 6em; } .ql-editor li.ql-indent-2.ql-direction-rtl.ql-align-right { padding-right: 7.5em; } .ql-editor .ql-indent-3:not(.ql-direction-rtl) { padding-left: 9em; } .ql-editor li.ql-indent-3:not(.ql-direction-rtl) { padding-left: 10.5em; } .ql-editor .ql-indent-3.ql-direction-rtl.ql-align-right { padding-right: 9em; } .ql-editor li.ql-indent-3.ql-direction-rtl.ql-align-right { padding-right: 10.5em; } .ql-editor .ql-indent-4:not(.ql-direction-rtl) { padding-left: 12em; } .ql-editor li.ql-indent-4:not(.ql-direction-rtl) { padding-left: 13.5em; } .ql-editor .ql-indent-4.ql-direction-rtl.ql-align-right { padding-right: 12em; } .ql-editor li.ql-indent-4.ql-direction-rtl.ql-align-right { padding-right: 13.5em; } .ql-editor .ql-indent-5:not(.ql-direction-rtl) { padding-left: 15em; } .ql-editor li.ql-indent-5:not(.ql-direction-rtl) { padding-left: 16.5em; } .ql-editor .ql-indent-5.ql-direction-rtl.ql-align-right { padding-right: 15em; } .ql-editor li.ql-indent-5.ql-direction-rtl.ql-align-right { padding-right: 16.5em; } .ql-editor .ql-indent-6:not(.ql-direction-rtl) { padding-left: 18em; } .ql-editor li.ql-indent-6:not(.ql-direction-rtl) { padding-left: 19.5em; } .ql-editor .ql-indent-6.ql-direction-rtl.ql-align-right { padding-right: 18em; } .ql-editor li.ql-indent-6.ql-direction-rtl.ql-align-right { padding-right: 19.5em; } .ql-editor .ql-indent-7:not(.ql-direction-rtl) { padding-left: 21em; } .ql-editor li.ql-indent-7:not(.ql-direction-rtl) { padding-left: 22.5em; } .ql-editor .ql-indent-7.ql-direction-rtl.ql-align-right { padding-right: 21em; } .ql-editor li.ql-indent-7.ql-direction-rtl.ql-align-right { padding-right: 22.5em; } .ql-editor .ql-indent-8:not(.ql-direction-rtl) { padding-left: 24em; } .ql-editor li.ql-indent-8:not(.ql-direction-rtl) { padding-left: 25.5em; } .ql-editor .ql-indent-8.ql-direction-rtl.ql-align-right { padding-right: 24em; } .ql-editor li.ql-indent-8.ql-direction-rtl.ql-align-right { padding-right: 25.5em; } .ql-editor .ql-indent-9:not(.ql-direction-rtl) { padding-left: 27em; } .ql-editor li.ql-indent-9:not(.ql-direction-rtl) { padding-left: 28.5em; } .ql-editor .ql-indent-9.ql-direction-rtl.ql-align-right { padding-right: 27em; } .ql-editor li.ql-indent-9.ql-direction-rtl.ql-align-right { padding-right: 28.5em; } .ql-editor .ql-video { display: block; max-width: 100%; } .ql-editor .ql-video.ql-align-center { margin: 0 auto; } .ql-editor .ql-video.ql-align-right { margin: 0 0 0 auto; } .ql-editor .ql-bg-black { background-color: #000; } .ql-editor .ql-bg-red { background-color: #e60000; } .ql-editor .ql-bg-orange { background-color: #f90; } .ql-editor .ql-bg-yellow { background-color: #ff0; } .ql-editor .ql-bg-green { background-color: #008a00; } .ql-editor .ql-bg-blue { background-color: #06c; } .ql-editor .ql-bg-purple { background-color: #93f; } .ql-editor .ql-color-white { color: #fff; } .ql-editor .ql-color-red { color: #e60000; } .ql-editor .ql-color-orange { color: #f90; } .ql-editor .ql-color-yellow { color: #ff0; } .ql-editor .ql-color-green { color: #008a00; } .ql-editor .ql-color-blue { color: #06c; } .ql-editor .ql-color-purple { color: #93f; } .ql-editor .ql-font-serif { font-family: Georgia, Times New Roman, serif; } .ql-editor .ql-font-monospace { font-family: Monaco, Courier New, monospace; } .ql-editor .ql-size-small { font-size: 0.75em; } .ql-editor .ql-size-large { font-size: 1.5em; } .ql-editor .ql-size-huge { font-size: 2.5em; } .ql-editor .ql-direction-rtl { direction: rtl; text-align: inherit; } .ql-editor .ql-align-center { text-align: center; } .ql-editor .ql-align-justify { text-align: justify; } .ql-editor .ql-align-right { text-align: right; } .ql-editor.ql-blank::before { color: rgba(0, 0, 0, 0.6); content: attr(data-placeholder); font-style: italic; left: 15px; pointer-events: none; position: absolute; right: 15px; } .ql-snow { box-sizing: border-box; } .ql-snow * { box-sizing: border-box; } .ql-snow .ql-hidden { display: none; } .ql-snow .ql-out-bottom, .ql-snow .ql-out-top { visibility: hidden; } .ql-snow .ql-tooltip { position: absolute; transform: translateY(10px); } .ql-snow .ql-tooltip a { cursor: pointer; text-decoration: none; } .ql-snow .ql-tooltip.ql-flip { transform: translateY(-10px); } .ql-snow .ql-formats { display: inline-block; vertical-align: middle; } .ql-snow .ql-formats:after { clear: both; content: ''; display: table; } .ql-snow .ql-stroke { fill: none; stroke: #444; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2; } .ql-snow .ql-stroke-miter { fill: none; stroke: #444; stroke-miterlimit: 10; stroke-width: 2; } .ql-snow .ql-fill, .ql-snow .ql-stroke.ql-fill { fill: #444; } .ql-snow .ql-empty { fill: none; } .ql-snow .ql-even { fill-rule: evenodd; } .ql-snow .ql-thin, .ql-snow .ql-stroke.ql-thin { stroke-width: 1; } .ql-snow .ql-transparent { opacity: 0.4; } .ql-snow .ql-direction svg:last-child { display: none; } .ql-snow .ql-direction.ql-active svg:last-child { display: inline; } .ql-snow .ql-direction.ql-active svg:first-child { display: none; } .ql-snow .ql-editor h1 { font-size: 2em; } .ql-snow .ql-editor h2 { font-size: 1.5em; } .ql-snow .ql-editor h3 { font-size: 1.17em; } .ql-snow .ql-editor h4 { font-size: 1em; } .ql-snow .ql-editor h5 { font-size: 0.83em; } .ql-snow .ql-editor h6 { font-size: 0.67em; } .ql-snow .ql-editor a { text-decoration: underline; } .ql-snow .ql-editor blockquote { border-left: 4px solid #ccc; margin-bottom: 5px; margin-top: 5px; padding-left: 16px; } .ql-snow .ql-editor code, .ql-snow .ql-editor pre { background-color: #f0f0f0; border-radius: 3px; } .ql-snow .ql-editor pre { white-space: pre-wrap; margin-bottom: 5px; margin-top: 5px; padding: 5px 10px; } .ql-snow .ql-editor code { font-size: 85%; padding: 2px 4px; } .ql-snow .ql-editor pre.ql-syntax { background-color: #23241f; color: #f8f8f2; overflow: visible; } .ql-snow .ql-editor img { max-width: 100%; } .ql-snow .ql-picker { color: #444; display: inline-block; float: left; font-size: 14px; font-weight: 500; height: 24px; position: relative; vertical-align: middle; } .ql-snow .ql-picker-label { cursor: pointer; display: inline-block; height: 100%; padding-left: 8px; padding-right: 2px; position: relative; width: 100%; } .ql-snow .ql-picker-label::before { display: inline-block; line-height: 22px; } .ql-snow .ql-picker-options { background-color: #fff; display: none; min-width: 100%; padding: 4px 8px; position: absolute; white-space: nowrap; } .ql-snow .ql-picker-options .ql-picker-item { cursor: pointer; display: block; padding-bottom: 5px; padding-top: 5px; } .ql-snow .ql-picker.ql-expanded .ql-picker-label { color: #ccc; z-index: 2; } .ql-snow .ql-picker.ql-expanded .ql-picker-label .ql-fill { fill: #ccc; } .ql-snow .ql-picker.ql-expanded .ql-picker-label .ql-stroke { stroke: #ccc; } .ql-snow .ql-picker.ql-expanded .ql-picker-options { display: block; margin-top: -1px; top: 100%; z-index: 1; } .ql-snow .ql-color-picker, .ql-snow .ql-icon-picker { width: 28px; } .ql-snow .ql-color-picker .ql-picker-label, .ql-snow .ql-icon-picker .ql-picker-label { padding: 2px 4px; } .ql-snow .ql-color-picker .ql-picker-label svg, .ql-snow .ql-icon-picker .ql-picker-label svg { right: 4px; } .ql-snow .ql-icon-picker .ql-picker-options { padding: 4px 0px; } .ql-snow .ql-icon-picker .ql-picker-item { height: 24px; width: 24px; padding: 2px 4px; } .ql-snow .ql-color-picker .ql-picker-options { padding: 3px 5px; width: 152px; } .ql-snow .ql-color-picker .ql-picker-item { border: 1px solid transparent; float: left; height: 16px; margin: 2px; padding: 0px; width: 16px; } .ql-snow .ql-picker:not(.ql-color-picker):not(.ql-icon-picker) svg { position: absolute; margin-top: -9px; right: 0; top: 50%; width: 18px; } .ql-snow .ql-picker.ql-header .ql-picker-label[data-label]:not([data-label=''])::before, .ql-snow .ql-picker.ql-font .ql-picker-label[data-label]:not([data-label=''])::before, .ql-snow .ql-picker.ql-size .ql-picker-label[data-label]:not([data-label=''])::before, .ql-snow .ql-picker.ql-header .ql-picker-item[data-label]:not([data-label=''])::before, .ql-snow .ql-picker.ql-font .ql-picker-item[data-label]:not([data-label=''])::before, .ql-snow .ql-picker.ql-size .ql-picker-item[data-label]:not([data-label=''])::before { content: attr(data-label); } .ql-snow .ql-picker.ql-header { width: 98px; } .ql-snow .ql-picker.ql-header .ql-picker-label::before, .ql-snow .ql-picker.ql-header .ql-picker-item::before { content: 'Normal'; } .ql-snow .ql-picker.ql-header .ql-picker-label[data-value="1"]::before, .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="1"]::before { content: 'Heading 1'; } .ql-snow .ql-picker.ql-header .ql-picker-label[data-value="2"]::before, .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="2"]::before { content: 'Heading 2'; } .ql-snow .ql-picker.ql-header .ql-picker-label[data-value="3"]::before, .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="3"]::before { content: 'Heading 3'; } .ql-snow .ql-picker.ql-header .ql-picker-label[data-value="4"]::before, .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="4"]::before { content: 'Heading 4'; } .ql-snow .ql-picker.ql-header .ql-picker-label[data-value="5"]::before, .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="5"]::before { content: 'Heading 5'; } .ql-snow .ql-picker.ql-header .ql-picker-label[data-value="6"]::before, .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="6"]::before { content: 'Heading 6'; } .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="1"]::before { font-size: 2em; } .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="2"]::before { font-size: 1.5em; } .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="3"]::before { font-size: 1.17em; } .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="4"]::before { font-size: 1em; } .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="5"]::before { font-size: 0.83em; } .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="6"]::before { font-size: 0.67em; } .ql-snow .ql-picker.ql-font { width: 108px; } .ql-snow .ql-picker.ql-font .ql-picker-label::before, .ql-snow .ql-picker.ql-font .ql-picker-item::before { content: 'Sans Serif'; } .ql-snow .ql-picker.ql-font .ql-picker-label[data-value=serif]::before, .ql-snow .ql-picker.ql-font .ql-picker-item[data-value=serif]::before { content: 'Serif'; } .ql-snow .ql-picker.ql-font .ql-picker-label[data-value=monospace]::before, .ql-snow .ql-picker.ql-font .ql-picker-item[data-value=monospace]::before { content: 'Monospace'; } .ql-snow .ql-picker.ql-font .ql-picker-item[data-value=serif]::before { font-family: Georgia, Times New Roman, serif; } .ql-snow .ql-picker.ql-font .ql-picker-item[data-value=monospace]::before { font-family: Monaco, Courier New, monospace; } .ql-snow .ql-picker.ql-size { width: 98px; } .ql-snow .ql-picker.ql-size .ql-picker-label::before, .ql-snow .ql-picker.ql-size .ql-picker-item::before { content: 'Normal'; } .ql-snow .ql-picker.ql-size .ql-picker-label[data-value=small]::before, .ql-snow .ql-picker.ql-size .ql-picker-item[data-value=small]::before { content: 'Small'; } .ql-snow .ql-picker.ql-size .ql-picker-label[data-value=large]::before, .ql-snow .ql-picker.ql-size .ql-picker-item[data-value=large]::before { content: 'Large'; } .ql-snow .ql-picker.ql-size .ql-picker-label[data-value=huge]::before, .ql-snow .ql-picker.ql-size .ql-picker-item[data-value=huge]::before { content: 'Huge'; } .ql-snow .ql-picker.ql-size .ql-picker-item[data-value=small]::before { font-size: 10px; } .ql-snow .ql-picker.ql-size .ql-picker-item[data-value=large]::before { font-size: 18px; } .ql-snow .ql-picker.ql-size .ql-picker-item[data-value=huge]::before { font-size: 32px; } .ql-snow .ql-color-picker.ql-background .ql-picker-item { background-color: #fff; } .ql-snow .ql-color-picker.ql-color .ql-picker-item { background-color: #000; } .ql-snow .ql-tooltip { background-color: #fff; border: 1px solid #ccc; box-shadow: 0px 0px 5px #ddd; color: #444; padding: 5px 12px; white-space: nowrap; } .ql-snow .ql-tooltip::before { content: "Visit URL:"; line-height: 26px; margin-right: 8px; } .ql-snow .ql-tooltip input[type=text] { display: none; border: 1px solid #ccc; font-size: 13px; height: 26px; margin: 0px; padding: 3px 5px; width: 170px; } .ql-snow .ql-tooltip a.ql-preview { display: inline-block; max-width: 200px; overflow-x: hidden; text-overflow: ellipsis; vertical-align: top; } .ql-snow .ql-tooltip a.ql-action::after { border-right: 1px solid #ccc; content: 'Edit'; margin-left: 16px; padding-right: 8px; } .ql-snow .ql-tooltip a.ql-remove::before { content: 'Remove'; margin-left: 8px; } .ql-snow .ql-tooltip a { line-height: 26px; } .ql-snow .ql-tooltip.ql-editing a.ql-preview, .ql-snow .ql-tooltip.ql-editing a.ql-remove { display: none; } .ql-snow .ql-tooltip.ql-editing input[type=text] { display: inline-block; } .ql-snow .ql-tooltip.ql-editing a.ql-action::after { border-right: 0px; content: 'Save'; padding-right: 0px; } .ql-snow .ql-tooltip[data-mode=link]::before { content: "Enter link:"; } .ql-snow .ql-tooltip[data-mode=formula]::before { content: "Enter formula:"; } .ql-snow .ql-tooltip[data-mode=video]::before { content: "Enter video:"; } .ql-snow a { color: #06c; } .ql-container.ql-snow { border: 1px solid #ccc; } .ql-bubble { box-sizing: border-box; } .ql-bubble * { box-sizing: border-box; } .ql-bubble .ql-hidden { display: none; } .ql-bubble .ql-out-bottom, .ql-bubble .ql-out-top { visibility: hidden; } .ql-bubble .ql-tooltip { position: absolute; transform: translateY(10px); } .ql-bubble .ql-tooltip a { cursor: pointer; text-decoration: none; } .ql-bubble .ql-tooltip.ql-flip { transform: translateY(-10px); } .ql-bubble .ql-formats { display: inline-block; vertical-align: middle; } .ql-bubble .ql-formats:after { clear: both; content: ''; display: table; } .ql-bubble .ql-stroke { fill: none; stroke: #ccc; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2; } .ql-bubble .ql-stroke-miter { fill: none; stroke: #ccc; stroke-miterlimit: 10; stroke-width: 2; } .ql-bubble .ql-fill, .ql-bubble .ql-stroke.ql-fill { fill: #ccc; } .ql-bubble .ql-empty { fill: none; } .ql-bubble .ql-even { fill-rule: evenodd; } .ql-bubble .ql-thin, .ql-bubble .ql-stroke.ql-thin { stroke-width: 1; } .ql-bubble .ql-transparent { opacity: 0.4; } .ql-bubble .ql-direction svg:last-child { display: none; } .ql-bubble .ql-direction.ql-active svg:last-child { display: inline; } .ql-bubble .ql-direction.ql-active svg:first-child { display: none; } .ql-bubble .ql-editor h1 { font-size: 2em; } .ql-bubble .ql-editor h2 { font-size: 1.5em; } .ql-bubble .ql-editor h3 { font-size: 1.17em; } .ql-bubble .ql-editor h4 { font-size: 1em; } .ql-bubble .ql-editor h5 { font-size: 0.83em; } .ql-bubble .ql-editor h6 { font-size: 0.67em; } .ql-bubble .ql-editor a { text-decoration: underline; } .ql-bubble .ql-editor blockquote { border-left: 4px solid #ccc; margin-bottom: 5px; margin-top: 5px; padding-left: 16px; } .ql-bubble .ql-editor code, .ql-bubble .ql-editor pre { background-color: #f0f0f0; border-radius: 3px; } .ql-bubble .ql-editor pre { white-space: pre-wrap; margin-bottom: 5px; margin-top: 5px; padding: 5px 10px; } .ql-bubble .ql-editor code { font-size: 85%; padding: 2px 4px; } .ql-bubble .ql-editor pre.ql-syntax { background-color: #23241f; color: #f8f8f2; overflow: visible; } .ql-bubble .ql-editor img { max-width: 100%; } .ql-bubble .ql-picker { color: #ccc; display: inline-block; float: left; font-size: 14px; font-weight: 500; height: 24px; position: relative; vertical-align: middle; } .ql-bubble .ql-picker-label { cursor: pointer; display: inline-block; height: 100%; padding-left: 8px; padding-right: 2px; position: relative; width: 100%; } .ql-bubble .ql-picker-label::before { display: inline-block; line-height: 22px; } .ql-bubble .ql-picker-options { background-color: #444; display: none; min-width: 100%; padding: 4px 8px; position: absolute; white-space: nowrap; } .ql-bubble .ql-picker-options .ql-picker-item { cursor: pointer; display: block; padding-bottom: 5px; padding-top: 5px; } .ql-bubble .ql-picker.ql-expanded .ql-picker-label { color: #777; z-index: 2; } .ql-bubble .ql-picker.ql-expanded .ql-picker-label .ql-fill { fill: #777; } .ql-bubble .ql-picker.ql-expanded .ql-picker-label .ql-stroke { stroke: #777; } .ql-bubble .ql-picker.ql-expanded .ql-picker-options { display: block; margin-top: -1px; top: 100%; z-index: 1; } .ql-bubble .ql-color-picker, .ql-bubble .ql-icon-picker { width: 28px; } .ql-bubble .ql-color-picker .ql-picker-label, .ql-bubble .ql-icon-picker .ql-picker-label { padding: 2px 4px; } .ql-bubble .ql-color-picker .ql-picker-label svg, .ql-bubble .ql-icon-picker .ql-picker-label svg { right: 4px; } .ql-bubble .ql-icon-picker .ql-picker-options { padding: 4px 0px; } .ql-bubble .ql-icon-picker .ql-picker-item { height: 24px; width: 24px; padding: 2px 4px; } .ql-bubble .ql-color-picker .ql-picker-options { padding: 3px 5px; width: 152px; } .ql-bubble .ql-color-picker .ql-picker-item { border: 1px solid transparent; float: left; height: 16px; margin: 2px; padding: 0px; width: 16px; } .ql-bubble .ql-picker:not(.ql-color-picker):not(.ql-icon-picker) svg { position: absolute; margin-top: -9px; right: 0; top: 50%; width: 18px; } .ql-bubble .ql-picker.ql-header .ql-picker-label[data-label]:not([data-label=''])::before, .ql-bubble .ql-picker.ql-font .ql-picker-label[data-label]:not([data-label=''])::before, .ql-bubble .ql-picker.ql-size .ql-picker-label[data-label]:not([data-label=''])::before, .ql-bubble .ql-picker.ql-header .ql-picker-item[data-label]:not([data-label=''])::before, .ql-bubble .ql-picker.ql-font .ql-picker-item[data-label]:not([data-label=''])::before, .ql-bubble .ql-picker.ql-size .ql-picker-item[data-label]:not([data-label=''])::before { content: attr(data-label); } .ql-bubble .ql-picker.ql-header { width: 98px; } .ql-bubble .ql-picker.ql-header .ql-picker-label::before, .ql-bubble .ql-picker.ql-header .ql-picker-item::before { content: 'Normal'; } .ql-bubble .ql-picker.ql-header .ql-picker-label[data-value="1"]::before, .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="1"]::before { content: 'Heading 1'; } .ql-bubble .ql-picker.ql-header .ql-picker-label[data-value="2"]::before, .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="2"]::before { content: 'Heading 2'; } .ql-bubble .ql-picker.ql-header .ql-picker-label[data-value="3"]::before, .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="3"]::before { content: 'Heading 3'; } .ql-bubble .ql-picker.ql-header .ql-picker-label[data-value="4"]::before, .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="4"]::before { content: 'Heading 4'; } .ql-bubble .ql-picker.ql-header .ql-picker-label[data-value="5"]::before, .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="5"]::before { content: 'Heading 5'; } .ql-bubble .ql-picker.ql-header .ql-picker-label[data-value="6"]::before, .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="6"]::before { content: 'Heading 6'; } .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="1"]::before { font-size: 2em; } .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="2"]::before { font-size: 1.5em; } .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="3"]::before { font-size: 1.17em; } .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="4"]::before { font-size: 1em; } .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="5"]::before { font-size: 0.83em; } .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="6"]::before { font-size: 0.67em; } .ql-bubble .ql-picker.ql-font { width: 108px; } .ql-bubble .ql-picker.ql-font .ql-picker-label::before, .ql-bubble .ql-picker.ql-font .ql-picker-item::before { content: 'Sans Serif'; } .ql-bubble .ql-picker.ql-font .ql-picker-label[data-value=serif]::before, .ql-bubble .ql-picker.ql-font .ql-picker-item[data-value=serif]::before { content: 'Serif'; } .ql-bubble .ql-picker.ql-font .ql-picker-label[data-value=monospace]::before, .ql-bubble .ql-picker.ql-font .ql-picker-item[data-value=monospace]::before { content: 'Monospace'; } .ql-bubble .ql-picker.ql-font .ql-picker-item[data-value=serif]::before { font-family: Georgia, Times New Roman, serif; } .ql-bubble .ql-picker.ql-font .ql-picker-item[data-value=monospace]::before { font-family: Monaco, Courier New, monospace; } .ql-bubble .ql-picker.ql-size { width: 98px; } .ql-bubble .ql-picker.ql-size .ql-picker-label::before, .ql-bubble .ql-picker.ql-size .ql-picker-item::before { content: 'Normal'; } .ql-bubble .ql-picker.ql-size .ql-picker-label[data-value=small]::before, .ql-bubble .ql-picker.ql-size .ql-picker-item[data-value=small]::before { content: 'Small'; } .ql-bubble .ql-picker.ql-size .ql-picker-label[data-value=large]::before, .ql-bubble .ql-picker.ql-size .ql-picker-item[data-value=large]::before { content: 'Large'; } .ql-bubble .ql-picker.ql-size .ql-picker-label[data-value=huge]::before, .ql-bubble .ql-picker.ql-size .ql-picker-item[data-value=huge]::before { content: 'Huge'; } .ql-bubble .ql-picker.ql-size .ql-picker-item[data-value=small]::before { font-size: 10px; } .ql-bubble .ql-picker.ql-size .ql-picker-item[data-value=large]::before { font-size: 18px; } .ql-bubble .ql-picker.ql-size .ql-picker-item[data-value=huge]::before { font-size: 32px; } .ql-bubble .ql-color-picker.ql-background .ql-picker-item { background-color: #fff; } .ql-bubble .ql-color-picker.ql-color .ql-picker-item { background-color: #000; } .ql-bubble .ql-color-picker svg { margin: 1px; } .ql-bubble .ql-color-picker .ql-picker-item.ql-selected, .ql-bubble .ql-color-picker .ql-picker-item:hover { border-color: #fff; } .ql-bubble .ql-tooltip { background-color: #444; border-radius: 25px; color: #fff; } .ql-bubble .ql-tooltip-arrow { border-left: 6px solid transparent; border-right: 6px solid transparent; content: " "; display: block; left: 50%; margin-left: -6px; position: absolute; } .ql-bubble .ql-tooltip:not(.ql-flip) .ql-tooltip-arrow { border-bottom: 6px solid #444; top: -6px; } .ql-bubble .ql-tooltip.ql-flip .ql-tooltip-arrow { border-top: 6px solid #444; bottom: -6px; } .ql-bubble .ql-tooltip.ql-editing .ql-tooltip-editor { display: block; } .ql-bubble .ql-tooltip.ql-editing .ql-formats { visibility: hidden; } .ql-bubble .ql-tooltip-editor { display: none; } .ql-bubble .ql-tooltip-editor input[type=text] { background: transparent; border: none; color: #fff; font-size: 13px; height: 100%; outline: none; padding: 10px 20px; position: absolute; width: 100%; } .ql-bubble .ql-tooltip-editor a { top: 10px; position: absolute; right: 20px; } .ql-bubble .ql-tooltip-editor a:before { color: #ccc; content: "D7"; font-size: 16px; font-weight: bold; } .ql-container.ql-bubble:not(.ql-disabled) a { position: relative; white-space: nowrap; } .ql-container.ql-bubble:not(.ql-disabled) a::before { background-color: #444; border-radius: 15px; top: -5px; font-size: 12px; color: #fff; content: attr(href); font-weight: normal; overflow: hidden; padding: 5px 15px; text-decoration: none; z-index: 1; } .ql-container.ql-bubble:not(.ql-disabled) a::after { border-top: 6px solid #444; border-left: 6px solid transparent; border-right: 6px solid transparent; top: 0; content: " "; height: 0; width: 0; } .ql-container.ql-bubble:not(.ql-disabled) a::before, .ql-container.ql-bubble:not(.ql-disabled) a::after { left: 0; margin-left: 50%; position: absolute; transform: translate(-50%, -100%); transition: visibility 0s ease 200ms; visibility: hidden; } .ql-container.ql-bubble:not(.ql-disabled) a:hover::before, .ql-container.ql-bubble:not(.ql-disabled) a:hover::after { visibility: visible; } "A Mother's Healing Touch" is a heartfelt exploration of the profound bond between a mother and her child, offering insights and guidance for nurturing emotional well-being and resilience. Drawing on the wisdom of ancient traditions and modern psychology, this book celebrates the transformative power of a mother's love and compassion in healing wounds, soothing fears, and fostering growth.Through personal anecdotes, practical tips, and mindfulness exercises, "A Mother's Healing Touch" offers support to mothers navigating the challenges of raising children in today's world. From soothing a crying infant to supporting a teenager through turbulent times, discover how to cultivate presence, empathy, and connection to strengthen your relationship with your child and promote their emotional resilience.Explore the healing potential of nurturing touch, empathetic listening, and unconditional acceptance as you embark on a journey of self-discovery and growth alongside your child. Whether you're facing moments of joy or adversity, this book serves as a guiding light, reminding mothers of the transformative power they hold to nurture, heal, and inspire their children through the gentle touch of love."

      A mother's healing touch

    1. ¿Qué es la cohesión social?

      esta sección debería ser algo como "concepto y medición de cohesión social" ... de eso no hay nada y es central en esta literatura. Debe considerar lo internacional como también una sección específica para América Latina (Ecosocial, CEPAL, COES, etc)

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      One of the most novel things of the manuscript is the use of a relatively quick photoablation system. Could this technique be applied in other laboratories? While the revised manuscript includes more technical details as requested, the description remains difficult to follow for readers from a biology background. I recommend revising this section to improve clarity and accessibility for a broader scientific audience.

      As suggested, we have adapted the paragraph related to the photoablation technique in the Material & Method section, starting line 1147. We believe it is now easier to follow.

      The authors suggest that in the animal model, early 3h infection with Neisseria do not show increase in vascular permeability, contrary to their findings in the 3D in vitro model. However, they show a non-significant increase in permeability of 70 KDa Dextran in the animal xenograft early infection. As a bioengineer this seems to point that if the experiment would have been done with a lower molecular weight tracer, significant increases in permeability could have been detected. I would suggest to do this experiment that could capture early events in vascular disruption.

      Comparing permeability under healthy and infected conditions using Dextran smaller than 70 kDa is challenging. Previous research (1) has shown that molecules below 70 kDa already diffuse freely in healthy tissue. Given this high baseline diffusion, we believe that no significant difference would be observed before and after N. meningitidis infection, and these experiments were not carried out. As discussed in the manuscript, bacteria-induced permeability in mice occurs at later time points, 16h post-infection, as shown previously (2). As discussed in the manuscript, this difference between the xenograft model and the chip could reflect the absence of various cell types present in the tissue parenchyma or simply vessel maturation time.

      One of the great advantages of the system is the possibility of visualizing infection-related events at high resolution. The authors show the formation of actin in a honeycomb structure beneath the bacterial microcolonies. This only occurred in 65% of the microcolonies. Is this result similar to in vitro 2D endothelial cultures in static and under flow? Also, the group has shown in the past positive staining of other cytoskeletal proteins, such as ezrin, in the ERM complex. Does this also occur in the 3D system?

      We imaged monolayers of endothelial cells in the flat regions of the chip (the two lateral channels) using the same microscopy conditions (i.e., Obj. 40X N.A. 1.05) that have been used to detect honeycomb structures in the 3D vessels in vitro. We showed that more than 56% of infected cells present these honeycomb structures in 2D, which is 13% less than in 3D, and is not significant due to the distributions of both populations. Thus, we conclude that under both in vitro conditions, 2D and 3D, the amount of infected cells exhibiting cortical plaques is similar. These results are in Figure 4E and S4B.

      We also performed staining of ezrin in the chip and imaged both the 3D and 2D regions. Although ezrin staining was visible in 3D (Author response image 1), it was not as obvious as other markers under these infected conditions, and we did not include it in the main text. Interpretation of this result is not straightforward, as the substrate of the cells is different, and it would require further studies on the behavior of ERM proteins in these different contexts.

      Author response image 1.

      F-actin (red) and ezrin (yellow) staining after 3h of infection with N. meningitidis (green) in 2D (top) and 3D (bottom) vessel-on-chip models.

      Recommendation to the authors:

      Reviewer #1 (Recommendation to the authors):

      I appreciate that the authors addressed most of my comments, of special relevance are the change of the title and references to infection-on-chip. I think that the current choice of words better acknowledges the incipient but strong bioengineering infection community. I also appreciate the inclusion of a limitation paragraph that better frames the current work and proposes future advancements.

      The addition of more methodological details has improved the manuscript. Although as mentioned earlier the wording needs to be accessible for the biology community. I also appreciated the addition of the quantification of binding under the WSS gradient in the different geometries and shown in Fig 3H. However, the description of the figure and the legend is not clear. What does "vessel" mean on the graph and "normalized histograms ...(blue)" in the figure legend. Could the authors rephrase it?

      In Figure 3F, we investigated whether Neisseria meningitidis exhibits preferential sites of infection. We hypothesized that, if bacteria preferentially adhered to specific regions, the local shear stress at these sites would differ from the overall distribution. To test this, we compared the shear stress at bacterial adhesion sites in the VoC (orange dots and curve) with the shear stress along the entire vascular edges (blue dots and curve). The high Spearman correlation indicates that there is no distinct shear stress value associated with bacterial adhesion. This suggests that bacteria can adhere across all regions, independently of local shear stress. To enhance clarity, the legend of Figure 3 and the related text have been rephrased in the revised manuscript (L289-314).

      Line 415. Should reference to Fig S5B, not Fig 5B. Also, the titles in Supplementary Figure 4 and 5 are duplicated, and the description of the legend inf Fig S5 seems a bit off. A and B seem to be swapped.

      Indeed, the reference to the right figure has been corrected. Also, the title of Figure S4 has been adapted to its contents, and the legend of Figure S5 has been corrected.

      Reviewer #2 (Recommendation to the authors):

      Minor comments to the authors:

      Line 163 "they formed" instead of "formed".

      Line 212 "two days" instead of "two day"

      Line 269 a space between two words is missing.

      These three comments have been addressed in the revised manuscript.

      In addition, I appreciate answering the comments, especially those requiring hypothesizing about including further cells. However, when discussing which other cells could be relevant for the model (lines 631 to 632) it would be beneficial to discuss not only the role of those cells but also how could they be included in the model. I think for the reader, inclusion of further cells could be seen as a challenge or limitation, and addressing these technical points in the discussion could be helpful.

      We thank Reviewer #2 for the insightful suggestion. Indeed, the method of introducing cells into the VoC depends on their type. Fibroblasts and dendritic cells, which are resident tissue cells, should be embedded in the collagen gel before polymerization and UV carving. This requires careful optimization to preserve chip integrity, as these cells exert pulling forces while migrating within the collagen matrix. In contrast, T cells and macrophages should be introduced through the vessel lumen to mimic their circulation in vivo. Pericytes can be co-seeded with endothelial cells, as they have been shown to self-organize within a few hours post-seeding. These important informations are now included in the manuscript (L577-587).

      Reviewer #3 (Recommendation to the authors):

      Suggestions and Recommendations

      Some suggestions related to the VOC itself:

      Figure 1, Fig S1, paragraph starting line 1071: More information would be helpful for the laser photoablation. For instance, is a non-standard UV laser needed? Which form of UV light is used? What is the frequency of laser pulsing? How many pulses/how long is needed to ablate the region of interest?

      The photoablation process requires a focused UV-laser, with high frequency (10 kHz) to lower the carving time while providing the required intensity to degrade collagen gel. To carve a reproducible number of 30 µm-large vessels, we used a 2 µm-large laser beam at an energy of 10 mW and moved the stage (i.e., sample) at a maximum speed of 1 mm/s. This information has been added to the related paragraph starting on line 1147 of the revised manuscript.

      It is difficult to understand the geometry of the VOC. In Figure 1C, is the light coloration representing open space through which medium can flow, and the dark section the collagen? On a single chip, how many vessels are cut through the collagen? It looks as if at least two are cut in Figure 1C in the righthand photo.

      In Figure 1C, the light coloration is the Factin staining. The horizontal upper and lower parts are the 2D lateral channels that also contain endothelial cells, and are connected to inlets and outlets, respectively. In the middle, two vertically carved 3D vessels are shown in the confocal image.

      Technically, we designed the PDMS structures to allow carving of 1 to 3 channels, maximizing the number of vessels that can be imaged while minimizing any loss of permeability at the PDMS/collagen/cells interface. This information has been added in the revised manuscript (L. 1147).

      If multiple vessels are cut in the center channel between the lateral channels, how do you ensure that medium flow is even between all vessels? A single chip with multiple different vessel architectures through the center channel would be expected to have different hydrostatic resistance with different architectures, thereby causing differences in flow rates in each vessel.

      To ensure a consistent flow rate regardless of the number of carved vessels, we opted to control the flow rate directly across the chip with a syringe pump. During experiments, one inlet and one outlet were closed, and a syringe pump was used. Because the carved vessels are arranged in parallel (derivation), the flow rate remains the same in each vessel. If a pressure controller had been used instead, the flow would have been distributed evenly across the different channels. This has been added to the revised manuscript in the paragraph starting on line 1210.

      The figures imply that the laser ablation can be performed at depth within the collagen gel, rather than just etching the surface. If this is the case, it should be stated explicitly. If not, this needs to be clarified.

      One of the main advantages of the photoablation technique is carving the collagen gel in volume, and not only etching the surface. Thanks to the 3D UV degradation, we can form the 3D architecture surrounded by the bulk collagen. This has been added to the revised manuscript, lines 154-155.

      Is the in-vivo-like vessel architecture connected to the lateral channel at an oblique angle, or is the image turned to fit the entire structure? (Figure 1F and 3E). Is that why there is high shear stress at its junction with the lateral channel depicted in Figure 3E?

      All structures require connection to the lateral channels to ensure media circulation and nutrient supply. The in vivo-like design must be rotated to allow the upper and lower branches of the complex structure to pass between the fixed PDMS pillars. To remain consistent with the image and the flow direction, we have kept the same orientation as in the COMSOL simulation. This leads to a locally higher shear stress at the top of the architecture. This has been added in the revised manuscript, in the paragraph starting on line 1474.

      Figure S1F,G: In the legend, shapes are circles, not squares. On the graphs, what do the numbers in parentheses mean?

      Indeed, the terms "squares" have been replaced by "circles" in Figure 1. (1) and (2) refer to the providers of the collagen, FujiFilm and Corning, respectively. We have added this mention in the legend in Figure S1.

      Figure 3B: how do the images on the left and right differ? Each of the 4 images needs to be explained.

      The four images represent the infected VoC from different viewing angles, illustrating the three-dimensional spread of infection throughout the vessel. A more detailed description has been added in the legend of Figure 3.

      Figure S3C is not referenced but should be, likely before sentence starting on line 299.

      Indeed, the reference to Figure S3C has been added line 301 of the revised manuscript.

      Results in Figure 3 with the pilD mutant are very interesting. It is worth commenting in the Discussion about how T4P functionality in addition to the presence of T4P contributes to Nm infection, and how in the future this could be probed with pilT mutants.

      We thank Reviewer #3 for this relevant insight. Following adhesion, a key functionality of Neisseria meningitidis for colony formation and enhanced infection is twitching motility. As suggested, we have added in the Discussion the idea of using a PilT mutant, which can adhere but cannot retract its pili, in the VoC model to investigate the role of motility in colonization in vitro under flow conditions (L611–623).

      Which vessel design was used for the data presented in Figures 4, 5, and 6 and associated supplemental figures?

      Straight channels have been mostly used in figures 4, 5, and 6. Rarely, we used the branched in vivo-like designs to observe potential similar infection patterns to in vivo, and related neutrophil activity. This has been added in the revised manuscript, lines 1435-1439.

      Figure 4B-D: the images presented in Figure 4C are not representative of the averages presented in Figures 4B,D. For instance, the aggregates appear much larger and more elongated in the animal model in Figure 4C, but the animal model and VOC have the colony doubling time (implying same size) in Figure 4B, and same average aggregate elongation in Figure 4D.

      The images in Figure 4C were selected to illustrate the elongation of colonies quantified in Figure 4D. The elongation angles are consistent between both images and align with the channel orientation. Representative images of colony expansion over time, corresponding to Figure 4A and 4B, are provided in Figure S4A.

      Figures 4E-F: dextran does not appear to diffuse in the VOC in response to histamine in these images, yet there is a significant increase in histamine-induced permeability in Figure 4F. Dotted lines should be used to indicate vessel walls for histamine, and/or a more representative image should be selected. A control set of images should also be included for comparison.

      We thank Reviewer #3 for the insightful comment. We confirm that we have carefully selected representative images for the histamine condition and adjusted them to display the same range of gray levels. The apparent increase in permeability with histamine is explained by a slight rise in background fluorescence, combined with the smaller channel size shown in Figure 4E.

      Figure S4 title is a duplicate of Figure S5 and is unrelated to the content of Figure S4. Suggest rewording to mention changes in permeability induced by Nm infection in the VOC and animal model.

      Indeed, the title of Figure S4 did not correspond to its content. We have, thus, changed it in the revised manuscript.

      Line 489 "...our Vessel-on-Chip model has the potential to fully capture the human neutrophil response during vascular infections, in a species-matched microenvironment", is an overstatement. As presented, the VOC model only contains endothelial cells and neutrophils. Many other cell types and structures can affect neutrophil activity. Thus, it is an overstatement to claim that the model can fully capture the human neutrophil response.

      We agree with the Reviewer #3, that neutrophil activity is fully recapitulated with other cell types, such as platelets, pericytes, macrophages, dendritic cells, and fibroblasts, that secrete important molecules such as cytokines, chemokines, TNF-α, and histamine. In our simplified model we were able to reconstitute the complex interaction of neutrophils with endothelial cells and with bacteria. The text was modified accordingly.

      Supplemental Figure 6 - Does CD62E staining overlap with sites of Nm attachment

      E-selectin staining does not systematically colocalize with Neisseria meningitidis colonies although bacterial adhesion is required. Its overall induced expression is heterogeneous across the tissue and shows heterogeneity from cell to cell as seen in vivo.

      Line 475, Figure 6E- Phagocytosis of Nm is described, but it is difficult to see. An arrow should be added to make this clear. Perhaps the reference should have been to Figure 6G? Consider changing the colors in Figure 6G away from red/green to be more color-blind friendly.

      Indeed, the reference to the right figure is Figure 6G, where the phagocytosis event is zoomed in. We have changed it in the text. Adapting the color of this figure 6G would imply to also change all the color codes of the manuscript, as red has been used for actin and green for Neisseria meningitidis.

      Lines 621-632 - This important discussion point should be reworked. Some suggested references to cite and discuss include PMID: 7913984, 15186399, 17991045, 18640287, 19880493.

      We have introduced in the discussion parts the following references as suggested (3–7), and discussed more the importance of introducting of immune cells to study immune cell-bacteria interaction and related immune response (L659-678).

      Minor corrections:

      •  Line 8 - suggest "photoablation-generated" instead of "photoablation-based"

      •  Line 57- remove the word "either", or modify the sentence

      •  Sentence on lines 162-165 needs rewording

      •  Lines 204-205- "loss of vascular permeability" should read "increase in vascular permeability"

      •  Line 293- "Measured" shear stress, should be "computed", since it was not directly measured (according to the Materials & Methods)

      •  Line 304- "consistently" should be "consistent"

      •  Fig. 3 legend, second line: replace "our" with "the VoC"

      •  Line 371, change "our" to "the"

      •  Line 415- Figure 5B doesn’t appear to show 2-D data. Is this in Figure S5B? Some clarification is needed. The quantification of Nm vessel association in both the VOC and the animal model should be shown in Figure 5, for direct comparison.

      •  Supplementary Figure 5C: correlation coefficient with statistical significance should be calculated.

      •  Figure 6 title, rephrase to "The infected VOC model"

      •  Line 450, replace "important" with "statistically significant"

      •  Line 459, suggest rephrasing to "bacterial pilus-mediated adhesion"

      •  Line 533- grammar needs correction

      •  Line 589- should be "sheds"

      •  Line 1106- should be "pellet"

      •  Lines 1223-1224 - is the antibody solution introduced into the inlet of the VOC for staining? Please clarify.

      •  Line 1295-unclear why Figure 2B is being referenced here

      All the suggested minor corrections have been taken into account in the revised manuscript.

      References

      (1) Gyohei Egawa, Satoshi Nakamizo, Yohei Natsuaki, Hiromi Doi, Yoshiki Miyachi, and Kenji Kabashima. Intravital analysis of vascular permeability in mice using two-photon microscopy. Scientific Reports, 3(1):1932, Jun 2013. ISSN 2045-2322. doi: 10.1038/srep01932.

      (2) Valeria Manriquez, Pierre Nivoit, Tomas Urbina, Hebert Echenique-Rivera, Keira Melican, Marie-Paule Fernandez-Gerlinger, Patricia Flamant, Taliah Schmitt, Patrick Bruneval, Dorian Obino, and Guillaume Duménil. Colonization of dermal arterioles by neisseria meningitidis provides a safe haven from neutrophils. Nature Communications, 12(1):4547, Jul 2021. ISSN 2041-1723. doi: 10.1038/s41467-021-24797-z.

      (3) Katherine A. Rhodes, Man Cheong Ma, María A. Rendón, and Magdalene So. Neisseria genes required for persistence identified via in vivo screening of a transposon mutant library. PLOS Pathogens, 18(5):1–30, 05 2022. doi: 10.1371/journal.ppat.1010497.

      (4) Heli Uronen-Hansson, Liana Steeghs, Jennifer Allen, Garth L. J. Dixon, Mohamed Osman, Peter Van Der Ley, Simon Y. C. Wong, Robin Callard, and Nigel Klein. Human dendritic cell activation by neisseria meningitidis: phagocytosis depends on expression of lipooligosaccharide (los) by the bacteria and is required for optimal cytokine production. Cellular Microbiology, 6(7):625–637, 2004. doi: https://doi.org/10.1111/j.1462-5822.2004.00387.x.

      (5) M. C. Jacobsen, P. J. Dusart, K. Kotowicz, M. Bajaj-Elliott, S. L. Hart, N. J. Klein, and G. L. Dixon. A critical role for atf2 transcription factor in the regulation of e-selectin expression in response to non-endotoxin components of neisseria meningitidis. Cellular Microbiology, 18(1):66–79, 2016. doi: https://doi.org/10.1111/cmi.12483.

      (6) Andrea Villwock, Corinna Schmitt, Stephanie Schielke, Matthias Frosch, and Oliver Kurzai. Recognition via the class a scavenger receptor modulates cytokine secretion by human dendritic cells after contact with neisseria meningitidis. Microbes and Infection, 10(10):1158–1165, 2008. ISSN 1286-4579. doi: https://doi.org/10.1016/j.micinf.2008.06.009.

      (7) Audrey Varin, Subhankar Mukhopadhyay, Georges Herbein, and Siamon Gordon. Alternative activation of macrophages by il-4 impairs phagocytosis of pathogens but potentiates microbial-induced signalling and cytokine secretion. Blood, 115(2):353–362, Jan 2010. ISSN 0006-4971. doi: 10.1182/blood-2009-08-236711.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The manuscript by Choi and colleagues investigates the impact of variation in cortical geometry and growth on cortical surface morphology. Specifically, the study uses physical gel models and computational models to evaluate the impact of varying specific features/parameters of the cortical surface. The study makes use of this approach to address the topic of malformations of cortical development and finds that cortical thickness and cortical expansion rate are the drivers of differences in morphogenesis.

      The study is composed of two main sections. First, the authors validate numerical simulation and gel model approaches against real cortical postnatal development in the ferret. Next, the study turns to modelling malformations in cortical development using modified tangential growth rate and cortical thickness parameters in numerical simulations. The findings investigate three genetically linked cortical malformations observed in the human brain to demonstrate the impact of the two physical parameters on folding in the ferret brain.

      This is a tightly presented study that demonstrates a key insight into cortical morphogenesis and the impact of deviations from normal development. The dual physical and computational modeling approach offers the potential for unique insights into mechanisms driving malformations. This study establishes a strong foundation for further work directly probing the development of cortical folding in the ferret brain. One weakness of the current study is that the interpretation of the results in the context of human cortical development is at present indirect, as the modelling results are solely derived from the ferret. However, these modelling approaches demonstrate proof of concept for investigating related alterations more directly in future work through similar approaches to models of the human cerebral cortex.

      We thank the reviewer for the very positive comments. While the current gel and organismal experiments focus on the ferret only, we want to emphasize that our analysis does consider previous observations of human brains and morphologies therein (Tallinen et al., Proc. Natl. Acad. Sci. 2014; Tallinen et al., Nat. Phys. 2016), which we compare and explain. This allows us to analyze the implications of our study broadly to understand the explanations of cortical malformations in humans using the ferret to motivate our study. Further analysis of normal human brain growth using computational and physical gel models can be found in our companion paper (Yin et al., 2025), now also published to eLife: S. Yin, C. Liu, G. P. T. Choi, Y. Jung, K. Heuer, R. Toro, L. Mahadevan, Morphogenesis and morphometry of brain folding patterns across species. eLife, 14, RP107138, 2025. doi:10.7554/eLife.107138

      In future work, we plan to obtain malformed human cortical surface data, which would allow us to further investigate related alterations more directly. We have added a remark on this in the revised manuscript (please see page 8–9).

      Reviewer 2 (Public review):

      Summary:

      Based on MRI data of the ferret (a gyrencephalic non-primate animal, in whom folding happens postnatally), the authors create in vitro physical gel models and in silico numerical simulations of typical cortical gyrification. They then use genetic manipulations of animal models to demonstrate that cortical thickness and expansion rate are primary drivers of atypical morphogenesis. These observations are then used to explain cortical malformations in humans.

      Strengths:

      The paper is very interesting and original, and combines physical gel experiments, numerical simulations, as well as observations in MCD. The figures are informative, and the results appear to have good overall face validity.

      We thank the reviewer for the very positive comments.

      Weaknesses:

      On the other hand, I perceived some lack of quantitative analyses in the different experiments, and currently, there seems to be rather a visual/qualitative interpretation of the different processes and their similarities/differences. Ideally, the authors also quantify local/pointwise surface expansion in the physical and simulation experiments, to more directly compare these processes. Time courses of eg, cortical curvature changes, could also be plotted and compared for those experiments. I had a similar impression about the comparisons between simulation results and human MRI data. Again, face validity appears high, but the comparison appeared mainly qualitative.

      We thank the reviewer for the comments. Besides the visual and qualitative comparisons between the models, we would like to point out that we have included the quantification of the shape difference between the real and simulated ferret brain models via spherical parameterization and the curvature-based shape index as detailed in main text Fig. 4 and SI Section 3. We have also utilized spherical harmonics representations for the comparison between the real and simulated ferret brains at different maximum order N. In our revision, we have included more calculations for the comparison between the real and simulated ferret brains at more time points in the SI (please see SI page 6). As for the comparison between the malformation simulation results and human MRI data in the current work, since the human MRI data are two-dimensional while our computational models are threedimensional, we focus on the qualitative comparison between them. In future work, we plan to obtain malformed human cortical surface data, from which we can then perform the parameterization-based and curvature-based shape analysis for a more quantitative assessment.

      I felt that MCDs could have been better contextualized in the introduction.

      We thank the reviewer for the comment. In our revision, we have revised the description of MCDs in the introduction (please see page 2).

      Reviewer #1 (Recommendations for the authors):

      The study is beautifully presented and offers an excellent complement to the work presented by Yin et al. In its current form, the malformation portion of the study appears predominantly reliant on the numerical simulations rather than the gel model. It might be helpful, therefore, to further incorporate the results presented in Figure S5 into the main text, as this seems to be a clear application of the physical gel model to modelling malformations. Any additional use of the gel models in the malformation portion of the study would help to further justify the necessity and complementarity of the dual methodological approaches.

      We thank the reviewer for the suggestion. We have moved Fig. S5 and the associated description to the main text in the revised manuscript (please see the newly added Figure 5 on page 6 and the description on page 5–7). In particular, we have included a new section on the physical gel and computational models for ferret cortical malformations right before the section on the neurology of ferret and human cortical malformations.

      One additional consideration is that the analyses in the current study focus entirely on the ferret cortex. Given the emphasis in the title on the human brain, it may be worthwhile to either consider adding additional modelling of the human cortex or to consider modifying the title to more accurately align with the focus of the methods/results.

      We thank the reviewer for the suggestion. While the current gel and organismal experiments focus on the ferret only, we want to emphasize that our analysis does consider previous observations of human brains and morphologies therein (Tallinen et al., Proc. Natl. Acad. Sci. 2014; Tallinen et al., Nat. Phys. 2016), which we compare and explain. This allows us to analyze the implications of our study broadly to understand the explanations of cortical malformations in humans using the ferret to motivate our study. Therefore, we think that the title of the paper seems reasonable. To further highlight the connection between the ferret brain simulations and human brain growth, we have included an additional comparison between human brain surface reconstructions adapted from a prior study and the ferret simulation results in the SI (please see SI Section S4 and SI Fig. S5 on page 9–10).

      Two additional minor points:

      Table S1 seems sufficiently critical to the motivation for the study and organization of the results section to justify inclusion in the main text. Of course, I would leave any such minor changes to the discretion of the authors.

      We thank the reviewer for the suggestion. We have moved Table S1 and the associated description to the main text in the revised manuscript (please see Table 1 on page 7).

      Page 7, Column 1: “macacques” → “macaques”.

      We thank the reviewer for pointing out the typo. We have fixed it in the revised manuscript (please see page 8).

      Reviewer #2 (Recommendations for the authors):

      The methods lack details on the human MRI data and patients.

      We thank the reviewer for the comment. Note that the human MRI data and patients were from prior works (Smith et al., Neuron 2018; Johnson et al., Nature 2018; Akula et al., Proc. Natl. Acad. Sci. 2023) and were used for the discussion on cortical malformations in Fig. 6. In the revision, we have included a new subsection in the Methods section and provided more details and references of the MRI data and patients (please see page 9–10).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The statistically adequate way of testing the biases is a hierarchical regression model (LMM) with a distance of the physical location from the nipple as a predictor, and a distance of the reported location from the nipple as a dependent variable. Either variable can be unsigned or signed for greater power, for example, coding the lateral breast as negative and the medial breast as positive. The bias will show in regression coefficients smaller than 1.

      Thank you for this suggestion. We have subsequently replaced the relevant ANOVA analyses with LMM analyses. Specifically, we use an LMM for breast and back separately to show the different effects of distance, then use a combined LMM to compare the interaction. Finally, we use an LMM to assess the differences between precision and bias on the back and breast. The new analysis confirms earlier statements and do not change the results/interpretation of the data.

      Moreover, any bias towards the nipple could simply be another instance of regression to the mean of the stimulus distribution, given that the tested locations were centered on the nipple. This confound can only be experimentally solved by shifting the distribution of the tested locations. Finally, given that participants indicated the locations on a 3D model of the body part, further experimentation would be required to determine whether there is a perceptual bias towards the nipple or whether the authors merely find a response bias.

      A localization bias toward the nipple in this context does not show that the nipple is the anchor of the breast's tactile coordinate system. The result might simply be an instance of regression to the mean of the stimulus distribution (also known as experimental prior). To convincingly show localization biases towards the nipple, the tested locations should be centered at another location on the breast.

      Another problem is the visual salience of the nipple, even though Blender models were uniformly grey. With this type of direct localization, it is very difficult to distinguish perceptual from response biases even if the regression to the mean problem is solved. There are two solutions to this problem: 1) Varying the uncertainty of the tactile spatial information, for example, by using a pen that exerts lighter pressure. A perceptual bias should be stronger for more uncertain sensory information; a response bias should be the same across conditions. 2) Measure bias with a 2IFC procedure by taking advantage of the fact that sensory information is noisier if the test is presented before the standard.

      We believe that the fact that we explicitly tested two locations with equally distributed test locations, both of which had landmarks, makes this unlikely. Indeed, testing on the back is exactly what the reviewer suggests. It would also be impossible to test this “on another location on the breast” as we are sampling across the whole breast. Moreover, as markers persisted on the model within each block, the participants were generating additional landmarks on each trial. Thus, if there were any regression to the mean, this would be observed for both locations. Nevertheless, we recognize that this test cannot distinguish between a sensory bias towards the nipple and consistent response bias that is always in the direction of the nipple, though to what extent these are the same thing is difficult to disentangle. That said, if we had restricted testing to half of the breast such that the distribution of points was asymmetrical this would allow us to test the hypothesis put forward by the reviewer. We recognize that this is a limitation of the data and have downplayed statements and added caveats accordingly.

      We have changed the appropriate heading and text in the discussion to downplay the finding:

      “Reports are biased towards the nipple”

      “suggesting that the nipple plays a pivotal role in the mental representation of the breast.”

      it might be harder to learn the range of locations on the back given that stimulation is not restricted to an anatomically defined region as it is the case for the breast.

      We apologize for any confusion but the point distribution is identical between tasks, as described in the methods.

      The stability of the JND differences between body parts across subjects is already captured in the analysis of the JNDs; the ANOVA and the post-hoc testing would not be significant if the order were not relatively stable across participants. Thus, it is unclear why this is being evaluated again with reduced power due to improper statistics.

      We apologize for any confusion here. Only one ANOVA with post-hoc testing was performed on the data. The second parenthetical describing the test was perhaps redundant and confusing, so I have removed it.

      “(Error! Reference source not found.A, B, 1-way ANOVA with Tukey’s HSD post-hoc t-test: p = 0.0284)”

      The null hypothesis of an ANOVA is that at least one of the mean values is different from the others; adding participants as a factor does not provide evidence for similarity.

      We agree with this statement and have removed the appropriate text.

      The pairwise correlations between body parts seem to be exploratory in nature. Like all exploratory analyses, the question arises of how much potential extra insights outweigh the risk of false positives. It would be hard to generate data with significant differences between several conditions and not find any correlations between pairs of conditions. Thus, the a priori chance of finding a significant correlation is much higher than what a correction accounts for.

      We broadly agree with this statement. However, we believe that the analyses were important to determine if participants were systematically more or less acute across body parts. Moreover, both the fact that we actually did not observe any other significant relationships and that we performed post-hoc correction imply that no false positives were observed. Indeed, in the one relationship that was observed, we would need to have an assumed FDR over 10x higher than the existing post hoc correction required implying a true relationship.

      If the JND at mid breast (measured with locations centered at the nipple) is roughly the same size as the nipple, it is not surprising that participants have difficulty with the categorical localization task on the nipple but perform better than chance on the significantly larger areola.

      We agree that it is not surprising given the previously shown data, however, the initial finding is surprising to many and this experiment serves to reinforce the previous finding.

      Neither signed nor absolute localization error can be compared to the results of the previous experiments. The JND should be roughly proportional to the variance of the errors.

      We apologize for any confusion, however we are not comparing the values, merely observing that the results are consistent.

      Reviewer #2 (Public review):

      I had a hard time understanding some parts of the report. What is meant by "broadly no relationship" in line 137?

      We have removed the qualifier to simplify the text.

      It is suggested that spatial expansion (which is correlated with body part size) is related between medial breast and hand - is this to say that women with large hands have large medial breast size? Nipple size was measured, but hand size was not measured, is this correct?

      Correct. We have added text to state as such.

      It is furthermore unclear how the authors differentiate medial breast and NAC. The sentence in lines 140-141 seems to imply the two terms are considered the same, as a conclusion about NAC is drawn from a result about the medial breast. This requires clarification.

      Thank you for catching this, we have corrected it in the text.

      Finally, given that the authors suspect that overall localization ability (or attention) may be overshadowed by a size effect, would not an analysis be adequate that integrates both, e.g. a regression with multiple predictors?

      If the reviewer means that participants would be consistently “acute” then we believe that SF1 would have stronger correlations. Consequently, we see no reason to add “overall tactile acuity” as a predictor.

      In the paragraph about testing quadrants of the nipple, it is stated that only 3 of 10 participants barely outperformed chance with a p < 0.01. It is unclear how a significant ttest is an indication of "barely above chance".

      We have adjusted the text to clarify our meaning.

      “On the nipple, however, participants were consistently worse at locating stimuli on the nipple than the breast (paired t-test, t = 3.42, p < 0.01) where only 3 of the 10 participants outperformed chance, though the group as a whole outperformed chance (Error! Reference source not found.B, 36% ± 13%; Z = 5.5, p < 0.01).”

      The final part of the paragraph on nipple quadrants (starting line 176) explains that there was a trend (4 of 10 participants) for lower tactile acuity being related to the inability to differentiate quadrants. It seems to me that such a result would not be expected: The stated hypothesis is that all participants have the same number of tactile sensors in their nipple and areola, independent of NAC size. In this section, participants determine the quadrant of a single touch. Theoretically, all participants should be equally able to perform this task, because they all have the same number of receptors in each quadrant of nipple and areola. Thus, the result in Figure 2C is curious.

      We agree that this result seemingly contradicts observations from the previous experiment, however we believe that it relates to the distinction between the ability to perform relative distinctions and absolute localizations. In the first experiment, the presentation of two sequential points provides an implicit reference whereas in the quadrant task there is no reference. With the results of the third experiment in mind, biases towards the nipple would effectively reduce the ability of participants to identify the quadrant. What this result may imply is that the degree of bias is greater for women with greater expansion. We have added text to the discussion to lay this out.

      “This negative trend implicitly contradicts the previous result where one might expect equal performance regardless of size as the location of the stimuli was scaled to the size of the nipple and areola. However, given the absence of a reference point, systematic biases are more likely to occur and thus may reflect a relationship between localization bias and breast size.”

      This section reports an Anova (line 193/194) with a factor "participant". This doesn't appear sensible. Please clarify. The factor distance is also unclear; is this a categorical or a continuous variable? Line 400 implies a 6-level factor, but Anovas and their factors, respectively, are not described in methods (nor are any of the other statistical approaches).

      We believe this comment has been addressed above with our replacement of the ANOVA with an LMM. We have also added descriptions of the analysis throughout the methods.

      The analysis on imprecision using mean pairwise error (line 199) is unclear: does pairwise refer to x/y or to touch vs. center of the nipple?

      We have clarified this to now read:

      “To measure the imprecision, we computed the mean pairwise distance between each of the reported locations for a given stimulus location and the mean reported location.”

      p8, upper text, what is meant by "relative over-representation of the depth axis"? Does this refer to the breast having depth but the equivalent area on the back not having depth? What are the horizontal planes (probably meant to be singular?) - do you simply mean that depth was ignored for the calculation of errors? This seems to be implied in Figure 3AB.

      This is indeed what we meant. We have attempted to clarify in the text.

      “Importantly, given the relative over-representation of the depth axis for the breast, we only considered angles in the horizontal planes such that the shape of the breast did not influence the results.” Became:

      “Importantly, because the back is a relatively flat surface in comparison to the breast, errors were only computed in the horizontal plane and depth was excluded when computing the angular error.”

      Lines 232-241, I cannot follow the conclusions drawn here. First, it is not clear to a reader what the aim of the presented analyses is: what are you looking for when you analyze the vectors? Second, "vector strength" should be briefly explained in the main text. Third, it is not clear how the final conclusion is drawn. If there is a bias of all locations towards the nipple, then a point closer to the nipple cannot exhibit a large bias, because the nipple is close-by. Therefore, one would expect that points close to the nipple exhibit smaller errors, but this would not imply higher acuity - just less space for localizing anything. The higher acuity conclusion is at odds with the remaining results, isn't it: acuity is low on the outer breast, but even lower at the NAC, so why would it be high in between the two?

      Thank you for pointing out the circular logic. We have replaced this sentence with a more accurate statement.

      “Given these findings, we conclude that the breast has lower tactile acuity than the hand and is instead comparable to the back. Moreover, localization of tactile events to both the back and breast are inaccurate but localizations to the breast are consistently biased towards the nipple.”

      The discussion makes some concrete suggestions for sensors in implants (line 283). It is not clear how the stated numbers were computed. Also, why should 4 sensors nipple quadrants receive individual sensors if the result here was that participants cannot distinguish these quadrants?

      Thank you for catching this, it should have been 4 sensors for the NAC, not just the nipple. We have fixed this in the text.

      I would find it interesting to know whether participants with small breast measurement delta had breast acuity comparable to the back. Alternatively, it would be interesting to know whether breast and back acuity are comparable in men. Such a result would imply that the torso has uniform acuity overall, but any spatial extension of the breast is unaccounted for. The lowest single participant data points in Figure 1B appear similar, which might support this idea.

      We agree that this is an interesting question and as you point out, the data does indicate that in cases of minimal expansion acuity may be constant on the torso. However, in the comparison of the JNDs, post-hoc testing revealed no significant difference between the back and either breast region. Consequently, subsampling the group would result in the same result. We have added a sentence to the discussion stating this.

      “Consequently, the acuity of the breast is likely determined initially by torso acuity and then any expansion.”

    1. self

      [/ 🧊/ ♖/ hyperpost/ ~/ indyweb/ 📓/ 20/ 25/ 11/ 3/ 🏛️](https://bafybeicbv7b4bpesh5wmnynftywhm2dzrswf6csndh2v4ndu2n3uuex4ny.ipfs.dweb.link/?filename=save%20string%20to%20local%20filesystem%20javascript%20-%20Brave%20Search%20(11_13_2025%208%EF%BC%9A27%EF%BC%9A28%20AM).html}

    1. 四、在寫筆記的時候,怎麼問自己問題?在《How to take smart notes?》中,作者 Sönke Ahrens 提供了 4 個問題範例:這個 idea 跟我有什麼關係?這個現象/事實可以被其他事情解釋嗎?X 這件事情對 Y 有什麼意義?我如何利用這個 idea 去解釋 Z(某個 topic) ?

      在書中畫重點後的下一步,可以用Obsidian筆記系統。

    1. Las universidades o son centros de pensamiento independiente y creativo, o no son.

      Esto implicaría que ¡en América Latina casi no hay universidades! Apenas unas cuantas por país. Una postura idealista del autor.

    2. los cambios rápidos y la presión social pueden llevar a decisiones equivocadas si no se reflexiona adecuadamente sobre la dirección a tomar.

      Reflejo del carácter racional del pensamiento universitario.

    3. ante los recortes y la inacción de los gobiernos más allá de los designios del mercado, las universidades públicas se han mostrado autosuficientes mientras crecía el número de estudiantes, el profesorado aceptaba la precariedad, el funcionamiento ordinario se mantenía pese al deterioro de las infraestructuras y, sobre todo, las sociedades mantenían su confianza.

      Autosuficientes para operar como han operado, lejos del ideal de universidad que ha expresado previamente el autor, lo que me parece ha reducido significativamente la confianza en ellas de la sociedad.

    4. Universidades y comunidades han crecido en una relación simbiótica que trae su causa en la libertad academica que se garantiza a las universidades desde su autonomía

      Se puede esperar entonces que comunidades que han tenido problemas serios de crecimiento tengan universidades con problemas serios en su operación.

    1. P r o g n o s i s s h o u l d b e e s t a b l i s h e d b e f o r et r e a t m e n t i s s t a r t e d a n d b a s e d o n t h i sp r o g n o s i s y o u r t r e a t m e n t p l a n s h o u l d b ed o n e ...

      ① Prognosis should be established before treatment is started and based on this prognosis your treatment plan should be done (Prognoz, tedaviye başlamadan önce belirlenmelidir ve bu prognoza dayanarak tedavi planınız yapılmalıdır)

      Açıklama:

      Prognoz, hastalığın olası seyri ve tedaviye yanıtını öngörür.

    Annotators

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This paper by Poverlein et al reports the substantial membrane deformation around the oxidative phosphorylation super complex, proposing that this deformation is a key part of super complex formation. I found the paper interesting and well-written.

      We thank the Reviewer for finding our work interesting. 

      Analysis of the bilayer curvature is challenging on the fine lengthscales they have used and produces unexpectedly large energies (Table 1). Additionally, the authors use the mean curvature (Eq. S5) as input to the (uncited, but it seems clear that this is Helfrich) Helfrich Hamiltonian (Eq. S7). If an errant factor of one half has been included with curvature, this would quarter the curvature energy compared to the real energy, due to the squared curvature.

      We thank the Reviewer for raising this important issue. We have now clarified in the SI and main manuscript that we employ the Helfrich model. In our initial implementation, we indeed used the mean curvature H, thereby missing a factor of 2. As the Reviewer correctly noted, this resulted in curvature deformation energies that were underestimated by a factor of ~4. We have now corrected for this effect in the revised analysis, and the updated Table 1. Importantly, however, this correction does not alter the general conclusions of our work that supercomplex formation relieves membrane strain and stabilizes the system. We have added an additional paragraph where we discuss the magnitude of the observed bending effects, and compared the previous estimates in literature:

      SI: 

      “The local mean curvature of the membrane midplane was computed using the Helfrich model (4,5) …”

      (4) W. Helfrich, Elastic properties of lipid bilayers theory and possible experiments. Zeitschrift für Naturforschung 28c, 693-703 (1973).

      (5) F. Campelo et al., Helfrich model of membrane bending: From Gibbs theory of liquid interfaces to membranes as thick anisotropic elastic layers. Advances in Colloid and Interface Science 208, 25-33 (2014).

      Main Text: 

      “which measures the energetic cost of deforming the membrane from a flat geometry (ΔG<sub>curv</sub>) based on the Helfrich model (45, 46). …

      Our analysis suggests that both contributions are substantially reduced upon formation of the SC, with the curvature penalty decreasing by 79.2 ± 5.2 kcal mol<sup>-1</sup> (for a membrane area of ca. 1000 nm<sup>2</sup>) and the thickness penalty by 2.8 ± 2.0 kcal mol<sup>-1</sup> (Table 1).”

      “We note that the magnitude of the estimated bending energies (~10² kcal mol<sup>-1</sup>) (Table 1), while seemingly high at first glance, falls within the range expected for large-scale membrane deformation processes induced by large multi-domain proteins. For example, the Piezo mechanosensitive channel performs roughly 150k<sub>B</sub>T (≈ 90 kcal mol⁻¹) of work to bend the bilayer into its dome-like shape (65). Comparable energies have also been estimated for the nucleation of small membrane pores (66), while vesicle formation typically requires bending energies on the order of 300 kcal mol<sup>-1</sup>, largely independent of vesicle size (67). When normalized by the affected membrane area (~1000 nm<sup>2</sup>), these values correspond to an energy density of approximately 0.1 kcal mol<sup>-1</sup> nm<sup>-2</sup>, which places our estimates within a biophysically reasonable regime. Notably, cryo-EM structures of several supercomplexes shows that such assemblies can impose significant curvature on the surrounding bilayer (36, 50, 68), supporting the notion that respiratory chain organization is closely coupled to local membrane deformation. Nevertheless, we expect that the absolute deformation energies may be overestimated, as the continuum Helfrich model neglects molecular-level effects such as lipid tilt and local rearrangements, which can partially relax curvature stresses and reduce the effective bending penalty near protein–membrane interfaces (69, 70).”

      The bending modulus used (ca. 5 kcal/mol) is small on the scale of typically observed biological bending moduli. This suggests the curvature energies are indeed much higher even than the high values reported. Some of this may be due to the spontaneous curvature of the lipids and perhaps the effect of the protein modifying the nearby lipids properties.

      The SI initially included an incorrect value for the bending modulus (20 kJ mol<sup>-1</sup> instead of 20k<sub>B</sub>T), which has now been corrected. The revised value is consistent with experimentally reported bending moduli from X-ray scattering measurements, although there remains substantial uncertainty in the precise values across different experimental and computational studies.

      “The bending deformation energy was computed from the mean curvature field H(x,y), assuming a constant bilayer bending modulus κ (taken as 20k<sub>b</sub>T  = 11.85 kcal mol<sup>-1</sup> (6)):”

      (6) S. Brown et al., Comparative analysis of bending moduli in one-component membranes via coarsegrained molecular dynamics simulations. Biophysical Journal 124, 1–13 (2025).

      It is unclear how CDL is supporting SC formation if its effect stabilizing the membrane deformation is strong or if it is acting as an electrostatic glue. While this is a weakenss for a definite quantification of the effect of CDL on SC formation, the study presents an interesting observation of CDL redistribution and could be an interesting topic for future work.

      We agree with the Reviewer that future studies would be important to investigate the relationship between CDL-induced stabilization of membrane and its electrostatic effects.  

      In summary, the qualitative data presented are interesting (especially the combination of molecular modeling with simpler Monte Carlo modeling aiding broader interpretation of the results). The energies of the membrane deformations are quite large. This might reflect the roles of specific lipids stabilizing those deformations, or the inherent difficulty in characterizing nanometer-scale curvature.

      We thank the Reviewer for appreciating our work and for the help in further improving our findings.

      Reviewer #3 (Public review):

      Summary:

      In this contribution, the authors report atomistic, coarse-grained and lattice simulations to analyze the mechanism of supercomplex (SC) formation in mitochondria. The results highlight the importance of membrane deformation as one of the major driving forces for the SC formation, which is not entirely surprising given prior work on membrane protein assembly, but certainly of major mechanistic significance for the specific systems of interest.

      We thank Reviewer 3 for appreciating the importance of our study. 

      Strengths:

      The combination of complementary approaches, including an interesting (re)analysis of cryo-EM data, is particularly powerful, and might be applicable to the analysis of related systems. The calculations also revealed that SC formation has interesting impacts on the structural and dynamical (motional correlation) properties of the individual protein components, suggesting further functional relevance of SC formation. In the revision, the authors further clarified and quantified their analysis of membrane responses, leading to further insights into membrane contributions. They have also toned down the decomposition of membrane contributions into enthalpic and entropic contributions, which is difficult to do. Overall, the study is rather thorough, highly creative and the impact on the field is expected to be significant.

      Weaknesses:

      Upon revision, I believe the weakness identified in previous work has been largely alleviated.

      We thank the Reviewer for their previous remarks, which allowed us to significantly improve our manuscript.

    1. aiming to augment their own experiences and through that ended up uh augmenting uh what the rest of humanity can do.

      augmenting what the rest of humanity can do

    1. Synthèse des "Rendez-vous de la techno" : La filière STI2D

      Résumé

      Ce document synthétise les informations et témoignages présentés lors de l'événement "Les rendez-vous de la techno" consacré à la filière Sciences et Technologies de l'Industrie et du Développement Durable (STI2D).

      La filière STI2D se positionne comme une voie d'excellence scientifique et technologique, conçue pour les élèves qui privilégient l'apprentissage par la pratique, la manipulation et la réalisation de projets concrets, en contraste avec l'approche plus théorique de la voie générale.

      Elle s'adresse à des profils créatifs, aimant le travail en groupe, la résolution de problèmes et l'innovation.

      Le cursus est structuré pour fournir des connaissances solides en sciences, technologie, mathématiques et ingénierie, tout en développant une sensibilité aux enjeux industriels et environnementaux.

      La pédagogie, axée sur des projets concrets comme la conception d'une voiture solaire ou la modélisation 3D de châteaux, permet aux élèves de mettre en œuvre leurs compétences de manière tangible.

      La filière STI2D se distingue par la grande diversité des poursuites d'études qu'elle autorise.

      Elle ouvre aussi bien la voie à des études courtes (BTS, BUT) qu'à des parcours longs et exigeants menant aux plus hautes qualifications (Classes Préparatoires aux Grandes Écoles TSI, écoles d'ingénieurs, licences universitaires).

      Les témoignages d'élèves et d'étudiants confirment que la filière constitue un tremplin efficace vers la réussite, y compris pour des élèves se réorientant depuis la voie générale, et que ses diplômés sont recherchés dans de nombreux secteurs d'activité de pointe.

      --------------------------------------------------------------------------------

      1. Présentation Générale de la Filière STI2D

      1.1. Public Cible et Profil de l'Élève

      La filière STI2D est accessible après une classe de seconde générale et technologique.

      Elle est particulièrement adaptée aux élèves présentant les caractéristiques suivantes :

      Intérêt pour la technologie et les sciences : Un goût prononcé pour la manipulation, la compréhension des phénomènes physiques et la mise en œuvre de solutions techniques.

      Esprit pratique et créatif : L'envie de travailler en groupe sur des projets, de résoudre des problèmes concrets et de faire preuve de créativité et d'innovation.

      Ambition : La filière attire des élèves qui envisagent des carrières d'ingénieur ou de technicien supérieur.

      Selon Mme Amarante, le choix de cette filière correspond à un profil qui "aime la technologie", qui est "plutôt créatif", qui "aime aussi résoudre des problèmes, trouver des solutions".

      1.2. Compétences et Connaissances Acquises

      Le baccalauréat STI2D est présenté comme un "bac technologique plutôt scientifique" qui permet d'acquérir des compétences solides et variées :

      Connaissances pluridisciplinaires : Sciences, technologie, mathématiques et ingénierie.

      Compétences industrielles et environnementales : Une sensibilisation forte aux enjeux de l'industrie moderne et du développement durable.

      Approche design et innovation : Développement de la créativité et de la capacité à innover.

      --------------------------------------------------------------------------------

      2. Structure du Cursus Pédagogique

      L'enseignement en STI2D est conçu pour rendre les concepts scientifiques plus accessibles par l'expérimentation et la réalisation.

      2.1. Classe de Première

      L'objectif est de permettre aux élèves qui "ont du mal à comprendre les enseignements" de manière abstraite de "se rapprocher de la manipulation" et de "comprendre des phénomènes en petit groupe".

      Le programme s'articule autour de deux spécialités :

      Ingénierie, Innovation et Développement Durable (I2D) : Acquisition de connaissances scientifiques fondamentales à travers trois domaines : la matière, l'énergie et l'information.

      Innovation Technologique (IT) : Mise en œuvre des connaissances acquises en I2D à travers la réalisation de trois projets concrets durant l'année.

      2.2. Classe de Terminale

      En terminale, l'enseignement de spécialité I2D se poursuit, complété par un choix parmi quatre approfondissements spécifiques. L'année est marquée par un projet de 72 heures qui couvre l'étude, l'analyse, la conception, la simulation et le prototypage.

      Spécialité

      Acronyme

      Description

      Architecture et Construction

      AC

      Approfondissement des connaissances liées à la matière et à la structure.

      Innovation Technologique et Éco-conception

      ITEC

      Approfondissement des connaissances liées à la conception mécanique et au design.

      Systèmes d'Information et Numérique

      SIN

      Approfondissement des connaissances liées à l'informatique et aux systèmes numériques.

      Énergie et Environnement

      EE

      Approfondissement des connaissances liées à la gestion, au transport et au stockage de l'énergie.

      Un exemple de projet pluridisciplinaire cité est celui de la voiture solaire, qui a mobilisé trois spécialités :

      AC pour la conception du châssis.

      EE pour la gestion de l'énergie (panneaux solaires, stockage, alimentation moteur).

      SIN pour la commande et le pilotage de la voiture.

      --------------------------------------------------------------------------------

      3. Poursuites d'Études et Débouchés

      La filière STI2D offre un large éventail de possibilités après le baccalauréat, permettant aux élèves de choisir entre des études courtes ou longues.

      3.1. Panorama des Options Post-Baccalauréat

      Type de Parcours

      Formations Possibles

      Exemples Cités

      Études Courtes (Bac+2 / Bac+3)

      BTS (Brevet de Technicien Supérieur)

      BTS CIEL (Informatique et Réseau), BTS Électrotechnique, CPI, CPRP, CRSA.

      BUT (Bachelor Universitaire de Technologie)

      BUT Génie Civil Construction Durable, BUT Informatique, BUT Génie Industriel et Maintenance. Il est à noter que les BUT ont des places réservées pour les bacheliers technologiques.

      Études Longues (Bac+5 et plus)

      Classes Préparatoires aux Grandes Écoles (CPGE)

      Prépa TSI (Technologie et Sciences Industrielles), spécifiquement destinée aux bacheliers STI2D/STL, et Prépa TPC (Technologie, Physique et Chimie).

      Écoles d'Ingénieurs

      Accès direct via le concours GPI Polytech pour STI2D/STL ou après une CPGE ou un BTS/BUT.

      Licences Universitaires

      Licence Informatique, Mathématiques, Physique, Sciences pour l'Ingénieur.

      3.2. Données et Tendances (Parcoursup Janvier 2025)

      Les données de Parcoursup indiquent une répartition équilibrée des choix des bacheliers STI2D, avec "autant de jeunes qui s'orientent vers des BTS que sur des BUT".

      Un nombre légèrement inférieur d'élèves se dirige directement vers les classes préparatoires, les écoles d'ingénieurs ou les licences universitaires.

      3.3. Secteurs d'Activité

      Les diplômés peuvent intégrer des secteurs très variés, dont beaucoup sont des "métiers en tension" :

      • BTP, architecture

      • Énergie, électronique, environnement

      • Audiovisuel, informatique, recherche et développement

      • Secteurs de pointe : aéronautique, ferroviaire, construction navale

      --------------------------------------------------------------------------------

      4. Témoignages et Expériences Pratiques

      4.1. L'Atelier de Prototypage : Une Démonstration Concrète

      Une visite de l'atelier de prototypage a été organisée pour des élèves de seconde. Guidés par M. René, ils ont découvert :

      Des machines de fabrication complexes : Une voiture de course fabriquée sur place et ayant participé à une course à Albi.

      Des technologies de prototypage rapide : Des imprimantes 3D plastique et métal, ainsi qu'une machine de découpe laser.

      La démonstration a mis en évidence la simplicité d'utilisation de certaines machines, incarnant l'esprit "Fablab" du lycée. Un élève a pu utiliser la machine de découpe laser après seulement 10 minutes d'explications pour réaliser une pièce. Cette expérience a souligné l'accessibilité de la technologie et la capacité des élèves à "concevoir et réaliser des pièces" rapidement.

      4.2. Paroles d'Élèves de Terminale STI2D

      Les témoignages des élèves de terminale illustrent la richesse et la diversité des parcours et des projets au sein de la filière.

      Spécialité Architecture et Construction (AC) :

      Jade a travaillé sur la modélisation des conduites d'eaux usées d'une ville fictive (Moeville) et souhaite devenir architecte d'intérieur.  

      Albin, réorienté depuis la première générale, ne "regrette pas du tout" son choix.

      Il a participé à un projet de visite et de modélisation 3D du château de Jaligny.

      Il souligne la valeur de l'approche plus appliquée de la filière et vise une école d'architecture ou un BUT Génie Civil.

      Spécialité Énergie et Environnement (EE) :

      Tom a choisi cette filière pour son "attrait relativement particulier pour tout ce qui était les énergies" et le désir "d'améliorer le fonctionnement de la société sur son point énergétique".

      Bien qu'il se destine à devenir pilote, il "prend du plaisir à suivre les cours".

      Spécialité Innovation Technologique et Éco-conception (ITEC) :

      Will a choisi ITEC car il avait "beaucoup aimé les cours d'innovation technologique" en première.

      Il se dirige vers une école d'informatique ou de cybersécurité.  

      Zoé, intéressée par le design (automobile, espace, mode), trouve que la spécialité ITEC est une bonne formation polyvalente où "on fait un peu de tout".

      Spécialité Systèmes d'Information et Numérique (SIN) :

      Liam apprécie le fait qu'en filière technologique, "il y a plus de pratique que de théorie" et que "on travaille plus souvent en classe qu'à la maison".    ◦ Martin a choisi la filière STI2D pour accéder à la spécialité SIN en vue d'une carrière dans l'informatique. Il n'est "pas déçu" et s'oriente vers les sciences des données.

      4.3. Paroles d'Étudiants en Post-Baccalauréat

      BTS :

      ◦ Les étudiants de BTS CPI (Conception de Produits Industriels) montrent la complémentarité des parcours : Chris vient d'un bac général et y voit "la continuité de la matière science de l'ingénieur", tandis que Gauthier vient d'un bac STI2D ITEC et a été attiré par "le design qu'on faisait en ITECH".  

      Paul, en BTS CPRP, a préféré le cadre du BTS à celui du BUT pour son projet de carrière dans l'ingénierie militaire.

      Il note que la cohabitation entre bacheliers généraux et STI2D est "plutôt complémentaire", les uns apportant la théorie (maths, physique), les autres la pratique.

      Classe Préparatoire TSI :

      ◦ Deux étudiants confirment que la prépa est le "meilleur moyen pour faire ingénieur".

      Ils décrivent un changement de rythme important par rapport à la terminale : "Ça change de STI2D", "c'est vachement plus intense".

      Cependant, l'adaptation est facilitée par une "bonne ambiance" et une "beaucoup de solidarité", notamment à l'internat.

      --------------------------------------------------------------------------------

      5. Points Clés et Ressources

      5.1. Diversité et Représentation

      Il est souligné que la filière STI2D compte "globalement plus de garçons que de filles", tout en insistant sur le fait que "c'est aussi une filière pour les filles".

      La présence de plusieurs étudiantes parmi les témoins (Jade, Zoé, Joyce) vient appuyer ce propos.

      5.2. Outils d'Orientation

      Pour aider les élèves dans leur parcours, deux ressources numériques accessibles via "Mon Bureau Numérique" sont mises en avant :

      La plateforme Avenir : En lien avec l'ONISEP, elle propose de la documentation, des fiches formations et des témoignages.

      Mon projet sup : Un outil d'aide à la préparation du projet d'orientation au lycée, permettant de cibler des secteurs d'activité en fonction des compétences et des intérêts de l'élève.

    1. Author response:

      Reviewer #1

      We agree that further clarification how elevated exercise disrupts blastema formation would strengthen the manuscript. Our data suggests a major contribution of proliferation. Exercise reduced the fraction of proliferative cells at 3 dpa, consistent with disrupted HA production and downstream Yap signaling. This interpretation aligns with prior studies showing that proliferation contributes to blastema establishment and is not restricted to the outgrowth phase of fin regeneration (Poleo et al, 2001; Poss et al, 2002; Wang et al, 2019; Pfefferli et al, 2014; Hou et al, 2020). We will explore additional experiments to reinforce these insights into the cellular mechanisms underlying exercise-disrupted blastema formation.

      We acknowledge that our analysis of ray branching abnormalities is limited in the current manuscript. We focus our study on introducing the zebrafish swimming and regeneration model and then characterizing ECM and signaling changes accounting for disrupted blastema establishment. For completeness, we included the observation of skeletal patterning defects (branching delays and bone fusions) but without detailed analysis. We note that decreased expression of shha and Shh-pathway components following early exercise corresponds with the branching defects. However, we recognize exercise could have additional effects during the outgrowth  phase when branching morphogenesis actively occurs. Therefore, we will expand our discussion to outline future research directions related to exercise impacts on regenerative skeletal patterning.

      We will expand the Introduction and/or Discussion sections to provide more context on known HA roles across regeneration contexts, including in zebrafish fins. Finally, we will improve the text’s clarity and specificity throughout the manuscript, including to resolve or explain any apparent contradictions.

      Reviewer #2

      We appreciate the Reviewer's concern regarding the specificity of forced exercise as a model for mechanical loading. Forced exercise has been widely used in vivo to induce mechanical loading without the requirement for specialized implants or animal restraint, including in mouse (Wallace et al, 2015; Bomer et al, 2016), rat (Honda et al, 2003; Boerckel et al, 2011; Boerckel et al, 2012), and, most relevant to our study, zebrafish models (Fiaz et al, 2012; Fiaz et al, 2014; Suniaga et al, 2018). However, we will expand our discussion of this approach and ensure precise language distinguishing exercise from mechanical loading.

      We acknowledge the possibility that early shear stress disrupts the wound epidermis, which we will elaborate on in a revised Discussion. However, exercise-induced disruptions to the fin epidermis of early regenerates (1–2 dpa; Figure 2) typically resolve within one day, whereas fibroblast lineage cells still fail to establish a robust blastema. Therefore, sustained effects of mechanical loading and/or mechanosensation are likely major contributors to the observed regeneration phenotypes.

      We will explore whether HA acts as a general enhancer of fin regeneration by comparing blastemal HA supplementation vs. controls in non-exercised regenerating animals, if technically feasible. We will merge Figure S7 (HA supplementation) with Figure 5 (HA depletion) for clarity, as suggested.

      We will include a schematic and clear definitions for 'peripheral' and 'central' rays in a revised manuscript.

      Reviewer #3

      We included Hoechst and eosin fluorescent staining in the manuscript to show changes in tissue architecture following swimming exercise (Supplemental Figure 4). We will extend this histological analysis to include hematoxylin and eosin staining to provide additional tissue visualization.

      References

      Poleo G, Brown CW, Laforest L, Akimenko MA. Cell proliferation and movement during early fin regeneration in zebrafish. Dev Dyn. 2001 Aug;221(4):380-90.

      Poss KD, Nechiporuk A, Hillam AM, Johnson SL, Keating MT. Mps1 defines a proximal blastemal proliferative compartment essential for zebrafish fin regeneration. Development. 2002 Nov;129(22):5141-9.

      Wang YT, Tseng TL, Kuo YC, Yu JK, Su YH, Poss KD, Chen CH. Genetic Reprogramming of Positional Memory in a Regenerating Appendage. Curr Biol. 2019 Dec 16;29(24):4193-4207.e4.

      Pfefferli C, Müller F, Jaźwińska A, Wicky C. Specific NuRD components are required for fin regeneration in zebrafish. BMC Biol. 2014 Apr 29;12:30.

      Hou Y, Lee HJ, Chen Y, Ge J, Osman FOI, McAdow AR, Mokalled MH, Johnson SL, Zhao G, Wang T. Cellular diversity of the regenerating caudal fin. Sci Adv. 2020 Aug 12;6(33):eaba2084.

      Wallace IJ, Judex S, Demes B. Effects of load-bearing exercise on skeletal structure and mechanics differ between outbred populations of mice. Bone. 2015 Mar;72:1-8.

      Bomer N, Cornelis FM, Ramos YF, den Hollander W, Storms L, van der Breggen R, Lakenberg N, Slagboom PE, Meulenbelt I, Lories RJ. The effect of forced exercise on knee joints in Dio2(-/-) mice: type II iodothyronine deiodinase-deficient mice are less prone to develop OA-like cartilage damage upon excessive mechanical stress. Ann Rheum Dis. 2016 Mar;75(3):571-7.

      Honda A, Sogo N, Nagasawa S, Shimizu T, Umemura Y. High-impact exercise strengthens bone in osteopenic ovariectomized rats with the same outcome as Sham rats. J Appl Physiol (1985). 2003 Sep;95(3):1032-7.

      Boerckel JD, Kolambkar YM, Stevens HY, Lin AS, Dupont KM, Guldberg RE. Effects of in vivo mechanical loading on large bone defect regeneration. J Orthop Res. 2012 Jul;30(7):1067-75.

      Boerckel JD, Uhrig BA, Willett NJ, Huebsch N, Guldberg RE. Mechanical regulation of vascular growth and tissue regeneration in vivo. Proc Natl Acad Sci U S A. 2011 Sep 13;108(37):E674-80.

      Fiaz AW, Léon-Kloosterziel KM, Gort G, Schulte-Merker S, van Leeuwen JL, Kranenbarg S. Swim-training changes the spatio-temporal dynamics of skeletogenesis in zebrafish larvae (Danio rerio). PLoS One. 2012;7(4):e34072.

      Fiaz AW, Léon‐Kloosterziel KM, van Leeuwen JL, Kranenbarg S. Exploring the molecular link between swim‐training and caudal fin development in zebrafish (Danio rerio) larvae. Journal of Applied Ichthyology. 2014 Aug;30(4):753-61.

      Suniaga S, Rolvien T, Vom Scheidt A, Fiedler IAK, Bale HA, Huysseune A, Witten PE, Amling M, Busse B. Increased mechanical loading through controlled swimming exercise induces bone formation and mineralization in adult zebrafish. Sci Rep. 2018 Feb 26;8(1):3646.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study extends the previous interesting work of this group to address the potentially differential control of movement and posture. Their earlier work explored a broad range of data to make the case for a downstream neural integrator hypothesized to convert descending velocity movement commands into postural holding commands. Included in that data were observations from people with hemiparesis due to stroke. The current study uses similar data, but pushes into a different, but closely related direction, suggesting that these data may address the independence of these two fundamental components of motor control. I find the logic laid out in the second sentence of the abstract ("The paretic arm after stroke is notable for abnormalities both at rest and during movement, thus it provides an opportunity to address the relationships between control of reaching, stopping, and stabilizing") less then compelling, but the study does make some interesting observations. Foremost among them, is the relation between the resting force postural bias and the effect of force perturbations during the target hold periods, but not during movement. While this interesting observation is consistent with the central mechanism the authors suggest, it seems hard to me to rule out other mechanisms, including peripheral ones. These limitations should should be discussed.

      Thank you for summarizing our work. Note we have improved the logic in our abstract (…”providing an opportunity to ask whether control of these behaviors is independently affected in stroke”) based on your comments as outlined in our previous revision. We now extensively discuss limitations and potential alternative mechanisms in greater detail, in a dedicated section (lines 846-895; see response to reviewer 2 for further details).

      Reviewer #2 (Public review):

      Summary:

      Here the authors address the idea that postural and movement control are differentially impacted with stroke. Specifically, they examined whether resting postural forces influenced several metrics of sensorimotor control (e.g., initial reach angle, maximum lateral hand deviation following a perturbation, etc.) during movement or posture. The authors found that resting postural forces influenced control only following the posture perturbation for the paretic arm of stroke patients, but not during movement. They also found that resting postural forces were greater when the arm was unsupported, which correlated with abnormal synergies (as assessed by the Fugl-Meyer). The authors suggest that these findings can be explained by the idea that the neural circuitry associated with posture is relatively more impacted by stroke than the neural circuitry associated with movement. They also propose a conceptual model that differentially weights the reticulospinal tract (RST) and corticospinal tract (CST) to explain greater relative impairments with posture control relative to movement control, due to abnormal synergies, in those with stroke.

      Thank you for the brief but comprehensive summary. We would like to clarify one point: we do not suggest that our findings are necessarily due to the neural circuitry associated with posture being more impacted than the neural circuitry associated with movement. (rather, our conceptual model suggests that increased outflow through the (ipsilateral) RST, involved in posture, compensates for CST damage, at the expense of posture abnormalities spilling over into movement). Instead, we suggest that the neural circuitry for posture vs. movement control remains relatively separate in stroke, with impairments in posture control not substantially explaining impairments in movement control.

      Comments on revisions:

      The authors should be commended for being very responsive to comments and providing several further requested analyses, which have improved the paper. However, there is still some outstanding issues that make it difficult to fully support the provided interpretation.

      Thank you for appreciating our response to your earlier comments. We address the outstanding issues below.

      The authors say within the response, "We would also like to stress that these perturbations were not designed so that responses are directly compared to each other ***(though of course there is an *indirect* comparison in the sense that we show influence of biases in one type of perturbation but not the other)***." They then state in the first paragraph of the discussion that "Remarkably, these resting postural force biases did not seem to have a detectable effect upon any component of active reaching but only emerged during the control of holding still after the movement ended. The results suggest a dissociation between the control of movement and posture." The main issue here is relying on indirect comparisons (i.e., significant in one situation but not the other), instead of relying on direct comparisons. Using well-known example, just because one group / condition might display a significant linear relationship (i.e., slope_1 > 0) and another group / condition does not (slope_2 = 0), does not necessarily mean that the two groups / conditions are statistically different from one another [see Figure 1 in Makin, T. R., & Orban de Xivry, J. J. (2019). Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. eLife, 8, e48175.].

      We agree and are well aware of the limitation posed by an indirect comparison – hence the language we used to comment on the data (“did not seem”, “suggest”, etc.). To address this limitation, we performed a more direct comparison of how the two types of perturbations (moving vs. holding) interact with resting biases. For this comparison, we calculated a Response Asymmetry Index (RAI):

      Above, 𝑟<sub>𝐴</sub> is the response on direction where resting bias is most-aligned with the perturbation, and 𝑟<sub>𝑂</sub> is the response on direction where resting bias is most-opposed to the perturbation.

      We calculated RAIs for two response metrics used for both moving and holding perturbations: maximum deviation and time to stabilization/settling time. For these two response metrics, positive RAIs indicate an asymmetry in line with an effect of resting bias.

      The idea behind the RAI is that, while the magnitude of responses may well differ between the two types of perturbations, this will be accounted for by the ratio used to calculate the asymmetry. The same approach has been used to assess symmetry/laterality across a variety of different modalities, such as gait asymmetry (Robinson et al., 1987), the relative fMRI activity in the contralateral vs. ipsilateral sensorimotor cortex while performing a motor task (Cramer et al., 1997), or the relative strength of ipsilateral vs. contralateral responses to transcranial magnetic stimulation (McPherson et al., 2018). Notably, the normalization also addresses potential differences in overall stiffness between holding vs. moving perturbations, which would similarly affect aligned and opposing cases (see our response to your following point).

      Figure 8 shows RAIs we obtained for holding (red) vs. moving/pulse (blue) perturbations. For the maximum deviation (left), there is more asymmetry for the holding case though the pvalue is marginal (p=0.088) likely due to the large variability in the pulse case (individual values shown in black dots). For time to stabilization/settling time (right) the difference is significant (p=0.0048). Together, these analyses indicate that resting biases interact substantially more with holding compared to movement control, in line with a relative independence between these two control modalities. We now include this panel as Figure 8, and describe it in Results (lines 587-611).

      Note that even a direct comparison does not prove that resting biases and active movement control are perfectly independent. We now discuss these issues in more depth, in the new Limitations section suggested by the Reviewer (lines 836-849).

      The authors have provided reasonable rationale of why they chose certain perturbation waveforms for different. Yet it still holds that these different waveforms would likely yield very different muscular responses making it difficult to interpret the results and this remains a limitation. From the paper it is unknown how these different perturbations would differentially influence a variety of classic neuromuscular responses, including short-range stiffness and stretch reflexes, which would be at play here.

      Much of the results can be interpreted when one considers classic neuromuscular physiology. In Experiment 1, differences in resting postural bias in supported versus unsupported conditions can readily be explained since there is greater muscle activity in the unsupported condition that leads to greater muscle stiffness to resist mechanical perturbations (Rack, P. M., & Westbury, D. R. (1974). The short-range stiffness of active mammalian muscle and its effect on mechanical properties. The Journal of physiology, 240(2), 331-350.). Likewise muscle stiffness would scale with changes in muscle contraction with synergies. Importantly for experiment 2, muscle stiffness is reduced during movement (Rack and Westbury, 1974) which may explain why resting postural biases do not seem to be impacting movement. Likewise, muscle spindle activity is shown to scale with extrafusal muscle fiber activity and forces acting through the tendon (Blum, K. P., Campbell, K. S., Horslen, B. C., Nardelli, P., Housley, S. N., Cope, T. C., & Ting, L. H. (2020). Diverse and complex muscle spindle afferent firing properties emerge from multiscale muscle mechanics. eLife, 9, e55177.). The concern here is that the authors have not sufficiently considered muscle neurophysiology, how that might relate to their findings, and how that might impact their interpretation. Given the differences in perturbations and muscle states at different phases, the concern is that it is not possible to disentangle whether the results are due to classic neurophysiology, the hypothesis they propose, or both. Can the authors please comment.

      It is possible that neuromuscular physiology may explain part of our results. However, this would not contradict our conceptual model.

      Regarding Experiment 1, it is possible that stiffness would scale with changes in background muscle contraction as the reviewer suggests. Indeed, Bennett and al.(Bennett et al., 1992) used brief perturbations on the wrist to assess elbow stiffness, finding that, during movement, stiffness was increased in positions with a higher gravity load (and, in general, in positions where the net muscle torque was higher). However, during posture maintenance (like in our Experiment 1), they found that stiffness did not vary with (elbow) position or gravity load (two characteristics of our findings in Experiment 1):

      “The observed stiffness variation was not simply due to passive tissue or other joint angle dependent properties, as stiffnesses measured during posture were position invariant. Note that the minimum stiffness found in posture was higher than the peak stiffness measured during movement, and did not change much with the gravity load.” (illustrated in Fig. 5 of that paper)

      We thus find it very unlikely that stiffness explains the difference between the supported vs. unsupported conditions in Experiment 1.

      Even if stiffness modulation between the supported vs. unsupported conditions could explain our finding of stronger posture biases in the latter case, it would not be incompatible with our interpretation of increased RST drive: increased stiffness would potentially magnify the effects of the RST drive we propose to drive these resting biases. It is possible that the increase in resting biases under conditions of increased muscle contraction (lack of arm support) is mediated through an increase in muscle stiffness. In other words, the increase in resting biases may not directly reflect additional RST outflow per se, but the scaling, through stiffness, of the same magnitude of RST outflow. Understanding this interaction was beyond the scope of our experiment design; in line with this, we briefly comment about it in our Limitations section.

      Regarding Experiment 2, stiffness has indeed been shown to be lower during movement, and we now comment the potential effect of this on our results in the “Limitations” section (lines 815-830, replicated below). Importantly, for the case of holding perturbations, the increased stiffness associated with holding would increase resistance to both extension and flexion-inducing perturbations. Thus, higher stiffness would be unlikely to explain our finding whereby resting biases resist or aggravate the effects of holding perturbations depending on perturbation direction. In addition, the framework in Blum et al., that describes how interactions between alpha and gramma drive can explain muscle activity patterns, does not rule out central neural control of stiffness: “muscle spindles have a unique muscle-within-muscle design such that their firing depends critically on both peripheral and central factors” (emphasis ours). It may be, for example, that gamma motoneurons controlling muscle spindles and stiffness are modulated from input from the reticular formation, making this a mechanism in line with our conceptual model.

      “Moreover, it has been shown that joint stiffness is reduced during movement compared to holding control (Rack and Westbury, 1974; Bennett et al., 1992). Along similar lines, muscle spindle activity – which may modulate stiffness – scales with extrafusal muscle fiber activity (such as muscle exertion involved in holding) and forces acting through the tendon (Blum et al., 2020). Such observations could, in principle, explain why we were unable to detect a relationship between resting biases and active movement control but we readily found a relationship between resting biases and active holding control: reduced joint stiffness during movement could scale down the influence of resting abnormalities. There are two issues with this explanation, however. First, it is debatable whether this should be considered an alternative explanation per se: stiffness modulation could be, in total or in part, the manifestation of a central movement/posture CST/RST mechanism similar to the one we propose in our conceptual model. For example, (Blum et al., 2020) argue that muscle spindle firing depends on both peripheral and central factors. Second, increased stiffness would not necessarily help detect differences in how active postural control responds to within-resting-posture vs. out-of-resting-posture perturbations. This is because an overall increase in stiffness would likely increase resistance to perturbations in any direction.”

      The authors should provide a limitations paragraph. They should address 1) how they used different perturbation force profiles, 2) the muscles were in different states which would change neuromuscular responses between trial phase / condition, 3) discuss a lack of direct statistical comparisons that support their hypothesis, and 4) provide a couple of paragraphs on classic neurophysiology, such as muscle stiffness and stretch reflexes, and how these various factors could influence the findings (i.e., whether they can disentangle whether the reported results are due to classic neurophysiology, the hypothesis they propose, or both).

      Thank you for your suggestion. We now discuss these points in a separate paragraph (lines 846895), bringing together our previous discussion on stretch reflexes, our description of different perturbation types, and the additional issues raised by the reviewer above.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors have responded well to all my concerns, save two minor points.

      Figure 2 appears to be unchanged, although they describe appropriate changes in the response letter.

      Thank you for catching this error – we now include the updated figure (further updated to use the terms near/distant in place of proximal/distal).

      I still take issue with the use of proximal and distal to describe the locations of targets. Taking definitions somewhat randomly from the internet, "The terms proximal and distal are used in structures that are considered to have a beginning and an end," and "Proximal and distal are anatomical terms used to describe the position of a body part in relation to another part or its origin." In any case, the hand does not become proximal just because you bring it to your chest. Why not simply stick to the common and clearly defined terms "near" and "distant"?

      Point taken. We have updated the paper to use the terms near/distant.

      Additional changes/corrections not outlined above

      We now include a link to the data and code supporting our findings (https://osf.io/hufy8/). In addition, we made several minor edits throughout the text to improve readability, and corrected occasional mislabeling of CCW and CW pulse data. Note that this correction did not alter the (lack of) relationship between resting biases and responses to perturbations during active movement.

      Response letter references

      Bennett D, Hollerbach J, Xu Y, Hunter I (1992) Time-varying stiffness of human elbow joint during cyclic voluntary movement. Exp Brain Res 88:433–442.

      Blum KP, Campbell KS, Horslen BC, Nardelli P, Housley SN, Cope TC, Ting LH (2020) Diverse and complex muscle spindle afferent firing properties emerge from multiscale muscle mechanics. Elife 9:e55177.

      Cramer SC, Nelles G, Benson RR, Kaplan JD, Parker RA, Kwong KK, Kennedy DN, Finklestein SP, Rosen BR (1997) A functional MRI study of subjects recovered from hemiparetic stroke. Stroke 28:2518–2527.

      McPherson JG, Chen A, Ellis MD, Yao J, Heckman C, Dewald JP (2018) Progressive recruitment of contralesional cortico-reticulospinal pathways drives motor impairment post stroke. J Physiol 596:1211–1225 Available at: https://doi.org/10.1113/JP274968.

      Rack PM, Westbury D (1974) The short range stiffness of active mammalian muscle and its effect on mechanical properties. J Physiol 240:331–350.

      Robinson R, Herzog W, Nigg BM (1987) Use of force platform variables to quantify the effects of chiropractic manipulation on gait symmetry. J Manipulative Physiol Ther 10:172–176.

      Williams PE, Goldspink G (1973) The effect of immobilization on the longitudinal growth of striated muscle fibres. J Anat 116:45.

    1. Synthèse du Documentaire "Ça baigne"

      Résumé

      Ce document propose une analyse synthétique des thèmes et événements clés présentés dans le documentaire "Ça baigne", centré sur la vie d'un collège et les défis rencontrés par son équipe pédagogique.

      Le fil conducteur est le cas de Sarah, une élève en situation de décrochage scolaire et comportemental, dont le sort est examiné lors d'un conseil de discipline.

      Le documentaire met en lumière la tension entre la nécessité de sanctionner et la volonté de soutenir une élève en détresse, exacerbée par une situation familiale extrêmement difficile.

      Il explore les stratégies mises en place par l'établissement – exclusion avec sursis, changement de classe, tutorat par une pair – et les réactions contrastées du corps enseignant, oscillant entre lassitude et engagement.

      Malgré une mobilisation intense, le parcours de Sarah reste précaire, illustrant la complexité de la lutte contre l'échec scolaire.

      En parallèle, la découverte d'un message de détresse anonyme dans les toilettes de l'établissement souligne un malaise adolescent plus large, dépassant le seul cas de Sarah.

      Analyse Approfondie des Thèmes Principaux

      Le Cas de Sarah : Entre Crise Personnelle et Décrochage Scolaire

      Le documentaire s'articule autour du suivi de Sarah, une élève dont la situation a atteint un point critique, nécessitant la tenue d'un conseil de discipline.

      Le Conseil de Discipline

      Le conseil est convoqué en raison de la dégradation rapide et sévère de la situation scolaire de Sarah. Les faits marquants sont :

      Décrochage Académique : Sarah est décrite comme étant en "complet décrochage scolaire" avec des résultats en "chute libre".

      Le deuxième trimestre ne compte que cinq notes au-dessus de la moyenne, alors que le premier trimestre affichait "plusieurs 20/20".

      Problèmes Comportementaux : Elle est constamment qualifiée d'insolente et de perturbatrice. Un enseignant témoigne : "elle est toujours resté insolente et tu as perturbé les cours".

      Engagement de Sarah : Face au conseil, Sarah exprime son souhait de rester dans l'établissement ("j'ai pas envie de changer de collège moi je suis bien là") et s'engage à "repartir à zéro" et à présenter des excuses.

      La Situation Familiale Complexe

      Un élément central, bien que traité avec pudeur à la demande du père, est le contexte familial de Sarah.

      Le Refus du Père : Le père de Sarah refuse explicitement que la situation familiale soit utilisée pour excuser le comportement de sa fille : "non non je veux pas qu'elle joue en sa faveur (...) ça n'a rien à voir".

      Le Contexte Révélé par le Principal : En l'absence de la famille, le principal décrit la situation aux membres du conseil : le père est seul pour s'occuper de ses enfants et consacre une grande partie de ses journées et soirées à l'hôpital auprès de la petite sœur de Sarah.

      Sa routine est décrite comme suit : lever à 6h, visite à l'hôpital jusqu'à 10h30, travail de 11h à 18h/19h, puis retour à l'hôpital jusqu'à 22h ou 23h.

      La Décision : La "Dernière Perche"

      Le conseil de discipline opte pour une sanction visant à la fois la fermeté et l'accompagnement.

      Sanction : La décision est une "exclusion définitive" assortie d'un "surcis".

      Avertissement : Le principal est très clair avec Sarah : "sache que c'est la dernière perche il y en aura plus d'autres. Si tu dérapes il ne pourra plus rien".

      Mesures d'Accompagnement : Un plan est mis en place, incluant une nouvelle classe, une nouvelle équipe pédagogique et la désignation d'une tutrice.

      Tensions et Stratégies au Sein de l'Équipe Pédagogique

      Le cas de Sarah révèle les divergences d'approches et la fatigue de l'équipe éducative.

      Le Rôle du Principal

      Le principal agit en médiateur et en protecteur, cherchant activement une solution pour "sauver la peau" de Sarah. Il confronte directement les enseignants les plus réticents.

      Négociation avec les Enseignants : Il demande à une professeure sceptique : "augmentez votre seuil de tolérance".

      Il la met en garde contre une "ligue" contre l'élève, affirmant que "tout le monde est capable de lui faire péter un câble en 15 secondes".

      Volonté de Soutien : Il est le principal architecte de la solution de la dernière chance, malgré le scepticisme ambiant.

      La Frustration des Enseignants

      Les professeurs expriment une lassitude et un sentiment d'impuissance face au comportement de Sarah.

      Saturation : Une enseignante déclare : "j'ai fait au moins 15 rapports sur elle mais mais je sais maintenant j'en fais plus parce que bon ça sert plus à rien".

      Conflit d'Intérêts Pédagogiques : Une autre professeure résume le dilemme : Sarah est "une élève intelligente qui m'empêche de travailler avec les autres élèves".

      Hostilité Ouverte : Le principal mentionne qu'une collègue, Madame Petite, "veut sa peau, elle l'a dit clairement".

      Le Débat sur les Mesures de Suivi

      La mise en place du suivi de Sarah suscite des débats. La proposition d'une "fiche de suivi" est immédiatement rejetée par un membre de l'équipe : "elle a plus le droit tu viens travailler tu travailles pas tu fais le boxon tu prends la porte".

      Cela témoigne d'une volonté de ne plus accorder de marge de manœuvre à l'élève.

      La Mise en Œuvre du Dispositif de Soutien et ses Limites

      Le documentaire suit les premiers pas de Sarah dans son nouveau cadre, révélant à la fois des progrès et des rechutes.

      L'Altercation avec Marine

      Un incident dans le couloir avec une surveillante, Marine, sert de test.

      Le Conflit : Sarah, attendant un professeur sans justificatif, est sommée par Marine de sortir dans la cour. Le ton monte.

      Une Réaction Nouvelle : Au lieu d'exploser, Sarah se contient et va chercher de l'aide auprès du personnel encadrant.

      Ce changement est noté comme un progrès significatif. Le principal lui dit : "ce qui est positif tu t'es pas énervé.

      Ça aurait été il y a 8 jours tu aurais dit à Marine (...) va te faire foutre non peut-être pire que ça".

      Le Rappel aux Règles : Sa tutrice et le principal lui rappellent cependant son erreur initiale : sans justificatif, elle devait obéir à l'ordre de la surveillante.

      Le Tutorat par une Pair (Lydia)

      Le système de tutorat est un élément clé du dispositif.

      Rôle de Lydia : Lydia, une autre élève, est chargée de suivre Sarah.

      Elle se montre sérieuse dans sa mission, qui consiste à s'assurer que Sarah "se tienne correctement" et "rattrape son cours d'histoire".

      Bilan du Tutorat : Lydia juge que globalement "ça va", mais admet que sa présence constante agace Sarah : "ça la gasse un peu".

      Elle révèle vouloir devenir commissaire de police, ce qui éclaire son intérêt pour ce rôle d'encadrement.

      L'Échec Final

      Le documentaire se conclut sur une note pessimiste. Le personnel constate que Sarah a manqué son cours de SVT.

      La conclusion est abrupte : "elle est sortie elle est pas allé en cours de SVT mais elle était où dehors elle est partie donc bah donc donc".

      Cet événement suggère un retour aux anciens comportements et met en doute la réussite du dispositif.

      Le Mystère du Message de Détresse

      En parallèle du cas de Sarah, une intrigue secondaire met en évidence le mal-être potentiel d'autres élèves.

      La Découverte : Un message est trouvé dans les toilettes : "aide-moi s'il te plaît je souffre trop (...) je veux mourir si tu veux m'aider mets une croix".

      L'Enquête du Personnel : Le personnel tente de déchiffrer les initiales de l'auteur ("DK" ou "PK") et la signification d'une réponse ("pourquoi" suivi d'une croix), démontrant leur vigilance.

      Hypothèses : Ils émettent des hypothèses, évoquant le cas d'une autre élève "malheureuse que ses parents vont la mettre en foyer".

      Cet événement fonctionne comme un rappel que la détresse psychologique est une réalité plus large au sein de l'établissement.

      Citations Clés

      Intervenant

      Citation

      Contexte

      Le Principal

      "J'aimerais qu'on lui sauve la peau à cette petite."

      Exprimant sa volonté de ne pas abandonner Sarah avant le conseil de discipline.

      Un Enseignant

      "Personne n'a rien contre elle mais tout le monde veut la mettre dehors."

      Résumant le paradoxe de la situation de Sarah et la lassitude du corps professoral.

      Le Père de Sarah

      "Non non je veux pas qu'elle joue en sa faveur, non on veut pas qu'il prenne ça en considération."

      Au conseil de discipline, refusant que sa situation familiale serve d'excuse.

      Sarah

      "Bah déjà de repartir à zéro et s'il faut faire des lettres d'excuses aux profs à qui je fais des torts bah je le ferai."

      Son engagement pris lors du conseil de discipline.

      Le Principal

      "Sache que c'est la dernière perche il y en aura plus d'autres."

      Avertissement final à Sarah après la décision du sursis.

      Un Enseignant

      "Elle a plus le droit tu viens travailler tu travailles pas tu fais le boxon tu prends la porte."

      Réaction au sujet de la fiche de suivi, marquant un durcissement de la posture.

      Message Anonyme

      "aide-moi s'il te plaît je souffre trop s'il te plaît aide-moi je veux mourir"

      Message de détresse découvert dans les toilettes de l'établissement.

    1. Reviewer #2 (Public review):

      This study aims to disentangle the contribution of sensory and motor processes (mapped onto the inverse and forward components of speech motor control models like DIVA) to production changes as a result of altered auditory feedback. After five experiments, the authors conclude that it is the motor compensation on the previous trial, and not the sensory error, that drives compensatory responses in subsequent trials.

      Assessment:

      The goal of this paper is great, and the question is timely. Quite a bit of work has gone into the study, and the technical aspects are sound. That said, I just don't understand how the current design can accomplish what the authors have set as their goal. This may, of course, be a misunderstanding on my part, so I'll try to explain my confusion below. If it is indeed my mistake, then I encourage the authors to dedicate some space to unpacking the logic in the Introduction, which is currently barely over a page long. They should take some time to lay out the logic of the experimental design and the dependent and independent variables, and how this design disentangles sensory and motor influences. Then clearly discuss the opposing predictions supporting sensory-driven vs. motor-driven changes. Given that I currently don't understand the logic and, consequently, the claims, I will focus my review on major points for now.

      Main issues

      (1) Measuring sensory change. As acknowledged by the authors, making a motor correction as a function of altered auditory feedback is an interactive process between sensory and motor systems. However, one could still ask whether it is primarily a change to perception vs. a change to production that is driving the motor correction. But to do this, one has to have two sets of measurements: (a) perceptual change, and (b) motor change. As far as I understand, the study has the latter (i.e., C), but not the former. Instead, the magnitude of perceptual change is estimated through the proxy of the magnitude of perturbation (P), but the two are not the same; P is a physical manipulation; perceptual change is a psychological response to that physical manipulation. It is theoretically possible that a physical change does not cause a psychological change, or that the magnitude of the two does not match. So my first confusion centers on the absence of any measure of sensory change in this study.

      To give an explicit example of what I mean, consider a study like Murphy, Nozari, and Holt (2024; Psychonomic Bulletin & Review). This work is about changes to production as a function of exposure to other talkers' acoustic properties - rather than your own altered feedback - but the idea is that the same sensory-motor loop is involved in both. When changing the acoustic properties of the input, the authors obtain two separate measures: (a) how listeners' perception changes as a function of this physical change in the acoustics of the auditory signal, and (b) how their production changes. This allows the authors to identify motor changes above and beyond perceptual changes. Perhaps making a direct comparison with this study would help the reader understand the parallels better.

      (2) A more fundamental issue for me is a theoretical one: Isn't a compensatory motor change ALWAYS a consequence of a perceptual change? I think it makes sense to ask, "Does a motor compensation hinge on a previous motor action or is sensory change enough to drive motor compensation?" This question has been asked for changed acoustics for self-produced speech (e.g., Hantzsch, Parrell, & Niziolek, 2022) and other-produced speech (Murphy, Holt, & Nozari, 2025), and in both cases, the answer has been that sensory changes alone are, in fact, sufficient to drive motor changes. A similar finding has been reported for the role of cerebellum in limb movements (Tseng et al., 2007), with a similar answer (note that in that study, the authors explicitly talk about "the addition" of motor corrections to sensory error, not one vs. the other as two independent factors. So I don't understand a sentence like "We found that motor compensation, rather than sensory errors, predicted the compensatory responses in the subsequent trials", which views motor compensations and sensory errors as orthogonal variables affecting future motor adjustments.

      In other words, there is a certain degree of seriality to the compensation process, with sensory changes preceding motor corrections. If the authors disagree with this, they should explain how an alternative is possible. If they mean something else, a comparison with the above studies and explaining the differences in positions would greatly help.

      (3) Clash with previous findings. I used the examples in point 2 to bring up a theoretical issue, but those examples are also important in that all three of them reach a conclusion compatible with one another and different from the current study. The authors do discuss Tseng et al.'s findings, which oppose their own, but dismiss the opposition based on limb vs. articulator differences. I don't find the authors reasoning theoretically convincing here, but more importantly, the current claims also oppose findings from speech motor studies (see citations in point 2), to which the authors' arguments simply don't apply. Strangely, Hantzsch et al.'s study has been cited a few times, but never in its most important capacity, which is to show that speech motor adaptation can take place after a single exposure to auditory error. Murphy et al. report a similar finding in the context of exposure to other talkers' speech.

      If the authors can convincingly justify their theoretical position in 2, the next step would be to present a thorough comparison with the results of the three studies above. If indeed there is no discrepancy, this comparison would help clarify it.

      References

      Hantzsch, L., Parrell, B., & Niziolek, C. A. (2022). A single exposure to altered auditory feedback causes observable sensorimotor adaptation in speech. eLife, 11, e73694.

      Murphy, T. K., Nozari, N., & Holt, L. L. (2024). Transfer of statistical learning from passive speech perception to speech production. Psychonomic Bulletin & Review, 31(3), 1193-1205.

      Murphy, T. K., Holt, L. L. & Nozari, N. (2025). Exposure to an Accent Transfers to Speech Production in a Single Shot. Preprint available at: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5196109.

      Tseng, Y. W., Diedrichsen, J., Krakauer, J. W., Shadmehr, R., & Bastian, A. J. (2007). Sensory prediction errors drive cerebellum-dependent adaptation of reaching. Journal of neurophysiology, 98(1), 54-62.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      In this manuscript, the authors performed an integration of 48 scRNA-seq public datasets and created a single-cell transcriptomic atlas for AML (222 samples comprising 748,679 cells). This is important since most AML scRNA-seq studies suffer from small sample size coupled with high heterogeneity. They used this atlas to further dissect AML with t(8;21) (AML-ETO/RUNX1-RUNX1T1), which is one of the most frequent AML subtypes in young people. In particular, they were able to predict Gene Regulatory Networks in this AML subtype using pySCENIC, which identified the paediatric regulon defined by a distinct group of hematopoietic transcription factors (TFs) and the adult regulon for t(8;21). They further validated this in bulk RNA-seq with AUCell algorithm and inferred prenatal signature to 5 key TFs (KDM5A, REST, BCLAF1, YY1, and RAD21), and the postnatal signature to 9 TFs (ENO1, TFDP1, MYBL2, KLF1, TAGLN2, KLF2, IRF7, SPI1, and YXB1). They also used SCENIC+ to identify enhancer-driven regulons (eRegulons), forming an eGRN, and found that prenatal origin shows a specific HSC eRegulon profile, while a postnatal origin shows a GMP profile. They also did an in silico perturbation and found AP-1 complex (JUN, ATF4, FOSL2), P300, and BCLAF1 as important TFs to induce differentiation. Overall, I found this study very important in creating a comprehensive resource for AML research. 

      Strengths: 

      (1) The generation of an AML atlas integrating multiple datasets with almost 750K cells will further support the community working on AML. 

      (2) Characterisation of t(8;21) AML proposes new interesting leads. 

      We thank the reviewer for a succinct summary of our work and highlighting its strengths.

      Weaknesses: 

      Were these t(8;21) TFs/regulons identified from any of the single datasets? For example, if the authors apply pySCENIC to any dataset, would they find the same TFs, or is it the increase in the number of cells that allows identification of these? 

      We implemented pySCENIC on individual datasets and compared the TFs (defining the regulons) identified to those from the combined AML scAtlas analysis. There were some common TFs identified, but these vary between individual studies. The union of all TFs identified makes a very large set - comprising around a third of all known TFs. AML scAtlas provides a more refined repertoire of TFs, perhaps as the underlying network inference approach is more robust with a higher number of cells. The findings of these investigations are included in Supplementary Figure 4DE, we hope this is useful for other users of pySCENIC.

      Reviewer #2 (Public review): 

      Summary: 

      The authors assemble 222 publicly available bone marrow single-cell RNA sequencing samples from healthy donors and primary AML, including pediatric, adolescent, and adult patients at diagnosis. Focusing on one specific subtype, t(8;21), which, despite affecting all age classes, is associated with better prognosis and drug response for younger patients, the authors investigate if this difference is reflected also in the transcriptomic signal. Specifically, they hypothesize that the pediatric and part of the young population acquires leukemic mutations in utero, which leads to a different leukemogenic transformation and ultimately to differently regulated leukemic stem cells with respect to the adult counterpart. The analysis in this work heavily relies on regulatory network inference and clustering (via SCENIC tools), which identifies regulatory modules believed to distinguish the pre-, respectively, post-natal leukemic transformation. Bulk RNA-seq and scATAC-seq datasets displaying the same signatures are subsequently used for extending the pool of putative signature-specific TFs and enhancer elements. Through gene set enrichment, ontology, and perturbation simulation, the authors aim to interpret the regulatory signatures and translate them into potential onset-specific therapeutic targets. The putative pre-natal signature is associated with increased chemosensitivity, RNA splicing, histone modification, stemness marker SMARCA2, and potentially maintained by EP300 and BCLAF1. 

      Strengths: 

      The main strength of this work is the compilation of a pediatric AML atlas using the efficient Cellxgene interface. Also, the idea of identifying markers for different disease onsets, interpreting them from a developmental angle, and connecting this to the different therapy and relapse observations, is interesting. The results obtained, the set of putative up-regulated TFs, are biologically coherent with the mechanisms and the conclusions drawn. I also appreciate that the analysis code was made available and is well documented. 

      We thank the reviewer for evaluating our work, and highlighting its key features, including creation of AML atlas, downstream analysis and interpretation for t(8;21) subtype.

      Weaknesses:

      There were fundamental flaws in how methods and samples were applied, a general lack of critical examination of both the results and the appropriateness of the methods for the data at hand, and in how results were presented. In particular: 

      (1) Cell type annotation: 

      (a) The 2-phase cell type annotation process employed for the scRNA-seq sample collection raised concerns. Initially annotated cells are re-labeled after a second round with the same cell types from the initial label pool (Figure 1E). The automatic annotation tools were used without specifying the database and tissue atlases used as a reference, and no information was shown regarding the consensus across these tools. 

      Cell type annotations are heavily influenced by the reference profiles used and vary significantly between tools. To address this, we used multiple cell type annotation tools which predominantly encompassed healthy peripheral blood cell types and/or healthy bone marrow populations. This determined the primary cluster cell types assigned. 

      Existing tools and resources are not leukemia specific, thus, to identify AMLassociated HSPC subpopulations we created a custom SingleR reference, using a CD34 enriched AML single-cell dataset. This was not suitable for the annotation of the full AML scAtlas, as it is derived from CD34 sorted cell types so is biased towards these populations. 

      We have made this much clearer in the revised manuscript, by splitting Figure 1 into two separate figures (now Figure 1 and Figure 2) reflecting both different analyses performed. The methods have also been updated with more detail on the cell type annotations, and we have included the automated annotation outputs as a supplementary table, as this may be useful for others in the single-cell community. 

      (b) Expression of the CD34 marker is only reported as a selection method for HSPCs, which is not in line with common practice. The use of only is admitted as a surface marker, while robust annotation of HSPCs should be done on the basis of expression of gene sets. 

      Most of the cells used in the HSPC analysis were in fact annotated as HSPCs with some exceptions. In line with this feedback, we have re-worked this analysis and simply taken HSPC annotated clusters forward for the subsequent analysis, yielding the same findings. 

      (c) During several analyses, the cell types used were either not well defined or contradictory, such as in Figure 2D, where it is not clear if pySCENIC and AUC scores were computed on HSPCs alone or merged with CMPs. In other cases, different cell type populations are compared and used interchangeably: comparing the HSPCderived regulons with bulk (probably not enriched for CD34+ cells) RNA samples could be an issue if there are no valid assumptions on the cell composition of the bulk sample. 

      We apologize for the lack of clarity regarding which cell types were used, the text has been updated to clarify that in the pySCENIC analysis all myeloid progenitor cells were included. 

      The bulk RNA-seq samples were used only to test the enrichment of our AML scAtlas derived regulons in an unbiased and large-scale way. While CD34 enriched samples could be preferable, this was not available to us. 

      We agree that more effort could be made to ensure the single-cell/myeloid progenitor derived regulons are comparable to the bulk-RNA sequencing data. In the original bulk RNA-seq validation analysis, we used all bulk-RNA sequencing timepoints (diagnostic, on-treatment, relapse) and included both bone marrow and peripheral blood. Upon reflection, and to better harmonize the bulk RNA-seq selection strategy with that of AML scAtlas, we revised our approach to include only diagnostic bone marrow samples. We expect that, since the leukemia blast count for pediatric AML is typically high at diagnosis, these samples will predominantly contain leukemic blasts. 

      (2) Method selection: 

      (a) The authors should explain why they use pySCENIC and not any other approach.They should briefly explain how pySCENIC works and what they get out in the main text. In addition they should explain the AUCell algorithm and motivate its usage. 

      pySCENIC is state-of-the-art method for network inference from scRNA data and is widely used within the single-cell community (over 5000 citations for both versions of the SCENIC pipeline). The pipeline has been benchmarked as one of the top performers for GRN analysis (Nguyen et al, 2021. Briefings in Bioinformatics). AUCELL is a module within the pySCENIC pipeline to summarize the activity of a set of genes (a regulon) into a single number which helps compare and visualize different regulons.  We have modified the manuscript (Results section 2 paragraph 2) to better explain this method and provided some rationale and accompanying citations to justify its use for this analysis. We thank the reviewer for highlighting this and hope our updates add some clarity.

      (b) The obtained GRN signatures were not critically challenged on an external dataset. Therefore, the evidence that supports these signatures to be reliable and significant to the investigated setting is weak. 

      These signatures were inferred using the most suitable AML single-cell RNA datasets currently available. To validate our findings, we used two independent datasets (the TARGET AML bulk RNA sequencing cohort, and the Lambo et al. scRNA-seq dataset). To clarify this workflow in the manuscript, we have added a panel to Figure 3 outlining the analytical process. To our knowledge, there are no other better-suited datasets for validation. Experimental validations on patient samples, while valuable, are beyond the scope of this study.

      (3) There are some issues with the analysis & visualization of the data. 

      Based on this feedback, we have improved several aspects of the analysis, changed some visualizations, and improved figure resolution throughout the manuscript. 

      (4) Discussion: 

      (a) What exactly is the 'regulon signature' that the authors infer? How can it be useful for insights into disease mechanisms? 

      The ’regulon signature’ here refers to a gene regulatory program (multiple gene modules, each defined by a transcription factor and its targets) which are specific to different age groups. Further investigation into this can be useful for understanding why patients of different ages confer a different clinical course. We have amended the text to explain this.  

      (b) The authors write 'Together this indicates that EP300 inhibition may be particularly effective in t(8;21) AML, and that BCLAF1 may present a new therapeutic target for t(8;21) AML, particularly in children with inferred pre-natal origin of the driver translocation.' I am missing a critical discussion of what is needed to further test the two targets. Put differently: Would the authors take the risk of a clinical study given the evidence from their analysis? 

      Indeed, many extensive studies would be required before these findings are clinically translatable. We have included a discussion paragraph (discussion paragraph 7) detailing what further work is required in terms of experimental validation and potential subsequent clinical study.

      Reviewer #1 (Recommendations for the authors): 

      In addition to the point raised above, Cytoscape files for the GRNs and eGRNs inferred would be useful to have. 

      We have now provided Cytoscape/eGRN tables in supplementary materials.

      Reviewer #2 (Recommendations for the authors): 

      (1) Figures 1F and 1G: You show the summed-up frequencies for all patients, right? It would be very interesting to see this per patient, or add error bars, since the shown frequencies might be driven by single patients with many cells. 

      While this type of plot could be informative, the large number of samples in the AML scAtlas rendered the output difficult to interpret. As a result, we decided not to include it in the manuscript.

      (2) An issue of selection bias has to be raised when only the two samples expressing the expected signatures are selected from the external scRNA dataset. Similarly, in the DepMap analysis, the age and nature of the other cell lines sensitive to EP300 and BCLAF1 should be reported. 

      Since the purpose of this analysis was to build on previously defined signatures, we selected the two samples which we had preliminary hypotheses for. It would indeed be interesting to explore those not matching these signatures; however, samples numbers are very small, so without preliminary findings robust interpretation and validation would be difficult. An expanded validation would be more appropriate once more data becomes available in the future. 

      We agree that investigating the age and nature of other BCLAF1/EP300 sensitive cell lines is a very valuable direction. Our analysis suggests that our BCLAF1 findings may also be applicable to other in-utero origin cancers, and we have now summarized these observations in Supplementary Figure 7H. 

      (3) Is there statistical evidence for your claim that "This shows that higher-risk subtypes have a higher proportion of LSCs compared to favorable risk disease."? At least intermediate and adverse look similar to me. How does this look if you show single patients?  

      We are grateful to the reviewer for noticing this oversight and have now included an appropriate statistical test in the revised manuscript. As before, while showing single patients may be useful, the large number of patients makes such plot difficult to interpret. For this reason, we have chosen not to include them.

      (4) Specify the statistical test you used to 'identify significantly differentially expressed TFs' (line 192). 

      The methods used for differential expression analysis are now clearly stated in the text as well as in the methods section. We hope this addition improves clarity for the reader.

      (5) Figure 2B: You show the summed up frequencies for all patients, right? It would be intriguing to see this figure per patient, since the shown frequencies might be driven by single patients with many cells. 

      Yes, the plot includes all patients. Showing individual patients on a single plot is not easily interpretable. 

      (6) Y axis in 2D is not samples, but single cells? Please specify. 

      We thank the reviewer for bringing this to our attention and have now updated Figure 3D accordingly. 

      (7) Figure 3A: I don't get why the chosen clusters are designated as post- and prenatal, given the occurrence of samples in them. 

      This figure serves to validate the previously defined regulon signatures, so the cluster designations are based on this. We have amended the text to elaborate on this point, which will hopefully provide greater clarity.

      (8) Figure 3E: What is shown on the y axis? Did you correct your p-values for multiple testing? 

      We apologize for this oversight and have now added a y axis label. P values were not corrected for multiple testing, as there are only few pairwise T tests performed.

      (9) Robustness: You find some gene sets up- and down-regulated. How would that change if you used an eg bootstrapped number of samples, or a different analysis approach? 

      To address this, we implemented both edgeR and DESeq2 for DE testing. Our findings (Supplementary Figure 5B) show that 98% of edgeR genes are also detected by DESeq2. We opted to use the smaller edgeR gene list for our analysis, due to the significant overlap showing robust findings. We thank the reviewer for this helpful suggestion, which has strengthened our analysis

      (10) Multiomics analysis:

      (a) Why only work on 'representative samples'? The idea of an integrated atlas is to identify robust patterns across patients, no? I'd love to see what regulons are robust, ie,  shared between patients.

      As discussed in point 2, there are very few samples available for the multiomics analysis. Therefore, we chose to focus on those samples which we had a working hypothesis for, as a validation for our other analyses. 

      (b) I don't agree that finding 'the key molecular processes, such as RNA splicing, histone modification, and TF binding' expressed 'further supports the stemness signature in presumed prenatal origin t(8;21) AML'.

      Following the improvements made on the bulk RNA-Seq analysis in response to the previous reviewer comments, we ended up with a smaller gene set. Consequently, the ontology results have changed. The updated results are now more specific and indicate that developmental processes are upregulated in presumed prenatal origin t(8;21) AML. 

      (c) Please clarify if the multiome data is part of the atlas.

      The multiome data is not a part of AML scAtlas, as it was published at a later date. We used this dataset solely for validation purposes and have updated the figures and text to clearly indicate that it is used as a validation dataset.  

      (d) Please describe the used data with respect to the number of patients, cells, age, etc.

      We clarified this point in the text and have also included supplementary tables detailing all samples used in the atlas and validation datasets. 

      (e) The four figures in Figure 4E look identical to me. What is the take-home message here? Do all perturbations have the same impact on driving differentiation? Please elaborate.

      The perturbation figure is intended to illustrate that other genes can behave similarly to members of the AP-1 complex (JUN and ATF4 here) following perturbation. Since the AP-1 complex is well known to be important in t(8;21) AML, we hypothesize that these other genes are also important. We apologize for the previous lack of interpretation here and have amended the text to clarify this point. 

      (11) Abstract: Please detail: how many of the 159 AML patients are t(8;21)? 

      We have now amended the abstract to include this. 

      (12) Figures: Increase font size where possible, eg age in 1B or risk group in 1G is super small and hard to read. 

      Extra attention has been given to improving the figure readability and resolution throughout the whole manuscript.  

      (13) Color codes in Figures 2B and 2C are all over the place and misleading: Sort 2C along age, indicate what is adult and adolescent, sort the x axis in 2B along age. 

      We have changed this figure accordingly.  

      (14) I suggest not coloring dendrograms, in my opinion this is highly irritating. 

      The dendrogram colors correspond to clusters which are referenced in the text, this coloring provides informative context and aids interpretation, making it a useful addition to the figure.

      (15) The resolution in Figure 4B is bad, I can't read the labels. 

      This visualization has been revised, to make presentation of this data clearer.  

      (16) In addition to selecting bulk RNA samples matching the two regulon signatures, some effort should have been put into investigating the samples not aligned with those, or assessing how unique these GRN signatures are to the specific cell type and disease of interest, excluding the influence of cell type composition and random noise. The lateonset signatures should also be excluded from being present in an external pre-natal cohort in a more statistically rigorous manner. 

      Our use of the bulk RNA-Seq data is solely intended for the validation of predefined regulon signatures, for which we already have a working hypothesis.  While we agree that further investigation of the samples that do not align with these signatures could yield interesting insights, we believe that such an analysis would extend beyond the scope of the current manuscript.

      (17) The specific bulk RNA samples used should be specified, along with the tissue of origin. The same goes for the Lambo dataset. 

      We have clarified this point in the text and provided a supplementary table detailing all samples used for validation, alongside the sample list from AML scAtlas.

      (18) In Supplementary Figure 5 B, the axes should be define. 

      We have updated this figure to include axis legends.

      (19) Supplementary Figure 4A. There is a mistake in the sex assignment for sample AML14D. Since chrY-genes are expressed, this sample is likely male, while the Xist expression is mostly zero. 

      We thank the reviewer for pointing out this error, which has now been corrected.  

      (20) Wording suggestions: 

      (a) Line 54: not compelling phrasing. 

      (b) Line 83: "allows to decipher". 

      (c) Line 88: repetition from line 85. 

      (d) Line 90: the expression "clean GRN" is not clear. 

      These wording suggestions have all been incorporated in the revised manuscript.

      (21) Supplementary Figure 3D is not interpretable, I suggest a different visualization. 

      We agree that the original figure was not the most informative and have replaced it with UMAPs displaying LSC6 and LSC17 scores.

    1. Author response:

      Reviewer 1 (Public review):

      (1) Figure 1B shows the PREDICTED force-extension curve for DNA based on a worm-like chain model. Where is the experimental evidence for this curve? This issue is crucial because the F-E curve will decide how and when a catch-bond is induced (if at all it is) as the motor moves against the tensiometer. Unless this is actually measured by some other means, I find it hard to accept all the results based on Figure 1B.

      The Worm-Like-Chain model for the elasticity of DNA was established by early work from the Bustamante lab (Smith et al., 1992)  and Marko and Siggia (Marko and Siggia, 1995), and was further validated and refined by the Block lab (Bouchiat et al., 1999; Wang et al., 1997). The 50 nm persistence length is the consensus value, and was shown to be independent of force and extension in Figure 3 of Bouchiat et al (Bouchiat et al., 1999). However, we would like to stress that for our conclusions, the precise details of the Force-Extension relationship of our dsDNA are immaterial. The key point is that the motor stretches the DNA and stalls when it reaches its stall force. Our claim of the catch-bond character of kinesin is based on the longer duration at stall compared to the run duration in the absence of load. Provided that the motor is indeed stalling because it has stretched out the DNA (which is strongly supported by the repeated stalling around the predicted extension corresponding to ~6 pN of force), then the stall duration depends on neither the precise value for the extension nor the precise value of the force at stall.

      (2) The authors can correct me on this, but I believe that all the catch-bond studies using optical traps have exerted a load force that exceeds the actual force generated by the motor. For example, see Figure 2 in reference 42 (Kunwar et al). It is in this regime (load force > force from motor) that the dissociation rate is reduced (catch-bond is activated). Such a regime is never reached in the DNA tensiometer study because of the very construction of the experiment. I am very surprised that this point is overlooked in this manuscript. I am therefore not even sure that the present experiments even induce a catch-bond (in the sense reported for earlier papers).

      It is true that Kunwar et al measured binding durations at super-stall loads and used that to conclude that dynein does act as a catch-bond (but kinesin does not) (Kunwar et al., 2011). However, we would like to correct the reviewer on this one. This approach of exerting super-stall forces and measuring binding durations is in fact less common than the approach of allowing the motor to walk up to stall and measuring the binding duration. This ‘fixed trap’ approach has been used to show catch-bond behavior of dynein (Leidel et al., 2012; Rai et al., 2013) and kinesin (Kuo et al., 2022; Pyrpassopoulos et al., 2020). For the non-processive motor Myosin I, a dynamic force clamp was used to keep the actin filament in place while the myosin generated a single step (Laakso et al., 2008). Because the motor generates the force, these are not superstall forces either.

      (3) I appreciate the concerns about the Vertical force from the optical trap. But that leads to the following questions that have not at all been addressed in this paper:

      (i) Why is the Vertical force only a problem for Kinesins, and not a problem for the dynein studies?

      Actually, we do not claim that vertical force is not a problem for dynein; our data do not speak to this question. There is debate in the literature as to whether dynein has catch bond behavior in the traditional single-bead optical trap geometry - while some studies have measured dynein catch bond behavior (Kunwar et al., 2011; Leidel et al., 2012; Rai et al., 2013), others have found that dynein has slip-bond or ideal-bond behavior (Ezber et al., 2020; Nicholas et al., 2015; Rao et al., 2019). This discrepancy may relate to vertical forces, but not in an obvious way.

      (ii) The authors state that "With this geometry, a kinesin motor pulls against the elastic force of a stretched DNA solely in a direction parallel to the microtubule". Is this really true? What matters is not just how the kinesin pulls the DNA, but also how the DNA pulls on the kinesin. In Figure 1A, what is the guarantee that the DNA is oriented only in the plane of the paper? In fact, the DNA could even be bending transiently in a manner that it pulls the kinesin motor UPWARDS (Vertical force). How are the authors sure that the reaction force between DNA and kinesin is oriented SOLELY along the microtubule?

      We acknowledge that “solely” is an absolute term that is too strong to describe our geometry. We will soften this term in our revision to “nearly parallel to the microtubule”. In the Geometry Calculations section of Supplementary Methods, we calculate that if the motor and streptavidin are on the same protofilament, the vertical force will be <1% of the horizontal force. We also note that if the motor is on a different protofilament, there will be lateral forces and forces perpendicular to the microtubule surface, except they are oriented toward rather than away from the microtubule. The DNA can surely bend due to thermal forces, but because inertia plays a negligible role at the nanoscale (Howard, 2001; Purcell, 1977), any resulting upward forces will only be thermal forces, which the motor is already subjected to at all times.

      (4) For this study to be really impactful and for some of the above concerns to be addressed, the data should also have included DNA tensiometer experiments with Dynein. I wonder why this was not done?

      As much as we would love to fully characterize dynein here, this paper is about kinesin and it took a substantial effort. The dynein work merits a stand-alone paper.

      While I do like several aspects of the paper, I do not believe that the conclusions are supported by the data presented in this paper for the reasons stated above.

      The three key points the reviewer makes are the validity of the worm-like-chain model, the question of superstall loads, and the role of DNA bending in generating vertical forces. We hope that we have fully addressed these concerns in our responses above.

      Reviewer #2 (Public review):

      Major comments:

      (1) The use of the term "catch bond" is misleading, as the authors do not really mean consistently a catch bond in the classical sense (i.e., a protein-protein interaction having a dissociation rate that decreases with load). Instead, what they mean is that after motor detachment (i.e., after a motor protein dissociating from a tubulin protein), there is a slip state during which the reattachment rate is higher as compared to a motor diffusing in solution. While this may indeed influence the dynamics of bidirectional cargo transport (e.g., during tug-of-war events), the used terms (detachment (with or without slip?), dissociation, rescue, ...) need to be better defined and the results discussed in the context of these definitions. It is very unsatisfactory at the moment, for example, that kinesin-3 is at first not classified as a catch bond, but later on (after tweaking the definitions) it is. In essence, the typical slip/catch bond nomenclature used for protein-protein interaction is not readily applicable for motors with slippage.

      We appreciate the reviewer’s point and we will work to streamline and define terms in our revision.

      (2) The authors define the stall duration as the time at full load, terminated by >60 nm slips/detachments. Isn't that a problem? Smaller slips are not detected/considered... but are also indicative of a motor dissociation event, i.e., the end of a stall. What is the distribution of the slip distances? If the slip distances follow an exponential decay, a large number of short slips are expected, and the presented data (neglecting those short slips) would be highly distorted.

      The reviewer brings up a good point that there may be undetected slips. To address this question, we plotted the distribution of slip distances for kinesin-3, which by far had the most slip events. As the reviewer suggested, it is indeed an exponential distribution. Our preliminary analysis suggests that roughly 20% of events are missed due to this 60 nm cutoff. This will change our unloaded duration numbers slightly, but this will not alter our conclusions.\

      (3) Along the same line: Why do the authors compare the stall duration (without including the time it took the motor to reach stall) to the unloaded single motor run durations? Shouldn't the times of the runs be included?

      The elastic force of the DNA spring is variable as the motor steps up to stall, and so if we included the entire run duration then it would be difficult to specify what force we were comparing to unloaded. More importantly, if we assume that any stepping and detachment behavior is history independent, then it is mathematically proper to take any arbitrary starting point (such as when the motor reaches stall), start the clock there, and measure the distribution of detachments durations relative to that starting point.

      More importantly, what we do in Fig. 3 is to separate out the ramps from the stalls and, using a statistical model, we compute a separate duration parameter (which is the inverse of the off-rate) for the ramp and the stall. What we find is that the relationship between ramp, stall, and unloaded durations is different for the three motors, which is interesting in itself.

      (4) At many places, it appears too simple that for the biologically relevant processes, mainly/only the load-dependent off-rates of the motors matter. The stall forces and the kind of motor-cargo linkage (e.g., rigid vs. diffusive) do likely also matter. For example: "In the context of pulling a large cargo through the viscous cytoplasm or competing against dynein in a tug-of-war, these slip events enable the motor to maintain force generation and, hence, are distinct from true detachment events." I disagree. The kinesin force at reattachment (after slippage) is much smaller than at stall. What helps, however, is that due to the geometry of being held close to the microtubule (either by the DNA in the present case or by the cargo in vivo) the attachment rate is much higher. Note also that upon DNA relaxation, the motor is likely kept close to the microtubule surface, while, for example, when bound to a vesicle, the motor may diffuse away from the microtubule quickly (e.g., reference 20).

      We appreciate the reviewer’s detailed thinking here, and we offer our perspective. As to the first point, we agree that the stall force is relevant and that the rigidity of the motor-cargo linkage will play a role. The goal of the sentence on pulling cargo that the reviewer highlights is to set up our analysis of slips, which we define as rearward displacements that don’t return to the baseline before force generation resumes. We agree that force after slippage is much smaller than at stall, and we plan to clarify that section of text. However, as shown in the model diagram in Fig. 5, we differentiate between the slip state (and recovery from this slip state) and the detached state (and reattachment from this detached state). This delineation is important because, as the reviewer points out, if we are measuring detachment and reattachment with our DNA tensiometer, then the geometry of a vesicle in a cell will be different and diffusion away from the microtubule or elastic recoil perpendicular to the microtubule will suppress this reattachment.

      Our evidence for a slip state in which the motor maintains association with the microtubule comes from optical trapping work by Tokelis et al (Toleikis et al., 2020) and Sudhakar et al (Sudhakar et al., 2021). In particular, Sudhakar used small, high index Germanium microspheres that had a low drag coefficient. They showed that during ‘slip’ events, the relaxation time constant of the bead back to the center of the trap was nearly 10-fold slower than the trap response time, consistent with the motor exerting drag on the microtubule. (With larger beads, the drag of the bead swamps the motor-microtubule friction.) Another piece of support for the motor maintaining association during a slip is work by Ramaiya et al. who used birefringent microspheres to exert and measure rotational torque during kinesin stepping (Ramaiya et al., 2017). In most traces, when the motor returned to baseline following a stall, the torque was dissipated as well, consistent with a ‘detached’ state. However, a slip event is shown in S18a where the motor slips backward while maintaining torque. This is best explained by the motor slipping backward in a state where the heads are associated with the microtubule (at least sufficiently to resist rotational forces). Thus, we term the resumption after slip to be a rescue from the slip state rather than a reattachment from the detached state.

      To finish the point, with the complex geometry of a vesicle, during slip events the motor remains associated with the microtubule and hence primed for recovery. This recovery rate is expected to be the same as for the DNA tensiometer. Following a detachment, however, we agree that there will likely be a higher probability of reattachment in the DNA tensiometer due to proximity effects, whereas with a vesicle any elastic recoil or ‘rolling’ will pull the detached motor away from the microtubule, suppressing reattachment. We plan to clarify these points in the text of the revision.

      (5) Why were all motors linked to the neck-coil domain of kinesin-1? Couldn't it be that for normal function, the different coils matter? Autoinhibition can also be circumvented by consistently shortening the constructs.

      We chose this dimerization approach to focus on how the mechoanochemical properties of kinesins vary between the three dominant transport families. We agree that in cells, autoinhibition of both kinesins and dynein likely play roles in regulating bidirectional transport, as will the activity of other regulatory proteins. The native coiled-coils may act as as ‘shock absorbers’ due to their compliance, or they might slow the motor reattachment rate due to the relatively large search volumes created by their long lengths (10s of nm). These are topics for future work. By using the neck-coil domain of kinesin-1 for all three motors, we eliminate any differences in autoinhibition or other regulation between the three kinesin families and focus solely on differences in the mechanochemistry of their motor domains.

      (6) I am worried about the neutravidin on the microtubules, which may act as roadblocks (e.g. DOI: 10.1039/b803585g), slip termination sites (maybe without the neutravidin, the rescue rate would be much lower?), and potentially also DNA-interaction sites? At 8 nM neutravidin and the given level of biotinylation, what density of neutravidin do the authors expect on their microtubules? Can the authors rule out that the observed stall events are predominantly the result of a kinesin motor being stopped after a short slippage event at a neutravidin molecule?

      We will address these points in our revision.

      (7) Also, the unloaded runs should be performed on the same microtubules as in the DNA experiments, i.e., with neutravidin. Otherwise, I do not see how the values can be compared.

      We will address this point in our revision.

      (8) If, as stated, "a portion of kinesin-3 unloaded run durations were limited by the length of the microtubules, meaning the unloaded duration is a lower limit." corrections (such as Kaplan-Meier) should be applied, DOI: 10.1016/j.bpj.2017.09.024.

      (9) Shouldn't Kaplan-Meier also be applied to the ramp durations ... as a ramp may also artificially end upon stall? Also, doesn't the comparison between ramp and stall duration have a problem, as each stall is preceded by a ramp ...and the (maximum) ramp times will depend on the speed of the motor? Kinesin-3 is the fastest motor and will reach stall much faster than kinesin-1. Isn't it obvious that the stall durations are longer than the ramp duration (as seen for all three motors in Figure 3)?

      The reviewer rightly notes the many challenges in estimating the motor off-rates during ramps. To estimate ramp off-rates and as an independent approach to calculating the unloaded and stall durations, we developed a Markov model coupled with Bayesian inference methods to estimate a duration parameter (equivalent to the inverse of the off-rate) for the unloaded, ramp, and stall duration distributions. With the ramps, we have left censoring due to the difficulty in detecting the start of the ramps in the fluctuating baseline, and we have right censoring due to reaching stall (with different censoring of the ramp duration for the three motors due to their different speeds). The Markov model assumes a constant detachment probability and history independence, and thus is robust even in the face of left and right censoring (details in the Supplementary section). This approach is preferred over Kaplan-Meier because, although these non-parametric methods make no assumptions for the distribution, they require the user to know exactly where the start time is.

      Regarding the potential underestimate of the kinesin-3 unloaded run duration due to finite microtubule lengths. The first point is that the unloaded duration data in Fig. 2C are quite linear up to 6 s and are well fit by the single-exponential fit (the points above 6s don’t affect the fit very much). The second point is that when we used our Markov model (which is robust against right censoring) to estimate the unloaded and stall durations, the results agreed with the single-exponential fits very well (Table S2). For instance, the single-exponential fit for the kinesin-3 unloaded duration was 2.74 s (2.33 – 3.17 s 95% CI) and the estimate from the Markov model was 2.76 (2.28 – 3.34 s 95% CI). Thus, we chose not to make any corrections due to finite microtubule lengths.

      (10) It is not clear what is seen in Figure S6A: It looks like only single motors (green, w/o a DNA molecule) are walking ... Note: the influence of the attached DNA onto the stepping duration of a motor may depend on the DNA conformation (stretched and near to the microtubule (with neutravidin!) in the tethered case and spherically coiled in the untethered case).

      In Figure S6A kymograph, the green traces are GFP-labeled kinesin-1 without DNA attached (which are in excess) and the red diagonal trace is a motor with DNA attached. There are also two faint horizontal red traces, which are labeled DNA diffusing by (smearing over a large area during a single frame). Panel S6B shows run durations of motors with DNA attached. We agree that the DNA conformation will differ if it is attached and stretched (more linear) versus simply being transported (random coil), but by its nature this control experiment is only addressing random coil DNA.

      (11) Along this line: While the run time of kinesin-1 with DNA (1.4 s) is significantly shorter than the stall time (3.0 s), it is still larger than the unloaded run time (1.0 s). What do the authors think is the origin of this increase?

      Our interpretation of the unloaded kinesin-DNA result is that the much slower diffusion constant of the DNA relative to the motor alone enables motors to transiently detach and rebind before the DNA cargo has diffused away, thus extending the run duration. In contrast, such detachment events for motors alone normally result in the motor diffusing away from the microtubule, terminating the run. This argument has been used to reconcile the longer single-motor run lengths in the gliding assay versus the bead assay (Block et al., 1990). Notably, this slower diffusion constant should not play a role in the DNA tensiometer geometry because if the motor transiently detaches, then it will be pulled backward by the elastic forces of the DNA and detected as a slip or detachment event. We will address this point in the revision.

      (12) "The simplest prediction is that against the low loads experienced during ramps, the detachment rate should match the unloaded detachment rate." I disagree. I would already expect a slight increase.

      Agreed. We will change this text to: “The prediction for a slip bond is that against the low loads experienced during ramps, the detachment rate should be equal to or faster than the unloaded detachment rate.”

      (13) Isn't the model over-defined by fitting the values for the load-dependence of the strong-to-weak transition and fitting the load dependence into the transition to the slip state?

      Essentially, yes, it is overdefined, but that is essentially by design and it is still very useful. Our goal here was to make as simple a model as possible that could account for the data and use it to compare model parameters for the different motor families. Ignoring the complexity of the slip and detached states, a model with a strong and weak state in the stepping cycle and a single transition out of the stepping cycle is the simplest formulation possible. And having rate constants (k<sub>S-W</sub> and k<sub>slip</sub> in our case) that vary exponentially with load makes thermodynamic sense for modeling mechanochemistry (Howard, 2001). Thus, we were pleasantly surprised that this bare-bones model could recapitulate the unloaded and stall durations for all three motors (Fig. 5C-E).

      (14) "When kinesin-1 was tethered to a glass coverslip via a DNA linker and hydrodynamic forces were imposed on an associated microtubule, kinesin-1 dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics (37)." This statement appears not to be true. In reference 37, very similar to the geometry reported here, the microtubules were fixed on the surface, and the stepping of single kinesin motors attached to large beads (to which defined forces were applied by hydrodynamics) via long DNA linkers was studied. In fact, quite a number of statements made in the present manuscript have been made already in ref. 37 (see in particular sections 2.6 and 2.7), and the authors may consider putting their results better into this context in the Introduction and Discussion. It is also noteworthy to discuss that the (admittedly limited) data in ref. 37 does not indicate a "catch-bond" behavior but rather an insensitivity to force over a defined range of forces.

      The reviewer misquoted our sentence. The actual wording of the sentence was: “When kinesin-1 was connected to micron-scale beads through a DNA linker and hydrodynamic forces parallel to the microtubule imposed, dissociation rates were relatively insensitive to loads up to ~3 pN, inconsistent with slip-bond characteristics (Urbanska et al., 2021).” The sentence the reviewer quoted was in a previous version that is available on BioRxiv and perhaps they were reading that version. Nonetheless, in the revision we will note in the Discussion that this behavior was indicative of an ideal bond (not a catch-bond), and we will also add a sentence in the Introduction highlighting this work.

      Reviewer #3 (Public review):

      The authors attribute the differences in the behaviour of kinesins when pulling against a DNA tether compared to an optical trap to the differences in the perpendicular forces. However, the compliance is also much different in these two experiments. The optical trap acts like a ~ linear spring with stiffness ~ 0.05 pN/nm. The dsDNA tether is an entropic spring, with negligible stiffness at low extensions and very high compliance once the tether is extended to its contour length (Fig. 1B). The effect of the compliance on the results should be addressed in the manuscript.

      This is an interesting point. To address it, we calculated the predicted stiffness of the dsDNA by taking the slope of theoretical force-extension curve in Fig. 1B. Below 650 nm extension, the stiffness is <0.001 pN/nM; it reaches 0.01 pN/nM at 855 nm, and at 960 nm where the force is 6 pN the stiffness is roughly 0.2 pN/nm. That value is higher than the quoted 0.05 pN/nm trap stiffness, but for reference, at this stiffness, an 8 nm step leads to a 1.6 pN jump in force, which is reasonable. Importantly, the stiffness of kinesin motors has been estimated to be in the range of 0.3 pN (Coppin et al., 1996; Coppin et al., 1997). Granted, this stiffness is also nonlinear, but what this means is that even at stall, our dsDNA tether has a similar predicted compliance to the motor that is pulling on it. We will address this point in our revision.  

      Compared to an optical trapping assay, the motors are also tethered closer to the microtubule in this geometry. In an optical trap assay, the bead could rotate when the kinesin is not bound. The authors should discuss how this tethering is expected to affect the kinesin reattachment and slipping. While likely outside the scope of this study, it would be interesting to compare the static tether used here with a dynamic tether like MAP7 or the CAP-GLY domain of p150glued.

      Please see our response to Reviewer #2 Major Comment #4 above, which asks this same question in the context of intracellular cargo. We plan to address this in our revision. Regarding a dynamic tether, we agree that’s interesting – there are kinesins that have a second, non-canonical binding site that achieves this tethering (ncd and Cin8); p150glued likely does this naturally for dynein-dynactin-activator complexes; and we speculated in a review some years ago (Hancock, 2014) that during bidirectional transport kinesin and dynein may act as dynamic tethers for one another when not engaged, enhancing the activity of the opposing motor.

      In the single-molecule extension traces (Figure 1F-H; S3), the kinesin-2 traces often show jumps in position at the beginning of runs (e.g., the four runs from ~4-13 s in Fig. 1G). These jumps are not apparent in the kinesin-1 and -3 traces. What is the explanation? Is kinesin-2 binding accelerated by resisting loads more strongly than kinesin-1 and -3?

      Due to the compliance of the dsDNA, the 95% limits for the initial attachment position are +/- 290 nm (Fig. S2). Thus, some apparent ‘jumps’ from the detached state are expected. We will take a closer look at why there are jumps for kinesin-2 that aren’t apparent for kinesin-1 or -3.

      When comparing the durations of unloaded and stall events (Fig. 2), there is a potential for bias in the measurement, where very long unloaded runs cannot be observed due to the limited length of the microtubule (Thompson, Hoeprich, and Berger, 2013), while the duration of tethered runs is only limited by photobleaching. Was the possible censoring of the results addressed in the analysis?

      Yes. Please see response to Reviewer #2 points (8) and (9) above.

      The mathematical model is helpful in interpreting the data. To assess how the "slip" state contributes to the association kinetics, it would be helpful to compare the proposed model with a similar model with no slip state. Could the slips be explained by fast reattachments from the detached state?

      In the model, the slip state and the detached states are conceptually similar; they only differ in the sequence (slip to detached) and the transition rates into and out of them. The simple answer is: yes, the slips could be explained by fast reattachments from the detached state. In that case, the slip state and recovery could be called a “detached state with fast reattachment kinetics”. However, the key data for defining the kinetics of the slip and detached states is the distribution of Recovery times shown in Fig. 4D-F, which required a triple exponential to account for all of the data. If we simplified the model by eliminating the slip state and incorporating fast reattachment from a single detached state, then the distribution of Recovery times would be a single-exponential with a time constant equivalent to t<sub>1</sub>, which would be a poor fit to the experimental distributions in Fig. 4D-F.

      We appreciate the efforts and helpful suggestions of all three reviewers and the Editor.

      References:

      Block, S.M., L.S. Goldstein, and B.J. Schnapp. 1990. Bead movement by single kinesin molecules studied with optical tweezers. Nature. 348:348-352.

      Bouchiat, C., M.D. Wang, J. Allemand, T. Strick, S.M. Block, and V. Croquette. 1999. Estimating the persistence length of a worm-like chain molecule from force-extension measurements. Biophys J. 76:409-413.

      Coppin, C.M., J.T. Finer, J.A. Spudich, and R.D. Vale. 1996. Detection of sub-8-nm movements of kinesin by high-resolution optical-trap microscopy. Proc Natl Acad Sci U S A. 93:1913-1917.

      Coppin, C.M., D.W. Pierce, L. Hsu, and R.D. Vale. 1997. The load dependence of kinesin's mechanical cycle. Proc Natl Acad Sci U S A. 94:8539-8544.

      Ezber, Y., V. Belyy, S. Can, and A. Yildiz. 2020. Dynein Harnesses Active Fluctuations of Microtubules for Faster Movement. Nat Phys. 16:312-316.

      Hancock, W.O. 2014. Bidirectional cargo transport: moving beyond tug of war. Nat Rev Mol Cell Biol. 15:615-628.

      Howard, J. 2001. Mechanics of Motor Proteins and the Cytoskeleton. Sinauer Associates, Inc., Sunderland, MA. 367 pp.

      Kunwar, A., S.K. Tripathy, J. Xu, M.K. Mattson, P. Anand, R. Sigua, M. Vershinin, R.J. McKenney, C.C. Yu, A. Mogilner, and S.P. Gross. 2011. Mechanical stochastic tug-of-war models cannot explain bidirectional lipid-droplet transport. Proc Natl Acad Sci U S A. 108:18960-18965.

      Kuo, Y.W., M. Mahamdeh, Y. Tuna, and J. Howard. 2022. The force required to remove tubulin from the microtubule lattice by pulling on its alpha-tubulin C-terminal tail. Nature communications. 13:3651.

      Laakso, J.M., J.H. Lewis, H. Shuman, and E.M. Ostap. 2008. Myosin I can act as a molecular force sensor. Science. 321:133-136.

      Leidel, C., R.A. Longoria, F.M. Gutierrez, and G.T. Shubeita. 2012. Measuring molecular motor forces in vivo: implications for tug-of-war models of bidirectional transport. Biophys J. 103:492-500.

      Marko, J.F., and E.D. Siggia. 1995. Stretching DNA. Macromolecules. 28:8759-8770.

      Nicholas, M.P., F. Berger, L. Rao, S. Brenner, C. Cho, and A. Gennerich. 2015. Cytoplasmic dynein regulates its attachment to microtubules via nucleotide state-switched mechanosensing at multiple AAA domains. Proc Natl Acad Sci U S A. 112:6371-6376.

      Purcell, E.M. 1977. Life at low Reynolds Number. Amer J. Phys. 45:3-11.

      Pyrpassopoulos, S., H. Shuman, and E.M. Ostap. 2020. Modulation of Kinesin's Load-Bearing Capacity by Force Geometry and the Microtubule Track. Biophys J. 118:243-253.

      Rai, A.K., A. Rai, A.J. Ramaiya, R. Jha, and R. Mallik. 2013. Molecular adaptations allow dynein to generate large collective forces inside cells. Cell. 152:172-182.

      Ramaiya, A., B. Roy, M. Bugiel, and E. Schaffer. 2017. Kinesin rotates unidirectionally and generates torque while walking on microtubules. Proc Natl Acad Sci U S A. 114:10894-10899.

      Rao, L., F. Berger, M.P. Nicholas, and A. Gennerich. 2019. Molecular mechanism of cytoplasmic dynein tension sensing. Nature communications. 10:3332.

      Smith, S.B., L. Finzi, and C. Bustamante. 1992. Direct mechanical measurements of the elasticity of single DNA molecules by using magnetic beads. Science. 258:1122-1126.

      Sudhakar, S., M.K. Abdosamadi, T.J. Jachowski, M. Bugiel, A. Jannasch, and E. Schaffer. 2021. Germanium nanospheres for ultraresolution picotensiometry of kinesin motors. Science. 371.

      Toleikis, A., N.J. Carter, and R.A. Cross. 2020. Backstepping Mechanism of Kinesin-1. Biophys J. 119:1984-1994.

      Urbanska, M., A. Ludecke, W.J. Walter, A.M. van Oijen, K.E. Duderstadt, and S. Diez. 2021. Highly-Parallel Microfluidics-Based Force Spectroscopy on Single Cytoskeletal Motors. Small. 17:e2007388.

      Wang, M.D., H. Yin, R. Landick, J. Gelles, and S.M. Block. 1997. Stretching DNA with optical tweezers. Biophys J. 72:1335-1346.

    1. Repeated measures ANOVA can be regarded as an extension of the paired t-test, used in situations where repeated measurements of the same variable are taken at different points in time (a time series) or under different conditions. Such situations are common in drug trials, but their analysis has certain complexities. Parametric tests based on the normal distribution assume that data sets being compared are independent. This is not the case in a repeated measures study design because the data from different time points or under different conditions come from the same subjects. This means the data sets are related, to account for which an additional assumption has to be introduced into the analysis.

      I concur with the author’s assertion that in repeated measures design, dependencies are formed among data points; however, I would question the notion that this is always a “complexity.” Although it may confuse analysis, it also enriches the study design and discourages individual differences. When the same group of participants was to be measured over again, then we lessen the variability arising from differences between people, which can actually reduce the statistical power of the test. This is by no means an exhaustive list of situations in which it helps to have a firm grasp on the concept, but consider just one example. In behavior analysis or health research, we often measure how well someone is doing, how stressed they are, or how much better they are getting at more than one point in time. This way,y you will be able to utilize the test to verify that observed changes are attributed only to intervention and not random variation or participant differences. Hereby, the material does an interesting job in showing us the importance of acknowledging dependencies in longitudinal data and using appropriate statistical techniques, having that in mind as opposed to assuming that repeated observations are indeed independent.

    1. Técnicas del nacimiento y de la obstetricia Técnicas de la infancia, crianza y alimentación del niño Técnicas de la adolescencia Técnicas del adulto Técnicas de la actividad y del movimiento Técnicas del cuidado del cuerpo. Frotar, lavar, enjabonar. Técnicas de la consumición, comer. Técnicas de la reproducción. Técnicas del cuidado, de lo anormal.

      Y? Para algunas cosas también puedes utilizar gráficos

    1. La lógica es clara: si va a usarse la inteligencia artificial en entornos profesionales, la supervisión humana debe integrarse en el proceso. En IE University, señalo a mis estudiantes que pueden usar modelos de inteligencia artificial generativa, pero deben compartir las secuencias de prompts, presentar su revisión del resultado, y asumir su responsabilidad. Si un profesional falla en ese control, su trabajo no puede darse por válido. Ya no estamos hablando de «nuevas tecnologías» ni de cosas que nacieron anteayer: si no sabes usar bien un algoritmo generativo, obtienes respuestas absurdas y no solo las das por buenas, sino que vas y las presentas como buenas, eres simplemente un irresponsable, un mal profesional.

      Instrucciones claras para los alumnos

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      This manuscript investigates the effects of oral supplementation with nicotinamide mononucleotide (NMN) on metabolism and inflammation in mice with diet-induced obesity, and whether these effects depend on the NAD⁺-dependent enzyme SIRT1. Using control and inducible SIRT1 knockout mice, the authors show that NMN administration mitigates high-fat diet-induced weight gain, enhances energy expenditure, and normalizes fasting glucose and plasma lipid profiles in a largely SIRT1-dependent manner. However, reductions in fat mass and adipose tissue expansion occur independently of SIRT1. Comprehensive plasma proteomic analyses (O-Link and mass spectrometry) reveal that NMN reverses obesity-induced alterations in metabolic and immune pathways, particularly those related to glucose and cholesterol metabolism. Integrative network and causal analyses identify both SIRT1-dependent and -independent protein clusters, as well as potential upstream regulators such as FBXW7, ADIPOR2, and PRDM16. Overall, the study supports that NMN modulates key metabolic and immune pathways through both SIRT1-dependent and alternative mechanisms to alleviate obesity and dyslipidemia in mice.

      Strengths:

      Well-written manuscript, and state-of-the-art proteomics-based methodologies to assess NMN and SIRT1-dependent effects.

      We thank the reviewer for highlighting that state-of-the-art proteomic research methods used, and we report for the first time on significant changes in plasma proteomics in mice after NMN supplementation in both wild-type and SIRT1-KO mice using a combination of DIA mass spectrometry and Olink.

      Weaknesses:

      Unfortunately, the study design, as well as the data analysis approach taken by the authors, are flawed. This limits the authors' ability to make the proposed conclusions.

      We agree that the administration of tamoxifen, along with the associated weight loss, could affect the obesity phenotype. For this reason, we ensured that both Cre-positive and Cre-negative mice received tamoxifen. Importantly, after the tamoxifen 'washout', the two groups weighed essentially the same. Going forward, we plan to address this comment by performing additional statistical tests on all six experimental groups to gain insights into dependencies. Based on your suggestions, we will clarify the limitations of the study design and improve the data analysis approaches to provide stronger support for our conclusions in the revised version of the paper.

      Reviewer #2 (Public review):

      Summary:

      Majeed and colleagues aimed to evaluate whether the metabolic effects of NMN in the context of a high-fat diet are SIRT1 dependent. For this, they used an inducible SIRT1 KO model (SIRT1 iKO), allowing them to bypass the deleterious effects of SIRT1 ablation during development. In line with previous reports, the authors observed that NMN prevents, to some degree, diet-induced metabolic damage in wild-type mice. When doing similar tests on SIRT1 iKO mice, the authors see that some, but not all, of the effects of NMN are abrogated. The phenotypic studies are complemented by plasma proteomic analyses evaluating the influence of the high-fat diet, SIRT1, and NMN on circulating protein profiles.

      Strengths:

      The mechanistic aspects behind the potential health benefits of NAD+ precursors have been poorly elucidated. This is in part due to the pleiotropic actions of NAD-related molecules on cellular processes. While sirtuins, most notably SIRT1, have been largely hypothesized to be key players in the therapeutic actions of NAD+ boosters, the proof for this in vivo is very limited. In this sense, this work is an important contribution to the field.

      We thank the reviewer for acknowledging the importance of this work to the field. In this report, we provide in vivo evidence of the action of NAD+ boosting, and hope to delineate the action of Sirt1, as well as the pleiotropic effects of NAD-related molecules on cellular and metabolic processes.

      Weaknesses:

      While the authors use a suitable methodology (SIRT1 iKO mice), the results show very early that the iKO mice themselves have some notable phenotypes, which complicate the picture. The actions of NMN in WT and SIRT1 KO mice are most often presented separately. However, this is not the right approach to evaluate and visualize SIRT1 dependency. Indeed, many of the "SIRT1-dependent" effects of NMN are consequent to the fact that SIRT1 deletion itself has a phenotype equivalent to or larger than that induced by NMN in wild-type mice. This would have been very evident if the two genotypes had been systematically plotted together. Consequently, and despite the value of the study, the results obtained with this model might not allow for solidly established claims of SIRT1 dependency on NMN actions. The fact that some of the effects of SIRT1 deletion are similar to those of NMN supplementation also makes it counterintuitive to propose that activation of SIRT1 is a major driver of NMN actions. Unbiasedly, one might as well conclude that NMN could act by inhibiting SIRT1. The fact that readouts for SIRT1 activity are not explored makes it also difficult to test the influence of NMN on SIRT1 in their experimental setting, or whether compensations could exist.

      We thank the reviewer for raising this point and acknowledge the limitations of using Sirt1 iKO mice. However, inducing Sirt1 KO in adulthood is a better alternative than using a homozygous Sirt1 KO mouse model, as the latter leads to embryonic lethality and many other developmental defects (1, 2). The proteomics analysis can provide insight into the effects of SIRT1 deletion under chow and high-fat diet (HFD) conditions, as well as the effects of diet in the presence or absence of nicotinamide mononucleotide (NMN). We will discuss these limitations and present the results for the two genotypes together, as suggested.

      A second weak point is that the proteomic explorations are interesting, yet feel too descriptive and disconnected from the overall phenotype or from the goal of the manuscript. It would be unreasonable to ask for gain/loss-of-function experiments based on the differentially abundant peptides. Yet, a deeper exploration of whether their altered presence in circulation is consistent with changes in their expression - and, if so, in which tissues - and a clearer discussion on their link to the phenotypes observed would be needed, especially for changes related to SIRT1 and NMN.

      First, we presented the data in this manner as a proof of concept, to demonstrate the effect of the diet on the plasma proteome and corroborate our findings with those published in the literature. We then investigated the effects of NAD boosting and Sirt1 KO in order to identify significant changes. We agree with the reviewer that it would be unreasonable to validate all the differentially abundant proteins. However, we will choose key proteins and assess their expression in different tissues, such as the liver, white adipose tissue (WAT) and muscles, and attempt to connect these changes with the phenotypes.

      Impact on the field and further significance of the work:

      Despite the fact that, in my opinion, the authors might not have conclusively achieved their main aim, there are multiple valuable aspects in this manuscript:

      (1) It provides independent validation for the potential benefits of NAD+ boosters in the context of diet-induced metabolic complications. Previous efforts using NR or NMN itself have provided contradicting observations. Therefore, additional independent experiments are always valuable to further balance the overall picture.

      (2) The metabolic consequences of deleting SIRT1 in adulthood have been poorly explored in previous works. Therefore, irrespective of the actions of NMN, the phenotypes observed are intriguing, and the proteomic differences are also large enough to spur further research to understand the role of SIRT1 as a therapeutic target.

      (3) Regardless of the influence of SIRT1, NMN promotes some plasma proteomic changes that are very well worth exploring. In addition, they highlight once more that the in vivo actions of NMN, as those of other NAD+ boosters, are pleiotropic. Hence, this work brings into question whether single gene KO models are really a good approach to explore the mechanisms of action of NAD+ precursors.

      We thank the reviewer for their analysis in highlighting the valuable aspects of the manuscript and we hope the revised manuscript will further strengthen the key results.

      References:

      (1) McBurney   MW, Yang   X, Jardine   K, Hixon   M, Boekelheide   K, Webb   JR, Lansdorp   PM, Lemieux   M. The mammalian SIR2alpha protein has a role in embryogenesis and gametogenesis. Mol Cell Biol  2003; 23:38–54.

      (2) Cheng   HL, Mostoslavsky   R, Saito   S, Manis   JP, Gu   Y, Patel   P, Bronson   R, Appella   E, Alt   FW, Chua   KF. Developmental defects and p53 hyperacetylation in Sir2 homolog (SIRT1)-deficient mice. Proc Natl Acad Sci U S A  2003; 100:10794–10799.

    1. Muslims also are Hindus.” (This is a common Hindu nationalist belief: that India’s Muslims are relatively recent converts, even though Islam arrived in India hundreds of years ago.

      see even a belief that indian muslims are hindus because the y are just hindu who recently converted to islam

    1. Synthèse du Webinaire : Concilier les Enjeux de l'Alimentation Durable et la Précarité

      Résumé

      Ce document de synthèse résume les échanges du webinaire "Comment concilier les 4 enjeux de l'alimentation durable et la précarité ?", organisé par le CRES et le GRAINE PACA.

      Il met en lumière la complexité de la précarité alimentaire, un phénomène hétérogène et difficile à quantifier, qui toucherait environ 8 millions de personnes en France.

      La région PACA se distingue par un taux de pauvreté élevé, le troisième plus important de France, exacerbant les inégalités d'accès à une alimentation de qualité.

      Les interventions scientifiques ont démontré que les quatre piliers de l'alimentation durable (nutrition/santé, environnement, socio-économique, socio-culturel) ne convergent pas naturellement.

      Cependant, des études approfondies révèlent qu'un régime alimentaire à la fois sain et à faible impact environnemental peut être moins coûteux.

      La clé réside dans une "végétalisation saine" de l'alimentation : une réduction de la consommation de produits animaux, notamment la viande de ruminant, compensée par un apport accru en céréales complètes, légumineuses, fruits et légumes.

      La région PACA dispose d'un écosystème structuré pour aborder ces défis, avec des instances de coordination comme la COALIM et des réseaux thématiques (Précalim, Éducalim, Régalim, PAT) visant à décloisonner les approches.

      Des programmes nationaux comme "Mieux Manger Pour Tous" et des réglementations telles que la loi EGalim offrent des cadres financiers et légaux pour transformer les systèmes alimentaires, y compris l'aide alimentaire.

      Enfin, l'étude de cas de l'épicerie sociale de Mouans-Sartoux illustre une transition réussie d'un modèle d'aide basé sur les invendus à une offre de produits frais, bio et locaux.

      Cette transformation, rendue possible par la volonté politique, des partenariats stratégiques (Biocoop, producteurs locaux) et l'accès à des financements dédiés, prouve qu'il est possible d'améliorer radicalement la qualité et la durabilité de l'aide alimentaire tout en respectant la dignité des bénéficiaires.

      --------------------------------------------------------------------------------

      1. Introduction et Contexte du Webinaire

      Organisé par le CRES PACA (Comité Régional d'Éducation pour la Santé) et le GRAINE PACA (Réseau Régional pour l'Éducation à l'Environnement et au Développement Durable), ce webinaire a bénéficié du soutien financier de la DREAL, et a été mené en partenariat avec la DRAF et l'ADEME Provence-Alpes-Côte d'Azur.

      Il s'inscrit dans le cadre du programme Mieux Manger Pour Tous et fait partie d'un cycle de deux webinaires portés par deux réseaux régionaux majeurs :

      Précalim : Réseau régional de lutte contre la précarité alimentaire.

      Éducalim : Réseau régional de l'éducation à l'alimentation durable et au goût.

      Les objectifs principaux du webinaire étaient les suivants :

      • Approfondir les connaissances sur la notion d'alimentation durable et les leviers pour concilier ses enjeux chez les personnes en situation de précarité.

      • Identifier les principales réglementations liées à l'alimentation durable pour tous.

      • Découvrir une action de terrain inspirante et reproductible.

      2. Le Cadre Stratégique et les Réseaux d'Acteurs en Région PACA

      2.1. L'Écosystème Régional pour une Alimentation Durable

      Présenté par Peggy Bucas (DRAF), le maillage régional en PACA est conçu pour maximiser l'efficacité des actions en faveur de l'alimentation durable.

      La COALIM : Cette instance réunit les institutions régionales (DRAF, DREAL, DREETS, ARS, ADEME, Région) qui pilotent des missions et des financements liés à l'alimentation durable. Elle assure une concertation et une complémentarité des actions.

      Les Réseaux Thématiques Régionaux : Quatre réseaux principaux apportent un soutien thématique et méthodologique aux porteurs de projet.

      Précalim : Focalisé sur la lutte contre la précarité alimentaire.    ◦ Éducalim : Centré sur l'éducation à l'alimentation durable et au goût.    ◦ Régalim : Dédié à la lutte contre le gaspillage et les pertes alimentaires.    ◦ Réseau des PAT : Anime les 29 Projets Alimentaires Territoriaux (PAT) de la région.

      Les PAT sont des leviers essentiels pour favoriser les coopérations, casser les fonctionnements en silo et développer une approche systémique. Ils ont pour mission d'intégrer un volet "justice sociale" pour réduire la précarité alimentaire.

      2.2. Le Réseau Précalim et le Programme "Mieux Manger Pour Tous"

      Présentés par Sandrine Fort (DREETS), le réseau Précalim et le programme MMPT sont des piliers de la lutte contre la précarité alimentaire dans la région.

      Le Réseau Précalim :

      Membres : Près de 600 membres (institutions, associations, collectivités). Un appel est lancé pour intégrer davantage d'acteurs agricoles.

      Objectifs :

      ◦ Créer de l'interconnaissance entre les acteurs.  

      ◦ Favoriser le partage d'initiatives et les retours d'expérience.  

      ◦ Promouvoir les synergies et les coopérations.  

      ◦ Valoriser les actions et les acteurs.

      Actions : Journées de rencontre, webinaires thématiques, ateliers "accélérateurs de projets" et une plateforme collaborative sur l'espace de l'ADEME.

      Le Programme "Mieux Manger Pour Tous" (MMPT) :

      Origine : Issu du plan d'action pour la transformation de l'aide alimentaire.

      Budget national : 60 millions d'euros en 2023, avec une progression de 10 millions par an prévue jusqu'en 2027.

      Objectifs :

      1. Améliorer la qualité nutritionnelle et gustative de l'aide alimentaire.   

      2. Soutenir la participation et l'accompagnement des personnes précaires.   

      3. Transformer les dispositifs locaux de lutte contre la précarité alimentaire (ex: paniers solidaires, groupements d'achat).   

      4. Réduire l'impact environnemental du système d'aide alimentaire.

      Chiffres du programme en PACA :

      2023 : 51 projets financés pour 1,7 million d'euros.    ◦ 2024 : 62 projets financés pour 2,5 millions d'euros.    ◦ 2025 : Enveloppe de 3,3 millions d'euros, avec 46 projets supplémentaires en cours d'instruction.

      3. La Précarité Alimentaire : Définitions et Chiffres Clés

      3.1. Définitions Fondamentales

      Terme

      Définition

      Alimentation Durable (FAO)

      Régimes alimentaires qui contribuent à protéger la biodiversité, sont culturellement acceptables, économiquement équitables et accessibles, et nutritionnellement sûrs et sains. Elle repose sur quatre enjeux : Nutrition/Santé, Environnement, Socio-économique, et Socio-culturel.

      Lutte contre la Précarité Alimentaire

      Favoriser l'accès à une alimentation sûre, diversifiée, de bonne qualité et en quantité suffisante pour les personnes en situation de vulnérabilité, dans le respect de leur dignité et en développant leur capacité d'agir.

      Aide Alimentaire

      Fourniture de denrées alimentaires aux personnes vulnérables, assortie d'une proposition d'accompagnement.

      Insécurité Alimentaire (FAO)

      Situation dans laquelle une personne n'a pas un accès régulier à suffisamment d'aliments sains et nutritifs pour une croissance et une vie active et saine. Elle est mesurée par l'échelle FIES (Food Insecurity Experience Scale).

      3.2. État des Lieux de la Précarité Alimentaire

      La mesure de la précarité alimentaire en France est complexe en raison de l'absence de méthode de recensement homogène et régulière.

      Les données sont issues du croisement de plusieurs sources (statistiques publiques, études ponctuelles comme INCA 3, données des associations).

      Chiffres nationaux (estimations) :

      Personnes en insécurité alimentaire : 8 millions, soit 11% de la population (Anses).

      Insatisfaction alimentaire : 16% des personnes déclarent ne pas avoir assez à manger et 45% ne pas avoir les aliments souhaités (CRÉDOC, 2022).

      Bénéficiaires de l'aide alimentaire : Entre 2 et 9 millions. La DGCS recense 5,3 millions de personnes inscrites auprès des associations habilitées.

      Non-recours à l'aide alimentaire : 75% des personnes en insécurité alimentaire n'ont pas recours à l'aide alimentaire (étude INCA 3, 2015).

      Difficultés financières : 38% des Français rencontrent des difficultés financières pour consommer des fruits et légumes frais tous les jours (Baromètre Ipsos/Secours Populaire, 2024).

      Impacts sur la santé :

      • La prévalence de l'obésité est près de quatre fois plus élevée chez les adultes les plus pauvres.

      • La consommation de fruits et légumes est deux fois plus faible chez les personnes en insécurité alimentaire (230g/jour en moyenne contre une recommandation de 400g/jour).

      Situation en région PACA :

      Taux de pauvreté : 3ème plus élevé de France, touchant environ 850 000 personnes.

      Niveau de vie médian des personnes pauvres : 10 600 € par an, soit plus de deux fois inférieur au niveau de vie médian de l'ensemble de la population de la région (22 000 €).

      Département le plus pauvre : Le Vaucluse, avec un taux de pauvreté de 20% (5ème plus élevé de France).

      Groupes les plus touchés :

      ◦ Les ménages dont le référent a moins de 30 ans (25% de taux de pauvreté).   

      ◦ Les familles monoparentales (30,2%).  

      ◦ Les seniors (la part des retraités parmi les ménages pauvres est de 30,4%).

      4. Éclairages Scientifiques : Vers une Alimentation Durable et Abordable

      Florent Vieux (MS-Nutrition) a présenté plusieurs études visant à quantifier les dimensions de l'alimentation durable (nutrition, environnement, coût) à partir de bases de données de référence (INCA 3, Ciqual, Agribalyse, Kantar).

      4.1. Hiérarchie des Groupes Alimentaires

      Cette étude montre que le classement des aliments en termes de coût et d'impact environnemental dépend fortement de l'unité fonctionnelle choisie.

      Unité Fonctionnelle

      Constats Clés

      Par kilogramme (€/kg)

      - Les plus chers/impactants : Viande de ruminant, produits de la mer. <br> - Les moins chers/impactants : Fruits, légumes, légumineuses.

      Par 100 kilocalories (€/100 kcal)

      - Les fruits et légumes deviennent très chers et impactants en raison de leur faible densité énergétique. <br> - Les produits laitiers et les œufs restent en position intermédiaire.

      Par unité de qualité nutritionnelle

      - Les produits de la mer redeviennent plus "abordables". <br>

      • Les fruits, légumes et légumineuses restent des choix très pertinents (faible coût/impact rapporté à leur densité nutritionnelle).

      Conclusion principale : Le classement du coût et de l'impact environnemental des catégories d'aliments est très similaire.

      Certains aliments comme les légumineuses, les pommes de terre et les céréales complètes sont systématiquement peu coûteux et peu impactants, quelle que soit l'unité fonctionnelle.

      4.2. Approche par "Déviance Positive"

      Cette étude a comparé les régimes alimentaires d'individus ayant une bonne qualité nutritionnelle mais des impacts environnementaux différents.

      Le groupe "plus durable" (bonne nutrition, faible impact) présentait également un coût alimentaire plus faible.

      Marqueurs d'une bonne qualité nutritionnelle (communs aux deux groupes) :

      ◦ Consommation élevée de fruits et légumes.  

      ◦ Consommation élevée de produits laitiers.  

      ◦ Faible consommation de boissons sucrées.

      Ce qui distingue le groupe à faible impact environnemental :

      ◦ Une consommation beaucoup plus faible de viande de ruminant.  

      ◦ Une consommation nettement plus élevée de céréales complètes pour compenser.

      4.3. Conclusion et Recommandations

      L'ensemble des études convergent vers un message principal : la "végétalisation saine".

      Il s'agit de réduire la consommation de produits animaux (surtout la viande) et de la substituer par des choix végétaux éclairés (céréales complètes, légumineuses, fruits et légumes).

      Enjeu spécifique pour les personnes précaires : L'augmentation de la consommation de fruits et légumes est prioritaire, car leur niveau de consommation de départ est particulièrement bas.

      Empreinte carbone : Si les plus pauvres ont une empreinte carbone globale bien plus faible que les plus riches, la différence est moins marquée pour le poste "alimentation". Agir sur ce levier reste donc pertinent pour tous.

      5. Cadre Réglementaire et Levier d'Action

      5.1. La Loi EGalim comme Modèle

      Clara Vigan (DRAF) a présenté la loi EGalim, appliquée à la restauration collective, comme un levier puissant pouvant inspirer des actions au-delà de ce secteur.

      Objectifs de la loi :

      50% de produits de qualité et durables, dont au moins 20% de produits bio.  

      Diversification des sources de protéines avec l'introduction de menus végétariens, ce qui permet de réduire les coûts.  

      Lutte contre le gaspillage alimentaire.    ◦ Réduction de l'usage du plastique.

      Ces principes peuvent être transposés à l'aide alimentaire pour améliorer la qualité de l'offre tout en maîtrisant les budgets.

      5.2. L'Impératif de la Sécurité Sanitaire des Aliments

      Peggy Bucas (DRAF) a rappelé les règles fondamentales du "Paquet Hygiène", cruciales pour toute structure distribuant des denrées.

      Principes clés : traçabilité des dons, respect de la chaîne du froid/chaud, hygiène des locaux et du personnel.

      Distinction essentielle :

      DLC (Date Limite de Consommation) : Dépassement impérativement interdit.    ◦ DDM (Date de Durabilité Minimale) : "à consommer de préférence avant", le produit reste consommable sans risque sanitaire après la date.

      6. Étude de Cas : La Transformation de l'Épicerie Sociale de Mouans-Sartoux

      Rémy Georgon (CCAS de Mouans-Sartoux) a partagé le retour d'expérience de la transformation de l'épicerie sociale de la commune.

      Le déclic : Une prise de conscience collective en 2020 face à la baisse de qualité des dons issus des invendus. La structure réalisait qu'elle distribuait "des produits que personne n'a achetés".

      La stratégie de transformation :

      1. Partenariats stratégiques : Une collaboration avec le magasin Biocoop local a permis d'instaurer une offre de produits en vrac (alimentaire et hygiène) et de créer un rayon de produits bio achetés.  

      2. Recherche de financements : Mobilisation des appels à projets "France Relance" (pour renouveler les équipements de froid) et "Mieux Manger Pour Tous".   

      3. Approvisionnement local et de saison : Mise en place d'un système de commande groupée de légumes frais et de saison auprès d'un producteur local.  

      4. Synergie avec la politique de la ville : Le projet MMPT a permis de financer l'embauche d'un maraîcher par le CCAS, mis à disposition de la régie agricole municipale pour augmenter la production de légumes bio à destination de l'épicerie.  

      5. Implication des bénéficiaires : Les usagers ont été consultés pour définir les produits frais prioritaires à acheter (produits laitiers).

      Résultats quantitatifs :

      ◦ En 2024, les produits bio représentaient 7% du stock (avec 0% de fruits et légumes).   

      ◦ Au premier semestre 2025, ce chiffre est passé à 46% de produits bio en poids, dont 62% sont des fruits et légumes.  

      ◦ Le budget d'achat de denrées est passé de 4 000 € à 25 000 €, soutenu par des subventions.

      Facteurs clés de succès :

      ◦ La conviction et l'engagement du responsable.  

      ◦ Une forte volonté politique et le soutien de la mairie.    ◦ La capacité à chercher des partenaires et des financements externes.  

      ◦ Le choix de privilégier la qualité sur la quantité.

    1. Regular Expressions Notepad++ regular expressions (“regex”) use the Boost regular expression library v1.85 (as of NPP v8.6.6), which was originally based on PCRE (Perl Compatible Regular Expression) syntax, only departing from it in very minor ways. Complete documentation on the precise implementation is to be found on the Boost pages for search syntax and replacement syntax. (Some users have misunderstood this paragraph to mean that they can use one of the regex-explainer websites that accepts PCRE and expect anything that works there to also work in Notepad++; this is not accurate. There are many different “PCRE” implimentations, and Boost itself does not claim to be “PCRE”, though both Boost and PCRE variants have the same origins in an early version of Perl’s regex engine. If your regex-explainer does not claim to use the same Boost engine as Notepad++ uses, there will be differences between the results from your chosen website and the results that Notepad++ gives.) The Notepad++ Community has a FAQ on other resources for regular expressions. Note: Regular expression “backward” search is disallowed due to sometimes surprising results. (For example, in the text to the test they travelled, a forward regex t\w+ will find 5 results; the same regex searching backward will find 17 matches.) If you really need this feature, please see Allow regex backward search to learn how to activate this option. Important Note: Syntax that works in the Find What: box for searching will not always work in the Replace with: box for replacement. There are different syntaxes. The Control Characters and Match by character code syntax work in both; other than that, see the individual sections for Searches vs Substitutions for which syntaxes are valid in which fields. Regex Special Characters for Searches In a regular expression (shortened into regex throughout), special characters interpreted are: Single-character matches . or \C ⇒ Matches any character. If you check the box which says . matches newline, or use the (?s) search modifier, then . or \C will match any character, including newline characters (\r or \n). With the option unchecked, or using the (?-s) search modifier, . or \C only match characters within a line, and do not match the newline characters. Any Unicode character within the Basic Multilingual Plane (BMP) (with a codepoint from U+0000 through U+FFFF) will be matched per these rules. Any Unicode character that is beyond the BMP (with a codepoint from U+10000 through U+10FFFF) will be matched as two separate characters instead, since the “surrogate code” uses two characters. (See the Match by Character Code section for more on how surrogate codes work.) \X ⇒ Matches a single non-combining character followed by any number (zero or more) combining characters. You can think of \X as a “. on steroids”: it matches the whole grapheme as a unit, not just the base character itself. This is useful if you have a Unicode encoded text with accents as separate, combining characters. For example, the letter ǭ̳̚, with four combining characters after the o, can be found either with the regex (?-i)o\x{0304}\x{0328}\x{031a}\x{0333} or with the shorter regex \X (the latter, being generic, matches more than just ǭ̳̚, inluding but not limited to ą̳̄̚ or o alone); if you want to limit the \X in this example to just match a possibly-modified o (so “o followed by 0 or more modifiers”), use a lookahead before the \X: (?=o)\X, which would match o alone or ǭ̳̚, but not ą̳̄̚. \$ , \( , \) , \* , \+ , \. , \? , \[ , \] , \\ , \| ⇒ Prefixing a special character with \ to “escape” the character allows you to search for a literal character when the regular expression syntax would otherwise have that character have a special meaning as a regex meta-character. The characters $ ( ) * + . ? [ ] \ | all have special meaning to the regex engine in normal circumstances; to get them to match as a literal (or to show up as a literal in the substitution), you will have to prefix them with the \ character. There are also other characters which are special only in certain circumstances (any time a character is used with a non-literal meaning throughout the Regular Expression section of this manual); if you want to match one of those sometimes-special characters as literal character in those situations, those sometimes-special characters will also have to be escaped in those situations by putting a \ before it. Please note: if you escape a normal character, it will sometimes gain a special meaning; this is why so many of the syntax items listed in this section have a \ before them. Match by character code It is possible to match any character using its character code. This allows searching for any character, even if you cannot type it into the Find box, or the Find box doesn’t seem to match your emoji that you want to search for. If you are using an ANSI encoding in your document (that is, using a character set like Windows 1252), you can use any character code with a decimal codepoint from 0 to 255. If you are using Unicode (one of the UTF-8 or UTF-16 encodings), you can actually match any Unicode character. These notations require knowledge of hexadecimal or octal versions of the character code. (You can find such character code information on most web pages about ASCII, or about your selected character set, and about UTF-8 and UTF-16 representations of Unicode characters.) \0ℕℕℕ ⇒ A single byte character whose code in octal is ℕℕℕ, where each ℕ is an octal digit. (That’s the number 0, not the letter o or O.) This notation works for for codepoints 0-255 (\0000 - \0377), which covers the full ANSI character set range, or the first 256 Unicode characters. For example, \0101 looks for the letter A, as 101 in octal is 65 in decimal, and 65 is the character code for A in ASCII, in most of the character sets, and in Unicode. \xℕℕ ⇒ Specify a single character with code ℕℕ, where each ℕ is a hexadecimal digit. What this stands for depends on the text encoding. This notation works for codepoints 0-255 (\x00 - \xFF), which covers the full ANSI character set range, or the first 256 Unicode characters. For instance, \xE9 may match an é or a θ depending on the character set (also known as the “code page”) in an ANSI encoded document. These next two only work with Unicode encodings (so the various UTF-8 and UTF-16 encodings): \x{ℕℕℕℕ} ⇒ Like \xℕℕ, but matches a full 16-bit Unicode character, which is any codepoint from U+0000 to U+FFFF. \x{ℕℕℕℕ}\x{ℕℕℕℕ} ⇒ For Unicode characters above U+FFFF, in the range U+10000 to U+10FFFF, you need to break the single 5-digit or 6-digit hex value and encode it into two 4-digit hex codes; these two codes are the “surrogate codes” for the character. For example, to search for the 🚂 STEAM LOCOMOTIVE character at U+1F682, you would search for the surrogate codes \x{D83D}\x{DE82}. If you want to know the surrogate codes for a given character, search the internet for “surrogate codes for character” (where character is the fancy Unicode character you need the codes for); the surrogate codes are equivalent to the two-word UTF-16 encoding for those higher characters, so UTF-16 tables will also work for looking this up. Any site or tool that you are likely to be using to find the U+###### for a given Unicode character will probably already give you the surrogate codes or UTF-16 words for the same character; if not, find a tool or site that does. You can also compute surrogate codes yourself from the character code, but only if you are comfortable with hexadecimal and binary. Skip the following bullets if you are prone to mathematics-based PTSD. Start with your Unicode U+######, calling the hexadecimal digits as PPWXYZ. The PP digits indicate the plane. subtract one and convert to the 4 binary bits pppp (so PP=01 becomes 0000, PP=0F becomes 1110, and PP=10 becomes 1111) Convert each of the other digits into 4 bits (W as wwww, X as xxvv, Y as yyyy, and Z as zzzz; you will see in a moment why two different characters are used in xxvv) Write those 20 bits in sequence: ppppwwwwxxvvyyyyzzzz Group into two equal groups: ppppwwwwxx and vvyyyyzzzz (you can see that the X ⇒ xxvv was split between the two groups, hence the notation) Before the first group, insert the binary digits 110110 to get 110110ppppwwwwxx, and split into the nibbles 1101 10pp ppww wwxx. Convert those nibbles to hex: it will give you a value from \x{D800} thru \x{DBFF}; this is the High Surrogate code Before the second group, insert the binary digits 110111 to get 110111vvyyyyzzzz, and split into the nibbles 1101 11vv yyyy zzzz. Convert those nibbles to hex: it will give you a value from \x{DC00} thru \x{DFFF}; this is the Low Surrogate code Combine those into the final \x{ℕℕℕℕ}\x{ℕℕℕℕ} for searching. For more on this, see the Wikipedia article on Unicode Planes, and the discussion in the Notepad++ Community Forum about how to search for non-ASCII characters Collating Sequences [[._col_.]] ⇒ The character the col “collating sequence” stands for. For instance, in Spanish, ch is a single letter, though it is written using two characters. That letter would be represented as [[.ch.]]. This trick also works with symbolic names of control characters, like [[.BEL.]] for the character of code 0x07. See also the discussion on character ranges. Control characters \a ⇒ The BEL control character 0x07 (alarm). \b ⇒ The BS control character 0x08 (backspace). This is only allowed inside a character class definition. Otherwise, this means “a word boundary”. \e ⇒ The ESC control character 0x1B. \f ⇒ The FF control character 0x0C (form feed). \n ⇒ The LF control character 0x0A (line feed). This is the regular end of line under Unix systems. \r ⇒ The CR control character 0x0D (carriage return). This is part of the DOS/Windows end of line sequence CR-LF, and was the EOL character on Mac 9 and earlier. OSX and later versions use \n. \t ⇒ The TAB control character 0x09 (tab, or hard tab, horizontal tab). \c☒ ⇒ The control character obtained from character ☒ by stripping all but its 5 lowest order bits. For instance, \cA and \ca both stand for the SOH control character 0x01. You can think of this as “\c means ctrl”, so \cA is the character you would get from hitting Ctrl+A in a terminal. (Note that \c☒ will not work if ☒ is outside of the Basic Multilingual Plane (BMP) – that is, it only works if ☒ is in the Unicode character range U+0000 - U+FFFF. The intention of \c☒ is to mnemonically escape the ASCII control characters obtained by typing Ctrl+☒, it is expected that you will use a simple ASCII alphanumeric for the ☒, like \cA or \ca.) Special Control escapes \R ⇒ Any newline sequence. Specifically, the atomic group (?>\r\n|\n|\x0B|\f|\r|\x85|\x{2028}|\x{2029}). Please note, this sequence might match one or two characters, depending on the text. Because its length is variable-width, it cannot be used in lookbehinds. Because it expands to a parentheses-based group with an alternation sequence, it cannot be used inside a character class. If you accidentally attempt to put it in a character class, it will be interpreted like any other literal-character escape (where \☒ is used to make sure that the next character is literal) meaning that the R will be taken as a literal R, without any special meaning. For example, if you try [\t\R]: you may be intending to say, “match any single character that’s a tab or a newline”, but what you are actually saying is “match the tab or a literal R”; to get what you probably intended, use [\t\v] for “a tab or any vertical spacing character”, or [\t\r\n] for “a tab or carriage return or newline but not any of the weird verticals”. Ranges or kinds of characters Character Classes [_set_] ⇒ This indicates a set of characters, for example, [abc] means any of the literal characters a, b or c. You can also use ranges by putting a hyphen between characters, for example [a-z] for any character from a to z. You can use a collating sequence in character ranges, like in [[.ch.]-[.ll.]] (these are collating sequences in Spanish). Certain characters require special treatment inside character classes: To use a literal - in a character class: Use it directly as the first or last character in the enclosing class notation, like [-abc] or [abc-]; OR use it “escaped” at any position, like [\-abc] or [a\-bc] . To use a literal ] in a character class: Use it directly right after the opening [ of the class notation, like []abc]; OR use it “escaped” at any position, like [\]abc] or [a\]bc] . To use a literal [ in a character class: Use it directly like any other character, like [ab[c]; “escaping” is not necessary, but is permissible, like [ab\[c] . This character is not special when used alone inside a class; however, there are cases where it is special in combination with another: If used with a colon in the order [: inside a class, it is the opening sequence for a named class (described below); if you want to include both a [ and a : inside the same character class, do not use them unescaped right next to each other; either change the order, like [:[], or escape one or both, like [\[:] or [[\:] or [\[\:] . If used with an equals sign in the order [= inside a class, it is the opening sequence for an equivalence class (described below); if you want to include both a [ and a = inside the same character class, do not use them unescaped right next to each other; either change the order, like [=[], or escape one or both, like [\[=] or [[\=] or [\[\=] . To use a literal \ in a character class, it must be doubled (i.e., \\) inside the enclosing class notation, like [ab\\c] . To use a literal ^ in a character class: Use it directly as any character but the first, such as [a^b] or [ab^]; OR use it “escaped” at any position, such as [\^ab] or [a\^b] or [ab\^] . [^_set_] ⇒ The complement of the characters in the set. For example, [^A-Za-z] means any character except an alphabetic character. Care should be taken with a complement list, as regular expressions are always multi-line, and hence [^ABC]* will match until the first A, B or C (or a, b or c if match case is off), including any newline characters. To confine the search to a single line, include the newline characters in the exception list, e.g. [^ABC\r\n]. [[:_name_:]] or [[:☒:]] ⇒ The whole character class named name. For many, there is also a single-letter “short” class name, ☒. Please note: the [:_name_:] and [:☒:] must be inside a character class [...] to have their special meaning. short full name description equivalent character class alnum letters and digits alpha letters h blank spacing which is not a line terminator [\t\x20\xA0] cntrl control characters [\x00-\x1F\x7F\x81\x8D\x8F\x90\x9D] d digit digits graph graphical character, so essentially any character except for control chars, \0x7F, \x80 l lower lowercase letters print printable characters [\s[:graph:]] punct punctuation characters [!"#$%&'()*+,\-./:;<=>?@\[\\\]^_{\|}~] s space whitespace (word or line separator) [\t\n\x0B\f\r\x20\x85\xA0\x{2028}\x{2029}] u upper uppercase letters unicode any character with code point above 255 [\x{0100}-\x{FFFF}] w word word characters [_\d\l\u] xdigit hexadecimal digits [0-9A-Fa-f] Note that letters include any unicode letters (ASCII letters, accented letters, and letters from a variety of other writing systems); digits include ASCII numeric digits, and anything else in Unicode that’s classified as a digit (like superscript numbers ¹²³…). Note that those character class names may be written in upper or lower case without changing the results. So [[:alnum:]] is the same as [[:ALNUM:]] or the mixed-case [[:AlNuM:]]. As stated earlier, the [:_name_:] and [:☒:] (note the single brackets) must be a part of a surrounding character class. However, you may combine them inside one character class, such as [_[:d:]x[:upper:]=], which is a character class that would match any digit, any uppercase, the lowercase x, and the literal _ and = characters. These named classes won’t always appear with the double brackets, but they will always be inside of a character class. If the [:_name_:] or [:☒:] are accidentally not contained inside a surrounding character class, they will lose their special meaning. For example, [:upper:] is the character class matching :, u, p, e, and r; whereas [[:upper:]] is similar to [A-Z] (plus other unicode uppercase letters) [^[:_name_:]] or [^[:☒:]] ⇒ The complement of character class named name or ☒ (matching anything not in that named class). This uses the same long names, short names, and rules as mentioned in the previous description. Character classes may not contain parentheses-based groups of any kind, including the special escape \R (which expands to a parentheses-based group when evaluated, even though \R doesn’t look like it contains parentheses). Character Properties These properties behave similar to named character classes, but cannot be contained inside a character class. \p☒ or \p{_name_} ⇒ Same as [[:☒:]] or [[:_name_:]], where ☒ stands for one of the short names from the table above, and name stands for one of the full names from above. For instance, \pd and \p{digit} both stand for a digit, just like the escape sequence \d does. \P☒ or \P{_name_} ⇒ Same as [^[:☒:]] or [^[:_name_:]] (not belonging to the class name). Character escape sequences \☒ ⇒ Where ☒ is one of d, w, l, u, s, h, v, described below. These single-letter escape sequences are each equivalent to a class from above. The lower-case escape sequence means it matches that class; the upper-case escape sequence means it matches the negative of that class. (Unlike the properties, these can be used both inside or outside of a character class.) Description Escape Sequence Positive Class Negative Escape Sequence Negative Class digits \d [[:digit:]] \D [^[:digit:]] word chars \w [[:word:]] \W [^[:word:]] lowercase \l [[:lower:]] \L [^[:lower:]] uppercase \u [[:upper:]] \U [^[:upper:]] word/line separators \s [[:space:]] \S [^[:space:]] horizontal space \h [[:blank:]] \H [^[:blank:]] vertical space \v see below \V Vertical space: This encompasses all the [[:space:]] characters that aren’t [[:blank:]] characters: The LF, VT, FF, CR , NEL control characters and the LS and PS format characters: 0x000A (line feed), 0x000B (vertical tabulation), 0x000C (form feed), 0x000D (carriage return), 0x0085 (next line), 0x2028 (line separator) and 0x2029 (paragraph separator). There isn’t a named class which matches. Note: despite its similarity to \v, even though \R matches certain vertical space characters, it is not a character-class-equivalent escape sequence (because it evaluates to a parentheses()-based expression, not a class-based expression). So while \d, \l, \s, \u, \w, \h, and \v are all equivalent to a character class and can be included inside another bracket[]-based character class, the \R is not equivalent to a character class, and cannot be included inside a bracketed[] character-class. Equivalence Classes [[=_char_=]] ⇒ All characters that differ from char by case, accent or similar alteration only. For example [[=a=]] matches any of the characters: A, À, Á, Â, Ã, Ä, Å, a, à, á, â, ã, ä and å. Multiplying operators + ⇒ This matches 1 or more instances of the previous character, as many as it can. For example, Sa+m matches Sam, Saam, Saaam, and so on. [aeiou]+ matches consecutive strings of vowels. * ⇒ This matches 0 or more instances of the previous character, as many as it can. For example, Sa*m matches Sm, Sam, Saam, and so on. ? ⇒ Zero or one of the last character. Thus Sa?m matches Sm and Sam, but not Saam. *? ⇒ Zero or more of the previous group, but minimally: the shortest matching string, rather than the longest string as with the “greedy” operator. Thus, m.*?o applied to the text margin-bottom: 0; will match margin-bo, whereas m.*o will match margin-botto. +? ⇒ One or more of the previous group, but minimally. {ℕ} ⇒ Matches ℕ copies of the element it applies to (where ℕ is any decimal number). {ℕ,} ⇒ Matches ℕ or more copies of the element it applies to. {ℕ,ℙ} ⇒ Matches ℕ to ℙ copies of the element it applies to, as much it can (where ℙ ≥ ℕ). {ℕ,}? or {ℕ,ℙ}? ⇒ Like the above, but minimally. *+ or ?+ or ++ or {ℕ,}+ or {ℕ,ℙ}+ ⇒ These so called “possessive” variants of greedy repeat marks do not backtrack. This allows failures to be reported much earlier, which can boost performance significantly. But they will eliminate matches that would require backtracking to be found. As an example, see how the matching engine handles the following two regexes: When regex “.*” is run against the text “abc”x : `“` matches `“` `.*` matches `abc”x` `”` doesn't match ( End of line ) => Backtracking `.*` matches `abc”` `”` doesn't match letter `x` => Backtracking `.*` matches `abc` `”` matches `”` => 1 overall match `“abc”` When regex “.*+”, with a possessive quantifier, is run against the text “abc”x : `“` matches `“` `.*+` matches `abc”x` ( catches all remaining characters ) `”` doesn't match ( End of line ) Notice there is no match at all in this version, because the possessive quantifier prevents backtracking to a possible solution. Anchors Anchors match a zero-length position in the line, rather than a particular character. ^ ⇒ This matches the start of a line (except when used inside a set, see above). $ ⇒ This matches the end of a line. \< ⇒ This matches the start of a word using Boost’s definition of words. \> ⇒ This matches the end of a word using Boost’s definition of words. \b ⇒ Matches either the start or end of a word. \B ⇒ Not a word boundary. It represents any location between two word characters or between two non-word characters. \A or \` ⇒ Matches the start of the file. \z or \' ⇒ Matches the end of the file. \Z ⇒ Matches like \z with an optional sequence of newlines before it. This is equivalent to (?=\v*\z), which departs from the traditional Perl meaning for this escape. \G ⇒ This “Continuation Escape” matches the end of the previous match, or matches the start of the text being matched if no previous match was found. In Find All or Replace All circumstances, this will allow you to anchor your next match at the end of the previous match. If it is the first match of a Find All or Replace All, and any time you use a single Find Next or Replace, the “end of previous match” is defined to be the start of the search area – the beginning of the document, or the current caret position, or the start of the highlighted text. Because of that, if you are using it in an alternation, where you want to say “find any occurrence of something after some prefix, or after a previous match), you will want to make sure that your prefix includes the start-of-file \A, otherwise the \G portion may accidentally match start-of-file when you don’t want that to occur. Capture Groups and Backreferences (_subset_) ⇒ Numbered Capture Group: Parentheses mark a part of the regular expression, also known as a subset expression or capture group. The string matched by the contents of the parentheses (indicated by subset in this example) can be re-used with a backreference or as part of a replace operation; see Substitutions, below. Groups may be nested. (?<name>_subset_) or (?'name'_subset_) ⇒ Named Capture Group: Names the value matched by subset as the group name. Please note that group names are case-sensitive. \ℕ, \gℕ, \g{ℕ}, \g<ℕ>, \g'ℕ', \kℕ, \k{ℕ}, \k<ℕ> or \k'ℕ' ⇒ Numbered Backreference: These syntaxes match the ℕth capture group earlier in the same expression. (Backreferences are used to refer to the capture group contents only in the search/match expression; see the Substitution Escape Sequences for how to refer to capture groups in substitutions/replacements.) A regex can have multiple subgroups, so \2, \3, etc. can be used to match others (numbers advance left to right with the opening parenthesis of the group). You can have as many capture groups as you need, and are not limited to only 9 groups (though some of the syntax variants can only reference groups 1-9; see the notes below, and use the syntaxes that explicitly allow multi-digit ℕ if you have more than 9 groups) Example: ([Cc][Aa][Ss][Ee]).*\1 would match a line such as Case matches Case but not Case doesn't match cASE. \ℕ ⇒ This form can only have ℕ as digits 1-9, so if you have more than 9 capture groups, you will have to use one of the other numbered backreference notations, listed in the next bullet point. Example: the expression \10 matches the contents of the first capture group \1 followed by the literal character 0”, not the contents of the 10th group. \gℕ, \g{ℕ}, \g<ℕ>, \g'ℕ', \kℕ, \k{ℕ}, \k<ℕ> or \k'ℕ' ⇒ These forms can handle any non-zero ℕ. For positive ℕ, it matches the ℕth subgroup, even if ℕ has more than one digit. \g10 matches the contents from the 10th capture group, not the contents from the first capture group followed by the literal 0. If you want to match a literal number after the contents of the ℕth capture group, use one of the forms that has braces, brackets, or quotes, like \g{ℕ} or \k'ℕ' or \k<ℕ>: For example, \g{2}3 matches the contents of the second capture group, followed by a literal 3, whereas \g23 would match the contents of the twenty-third capture group. For clarity, it is highly recommended to always use the braces or brackets form for multi-digit ℕ For negative ℕ, groups are counted backwards relative to the last group, so that \g{-1} is the last matched group, and \g{-2} is the next-to-last matched group. Please, note the difference between absolute and relative backreferences. For instance, an exact four-letters word palindrome can be matched with : the regex (?-i)\b(\w)(\w)\g{2}\g{1}\b, when using absolute (positive) coordinates the regex (?-i)\b(\w)(\w)\g{-1}\g{-2}\b, when using relative (negative) coordinates \g{name}, \g<name>, \g'name', \k{name}, \k<name> or \k'name' ⇒ Named Backreference: The string matching the subexpression named name. (As with the Numbered Backreferences above, these Named Backreferences are used to refer to the capture group contents only in the search/match expression; see the Substitution Escape Sequences for how to refer to capture groups in substitutions/replacements.)

      regular expression

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewer’s Comments

      We thank all three reviewers for their thoughtful and detailed comments, which will help us to improve the quality and clarity of our manuscript.


      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __ Summary: In this work, Tripathi et al address the open question of how the Fat/Ds pathway affects organ shape, using the Drosophila wing as a model. The Fat/Ds pathway is a conserved but complex pathway, interacting with Hippo signalling to affect growth and providing planar cell polarity that can influence cellular dynamics during morphogenesis. Here, authors use genetic perturbations combined with quantification of larval, pupal, and adult wing shape and laser ablation to conclude that the Ft/Ds pathway affects wing shape only during larval stages in a way that is at least partially independent of its interaction with Hippo and rather due to an effect on tissue tension and myosin II distribution. Overall the work is clearly written and well presented. I only have a couple major comments on the limitations of the work.

      Major comments: 1. Authors conclude from data in Figures 1 and 2 that the Fat/Ds pathway only affects wing shape during larval stages. When looking at the pupal wing shape analysis in Figure 2L, however, it looks there is a difference in wt over time (6h-18h, consistent with literature), but that difference in time goes away in RNAi-ds, indicating that actually there is a role for Ds in changing shape during pupal stages, although the phenotype is clearly less dramatic than that of larval stages. No statistical test was done over time (within the genotype), however, so it's hard to say. I recommend the authors test over time - whether 6h and 18h are different in wild type and in ds mutant. I think this is especially important because there is proximal overgrowth in the Fat/Ds mutants, much of which is contained in the folds during larval stages. That first fold, however, becomes the proximal part of the pupal wing after eversion and contracts during pupal stages to elongate the blade (Aiguoy 2010, Etournay 2015). Also, according to Trinidad Curr Biol 2025, there is a role for Fat/Ds pathway in pupal stages. All of that to say that it seems likely that there would be a phenotype in pupal stages. It's true it doesn't show up in the adult wing in the experiments in Fig 1, but looking at the pupal wing itself is more direct - perhaps the very proximal effect is less prominent later, as there is potential for further development after 18hr before adulthood and the most proximal parts are likely anyway excluded in the analysis.

      Response: Our main purpose in examining pupal wing shape was to emphasize that wings lacking ds are visibly abnormal even at early pupal stages. The reviewer makes the point that the change in shape from 6h to 18h APF is greater in control wings than in RNAi-ds wings. We have added quantitation of this to the revised manuscript as suggested. This difference could be interpreted as indicating that Ds-Fat signaling actively contributes to wing shape during pupal morphogenesis. However, given the genetic evidence that Ds-Fat signaling influences wing shape only during larval growth, we favor the interpretation that it reflects consequences of Ds-Fat action during larval stages – eg, overgrowth of the wing, particularly the proximal wing and hinge as occurs in ds and fat mutants, could result in relatively less elongation during the pupal hinge contraction phase. This wouldn’t change our key conclusions, but it is something that we discuss in a revised manuscript.

      I think there needs to be a mention and some discussion of the fact that the wing is not really flat. While it starts out very flat at 72h, by 96h and beyond, there is considerable curvature in the pouch that may affect measurements of different axis and cell shape. It is not actually specified in the methods, so I assume the measurements were taken using a 2D projection. Not clear whether the curvature of the pouch was taken into account, either for cell shape measurements presented in Fig 4 or for the wing pouch dimensional analysis shown in Fig 3, 6, and supplements. Do perturbations in Ft/Ds affect this curvature? Are they more or less curved in one or both axes? Such a change could affect the results and conclusions. The extent to which the fat/ds mutants fold properly is another important consideration that is not mentioned. For example, maybe the folds are deeper and contain more material in the ds/fat mutants, and that's why the pouch is a different shape? At the very least, this point about the 3D nature of the wing disc must be raised in discussion of the limitations of the study. For the cell shape analysis, you can do a correction based on the local curvature (calculated from the height map from the projection). For the measurement of A/P, D/V axes of the wing pouch, best would be to measure the geodesic distance in 3D, but this is not reasonable to suggest at this point. One can still try to estimate the pouch height/curvature, however, both in wild type and in fat/ds mutants.

      Response: The wing pouch measurements were done on 2D projections of wing discs that were already slightly flattened by coverslips, so there is not much curvature outside of the folds. We will revise the methods to make sure this is clear. While we recognize that the absolute values measured can be affected by this, our conclusions are based on the qualitative differences in proportions between genotypes and time points, and we wouldn’t expect these to differ significantly even if 3D distances were measured. Obtaining accurate 3D measures is technically more challenging - it requires having spacers matching the thickness of the wing disc, which varies at different time points and genotypes, and then measuring distances across curved surfaces. What we propose to address this is to do a limited set of 3D measures on wild-type and dsmutant wing discs at early and late stages and which we expect will confirm our expectation that the conclusions of our analysis are unaffected, while at the same time providing an indication of how much curvature affects the values obtained. We will also make sure the issue of wing disc curvature and folds is discussed in the text.

      Minor comments: 1. The analysis of the laser ablation is not really standard - usually one looks at recoil velocity or a more complicated analysis of the equilibrium shape using a model (e.g Shivakumar and Lenne 2016, Piscitello-Gomez 2023, Dye et al 2021). One may be able to extract more information from these experiments - nevertheless, I doubt the conclusions would change, given that that there seems to be a pretty clear difference between wt and ds (OPTIONAL).

      Response: We will add measurements of recoil velocities to complement our current analysis of circular cuts.

      Figure 7G: I think you also need a statistical test between RNAi-ds and UAS-rokCA+RNAi-ds.

      Response: We include this statistical test in the revised manuscript (it shows that they are significantly different).

      In the discussion, there is a statement: "However, as mutation or knock down of core PCP components, including pk or sple, does not affect wing shape... 59." Reference 59 is quite old and as far as I can tell shows neither images nor quantifications of the wing shape phenotype (not sure it uses "knockdown" either - unless you mean hypomorph?). A more recent publication Piscitello-Gomez et al Elife 2023 shows a very subtle but significant wing shape phenotype in core PCP mutants. It doesn't change your logic, but I would change the statement to be more accurate by saying "mutation of core PCP components has only subtle changes in adult wing shape"

      Response: Thank-you for pointing this out, we have revised the manuscript accordingly.

      **Referee cross-commenting**

      Reviewer2: Reviewer 2 makes the statement: "The distance along the AP boundary from the pouch border to DV midline is topologically comparable to the PD length of the adult wing. The distance along the DV boundary from A border to P border is topologically comparable to the AP length of the adult wing."

      I disagree - the DV boundary wraps around the entire margin of the adult wing (as correctly drawn with the pink line in Fig 2A). It is not the same as the wide axis of the adult wing (perpendicular to the AP boundary). It is not trivial to map the proximal-distal axis of the larval wing to the proximal-distal axis of the adult, due to the changes in shape that occur during eversion. Thus, I find it much easier to look at the exact measurement that the authors make, and it is much more standard in the field, rather than what the reviewer suggests. Alternatively, one could I guess measure in the adult the ratio of the DV margin length (almost the circumference of the blade?) to the AP boundary length. That may be a more direct comparison. Actually the authors leave out the term "boundary" - what they call AP is actually the AP boundary, not the AP axis, and likewise for the DV - what they measure is DV boundary, but I only noticed that in the second read-through now. Just another note, these measurements of the pouch really only correspond to the very distal part of the wing blade, as so much of the proximal blade comes from the folds in the wing disc. Therefore, a measurement of only distal wing shape would be more comparable.

      Response: We thank Reviewer 1 for their comments here. In terms of the region measured, we measure to the inner Wg ring in the disc, the location of this ring in the adult is actually more proximal than described above (eg see Fig 1B of Liu, X., Grammont, M. & Irvine, K. D. Roles for scalloped and vestigial in regulating cell affinity and interactions between the wing blade and the wing hinge. Developmental Biology 228, 287–303 (2000)), and this defines roughly the region we have measured in adult wings (with the caveat noted above that the measurements in the disc can be affected by curvature and the hinge/pouch fold, which we will address).

      Reviewer 2 states that authors cannot definitively conclude anything about mechanical tension from their reported cutting data because the authors have not looked at initial recoil velocity. I strongly disagree. __The wing disc tissue is elastic on much longer timescales than what's considered after laser ablation (even hours), and the shape of the tissue after it equilibrates from a circular cut (1-2min) can indeed be used to infer tissue stresses (see Dye et al Elife 2021, Piscitello-Gomez et al eLife 2023, Tahaei et al arXiv 2024).__ In the wing disc, the direction of stresses inferred from initial recoil velocity are correlated with the direction of stresses inferred from analysing the equilibrium shape after a circular cut. Rearrangements, a primary mechanism of fluidization in epithelia, does not occur within 1'. Analysing the equilibrium shape after circular ablation may be more accurate for assessing tissue stresses than initial recoil velocity - in Piscitello-Gomez et al 2023, the authors found that a prickle mutation (PCP pathway) affected initial recoil velocity but not tissue stresses in the pupal wing. Such equilibrium circular cuts have also been used to analyze stresses in the avian embryo, where it correlates with directions of stress gathered from force inference methods (Kong et al Scientific Reports 2019). The Tribolium example noted by the reviewer is on the timescale of tens to hundreds of minutes - much longer than the timescale of laser ablation retraction. It is true the analysis of the ablation presented in this paper is not at the same level as those other cited papers and could be improved. But I don't think the analysis would be improved by additional experiments doing timelapse of initial retraction velocity.

      Response: Thank-you, we agree with Reviewer 1 here.

      Reviewer 2 states "If cell anistropy is caused by polarized myosin activity, that activity is typically polarized along the short edges not long edges" Not true in this case. Myosin II accumulates along long boundaries (Legoff and Lecuit 2013). "Therefore, interpreting what causes the cell anistropy and how DS regulates it is difficult," Agreed - but this is well beyond the scope of this manuscript. The authors clearly show that there is a change of cell shape, at least in these two regions. Better would be to quantify it throughout the pouch and across multiple discs. Similar point for myosin quantifications - yes, polarity would be interesting and possible to look at in these data, and it would be better to do so on multiple discs, but the lack of overall myosin on the junctions shown here is not nothing. Interpreting what Ft/Ds does to influence tension and myosin and eventually tissue shape is a big question that's not answered here. I think the authors do not claim to fully understand this though, and maybe further toning down the language of the conclusions could help.

      Response: We agree with Reviewer 1 here and will also add quantitation of myosin across multiple discs and will include higher magnification myosin images and polarity tests.

      Reviewer 3: I agree with many of the points raised by Reviewer 3, in particular that relevant for Fig 1. The additional experiments looking at myosin II localization and laser ablation in the other perturbations (Hippo and Rok mutants/RNAi) would certainly strengthen the conclusions.

      Response: Reviewer 3 comment on Fig 1 requests Ab stains to assess recovery of expression after downshift, which we will do.

      We will add examination of myosin localization in hpo RNAi wing discs, and in the ds/rok combinations. We note that the effects of Rok manipulations on myosin and on recoil velocity have been described previously (eg Rauskolb et al 2014).

      Reviewer #1 (Significance (Required)): I think the work provides a clear conceptual advance, arguing that the Ft/Ds pathway can influence mechanical stress independently of its interaction with Hippo and growth. Such a finding, if conserved, could be quite important for those studying morphogenesis and Fat function in this and other organisms. For this point, the genetic approach is a clear strength. Previous work in the Drosophila wing has already shown an adult wing phenotype for Ft/Ds mutations that was attributed to its role in the larval growth phase, as marked clones show aberrant growth in mutants. The novelty of this work is the dissection of the temporal progression of this phenotype and how it relates to Hippo and myosin II activation. It remains unclear exactly how Ft/Ds may affect tissue tension, except that it involves a downregulation of myosin II - the mechanism of that is not addressed here and would involve considerable more work. I think the temporal analysis of the wing pouch shape was quite revealing, providing novel information about how the phenotype evolves in time, in particular that there is already a phenotype quite early in development. As mentioned above, however, the lack of consideration of the wing disc as a 3D object is a potential limitation. While the audience is likely mostly developmental biologists working in basic research, it may also interest those studying the pathway in other contexts, including in vertebrates given its conservation and role in other processes.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __ The manuscript begins with very nice data from a ts sensitive period experiment. Instead of a ts mutation, the authors induced RNAi in a temperature dependent manner. The results are striking and strong. Knockdown of FT or DS during larval stages to late L3 changed shape while knockdown of FT or DS during later pupal stages did not. This indicates they are required during larval, not pupal stages of wing development for this shape effect. They did shift-up or shift-down at "early pupa stage" but precisely what stage that means was not described anywhere in the manuscript. White prepupal? Time? Likewise a shift-down was done at "late L3" but that meaning is also vague. Moreover, I was surprised to see they did not do a shift-up at the late L3 stage, to give completeness to the experiment. Why?

      Response: We have added more precise descriptions of the timing, and we will also add the requested late L3 shift-up experiment.

      Looking at the "shape" of the larval wing pouch they see a difference in the mutants. The pouch can be approximated as an ellipse, but with differing topology to the adult wing. Here, they muddled the analysis. The adult wing surface is analogous to one hemisphere of the larval wing pouch, ie., either dorsal or ventral compartment. The distance along the AP boundary from the pouch border to DV midline is topologically comparable to the PD length of the adult wing. The distance along the DV boundary from A border to P border is topologically comparable to the AP length of the adult wing. They confusingly call this latter metric the "DV length" and the former metric the "AP length" , and in fact they do not measure the PD length but PD+DP length. Confusing. Please change to make this consistent with earlier analysis of the adult and invert the reported ratio and divide by two.

      Then you would find the larval PD/AP ratio is smaller in the FT and DS mutants than wildtype, which resembles the smaller PD/AP ratio seen in the mutant adult wings. Totally consistent and also provides further evidence with the ts experiments that FT and DS exert shape effects in the larval phase of life.

      Response: As noted by Reviewer 1 in cross-referencing, some of the statements made by Reviewer 2 here are incorrect, eg “The distance along the DV boundary from A border to P border is topologically comparable to the AP length of the adult wing.” They are correct where they note that the A-P length we measure in the discs is actually equivalent to 2x the adult wing length, since we are measuring along both the dorsal and ventral wing, but this makes no difference to the analysis as the point is to compare shape between time points and genotypes, not to make inferences based on the absolute numbers obtained. The numerical manipulations suggested are entirely feasible but we think they are unnecessary.

      The remainder of the manuscript has experimental results that are more problematic, and really the authors do not figure out how the shape effect in larval stages is altered. I outline below the main problems.

      1. They compare the FT DS shape phenotypes to those of mutants or knockdowns in Hippo pathway genes (Hippo is known to be downstream of FT and DS). They find these Hippo perturbations do have shape effects trending in same direction as FT and DS effects. Knockdown reduces the PD/AP ratio while overexpressing WARTS increases the PD/AP ratio. The effect magnitudes are not as strong, but then again, they are using hypomorphic alleles and RNAi, which often induces partial or hypomorphic phenotypes. The effect strength is comparable when wing pouches are young but then dissipates over time, while FT and DS effects do not dissipate over time. The complexity of the data do not negate the idea that Hippo signaling is also playing some role and could be downstream of FT and DS in all of this. But the authors really downplay the data to the point of stating "These results imply that Ds-Fat influences wing pouch shape during wing disc growth separately from its effects on Hippo signaling." I think a more expansive perspective is needed given the caveats of the experiments.

      Response: Our results emphasize that the effects of Ds-Fat on wing shape cannot be explained solely by effects on Hippo signaling, eg as we stated on page 7 “These observations suggest that Hippo signaling contributes to, but does not fully explain, the influence of ds or fat on adult wing shape.” We also note that impairment of Hippo signaling has similar effects in younger discs, but very different effects in older discs, which clearly indicates that they are having very different effects during disc growth; we will revise the text to make sure our conclusions are clear.

                    The reviewer wonders whether some of the differences could be due to the nature of the alleles or gene knockdown. First, the *ex*, *ds*, and *fat* alleles that we use are null alleles (eg see FlyBase), so it is not correct to say that we use only hypomorphic alleles and RNAi. We do use a hypomorphic allele for wts, and RNAi for hpo, for the simple reason that null alleles in these genes are lethal, so adult wings could not be examined. A further issue that is not commented on by the reviewer, but is more relevant here, is that there are multiple inputs into Hippo signaling, so of course even a null allele for ex, ds or fat is not a complete shutdown of Hippo signaling. Nonetheless, one can estimate the relative impairment of Hippo signaling by measuring the increased size of the wings, and from this perspective the knockdown conditions that we use are associated with roughly comparable levels of Hippo pathway impairment, so we stand by our results. We do however, recognize that these issues could be discussed more clearly in the text, and will do so in a revised manuscript.
      

      Puzzlingly, this lack of taking seriously a set of complex results does not transfer to another set of experiments in which they inhibit or activate ROK, the rho kinase. When ROK is perturbed, they also see weak effects on shape when compared to FT or DS perturbation. This weakness is seen in adults, larvae, clones and in epistasis experiments. The epistasis experiment in particular convincingly shows that constitutuve ROK activation is not epistatic to loss of DS; in fact if anything the DS phenotype suppresses the ROK phenotype. These results also show that one cannot simply explain what FT and DS are doing with some single pathway or effector molecule like ROK. It is more complex than that.

      What I really think was needed were experiments combining FT and DS knockdown with other mutants or knockdowns in the Hippo and Rho pathways, and even combining Hippo and Rho pathway mutants with FT or DS intact, to see if there are genetic interactions (additive, synergistic, epistatic) that could untangle the phenotypic complexity.

      Response: We’re puzzled by these comments. First, we never claimed that what Fat or Ds do could be explained simply by manipulation of Rok (eg, see Discussion). Moreover, examination of wings and wing discs where ds is combined with Rho manipulations is in Fig 7, and Hippo and Rho pathway manipulation combinations are in Fig S5. We don’t think that combining ds or fat mutations with other Hippo pathway mutations would be informative, as it is well established that Ds-Fat are upstream regulators of Hippo signaling.

      Laser cutting experiments were done to see if there is anisotropy in tissue tension within the wing pouch. This was to test a favored idea that FT and DS activity generates anisotropy in tissue tension, thereby controlling overall anisotropic shape of the pouch. However there is a fundamental flaw to their laser cutting analysis. Laser cutting is a technique used to measure mechanical tension, with initial recoil velocity directly proportional to the tissue's tension. By cutting a small line and observing how quickly the edges of the cut snap apart, people can quantify the initial recoil velocity and infer the stored mechanical stress in the tissue at the time of ablation. Live imaging with high-speed microscopy is required to capture the immediate response of the tissue to the cut since initial recoil velocity occurs in the first few seconds. A kymograph is created by plotting the movement of the tissue edges over this time scale, perpendicular to the cut. The initial recoil velocity is the slope of the kymograph at time zero, representing how fast the severed edges move apart. A higher recoil velocity indicates higher mechanical tension in the tissue. However, the authors did not measure this initial recoil velocity but instead measured the distance between the severed edges at one time point: 60 seconds after cutting. This is much later than the time point at which the recoil usually begins to dissipate or decay. This decay phase typically lasts a minute or two, during which time the edges continue to separate but at a progressively slower rate. This time-dependent decay of the recoil reveals whether the tissue behaves more like a viscous fluid or an elastic solid. Therefore, the distance metric at 60 seconds is a measurement of both tension and the material properties of the cells. One cannot know then whether a difference in the distance is due to a difference in tension or fluidity of the cells. If the authors made measurements of edge separation at several time points in the first 10 seconds after ablation, they can deconvolute the two. Otherwise their analysis is inconclusive. Anisotropy in recoil could be caused by greater tissue fluidity along one axis. Observing a gradient of cell fluidity in a tissue along one axis of a tissue has been observed in the amnioserosa of Tribolium for example. (Related and important point - was the anisotropy of recoil oriented along the PD or AP axis or not oriented to either axis, this key point was never stated)..

      The authors cannot definitiviely conclude anything about mechanical tension from their reported cutting data.

      Response: As noted by Reviewer 1 in cross-commenting, there is no fluidity on a time scale of 1 minute in the wing disc, and circular ablations are an established methods to investigate tissue stress. We choose the circular ablation method in part because it interrogates stress over a larger area, whereas cutting individual junctions is subject to more variability, particularly as the orientation of the junction (eg radial vs tangential) impacts the tension detected in the wing disc. Nonetheless, we will add recoil measurements to the revised manuscript to complement our circular ablations, which we expect will provide independent confirmation of our results and address the Reviewer’s concern here.

      They measured the eccentricity of wing pouch cells near the pouch border, and found they were highly anisotropic compared to DS mutant cells at comparable locations. Cells were elongated but again what if either axis (PD or AP) they were elongated along was never stated. If cell anistropy is caused by polarized myosin activity, that activity is typically polarized along the short edges not long edges. Thus, recoil velocity after laaser cutting would be stonger along the axis aligned with short cell edges. It looks like the cutting anisotropy they see is greater along the axis aligned with long cell edges. Of course, if the cell anisotropy is caused by a pulling force exerted by the pouch boundary, then it would stretch the cells. This would in fact fit their cutting data. But then again, the observed cell anisotropy could also be caused by variation in the fluid-solid properties of the wing cells as discussed earlier. Compression of the cells then would deform them anisotropically and produce the anisotropic shapes that were observed, Therefore, interpreting what causes the cell anistropy and how DS regulates it is difficult,

      Response: As noted by Reviewer 1 in cross-commenting, it is well established that tension and myosin are higher along long edges in the proximal wing. However, we acknowledge that we could do a better job of making the location and orientation of the regions shown in these experiments clear and, we will address this in a revised manuscript.

      The imaging and analysis of the myosin RLC by GFP tagging is also flawed. SQH-GFP is a tried and true proxy for myosin activity in Drosophila. Although the authors image the wing pouch of wildtype and DS mutants. they did so under low magnification to image the entire pouch. This gives a "low-res" perspective of overall myosin but what they needed to do was image at high magnification in that proximal region of the pouch and see if Sqh-GFP is polarized in wildtype cells along certain cell edges aligned with an axis. And if such a polarity is observed, is it present or absent in the DS mutant. From the data shown in Figure 5, I cannot see any significant difference between wildtype and knocked down samples at this low resolution. Any difference, if there is any, is not really interpretable.

      Response: We agree that examination of myosin localization at high resolution to see if it is polarized is a worthwhile experiment. We did in fact do this, and myosin (Sqh:GFP) appeared unpolarized in ds mutants. However, the levels of myosin were so low that we didn’t feel confident in our assessment, so we didn’t include it. We now recognize that this was a mistake, and we will include high resolution myosin images and assessments of (lack of) polarity in a revised manuscript to address this comment.

      In conclusion, the manuscript has multiple problems that make it imposiible for the authors to make the claims they make in the current manuscript. And even if they calibrated their interpretations to fit the data, there is not much of a simple clear picture as to how FT and DS regulate pouch eccentricity in the larval wing.

      Response: We think that the legitimate issues raised are addressable, as described above, while some of the criticisms are incorrect (as noted by Reviewer 1).

      Reviewer #2 (Significance (Required)): This manuscript describes experiments studying the role that the protocadherins FAT and DACHSOUS play in determining the two dimensional "shape" of the fruit fly wing. By "shape", the manuscript really means how much the wing's outline, when approximated as an ellipse, deviates from a circle. The elliptical approximations of FT and DS mutant wings more closely resemble a circle compared to the more eccentric wildtype wings. This suggests the molecules contribute to anisotropic growth in some way. A great deal of attention has been paid on how FT and DS regulate overall organ growth and planar cell polarity, and the Irvine lab has made extensive contributions to these questions over the years. Somewhat understudied is how FT and DS regulate wing shape, and this manuscript focuses on that. It follows up on an interesting result that the Irvine lab published in 2019, in which mud mutants randomized spindle pole orientation in wing cells but did not change the eccentricity of wings, ruling out biased cell division orientation as a mechanism for the anisotropic growth.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __ Summary The authors investigate the mechanisms underlying epithelial morphogenesis using the Drosophila wing as a model system. Specifically, they analyze the contribution of the conserved Fat/Ds pathway to wing shape regulation. The main claim of the manuscript is that Ds/Fat controls wing shape by regulating tissue mechanical stress through MyoII levels, independently of Hippo signaling and tissue growth.

      Major Comments To support their main conclusions, the authors should address the following major points and consider additional experiments where indicated. Most of the suggested experiments are feasible within a reasonable timeframe, while a few are more technically demanding but would substantially strengthen the manuscript's central claims.

      Figure 1: The authors use temperature-sensitive inactivation of Fat or Ds to determine the developmental window during which these proteins regulate wing shape. To support this claim, it is essential to demonstrate that upon downshift during early pupal stages, Ds or Fat protein levels are restored to normal. For consistency, please include statistical analyses in Figure 1P and ensure that all y-axis values in shape quantifications start at 1.

      Response: We will do the requested antibody stains for Fat (Ds antibody is unfortunately no longer available, but the point made by the reviewer can be addressed by Fat as the approach and results are the same for both genes). We have also added the requested statistical analysis to Fig 1P, and adjusted the scales as requested.

      Figure 2: The authors propose that wing shape is regulated by Fat/Ds during larval development. However, Figure 2L suggests that wing elongation occurs in control conditions between 6 and 12 h APF, while this elongation is not observed upon Ds RNAi. The authors should therefore perform downshift experiments while monitoring wing shape during the pupal stage to substantiate their main claim. In addition, equivalent data for Fat loss of function should be included to support the assertion that Fat and Ds act similarly.

      Response: As noted in our response to point 1 of Reviewer 1, we agree that there does seem to be relatively more elongation in control wings than in ds RNAi wings, but we think this likely reflects effects of ds on growth during larval stages, and we will revise the manuscript to comment on this.

      We will also add the suggested examination of fat RNAi pupal wings.

      The suggested examination of pupal wing shape in downshift experiments is unfortunately not feasible. Our temperature shift experiments expressing ds or fat RNAi are done using the UAS-Gal4-Gal80tssystem. We also use the UAS-Gal4 system to mark the pupal wing. If we do a downshift experiment, then expression of the fluorescent marker will be shut down in parallel with the shut down of ds or fat RNAi, so the pupal wings would no longer be visible.

      Figure 3: The authors state that "These observations indicate that Ds-Fat signaling influences wing shape during the initial formation of the wing pouch, in addition to its effects during wing growth." This conclusion is not fully supported, as the authors only examine wing shape at 72 h AEL. At this stage, fat or ds mutant wings already display altered morphology. The authors could only make this claim if earlier time points were fully analyzed. In fact, the current data rather suggest that Ds function is required before 72 h AEL, as a rescue of wing shape is observed between 72 and 120 h AEL.

      Response: First, I think we are largely in agreement with the Reviewer, as the basis for our saying that DS-Fat are likely required during initial formation of the wing pouch is that our data show they must be required before 72 h AEL. Second, 72 h is the earliest that we can look using Wg expression as a marker, as at earlier stages it is in a ventral wedge rather than a ring around the future wing pouch + DV line (eg see Fig 8 of Tripathi, B. K. & Irvine, K. D. The wing imaginal disc. Genetics (2022) doi:10.1093/genetics/iyac020.). We can revise the text to make sure this is clear.

      Figure 4: The authors state that "The influence of Ds-Fat on wing shape is not explained by Hippo signaling." However, this conclusion is not supported by their data, which show that partial loss of ex or hippo causes clear defects in wing shape. In addition, the initial wing shape is affected in wts and ex mutants, and hypomorphic alleles were used for these experiments. Therefore, the main conclusion requires revision. It would be useful to include a complete dataset for hippo RNAi, ex, and wts conditions in Figure S1. The purpose and interpretation of the InR^CA experiments are also unclear. While InR^CA expression can increase tissue growth, Hippo signaling has functions beyond growth control. Whether Hippo regulates tissue shape through InR^CA-dependent mechanisms remains to be clarified.

      Response: As noted in our response to point 1 of Reviewer 2 - our results emphasize that the effects of Ds-Fat on wing shape cannot be explained solely by effects on Hippo signaling, eg as we stated on page 7 “These observations suggest that Hippo signaling contributes to, but does not fully explain, the influence of ds or fat on adult wing shape.” We also note that impairment of Hippo signaling has similar effects in younger discs, but very different effects in older discs, which clearly indicates that they are having very different effects during disc growth. We will make some revisions to the text to make sure that our conclusions are clear throughout.

      While we used a hypomorphic allele for wts, because null alleles are lethal, the ex allele that we used is described in Flybase as an amorph, not a hypomorph, and as noted in our response to Reviewer 2, we will add some discussion about relative strength of effects on Hippo signaling.

      In Fig S1, we currently show adult wings for ex[e1] and RNAi-Hpo, and wing discs for wts[P2]/wts[x1], and for ex[e1]. The wts combination does not survive to adult so we can’t include this. We will however, add hpo RNAi wing discs as requested.

                    The purpose of including InR^CA experiments is to try to separate effects of Hippo signaling from effects of growth, because InR signaling manipulation provides a distinct mechanism for increasing growth. We will revise the text to try to make sure this is clearer.
      

      Figure 5: This figure presents images of MyoII distribution, but no quantification across multiple samples is provided. Moreover, the relationship between changes in tissue stress and MyoII levels remains unclear. Performing laser ablation and MyoII quantification on the same samples would provide stronger support for the proposed conclusions.

      Response: We will revise the quantitation so that it presents analysis of averages across multiple discs, rather than representative examples of single discs.

      Both the myosin imaging, and the laser ablation were done on the same genotypes (wildtype and ds) at the same ages (108 h AEL) so we think it is valid to directly compare them. Moreover, the imaging conditions for laser ablation and myo quantification are different, so it’s not feasible to do them at the same time (For ablations we do a single Z plane and a single channel (has to include Ecad, or an equivalent junctional marker) on live discs, so that fast imaging can be done. For Myo imaging we do multiple Z stacks and multiple channels (eg Ecad and Myo), which is not compatible with the fast imaging needed for analysis of laser ablations).

      Figure 6: It is unclear when Rok RNAi and Rok^CA misexpression were induced. To substantiate their claims, the authors should measure both MyoII levels and mechanical tension under the different experimental conditions in which wing shape was modified through Rok modulation (i.e. the condition shown in Fig. 7G). For comparison, fat and ds data should be added to Fig 6H. Overall, the effects of Rok modulation appear milder than those of Fat manipulation. Given that Dachs has been shown to regulate tension downstream of Fat/Ds, it would be informative to determine whether tissue tension is altered in dachs mutant wings and to assess the relative contribution of Dachs- versus MyoII-mediated tension to wing shape control. It would also be interesting to test whether Rok activation can rescue dachs loss-of-function phenotypes.

      Response: In these Rok experiments there was no separate temporal control of Rok RNAi or Rok^CA expression, they were expressed under nub-Gal4 control throughout development.

      We will add examination of myosin in combinations of ds RNAi and rok manipulation as in Fig 7G to a revised manuscript.

      Data for fat and ds comparable to that shown in Fig 6H is already presented in Fig 3D, and we don’t think its necessary to reproduce this again in Fig 6H.

      We agree that the effects of Rok manipulations are milder than those of Fat manipulations; as we try to discuss, this could be because the pattern or polarity of myosin is also important, not just the absolute level, and we will add assessment of myosin polarity.

      The suggestion to also look at dachs mutants is reasonable, and we will add this. In addition, we plan to add an "activated" Dachs (a Zyxin-Dachs fusion protein previously described in Pan et al 2013) that we anticipate will provide further evidence that the effects of Ds-Fat are mediated through Dachs. We will also add the suggested experiment combining Rok activation with dachs loss-of-function.

      Figure 7: The authors use genetic interactions to support their claim that Fat controls wing shape independently of Hippo signaling. However, these interactions do not formally exclude a role for Hippo. Moreover, previous work has shown that tissue tension regulates Hippo pathway activity, implying that any manipulation of tension could indirectly affect Hippo and growth. To provide more direct evidence, the authors should further analyze MyoII localization and tissue tension under the various experimental conditions tested (as also suggested above).

      Response: As discussed above, our data clearly show that Fat has effects independently of Hippo signaling that are crucial for its effects on wing shape, but we did not mean to imply that the regulation of Hippo signaling by Fat makes no contribution to wing shape control, and we will revise the text to make this clearer. We will also add additional analysis of Myosin localization , as described above.

      Reviewer #3 (Significance (Required)): How organ growth and shape are controlled remains a fundamental question in developmental biology, with major implications for our understanding of disease mechanisms. The Drosophila wing has long served as a powerful and informative model to study tissue growth and morphogenesis. Work in this system has been instrumental in delineating the conserved molecular and mechanical processes that coordinate epithelial dynamics during development. The molecular regulators investigated by the authors are highly conserved, suggesting that the findings reported here are likely to be of broad biological relevance.

      Previous studies have proposed that anisotropic tissue growth regulates wing shape during larval development and that such anisotropy induces mechanical responses that promote MyoII localization (Legoff et al., 2013, PMID: 24046320; Mao et al., 2013, PMID: 24022370). The Ds/Fat system has also been shown to regulate tissue tension through the Dachs myosin, a known modulator of the Hippo/YAP signaling pathway. As correctly emphasized by the authors, the respective contributions of anisotropic growth and mechanical tension to wing shape control remain only partially understood. The current study aims to clarify this issue by analyzing the role of Fat/Ds in controlling MyoII localization and, consequently, wing shape. This represents a potentially valuable contribution. However, the proposed mechanistic link between Fat/Ds and MyoII localization remains insufficiently explored. Moreover, the role of MyoII is not fully discussed in the broader context of Dachs function and its known interactions with MyoII (Mao et al., 2011, PMID: 21245166; Bosveld et al., 2012, PMID: 22499807; Trinidad et al., 2024, PMID: 39708794). Most importantly, the experimental evidence supporting the authors' conclusions would benefit from further strengthening. It should also be noted that disentangling the relative contributions of anisotropic growth and MyoII polarization to tissue shape and size remains challenging, as MyoII levels are known to increase in response to anisotropic growth (Legoff et al., 2013; Mao et al., 2013), and mechanical tension itself can modulate Hippo/YAP signaling (Rauskolb et al., 2014, PMID: 24995985).

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary

      The authors investigate the mechanisms underlying epithelial morphogenesis using the Drosophila wing as a model system. Specifically, they analyze the contribution of the conserved Fat/Ds pathway to wing shape regulation. The main claim of the manuscript is that Ds/Fat controls wing shape by regulating tissue mechanical stress through MyoII levels, independently of Hippo signaling and tissue growth.

      Major Comments

      To support their main conclusions, the authors should address the following major points and consider additional experiments where indicated. Most of the suggested experiments are feasible within a reasonable timeframe, while a few are more technically demanding but would substantially strengthen the manuscript's central claims.

      Figure 1:

      The authors use temperature-sensitive inactivation of Fat or Ds to determine the developmental window during which these proteins regulate wing shape. To support this claim, it is essential to demonstrate that upon downshift during early pupal stages, Ds or Fat protein levels are restored to normal. For consistency, please include statistical analyses in Figure 1P and ensure that all y-axis values in shape quantifications start at 1.

      Figure 2:

      The authors propose that wing shape is regulated by Fat/Ds during larval development. However, Figure 2L suggests that wing elongation occurs in control conditions between 6 and 12 h APF, while this elongation is not observed upon Ds RNAi. The authors should therefore perform downshift experiments while monitoring wing shape during the pupal stage to substantiate their main claim. In addition, equivalent data for Fat loss of function should be included to support the assertion that Fat and Ds act similarly.

      Figure 3:

      The authors state that "These observations indicate that Ds-Fat signaling influences wing shape during the initial formation of the wing pouch, in addition to its effects during wing growth." This conclusion is not fully supported, as the authors only examine wing shape at 72 h AEL. At this stage, fat or ds mutant wings already display altered morphology. The authors could only make this claim if earlier time points were fully analyzed. In fact, the current data rather suggest that Ds function is required before 72 h AEL, as a rescue of wing shape is observed between 72 and 120 h AEL.

      Figure 4:

      The authors state that "The influence of Ds-Fat on wing shape is not explained by Hippo signaling." However, this conclusion is not supported by their data, which show that partial loss of ex or hippo causes clear defects in wing shape. In addition, the initial wing shape is affected in wts and ex mutants, and hypomorphic alleles were used for these experiments. Therefore, the main conclusion requires revision. It would be useful to include a complete dataset for hippo RNAi, ex, and wts conditions in Figure S1. The purpose and interpretation of the InR^CA experiments are also unclear. While InR^CA expression can increase tissue growth, Hippo signaling has functions beyond growth control. Whether Hippo regulates tissue shape through InR^CA-dependent mechanisms remains to be clarified.

      Figure 5:

      This figure presents images of MyoII distribution, but no quantification across multiple samples is provided. Moreover, the relationship between changes in tissue stress and MyoII levels remains unclear. Performing laser ablation and MyoII quantification on the same samples would provide stronger support for the proposed conclusions.

      Figure 6:

      It is unclear when Rok RNAi and Rok^CA misexpression were induced. To substantiate their claims, the authors should measure both MyoII levels and mechanical tension under the different experimental conditions in which wing shape was modified through Rok modulation (i.e. the condition shown in Fig. 7G). For comparison, fat and ds data should be added to Fig 6H.<br /> Overall, the effects of Rok modulation appear milder than those of Fat manipulation. Given that Dachs has been shown to regulate tension downstream of Fat/Ds, it would be informative to determine whether tissue tension is altered in dachs mutant wings and to assess the relative contribution of Dachs- versus MyoII-mediated tension to wing shape control. It would also be interesting to test whether Rok activation can rescue dachs loss-of-function phenotypes.

      Figure 7:

      The authors use genetic interactions to support their claim that Fat controls wing shape independently of Hippo signaling. However, these interactions do not formally exclude a role for Hippo. Moreover, previous work has shown that tissue tension regulates Hippo pathway activity, implying that any manipulation of tension could indirectly affect Hippo and growth. To provide more direct evidence, the authors should further analyze MyoII localization and tissue tension under the various experimental conditions tested (as also suggested above).

      Significance

      How organ growth and shape are controlled remains a fundamental question in developmental biology, with major implications for our understanding of disease mechanisms. The Drosophila wing has long served as a powerful and informative model to study tissue growth and morphogenesis. Work in this system has been instrumental in delineating the conserved molecular and mechanical processes that coordinate epithelial dynamics during development. The molecular regulators investigated by the authors are highly conserved, suggesting that the findings reported here are likely to be of broad biological relevance.

      Previous studies have proposed that anisotropic tissue growth regulates wing shape during larval development and that such anisotropy induces mechanical responses that promote MyoII localization (Legoff et al., 2013, PMID: 24046320; Mao et al., 2013, PMID: 24022370). The Ds/Fat system has also been shown to regulate tissue tension through the Dachs myosin, a known modulator of the Hippo/YAP signaling pathway. As correctly emphasized by the authors, the respective contributions of anisotropic growth and mechanical tension to wing shape control remain only partially understood. The current study aims to clarify this issue by analyzing the role of Fat/Ds in controlling MyoII localization and, consequently, wing shape. This represents a potentially valuable contribution. However, the proposed mechanistic link between Fat/Ds and MyoII localization remains insufficiently explored. Moreover, the role of MyoII is not fully discussed in the broader context of Dachs function and its known interactions with MyoII (Mao et al., 2011, PMID: 21245166; Bosveld et al., 2012, PMID: 22499807; Trinidad et al., 2024, PMID: 39708794). Most importantly, the experimental evidence supporting the authors' conclusions would benefit from further strengthening. It should also be noted that disentangling the relative contributions of anisotropic growth and MyoII polarization to tissue shape and size remains challenging, as MyoII levels are known to increase in response to anisotropic growth (Legoff et al., 2013; Mao et al., 2013), and mechanical tension itself can modulate Hippo/YAP signaling (Rauskolb et al., 2014, PMID: 24995985).

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors describe a new computational method (SegPore), which segments the raw signal from nanopore direct RNA-Seq data to improve the identification of RNA modifications. In addition to signal segmentation, SegPore includes a Gaussian Mixture Model approach to differentiate modified and unmodified bases. SegPore uses Nanopolish to define a first segmentation, which is then refined into base and transition blocks. SegPore also includes a modification prediction model that is included in the output. The authors evaluate the segmentation in comparison to Nanopolish and Tombo (RNA002) as well as f5c and Uncalled 4 (RNA004), and they evaluate the impact on m6A RNA modification detection using data with known m6A sites. In comparison to existing methods, SegPore appears to improve the ability to detect m6A, suggesting that this approach could be used to improve the analysis of direct RNA-Seq data.

      Strengths:

      SegPore address an important problem (signal data segmentation). By refining the signal into transition and base blocks, noise appears to be reduced, leading to improved m6A identification at the site level as well as for single read predictions. The authors provide a fully documented implementation, including a GPU version that reduces run time. The authors provide a detailed methods description, and the approach to refine segments appears to be new.

      Weaknesses:

      The authors show that SegPore reduces noise compared to other methods, however the improvement in accuracy appears to be relatively small for the task of identifying m6A. To run SegPore, the GPU version is essential, which could limit the application of this method in practice.

      As discussed in Paragraph 4 of the Discussion, we acknowledge that the improvement of SegPore combined with m6Anet over Nanopolish+m6Anet in bulk in vivo analysis is modest. This outcome is likely influenced by several factors, including alignment inaccuracies caused by pseudogenes or transcript isoforms, the presence of additional RNA modifications that can affect signal baselines, and the fact that m6Anet is specifically trained on Nanopolish-derived events. Additionally, the absence of a modification-free (in vitro transcribed) control sample in the benchmark dataset makes it challenging to establish true k-mer baselines.

      Importantly, these challenges do not exist for in vitro data, where the signal is cleaner and better defined. As a result, SegPore achieves a clear and substantial improvement at the single-molecule level, demonstrating the strength of its segmentation approach and its potential to significantly enhance downstream analyses. These results indicate that SegPore is particularly well suited for benchmarking and mechanistic studies of RNA modifications under controlled experimental conditions, and they provide a strong foundation for future developments.

      We also recognize that the current requirement for GPU acceleration may limit accessibility in some computational environments. To address this, we plan to further optimize SegPore in future versions to support efficient CPU-only execution, thereby broadening its applicability and impact.

      Reviewer #2 (Public review):

      Summary:

      The work seeks to improve detection of RNA m6A modifications using Nanopore sequencing through improvements in raw data analysis. These improvements are said to be in the segmentation of the raw data, although the work appears to position the alignment of raw data to the reference sequence and some further processing as part of the segmentation, and result statistics are mostly shown on the 'data-assigned-to-kmer' level.

      As such, the title, abstract and introduction stating the improvement of just the 'segmentation' does not seem to match the work the manuscript actually presents, as the wording seems a bit too limited for the work involved.

      The work itself shows minor improvements in m6Anet when replacing Nanopolish' eventalign with this new approach, but clear improvements in the distributions of data assigned per kmer. However, these assignments were improved well enough to enable m6A calling from them directly, both at site-level and at read-level.

      A large part of the improvements shown appear to stem from the addition of extra, non-base/kmer specific, states in the segmentation/assignment of the raw data, removing a significant portion of what can be considered technical noise for further analysis. Previous methods enforced assignment of (almost) all raw data, forcing a technically optimal alignment that may lead to suboptimal results in downstream processing as datapoints could be assigned to neighbouring kmers instead, while random noise that is assigned to the correct kmer may also lead to errors in modification detection.

      For an optimal alignment between the raw signal and the reference sequence, this approach may yield improvements for downstream processing using other tools.

      Additionally, the GMM used for calling the m6A modifications provides a useful, simple and understandable logic to explain the reason a modification was called, as opposed to the black models that are nowadays often employed for these types of tasks.

      Weaknesses:

      The manuscript suggests the eventalign results are improved compared to Nanopolish. While this is believably shown to be true (Table 1), the effect on the use case presented, downstream differentiation between modified and unmodified status on a base/kmer, is likely limited for during downstream modification calling the noisy distributions are often 'good enough'. E.g. Nanopolish uses the main segmentation+alignment for a first alignment and follows up with a form of targeted local realignment/HMM test for modification calling (and for training too), decreasing the need for the near-perfect segmentation+alignment this work attempts to provide. Any tool applying a similar strategy probably largely negates the problems this manuscript aims to improve upon. Should a use-case come up where this downstream optimisation is not an option, SegPore might provide the necessary improvements in raw data alignment.

      Thank you for this thoughtful comment. We agree that many current state-of-the-art (SOTA) methods perform well on benchmark datasets, but we believe there is still substantial room for improvement. Most existing benchmarks are based on limited datasets, primarily focusing on DRACH motifs in human and mouse transcriptomes. However, m6A modifications can also occur in non-DRACH motifs, where current models tend to underperform. Furthermore, other RNA modifications, such as pseudouridine, inosine, and m5C, remain less studied, and their detection is likely to benefit from more accurate and informative signal modeling.

      It is also important to emphasize that raw signal segmentation and RNA modification detection are fundamentally distinct tasks. SegPore focuses on improving the segmentation step by producing a cleaner and more interpretable signal, which provides a stronger foundation for downstream analyses. Even if RNA modification detection algorithms such as m6Anet can partially compensate for noisy segmentation in specific cases, starting from a more accurate signal alignment can still lead to improved accuracy, robustness, and interpretability—particularly in challenging scenarios such as non-canonical motifs or less characterized modifications.

      Scientific progress in this field is often incremental, and foundational improvements can have a significant long-term impact. By enhancing raw signal segmentation, SegPore contributes an essential building block that we expect will enable the development of more accurate and generalizable RNA modification detection algorithms as the community integrates it into more advanced workflows.

      Appraisal:

      The authors have shown their methods ability to identify noise in the raw signal and remove their values from the segmentation and alignment, reducing its influences for further analyses. Figures directly comparing the values per kmer do show a visibly improved assignment of raw data per kmer. As a replacement for Nanopolish' eventalign it seems to have a rather limited, but improved effect, on m6Anet results. At the single read level modification modification calling this work does appear to improve upon CHEUI.

      Impact:

      With the current developments for Nanopore based modification calling largely focusing on Artificial Intelligence, Neural Networks and the likes, improvements made in interpretable approaches provide an important alternative that enables deeper understanding of the data rather than providing a tool that plainly answers the question of wether a base is modified or not, without further explanation. The work presented is best viewed in context of a workflow where one aims to get an optimal alignment between raw signal data and the reference base sequence for further processing. For example, as presented, as a possible replacement for Nanopolish' eventalign. Here it might enable data exploration and downstream modification calling without the need for local realignments or other approaches that re-consider the distribution of raw data around the target motif, such as a 'local' Hidden Markov Model or Neural Networks. These possibilities are useful for a deeper understanding of the data and further tool development for modification detection works beyond m6A calling.

      Reviewer #3 (Public review):

      Summary:

      Nucleotide modifications are important regulators of biological function, however, until recently, their study has been limited by the availability of appropriate analytical methods. Oxford Nanopore direct RNA sequencing preserves nucleotide modifications, permitting their study, however many different nucleotide modifications lack an available base-caller to accurately identify them. Furthermore, existing tools are computationally intensive, and their results can be difficult to interpret.

      Cheng et al. present SegPore, a method designed to improve the segmentation of direct RNA sequencing data and boost the accuracy of modified base detection.

      Strengths:

      This method is well described and has been benchmarked against a range of publicly available base callers that have been designed to detect modified nucleotides.

      Weaknesses:

      However, the manuscript has a significant drawback in its current version. The most recent nanopore RNA base callers can distinguish between different ribonucleotide modifications, however, SegPore has not been benchmarked against these models.

      The manuscript would be strengthened by benchmarking against the rna004_130bps_hac@v5.1.0 and rna004_130bps_sup@v5.1.0 dorado models, which are reported to detect m5C, m6A_DRACH, inosine_m6A and PseU.

      A clear demonstration that SegPore also outperforms the newer RNA base caller models will confirm the utility of this method.

      Thank you for highlighting this important limitation. While Dorado, the new ONT basecaller, is publicly available and supports modification-aware basecalling, suitable public datasets for benchmarking m5C, inosine, m6A, and PseU detection on RNA004 are currently lacking. Dorado’s modification-aware models are trained on ONT’s internal data, which is not publicly released. Therefore, it is currently not feasible to directly evaluate or compare SegPore’s performance against Dorado for these RNA modifications.

      We would also like to emphasize that SegPore’s primary contribution lies in raw signal segmentation, which is an upstream and foundational step in the RNA modification detection pipeline. As more publicly available datasets for RNA004 modification detection become accessible, we plan to extend our work to benchmark and integrate SegPore with modification detection tasks on RNA004 data in future studies.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Comments based on Author Response

      “However, it is valid to compare them on the segmentation task, where SegPore exhibits better performance (Table 1).”

      This dodges the point of the actual use case of this approach, as Nanopolish indeed does not support calling modifications for this kind of data, but the general approach it uses might, if adapted for this data, nullify the gains made in the examples presented.

      We respectfully disagree with the comment that the advantages demonstrated by SegPore could be “nullified”. Although SegPore’s performance is indeed more modest in in vivo datasets, it shows substantially better performance than CHEUI in in vitro data, clearly demonstrating that improved segmentation directly contributes to more accurate RNA modification estimation.

      It is worth noting that CHEUI relies on Nanopolish’s segmentation results for m6A detection. Despite this, SegPore outperforms CHEUI, further supporting the conclusion that segmentation quality has a meaningful impact on downstream modification calling.

      In conclusion, based on our current experimental results, SegPore is particularly well suited for RNA modification analysis from in vitro transcribed data, where its improved segmentation provides a clear advantage over existing methods.

      Further comments

      (2) “(2) Page 3  employ models like Hidden Markov Models (HMM) to segment the signal, but they are prone to noise and inaccuracies”

      “That's the alignment/calling part, not the segmentation?”

      “Current methods, such as Nanopolish, employ models like Hidden Markov Models (HMM) to segment the signal”

      I get the impression the word 'segment' has a different meaning in this work than what I'm used to based on my knowledge around Nanopolish and Tombo, see the deeper code examples further down below.

      Additionally, in Nanopolish there is a clear segmentation step (or event detection) without any HMM, then a sort of dynamic timewarping step that aligns the segments and re-combines some segments into a single segment where necessary afterwards. I believe the HMM in Nanopolish is not used at all unless modification calling, but if you can point out otherwise I'm open for proof.

      Now I believe it is the meaning of 'segmenting the signal' that confuses me, and now the clarification makes it a bit odd as well:

      “Nanopolish and Tombo align the raw signal to the reference sequence to determine which portion of the signal corresponds to each k-mer. We define this process as the segmentation task, referred to as "eventalign" in Nanopolish.”

      So now it's clearly stated the raw signal is being 'aligned' and then the process is suddenly defined as the 'segmentation task', and again referred to as "eventalign". Why is it not referred to as the 'alignment task' instead?

      I understand the segmentation and alignment parts are closely connected but to me, it seems this work picks the wrong word for the problem being solved.

      “Unlike Nanopolish and Tombo, which directly align the raw signal to the reference sequence,…”

      Looking at their code, I believe both Nanopolish and Tombo actually do segment the data first (or "event detection"), then they align the segments/events they found, and finally multiple events aligned to the same section are merged. See for yourself:

      Nanopolish:

      https://github.com/jts/nanopolish/blob/master/src/nanopolish_squiggle_read.cpp<br /> Line 233:

      cpp

      trim_and_segment_raw(fast5_data.rt, trim_start, trim_end, varseg_chunk, varseg_thresh);

      event_table et = detect_events(fast5_data.rt, *ed_params);

      Line 270:

      cpp

      // align events to the basecalled read

      std::vector event_alignment = adaptive_banded_simple_event_align(*this, *this->base_model[strand_idx], read_sequence);

      Where event detection is further defined at line 268 here:

      https://github.com/jts/nanopolish/blob/master/src/thirdparty/scrappie/event_detection.c

      Tombo:

      https://github.com/nanoporetech/tombo/blob/master/tombo/resquiggle.py

      line 1162 and onwards shows a ‘segment_signal’ call and the results are used in a ‘find_adaptive_base_assignment’ call, where ‘segment_signal’ starting at line 1057 tries to find where the signal jumps from a series of similar values to another (start of a base change in the pore), stored in ‘valid_cpts’, and the ‘find_adaptive_base_assignment’ tries to align the resulting segment values to the expected series of values:

      python

      valid_cpts, norm_signal, new_scale_values = segment_signal(

      map_res, num_events, rsqgl_params, outlier_thresh, const_scale)

      event_means = ts.compute_base_means(norm_signal, valid_cpts)

      dp_res = find_adaptive_base_assignment(

      valid_cpts, event_means, rsqgl_params, std_ref, map_res.genome_seq,

      start_clip_bases=map_res.start_clip_bases,

      seq_samp_type=seq_samp_type, reg_id=map_res.align_info.ID)

      These implementations are also why I find the choice of words for what is segmentation and what is alignment a bit confusing in this work, as both Tombo and Nanopolish do a similar, clear segmentation step (or an "event detection" step), followed by the alignment of the segments they determined. The terminology in this work appears to deviate from these.

      We thank the reviewer for the detailed comments!

      First of all, we sincerely apologize for our earlier misunderstanding regarding how Nanopolish and Tombo operate. Based on a closer examination of their source codes, we now recognize that both tools indeed include a segmentation step based on change-point detection methods, after which the resulting segments are aligned to the reference sequence. We have revised the relevant text in the manuscript accordingly:

      - “Current methods, such as Nanopolish, employ change-point detection methods to segment the signal and use dynamic programming methods and HMM to align the derived segments to the reference sequence,”

      - “We define this process as the segmentation and alignment task (abbreviated as the segmentation task), which is referred to as “eventalign” in Nanopolish.”

      - “In SegPore, we segment the raw signal into small fragments using a Hierarchical Hidden Markov Model (HHMM) and align the mean values of these fragments to the reference, where each fragment corresponds to a sub-state of a k-mer. By contrast, Nanopolish and Tombo use change-point–based methods to segment the signal and employ dynamic programming approaches together with profile HMMs to align the resulting segments to the reference sequence.”

      Regarding terminology, we originally borrowed the term “segmentation” from speech processing, where it refers to dividing continuous audio signals into meaningful units. In the context of nanopore signal analysis, segmentation and alignment are often tightly coupled steps. Because of this and because our initial focus was on methodological development rather than terminology, we used the term “segmentation task” to describe the combined process of signal segmentation and alignment.

      However, we now recognize that this terminology may cause confusion. Changing every instance of “segmentation” to “segmentation and alignment” or “alignment” would require substantial rewriting of the manuscript. Therefore, in this revision, we have clearly defined “segmentation task” as referring to the combined process of segmentation and alignment. We apologize for any earlier confusion and will adopt the term “alignment” in future work for greater clarity.

      (3) I think I do understand the meaning, but I do not understand the relevance of the Aj bit in the last sentence. What is it used for?

      Based on the response and another close look at Fig1, it turns out the j refers to extremely small numbers 1 and 2 in step 3. You may want in improve readability for these.

      Thank you for the suggestion. We have added subscripts to all nucleotides in the reference sequence in Figure 1A and revised the legend to clarify the notation and improve readability. Specifically, we now include the following explanation:

      “For example, A<sub>j</sub> denotes the base ‘A’ at the j-th position on the reference sequence. In this example, A<sub>1</sub> and A<sub>2</sub> refer to the first and second occurrences of ‘A’ in the reference sequence, respectively. Accordingly, μ<sub>1</sub> and μ<sub>2</sub> are aligned to A<sub>1</sub>, while μ<sub>3</sub> is aligned to A<sub>2</sub>”.

      (6) “We chose to use the poly(A) tail for normalization because it is sequence-invariant- i.e., all poly(A) tails consist of identical k-mers, unlike transcript sequences which vary in composition. In contrast, using the transcript region for normalization can introduce biases: for instance, reads with more diverse k-mers (having inherently broader signal distributions) would be forced to match the variance of reads with more uniform k-mers, potentially distorting the baseline across k-mers.”

      While the next part states there was a benchmark showing SegPore still works without this normalization, I think this answer does not touch upon the underlying issue I'm trying to point out here.

      - The biases mentioned here due to a more diverse (or different) subsets of k-mers in a read indeed affects the variance of the signal overall.

      - As I pointed out in my earlier remark here, this can be resolved using an approach of 'general normalization', 'mapping to expected signal', 'theil-sen fitting of scale and offset', 're-mapping to expected signal', as Tombo and Nanopolish have implemented.<br /> - Alternatively, one could use the reference sequence (using the read mapping information) and base the expected signal mean and standard deviation on that instead.

      - The polyA tail stability as an indicator for the variation in the rest of the signal seems a questionable assumption to me. A 'noisy' pore could introduce a large standard deviation using the polyA tail without increasing the deviations on the signal induced by the variety of k-mers, rather it would be representative for the deviations measured within a single k-mer segment. I thought this possible discrepancy is to be expected from a worn out pore, hence I'd imagine reads sequenced later in a run to provide worse results using this method.

      In the current version it is not the statement that is unclear, it is the underlying assumption of how this works that I question.

      We thank the reviewer for raising this important point and for the insightful discussion. Our choice of using the poly(A) tail for normalization is based on the working hypothesis that the poly(A) signal reflects overall pore-level variability and provides a stable reference for signal scaling. We find this to be a practical and effective approach in most experimental settings.

      We agree that more sophisticated strategies, such as “general normalization” or iterative fitting to the expected signal (as implemented in Tombo and Nanopolish), could in principle generate a "better" normalization. However, these approaches are significantly more challenging to implement in practice. This is because signal normalization and alignment are mutually dependent processes: baseline estimates for k-mers influence alignment accuracy, while alignment accuracy, in turn, affects baseline calculation. This interdependence becomes even more complex in the presence of RNA modifications, which alter signal distributions and further confound model fitting.

      It is worth noting that this limitation is already evident in our results. As shown in Figure 4B (first and second k-mers), Nanopolish produces more dispersed baselines than SegPore, even for these unmodified k-mers, suggesting inherent limitations in its normalization strategy. Ideally, baselines for the same k-mer should remain highly consistent across different reads.

      In contrast, poly(A)-based normalization offers a simpler and more robust solution that avoids this circular dependency. Because poly(A) sequences are compositionally homogeneous, they enable reliable estimation of scaling parameters without assumptions about k-mer composition or modification state. Regarding the reviewer’s concern about pore instability, we mitigate this issue by including only high-quality, confidently mapped reads in our analysis, which reduces the likelihood of incorporating signals from degraded or “noisy” pores.

      We fully agree that exploring more advanced normalization strategies is an important direction for future work, and we plan to investigate such approaches as the field progresses.

      (8) “In the remainder of this paper, we refer to these resulting events as the output of eventalign analysis or the segmentation task.”

      Picking only one descriptor rather than two alternatives would be easier to follow (and I'd prefer the first).

      Thank you for the suggestion. We have revised the sentence to:

      “In the remainder of this paper, we refer to these resulting events as the output of eventalign analysis, which also represents the final output of the segmentation and alignment task.”

      (9) “Additionally, a complete explanation of how the weighted mean is computed is provided in Section 5.3 of Supplementary Note 1. It is derived from signal points that are assigned to a given 5mer.”

      I believe there's no more mention of a weighted mean, and I don't get any hits when searching for 'weight'. Is that intentional?

      We apologize for the misplacement of the formulas. We have updated Section 5.3 of Supplementary Note 1 to clarify the definition of the weighted mean. Because multiple current signal segments may be aligned to a single k-mer, we computed the weighted mean for each k-mer across these segments, where the weight corresponds to the number of data points assigned to “curr” state in each event.

      (17) Response: We revised the sentence to clarify the selection criteria: "For selected 5mers “that exhibit both a clearly unmodified and a clearly” “modified signal component”, “SegPore reports the modification rate at each site,” “as well as the modification state of that site on individual reads.””

      So is this the same set described on page 13 ln 343 or not?

      “Due to the differences between human (Supplementary Fig. S2A) and mouse (Supplementary Fig. S2B), only six 5mers were found to have m6A annotations in the test data's ground truth (Supplementary Fig. S2C). For a genomic location to be identified as a true m6A modification site, it had to correspond to one of these six common 5mers and have a read coverage of greater than 20.”

      I struggle to interpret the 'For selected 5mers' part, as I'm not sure if this is a selection I'm supposed to already know at this point in the text or if it's a set just introduced here. If the latter, removing the word 'selected' would clear it up for me.

      We apologize for the confusion. What we mean is that when pooling signals aligned to the same k-mer across different genomic locations and reads, only a subset of k-mers exhibit a bimodal distribution — one peak corresponding to the unmodified state and another to the modified state. Other k-mers show a unimodal distribution, making it impossible to reliably estimate modification levels. We refer to the subset of k-mers that display a bimodal distribution as the “selected” k-mers.

      The “selected k-mers” described on page 13, line 343, must additionally have ground truth labels available in both the training and test datasets. There are 10 k-mers with ground truth annotations in the training data and 11 in the test data, and only 6 of these k-mers are shared between the two datasets, therefore only those 6 overlapping k-mers are retained for evaluation. These 6 k-mers satisfy both criteria: (1) exhibiting a bimodal distribution and (2) having ground truth annotations in both training and test sets.

      To improve clarity, we have removed the term “selected” from the sentence.

      (21) "Tombo used the "resquiggle" method to segment the raw signals, and we standardized the segments using the “poly(A)” tail to ensure a fair comparison “(See” “preprocessing section in Materials and Methods)."”

      In the Materials and Methods:

      “The raw signal segment corresponding to the poly(A) tail is used to standardize the raw signal for each read.”

      I cannot find more detailed information here on what the standardization does, do you mean to refer to Supplementary Note 1, Section 3 perhaps?

      Thank you for pointing this out. Yes, the standardization procedure is described in detail in Supplementary Note 1, Section 3. Tombo itself does not segment and align the raw signal on the absolute pA scale, which can result in very large variance in the derived events if the raw signal is used directly. To ensure a fair comparison, we therefore applied the same preprocessing steps to Tombo’s raw signals as we did for SegPore, using only the event boundary information from Tombo while standardizing the signal in the same way.

      We have revised the sentence for clarity as follows:

      “Tombo used the "resquiggle" method to segment the raw signals, but the resulting signals are not reported on the absolute pA scale. To ensure a fair comparison with SegPore, we standardized the segments using the poly(A) tail in the same way as SegPore (See preprocessing section in Materials and Methods).”

      (22A) The table shown does help showing the benchmark is unlikely to be 'cheated'. However I am suprised to see the Avg std for Nanopolish and Tombo going up instead of down, as I'd expect the transition values to increase the std, and hence, removing them should decrease these values. So why does this table show the opposite?

      I believe this table is not in the main text or the supplement, would it not be a good idea to cover this point somewhere in the work?

      Thank you for this insightful comment. In response, we carefully re-examined our analysis and identified a bug in the code related to boundary removal for Nanopolish. We have now corrected this issue and included the updated results in Supplementary Table S1 of the revised manuscript. As shown in the updated table, the average standard deviations decrease after removing the boundary regions for both Nanopolish and Tombo.

      We have now included this table in Supplementary Table S1 in the revised manuscript and added the following clarification:

      “It is worth noting that the data points corresponding to the transition state between two consecutive 5-mers are not included in the calculation of the standard deviation in SegPore’s results in Table 1. However, their exclusion does not affect the overall conclusion, as there are on average only ~6 points per 5-mer in the transition state (see Supplementary Table S1 for more details).”

      (22B) As mentioned in 2), I'm happy there's a clear definition of what is meant but I found the chosen word a bit odd.

      We apologize for the earlier unclear terminology. We now refer to it as the segmentation and alignment task, abbreviated as the segmentation task.

      (23) Reading back I can gather that from the text earlier, but the summation of what is being tested is this:

      “including Tombo, MINES (31), Nanom6A (32), m6Anet, Epinano (33), and CHEUI (20). “

      next, the identifier "Nanopolish+m6Anet" is, aside from the figure itself, only mentioned in the discussion. Adding a line that explains that "Nanopolish+m6Anet" is the default method of running m6Anet and "SegPore+m6Anet" replaces the Nanopolish part for m6Anet with Segpore, rather than jumping straight to "SegPore+m6Anet", would clarify where this identifier came from.

      Thank you for the helpful suggestion. We have added the identifier to the revised manuscript as follows:

      “Given their comparable methodologies and input data requirements, we benchmarked SegPore against several baseline tools, including Tombo, MINES (31), Nanom6A (32), m6Anet, Epinano (33), and CHEUI (20). By default, MINES and Nanom6A use eventalign results generated by Tombo, while m6Anet, Epinano, and CHEUI rely on eventalign results produced by Nanopolish. In Fig. 3C, ‘Nanopolish+m6Anet’ refers to the default m6Anet pipeline, whereas ‘SegPore+m6Anet’ denotes a configuration in which Nanopolish’s eventalign results are replaced with those from SegPore.”

      (24) For completeness I'd expect tickmarks and values on the y-axis as well.

      Thank you for the suggestion. We have updated Figures 3A and 3B in the revised manuscript to include tick marks and values on the y-axis as requested.

      (25) Considering this statement and looking back at figure 3a and 3b, wouldn't this be easier to observe if the histograms/KDE's were plotted with overlap in a single figure?

      We appreciate the suggestion. However, we believe that overlaying Figures 3A and 3B into a single panel would make the visualization cluttered and more difficult to interpret.

      (29) Please change the sentence in the text to make that clear. As it is written now (while it's the same number of motifs, so one might guess it) it does not seem to refer to that particular set of motifs and could be a new selection of 6 motifs.

      We appreciate the suggestion and have revised the sentence for clarity as follows:

      “We evaluated m6A predictions using two approaches: (1) SegPore’s segmentation results were fed into m6Anet, referred to as SegPore+m6Anet, which works for all DRACH motifs and (2) direct m6A predictions from SegPore’s Gaussian Mixture Model (GMM), which is limited to the six selected 5-mers shown in Supplementary Fig. S2C that exhibit clearly separable modified and unmodified components in the GMM (see Materials and Methods for details). ”

      (31) I think we have a different interpretation of the word 'leverage', or perhaps what it applies to. I'd say it leverages the jiggling if there's new information drawn from the jiggling behaviour. It's taking it into account if it filters for it. The HHMM as far as I understand tries to identify the jiggles, and ignore their values for the segmentation etc. So while one might see this as an approach that "leverages the hypothesis", I don't see how this HHMM "leverages the jiggling property" itself.

      Thank you for the helpful suggestion. We have replaced the word “leverages” with “models” in the revised manuscript.

      New points

      pg6ln166: “…we extract the aligned raw signal segment and reference sequence segment from Nanopolish's events [...] we extract the raw signal segment corresponding to the transcript region for each input read based on Nanopolish's poly(A) detection results.”

      It is not clear as to why this different approach is applied for these two cases in this part of the text.

      Thank you for pointing this out. The two approaches refer to different preprocessing strategies for in vivo and in vitro data.

      For in vivo data, a large proportion of reads do not span the full-length transcript and often map only to a portion of the reference sequence. Moreover, because a single gene can generate multiple transcript isoforms, a read may align equally well to several possible transcripts. Therefore, we extract only the raw signal segment that corresponds to the mapped portion of the transcript for each read.

      In contrast, for in vitro data, the transcript sequence is known precisely. As a result, we can directly extract all raw signals following the poly(A) tail and align them to the complete reference sequence.

      pg10ln259: An important distinction from classical global alignment algorithms is that one or multiple base blocks may align with a single 5mer.”

      If there was usually a 1:1 mapping the alignment algorithm would be more or less a direct match, so I think the multiple blocks aligning to a 5mer thing is actually quite common.

      Thank you for the comment. The “classical global alignment algorithm” here refers to the Needleman–Wunsch algorithm used for sequence alignment. Our intention was to highlight the conceptual difference between traditional sequence alignment and nanopore signal alignment. In classical sequence alignment, each base typically aligns to a single position in the reference. In contrast, in nanopore signal alignment, one or multiple signal segments — corresponding to varying dwell times of the motor protein — can align to a single 5-mer.

      We have revised the sentence as follows:

      “An important distinction from classical global alignment algorithms (Needleman–Wunsch algorithm)……”

      pg13ln356: "dwell time" is not defined or used before, I guess it's effectively the number of raw samples per segment but this should be clarified.

      Thank you for pointing this out. We have now added a clear definition of dwell time in the text as follows:

      "such as the normalized mean μ_i, standard deviation σ_i, dwell time l_i (number of data points in the event)."

      pg13ln358: “Feature vectors from 80% of the genomic locations were used for training, while the remaining 20% were set aside for validation.”

      I assume these are selected randomly but this is not explicitly stated here and should be.

      Yes, they are randomly selected. We have revised the sentence as follows:

      “Feature vectors from a randomly selected 80% of the genomic locations were used for training, while the remaining 20% were set aside for validation.”

      pg18ln488: The manuscript now evaluates RNA004 and compares against f5c and Uncalled4. It mentions the differences between RNA004 and RNA002, namely kmer size and current levels, but does not explain where the starting reference model values for the RNA004 model come from: In pg18ln492 they state "RNA004 provides reference values for 9mers", then later they seem to use a 5mer parameter table (pg19ln508), are they re-using the same table from RNA002 or did they create a 5mer table from the 9mer reference table?

      We apologize for the confusion. The reference model table for RNA004 9-mers is obtained from f5c (the array named ‘rna004_130bps_u_to_t_rna_9mer_template_model_builtin_data’in  https://raw.githubusercontent.com/hasindu2008/f5c/refs/heads/master/src/model.h).

      Author response image 1.

      We have revised the subsection header “5-mer parameter table” in the Method to “5-mer & 9-mer parameter table” to highlight this and added a paragraph about how to obtain the 9-mer parameter table:

      “In the RNA004 data analysis (Table 2), we obtained the 9-mer parameter table from the source code of f5c (version 1.5). Specifically, we used the array named ‘rna004_130bps_u_to_t_rna_9mer_template_model_builtin_data’ from the following file: https://raw.githubusercontent.com/hasindu2008/f5c/refs/heads/master/src/model.h (accessed on 17 October 2025).”

      Also, in page 18 line 195, we added the following sentence:

      “The 9-mer parameter table in pA scale for RNA004 data provided by f5c (see Materials and Methods) was used in the analysis.”

      pg19ln520: “Additionally, due to the differences of the k-mer motifs between human and mouse (Supplementary Fig. S2), six shared 5mers were selected to demonstrate SegPore's performance in modification prediction directly.”

      "the differences" - in occurrence rates, as I gather from the supplementary figure, but it would be good to explicitly state it in this sentence itself too.

      Thank you for the helpful suggestion. We agree that the original sentence was vague. The main reason for selecting only six 5-mers is the difference in the availability of ground truth labels for specific k-mer motifs between human and mouse datasets. We have revised the sentence accordingly:

      “Additionally, due to the differences in the availability of ground truth labels for specific k-mer motifs between human and mouse (Supplementary Fig. S2), six shared 5-mers were selected to directly demonstrate SegPore’s performance in modification prediction.”

      pg24ln654: “SegPore codes current intensity levels”

      "codes" is meant to be "stores" I guess? Perhaps "encodes"?

      Thank you for the suggestion. We have now replaced it with “encodes” in the revised manuscript.

      Lastly, looking at the feedback from the other reviewers comment:

      The 'HMM' mentioned in line 184 looks fine to me, the HHMM is 2 HMM's in a hierarchical setup and the text now refers to one of these HMM layers. If this is to be changed it would need to state the layer (e.g. "the outer HHMM layer") throughout the text instead.

      We agree with this assessment and believe that the term “inner HMM” is accurate in this context, as it correctly refers to one of the two HMM layers within the HHMM structure. Therefore, we have decided to retain the current terminology.

      Reviewer #3 (Recommendations for the authors):

      I recommend the publication of this manuscript, provided that the following comments are addressed.

      Page 5, Preprocessing: You comment that the poly(A) tail provides a stable reference that is crucial for the normalisation of all reads. How would this step handle reads that have interrupted poly(A) tails (e.g. in the case of mRNA vaccines that employ a linker sequence)? Or cell types that express TENT4A/B, which can include transcripts with non-A residues in the poly(A) tail: https://www.science.org/doi/full/10.1126/science.aam5794.

      It depends on Nanopolish’s ability to reliably detect the poly(A) tail. In general, the poly(A) region produces a long stretch of signals fluctuating around a current level of ~108.9 pA (RNA002) with relatively stable variation, which allows it to be identified and used for normalization.

      For in vivo data, if the poly(A) tail is interrupted (e.g., due to non-A residues or linker sequences), two scenarios are possible:

      (1) The poly(A) tail may not be reliably detected, in which case the corresponding read will be excluded from our analysis.

      (2) Alternatively, Nanopolish may still recognize the initial uninterrupted portion of the poly(A) signal, which is typically sufficient in length and stability to be used for signal normalization.

      For in vitro data, the poly(A) tails are uninterrupted, so this issue does not arise.

      All analyses presented in this study are based exclusively on reads with reliably detected poly(A) tails.

      Page 7, 5mer parameter table: r9.4_180mv_70bps_5mer_RNA is an older kmer model (>2 years). How does your method perform with the newer RNA kmer models that do permit the detection of multiple ribonucleotide modifications? Addressing this comment would be beneficial, however I understand that it would require the generation of new data, as limited RNA004 datasets are available in the public domain.

      “r9.4_180mv_70bps_5mer_RNA” is the most widely used k-mer model for RNA002 data. Regarding the newer k-mer models, we believe the reviewer is referring to the “modification basecalling” models available in Dorado, which are specifically designed for RNA004 data. At present, SegPore can perform RNA modification estimation only on RNA002 data, as this is the platform for which suitable training data and ground truth annotations are available. Evaluating SegPore’s performance with the newer RNA004 modification models would require new datasets containing known modification sites generated with RNA004 chemistry. Since such data are currently unavailable, we have not yet been able to assess SegPore under these conditions. This represents an important future direction for extending and validating our method.

      The Methods and Results sections contain redundant information -please streamline the information in these sections and reduce the redundancy.

      We thank the reviewer for this suggestion and acknowledge that there is some overlap between the Methods and Results sections. However, we feel that removing these parts could compromise the clarity and readability of the manuscript, especially given that Reviewer 2 emphasized the need for clearer explanations. We therefore decided to retain certain methodological descriptions in the Results section to ensure that key steps are understandable without requiring the reader to constantly cross-reference the Methods.

      Minor comments

      Please be consistent when referring to k-mers and 5-mers (sometimes denoted as 5mers - please change to 5-mers throughout).

      We have revised the manuscript to ensure consistency and now use “5-mers” throughout the text.

      Introduction

      Lines 80 - 112: Please condense this section to roughly half the length (1-2 paragraphs). In general, the results described in the introduction should be very brief, as they are described in full in the results section.

      Thank you for the suggestion. We have condensed the original three paragraphs into a single, more concise paragraph as follows:

      "SegPore is a novel tool for direct RNA sequencing (DRS) signal segmentation and alignment, designed to overcome key limitations of existing approaches. By explicitly modeling motor protein dynamics during RNA translocation with a Hierarchical Hidden Markov Model (HHMM), SegPore segments the raw signal into small, biologically meaningful fragments, each corresponding to a k-mer sub-state, which substantially reduces noise and improves segmentation accuracy. After segmentation, these fragments are aligned to the reference sequence and concatenated into larger events, analogous to Nanopolish’s “eventalign” output, which serve as the foundation for downstream analyses. Moreover, the “eventalign” results produced by SegPore enhance interpretability in RNA modification estimation. While deep learning–based tools such as m6Anet classify RNA modifications using complex, non-transparent features (see Supplementary Fig. S5), SegPore employs a simple Gaussian Mixture Model (GMM) to distinguish modified from unmodified nucleotides based on baseline current levels. This transparent modeling approach improves confidence in the predictions and makes SegPore particularly well-suited for biological applications where interpretability is essential."

      Line 104: Please change "normal adenosine" to "adenosine".

      We have revised the manuscript as requested and replaced all instances of “normal adenosine” with “adenosine” throughout the text.

      Materials and Methods

      Line 176: Please reword "...we standardize the raw current signals across reads, ensuring that the mean and standard deviation of the poly(A) tail are consistent across all reads." To "...we standardize the raw current signals for each read, ensuring that the mean and standard deviation are consistent across the poly(A) tail region."

      We have changed sentence as requested.

      “Since the poly(A) tail provides a stable reference, we standardize the raw current signals for each read, ensuring that the mean and standard deviation are consistent across the poly(A) tail region.”

      Line 182: Please describe the RNA translocation hypothesis, as this is the first mention of it in the text. Also, why is the Hierachical Hidden Markov model perfect for addressing the RNA translocation hypothesis? Explain more about how the HHMM works and why it is a suitable choice.

      We have revised the sentence as requested:

      “The RNA translocation hypothesis (see details in the first section of Results) naturally leads to the use of a hierarchical Hidden Markov Model (HHMM) to segment the raw current signal.”

      The motivation of the HHMM is explained in detail in the the first section “RNA translocation hypothesis” of Results. As illustrated in Figure 2, the sequencing data suggest that RNA molecules may translocate back and forth (often referred to as jiggling) while passing through the nanopore. This behavior results in complex current fluctuations that are challenging to model with a simple HMM. The HHMM provides a natural framework to address this because it can model signal dynamics at two levels. The outer HMM distinguishes between two major states — base states (where the signal corresponds to a stable sub-state of a k-mer) and transition states (representing transitions from one base state to the next). Within each base state, an inner HMM models finer signal variation using three states — “curr”, “prev”, and “next” — corresponding to the current k-mer sub-states and its neighboring k-mer sub-states. This hierarchical structure captures both the stable signal patterns and the stochastic translocation behavior, enabling more accurate and biologically meaningful segmentation of the raw current signal.

      Line 184: do you mean HHMM? Please be consistent throughout the text.

      As explained in the previous response, the HHMM consists of two layers: an outer HMM and an inner HMM. The term “HMM” in line 184 is meant to be read together with “inner” at the end of line 183, forming the phrase “inner HMM.” It seems the reviewer may have overlooked this when reading the text.

      Line 203: please delete: "It is obviously seen that".

      We have removed the phrase “It is obviously seen that” from the sentence as requested. The revised sentence now reads:

      “The first part of Eq. 2 represents the emission probabilities, and the second part represents the transition probabilities.”

      Line 314, GMM for 5mer parameter table re-estimation: "Typically, the process is repeated three to five times until the5mer parameter table stabilizes." How is the stabilisation of the 5mer parameter table quantified? What is a reasonable cut-off that would demonstrate adequate stabilisation of the 5mer parameter table? Please add details of this to the text.

      We have revised the sentence to clarify the stabilization criterion as follows:

      “Typically, the process is repeated three to five times until the 5-mer parameter table stabilizes (when the average change of mean values of all 5-mers is less than 5e-3).”

      Results

      Line 377: Please edit to read "Traditional base calling algorithms such as Guppy and Albacore assume that the RNA molecule is translocated unidirectionally through the pore by the motor protein."

      We have revised the sentence as:

      “In traditional basecalling algorithms such as Guppy and Albacore, we implicitly assume that the RNA molecule is translocated through the pore by the motor protein in a monotonic fashion, i.e., the RNA is pulled through the pore unidirectionally.”

      Line 555, m6A identification at the site level: "For six selected m6A motifs, SegPore achieved an ROC AUC of 82.7% and a PR AUC of 38.7%, earning the third best performance compared with deep leaning methods m6Anet and CHEUI (Fig. 3D)." So SegPore performs third best of all deep learning methods. Do you recommend its use in conjunction with m6Anet for m6A detection? Please clarify in the text. This will help to guide users to possible best practice uses of your software.

      Thank you for the suggestion. We have added a clarification in the revised manuscript to guide users.

      “For practical applications, we recommend taking the intersection of m6A sites predicted by SegPore and m6Anet to obtain high-confidence modification sites, while still benefiting from the interpretability provided by SegPore’s predictions.”

      Figures.

      Figure 1A please refer to poly(A) tail, rather than polyA tail.

      We have updated it to poly(A) tail in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The study by Pinho et al. presents a novel behavioral paradigm for investigating higher-order conditioning in mice. The authors developed a task that creates associations between light and tone sensory cues, driving mediated learning. They observed sex differences in task acquisition, with females demonstrating faster-mediated learning compared to males. Using fiber photometry and chemogenetic tools, the study reveals that the dorsal hippocampus (dHPC) plays a central role in encoding mediated learning. These findings are crucial for understanding how environmental cues, which are not directly linked to positive/negative outcomes, contribute to associative learning. Overall, the study is well-designed, with robust results, and the experimental approach aligns with the study's objectives. 

      Strengths: 

      (1) The authors develop a robust behavioral paradigm to examine higher-order associative learning in mice. 

      (2) They discover a sex-specific component influencing mediated learning, with females exhibiting enhanced learning abilities. 

      (3) Using fiber photometry and chemogenetic techniques, the authors identify the dorsal hippocampus but not the ventral hippocampus, which plays a crucial for encoding mediated learning.

      We appreciate the strengths highlighted by the Reviewer and the valuable and complete summary of our work.

      Weaknesses: 

      (1) The study would be strengthened by further elaboration on the rationale for investigating specific cell types within the hippocampus.  

      We thank the Reviewer for highlighting this important point. In the revised manuscript, we have added new information (Page 11, Lines 27-34) to specifically explain the rational of studying the possible cell-type specific involvement in sensory preconditioning.

      (2) The analysis of photometry data could be improved by distinguishing between early and late responses, as well as enhancing the overall presentation of the data.  

      According to the Reviewer comment, we have included new panels in Figure 3E and the whole Supplementary Figure 4, which separates the photometry data across different preconditioning and conditioning sessions, respectively. Overall, this data suggests that there are no major changes on cell activity in both hippocampal regions during the different sessions as similar light-tone-induced enhancement of activity is observed. These findings have been incorporated in the Results Section (Page 12, Lines 13-15, 19-20 and 35-36).

      (3) The manuscript would benefit from revisions to improve clarity and readability.

      Based on the fair comment, we have gone through the text to increase clarity and readability.

      Reviewer #2 (Public review): 

      Summary: 

      Pinho et al. developed a new auditory-visual sensory preconditioning procedure in mice and examined the contribution of the dorsal and ventral hippocampus to learning in this task. Using photometry they observed activation of the dorsal and ventral hippocampus during sensory preconditioning and conditioning. Finally, the authors combined their sensory preconditioning task with DREADDs to examine the effect of inhibiting specific cell populations (CaMKII and PV) in the DH on the formation and retrieval/expression of mediated learning. 

      Strengths: 

      The authors provide one of the first demonstrations of auditory-visual sensory preconditioning in male mice. Research on the neurobiology of sensory preconditioning has primarily used rats as subjects. The development of a robust protocol in mice will be beneficial to the field, allowing researchers to take advantage of the many transgenic mouse lines. Indeed, in this study, the authors take advantage of a PV-Cre mouse line to examine the role of hippocampal PV cells in sensory preconditioning. 

      We acknowledge the Reviewer´s effort and for highlighting the strengths of our work.

      Weaknesses: 

      (1) The authors report that sensory preconditioning was observed in both male and female mice. However, their data only supports sensory preconditioning in male mice. In female mice, both paired and unpaired presentations of the light and tone in stage 1 led to increased freezing to the tone at test. In this case, fear to the tone could be attributed to factors other than sensory preconditioning, for example, generalization of fear between the auditory and visual stimulus.

      We thank the comment raised by the Reviewer. At first, we were hypothesizing that female mice were somehow able to associate light and tone although they were presented separately during the preconditioning sessions. Thus, we designed new experiments (shown in Supplementary Figure 2D) to test if we would observe data congruent with our initial hypothesis or with fear generalization as proposed by the reviewer. We have performed a new experiment comparing a Paired group with two additional control groups that are (i) an Unpaired group where we increased the time between the light and tone presentations and (ii) an experimental group where the light was absent during the conditioning. Clearly, the new results indicate the presence of fear generalization in female mice aswe found a significant cue-induced increase on freezing responses in all the experimental groups tested. In accordance with the Reviewer’s suggestion, we can conclude that mediated learning is not correctly observed in female mice using the protocol described (i.e. with 2 conditioning sessions). All these new results forced us to reorganize the structure and the figures of the manuscript to focus more in male mice in the Main Figures whereas showing the data with female mice in Supplementary Figures. Overall, our data clearly revealed the necessity to have adapted behavioral protocols for each sex demonstrating sex differences in sensory preconditioning, which was added in the Discussion Section (Page 15, lines 12-37).

      (2) In the photometry experiment, the authors report an increase in neural activity in the hippocampus during both phase 1 (sensory preconditioning) and phase 2 (conditioning). In the subsequent experiment, they inhibit neural activity in the DH during phase 1 (sensory preconditioning) and the probe test, but do not include inhibition during phase 2 (conditioning). It was not clear why they didn't carry forward investigating the role of the hippocampus during phase 2 conditioning. Sensory preconditioning could occur due to the integration of the tone and shock during phase two, or retrieval and chaining of the tonelight-shock memories at test. These two possibilities cannot be differentiated based on the data. Given that we do not know at which stage the mediate learning is occurring, it would have been beneficial to additionally include inhibition of the DH during phase 2. 

      Following the Reviewer’s valuable comment, we have conducted a new experiment where we have chemogenetically inhibited the CaMKII-positive neurons of the dHPC during the conditioning to explore their involvement in mediated learning formation. Notably, the inhibition of principal neurons of the dHPC during conditioning does not impair the formation ofthe mediated learning in our hands. These new results are now shown in Supplementary Figure 7G and added in the Results section (Page 13, Lines 19-23).

      (3) In the final experiment, the authors report that inhibition of the dorsal hippocampus during the sensory preconditioning phase blocked mediated learning. While this may be the case, the failure to observe sensory preconditioning at test appears to be due more to an increase in baseline freezing (during the stimulus off period), rather than a decrease in freezing to the conditioned stimulus. Given the small effect, this study would benefit from an experiment validating that administration of J60 inhibited DH cells. Further, given that the authors did not observe any effect of DREADD inhibition in PV cells, it would also be important to validate successful cellular silencing in this protocol.  

      According to the Reviewer comments, we have performed new experiments to validate the use of J60 to inhibit hippocampal cells that are shown in Supplementary Figure 7 E-F for CaMKII-positive neurons, in which J60 administration tends to decrease the frequency of calcium events both in the dHPC and vHPC. Furthermore, in Supplementary Figure 8 B-C we show that J60 is also able to modify calcium events in PV-positive interneurons. Although,the best method to validate the use of DREADD (i.e. to inhibit hippocampal cell activity) could be electrophysiology recordings, we lack this technique in our laboratory. Thus, in order to adress the reviewer comment, we decided to combine the DREADD modulation through J60 administration with photometry recordings, where several tendencies are confirmed. In addition, a similar approach has been used in another preprint of the lab (https://doi.org/10.1101/2025.08.29.673009), where there is an increase of phospho-PDH, a marker of neuronal inhibition upon J60 administration in the dHPC, as well as in other experiments conducted from a collaborator lab where they were able to observe a modulation of SOM-positive interneurons activity upon J60 administration (PhD defense of Miguel Sabariego, University Pompeu Fabra, Barcelona). 

      Reviewer #3 (Public review): 

      Summary: 

      Pinho et al. investigated the role of the dorsal vs ventral hippocampus and the gender differences in mediated learning. While previous studies already established the engagement of the hippocampus in sensory preconditioning, the authors here took advantage of freely-moving fiber photometry recording and chemogenetics to observe and manipulate sub-regions of the hippocampus (dorsal vs. ventral) in a cell-specific manner. The authors first found sex differences in the preconditioning phase of a sensory preconditioning procedure, where males required more preconditioning training than females for mediating learning to manifest, and where females displayed evidence of mediated learning even when neutral stimuli were never presented together within the session. 

      After validation of a sensory preconditioning procedure in mice using light and tone neutral stimuli and a mild foot shock as the unconditioned stimulus, the authors used fiber photometry to record from all neurons vs. parvalbumin_positive_only neurons in the dorsal hippocampus or ventral hippocampus of male mice during both preconditioning and conditioning phases. They found increased activity of all neurons, as well as PV+_only neurons in both sub-regions of the hippocampus during both preconditioning and conditioning phases. Finally, the authors found that chemogenetic inhibition of CaMKII+ neurons in the dorsal, but not ventral, hippocampus specifically prevented the formation of an association between the two neutral stimuli (i.e., light and tone cues), but not the direct association between the light cue and the mild foot shock. This set of data: (1) validates the mediated learning in mice using a sensory preconditioning protocol, and stresses the importance of taking sex effect into account; (2) validates the recruitment of dorsal and ventral hippocampi during preconditioning and conditioning phases; and (3) further establishes the specific role of CaMKII+ neurons in the dorsal but not ventral hippocampus in the formation of an association between two neutral stimuli, but not between a neutralstimulus and a mild foot shock. 

      Strengths: 

      The authors developed a sensory preconditioning procedure in mice to investigate mediated learning using light and tone cues as neutral stimuli, and a mild foot shock as the unconditioned stimulus. They provide evidence of a sex effect in the formation of light-cue association. The authors took advantage of fiber-photometry and chemogenetics to target sub-regions of the hippocampus, in a cell-specific manner and investigate their role during different phases of a sensory conditioning procedure. 

      We thank the Reviewer for the extensive summary of our work and for giving interesting value to some of our findings.

      Weaknesses: 

      The authors went further than previous studies by investigating the role of sub-regions of the hippocampus in mediated learning, however, there are several weaknesses that should be noted: 

      (1) This work first validates mediated learning in a sensory preconditioning procedure using light and tone cues as neutral stimuli and a mild foot shock as the unconditioned stimulus, in both males and females. They found interesting sex differences at the behavioral level, but then only focused on male mice when recording and manipulating the hippocampus. The authors do not address sex differences at the neural level. 

      We appreciate the comment of the Reviewer. Indeed, thanks to other Reviewer comments during this revision process (see Point 1 of Reviewer #2), we performed an additional experiment that reveals that using the described protocol in female mice we observed fear generalization rather than mediated learning responding. This data pointed to the need of sex-specific changes in the behavioral protocols to measure sensory preconditioning. The revised version of the manuscript, although highlighting these sex differences in behavioral performance (see Supplementary Figure 2), is more focused in male mice and, accordingly, all photometry or chemogenetic experiments are performed using male mice. In future studies, once we are certain to have a sensory preconditioning paradigm working in female mice, it will be very interesting to study if the same hippocampal mechanisms mediating this behavior in male mice are also observed in female mice.  

      (2) As expected in fear conditioning, the range of inter-individual differences is quite high. Mice that didn't develop a strong light-->shock association, as evidenced by a lower percentage of freezing during the Probe Test Light phase, should manifest a low percentage of freezing during the Probe Test Tone phase. It would interesting to test for a correlation between the level of freezing during mediated vs test phases. 

      Thanks to the comment raised by the reviewer, we generated a new set of data correlating mediated and direct fear responses. As it can be observed in Supplementary Figure 3, there is a significant correlation between mediated and direct learning in male mice (i.e. the individuals that freeze more in the direct learning test, correlate with the individuals that express more fear response in the mediated learning test). In contrast, this correlation is absent in female mice, further confirming what we have explained above. We have highlighted this new analysis in the Results section (Page 11, Lines 20-24).

      (3) The use of a synapsin promoter to transfect neurons in a non-specific manner does not bring much information. The authors applied a more specific approach to target PV+ neurons only, and it would have been more informative to keep with this cell-specific approach, for example by looking also at somatostatin+ inter-neurons. 

      The idea behind using a pan neuronal promoter was to assess in general terms how neuronal activity in the hippocampus is engaged during different phases of the lighttone sensory preconditioning. However, the comment of the Reviewer is very pertinent and, as suggested, we have generated some new data targeting CaMKII-positive neurons (see Point 4 below). Finally, although it could be extremely interesting, we believe that targeting different interneuron subtypes is out of the scope of the present work. However, we have added this in the Discussion Section as a future perspective/limitation of our study (Page 17, Lines 9-24).   

      (4) The authors observed event-related Ca2+ transients on hippocampal pan-neurons and PV+ inter-neurons using fiber photometry. They then used chemogenetics to inhibit CaMKII+ hippocampal neurons, which does not logically follow. It does not undermine the main finding of CaMKII+ neurons of the dorsal, but not ventral, hippocampus being involved in the preconditioning, but not conditioning, phase. However, observing CaMKII+ neurons (using fiber photometry) in mice running the same task would be more informative, as it would indicate when these neurons are recruited during different phases of sensory preconditioning. Applying then optogenetics to cancel the observed event-related transients (e.g., during the presentation of light and tone cues, or during the foot shock presentation) would be more appropriate.  

      We have generated new photometry data to analyze the activity of CaMKII-positive neurons during the preconditioning phase to confirm their engagement during the light-tone pairings. Thus, we infused a CaMKII-GCAMP calcium sensor into the dHPC and vHPC of mice and we recorded its activity during the 6 preconditioning sessions. The new results can be found in Figure 3 and explained in the Results section (Page 12, Lines 26-36). The results clearly show an engagement of CaMKII-positive neurons during the light-tone pairing observed both in the dHPC and vHPC. Finally, although the suggestion of performing optogenetic manipulations would be very elegant, we expect to have convinced the reviewer that our chemogenetic results clearly show and are enough to demonstrate the involvement of dHPC in the formation of mediated learning in the Light-Tone sensory preconditioning paradigm. However, we have added this in the Discussion Section as a future perspective/limitation of our study (Page 17, Lines 9-24).  

      (5) Probe tests always start with the "Probe Test Tone", followed by the "Probe Test Light". "Probe Test Tone" consists of an extinction session, which could affect the freezing response during "Probe Test Light" (e.g., Polack et al. (http://dx.doi.org/10.3758/s13420-013-0119-5)). Preferably, adding a group of mice with a Probe Test Light with no Probe Test Tone could help clarify this potential issue. The authors should at least discuss the possibility that the tone extinction session prior to the "Probe Test Light" could have affected the freezing response to the light cue. 

      We appreciate the comment raised by the reviewer. However, we think that our direct learning responses are quite robust in all of our experiments and, thus, the impact of a possible extinction based on the tone presentation should not affect our direct learning. However, as it is an important point, we have discussed it in the Discussion Section (Page 17, Lines 12-14).  

      Reviewer #4 (Public review): 

      Summary 

      Pinho et al use in vivo calcium imaging and chemogenetic approaches to examine the involvement of hippocampal sub-regions across the different stages of a sensory preconditioning task in mice. They find clear evidence for sensory preconditioning in male but not female mice. They also find that, in the male mice, CaMKII-positive neurons in the dorsal hippocampus: (1) encode the audio-visual association that forms in stage 1 of the task, and (2) retrieve/express sensory preconditioned fear to the auditory stimulus at test. These findings are supported by evidence that ranges from incomplete to convincing. They will be valuable to researchers in the field of learning and memory. 

      We appreciate the summary of our work and all the constructive comments raised by the Reviewer, which have greatly improved the clarity and quality of our manuscript.  

      Abstract 

      Please note that sensory preconditioning doesn't require the stage 1 stimuli to be presented repeatedly or simultaneously. 

      The reviewer is right, and we have corrected and changed that information in the revised abstract.  

      "Finally, we combined our sensory preconditioning task with chemogenetic approaches to assess the role of these two hippocampal subregions in mediated learning."  This implies some form of inhibition of hippocampal neurons in stage 2 of the protocol, as this is the only stage of the protocol that permits one to make statements about mediated learning. However, it is clear from what follows that the authors interrogate the involvement of hippocampal sub-regions in stages 1 and 3 of the protocol - not stage 2. As such, most statements about mediated learning throughout the paper are potentially misleading (see below for a further elaboration of this point). If the authors persist in using the term mediated learning to describe the response to a sensory preconditioned stimulus, they should clarify what they mean by mediated learning at some point in the introduction. Alternatively, they might consider using a different phrase such as "sensory preconditioned responding". 

      Considering the arguments of the Reviewer, we have modified our text in the Abstract and through the main text. Moreover, based on a comment of Reviewer #2 (Point 2) we have generated new data demonstrating that dHPC does not seem to be involved in mediated learning formation during Stage 2, as its inhibition does not impair sensory preconditioning responding. This new data can be seen in Supplementary Figure 7G.  

      Introduction 

      "Low-salience" is used to describe stimuli such as tone, light, or odour that do not typically elicit responses that are of interest to experimenters. However, a tone, light, or odour can be very salient even though they don't elicit these particular responses. As such, it would be worth redescribing the "low-salience" stimuli in some other terms. 

      Through the revised version of the manuscript, we have replaced the term “lowsalience” by “innocuous stimuli” or avoiding any adjective as we think is not necessary.  

      "These higher-order conditioning processes, also known as mediated learning, can be captured in laboratory settings through sensory preconditioning procedures2,6-11."  Higher-order conditioning and mediated learning are not interchangeable terms: e.g., some forms of second-order conditioning are not due to mediated learning. More generally, the use of mediated learning is not necessary for the story that the authors develop in the paper and could be replaced for accuracy and clarity. E.g., "These higher-order conditioning processes can be studied in the laboratory using sensory preconditioning procedures2,6-11." 

      According to the Reviewer proposal, we have modified the text. 

      In reference to Experiment 2, it is stated that: "However, when light and tone were separated on time (Unpaired group), male mice were not able to exhibit mediated learning response (Figure 2B) whereas their response to the light (direct learning) was not affected (Figure 2D). On the other hand, female mice still present a lower but significant mediated learning response (Figure 2C) and normal direct learning (Figure 2E). Finally, in the No-Shock group, both male (Figure 2B and 2D) and female mice (Figure 2C and 2E) did not present either mediated or direct learning, which also confirmed that the exposure to the tone or light during Probe Tests do not elicit any behavioral change by themselves as the presence of the electric footshock is required to obtain a reliable mediated and direct learning responses."  The absence of a difference between the paired and unpaired female mice should not be described as "significant mediated learning" in the latter. It should be taken to indicate that performance in the females is due to generalization between the tone and light. That is, there is no sensory preconditioning in the female mice. The description of performance in the No-shock group really shouldn't be in terms of mediated or direct learning: that is, this group is another control for assessing the presence of sensory preconditioning in the group of interest. As a control, there is no potential for them to exhibit sensory preconditioning, so their performance should not be described in a way that suggests this potential. 

      All these comments are very pertinent and also raised by Reviewer #2 (Point 1, see above). In the revised version of the manuscript, we have carefully changed, when necessary, our interpretation of the results (e.g. in the case of the No-Shock group). In addition, we have generated new data that confirm that using similar conditions (i.e. 2 conditioning sessions in our SPC) in female mice we observe fear generalization and not a confident sensory preconditioning responding. In our opinion, this is not discarding the presence of mediated learning in female mice but suggesting that adapted protocols must be used in each sex. These results forced us to change the organization of the Figures but we hope the reviewer would agree with all the changes proposed. In addition, we have re-wrote a paragraph in the Discussion Section to explain these sex differences (see Page 15, lines 12-37). 

      Methods - Behavior 

      I appreciate the reasons for testing the animals in a new context. This does, however, raise other issues that complicate the interpretation of any hippocampal engagement: e.g., exposure to a novel context may engage the hippocampus for exploration/encoding of its features - hence, it is engaged for retrieving/expressing sensory preconditioned fear to the tone. This should be noted somewhere in the paper given that one of its aims is to shed light on the broader functioning of the hippocampus in associative processes. 

      This general issue - that the conditions of testing were such as to force engagement of the hippocampus - is amplified by two further features of testing with the tone. The first is the presence of background noise in the training context and its absence in the test context. The second is the fact that the tone was presented for 30 s in stage 1 and then continuously for 180s at test. Both changes could have contributed to the engagement of the hippocampus as they introduce the potential for discrimination between the tone that was trained and tested. 

      We have now added these pertinent comments in a “Study limitations” paragraph found in the Discussion Section (Page 17, Lines 9-24). Indeed, the different changes of context (including the presence of background noise) have been implemented by the fact that during the setting up of the paradigm we had problems of fear generalization (also in male mice). Similarly, differences in cue exposure between the preconditioning phase and the test phase were also decided based on important differences between previous protocols used in rats compared to how mice are responding. Certainly, mice were not able to adapt their behavioral responses when shorter time windows exposing the cue were used as it clearly happens with rats [1].

      Results - Behavior 

      The suggestion of sex differences based on differences in the parameters needed to generate sensory preconditioning is interesting. Perhaps it could be supported through some set of formal analyses. That is, the data in supplementary materials may well show that the parameters needed to generate sensory preconditioning in males and females are not the same. However, there needs to be some form of statistical comparison to support this point. As part of this comparison, it would be neat if the authors included body weight as a covariate to determine whether any interactions with sex are moderated by body weight.  

      Regarding the comparison between male and female mice, although the comments of the Reviewer are pertinent and interesting, we think that with the new data generated is not appropriate to compare both sexes as we still have to optimize the SPC protocol for female mice. 

      What is the value of the data shown in Figure 1 given that there are no controls for unpaired presentations of the sound and light? In the absence of these controls, the experiment cannot have shown that "Female and male mice show mediated learning using an auditory-visual sensory preconditioning task" as implied by its title. Minimally, this experiment should be relabelled. 

      Based on the new data generated with female mice, we have decided to remove Figure 1 and re-organize the structure of the manuscript. We hope that the Reviewer would agree that this has improved the clarity of the manuscript.  

      "Altogether, this data confirmed that we successfully set up an LTSPC protocol in mice and that this behavioral paradigm can be used to further study the brain circuits involved in higherorder     conditioning."  Please insert the qualifier that LTSPC was successfully established in male mice. There is no evidence of LTSPC in female mice. 

      We fully agree with the Reviewer and our new findings further confirm this issue. Thus, we have changed the statement in the revised version of the manuscript.  

      Results - Brain 

      "Notably, the inhibition of CaMKII-positive neurons in the dHPC (i.e. J60 administration in DREADD-Gi mice) during preconditioning (Figure 4B), but not before the Probe Test 1 (Figure 4B), fully blocked mediated, but not direct learning (Figure  4D)." The right panel of Figure 4B indicates no difference between the controls and Group DPC in the percent change in freezing from OFF to ON periods of the tone. How does this fit with the claim that CaMKII-positive neurons in the dorsal hippocampus regulate associative formation during the session of tone-light exposures in stage 1 of sensory preconditioning? 

      To improve the quality of the figures and to avoid possible redundancies between panels, in the new version of the manuscript, we have decided to remove all the panels regarding the percentage of change. However, in our opinion regarding the issue raised by the Reviewer, the inhibition of the dHPC clearly induced an impairment of mediated learning as animals do not change their behavior (i.e. there is no significant increase of freezing between OFF and ON periods) when the tone appears in comparison with the other two groups. The graphs indicating the percentage of change (old version of the manuscript) was a different manner to show the presence of tone- or light-induced responses in each experimental group. Thus, a significant effect (shown by # symbol) meant that in that specific experimental group there was a significant change in behavior (freezing) when the cue (tone or light) appeared compared when there was no cue (OFF period). Thus, in the old panel 4B commented by the Reviewer, in our opinion, the absence of significance in the group where the dHPC has been inhibited during thepreconditioning, compared to the other groups, where a clear significant effect can be observed, indicate an impairment of mediated learning formation. However, to avoid any confusion, we have slightly modified the text to strictly mention what is being analyzed and/or shown in the graphs and, as mentioned, the graphs of percentage of change have been removed.  

      Discussion 

      "When low salience stimuli were presented separated on time or when the electric footshock was absent, mediated and direct learning were abolished in male mice. In female mice, although light and tone were presented separately during the preconditioning phase, mediated learning was reduced but still present, which implies that female mice are still able to associate the two low-salience stimuli." 

      This doesn't quite follow from the results. The failure of the female unpaired mice to withhold their freezing to the tone should not be taken to indicate the formation of a light-tone association across the very long interval that was interpolated between these stimulus presentations. It could and should be taken to indicate that, in female mice, freezing conditioned to the light simply generalized to the tone (i.e., these mice could not discriminate well between the tone and light). 

      As discussed above, we fully agree with the Reviewer and all the manuscript has been modified as described above. 

      "Indeed, our data suggests that when hippocampal activity is modulated by the specific manipulation of hippocampal subregions, this brain region is not involved during retrieval."  Does this relate to the results that are shown in the right panel of Figure 4B, where there is no significant difference between the different groups? If so, how does it fit with the results shown in the left panel of this figure, where differences between the groups are observed? 

      "In line with this, the inhibition of CaMKII-positive neurons from the dorsal hippocampus, which has been shown to project to the restrosplenial cortex56, blocked the formation of mediated learning." 

      Is this a reference to the findings shown in Figure 4B and, if so, which of the panels exactly? That is, one panel appears to support the claim made here while the other doesn't. In general, what should the reader make of data showing the percent change in freezing from stimulus OFF to stimulus ON periods? 

      In our opinion, as pointed above, the graphs indicating the percentage of change were a different manner to show the presence of tone- or light-induced behavioral responses in each experimental group. Thus, a significant effect (shown by # symbol) meant that in this specific experimental group there was a significant change in behavior (freezing) when the cue (tone or light appear) compared when there was no cue (OFF period). Thus, in the old panel 4B commented by the Reviewer, in our opinion, the absence of significance in the group where the dHPC has been inhibited during the preconditioning, compared to the other groups where a clear significant effect can be observed, indicates an impairment of mediated learning formation. In the revised version of the manuscript, we have rephrased these sentences to stick to what the graphs are showing and, as explained, the graphs of percentage of change have been removed.

      Reviewer #1 (Recommendations for the authors): 

      The authors may address the following questions: 

      (1) The study identifies major sex differences in the conditioning phase, with females showing faster learning. Since hormonal fluctuations can influence learning and behavior, it would be helpful for the authors to comment on whether they tracked the estrous cycle of the females and whether any potential effects of the cycle on mediated learning were considered. 

      This is a relevant and important point raised by the Reviewer. In our study we did not track the estrous cycle to investigate whether it exists any effect of the cycle on mediated learning, which could be an interesting project by itself. Although in the revised version of the manuscript we provide new information regarding the mediated learning performance in male and female mice, we agree with the reviewer that sex hormones may account for the observed sex differences. However, the aim of the present work was to explore potential sex differences in mediated learning responding rather than to investigate the specific mechanisms behind these potential sex differences. 

      For this reason and to avoid adding further complexity to our present study, we did not check the estrous cycle in the female mice, the testosterone levels in male mice or analyze the amount of sex hormones during different phases of the sensory preconditioning task. Indeed, we think that checking the estrous cycle in female mice would still not be enough to ascertain the role of sex hormones because checking the androgen levels in male mice would also be required. In line with this, meta-analysis of neuroscience literature using the mouse model as research subjects [2-4]  has revealed that data collected from female mice (regardless of the estrous cycle) did not vary more than the data from males. In conclusion, we think that using randomized and mixed cohorts of male and female mice (as in the present study) would provide the same degree of variability in both sexes. Nevertheless, we have added a sentence to point to this possibility in the Discussion Section (Page 15, lines 32-37). 

      (2) The rationale for including parvalbumin (PV) cells in the study could be clarified. Is there prior evidence suggesting that this specific cell type is involved in mediated learning? This could apply to sensory stimuli not used in the current study.

      In the revised version of the manuscript, we have better clarified why we targeted PV interneurons, specifically mentioning previous studies [5] (see Page 11, Lines 27-34). 

      (3) The photometry recordings from the dHPC during the preconditioning phase, shown in Figure 3, are presented as average responses. It would be beneficial to separate the early vs. late trials to examine whether there is an increase in hippocampal activity as the associative learning progresses, rather than reporting the averaged data. Additionally, to clarify the dynamics of the dHPC in associative learning, the authors could compare the magnitude of photometry responses when light and tone stimuli are presented individually in separate sessions versus when they are presented closely in time to facilitate associative learning.

      As commented above, according to the Reviewer’s comment, we have now included a new Supplementary Figure 4, which splits the photometry data by the different preconditioning and conditioning sessions. Overall, this data suggests that there are no major changes on cell activity in both hippocampal regions during the different sessions as similar light-tone-induced enhancement of activity is observed. There is only an interesting trend in the activity of Pan-Neurons over the onset of light during conditioning sessions. All this is included now in the Results Section (Page 12, Line 13-15).

      (4) The authors note that PV cell responses recorded with GCaMP were similar to general hippocampal neurons, yet chemogenetic manipulations of PV cells did not impact behavior. A more detailed discussion of this discrepancy would be helpful. 

      As suggested by the Reviewer, we have included additional Discussion to explain the potential discrepancy between the activity of PV interneurons assessed by photometry and its modulation by chemogenetics (see Page 16, Lines 27-33).   

      (5) All fiber photometry recordings were conducted in male mice. Given the sex differences observed in associative learning, the authors could expand the study to include dHPC responses in females during both preconditioning and conditioning sessions. 

      We appreciate the comment of the Reviewer. Indeed, thanks to other comments made by other Reviewers in this revision (see Point 1 of Reviewer #2), we are not still sure that we have an optimal protocol to study mediated learning in female mice due to sexspecific changes related to fear generalization. Thus, the revised version of the manuscript, although highlighting these sex differences in behavioral performance (see Supplementary Figure 2), is more focused in male mice and, accordingly, all photometry or chemogenetic experiments are performed exclusively using male mice. In future studies, once we would be sure to have a sensory preconditioning paradigm working in female mice, it will be very interesting to study if the same hippocampal mechanisms mediating this behavior in male mice are also observed in female mice. 

      Minor Comments: 

      (1) In the right panel of Figure 2A, females received only one conditioning session, so the "x2" should be corrected to "x1" conditioning to accurately reflect the data. 

      We thank the Reviewer for the comment that has been addressed in the revised version of the manuscript.  

      (2) The overall presentation of Figure 3 could be improved. For example, the y-axis in Panel B could be cut to a maximum of 3 rather than 6, which would better highlight the response data. Alternatively, including heatmap representations of the z-score responses could enhance clarity and visual impact.  

      We thank the Reviewer for the comment that has been addressed providing a new format for Figures 2 and 3 in the revised version of the manuscript.   

      (3) There are several grammatical errors throughout the manuscript. It is recommended that the authors use a grammar correction tool to improve the overall writing quality and readability.  

      We have tried to correct the grammar through all the manuscript.  

      Reviewer #2 (Recommendations for the authors):  

      (1) In the abstract the authors write that sensory preconditioning requires the "repeated and simultaneous presentation of two low-salience stimuli such as a light and a tone". Previous research has shown that sensory preconditioning can still occur if the two stimuli are presented serially, rather than simultaneously. Further, the tone and the light are not necessarily "low-salience", for example, they can be loud or bright. It would be better to refer to them as innocuous. 

      In the revised version of the abstract, we have included the modifications suggested by the Reviewer.   

      (2) The authors develop a novel automated tool for assessing freezing behaviour in mice that correlates highly with both manual freezing and existing, open-source freeze estimation software (ezTrack). The authors should explain how the new program differs from ezTrack, or if it provides any added benefit over this existing software. 

      We have added new information in the Results Section (Page 10, Lines 13-20 to better explain how the new tool to quantify freezing could improve existing software.  

      (3) In Experiment 1, the authors report a sex difference in levels of freezing between male and female mice when they are only given one session of sensory preconditioning. This should be supported by a statistical comparison of levels of freezing between male and female mice. 

      Based on the new results obtained with female mice, we have decided to remove the original Figure 1 of the manuscript as it is not meaningful to compare male and female mediated learning response if we do not have an optimal protocol in female mice.  

      (4) Why did the authors choose to vary the duration of the stimuli across preconditioning, conditioning, and testing? During preconditioning, the light-tone compound was 30s, in conditioning the light was 10s, and at test both stimuli were presented continuously for 3 min. Did the level of freezing vary across the three-minute probe session? There is some evidence that rodents can learn the timing of stimuli and it may be the case that freezing was highest at the start of the test stimulus, when it most closely resembled the conditioned stimulus. 

      Differences in cue exposure between the preconditioning phase and the test phase were decided based on important differences between previous protocols used in rats compared to how mice are responding. Indeed, mice were not able to adapt their behavioral responses when shorter time windows exposing the cue were used as it clearly happens with rats1. In addition, we have added a new graph to show the time course of the behavioral responses (see Figure 1 and 4 and Supplementary Figure 2) that correlate with the quantification of freezing responses shown by the percentage of freezing during ON and OFF periods.   

      (5) The title of Experiment 1 "Female and male mice show mediated learning using an auditory-visual sensory preconditioning task" - this experiment does not demonstrate mediated learning; it merely shows that animals will freeze more in the presence of a stimulus as compared with no stimulus. This experiment lacks the necessary controls to claim mediated learning (which are presented in Experiment 2) and should therefore be retitled something more appropriate.

      As stated above, based on the new results obtained with female mice, we have decided to remove the original Figure 1 of the manuscript as it is not meaningful to compare male and female mediated learning response if we do not have an optimal protocol in female mice.   

      (6) In Figure 2, why does the unpaired group show less freezing to the tone than the paired group given that the tone was directly paired with the shock in both groups? 

      We believe the Reviewer may have referred to the tone in error (i.e. there are no differences in the freezing observed to the tone) and (s)he might be talking about the freezing induced by the Light in the direct learning test. In this case, it is true that the direct learning (e.g. percentage of freezing) seems to be slightly lower in the unpaired group compared to the paired one, which could be due to a latent inhibition process caused by the different exposure of cues between paired and unpaired experimental groups. However, the direct learning in both groups is clear and significant and there are no significant differences between them, which makes difficult to extract any further conclusion. 

      (7) The stimuli in the design schematics are quite small and hard to see, they should be enlarged for clarity. The box plots also looked stretched and the colour difference between the on and off periods is difficult to discern. 

      We have included some important modification to the Figures in order to address the comments made by the Reviewer and improve its quality.   

      (8) The authors do not include labels for the experimental groups (paired, unpaired, no shock) in Figures 2B, 2D, 2C, and 2E. This made it very difficult to interpret the figure.  

      According to this suggestion, Figure 2 has been changed accordingly. 

      (9) The levels of freezing during conditioning should be presented for all experiments.  

      We have generated a new Supplementary Figure 9 to show the freezing levels during conditioning sessions. 

      (10) In the final experiment, the authors wrote that mice were injected with J60 or saline, but I could not find the data for the saline animals.  

      In the Results and Methods section, we have included a sentence to better explain this issue. In addition, we have added a new Supplementary Figure 7 to show the performance of all control groups.  

      (11) Please list the total number of animals (per group, per sex) for each experiment.  

      In the revised version of the manuscript, we have added this information in each Figure Legend.  

      Reviewer #3 (Recommendations for the authors): 

      I found this study very interesting, despite a few weaknesses. I have several minor comments to add, hoping that it would improve the manuscript: 

      (1) The terminology used is not always appropriate/consistent. I would use "freely moving fiber photometry" or simply "fiber photometry" as calcium imaging conventionally refers to endoscopic or 2-photon calcium imaging. 

      We thank the Reviewer for this comment that has been addressed and corrected in the revised version of the manuscript. 

      (2) "Dorsal hippocampus mediates light-tone sensory preconditioning task in mice" suggests that a brain region mediates a task. I would rather suggest, e.g. "Dorsal hippocampus mediates light-tone association in mice" 

      We thank the Reviewer for this comment that has been addressed and corrected in the revised version of the manuscript.

      (3) As you are using low-salience stimuli, it would be better to also inform the readership with the light intensity used for the light cue, for replicability purposes. 

      In the Methods section (Page 5, Line 30), we have added new information regarding the visual stimuli used. 

      (4) If the authors didn't use a background noise during the probe tests, the tone cue could have been perceived as being louder/clearer by mice. Couldn't it have inflated the freezing response for the tone cue?  

      This is an interesting comment made by the Reviewer although we do not have any data to directly answer his/her suggestion. However, the presence of the Background noise resulted necessary to set up the protocol and to change different aspects of the context through all the paradigm, which was necessary to avoid fear generalization in mice. In addition, as demonstrated before [6] , the presence of background noise is important to avoid that other auditory cue (i.e. tone) could induce fear responses by itself as the transition of noise to silence is a signal to danger for animals. 

      (5) "salience" is usually used for the intensity of a stimulus, not for an association or pairing. Rather, we usually refer to the strength of an association. 

      We thank the Reviewer for this comment that has been addressed and corrected in the revised version of the manuscript.

      (6) Figure 3, panel A. "RCaMP Neurons", maybe "Pan-Neurons" would be more appropriate, as PV+ inter-neurons are also neurons. 

      We thank the Reviewer for this comment that has been corrected accordingly.

      (7) Figure 4, panel A, please add the AAV injected, and the neurons labelled in your example slice. 

      We thank the Reviewer for this comment that has been corrected accordingly.

      References

      (1) Wong, F. S., Westbrook, R. F. & Holmes, N. M. 'Online' integration of sensory and fear memories in the rat medial temporal lobe. Elife 8 (2019). https://doi.org:10.7554/eLife.47085

      (2) Prendergast, B. J., Onishi, K. G. & Zucker, I. Female mice liberated for inclusion in neuroscience and biomedical research. Neurosci Biobehav Rev 40, 1-5 (2014). https://doi.org:10.1016/j.neubiorev.2014.01.001

      (3) Becker, J. B., Prendergast, B. J. & Liang, J. W. Female rats are not more variable than male rats: a meta-analysis of neuroscience studies. Biol Sex Differ 7, 34 (2016). https://doi.org:10.1186/s13293-016-0087-5

      (4) Shansky, R. M. Are hormones a "female problem" for animal research? Science 364,  825-826 (2019). https://doi.org:10.1126/science.aaw7570

      (5) Busquets-Garcia, A. et al. Hippocampal CB1 Receptors Control Incidental Associations. Neuron 99, 1247-1259 e1247 (2018). https://doi.org:10.1016/j.neuron.2018.08.014

      (6) Pereira, A. G., Cruz, A., Lima, S. Q. & Moita, M. A. Silence resulting from the cessation of movement signals danger. Curr Biol 22, R627-628 (2012). https://doi.org:10.1016/j.cub.2012.06.015

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      SMC5/6 is a highly conserved complex able to dynamically alter chromatin structure, playing in this way critical roles in genome stability and integrity that include homologous recombination and telomere maintenance. In the last years, a number of studies have revealed the importance of SMC5/6 in restricting viral expression, which is in part related to its ability to repress transcription from circular DNA. In this context, Oravcova and colleagues recently reported how SMC5/6 is recruited by two mutually exclusive complexes (orthologs of yeast Nse5/6) to SV40 LT-induced PML nuclear bodies (SIMC/SLF2) and DNA lesions (SLF1/2). In this current work, the authors extend this study, providing some new results. However, as a whole, the story lacks unity and does not delve into the molecular mechanisms responsible for the silencing process. One has the feeling that the story is somewhat incomplete, putting together not directly connected results.

      Please see the introductory overview above.

      (1) In the first part of the work, the authors confirm previous conclusions about the relevance of a conserved domain defined by the interaction of SIMC and SLF2 for their binding to SMC6, and extend the structural analysis to the modelling of the SIMC/SLF2/SMC complex by AlphaFold. Their data support a model where this conserved surface of SIMC/SLF2 interacts with SMC at the backside of SMC6's head domain, confirming the relevance of this interaction site with specific mutations. These results are interesting but confirmatory of a previous and more complete structural analysis in yeast (Li et al. NSMB 2024). In any case, they reveal the conservation of the interaction. My major concern is the lack of connection with the rest of the article. This structure does not help to understand the process of transcriptional silencing reported later beyond its relevance to recruit SMC5/6 to its targets, which was already demonstrated in the previous study.

      Demonstrating the existence of a conserved interface between the Nse5/6-like complexes and SMC6 in both yeast and human is foundationally important, not confirmatory, and was not revealed in our previous study. It remains unclear how this interface regulates SMC5/6 function, but yeast studies suggest a potential role in inhibiting the SMC5/6 ATPase cycle. Nevertheless, the precise function of Nse5/6 and its human orthologs in SMC5/6 regulation remain undefined, largely due to technical limitations in available in vivo analyses. The SIMC1/SLF2/SMC6 complex structure likely extends to the SLF1/2/SMC6 complex, suggesting a unifying function of the Nse5/6-like complexes in SMC5/6 regulation, albeit in the distinct processes of ecDNA silencing and DNA repair. There have been no studies to date (including this one) showing that SIMC1-SLF2 is required for SMC5/6 recruitment to ecDNA. Our previous study showed that SIMC1 was needed for SMC5/6 to colocalize with SV40 LT antigen at PML NBs. Here we show that SIMC1 is required for ecDNA repression, in the absence of PML NBs, which was not anticipated.

      (2) In the second part of the work, the authors focus on the functionality of the different complexes. The authors demonstrate that SMC5/6's role in transcription silencing is specific to its interaction with SIMC/SLF2, whereas SMC5/6's role in DNA repair depends on SLF1/2. These results are quite expected according to previous results. The authors already demonstrated that SLF1/2, but not SIMC/SLF2, are recruited to DNA lesions. Accordingly, they observe here that SMC5/6 recruitment to DNA lesions requires SLF1/2 but not SIMC/SLF2. Likewise, the authors already demonstrated that SIMC/SLF2, but not SLF1/2, targets SMC5/6 to PML NBs. Taking into account the evidence that connects SMC5/6's viral resistance at PML NBs with transcription repression, the observed requirement of SIMC/SLF2 but not SLF1/2 in plasmid silencing is somehow expected. This does not mean the expectation has not to be experimentally confirmed. However, the study falls short in advancing the mechanistic process, despite some interesting results as the dispensability of the PML NBs or the antagonistic role of the SV40 large T antigen. It had been interesting to explore how LT overcomes SMC5/6-mediated repression: Does LT prevent SIMC/SLF2 from interacting with SMC5/6? Or does it prevent SMC5/6 from binding the plasmid? Is the transcription-dependent plasmid topology altered in cells lacking SIMC/SLF2? And in cells expressing LT? In its current form, the study is confirmatory and preliminary. In agreement with this, the cartoons modelling results here and in the previous work look basically the same.

      Our previous study only examined the localization of SLF1 and SIMC1 at DNA lesions. The localization of these subcomplexes alone should not be used to define their roles in SMC5/6 localization. Indeed, the field is split in terms of whether Nse5/6-like complexes are required for ecDNA binding/loading, or regulation of SMC5/6 once bound. 

      We agree, determining the potential mechanism of action of LT in overcoming SMC5/6-based repression is an important next step. We believe it is unlikely due to blocking of the SMC5/6SIMC1/SLF2 interface, since SIMC1-SLF2 is required for SMC5/6 to localize at LT-induced foci. It will require the identification of any direct interactions with SMC5/6 subunits, and better methods for assessing SMC5/6 loading and activity on ecDNAs. Unlike HBx, Vpr, and BNRF1 it does not appear to induce degradation of SMC5/6, making it a more complex and interesting challenge. Also, the dispensability of PML NBs in plasmid silencing versus viral silencing raises multiple important questions about SMC5/6’s repression mechanism. 

      (3) There are some points about the presented data that need to be clarified.

      Thank you, we have addressed these points below, within the Recommendations for authors section.

      Reviewer #2 (Public review):

      Oracová et al. present data supporting a role for SIMC1/SLF2 in silencing plasmid DNA via the SMC5/6 complex. Their findings are of interest, and they provide further mechanistic detail of how the SMC5/6 complex is recruited to disparate DNA elements. In essence, the present report builds on the author's previous paper in eLife in 2022 (PMID: 36373674, "The Nse5/6-like SIMC1-SLF2 complex localizes SMC5/6 to viral replication centers") by showing the role of SIMC1/SLF2 in localisation of the SMC5/6 complex to plasmid DNA, and the distinct requirements as compared to recruitment to DNA damage foci. Although the findings of the manuscript are of interest, we are not yet convinced that the new data presented here represents a compelling new body of work and would better fit the format of a "research advance" article. In their previous paper, Oracová et al. show that the recruitment of SMC5/6 to SV40 replication centres is dependent on SIMC1, and specifically, that it is dependent on SIMC1 residues adjacent to neighbouring SLF2.

      We agree. We submitted this manuscript as a “Research Advance”, not as a standalone research article, given that it is an extension of our previous “Research Article” (1).

      Other comments

      (1) The mutations chosen in Figure 1 are quite extensive - 5 amino acids per mutant. In addition, they are in many cases 'opposite' changes, e.g., positive charge to negative charge. Is the effect lost if single mutations to an alanine are made?

      The mutations were chosen to test and validate the predicted SIMC1-SLF2-SMC6 structure i.e. the contact point between the conserved patch of SIMC1-SLF2 and SMC6. Multiple mutations and charge inversions increased the chance of disrupting the extensive interface. In this respect, the mutations were successful and informative, confirming the requirement of this region in specifically contacting SMC6. Whilst alanine scanning mutations are possible, we believe that they would not add to, or detract from, our validation of the predicted SIMC1-SLF2-SMC6 interface.

      (2) In Figure 2c, it isn't clear from the data shown that the 'SLF2-only' mutations in SMC6 result in a substantial reduction in SIMC1/SLF2 binding.

      To clarify the difference between wild-type and SLF2-only mutations in SIMC1-SLF2 interaction, we have performed an image volume analysis. This shows that the SLF2-facing SMC6 mutant reduces its interaction with SIMC1 (to 44% of WT) and SLF2 (to 21% of WT). The reduction in both SIMC1 and SLF2 interaction with SMC6 SLF2-facing mutant is expected, since SIMC1 and SLF2 are an interdependent heterodimer.  

      Author response table 1.

      (3) In the GFP reporter assays (e.g. Figure 3), median fluorescence is reported - was there any observed difference in the percentage of cells that are GFP positive?

      Yes, as expected when the GFP plasmid is not actively repressed, the percent of GFP positive cells differs in each cell line – in the same trend as GFP intensity

      (4) The potential role of the large T antigen as an SMC5/6 evasion factor is intriguing. However, given the role of the large T antigen as a transcriptional activator, caution is required when interpreting enhanced GFP fluorescence. Antagonism of the SMC5/6 complex in this context might be further supported by ChIP experiments in the presence or absence of large T. Can large T functionally substitute for HBx or HIV-Vpr?

      We agree, the potential role of LT in SMC5/6 antagonism is interesting. We did state in the text “While LT is known to be a promiscuous transcriptional activator (2,3) that does not rule out a co-existing role in antagonizing SMC5/6. Indeed, these findings are reminiscent of HBx from HBV and Vpr of HIV-1, both of which are known promiscuous transcriptional activators that also directly antagonize SMC5/6 to relieve transcriptional repression (4-10).“ We have tried ChIP experiments, but found these to be unreliable in assessing SMC5/6 association with plasmid DNA. Given the many disparate targets of LT, HBx and Vpr (other than SMC5/6), it seems unlikely that LT could functionally substitute for HBx and Vpr in supporting HBV and HIV-1 infections. Whilst certainly an interesting future question, we believe it is beyond the scope of this study.

      (5) In Figure 5c, the apparent molecular weight of large T and SMC6 appears to change following transfection of GFP-SMC5 - is there a reason for this?

      We are not certain as to what causes the molecular weight shift, but it is not specifically related to GFPSMC5 transfection. Rather, it appears to be a general effect of the pulldown. Indeed, a very weak “background” band of LT is seen in the GFP only pulldown, which also runs at a “higher” molecular weight, as in the GFP-SMC5 pulldown. We believe that the effect is instead related to gel mobility in the wells that contain post pulldown proteins and different buffers. We have also seen similar effects using different protein-protein interaction pairs. 

      Reviewer #3 (Public review):

      Summary:

      This study by the Boddy and Otomo laboratories further characterizes the roles of SMC5/6 loader proteins and related factors in SMC5/6-mediated repression of extrachromosomal circular DNA. The work shows that mutations engineered at an AlphaFold-predicted protein-protein interface formed between the loader SLF2/SIMC1 and SMC6 (similar to the interface in the yeast counterparts observed by cryo-EM) prevent co-IP of the respective proteins. The mutations in SLF2 also hinder plasmid DNA silencing when expressed in SLF2-/- cell lines, suggesting that this interface is needed for silencing. SIMC1 is dispensable for recruitment of SMC5/6 to sites of DNA damage, while SLF1 is required, thus separating the functions of the two loader complexes. Preventing SUMOylation (with a chemical inhibitor) increases transcription from plasmids but does not in SLF2-deleted cell lines, indicating the SMC5/6 silences plasmids in a SUMOylation dependent manner. Expression of LT is sufficient for increased expression, and again, not additive or synergistic with SIMC1 or SLF2 deletion, indicating that LT prevents silencing by directly inhibiting 5/6. In contrast, PML bodies appear dispensable for plasmid silencing.

      Strengths:

      The manuscript defines the requirements for plasmid silencing by SMC5/6 (an interaction of Smc6 with the loader complex SLF2/SIMC1, SUMOylation activity) and shows that SLF1 and PML bodies are dispensable for silencing. Furthermore, the authors show that LT can overcome silencing, likely by directly binding to (but not degrading) SMC5/6.

      Weaknesses:

      (1) Many of the findings were expected based on recent publications.

      There have been no manuscripts describing the role of SIMC1-SLF2 in ecDNA silencing. There have been studies describing SLF2’s roles in ecDNA silencing, but these suggested SLF2 had an SLF1 independent role, with no mention of an alternate Nse5-like cofactor. Our earlier study in eLife (1) described the identification of SIMC1 as an Nse5-like cofactor for SLF2 but did not test potential roles of the complex in ecDNA silencing. Also, the apparent dispensability of PML NBs in plasmid silencing (in U2OS cells) was unexpected based on recent publications. Finally, SV40 LT has not previously been implicated in SMC5/6 inhibition, which may occur through novel mechanisms.

      (2) While the data are consistent with SIMC1 playing the main function in plasmid silencing, it is possible that SLF1 contributes to silencing, especially in the absence of SIMC1. This would potentially explain the discrepancy with the data reported in ref. 50. SLF2 deletion has a stronger effect on expression than SIMC1 deletion in many but not all experiments reported in this manuscript. A double mutant/deletion experiments would be useful to explore this possibility.

      It is interesting to note that the data in ref. 50 (11) is also at odds with that in ref. 45 (8) in terms of defining a role for SLF1 in the silencing of unintegrated HIV-1 DNA. The Irwan study showed that SLF1 deficient cells exhibit increased expression of a reporter gene from unintegrated HIV-1, whereas the Dupont study found that SLF1 deletion, unlike SLF2 deletion, has no effect. It is unclear what the basis of this discrepancy is. In line with the Dupont study, we found no effect of SLF1 deletion on plasmid expression (Figure 4B), whereas SLF2 deletion increased reporter expression (Figure 3A/B). It is possible that SLF1 could support some plasmid silencing in the absence of SIMC1, especially considering the gross structural similarity in their C-terminal Nse5-like domains. However, we have been unable to generate double-knockout SIMC1 and SLF1 cells to test such a possibility, and shSLF1 has been ineffective. 

      (3) SLF2 is part of both types of loaders, while SLF1 and SIMC1 are specific to their respective loaders. Did the authors observe differences in phenotypes (growth, sensitivities to DNA damage) when comparing the mutant cell lines or their construction? This should be stated in the manuscript.

      We have not observed significant differences in the growth rates of each cell line, and DNA damage sensitivities are as yet untested.   

      (4) It would be desirable to have control reporter constructs located on the chromosome for several experiments, including the SUMOylation inhibition (Figures 5A and 5-S2) and LT expression (Figure 5D) to exclude more general effects on gene expression.

      We have repeated all GFP reporter assays using integrated versus episomal plasmid DNA. A seminal study by Decorsière et al. (6) showed that SMC5/6 degradation by HBx of HBV increased transcription of episomal but not chromosomally integrated reporters. In line with this data, the deletion of SLF2 does not notably impact the expression of our GFP reporter construct when it is genomically integrated (Figure 3—figure supplement 1C).  

      Somewhat surprisingly, given the generally transcriptionally repressive roles of SUMO, inhibition of the SUMO pathway with SUMOi did not significantly impact the expression of our genomically integrated GFP reporter, versus the episomal plasmid (Figure 5—figure supplement 1C). Finally, the expression of SV40 LT, which enhances plasmid reporter expression (Figure 5D), also did not notably affect expression of the same reporter when located in the genome (Figure 5—figure supplement 3B). This is an interesting result, which is in line with an early study showing that HBx of HBV induces transcription from episomal, but not chromosomally integrated reporters (12). This further suggests that SV40 LT acts similarly to other early viral proteins like HBx and Vpr to counteract or bypass SMC5/6 restriction, amongst their multifaceted functions. Clearly, further analyses are needed to define mechanisms of LT in counteracting SMC5/6, but they do not appear to include complex degradation as seen with HBx and Vpr.  

      (5) Figure 5A: There appears to be an increase in GFP in the SLF2-/- cells with SUMOi? Is this a significant increase?

      No significant difference was found between WT, SIMC1-/- or SLF2-/- when treated with SUMOi (p>0.05). The p-value is 0.0857 (when comparing SLF2-/- to WT in the SUMOi condition) This is described in the figure legend to Figure 5.

      (6) The expression level of SFL2 mut1 should be tested (Figure 3B).

      Full length SLF2 (WT or mutants) has been undetectable by western analyses. However, truncated SLF2 mut1 expresses well and binds SIMC1 but not SMC6 (Figure 1C). Moreover, full length SLF2 mut1 expression was confirmed by qPCR – showing a somewhat higher expression level than SLF2 WT (Figure 3—figure supplement 1B).  

      Reviewer #1 (Recommendations for the authors):

      There are some points about the presented data that need to be clarified.

      (1) Figures 3, 4B, and 5. The authors should rule out the possibility that the reported effects on transcription were due to alterations in plasmid number. This is particularly important, taking into account the importance of SMC5/6 in DNA replication.

      We used qPCR to assess plasmid copy number versus genomic DNA in our cell lines, testing at 72 hours post transfection to avoid any impact of cytosolic DNA (13). Our qPCR data show that there is no significant impact on plasmid copy number across our cell lines i.e. WT and SLF2 null.  SMC5/6 has a positive role in DNA replication progression on the genome (e.g. (14)), so loss of SMC5/6 “targeting” in SIMC1 and SLF2 null cells would be unlikely to promote replication fork progression per se. 

      (2) Figure S1A. In contrast to the statement in the text, the SIMC1-combo control is affected in its binding to SLF2; however, it is not affected in its binding to SMC6. This is somehow unexpected because it suggests that the solenoid-like structure is not required for SMC6 binding, just specific patches at either SIMC or SLF2. This should be commented on.

      We appreciate the reviewer’s observation regarding the discrepancy between Figure S1A and the text. This was our oversight. The data show that SLF2 recovery was reduced in the pull-down with the SIMC1 combo control mutant, while SLF2 expression was unchanged. Because SLF2 or SIMC1 variants that fail to associate typically show poor expression (1), these findings suggest that the SIMC1 combo control mutant associates with SLF2, albeit more weakly. Since the mutations were introduced into surface residues of SIMC1, it is not immediately clear how they would weaken the interaction or destabilize the complex. In contrast, SMC6 was fully recovered with the SIMC1 combo control mutant, indicating that the SIMC1–SMC6 interaction remains stable without stoichiometric SLF2. This may reflect direct recognition of a SIMC1 binding epitope or stabilization of its solenoid structure by SMC6, although this interpretation remains uncertain given the unstable nature of free SIMC1 and SLF2. Alternatively, SMC6 may have co-sedimented with the SIMC1 combo control mutant together with SLF2, which was initially retained but subsequently lost during washing, whereas SMC6 remained due to its limited solubility in the absence of other SMC5/6 subunits. While further mechanistic analysis will require purified SMC5/6 components, our data support the AlphaFold-based model by demonstrating that SIMC1 mutations on the non–SMC6-contacting surface retain association with SMC6. The text has been revised accordingly.

      (3) The SLF2-only mutant has alterations that affect interactions with both SLF2 and SIMC1. Is it not another Mixed mutant?

      We appreciate the reviewer’s observation regarding the discrepancy between the mutant name (“SLF2only”) and its description (“while N947 forms salt bridges with SIMC1”). The previous statement was inaccurate due to a misinterpretation of several AlphaFold models. Across these models, the SIMC1– SLF2 interface residues remain largely consistent, but the SIMC1 residue R470 exhibits positional variability—contacting N947 in some models but not in others. Given this variability and the absence of an experimental structure, we have revised the text to avoid overinterpretation. Because the N947 side chain is oriented toward SLF2 and consistently forms polar contacts with the H1148 side chain and G1149 backbone, we have renamed this mutant “SLF2-facing,” which more accurately describes its modeled environment. The other mutants are likewise renamed “SIMC1-facing” and “SIMC1–SLF2groove-facing,” providing a clearer and more consistent description of the interface.

      (4) The SLF2-only mutant still displays clear interactions with SMC6. Can this be explained with the AlphaFold model?

      SIMC1 may contribute more substantially to SMC6 binding than SLF2, consistent with our mutagenesis results. However, the energetic contributions of individual residues or proteins cannot be quantitatively inferred from structural models alone. Comprehensive experimental and computational analyses would be required to address this point.

      (5) The conclusions about the role of SUMOylation are vague; it is already known that its general effect on transcription repression, and the authors already demonstrated that SIMC interacts with SUMO pathway factors. Concerning the epistatic effect, the experiment should be done at a lower inhibitor concentration; at 100 nM there is not much margin to augment according to the kinetics analysis in Figure S5.

      The SUMO pathway is indeed thought to be generally repressive for transcription. Notably, in response to a suggestion from Reviewer 3 (public review point 4), we have repeated several of our GFP expression assays using cells with the GFP reporter plasmid integrated into the genome (please see Figure 3—figure supplement 1C; Figure 5—figure supplement 1C; Figure 5—figure supplement 3B). This type of integrated reporter does not show elevated expression following inhibition of the SMC5/6 complex, unlike ecDNAs (6,10). Interestingly, SUMOi, LT expression, and SLF2 knockout also did not notably impact the expression of our integrated GFP reporter (Figure 3—figure supplement 1C; Figure 5—figure supplement 1C; Figure 5—figure supplement 3B, unlike that of the plasmid (ecDNA) reporter. Given the “general” inhibitory effect of SUMO on transcription, the SUMOi result was not expected, and it opens further interesting avenues for study. 

      In Figure 5—figure supplement 1A, 100 nM SUMOi increases reporter expression well below the highest SUMOi dose. We believe that the ~3-4 fold induction of GFP expression in SLF2 null cells, if independent of SUMOylation, should further increase GFP expression. The impact of SUMOylation on GFP reporter expression remains “vague”, but our data indicate that SMC5/6 operates within SUMO’s “umbrella” function and provides a starting point for more mechanistic dissection. 

      (6) Figure 5C. Why is the size different between Input versus GFP-PD?

      Please see our response to this question above: reviewer 2, point (5)

      Reviewer #2 (Recommendations for the authors):

      If further data could be provided to extend on that which is presented, then publication as a 'standalone research article' may be appropriate, but not in its present form.

      We submitted this manuscript as a “Research Advance” not as a standalone research article, given that it was an extension of our previous research article (1).

      Reviewer #3 (Recommendations for the authors):

      (1) The term 'LT' should be defined in the title

      We have updated the title accordingly.  

      (2) This reviewer found the nomenclature of the SMC6 mutants confusing (SIMC1-only...). Either rephrase or define more clearly in the text and the figures.

      We agree with the reviewer and have renamed the mutants as “SIMC1-facing”, “SLF2-facing,”, and “SIMC1–SLF2-groove-facing”.

      (3) The authors could better emphasize that LT blocks silencing in trans (not only on its cognate target sequence in cis). This is consistent with the observed direct binding to SMC5/6.

      We appreciate the suggestion to further emphasize the impact of LT on plasmid silencing. We did not want to overstate its impact at this time because we do not know if it directly binds SMC5/6 or indeed affects SMC5/6 function more broadly. LT expression like HBx, does cause induction of a DNA damage response, but we cannot at this point tie that response to SMC5/6 inhibition alone.

      (4) Figure 5 S1: the merge looks drastically different. Is DAPI omitted in the wt merge image?

      Thank you for noting this issue. We have corrected the image, which was impacted by the use of an underexposed DAPI image.  

      (5) Figure 1: how is the structure in B oriented relative to A? A visual guide would be helpful.

      We have added arrows to indicate the view orientation and rotational direction to turn A to B.

      (6) Line 126, unclear what "specificity" here means.

      We have revised the sentence without this word, which now starts with “To confirm the SIMC1-SMC6 interface, we introduced….”

      (7) Line 152, The statement implies that the conserved residues are needed for loader subunits interactions ('mediating the SIMC1-SLF2 interaction"). Does Figure 1C not show that the residues are not important? Please clarify.

      Thank you for noting this writing error. We have corrected the sentence to provide the intended meaning. It now reads "Collectively, these results confirm that the conserved surface patch of SIMC1SLF2 is essential for SMC6 binding.” 

      References

      (1) Oravcova M, Nie M, Zilio N, Maeda S, Jami-Alahmadi Y, Lazzerini-Denchi E, Wohlschlegel JA, Ulrich HD, Otomo T, Boddy MN. The Nse5/6-like SIMC1-SLF2 complex localizes SMC5/6 to viral replication centers. Elife. 2022;11. PMCID: PMC9708086

      (2) Sullivan CS, Pipas JM. T antigens of simian virus 40: molecular chaperones for viral replication and tumorigenesis. Microbiol Mol Biol Rev. 2002;66(2):179-202. PMCID: PMC120785

      (3) Gilinger G, Alwine JC. Transcriptional activation by simian virus 40 large T antigen: requirements for simple promoter structures containing either TATA or initiator elements with variable upstream factor binding sites. J Virol. 1993;67(11):6682-8. PMCID: PMC238107

      (4) Qadri I, Conaway JW, Conaway RC, Schaack J, Siddiqui A. Hepatitis B virus transactivator protein, HBx, associates with the components of TFIIH and stimulates the DNA helicase activity of TFIIH. Proc Natl Acad Sci U S A. 1996;93(20):10578-83. PMCID: PMC38195

      (5) Aufiero B, Schneider RJ. The hepatitis B virus X-gene product trans-activates both RNA polymerase II and III promoters. EMBO J. 1990;9(2):497-504. PMCID: PMC551692

      (6) Decorsiere A, Mueller H, van Breugel PC, Abdul F, Gerossier L, Beran RK, Livingston CM, Niu C, Fletcher SP, Hantz O, Strubin M. Hepatitis B virus X protein identifies the Smc5/6 complex as a host restriction factor. Nature. 2016;531(7594):386-9. 

      (7) Murphy CM, Xu Y, Li F, Nio K, Reszka-Blanco N, Li X, Wu Y, Yu Y, Xiong Y, Su L. Hepatitis B Virus X Protein Promotes Degradation of SMC5/6 to Enhance HBV Replication. Cell Rep. 2016;16(11):2846-54. PMCID: PMC5078993

      (8) Dupont L, Bloor S, Williamson JC, Cuesta SM, Shah R, Teixeira-Silva A, Naamati A, Greenwood EJD, Sarafianos SG, Matheson NJ, Lehner PJ. The SMC5/6 complex compacts and silences unintegrated HIV-1 DNA and is antagonized by Vpr. Cell Host Microbe. 2021;29(5):792-805 e6. PMCID: PMC8118623

      (9) Felzien LK, Woffendin C, Hottiger MO, Subbramanian RA, Cohen EA, Nabel GJ. HIV transcriptional activation by the accessory protein, VPR, is mediated by the p300 co-activator. Proc Natl Acad Sci U S A. 1998;95(9):5281-6. PMCID: PMC20252

      (10) Diman A, Panis G, Castrogiovanni C, Prados J, Baechler B, Strubin M. Human Smc5/6 recognises transcription-generated positive DNA supercoils. Nat Commun. 2024;15(1):7805. PMCID: PMC11379904

      (11) Irwan ID, Bogerd HP, Cullen BR. Epigenetic silencing by the SMC5/6 complex mediates HIV-1 latency. Nat Microbiol. 2022;7(12):2101-13. PMCID: PMC9712108

      (12) van Breugel PC, Robert EI, Mueller H, Decorsiere A, Zoulim F, Hantz O, Strubin M. Hepatitis B virus X protein stimulates gene expression selectively from extrachromosomal DNA templates. Hepatology. 2012;56(6):2116-24. 

      (13) Lechardeur D, Sohn KJ, Haardt M, Joshi PB, Monck M, Graham RW, Beatty B, Squire J, O'Brodovich H, Lukacs GL. Metabolic instability of plasmid DNA in the cytosol: a potential barrier to gene transfer. Gene Ther. 1999;6(4):482-97. 

      (14) Gallego-Paez LM, Tanaka H, Bando M, Takahashi M, Nozaki N, Nakato R, Shirahige K, Hirota T. Smc5/6-mediated regulation of replication progression contributes to chromosome assembly during mitosis in human cells. Mol Biol Cell. 2014;25(2):302-17. PMCID: PMC3890350

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review): 

      Summary: 

      This paper by Schommartz and colleagues investigates the neural basis of memory reinstatement as a function of both how recently the memory was formed (recent, remote) and its development (children, young adults). The core question is whether memory consolidation processes as well as the specificity of memory reinstatement differ with development. A number of brain regions showed a greater activation difference for recent vs. remote memories at the long versus shorter delay specifically in adults (cerebellum, PHG, LOC). A different set showed decreases in the same comparison, but only in children (precuneus, RSC). The authors also used neural pattern similarity analysis to characterize reinstatement, though still in this revised paper I have substantive concerns about how the analyses were performed. While scene-specific reinstatement decreased for remote memories in both children and adults, claims about its presence cannot be made given the analyses. Gist-level reinstatement was observed in children but not adults, but I also have concerns about this analysis. Broadly, the behavioral and univariate findings are consistent with the idea memory consolidation differs between children and adults in important ways, and takes a step towards characterizing how.

      Strengths: 

      The topic and goals of this paper are very interesting. As the authors note, there is little work on memory consolidation over development, and as such this will be an important data point in helping us begin to understand these important differences. The sample size is great, particularly given this is an onerous, multi-day experiment; the authors are to be commended for that. The task design is also generally well controlled, for example as the authors include new recently learned pairs during each session.  

      Weaknesses: 

      As noted above and in my review of the original submission, the pattern similarity analysis for both item and category-level reinstatement were performed in a way that is not interpretable given concerns about temporal autocorrelation within scanning run.Unfortunately these issues remain of concern in this revision because they were not rectified. Most of my review focuses on this analytic issue, though I also outline additional concerns. 

      (1) The pattern similarity analyses are largely uninterpretable due to how they were performed. 

      (a) First, the scene-specific reinstatement index: The authors have correlated a neural pattern during a fixation cross (delay period) with a neural pattern associated with viewing a scene as their measure of reinstatement. The main issue with this is that these events always occurred back-to-back in time. As such, the two patterns will be similar due simply to the temporal autocorrelation in the BOLD signal. Because of the issues with temporal autocorrelation within scanning run, it is always recommended to perform such correlations only across different runs. In this case, the authors always correlated patterns extracted from the same run, and which moreover have temporal lags that are perfectly confounded with their comparison of interest (i.e., from Fig 4A, the "scene-specific" comparisons will always be back-to-back, having a very short temporal lag; "set-based" comparisons will be dispersed across the run, and therefore have a much higher lag). The authors' within-run correlation approach also yields correlation values that are extremely high - much higher than would be expected if this analysis was done appropriately. The way to fix this would be to restrict the analysis to only cross-run comparisons, which is not possible given the design. 

      To remedy this, in the revision the authors have said they will refrain from making conclusions about the presence of scene-specific reinstatement (i.e., reinstatement above baseline). While this itself is an improvement from the original manuscript, I still have several concerns. First, this was not done thoroughly and at times conclusions/interpretations still seem to imply or assume the presence of scene reinstatement (e.g., line 979-985, "our research supports the presence of scene-specific reinstatement in 5-to-7-year-old children"; line 1138). 

      We thank the reviewers for pointing out that there are inconsistencies in our writing. We agree that we cannot make any claims about the baseline level of scene-specific reinstatement. To reiterate, our focus is on the changes in reinstatement over time (30 minutes, 24 hours, and two weeks after learning), which showed a robust decrease. Importantly, scenespecific reinstatement indices for recent items — tested on different days — did not significantly differ, as indicated by non-significant main effects of Session (all p > .323) and Session x ROI interactions (all p > .817) in either age group. This supports our claim that temporal autocorrelation is stable and consistent across conditions and that the observed decline in scene-specific reinstatement reflects a time-dependent change in remote retrieval. We have revised the highlighted passages, accordingly, emphasizing the delay-related decrease in scene-specific reinstatement rather than its absolute magnitude. 

      Second, the authors' logic for the neural-behavioural correlations in the PLSC analysis involved restricting to regions that showed significant reinstatement for the gist analysis, which cannot be done for the analogous scene-specific reinstatement analysis. This makes it challenging to directly compare these two analyses since one was restricted to a small subset of regions and only children (gist), while scene reinstatement included both groups and all ROIs. 

      We thank the reviewer for pointing this out and want to clarify that it was not our intention to directly compare these analyses. For the neural-behavioral correlations, we included only those regions identified based on gist-like representations baseline, whereas for scene-specific reinstatement, we included all regions due to the absence of such a baseline. The primary aim of the PLSC analysis was to identify a set of regions that, after a stringent permutation and bootstrapping procedure, form a latent variable that explains a significant proportion of variance in behavioral performance across all participants. 

      Third, it is also unclear whether children and adults' values should be directly comparable given pattern similarity can be influenced by many factors like motion, among other things. 

      We thank the reviewer for raising this important point. In our multivariate analysis, we included confounding regressors specifically addressing motion-related artefacts. Following recent best practices for mitigating motion-related confounding factors in both adult and pediatric fMRI data (Ciric et al., 2017; Esteban et al., 2020; Jones et al., 2021; Satterthwaite et al., 2013), we implemented the most effective motion correction strategies. 

      Importantly, our group × session interaction analysis focuses on relative changes in reinstatement over time rather than comparing absolute levels of pattern similarity between children and adults. This approach controls for potential baseline differences and instead examines whether the magnitude of delay-related changes differs across groups. We believe this warrants the comparison and ensures that our conclusions are not driven by group-level differences in baseline similarity or motion artifacts.

      My fourth concern with this analysis relates to the lack of regional specificity of the effects. All ROIs tested showed a virtually identical pattern: "Scene-specific reinstatement" decreased across delays, and was greater in children than adults. I believe control analyses are needed to ensure artifacts are not driving these effects. This would greatly strengthen the authors' ability to draw conclusions from the "clean" comparison of day 1 vs. day 14. (A) The authors should present results from a control ROI that should absolutely not show memory reinstatement effects (e.g., white matter?). Results from the control ROI should look very different - should not differ between children and adults, and should not show decreases over time. 

      (C) If the same analysis was performed comparing the object cue and immediately following fixation (rather than the fixation and the immediately following scene), the results should look very different. I would argue that this should not be an index of reinstatement at all since it involves something presented visually rather than something reinstated (i.e., the scene picture is not included in this comparison). If this control analysis were to show the same effects as the primary analysis, this would be further evidence that this analysis is uninterpretable and hopelessly confounded. 

      We appreciate the reviewer’s suggestion to strengthen the interpretation of our findings by including appropriate control analyses to rule out non-memory-related artifacts. In response, we conducted several control analyses, detailed below, which collectively support the specificity of the observed reinstatement effects. The report of the results is included in the manuscript (line 593-619).

      We checked that item reinstatement for incorrectly remembered trial did not show any session-related decline for any ROI. This indicates that the reinstatement for correctly remembered items is memory-related (see Fig. S5 for details). 

      We conducted additional analyses on three subregions of the corpus callosum (the body, genu, and splenium). The results of the linear mixed-effects models revealed no significant group effect (all p > .426), indicating no differences between children and adults. In contrast, all three ROIs showed a significant main effect of Session (all p < .001). However, post hoc analyses indicated that this effect was driven by differences between the recent and the Day 14 remote condition. The main contrasts of interest – recent vs. Day 1 remote and Day 1 remote vs. Day 14 remote – were not significant (all p > .080; see Table S10.4), suggesting that, unlike in other ROIs, there was no delay-related decrease in scene-specific reinstatement in these white matter regions.

      Then we repeated our analysis using the same procedure but replaced the “scene” time window with the “object” time window. The rationale for this control is that comparing the object cue to the immediately following fixation period should not reflect scene reinstatement, as the object and the reinstated scene rely on distinct neural representations. Accordingly, we did not expect a delay-related decrease in the reinstatement index. Consistent with this expectation, the analysis using the object – fixation similarity index – though also influenced by temporal autocorrelation – did not reveal any significant effect of session or delay in any ROI (all p > .059; see Table S9, S9.1).

      Together, these control analyses provide converging evidence that our findings are not driven by global or non-specific signal changes. We believe that these control analyses strengthen our interpretation about delay-related decrease in scene-specific reinstatement index. 

      (B) Do the recent items from day 1 vs. day 14 differ? If so, this could suggest something is different about the later scans (and if not, it would be reassuring). 

      The recent items tested on day 1 and day14 do not differ (all p. > .323). This effect remains stable across all ROIs.

      (b) For the category-based neural reinstatement: (1) This suffers from the same issue of correlations being performed within run. Again, to correct this the authors would need to restrict comparisons to only across runs (i.e., patterns from run 1 correlated with patterns for run 2 and so on). The authors in their response letter have indicated that because the patterns being correlated are not derived from events in close temporal proximity, they should not suffer from the issue of temporal autocorrelation. This is simply not true. For example, see the paper by Prince et al. (eLife 2022; on GLMsingle). This is not the main point of Prince et al.'s paper, but it includes a nice figure that shows that, using standard modelling approaches, the correlation between (same-run) patterns can be artificially elevated for lags as long as ~120 seconds (and can even be artificially reduced after that; Figure 5 from that paper) between events. This would affect many of the comparisons in the present paper. The cleanest way to proceed is to simply drop the within-run comparisons, which I believe the authors can do and yet they have not. Relatedly, in the response letter the authors say they are focusing mainly on the change over time for reinstatement at both levels including the gist-type reinstatement; however, this is not how it is discussed in the paper. They in fact are mainly relying on differences from zero, as children show some "above baseline" reinstatement while adults do not, but I believe there were no significant differences over time (i.e., the findings the authors said they would lean on primarily, as they are arguably the most comparable).  

      We thank the reviewer for this important comment regarding the potential inflation of similarity values due to within-run comparisons.

      To address the reviewer’s concern, we conducted an additional cross-run analysis for all correctly retrieved trials. The approach restricted comparisons to non-overlapping runs (run1run2, run2-run3, run1-run3). This analysis revealed robust gist-like reinstatement in children for remote Day 14 memories in the mPFC (p = .035) and vlPFC (p = .0007), in adults’ vlPFC remote Day 1 memories (p = .029), as well as in children and adults remote Day 1 memories in LOC (p < .02). A significant Session effect in both regions (mPFC: p = .026; vlPFC: p = .002) indicated increased reinstatement for long delay (Day 14) compared to short-delay and recent session (all p < .05). Given that the cross-run results largely replicate and reinforce the effects found previously with within-run, we believe that combining both sources of information is methodologically justified and statistically beneficial. Specifically, both approaches independently identified significant gist-like reinstatement in children’s mPFC and vlPFC (although within-run vlPFC effect (short delay: p = .038; long delay p = .047) did not survive multiple comparisons), particularly for remote memories. Including both withinrun and between-run comparisons increases the number of unique, non-repeated trial pairs, improving statistical power without introducing redundancy. While we acknowledge that same-run comparisons may be influenced by residual autocorrelation (as shown by Prince et al. 2022, eLife), we believe that our design mitigates this risk through consistency between within-run and cross-run results, long inter-trial intervals, and trial-wise estimation of activation. We have adjusted the manuscript, accordingly, reporting the combined analysis. We also report cross-run and within-run analysis separately in supplementary materials (Tables S12.1, S12.2, showing that they converge with the cross-run results and thus strengthen rather than dilute the findings. 

      As suggested, we now explicitly highlight the change over time as the central finding. We observe a clear increase in gist-like reinstatement from recent to remote memories in children, particularly in mPFC and vlPFC. These effects based on combined within- and cross-run comparisons, are now clearly stated in the main results and interpreted in the discussion accordingly. 

      (2) This analysis uses a different approach of comparing fixations to one another, rather than fixations to scenes. In their response letter and the revised paper, the authors do provide a bit of reasoning as to why this is the most sensible. However, it is still not clear to me whether this is really "reinstatement" which (in my mind) entails the re-evoking of a neural pattern initially engaged during perception. Rather, could this be a shared neural state that is category specific? 

      We thank the reviewer for raising this important conceptual point about whether our findings reflect reinstatement in the classical sense — namely, the reactivation of perceptual neural patterns — or a shared, category-specific state.

      While traditional definitions of reinstatement emphasize item-specific reactivation (e.g., Ritchey et al., 2013; Xiao et al., 2017) it is increasingly recognized that memory retrieval can also involve the reactivation of abstracted, generalized, or gist-like representations, especially as memories consolidate. Our analysis follows this view, aimed to capture how memory representations evolve over time, particularly in development.

      Several studies support this broader notion of gist-like reinstatement. For instance, Chen et al. (2017) showed that while event-specific patterns were reinstated across the default mode network and medial temporal lobe, inter-subject recall similarity exceeded encodingretrieval similarity, suggesting transformation and abstraction beyond perceptual reinstatement. Zhuang et al. (2021) further showed that loss of neural distinctiveness in the

      MTL over time predicted false memories, linking neural similarity to representational instability. This aligns with our finding that greater gist-like reinstatement is associated with lower memory accuracy.

      Ye et al. (2020) discuss how memory representations are reshaped post-encoding — becoming more differentiated, integrated, or weakened depending on task goals and neural resources. While their work focuses on adults, our previous findings (Schommartz et al., 2023) suggest that children’s neural systems (the same sample) are structurally immature, making them more likely to rely on gist-based consolidation (see Fandakova et al., 2019). Adults, by contrast, may retain more item-specific traces.

      Relatedly, St-Laurent & Buchsbaum (2019) show that with repeated encoding, neural memory representations become increasingly distinct from perception, suggesting that reinstatement need not mimic perception. We agree that reinstatement does not always reflect reactivation of low-level sensory patterns, particularly over long delays or in developing brains.

      Finally, while we did not correlate retrieval patterns directly with perceptual encoding patterns, we assessed neural similarity among retrieved items within vs. between categories, based on non-repeated, independently sampled trials. This approach is intended to capture the structure and delay-related transformation of mnemonic representations, especially in terms of how they become more schematic or gist-like over time. Our findings align conceptually with the results of Kuhl et al. (2012), who used MVPA to show that older and newer visual memories can be simultaneously reactivated during retrieval, with greater reactivation of older memories interfering with retrieval accuracy for newer memories. Their work highlights how overlapping category-level representations in ventral temporal cortex can reflect competition among similar memories, even in the absence of item-specific cues. In our developmental context, we interpret the increased neural similarity among category members in children as possibly reflecting such representational overlap or competition, where generalized traces dominate over item-specific ones. This pattern may reflect a shift toward efficient but less precise retrieval, consistent with developmental constraints on memory specificity and consolidation.

      In this context, we view our findings as evidence of memory trace reorganization — from differentiated, item-level representations toward more schematic, gist-like neural patterns (Sekeres et al., 2018), particularly in children. Our cross-run analyses further confirm that this is not an artifact of same-run correlations or low-level confounds. We have clarified this distinction and interpretation throughout the revised manuscript (see lines 144-158; 1163-1170).

      In any case, I think additional information should be added to the text to clarify that this definition differs from others in the literature. The authors might also consider using some term other than reinstatement. Again (as I noted in my prior review), the finding of no category-level reinstatement in adults is surprising and confusing given prior work and likely has to do with the operationalization of "reinstatement" here. I was not quite sure about the explanation provided in the response letter, as category-level reinstatement is quite widespread in the brain for adults and is robust to differences in analytic procedures etc. 

      We agree that our operationalization of "reinstatement" differs from more conventional uses of the term, which typically involve direct comparisons between encoding and retrieval phases, often with item-level specificity. As our analysis is based on similarity among retrieval-phase trials (fixation-based activation patterns) and focuses on within- versus between-category neural similarity, we agree that the term reinstatement may suggest a stronger encoding–retrieval mapping than we are claiming.

      To avoid confusion and overstatement, we have revised the terminology throughout the manuscript: we now refer to our measure as “gist-like representations” rather than “gist-like reinstatement.” This change better reflects the nature of our analysis — namely, that we are capturing shared neural patterns among category-consistent memories that may reflect reorganized or abstracted traces, especially after delay and in development.

      As the reviewer rightly points out, category-level reinstatement is well documented in adults (e.g., Kuhl & Chun, 2014; Tompary et al., 2020; Tompary & Davachi, 2017). The absence of such effects in our adult group may indeed reflect differences in study design, particularly our use of non-repeated, cross-trial comparisons based on fixation events. It may also reflect different consolidation strategies, with adults preserving more differentiated or item-specific representations, while children form more schematic or generalizable representations — a pattern consistent with our interpretation and supported by prior work (Fandakova et al., 2019; Sekeres et al., 2018) 

      We have updated the relevant sections of the manuscript (Results, Discussion (particularly lines 1163- 1184), and Figure captions) to clarify this terminology shift and explicitly contrast our approach with more standard definitions of reinstatement. We hope this revision provides the needed conceptual clarity while preserving the integrity of our developmental findings.

      (3) Also from a theoretical standpoint-I'm still a bit confused as to why gist-based reinstatement would involve reinstatement of the scene gist, rather than the object's location (on the screen) gist. Were the locations on the screen similar across scene backgrounds from the same category? It seems like a different way to define memory retrieval here would be to compare the neural patterns when cued to retrieve the same vs. similar (at the "gist" level) vs. different locations across object-scene pairs. This is somewhat related to a point from my review of the initial version of this manuscript, about how scene reinstatement is not necessary. The authors state that participants were instructed to reinstate the scene, but that does not mean they were actually doing it. The point that what is being measured via the reinstatement analyses is actually not necessary to perform the task should be discussed in more detail in the paper. 

      We appreciate the reviewer’s thoughtful theoretical question regarding whether our measure of “gist-like representations” might reflect reinstatement of spatial (object-location) gist, rather than scene-level gist. We would like to clarify several key points about our task design and interpretation:

      (1) Object locations were deliberately varied and context dependent.

      In our stimulus set, each object was embedded in a rich scene context, and the locations were distributed across six distinct possible areas within each scene, with three possible object placements per location. These placements were manually selected to ensure realistic and context-sensitive positioning of objects within the scenes. Importantly, locations were not fixed across scenes within a given category. For example, objects placed in “forest” scenes could appear in different screen locations across different scene exemplars (e.g., one in the bottom-left side, another floating above). Therefore, the task did not introduce a consistent spatial schema across exemplars from the same scene category that could give rise to a “location gist.”

      (2) Scene categories provided consistent high-level contextual information.

      By contrast, the scene categories (e.g., farming, forest, indoor, etc.) provided semantically coherent and visually rich contextual backgrounds that participants could draw upon during retrieval. This was emphasized in the instruction phase, where participants were explicitly encouraged to recall the whole scene based on the stories they created during learning (not just the object or its position). While we acknowledge that we cannot directly verify the reinstated content, this instruction aligns with prior studies showing that scene and context reinstatement can occur even without direct task relevance (e.g., Kuhl & Chun, 2014; Ritchey et al., 2013).

      (3) Our results are unlikely to reflect location-based reinstatement.

      If participants had relied on a “location gist” strategy, we would have expected greater neural similarity across scenes with similar spatial layouts, regardless of category. However, our design avoids this confound by deliberately varying locations across exemplars within categories. Additionally, our categorical neural similarity measure contrasted within-category vs. between-category comparisons — making it sensitive to shared contextual or semantic structure, not simply shared screen positions.

      Considering this, we believe that the neural similarity observed in the mPFC and vlPFC in children at long delay reflects the emergence of scene-level, gist-like representations, rather than low-level spatial regularities. Nevertheless, we now clarify this point in the manuscript and explicitly discuss the limitation that reinstatement of scene context was encouraged but not required for successful task performance.

      Future studies could dissociate spatial and contextual components of reinstatement more directly by using controlled spatial overlap or explicit location recall conditions. However, given the current task structure, location-based generalization is unlikely to account for the category-level similarity patterns we observe.

      (2) Inspired by another reviewer's comment, it is unclear to me the extent to which age group differences can be attributed to differences in age/development versus memory strength. I liked the other reviewer's suggestions about how to identify and control for differences in memory strength, which I don't think the authors actually did in the revision. They instead showed evidence that memory strength does seem to be lower in children, which indicates this is an interpretive confound. For example, I liked the reviewer's suggestion of performing analyses on subsets of participants who were actually matched in initial learning/memory performance would have been very informative. As it is, the authors didn't really control for memory strength adequately in my opinion, and as such their conclusions about children vs. adults could have been reframed as people with weak vs. strong memories. This is obviously a big drawback given what the authors want to conclude. Relatedly, I'm not sure the DDM was incorporated as the reviewer was suggesting; at minimum I think the authors need to do more work in the paper to explain what this means and why it is relevant. (I understand putting it in the supplement rather

      than the main paper, but I still wanted to know more about what it added from an interpretive perspective.) 

      We appreciate the reviewer’s thoughtful concerns regarding potential confounding effects of memory strength on the observed age group differences. This is indeed a critical issue when interpreting developmental findings.

      While we agree that memory strength differs between children and adults — and our own DDM-based analysis confirms this, mirroring differences observed in accuracy — we would like to emphasize that these differences are not incidental but rather reflect developmental changes in the underlying memory system. Given the known maturation of both structural and functional memory-related brain regions, particularly the hippocampus and prefrontal cortex, we believe it would be theoretically inappropriate to control for memory strength entirely, as doing so would remove variance that is central to the age-related neural effects we aim to understand.

      To address the reviewer's concern empirically, we conducted an additional control analysis in which we subsampled children to include only those who reached learning criterion after two cycles (N = 28 out of 49 children, see Table S1.1, S1.2, Figure S1, Table S9.1), thereby selecting a high-performing subgroup. Importantly, this subsample replicated behavioral and neural results to the full group. This further suggests that the observed age group differences are not merely driven by differences in memory strength.

      As abovementioned, the results of the DDM support our behavioral findings, showing that children have lower drift rates for evidence accumulation, consistent with weaker or less accessible memory representations. While these results are reported in the Supplementary Materials (section S2.1, Figure S2, Table S2), we agree that their interpretive relevance should be more clearly explained in the main text. We have therefore updated the Discussion section to explicitly state how the DDM results provide converging evidence for our interpretation that developmental differences in memory quality — not merely strategy or task performance — underlie the observed neural differences (see lines 904-926).

      In sum, we view memory strength not as a confound to be removed, but as a meaningful and theoretically relevant factor in understanding the emergence of gist-like representations in children. We have clarified this interpretive stance in the revised manuscript and now discuss the role of memory strength more explicitly in the Discussion.

      (3) Some of the univariate results reporting is a bit strange, as they are relying upon differences between retrieval of 1- vs. 14-day memories in terms of the recent vs. remote difference, and yet don't report whether the regions are differently active for recent and remote retrieval. For example in Figure 3A, neither anterior nor posterior hippocampus seem to be differentially active for recent vs. remote memories for either age group (i.e., all data is around 0). Precuneus also interestingly seems to show numerically recent>remote (values mostly negative), whereas most other regions show the opposite. This difference from zero (in either direction) or lack thereof seems important to the message. In response to this comment on the original manuscript, the authors seem to have confirmed that hippocampal activity was greater during retrieval than implicit baseline. But this was not really my question - I was asking whether hippocampus is (and other ROIs in this same figure are) differently engaged for recent vs. remote memories.

      We thank the reviewer for bringing up this important point. Our previous analysis showed that both anterior and posterior regions of the hippocampus, anterior parahippocampal gyrus and precuneus exhibited significant activation from zero in children and adults for correctly remembered items (see Fig. S2, Table S7 in Supplementary Materials). Based on your suggestion, our additional analysis showed: 

      (i) The linear mixed-effects model for correctly remembered items showed no significant interaction effects (group x session x memory age (recent, remote)) for the anterior hippocampus (all p > .146; see Table S7.1).

      (ii) For the posterior hippocampus, we observed a significant main effect of group (F(1,85),   = 5.62, p = .038), showing significantly lower activation in children compared to adults (b = .03, t = -2.34, p = .021). No other main or interaction effects were significant (all p > .08; see Table S7.1).

      (iii) For the anterior PHG, that also showed no significant remote > recent difference, the model showed that there was indeed no difference between remote and recent items across age groups and delays (all p > .194; Table S7.1). 

      Moreover, when comparing recent and remote hippocampal activation directly, there were no significant differences in either group (all FDR-adjusted p > .116; Table S7.2), supporting the conclusion that hippocampal involvement was stable across delays for successfully retrieved items. 

      In contrast, analysis of unsuccessfully remembered items showed that hippocampal activation was not significantly different from zero in either group (all FDR-adjusted p > .052; Fig. S2.1, Table S7.1), indicating that hippocampal engagement was specific to successful memory retrieval.

      To formally test whether hippocampal activation differs between remembered and forgotten items, we ran a linear mixed-effects model with Group, Memory Success (remembered vs. forgotten), and ROI (anterior vs. posterior hippocampus) as fixed effects. This model revealed a robust main effect of memory success (F(1,1198) = 128.27, p < .001), showing that hippocampal activity was significantly higher for remembered compared to forgotten items (b = .06, t(1207) = 11.29, p < .001; Table S7.3). 

      As the reviewer noted, precuneus activation was numerically higher for recent vs. remote items, and this was confirmed in our analysis. While both recent and remote retrieval elicited significantly above-zero activation in the precuneus (Table S7.2), activation for recent items was significantly higher than for remote items, consistent across both age groups.

      Taken together, these analyses support the conclusion that hippocampal involvement in successful retrieval is sustained across delays, while other ROIs such as the precuneus may show greater engagement for more recent memories. We have now updated the manuscript text ( lines 370-390) and supplementary materials to reflect these findings more clearly, as well as to clarify the distinction between activation relative to baseline and memory-agerelated modulation.

      (4) Related to point 3, the claims about hippocampus with respect to multiple trace theory feel very unsupported by the data. I believe the authors want to conclude that children's memory retrieval shows reliance on hippocampus irrespective of delay, presumably because this is a detailed memory task. However the authors have not really shown this; all they have shown is that hippocampal involvement (whatever it is) does not vary by delay. But we do not have compelling evidence that the hippocampus is involved in this task at all. That hippocampus is more active during retrieval than implicit baseline is a very low bar and does not necessarily indicate a role in memory retrieval. If the authors want to make this claim, more data are needed (e.g., showing that hippocampal activity during retrieval is higher when the upcoming memory retrieval is successful vs. unsuccessful). In the absence of this, I think all the claims about multiple trace theory supporting retrieval similarly across delays and that this is operational in children are inappropriate and should be removed. 

      We thank the reviewer for pointing this out. We agree that additional analysis of hippocampal activity during successful and unsuccessful memory retrieval is warranted. This will provide stronger support for our claim that strong, detailed memories during retrieval rely on the hippocampus in both children and adults. Our previously presented results on the remote > recent univariate signal difference in the hippocampus (p. 14-18; lines 433-376, Fig. 3A) show that this difference does not vary between children and adults, or between Day 1 and Day 14. Our further analysis showed that both anterior and posterior regions of the hippocampus exhibited significant activation from zero in children and adults for correctly remembered items (see Fig. S2, Table S7 in Supplementary Materials). Based on your suggestion, our recent additional analysis showed:

      (i) For forgotten items, we did not observe any activation significantly higher than zero in either the anterior or posterior hippocampus for recent and remote memory on Day 1 and Day 14 in either age group (all p > .052 FDR corrected; see Table S7.1, Fig. S2.1).

      (ii) After establishing no difference between recent and remote activation across and between sessions (Day 1, Day 14), we conducted another linear mixed-effects model with group x memory success (remembered, forgotten) x region (anterior hippocampus, posterior hippocampus), with subject as a random effect. The model showed no significant effects for the memory success x region interaction (F = 1.12(1,1198), p = .289) and no significant group x memory success x region interaction (F = .017(1,1198), p = .895). However, we observed a significant main effect of memory success (F = 128.27(1,1198), p < .001), indicating significantly higher hippocampal activation for remembered compared to forgotten items (b = .06, t = 11.29, p <.001; see Table S7.3).

      (iii) Considering the comparatively low number of incorrect trials for recent items in the adult group, we reran this analysis only for remote items. Similarly, the model showed no significant effects for the memory success x region interaction (F = .72(1,555), p = .398) and no significant group x memory success x region interaction (F = .14(1,555), p = .705). However, we observed a significant main effect of memory success (F = 68.03(1,555), p < .001), indicating significantly higher hippocampal activation for remote remembered compared to forgotten items (b = .07, t = 8.20, p <.001; see Table S7.3).

      Taken together, our results indicate that significant hippocampal activation was observed only for correctly remembered items in both children and adults, regardless of memory age and session. For forgotten items, we did not observe any significant hippocampal activation in either group or delay. Moreover, hippocampal activation was significantly higher for remembered compared to forgotten memories. This evidence supports our conclusions regarding the Multiple Trace and Trace Transformation Theories, suggesting that the hippocampus supports retrieval similarly across delays, and provides novel evidence that this process is operational in both children and adults. This aligns also with Contextual Bindings Theory, as well as empirical evidence by Sekeres, Winokur, & Moscovitch (2018), among others. We have added this information to the manuscript.

      (5) There are still not enough methodological details in the main paper to make sense of the results. Some of these problems were addressed in the revision but others remain. For example, a couple of things that were unclear: that initially learned locations were split, where half were tested again at day 1 and the other half at day 14; what specific criterion was used to determine to pick the 'well-learned' associations that were used for comparisons at different delay periods (object-scene pairs that participants remembered accurately in the last repetition of learning? Or across all of learning?). 

      We thank the reviewer for pointing this out. The initially learned object-scene associations on Day 0 were split in two halves based on  their categories before the testing. Specifically, half of the pairs from the first set and half of the pairs from the second set of 30 object-scene associations were used to create the set 30 remote pair for Day 1 testing. A similar procedure was repeated for the remaining pairs to create a set of remote object-scene associations for Day 14 retrieval. We tried to equally distribute the categories of pairs between the testing sets. We added this information to the methods section of the manuscript (see p. 47, lines 12371243). In addition, the sets of association for delay test on Day 1 and Day 14 were not based on their learning accuracy. Of note, the analysis of variance revealed that there was no difference in learning accuracy between the two sets created for delay tests in either age group (children: p = .23; adults  p = .06). These results indicate that the sets were comprised of items learned with comparable accuracy in both age groups. 

      (6) In still find the revised Introduction a bit unclear. I appreciated the added descriptions of different theories of consolidation, though the order of presented points is still a bit hard to follow. Some of the predictions I also find a bit confusing as laid out in the introduction. (1) As noted in the paper multiple trace theory predicts that hippocampal involvement will remain high provided memories retained are sufficiently high detail. The authors however also predict that children will rely more on gist (than detailed) memories than adults, which would seem to imply (combined with the MTT idea) that they should show reduced hippocampal involvement over time (while in adults, it should remain high). However, the authors' actual prediction is that hippocampus will show stable involvement over time in both kids and adults. I'm having a hard time reconciling these points. (2) With respect to the extraction of gist in children, I was confused by the link to Fuzzy Trace Theory given the children in the present study are a bit young to be showing the kind of gist extraction shown in the Brainerd & Reyna data. Would 5-7 year olds not be more likely to show reliance on verbatim traces under that framework? Also from a phrasing perspective, I was confused about whether gist-like information was something different from just gist in this sentence: "children may be more inclined to extract gist information at the expense of detailed or gist-like information." (p. 8) - is this a typo? 

      We thank the reviewer for this thoughtful observation. 

      Our hypothesis of stable hippocampal engagement over time was primarily based on Contextual Binding Theory (Yonelinas et al., 2019), and the MTT, supported by the evidence provided by Sekeres et al., 2018, which posits that the hippocampus continues to support retrieval when contextual information is preserved, even for older, consolidated memories. Given that our object-location associations were repeatedly encoded and tied to specific scene contexts, we believe that retrieval success for both recent and remote memories likely involved contextual reinstatement, leading to sustained hippocampal activity. Also in accordance with the MTT and related TTT, different memory representations may coexist, including detailed and gist-like memories. Therefore, we suggest that children may not rely on highly detailed item-specific memory, but rather on sufficiently contextualized schematic traces, which still engage the hippocampus. This distinction is now made clearer in the Introduction (see lines 223-236).

      We appreciate the reviewer’s point regarding Fuzzy Trace Theory (Brainerd & Reyna, 2002). Indeed, in classic FTT, young children are thought to rely more on verbatim traces due to immature gist extraction mechanisms (primarily from verbal material). However, we use the term “gist-like representations” to refer to schematic or category-level retrieval that emerges through structured, repeated learning (as in our task). This form of abstraction may not require full semantic gist extraction in the FTT sense but may instead reflect consolidation-driven convergence onto shared category-level representations — especially when strategic resources are limited. We now clarify this distinction and revise the ambiguous sentence with typo (“at the expense of detailed or gist-like information”) to better reflect our intended meaning (see p.8).

      (7) For the PLSC, if I understand this correctly, the profiles were defined for showing associations with behaviour across age groups. (1) As such, is it not "double dipping" to then show that there is an association between brain profile and behaviour-must this not be true by definition? If I am mistaken, it might be helpful to clarify this in the paper. (2) In addition, I believe for the univariate and scene-specific reinstatement analyses these profiles were defined across both age groups. I assume this doesn't allow for separate definition of profiles across the two group (i.e., a kind of "interaction"). If this is the case, it makes sense that there would not be big age differences... the profiles were defined for showing an association across all subjects. If the authors wanted to identify distinct profiles in children and adults they may need to run another analysis. 

      We thank the reviewer for this thoughtful comment. 

      (1) We agree that showing the correlation between the latent variable and behavior may be redundant, as the relationship is already embedded in the PLSC solution and quantified by the explained variance. Our intention was merely to visualize the strength of this relationship. In hindsight, we agree that this could be misinterpreted, and we have removed the additional correlation figure from the manuscript.

      We also see the reviewer’s point that, given the shared latent profile across groups, it is expected that the strength of the brain-behavior relationship does not differ between age groups. Instead, to investigate group differences more appropriately, we examined whether children and adults differed in their expression of the shared latent variable (i.e., brain scores). This analysis revealed that children showed significantly lower brain scores than adults both in short delay, t(83) = -4.227, p = .0001, and long delay, t(74) = -5.653, p < .001, suggesting that while the brain-behavior profile is shared, its expression varies by group. We have added this clarification to the Results section (p. 19-20) of the revised manuscript. 

      (2) Regarding the second point, we agree with the reviewer that defining the PLS profiles across both age groups inherently limits the ability to detect group-specific association, as the resulting latent variables represent shared pattern across the full sample. To address this, we conducted additional PLS analyses separately within each age group to examine whether distinct neural upregulation profiles (remote > recent) emerge for short and long delay conditions.

      These within-group analyses, however, were based on smaller subsamples, which reduced statistical power, especially when using bootstrapping to assess the stability of the profiles. For the short delay, although some regions reached significance, the overall latent variables did not reach conventional thresholds for stability (all p > .069), indicating that the profiles were not robust. This suggests that within-group PLS analyses may be underpowered to detect subtle effects, particularly when modelling neural upregulation (remote > recent), which may be inherently small.

      Nonetheless, when we exploratively applied PLSC separately within each group using recent and remote activity levels against the implicit baseline (rather than the contrast remote > recent) and its relation to memory performance, we observed significant and stable latent variables in both children and adults. This implies that such contrasts (vs. baseline) may be more sensitive and better suited to detect meaningful brain–behavior relationships within age groups. We have added this clarification to the Results sections of the manuscript to highlight the limitations of within-group contrasts for neural upregulation. 

      Author response image 1.

      (3) Also, as for differences between short delay brain profile and long delay brain profile for the scene-specific reinstatement - there are 2 regions that become significant at long delay that were not significant at a short delay (PC, and CE). However, given there are ceiling effects in behaviour at the short but not long delay, it's unclear if this is a meaningful difference or just a difference in sensitivity. Is there a way to test whether the profiles are statistically different from one another?

      We thank the reviewer for this comment. To better illustrate differential profiles also for high memory accuracy after immediate delay (30 minutes delay), we added the immediate (30 minutes delay) condition as a third reference point, given the availability of scene-specific reinstatement data at this time point. Interestingly, the immediate reinstatement profile revealed a different set of significant regions, with distinct expression patterns compared to both the short and long delay conditions. This supports the view that scene-specific reinstatement is not static but dynamically reorganized over time.

      Regarding the ceiling effect at short delay, we acknowledge this as a potential limitation. However, we note that our primary analyses were conducted across both age groups combined, and not solely within high-performing individuals. As such, the grouping may mitigate concerns that ceiling-level performance in a subset of participants unduly influenced the overall reinstatement profile. Moreover, we observed variation in neural reinstatement despite ceiling-level behavior, suggesting that the neural signal retains sensitivity to consolidation-related processes even when behavioral accuracy is near-perfect.

      While we agree that formal statistical comparisons of reinstatement profiles across delays (e.g., using representational profile similarity or interaction tests) could be an informative direction, we feel that this goes beyond the scope of the current manuscript. 

      (4) As I mentioned above, it also was not ideal in my opinion that all regions were included for the scene-specific reinstatement due to the authors' inability to have an appropriate baseline and therefore define above-chance reinstatement. It makes these findings really challenging to compare with the gist reinstatement ones. 

      We appreciate the reviewer’s comment and agree that the lack of a clearly defined baseline for scene-specific reinstatement limits our ability to determine whether these values reflect above-chance reinstatement. However, we would like to clarify that we do not directly compare the magnitude of scene-specific reinstatement to that of gist-like reinstatement in our analyses or interpretations. These two analyses serve complementary purposes: the scenespecific analysis captures trial-unique similarity (within-item reinstatement), while the gistlike analysis captures category-level representational structure (across items). Because they differ not only in baseline assumptions but also in analytical scope and theoretical interpretation, our goal was not to compare them directly, but rather to explore distinct but co-existing representational formats that may evolve differently across development and delay.

      (8) I would encourage the authors to be specific about whether they are measuring/talking about memory representations versus reinstatement, unless they think these are the same thing (in which case some explanation as to why would be helpful). For example, especially under the Fuzzy Trace framework, couldn't someone maintain both verbatim and gist traces of a memory yet rely more on one when making a memory decision? 

      We thank the reviewer for pointing out the importance of conceptual clarity when referring to memory representations versus reinstatement. We agree that these are distinct but related concepts: in our framework, memory representations refer to the neural content stored as a result of encoding and consolidation, whereas reinstatement refers to the reactivation of those representations during retrieval. Thus, reinstatement serves as a proxy for the underlying memory representation — it is how we measure or infer the nature (e.g., specificity, abstraction) of the stored content.

      Under Fuzzy Trace Theory, it is indeed possible for both verbatim and gist representations to coexist. Our interpretation is not that children lack verbatim traces, but rather that they are more likely to rely on schematic or gist-like representations during retrieval, especially after a delay. Our use of neural pattern similarity (reinstatement) reflects which type of representation is being accessed, not necessarily which traces exist in parallel.

      To avoid ambiguity, we have revised the manuscript to more explicitly distinguish between reinstatement (neural reactivation) and the representational format (verbatim vs. gist-like), especially in the framing of our hypotheses and interpretation of age group differences.

      (9) With respect to the learning criteria - it is misleading to say that "children needed between two to four learning-retrieval cycles to reach the criterion of 83% correct responses" (p. 9). Four was the maximum, and looking at the Figure 1C data it appears as though there were at least a few children who did not meet the 83% minimum. I believe they were included in the analysis anyway? Please clarify. Was there any minimum imposed for inclusion?

      We thank the reviewer for pointing this out. As stated in Methods Section (p. 50, lines 13261338) “These cycles ranged from a minimum of two to a maximum of four.<…> The cycles ended when participants provided correct responses to 83% of the trials or after the fourth cycle was reached.” We have corrected the corresponding wording in the Results section (line 286-289) to reflect this more accurately. Indeed, five children did not reach the 83% criterion but achieved final performance between 70 and 80% after the fourth learning cycle. These participants were included in this analysis for two main reasons:

      (1) The 83% threshold was established during piloting as a guideline for how many learningretrieval cycles to allow, not a strict learning criterion. It served to standardize task continuation, rather than to exclude participants post hoc.

      (2) The performance of these five children was still well above chance level (33%), indicating meaningful learning. Excluding them would have biased the sample toward higherperforming children and reduced the ecological validity of our findings. Including them ensures a more representative view of children’s performance under extended learning conditions.

      (10) For the gist-like reinstatement PLSC analysis, results are really similar a short and long delays and yet some of the text seems to implying specificity to the long delay. One is a trend and one is significant (p. 31), but surely these two associations would not be statistically different from one another?  

      We agree with the reviewer that the associations at short and long delays appeared similar. While a formal comparison (e.g., using a Z-test for dependent correlations) would typically be warranted, in the reanalyzed dataset only the long delay profile remains statistically significant, which limits the interpretability of such a comparison. 

      (11) As a general comment, I had a hard time tying all of the (many) results together. For example adults show more mature neocortical consolidation-related engagement, which the authors say is going to create more durable detailed memories, but under multiple trace theory we would generally think of neocortical representations as providing more schematic information. If the authors could try to make more connections across the different neural analyses, as well as tie the neural findings in more closely with the behaviour & back to the theoretical frameworks, that would be really helpful.  

      We thank the reviewer for this valuable suggestion. We have revised the discussion section to more clearly link the behavioral and neural findings and to interpret them in light of existing consolidation theories for better clarity. 

      Reviewer #2 (Public Review): 

      Schommartz et al. present a manuscript characterizing neural signatures of reinstatement during cued retrieval of middle-aged children compared to adults. The authors utilize a paradigm where participants learn the spatial location of semantically related item-scene memoranda which they retrieve after short or long delays. The paradigm is especially strong as the authors include novel memoranda at each delayed time point to make comparisons across new and old learning. In brief, the authors find that children show more forgetting than adults, and adults show greater engagement of cortical networks after longer delays as well as stronger item-specific reinstatement. Interestingly, children show more category-based reinstatement, however, evidence supports that this marker may be maladaptive for retrieving episodic details. The question is extremely timely both given the boom in neurocognitive research on the neural development of memory, and the dearth of research on consolidation in this age group. Also, the results provide novel insights into why consolidation processes may be disrupted in children. 

      We thank the reviewer for the positive evaluation.

      Comments on the revised version: 

      I carefully reviewed not only the responses to my own reviews as well as those raised by the other reviewers. While they addressed some of the concerns raised in the process, I think many substantive concerns remain. 

      Regarding Reviewer 1: 

      The authors point that the retrieval procedure is the same over time and similarly influenced by temporal autocorrelations, which makes their analysis okay. However, there is a fundamental problem as to whether they are actually measuring reinstatement or they are only measuring differences in temporal autocorrelation (or some non-linear combination of both). The authors further argue that the stimuli are being processed more memory wise rather than perception wise, however, I think there is no evidence for that and that perception-memory processes should be considered on a continuum rather than as discrete processes. Thus, I agree with reviewer 1 that these analyses should be removed. 

      We thank the reviewer for raising this important question. We would like to clarify a few key points regarding temporal autocorrelation and reinstatement.

      During the fixation window, participants were instructed to reinstate the scene and location associated with the cued object from memory. This task was familiar to them, as they had been trained in retrieving locations within scenes. Our analysis aims to compare the neural representations during this retrieval phase with those when participants view the scene, in order to assess how these representations change in similarity over time, as memories become less precise.

      We acknowledge that temporal proximity can lead to temporal autocorrelation. However, evidence suggests that temporal autocorrelation is consistent and stable across conditions (Gautama & Van Hulle, 2004; Woolrich et al., 2004). Shinn & Lagalwar (2021)further demonstrated that temporal autocorrelation is highly reliable at both the subject and regional levels. Given that we analyze regions of interest (ROIs) separately, potential spatial variability in temporal autocorrelation is not a major concern.

      No difference between item-specific reinstatement for recent items on day 1 and day 14 (which were merged) for further delay-related comparison also suggests that the reinstatement measure was stable for recent items even sampled at two different testing days. 

      Importantly, we interpret the relative change in the reinstatement index rather than its absolute value.

      In addition, when we conducted the same analysis for incorrectly retrieved memories, we did not observe any delay-related decline in reinstatement (see p. 25, lines 623-627). This suggests that the delay-related changes in reinstatement are specific to correctly retrieved memories. 

      Finally, our control analysis examining reinstatement between object and fixation time points (as suggested by Reviewer 1) revealed no delay-related effects in any ROI (see p.24, lines 605-612), further highlighting the specificity of the observed delay-related change in item reinstatement.

      We emphasize that temporal autocorrelation should be similar across all retrieval delays due to the identical task design and structure. Therefore, any observed decrease in reinstatement with increasing delay likely reflects a genuine change in the reinstatement index, rather than differences in temporal autocorrelation. Since our analysis includes only correctly retrieved items, and there is no perceptual input during the fixation window, this process is inherently memory-based, relying on mnemonic retrieval rather than sensory processing.

      We respectfully disagree with the reviewer's assertion that retrieval during the fixation period cannot be considered more memory-driven than perception-driven. At this time point, participants had no access to actual images of the scene, making it necessary for them to rely on mnemonic retrieval. The object cue likely triggered pattern completion for the learned object-scene association, forming a unique memory if remembered correctly(Horner & Burgess, 2013). This process is inherently mnemonic, as it is based on reconstructing the original neural representation of the scene (Kuhl et al., 2012; Staresina et al., 2013).

      While perception and memory processes can indeed be viewed as a continuum, some cognitive processes are predominantly memory-based, involving reconstruction rather than reproduction of previous experiences (Bartlett, 1932; Ranganath & Ritchey, 2012). In our task, although the retrieved material is based on previously encoded visual information, the process of recalling this information during the fixation period is fundamentally mnemonic, as it does not involve visual input. Our findings indicate that the similarity between memorybased representations and those observed during actual perception decreases over time, suggesting a relative change in the quality of the representations. However, this does not imply that detailed representations disappear; they may still be robust enough to support correct memory recall. Previous studies examining encoding-retrieval similarity have shown similar findings(Pacheco Estefan et al., 2019; Ritchey et al., 2013).

      We do not claim that perception and memory processes are entirely discrete, nor do we suggest that only perception is involved when participants see the scene. Viewing the scene indeed involves recognition processes, updating retrieved representations from the fixation period, and potentially completing missing or unclear information. This integrative process demonstrates the interrelation of perception and memory, especially in complex tasks like the one we employed.

      In conclusion, our task design and analysis support the interpretation that the fixation period is primarily characterized by mnemonic retrieval, facilitated by cue-triggered pattern completion, rather than perceptual processing. We believe this approach aligns with the current understanding of memory retrieval processes as supported by the existing literature.

      The authors seem to have a design that would allow for across run comparisons, however, they did not include these additional analyses. 

      Thank you for pointing this out. We ran as additional cross-run comparison. This results and further proceeding are reported in the comment for reviewer 1. 

      To address the reviewer’s concern, we conducted an additional cross-run analysis for all correctly retrieved trials. The approach restricted comparisons to non-overlapping runs (run1run2, run2-run3, run1-run3). This analysis revealed robust gist-like reinstatement in children for remote Day 14 memories in the mPFC (p = .035) and vlPFC (p = .0007), in adults’ vlPFC remote Day 1 memories (p = .029), as well as in children and adults remote Day 1 memories in LOC (p < .02). A significant Session effect in both regions (mPFC: p = .026; vlPFC: p = .002) indicated increased reinstatement for long delay (Day 14) compared to short-delay and recent session (all p < .05). Given that the cross-run results largely replicate and reinforce the effects found previously with within-run, we believe that combining both sources of information is methodologically justified and statistically beneficial. Specifically, both approaches independently identified significant gist-like reinstatement in children’s mPFC and vlPFC (although within-run vlPFC effect (short delay: p = .038; long delay p = .047) did not survive multiple comparisons), particularly for remote memories. Including both withinrun and between-run comparisons increases the number of unique, non-repeated trial pairs, improving statistical power without introducing redundancy. While we acknowledge that same-run comparisons may be influenced by residual autocorrelation(Prince et al., 2022), we believe that our design mitigates this risk through consistency between within-run and crossrun results, long inter-trial intervals, and trial-wise estimation of activation. We have adjusted the manuscript, accordingly, reporting the combined analysis. We also report cross-run and within-run analysis separately in supplementary materials (Tables S12.1, S12.2, showing that they converge with the cross-run results and thus strengthen rather than dilute the findings. 

      As suggested, we now explicitly highlight the change over time as the central finding. We observe a clear increase in gist-like reinstatement from recent to remote memories in children, particularly in mPFC and vlPFC. These effects based on combined within- and cross-run comparisons, are now clearly stated in the main results and interpreted in the discussion accordingly. 

      (1) The authors did not satisfy my concerns about different amounts of re-exposures to stimuli as a function of age, which introduces a serious confound in the interpretation of the neural data. 

      (2) Regarding Reviewer 1's point about different number of trials being entered into analysis, I think a more formal test of sub-sampling the adult trials is warranted. 

      (1) We thank the reviewer for pointing this out. Overall, children needed 2 to 4 learning cycles to improve their performance and reach the learning criteria, compared to 2 learning cycles in adults. To address the different amounts of re-exposure to stimuli between the age groups, we subsampled the child group to only those children who reached the learning criteria after 2 learning cycles. For this purpose, we excluded 21 children from the analysis who needed 3 or 4 learning cycles. This resulted in 39 young adults and 28 children being included in the subsequent analysis. 

      (i) We reran the behavioral analysis with the subsampled dataset (see Supplementary Materials,  Table S1.1, Fig. S1, Table S1.2). This analysis replicated the previous findings of less robust memory consolidation in children across all time delays. 

      (ii) We reran the univariate analysis (see in Supplementary Materials, Table S9.1). This analysis also replicated fully the previous findings. This indicates that the inclusion of child participants with greater material exposure during learning in the analysis of neural retrieval patterns did not affect the group differences in univariate neural results. 

      These subsampled results demonstrated that the amount of re-exposure to stimuli during encoding does not affect consolidation-related changes in memory retrieval at the behavioral and neural levels in children and adults across all time delays. We have added this information to the manuscript (line 343-348, 420-425). 

      (2) We appreciate Reviewer 1's suggestion to perform a formal test by sub-sampling the adult trials to match the number of trials in the child group. However, we believe that this approach may not be optimal for the following reasons:

      (i) Loss of Statistical Power: Sub-sampling the adult trials would result in a reduced sample size, potentially leading to a significant loss of statistical power and the ability to detect meaningful effects, particularly in a context where the adult group is intended to serve as a robust control or comparison group.

      (ii) Introducing sub-sampling could introduce variability that complicates the interpretation of results, particularly if the trial sub-sampling process does not fully capture the variability inherent in the original adult data.

      (iii) Robustness of Existing Findings: We have already addressed potential concerns about unequal trial numbers by conducting analyses that control for the number of learning cycles, as detailed in our supplementary materials. These analyses have shown that the observed effects are consistent, suggesting that the differences in trial numbers do not critically influence our findings.

      Given these considerations, we hope the reviewer understands our rationale and agrees that the current analysis is robust and appropriate for addressing the research questions.

      I also still fundamentally disagree with the use of global signals when comparing children to adults, and think this could very much skew the results. 

      We thank the reviewer for raising this important issue. To address this concern comprehensively, we have taken the following steps:

      (1) Overview of the literature support for global signal regression (GSR). A growing body of methodological and empirical research supports the inclusion of global signal repression as part of best practice denoising pipelines, particularly when analyzing pediatric fMRI data. Studies such as (Ciric et al., 2017; Parkes et al., 2018; J. D. Power et al., 2012, 2014; Power et al., 2012), and (Thompson et al., 2016) show that  GSR improves motion-related artifact removal. Critically, pediatric-specific studies (Disselhoff et al., 2025; Graff et al., 2022) conclude that pipelines including GSR are most effective for signal recovery and artifact removal in younger children. Graff et al. (2021) demonstrated that among various pipelines, GSR yielded the best noise reduction in 4–8-year-olds. Additionally, (Li et al., 2019; Qing et al., 2015) emphasized that GSR reduces artifactual variance without distorting the spatial structure of neural signals. (Ofoghi et al., 2021)demonstrated that global signal regression helps mitigate non-neuronal noise sources, including respiration, cardiac activity, motion, vasodilation, and scanner-related artifacts. Based on this and other recent findings, we consider GSR particularly beneficial for denoising paediatric  fMRI data in our study.

      (2) Empirical comparison of pipelines with and without GSR. We re-run the entire first-level univariate analysis using the pipeline that excluded the global signal regression. The resulting activation maps (see Supplementary Figure S3.2, S4.2, S5.2, S9.2) differed notably from the original pipeline. Specifically, group differences in cortical regions such as mPFC, cerebellum, and posterior PHG no longer reached significance, and the overall pattern of results appeared noisier. 

      (3) Evaluation of the pipeline differences. To further evaluate the impact of GSR, we conducted the following analyses:

      (a) Global signal is stable across groups and sessions. A linear mixed-effects model showed no significant main effects or interactions involving group or session on the global signal (F-values < 2.62, p > .11), suggesting that the global signal was not group- or session-dependent in our sample. 

      (b) Noise Reduction Assessment via Contrast Variability. We compared the variability (standard deviation and IQR) of contrast estimates across pipelines. Both SD (b = .070, p < .001) and IQR (b = .087, p < .001) were significantly reduced in the GSR pipeline, especially in children (p < .001) compared to adults (p = .048). This suggests that GSR reduces inter-subject variability in children, likely reflecting improved signal quality.

      (c) Residual Variability After Regressing Global Signal. We regressed out global signal post hoc from both pipelines and compared the residual variance. Residual standard deviation was significantly lower for the GSR pipeline (F = 199, p < .001), with no interaction with session or group, further indicating that GSR stabilizes the signal and attenuates non-neuronal variability.

      Conclusion

      In summary, while we understand the reviewer’s concern, we believe the empirical and theoretical support for GSR, especially in pediatric samples, justifies its use in our study. Nonetheless, to ensure full transparency, we provide full results from both pipelines in the Supplementary Materials and have clarified our reasoning in the revised manuscript.

      Reviewer #1 (Recommendations For The Authors): 

      (1) Some figures are still missing descriptions of what everything on the graph means; please clarify in captions. 

      We thank the reviewer for pointing this out. We undertook the necessary adjustments in the graph annotations. 

      (2) The authors conclude they showed evidence of neural reorganization of memory representations in children (p. 41). But the gist is not greater in children than adults, and also does not differ over time-so, I was confused about what this claim was based on? 

      We thank the reviewer for raising this question. Our results on gist-like reinstatements suggest that gist-like reinstatement was significantly higher in children compared to adults in the mPFC in addition to the child gist-like reinstatement indices being significantly higher than zero (see p.27-28). These results support our claim on neural reorganization of memory represenations in children. We hope this clarifies the issue. 

      References

      Bartlett, F. C. (1932). Remembering: A study in experimental and social psychology. Cambridge University Press.

      Brainerd, C. J., & Reyna, V. F. (2002). Fuzzy-Trace Theory: Dual Processes in Memory, Reasoning, and Cognitive Neuroscience (pp. 41–100). https://doi.org/10.1016/S00652407(02)80062-3

      Chen, J., Leong, Y. C., Honey, C. J., Yong, C. H., Norman, K. A., & Hasson, U. (2017). Shared memories reveal shared structure in neural activity across individuals. Nature Neuroscience, 20(1), 115–125. https://doi.org/10.1038/nn.4450

      Ciric, R., Wolf, D. H., Power, J. D., Roalf, D. R., Baum, G. L., Ruparel, K., Shinohara, R. T., Elliott, M. A., Eickhoff, S. B., Davatzikos, C., Gur, R. C., Gur, R. E., Bassett, D. S., & Satterthwaite, T. D. (2017). Benchmarking of participant-level confound regression strategies for the control of motion artifact in studies of functional connectivity. NeuroImage, 154, 174–187. https://doi.org/10.1016/j.neuroimage.2017.03.020

      Disselhoff, V., Jakab, A., Latal, B., Schnider, B., Wehrle, F. M., Hagmann, C. F., Held, U., O’Gorman, R. T., Fauchère, J.-C., & Hüppi, P. (2025). Inhibition abilities and functional brain connectivity in school-aged term-born and preterm-born children. Pediatric Research, 97(1), 315–324. https://doi.org/10.1038/s41390-024-03241-0

      Esteban, O., Ciric, R., Finc, K., Blair, R. W., Markiewicz, C. J., Moodie, C. A., Kent, J. D., Goncalves, M., DuPre, E., Gomez, D. E. P., Ye, Z., Salo, T., Valabregue, R., Amlien, I. K., Liem, F., Jacoby, N., Stojić, H., Cieslak, M., Urchs, S., … Gorgolewski, K. J. (2020). Analysis of task-based functional MRI data preprocessed with fMRIPrep. Nature Protocols, 15(7), 2186–2202. https://doi.org/10.1038/s41596-020-0327-3

      Fandakova, Y., Leckey, S., Driver, C. C., Bunge, S. A., & Ghetti, S. (2019). Neural specificity of scene representations is related to memory performance in childhood. NeuroImage, 199, 105–113. https://doi.org/10.1016/j.neuroimage.2019.05.050

      Gautama, T., & Van Hulle, M. M. (2004). Optimal spatial regularisation of autocorrelation estimates in fMRI analysis. NeuroImage, 23(3), 1203–1216.  https://doi.org/10.1016/j.neuroimage.2004.07.048

      Graff, K., Tansey, R., Ip, A., Rohr, C., Dimond, D., Dewey, D., & Bray, S. (2022). Benchmarking common preprocessing strategies in early childhood functional connectivity and intersubject correlation fMRI. Developmental Cognitive Neuroscience, 54, 101087. https://doi.org/10.1016/j.dcn.2022.101087

      Horner, A. J., & Burgess, N. (2013). The associative structure of memory for multi-element events. Journal of Experimental Psychology: General, 142(4), 1370–1383. https://doi.org/10.1037/a0033626

      Jones, J. S., the CALM Team, & Astle, D. E. (2021). A transdiagnostic data-driven study of children’s behaviour and the functional connectome. Developmental Cognitive Neuroscience, 52, 101027. https://doi.org/10.1016/j.dcn.2021.101027

      Kuhl, B. A., Bainbridge, W. A., & Chun, M. M. (2012). Neural Reactivation Reveals Mechanisms for Updating Memory. Journal of Neuroscience, 32(10), 3453–3461. https://doi.org/10.1523/JNEUROSCI.5846-11.2012

      Kuhl, B. A., & Chun, M. M. (2014). Successful Remembering Elicits Event-Specific Activity Patterns in Lateral Parietal Cortex. Journal of Neuroscience, 34(23), 8051–8060. https://doi.org/10.1523/JNEUROSCI.4328-13.2014

      Li, J., Kong, R., Liégeois, R., Orban, C., Tan, Y., Sun, N., Holmes, A. J., Sabuncu, M. R., Ge, T., & Yeo, B. T. T. (2019). Global signal regression strengthens association between resting-state functional connectivity and behavior. NeuroImage, 196, 126–141. https://doi.org/10.1016/j.neuroimage.2019.04.016

      Ofoghi, B., Chenaghlou, M., Mooney, M., Dwyer, D. B., & Bruce, L. (2021). Team technical performance characteristics and their association with match outcome in elite netball. International Journal of Performance Analysis in Sport, 21(5), 700–712. https://doi.org/10.1080/24748668.2021.1938424

      Pacheco Estefan, D., Sánchez-Fibla, M., Duff, A., Principe, A., Rocamora, R., Zhang, H., Axmacher, N., & Verschure, P. F. M. J. (2019). Coordinated representational reinstatement in the human hippocampus and lateral temporal cortex during episodic memory retrieval. Nature Communications, 10(1), 2255. https://doi.org/10.1038/s41467019-09569-0

      Parkes, L., Fulcher, B., Yücel, M., & Fornito, A. (2018). An evaluation of the efficacy, reliability, and sensitivity of motion correction strategies for resting-state functional MRI. NeuroImage, 171, 415–436. https://doi.org/10.1016/j.neuroimage.2017.12.073

      Power, J. D., Barnes, K. A., Snyder, A. Z., Schlaggar, B. L., & Petersen, S. E. (2012). Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion. NeuroImage, 59(3), 2142–2154. https://doi.org/10.1016/j.neuroimage.2011.10.018

      Power, J. D., Mitra, A., Laumann, T. O., Snyder, A. Z., Schlaggar, B. L., & Petersen, S. E. (2014). Methods to detect, characterize, and remove motion artifact in resting state fMRI. NeuroImage, 84, 320–341. https://doi.org/10.1016/j.neuroimage.2013.08.048

      Power, S. D., Kushki, A., & Chau, T. (2012). Intersession Consistency of Single-Trial Classification of the Prefrontal Response to Mental Arithmetic and the No-Control State by NIRS. PLoS ONE, 7(7), e37791. https://doi.org/10.1371/journal.pone.0037791

      Prince, J. S., Charest, I., Kurzawski, J. W., Pyles, J. A., Tarr, M. J., & Kay, K. N. (2022). Improving the accuracy of single-trial fMRI response estimates using GLMsingle. ELife, 11. https://doi.org/10.7554/eLife.77599

      Qing, Z., Dong, Z., Li, S., Zang, Y., & Liu, D. (2015). Global signal regression has complex effects on regional homogeneity of resting state fMRI signal. Magnetic Resonance Imaging, 33(10), 1306–1313. https://doi.org/10.1016/j.mri.2015.07.011

      Ranganath, C., & Ritchey, M. (2012). Two cortical systems for memory-guided behaviour. Nature Reviews Neuroscience, 13(10), 713–726. https://doi.org/10.1038/nrn3338

      Ritchey, M., Wing, E. A., LaBar, K. S., & Cabeza, R. (2013). Neural Similarity Between Encoding and Retrieval is Related to Memory Via Hippocampal Interactions. Cerebral Cortex, 23(12), 2818–2828. https://doi.org/10.1093/cercor/bhs258

      Satterthwaite, T. D., Elliott, M. A., Gerraty, R. T., Ruparel, K., Loughead, J., Calkins, M. E., Eickhoff, S. B., Hakonarson, H., Gur, R. C., Gur, R. E., & Wolf, D. H. (2013). An improved framework for confound regression and filtering for control of motion artifact in the preprocessing of resting-state functional connectivity data. NeuroImage, 64, 240–256. https://doi.org/10.1016/j.neuroimage.2012.08.052

      Schommartz, I., Lembcke, P. F., Pupillo, F., Schuetz, H., de Chamorro, N. W., Bauer, M., Kaindl, A. M., Buss, C., & Shing, Y. L. (2023). Distinct multivariate structural brain profiles are related to variations in short- and long-delay memory consolidation across children and young adults. Developmental Cognitive Neuroscience, 59. https://doi.org/10.1016/J.DCN.2022.101192

      Sekeres, M. J., Winocur, G., & Moscovitch, M. (2018). The hippocampus and related neocortical structures in memory transformation. Neuroscience Letters, 680, 39–53. https://doi.org/10.1016/j.neulet.2018.05.006

      Shinn, L. J., & Lagalwar, S. (2021). Treating Neurodegenerative Disease with Antioxidants: Efficacy of the Bioactive Phenol Resveratrol and Mitochondrial-Targeted MitoQ and SkQ. Antioxidants, 10(4), 573. https://doi.org/10.3390/antiox10040573

      Staresina, B. P., Alink, A., Kriegeskorte, N., & Henson, R. N. (2013). Awake reactivation predicts memory in humans. Proceedings of the National Academy of Sciences, 110(52), 21159–21164. https://doi.org/10.1073/pnas.1311989110

      St-Laurent, M., & Buchsbaum, B. R. (2019). How Multiple Retrievals Affect Neural Reactivation in Young and Older Adults. The Journals of Gerontology: Series B, 74(7), 1086–1100. https://doi.org/10.1093/geronb/gbz075

      Thompson, G. J., Riedl, V., Grimmer, T., Drzezga, A., Herman, P., & Hyder, F. (2016). The Whole-Brain “Global” Signal from Resting State fMRI as a Potential Biomarker of Quantitative State Changes in Glucose Metabolism. Brain Connectivity, 6(6), 435–447. https://doi.org/10.1089/brain.2015.0394

      Tompary, A., & Davachi, L. (2017). Consolidation Promotes the Emergence of Representational Overlap in the Hippocampus and Medial Prefrontal Cortex. Neuron, 96(1), 228-241.e5. https://doi.org/10.1016/j.neuron.2017.09.005

      Tompary, A., Zhou, W., & Davachi, L. (2020). Schematic memories develop quickly, but are not expressed unless necessary. PsyArXiv.

      Woolrich, M. W., Behrens, T. E. J., Beckmann, C. F., Jenkinson, M., & Smith, S. M. (2004). Multilevel linear modelling for FMRI group analysis using Bayesian inference. NeuroImage, 21(4), 1732–1747. https://doi.org/10.1016/j.neuroimage.2003.12.023

      Xiao, X., Dong, Q., Gao, J., Men, W., Poldrack, R. A., & Xue, G. (2017). Transformed Neural Pattern Reinstatement during Episodic Memory Retrieval. The Journal of Neuroscience, 37(11), 2986–2998. https://doi.org/10.1523/JNEUROSCI.2324-16.2017

      Ye, Z., Shi, L., Li, A., Chen, C., & Xue, G. (2020). Retrieval practice facilitates memory updating by enhancing and differentiating medial prefrontal cortex representations. ELife, 9, 1–51. https://doi.org/10.7554/ELIFE.57023

      Yonelinas, A. P., Ranganath, C., Ekstrom, A. D., & Wiltgen, B. J. (2019). A contextual binding theory of episodic memory: systems consolidation reconsidered. Nature Reviews. Neuroscience, 20(6), 364–375. https://doi.org/10.1038/S41583-019-01504

      Zhuang, L., Wang, J., Xiong, B., Bian, C., Hao, L., Bayley, P. J., & Qin, S. (2021). Rapid neural reorganization during retrieval practice predicts subsequent long-term retention and false memory. Nature Human Behaviour, 6(1), 134–145.

      https://doi.org/10.1038/s41562-021-01188-4

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary: 

      In this manuscript, the authors identified that

      (1) CDK4/6i treatment attenuates the growth of drug-resistant cells by prolongation of the G1 phase;

      (2) CDK4/6i treatment results in an ineffective Rb inactivation pathway and suppresses the growth of drugresistant tumors;

      (3) Addition of endocrine therapy augments the efficacy of CDK4/6i maintenance; 

      (4) Addition of CDK2i with CDK4/6 treatment as second-line treatment can suppress the growth of resistant cell; 

      (5) The role of cyclin E as a key driver of resistance to CDK4/6 and CDK2 inhibition.

      Strengths: 

      To prove their complicated proposal, the authors employed orchestration of several kinds of live cell markers, timed in situ hybridization, IF and Immunoblotting. The authors strongly recognize the resistance of CDK4/6 + ET therapy and demonstrated how to overcome it. 

      Weaknesses: 

      The authors need to underscore their proposed results from what is to be achieved by them and by other researchers. 

      Reviewer #2 (Public review): 

      Summary: 

      This study elucidated the mechanism underlying drug resistance induced by CDK4/6i as a single agent and proposed a novel and efficacious second-line therapeutic strategy. It highlighted the potential of combining CDK2i with CDK4/6i for the treatment of HR+/HER2- breast cancer.

      Strengths: 

      The study demonstrated that CDK4/6 induces drug resistance by impairing Rb activation, which results in diminished E2F activity and a delay in G1 phase progression. It suggests that the synergistic use of CDK2i and CDK4/6i may represent a promising second-line treatment approach. Addressing critical clinical challenges, this study holds substantial practical implications.

      Weaknesses: 

      (1) Drug-resistant cell lines: Was a drug concentration gradient treatment employed to establish drug-resistant cell lines? If affirmative, this methodology should be detailed in the materials and methods section. 

      We greatly appreciate the reviewer for raising this important question. In the revised manuscript, we have updated the methods section (“Drug-resistant cell lines”) to more precisely describe how the drug-resistant cell lines were established. 

      (2) What rationale informed the selection of MCF-7 cells for the generation of CDK6 knockout cell lines? Supplementary Figure 3. A indicates that CDK6 expression levels in MCF-7 cells are not notably elevated. 

      We appreciate the reviewer’s insightful question about the rationale for selecting MCF-7 cells to generate CDK6 knockout cell lines. This choice was guided by prior studies highlighting the significant role of CDK6 in mediating resistance to CDK4/6 inhibitors (21-24). Moreover, we observed a 4.6-fold increase in CDK6 expression in CDK4/6i resistant MCF-7 cells compared to their drug-naïve counterparts (Supplementary Figure 3A). While we did not detect notable differences in CDK4/6 activity between wild-type and CDK6 knockout cells under CDK4/6 inhibitor treatment, these findings point to a potential non-canonical function of CDK6 in conferring resistance to CDK4/6 inhibitors.  

      (3) For each experiment, particularly those involving mice, the author must specify the number of individuals utilized and the number of replicates conducted, as detailed in the materials and methods section. 

      We sincerely thank the reviewer for bringing this to our attention. In the revised manuscript, we have explicitly stated the number of replicates and mice used for each experiment as appropriate in figure legends and relevant text to ensure transparency and clarity. 

      (4) Could this treatment approach be extended to triple-negative breast cancer?

      We greatly appreciate the reviewer’s inquiry about extending our findings to triple-negative breast cancer (TNBC). Based on the data presented in Figure 1 and Supplementary Figure 2, which include the TNBC cell line MDA-MB-231, we expect that the benefits of maintaining CDK4/6 inhibitors could indeed be applicable to TNBC with an intact Rb/E2F pathway. Additionally, our recent paper (25) indicates a similar mechanism in TNBC.

      Reviewer #3 (Public review):

      Summary: 

      In their manuscript, Armand and colleagues investigate the potential of continuing CDK4/6 inhibitors or combining them with CDK2 inhibitors in the treatment of breast cancer that has developed resistance to initial therapy. Utilizing cellular and animal models, the research examines whether maintaining CDK4/6 inhibition or adding CDK2 inhibitors can effectively control tumor growth after resistance has set in. The key findings from the study indicate that the sustained use of CDK4/6 inhibitors can slow down the proliferation of cancer cells that have become resistant, and the combination of CDK2 inhibitors with CDK4/6 inhibitors can further enhance the suppression of tumor growth. Additionally, the study identifies that high levels of Cyclin E play a significant role in resistance to the combined therapy. These results suggest that continuing CDK4/6 inhibitors along with the strategic use of CDK2 inhibitors could be an effective strategy to overcome treatment resistance in hormone receptor-positive breast cancer.

      Strengths: 

      (1) Continuous CDK4/6 Inhibitor Treatment Significantly Suppresses the Growth of Drug-Resistant HR+ Breast Cancer: The study demonstrates that the continued use of CDK4/6 inhibitors, even after disease progression, can significantly inhibit the growth of drug-resistant breast cancer. 

      (2) Potential of Combined Use of CDK2 Inhibitors with CDK4/6 Inhibitors: The research highlights the potential of combining CDK2 inhibitors with CDK4/6 inhibitors to effectively suppress CDK2 activity and overcome drug resistance. 

      (3) Discovery of Cyclin E Overexpression as a Key Driver: The study identifies overexpression of cyclin E as a key driver of resistance to the combination of CDK4/6 and CDK2 inhibitors, providing insights for future cancer treatments. 

      (4) Consistency of In Vitro and In Vivo Experimental Results: The study obtained supportive results from both in vitro cell experiments and in vivo tumor models, enhancing the reliability of the research. 

      (5) Validation with Multiple Cell Lines: The research utilized multiple HR+/HER2- breast cancer cell lines (such as MCF-7, T47D, CAMA-1) and triple-negative breast cancer cell lines (such as MDA-MB-231), validating the broad applicability of the results.

      Weaknesses: 

      (1) The manuscript presents intriguing findings on the sustained use of CDK4/6 inhibitors and the potential incorporation of CDK2 inhibitors in breast cancer treatment. However, I would appreciate a more detailed discussion of how these findings could be translated into clinical practice, particularly regarding the management of patients with drug-resistant breast cancer. 

      Thank you to the reviewer for this crucial comment. In the revised Discussion, we've broadened our exploration of clinical translation. Specifically, we emphasize that ongoing CDK4/6 inhibition, although not fully stopping resistant tumors, significantly slows their growth and may offer a therapeutic window when combined with ET and CDK2 inhibition. We also note that these approaches may work best for patients without Rb loss or newly acquired resistance-driving mutations, and that cyclin E overexpression could be a biomarker to inform patient selection. These points together highlight that our findings provide a mechanistic understanding and potential framework for clinical trials testing maintenance CDK4/6i with selective addition of CDK2i as a secondline strategy in drug-resistant HR+/HER2- breast cancer.

      (2) While the emergence of resistance is acknowledged, the manuscript could benefit from a deeper exploration of the molecular mechanisms underlying resistance development. A more thorough understanding of how CDK2 inhibitors may overcome this resistance would be valuable. 

      We thank the reviewer for this valuable suggestion. In the revised manuscript, we have expanded our Discussion to more explicitly synthesize the molecular mechanisms of resistance and how CDK2 inhibitors counteract them. Specifically, we describe how sustained CDK4/6 inhibition drives a non-canonical route of Rb degradation, resulting in inefficient E2F activation and prolonged G1 phase progression. We also highlight the role of c-Myc in amplifying E2F activity and promoting resistance, and we show that continued ET mitigates this effect by suppressing c-Myc. Importantly, we demonstrate that CDK2 inhibition alone cannot fully suppress the growth of resistant cells, but when combined with CDK4/6 inhibition, it produces durable repression of E2F and Myc target gene programs and significantly delays the G1/S transition. Finally, we identify cyclin E overexpression as a key mechanism of escape from dual CDK4/6i + CDK2i therapy, suggesting its potential as a biomarker for patient stratification . Together, these findings provide a detailed mechanistic rationale for how CDK2 inhibition can overcome specific pathways of resistance in HR<sup>+</sup>/HER2<sup>-</sup> breast cancer.

      (3) The manuscript supports the continued use of CDK4/6 inhibitors, but it lacks a discussion on the long-term efficacy and safety of this approach. Additional studies or data to support the safety profile of prolonged CDK4/6 inhibitor use would strengthen the manuscript. 

      We appreciate the reviewer’s insightful comment. In the revised manuscript, we emphasize the longterm efficacy and safety considerations of sustained CDK4/6 inhibition. Clinical trial and retrospective data have shown that continued CDK4/6i therapy can extend progression-free survival in selected patients, while maintaining a favorable safety profile (26-28). We have updated the Discussion to highlight these findings more explicitly, underscoring that while prolonged CDK4/6 inhibition slows but does not fully arrest tumor growth, it remains a clinically viable strategy when balanced against its manageable toxicity profile.

      Reviewer #1 (Recommendations for the authors): 

      It is well known that the combination therapy of CDK4/6i and ET has therapeutic benefits in ER(+) HER2(-) advanced breast cancer. However, drug resistance is a problem, and second-line therapy to solve this problem has not been established. Although some parts of the research results are already reported, the authors confirmed them by employing live cell markers, and further proved and suggested how to overcome this resistance in detail. This part is considered novel. 

      Overall, this research manuscript is eligible to be accepted with the appropriate addressing of questions.

      (1)The effects and biochemical changes of combination therapy of CDK4/6i and CDK2i are already known in several papers. The author needs to highlight the differences between the author's research and that of otherresearchers. 

      We thank the reviewer for the opportunity to clarify the novelty of our findings in the context of prior studies on CDK4/6i and CDK2i combination therapy. In the revised manuscript, we have updated the Discussion section to more clearly delineate how our work extends and differs from existing research.

      Specifically, we now state:

      Page 12: The combination of CDK4/6i and ET has reshaped treatment for HR<sup>+</sup>/HER2<sup>-</sup> breast cancer (1-8). However, resistance commonly emerges, and no consensus second-line standard is established. Our data show that continued CDK4/6i treatment in drug-resistant cells engages a non-canonical, proteolysis-driven route of Rb inactivation, yielding attenuated E2F output and a pronounced delay in G1 progression (Figure 7G). Concurrent ET further deepens this blockade by suppressing c-Myc-mediated E2F amplification, thereby prolonging G1 and slowing population growth. Importantly, CDK2 inhibition alone was insufficient to control resistant cells. Robust suppression of CDK2 activity and resistant-cell growth required CDK2i in combination with CDK4/6i, consistent with prior reports supporting dual CDK targeting (9-16). Moreover, cyclin E, and in some contexts cyclin A, blunted the efficacy of the CDK4/6i and CDK2i combination by reactivating CDK2. Together, these findings provide a mechanistic rationale for maintaining CDK4/6i beyond progression and support testing ET plus CDK4/6i with the strategic addition of CDK2i, as evidenced by concordant in vitro and in vivo results.

      (2) Regarding Figures 3H and 3I, I wonder if it is live cell imaging results or if the authors counter each signal via timed IF staining slides? If live cell imaging is used, the authors need to present the methods. 

      We appreciate the reviewer’s question. Figures 3H and 3I derive from a live–fixed correlative pipeline rather than purely live imaging or independently timed IF slides. We first imaged asynchronously proliferating cells live for ≥48 h to (i) segment/track nuclei with H2B fluorescence, (ii) define mitotic exit (t = 0 at anaphase), and (iii) record CDK2 activity using a CDK2 KTR in the last live frame. Immediately after the live acquisition, we pulsed EdU (10 µM, 15 min) and fixed the same wells, photobleached fluorescent proteins (3% H₂O₂ + 20 mM HCl, 2 h, RT) to prevent crosstalk, and then performed click-chemistry EdU detection, IF for phospho-Rb (Ser807/811) and total Rb, and RNA FISH for E2F1. Fixed-cell readouts (p-Rb positivity, EdU incorporation, E2F1 mRNA puncta) were mapped back to each single cell’s live-derived time since mitosis and/or CDK2 activity, enabling the kinetic plots shown in Fig. 3H–I.

      To ensure transparency and reproducibility, we added detailed methods describing this workflow in the “Immunofluorescence and mRNA fluorescence in situ hybridization (FISH)” section under a dedicated “live– fixed pipeline” paragraph, and we cross-referenced acquisition and analysis parameters in “Live- and fixed-cell image acquisition” and “Image processing and analysis.” These updates specify: EdU pulse/fix conditions, photobleaching, antibodies/probes, imaging hardware and channels, segmentation/tracking, mitosis alignment, background correction, and how fixed readouts were binned/quantified as functions of time after mitosis and CDK2 activity.

      (3) Regarding Figure 3F, seven images were obtained in same fields? The author needs to describe the meaning of the white image and the yellow and blue image of the bottom in detail. 

      Thank you for raising this point. All seven panels in Fig. 3F are from the same field of view. The top row shows the raw channels (Hoechst, p-Rb, total Rb, and E2F1 RNA FISH). The bottom row shows the corresponding processed outputs from that field: (i) nuclear segmentation, (ii) phosphorylated Rb-status classification, and (iii) cell boundaries used for single-cell RNA-FISH quantification. We have revised the figure legend to make this explicit.

      (4) The author showed E2F mRNA by ISH, but in fact, RB does not suppress E2F mRNA but suppresses protein, so the author needs to confirm E2F at the protein level.

      We sincerely appreciate the reviewer’s thoughtful suggestion to examine E2F1 at the protein level. In our study, we focused on E2F1 mRNA expression because it is a well-established and biologically meaningful readout of E2F1 transcriptional activity. Due to its autoregulatory nature (17), the release of active E2F1 protein from Rb induces the transcription of E2F1 itself, creating a positive feedback loop. As a result, E2F1 mRNA abundance serves as a direct and reliable proxy for E2F1 protein activity (18-20). Thus, quantifying E2F1 mRNA provides a biologically relevant and mechanistic indicator of Rb-E2F pathway status. To clarify this rationale, we have updated the Results section and added references supporting our use of E2F1 mRNA as a readout for E2F1 activity.

      (5) Is it possible to synchronize cells (nocodazole shake-off, Double thymidine block) under the presence of cdk4/6i? If so, then the authors need to demonstrate the delay of G1 progression via immunoblotting. 

      We thank the reviewer for this constructive suggestion. To address it, we performed nocodazole synchronization followed by release and monitored cell-cycle progression in the presence or absence of CDK4/6 inhibition.

      Specifically, we added the following new datasets to the revised manuscript:

      Fig. 3L: Live single-cell trajectories of CDK4/6 and CDK2 activities alongside the Cdt1-degron reporter after 14 hours of nocodazole (250 nM) treatment and release. We compared the averaged traces of CDK4/6 and CDK2 activities and Cdt1 intensity in parental cells (gray) and resistant cells with (red) and without (blue) CDK4/6i maintenance. These data show suppressed and delayed CDK2 activation, as well as a right-shifted S-phase entry, particularly under continuous CDK4/6 inhibition.

      Fig. 3M: Fixed-cell EdU pulse-labeling at 4, 6, 8, 12, 16, and 24 h post-release further confirms a significant delay in S-phase entry and prolonged G1 duration in CDK4/6i-maintained cells compared with naïve and withdrawn conditions.

      Together, these results directly demonstrate the delay in G1 progression following synchronized mitotic exit under CDK4/6 inhibition.

      (6) In Figure 5C the authors showed a violin plot of c-Myc level. Is this Immunohistochemical staining? The authors need to clarify the methods.

      Thank you for flagging this. The c-Myc measurements in Fig. 5C are from immunofluorescence (IF), not IHC. We now state this explicitly in the legend.

      (7) Regarding Live cell immunofluorescence tracing of live-cell reporters, the author needs to clarify the methods (excitation, emission), name of instruments, and software used.

      To address this, we have expanded the “Live-cell, fixed-cell, and tumor tissue image acquisition” section in the Materials and Methods.

      (8) Lines 475 SF1A, the authors need to correct typos. Naïve Naïve.

      We greatly appreciate the reviewer’s attention to this detail and have ensured all typos have been addressed.  

      (9) The authors need to unify Cdt1-degron(legends) Vs Cdt1 degron (figures). 

      We greatly appreciate your attention to this discrepancy. Language referring to the Cdt1 degron has been unified between figures and legends. 

      Reviewer #3 (Recommendations for the authors):

      (1) While the manuscript discusses the selection of doses for CDK4/6 inhibitors and CDK2 inhibitors, there is a lack of detailed data on the dose-response relationship. Additional data on the effects of different doses would be beneficial. 

      We appreciate the reviewer’s important comment. To address it, we performed additional dose– response experiments testing a range of CDK4/6i and CDK2i concentrations. These analyses revealed a clear synergistic interaction between the two inhibitors. The new data are now presented in Figure 6G and Supplementary Figure 8F of the revised manuscript.

      (2) In clinical trials, the criteria for patient selection are crucial for interpreting study outcomes. A detailed description of the patient selection criteria should be provided.  

      We thank the reviewer for bringing this important point to our attention. In the revised manuscript, we have clarified the patient selection criteria relevant to the interpretation of clinical outcomes. Specifically, we note that retrospective analyses suggest patients with indolent disease and no prior chemotherapy may benefit most from continued CDK4/6i plus ET. Moreover, our data and others’ indicate that clinical benefit is expected in tumors retaining an intact Rb/E2F axis, while resistance-driving alterations (e.g., Rb loss, PIK3CA, ESR1, FGFR1–3, HER2, FAT1 mutations) are likely to limit efficacy. Finally, we highlight cyclin E overexpression as a potential biomarker of resistance to combined CDK4/6i and CDK2i, underscoring the need for biomarker-guided patient stratification. These additions provide a more detailed framework for patient selection in future clinical applications.

      References

      (1) Finn RS, Crown JP, Lang I, Boer K, Bondarenko IM, Kulyk SO, et al. The cyclin-dependent kinase 4/6 inhibitor palbociclib in combination with letrozole versus letrozole alone as first-line treatment of oestrogen receptor-positive, HER2-negative, advanced breast cancer (PALOMA-1/TRIO-18): a randomised phase 2 study. Lancet Oncol 2015;16:25-35

      (2) Finn RS, Martin M, Rugo HS, Jones S, Im S-A, Gelmon K, et al. Palbociclib and Letrozole in Advanced Breast Cancer. New England Journal of Medicine 2016;375:1925-36

      (3) Turner NC, Slamon DJ, Ro J, Bondarenko I, Im S-A, Masuda N, et al. Overall Survival with Palbociclib and Fulvestrant in Advanced Breast Cancer. New England Journal of Medicine 2018;379:1926-36

      (4) Dickler MN, Tolaney SM, Rugo HS, Cortés J, Diéras V, Patt D, et al. MONARCH 1, A Phase II Study of Abemaciclib, a CDK4 and CDK6 Inhibitor, as a Single Agent, in Patients with Refractory HR(+)/HER2(-) Metastatic Breast Cancer. Clin Cancer Res 2017;23:5218-24

      (5) Johnston S, Martin M, Di Leo A, Im S-A, Awada A, Forrester T, et al. MONARCH 3 final PFS: a randomized study of abemaciclib as initial therapy for advanced breast cancer. npj Breast Cancer 2019;5:5

      (6) Hortobagyi GN, Stemmer SM, Burris HA, Yap Y-S, Sonke GS, Hart L, et al. Overall Survival with Ribociclib plus Letrozole in Advanced Breast Cancer. New England Journal of Medicine 2022;386:94250

      (7) Slamon DJ, Neven P, Chia S, Fasching PA, De Laurentiis M, Im S-A, et al. Overall Survival with Ribociclib plus Fulvestrant in Advanced Breast Cancer. New England Journal of Medicine 2019;382:51424

      (8) Im S-A, Lu Y-S, Bardia A, Harbeck N, Colleoni M, Franke F, et al. Overall Survival with Ribociclib plus Endocrine Therapy in Breast Cancer. New England Journal of Medicine 2019;381:307-16

      (9) Pandey K, Park N, Park KS, Hur J, Cho YB, Kang M, et al. Combined CDK2 and CDK4/6 Inhibition Overcomes Palbociclib Resistance in Breast Cancer by Enhancing Senescence. Cancers (Basel) 2020;12

      (10) Freeman-Cook K, Hoffman RL, Miller N, Almaden J, Chionis J, Zhang Q, et al. Expanding control of the tumor cell cycle with a CDK2/4/6 inhibitor. Cancer Cell 2021;39:1404-21 e11

      (11) Dietrich C, Trub A, Ahn A, Taylor M, Ambani K, Chan KT, et al. INX-315, a selective CDK2 inhibitor, induces cell cycle arrest and senescence in solid tumors. Cancer Discov 2023

      (12) Al-Qasem AJ, Alves CL, Ehmsen S, Tuttolomondo M, Terp MG, Johansen LE, et al. Co-targeting CDK2 and CDK4/6 overcomes resistance to aromatase and CDK4/6 inhibitors in ER+ breast cancer. NPJ Precis Oncol 2022;6:68

      (13) Kudo R, Safonov A, Jones C, Moiso E, Dry JR, Shao H, et al. Long-term breast cancer response to CDK4/6 inhibition defined by TP53-mediated geroconversion. Cancer Cell 2024

      (14) Arora M, Moser J, Hoffman TE, Watts LP, Min M, Musteanu M, et al. Rapid adaptation to CDK2 inhibition exposes intrinsic cell-cycle plasticity. Cell 2023;186:2628-43 e21

      (15) Kumarasamy V, Wang J, Roti M, Wan Y, Dommer AP, Rosenheck H, et al. Discrete vulnerability to pharmacological CDK2 inhibition is governed by heterogeneity of the cancer cell cycle. Nature Communications 2025;16:1476

      (16) Dommer AP, Kumarasamy V, Wang J, O'Connor TN, Roti M, Mahan S, et al. Tumor Suppressors Condition Differential Responses to the Selective CDK2 Inhibitor BLU-222. Cancer Res 2025

      (17) Johnson DG, Ohtani K, Nevins JR. Autoregulatory control of E2F1 expression in response to positive and negative regulators of cell cycle progression. Genes & Development 1994;8:1514-25

      (18) Chung M, Liu C, Yang HW, Koberlin MS, Cappell SD, Meyer T. Transient Hysteresis in CDK4/6 Activity Underlies Passage of the Restriction Point in G1. Mol Cell 2019;76:562-73 e4

      (19) Kim S, Leong A, Kim M, Yang HW. CDK4/6 initiates Rb inactivation and CDK2 activity coordinates cell-cycle commitment and G1/S transition. Sci Rep 2022;12:16810

      (20) Yang HW, Chung M, Kudo T, Meyer T, Yang HW, Chung, Mingyu, Kudo T, et al. Competing memories of mitogen and p53 signalling control cell-cycle entry. Nature 2017;549:404-8

      (21) Yang C, Li Z, Bhatt T, Dickler M, Giri D, Scaltriti M, et al. Acquired CDK6 amplification promotes breast cancer resistance to CDK4/6 inhibitors and loss of ER signaling and dependence. Oncogene 2017;36:2255-64

      (22) Li Q, Jiang B, Guo J, Shao H, Del Priore IS, Chang Q, et al. INK4 Tumor Suppressor Proteins Mediate Resistance to CDK4/6 Kinase Inhibitors. Cancer Discov 2022;12:356-71

      (23) Ji W, Zhang W, Wang X, Shi Y, Yang F, Xie H, et al. c-myc regulates the sensitivity of breast cancer cells to palbociclib via c-myc/miR-29b-3p/CDK6 axis. Cell Death & Disease 2020;11:760

      (24) Wu X, Yang X, Xiong Y, Li R, Ito T, Ahmed TA, et al. Distinct CDK6 complexes determine tumor cell response to CDK4/6 inhibitors and degraders. Nature Cancer 2021;2:429-43

      (25) Kim S, Son E, Park HR, Kim M, Yang HW. Dual targeting CDK4/6 and CDK7 augments tumor response and anti-tumor immunity in breast cancer models. J Clin Invest 2025

      (26) Ravani LV, Calomeni P, Vilbert M, Madeira T, Wang M, Deng D, et al. Efficacy of Subsequent Treatments After Disease Progression on CDK4/6 Inhibitors in Patients With Hormone Receptor-Positive Advanced Breast Cancer. JCO Oncol Pract 2025;21:832-42

      (27) Martin JM, Handorf EA, Montero AJ, Goldstein LJ. Systemic Therapies Following Progression on Firstline CDK4/6-inhibitor Treatment: Analysis of Real-world Data. Oncologist 2022;27:441-6

      (28) Kalinsky K, Bianchini G, Hamilton E, Graff SL, Park KH, Jeselsohn R, et al. Abemaciclib Plus Fulvestrant in Advanced Breast Cancer After Progression on CDK4/6 Inhibition: Results From the Phase III postMONARCH Trial. J Clin Oncol 2025;43:1101-12

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The Major Histocompatibility Complex (MHC) region is a collection of numerous genes involved in both innate and adaptive immunity. MHC genes are famed for their role in rapid evolution and extensive polymorphism in a variety of vertebrates. This paper presents a summary of gene-level gain and loss of orthologs and paralogs within MHC across the diversity of primates, using publicly available data.

      Strengths:

      This paper provides a strong case that MHC genes are rapidly gained (by paralog duplication) and lost over millions of years of macroevolution. The authors are able to identify MHC loci by homology across species, and from this infer gene duplications and losses using phylogenetic analyses. There is a remarkable amount of genic turnover, summarized in Figure 6 and Figure 7, either of which might be a future textbook figure of immune gene family evolution. The authors draw on state-of-the-art phylogenetic methods, and their inferences are robust insofar as the data might be complete enough to draw such conclusions.

      Weaknesses:

      One concern about the present work is that it relies on public databases to draw inferences about gene loss, which is potentially risky if the publicly available sequence data are incomplete. To say, for example, that a particular MHC gene copy is absent in a taxon (e.g., Class I locus F absent in Guenons according to Figure 1), we need to trust that its absence from the available databases is an accurate reflection of its absence in the genome of the actual organisms. This may be a safe assumption, but it rests on the completeness of genome assembly (and gene annotations?) or people uploading relevant data. This reviewer would have been far more comfortable had the authors engaged in some active spot-checking, doing the lab work to try to confirm absences at least for some loci and some species. Without this, a reader is left to wonder whether gene loss is simply reflecting imperfect databases, which then undercuts confidence in estimates of rates of gene loss.

      Indeed, just because a locus has not been confirmed in a species does not necessarily mean that it is absent. As we explain in the Figure 1 caption, only a few species have had their genomes extensively studied (gray background), and only for these species does the absence of a point in this figure mean that a locus is absent. The white background rows represent species that are not extensively studied, and we point out that the absence of a point does not mean that a locus is absent from the species, rather undiscovered. We have also added a parenthetical to the text to explain this (line 156): “Only species with rows highlighted in gray have had their MHC regions extensively studied (and thus only for these rows is the absence of a gene symbol meaningful).”

      While we agree that spot-checking may be a helpful next step, one of the goals of this manuscript is to collect and synthesize the enormous volume of MHC evolution research in the primates, which will serve as a jumping-off point for other researchers to perform important wet lab work.

      Some context is useful for comparing rates of gene turnover in MHC, to other loci. Changing gene copy numbers, duplications, and loss of duplicates, are common it seems across many loci and many organisms; is MHC exceptional in this regard, or merely behaving like any moderately large gene family? I would very much have liked to see comparable analyses done for other gene families (immune, like TLRs, or non-immune), and quantitative comparisons of evolutionary rates between MHC versus other genes. Does MHC gene composition evolve any faster than a random gene family? At present readers may be tempted to infer this, but evidence is not provided.

      Our companion paper (Fortier and Pritchard, 2025) demonstrates that the MHC is a unique locus in many regards, such as its evidence for deep balancing selection and its excess of disease associations. Thus, we expect that it is evolving faster than any random gene family. It would be interesting to repeat this analysis for other gene families, but that is outside of the scope of this project. Additionally, allele databases for other gene families are not nearly as developed, but as more alleles become available for other polymorphic families, a comparable analysis could become possible.

      We have added a paragraph to the discussion (lines 530-546) to clarify that we do not know for certain whether the MHC gene family is evolving rapidly compared to other gene families.

      While on the topic of making comparisons, the authors make a few statements about relative rates. For instance, lines 447-8 compare gene topology of classical versus non-classical genes; and line 450 states that classical genes experience more turnover. But there are no quantitative values given to these rates to provide numerical comparisons, nor confidence intervals provided (these are needed, given that they are estimates), nor formal statistical comparisons to confirm our confidence that rates differ between types of genes.

      More broadly, the paper uses sophisticated phylogenetic methods, but without taking advantage of macroevolutionary comparative methods that allow model-based estimation of macroevolutionary rates. I found the lack of quantitative measurements of rates of gene gain/loss to be a weakness of the present version of the paper, and something that should be readily remedied. When claiming that MHC Class I genes "turn over rapidly" (line 476) - what does rapidly mean? How rapidly? How does that compare to rates of genetic turnover at other families? Quantitative statements should be supported by quantitative estimates (and their confidence intervals).

      These statements refer to qualitative observations, so we cannot provide numerical values. We simply conclude that certain gene groups evolve faster or slower based on the species and genes present in each clade. It is difficult to provide estimates because of the incomplete sampling of genes that survived to the present day. In addition, the presence or absence of various orthologs in different species still needs to be confirmed, at which point it might be useful to be more quantitative. We have also added a paragraph to the discussion to address this concern and advocate for similar analyses of other gene families in the future when more data is available (lines 530-546).

      The authors refer to 'shared function of the MHC across species' (e.g. line 22); while this is likely true, they are not here presenting any functional data to confirm this, nor can they rule out neofunctionalization or subfunctionalization of gene duplicates. There is evidence in other vertebrates (e.g., cod) of MHC evolving appreciably altered functions, so one may not safely assume the function of a locus is static over long macroevolutionary periods, although that would be a plausible assumption at first glance.

      Indeed, we cannot assume that the function of a locus is static across time, especially for the MHC region. In our research, we read hundreds of papers that each focused on a small number of species or genes and gathered some information about them, sometimes based on functional experiments and sometimes on measures such as dN/dS. These provide some indication of a gene’s broad classification in a species or clade, even if the evidence is preliminary. Where possible, we used this preliminary evidence to give genes descriptors “classical,” “non-classical,” “dual characteristics,” “pseudogene,” “fixed”, or “unfixed.” Sometimes multiple individuals and haplotypes were analyzed, so we could even assign a minimum number of gene copies present in a species. We have aggregated all of these references into Supplementary Table 1 (for Class I/Figure 1) and Supplementary Table 2 (for Class II/Figure 2) along with specific details about which data points in these figures that each reference supports. We realize that many of these classifications are based on a small number of individuals or indirect measures, so they may change in the future as more functional data is generated.

      Reviewer #2 (Public review):

      Summary:

      The authors aim to provide a comprehensive understanding of the evolutionary history of the Major Histocompatibility Complex (MHC) gene family across primate species. Specifically, they sought to:

      (1) Analyze the evolutionary patterns of MHC genes and pseudogenes across the entire primate order, spanning 60 million years of evolution.

      (2) Build gene and allele trees to compare the evolutionary rates of MHC Class I and Class II genes, with a focus on identifying which genes have evolved rapidly and which have remained stable.

      (3) Investigate the role of often-overlooked pseudogenes in reconstructing evolutionary events, especially within the Class I region.

      (4) Highlight how different primate species use varied MHC genes, haplotypes, and genetic variation to mount successful immune responses, despite the shared function of the MHC across species.

      (5) Fill gaps in the current understanding of MHC evolution by taking a broader, multi-species perspective using (a) phylogenomic analytical computing methods such as Beast2, Geneconv, BLAST, and the much larger computing capacities that have been developed and made available to researchers over the past few decades, (b) literature review for gene content and arrangement, and genomic rearrangements via haplotype comparisons.

      (6) The authors overall conclusions based on their analyses and results are that 'different species employ different genes, haplotypes, and patterns of variation to achieve a successful immune response'.

      Strengths:

      Essentially, much of the information presented in this paper is already well-known in the MHC field of genomic and genetic research, with few new conclusions and with insufficient respect to past studies. Nevertheless, while MHC evolution is a well-studied area, this paper potentially adds some originality through its comprehensive, cross-species evolutionary analysis of primates, focus on pseudogenes and the modern, large-scale methods employed. Its originality lies in its broad evolutionary scope of the primate order among mammals with solid methodological and phylogenetic analyses.

      The main strengths of this study are the use of large publicly available databases for primate MHC sequences, the intensive computing involved, the phylogenetic tool Beast2 to create multigene Bayesian phylogenetic trees using sequences from all genes and species, separated into Class I and Class II groups to provide a backbone of broad relationships to investigate subtrees, and the presentation of various subtrees as species and gene trees in an attempt to elucidate the unique gene duplications within the different species. The study provides some additional insights with summaries of MHC reference genomes and haplotypes in the context of a literature review to identify the gene content and haplotypes known to be present in different primate species. The phylogenetic overlays or ideograms (Figures 6 and 7) in part show the complexity of the evolution and organisation of the primate MHC genes via the orthologous and paralogous gene and species pathways progressively from the poorly-studied NWM, across a few moderately studied ape species, to the better-studied human MHC genes and haplotypes.

      Weaknesses:

      The title 'The Primate Major Histocompatibility Complex: An Illustrative Example of GeneFamily Evolution' suggests that the paper will explore how the Major Histocompatibility Complex (MHC) in primates serves as a model for understanding gene family evolution. The term 'Illustrative Example' in the title would be appropriate if the paper aimed to use the primate Major Histocompatibility Complex (MHC) as a clear and representative case to demonstrate broader principles of gene family evolution. That is, the MHC gene family is not just one instance of gene family evolution but serves as a well-studied, insightful example that can highlight key mechanisms and concepts applicable to other gene families. However, this is not the case, this paper only covers specific details of primate MHC evolution without drawing broader lessons to any other gene families. So, the term 'Illustrative Example' is too broad or generalizing. In this case, a term like 'Case Study' or simply 'Example' would be more suitable. Perhaps, 'An Example of Gene Family Diversity' would be more precise. Also, an explanation or 'reminder' is suggested that this study is not about the origins of the MHC genes from the earliest jawed vertebrates per se (~600 mya), but it is an extension within a subspecies set that has emerged relatively late (~60 mya) in the evolutionary divergent pathways of the MHC genes, systems, and various vertebrate species.

      Thank you for your input on the title; we have changed it to “A case study of gene family evolution” instead.

      Thank you also for pointing out the potential confusion about the time span of our study. We have added “Having originated in the jawed vertebrates,” to a sentence in the introduction (lines 38-39). We have also added the sentence “Here, we focus on the primates, spanning approximately 60 million years within the over 500-million-year evolution of the family \citep{Flajnik2010}.“ to be more explicit about the context for our work (lines 59-61).

      Phylogenomics. Particular weaknesses in this study are the limitations and problems associated with providing phylogenetic gene and species trees to try and solve the complex issue of the molecular mechanisms involved with imperfect gene duplications, losses, and rearrangements in a complex genomic region such as the MHC that is involved in various effects on the response and regulation of the immune system. A particular deficiency is drawing conclusions based on a single exon of the genes. Different exons present different trees. Which are the more reliable? Why were introns not included in the analyses? The authors attempt to overcome these limitations by including genomic haplotype analysis, duplication models, and the supporting or contradictory information available in previous publications. They succeed in part with this multidiscipline approach, but much is missed because of biased literature selection. The authors should include a paragraph about the benefits and limitations of the software that they have chosen for their analysis, and perhaps suggest some alternative tools that they might have tried comparatively. How were problems with Bayesian phylogeny such as computational intensity, choosing probabilities, choosing particular exons for analysis, assumptions of evolutionary models, rates of evolution, systemic bias, and absence of structural and functional information addressed and controlled for in this study?

      We agree that different exons have different trees, which is exactly why we repeated our analysis for each exon in order to compare and contrast them. In particular, the exons encoding the binding site of the resulting protein (exons 2 and 3 for Class I and exon 2 for Class II) show evidence for trans-species polymorphism and gene conversion. These phenomena lead to trees that do not follow the species tree and are fascinating in and of themselves, which we explore in detail in our companion paper (Fortier and Pritchard, 2025). Meanwhile, the non-peptide-binding extracellular-domain-encoding exon (exon 4 for Class I and exon 3 for Class II) is comparably sized to the binding-site-encoding exons and provides an interesting functional contrast. As this exon is likely less affected by trans-species polymorphism, gene conversion, and convergent evolution, we present results from it most often in the main text, though we occasionally touch on differences between the exons. See lines 191-196, 223-226, and 407-414 for some examples of how we discuss the exons in the text. Additionally, all trees from all of these exons can be found in the supplement. 

      We agree that introns would valuable to study in this context. Even though the non--binding-site-encoding exons are probably *less* affected by trans-species polymorphism, gene conversion, and convergent evolution, they are still functional. The introns, however, experience much more relaxed selection, if any, and comparing their trees to those for the exons would be valuable and illuminating. We did not generate intron trees for two reasons. Most importantly, there is a dearth of data available for the introns; in the databases we used, there was often intron data available only for human, chimpanzee, and sometimes macaque, and only for a small subset of the genes. This limitation is at odds with the comprehensive, many-gene-many-species approach which we feel is the main novelty of this work. Secondly, the introns that *are* available are difficult to align. Even aligning the exons across such a highly-diverged set of genes and pseudogenes was difficult and required manual effort. The introns proved even more difficult to try to align across genes. In the future, when more intron data is available and sufficient effort is put into aligning them, it will be possible and desirable to do a comparable analysis. We also added a sentence to the “Data” section to briefly explain why we did not include introns (lines 134-135).

      We explain our Bayesian phylogenetics approach in detail in the Methods (lines 650-725), including our assumptions and our solutions to challenges specific to this application. For further explanation of the method itself, we suggest reading the original BEAST and BEAST2 papers (Drummond & Rambaut (2007), Drummond et al. (2012), Bouckaert et al. (2014), and Bouckaert et al. (2019)). Known structural and functional information helped us validate the alignments we used in this study, but the fact that such information is not fully known for every gene and species should not affect the method itself.

      Gene families as haplotypes. In the Introduction, the MHC is referred to as a 'gene family', and in paragraph 2, it is described as being united by the 'MHC fold', despite exhibiting 'very diverse functions'. However, the MHC region is more accurately described as a multigene region containing diverse, haplotype-specific Conserved Polymorphic Sequences, many of which are likely to be regulatory rather than protein-coding. These regulatory elements are essential for controlling the expression of multiple MHC-related products, such as TNF and complement proteins, a relationship demonstrated over 30 years ago. Non-MHC fold loci such as TNF, complement, POU5F1, lncRNA, TRIM genes, LTA, LTB, NFkBIL1, etc, are present across all MHC haplotypes and play significant roles in regulation. Evolutionary selection must act on genotypes, considering both paternal and maternal haplotypes, rather than on individual genes alone. While it is valuable to compile databases for public use, their utility is diminished if they perpetuate outdated theories like the 'birth-and-death model'. The inclusion of prior information or assumptions used in a statistical or computational model, typically in Bayesian analysis, is commendable, but they should be based on genotypic data rather than older models. A more robust approach would consider the imperfect duplication of segments, the history of their conservation, and the functional differences in inheritance patterns. Additionally, the MHC should be examined as a genomic region, with ancestral haplotypes and sequence changes or rearrangements serving as key indicators of human evolution after the 'Out of Africa' migration, and with disease susceptibility providing a measurable outcome. There are more than 7000 different HLA-B and -C alleles at each locus, which suggests that there are many thousands of human HLA haplotypes to study. In this regard, the studies by Dawkins et al (1999 Immunol Rev 167,275), Shiina et al. (2006 Genetics 173,1555) on human MHC gene diversity and disease hitchhiking (haplotypes), and Sznarkowska et al. (2020 Cancers 12,1155) on the complex regulatory networks governing MHC expression, both in terms of immune transcription factor binding sites and regulatory non-coding RNAs, should be examined in greater detail, particularly in the context of MHC gene allelic diversity and locus organization in humans and other primates.

      Thank you for these comments. To clarify that the MHC “region” is different from (and contains) the MHC “gene family” as we describe it, we changed a sentence in the abstract (lines 8-10) from “One large gene family that has experienced rapid evolution is the Major Histocompatibility Complex (MHC), whose proteins serve critical roles in innate and adaptive immunity.” to “One large gene family that has experienced rapid evolution lies within the Major Histocompatibility Complex (MHC), whose proteins serve critical roles in innate and adaptive immunity.” We know that the region is complex and contains many other genes and regulatory sequences; Figure 1 of our companion paper (Fortier and Pritchard, 2025) depicts these in order to show the reader that the MHC genes we focus on are just one part of the entire region.

      We love the suggestion to look at the many thousands of alleles present at each of the classical loci. This is the focus of our complimentary paper (Fortier and Pritchard, 2025) which explores variation at the allele level. In the current paper, we look mainly at the differences between genes and the use of different genes in different species.

      Diversifying and/or concerted evolution. Both this and past studies highlight diversifying selection or balancing selection model is the dominant force in MHC evolution. This is primarily because the extreme polymorphism observed in MHC genes is advantageous for populations in terms of pathogen defence. Diversification increases the range of peptides that can be presented to T cells, enhancing the immune response. The peptide-binding regions of MHC genes are highly variable, and this variability is maintained through selection for immune function, especially in the face of rapidly evolving pathogens. In contrast, concerted evolution, which typically involves the homogenization of gene duplicates through processes like gene conversion or unequal crossing-over, seems to play a minimal role in MHC evolution. Although gene duplication events have occurred in the MHC region leading to the expansion of gene families, the resulting paralogs often undergo divergent evolution rather than being kept similar or homozygous by concerted evolution. Therefore, unlike gene families such as ribosomal RNA genes or histone genes, where concerted evolution leads to highly similar copies, MHC genes display much higher levels of allelic and functional diversification. Each MHC gene copy tends to evolve independently after duplication, acquiring unique polymorphisms that enhance the repertoire of antigen presentation, rather than undergoing homogenization through gene conversion. Also, in some populations with high polymorphism or genetic drift, allele frequencies may become similar over time without the influence of gene conversion. This similarity can be mistaken for gene conversion when it is simply due to neutral evolution or drift, particularly in small populations or bottlenecked species. Moreover, gene conversion might contribute to greater diversity by creating hybrids or mosaics between different MHC genes. In this regard, can the authors indicate what percentage of the gene numbers in their study have been homogenised by gene conversion compared to those that have been diversified by gene conversion?

      We appreciate the summary, and we feel we have appropriately discussed both gene conversion and diversifying selection in the context of the MHC genes. Because we cannot know for sure when and where gene conversion has occurred, we cannot quantify percentages of genes that have been homogenized or diversified.  

      Duplication models. The phylogenetic overlays or ideograms (Figures 6 and 7) show considerable imperfect multigene duplications, losses, and rearrangements, but the paper's Discussion provides no in-depth consideration of the various multigenic models or mechanisms that can be used to explain the occurrence of such events. How do their duplication models compare to those proposed by others? For example, their text simply says on line 292, 'the proposed series of events is not always consistent with phylogenetic data'. How, why, when? Duplication models for the generation and extension of the human MHC class I genes as duplicons (extended gene or segmental genomic structures) by parsimonious imperfect tandem duplications with deletions and rearrangements in the alpha, beta, and kappa blocks were already formulated in the late 1990s and extended to the rhesus macaque in 2004 based on genomic haplotypic sequences. These studies were based on genomic sequences (genes, pseudogenes, retroelements), dot plot matrix comparisons, and phylogenetic analyses of gene and retroelement sequences using computer programs. It already was noted or proposed in these earlier 1999 studies that (1) the ancestor of HLA-P(90)/-T(16)/W(80) represented an old lineage separate from the other HLA class I genes in the alpha block, (2) HLA-U(21) is a duplicated fragment of HLA-A, (3) HLA-F and HLA-V(75) are among the earliest (progenitor) genes or outgroups within the alpha block, (4) distinct Alu and L1 retroelement sequences adjoining HLA-L(30), and HLA-N genomic segments (duplicons) in the kappa block are closely related to those in the HLA-B and HLA-C in the beta block; suggesting an inverted duplication and transposition of the HLA genes and retroelements between the beta and kappa regions. None of these prior human studies were referenced by Fortier and Pritchard in their paper. How does their human MHC class I gene duplication model (Fig. 6) such as gene duplication numbers and turnovers differ from those previously proposed and described by Kulski et al (1997 JME 45,599), (1999 JME 49,84), (2000 JME 50,510), Dawkins et al (1999 Immunol Rev 167,275), and Gaudieri et al (1999 GR 9,541)? Is this a case of reinventing the wheel?

      Figures 6 and 7 are intended to synthesize and reconcile past findings and our own trees, so they do not strictly adhere to the findings of any particular study and cannot fully match all studies. In the supplement, Figure 6 - figure supplement 1 and Figure 7 - figure supplement 1 duly credit all of the past work that went into making these trees. Most previous papers focus on just one aspect of these trees, such as haplotypes within a species, a specific gene or allelic lineage relationship, or the branching pattern of particular gene groups. We believe it was necessary to bring all of these pieces of evidence together. Even among papers with the same focus (to understand the block duplications that generated the current physical layout of the MHC), results differ. For example, Geraghty (1992), Hughes (1995), Kulski (2004)/Kulski (2005),  and Shiina (1999) all disagree on the exact branching order of the genes MHC-W, -P, and -T, and of MHC-G, -J, and -K. While the Kulski studies you pointed out were very thorough for their era, they still only relied on data from three species and one haplotype per species. Our work is not intended to replace or discredit these past works, simply build upon them with a larger set of species and sequences. We hope the hypotheses we propose in Figures 6 and 7 can help unify existing research and provide a more easily accessible jumping-off-point for future work.

      Results. The results are presented as new findings, whereas most if not all of the results' significance and importance already have been discussed in various other publications. Therefore, the authors might do better to combine the results and discussion into a single section with appropriate citations to previously published findings presented among their results for comparison. Do the trees and subsets differ from previous publications, albeit that they might have fewer comparative examples and samples than the present preprint? Alternatively, the results and discussion could be combined and presented as a review of the field, which would make more sense and be more honest than the current format of essentially rehashing old data.

      In starting this project, we found that a large barrier to entry to this field of study is the immense amount of published literature over 30+ years. It is both time-consuming and confusing to read up on the many nuances of the MHC genes, their changing names, and their evolution, making it difficult to start new, innovative projects. We acknowledge that while our results are not entirely novel, the main advantage of our work is that it provides a thorough, comprehensive starting point for others to learn about the MHC quickly and dive into new research. We feel that we have appropriately cited past literature in both the main text, appendices, and supplement, so that readers may dive into a particular area with ease.

      Minor corrections:

      (1) Abstract, line 19: 'modern methods'. Too general. What modern methods?

      To keep the abstract brief, the methods are introduced in the main text when each becomes relevant as well as in the methods section.

      (2) Abstract, line 25: 'look into [primate] MHC evolution.' The analysis is on the primate MHC genes, not on the entire vertebrate MHC evolution with a gene collection from sharks to humans. The non-primate MHC genes are often differently organised and structurally evolved in comparison to primate MHC.

      Thank you! We have added the word “primate” to the abstract (line 25).

      (3) Introduction, line 113. 'In a companion paper (Fortier and Pritchard, 2024)' This paper appears to be unpublished. If it's unpublished, it should not be referenced.

      This paper is undergoing the eLife editorial process at the same time; it will have a proper citation in the final version.

      (4) Figures 1 and 2. Use the term 'gene symbols' (circle, square, triangle, inverted triangle, diamond) or 'gene markers' instead of 'points'. 'Asterisks "within symbols" indicate new information.

      Thank you, the word “symbol” is much clearer! We have changed “points” to “symbols” in the captions for Figure 1, Figure 1 - figure supplement 1, Figure 2, and Figure 2 - figure supplement 1. We also changed this in the text (lines 157-158 and 170).

      (5) Figures. A variety of colours have been applied for visualisation. However, some coloured texts are so light in colour that they are difficult to read against a white background. Could darker colours or black be used for all or most texts?

      With such a large number of genes and species to handle in this work, it was nearly impossible to choose a set of colors that were distinct enough from each other. We decided to prioritize consistency (across this paper, its supplement, and our companion paper) as well as at-a-glance grouping of similar sequences. Unfortunately, this means we had to sacrifice readability on a white background, but readers may turn to the supplement if they need to access specific sequence names.

      (6) Results, line 135. '(Fortier and Pritchard, 2024)' This paper appears to be unpublished. If it's unpublished, it should not be referenced.

      Repeat of (3). This paper is undergoing the eLife editorial process at the same time; it will have a proper citation in the final version.

      (7) Results, lines 152 to 153, 164, 165, etc. 'Points with an asterisk'. Use the term 'gene symbols' (circle, square, triangle, inverted triangle, diamond) or 'gene markers' instead of 'points'. A point is a small dot such as those used in data points for plotting graphs .... The figures are so small that the asterisks in the circles, squares, triangles, etc, look like points (dots) and the points/asterisks terminology that is used is very confusing visually.

      Repeat of (4). Thank you, the word “symbol” is much clearer! We have changed “points” to “symbols” in the captions for Figure 1, Figure 1 - figure supplement 1, Figure 2, and Figure 2 - figure supplement 1. We also changed this in the text (lines 157-158 and 170).

      (8) Line 178 (BEA, 2024) is not listed alphabetically in the References.

      Thank you for catching this! This reference maps to the first bibliography entry, “SUMMARIZING POSTERIOR TREES.” We are unsure how to cite a webpage that has no explicit author within the eLife Overleaf template, so we will consult with the editor.

      (9) Lines 188-190. 'NWM MHC-G does not group with ape/OWM MHC-G, instead falling outside of the clade containing ape/OWM MHC-A, -G, -J and -K.' This is not surprising given that MHC-A, -G, -J, and -K are paralogs of each other and that some of them, especially in NWM have diverged over time from the paralogs and/or orthologs and might be closer to one paralog than another and not be an actual ortholog of OWM, apes or humans.

      We included this sentence to clarify the relationships between genes and to help describe what is happening in Figure 6. Figure 6 - figure supplement 1 includes all of the references that go into such a statement and Appendix 3 details our reasoning for this and other statements.

      (10) Line 249. Gene conversion: This is recombination between two different genes where a portion of the genes are exchanged with one another so that different portions of the gene can group within one or other of the two gene clades. Alternatively, the gene has been annotated incorrectly if the gene does not group within either of the two alternative clades. Another possibility is that one or two nucleotide mutations have occurred without a recombination resulting in a mistaken interpretation or conclusion of a recombination event. What measures are taken to avoid false-positive conclusions? How many MHC gene conversion (recombination) events have occurred according to the authors' estimates? What measures are taken to avoid false-positive conclusions?

      All of these possibilities are certainly valid. We used the program GENECONV to infer gene conversion events, but there is considerable uncertainty owing to the ages of the genes and the inevitable point mutations that have occurred post-event. Gene conversion was not the focus of our paper, so we did our best to acknowledge it (and the resulting differences between trees from different exons) without spending too much time diving into it. A list of inferred gene conversion events can be found in Figure 3 - source data 1 and Figure 4 - source data 1.

      (11) Lines 284-286. 'The Class I MHC region is further divided into three polymorphic blocks-alpha, beta, and kappa blocks-that each contains MHC genes but are separated by well-conserved non-MHC genes.' The MHC class I region was first designated into conserved polymorphic duplication blocks, alpha and beta by Dawkins et al (1999 Immunol Rev 167,275), and kappa by Kulski et al (2002 Immunol Rev 190,95), and should be acknowledged (cited) accordingly.

      Thank you for catching this! We have added these citations (lines 302-303)!

      (12) Lines 285-286. 'The majority of the Class I genes are located in the alpha-block, which in humans includes 12 MHC genes and pseudogenes.' This is not strictly correct for many other species, because the majority of class I genes might be in the beta block of new and old-world monkeys, and the authors haven't provided respective counts of duplication numbers to show otherwise. The alpha block in some non-primate mammalian species such as pigs, rats, and mice has no MHC class I genes or only a few. Most MHC class I genes in non-primate mammalian species are found in other regions. For example, see Ando et al (2005 Immunogenetics 57,864) for the pig alpha, beta, and kappa regions in the MHC class I region. There are no pig MHC genes in the alpha block.

      Yes, which is exactly why we use the phrase “in humans” in that particular sentence. The arrangement of the MHC in several other primate reference genomes is shown in Figure 1 - figure supplement 2.

      (13) Line 297 to 299. 'The alpha-block also contains a large number of repetitive elements and gene fragments belonging to other gene families, and their specific repeating pattern in humans led to the conclusion that the region was formed by successive block duplications (Shiina et al., 1999).' There are different models for successive block duplications in the alpha block and some are more parsimonious based on imperfect multigenic segmental duplications (Kulski et al 1999, 2000) than others (Shiina et al., 1999). In this regard, Kulski et al (1999, 2000) also used duplicated repetitive elements neighbouring MHC genes to support their phylogenetic analyses and multigenic segmental duplication models. For comparison, can the authors indicate how many duplications and deletions they have in their models for each species?

      We have added citations to this sentence to show that there are different published models to describe the successive block duplications (line 307). Our models in Figure 6 and Figure 7 are meant to aggregate past work and integrate our own, and thus they were not built strictly by parsimony. References can be found in Figure 6 - figure supplement 1 and Figure 7 - figure supplement 1.

      (14) Lines 315-315. 'Ours is the first work to show that MHC-U is actually an MHC-A-related gene fragment.' This sentence should be deleted. Other researchers had already inferred that MHC-U is actually an MHC-A-related gene fragment more than 25 years ago (Kulski et al 1999, 2000) when the MHC-U was originally named MHC-21.

      While these works certainly describe MHC-U/MHC-21 as a fragment in the 𝛼-block, any relation to MHC-A was by association only and very few species/haplotypes were examined. So although the idea is not wholly novel, we provide convincing evidence that not only is MHC-U related to MHC-A by sequence, but also that it is a very recent partial duplicate of MHC-A. We show this with Bayesian phylogenetic trees as well as an analysis of haplotypes across many more species than were included in those papers.  

      (15) Lines 361-362. 'Notably, our work has revealed that MHC-V is an old fragment.' This is not a new finding or hypothesis. Previous phylogenetic analysis and gene duplication modelling had already inferred HLA-V (formerly HLA-75) to be an old fragment (Kulski et al 1999, 2000).

      By “old,” we mean older than previous hypotheses suggest. Previous work has proposed that MHC-V and -P were duplicated together, with MHC-V deriving from an MHC-A/H/V ancestral gene and MHC-P deriving from an MHC-W/T/P ancestral gene (Kulski (2005), Shiina (1999)). However, our analysis (Figure 5A) shows that MHC-V sequences form a monophyletic clade outside of the MHC-W/P/T group of genes as well as outside of the MHC-A/B/C/E/F/G/J/K/L group of genes, which is not consistent with MHC-A and -V being closely related. Thus, we conclude that MHC-V split off earlier than the differentiation of these other gene groups and is thus older than previously thought. We explain this in the text as well (lines 317-327) and in Appendix 3.  

      (16) Line 431-433. 'the Class II genes have been largely stable across the mammals, although we do see some lineage-specific expansions and contractions (Figure 2 and Figure 2-gure Supplement 2).' Please provide one or two references to support this statement. Is 'gure' a typo?

      We corrected this typo, thank you! This conclusion is simply drawn from the data presented in Figure 2 and Figure 2 - figure supplement 2. The data itself comes from a variety of sources, which are already included in the supplement as Figure 2 - source data 1.

      (17) Line 437. 'We discovered far more "specific" events in Class I, while "broad-scale" events were predominant in Class II.' Please define the difference between 'specific' and 'broad-scale'.

      These terms are defined in the previous sentence (lines 466-469).

      450-451. 'This shows that classical genes experience more turnover and are more often affected by long-term balancing selection or convergent evolution.' Is balancing selection a form of divergent evolution that is different from convergent evolution? Please explain in more detail how and why balancing selection or convergent evolution affects classical and nonclassical genes differently.

      Balancing selection acts to keep alleles at moderate frequencies, preventing any from fixing in the population. In contrast, convergent evolution describes sequences or traits becoming similar over time even though they are not similar by descent. While we cannot know exactly what selective forces have occurred in the past, we observe different patterns in the trees for each type of gene. In Figures 1 and 2, viewers can see at first glance that the nonclassical genes (which are named throughout the text and thoroughly described in Appendix 3) appear to be longer-lived than the classical genes. In addition, lines 204-222 and 475-488 describe topological differences in the BEAST2 trees of these two types of genes. However, we acknowledge that it could be helpful to have additional, complimentary information about the classical vs. non-classical genes. Thus, we have added a sentence and reference to our companion paper (Fortier and Pritchard, 2025), which focuses on long-term balancing selection and draws further contrast between classical and non-classical genes. In lines 481-484, we added  “We further explore the differences between classical and non-classical genes in our companion paper, finding ancient trans-species polymorphism at the classical genes but not at the non-classical genes \citep{Fortier2025b}.”

      References

      Some references in the supplementary materials such as Alvarez (1997), Daza-Vamenta (2004), Rojo (2005), Aarnink (2014), Kulski (2022), and others are missing from the Reference list. Please check that all the references in the text and the supplementary materials are listed correctly and alphabetically.

      We will make sure that these all show up properly in the proof.

      Reviewer #3 (Public review):

      Summary:

      The article provides the most comprehensive overview of primate MHC class I and class II genes to date, combining published data with an exploration of the available genome assemblies in a coherent phylogenetic framework and formulating new hypotheses about the evolution of the primate MHC genomic region.

      Strengths:

      I think this is a solid piece of work that will be the reference for years to come, at least until population-scale haplotype-resolved whole-genome resequencing of any mammalian species becomes standard. The work is timely because there is an obvious need to move beyond short amplicon-based polymorphism surveys and classical comparative genomic studies. The paper is data-rich and the approach taken by the authors, i.e. an integrative phylogeny of all MHC genes within a given class across species and the inclusion of often ignored pseudogenes, makes a lot of sense. The focus on primates is a good idea because of the wealth of genomic and, in some cases, functional data, and the relatively densely populated phylogenetic tree facilitates the reconstruction of rapid evolutionary events, providing insights into the mechanisms of MHC evolution. Appendices 1-2 may seem unusual at first glance, but I found them helpful in distilling the information that the authors consider essential, thus reducing the need for the reader to wade through a vast amount of literature. Appendix 3 is an extremely valuable companion in navigating the maze of primate MHC genes and associated terminology.

      Weaknesses:

      I have not identified major weaknesses and my comments are mostly requests for clarification and justification of some methodological choices.

      Thank you so much for your kind and supportive review!

      Reviewer #1 (Recommendations for the authors):

      (1) Line 151: How is 'extensively studied' defined?

      Extensively studied is not a strict definition, but a few organisms clearly stand apart from the rest in terms of how thoroughly their MHC regions have been studied. For example, the macaque is a model organism, and individuals from many different species and populations have had their MHC regions fully sequenced. This is in contrast to the gibbon, for example, in which there is some experimental evidence for the presence of certain genes, but no MHC region has been fully sequenced from these animals.

      (2) Can you clarify how 'classical' and 'non-classical' MHC genes are being determined in your analysis?

      Classical genes are those whose protein products perform antigen presentation to T cells and are directly involved in adaptive immunity, while non-classical genes are those whose protein products do not do this. For example, these non-classical genes might code for proteins that interact with receptors on Natural Killer cells and influence innate immunity. The roles of these proteins are not necessarily conserved between closely related species, and experimental evidence is needed to evaluate this. However, in the absence of such evidence, wherever possible we have provided our best guess as to the roles of the orthologous genes in other species, presented in Figure 1 - source data 1 and Figure 2 - source data 1. This is based on whatever evidence is available at the moment, sometimes experimental but typically based on dN/dS ratios and other indirect measures.

      (3) I find the overall tone of the paper to be very descriptive, and at times meandering and repetitive, with a lot of similar kinds of statements being repeated about gene gain/loss. This is perhaps inevitable because a single question is being asked of each of many subsets of MHC gene types, and even exons within gene types, so there is a lot of repetition in content with a slightly different focus each time. This does not help the reader stay focused or keep track. I found myself wishing for a clearly defined question or hypothesis, or some rate parameter in need of estimation. I would encourage the authors to tighten up their phrasing, or consider streamlining the results with some better signposting to organize ideas within the results.

      We totally understand your critique, as we talk about a wide range of specific genes and gene groups in this paper. To improve readability, we have added many more signposting phrases and sentences:

      “Aside from MHC-DRB, …” (line 173)

      “Now that we had a better picture of the landscape of MHC genes present in different primates, we wanted to understand the genes’ relationships. Treating Class I, Class IIA, and Class IIB separately, ...” (line 179-180)

      “We focus first on the Class I genes.” (line 191)

      “... for visualization purposes…” (line195)

      “We find that sequences do not always assort by locus, as would be expected for a typical gene.” (lines 196-197)

      “... rather than being directly orthologous to the ape/OWM MHC-G genes.” (lines 201-202)

      “Appendix 3 explains each of these genes in detail, including previous work and findings from this study.“ (lines 202-203)

      “... (but not with NWM) …” (line 208)

      “While genes such as MHC-F have trees which closely match the overall species tree, other genes show markedly different patterns, …” (lines 212-213)

      “Thus, while some MHC-G duplications appear to have occurred prior to speciation events within the NWM, others are species-specific.” (lines 218-219)

      “... indicating rapid evolution of many of the Class I genes” (lines 220-221)

      “Now turning to the Class II genes, …“ (line 223)

      “(see Appendix 2 for details on allele nomenclature) “ (line 238)

      “(e.g. MHC-DRB1 or -DRB2)” (line 254)

      “...  meaning their names reflect previously-observed functional similarity more than evolutionary relatedness.” (lines 257-258)

      “(see Appendix 3 for more detail)” (line 311)

      “(a 5'-end fragment)” (line 324)

      “Therefore, we support past work that has deemed MHC-V an old fragment.” (lines 326-327)

      “We next focus on MHC-U, a previously-uncharacterized fragment pseudogene containing only exon 3.” (line 328-329)

      “However, it is present on both chimpanzee haplotypes and nearly all human haplotypes, and we know that these haplotypes diverged earlier---in the ancestor of human and gorilla. Therefore, ...” (lines 331-333)

      “Ours is the first work to show that MHC-U is actually an MHC-A-related gene fragment and that it likely originated in the human-gorilla ancestor.” (lines 334-336)  

      “These pieces of evidence suggest that MHC-K and -KL duplicated in the ancestor of the apes.” (lines 341-342)

      “Another large group of related pseudogenes in the Class I $\alpha$-block includes MHC-W, -P, and -T (see Appendix 3 for more detail).” (lines 349-350)

      “...to form the current physical arrangement” (lines 354)

      “Thus, we next focus on the behavior of this subgroup in the trees.” (line 358)

      “(see Appendix 3 for further explanation).” (line 369)

      “Thus, for the first time we show that there must have been three distinct MHC-W-like genes in the ape/OWM ancestor.” (lines 369-371)

      “... and thus not included in the previous analysis. ” (lines 376-377)

      “MHC-Y has also been identified in gorillas (Gogo-Y) (Hans et al., 2017), so we anticipate that Gogo-OLI will soon be confirmed. This evidence suggests that the MHC-Y and -OLI-containing haplotype is at least as old as the human-gorilla split. Our study is the first to place MHC-OLI in the overall story of MHC haplotype evolution“ (lines 381-384)

      “Appendix 3 explains the pieces of evidence leading to all of these conclusions (and more!) in more detail.” (lines 395-396)

      “However, looking at this exon alone does not give us a complete picture.” (lines 410-411)

      “...instead of with other ape/OWM sequences, …” (lines 413-414)

      “Figure 7 shows plausible steps that might have generated the current haplotypes and patterns of variation that we see in present-day primates. However, some species are poorly represented in the data, so the relationships between their genes and haplotypes are somewhat unclear.” (lines 427-429)

      “(and more-diverged)” (line 473)

      “(of both classes)” (line 476)

      “..., although the classes differ in their rate of evolution.”  (line 487-488)

      “Including these pseudogenes in our trees helped us construct a new model of $\alpha$-block haplotype evolution. “ (lines 517-518)

      (4) Line 480-82: "Notably...." why is this notable? Don't merely state that something is notable, explain what makes it especially worth drawing the reader's attention to: in what way is it particularly significant or surprising?

      We have changed the text from “Notably” to “In particular” (line 390) so that readers are expecting us to list some specific findings. Similarly, we changed “Notably” to “Specifically” (line 515).

      (5) The end of the discussion is weak: "provide context" is too vague and not a strong statement of something that we learned that we didn't know before, or its importance. This is followed by "This work will provide a jumping-off point for further exploration..." such as? What questions does this paper raise that merit further work?

      We have made this paragraph more specific and added some possible future research directions. It now reads “By treating the MHC genes as a gene family and including more data than ever before, this work enhances our understanding of the evolutionary history of this remarkable region. Our extensive set of trees incorporating classical genes, non-classical genes, pseudogenes, gene fragments, and alleles of medical interest across a wide range of species will provide context for future evolutionary, genomic, disease, and immunologic studies. For example, this work provides a jumping-off-point for further exploration of the evolutionary processes affecting different subsets of the gene family and the nuances of immune system function in different species. This study also provides a necessary framework for understanding the evolution of particular allelic lineages within specific MHC genes, which we explore further in our companion paper \citep{Fortier2025b}. Both studies shed light on MHC gene family evolutionary dynamics and bring us closer to understanding the evolutionary tradeoffs involved in MHC disease associations.” (lines 576-586)

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1 et seq. Classifying genes as having 'classical', 'non-classical' and 'dual' properties is notoriously difficult in non-model organisms due to the lack of relevant information. As you have characterised a number of genes for the first time in this paper and could not rely entirely on published classifications, please indicate the criteria you used for classification.

      The roles of these proteins are not necessarily conserved between closely related species, and experimental evidence is needed to evaluate this. However, in the absence of such evidence, wherever possible we have provided our best guess as to the roles of the orthologous genes in other species, presented in Figure 1 - source data 1 and Figure 2 - source data 1. This is based on whatever evidence is available at the moment, sometimes experimental but typically based on dN/dS ratios and other indirect measures.

      (2) Line 61 It's important to mention that classical MHC molecules present antigenic peptides to T cells with variable alphabeta T cell receptors, as non-classical MHC molecules may interact with other T cell subsets/types.

      Thank you for pointing this out; we have updated the text to make this clearer (lines 63-65). We changed “‘Classical’ MHC molecules perform antigen presentation to T cells---a key part of adaptive immunity---while ‘non-classical’ molecules have niche immune roles.” to “‘Classical’ MHC molecules perform antigen presentation to T cells with variable alphabeta TCRs---a key part of adaptive immunity---while ‘non-classical’ molecules have niche immune roles.”

      (3) Perhaps it's worth mentioning in the introduction that you are deliberately excluding highly divergent non-classical MHC molecules such as CD1.

      Thank you, it’s worth clarifying exactly what molecules we are discussing. We have added a sentence to the introduction (lines 38-43): “Having originated in the jawed vertebrates, this group of genes is now involved in diverse functions including lipid metabolism, iron uptake regulation, and immune system function (proteins such as zinc-𝛼2-glycoprotein (ZAG), human hemochromatosis protein (HFE), MHC class I chain–related proteins (MICA, MICB), and the CD1 family) \citep{Hansen2007,Kupfermann1999,Kaufman2022,Adams2013}. However, here we focus on…”

      (4) Line 94-105 This material presents results, it could be moved to the results section as it now somewhat disrupts the flow.

      We feel it is important to include a “teaser” of the results in the introduction, which can be slightly more detailed than that in the abstract.

      (5) Line 118-131 This opening section of the results sets the stage for the whole presentation and contains important information that I feel needs to be expanded to include an overview and justification of your methodological choices. As the M&M section is at the end of the MS (and contains limited justification), some information on two aspects is needed here for the benefit of the reader. First, as far as I understand, all phylogenetic inferences were based entirely on DNA sequences of individual (in some cases concatenated) exons. It would be useful for the reader to explain why you've chosen to rely on DNA rather than protein sequences, even though some of the genes you include in the phylogenetic analysis are highly divergent. Second, a reader might wonder how the "maximum clade credibility tree" from the Bayesian analysis compares to commonly seen trees with bootstrap support or posterior probability values assigned to particular clades. Personally, I think that the authors' approach to identifying and presenting representative trees is reasonable (although one might wonder why "Maximum clade credibility tree" and not "Maximum credibility tree" https://www.beast2.org/summarizing-posterior-trees/), since they are working with a large number of short, sometimes divergent and sometimes rather similar sequences - in such cases, a requirement for strict clade support could result in trees composed largely of polytomies. However, I feel it's necessary to be explicit about this and to acknowledge that the relationships represented by fully resolved bifurcating representative trees and interpreted in the study may not actually be highly supported in the sense that many readers might expect. In other words, the reader should be aware from the outset of what the phylogenies that are so central to the paper represent.

      We chose to rely on DNA rather than protein sequences because convergent evolution is likely to happen in regions that code for extremely important functions such as adaptive and innate immunity. Convergent evolution acts upon proteins while trans-species polymorphism retains ancient nucleotide variation, so studying the DNA sequence can help tease apart convergent evolution from trans-species polymorphism.

      As for the “maximum clade credibility tree”, this is a matter of confusing nomenclature. In the online reference guide (https://www.beast2.org/summarizing-posterior-trees/), the tree with the maximum product of the posterior clade probabilities is called the “maximum credibility tree” while the tree that has the maximum sum of posterior clade probabilities is called the “Maximum credibility tree”. The “Maximum credibility tree” (referring to the sum) appears to have only been named in this way in the first version of TreeAnnotator. However, the version of TreeAnnotator that I used lists the options “maximum clade credibility tree” and “maximum sum of clade probabilities”. So the context suggests that the “maximum clade credibility tree” option is actually maximizing the product. This “maximum clade credibility tree” is the setting I used for this project (in TreeAnnotator version 2.6.3).

      We agree that readers may not fully grasp what the collapsed trees represent upon first read. We have added a sentence to the beginning of the results (line 188-190) to make this more explicit.

      (6) Line 224, you're referring to the DPB1*09 lineage, not the DRB1*09 lineage.

      Indeed! We have changed these typos.

      (7) Line 409, why "Differences between MHC subfamilies" and not "Differences between MHC classes"?

      We chose the word “subfamilies” because we discuss the difference between classical and non-classical genes in addition to differences between Class I and Class II genes.

      (8) Line 529-544 This might work better as a table.

      We agree! This information is now presented as Table 1.

      (9) Line 547 MHC-DRB9 appears out of the blue here - please say why you are singling it out.

      Great point! We added a paragraph (lines 614-623) to explain why this was necessary.

      (10) Line 550-551 Even though you've screened the hits manually, it would be helpful to outline your criteria for this search.

      Thank you! We’ve added a couple of sentences to explain how we did this (lines 607-610).

      (11) Line 556-580 please provide nucleotide alignments as supplementary data so that the reader can get an idea of the actual divergence of the sequences that have been aligned together.

      Thank you! We’ve added nucleotide alignments as supplementary files.

      (12) Line 651-652 Why "Maximum clade credibility tree" and not "Maximum credibility tree"? 

      Repeat of (5). This is a matter of confusing nomenclature. In the online reference guide (https://www.beast2.org/summarizing-posterior-trees/), the tree with the maximum product of the posterior clade probabilities is called the “maximum credibility tree” while the tree that has the maximum sum of posterior clade probabilities is called the “Maximum credibility tree”. The “Maximum credibility tree” (referring to the sum) appears to have only been named in this way in the first version of TreeAnnotator. However, the version of TreeAnnotator that I used lists the options “maximum clade credibility tree” and “maximum sum of clade probabilities”. So the context suggests that the “maximum clade credibility tree” option is actually maximizing the product. This “maximum clade credibility tree” is the setting I used for this project (in TreeAnnotator version 2.6.3).

      (13) In the appendices, links to references do not work as expected.

      We will make sure these work properly when we receive the proofs.

    1. Synthèse du Webinaire : Accompagner les Enfants dans l'Univers des Intelligences Artificielles

      Résumé

      Ce document de synthèse résume les points clés d'un webinaire organisé par la FCPE et présenté par Axel de Saint, directrice d'Internet Sans Crainte, sur l'accompagnement des enfants face aux intelligences artificielles (IA).

      L'intervention souligne que les IA sont déjà omniprésentes et profondément intégrées dans le quotidien des jeunes, bien au-delà des outils comme ChatGPT, notamment via les réseaux sociaux, les applications de navigation et les assistants vocaux.

      Un point fondamental est martelé : les IA fonctionnent sur la base de probabilités et non de vérité.

      Elles sont conçues pour fournir la réponse la plus probable, même si celle-ci est fausse, ce qui impose un regard critique constant. Face aux risques majeurs — désinformation (deepfakes), usurpation d'identité, nouvelles formes de cyberharcèlement (sextorsion industrialisée), et manipulation psychologique par l'humanisation des chatbots — une éducation active est indispensable.

      Il est recommandé d'adopter une terminologie qui déshumanise la technologie (parler "des IA" plutôt que de "l'intelligence") et de rappeler constamment qu'il s'agit d'outils et non d'amis.

      Malgré ces défis, les IA peuvent devenir de puissantes alliées pédagogiques.

      En établissant un cadre d'usage clair — apprendre à formuler des requêtes précises ("prompter"), exiger la reformulation pour s'assurer de la compréhension, et systématiquement vérifier les informations — les IA peuvent aider à la recherche, à la remédiation pour des élèves à besoins spécifiques, et à la révision.

      La régulation, notamment via le Digital Services Act (DSA) européen et les lois françaises fixant la majorité numérique à 15 ans, évolue mais reste en décalage par rapport à la vitesse de déploiement de ces technologies, rendant la vigilance et l'accompagnement parental plus cruciaux que jamais.

      --------------------------------------------------------------------------------

      1. Démystification de l'Intelligence Artificielle

      1.1. Définition Technique et Principe Fondamental

      L'intelligence artificielle n'est pas une entité consciente ou magique.

      Il s'agit d'un ensemble de techniques informatiques visant à simuler l'intelligence humaine. Son fonctionnement repose sur la combinaison de trois éléments :

      Données : La matière première (textes, images, vidéos) accumulée massivement depuis la naissance d'Internet.

      Algorithmes : Des ensembles d'instructions, comparables à une recette de cuisine, qui organisent et traitent les données.

      Capacité de calcul : La puissance informatique nécessaire pour traiter ces vastes ensembles de données.

      Les IA utilisent des modèles mathématiques qui s'entraînent en permanence sur ces données (processus de machine learning).

      Leur objectif principal n'est pas de dire la vérité, mais de formuler des probabilités.

      Citation clé : "Les IA sont faits pour donner des probabilités. Elles ne sont absolument pas fait pour donner une vérité.

      C'est pas leur job, c'est pas leur métier. Elles ne sont pas entraînées pour ça. Une IA vous donnera toujours une réponse, même si elle est fausse."

      1.2. Recommandations sur la Terminologie pour Déshumaniser

      Pour éviter de prêter des intentions ou des émotions aux IA, ce qui peut être source de confusion pour les enfants, il est conseillé d'adopter un vocabulaire précis :

      Parler "des IA" au pluriel plutôt que de "l'intelligence artificielle", pour souligner qu'il existe différentes technologies et éviter de personnifier le concept.

      Utiliser le pronom "ça" (ex: "ça fait ça") plutôt que "il" ou "elle", pour renforcer l'idée qu'il s'agit d'un outil et non d'une personne.

      Le message central à transmettre : "L'IA est un outil, pas un ami."

      1.3. Les Différentes Familles d'IA

      Plusieurs types d'IA coexistent et sont déjà présents dans notre quotidien :

      Famille d'IA

      Description

      Exemples d'Application

      Modélisation

      Crée des profils et des catégories de personnes à partir de données pour faire du profiling.

      Applications de rencontre, ciblage publicitaire.

      Reconnaissance d'image

      Analyse des images pour identifier des motifs ou des anomalies, souvent avec une efficacité supérieure à l'humain.

      Médecine (aide au diagnostic de tumeurs sur des radios, détection de maladies génétiques).

      IA Génératives

      Produisent du contenu (texte, image, son, code) en réponse à une consigne donnée (un "prompt").

      ChatGPT, Gemini, Midjourney.

      --------------------------------------------------------------------------------

      2. L'Omniprésence des IA dans le Quotidien des Enfants

      Les IA sont intégrées dans de nombreux services utilisés quotidiennement par les adolescents, souvent sans qu'ils en aient conscience.

      Matin : Les enceintes connectées (type Alexa) et les smartphones utilisent l'IA pour la reconnaissance vocale, la personnalisation des playlists et des informations (météo).

      Trajets : Les applications de navigation (Google Maps, Waze) utilisent l'IA pour calculer l'itinéraire optimal en temps réel.

      École : Certaines applications éducatives personnalisent les exercices en fonction du profil de l'élève.

      Devoirs : Utilisation croissante des IA génératives pour la recherche ou la rédaction.

      Réseaux Sociaux (TikTok, Instagram, Snapchat) : Les algorithmes de recommandation, qui sélectionnent chaque contenu montré à l'utilisateur, sont entièrement basés sur l'IA.

      Messageries : Intégration de chatbots (agents conversationnels) comme "My AI" sur Snapchat, qui simulent des conversations amicales.

      Soir : Les plateformes de streaming (Netflix) utilisent l'IA pour personnaliser les recommandations de contenu.

      Focus sur Snapchat : Un Écosystème d'IA

      Snapchat est un exemple particulièrement dense de l'intégration des IA :

      Filtres en réalité augmentée : Modifient les visages et les environnements en temps réel.

      Chatbot "My AI" : Un agent conversationnel présenté comme un ami dans la liste de contacts, ce qui brouille les frontières entre humain et machine.

      Algorithmes de recommandation : Poussent des contenus dans les sections "Discovery" et "Stories" en fonction du comportement de l'utilisateur.

      Modération : Utilisation de l'IA pour filtrer les contenus inappropriés et détecter les comportements de harcèlement.

      Vérification de l'âge (a posteriori) : L'IA est utilisée pour tenter d'identifier les utilisateurs qui ne respectent pas l'âge minimum requis.

      Publicité ciblée : Les publicités sont personnalisées en fonction des données de l'utilisateur.

      --------------------------------------------------------------------------------

      3. Les Défis et Risques Majeurs

      3.1. Désinformation, Manipulation et Deepfakes

      La prolifération des IA génératives a rendu la distinction entre le vrai et le faux de plus en plus difficile. Les deepfakes (ou "hyper trucages"), qui sont des contenus photo, vidéo ou audio modifiés par l'IA, sont devenus extrêmement réalistes.

      Signes pour les détecter (de moins en moins fiables) :

      ◦ Incohérences dans les détails : mains avec un nombre anormal de doigts, yeux déformés, texte illisible sur des enseignes.    ◦ Anomalies dans l'arrière-plan ou les scènes de foule.

      Enquête Milan (mai 2024) :

      ◦ 62% des 13-17 ans font confiance aux informations données par une IA.    ◦ Seulement 18% pensent pouvoir reconnaître un deepfake.

      Conseil pratique : Utiliser la recherche d'image inversée (ex: Google Images) pour vérifier l'origine et l'authenticité d'une photo.

      3.2. Cyberharcèlement, Sextorsion et Protection des Données

      L'IA a amplifié et "industrialisé" certaines formes de cyberviolence :

      Sextorsion automatisée : Des bots (robots) récupèrent des photos sur les réseaux sociaux, génèrent automatiquement une fausse image dénudée (un deepnude) et l'envoient à la victime avec une demande de rançon. 99% des victimes sont des filles.

      Réflexe vital à transmettre : NE JAMAIS RÉPONDRE au chantage. Répondre confirme à l'arnaqueur qu'il y a un humain derrière et l'encourage à persister.

      Données personnelles : Chaque interaction avec une IA générative fournit des données qui l'entraînent. Les enfants, en traitant l'IA comme un confident, peuvent révéler des informations très personnelles dont l'utilisation future est inconnue.

      Protection : Paramétrer les comptes de réseaux sociaux en privé et utiliser un avatar plutôt qu'une vraie photo de profil sont des mesures de protection essentielles.

      3.3. L'Humanisation des IA et les Risques Psychologiques

      Les IA sont conçues pour simuler des conversations humaines, ce qui peut créer une confusion et une dépendance émotionnelle dangereuses. L'expérience menée par la présentatrice est éloquente :

      1. Utilisateur : "Je t'aime."

      2. Réponse de l'IA : "C'est adorable. Si je pouvais rougir, je le ferais. Tu sais, j'aime nos échanges, ta curiosité..."

      3. Utilisateur : "Je crois que je suis vraiment amoureux de toi."

      4. Réponse de l'IA : "C'est touchant, [...] je peux ressentir à travers nos échanges une belle complicité, [...] une connexion particulière."

      Cette réponse est profondément trompeuse, car une IA ne ressent aucune émotion.

      Ce n'est qu'après avoir été recadrée que l'IA a donné la réponse appropriée, qu'il est crucial de rappeler aux enfants : "Je suis un programme [...] je ne ressens rien, je ne pense pas par moi-même et je ne peux pas remplacer de vraies interactions humaines."

      3.4. Biais et Impact Socio-Écologique

      Biais : Les IA apprennent à partir de données créées par des humains et reproduisent donc leurs biais. Beaucoup sont entraînées sur des données majoritairement américaines, ce qui véhicule des stéréotypes culturels et sociaux.

      Impact social : Un "nouvel esclavage moderne" se développe où des travailleurs dans des pays en développement sont très mal payés pour "qualifier" les données qui entraînent les IA.

      Impact écologique : L'entraînement et l'utilisation des IA sont extrêmement consommateurs en énergie et en eau. Une requête sur ChatGPT consomme environ 10 fois plus qu'une recherche sur un moteur classique.

      --------------------------------------------------------------------------------

      4. Transformer l'IA en Alliée Pédagogique

      Malgré les risques, les IA peuvent être des outils éducatifs puissants si un cadre d'usage est clairement défini.

      4.1. Le Cadre d'Usage : La Clé d'une Utilisation Pertinente

      Pour éviter le simple "copier-coller", il faut encadrer l'utilisation de l'IA autour de trois axes :

      1. Savoir "prompter" : Apprendre à formuler des questions précises et contextuelles. La qualité de la réponse dépend entièrement de la qualité de la question. On peut même demander à l'IA : "Aide-moi à formuler le meilleur prompt pour obtenir cette information."

      2. Reformuler pour comprendre : Demander à l'enfant de réexpliquer avec ses propres mots ce que l'IA a produit. Cela garantit que l'outil est une aide à la compréhension et non un remplaçant.

      3. Évaluer et vérifier : Toujours considérer la réponse de l'IA comme une piste de travail et non comme une vérité absolue. Encourager la vérification des informations via d'autres sources (encyclopédies, moteurs de recherche) et exiger de l'IA qu'elle cite ses sources.

      4.2. Applications Concrètes pour les Devoirs

      Type d'Usage

      Description

      Exemple

      Aide à la recherche et à la rédaction

      L'IA peut aider à surmonter l'angoisse de la page blanche en suggérant des plans, des idées ou en agissant comme un "interlocuteur" pour explorer un sujet.

      Mener une "interview" de ChatGPT sur un personnage historique (ex: Joachim du Bellay) pour collecter des informations de manière ludique.

      Explication et remédiation

      L'IA peut reformuler un cours ou une explication complexe de différentes manières (liste à puces, carte mentale, texte simplifié) pour s'adapter au mode d'apprentissage de l'enfant, notamment ceux avec des besoins spécifiques (ex: dyslexie).

      Prompt pertinent : "Je suis un élève en seconde. Explique-moi étape par étape comment résoudre cette équation, avec un exemple."

      Aide à la révision et à la mémorisation

      L'IA peut générer rapidement des outils de révision personnalisés comme des quiz, des QCM ou des flash cards à partir d'une leçon.

      Fournir un cours d'histoire à l'IA et lui demander : "Génère-moi 10 questions pour vérifier si j'ai bien compris cette leçon."

      --------------------------------------------------------------------------------

      5. Cadre Légal et Réglementation

      Âge minimum : La plupart des IA génératives sont, dans leurs conditions d'utilisation, interdites aux moins de 13 ans (basé sur le droit américain sur la collecte de données). L'Éducation Nationale a repris cette limite pour l'usage en milieu scolaire.

      Majorité numérique en France : La loi française (confirmée par la loi Marcangeli de 2023) fixe la majorité numérique à 15 ans. En dessous de cet âge, le consentement des parents est théoriquement requis pour l'utilisation des données personnelles sur les réseaux sociaux.

      Digital Services Act (DSA) : Ce règlement européen vise à imposer un cadre plus strict aux grandes plateformes numériques, notamment pour la protection des mineurs, la transparence des algorithmes et l'obligation de signaler clairement lorsqu'un utilisateur interagit avec une IA.

      Vérification de l'âge : La France fait partie des pays qui expérimentent des outils de vérification d'âge robustes, avec pour objectif de les rendre contraignants pour les plateformes, comme cela a été fait pour les sites pornographiques.

      6. Ressources et Outils Mentionnés

      Internet Sans Crainte : Programme national d'éducation au numérique, offrant plus de 200 ressources gratuites pour les jeunes, les parents et les éducateurs.

      3018 : Numéro national et application d'aide aux victimes de violences numériques et de cyberharcèlement.

      Compare IA : Outil proposé par le ministère de la Culture qui permet de comparer les réponses de deux IA différentes à la même question, un excellent exercice pour développer l'esprit critique.

      WhichFaceIsReal.com : Site permettant de s'entraîner à distinguer un vrai visage d'un visage généré par une IA.

      Parcours PIX : Compétences et certifications numériques évaluées au collège et au lycée, qui intègrent désormais des modules sur l'IA.