10,000 Matching Annotations
  1. Nov 2025
    1. Reviewer #1 (Public review):

      Summary:

      Dorrego-Rivas et al. investigated two different DA neurons and their neurotransmitter release properties in the main olfactory bulb. They found that the two different DA neurons in mostly glomerular layers have different morphologies as well as electrophysiological properties. The anaxonic DA neurons are able to self-inhibit but the axon-bearing ones are not. The findings are interesting and important to increase the understanding both of the synaptic transmissions in the main olfactory bulb and the DA neuron diversity. However, there are some major questions that the authors need to address to support their conclusions.

      (1) It is known that there are two types of DA neurons in the glomerular layer with different diameters and capacitances (Kosaka and Kosaka, 2008; Pignatelli et al., 2005; Angela Pignatelli and Ottorino Belluzzi, 2017). In this manuscript, the authors need to articulate better which layer the imaging and ephys recordings took place, all glomerular layers or with an exception. Meanwhile, they have to report the electrophysiological properties of their recordings, including capacitances, input resistance, etc.

      (2) It is understandable that recording the DA neurons in the glomerular layer is not easy. However, the authors still need to increase their n's and repeat the experiments at least three times to make their conclusion more solid. For example (but not limited to), Fig 3B, n=2 cells from 1 mouse. Fig.4G, the recording only has 3 cells.

      (3) The statistics also use pseudoreplicates. It might be better to present the biology replicates, too.

      (4) In Figure 4D, the authors report the values in the manuscript. It is recommended to make a bar graph to be more intuitive.

      (5) In Figure 4F and G, although the data with three cells suggest no phenotype, the kinetics looked different. So, the authors might need to explore that aside from increasing the n.

      (6) Similarly, for Figure 4I and J, L and M, it is better to present and analyze it like F and G, instead of showing only the after-antagonist effect.

      Comments on revisions:

      In the rebuttal, the authors argued that it had been extremely hard to obtain recordings stable enough for before-and-after effects on the same cell. Alternatively, they could perform the before-and-after comparison on different cells.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      This Reviewer was positive about the study, stating ‘The findings are interesting and important to increase the understanding both of the synaptic transmissions in the main olfactory bulb and the DA neuron diversity.’ They provided a number of helpful suggestions for improving the paper, which we have incorporated as follows:

      (1) It is known that there are two types of DA neurons in the glomerular layer with different diameters and capacitances (Kosaka and Kosaka, 2008; Pignatelli et al., 2005; Angela Pignatelli and Ottorino Belluzzi, 2017). In this manuscript, the authors need to articulate better which layer the imaging and ephys recordings took place, all glomerular layers or with an exception. Meanwhile, they have to report the electrophysiological properties of their recordings, including capacitances, input resistance, etc.

      We thank the Reviewer for this clarification. Indeed, the two dopaminergic cell types we study here correspond directly to the subtypes previously identified based on cell size. Our previous work showed that axon-bearing OB DA neurons have significantly larger somas than their anaxonic neighbours (Galliano et al. 2018), and we replicate this important result in the present study (Figure 3D). In terms of electrophysiological correlates of cell size, we now provide full details of passive membrane properties in the new Supplementary Figure 4, as requested. Axon-bearing DA neurons have significantly lower input resistance and show a non-significant trend towards higher cell capacitance. Both features are entirely consistent with the larger soma size in this subtype. We apologise for the oversight in not fully describing previous categorisations of OB DA neurons, and have now added this information and the appropriate citations to the Introduction (lines 56 to 59 of the revised manuscript). 

      In terms of cell location, all cells in this study were located in the OB glomerular layer. We sampled the entire glomerular layer in all experiments, including the glomerular/EPL border where the majority of axon-bearing neurons are located (Galliano et al. 2018). This is now clarified in the Materials and Methods section (lines 535 to 537 and 614 to 616 of the revised manuscript).

      (2) It is understandable that recording the DA neurons in the glomerular layer is not easy. However, the authors still need to increase their n's and repeat the experiments at least three times to make their conclusion more solid. For example (but not limited to), Fig 3B, n=2 cells from 1 mouse. Fig.4G, the recording only has 3 cells.

      Despite the acknowledged difficulty of these experiments, we have now added substantial extra data to the study as requested. We have increased the number of cells and animals to further support the following findings:

      Fig 3B: we now have n=5 cells from N=3 mice. We have created a new Supplementary Figure 1 to show all the examples.

      Figure 4G: we now have n=6 cells from N=4 mice.

      Figure 5G: we now have n=3 cells from N=3 mice.

      The new data now provide stronger support for our original conclusions. In the case of auto-evoked inhibition after the application of D1 and D2 receptor antagonists, a nonsignificant trend in the data suggests that, while dopamine is clearly not necessary for the response, it may play a small part in its strength. We have now included this consideration in the Results section (lines 256 to 264 of the revised manuscript).

      (3) The statistics also use pseudoreplicates. It might be better to present the biology replicates, too.

      Indeed, in a study focused on the structural and functional properties of individual neurons, we performed all comparisons with cell as the unit of analysis. This did often (though not always) involve obtaining multiple data points from individual mice, but in these low-throughput experiments n was never hugely bigger than N. The potential impact of pseudoreplicates and their associated within-animal correlations was therefore low. We checked this in response to the Reviewer’s comment by running parallel nested analyses for all comparisons that returned significant differences in the original submission. These are the cases in which we would be most concerned about potential false positive results arising from intra-animal correlations, which nested tests specifically take into account (Aarts et al., 2013). In every instance we found that the nested tests also reported significant differences between anaxonic and axonbearing cell types, thus fully validating our original statistical approach. We now report this in the relevant section of the Materials and Methods (lines 686 to 691 of the revised manuscript).

      (4) In Figure 4D, the authors report the values in the manuscript. It is recommended to make a bar graph to be more intuitive.

      This plot does already exist in the original manuscript. We originally describe these data to support the observation that an auto-evoked inhibition effect exists in anaxonic neurons (corresponding to now lines 240 to 245 of the revised manuscript). We then show them visually in their entirety when we compare them to the lack of response in axon-bearing neurons, depicted in Figure 5C. We still believe that this order of presentation is most appropriate for the flow of information in the paper, so have maintained it in our revised submission.

      (5) In Figure 4F and G, although the data with three cells suggest no phenotype, the kinetics looked different. So, the authors might need to explore that aside from increasing the n.

      We thank the Reviewer for this suggestion. To quantify potential changes in the autoevoked inhibition response kinetics, we fitted single exponential functions and compared changes in the rate constant (k; Methods, lines 650 to 652 of the revised manuscript). Overall, we observed no consistent or significant change in rate constant values after adding DA receptor antagonists. This finding is now reported in the Results section (lines 260 to 263 of the revised manuscript) and shown in a new Supplementary Figure 3.

      (6) Similarly, for Figure 4I and J, L and M, it is better to present and analyze it like F and G, instead of showing only the after-antagonist effect.

      We agree that the ideal scenario would have been to perform the experiments in Figure 4J and 4M the same way as those in Figure 4G, with a before vs after comparison. Unfortunately, however, this was not practically possible. 

      When attempting to apply carbenoxelone to already-patched cells, we found that this drug highly disrupted the overall health and stability of our recordings immediately after its application. This is consistent with previous reports of similar issues with this compound (e.g. Connors 2012, Epilepsy Currents; Tovar et al., 2009, Journal of Neurophysiology). After many such attempts, the total yield of this experiment was one single cell from one animal. Even so, as shown in the traces below, we were able to show that the auto-evoked inhibition response was not eliminated in this specific case:

      Author response image 1.

      Traces of an AEI response recorded before (magenta) and after (green) the application of carbenoxolone (n=1 cell from N=1 mouse).

      In light of these issues, we instead followed published protocols in applying the carbenoxolone directly in the bath without prior recording for 20 minutes (following Samailova et al., 2003, Journal of Neurochemistry) and ran the protocol after that time. Given that our main question was to ask whether gap junctions were strictly necessary for the presence of any auto-evoked inhibition response, our positive findings in these experiments still allowed us to draw clear conclusions.

      In contrast, the issue with the NKCC1 antagonist bumetanide was time. As acknowledged by this Reviewer, obtaining and maintaining high-quality patch recordings from OB DA neurons is technically challenging. Bumetanide is a slow-acting drug when used to modify neuronal chloride concentrations, because in addition to the time it takes to reach the neurons and effectively block NKCC1, the intracellular levels of chloride subsequently change slowly. Studies using this drug in slice physiology experiments typically use an incubation time of at least 20 minutes (e.g. Huberfeld et al., 2007, Journal of Neuroscience), which was incompatible with productive data collection in OB DA neurons. Again, after many unsuccessful efforts, we were forced instead to include bumetanide in the bath without prior recording for 20-30 minutes. As with the carbenoxolone experiment, our goal here was to establish whether autoevoked inhibition was in any way retained in the presence of this drug, so our positive result again allowed us to draw clear conclusions.

      Reviewer #1 (Recommendations for the authors):

      (1) I suggest the authors reconsider the terminology. For example, they use "strikingly" in their title. The manuscript reported two different transmitter release strategies but not the mechanisms, and the word "strikingly" is not professional, either.

      We appreciate the Reviewer’s attention to clarity and tone in the manuscript title, and have nevertheless decided to retain the original wording. The almost all-or-nothing differences between closely related cell types shown in structural and functional properties here (Figures 3F & 5C) are pronounced, extremely clear and easily spotted – all properties appropriate for the word ‘striking.’ In addition, we note that the use of this term is not at all unprofessional, with a PubMed search for ‘strikingly’ in the title of publications returning over 200 hits.

      (2) Similarly, almost all confocal scopes are 3D because images can be taken at stacks. So "3D confocal" is misleading.

      We understand that this is misleading. We have now replaced the sentence ‘Example snapshot of a 3D confocal stack of…’ by ‘Example confocal images of…’ in all the figure legends that apply.

      (3) It is recommended to present the data in bar graphs with data dots instead of showing the numbers in the manuscript directly.

      We agree entirely, and now present data plots for all comparisons reported in the study (Supplementary Figures 2, 4 and 5).

      Reviewer #2 (Recommendations for the authors):

      (1) Several experiments report notably small sample sizes, such as in Figures 3B and 5G, where data from only 2 cells derived from 1-2 mice are presented. Figures 4E-G also report the experimental result only from 3 cells derived from 3 mice. To enhance the statistical robustness and reliability of the findings, these experiments should be replicated with larger sample sizes.

      As per our response to Reviewer 1’s comment #2 above, and to directly address the concern that some evidence was ‘incomplete’, we have now added significant extra data and analysis to this revised submission (Figures 4 and 5; and Supplementary Figure 1). We believe that this has further enhanced the robustness and reliability of our findings, as requested.

      (2) The authors utilize vGAT-Cre for Figures 1-3 and DAT-tdTomato for Figures 4-5, raising concerns about consistency in targeting the same population of dopaminergic neurons. It remains unclear whether all OB DA neurons express vGAT and release GABA. Clarification and additional evidence are needed to confirm whether the same neuronal population was studied across these experiments.

      Although we indeed used different mouse lines to investigate structural and functional aspects of transmitter release, we can be very confident that both approaches allowed us to study the same two distinct DA cell types being compared in this paper. Existing data to support this position are already clear and strong, so in this revision we have focused on the Reviewer’s suggestion to clarify the approaches we chose.

      First, it is well characterised that in mouse and many other species all OB DA neurons are also GABAergic. This has been demonstrated comprehensively at the level of neurochemical identity and in terms of dopamine/GABA co-release, and is true across both small-soma/anaxonic and large-soma/axon-bearing subclasses (Kosaka & Kosaka 2008; 2016; Maher & Westbrook 2008; Borisovska et al., 2013; Vaaga et al., 2016; Liu et al. 2013). To specifically confirm vGAT expression, we have also now provided additional single-cell RNAseq data and immunohistochemical label in a revised Figure 1 (see also Panzanelli et al., 2007, now referenced in the paper, who confirmed endogenous vGAT colocalisation in TH-positive OB neurons). Most importantly, by using vGAT-cre mice here we were able to obtain sufficient numbers of both anaxonic and axon-bearing DA neurons among the vGAT-cre-expressing OB population. We could unambiguously identify these cells as dopaminergic because of their expression of TH protein which, due to the absence of noradrenergic neurons in the OB, is a specific and comprehensive marker for dopaminergic cells in this brain region (Hokfelt et al., 1975; Rosser et al., 1986; Kosaka & Kosaka 2016). Crucially, both axon-bearing and anaxonic OB DA subtypes strongly express TH (Galliano et al., 2018, 2021). We have now added additional text to the relevant Results section (lines 99 to 108 of the revised manuscript) to clarify these reasons for studying vGAT-cre mice here.

      We were also able to clearly identify and sample both subtypes of OB DA neuron using DAT-tdT mice. Our previous published work has thoroughly characterised this exact mouse line at the exact ages studied in the present paper (Galliano et al., 2018; Byrne et al., 2022). We know that DAT-tdT mice provide rather specific label for TH-expressing OB DA neurons (75% co-localisation; Byrne et al., 2022), but most importantly we know which non-DA neurons are labelled in this mouse line and how to avoid them. All nonTH-expressing but tdT-positive cells in juvenile DAT-tdT mice are small, dimly fluorescent and weakly spiking neurons of the calretinin-expressing glomerular subtype (Byrne et al., 2022). These cells are easily detected during physiological recordings, and were excluded from our study here. This information is now provided in the relevant Methods section (lines 616 to 619 of the revised manuscript, also referenced in lines 236 to 240 of the results section), and we apologise for its previous omission. Finally, we have shown both structurally and functionally that both axon-bearing and anaxonic OB DA subtypes are labelled in DAT-tdT mice (Galliano et al., 2018, Tufo et al., 2025; present study). Overall, these additional clarifications firmly establish that the same neuronal populations were indeed studied across our experiments.

      (3) The low TH+ signal in Figure 1D raises questions regarding the successful targeting of OB DA neurons. Further validation, such as additional staining, is required to ensure that the targeted neurons are accurately identified.

      As noted in our response to the previous comment, TH is a specific marker for dopaminergic neurons in the mouse OB, and is widely used for this purpose. Labelling for TH in our tissue is extremely reliable, and in fact gives such strong signal that we were forced to reduce the primary antibody concentration to 1:50,000 to prevent bleedthrough into other acquisition channels. Even at this concentration it was extremely straightforward to unambiguously identify TH-positive cells based on somatic immunofluorescence. We recognise, however, that the original example image in Figure 1D was not sufficiently clear, and have now provided a new example which illustrates the TH-based identification of these cells much more effectively. 

      (4) Estimating the total number of dopaminergic neurons in the olfactory bulb, along with the relative proportions of anaxonic and axon-bearing neuron subtypes, would provide valuable context for the study. Presenting such data is crucial to underscore the biological significance of the findings.

      This information has already been well characterised in previous studies. Total dopaminergic cell number in the OB is ~90,000 (Maclean & Shipley, 1988; Panzanelli et al., 2007; Parrish-Aungst et al., 2007). In terms of proportions, anaxonic neurons make up the vast majority of these cells, with axon-bearing neurons representing only ~2.5% of all OB dopaminergic neurons at P28 (Galliano et al., 2018). Of course, the relatively low number of the axon-bearing subtype does not preclude its having a potentially large influence on glomerular networks and sensory processing, as demonstrated by multiple studies showing the functional effects of inter-glomerular inhibition (Kosaka & Kosaka, 2008; Liu et al., 2013; Whitesell et al., 2013; Banerjee et al., 2015). This information has now been added to the Introduction (line 47 and lines 59 to 62 of the revised manuscript).

      (5) The authors report that in-utero injection was performed based on the premise that the two subclasses of dopaminergic neurons in the olfactory bulb are generated during embryonic development. However, it remains unclear whether in-utero injection is essential for distinguishing between these two subclasses. While the manuscript references a relevant study, the explanation provided is insufficient. A more detailed justification for employing in-utero injection would enhance the manuscript's clarity and methodological rigor.

      We apologise for the lack of clarity in explaining the approach. In utero injection is not absolutely essential for distinguishing between the two subclasses, but it does have two major advantages. 1) Because infection happens before cells migrate to their final positions, it produces sparse labelling which permits later unambiguous identification of individual cells’ processes; and 2) Because both subclasses are generated embryonically (compared to the postnatal production of only anaxonic DA neurons), it allows effective targeting of both cell types. We have now expanded the relevant section of the Results to explain the rationale for our approach in more detail (lines 109 to 116 of the revised manuscript).

      (6) In Figures 1A and 4A, it appears that data from previously published studies were utilized to illustrate the differential mRNA expression in dopaminergic neurons of the olfactory bulb. However, the Methods section and the manuscript lack a detailed description of how these dopaminergic neurons were classified or analyzed. Given that these figures contribute to the primary dataset, providing additional explanation and context is essential to ensure clarity of the findings.

      We apologise for the lack of clarity. We have now extended the part of the methods referring to the RNAseq data analysis (lines 666 to 678 of the revised manuscript). 

      (7) In Figure 2C, anaxonic dopamine neurons display considerable variability in the number of neurotransmitter release sites, with some neurons exhibiting sparse sites while others exhibit numerous sites. The authors should address the potential biological or methodological reasons for this variability and discuss its significance.

      We thank the Reviewer for highlighting this feature of our data. We have now outlined potential methodological reasons for the variability, whilst also acknowledging that it is consistent with previous reports of presynaptic site distributions in these cells (Kiyokage et al., 2017; Results, lines 169 to 172 of the revised manuscript). We have also added a brief discussion of the potential biological significance (Discussion, lines 446 to 450).

      (8) In the images used to differentiate anaxonic and axon-bearing neurons, the soma, axons, and dendrites are intermixed, making it difficult to distinguish structures specific to each subclass. Employing subclass-specific labeling or sparse labeling techniques could enhance clarity and accuracy in identifying these structures.

      Distinguishing these structures is indeed difficult, and was the main reason we used viral label to produce sparse labelling (see response to comment #5 above). In all cases we were extremely careful, including cells only when we could be absolutely certain of their anaxonic or axon-bearing identity, and could also be certain of the continuity of all processes. Crucially, while the 2D representations we show in our figures may suggest a degree of intermixing, we performed all analyses on 3D image stacks, significantly improving our ability to accurately assign structures to individual cells. We have now added extra descriptions of this approach in the relevant Methods section (lines 546 to 548 of the revised manuscript).

      (9) In Figure 3, the soma area and synaptophysin puncta density are compared between axon-bearing and anaxonic neurons. However, the figure only presents representative images of axon-bearing neurons. To ensure a fair and accurate comparison, representative images of both neuron subtypes should be included.

      The original figures did include example images of puncta density (or lack of puncta) in both cell types (Figure 2B and Figure 3E). For soma area, we have now included representative images of axon-bearing and anaxonic neurons with an indication of soma area measurement in a new Supplementary Figure 2A.

      (10) In Figure 4B, the authors state that gephyrin and synaptophysin puncta are in 'very close proximity.' However, it is unclear whether this proximity is sufficient to suggest the possibility of self-inhibition. Quantifying the distance between gephyrin and synaptophysin puncta would provide critical evidence to support this claim. Additionally, analyzing the distribution and proportion of gephyrinsynaptophysin pairs in close proximity would offer further clarity and strengthen the interpretation of these findings.

      We thank the Reviewer for raising this issue. We entirely agree that the example image previously shown did not constitute sufficient evidence to claim either close proximity of gephyrin and synaptophysin puncta, nor the possibility of self-inhibition. We are not in a position to perform a full quantitative analysis of these spatial distributions, nor do we think this is necessary given previous direct evidence for auto-evoked inhibition in OB dopaminergic cells (Smith and Jahr, 2002; Murphy et al., 2005; Maher and Westbrook, 2008; Borisovska et al., 2013) and our own demonstration of this phenomenon in anaxonic neurons (Figure 4). We have therefore removed the image and the reference to it in the text. 

      (11) In Figures 4J and 4M, the effects of the drugs are presented without a direct comparison to the control group (baseline control?). Including these baseline control data is essential to provide a clear context for interpreting the drug effects and to validate the conclusions drawn from these experiments.

      We appreciate the Reviewer’s attention to this important point. As this concern was also raised by Reviewer 1 (their point #6), we have provided a detailed response fully addressing it in our replies to Reviewer 1 above. 

      (12) In Lines 342-344, the authors claim that VMAT2 staining is notoriously difficult. However, several studies (e.g., Weihe et al., 2006; Cliburn et al., 2017) have successfully utilized VMAT2 staining. Moreover, Zhang et al., 2015 - a reference cited by the authors - demonstrates that a specific VMAT2 antibody effectively detects VMAT2. Providing evidence of VMAT2 expression in OB DA neurons would substantiate the claim that these neurons are GABA-co-releasing DA neurons and strengthen the study's conclusions.

      As noted in response to this Reviewer’s comment #2 above, there is clear published evidence that OB DA neurons are GABA- and dopamine-releasing cells. These cells are also known to express VMAT2 (Cave et al., 2010; Borisovska et al., 2013; Vergaña-Vera et al., 2015). We do not therefore believe that additional evidence of VMAT2 expression is necessary to strengthen our study’s conclusions. We did make every effort to label VMAT2-positive release sites in our neurons, but unfortunately all commercially available antibodies were ineffective. The successful staining highlighted by the Reviewer was either performed in the context of virally driven overexpression (Zhang et al., 2015) or was obtained using custom-produced antibodies (Weihe et al., 2006; Cliburn et al., 2017). We have now modified the Discussion text to provide more clarification of these points (lines 393 to 395 of the revised manuscript).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This paper investigates the physical mechanisms underlying cell intercalation, which then enables collective cell flows in confluent epithelia. The authors show that T1 transitions (the topological transitions responsible for cell intercalation) correspond to the unbinding of groups of hexatic topological defects. Defect unbinding, and hence cell intercalation and collective cell flows, are possible when active stresses in the tissue are extensile. This result helps to rationalize the observation that many epithelial cell layers have been found to exhibit extensile active nematic behavior.

      Strengths

      The authors obtain their results based on a combination of active hexanematic hydrodynamics and a multiphase field (MPF) model for epithelial layers, whose connection is a strength of the paper. With the hydrodynamic approach, the authors find the active flow fields produced around hexatic topological defects, which can drive defect unbinding. Using the MPF simulations, the authors show that T1 transitions tend to localize close to hexatic topological defects.

      We are grateful to Reviewer #1, for appreciating and highlighting the strengths of work.

      Weaknesses

      Citations are sometimes not comprehensive. Cases of contractile behavior found in collective cell flows, which would seemingly contradict some of the authors’ conclusions, are not discussed.

      I encourage the authors to address the comments and questions below.

      We are thankful to Reviewer #1, for their questions and comments. We have addressed them point by point below, and have amended the manuscript accordingly.

      (1) In Equation 1, what do the authors mean by the cluster’s size ℓ? How is this quantity defined? The calculations in the Methods suggest that ℓ indicates the distance between the p-atic defects and the center of the T1 cell cluster, but this is not clearly defined.

      We are thank Reviewer #1 for their question. We define the cluster size as the initial distance between the center of the quadrupole and any defect (see Methods). In a primary cell cluster, where cells themselves are the defects, the cluster’s size is the distance between the center of the central junction and the center of any cell in the cluster. Hence, this is half the diameter of an cell which, for example in a typical, confluent MDCK epithelial monolayer, would be about 10µm. We have added this clarification in the definition of the cluster size, above Eq. (1).

      (2) The multiphase field model was developed and reviewed already, before the Loewe et al. 2020 paper that the authors cite. Earlier papers include Camley et al. PNAS 2014, Palmieri et al. Sci. Rep. 2015, Mueller et al. PRL 2019, and Peyret et al. Biophys. J. 2019, as reviewed in Alert and Trepat. Annu. Rev. Condens. Matter Phys. 2020.

      We thank the referee for their suggestion to incorporate further MPF literature. We have done so in the amended manuscript.

      (3) At what time lag is the mean-squared displacement in Figure 3f calculated? How does the choice of a lag time affect these data and the resulting conclusions?

      The scatter plot in Fig. 3f was constructed by dividing the system into square subregions of size ∆ℓ = 35 l.u., each containing approximately 4 cells. For each subregion, we analyzed a time window of ∆t = 25 × 10<sup>3</sup> iterations, measuring both the normalized mean square displacement of cells (relative to the subregion area ∆ℓ<sup>2</sup>) and the average defect density. The normalized displacement is calculated as m.s.d. , where t∗ denotes the start time of the observation window. We chose the time window ∆t used to compute the mean square displacement to match the characteristic duration of T1 events and defect lifetimes in our simulations. Observation times much longer (∆t > 35 × 10<sup>3</sup>) than the typical T1 event duration would cause the two sets of data points to merge into a single group, suggesting no correlation between cell motility and defect density beyond defect life-time.

      (4) The authors argue that their results provide an explanation for the extensile behavior of cell layers. However, there are also examples of contractile behavior, such as in Duclos et al., Nat. Phys., 2017 and in P´erez-Gonz´alez et al., Nat. Phys., 2019. In both cases, collective cell flows were observed, which in principle require cell intercalations. How would these observations be rationalized with the theory proposed in this paper? Can these experiments and the theory be reconciled?

      The contractile or extensile nature of stress in epithelia depends crucially on the specific tissue type and its biological context. Different cell populations, depending on their position along the epithelial/mesenchymal spectrum, can exhibit either contractile or extensile behaviors. Our theory applies to tissues where hexatic order dominates at the cellular scale, particularly in confluent systems where neighbor exchanges occur primarily through T1 transitions. In contrast, the systems studied by Duclos et al., Nat. Phys. (2018) and Perez-Gonzalez et al. (Nat. Phys., 2019) exhibit nematic order at the cellular level, meaning their dynamics are governed by fundamentally different mechanisms. Since our framework is derived for hexatic-dominated tissues, it does not directly apply to those cases, though a hybrid hexanematic descriptions previously developed by some of the authors in Armengol-Collado et al. eLife 13:e86400 (2024) could help reconcile these observations. In general, a key distinction must be made between the contractility of individual cells and the extensile/contractile nature of the collective force network. To illustrate this, consider a cell exerting a 6- fold symmetric force distribution: each vertex force arises from an imbalance in junctional tensions with neighboring cells, which are themselves contractile due to actomyosin activity. However, the resulting vertex forces can be either contractile or extensile depending on network geometry and tension distribution. This is captured in our coarse-grained description [see Armengol-Collado et al. eLife 13:e86400 (2024)], where the active stress emerges from higher-order moments of cellular forces. Specifically, the deviatoric part of the hexatic active stress tensor , where is the cell radius, the number cell density and the intensity of cellular tension. The negative sign of the coefficient of the active stress shows that the active stress is extensile—consistently with observations in various epithelial systems (e.g., Saw et al., Nature 2017; Blanch-Mercader et al., Phys. Rev. Lett. 2018). Finally, we note that the connection between cellular-scale forces and large-scale extensility has been rationalized in other contexts, such as active nematics (Balasubramaniam et al., Nat. Mater. 2021).

      Reviewer #2 (Public Review):

      This paper studies the role of hexatic defects in the collective migration of epithelia. The authors emphasize that epithelial migration is driven by cell intercalation events and not just isolated T1 events, and analyze this through the lens of hexatic topological defects. Finally, the authors study the effect of active and passive forces on the dynamics of hexatic defects using analytical results, and numerical results in both continuum and phase-field models.

      The results are very interesting and highlight new ways of studying epithelial cell migration through the analysis of the binding and unbinding of hexatic defects.

      We are grateful to Reviewer #2, for their interest and for emphasizing the novelty of our work.

      Strengths

      (1) The authors convincingly argue that intercalation events are responsible for collective cell migration, and that these events are accompanied by the formation and unbinding of hexatic topological defects.

      (2) The authors clearly explain the dynamics of hexatic defects during T1 transitions, and demonstrate the importance of active and passive forces during cell migration.

      (3) The paper thoroughly studies the T1 transition through the viewpoint of hexatic defects. A continuum model approach to study T1 transitions in cell layers is novel and can lead to valuable new insights.

      We thank the Reviewer for their kind and supporting words, and for highlighting the clarity, persuasiveness, and thoroughness.

      Weaknesses

      (1) The authors could expand on the dynamics of existing hexatic defects during epithelial cell migration, in addition to how they are created during T1 transitions.

      We thank the referee for their comment. The detailed analysis of dislocation-pair unbinding modes and their statistical impact on the transition to collective migration is comprehensively addressed in our subsequent work Puggioni et al., arXiv:2502.09554. In the present study, we focus specifically on the fundamental mechanism enabling dislocation unbinding: active extensile stresses generate flows that drive dislocation pairs apart, while passive elastic stresses tend to pull them together (Krommydas et al., Phys. Rev. Lett. 2023; Armengol- Collado et al., arXiv:2502.13104). When active forces dominate over passive restoring forces, the dislocations unbind. This represents a crucial distinction from classical Berezinskii–Kosterlitz–Thouless or Kosterlitz–Thouless–Halperin–Nelson–Youn transitions, where thermal fluctuations drive defect unbinding. In our system, the process is fundamentally activity-driven. Nevertheless, the resulting state - characterized by unbound defects and collective migration - bears strong analogy to the melting transition in equilibrium systems. We emphasize that the dynamics of passive defects has been previously examined in Krommydas et al., Phys. Rev. Lett. 2023. A discussion of these aspects can be found in the Appendix “Numerical simulations of defect annihilation and unbinding”.

      (2) The different terms in the MPF model used to study cell layer dynamics are not fully justified. In particular, it is not clear why the model includes self-propulsion and rotational diffusion in addition to nematic and hexatic stresses, and how these quantities are related to each other.

      We thank the referee for their comment. The MPF model’s terms (e.g., self-propulsion, rotational diffusion), reflect the stochastic, deformable nature of cells as active droplets migrating with near-constant speed. We emphasize that self-propulsion is the only non-equilibrium mechanism in our model — no additional active stresses (nematic or hexatic) are imposed. We have clarified this point in the revised manuscript and expanded our discussion of the MPF model.

      (3) The authors could provide some physical intuition on what an active extensile or contractile term in the hexatic order parameter means, and how this is related to extensility and contractility in active nematics and/or for cell layers.

      We thank the referee for their comment. As we explain in the reply to comment [4] of Reviewer #1, the contractile or extensile nature of stress in epithelia depends crucially on the specific tissue type and its biological context. Different cell populations, depending on their position along the epithelial/mesenchymal spectrum, can exhibit either contractile or extensile behaviors. Our theory applies to tissues where hexatic order dominates at the cellular scale, particularly in confluent systems where neighbor exchanges occur primarily through T1 transitions. In contrast, the systems studied by Duclos et al., Nat. Phys. (2018) and Perez-Gonzalez et al. (Nat. Phys., 2019) exhibit nematic order at the cellular level, meaning their dynamics are governed by fundamentally different mechanisms. Since our framework is derived for hexatic-dominated tissues, it does not directly apply to those cases, though a hybrid hexanematic descriptions previously developed by some of the authors in Armengol-Collado et al. eLife 13:e86400 (2024) could help reconcile these observations. In general, a key distinction must be made between the contractility of individual cells and the extensile/contractile nature of the collective force network. To illustrate this, consider a cell exerting a 6-fold symmetric force distribution: each vertex force arises from an imbalance in junctional tensions with neighboring cells, which are themselves contractile due to actomyosin activity. However, the resulting vertex forces can be either contractile or extensile depending on network geometry and tension distribution. This is captured in our coarse-grained description [see Armengol-Collado et al. eLife 13:e86400 (2024)], where the active stress emerges from higher-order moments of cellular forces. Specifically, the deviatoric part of the hexatic active stress tensor , where is the cell radius, the number cell density and the intensity of cellular tension. The negative sign of the coefficient of the active stress shows that the active stress is extensile—consistently with observations in various epithelial systems (e.g., Saw et al., Nature 2017; Blanch-Mercader et al., Phys. Rev. Lett. 2018). Finally, we note that the connection between cellular-scale forces and large-scale extensility has been rationalized in other contexts, such as active nematics (Balasubramaniam et al., Nat. Mater. 2021).

      Recommendations for the Authors: Reviewer #2 (Recommendations for the Authors):

      (1) The authors point out that hexatic topological defects are produced in quadrupoles (L109). Does this also mean that these defects can be annihilated only in quadrupoles as well? In the same vein, are hexatic defects always bound in pairs, as suggested by the schematics, or is it possible to observe an isolated hexatic defect?

      We thank the referee for their question. Hexatic disclinations (the defect monopoles discussed in this work), much like electrons and positrons, can annihilate in any number of neutral charge configuration (dipole, quadrupole, octupole, etc.). Unbinding a pair of hexatic disinclination, however, costs much more energy than unbinding a quadrupole to dipoles. Hence isolated defects appear in abundance only in late, fully disordered phase, where the system has completely “melted”. For more details on how defect unbinding modes affect tissue dynamics, please see our subsequent work Puggioni et al., arXiv:2502.09554.

      (2) Could you clarify if the flows described in Figures 2(a)-(b), panel (i) are driven by a passive backflow term without activity? Could you compare the magnitudes of these flows compared to the typical active terms?

      We thank the referee for their question. In panel 2(b) there is only passive backflow. In 2(a) instead, both terms are included, and are in a regime of parameters where the active flow overcomes the active flow (and hence the active force overcomes the passive force as delineated in the discussions section). In turn, the magnitude of the passive flows, is studied in detail in our previous work Krommydas et al., (Phys. Rev. Lett. 2023).

      (3) Could you clarify how the continuum hexatic model and MPF model are related to each other? What are the similarities and differences in the dynamics of these models?

      We thank the referee for this insightful question. A key point of our work is precisely that the continuum hexatic model and the MPF (Multi-Phase Field) model are distinct in nature.

      The MPF model is an established agent-based framework used to simulate tissue dynamics at the cellular level. It captures individual cell behaviors and interactions through phase-field variables. In our work, we use the MPF model as a benchmark to extract statistical features of tissue dynamics, such as defect motion and orientational correlations. In contrast, our continuum hexatic model is a coarse-grained hydrodynamic theory that describes the dynamics of orientational order in active tissues. It is built on symmetry principles and conservation laws, and it does not rely on microscopic cell-level details. Instead, it captures the collective behavior of the system through a hexatic order parameter and its coupling to flow and activity.

      Despite their conceptual differences, the MPF model and our hydrodynamic theory exhibit similar statistical features. This agreement—also observed in the independent study by Jain et al. (Phys. Rev. Res. 2024)—provides strong support for the validity and generality of our continuum description.

      (4) When multiple references by the same author and year are cited using alphabets, the second alphabet is not in bold e.g. Giomi et al., 2022b, a in Line 75, and others.

      We are grateful to the referee carefully going through the manuscript and pointing out these typos. We have corrected them in the amended manuscript.

      Reviewer #3 (Public Review):

      In this manuscript, the authors discuss epithelial tissue fluidity from a theoretical perspective. They focus on the description of topological transitions whereby cells change neighbors (T1 transitions). They explain how such transitions can be described by following the fate of hexatic defects. They first focus on a single T1 transition and the surrounding cells using a hydrodynamic model of active hexatics. They show that successful T1 intercalations, which promote tissue fluidity, require a sufficiently large extensile hexatic activity in the neighborhood of the cells attempting a T1 transition. If such activity is contractile or not sufficiently extensile, the T1 is reversed, hexatic defects annihilate, and the epithelial network configuration is unchanged. They then describe a large epithelium, using a phase field model to describe cells. They show a correlation between T1 events and hexatic defects unbinding, and identify two populations of T1 cells: one performing T1 cycles (failed T1), and not contributing to tissue migration, and one performing T1 intercalation (successful T1) and leading to the collective cell migration.

      Strengths

      The manuscript is scientifically sound, and the variety of numerical and analytical tools they use is impressive. The approach and results are very interesting and highlight the relevance of hexatic order parameters and their defects in describing tissue dynamics.

      We thank the Reviewer for recognizing the scientific soundness of the manuscript, the breadth of numerical and analytical tools employed, as well as their interest in our work.

      Weaknesses

      (1) Goal and message of the paper. (a) In my opinion, the article is mainly theoretical and should be presented as such. For instance, their conclusions and the consequences of their analysis in terms of biology are not extremely convincing, although they would be sufficient for a theory paper oriented to physicists or biophysicists. The choice of journal and potential readership should be considered, and I am wondering whether the paper structure should be re-organized, in order to have side-by-side the methods and the results, for instance (see also below).

      We thank the referee for their criticism. In response, we have made an effort to reword certain parts of the manuscript. As with any theoretical study, the biological implications of our work can only be fully assessed through experimental validation — a prospect we look forward to. Nevertheless, we have submitted our work to the subsection of Physics of Life, which we believe is perfectly suited to our content.

      (b) Currently, the two main results sections are somewhat disconnected, because they use different numerical models, and because the second section only marginally uses the results from the first section to identify/distinguish T1.

      We thank the referee, for their comment. In the second section we are using statistics from the MPF model, to support the analytical and numerical findings of our hydrodynamic theory of cell intercalation. In the time between our submission, further qualitative evidence have been brought to light in the work of Jain et al. (Phys. Rev. Res. 2024).

      (2) Quite surprisingly, the authors use a cell-based model to describe the macroscopic tissuescale behavior, and a hydrodynamic model to describe the cell-based events. In particular, their hydrodynamic description (the active hexatic model) is supposed to be a coarse-grained description, valid to capture the mesoscopic physics, and yet, they use it to describe cellscale events (T1 transitions). For instance, what is the meaning of the velocity field they are discussing in Figure 2? This makes me question the validity of the results of their first part.

      We thank the referee for their comment. There are many excellent discrete models of epithelial tissues in the literature (e.g., Bi et al., Phys. Rev. X 2016; Pasupalak et al., Soft Matter 2020; Graner et al., Phys. Rev. Lett. 1992), each capturing essential biological features such as cell division, apoptosis and sorting. While these models have provided invaluable insights, our work takes a different approach by developing a continuum theory aimed at describing epithelial dynamics at two levels: (1) mesoscopic intercalation events and (2) macroscopic collective migration. Crucially, our goal is not to replicate a specific discrete model — which would risk constructing a “model of a model” — but rather to derive a hydrodynamic description of tissue dynamics grounded in symmetry principles and conservation laws. Along this logic, the velocity field in our theory should be interpreted as an Eulerian (continuum) velocity, representing the coarse-grained flow of the tissue rather than the Lagrangian motion of individual cells. This distinction is central to our framework, which operates at scales where cellular details are averaged out, yet retains the essential physics of hexatic order and active stresses. We validate our predictions against the Multiphase Field (MPF) model. [We thank Reviewer 1 for their suggestion to incorporate further MPF literature.] Furthermore, Jain et al. (Phys. Rev. Res. 2024) have used the MPF to predict flow patterns around T1 transitions and obtained results compatible with those of our hydrodynamic theory. From this comparison we can conclude that both the MPF and our theory are able to capture the same aspect of cell intercalation in epithelial layer. This, however, does not imply that other discrete models of epithelia can reproduce this aspect too, nor that our theory is specifically tailored to the MPF model. We have clarified these points in the revised manuscript and expanded our discussion of the MPF model.

      (3) The quality of the numerical results presented in the second part (phase field model) could be improved. (a) In terms of analysis of the defects. It seems that they have all the tools to compare their cell-resolved simulations and their predictions about how a T1 event translates into defects unbinding. However, their analysis in Figure 3e is relatively minimal: it shows a correlation between T1 cells and defects. But it says nothing about the structure and evolution of the defects, which, according to their first section, should be quite precise.

      We thank the referee for their comment. Further qualitative evidence have been brought to light in the work of Jain et al. (Phys. Rev. Res. 2024), were the exact flow pattern predicted by our hydrodynamic theory is obtained, in the MPF, around cells undergoing T1 rearrangements.

      (b) In terms of clarity of the presentation. For instance, in Figure 3f, they plot the mean-square displacement as a function of a defect density. I thought that MSD was a time-dependent quantity: they must therefore consider MSD at a given time, or averaged over time. They should be explicit about what their definition of this quantity is.

      We thank the referee for raising this point. As clarified in our response to Reviewer 1, point 3, the mean square displacement (MSD) plotted in Fig. 3f is computed over a fixed time window of ∆t = 25×103 iterations, chosen to match the typical duration of T1 events and defect lifetimes. [See also reply to Reviewer #1, point (3).] The MSD is normalized by the subregion area and averaged over time within each window. We have now made this explicit in the amended version of the manuscript.

      (c) In terms of statistics. For instance, Figure 3g is used to study the role of rotational diffusion on the average time between T1s. The error bars in this figure are huge and make their claims hardly supported. Their claim of a ”monotonic decay” of the average time between intercalations is also not fully supported given their statistics.

      We appreciate the Reviewer’s comment regarding the statistical robustness of Fig. 3g. While we acknowledge that the error bars are substantial – reflecting the inherent variability in cell intercalation dynamics – the yellow curve does exhibit a consistent downward trend in the average time between T1 transitions as rotational diffusion increases. This monotonic decrease is visible across the entire range of variation of the rotational diffusion Dr, and is statistically supported when considering the trend over independent simulations. To address this concern, we have revised the main text to adjusted the wording: instead of stating that “the former is a monotonically decreasing function of Dr,” we now write that “the former displays a decreasing trend with Dr,” which better reflects the statistical variability while preserving the observed behavior.

      Reviewer #3 (Recommendations for the Authors):

      (1) Section 1 is difficult to follow due to multiple reasons: early but delayed definitions, unclear use of T1 intercalation vs. T1 cycles, disconnected figures and unclear simulation descriptions. We recommend including simulation setup details earlier and restructuring the flow of arguments.

      We thank the referee for their comment. We have made an effort in rewording and clarifying things in our amended manuscript. We are slightly confused by what they mean by “early but delayed definitions”, if they could clarify, we would be happy to amend the position and phrasing of these definitions accordingly.

      (2) It could be useful to have an additional figure early on defining schematically hexatic defects and an illustration showing an epithelium (or a simulation), similar to what the authors have produced in some of their other publications on this topic.

      We thank the referee for their comment. Figures 3c and 3d show what a hexatic defect looks like in a simulation of the epithelium. Following the referee’s recommendation, we have added a note in the caption of figure 3, citing our work were we show the same defects in MDCK epithelial monolayers (Armengol et al., Nat. Phys. 2023).

      (3) Minor points and typos:

      Line 88: the bond between vertices shrinks, not the vertices.

      Figure 1: the 1/6 is displayed as 1 6 (fraction bar missing).

      Line 232: “and order” → “one/an order”.

      Line 237: Fig. 3g) → Fig. 3g

      Line 298: ”nu” and ”v” hard to distinguish in eLife font.

      Methods: define all notation clearly (e.g., tensor product exponent, D/Dt in Eq. 3c).

      Methods: ”cell orientation, coarse-graining and topological defects” section is difficult to follow, schematic would help.

      Line 457 onward: unclear how panels (ii-iv) of Fig. 2ab are obtained.

      Line 480 onward: not referenced in main text.

      Figure 2: “avalancHe” typo.

      Figure 2 caption: “cell intercalaTION” typo.

      Movies are neither referenced nor explained.

      Figure 5 and 6 are not referenced in the main text.

      We thank the referee for their detailed read of the paper. We have corrected all typos.

    1. Document d'Information : Peut-on Réinventer les Lumières ?

      Synthèse

      Ce document d'information synthétise les arguments et les thèmes clés abordés lors de la séance de clôture du cycle "Peut-on réinventer les Lumières ?", organisée par l'Institut d'Études Avancées de Paris.

      Les interventions de Francis Wolf et Céline Spector, deux philosophes éminents, ont convergé vers une défense robuste et nuancée de l'universalisme, tout en examinant de manière critique les objections contemporaines, notamment celles issues des courants identitaires et postcoloniaux.

      L'argument central, porté par Francis Wolf, est que l'humanité forme une communauté morale unique, fondée sur des droits et des devoirs réciproques.

      Il déconstruit méthodiquement les critiques affirmant que les valeurs universelles ne sont qu'un masque pour la domination occidentale.

      En distinguant l'origine d'une idée de sa portée et en s'appuyant sur des exemples concrets de luttes pour la démocratie et la liberté à travers le monde (Printemps arabes, Iran), il soutient que l'universalisme est un outil d'émancipation essentiel. Il insiste sur la distinction fondamentale entre l'universel, qui garantit la diversité, et l'uniforme, qui la nie.

      Céline Spector prolonge cette analyse en se concentrant sur les critiques postcoloniales des droits de l'homme.

      Elle en systématise les principaux arguments (ethnocentrisme, fiction idéologique, outil de colonisation) tout en soulignant les paradoxes inhérents au concept de droits humains dès son origine.

      Son propos, en accord avec celui de Wolf, vise à réaffirmer la pertinence de cet héritage des Lumières face à ces objections.

      La discussion a ensuite exploré plusieurs concepts connexes, dont la notion de "pluriversel" (jugée contradictoire ou maladroite), l'existence de précédents non-occidentaux aux droits humains (la Charte du Mandé de 1236), et la tension persistante entre l'idéal universel et son application souvent défaillante ("deux poids, deux mesures").

      Enfin, le débat s'est ouvert sur les défis contemporains, tels que les droits de la nature face à la crise environnementale et le rôle de l'héritage des Lumières dans la construction d'une Europe capable de résister aux dynamiques impériales.

      --------------------------------------------------------------------------------

      Contexte de l'Événement

      La discussion s'est tenue dans le cadre de la séance de clôture du cycle de conférences de l'IEA de Paris, présidé par Betina Laville, sur le thème "Peut-on réinventer les Lumières ?".

      L'objectif était de conclure une année de réflexion sur la place de l'universel dans un monde qualifié de "fracturé" et de plus en plus contestataire envers l'héritage intellectuel européen.

      Les deux intervenants principaux étaient :

      Francis Wolf : Philosophe, professeur émérite à l'École Normale Supérieure, spécialiste de philosophie antique et auteur de travaux significatifs sur l'humanisme et l'universalisme, notamment Plaidoyer pour l'universel.

      Céline Spector : Philosophe, professeure à Sorbonne Université, spécialiste des Lumières (en particulier Montesquieu et Rousseau) et des questions européennes, auteure de No Demos. Souveraineté et démocratie à l'épreuve de l'Europe.

      Le Plaidoyer pour l'Universalisme de Francis Wolf

      Francis Wolf a structuré son intervention comme une défense des valeurs universelles, qu'il définit à travers une thèse fondatrice : "l'humanité forme une communauté morale de droit et de devoirs réciproques".

      Il se concentre principalement sur la réfutation des critiques qui jugent cet universalisme excessif, au profit de communautés morales restreintes ("infrahumaines").

      Les Critiques de l'Universalisme

      Wolf identifie deux grands courants critiques contemporains de l'universalisme :

      1. Les idéologies "de droite" : Nationalistes, racistes et xénophobes, elles nient l'existence de l'Homme en général pour n'admettre que des communautés de "semblables" ("nous" contre "eux").

      Cette vision, selon Wolf, est en pleine résurgence et se manifeste par le piétinement du droit international (depuis l'invasion de l'Ukraine), la remise en cause du droit des réfugiés (accords de Genève) et la montée des politiques discriminatoires ou d'épuration ethnique.

      2. Les idéologies identitaristes "de gauche" : Symétriques aux premières, elles reprennent des arguments hérités d'un "marxisme simplifié" selon lesquels toute prétention à l'universalité est un leurre masquant la domination.

      Réfutation des Arguments Anti-Universalistes

      Wolf examine et réfute systématiquement plusieurs arguments récurrents contre les valeurs universelles.

      Argument Critique

      Réfutation par Francis Wolf

      1. Aucune lutte ne peut se faire au nom de l'universel, car elle défend toujours des victimes particulières.

      Si les combats pour des minorités oublient qu'ils visent l'égalité pour tous, ils trahissent leur propre cause.

      Les colonisés n'ont pas lutté pour devenir colonisateurs, mais pour abolir le colonialisme.

      2. L'universel se présente comme neutre, mais ne l'est jamais ; il nie les relations de domination.

      Bien que l'universel soit parfois utilisé pour nier les injustices, il n'est pas nécessaire de se définir uniquement "en tant que" (femme, colonisé, etc.).

      Les identités sont métissées, fluides et non des essences réifiées.

      3. L'expérience des souffrances particulières est incommunicable et il n'y a pas de lieu neutre pour juger.

      Une injustice ne concerne pas que la victime ou le coupable, mais la communauté morale entière. Sans un "tiers lieu" permettant de juger, il n'y a plus de justice, seulement des vengeances. Toute souffrance a une dimension communicable.

      4. L'universel n'est que le masque des intérêts dominants.

      Cet argument, bien que souvent justifié par l'histoire (colonisation, guerre d'Irak), n'est pas généralisable.

      Les pires entreprises de domination (génocides) n'ont pas besoin de ce prétexte et se font au nom d'identités essentialisées ("sous-hommes", "bêtes nuisibles").

      5. Tout universel est en fait particulier ; c'est un autre nom de l'Occident.

      Concéder qu'un universel naît dans un contexte particulier n'en limite pas la portée. L'algèbre, née en Perse, n'est pas une science "iranienne".

      La démocratie et les droits humains sont réclamés par les peuples en lutte partout dans le monde (Printemps arabes, Hong Kong, Iran), et leurs despotes les rejettent en les qualifiant de "valeurs occidentales".

      Prétendre que l'Occident a seul inventé les droits humains est une "illusion occidentaliste" (Amartya Sen).

      La Vertu Émancipatrice de l'Universel

      Pour conclure, Wolf affirme que l'universalisme conserve sa force émancipatrice.

      Il pose la question : qui est le véritable ethnocentriste ?

      Celui qui croit en l'existence de consciences critiques dans toutes les cultures, ou celui qui essentialise les autres cultures en leur déniant cette capacité critique ?

      Il distingue enfin l'universel de l'uniforme. Loin d'effacer les particularités, les valeurs universelles (laïcité, liberté d'opinion, tolérance) sont la condition de leur coexistence.

      Elles constituent un "universel de second niveau", formel, qui garantit la diversité.

      La Critique Postcoloniale des Droits de l'Homme selon Céline Spector

      Céline Spector se déclare en "profond accord" avec Francis Wolf et concentre son propos sur la critique spécifique des droits de l'homme par les études postcoloniales et décoloniales.

      Les Paradoxes Originels des Droits de l'Homme

      Dès leur proclamation aux États-Unis (1776) et en France (1789), les droits de l'homme présentent des paradoxes fondamentaux :

      • Ils sont à la fois évidents et advenus (nés de révolutions).

      • Ils sont à la fois naturels et historiques.

      • Ils sont à la fois innés et civiques.

      • Ils sont à la fois universels et situés.

      Ces paradoxes ont nourri les critiques (marxistes, féministes) qui y voyaient une hypocrisie, notamment en raison de l'exclusion des femmes, des esclaves et d'autres minorités.

      Les Cinq Piliers de la Critique Postcoloniale

      Spector résume la critique postcoloniale des droits de l'homme en cinq arguments principaux :

      1. Ils ne sont pas universels mais occidentaux, protégeant uniquement les citoyens d'Europe.

      2. Ce sont des fictions idéologiques ayant servi à justifier la "mission civilisatrice" de la colonisation.

      3. Ils sont associés à une conception de la raison qui exclut les peuples "sauvages" ou "barbares", jugés incapables d'y accéder.

      4. La liste des droits est arbitraire et abusive, notamment l'inclusion du droit de propriété qui a servi à exproprier les peuples nomades.

      5. Ce sont les droits des colons et de leurs complices, qui n'avaient aucune volonté politique de mettre fin au pillage des colonies ou à l'esclavage.

      Tout en reconnaissant la nécessité de prendre en compte ces critiques pour révéler les "tensions inhérentes aux Lumières", l'approche de Céline Spector vise à formuler des objections à cette vision, rejoignant ainsi la défense de l'universalisme de Francis Wolf.

      Thèmes Clés de la Discussion

      La période d'échange avec le public a permis d'approfondir plusieurs thématiques.

      Le Concept de "Pluriversel"

      Interrogés sur cette notion issue des théories décoloniales, les deux intervenants expriment leur scepticisme :

      Francis Wolf y voit soit une contradiction dans les termes, soit une simple reformulation du fait que l'universel est toujours perçu depuis un point de vue culturel particulier, sans pour autant y être prisonnier.

      Céline Spector, citant la définition du Dictionnaire décolonial, le décrit comme une "critique radicale de l'universalisme".

      Elle considère ce concept comme une "tentative maladroite" de la part d'auteurs (Ramon Grosfoguel, Walter Mignolo, etc.) qui se retrouvent dans une impasse existentielle : vouloir lutter pour les droits sans utiliser l'outil des droits universels.

      Précédents Historiques et Application du Droit

      La Charte du Mandé (1236) : Cette charte, issue de l'empire du Mali, est évoquée comme un possible précédent africain à la reconnaissance de valeurs universelles, telles que l'égalité entre ethnies et religions, et la participation des femmes au gouvernement.

      Le "Deux Poids, Deux Mesures" : Un participant soulève le problème du "double standard" dans l'application du droit international.

      Céline Spector reconnaît la légitimité de cette critique mais met en garde contre une indignation qui dévalorise les institutions internationales (ONU, CPI), les rendant fragiles et poussant les puissances hégémoniques à simplement les quitter.

      Universalité, Environnement et Europe

      Droits de la Nature : La question d'un "droit à l'environnement" est soulevée comme un défi majeur pour réinventer les Lumières.

      La discussion porte sur la tension entre les droits humains et les "droits de la nature", un concept de plus en plus débattu juridiquement (ex: le fleuve Whanganui en Nouvelle-Zélande, la lagune Mar Menor en Espagne).

      Ce débat interroge la centralité de l'homme dans la définition de l'environnement.

      L'Héritage des Lumières pour l'Europe : Céline Spector propose de voir dans l'héritage de Montesquieu, et spécifiquement son modèle de "République fédérative", un outil puissant pour penser la résistance des démocraties face à la résurgence des empires.

      Francis Wolf abonde en ce sens, soulignant que la construction européenne illustre la primauté du demos (communauté politique) sur l'ethnos (communauté préexistante), un principe également au cœur de la résistance ukrainienne.

      Les "Lumières Noires" : Ce terme, associé à Curtis Yarvin, est décrit comme un "usage complètement perverti" des Lumières, désignant une technocratie oligarchique où une élite numérique domine des citoyens dépossédés de leurs droits.

      C'est l'antithèse même de l'idéal des Lumières.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review)

      Summary:

      This study by Park and colleagues uses longitudinal saliva viral load data from two cohorts (one in the US and one in Japan from a clinical trial) in the pre-vaccine era to subset viral shedding kinetics and then use machine learning to attempt to identify clinical correlates of different shedding patterns. The stratification method identifies three separate shedding patterns discriminated by peak viral load, shedding duration, and clearance slope. The authors also assess micro-RNAs as potential biomarkers of severity but do not identify any clear relationships with viral kinetics.

      Strengths:

      The cohorts are well developed, the mathematical model appears to capture shedding kinetics fairly well, the clustering seems generally appropriate, and the machine learning analysis is a sensible, albeit exploratory approach. The micro-RNA analysis is interesting and novel.

      Weaknesses:

      The conclusions of the paper are somewhat supported by the data but there are certain limitations that are notable and make the study's findings of only limited relevance to current COVID-19 epidemiology and clinical conditions.

      We sincerely appreciate the reviewer’s thoughtful and constructive comments, which have been invaluable in improving the quality of our study. We have carefully revised the manuscript to address all points raised.

      (1) The study only included previously uninfected, unvaccinated individuals without the omicron variant. It has been well documented that vaccination and prior infection both predict shorter duration shedding. Therefore, the study results are no longer relevant to current COVID-19 conditions. This is not at all the authors' fault but rather a difficult reality of much retrospective COVID research.

      Thank you for your comment. We agree with the review’s comment that some of our results could not provide insight into the current COVID-19 conditions since most people have either already been infected with COVID-19 or have been vaccinated. We revised our manuscript to discuss this (page 22, lines 364-368). Nevertheless, we believe it is novel that we have extensively investigated the relationship between viral shedding patterns in saliva and a wide range of clinical and microRNA data, and that developing a method to do so remains important. This is important for providing insight into early responses to novel emerging viral diseases in the future. Therefore, we still believe that our findings are valuable.

      (2) The target cell model, which appears to fit the data fairly well, has clear mechanistic limitations. Specifically, if such a high proportion of cells were to get infected, then the disease would be extremely severe in all cases. The authors could specify that this model was selected for ease of use and to allow clustering, rather than to provide mechanistic insight. It would be useful to list the AIC scores of this model when compared to the model by Ke.

      Thank you for your feedback and suggestion regarding our mathematical model. As the reviewer pointed out, in this study, we adopted a simple model (target cell-limited model) to focus on reconstruction of viral dynamics and stratification of shedding patterns rather than exploring the mechanism of viral infection in detail. Nevertheless, we believe that the target cell-limited model provides reasonable reconstructed viral dynamics as it has been used in many previous studies. We revised manuscript to clarify this point (page 10, lines 139-144). Also, we revised our manuscript to provide more detailed description of the model comparison along with information about AIC (page 10, lines 130-135).

      (3) Line 104: I don't follow why including both datasets would allow one model to work better than the other. This requires more explanation. I am also not convinced that non-linear mixed effects approaches can really be used to infer early model kinetics in individuals from one cohort by using late viral load kinetics in another (and vice versa). The approach seems better for making populationlevel estimates when there is such a high amount of missing data.

      Thank you for your feedback. We recognized that our explanation was insufficient by your comment. We intended to describe that, rather than comparing performance of the two models, data fitting can be performed with same level for both models by including both datasets. We revised the manuscript to clarify this point (page 10, lines 135-139).

      Additionally, we agree that nonlinear mixed effects models are a useful approach for performing population-level estimates of missing data. On the other hand, in addition, the nonlinear mixed effects model has the advantage of making the reasonable parameter estimation for each individual with not enough data points by considering the distribution of parameters of other individuals. Paying attention to these advantages, we adopted a nonlinear mixed effects model in our study. We also revised the manuscript to clarify this (page 27, lines 472-483).

      (4) Along these lines, the three clusters appear to show uniform expansion slopes whereas the NBA cohort, a much larger cohort that captured early and late viral loads in most individuals, shows substantial variability in viral expansion slopes. In Figure 2D: the upslope seems extraordinarily rapid relative to other cohorts. I calculate a viral doubling time of roughly 1.5 hours. It would be helpful to understand how reliable of an estimate this is and also how much variability was observed among individuals.

      We appreciate your detailed feedback on the estimated up-slope of viral dynamics. As the reviewer noted, the pattern differs from that observed in the NBA cohort, which may be due to their measurement of viral load from upper respiratory tract swabs. In our estimation, the mean and standard deviation of the doubling time (defined as ln2/(𝛽𝑇<sub>0</sub>𝑝𝑐<sup>−1</sup> − 𝛿)) were 1.44 hours and 0.49 hours, respectively. Although direct validation of these values is challenging, several previous studies, including our own, have reported that viral loads in saliva increase more rapidly than in the upper respiratory tract swabs, reaching their peak sooner. Thus, we believe that our findings are consistent with those of previous studies. We revised our manuscript to discuss this point with additional references (page 20, lines 303-311).

      (5) A key issue is that a lack of heterogeneity in the cohort may be driving a lack of differences between the groups. Table 1 shows that Sp02 values and lab values that all look normal. All infections were mild. This may make identifying biomarkers quite challenging.

      Thank you for your comment regarding heterogeneity in the cohort. Although the NFV cohort was designed for COVID-19 patients who were either mild or asymptomatic, we have addressed this point and revised the manuscript to discuss it (page 21, lines 334-337).

      (6) Figure 3A: many of the clinical variables such as basophil count, Cl, and protein have very low pre-test probability of correlating with virologic outcome.

      Thank you for your comment regarding some clinical information we used in our study. We revised our manuscript to discuss this point (page 21, lines 337-338).

      (7) A key omission appears to be micoRNA from pre and early-infection time points. It would be helpful to understand whether microRNA levels at least differed between the two collection timepoints and whether certain microRNAs are dynamic during infection.

      Thank you for your comment regarding the collection of micro-RNA data. As suggested by the reviewer, we compared micro-RNA levels between two time points using pairwise t-tests and Mann-Whitney U tests with FDR correction. As a result, no micro-RNA showed a statistically significant difference. This suggests that micro-RNA levels remain relatively stable during the course of infection, at least for mild or asymptomatic infection, and may therefore serve as a biomarker independent of sampling time. We have revised the manuscript to include this information (page 17, lines 259-262).

      (8) The discussion could use a more thorough description of how viral kinetics differ in saliva versus nasal swabs and how this work complements other modeling studies in the field.

      We appreciate the reviewer’s thoughtful feedback. As suggested, we have added a discussion comparing our findings with studies that analyzed viral dynamics using nasal swabs, thereby highlighting the differences between viral dynamics in saliva and in the upper respiratory tract. To ensure a fair and rigorous comparison, we referred to studies that employed the same mathematical model (i.e., Eqs.(1-2)). Accordingly, we revised the manuscript and included additional references (page 20, lines 303-311).

      Furthermore, we clarified the significance of our study in two key aspects. First, it provides a detailed analysis of viral dynamics in saliva, reinforcing our previous findings from a single cohort by extending them across multiple cohorts. Second, this study uniquely examines whether viral dynamics in saliva can be directly predicted by exploring diverse clinical data and micro-RNAs. Notably, cohorts that have simultaneously collected and reported both viral load and a broad spectrum of clinical data from the same individuals, as in our study, are exceedingly rare. We revised the manuscript to clarify this point (page 20, lines 302-311).

      (9) The most predictive potential variables of shedding heterogeneity which pertain to the innate and adaptive immune responses (virus-specific antibody and T cell levels) are not measured or modeled.

      Thank you for your comment. We agree that antibody and T cell related markers may serve as the most powerful predictors, as supported by our own study [S. Miyamoto et al., PNAS (2023), ref. 24] as well as previous reports. While this point was already discussed in the manuscript, we have revised the text to make it more explicit (page 21, lines 327-328).

      (10) I am curious whether the models infer different peak viral loads, duration, expansion, and clearance slopes between the 2 cohorts based on fitting to different infection stage data.

      Thank you for your comment. We compared features between 2 cohorts as reviewer suggested. As a result, a statistically significant difference between the two cohorts (i.e., p-value ≤ 0.05 from the t-test) was observed only at the peak viral load, with overall trends being largely similar. At the peak, the mean value was 7.5 log<sub>10</sub> (copies/mL) in the Japan cohort and 8.1 log<sub>10</sub> (copies/mL) in the Illinois cohort, with variances of 0.88 and 0.87, respectively, indicating comparable variability.

      Reviewer #2 (Public review)

      Summary:

      This study argues it has found that it has stratified viral kinetics for saliva specimens into three groups by the duration of "viral shedding"; the authors could not identify clinical data or microRNAs that correlate with these three groups.

      Strengths:

      The question of whether there is a stratification of viral kinetics is interesting.

      Weaknesses:

      The data underlying this work are not treated rigorously. The work in this manuscript is based on PCR data from two studies, with most of the data coming from a trial of nelfinavir (NFV) that showed no effect on the duration of SARS-CoV-2 PCR positivity. This study had no PCR data before symptom onset, and thus exclusively evaluated viral kinetics at or after peak viral loads. The second study is from the University of Illinois; this data set had sampling prior to infection, so has some ability to report the rate of "upswing." Problems in the analysis here include:

      We are grateful to the reviewer for the constructive feedback, which has greatly enhanced the quality of our study. In response, we have carefully revised the manuscript to address all comments.

      The PCR Ct data from each study is treated as equivalent and referred to as viral load, without any reports of calibration of platforms or across platforms. Can the authors provide calibration data and justify the direct comparison as well as the use of "viral load" rather than "Ct value"? Can the authors also explain on what basis they treat Ct values in the two studies as identical?

      Thank you for your comment regarding description of viral load data. We recognized the lack of explanation for the integration of viral load data by reviewer's comment. We calculated viral load from Ct value using linear regression equations between Ct and viral load for each study's measurement method, respectively. We revised the manuscript to clarify this point in the section of Saliva viral load data in Methods.

      The limit of detection for the NFV PCR data was unclear, so the authors assumed it was the same as the University of Illinois study. This seems a big assumption, as PCR platforms can differ substantially. Could the authors do sensitivity analyses around this assumption?

      Thank you for your comment regarding the detection limit for viral load data. As reviewer suggested, we conducted sensitivity analysis for assumption of detection limit for the NFV dataset. Specifically, we performed data fitting in the same manner for two scenarios: when the detection limit of NFV PCR was lower (0 log<sub>10</sub> copies/mL) or higher (2 log<sub>10</sub> copies/mL) than that of the Illinois data (1.08 log<sub>10</sub> copies/mL), and compared the results.

      As a result, we obtained largely comparable viral dynamics in most cases (Supplementary Fig 6). When comparing the AIC values, we observed that the AIC for the same censoring threshold was 6836, whereas it increased to 7403 under the low censoring threshold and decreased to 6353 under the higher censoring threshold. However, this difference may be attributable to the varying number of data points treated as below the detection limit. Specifically, when the threshold is set higher, more data are treated as below the detection limit, which may result in a more favorable error calculation. To discuss this point, we have added a new figure (Supplementary Fig 6) and revised the manuscript accordingly (page 25, lines 415-418).

      The authors refer to PCR positivity as viral shedding, but it is viral RNA detection (very different from shedding live/culturable virus, as shown in the Ke et al. paper). I suggest updating the language throughout the manuscript to be precise on this point.

      We appreciate the reviewer’s feedback regarding the terminology used for viral shedding. In response, we have revised all instances of “viral shedding” to “viral RNA detection” throughout the manuscript as suggested.

      Eyeballing extended data in Figure 1, a number of the putative long-duration infections appear to be likely cases of viral RNA rebound (for examples, see S01-16 and S01-27). What happens if all the samples that look like rebound are reanalyzed to exclude the late PCR detectable time points that appear after negative PCRs?

      We sincerely thank the reviewer for the valuable suggestion. In response, we established a criterion to remove data that appeared to exhibit rebound and subsequently performed data fitting

      (see Author response image 1 below). The criterion was defined as: “any data that increase again after reaching the detection limit in two measurements are considered rebound and removed.” As a result, 15 out of 144 cases were excluded due to insufficient usable data, leaving 129 cases for analysis. Using a single detection limit as the criterion would have excluded too many data points, while defining the criterion solely based on the magnitude of increase made it difficult to establish an appropriate “threshold for increase.”

      The fitting result indicates that the removal of rebound data may influence the fitting results; however, direct comparison of subsequent analyses, such as clustering, is challenging due to the reduced sample size. Moreover, the results can vary substantially depending on the criterion used to define rebound, and establishing a consistent standard remains difficult. Accordingly, we retained the current analysis and have added a discussion of rebound phenomena in the Discussion section as a limitation (page 22, lines 355-359). We once again sincerely appreciate the reviewer’s insightful and constructive suggestion.

      Author response image 1.

      Comparison of model fits before and after removing data suspected of rebound. Black dots represent observed measurements, and the black and yellow curves show the fitted viral dynamics for the full dataset and the dataset with rebound data removed, respectively.

      There's no report of uncertainty in the model fits. Given the paucity of data for the upslope, there must be large uncertainty in the up-slope and likely in the peak, too, for the NFV data. This uncertainty is ignored in the subsequent analyses. This calls into question the efforts to stratify by the components of the viral kinetics. Could the authors please include analyses of uncertainty in their model fits and propagate this uncertainty through their analyses?

      We sincerely appreciate the reviewer’s detailed feedback on model uncertainty. To address this point, we revised Extended Fig 1 (now renumbered as Supplementary Fig 1) to include 95% credible intervals computed using a bootstrap approach. In addition, to examine the potential impact of model uncertainty on stratified analyses, we reconstructed the distance matrix underlying stratification by incorporating feature uncertainty. Specifically, for each individual, we sampled viral dynamics within the credible interval and averaged the resulting feature, and build the distance matrix using it. We then compared this uncertainty-adjusted matrix with the original one using the Mantel test, which showed a strong correlation (r = 0.72, p < 0.001). Given this result, we did not replace the current stratification but revised the manuscript to provide this information through Result and Methods sections (page 11, lines 159-162 and page 28, lines 512-519). Once again, we are deeply grateful for this insightful comment.

      The clinical data are reported as a mean across the course of an infection; presumably vital signs and blood test results vary substantially, too, over this duration, so taking a mean without considering the timing of the tests or the dynamics of their results is perplexing. I'm not sure what to recommend here, as the timing and variation in the acquisition of these clinical data are not clear, and I do not have a strong understanding of the basis for the hypothesis the authors are testing.

      We appreciate the reviewers' feedback on the clinical data. We recognized that the manuscript lacked description of the handling of clinical data by your comment. In this research, we focused on finding “early predictors” which could provide insight into viral shedding patterns. Thus, we used clinical data measured in the earliest time (date of admission) for each patient. Another reason is that the date of admission is the almost only time point at which complete clinical data without any missing values are available for all participants. We revised our manuscript to clarify this point (page 5, lines 90-95).

      It's unclear why microRNAs matter. It would be helpful if the authors could provide more support for their claims that (1) microRNAs play such a substantial role in determining the kinetics of other viruses and (2) they play such an important role in modulating COVID-19 that it's worth exploring the impact of microRNAs on SARS-CoV-2 kinetics. A link to a single review paper seems insufficient justification. What strong experimental evidence is there to support this line of research?

      We appreciate the reviewer’s comments regarding microRNA. Based on this feedback, we recognized the need to clarify our rationale for selecting microRNAs as the analyte. The primary reason was that our available specimens were saliva, and microRNAs are among the biomarkers that can be reliably measured in saliva. At the same time, previous studies have reported associations between microRNAs and various diseases, which led us to consider the potential relevance of microRNAs to viral dynamics, beyond their role as general health indicators. To better reflect this context, we have added supporting references (page 17, lines 240-243).

      Reviewer #3 (Public review)

      The article presents a comprehensive study on the stratification of viral shedding patterns in saliva among COVID-19 patients. The authors analyze longitudinal viral load data from 144 mildly symptomatic patients using a mathematical model, identifying three distinct groups based on the duration of viral shedding. Despite analyzing a wide range of clinical data and micro-RNA expression levels, the study could not find significant predictors for the stratified shedding patterns, highlighting the complexity of SARS-CoV-2 dynamics in saliva. The research underscores the need for identifying biomarkers to improve public health interventions and acknowledges several limitations, including the lack of consideration of recent variants, the sparsity of information before symptom onset, and the focus on symptomatic infections. 

      The manuscript is well-written, with the potential for enhanced clarity in explaining statistical methodologies. This work could inform public health strategies and diagnostic testing approaches. However, there is a thorough development of new statistical analysis needed, with major revisions to address the following points:

      We sincerely appreciate the thoughtful feedback provided by Reviewer #3, particularly regarding our methodology. In response, we conducted additional analyses and revised the manuscript accordingly. Below, we address the reviewer’s comments point by point.

      (1) Patient characterization & selection: Patient immunological status at inclusion (and if it was accessible at the time of infection) may be the strongest predictor for viral shedding in saliva. The authors state that the patients were not previously infected by SARS-COV-2. Was Anti-N antibody testing performed? Were other humoral measurements performed or did everything rely on declaration? From Figure 1A, I do not understand the rationale for excluding asymptomatic patients. Moreover, the mechanistic model can handle patients with only three observations, why are they not included? Finally, the 54 patients without clinical data can be used for the viral dynamics fitting and then discarded for the descriptive analysis. Excluding them can create a bias. All the discarded patients can help the virus dynamics analysis as it is a population approach. Please clarify. In Table 1 the absence of sex covariate is surprising.

      We appreciate the detailed feedback from the reviewer regarding patient selection. We relied on the patient's self-declaration to determine the patient's history of COVID-19 infection and revised the manuscript to specify this (page 6, lines 83-84).

      In parameter estimation, we used the date of symptom onset for each patient so that we establish a baseline of the time axis as clearly as possible, as we did in our previous works. Accordingly, asymptomatic patients who do not have information on the date of symptom onset were excluded from the analysis. Additionally, in the cohort we analyzed, for patients excluded due to limited number of observations (i.e., less than 3 points), most patients already had a viral load close to the detection limit at the time of the first measurement. This is due to the design of clinical trial, as if a negative result was obtained twice in a row, no further follow-up sampling was performed. These patients were excluded from the analysis because it hard to get reasonable fitting results. Also, we used 54 patients for the viral dynamics fitting and then only used the NFV cohort for clinical data analysis. We acknowledge that our description may have confused readers. We revised our manuscript to clarify these points regarding patient selecting for data fitting (page 6, lines 96-102, page 24, lines 406-407, and page 7, lines 410-412). In addition, we realized, thanks to the reviewer’s comment, that gender information was missing in Table 1. We appreciate this observation and have revised the table to include gender (we used gender in our analysis). 

      (2) Exact study timeline for explanatory covariates: I understand the idea of finding « early predictors » of long-lasting viral shedding. I believe it is key and a great question. However, some samples (Figure 4A) seem to be taken at the end of the viral shedding. I am not sure it is really easier to micro-RNA saliva samples than a PCR. So I need to be better convinced of the impact of the possible findings. Generally, the timeline of explanatory covariate is not described in a satisfactory manner in the actual manuscript. Also, the evaluation and inclusion of the daily symptoms in the analysis are unclear to me.

      We appreciate the reviewer’s feedback regarding the collection of explanatory variables. As noted, of the two microRNA samples collected from each patient, one was obtained near the end of viral shedding. This was intended to examine potential differences in microRNA levels between the early and late phases of infection. No significant differences were observed between the two time points, and using microRNA from either phase alone or both together did not substantially affect predictive accuracy for stratified groups. Furthermore, microRNA collection was motivated primarily by the expectation that it would be more sensitive to immune responses, rather than by ease of sampling. We have revised the manuscript to clarify these points regarding microRNA (page 17, lines 243-245 and 259-262).

      Furthermore, as suggested by the reviewer, we have also strengthened the explanation regarding the collection schedule of clinical information and the use of daily symptoms in the analysis (page 6, lines 90-95, page 14, lines 218-220,).

      (3) Early Trajectory Differentiation: The model struggles to differentiate between patients' viral load trajectories in the early phase, with overlapping slopes and indistinguishable viral load peaks observed in Figures 2B, 2C, and 2D. The question arises whether this issue stems from the data, the nature of Covid-19, or the model itself. The authors discuss the scarcity of pre-symptom data, primarily relying on Illinois patients who underwent testing before symptom onset. This contrasts earlier statements on pages 5-6 & 23, where they claim the data captures the full infection dynamics, suggesting sufficient early data for pre-symptom kinetics estimation. The authors need to provide detailed information on the number or timing of patient sample collections during each period.

      Thank you for the reviewer’s thoughtful comments. The model used in this study [Eqs.(1-2)] has been employed in numerous prior studies and has successfully identified viral dynamics at the individual level. In this context, we interpret the rapid viral increase observed across participants as attributable to characteristics of SARS-CoV-2 in saliva, an interpretation that has also been reported by multiple previous studies. We have added the relevant references and strengthened the corresponding discussion in the manuscript (page 20, lines 303-311).

      We acknowledge that our explanation of how the complementary relationship between the two cohorts contributes to capturing infection dynamics was not sufficiently clear. As described in the manuscript, the Illinois cohort provides pre-symptomatic data, whereas the NFV cohort offers abundant end-phase data, thereby compensating for each other’s missing phases. By jointly analyzing the two cohorts with a nonlinear mixed-effects model, we estimated viral dynamics at the individual-level. This approach first estimates population-level parameters (fixed effects) using data from all participants and then incorporates random effects to account for individual variability, yielding the most plausible parameter values.

      Thus, even when early-phase data are lacking in the NFV cohort, information from the Illinois cohort allows us to infer most reasonable dynamics, and the reverse holds true for the end phase. In this context, we argued that combining the two cohorts enables mathematical modeling to capture infection dynamics at the individual level. Recognizing that our earlier description could be misleading, we have carefully reinforced the relevant description (page 27, lines 472-483). In addition, as suggested by the reviewer, we have added information on the number of data samples available for each phase in both cohorts (page 7, lines 106-109).

      (4) Conditioning on the future: Conditioning on the future in statistics refers to the problematic situation where an analysis inadvertently relies on information that would not have been available at the time decisions were made or data were collected. This seems to be the case when the authors create micro-RNA data (Figure 4A). First, when the sampling times are is something that needs to be clarified by the authors (for clinical outcomes as well). Second, proper causal inference relies on the assumption that the cause precedes the effect. This conditioning on the future may result in overestimating the model's accuracy. This happens because the model has been exposed to the outcome it's supposed to predict. This could question the - already weak - relation with mir-1846 level.

      We appreciate the reviewer’s detailed feedback. As noted in Reply to Comments 2, we collected micro-RNA samples at two time points, near the peak of infection dynamics and at the end stage, and found no significant differences between them. This suggests that micro-RNA levels are not substantially affected by sampling time. Indeed, analyses conducted using samples from the peak, late stage, or both yielded nearly identical results in relation to infection dynamics. To clarify this point, we revised the manuscript by integrating this explanation with our response in Reply to Comments 2 (page 17, lines 259-262). In addition, now we also revised manuscript to clarify sampling times of clinical information and micro-RNA (page 6, lines 90-95).

      (5) Mathematical Model Choice Justification and Performance: The paper lacks mention of the practical identifiability of the model (especially for tau regarding the lack of early data information). Moreover, it is expected that the immune effector model will be more useful at the beginning of the infection (for which data are the more parsimonious). Please provide AIC for comparison, saying that they have "equal performance" is not enough. Can you provide at least in a point-by-point response the VPC & convergence assessments?

      We appreciate the reviewer’s detailed feedback regarding the mathematical model. We acknowledge the potential concern regarding the practical identifiability of tau (incubation period), particularly given the limited early-phase data. In our analysis, however, the nonlinear mixed-effects model yielded a population-level estimate of 4.13 days, which is similar with previously reported incubation periods for COVID-19. This concordance suggests that our estimate of tau is reasonable despite the scarcity of early data.

      For model comparison, first, we have added information on the AIC of the two models to the manuscript as suggested by the reviewer (page 10, lines 130-135). One point we would like to emphasize is that we adopted a simple target cell-limited model in this study, aiming to focus on reconstruction of viral dynamics and stratification of shedding patterns rather than exploring the mechanism of viral infection in detail. Nevertheless, we believe that the target cell-limited model provides reasonable reconstructed viral dynamics as it has been used in many previous studies. We revised manuscript to clarify this (page 10, lines 135-144). 

      Furthermore, as suggested, we have added the VPC and convergence assessment results for both models, together with explanatory text, to the manuscript (Supplementary Fig 2, Supplementary Fig 3, and page 10, lines 130-135). In the VPC, the observed 5th, 50th, and 95th percentiles were generally within the corresponding simulated prediction intervals across most time points. Although minor deviations were noted in certain intervals, the overall distribution of the observed data was well captured by the models, supporting their predictive performance (Supplementary Fig 2). In addition, the log-likelihood and SAEM parameter trajectories stabilized after the burn-in phase, confirming appropriate convergence (Supplementary Fig 3).

      (6) Selected features of viral shedding: I wonder to what extent the viral shedding area under the curve (AUC) and normalized AUC should be added as selected features.

      We sincerely appreciate the reviewer’s valuable suggestion regarding the inclusion of additional features. Following this recommendation, we considered AUC (or normalized AUC) as an additional feature when constructing the distance matrix used for stratification. We then evaluated the similarity between the resulting distance matrix and the original one using the Mantel test, which showed a very high correlation (r = 0.92, p < 0.001). This indicates that incorporating AUC as an additional feature does not substantially alter the distance matrix. Accordingly, we have decided to retain the current stratification analysis, and we sincerely thank the reviewer once again for this interesting suggestion.

      (7) Two-step nature of the analysis: First you fit a mechanistic model, then you use the predictions of this model to perform clustering and prediction of groups (unsupervised then supervised). Thus you do not propagate the uncertainty intrinsic to your first estimation through the second step, ie. all the viral load selected features actually have a confidence bound which is ignored. Did you consider a one-step analysis in which your covariates of interest play a direct role in the parameters of the mechanistic model as covariates? To pursue this type of analysis SCM (Johnson et al. Pharm. Res. 1998), COSSAC (Ayral et al. 2021 CPT PsP), or SAMBA ( Prague et al. CPT PsP 2021) methods can be used. Did you consider sampling on the posterior distribution rather than using EBE to avoid shrinkage?

      Thank you for the reviewer’s detailed suggestions regarding our analysis. We agree that the current approach does not adequately account for the impact of uncertainty in viral dynamics on the stratified analyses. As a first step, we have revised Extended Data Fig 1 (now renumbered as Supplementary Fig 1) to include 95% credible intervals computed using a bootstrap approach, to present the model-fitting uncertainty more explicitly. Then, to examine the potential impact of model uncertainty on stratified analyses, we reconstructed the distance matrix underlying stratification by incorporating feature uncertainty. Specifically, for each individual, we sampled viral dynamics within the credible interval and averaged the resulting feature, and build the distance matrix using it. We then compared this uncertainty-adjusted matrix with the original one using the Mantel test, which showed a strong correlation (r = 0.72, p < 0.001). Given this result, we did not replace the current stratification but revised the manuscript to provide this information (page 11, lines 159-162 and page 28, 512-519).

      Furthermore, we carefully considered the reviewer’s proposed one-step analysis. However, implementation was constrained by data-fitting limitations. Concretely, clinical information is available only in the NFV cohort. Thus, if these variables are to be entered directly as covariates on the parameters, the Illinois cohort cannot be included in the data-fitting process. Yet the NFV cohort lacks any pre-symptomatic observations, so fitting the model to that cohort alone does not permit a reasonable (well-identified/robust) fitting result. While we were unable to implement the suggestion under the current data constraints, we sincerely appreciate the reviewer’s thoughtful and stimulating proposal.

      (8) Need for advanced statistical methods: The analysis is characterized by a lack of power. This can indeed come from the sample size that is characterized by the number of data available in the study. However, I believe the power could be increased using more advanced statistical methods. At least it is worth a try. First considering the unsupervised clustering, summarizing the viral shedding trajectories with features collapses longitudinal information. I wonder if the R package « LongituRF » (and associated method) could help, see Capitaine et al. 2020 SMMR. Another interesting tool to investigate could be latent class models R package « lcmm » (and associated method), see ProustLima et al. 2017 J. Stat. Softwares. But the latter may be more far-reached.

      Thank you for the reviewer’s thoughtful suggestions regarding our unsupervised clustering approach. The R package “LongitiRF” is designed for supervised analysis, requiring a target outcome to guide the calculation of distances between individuals (i.e., between viral dynamics). In our study, however, the goal was purely unsupervised clustering, without any outcome variable, making direct application of “LongitiRF” challenging.

      Our current approach (summarizing each dynamic into several interpretable features and then using Random Forest proximities) allows us to construct a distance matrix in an unsupervised manner. Here, the Random Forest is applied in “proximity mode,” focusing on how often dynamics are grouped together in the trees, independent of any target variable. This provides a practical and principled way to capture overall patterns of dynamics while keeping the analysis fully unsupervised.

      Regarding the suggestion to use latent class mixed models (R package “lcmm”), we also considered this approach. In our dataset, each subject has dense longitudinal measurements, and at many time points, trajectories are very similar across subjects, resulting in minimal inter-individual differences. Consequently, fitting multi-class latent class mixed models (ng ≥ 2) with random effects or mixture terms is numerically unstable, often producing errors such as non-positive definite covariance matrices or failure to generate valid initial values. Although one could consider using only the time points with the largest differences, this effectively reduces the analysis to a feature-based summary of dynamics. Such an approach closely resembles our current method and contradicts the goal of clustering based on full longitudinal information.

      Taken together, although we acknowledge that incorporating more longitudinal information is important, we believe that our current approach provides a practical, stable, and informative solution for capturing heterogeneity in viral dynamics. We would like to once again express our sincere gratitude to the reviewer for this insightful suggestion.

      (9) Study intrinsic limitation: All the results cannot be extended to asymptomatic patients and patients infected with recent VOCs. It definitively limits the impact of results and their applicability to public health. However, for me, the novelty of the data analysis techniques used should also be taken into consideration.

      We appreciate your positive evaluation of our research approach and acknowledge that, as noted in the Discussion section as our first limitation, our analysis may not provide valid insights into recent VOCs or all populations, including asymptomatic individuals. Nonetheless, we believe it is novel that we extensively investigated the relationship between viral shedding patterns in saliva and a wide range of clinical and micro-RNA data. Our findings contribute to a deeper and more quantitative understanding of heterogeneity in viral dynamics, particularly in saliva samples. To discuss this point, we revised our manuscript (page 22, lines 364-368).

      Strengths are:

      Unique data and comprehensive analysis.

      Novel results on viral shedding.

      Weaknesses are:

      Limitation of study design.

      The need for advanced statistical methodology.

      Reviewer #1 (Recommendations For The Authors):

      Line 8: In the abstract, it would be helpful to state how stratification occurred.

      We thank the reviewer for the feedback, and have revised the manuscript accordingly (page 2, lines 8-11).

      Line 31 and discussion: It is important to mention the challenges of using saliva as a specimen type for lab personnel.

      We thank the reviewer for the feedback, and have revised the manuscript accordingly (page 3, lines 36-41).

      Line 35: change to "upper respiratory tract".

      We thank the reviewer for the feedback, and have revised the manuscript accordingly (page 3, line 35).

      Line 37: "Saliva" is not a tissue. Please hazard a guess as to which tissue is responsible for saliva shedding and if it overlaps with oral and nasal swabs.

      We thank the reviewer for the feedback, and have revised the manuscript accordingly (page 3, lines 42-45).

      Line 42, 68: Please explain how understanding saliva shedding dynamics would impact isolation & screening, diagnostics, and treatments. This is not immediately intuitive to me.

      We thank the reviewer for the feedback, and have revised the manuscript accordingly (page 3, lines 48-50).

      Line 50: It would be helpful to explain why shedding duration is the best stratification variable.

      We thank the reviewer for the feedback. We acknowledge that our wording was ambiguous. The clear differences in the viral dynamics patterns pertain to findings observed following the stratification, and we have revised the manuscript to make this explicit (page 4, lines 59-61).

      Line 71: Dates should be listed for these studies.

      We thank the reviewer for the feedback, and have revised the manuscript accordingly (page 6, lines 85-86).

      Reviewer #2 (Recommendations For The Authors):

      Please make all code and data available for replication of the analyses.

      We appreciate the suggestion. Due to ethical considerations, it is not possible to make all data and code publicly available. We have clearly stated in the manuscript about it (Data availability section in Methods).

      Reviewer #3 (Recommendations For The Authors):

      Here are minor comments / technical details:

      (1) Figure 1B is difficult to understand.

      Thank you for the comment. We updated Fig 1B to incorporate more information to aid interpretation.

      (2) Did you analyse viral load or the log10 of viral load? The latter is more common. You should consider it. SI Figure 1 please plot in log10 and use a different point shape for censored data. The file quality of this figure should be improved. State in the material and methods if SE with moonlit are computed with linearization or importance sampling.

      Thank you for the comment. We conducted our analyses using log10-transformed viral load. Also, we revised Supplementary Fig 1 (now renumbered as Supplementary Fig 4) as suggested. We also added Supplementary Fig 3 and clarified in the Methods that standard errors (SE) were obtained in Monolix from the Fisher information matrix using the linearization method (page 28, lines 498-499).

      (3) Table 1 and Figure 3A could be collapsed.

      Thank you for the comment, and we carefully considered this suggestion. Table 1 summarizes clinical variables by category, whereas Fig 3A visualizes them ordered by p-value of statistical analysis. Collapsing these into a single table would make it difficult to apprehend both the categorical summaries and the statistical ranking at a glance, thereby reducing readability. We therefore decided to retain the current layout. We appreciate the constructive feedback again. 

      (4) Figure 3 legend could be clarified to understand what is 3B and 3C.

      We thank the reviewer for the feedback and have reinforced the description accordingly.

      (5) Why use AIC instead of BICc?

      Thank you for your comment. We also think BICc is a reasonable alternative. However, because our objective is predictive adequacy (reconstruction of viral dynamics), we judged AIC more appropriate. In NLMEM settings, the effective sample size required by BICc is ambiguous, making the penalty somewhat arbitrary. Moreover, since the two models reconstruct very similar dynamics, our conclusions are not sensitive to the choice of criterion.

      (6) Bibliography. Most articles are with et al. (which is not standard) and some are with an extended list of names. Provide DOI for all.

      We thank the reviewer for the feedback, and have revised the manuscript accordingly.

      (7) Extended Table 1&2 - maybe provide a color code to better highlight some lower p-values (if you find any interesting).

      We thank the reviewer for the feedback. Since no clinical information and micro-RNAs other than mir-1846 showed low p-values, we highlighted only mir-1846 with color to make it easier to locate.

      (8) Please make the replication code available.

      We appreciate the suggestion. Due to ethical considerations, it is not possible to make all data and code publicly available. We have clearly stated in the manuscript about it (Data availability section in Methods).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      In this work, van Paassen et al. have studied how CD8 T cell functionality and levels predict HIV DNA decline. The article touches on interesting facets of HIV DNA decay, but ultimately comes across as somewhat hastily done and not convincing due to the major issues. 

      (1) The use of only 2 time points to make many claims about longitudinal dynamics is not convincing. For instance, the fact that raw data do not show decay in intact, but do for defective/total, suggests that the present data is underpowered. The authors speculate that rising intact levels could be due to patients who have reservoirs with many proviruses with survival advantages, but this is not the parsimonious explanation vs the data simply being noisy without sufficient longitudinal follow-up. n=12 is fine, or even reasonably good for HIV reservoir studies, but to mitigate these issues would likely require more time points measured per person. 

      (1b) Relatedly, the timing of the first time point (6 months) could be causing a number of issues because this is in the ballpark for when the HIV DNA decay decelerates, as shown by many papers. This unfortunate study design means some of these participants may already have stabilized HIV DNA levels, so earlier measurements would help to observe early kinetics, but also later measurements would be critical to be confident about stability. 

      The main goal of the present study was to understand the relationship of the HIV-specific CD8 T-cell responses early on ART with the reservoir changes across the subsequent 2.5-year period on suppressive therapy. We have revised the manuscript in order to clarify this.  We chose these time points because the 24 week time point is past the initial steep decline of HIV DNA, which takes place in the first weeks after ART initiation. It is known that HIV DNA continues to decay for years after (Besson, Lalama et al. 2014, Gandhi, McMahon et al. 2017). 

      (2) Statistical analysis is frequently not sufficient for the claims being made, such that overinterpretation of the data is problematic in many places. 

      (2a) First, though plausible that cd8s influence reservoir decay, much more rigorous statistical analysis would be needed to assert this directionality; this is an association, which could just as well be inverted (reservoir disappearance drives CD8 T cell disappearance). 

      To correlate different reservoir measures between themselves and with CD8+ T-cell responses at 24 and 156 weeks, we now performed non-parametric (Spearman) correlation analyses, as they do not require any assumptions about the normal distribution of the independent and dependent variables. Benjamini-Hochberg corrections for multiple comparisons (false discovery rate, 0.25) were included in the analyses and did not change the results. 

      Following this comment we would like to note that the association between the T-cell response at 24 weeks and the subsequent decrease in the reservoir cannot be bi-directional (that can only be the case when both variables are measured at the same time point). Therefore, to model the predictive value of T-cell responses measured at 24 weeks for the decrease in the reservoir between 24 and 156 weeks, we fitted generalized linear models (GLM), in which we included age and ART regimen, in addition to three different measures of HIV-specific CD8+ T-cell responses, as explanatory variables, and changes in total, intact, and total defective HIV DNA between 24 and 156 weeks ART as dependent variables.

      (2b) Words like "strong" for correlations must be justified by correlation coefficients, and these heat maps indicate many comparisons were made, such that p-values must be corrected appropriately. 

      We have now used Spearman correlation analysis, provided correlation coefficients to justify the wording, and adjusted the p-values for multiple comparisons (Fig. 1, Fig 3., Table 2). Benjamini-Hochberg corrections for multiple comparisons (false discovery rate, 0.25) were included in the analyses and did not change the results.  

      (3) There is not enough introduction and references to put this work in the context of a large/mature field. The impacts of CD8s in HIV acute infection and HIV reservoirs are both deep fields with a lot of complexity. 

      Following this comment we have revised and expanded the introduction to put our work more in the context of the field (CD8s in acute HIV and HIV reservoirs). 

      Reviewer #2 (Public review): 

      Summary: 

      This study investigated the impact of early HIV specific CD8 T cell responses on the viral reservoir size after 24 weeks and 3 years of follow-up in individuals who started ART during acute infection. Viral reservoir quantification showed that total and defective HIV DNA, but not intact, declined significantly between 24 weeks and 3 years post-ART. The authors also showed that functional HIV-specific CD8⁺ T-cell responses persisted over three years and that early CD8⁺ T-cell proliferative capacity was linked to reservoir decline, supporting early immune intervention in the design of curative strategies. 

      Strengths: 

      The paper is well written, easy to read, and the findings are clearly presented. The study is novel as it demonstrates the effect of HIV specific CD8 T cell responses on different states of the HIV reservoir, that is HIV-DNA (intact and defective), the transcriptionally active and inducible reservoir. Although small, the study cohort was relevant and well-characterized as it included individuals who initiated ART during acute infection, 12 of whom were followed longitudinally for 3 years, providing unique insights into the beneficial effects of early treatment on both immune responses and the viral reservoir. The study uses advanced methodology. I enjoyed reading the paper. 

      Weaknesses: 

      All participants were male (acknowledged by the authors), potentially reducing the generalizability of the findings to broader populations. A control group receiving ART during chronic infection would have been an interesting comparison. 

      We thank the reviewer for their appreciation of our study. Although we had indeed acknowledged the fact that all participants were male, we have clarified why this is a limitation of the study (Discussion, lines 296-298). The reviewer raises the point that it would be useful to compare our data to a control group. Unfortunately, these samples are not yet available, but our study protocol allows for a control group (chronic infection) to ensure we can include a control group in the future.

      Reviewer #1 (Recommendations for the authors): 

      Minor: 

      On the introduction: 

      (1) One large topic that is mostly missing completely is the emerging evidence of selection on HIV proviruses during ART from the groups of Xu Yu and Matthias Lichterfeld, and Ya Chi Ho, among others. 

      Previously, it was only touched upon in the Discussion. Now we have also included this in the Introduction (lines 77-80).

      (2) References 4 and 5 don't quite match with the statement here about reservoir seeding; we don't completely understand this process, and certainly, the tissue seeding aspect is not known. 

      Line 61-62: references were changed and this paragraph was rewritten to clarify.

      (3) Shelton et al. showed a strong relationship with HIV DNA size and timing of ART initiation across many studies. I believe Ananwaronich also has several key papers on this topic. 

      References by Ananwaronich are included (lines 91-94).

      (4) "the viral levels decline within weeks of AHI", this is imprecise, there is a peak and a decline, and an equilibrium. 

      We agree and have rewritten the paragraph accordingly.

      (5) The impact of CD8 cells on viral evolution during primary infection is complex and likely not relevant for this paper. 

      We have left viral evolution out of the introduction in order to keep a focus on the current subject.

      (6) The term "reservoir" is somewhat polarizing, so it might be worth mentioning somewhere exactly what you think the reservoir is, I think, as written, your definition is any HIV DNA in a person on ART? 

      Indeed, we refer to the reservoir when we talk about the several aspects of the reservoir that we have quantified with our assays (total HIV DNA, unspliced RNA, intact and defective proviral DNA, and replication-competent virus). In most instances we try to specify which measurement we are referring to. We have added additional reservoir explanation to clarify our definition to the introduction (lines 55-58).

      (7) I think US might be used before it is defined. 

      We thank the reviewer for this notification, we have now also defined it in the Results section (line 131).

      (8) In Figure 1 it's also not clear how statistics were done to deal with undetectable values, which can be tricky but important. 

      We have now clarified this in the legend to Figure 2 (former Figure 1). Paired Wilcoxon tests were performed to test the significance of the differences between the time points. Pairs where both values were undetectable were always excluded from the analysis. Pairs where one value was undetectable and its detection limit was higher than the value of the detectable partner, were also excluded from the analysis. Pairs where one value was undetectable and its detection limit was lower than the value of the detectable partner, were retained in the analysis.

      In the discussion: 

      (1) "This confirms that the existence of a replication-competent viral reservoir is linked to the presence of intact HIV DNA." I think this statement is indicative of many of the overinterpretations without statistical justification. There are 4 of 12 individuals with QVOA+ detectable proviruses, which means there are 8 without. What are their intact HIV DNA levels? 

      We thank the reviewer for the question that is raised here. We have now compared the intact DNA levels (measured by IPDA) between participants with positive vs. negative QVOA output, and observed a significant difference. We rephrased the wording as follows: “We compared the intact HIV DNA levels at the 24-week timepoint between the six participants, from whom we were able to isolate replicating virus, and the fourteen participants, from whom we could not. Participants with positive QVOA had significantly higher intact HIV DNA levels than those with negative QVOA (p=0.029, Mann-Whitney test; Suppl. Fig. 3). Five of six participants with positive QVOA had intact DNA levels above 100 copies/106 PBMC, while thirteen of fourteen participants with negative QVOA had intact HIV DNA below 100 copies/106 PBMC (p=0.0022, Fisher’s exact test). These findings indicate that recovery of replication-competent virus by QVOA is more likely in individuals with higher levels of intact HIV DNA in IPDA, reaffirming a link between the two measurements.”

      (2) "To determine whether early HIV-specific CD8+ T-cell responses at 24 weeks were predictive for the change in reservoir size". This is a fundamental miss on correlation vs causation... it could be the inverse. 

      We thank the reviewer for the remark. We have calculated the change in reservoir size (the difference between the reservoir size at 24 weeks and 156 weeks ART) and analyzed if the HIVspecific CD8+ T-cell response at 24 weeks ART are predictive for this change. We do not think it can be inverse, as we have a chronological relationship (CD8+ responses at week 24 predict the subsequent change in the reservoir).

      (3) "This may suggest that active viral replication drives the CD8+ T-cell response." I think to be precise, you mean viral transcription drives CD8s, we don't know about the full replication cycle from these data. 

      We agree with the reviewer and have changed “replication” to “transcription” (line 280).

      (4) "Remarkably, we observed that the defective HIV DNA levels declined significantly between 24 weeks and 3 years on ART. This is in contrast to previous observations in chronic HIV infection (30)". I don't find this remarkable or in contrast: many studies have analyzed and/or modeled defective HIV DNA decay, most of which have shown some negative slope to defective HIV DNA, especially within the first year of ART. See White et al., Blankson et al., Golob et al., Besson et al., etc In addition, do you mean in long-term suppressed? 

      The point we would like to make is that,  compared to other studies, we found a significant, prominent decrease in defective DNA (and not intact DNA) over the course of 3 years, which is in contrast to other studies (where usually the decrease in intact is significant and the decrease in defective less prominent). We have rephrased the wording (lines 227-230) as follows:

      “We observed that the defective HIV DNA levels decreased significantly between 24 and 156 weeks of ART. This is different from studies in CHI, where no significant decrease during the first 7 years of ART (Peluso, Bacchetti et al. 2020, Gandhi, Cyktor et al. 2021), or only a significant decrease during the first 8 weeks on ART, but not in the 8 years thereafter, was observed (Nühn, Bosman et al. 2025).”

      Reviewer #2 (Recommendations for the authors): 

      (1) Page 4, paragraph 2 - will be informative to report the statistics here. 

      (2) Page 4, paragraph 4 - "General phenotyping of CD4+ (Suppl. Fig. 3A) and CD8+ (Supplementary Figure 3B) T-cells showed no difference in frequencies of naïve, memory or effector CD8+ T-cells between 24 and 156 weeks." - What did the CD4+ phenotyping show? 

      We thank the reviewer for the remark. Indeed, there were also no differences in frequencies of naïve, memory or effector CD4+ T-cells between 24 and 156 weeks. We have added this to the paragraph (now Suppl. Fig 4), lines 166-168.

      (3) Page 5, paragraph 3 - "Similarly, a broad HIV-specific CD8+ T-cell proliferative response to at least three different viral proteins was observed in the majority of individuals at both time points" - should specify n=? for the majority of individuals. 

      At time point 24 weeks, 6/11 individuals had a response to env, 10/11 to gag, 5/11 to nef, and 4/11 to pol. At 156 weeks, 8/11 to env, 10/11 to gag, 8/11 to nef and 9/11 to pol. We have added this to the text (lines 188-191).

      (4) Seven of 22 participants had non-subtype B infection. Can the authors explain the use of the IPDA designed by Bruner et. al. for subtype B HIV, and how this may have affected the quantification in these participants? 

      Intact HIV DNA was detectable in all 22 participants. We cannot completely exclude influence of primer/probe-template mismatches on the quantification results, however such mismatches could also have occurred in subtype B participants, and droplet digital PCR that IPDA is based on is generally much less sensitive to these mismatches than qPCR.

      (5) Page 7, paragraph 2 - the authors report a difference in findings from a previous study ("a decline in CD8 T cell responses over 2 years" - reference 21), but only provide an explanation for this on page 9. The authors should consider moving the explanation to this paragraph for easier understanding. 

      We agree with the reviewer that this causes confusion. Therefore, we have revised and changed the order in the Discussion.

      (6) Page 7, paragraph 2 - Following from above, the previous study (21) reported this contradicting finding "a decline in CD8 T cell responses over 2 years" in a CHI (chronic HIV) treated cohort. The current study was in an acute HIV treated cohort. The authors should explain whether this may also have resulted in the different findings, in addition to the use of different readouts in each study.

      We thank the reviewer for this attentiveness. Indeed, the study by Takata et al. investigates the reservoir and HIV-specific CD8+ T-cell responses in both the RV254/ SEARCH010 study who initiated ART during AHI and the RV304/ SEARCH013 who initiated ART during CHI. We had not realized that the findings of the decline in CD8 T cell responses were solely found in the RV304/ SEARCH013 (CHI cohort). It appears functional HIV specific immune responses were only measured in AHI at 96 weeks, so we have clarified this in the Discussion. 

      Besson, G. J., C. M. Lalama, R. J. Bosch, R. T. Gandhi, M. A. Bedison, E. Aga, S. A. Riddler, D. K. McMahon, F. Hong and J. W. Mellors (2014). "HIV-1 DNA decay dynamics in blood during more than a decade of suppressive antiretroviral therapy." Clin Infect Dis 59(9): 1312-1321.

      Gandhi, R. T., J. C. Cyktor, R. J. Bosch, H. Mar, G. M. Laird, A. Martin, A. C. Collier, S. A. Riddler, B. J. Macatangay, C. R. Rinaldo, J. J. Eron, J. D. Siliciano, D. K. McMahon and J. W. Mellors (2021). "Selective Decay of Intact HIV-1 Proviral DNA on Antiretroviral Therapy." J Infect Dis 223(2): 225-233.

      Gandhi, R. T., D. K. McMahon, R. J. Bosch, C. M. Lalama, J. C. Cyktor, B. J. Macatangay, C. R. Rinaldo, S. A. Riddler, E. Hogg, C. Godfrey, A. C. Collier, J. J. Eron and J. W. Mellors (2017). "Levels of HIV-1 persistence on antiretroviral therapy are not associated with markers of inflammation or activation." PLoS Pathog 13(4): e1006285.

      Nühn, M. M., K. Bosman, T. Huisman, W. H. A. Staring, L. Gharu, D. De Jong, T. M. De Kort, N. Buchholtz, K. Tesselaar, A. Pandit, J. Arends, S. A. Otto, E. Lucio De Esesarte, A. I. M. Hoepelman, R. J. De Boer, J. Symons, J. A. M. Borghans, A. M. J. Wensing and M. Nijhuis (2025). "Selective decline of intact HIV reservoirs during the first decade of ART followed by stabilization in memory T cell subsets." Aids 39(7): 798-811.

      Peluso, M. J., P. Bacchetti, K. D. Ritter, S. Beg, J. Lai, J. N. Martin, P. W. Hunt, T. J. Henrich, J. D. Siliciano, R. F. Siliciano, G. M. Laird and S. G. Deeks (2020). "Differential decay of intact and defective proviral DNA in HIV-1-infected individuals on suppressive antiretroviral therapy." JCI Insight 5(4).

    1. La Réflexion Dialogique : Synthèse des Idées de Steve Mann

      Synthèse Exécutive

      Le professeur Steve Mann (Université de Warwick), lors de sa résidence à l'Institut d'Études Avancées (IEA) de Paris, présente son projet de recherche sur la "réflexion dialogique".

      Il la définit comme une forme de conversation collaborative et médiatisée, conçue pour examiner les expériences et les idées, contrastant fortement avec la vision traditionnelle de la réflexion en tant qu'exercice solitaire et individuel.

      L'argument central de sa présentation est que les êtres humains possèdent un "moteur interactionnel" inné, une capacité fondamentale à l'empathie, à l'écoute et à l'interaction, rendant les pratiques dialogiques non pas artificielles, mais au contraire profondément ancrées dans notre nature.

      Mann suggère que l'IEA, dont la mission est de favoriser le dialogue, pourrait systématiquement documenter et analyser ces interactions fertiles, voire positionner la réflexion dialogique comme une de ses méthodes de recherche.

      Son propre plan de travail à l'institut consiste à réexaminer ses corpus de données à la recherche de marqueurs linguistiques de la réflexion dialogique, tout en explorant des domaines comme les études néonatales pour en consolider les fondements théoriques.

      --------------------------------------------------------------------------------

      1. Définition et Fondements de la Réflexion Dialogique

      La réflexion dialogique est présentée comme un processus collaboratif qui vise à dépasser la pensée individuelle à travers une interaction dynamique et une multiplicité de perspectives.

      Définition : Il s'agit d'une forme d'enquête par la parole, souvent structurée, qui permet d'examiner les expériences, les idées et les présupposés.

      Elle est fondamentalement médiatisée et collaborative.

      Origines du Concept : L'intérêt de Steve Mann pour ce sujet provient de plusieurs sources :

      "Cooperative Development" (Développement Coopératif) : Un modèle développé par son superviseur de thèse, Julian Edge, fortement influencé par les idées de Carl Rogers (respect, empathie, sincérité).

      Ce modèle met l'accent sur l'écoute active et utilise des techniques linguistiques spécifiques comme le "reflet" (reflecting) et la "focalisation" (focusing) pour soutenir l'émergence des idées du locuteur.   

      Travaux antérieurs : Un chapitre co-écrit avec le professeur Steve Walsh sur la réflexion dialogique, que Mann a estimé n'avoir fait qu'effleurer le sujet.   

      Recherche sur la Réflexivité : Des travaux sur la réflexivité dans les entretiens de recherche qualitative, analysant comment les chercheurs réfléchissent à leur propre identité et méthodologie.

      2. Contestation de la Vision Traditionnelle de la Réflexion

      Mann remet en question la sémiotique dominante qui présente la réflexion comme une pratique purement individuelle et solitaire.

      L'image du "Penseur" : La sculpture "Le Penseur" de Rodin est citée comme l'archétype de cette vision de la pensée individuelle et isolée.

      Mann note l'influence de Charles Baudelaire sur Rodin, soulignant le lien entre la forme physique et l'exploration des états émotionnels internes.

      La connotation négative : Cette vision individualiste a un "côté sombre", incarné par le mythe de Narcisse.

      La pratique réflexive est ainsi souvent perçue de manière péjorative comme une forme d'introspection excessive ou de "nombrilisme" (navel-gazing).

      Le Contexte Éducatif : Le système éducatif est souvent décrit comme "monologique", dominé par la parole de l'enseignant qui fournit des réponses à des questions que les élèves n'ont pas posées.

      Le travail de Mann vise à "perturber" ou "intervenir" dans ces normes d'interaction pour les rendre plus dialogiques.

      3. Le Concept du "Moteur Interactionnel"

      Pour contrer l'idée que le dialogue structuré est artificiel, Mann s'appuie sur des recherches en études néonatales, notamment celles de Stephen Levinson.

      Preuves chez les nouveau-nés : Des études montrent que les nouveau-nés interagissent avec leurs soignants quelques jours seulement après la naissance.

      On observe des preuves de prise de tour (turn-taking) et d'organisation séquentielle dans leur regard et leurs interactions.

      Une Capacité Innée : Levinson propose l'existence d'un "moteur interactionnel" (interactional engine), une capacité humaine spéciale et innée pour l'interaction.

      Cette capacité inclut des compétences cognitives comme l'attention conjointe, l'empathie et la recherche d'un terrain d'entente (common ground).

      Implications Fondamentales : Si l'empathie et l'écoute sont des aspects fondamentaux de l'expérience humaine dès le début de la vie, alors les pratiques qui les favorisent ne sont pas artificielles mais exploitent une disposition naturelle.

      Neurosciences et Interaction : Mann cite des études montrant que les processus cérébraux et cognitifs fonctionnent différemment lorsque les individus sont en interaction.

      Par exemple, le cerveau d'un nourrisson réagit différemment à une écoute dirigée vers lui par rapport à une écoute périphérique.

      De plus, les messages soutenus par des éléments multimodaux sont mieux assimilés par le cerveau.

      4. Outils et Méthodes pour la Pratique Dialogique

      Pour être efficace, la réflexion dialogique doit être médiatisée par des outils et un "étayage" (scaffolding) appropriés, au sens vygotskien du terme.

      Outil / Approche

      Description

      Outils vidéo (Iris Connect, VEO)

      Permettent aux praticiens (enseignants, médecins) d'analyser leurs propres interactions.

      E-portfolios et Podcasts

      Offrent des moyens multimodaux pour la création de sens et la réflexion.

      Mentorat et Coaching

      Projets qui structurent la pratique réflexive et l'intègrent dans le développement professionnel.

      Recherche-Action

      Approche visant à modifier les normes d'interaction au sein des séminaires ou des formations.

      5. Perspectives pour l'Institut d'Études Avancées de Paris

      Mann souligne l'alignement entre son projet et la mission de l'IEA, qui est de "promouvoir des discussions qui encouragent la réflexion".

      Témoignages de Résidents : Il cite le rapport annuel de l'institut, où des résidents témoignent de l'importance des conversations informelles et de la manière dont ces interactions ont fait évoluer de manière significative leur projet de recherche.

      ◦ _« Très enrichissant de discuter de manière informelle pendant le déjeuner et les apéritifs aussi.

      Ces conversations m'ont aidé à la fois à voir mon propre projet d'un point de vue non spécialiste et à avoir une idée des développements importants dans d'autres domaines. »_   

      « Grâce à l'interaction à l'IEA, l'orientation initiale de ma recherche a considérablement évolué depuis sa création. Cela m'a amené à examiner les questions de pouvoir, les structures sociétales et leur impact sur l'atteinte des objectifs de durabilité. »

      Propositions pour l'Institut :

      1. Documenter les processus : L'IEA pourrait-il systématiquement documenter et analyser les types d'interactions et de réflexions dialogiques qui s'y déroulent ?   

      2. Une nouvelle méthode de recherche : L'institut pourrait-il positionner la réflexion dialogique comme l'une de ses nouvelles méthodes de recherche, valorisant ainsi les processus collaboratifs au même titre que les productions écrites ?

      6. Plan de Recherche de Steve Mann

      Durant sa résidence, Mann prévoit de se concentrer sur plusieurs axes :

      Analyse de Données Existantes : Réexaminer ses corpus de données (les siens et ceux de ses étudiants) pour identifier des exemples de réflexion dialogique.

      Identification de Marqueurs Linguistiques : Rechercher des preuves linguistiques spécifiques de la réflexion, telles que :

      ◦ La création de liens et de résonances.  

      ◦ L'utilisation de métaphores, de récits, d'anecdotes.    ◦ Les stratégies d'atténuation (hedging) et de spéculation.   

      ◦ La signalisation de "zones grises" ou de "tiers-espaces".    ◦ Les "moments eurêka" (light bulb moments).

      Influence de Bakhtine : Explorer la nature multimodale et intertextuelle de la réflexion dialogique, en s'appuyant sur le concept d'hétéroglossie de Bakhtine (les voix, concepts et cadres internalisés que nous mobilisons dans le dialogue).

      Tension Centripète/Centrifuge : Étudier comment l'esprit oscille entre un désir de focalisation (centripète) et une volonté d'élargir les perspectives (centrifuge).

      7. Échanges avec les autres Chercheurs

      La présentation a suscité des réactions et des connexions avec les travaux d'autres résidents.

      Dialogue avec Sadi :

      ◦ Sadi exprime son intérêt pour l'approche de Mann afin d'améliorer les "formats" de l'IEA et mentionne l'approche de l'enquête humble (humble inquiry) d'Edgar Schein.  

      ◦ Il partage une expérience utilisant des micro-caméras qui révèlent une synchronisation des regards entre des personnes résolvant un problème.

      Cela illustre le "triangle psychosocial" : l'ego, l'alter et l'objet. 

      ◦ Il émet l'hypothèse que le succès de l'IEA réside dans l'absence de hiérarchie ou de compétition, ce qui permet aux chercheurs de se concentrer sur l'objet de la discussion plutôt que sur les relations interpersonnelles.

      Dialogue avec Eleanor :

      ◦ Eleanor établit un lien avec le concept de "co-construction" du sens (une poignée de main nécessite deux personnes).   

      ◦ Elle cite les travaux de Charles Goodwin ("Co-operation"), qui a analysé à un niveau micro-temporel comment la pensée se forme pendant que l'on parle.    ◦

      Elle recommande deux chercheuses françaises travaillant sur ces sujets : Aude-Marie Morgenstern et Maya Gratier, qui étudient les interactions entre mères et nourrissons et leur dimension "musicale".

    1. La Créativité : Perspectives Croisées des Neurosciences, de l'Art, de la Musique et de l'Intelligence Artificielle

      Résumé

      Ce document de synthèse analyse les thèmes et les arguments clés d'une table ronde sur la créativité, réunissant des experts en neurosciences, composition musicale, arts plastiques et intelligence artificielle.

      La discussion s'articule autour d'un cadre conceptuel définissant la créativité humaine selon quatre dimensions : la nouveauté, l'adéquation, l'authenticité et l'agentivité.

      Les intervenants explorent comment ces dimensions se manifestent dans leurs domaines respectifs.

      En intelligence artificielle, la créativité émerge par des mécanismes de curiosité et des algorithmes évolutionnistes, permettant à des robots de découvrir de manière autonome des solutions nouvelles et efficaces à des problèmes complexes, comme le démontrent les exemples du jeu de Go ou de l'apprentissage moteur.

      Dans le domaine artistique et musical, la créativité oscille entre la génération au sein de contraintes strictes (l'algorithme de composition de Mozart) et la transgression délibérée des conventions pour créer de l'inédit (l'hybridation chez Beethoven).

      Les bases neuroscientifiques révèlent le rôle central du cortex préfrontal, qui agit comme un moniteur capable d'inhiber des stratégies inefficaces pour laisser émerger de nouvelles solutions issues de la mémoire.

      Enfin, des exemples tirés du monde animal, notamment le poulpe et sa capacité de camouflage et de ruse ("métis"), suggèrent que la créativité est un phénomène plus large que l'activité purement humaine.

      La discussion conclut sur les limites actuelles de l'IA, qui excelle à produire des surfaces cohérentes mais peine encore à générer des œuvres dotées de la profondeur structurelle et de l'authenticité caractéristiques de la création humaine.

      --------------------------------------------------------------------------------

      1. Un Cadre Théorique pour la Créativité

      Étienne Koechlin, neuroscientifique, propose un modèle standard pour décomposer le concept de créativité en quatre dimensions fondamentales.

      Ce cadre sert de référence tout au long de la discussion pour analyser les différentes manifestations de la créativité.

      Dimension

      Description

      Concepts Clés

      Cognitives

      Nouveauté

      La capacité à produire quelque chose qui n'existait pas auparavant. Cette possibilité est inhérente même aux systèmes formels les plus fermés, comme le démontre le théorème de Gödel.

      Génération, innovation, possibilité de l'inédit.

      Adéquation

      La production nouvelle doit être pertinente par rapport à un contexte externe. Cela peut être la solution à un problème, ou une œuvre d'art qui résonne avec un public.

      Évaluation, pertinence, contexte, originalité (articulation nouveauté/adéquation).

      Conatives

      Authenticité

      L'acte créatif est l'expression d'un individu, souvent issue d'un déséquilibre interne (insatisfaction, état extatique).

      Le créateur cherche à répondre à ce déséquilibre.

      Expression individuelle, déséquilibre interne, énergie créatrice.

      Agentivité

      La créativité est une action visant à transformer ou influencer le monde. Il y a une volonté d'être effectif, d'avoir un impact.

      Action, volonté, transformation du monde, effectivité.

      Koechlin souligne que ces dimensions peuvent être présentes à des degrés divers selon l'activité (humaine, animale ou artificielle).

      Par exemple, une IA comme AlphaGo fait preuve de nouveauté et d'adéquation (coups créatifs pour gagner), et d'une forme d'agentivité (interagir avec un joueur humain), mais son authenticité est considérée comme très réduite.

      2. La Créativité dans les Systèmes Artificiels

      Pierre-Yves Oudeyer, chercheur en IA, présente comment des machines peuvent générer des comportements et des connaissances à la fois nouveaux, pertinents et efficaces, remplissant ainsi plusieurs critères de la créativité.

      2.1. La Curiosité comme Moteur de l'Exploration

      Le travail de l'équipe de P-Y. Oudeyer se concentre sur la modélisation de la curiosité, comprise comme le mécanisme poussant un agent (enfant ou robot) à explorer spontanément son environnement.

      Apprentissage Autonome : Un robot quadrupède, initialement sans connaissance de son corps ou de l'environnement, apprend par expérimentation.

      Guidé par des algorithmes de curiosité, il teste des actions (bouger ses membres, vocaliser) et observe les résultats.

      Découverte de Régularités : Le robot découvre progressivement des relations de cause à effet : pousser un objet avec son bras le fait bouger, vocaliser vers un autre robot provoque une imitation.

      Cette exploration, motivée par la curiosité, le mène à découvrir les interactions sociales.

      Étienne Koechlin relie cette approche à la recherche en neurosciences sur les moteurs de l'action.

      Il oppose deux visions : l'action pour accumuler des ressources (récompenses) et l'action pour acquérir de l'information et améliorer ses modèles internes du monde.

      La curiosité est au cœur de cette seconde vision : on agit là où l'on pense pouvoir apprendre le plus.

      2.2. Algorithmes Évolutionnistes et Apprentissage par Renforcement

      Des algorithmes inspirés de l'évolution biologique permettent de générer des solutions créatives que des ingénieurs n'auraient pas envisagées.

      Créatures Virtuelles : Dans une simulation, des "créatures" composées de cellules virtuelles (muscles, cellules rigides) sont générées aléatoirement.

      Un critère de "fitness" (capacité à avancer vite) est défini.

      Les créatures les plus performantes sont sélectionnées, leurs "gènes" sont mutés aléatoirement pour créer une nouvelle génération.

      Au fil des générations, des formes de corps et des stratégies de locomotion efficaces et inattendues émergent.

      Robots Physiques : Un robot physique apprend à se déplacer par essais et erreurs (apprentissage par renforcement). Initialement, ses mouvements sont aléatoires et maladroits.

      En quelques minutes, il découvre comment se retourner, puis se mettre sur ses pattes et marcher de manière robuste, capable de réagir aux perturbations.

      La stratégie de mouvement finale n'a pas été programmée par un humain, mais découverte par le robot lui-même.

      Ces mêmes méthodes sont à la base des succès d'AlphaGo, qui a produit des coups jugés "hautement créatifs" par les experts humains.

      3. La Créativité dans la Pratique Artistique

      Les intervenants issus des domaines de la musique et des arts plastiques illustrent la tension créative entre la contrainte et la liberté, et entre la tradition et l'innovation.

      3.1. Musique : Algorithmes et Transgressions

      Le compositeur Floris Guédy présente deux modèles de création musicale :

      Le Jeu de Dés de Mozart : Un système algorithmique pour composer des menuets.

      En lançant des dés, on sélectionne des mesures pré-écrites dans une matrice.

      Bien que basé sur le hasard, le système est ultra-contraint par des règles d'harmonie tonale (fonctions harmoniques : sujet, verbe, complément).

      Le résultat est toujours cohérent et varié, générant des milliards de combinaisons possibles.

      Ce système peut être généralisé pour simuler, avec le même modèle de base, les styles de compositeurs ultérieurs (Schumann, Debussy) en changeant simplement les paramètres.

      L'Hybridation chez Beethoven : L'analyse des brouillons de la 30ème sonate pour piano montre un processus créatif différent. Beethoven oppose deux éléments musicaux (A : monodique et piqué ; B : accords liés) et crée un troisième élément (C) en hybridant leurs caractéristiques.

      Ses carnets révèlent un processus de recherche active, d'essais et d'erreurs pour trouver le contraste maximal rendant l'hybridation la plus audible possible.

      Pour F. Guédy, ce type de créativité, qui consiste à "casser les conventions" d'une infinité de manières possibles, est difficilement simulable par une IA qui cherche plutôt à reproduire ce qui est statistiquement probable.

      3.2. Arts et Artisanat : Co-création et Matière Active

      Patricia Ribault, spécialiste en arts plastiques, met en lumière la créativité dans les processus de "faire" et les interactions.

      La Co-création à Murano : Lors d'un workshop, des étudiants en design présentent des dessins aux maîtres verriers de Murano.

      Les artisans, confrontés à des formes qui dépassent leur savoir-faire traditionnel, doivent inventer de nouvelles techniques.

      Ce moment de "cocréation" pousse les techniques traditionnelles au-delà de leurs limites.

      La Matière Active ("Active Matter") : Elle décrit son travail au sein du cluster d'excellence "Matters of Activity", où des chercheurs de toutes disciplines (scientifiques, ingénieurs, designers) étudient des pratiques comme le filtrage, le tissage ou la découpe sous l'angle de la matière elle-même comme agent actif.

      Visualisation de la Neuroplasticité : Elle présente le projet "Brain Roads", une collaboration entre artistes, designers et neurochirurgiens visant à visualiser la complexité de la plasticité cérébrale.

      Face aux limites des imageries traditionnelles (tractographie), les artistes proposent de nouveaux modèles graphiques (inspirés des cartes de métro, des voxels) pour mieux guider le geste du chirurgien et représenter l'expérience des patients en chirurgie éveillée.

      4. Les Bases Biologiques et Neuroscientifiques

      La discussion explore les mécanismes cérébraux sous-jacents à la créativité humaine ainsi que ses manifestations dans le monde animal.

      4.1. Le Rôle du Cortex Préfrontal

      Étienne Koechlin explique que le cortex préfrontal est la région clé qui "autorise" la créativité chez l'homme.

      Le Mécanisme de Contrôle et d'Ouverture : Cette région du cerveau monitore en permanence nos comportements et stratégies mentales.

      Lorsqu'une stratégie est jugée non pertinente ou inefficace, le cortex préfrontal l'inhibe.

      Cette inhibition permet à de nouvelles options, issues d'un "remixage" contextualisé de la mémoire à long terme, d'émerger.

      Gestion de la Propre Limitation : Le système est conçu pour prendre en compte sa propre limitation. Il accepte de "perdre le contrôle" pour permettre l'émergence de la nouveauté.

      Les nouvelles options sont ensuite évaluées : si elles sont probantes, elles sont confirmées et consolidées en mémoire, enrichissant le répertoire de l'individu pour de futures créations.

      L'Exemple du Test des 9 Points : Ce test classique illustre le processus.

      Pour relier 9 points avec 4 segments de droite sans lever le crayon, il faut abandonner des modèles mentaux implicites (ne pas sortir du carré, ne pas repasser sur un trait).

      La solution émerge lorsqu'on transgresse ces règles auto-imposées.

      4.2. La Créativité Animale : Le Poulpe et la "Métis"

      Patricia Ribault utilise l'exemple du poulpe pour illustrer une forme d'intelligence créative non-humaine, la "métis" (la ruse), théorisée par Marcel d'Étienne et Jean-Pierre Vernand.

      Un Être sans Structure Rigide : Le poulpe peut prendre et perdre forme, ce qui lui confère une plasticité exceptionnelle.

      Maître du Camouflage : Sa créativité s'exprime dans sa capacité à interagir avec la perception de l'autre.

      Le camouflage n'est pas seulement se fondre, mais "tromper celui ou ceux qui vous regardent". Il peut être défensif ou offensif (hypnotiser une proie).

      Le "Mimic Octopus" : Cette espèce est capable non seulement de se camoufler mais de changer son comportement pour imiter d'autres animaux en fonction de la situation.

      La Métis comme Forme de Créativité : La métis est décrite comme une "intelligence à l'œuvre dans le devenir", utilisant "la prudence, la perspicacité, la promptitude", mais aussi "la ruse, voire le mensonge".

      L'être "amétis", comme le poulpe, est "insaisissable" et capable de "retourner constamment des situations".

      5. Thèmes Transversaux et Conclusion

      La discussion finale aborde plusieurs questions clés sur la nature de la créativité et les distinctions entre l'humain et la machine.

      Authenticité et Subjectivité : La question de l'authenticité reste la plus difficile à attribuer aux IA.

      L'authenticité humaine est liée à un déséquilibre interne et à une intention expressive.

      Les IA peuvent simuler une forme de subjectivité primaire (en ayant des modèles de leurs propres connaissances), mais l'expressivité profonde reste un attribut humain.

      Hasard et Contrainte : Le hasard est une composante essentielle du fonctionnement cérébral, notamment via le "bruit neuronal" qui augmente lorsque les modèles du monde sont mis en défaut, ouvrant le "champ des possibles".

      Cependant, comme le montre le jeu de Mozart, un hasard apparent peut opérer au sein de contraintes très fortes.

      La créativité réside dans ce jeu entre ouverture (pensée divergente) et fermeture (pensée convergente).

      Les Limites Actuelles de l'IA : Une anecdote est partagée sur une IA chargée d'improviser dans le style de L'Art de la Fugue de Bach.

      Le résultat était bluffant en surface ("la chair"), mais ignorait complètement la structure fondamentale de l'œuvre.

      De même, un texte rédigé par une IA est décrit comme "très fluide", "cohérent en surface", mais sans "corps" ni profondeur sémantique.

      Sérendipité : Il est souligné que la créativité ne peut pas être planifiée.

      Elle émerge souvent de la sérendipité : la découverte de quelque chose d'intéressant par hasard en cherchant autre chose.

      Pour être efficace, la sérendipité nécessite cependant une capacité de reconnaissance de ce qui est intéressant, ce qui renvoie à la subjectivité et au modèle interne du créateur.

    1. Reviewer #1 (Public review):

      Summary:

      The authors attempt to study how oocyte incomplete cytokinesis occurs in the mouse ovary.

      Strengths:

      The finding that UPR components are highly expressed during zygotene is an interesting result that has broad implications for how germ cells navigate meiosis. The findings that proteasome activity increases in germ cells compared to somatic cells suggest that the germline might have a quantitatively different response for protein clearance.

      Weaknesses:

      (1) The microscopy images look saturated, for example, Figure 1a, b, etc? Is this a normal way to present fluorescent microscopy?

      (2) The authors should ensure that all claims regarding enrichment/lower vs lower values have indicated statistical tests.

      (a) In Figure 2f, the authors should indicate which comparison is made for this test. Is it comparing 2 vs 6 cyst numbers?

      (b) Figures 4d and 4e do not have a statistical test indicated.

      (3) Because the system is developmentally dynamic, the major conclusions of the work are somewhat unclear. Could the authors be more explicit about these and enumerate them more clearly in the abstract?

      (4) The references for specific prior literature are mostly missing (lines 184-195, for example).

      (5) The authors should define all acronyms when they are first used in the text (UPR, EGAD, etc).

      (6) The jumping between topics (EMA, into microtubule fragmentation, polarization proteins, UPR/ERAD/EGAD, GCNA, ER, balbiani body, etc) makes the narrative of the paper very difficult to follow.

      (7) The heading title "Visham participates in organelle rejuvenation during meiosis" in line 241 is speculative and/or not supported. Drawing upon the extensive, highly rigorous Drosophila literature, it is safe to extrapolate, but the claim about regeneration is not adequately supported.

    2. Reviewer #3 (Public review):

      This manuscript provides evidence that mice have a fusome, a conserved structure most well studied in Drosophila that is important for oocyte specification. Overall, a myriad of evidence is presented demonstrating the existence of a mouse fusome that the authors term visham. This work is important as it addresses a long-standing question in the field of whether mice have fusomes and sheds light on how oocytes are specified in mammals. Concerns that need to be addressed revolve around several conclusions that are overstated or unclear and are listed below.

      (1) Line 86 - the heading for this section is "PGCs contain a Golgi-rich structure known as the EMA granule" but there is nothing in this section that shows it is Golgi-rich. It does show that the structure is asymmetric and has branches.

      (2) Line 105-106, how do we know if what's seen by EM corresponds to the EMA1 granule?

      (3) Line 106-107-states "Visham co-stained with the Golgi protein Gm130 and the recycling endosomal protein Rab11a1". This is not convincing as there is only one example of each image, and both appear to be distorted.

      (4) Line 132-133---while visham formation is disrupted when microtubules are disrupted, I am not convinced that visham moves on microtubules as stated in the heading of this section.

      (5) Line 156 - the heading for this section states that Visham associates with polarity and microtubule genes, including pard3, but only evidence for pard3 is presented.

      (6) Lines 196-210 - it's strange to say that UPR genes depend on DAZ, as they are upregulated in the mutants. I think there are important observations here, but it's unclear what is being concluded.

      (7) Line 257-259---wave 1 and 2 follicles need to be explained in the introduction, and how this fits with the observations here clarified.

    3. Author response:

      Reviewer #1 (Public Review):

      Summary

      We thank the reviewer for the constructive and thoughtful evaluation of our work. We appreciate the recognition of the novelty and potential implications of our findings regarding UPR activation and proteasome activity in germ cells.

      (1) The microscopy images look saturated, for example, Figure 1a, b, etc. Is this a normal way to present fluorescent microscopy?

      The apparent saturation was not present in the original images, but likely arose from image compression during PDF generation. While the EMA granule was still apparent, in the revised submission, we will provide high-resolution TIFF files to ensure accurate representation of fluorescence intensity and will carefully optimize image display settings to avoid any saturation artifacts.

      (2) The authors should ensure that all claims regarding enrichment/lower vs. lower values have indicated statistical tests.

      We fully agree. In the revised version, we will correct any quantitative comparisons where statistical tests were not already indicated, with a clear statement of the statistical tests used, including p-values in figure legends and text.

      (a) In Figure 2f, the authors should indicate which comparison is made for this test. Is it comparing 2 vs. 6 cyst numbers?

      We acknowledge that the description was not sufficiently detailed. Indeed, the test was not between 2 vs 6 cyst numbers, but between all possible ways 8-cell cysts or the larger cysts studied could fragment randomly into two pieces, and produce by chance 6-cell cysts in 13 of 15 observed examples. We will expand the legend and main text to clarify that a binomial test was used to determine that the proportion of cysts producing 6-cell fragments differed very significantly from chance.

      Revised text:

      “A binomial test was used to assess whether the observed frequency of 6-cell cyst products differed from random cyst breakage. Production of 6-cell cysts was strongly preferred (13/15 cysts; ****p < 0.0001).”

      (b) Figures 4d and 4e do not have a statistical test indicated.

      We will include the specific statistical test used and report the corresponding p-values directly in the figure legends.

      (3) Because the system is developmentally dynamic, the major conclusions of the work are somewhat unclear. Could the authors be more explicit about these and enumerate them more clearly in the abstract?

      We will revise the abstract to better clarify the findings of this study. We will also replace the term Visham with mouse fusome to reflect its functional and structural analogy to the Drosophila and Xenopus fusomes, making the narrative more coherent and conclusive.

      (4) The references for specific prior literature are mostly missing (lines 184-195, for example).

      We appreciate this observation of a problem that occurred inadvertently when shortening an earlier version.  We will add 3–4 relevant references to appropriately support this section.

      (5) The authors should define all acronyms when they are first used in the text (UPR, EGAD, etc).

      We will ensure that all acronyms are spelled out at first mention (e.g., Unfolded Protein Response (UPR), Endosome and Golgi-Associated Degradation (EGAD)).

      (6)  The jumping between topics (EMA, into microtubule fragmentation, polarization proteins, UPR/ERAD/EGAD, GCNA, ER, balbiani body, etc) makes the narrative of the paper very difficult to follow.

      We are not jumping between topics, but following a narrative relevant to the central question of whether female mouse germ cells develop using a fusome.  EMA, microtubule fragmentation, polarization proteins, ER, and balbiani body are all topics with a known connection to fusomes. This is explained in the general introduction and in relevant subsections. We appreciate this feedback that further explanations of these connections would be helpful. In the revised manuscript, use of the unified term mouse fusome will also help connect the narrative across sections.  UPR/ERAD/EGAD are processes that have been studied in repair and maintenance of somatic cells and in yeast meiosis.  We show that the major regulator XbpI is found in the fusome, and that the fusome and these rejuvenation pathway genes are expressed and maintained throughout oogenesis, rather than only during limited late stages as suggested in previous literature.

      (7) The heading title "Visham participates in organelle rejuvenation during meiosis" in line 241 is speculative and/or not supported. Drawing upon the extensive, highly rigorous Drosophila literature, it is safe to extrapolate, but the claim about regeneration is not adequately supported.

      We believe this statement is accurate given the broad scope of the term "participates." It is supported by localization of the UPR regulator XbpI to the fusome. XbpI is the ortholog of HacI a key gene mediating UPR-mediated rejuvenation during yeast meiosis.  We also showed that rejuvenation pathway genes are expressed throughout most of meiosis (not previously known) and expanded cytological evidence of stage-specific organelle rejuvenation later in meiosis, such as mitochondrial-ER docking, in regions enriched in fusome antigens. However, we recognize the current limitations of this evidence in the mouse, and want to appropriately convey this, without going to what we believe would be an unjustified extreme of saying there is no evidence. 

      Reviewer #2 (Public Review):

      We thank the reviewer for the comprehensive summary and for highlighting both the technical achievement and biological relevance of our study. We greatly appreciate the thoughtful suggestions that have helped us refine our presentation and terminology.

      (1) Some titles contain strong terms that do not fully match the conclusions of the corresponding sections.

      (1a) Article title “Mouse germline cysts contain a fusome-like structure that mediates oocyte development”

      We will change the statement to: “Mouse germline cysts contain a fusome that supports germline cyst polarity and rejuvenation.”

      (1b) Result title “Visham overlaps centrosomes and moves on microtubules” We acknowledge that “moves” implies dynamics. We will include additional supplementary images showing small vesicular components of the mouse fusome on spindle-derived microtubule tracks.

      (1c) Result title “Visham associates with Golgi genes involved in UPR beginning at the onset of cyst formation”

      We will revise this title to: “The mouse fusome associates with the UPR regulatory protein Xbp1 beginning at the onset of cyst formation” to reflect the specific UPR protein that was immunolocalized. 

      (1d) Result title “Visham participates in organelle rejuvenation during meiosis”

      We will revise this to: “The mouse fusome persists during organelle rejuvenation in meiosis.”

      (2) The authors aim to demonstrate that Visham is a fusome-like structure. I would suggest simply referring to it as a "fusome-like structure" rather than introducing a new term, which may confuse readers and does not necessarily help the authors' goal of showing the conservation of this structure in Drosophila and Xenopus germ cells. Interestingly, in a preprint from the same laboratory describing a similar structure in Xenopus germ cells, the authors refer to it as a "fusome-like structure (FLS)" (Davidian and Spradling, BioRxiv, 2025).

      We appreciate the reviewer’s insightful comment. To maintain conceptual clarity and align with existing literature, we will refer to the structure as the mouse fusome throughout the manuscript, avoiding introduction of a new term.

      Reviewer #3 (Public Review):

      We thank the reviewer for emphasizing the importance of our study and for providing constructive feedback that will help us clarify and strengthen our conclusions.

      (1) Line 86 - the heading for this section is "PGCs contain a Golgi-rich structure known as the EMA granule" 

      We agree that the enrichment of Golgi within the EMA PGCs was not shown until the next section. We will revise this heading to:

      “PGCs contain an asymmetric EMA granule.”

      (2)  Line 105-106, how do we know if what's seen by EM corresponds to the EMA1 granule?

      We will clarify that this identification is based on co-localization with Golgi markers (GM130 and GS28) and response to Brefeldin A treatment, which will be included as supplementary data. These findings support that the mouse fusome is Golgi-derived and can therefore be visualized by EM. The Golgi regions in E13.5 cyst cells move close together and associate with ring canals as visualized by EM (Figure 1E), the same as the mouse fusomes identified by EMA.

      (3) Line 106-107-states "Visham co-stained with the Golgi protein Gm130 and the recycling endosomal protein Rab11a1". This is not convincing as there is only one example of each image, and both appear to be distorted.

      Space is at a premium in these figures, but we have no limitation on data documenting this absolutely clear co-localization. We will replace the existing images with high-resolution, non-compressed versions for the final figures to clearly illustrate the co-staining patterns for GM130 and Rab11a1.

      (4) Line 132-133---while visham formation is disrupted when microtubules are disrupted, I am not convinced that visham moves on microtubules as stated in the heading of this section.

      We will include additional supplementary data showing small mouse fusome vesicles aligned along microtubules.

      (5) Line 156 - the heading for this section states that Visham associates with polarity and microtubule genes, including pard3, but only evidence for pard3 is presented.

      We agree and will revise the heading to: “Mouse fusome associates with the polarity protein Pard3.” We are adding data showing association of small fusome vesicles on microtubules.  

      (6)  Lines 196-210 - it's strange to say that UPR genes depend on DAZ, as they are upregulated in the mutants. I think there are important observations here, but it's unclear what is being concluded.

      UPR genes are not upregulated in DAZ in the sense we have never documented them increasing. We show that UPR genes during this time behave like pleuripotency genes and normally decline, but in DAZ mutants their decline is slowed.  We will rephrase the paragraph to clarify that Dazl mutation partially decouples developmental processes that are normally linked, which alters UPR gene expression relative to cyst development.

      (7) Line 257-259-wave 1 and 2 follicles need to be explained in the introduction, and how these fits with the observations here clarified.

      Follicle waves are too small a focus of the current study to explain in the introduction, but we will request readers to refer to the cited relevant literature (Yin and Spradling, 2025) for further details.

      We sincerely thank all reviewers for their insightful and constructive feedback. We believe that the planned revisions—particularly the refined terminology, improved image quality, clarified statistics, and restructured abstract—will substantially strengthen the manuscript and enhance clarity for readers.

    1. Reviewer #1 (Public review):

      Summary:

      In this paper, the authors conduct both experiments and modeling of human cytomegalovirus (HCMV) infection in vitro to study how the infectivity of the virus (measured by cell infection) scales with the viral concentration in the inoculum. A naïve thought would be that this is linear in the sense that doubling the virus concentration (and thus the total virus) in the inoculum would lead to doubling the fraction of infected cells. However, the authors show convincingly that this is not the case for HCMV, using multiple strains, two different target cells, and repeated experiments. In fact, they find that for some regimens (inoculum concentration), infected cells increase faster than the concentration of the inoculum, which they term "apparent cooperativity". The authors then provided possible explanations for this phenomenon and constructed mathematical models and simulations to implement these explanations. They show that these ideas do help explain the cooperativity, but they can't be conclusive as to what the correct explanation is. In any case, this advances our knowledge of the system, and it is very important when quantitative experiments involving MOI are performed.

      Strengths:

      Careful experiments using state-of-the-art methodologies and advancing multiple competing models to explain the data.

      Weaknesses:

      There are minor weaknesses in explaining the implementation of the model. However, some specific assumptions, which to this reviewer were unclear, could have a substantial impact on the results. For example, whether cell infection is independent or not. This is expanded below.

      Suggestions to clarify the study:

      (1) Mathematically, it is clear what "increase linearly" or "increase faster than linearly" (e.g., line 94) means. However, it may be confusing for some readers to then look at plots such as in Figure 2, which appear linear (but on the log-log scale) and about which the authors also say (line 326) "data best matching the linear relationship on a log-log scale".

      (2) One of the main issues that is unclear to me is whether the authors assume that cell infection is independent of other cells. This could be a very important issue affecting their results, both when analyzing the experimental data and running the simulations. One possible outcome of infection could be the generation of innate mediators that could protect (alter the resistance) of nearby cells. I can imagine two opposite results of this: i) one possibility is that resistance would lead to lower infection frequencies and this would result in apparent sub-linear infection (contrary to the observations); or ii) inoculums with more virus lead to faster infection, which doesn't allow enough time for the "resistance" (innate effect) to spread (potentially leading to results similar to the observations, supra-linear infection).

      (3) Another unclear aspect of cell infection is whether each cell only has one chance to be infected or multiple chances, i.e., do the authors run the simulation once over all the cells or more times?

      (4) On the other hand, the authors address the complementary issue of the virus acting independently or not, with their clumping model (which includes nice experimental measurements). However, it was unclear to me what the assumption of the simulation is in this case. In the case of infection by a clump of virus or "viral compensation", when infection is successful (the cell becomes infected), how many viruses "disappear" and what happens to the rest? For example, one of the viruses of the clump is removed by infection, but the others are free to participate in another clump, or they also disappear. The only thing I found about this is the caption of Figure S10, and it seems to indicate that only the infected virus is removed. However, a typical assumption, I think, is that viruses aggregate to improve infection, but then the whole aggregate participates in infection of a single cell, and those viruses in the clump can't participate in other infections. Viral cooperativity with higher inocula in this case would be, perhaps, the result of larger numbers of clumps for higher inocula. This seems in agreement with Figure S8, but was a little unclear in the interpretation provided.

      (5) In algorithm 1, how does P_i, as defined, relate to equation 1?

      (6) In line 228, and several other places (e.g., caption of Table S2), the authors refer to the probability of a single genome infecting a cell p(1)=exp(-lambda), but shouldn't it be p(1)=1-exp(-lambda) according to equation 1?

      (7) In line 304, the accrued damage hypothesis is defined, but it is stated as a triggering of an antiviral response; one would assume that exposure to a virion should increase the resistance to infection. Otherwise, the authors are saying that evolution has come up with intracellular viral resistance mechanisms that are detrimental to the cell. As I mentioned above, this could also be a mechanism for non-independent cell infection. For example, infected cells signal to neighboring cells to "become resistance" to infection. This would also provide a mechanism for saturation at high levels.

      (8) In Figure 3, and likely other places, t-tests are used for comparisons, but with only an n=5 (experiments). Many would prefer a non-parametric test.

    2. Reviewer #3 (Public review):

      Summary:

      The authors dilute fluorescent HCMV stocks in small steps (df ≈ 1.3-1.5) across 23 points, quantify infections by flow cytometry at 3 dpi, and fit a power-law model to estimate a cooperativity parameter n (n > 1 indicates apparent cooperativity). They compare fibroblasts vs epithelial cells and multiple strains/reporters, and explore alternative mechanisms (clumping, accrued damage, viral compensation) via analytical modeling and stochastic simulations. They discuss implications for titer/MOI estimation and suggest a method for detecting "apparent cooperativity," noting that for viruses showing this behavior, MOI estimation may be biased.

      Strengths:

      (1) High-resolution titration & rigor: The small-step dilution design (23 serial dilutions; tailored df) improves dose-response resolution beyond conventional 10× series.

      (2) Clear quantitative signal: Multiple strain-cell pairs show n > 1, with appropriate model fitting and visualization of the linear regime on log-log axes.

      (3) Mechanistic exploration: Side-by-side modeling of clumping vs accrued damage vs compensation frames testable hypotheses for cooperativity.

      Weaknesses:

      (1) Secondary infection control: The authors argue that 3 dpi largely avoids progeny-mediated secondary infection; this claim should be strengthened (e.g., entry inhibitors/control infections) or add sensitivity checks showing results are robust to a small secondary-infection contribution.

      (2) Discriminating mechanisms: At present, simulations cannot distinguish between accrued damage and viral compensation. The authors should propose or add a decisive experiment (e.g., dual-color coinfection to quantify true coinfection rates versus "priming" without coinfection; timed sequential inocula) and outline expected signatures for each mechanism.

      (3) Decline at high genomes/cell: Several datasets show a downturn at high input. Hypotheses should be provided (cytotoxicity, receptor depletion, and measurement ceiling) and any supportive controls.

      (4) Include experimental data: In Figure 6, please include the experimentally measured titers (IU/mL), if available.

      (5) MOI guidance: The practical guidance is important; please add a short "best-practice box" (how to determine titer at multiple genomes/cell and cell densities; when single-hit assumptions fail) for end-users.

    3. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this paper, the authors conduct both experiments and modeling of human cytomegalovirus (HCMV) infection in vitro to study how the infectivity of the virus (measured by cell infection) scales with the viral concentration in the inoculum. A naïve thought would be that this is linear in the sense that doubling the virus concentration (and thus the total virus) in the inoculum would lead to doubling the fraction of infected cells. However, the authors show convincingly that this is not the case for HCMV, using multiple strains, two different target cells, and repeated experiments. In fact, they find that for some regimens (inoculum concentration), infected cells increase faster than the concentration of the inoculum, which they term "apparent cooperativity". The authors then provided possible explanations for this phenomenon and constructed mathematical models and simulations to implement these explanations. They show that these ideas do help explain the cooperativity, but they can't be conclusive as to what the correct explanation is. In any case, this advances our knowledge of the system, and it is very important when quantitative experiments involving MOI are performed.

      Strengths:

      Careful experiments using state-of-the-art methodologies and advancing multiple competing models to explain the data.

      Weaknesses:

      There are minor weaknesses in explaining the implementation of the model. However, some specific assumptions, which to this reviewer were unclear, could have a substantial impact on the results. For example, whether cell infection is independent or not. This is expanded below.

      Suggestions to clarify the study:

      (1) Mathematically, it is clear what "increase linearly" or "increase faster than linearly" (e.g., line 94) means. However, it may be confusing for some readers to then look at plots such as in Figure 2, which appear linear (but on the log-log scale) and about which the authors also say (line 326) "data best matching the linear relationship on a log-log scale". 

      This is a good point. In our revision, we will include a clarification to indicate that linear on the log-log scale relationship does not imply linear relationship on the linear-linear scale.

      (2) One of the main issues that is unclear to me is whether the authors assume that cell infection is independent of other cells. This could be a very important issue affecting their results, both when analyzing the experimental data and running the simulations. One possible outcome of infection could be the generation of innate mediators that could protect (alter the resistance) of nearby cells. I can imagine two opposite results of this: i) one possibility is that resistance would lead to lower infection frequencies and this would result in apparent sub-linear infection (contrary to the observations); or ii) inoculums with more virus lead to faster infection, which doesn't allow enough time for the "resistance" (innate effect) to spread (potentially leading to results similar to the observations, supra-linear infection). 

      In our models we assumed cells to be independent of each other (see also responses to other similar points). Because we measure infection in individual cells, assuming cells are independent is a reasonable first approximation. However, the reviewer makes an excellent point that there may be some between-cell signaling happening in the culture that “alerts” or “conditions” cells to change their “resistance”. It is also possible that at higher genome/cell numbers, exposure of cells to virions or virion debris may change the state of cells in the culture, and more cells become “susceptible” to infection. This is a good point that we will list in Limitations subsection of Discussion; it is a good hypothesis to test in our future experiments.

      (3) Another unclear aspect of cell infection is whether each cell only has one chance to be infected or multiple chances, i.e., do the authors run the simulation once over all the cells or more times? 

      Each cell has only one chance to be infected. Algorithm 1 clearly states that; we will add an extra sentence in “Agent-based simulations” to indicate this point.

      (4) On the other hand, the authors address the complementary issue of the virus acting independently or not, with their clumping model (which includes nice experimental measurements). However, it was unclear to me what the assumption of the simulation is in this case. In the case of infection by a clump of virus or "viral compensation", when infection is successful (the cell becomes infected), how many viruses "disappear" and what happens to the rest? For example, one of the viruses of the clump is removed by infection, but the others are free to participate in another clump, or they also disappear. The only thing I found about this is the caption of Figure S10, and it seems to indicate that only the infected virus is removed. However, a typical assumption, I think, is that viruses aggregate to improve infection, but then the whole aggregate participates in infection of a single cell, and those viruses in the clump can't participate in other infections. Viral cooperativity with higher inocula in this case would be, perhaps, the result of larger numbers of clumps for higher inocula. This seems in agreement with Figure S8, but was a little unclear in the interpretation provided. 

      This is a good point. We did not remove the clump if one of the virions in the clump manages to infect a cell, and indeed, this could be the reason why in some simulations we observe apparent cooperativity when modeling viral clumping. This is something we will explore in our revision.

      (5) In algorithm 1, how does P_i, as defined, relate to equation 1? 

      These are unrelated because eqn.(1) is a phenomenological model that links infection per cell to genomes per cell. P_i in algorithm 1 is “physics-inspired” potential barrier.

      (6) In line 228, and several other places (e.g., caption of Table S2), the authors refer to the probability of a single genome infecting a cell p(1)=exp(-lambda), but shouldn't it be p(1)=1-exp(-lambda) according to equation 1?

      Indeed, it was a typo, p(1)=1-exp(-lambda) per eqn 1. Thank you, it will be corrected in the revised paper.

      (7) In line 304, the accrued damage hypothesis is defined, but it is stated as a triggering of an antiviral response; one would assume that exposure to a virion should increase the resistance to infection. Otherwise, the authors are saying that evolution has come up with intracellular viral resistance mechanisms that are detrimental to the cell. As I mentioned above, this could also be a mechanism for non-independent cell infection. For example, infected cells signal to neighboring cells to "become resistance" to infection. This would also provide a mechanism for saturation at high levels. 

      We do not know how exposure of a cell to one virion would change its “antiviral state”, i.e., to become more or less resistant to the next infection. If a cell becomes more resistant, there is no possibility to observe apparent cooperativity in infection of cells, so this hypothesis cannot explain our observations with n>1. Whether this mechanism plays a role in saturation of cell infection rate at lower than 1 value when genome/cell is large is unclear but is a possibility. We will add this point to Discussion in revision.

      (8) In Figure 3, and likely other places, t-tests are used for comparisons, but with only an n=5 (experiments). Many would prefer a non-parametric test. 

      We repeated the analyses in Fig 3 with Mann-Whitney test, results were the same, so we would like to keep results from the t-test in the paper.

      Reviewer #2 (Public review):

      In their article, Peterson et al. wanted to show to what extent the classical "single hit" model of virion infection, where one virion is required to infect a cell, does not match empirical observations based on human cytomegalovirus in vitro infection model, and how this would have practical impacts in experimental protocols.

      They first used a very simple experimental assay, where they infected cells with serially diluted virions and measured the proportion of infected cells with flow cytometry. From this, they could elegantly show how the proportion of infected cells differed from a "single hit" model, which they simulated using a simple mathematical model ("powerlaw model"), and better fit a model where virions need to cooperate to infect cells. They then explore which mechanism could explain this apparent cooperation:

      (1) Stochasticity alone cannot explain the results, although I am unsure how generalizable the results are, because the mathematical model chosen cannot, by design, explain such observations only by stochasticity. 

      Our null model simulations are not just about stochasticity; they also include variability in virion infectivity and cell resistance to infection. We agree that simulations cannot truly prove that such variability cannot result in apparent cooperativity; however, we also provide a mathematical proof that increase in frequency of infected cells should be linear with virion concentration at small genome/cell numbers.

      (2) Virion clumping seemed not to be enough either to generally explain such a pattern. For that, they first use a mathematical model showing that the apparent cooperation would be small. However, I am unsure how extreme the scenario of simulated virion clumping is. They then used dynamic light scattering to measure the distribution of the sizes of clumps. From these estimates, they show that virion clumps cannot reproduce the observed virion cooperation in serial dilution assays. However, the authors remain unprecise on how the uncertainty of these clumps' size distribution would impact the results, as most clumps have a size smaller than a single virion, leaving therefore a limited number of clumps truly containing virions. 

      As we stated in the paper, clumping may explain apparent cooperativity in simulations depending on how stock dilution impacts distribution of virions/clump. This could be explored further, however, better experimental measurements of virions/clump would be highly informative (but we do not have resources to do these experiments at present). Our point is that the degree of apparent cooperativity is dependent on the target cell used (n is smaller on epithelial cells than on fibroblasts) that is difficult to explain by clumping which is a virion property. Per comment by reviewer 1, we will do some more analyses of the clumping model to investigate importance of clump removal per successful infection on the detected degree of apparent cooperativity.

      The two models remain unidentifiable from each other but could explain the apparent virion cooperativity: either due to an increase in susceptibility of the cell each time a virion tries to infect it, or due to viral compensation, where lesser fit viruses are able to infect cells in co-infection with a better fit virion. Unfortunately, the authors here do not attempt to fit their mathematical model to the experimental data but only show that theoretical models and experimental data generate similar patterns regarding virion apparent cooperation. 

      In the revision we will provide examples of simulations that “match” experimental data with a relatively high degree of apparent cooperativity; we have done those before but excluded them from the current version since they are a bit messy. Fitting simulations to data may be an overkill.

      Finally, the authors show that this virions cooperation could make the relationship between the estimated multiplicity of infection and viruses/cell deviate from the 1:1 relationship. Consequently, the dilution of a virion stock would lead to an even stronger decrease in infectivity, as more diluted virions can cooperate less for infection.

      Overall, this work is very valuable as it raises the general question of how the estimate of infectivity can be biased if extrapolated from a single virus titer assay. The observation that HCMV virions often cooperate and that this cooperation varies between contexts seems robust. The putative biological explanations would require further exploration.

      This topic is very well known in the case of segmented viruses and the semi-infectious particles, leading to the idea of studying "sociovirology", but to my knowledge, this is the first time that it was explored for a nonsegmented virus, and in the context of MOI estimation. 

      Thank you.

      Reviewer #3 (Public review): 

      Summary:

      The authors dilute fluorescent HCMV stocks in small steps (df ≈ 1.3-1.5) across 23 points, quantify infections by flow cytometry at 3 dpi, and fit a power-law model to estimate a cooperativity parameter n (n > 1 indicates apparent cooperativity). They compare fibroblasts vs epithelial cells and multiple strains/reporters, and explore alternative mechanisms (clumping, accrued damage, viral compensation) via analytical modeling and stochastic simulations. They discuss implications for titer/MOI estimation and suggest a method for detecting "apparent cooperativity," noting that for viruses showing this behavior, MOI estimation may be biased.

      Strengths:

      (1) High-resolution titration & rigor: The small-step dilution design (23 serial dilutions; tailored df) improves dose-response resolution beyond conventional 10× series.

      (2) Clear quantitative signal: Multiple strain-cell pairs show n > 1, with appropriate model fitting and visualization of the linear regime on log-log axes.

      (3) Mechanistic exploration: Side-by-side modeling of clumping vs accrued damage vs compensation frames testable hypotheses for cooperativity. 

      Thank you.

      Weaknesses:

      (1) Secondary infection control: The authors argue that 3 dpi largely avoids progeny-mediated secondary infection; this claim should be strengthened (e.g., entry inhibitors/control infections) or add sensitivity checks showing results are robust to a small secondary-infection contribution. 

      This is an important point. We do believe that the current knowledge about HCMV virion production time – it takes 3-4 days to make virions per multiple papers (see Fig 7 in Vonka and Benyesh-Melnick JB 1966; Fig 3B in Stanton et al JCI 2010; and Fig 1A in Li et al. PNAS 2015) – is sufficient to justify our experimental design but we do agree that an additional control to block novel infections with would be useful. We had previously performed experiments with a HCMV TB-gL-KO that cannot make infectious virions (but the stock virions can be made from complemented target cells). We will investigate if our titration experiments with this virus strain have sufficient resolution to detect apparent cooperativity. However, at present we do not have the resources to perform novel experiments.  

      (2) Discriminating mechanisms: At present, simulations cannot distinguish between accrued damage and viral compensation. The authors should propose or add a decisive experiment (e.g., dual-color coinfection to quantify true coinfection rates versus "priming" without coinfection; timed sequential inocula) and outline expected signatures for each mechanism. 

      Excellent suggestion. Because infection of a cell is a result of the joint viral infectivity and cell resistance, it may be hard to discriminate between these alternatives unless we specify them as particular molecular mechanisms. But we will try our best and list potential future experiments in the revised version of the paper.

      (3) Decline at high genomes/cell: Several datasets show a downturn at high input. Hypotheses should be provided (cytotoxicity, receptor depletion, and measurement ceiling) and any supportive controls. 

      Another good point. We do not have a good explanation, but we do not believe this is because of saturation of available target cells.  It seemed to only happen (or was most pronounced) with the ME stocks, which are typically lower in titer and so the higher MOI were nearly undiluted stock. It may be the effect of the conditioned medium.  Or perhaps there are non-infectious particles like dense bodies (enveloped particles that lack a capsid and genome) and non-infectious, enveloped particles (NIEPs) that compete for receptors or otherwise damage cells and these don’t get diluted out at the higher doses.  We plan to include these points in Discussion of the revised version of the paper.

      (4) Include experimental data: In Figure 6, please include the experimentally measured titers (IU/mL), if available. 

      This is a model-simulated scenario, and as such, there is no measured titers.

      (5) MOI guidance: The practical guidance is important; please add a short "best-practice box" (how to determine titer at multiple genomes/cell and cell densities; when single-hit assumptions fail) for end-users. 

      Good suggestion. We will include best-practice box using guidelines developed in Ryckman lab over the years in the revised version of the paper.

      Overall note to all reviews: We have deposited our codes and the data on github; yet, none of the reviewers commented on it.

    1. Reviewer #1 (Public review):

      Summary:

      The authors use high-resolution ribosome profiling (Ezra-seq) and eRF1 pulldown-based ribosome profiling (eRF1-seq) developed in their lab to identify a GA rich sequence motif located upstream of the stop codon responsible for translation termination pausing. They then perform a massively parallel assay with randomly generated sequences to further characterize this motif. Using mouse tissues, they show that termination pausing signatures can be tissue-specific. They use a series of published ribosome structures and 18S rRNA mutants, and eS26 knockdown experiments to propose that the GA rich sequence interacts with the 3′-end of the 18S rRNA.

      Strengths:

      (1) Robust ribosome profiling data and clear analyses clarify the subtle behavior of terminating ribosomes near the stop codon.

      (2) Novel termination or "false termination" sites revealed by eRF1-seq in the 5′-UTR, 3′-UTR, and CDS highlight a previously underappreciated facet of translation dynamics.

      Weakness:

      (1) Modest effects seen in ABCE1 knockdown do not seem to add up to the level of regulation. The authors state "ABCE1 regulates terminating ribosomes independent of the sequence context" on pg 9, and "ABCE1 modulates termination pausing independent of the mRNA sequence context" in the figure caption for Figure S4. Given the modest effect of the knockdown, such phrasing is most likely not supported. Further clarification of "ABCE1 plays a generic role in translation termination" is necessary.

      (2) The authors propose that the GA rich sequence element upstream of the stop codon on the mRNA could potentially base pair with the 3′-end of the 18S rRNA. In the PDBs the authors reference in their paper and also in 3JAG, 3JAH, 3JAI (structures of terminating ribosomes with the stop codon in the A-site and eRF1), the mRNA exiting the ribosome and the 3′-end of the 18S rRNA are about 25-30 A apart. In addition, a segment of eS26 is wedged in between these two RNA segments. This reviewer noted this arrangement in a random sampling of 5 other PDBs of mammalian and human ribosome 80S structures. How do the authors anticipate the base pairing they have proposed to occur in light of these steric hindrances? RpsS26 is known to be released by Tsr2 in yeast during very specific stresses. Is it their expectation that termination pausing in human/mammalian cells happens during stressful conditions only?

      (3) The authors say, "It is thus likely that mRNA undergoes post-decoding scanning by 18S rRNA." (pg. 10). It is unclear what the authors mean by "scanning." Do they mean that the mRNA gets scanned in a manner similar to scanning during initiation? There is no evidence presented to support that particular conclusion.

      (4) Role of termination pausing in the testis is highly speculative. The authors state: "It is thus conceivable that the wide range of ribosome density at stop codons in testis facilitates functional division of ribosome occupancy beyond the coding region." It is unclear what type of functional division they are referring to.

    2. Reviewer #4 (Public review):

      Summary:

      This manuscript by Qian and colleagues utilizes ribosome profiling, and reporter assays to dissect translation termination. Unfortunately, the data do not support the conclusions of the paper, controls are missing and several assays are not well validated and do not reproduce previous findings from others.

      Specific comments:

      • Translation termination has been studied in several organisms including mammalian cells and yeast. In those cases what is analyzed is not the peak height at the stop codon, but rather the difference in the ribosome density before and after the stop. Thus, analyzing peak height is not validated. I understand that this is relevant only for the ribosome profiling experiments (and Ezra-seq) not the RF1 profiling. But much of the data was acquired that way.

      • Moreover, the data do not reproduce previous findings and no effort is made to connect them to previous data. Previous data has shown that stop codon efficacy varies. This is not reproduced (S1C). Similarly, an effect from the +1 residue is not reproduced. The data isn't even stratified by different stop codons as previous work has shown that different surrounding residues have different effects in the context of different stop codons. Thus, none of the sequencing data is validated or trusted and does not reproduce previous findings.

      • The GA-rich sequence identified by Ezra-Seq and RF1 seq is not the same and it differs from previous sequences (Wangen &Green).

      • The authors claim that the majority of Rf1 peaks is at stop codons, but that is not true. It is only about 30% of the peaks. Also, not all mRNAs have peaks at the stop codons. That is at best problematic. Finally, there are mRNAs that are known to "suffer" from NMD, what do these look like in the Ezra-Seq and RF1-Seq? How about mRNAs that have programmed frameshifts? This raises questions on the validity of the eRF1 data.

      • Figure 4: First, instead of M/P ratio, one should analyze M/M+P, to normalize out differences in the loading and effects from collisions, which are guaranteed to occur here, but not considered or analyzed. Second, the data are analyzed as if what matters are codons in the P and E site (and beyond, where there are definitely NOT recognized codons). While there is evidence for some interactions, one would think that an additional analysis based on sequence would be helpful. Also, the supplemental data indicates that very rarely are there reciprocal changes (as should be the case), and as seen for stop codons.

      • Regarding the HiBit reporter assay: The two sequecnes clearly have effects on translation without considering stop codon context (Figure 4C), which need to be taken into account. Also, the effect from the sequences varies in the context of the assay in 4C and 4D (2-fold vs .5 fold), further questioning the assay. Moreover, the authors claim that re-initiation cannot account for Hibit levels, but that is clearly incorrect. The western in Figure 4E does not reproduce the data in 4D. While Hibit goes up (as in 4D, the putative GFP-fusion goes down. Finally, while the second reading frame should be more efficient is not explained and further argues for an artifact. Previous work (and work herein) suggests that read-through occurs equally in each reading frame. No controls for these assays are presented: e.g. stimulation by antibiotics, ABCE1 depletion, etc.

      • Figure 5 has similar problems. I don't understand how the Figure in 5A is made, but when you overlay the cited structures on Rps26, the molecules are identical. I guess the authors used some fantasy to build non-existing sequences differently into the structure. There is no basis for that. In panel C and the same in Figure 7, the number of analyzed mRNAs varies. This could influence the outcome and the EXACT same set of mRNAs should be analyzed. But the main problem here is that the authors need to analyze readthrough and not peak height as detailed above. Essential controls are missing that show what fraction of the 18S rRNA is mutated. Previous work has shown that 2 nt truncated 18S rRNA is actively degraded. It is hard to believe how 15% of altered ribosomes can abolish 100% of the effect from the C-rich sequences. Important validation is missing: the authors should analyze rRNA sequences in their ribo-seq dataset to demonstrate that they have the mutated rRNAs, and that these enrich and de-enrich as predicted.

      • In Figure 5-7 the authors develop a model that the sequence selectivity arises from base pairing between 18S rRNA and the mRNA. If so, then they should really stratify the data by number of WC pairs that can be formed. And only WC pairs, as GU pairs have a totally different geometry that will likely be discriminated against in this context. Also, the mutation is in a part of the helix that has no effect (Figure S3G). Thus, the data within the manuscript are inconsistent.

      • Figure 6 does not agree with published data (Li et al., Nature 2022). Previous work did not show testis-depletion of Rps26 in purified ribosomes. This is the critical difference as the authors here did not purify ribosomes. Also, another Rps is an essential control, even if purified ribosomes are used. The validity of this dataset is thus questionable . Depletion from polysomes is hard to believe, as overall there is less signal in the polysomes.

      • Figure 7 has similar problems as figure 5. Different pools of mRNAs are analyzed; peak height is not validated. Overexpression of Rps26 is not shown, as only Myc is shown, not Rps26. Beyond that, increased occupancy in ribosomes needs to be shown for the effect to come from ribosomes. Given how sick the cells are it is most likely that all effects are secondary and arise from whatever else is going on in the overexpression or depletion of Rps26. No controls are presented to show specific effects from Rps26.

      • The authors need to check Rli1/ABCE levels in their cells. Their data have features that are indicative of low ABCE1 levels. These include a very small effect from ABCE1 depletion. These could be responsible for some of the effects they observe.

    3. Author response:

      We thank the editor and reviewers for their thoughtful feedback. We agree with eLife’s overall assessment that, while profiling terminating ribosomes is informative in revealing termination dynamics, the underlying mechanisms require more evidence. Our revision will focus on three conceptual points.

      (1) We will tone down the statement that putative mRNA:rRNA interaction contributes to sequence-specific termination pausing.

      (2) We will clarify the potential role of Rps26 in regulating translation termination.

      (3) We will expand the discussion of tissue-specific termination pausing.

      Reviewer #1 (Public Review):

      (1) We admit that the modest effects of ABCE1 were partly due to the incomplete ABCE1 knockdown in HEK293 cells. Since the elevated ribosome density occurred at all stop codons, we argue that the action of ABCE1 is likely independent of the sequence context. We will rephrase relevant statements in the revised manuscript.

      (2) In terms of Rps26 structures, we agree the structural rearrangement in the absence of Rps26 is highly speculative. However, we do not believe the Rps26 stoichiometry is solely dependent on stress. We will clarify this important point in the revised manuscript.

      (3) We apologize for the confusion about 18S rRNA “scanning” and will revise the sentence in the main text.

      (4) We agree that functional significance of testis-specific termination dynamics is unclear. Since other reviewers raised similar concern, we will expand the discussion of tissue-specific termination pausing in the revised manuscript.

      Reviewer #2 (Public Review):

      We appreciate the Reviewer’s time and efforts in reviewing our manuscript. We are grateful for the insightful comments and many recommendations made by the reviewer to improve our manuscript. We feel that the reviewer may have some misunderstanding in terms of the sequence motif associated with the termination pausing, partly because of the lack of clarity in our original description of the results from MPRA and reporter assays. We will ensure that the reviewer’s points are fully addressed in the revised manuscript.

      Reviewer #3 (Public Review):

      We thank the reviewer’s positive comment on our manuscript. We agree that the tissue-specific termination differences were poorly described in the main text. Notably, other reviewers raised similar concerns. We will expand the relevant discussion in the revised manuscript, outlining this as a limitation and a future direction.

      Reviewer #4 (Public Review):

      We believe the reviewer mixed xthe public view with recommendation comments. The reviewer appears to be preoccupied by previous studies and questioned some inconsistency in our results. With the development of new technology such as eRF1-seq, we are encouraged to present “new” and “different” findings. All other reviewers appreciate the development of eRF1-seq to profile terminating ribosomes. In fact, we do not believe our data is fundamentally different from the established principles. Rather, our data provides new perspectives to further our understanding of ribosome dynamics at stop codons. We thank the reviewer for understanding.

      The reviewer is quite confused by our sequencing analysis based on peak height, or read density, which is commonly used to infer ribosome dynamics such as pausing. Regarding the sequencing analysis and reporter assays in cells expressing 18S mutant (Figure 5) and Rps26 (Figure 7), we feel that the reviewer has some misunderstanding. In the revised manuscript, we will do our best to clarify those relevant issues. Finally, the reviewer’s comment on base pairing is well-received and we will thoroughly revise the main text and discussion in the revised manuscript.

    1. Reviewer #1 (Public review):

      Microglia are mononuclear phagocytes in the CNS and play essential roles in physiology and pathology. In some conditions, circulating monocytes may infiltrate in the CNS and differentiated into microglia or microglia-like cells. However, the specific mechanism is large unknown. In this study, the authors explored the epigenetic regulation of this process. The quality of this study will be significantly improved if a few questions are addressed.

      (1) The capacity of circulating myeloid cell-derived microglia are controversial. In this study, the authors utilized CX3CR1-GFP/CCR2-DsRed (hetero) mice as a lineage tracing line. However, this animal line is not an appropriate approach for this purpose. For example, when the CX3CR1-GFP/CCR2-DsRed as the undifferentiated donor cell, they are GFP+ and DsRed+. When the cell fate has been changed to microglia, they will change into GFP+ and DsRed- cells. However, this process is mediated with busulfan and artificially introduced bone marrow cells in the circulating cell, which is not existed in physiological and pathological conditions. These artifacts will potentially bring in artifacts and confound the conclusion, as the classical wrong text book knowledge of the bone marrow derived microglia theory and subsequently corrected by Fabio Rossi lab1,2. This is the most risk for drawing this conclusion. The top evidence is from the parabiosis animal model. Therefore, A parabiosis study before making this conclusion, combining a CX3CR1-GFP (hetero) mouse with a WT mouse without busulfan conditioning and looking at whether there are GFP+ microglia in the GFP- WT mouse brain. If there are no GFP+ microglia, the author should clarify this is not a physiological or pathological condition, but a defined artificial host condition, as previously study did3.

      (2) In some conditions, peripheral myeloid cells can infiltrate and replace the brain microglia4,5. Discuss it would be helpful to better understand the mechanism of microglia replacement.

      References:

      (1) Ajami, B., Bennett, J.L., Krieger, C., Tetzlaff, W., and Rossi, F.M. (2007). Local self-renewal can sustain CNS microglia maintenance and function throughout adult life. Nature neuroscience 10, 1538-1543. 10.1038/nn2014.

      (2) Ajami, B., Bennett, J.L., Krieger, C., McNagny, K.M., and Rossi, F.M.V. (2011). Infiltrating monocytes trigger EAE progression, but do not contribute to the resident microglia pool. Nature neuroscience 14, 1142-1149. http://www.nature.com/neuro/journal/v14/n9/abs/nn.2887.html#supplementary-information.

      (3) Mildner, A., Schmidt, H., Nitsche, M., Merkler, D., Hanisch, U.K., Mack, M., Heikenwalder, M., Bruck, W., Priller, J., and Prinz, M. (2007). Microglia in the adult brain arise from Ly-6ChiCCR2+ monocytes only under defined host conditions. Nature neuroscience 10, 1544-1553. 10.1038/nn2015.

      (4) Wu, J., Wang, Y., Li, X., Ouyang, P., Cai, Y., He, Y., Zhang, M., Luan, X., Jin, Y., Wang, J., et al. (2025). Microglia replacement halts the progression of microgliopathy in mice and humans. Science 389, eadr1015. 10.1126/science.adr1015.

      (5) Xu, Z., Rao, Y., Huang, Y., Zhou, T., Feng, R., Xiong, S., Yuan, T.F., Qin, S., Lu, Y., Zhou, X., et al. (2020). Efficient strategies for microglia replacement in the central nervous system. Cell reports 32, 108041. 10.1016/j.celrep.2020.108041.

    1. 'Écoute dans le Développement Humain : Une Analyse de la Perspective de la Professeure Elinor Ochs

      Résumé Analytique

      Ce document de synthèse analyse les arguments principaux de la professeure Elinor Ochs concernant le rôle sous-estimé de l'écoute dans le développement de l'enfant.

      La thèse centrale est que les études développementales dominantes, principalement menées dans les sociétés occidentales post-industrielles, se sont concentrées de manière excessive sur la production de la parole par l'enfant dans des contextes dyadiques (parent-enfant), tout en négligeant la compétence cruciale de l'écoute, en particulier l'écoute incidente ("overhearing") au sein d'interactions multipartites.

      En s'appuyant sur des décennies de recherche ethnographique, notamment son travail fondateur au Samoa, Ochs démontre que dans de nombreuses sociétés, les enfants sont socialisés dès leur plus jeune âge pour devenir des auditeurs compétents au sein de conversations de groupe.

      Cette "formation" à l'écoute est facilitée par des "affordances" culturelles spécifiques, telles que l'architecture ouverte des habitations, les postures corporelles qui orientent l'enfant vers l'espace public, et une économie domestique qui valorise la continuité générationnelle et les ressources partagées.

      En contraste, le modèle occidental, avec ses espaces privés et son accent sur l'individualisme économique, favorise des interactions dyadiques centrées sur l'enfant, amplifiant son rôle de locuteur plutôt que d'auditeur.

      En conclusion, la professeure Ochs soutient que les interactions multipartites offrent des avantages développementaux uniques, exposant les enfants à une plus grande diversité de locuteurs, de perspectives et de variétés linguistiques.

      Ses recherches remettent en question l'universalité des modèles actuels d'acquisition du langage et appellent à une réévaluation du rôle de l'écoute comme une compétence socio-culturellement construite, essentielle à l'apprentissage, à la coopération et à l'intégration sociale.

      Introduction : La Perspective d'une Anthropologue Linguistique

      La professeure Elinor Ochs, de l'UCLA, est une anthropologue linguistique qui combine les disciplines de la linguistique et de l'anthropologie.

      Sa méthodologie principale est le travail de terrain ethnographique, utilisant des enregistrements audio et vidéo pour documenter de manière détaillée comment la communication façonne les situations sociales, les relations et les modes de pensée.

      Domaine de spécialisation : Elle a co-créé le sous-domaine de la "socialisation langagière", qui postule qu'en apprenant une langue, les enfants acquièrent simultanément une compétence socioculturelle pour devenir une "personne" au sein de leur communauté.

      Expérience de recherche :

      Samoa (1978-1988) : Étude longitudinale sur l'acquisition du langage chez de jeunes enfants dans un village rural.  

      États-Unis (années 80 et 2000) : Recherches sur les différences de classe sociale dans le discours de résolution de problèmes et une étude interdisciplinaire à grande échelle documentant la vie de 32 familles de la classe moyenne.   

      Autisme (depuis 1997) : Étude des pratiques communicatives des enfants sur le spectre autistique à la maison et à l'école.

      Le Paradigme Dominant dans les Études Développementales : La Primauté de la Parole sur l'Écoute

      La professeure Ochs commence par un constat : bien que la parole et l'écoute soient deux pratiques communicatives universelles, la parole reste de loin l'objet d'intérêt principal dans tous les domaines qui étudient le langage. L'accent est mis sur la production du langage, et non sur le processus qui distingue l'audition de l'écoute.

      Les Limites des Études Quantitatives

      Les études quantitatives sur le développement du langage chez l'enfant se concentrent sur la langue produite par l'enfant, souvent réduite au nombre de mots.

      Une préoccupation majeure du public, notamment concernant les différences socio-économiques ("word gap"), est née de ces études.

      Le Modèle Dyadique : La généralisation dominante est que "plus un enfant entend de mots qui lui sont directement adressés, plus son vocabulaire sera étendu".

      Conditions Idéales Supposées : Ce modèle repose sur des conditions très spécifiques :

      1. L'enfant est l'allocutaire principal dans une conversation dyadique (un locuteur, un auditeur).  

      2. L'interaction est en face à face.  

      3. Le langage utilisé est simplifié et affectif (langage adressé à l'enfant ou "parler bébé").

      La Négation de l'Écoute Incidente : Dans ce cadre, l'écoute de conversations d'autres personnes ("overhearing") est considérée comme ayant "peu ou pas de bénéfice développemental".

      Biais Culturel : Ces études sont principalement situées dans des sociétés occidentales post-industrielles, avec très peu de recherches menées dans des sociétés aux économies sociopolitiques différentes.

      Un Modèle Alternatif : L'Apprentissage par l'Écoute en Contexte Multipartite

      La thèse centrale de la professeure Ochs, étayée par des recherches ethnographiques, est qu'un autre modèle d'apprentissage existe et est courant dans de nombreuses sociétés.

      Arguments Clés

      Argument

      Description

      Argument 1

      Les études développementales valorisent les conversations dyadiques fréquentes où le jeune enfant est locuteur ou allocutaire principal, motivant des interventions éducatives dans le monde entier.

      Argument 2

      Des études ethnographiques montrent que dans certaines sociétés, les nourrissons et les tout-petits participent régulièrement à des conversations multipartites en tant qu'auditeurs incidents légitimes ("legitimate overhearers") ou participants secondaires.

      Argument 3

      Qu'ils soient immergés dans des contextes multipartites ou dyadiques, les enfants neurotypiques acquièrent le langage avec succès dans différents contextes socioculturels.

      Argument 4

      Les interactions multipartites possèdent leurs propres affordances développementales, exposant les enfants à une diversité de locuteurs, de perspectives et de variétés linguistiques, et leur apprenant à adapter leur discours à différents interlocuteurs ("recipient design").

      Argument 5

      Les compétences d'écoute sont renforcées dès la petite enfance par des alignements corporels multipartites tournés vers l'extérieur et par des environnements construits ouverts qui offrent un accès auditif et visuel aux espaces publics.

      Étude de Cas Ethnographique : Le Village Samoan

      Le travail de terrain de la professeure Ochs au Samoa, il y a près de 50 ans, constitue la principale source de données pour son argumentaire.

      Contexte Linguistique et Social

      Langue Complexe : La langue samoane est ergative, avec des ordres de mots multiples, deux registres phonologiques, et un vocabulaire de respect complexe.

      Société Hiérarchique : La société est structurée avec des personnes titrées (grands chefs, orateurs) et non titrées.

      Absence de "Parler Bébé" : Les soignants n'utilisent généralement pas de langage simplifié ou de "parler bébé" avec les nourrissons. Ils n'étiquettent pas les objets et posent rarement des questions dont ils connaissent la réponse.

      Apprentissage Immersif : Les enfants acquièrent le samoan parlé en étant au milieu d'interactions multipartites.

      Les Affordances Environnementales et Corporelles pour l'Écoute

      Ochs identifie deux types principaux d'affordances qui favorisent une culture de l'écoute.

      1. Environnements Construits Ouverts :

      ◦ Les maisons traditionnelles samoanes n'ont ni murs extérieurs ni murs intérieurs. L'espace est ouvert, avec des nattes en feuilles de cocotier pour l'ombre.   

      ◦ Les maisons sont regroupées en concessions familiales ouvertes et proches de la route principale, donnant accès aux conversations publiques.  

      ◦ Les interactions simultanées à l'intérieur et à l'extérieur de la maison sont courantes, et les habitants sont habitués à écouter plusieurs conversations à la fois.  

      ◦ En revanche, les maisons de style européen (coloniales), bien que prestigieuses, sont murées, rectangulaires et moins appréciées car elles limitent l'accès auditif et sont très chaudes.

      2. Alignements Corporels Orientés vers l'Extérieur :

      Nourrissons : Ils sont souvent "nichés" dans les bras d'un soignant (adulte ou aîné) de manière à faire face à l'extérieur, vers l'espace public et la communauté. Ils sont portés sur le dos, sur la hanche, ou assis devant le soignant, regardant dans la même direction que les autres participants.  

      Enfants plus âgés : Ils doivent s'asseoir en tailleur (ne pas montrer la plante des pieds) et observer activement les personnes à l'intérieur de la maison ainsi que celles sur la route depuis le bord de la maison. Leurs tâches (messagers, service, etc.) les rendent mobiles et actifs dans la communauté.  

      ◦ Le mot samoan pour "respect" (fa'aaloalo) est composé du préfixe fa'a et de alo, qui signifie "visage", impliquant l'idée de "se tourner vers l'autre".

      Hypothèses Socio-Économiques et Questions Ouvertes

      La professeure Ochs relie ces différents modes d'interaction à la structure économique de la famille.

      Le Modèle de la Continuité Familiale (ex: Samoa) :

      ◦ Les enfants sont élevés pour soutenir les ressources économiques partagées de la famille et assurer la continuité générationnelle des biens.  

      ◦ Dans ce contexte, "la famille a un investissement pour que l'enfant écoute". L'écoute est une compétence essentielle pour apprendre les dynamiques sociales et économiques du groupe.  

      ◦ Ce modèle favorise la participation de l'enfant en tant qu'auditeur dans des conversations multipartites.

      Le Modèle de l'Indépendance Individuelle (ex: familles néolibérales américaines) :

      ◦ Les enfants sont élevés pour devenir des individus économiquement indépendants, un héritage culturel où les droits de succession ont été abolis bien avant la révolution industrielle.    ◦ L'accent est mis sur le développement rapide de l'enfant en tant qu'individu, ce qui favorise les interactions dyadiques intenses et centrées sur l'enfant.

      Questions Centrales pour la Recherche Future

      La présentation se termine par une série de questions fondamentales :

      1. Les habitats (ouverts ou murés) et les orientations corporelles peuvent-ils influencer la phénoménologie de l'écoute dans la petite enfance ?

      2. Ces facteurs socioculturels agissent-ils comme des "amplificateurs culturels" ?

      Un habitat privé et clos amplifie-t-il l'écoute en tant qu'allocutaire dyadique, tandis qu'un habitat ouvert amplifie l'écoute en tant que participant secondaire ?

      3. Les études développementales actuelles n'examinent-elles qu'une "fraction des possibilités" en matière d'environnements et d'affordances pour l'écoute ?

    1. Crise, Inégalités et Précarité : Synthèse des Analyses d'Esther Duflo, Claire Hédon et Frédéric Worms

      Résumé

      Ce document de synthèse analyse les interventions d'Esther Duflo, Claire Hédon et Frédéric Worms sur l'impact de la crise du coronavirus sur les inégalités et la précarité. Les conclusions clés sont les suivantes :

      Aggravation des Inégalités : La crise a un effet immédiat et délétère, exacerbant les inégalités existantes tant au sein des pays qu'entre eux.

      Les populations les plus pauvres et les plus vulnérables subissent de manière disproportionnée les chocs sanitaires et économiques.

      Aux États-Unis, par exemple, la probabilité de décès du coronavirus pour une personne noire est quatre fois supérieure à celle d'une personne blanche, à âge égal.

      Disparité des Réponses Économiques : Les pays riches ont pu mobiliser 20% de leur PIB pour soutenir leurs économies, contre 6% pour les pays émergents et seulement 2% pour les pays pauvres, ce qui laisse présager un enlisement de la pauvreté dans ces derniers.

      Révélation des Failles Systémiques : La crise a mis en lumière des problèmes structurels profonds :

      • une méfiance institutionnalisée envers les pauvres qui rend les systèmes de protection sociale punitifs,
      • un recul des services publics qui complique l'accès aux droits (notamment à cause de la dématérialisation), et
      • une incapacité de la communauté internationale à organiser une solidarité efficace.

      Opportunités de Changement : Malgré ses effets négatifs, la crise offre des opportunités.

      Elle a démontré que le gouvernement est une solution essentielle pour gérer les crises, et non le problème.

      L'expérience massive du chômage partiel pourrait également changer la perception de la redistribution, en montrant que chacun peut avoir besoin d'aide, et potentiellement ouvrir la voie à des systèmes plus respectueux de la dignité.

      Approche Structurelle : Le traitement des inégalités n'est pas seulement une conséquence à gérer, mais une condition préalable à la gestion efficace des crises futures, qu'elles soient sanitaires, climatiques ou démocratiques.

      La confiance dans un système de redistribution juste est indispensable pour obtenir l'adhésion collective aux efforts nécessaires.

      Enjeux de l'Accès au Droit : La crise a aggravé le phénomène de "non-recours" aux droits, où les personnes les plus précaires, confrontées à la fermeture des services physiques et à la barrière numérique, ne parviennent pas à obtenir les aides auxquelles elles ont droit.

      --------------------------------------------------------------------------------

      1. L'Impact Immédiat et Disproportionné de la Crise

      La crise du coronavirus, loin d'être un "grand égaliseur", a frappé de manière asymétrique, aggravant les vulnérabilités existantes.

      1.1. Inégalités au sein des Pays Riches

      Sur le plan sanitaire : Esther Duflo souligne que les populations les plus pauvres et minoritaires ont été les plus touchées.

      Aux États-Unis, en ajustant pour l'âge, une personne noire a quatre fois plus de chances de mourir du coronavirus qu'une personne blanche.

      Une étude de l'INSEE en France, citée par Claire Hédon, montre également une corrélation entre le niveau de vie de la commune et la mortalité.

      Sur le plan économique :

      ◦ La reprise est inégale. Aux États-Unis, le quart le plus riche de la population a retrouvé ses niveaux d'emploi et de salaire d'avant-crise, tandis que les plus pauvres, notamment dans le secteur des services, s'installent dans une crise durable.  

      ◦ Les dispositifs de solidarité, comme le chômage partiel en Europe, se sont principalement basés sur l'existence d'un emploi préalable, laissant de côté les personnes déjà en grande précarité.   

      ◦ Claire Hédon rapporte que les personnes aux minima sociaux ont vu leur situation se dégrader (courses plus chères dans les commerces de proximité, enfants non scolarisés à la cantine à 1€) sans bénéficier d'aides supplémentaires significatives.

      1.2. Inégalités entre les Pays

      Esther Duflo met en évidence un fossé immense dans la capacité de réponse économique à la crise.

      Catégorie de pays

      Dépenses de soutien fiscal (en % du PIB)

      Pays riches

      20 %

      Pays émergents

      6 %

      Pays pauvres

      2 % (d'un PIB déjà beaucoup plus petit)

      Cette disparité a des conséquences majeures :

      • Les pays riches ont pu emprunter massivement pour protéger leurs populations, une option inaccessible aux pays pauvres.

      • Alors qu'une reprise économique rapide est attendue dans les pays riches grâce à la vaccination, les pays pauvres risquent un "enlisement de la crise" et un renfermement de la pauvreté sur elle-même.

      2. Les Failles Systémiques Révélées et Exacerbées

      La crise a agi comme un révélateur de dysfonctionnements structurels profonds dans nos sociétés et nos institutions.

      2.1. La Méfiance envers les Pauvres et le Carcan Punitif de la Redistribution

      Esther Duflo affirme que nos systèmes de protection sociale sont qualitativement faibles et "punitifs à leur cœur" en raison d'une méfiance profonde envers les pauvres, perçus comme "paresseux".

      Cette vision, qualifiée de "victorienne", érige des barrières pour éviter que les bénéficiaires "ne se vautrent pas dans la complaisance".

      Claire Hédon confirme ce constat avec des exemples concrets :

      Le soupçon de fraude permanent : Elle cite le cas d'un homme ayant mis 15 mois à obtenir le RSA, ou ceux de personnes accusées de fraude pour avoir vendu leurs vêtements ou leur voiture pour survivre.

      Un regard culpabilisateur : "J'ai le sentiment qui est ancré dans la société un regard très culpabilisateur qui est aussi qu'est-ce que vous avez raté dans votre vie pour vous retrouver dans cette situation là."

      Elle soutient que c'est la société qui a échoué envers ces personnes, et non l'inverse.

      2.2. Le Recul des Services Publics et le Non-Recours aux Droits

      Claire Hédon, en tant que Défenseure des droits, alerte sur un "recul de la présence de l'État" qui a été aggravé par la crise.

      La dématérialisation comme barrière : La fermeture des services physiques (CAF, postes) a rendu l'accès aux droits quasi impossible pour les personnes sans connexion internet, sans matériel adéquat ou sans compétences numériques.

      Pour les plus précaires, la dématérialisation aboutit à un "non accès au droit".

      Le phénomène du non-recours : Beaucoup de personnes éligibles n'arrivent pas à faire valoir leurs droits. La lutte contre la fraude, en complexifiant les démarches, génère de fait du non-recours.

      Qualité de l'accueil : Même l'accès physique est semé d'embûches, comme l'illustre l'exemple d'un homme devant parcourir 30 km pour se rendre à la CAF, se voir refuser l'entrée faute de rendez-vous pris sur internet, puis être jugé "pas motivé" par les agents d'accueil.

      2.3. L'Échec de la Solidarité Internationale

      Esther Duflo déplore que les pays riches, qui ont dépensé des "trillions de dollars" pour leurs propres économies, aient été "aux grands abonnés absents" pour aider les pays pauvres.

      L'appel à un "plan Marshall pour les pays pauvres" qu'elle a lancé au début de la crise n'a pas été entendu.

      Cette incapacité à agir collectivement en temps de crise est un signal inquiétant pour les défis à venir, notamment le changement climatique.

      3. Les Crises comme Catalyseurs de Changements Potentiels

      Malgré le constat sombre, les intervenants identifient des lueurs d'espoir et des opportunités de repenser certains paradigmes.

      3.1. Le Rôle Essentiel de l'État

      Pour Esther Duflo, la crise a apporté une leçon majeure : "le gouvernement n'est pas le problème, le gouvernement est la solution."

      Seul l'État a la capacité :

      • D'imposer des mesures de santé publique (port du masque).

      • D'investir massivement dans la recherche et l'achat de vaccins.

      • D'emprunter au nom de la population pour la protéger des chocs économiques.

      Cette prise de conscience pourrait mener à un "regain d'appréciation pour l'importance du rôle du gouvernement".

      3.2. Vers une Nouvelle Perception de la Redistribution

      L'expérience massive et souple du chômage partiel en Europe a montré que "tout le monde peut avoir besoin d'aide".

      Des personnes "tout à fait vertueuses" se sont retrouvées dépendantes d'un soutien public.

      Espoir d'un changement de mentalité : Esther Duflo espère que cette expérience pourra "nous libérer un peu de ce carcan victorien" et permettre une redistribution "plus fluide, plus respectueuse, mettant la dignité des individus au cœur".

      Débat sur le revenu des jeunes : Claire Hédon note que la crise a rendu moins tabou le débat sur un revenu d'existence pour les 18-25 ans (via le RSA ou la généralisation de la Garantie Jeune).

      4. Une Approche Structurelle : Traiter les Inégalités pour Prévenir les Crises

      Frédéric Worms propose une analyse en trois niveaux de la réponse à la crise et plaide pour une vision structurelle à long terme.

      4.1. Trois Types de Réponses à la Crise

      1. La réponse "hypocrite" : Consiste à dire que, puisque les mesures sanitaires aggravent les inégalités, il ne fallait pas y répondre (ou pas autant).

      Frédéric Worms et Esther Duflo réfutent cet argument en soulignant qu'il n'y a pas d'arbitrage entre le sanitaire et l'économique : les pays qui ont mal géré la crise sanitaire ont aussi les pires résultats économiques.

      2. La réponse "honnête" (démocratie sociale) : Consiste à répondre aux deux dangers simultanément, en conjuguant les impératifs sanitaires, économiques et sociaux.

      3. La réponse "structurelle" (la plus forte) : Consiste à affirmer que le traitement des inégalités est la condition même de la réponse aux dangers sanitaires du 21e siècle. Les inégalités ne sont pas un effet secondaire, mais une cause première des crises.

      4.2. La Confiance comme Prérequis à l'Action Collective

      Cette approche structurelle est essentielle car, comme le souligne Esther Duflo, on ne peut pas gérer une crise (COVID, climatique) qui implique des sacrifices sans la confiance des citoyens.

      Confiance et redistribution : Les gens n'accepteront des mesures difficiles (ex: taxe carbone) que s'ils ont confiance dans le fait qu'ils seront justement compensés.

      Cette confiance est impossible sans un système de redistribution perçu comme "efficace, généreux et qui respecte les gens".

      Le cercle vicieux de la défiance : Frédéric Worms pointe une "défiance mutuelle" :

      celle des citoyens envers le gouvernement, mais aussi celle du gouvernement envers les citoyens (soupçon de fraude).

      Briser ce cercle nécessite de s'appuyer sur le savoir, la science, et des "institutions du désaccord" solides.

      5. Pistes d'Action et Solutions

      La discussion a également abordé des solutions concrètes pour lutter contre la pauvreté et les inégalités.

      Revenu Minimum Garanti vs. Revenu Universel :

      Pour les pays pauvres, Esther Duflo préconise un revenu universel très faible, accessible sur simple demande.

      L'enjeu principal y est la perte de dignité, et même un revenu modeste peut suffire à "mettre de quoi manger à vos enfants trois fois par jour".   

      Pour les pays riches, elle privilégie un revenu minimum garanti (sur le principe du RSA), qui concentre les ressources sur ceux qui en ont le plus besoin, car les informations pour les cibler existent.

      Elle insiste sur le fait que la dignité y est aussi liée au travail, qui nécessite plus que de l'argent (logement, garde d'enfants, etc.).

      Ce doit être un droit, non une charité.

      Le Droit au Travail : Claire Hédon et Esther Duflo s'accordent sur l'importance du droit au travail.

      Les personnes en situation de précarité souhaitent travailler, car c'est un "moyen d'être inséré dans la société".

      L'Approche Expérimentale : Esther Duflo plaide pour l'importation d'une attitude apprise dans son travail dans les pays pauvres :

      l'humilité de reconnaître qu'on ne sait pas toujours ce qui marche et la nécessité de tester rigoureusement les politiques publiques avant de les généraliser.

      Des études ont par exemple montré que la sécurité financière encourage l'initiative plutôt qu'elle ne la limite.

      Droit à l'accès au numérique : Face à la dématérialisation généralisée, Claire Hédon estime qu'il faut désormais réfléchir à un "droit à l'accès au numérique".

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Bisht et al address the hypothesis that protein folding chaperones may be implicated in aggregopathies and in particular Tau aggregation, as a means to identify novel therapeutic routes for these largely neurodegenerative conditions.

      The authors conducted a genetic screen in the Drosophila eye, which facilitates the identification of mutations that either enhance or suppress a visible disturbance in the nearly crystalline organization of the compound eye. They screened by RNA interference all 64 known Drosophila chaperones and revealed that mutations in 20 of them exaggerate the Tau-dependent phenotype, while 15 ameliorated it. The enhancer of the degeneration group included 2 subunits of the typically heterohexameric prefoldin complex and other co-translational chaperones.

      The authors characterized in depth one of the prefoldin subunits, Pfdn5, and convincingly demonstrated that this protein functions in the regulation of microtubule organization, likely due to its regulation of proper folding of tubulin monomers. They demonstrate convincingly using both immunohistochemistry in larval motor neurons and microtubule binding assays that Pfdn5 is a bona fide microtubule-associated protein contributing to the stability of the axonal microtubule cytoskeleton, which is significantly disrupted in the mutants.

      Similar phenotypes were observed in larvae expressing Frontotemporal dementia with Parkinsonism on chromosome 17-associated mutations of the human Tau gene V377M and R406W. On the strength of the phenotypic evidence and the enhancement of the TauV377Minduced eye degeneration, they demonstrate that loss of Pfdn5 exaggerates the synaptic deficits upon expression of the Tau mutants. Conversely, the overexpression of Pfdn5 or Pfdn6 ameliorates the synaptic phenotypes in the larvae, the vacuolization phenotypes in the adult, and even memory defects upon TauV377M expression.

      Strengths

      The phenotypic analyses of the mutant and its interactions with TauV377M at the cell biological, histological, and behavioral levels are precise, extensive, and convincing and achieve the aims of characterization of a novel function of Pfdn5. 

      Regarding this memory defect upon V377M tau expression. Kosmidis et al (2010), PMID: 20071510, demonstrated that pan-neuronal expression of Tau<sup>V377M</sup> disrupts the organization of the mushroom bodies, the seat of long-term memory in odor/shock and odor/reward conditioning. If the novel memory assay the authors use depends on the adult brain structures, then the memory deficit can be explained in this manner. 

      (1) If the mushroom bodies are defective upon Tau<sup>V377M</sup>. expression, does overexpression of Pfdn5 or 6 reverse this deficit? This would argue strongly in favor of the microtubule stabilization explanation.

      We thank the reviewer for this insightful comment. Consistent with Kosmidis et al. (2010), we confirm that expression of hTau<sup>V377M</sup> disrupts the architecture of mushroom bodies.   In addition, we find, as suggested by the reviewer, that coexpression of either Pfdn5 or Pfdn6 with hTau<sup>V377M</sup> significantly restores the organization of the mushroom bodies. These new findings strongly support the hypothesis that Pfdn5 or Pfdn6 mitigate hTau<sup>V377M</sup> -induced memory deficits by preserving the structure of the mushroom body, likely through stabilizing the microtubule network. This data has now been included in the revised manuscript (Figure 7H-O).

      (2) The discovery that Pfdn5 (and 6 most likely) affects tauV377M toxicity is indeed a novel and important discovery for the Tauopathies field. It is important to determine whether this interaction affects only the FTDP-17-linked mutations or also WT Tau isoforms, which are linked to the rest of the Tauopathies. Also, insights on the mode(s) that Pfdn5/6 affect Tau toxicity, such as some of the suggestions above, are aiming at will likely be helpful towards therapeutic interventions.

      We agree that determining whether prefoldin modulates the toxicity of both mutant and wildtype Tau is critical for understanding its broader relevance to Tauopathies. We have now performed additional experiments required to address this issue. These new data show that loss of Pfdn5 also exacerbates toxicity associated with wildype Tau (hTau<sup>WT</sup>), in a manner similar to that observed with hTau<sup>V337M</sup> or hTau<sup>R406W</sup>. Specifically, overexpression of hTau<sup>WT</sup> in a Pfdn5 mutant background leads to Tau aggregate formation (Figure S7G-I), and coexpression of Pfdn5 with hTau<sup>WT</sup> reduces the associated synaptic defects (Figure S11F-L). These findings underscore a general role for Pfdn5 in modulating diverse Tauopathy-associated phenotypes and suggest that it could be a broadly relevant therapeutic target. 

      Weakness

      (3) What is unclear, however, is how Pfdn5 loss or even overexpression affects the pathological Tau phenotypes. Does Pfdn5 (or 6) interact directly with TauV377M? Colocalization within tissues is a start, but immunoprecipitations would provide additional independent evidence that this is so.

      We appreciate this important suggestion. To investigate a potential direct interaction between Pfdn5 and Tau<sup>V377M</sup>, we performed co-immunoprecipitation experiments using lysates from adult fly brain expressing hTau<sup>V337M</sup>. Under the conditions tested, we did not detect a direct physical interaction. While this does not support a direct interaction, it does not strongly refute it either. We note that Pfdn5 and Tau are colocalized within axons (Figure S13J-K). At this stage, we are unable to resolve the issue of direct vs indirect association. If indirect, then Tau and Pfdn5 act within the same subcellular compartments (axon); if direct, then either only a small fraction of the total cellular proteins is in the Tau-Pfdn5 complex and therefore difficult to detect in bulk protein westerns, or the interactions are dynamic or occur in conditions that we have not been able to mimic in vitro. 

      (4) Does Pfdn5 loss exacerbate Tau<sup>V377M</sup> phenotypes because it destabilizes microtubules, which are already at least partially destabilized by Tau expression? Rescue of the phenotypes by overexpression of Pfdn5 agrees with this notion. 

      However, Cowan et al (2010) pmid: 20617325 demonstrated that wildtype Tau accumulation in larval motor neurons indeed destabilizes microtubules in a Tau phosphorylation-dependent manner. So, is Tau<sup>V377M</sup> hyperphosphorylated in the larvae?? What happens to Tau<sup>V377M</sup> phosphorylation when Pfdn5 is missing and presumably more Tau is soluble and subject to hyperphosphorylation as predicted by the above?

      We completely agree that it is important to link Tau-induced phenotypes with the microtubule destabilization and phosphorylation state of Tau.   We performed immunostaining using futsch antibody to check the microtubule organization at the NMJ and observed a severe reduction in futsch intensity when Tau<sup>V337M</sup> was expressed in the Pfdn5 mutant (ElavGal4>Tau<sup>V337M</sup>; DPfdn5<sup>15/40</sup>), suggesting that Pfdn5 absence exacerbates the hTau<sup>V337M</sup> defects due to more microtubule destabilization (Figure S6F-J). 

      We have performed additional experiments to examine the phosphorylation state of hTau in Drosophila larval axons. Immunocytochemistry indicated that only a subset of hTau aggregates in Pfdn5 mutants (Elav-Gal4>Tau<sup>V337M</sup>; DPfdn5<sup>15/40</sup>) are recognized by phospho-hTau antibodies.   For instance, the AT8 antibody (targeting pSer202/pThr205) (Goedert et al., 1995) labelled only a subset of aggregates identified by the total hTau antibody (D5D8N) (Figure S9AE). Moreover, feeding these larvae (Elav-Gal4>Tau<sup>V337M</sup; DPfdn5<sup>15/40</sup>) with LiCl, which blocks GSK3b, still showed robust Tau aggregation (Figure S9F-J). 

      These results imply that: a) soluble phospho-hTau levels in Pfdn5 mutants are low and not reliably detected with a single phospholylation-specific antibody; b) Loss of Pfdn5 results in Tau aggregation in a hyperphosphorylation-independent manner similar to what has been reported earlier (LI et al. 2022); and c) the destabilization of microtubules in Elav-Gal4>Tau<sup>V337M</sup>; DPfdn5<sup>15/40</sup> results in Tau dissociation and aggregate formation. These data and conclusions have been incorporated into the revised manuscript.

      (5) Expression of WT human Tau (which is associated with most common Tauopathies other than FTDP-17) as Cowan et al suggest has significant effects on microtubule stability, but such Tauexpressing larvae are largely viable. Will one mutant copy of the Pfdn5 knockout enhance the phenotype of these larvae?? Will it result in lethality? Such data will serve to generalize the effects of Pfdn5 beyond the two FDTP-17 mutations utilized.

      We have now examined whether heterozygous loss of Pfdn5 (∆Pfdn5/+) enhances the effect of Tau expression. While each genotype (hTau<sup>V337M</sup>, hTau<sup>WT</sup> or ∆Pfdn5/+) alone is viable, Elav-Gal4 driven expression of hTau<sup>V337M</sup> or hTau<sup>WT</sup> in Pfdn5 heterozygous background does not cause lethality. 

      (6) Does the loss of Pfdn5 affect TauV377M (and WTTau) levels?? Could the loss of Pfdn5 simply result in increased Tau levels? And conversely, does overexpression of Pfdn5 or 6 reduce Tau levels?? This would explain the enhancement and suppression of Tau<sup>V377M</sup> (and possibly WT Tau) phenotypes. It is an easily addressed, trivial explanation at the observational level, which, if true, begs for a distinct mechanistic approach.

      To test whether Pfdn5 modulates Tau phenotypes by altering Tau protein levels, we performed western blot analysis under Pfdn5 or Pfdn6 overexpression conditions and observed no change in hTau<sup>V337M</sup> levels (Figure 6O). However, in the absence of Pfdn5, both hTau<sup>V337M</sup> and hTau<sup>WT</sup> form large, insoluble aggregates that are not detected in soluble lysates by standard western blotting but are visualized by immunocytochemistry (Figure S7G-I). Thus, the apparent reduction in Tau levels on western blots reflects a solubility shift, not an actual decrease in Tau expression. These findings argue against a simple model in which Pfdn5 regulates Tau abundance and instead support a mechanism in which Pfdn5 loss leads to change in Tau conformation, leading to its sequesteration away for already destabilized microtubules.  

      (7) Finally, the authors argue that Tau<sup>V377M</sup> forms aggregates in the larval brain based on large puncta observed especially upon loss of Pfdn5. This may be so, but protocols are available to validate this molecularly the presence of insoluble Tau aggregates (for example, pmid: 36868851) or soluble Tau oligomers, as these apparently differentially affect Tau toxicity. Does Pfdn5 loss exaggerate the toxic oligomers, and overexpression promote the more benign large aggregates??

      We have performed additional experiments to analyze the nature of these aggregates using 1,6-HD. The 1,6-hexanediol can dissolve the Tau aggregate seeds formed by Tau droplets, but cannot dissolve the stable Tau aggregates (WEGMANN et al. 2018). We observed that 5% 1,6hexanediol failed to dissolve these Tau aggregates (Figure S8), demonstrating the formation of stable filamentous flame-shaped NFT-like aggregates in the absence of Pfdn5 (Figure 5D and Figure S9).

      Reviewer #2 (Public review):

      Bisht et al detail a novel interaction between the chaperone, Prefoldin 5, microtubules, and taumediated neurodegeneration, with potential relevance for Alzheimer's disease and other tauopathies. Using Drosophila, the study shows that Pfdn5 is a microtubule-associated protein, which regulates tubulin monomer levels and can stabilize microtubule filaments in the axons of peripheral nerves. The work further suggests that Pfdn5/6 may antagonize Tau aggregation and neurotoxicity. While the overall findings may be of interest to those investigating the axonal and synaptic cytoskeleton, the detailed mechanisms for the observed phenotypes remain unresolved and the translational relevance for tauopathy pathogenesis is yet to be established. Further, a number of key controls and important experiments are missing that are needed to fully interpret the findings.

      The strength of this study is the data showing that Pfdn5 localizes to axonal microtubules and the loss-of-function phenotypic analysis revealing disrupted synaptic bouton morphology. The major weakness relates to the experiments and claims of interactions with Tau-mediated neurodegeneration. 

      In particular, it is unclear whether knockdown of Pfdn5 may cause eye phenotypes independent of Tau. 

      Our new experiments confirm that knockdown of Pfdn5 alone does not cause eye phenotypes.

      Further, the GMR>tau phenotype appears to have been incorrectly utilized to examine agedependent, neurodegeneration.

      In response, we have modulated and explained our conclusions in this regard as described later in our “rebuttal.”

      This manuscript argues that its findings may be relevant to thinking about mechanisms and therapies applicable to tauopathies; however, this is premature given that many questions remain about the interactions from Drosophila, the detailed mechanisms remain unresolved, and absent evidence that Tau and Pfdn may similarly interact in the mammalian neuronal context. Therefore, this work would be strongly enhanced by experiments in human or murine neuronal culture or supportive evidence from analyses of human data.

      The reviewer is correct that the impact would be greater if Pfdn5-Tau interactions were also examined in human tissue.   While we have not attempted these experiments ourselves, we hope that our observations will stimulate others to test the conservation of phenomena we describe. There are, however, several lines of circumstantial evidence from human Alzheimer’s disease datasets that implicate PFDN5 in disease pathology. For example, recent compilations and analyses of proteomic data show reductions of CCT components, TBCE, as well as Prefoldin subunits, including PFDN5, in AD tissue (HSIEH et al. 2019; TAO et al. 2020; JI et al. 2022; ASKENAZI et al. 2023; LEITNER et al. 2024; SUN et al. 2024). Furthermore, whole blood mRNA expression data from Alzheimer's patients revealed downregulation of PFDN5 transcript (JI et al. 2022). Together, these findings from human data are consistent with the roles of PFDN5 in suppressing diverse neurodegenerative processes. We have incorporated these points into the discussion section of the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      See public review for experimental recommendations focusing on the Tau Pfdn interactions.  I would refrain from using the word aggregates, I would call them puncta, unless there is molecular or visual (ie AFM) evidence that they are indeed insoluble aggregates.  Finally, although including the full genotypes written out below the axis in the bar graphs is appreciated, it nevertheless makes them difficult to read due to crowding in most cases and somewhat distracting from the figure. 

      In my opinion, a more reader-friendly manner of reporting the phenotypes will be highly helpful. For example, listing each component of the genotype on the left of each bar graph and adding a cross or a filled circle under the bar to inform of the full genotype of the animals used.

      As described in the response to the previous comment, we now have strong direct evidences to support our view that the observed puncta are stable Tau aggregates. Thus, we feel justified to use the term Tau-aggregates in preference to Tau puncta. 

      We have tried to write the genotypes to make them more reader-friendly.

      Reviewer #2 (Recommendations for the authors):

      (1) Lines 119-121: 35 modifiers from 64 seem like an unusually high hit rate. Are these individual genes or lines? Were all modifiers supported by at least 2 independent RNAi strains targeting non-overlapping sequences? A supplemental table should be included detailing all genes and specific strains tested, with corresponding results.

      We agree with the reviewer that 35 modifiers from 64 genes may be too high. However, since the genes knocked down in the study are chaperones, crucial for maintaining proteostasis, we may have got unusually high hits. The information related to individual genes and lines is provided in Supplemental Table 1. We have now included an additional Supplemental Table 3, which lists the genes and the RNAi lines used in Figure 1, detailing the sequence target information. The table also specifies the number of independent RNAi strains used and the corresponding results. 

      (2) Figure 1: The authors quantify the areas of ommatidial fusion and necrosis as degeneration, but it is difficult to appreciate the aberrations in the photos provided. Was any consideration given to also quantifying eye size?

      We have processed the images to enhance their contrast and make the aberrations clearer. The percentage of degenerated eye area (Figure 1M) was normalized with total eye area. The method for quantifying degenerated area has been explained in the materials and methods section.

      (3) Figure 1: a) Only enhancers of rough eyes are shown but no controls are included to evaluate whether knockdown of these genes causes eye toxicity in the absence of Tau. These are important missing controls. All putative Tau enhancers, including Pdn5/6, need to be tested with GMR-GAL4 independently of Tau to determine whether they cause a rough eye. In a previous publication from some of the same investigators (Raut et al 2017), knockdown of Pfdn using eyGAL4 was shown to induce severe eye morphology defects - this raises questions about the results shown here. 

      We agree that assessing the effects of HSP knockdown independent of Tau is essential to confirm modifier specificity. We have now performed these knockdowns, and the data are reported in Supplemental Table 1. For RNAi lines represented in Figure 1, which enhanced Tau-induced degeneration/eye developmental defect, except for one of the RNAi lines against Pfdn6 (GD34204), no detectable eye defects were observed when knocked down with GMR-Gal4 at 25°C, suggesting that enhancement is specific to the Tau background. 

      Use of a more eye-specific GMR-Gal4 driver at 25°C versus broader expressing ey-Gal4 at 29°C in prior work (Raut et al. 2017) likely reflects the differences in the eye morphological defects.

      (b) Besides RNAi, do the classical Pdn5 deletion alleles included in this work also enhance the tau rough eye when heterozygous? Please also consider moving the Pfdn5/6 overexpression studies to evaluate possible suppression of the Tau rough eye to Figure 1, as it would enhance the interpretation of these data (but see also below).

      GMR-Gal4 driven expression of hTau<sup>V337M</sup> or hTau<sup>WT</sup> in Pfdn5 heterozygous background does not enhance rough eye phenotype. 

      (4) For genes of special interest, such as Pdn5, and other genes mentioned in the results, the main figure, or discussion, it is also important to perform quantitative PCR to confirm that the RNAi lines used actually knock down mRNA expression and by how much. These studies will establish specificity.

      We agree that confirming RNAi efficiency via quantitative PCR (qPCR) is essential for validating the knockdown efficiency. We have now included qPCR data, especially for key modifiers, confirming effective knockdown (Figure S2).

      (5) Lines 235-238: how do you conclude whether the tau phenotype is "enhanced" when Pfdn5 causes a similar phenotype on its own? Could the combination simply be additive? Did overexpression of Pdn5 suppress the UAS-hTau NMJ bouton phenotype (see below)? 

      Although Pfdn5 mutants and hTau expression individually increase satellite boutons, their combination leads to a significantly more severe and additional phenotype, such as significantly decreased bouton size and increased bouton number, indicating an enhancing rather than purely additive interaction (Figure 4 and Figure S6C). Moreover, we now show that overexpression of Pfdn5 significantly suppressed the hTau<sup>V337M</sup>-induced NMJ phenotypes. This new data has been incorporated as Figure S11F-L in the revised manuscript. 

      Alternatively, did the authors consider reducing fly tau in the Pdn5 mutant background?

      In new additional experiments, we observe that double mutants for Drosophila Tau (dTau) and Pfdn5 also exhibit severe NMJ defects, suggesting genetic interactions between dTau and Pfdn5. This data is shown below for the reviewer.

      Author response image 1.

      A double mutant combination of dTau and Pfdn5 aggravates the synaptic defects at the Drosophila NMJ. (A-D') Confocal images of NMJ synapses at muscle 4 of A2 hemisegment showing synaptic morphology in (A-A') control, (B-B') ΔPfdn5<SUP>15/40</SUP>, (C-C') dTauKO/dTauKO (Drosophila Tau mutant), (D-D') dTauKO/dTauKO; ∆Pfdn5<SUP>15/40</SUP> double immunolabeled for HRP (green), and CSP (magenta). The scale bar in D for (A-D') represents 10 µm. 

      (6) It may be important to further extend the investigation to the actin cytoskeleton. It is noted that Pfdn5 also stabilizes actin. Importantly, tau-mediated neurodegeneration in Drosophila also disrupts the actin cytoskeleton, and many other regulators of actin modify tau phenotypes.

      We appreciate the suggestion to examine the actin cytoskeleton. While prior studies indicate that Pfdn5 might regulate the actin cytoskeleton and that Tau<sup>V377M</sup> hyperstabilizes the actin cytoskeleton, we did not observe altered actin levels in Pfdn5 mutants (Figure 2G). However, actin dynamics may represent an additional mechanism through which Pfdn5 might temporally influence Tauopathy. Future work will address potential actin-related mechanisms in Tauopathy.

      (7) Figure 2: in the provided images, it is difficult to appreciate the futsch loops. Please include an image with increased magnification. It appears that fly strains harboring a genomic rescue BAC construct are available for Pfdn-this would be a complementary reagent to test besides Pfdn overexpression.

      We have updated Figure 2 to include high magnification NMJ images as insets, clearly showing the Futsch loops. While we have not yet tested a genomic rescue BAC construct for Pfdn5, we plan to use the fly line harboring this construct in future work.

      (8) Figure 3: Some of the data is not adequately explained. The use of Ran as a loading control seems rather unusual. What is the justification? Pfdn appears to only partially co-localize with a-tubulin in the axon; can the authors discuss or explain this? Further, in Pfdn5 mutants, there appears to be a loss of a-tubulin staining (3b'); this should also be discussed.

      We appreciate the reviewer's concern regarding the choice of loading control for our Western blot analysis. Importantly, since Tubulin levels and related pathways were the focus of our analysis, traditional loading controls such as α- or β-tubulin or actin were deemed unsuitable due to potential co-regulation. Ran, a nuclear GTPase involved in nucleocytoplasmic transport, is not known to be transcriptionally or post-translationally regulated by Tubulin-associated signaling pathways. To ensure its reliability as a loading control, we confirmed by densitometric analysis that Ran expression showed minimal variability across all samples. Hence, we used Ran for accurate normalization in the Western blot data represented in this manuscript. We have also used GAPDH as a loading control and found no difference with respect to Ran as a loading control across samples.

      We appreciate the reviewer's comment regarding the interpretation of our Pearson's correlation coefficient (PCC) results. While the mean colocalization value of 0.6 represents a moderate positive correlation (MUKAKA 2012), which may not reach the conventional threshold for "high positive" colocalization (usually considered 0.7-0.9), it nonetheless indicates substantial spatial overlap between the proteins of interest. Importantly, colocalization analysis provides supportive but indirect evidence for molecular proximity.  To further validate the interaction, we performed a microtubule binding assay, which directly demonstrates the binding of Pfdn5 to stabilized microtubules.

      In accordance with the western blot analysis shown in Figure 2G-I, the levels of Tubulin are reduced in the Pfdn5 mutants (Figure 3B''). We have incorporated and discussed this in the revised manuscript.

      (9) Figure 4: Overexpression of Pfdn appears to rescue the supernumerary satellite bouton numbers induced by human Tau; however, interpretation of this experiment is somewhat complicated as it is performed in Pfdn mutant genetic background. Can overexpression of Pfdn on its own rescue the Tau bouton defect in an otherwise wildtype background?

      We have now coexpressed Pfdn5 and hTau<SUP>V337M</SUP> in an otherwise wild-type background. As shown in Figure S11F-L, Pfdn5 overexpression suppresses Tau-induced bouton defects. We have incorporated the data in the Results section to support the role of Pfdn5 as a modifier of Tau toxicity.

      (10) Lines 256-263 / Figure 5: (a) What exactly are these tau-positive structures (punctae) being stained in larval brains in Fig 5C-E? Most prior work on tau aggregation using Drosophila models has been done in the adult brain, and human wildtype or mutant Tau is not known to form significant numbers of aggregates in neurons (although aggregates have been described following glia tau expression). 

      Therefore, the results need to be further clarified. Besides the provided schematic, a zoomed-out image showing the whole larval brain is needed here for orientation. Have these aggregates been previously characterized in the literature? 

      We agree with the reviewer that the expression of the wildtype or mutant form of human Tau in Drosophila is not known to form aggregates in the larval brain, in contrast to the adult brain (JACKSON et al. 2002; OKENVE-RAMOS et al. 2024). Consistent with previous reports, we also observed that Tau expression on its own does not form aggregates in the Drosophila larval brain.

      However, in the absence of Pfdn5, microtubule disruption is severe, leading to reduced Taumicrotubule binding and formation of globular/round or flame-shaped tangles like aggregates in the larval brain. Previous studies have reported that 1,6-hexanediol can dissolve the Tau aggregate seeds formed by Tau droplets, but cannot dissolve the stable Tau aggregates (WEGMANN et al. 2018). We observed that 5% 1,6-Hexanediol failed to dissolve these Tau puncta, demonstrating the formation of stable aggregates in the absence of Pfdn5. Additionally, we now performed a Tau solubility assay and show that in the absence of Pfdn5, a significant amount of Tau goes in the pellet fraction, which could not be detected by phospho-specific AT8 Tau antibody (targeting pSer202/pThr205) but was detected by total hTau antibody (D5D8N) on the western blots (Figure S8). These data further reinforce our conclusion that  Pfdn5 prevents the transition of hTau from soluble and/or microtubule-associated state to an aggregated, insoluble, and pathogenic state. These new data have been incorporated into the revised manuscript.

      (b) Can additional markers (nuclei, cell membrane, etc.) be used to highlight whether the taupositive structures are present in the cell body or at synapses?

      We performed the co-staining of Tau and Elav to assess the aggregated Tau localization. We found that in the presence of Pfdn5, Tau is predominantly cytoplasmic and localised to the cell body and axons. In the absence of Pfdn5, Tau forms aggregates but is still localized to the cell body or axons. However, some of the aggregates are very large, and the subcellular localization could not be determined (Figure S8M-N'). These might represent brain regions of possible nuclear breakdown and cell death (JACKSON et al. 2002).

      (c) It would also be helpful to perform western blots from larval (and adult) brains examining tau protein levels, phospho-tau species, possible higher-molecular weight oligomeric forms, and insoluble vs. soluble species. These studies would be especially important to help interpret the potential mechanisms of observed interactions.

      Western blot analysis revealed that overexpression of Pfdn5 does not alter total Tau levels (Figure 6O). In Pfdn5 mutants, however, hTau<sup>V337M</sup> levels were reduced in the supernatant fraction and increased in the pellet fraction, indicating a shift from soluble monomeric Tau to aggregated Tau.

      (d) Does overexpression of Pdn5 (UAS-Pdn5) suppress the formation of tau aggregates? I would therefore recommend that additional experiments be performed looking at adult flies (perhaps in Pfdn5 heterozygotes or using RNAi due to the larval lethality of Pdn5 null animals).

      Overexpression of Pfdn5 significantly reduced Tau-aggregates (Elav-Gal4/UASTau<sup>V337M</sup>; UAS-Pfdn5; DPfdn5<sup>15/40</sup>) observed in Pfdn5 mutants (Figure 5E). Coexpression of Pfdn5 and hTau<sup>V337M</sup> suppresses the Tau aggregates/puncta in 30-day adult brain. Since heterozygous DPfdn<sup>15</sup>/+ did not show a reduction in Pfdn5 levels, we did not test the suppression of Tau aggregates in  DPfdn<sup>15</sup>/+; Elav>UAS-Pfdn5, UAS-Tau<sup>V337M</sup>.

      (11) Figure 6, panels A-N: The GMR>Tau rough eye is not a "neurodegenerative" but rather a predominantly developmental phenotype. It results from aberrant retinal developmental patterning and the subsequent secretion/formation of the overlying eye cuticle (lenslets). I am confused by the data shown suggesting a "shrinking eye size" and increasing roughened surface over time (a GMR>tau eye similar to that shown in panel B cannot change to appear like the one in panel H with aging). The rough eye can be quite variable among a population of animals, but it is usually fixed at the time the adult fly ecloses from the pupal case, and quite stable over time in an individual animal. Therefore, any suppression of the Tau rough eye seen at 30 days should be appreciable as soon as the animals eclose. These results need to be clarified. If indeed there is robust suppression of Tau rough eye, it may be more intuitive and clearer to include these data with Figure 1, when first showing the loss-of-function enhancement of the Tau rough eye. Also, why is Pfdn6 included in these experiments but not in the studies shown in Figures 2-5?

      We thank the reviewer for their careful and knowledgeable assessment of the GMR>Tau rough eye model. We appreciate the clarification that the rough eye phenotype could be “developmental” rather than neurodegenerative.”  Our initial observations regarding "shrinking eye size" and "increased surface roughness" clearly show age-related progression of structural change.   Such progression has been observed and reported by others (IIJIMA-ANDO et al. 2012; PASSARELLA AND GOEDERT 2018).   We observed an age-dependent increase in the number of fused ommatidia in GMR-Gal4 >Tau, which were rescued by Pfdn5 or Pfdn6 expression. We noted that adult-specific induction of hTau<sup>V337M</sup> adult flies using the Gal80<sup>ts</sup> and GMR-GeneSwitch (GMR-GS) systems was not sufficient to induce a significant eye phenotype; thus, early expression of Tau in the developing eye imaginal disc appears to be required for the adult progressive phenotype that we observe. We feel that it is inadequate to refer to this adult progressive phenotype as “developmental,” and while admittedly arguable whether this can be termed “degenerative.”   

      To address neurodegeneration more directly, we focused on 30-day-old adult fly brains and demonstrated that Pfdn5 overexpression suppresses age-dependent Tau-induced neurodegeneration in the central nervous system (Figure 6H-N and Figure S12). This supports our central conclusion regarding the neuroprotective role of Pfdn5 in age-associated Tau pathology. Since we found an enhancement in the Tau-induced synaptic and eye phenotypes by Pfdn6 knockdown, we also generated CRISPR/Cas9-mediated loss-of-function mutants for Pfdn6. However, loss of Pfdn6 resulted in embryonic/early first instar lethality, which precluded its detailed analysis at the larval stages.

      (12) Figure 6, panels O-T: the elav>tau image appears to show a different frontal section plane compared to the other panels. It is advisable to show images at a similar level in all panels since vacuolar pathology can vary by region. It is also useful to be able to see the entire brain at a lower power, but the higher power inset view is obscuring these images. I would recommend creating separate panels rather than showing them as insets.

      In the revised figure, we now display the low- and high-magnification images as separate, clearly labeled panels instead of using insets. This improves visibility of the brain morphology while providing detailed views of the vacuolar pathology (Figure 6H-L).

      (13) Figure 6/7: For the experiments in which Pfdn5/6 is overexpressed and possibly suppresses tau phenotypes (brain vacuoles and memory), it is important to use controls that normalize the number of UAS binding sites, since increased UAS sites may dilute GAL4 and reduced Tau expression levels/toxicity. Therefore, it would be advisable to compare with Elav>Tau flies that also include a chromosome with an empty UAS site or other transgenes, such as UAS-GFP or UAS-lacZ.

      We thank the reviewer for the suggestion. Now we have incorporated proper controls in the brain vacuolization, the mushroom body, and ommatidial fusion rescue experiments. Also, we have independently verified whether Gal4 dilution has any effect on the Tau phenotypes (Figure 6H-L, Figure 7, and Figure S11A-B).

      (14) Lines 311-312: the authors say vacuolization occurs in human neurodegenerative disease, which is not really true to my knowledge and definitely not stated in the citation they use. Please re-phrase.

      Now we have made the appropriate changes in the revised manuscript.

      (15) Figure 7: The authors claim that Pfdn5/6 expression does not impact memory behavior, but there in fact appears to be a decrease in preference index (panel D vs panel B). Does this result complicate the interpretation of the potential interaction with Tau (panel F). Are data from wildtype control flies available?

      In our memory assay, a decrease in performance index (PI) of the trained flies compared to the naïve flies indicates memory formation (normal memory in control flies, Figure 7B). In contrast, a lack of significant difference in PI indicates a memory defect (Figure 7C: hTau<sup>V337M</sup> overexpressed flies). "Decrease in preference index (panel D vs panel B)" is not a sign of memory defect; it may be interpreted as a better memory instead. Hence, neuronal overexpression of Pfdn5 (Figure 7D) or Pfdn6 (Figure 7E) in wildtype neurons does not cause memory deficits. In addition, coexpression of Pfdn5/6 and hTau<sup>V337M</sup> successfully rescues the Tau-induced memory defect (significant drop in PI compared to the PI of naïve flies in Figure 7F-G). Moreover, almost complete rescue of the Tau-induced mushroom body defect on Pfdn5 or Pfdn6 expression further establishes potential interaction between Pfdn5/6 and Tau. This data has been incorporated into the revised manuscript.

      The memory assay itself with extensive data on wildtype flies and various other genotype will shortly be submitted for publication in another manuscript (Majumder et al, manuscript under preparation); However, we can confirm for the reviewer that wildtype flies, trained and assayed by the protocol described, show a significant decrease in performance index compared to the naïve flies, indicative of strong learning and memory performance, very similar to the control genotype data shown in Figure 7B. 

      Additional minor considerations

      (16) Lines 50-52: there are many therapeutic interventions for treating tauopathies, but not curative or particularly effective ones.

      Now we have made the appropriate changes in the revised manuscript.

      (17) Lines 87-106 seem like a duplication of the abstract. Consider deleting or condensing.

      We have made the appropriate changes in the revised manuscript.

      (18) Where is pfdn5 expressed? Development v. adult? Neuron v. glia? Conservation?

      Prefoldin5 is expressed throughout development but strongly localized to the larval trachea and neuronal axons. Drosophila Pfdn5 shows 35% overall identity with human PFDN5. 

      (19) Liine 187: is pfdn5 truly "novel"?

      The role of Pfdn5 as microtubule-binding and stabilizing is a new finding and has not been predicted or described before. Hence, it is a novel neuronal microtubule-associated protein.  

      (20) Figure 5, panel F, genotype labels on the x-axis are confusing; consider simplifying to Control, DPfdn, and Rescue.

      We have made appropriate changes in the figure for better readability.

      (21) Figures 5/8: it might be preferable to use consistent colors for Tau/HRP--Tau is labeled green in Figure 5 and then purple in Figure 8.

      We have made these changes where possible. 

      (22) Lines 311-312: Vacuolar neuropathology is NOT typically observed in human Tauopathy.

      We thank the reviewer for pointing this out. We have made the appropriate changes in the revised manuscript.

      (23) Lines 328-349: The explanation could be made more clear. Naïve flies should not necessarily be called controls. Also, a more detailed explanation of how the preference index is computed would be helpful. Why are some datapoints negative values?

      (a) We have rewritten this paragraph to make the description and explanation clearer. The detailed method and formula to calculate the Preference index have been incorporated in the Materials and Methods section.

      (b) We have replaced the term Control with Naïve. 

      (c) Datapoints with negative values appeared in some of the 'Trained' group flies. It indicates that post-CuSO<sub>4</sub> training, some groups showed repulsion towards the otherwise attractive odor 2,3B. As 2,3B is an attractive odorant, naïve or control flies show attraction towards it compared to air, which is evident from a higher number of flies in the Odor arm (O) compared to that of the Air arm (A) of the Y-maze; thus, the PI [(O-A/O+A)*100] is positive in case of naïve fly groups. Training of the flies led to an association of the attractive odorant with bitter food, leading to a decrease of attraction, and even repulsion towards the odorant in a few instances, resulting in less fly count in the odor arm compared to the air arm. Hence, the PI becomes negative as (O-A) is negative in such instances. Thus, it is not an anomaly but indicates strong learning. 

      (24) Line 403: misspelling "Pdfn"

      We have corrected this.

      (25) Lines 423-425: recommend re-phrasing, since tauopathies are human diseases. Mice and other animal models may be susceptible to tau-mediated neuronal dysfunction but not Tauopathy, per see.

      We have made the appropriate changes in the revised manuscript.

      (26) Lines 468-469: "tau neuropathology" rather than "tau associated neuropathies".

      We have made the appropriate changes in the revised manuscript. 

      References

      Askenazi, M., T. Kavanagh, G. Pires, B. Ueberheide, T. Wisniewski et al., 2023 Compilation of reported protein changes in the brain in Alzheimer's disease. Nat Commun 14: 4466.

      Hsieh, Y. C., C. Guo, H. K. Yalamanchili, M. Abreha, R. Al-Ouran et al., 2019 Tau-Mediated Disruption of the Spliceosome Triggers Cryptic RNA Splicing and Neurodegeneration in Alzheimer's Disease. Cell Rep 29: 301-316 e310.

      Iijima-Ando, K., M. Sekiya, A. Maruko-Otake, Y. Ohtake, E. Suzuki et al., 2012 Loss of axonal mitochondria promotes tau-mediated neurodegeneration and Alzheimer's disease-related tau phosphorylation via PAR-1. PLoS Genet 8: e1002918.

      Jackson, G. R., M. Wiedau-Pazos, T. K. Sang, N. Wagle, C. A. Brown et al., 2002 Human wildtype tau interacts with wingless pathway components and produces neurofibrillary pathology in Drosophila. Neuron 34: 509-519.

      Ji, W., K. An, C. Wang and S. Wang, 2022 Bioinformatics analysis of diagnostic biomarkers for Alzheimer's disease in peripheral blood based on sex differences and support vector machine algorithm. Hereditas 159: 38.

      Leitner, D., G. Pires, T. Kavanagh, E. Kanshin, M. Askenazi et al., 2024 Similar brain proteomic signatures in Alzheimer's disease and epilepsy. Acta Neuropathol 147: 27.

      Li, L., Y. Jiang, G. Wu, Y. A. R. Mahaman, D. Ke et al., 2022 Phosphorylation of Truncated Tau Promotes Abnormal Native Tau Pathology and Neurodegeneration. Mol Neurobiol 59: 6183-6199.

      Mershin, A., E. Pavlopoulos, O. Fitch, B. C. Braden, D. V. Nanopoulos et al., 2004 Learning and memory deficits upon TAU accumulation in Drosophila mushroom body neurons. Learn Mem 11: 277-287.

      Mukaka, M. M., 2012 Statistics corner: A guide to appropriate use of correlation coefficient in medical research. Malawi Med J 24: 69-71.

      Okenve-Ramos, P., R. Gosling, M. Chojnowska-Monga, K. Gupta, S. Shields et al., 2024 Neuronal ageing is promoted by the decay of the microtubule cytoskeleton. PLoS Biol 22: e3002504.

      Passarella, D., and M. Goedert, 2018 Beta-sheet assembly of Tau and neurodegeneration in Drosophila melanogaster. Neurobiol Aging 72: 98-105.

      Sun, Z., J. S. Kwon, Y. Ren, S. Chen, C. K. Walker et al., 2024 Modeling late-onset Alzheimer's disease neuropathology via direct neuronal reprogramming. Science 385: adl2992.

      Tao, Y., Y. Han, L. Yu, Q. Wang, S. X. Leng et al., 2020 The Predicted Key Molecules, Functions, and Pathways That Bridge Mild Cognitive Impairment (MCI) and Alzheimer's Disease (AD). Front Neurol 11: 233.

      Wegmann, S., B. Eftekharzadeh, K. Tepper, K. M. Zoltowska, R. E. Bennett et al., 2018 Tau protein liquid-liquid phase separation can initiate tau aggregation. EMBO J 37.

    1. The Appeal of the MiddleThere’s a type of game that I don’t think the world will ever have enough of: they’re the pleasant, 45-minute games that I can teach to anyone, but can also play with anyone. I’ve played this with my non-gamer mom, and my hardcore gamer friends, and many in between. The beauty is that 1) it’s easy to learn for new gamers, 2) possesses enough depth that gamers can enjoy it, but 3) also has enough randomness and a forgiving strategic learning curve, so that new gamers will stand a chance against more experienced players, and 4) plays quickly enough that it never overstays its welcome.

      board game: mid-weight, beginner-friendly

    1. Synthèse du projet Sympa

      Résumé Exécutif

      Sympa est un gestionnaire de listes de diffusion open-source (GPLv2), développé en Perl depuis 17 ans.

      Initialement conçu au sein de l'université Comète-Résu, il est aujourd'hui hébergé par Renater, le réseau national de télécommunications pour la technologie, l'enseignement et la recherche en France.

      Bien qu'il assure les fonctions de base d'un gestionnaire de listes, Sympa se distingue par des fonctionnalités avancées qui en font un outil puissant pour les grandes organisations.

      Ses principaux atouts sont sa capacité d'intégration profonde avec les systèmes d'information existants (bases de données, annuaires LDAP, systèmes d'authentification), ses mécanismes d'industrialisation pour la création et la gestion de milliers de listes, et un système d'autorisation par scénarios extrêmement flexible et expressif.

      Le projet, bien que mature et utilisé par des institutions prestigieuses (90% des universités françaises, ministères, entreprises comme Orange et Atos), fait face aux défis d'un code historique de 17 ans.

      Pour y répondre, l'équipe de développement a entamé une refonte majeure du code pour la future version 7.0.

      Cette version introduira une architecture modernisée, des tests unitaires, une nouvelle interface web et une migration vers Git pour faciliter les contributions externes.

      La vision à long terme inclut le déploiement en mode SaaS, la diffusion de messages multi-supports (SMS, web) et un système de plugins.

      Le projet lance un appel actif à la communauté pour contribuer au développement, à la documentation, au support et à la gestion du projet, offrant même un service d'hébergement gratuit pour la communauté Perl afin de promouvoir l'utilisation d'outils libres.

      1. Introduction à Sympa

      Définition et Origine

      Nom : Sympa est l'acronyme de "Système de Multi-postage Automatique".

      Âge : Il s'agit d'un logiciel mature, dont la première version a été publiée le 1er avril 1997, soit il y a 17 ans au moment de la présentation.

      Fonction de base : Comme Mailman ou PHPList, Sympa permet d'envoyer un seul e-mail à un serveur qui se charge de le distribuer à un grand nombre d'abonnés.

      Hébergement et Licence : Le projet est hébergé par Renater, l'équivalent français du réseau national pour la recherche et l'éducation. C'est un logiciel libre sous licence GPLv2.

      Philosophie Perl : L'équipe revendique fièrement l'utilisation de Perl, affirmant que malgré les questions sur l'utilisation d'un langage "plus moderne", Sympa reste l'un des meilleurs gestionnaires de listes de diffusion et "il fonctionne".

      Statistiques et Utilisateurs Clés

      Sympa est utilisé par une base d'utilisateurs majoritairement internationale, malgré son origine française.

      Métrique

      Chiffre Record

      Contexte

      Plus grande liste

      1,6 million d'abonnés

      Plus grand nombre d'hôtes virtuels

      30 000

      Sur un seul serveur, par l'hébergeur Infomaniac

      Plus grand nombre de listes

      32 000

      Sur un seul serveur

      Plus grand nombre d'abonnés

      3 millions

      Sur un seul serveur

      Principaux utilisateurs :

      Recherche et Éducation : 90% des universités et centres de recherche en France.

      Secteur Public : Plusieurs ministères français.

      Entreprises privées : Orange, Atos.

      Hébergeurs : Infomaniac, Switch (fourni par défaut à leurs clients).

      Organisations non gouvernementales : riseup.net, NAA, UNESCO, CGT.

      2. Fonctionnalités Principales et Différenciatrices

      Au-delà de l'envoi d'e-mails, Sympa se distingue par des capacités avancées conçues pour les environnements complexes.

      Gestion Avancée des E-mails

      Envoi en masse optimisé : Sympa permet de regrouper les e-mails par domaine et de personnaliser la fréquence d'envoi pour éviter d'être identifié comme un spammeur tout en assurant une distribution rapide.

      Support des standards (RFC) : Il prend en charge S/MIME (signature et chiffrement), DKIM et offre une protection contre DMARC, ce qui a été crucial lorsque Yahoo a modifié sa politique en avril, cassant de nombreux systèmes de listes de diffusion.

      Gestion des erreurs : La gestion des bounces est automatique et gérée par Sympa, non par l'expéditeur original. Le support de VERP (Variable Envelope Return Path) permet de traiter automatiquement les erreurs pour les adresses e-mail transférées.

      Suivi des e-mails : Un suivi respectueux de la vie privée (sans "spy pixels") permet de savoir ce qui est arrivé à un e-mail pour chaque utilisateur, en se basant sur les RFC.

      Personnalisation (Mail Merging) : Il est possible de fusionner des données utilisateur dans un e-mail pour envoyer des messages personnalisés.

      Archives Web : Sympa dispose d'archives web avec un contrôle d'accès fin.

      Intégration aux Systèmes d'Information (SI)

      Sympa est conçu pour s'intégrer nativement avec les briques logicielles d'un système d'information d'entreprise ou d'université.

      Composant

      Technologies Supportées

      Serveur de messagerie (MTA)

      Sendmail, Postfix, Exim

      Base de données (SGBDR)

      MySQL, PostgreSQL, Oracle, SQLite, Sybase ("sans espoir")

      Serveur Web

      Apache, lighttpd, Nginx

      Sources de données (Référentiels)

      Bases de données relationnelles, LDAP, fichiers plats, services web (texte brut)

      Systèmes d'authentification

      Natif (email/mot de passe), CAS, Shibboleth, LDAP

      Industrialisation de la Gestion des Listes

      Pour les environnements nécessitant la création de centaines ou de milliers de listes (par exemple, chaque année dans une université), Sympa offre des mécanismes d'automatisation.

      1. Création Manuelle : Un simple formulaire web où l'utilisateur remplit les informations de base (nom, objet, propriétaire).

      Les valeurs par défaut sont fournies par la configuration globale et un modèle de liste (Template Toolkit - tt2).

      2. Familles de Listes : Un mécanisme pour créer des listes en masse.

      Il utilise un modèle tt2 commun et un fichier XML qui définit les paramètres spécifiques de chaque liste à créer.

      Une seule commande permet de générer ou de mettre à jour toutes les listes de la famille.

      3. Listes Automatiques : Conçues pour les cas où il existe un très grand nombre de listes potentielles mais où seulement une fraction sera utilisée.

      ◦ Le nom de la liste contient lui-même les paramètres (ex: prefix-field1_value1-field2_value2).  

      ◦ La liste n'est créée dynamiquement que lors du premier envoi d'un message à cette adresse.   

      ◦ Une interface web a été développée pour simplifier la composition de ces adresses complexes.

      4. Familles de Familles : Il est possible de créer des familles de listes automatiques, permettant une industrialisation à plusieurs niveaux.

      Mécanisme d'Autorisation par Scénarios

      C'est l'une des fonctionnalités les plus originales et puissantes de Sympa.

      Principe : Les autorisations pour chaque action (envoyer un message, consulter les archives, etc.) sont définies dans des fichiers appelés "scénarios" (ex: send.scenario).

      Structure d'un scénario : C'est une séquence de règles évaluées de haut en bas.

      Chaque règle a la forme : test(arguments) 'auth_method' -> decision.

      Évaluation : Le traitement s'arrête à la première règle dont le test est vrai.

      Tests : De nombreux tests sont disponibles (is_subscriber, is_list_owner, etc.).

      Il est possible d'ajouter des tests personnalisés via des modules Perl (custom_condition).

      Méthodes d'authentification : Permettent d'appliquer des règles différentes selon la robustesse de l'authentification (ex: smime, smtp pour le champ From:, md5 pour un utilisateur authentifié sur le web).

      Décisions : Vont au-delà du simple "oui/non". Les décisions possibles incluent do_it (accepter), reject (rejeter), owner (modération par le propriétaire), etc.

      Ce système offre une grande expressivité pour définir des politiques d'accès très fines.

      Capacités de Gestion de Groupes

      Sympa peut être utilisé comme un gestionnaire de groupes pour des applications tierces.

      Interface SOAP (et REST en développement) : Une interface SOAP permet à d'autres applications d'interroger les données internes de Sympa (créer une liste, abonner un utilisateur, etc.).

      Intégration : Des plugins pour des applications comme DokuWiki ou LimeSurvey permettent d'interroger Sympa pour savoir à quelles listes (donc à quels groupes) un utilisateur appartient.

      L'application tierce peut alors accorder des privilèges en fonction de cette appartenance.

      Hiérarchie de groupes : Sympa permet d'inclure des listes dans d'autres listes, créant ainsi des groupes plus larges.

      Personnalisation Poussée

      Presque tous les aspects de Sympa sont personnalisables à différents niveaux (serveur global, hôte virtuel, liste individuelle) selon un principe de cascade.

      Interface Web : Entièrement basée sur des modèles Template Toolkit.

      Messages de service : Les messages envoyés aux utilisateurs (bienvenue, etc.) peuvent être modifiés.

      Modèles de création de liste.

      Scénarios d'autorisation.

      Paramètres de liste : Il est possible de créer ses propres paramètres en plus de la centaine existante.

      Attributs utilisateur : Possibilité d'ajouter des champs personnalisés pour les utilisateurs, qui pourront être synchronisés avec LDAP ou une base de données dans une future version.

      3. Architecture et Fonctionnement Technique

      Le flux de traitement d'un e-mail illustre l'architecture modulaire de Sympa :

      1. Réception : Un e-mail est envoyé à une liste et arrive sur le MTA entrant.

      2. Traitement Initial : Le MTA transmet l'e-mail au démon sympa.pl, qui évalue les autorisations, personnalise le message, etc.

      3. Stockage : Si le message est autorisé, il est stocké dans une base de données relationnelle (SGBDR). L'utilisation d'une base de données permet un accès concurrentiel sécurisé.

      4. Distribution : Un démon dédié, bulk.pl, se charge exclusivement de l'envoi des e-mails.

      Il lit les messages dans la base de données et ouvre de multiples sessions SMTP pour une distribution rapide et parallélisable sur plusieurs serveurs.

      5. Archivage : Simultanément, une copie du message est traitée par le démon archived.pl pour être ajoutée aux archives web.

      4. Le Projet Sympa : Développement et Communauté

      Gouvernance et Équipe

      Développeurs principaux : Le projet est passé de 2 développeurs historiques à une équipe élargie de 5 personnes, dont 3 externes à Renater.

      Mark (Strasbourg) : Gourou Perl.   

      Guillaume : Responsable sécurité, expert en bonnes pratiques.    ◦ Soji (Tokyo) : Spécialiste des e-mails et des problèmes d'encodage (a mené la migration vers UTF-8).   

      Etienne : Développeur polyglotte.  

      David Verdin (le présentateur) : "Homme à tout faire" (documentation, gestion de communauté, présentations).

      Contributions : Le projet bénéficie de nombreuses contributions de la communauté Perl.

      Défis d'un Logiciel Ancien

      Avec 17 ans d'histoire, le code de Sympa est devenu très hétérogène, avec des styles de codage variés issus de nombreux contributeurs.

      Base installée : L'importante base d'utilisateurs en production impose une grande prudence lors des modifications du code.

      Dépendances : L'ajout de nouveaux modules CPAN est compliqué car les utilisateurs en production préfèrent installer via des paquets de distribution, qui doivent donc exister pour ces modules.

      Absence de tests : Historiquement, le logiciel n'avait pas de tests unitaires ; les tests étaient effectués "en direct" sur les serveurs de production.

      5. L'Avenir de Sympa : Feuille de Route et Vision

      Versions à Venir (6.2, 7.0, 7.1)

      Version 6.2 : Presque finalisée, elle subit actuellement des tests manuels intensifs avant une sortie en bêta.

      Version 7.0 : Il s'agit d'une refonte majeure.

      Nouveau code : Réécriture complète menée par Guillaume pour moderniser l'architecture. 

      Tests unitaires : Implémentation systématique de tests.    ◦ Nouvelle interface web : Plus simple, plus moderne et ergonomique, développée par un contributeur de Nouvelle-Zélande.  

      Migration vers Git : Pour faciliter le fork et les contributions externes (par exemple sur GitHub).

      Version 7.1 et au-delà :

      Mode SaaS (Software as a Service).  

      Diffusion multi-supports : Envoi de messages via SMS ou mise à jour de services web.  

      Système de plugins : Pour permettre l'ajout de petites fonctionnalités sans attendre une intégration au cœur du logiciel.  

      Support des adresses e-mail internationalisées.

      Orientations Stratégiques

      Un objectif clé est de maintenir la double capacité de Sympa :

      1. Grandes installations : Capable de tourner sur des clusters en mode SaaS.

      2. Petites installations : Rester simple à installer et à faire fonctionner sur un petit serveur autonome.

      6. Appel à la Participation et Offres à la Communauté

      Opportunités de Contribution

      Le projet recherche activement de l'aide, y compris non technique :

      Développement : Correction de bugs, ajout de fonctionnalités.

      Documentation : La documentation est un wiki modifiable par tout utilisateur abonné à la liste sympa-users.

      Support : Aider les autres utilisateurs sur les listes de diffusion.

      Packaging : Créer des paquets pour différentes distributions Linux.

      Gestion de projet : Partage d'expérience sur la gestion d'un projet logiciel en pleine croissance.

      Offre d'Hébergement Gratuit

      Pour contrer l'utilisation de services comme Google Groups par les communautés du logiciel libre, l'équipe Sympa propose de fournir un service d'hébergement de listes de diffusion gratuit pour la communauté Perl mondiale.

      L'infrastructure de Renater permet de déployer un nouvel hôte virtuel en 30 minutes.

      7. Questions et Réponses Clés

      Nouvelle interface web (v7.0) : Elle sera plus simple, avec moins d'options par défaut pour ne pas submerger les nouveaux utilisateurs.

      L'ergonomie sera plus moderne et proche de ce que l'on trouve sur les réseaux sociaux.

      Interface REST : Une interface REST existe déjà pour la gestion de groupes (basée sur OAuth), mais la refonte du code vise à rendre toutes les fonctionnalités de Sympa accessibles via toutes ses interfaces (ligne de commande, SOAP, REST, web et e-mail).

      Stockage des e-mails et des pièces jointes : Les e-mails des archives sont stockés de façon permanente.

      L'anonymisation est un défi juridique et technique complexe.

      Les pièces jointes sont stockées et accessibles via un lien.

      Pour les listes qui le souhaitent, les pièces jointes volumineuses peuvent être automatiquement détachées et remplacées par un lien pour alléger les e-mails.

      Support des bases de données : MySQL est celle qui reçoit le plus d'attention car c'est la plus utilisée par l'équipe.

      PostgreSQL et SQLite sont également très bien maintenus et leurs schémas sont mis à jour automatiquement.

      Le support d'Oracle est plus difficile.

    1. Reviewer #2 (Public review):

      Summary:

      The role of PRC2 in post neural crest induction was not well understood. This work developed an elegant mouse genetic system to conditionally deplete EED upon SOX10 activation. Substantial developmental defects were identified for craniofacial and bone development. The authors also performed extensive single-cell RNA sequencing to analyze differentiation gene expression changes upon conditional EED disruption.

      Strengths:

      (1) Elegant genetic system to ablate EED post neural crest induction.

      (2) Single-cell RNA-seq analysis is extremely suitable for studying the cell type specific gene expression changes in developmental systems.

      Original Weaknesses:

      (1) Although this study is well designed and contains state-of-art single cell RNA-seq analysis, it lacks the mechanistic depth in the EED/PRC2-mediated epigenetic repression. This is largely because no epigenomic data was shown.

      (2) The mouse model of conditional loss of EZH2 in neural crest has been previously reported, as the authors pointed out in the discussion. What is novelty in this study to disrupt EED? Perhaps a more detailed comparison of the two mouse models would be beneficial.

      (3) The presentation of the single-cell RNA-seq data may need improvement. The complexity of the many cell types blurs the importance of which cell types are affected the most by EED disruption.

      (4) While it's easy to identify PRC2/EED target genes using published epigenomic data, it would be nice to tease out the direct versus indirect effects in the gene expression changes (e.g Fig. 4e)

      Comments on latest version:

      The authors have addressed weaknesses 2 and 3 of my previous comment very well. For weaknesses 1 and 4, the authors have added a main Fig 5 and its associated supplemental materials, which definitely strengthen the mechanistic depth of the story. However, I think the audience would appreciate if the following questions/points could be further addressed regarding the Cut&Tag data (mostly related to main Figure 5):

      (1) The authors described that Sox10-Cre would be expressed at E8.75, and in theory, EED-FL would be ablated soon after that. Why would E16.5 exhibit a much smaller loss in H3K27me3 compared to E12.5? Shouldn't a prolong loss of EED lead to even worse consequence?

      (2) The gene expression change at E12.5 upon loss of EED (shown in Fig. 4h) seems to be massive, including many PRC2-target genes. However, the H3K27me3 alteration seems to be mild even at E12.5. Does this infer a PRC2 or H3K27 methylation - independent role of EED? To address this, I suggest the authors re-consider addressing my previously commented weakness #4 regarding the RNA-seq versus Cut&Tag change correlation. For example, a gene scatter plot with X-axis of RNA-seq changes versus Y-axis of H3K27me3 level changes.

      (3) The CUT&Tag experiments seem to contain replicates according to the figure legend, but no statistical analysis was presented including the new supplemental tables. Also, for Fig. 5c-d, instead of showing the MRR in individual conditions, I think the audience would really want to know the differential MRR between Fl/WT and Fl/Fl. In other words, how many genes/ MRR have statistically lower H3K27me3 level upon EED loss.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Epigenetic regulation complex (PRC2) is essential for neural crest specification, and its misregulation has been shown to cause severe craniofacial defects. This study shows that Eed, a core PRC2 component, is critical for craniofacial osteoblast differentiation and mesenchymal proliferation after neural crest induction. Using mouse genetics and single-cell RNA sequencing, the researcher found that conditional knockout of Eed leads to significant craniofacial hypoplasia, impaired osteogenesis, and reduced proliferation of mesenchymal cells in post-migratory neural crest populations.

      Overall, the study is superficial and descriptive. No in-depth mechanism was analyzed and the phenotype analysis is not comprehensive.

      We thank the reviewer for sharing their expertise and for taking the time to provide helpful suggestions to improve our study. We are gratified that the striking phenotypes we report from Eed loss in post-migratory neural crest craniofacial tissues were appreciated. The breadth and depth of our phenotyping techniques, including skeletal staining, micro-CT, echocardiogram, immunofluorescence, histology, and primary craniofacial cell culture provide comprehensive data in support our hypothesis that PRC2 is required for epigenetic control of craniofacial osteoblast differentiation. To provide mechanistic data in support of this hypothesis, we have now performed CUT&Tag H3K27me3 chromatin profiling on nuclei harvested from E12.5 or E16.5 Sox10-Cre Eed<sup>Fl/WT</sup> and Sox10-Cre Eed<sup>Fl/Fl</sup> craniofacial tissue. These new data, which are presented in Fig. 5, Supplementary Fig. 9, and Supplementary Tables 7-10 of our revised manuscript, validate our hypothesis that epigenetic regulation of chromatin architecture downstream of PRC2 activity underlies craniofacial osteoblast differentiation. In particular, we now show that Eed-dependent H3K27me3 methylation is associated with correct temporal expression of transcription factors that are necessary for craniofacial differentiation and patterning, such as including Msx1, Pitx1, Pax7, which were initially nominated by single-cell RNA sequencing of E12.5 Sox10-Cre Eed<sup>Fl/WT</sup> and Sox10-Cre Eed<sup>Fl/Fl</sup> craniofacial tissues in Fig. 4, Supplementary Fig. 5-7, and Supplementary Tables 1-6.

      Reviewer #2 (Public review):

      Summary:

      The role of PRC2 in post-neural crest induction was not well understood. This work developed an elegant mouse genetic system to conditionally deplete EED upon SOX10 activation. Substantial developmental defects were identified for craniofacial and bone development. The authors also performed extensive single-cell RNA sequencing to analyze differentiation gene expression changes upon conditional EED disruption.

      Strengths:

      (1) Elegant genetic system to ablate EED post neural crest induction.

      (2) Single-cell RNA-seq analysis is extremely suitable for studying the cell type-specific gene expression changes in developmental systems.

      We thank the reviewer for their generous and helpful comments on our study. We are happy that our mouse genetic and single-cell RNA sequencing approaches were appropriate in pairing the craniofacial phenotypes we report with distinct gene expression changes in post-migratory neural crest tissues upon Eed deletion.

      Weaknesses:

      (1) Although this study is well designed and contains state-of-the-art single-cell RNA-seq analysis, it lacks the mechanistic depth in the EED/PRC2-mediated epigenetic repression. This is largely because no epigenomic data was shown.

      Thank you for this suggestion. As described in response to Reviewer #1, we have now performed CUT&Tag H3K27me3 chromatin profiling on nuclei harvested from E12.5 or E16.5 Sox10-Cre Eed<sup>Fl/WT</sup> and Sox10-Cre Eed<sup>Fl/Fl</sup> craniofacial tissues to provide mechanistic epigenomic data in support of our hypothesis that hat PRC2 is required for craniofacial osteoblast differentiation. These new data, which are presented in Fig. 5, Supplementary Fig. 9, and Supplementary Tables 7-10 of our revised manuscript, integrate genome-wide and targeted metaplot visualizations across genotypes with in-depth analyses of methylation rich regions and genes associated with methylation rich loci. Broadly, these new data reveal that changes in H3K27me3 occupancy correlate with gene expression changes from single-cell RNA sequencing of E12.5 Sox10-Cre Eed<sup>Fl/WT</sup> and Sox10-Cre Eed<sup>Fl/Fl</sup> craniofacial tissues in Fig. 4, Supplementary Fig. 5-7, and Supplementary Tables 1-6.

      (2) The mouse model of conditional loss of EZH2 in neural crest has been previously reported, as the authors pointed out in the discussion. What is novel in this study to disrupt EED? Perhaps a more detailed comparison of the two mouse models would be beneficial.

      We acknowledge and cite the study the reviewer has indicated (Schwarz et al. Development 2014) in our initial and revised manuscripts. This elegant investigation uses Wnt1-Cre to delete Ezh2 and reports a phenotype similar to the one we observed with Sox10-Cre deletion of Eed, but our study adds depth to the understanding of PRC2’s vital role in neural crest development by ablating Eed, which has a unique function in the PRC2 complex by binding to H3K27me3 and allosterically activating Ezh2. In this sense, our study sheds light on whether phenotypes arising from deletion of Eed, the PRC2 “reader”, differ from phenotypes arising from deletion of Ezh2, the PRC2 “writer”, in neural crest derived tissues. Moreover, we provide the first single-cell RNA sequencing and epigenomic investigations of craniofacial phenotypes arising from PRC2 activity in the developing neural crest. Due to limitations associated with the Wnt1-Cre transgene (Lewis et al. Developmental Biology 2013), which targets pre-migratory neural crest cells, our investigations used Sox10Cre, which targets the migratory neural crest and is completely recombined by E10.5. We have included a detailed comparison of these mouse models in the Discussion section of our revised manuscript, and we thank the reviewer for this thoughtful suggestion. 

      (3) The presentation of the single-cell RNA-seq data may need improvement. The complexity of the many cell types blurs the importance of which cell types are affected the most by EED disruption.

      We thank the reviewer for the opportunity to improve the presentation of our single-cell RNA sequencing data. In response, we have added Supplementary Fig. 8 to our revised manuscript, which shows the cell clusters most affected by EED disruption in UMAP space across genotypes. Because we wanted to capture the fill diversity of cell types underlying the phenotypes we report, we did not sort Sox10+ cells (via FACS, for example) from craniofacial tissues before single-cell RNA sequencing. Our resulting single-cell RNA sequencing data are therefore inclusive of a diversity of cell types in UMAP space, and the prevalence of many of these cell types was unaffected by epigenetic disruption of neural crest derived tissues. The prevalence of the cell clusters that are most affected across genotypes and which are most relevant to our analyses of the developing neural crest are shown in Fig. 4c (and now also in Supplementary Fig. 8), including C0 (differentiating osteoblasts), C4 (mesenchymal stem cells), C5 (mesenchymal stem cells), and C7 (proliferating mesenchymal stem cells). Marker genes and pseudobulked differential expression analyses across these clusters are shown in Fig. 4d and Fig. 4e-h, respectively. 

      (4) While it's easy to identify PRC2/EED target genes using published epigenomic data, it would be nice to tease out the direct versus indirect effects in the gene expression changes (e.g Figure 4e).

      We agree with the reviewer that the single-cell RNA sequencing data in our initial submission do not provide insight into direct versus indirect changes in gene expression downstream of PRC2. In contrast, the CUT&Tag chromatin profiling data that we have generated for this revision provides mechanistic insight into H3K27me3 occupancy and direct effects on gene expression resulting from PRC2 inactivation in our mouse models.

      REVIEWING EDITOR COMMENTS

      The following are recommended as essential revisions

      (1) The study is overall superficial and primarily descriptive, lacking in-depth mechanistic analysis and comprehensive phenotype evaluation.

      Please see responses to Reviewer #1 and Reviewer #2 (weaknesses 1 and 4) above. 

      (2) The authors did not investigate the temporal and spatial expression of Eed during cranial neural crest development, which is crucial for explaining the observed phenotypes.

      The temporal and spatial expression of Eed during embryogenesis is well studied. Eed is ubiquitously expressed starting at E5.5, peaks at E9.5, and is downregulated but maintained at a high basal expression level through E18.5 (Schumacher et al. Nature 1996). Although comprehensive analysis of Eed expression in neural crest tissues has not been reported (to our knowledge), Eed physically and functionally interacts with Ezh2 (Sewalt et al. Mol Cell Biol 1998), which is enriched at a diversity of timepoints throughout all developing craniofacial tissues (Schwarz et al. Development 2014). In our study, we confirmed enrichment of Eed expression in craniofacial tissues throughout development using QPCR, and have provided a more detailed description of these published and new findings in the Discussion section of our revised manuscript. 

      (3) There is no apoptosis analysis provided for any of the samples.

      We evaluated the presence of apoptotic cells in E12.5 craniofacial sections using immunofluorescence for Cleaved Caspase 3 in Supplementary Fig. 3d. Although we found a modest increase in the labeling index of apoptotic cells, there was insufficient evidence to conclude that apoptosis is a substantial factor in craniofacial hypoplasia resulting from Eed loss in post-migratory neural crest craniofacial tissues. We have clarified these findings in the Results and Discussion sections of our revised manuscript. 

      (4) As Eed is a core component of the PRC2 complex, were any other components altered in the Eed cKO mutant? How does Eed regulation influence osteogenic differentiation and proliferation through known pathways?

      We thank the editors for this thoughtful inquiry. Although we did not specifically investigate expression or stability of other PRC2 components in Eed conditional mutants, and little is known about how Eed regulates osteogenic differentiation or proliferation through any pathway, our single-cell RNA sequencing data presented in Fig. 4, Supplementary Fig. 5-7, and Supplementary Tables 1-6 provide a significant conceptual advance with mechanistic implications for understanding bone development downstream of Eed and do not reveal any alterations in the expression of other PRC2 components across genotypes. We have clarified these important details in the Discussion section of our revised manuscript. 

      (5) The authors may compare the Eed cKO phenotype with that of the previous EZH2 cKO mouse model since both Eed and EZH2 are essential subunits of PRC2.

      Please see responses to editorial comment 2 above and the last paragraph of the Discussion section of our revised manuscript for comparisons between Eed and Ezh2 knockout phenotypes.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors validate the contribution of RAP2A to GB progression. RAp2A participates in asymmetric cell division, and the localization of several cell polarity markers, including cno and Numb.

      Strengths:

      The use of human data, Drosophila models, and cell culture or neurospheres is a good scenario to validate the hypothesis using complementary systems.

      Moreover, the mechanisms that determine GB progression, and in particular glioma stem cells biology, are relevant for the knowledge on glioblastoma and opens new possibilities to future clinical strategies.

      Weaknesses:

      While the manuscript presents a well-supported investigation into RAP2A's role in GBM, several methodological aspects require further validation. The major concern is the reliance on a single GB cell line (GB5), which limits the generalizability of the findings. Including multiple GBM lines, particularly primary patient-derived 3D cultures with known stem-like properties, would significantly enhance the study's relevance.

      Additionally, key mechanistic aspects remain underexplored. Further investigation into the conservation of the Rap2l-Cno/aPKC pathway in human cells through rescue experiments or protein interaction assays would be beneficial. Similarly, live imaging or lineage tracing would provide more direct evidence of ACD frequency, complementing the current indirect metrics (odd/even cell clusters, Numb asymmetry).

      Several specific points require attention:

      (1) The specificity of Rap2l RNAi needs further confirmation. Is Rap2l expressed in neuroblasts or intermediate neural progenitors? Can alternative validation methods be employed?

      There are no available antibodies/tools to determine whether Rap2l is expressed in NB lineages, and we have not been able either to develop any. However, to further prove the specificity of the Rap2l phenotype, we have now analyzed two additional and independent RNAi lines of Rap2l along with the original RNAi line analyzed. We have validated the results observed with this line and found a similar phenotype in the two additional RNAi lines now analyzed. These results have been added to the text ("Results section", page 6, lines 142-148) and are shown in Supplementary Figure 3.

      (2) Quantification of phenotypic penetrance and survival rates in Rap2l mutants would help determine the consistency of ACD defects.

      In the experiment previously mentioned (repetition of the original Rap2l RNAi line analysis along with two additional Rap2l RNAi lines) we have substantially increased the number of samples analyzed (both the number of NB lineages and the number of different brains analyzed). With that, we have been able to determine that the penetrance of the phenotype was 100% or almost 100% in the 3 different RNAi lines analyzed (n>14 different brains/larvae analyzed in all cases). Details are shown in the text (page 6, lines 142-148), in Supplementary Figure 3 and in the corresponding figure legend.

      (3) The observations on neurosphere size and Ki-67 expression require normalization (e.g., Ki-67+ cells per total cell number or per neurosphere size). Additionally, apoptosis should be assessed using Annexin V or TUNEL assays.

      The experiment of Ki-67+ cells was done considering the % of Ki-67+ cells respect the total cell number in each neurosphere. In the "Materials and methods" section it is well indicated: "The number of Ki67+ cells with respect to the total number of nuclei labelled with DAPI within a given neurosphere were counted to calculate the Proliferative Index (PI), which was expressed as the % of Ki67+ cells over total DAPI+ cells"

      Perhaps it was not clearly showed in the graph of Figure 5A. We have now changed it indicating: "% of Ki67+ cells/ neurosphere" in the "Y axis". 

      Unfortunately, we currently cannot carry out neurosphere cultures to address the apoptosis experiments. 

      (4) The discrepancy in Figures 6A and 6B requires further discussion.

      We agree that those pictures can lead to confusion. In the analysis of the "% of neurospheres with even or odd number of cells", we included the neurospheres with 2 cells both in the control and in the experimental condition (RAP2A). The number of this "2 cell-neurospheres" was very similar in both conditions (27,7 % and 27 % of the total neurospheres analyzed in each condition), and they can be the result of a previous symmetric or asymmetric division, we cannot distinguish that (only when they are stained with Numb, for example, as shown in Figure 6B). As a consequence, in both the control and in the experimental condition, these 2-cell neurospheres included in the group of "even" (Figure 6A) can represent symmetric or asymmetric divisions. However, in the experiment shown in Figure 6B, it is shown that in these 2 cellneurospheres there are more cases of asymmetric divisions in the experimental condition (RAP2A) than in the control.

      Nevertheless, to make more accurate and clearer the conclusions, we have reanalyzed the data taking into account only the neurospheres with 3-5-7 (as odd) or 4-6-8 (as even) cells. Likewise, we have now added further clarifications regarding the way the experiment has been analyzed in the methods.

      (5) Live imaging of ACD events would provide more direct evidence.

      We agree that live imaging would provide further evidence. Unfortunately, we currently cannot carry out neurosphere cultures to approach those experiments.

      (6) Clarification of terminology and statistical markers (e.g., p-values) in Figure 1A would improve clarity.

      We thank the reviewer for pointing out this issue. To improve clarity, we have now included a Supplementary Figure (Fig. S1) with the statistical parameters used. Additionally, we have performed a hierarchical clustering of genes showing significant or not-significant changes in their expression levels.

      (7) Given the group's expertise, an alternative to mouse xenografts could be a Drosophila genetic model of glioblastoma, which would provide an in vivo validation system aligned with their research approach.

      The established Drosophila genetic model of glioblastoma is an excellent model system to get deep insight into different aspects of human GBM. However, the main aim of our study was to determine whether an imbalance in the mode of stem cell division, favoring symmetric divisions, could contribute to the expansion of the tumor. We chose human GBM cell lines-derived neurospheres because in human GBM it has been demonstrated the existence of cancer stem cells (glioblastoma or glioma stem cells -GSCs--). And these GSCs, as all stem cells, can divide symmetric or asymmetrically. In the case of the Drosophila model of GBM, the neoplastic transformation observed after overexpressing the EGF receptor and PI3K signaling is due to the activation of downstream genes that promote cell cycle progression and inhibit cell cycle exit. It has also been suggested that the neoplastic cells in this model come from committed glial progenitors, not from stem-like cells.

      With all, it would be difficult to conclude the causes of the potential effects of manipulating the Rap2l levels in this Drosophila system of GBM. We do not discard this analysis in the future (we have all the "set up" in the lab). However, this would probably imply a new project to comprehensively analyze and understand the mechanism by which Rap2l (and other ACD regulators) might be acting in this context, if it is having any effect. 

      However, as we mentioned in the Discussion, we agree that the results we have obtained in this study must be definitely validated in vivo in the future using xenografts with 3D-primary patient-derived cell lines.

      Reviewer #2 (Public review):

      This study investigates the role of RAP2A in regulating asymmetric cell division (ACD) in glioblastoma stem cells (GSCs), bridging insights from Drosophila ACD mechanisms to human tumor biology. They focus on RAP2A, a human homolog of Drosophila Rap2l, as a novel ACD regulator in GBM is innovative, given its underexplored role in cancer stem cells (CSCs). The hypothesis that ACD imbalance (favoring symmetric divisions) drives GSC expansion and tumor progression introduces a fresh perspective on differentiation therapy. However, the dual role of ACD in tumor heterogeneity (potentially aiding therapy resistance) requires deeper discussion to clarify the study's unique contributions against existing controversies. Some limitations and questions need to be addressed.

      (1) Validation of RAP2A's prognostic relevance using TCGA and Gravendeel cohorts strengthens clinical relevance. However, differential expression analysis across GBM subtypes (e.g., MES, DNA-methylation subtypes ) should be included to confirm specificity.

      We have now included a Supplementary figure (Supplementary Figure 2), in which we show the analysis of RAP2A levels in the different GBM subtypes (proneural, mesenchymal and classical) and their prognostic relevance (i.e. the proneural subtype that presents RAP2A levels significantly higher than the others is the subtype that also shows better prognostic).

      (2) Rap2l knockdown-induced ACD defects (e.g., mislocalization of Cno/Numb) are well-designed. However, phenotypic penetrance and survival rates of Rap2l mutants should be quantified to confirm consistency.

      We have now analyzed two additional and independent RNAi lines of Rap2l along with the original RNAi line. We have validated the results observed with this line and found a similar phenotype in the two additional RNAi lines now analyzed. To determine the phenotypic penetrance, we have substantially increased the number of samples analyzed (both the number of NB lineages and the number of different brains analyzed). With that, we have been able to determine that the penetrance of the phenotype was 100% or almost 100% in the 3 different Rap2l RNAi lines analyzed (n>14 different brains/larvae analyzed in all cases). These results have been added to the text ("Results section", page 6, lines 142-148) and are shown in Supplementary Figure 3 and in the corresponding figure legend. 

      (3) While GB5 cells were effectively used, justification for selecting this line (e.g., representativeness of GBM heterogeneity) is needed. Experiments in additional GBM lines (especially the addition of 3D primary patient-derived cell lines with known stem cell phenotype) would enhance generalizability.

      We tried to explain this point in the paper (Results). As we mentioned, we tested six different GBM cell lines finding similar mRNA levels of RAP2A in all of them, and significantly lower levels than in control Astros (Fig. 3A). We decided to focus on the GBM cell line called GB5 as it grew well (better than the others) in neurosphere cell culture conditions, for further analyses. We agree that the addition of at least some of the analyses performed with the GB5 line using other lines (ideally in primary patientderive cell lines, as the reviewer mentions) would reinforce the results. Unfortunately, we cannot perform experiments in cell lines in the lab currently. We will consider all of this for future experiments.

      (4) Indirect metrics (odd/even cell clusters, NUMB asymmetry) are suggestive but insufficient. Live imaging or lineage tracing would directly validate ACD frequency.

      We agree that live imaging would provide further evidence. Unfortunately, we cannot approach those experiments in the lab currently.

      (5) The initial microarray (n=7 GBM patients) is underpowered. While TCGA data mitigate this, the limitations of small cohorts should be explicitly addressed and need to be discussed.

      We completely agree with this comment. We had available the microarray, so we used it as a first approach, just out of curiosity of knowing whether (and how) the levels of expression of those human homologs of Drosophila ACD regulators were affected in this small sample, just as starting point of the study. We were conscious of the limitations of this analysis and that is why we followed up the analysis in the datasets, on a bigger scale. We already mentioned the limitations of the array in the Discussion:

      "The microarray we interrogated with GBM patient samples had some limitations. For example, not all the human genes homologs of the Drosophila ACD regulators were present (i.e. the human homologs of the determinant Numb). Likewise, we only tested seven different GBM patient samples. Nevertheless, the output from this analysis was enough to determine that most of the human genes tested in the array presented altered levels of expression"[....] In silico analyses, taking advantage of the existence of established datasets, such as the TCGA, can help to more robustly assess, in a bigger sample size, the relevance of those human genes expression levels in GBM progression, as we observed for the gene RAP2A."

      (6) Conclusions rely heavily on neurosphere models. Xenograft experiments or patient-derived orthotopic models are critical to support translational relevance, and such basic research work needs to be included in journals.

      We completely agree. As we already mentioned in the Discussion, the results we have obtained in this study must be definitely validated in vivo in the future using xenografts with 3D-primary patient-derived cell lines.

      (7) How does RAP2A regulate NUMB asymmetry? Is the Drosophila Rap2l-Cno/aPKC pathway conserved? Rescue experiments (e.g., Cno/aPKC knockdown with RAP2A overexpression) or interaction assays (e.g., Co-IP) are needed to establish molecular mechanisms.

      The mechanism by which RAP2A is regulating ACD is beyond the scope of this paper. We do not even know how Rap2l is acting in Drosophila to regulate ACD. In past years, we did analyze the function of another Drosophila small GTPase, Rap1 (homolog to human RAP1A) in ACD, and we determined the mechanism by which Rap1 was regulating ACD (including the localization of Numb): interacting physically with Cno and other small GTPases, such as Ral proteins, and in a complex with additional ACD regulators of the "apical complex" (aPKC and Par-6). Rap2l could be also interacting physically with the "Ras-association" domain of Cno (domain that binds small GTPases, such as Ras and Rap1). We have added some speculations regarding this subject in the Discussion:

      "It would be of great interest in the future to determine the specific mechanism by which Rap2l/RAP2A is regulating this process. One possibility is that, as it occurs in the case of the Drosophila ACD regulator Rap1, Rap2l/RAP2A is physically interacting or in a complex with other relevant ACD modulators."

      (8) Reduced stemness markers (CD133/SOX2/NESTIN) and proliferation (Ki-67) align with increased ACD. However, alternative explanations (e.g., differentiation or apoptosis) must be ruled out via GFAP/Tuj1 staining or Annexin V assays.

      We agree with these possibilities.  Regarding differentiation, the potential presence of increased differentiation markers would be in fact a logic consequence of an increase in ACD divisions/reduced stemness markers. Unfortunately, we cannot approach those experiments in the lab currently.

      (9) The link between low RAP2A and poor prognosis should be validated in multivariate analyses to exclude confounding factors (e.g., age, treatment history).

      We have now added this information in the "Results section" (page 5, lines 114-123).

      (10) The broader ACD regulatory network in GBM (e.g., roles of other homologs like NUMB) and potential synergies/independence from known suppressors (e.g., TRIM3) warrant exploration.

      The present study was designed as a "proof-of-concept" study to start analyzing the hypothesis that the expression levels of human homologs of known Drosophila ACD regulators might be relevant in human cancers that contain cancer stem cells, if those human homologs were also involved in modulating the mode of (cancer) stem cell division. 

      To extend the findings of this work to the whole ACD regulatory network would be the logic and ideal path to follow in the future.

      We already mentioned this point in the Discussion:

      "....it would be interesting to analyze in the future the potential consequences that altered levels of expression of the other human homologs in the array can have in the behavior of the GSCs. In silico analyses, taking advantage of the existence of established datasets, such as the TCGA, can help to more robustly assess, in a bigger sample size, the relevance of those human genes expression levels in GBM progression, as we observed for the gene RAP2A."

      (11) The figures should be improved. Statistical significance markers (e.g., p-values) should be added to Figure 1A; timepoints/culture conditions should be clarified for Figure 6A.

      Regarding the statistical significance markers, we have now included a Supplementary Figure (Fig. S1) with the statistical parameters used. Additionally, we have performed a hierarchical clustering of genes showing significant or notsignificant changes in their expression levels. 

      Regarding the experimental conditions corresponding to Figure 6A, those have now been added in more detail in "Materials and Methods" ("Pair assay and Numb segregation analysis" paragraph).

      (12) Redundant Drosophila background in the Discussion should be condensed; terminology should be unified (e.g., "neurosphere" vs. "cell cluster").

      As we did not mention much about Drosophila ACD and NBs in the "Introduction", we needed to explain in the "Discussion" at least some very basic concepts and information about this, especially for "non-drosophilists". We have reviewed the Discussion to maintain this information to the minimum necessary.

      We have also reviewed the terminology that the Reviewer mentions and have unified it.

      Reviewer #1 (Recommendations for the authors):

      To improve the manuscript's impact and quality, I would recommend:

      (1) Expand Cell Line Validation: Include additional GBM cell lines, particularly primary patient-derived 3D cultures, to increase the robustness of the findings.

      (2) Mechanistic Exploration: Further examine the conservation of the Rap2lCno/aPKC pathway in human cells using rescue experiments or protein interaction assays.

      (3) Direct Evidence of ACD: Implement live imaging or lineage tracing approaches to strengthen conclusions on ACD frequency.

      (4) RNAi Specificity Validation: Clarify Rap2l RNAi specificity and its expression in neuroblasts or intermediate neural progenitors.

      (5) Quantitative Analysis: Improve quantification of neurosphere size, Ki-67 expression, and apoptosis to normalize findings.

      (6) Figure Clarifications: Address inconsistencies in Figures 6A and 6B and refine statistical markers in Figure 1A.

      (7) Alternative In Vivo Model: Consider leveraging a Drosophila glioblastoma model as a complementary in vivo validation approach.

      Addressing these points will significantly enhance the manuscript's translational relevance and overall contribution to the field.

      We have been able to address points 4, 5 and 6. Others are either out of the scope of this work (2) or we do not have the possibility to carry them out at this moment in the lab (1, 3 and 7). However, we will complete these requests/recommendations in other future investigations.

      Reviewer #2 (Recommendations for the authors):

      Major Revision /insufficient required to address methodological and mechanistic gaps.

      (1) Enhance Clinical Relevance

      Validate RAP2A's prognostic significance across multiple GBM subtypes (e.g., MES, DNA-methylation subtypes) using datasets like TCGA and Gravendeel to confirm specificity.

      Perform multivariate survival analyses to rule out confounding factors (e.g., patient age, treatment history).

      (2) Strengthen Mechanistic Insights

      Investigate whether the Rap2l-Cno/aPKC pathway is conserved in human GBM through rescue experiments (e.g., RAP2A overexpression with Cno/aPKC knockdown) or interaction assays (e.g., Co-IP).

      Use live-cell imaging or lineage tracing to directly validate ACD frequency instead of relying on indirect metrics (odd/even cell clusters, NUMB asymmetry).

      (3) Improve Model Systems & Experimental Design

      Justify the selection of GB5 cells and include additional GBM cell lines, particularly 3D primary patient-derived cell models, to enhance generalizability.

      It is essential to perform xenograft or orthotopic patient-derived models to support translational relevance.

      (5) Address Alternative Interpretations

      Rule out other potential effects of RAP2A knockdown (e.g., differentiation or apoptosis) using GFAP/Tuj1 staining or Annexin V assays.

      Explore the broader ACD regulatory network in GBM, including interactions with NUMB and TRIM3, to contextualize findings within known tumor-suppressive pathways.

      (6) Improve Figures & Clarity

      Add statistical significance markers (e.g., p-values) in Figure 1A and clarify timepoints/culture conditions for Figure 6A.

      Condense redundant Drosophila background in the discussion and ensure consistent terminology (e.g., "neurosphere" vs. "cell cluster").

      We have been able to address points 1, partially 3 and 6. Others are either out of the scope of this work or we do not have the possibility to carry them out at this moment in the lab. However, we are very interested in completing these requests/recommendations and we will approach that type of experiments in other future investigations.

    1. Product Description .quill-editor-edit-mode .ql-editor { min-height: 125px; } .ql-container { box-sizing: border-box; font-family: Helvetica, Arial, sans-serif; font-size: 13px; height: 100%; margin: 0px; position: relative; } .ql-container.ql-disabled .ql-tooltip { visibility: hidden; } .ql-container.ql-disabled .ql-editor ul[data-checked]>li::before { pointer-events: none; } .ql-clipboard { left: -100000px; height: 1px; overflow-y: hidden; position: absolute; top: 50%; } .ql-clipboard p { margin: 0; padding: 0; } .ql-editor { box-sizing: border-box; line-height: 1.42; height: 100%; outline: none; overflow-y: auto; padding: 12px 15px; tab-size: 4; -moz-tab-size: 4; text-align: left; white-space: pre-wrap; word-wrap: break-word; } .ql-editor>* { cursor: text; } .ql-editor p, .ql-editor ol, .ql-editor ul, .ql-editor pre, .ql-editor blockquote, .ql-editor h1, .ql-editor h2, .ql-editor h3, .ql-editor h4, .ql-editor h5, .ql-editor h6 { margin: 0; padding: 0; counter-reset: list-1 list-2 list-3 list-4 list-5 list-6 list-7 list-8 list-9; } .ql-editor ol, .ql-editor ul { padding-left: 1.5em; } .ql-editor ol>li, .ql-editor ul>li { list-style-type: none; } .ql-editor ul>li::before { content: '\2022'; } .ql-editor ul[data-checked=true], .ql-editor ul[data-checked=false] { pointer-events: none; } .ql-editor ul[data-checked=true]>li *, .ql-editor ul[data-checked=false]>li * { pointer-events: all; } .ql-editor ul[data-checked=true]>li::before, .ql-editor ul[data-checked=false]>li::before { color: #777; cursor: pointer; pointer-events: all; } .ql-editor ul[data-checked=true]>li::before { content: '\2611'; } .ql-editor ul[data-checked=false]>li::before { content: '\2610'; } .ql-editor li::before { display: inline-block; white-space: nowrap; width: 1.2em; } .ql-editor li:not(.ql-direction-rtl)::before { margin-left: -1.5em; margin-right: 0.3em; text-align: right; } .ql-editor li.ql-direction-rtl::before { margin-left: 0.3em; margin-right: -1.5em; } .ql-editor ol li:not(.ql-direction-rtl), .ql-editor ul li:not(.ql-direction-rtl) { padding-left: 1.5em; } .ql-editor ol li.ql-direction-rtl, .ql-editor ul li.ql-direction-rtl { padding-right: 1.5em; } .ql-editor ol li { counter-reset: list-1 list-2 list-3 list-4 list-5 list-6 list-7 list-8 list-9; counter-increment: list-0; } .ql-editor ol li:before { content: counter(list-0, decimal) '. '; } .ql-editor ol li.ql-indent-1 { counter-increment: list-1; } .ql-editor ol li.ql-indent-1:before { content: counter(list-1, lower-alpha) '. '; } .ql-editor ol li.ql-indent-1 { counter-reset: list-2 list-3 list-4 list-5 list-6 list-7 list-8 list-9; } .ql-editor ol li.ql-indent-2 { counter-increment: list-2; } .ql-editor ol li.ql-indent-2:before { content: counter(list-2, lower-roman) '. '; } .ql-editor ol li.ql-indent-2 { counter-reset: list-3 list-4 list-5 list-6 list-7 list-8 list-9; } .ql-editor ol li.ql-indent-3 { counter-increment: list-3; } .ql-editor ol li.ql-indent-3:before { content: counter(list-3, decimal) '. '; } .ql-editor ol li.ql-indent-3 { counter-reset: list-4 list-5 list-6 list-7 list-8 list-9; } .ql-editor ol li.ql-indent-4 { counter-increment: list-4; } .ql-editor ol li.ql-indent-4:before { content: counter(list-4, lower-alpha) '. '; } .ql-editor ol li.ql-indent-4 { counter-reset: list-5 list-6 list-7 list-8 list-9; } .ql-editor ol li.ql-indent-5 { counter-increment: list-5; } .ql-editor ol li.ql-indent-5:before { content: counter(list-5, lower-roman) '. '; } .ql-editor ol li.ql-indent-5 { counter-reset: list-6 list-7 list-8 list-9; } .ql-editor ol li.ql-indent-6 { counter-increment: list-6; } .ql-editor ol li.ql-indent-6:before { content: counter(list-6, decimal) '. '; } .ql-editor ol li.ql-indent-6 { counter-reset: list-7 list-8 list-9; } .ql-editor ol li.ql-indent-7 { counter-increment: list-7; } .ql-editor ol li.ql-indent-7:before { content: counter(list-7, lower-alpha) '. '; } .ql-editor ol li.ql-indent-7 { counter-reset: list-8 list-9; } .ql-editor ol li.ql-indent-8 { counter-increment: list-8; } .ql-editor ol li.ql-indent-8:before { content: counter(list-8, lower-roman) '. '; } .ql-editor ol li.ql-indent-8 { counter-reset: list-9; } .ql-editor ol li.ql-indent-9 { counter-increment: list-9; } .ql-editor ol li.ql-indent-9:before { content: counter(list-9, decimal) '. '; } .ql-editor .ql-indent-1:not(.ql-direction-rtl) { padding-left: 3em; } .ql-editor li.ql-indent-1:not(.ql-direction-rtl) { padding-left: 4.5em; } .ql-editor .ql-indent-1.ql-direction-rtl.ql-align-right { padding-right: 3em; } .ql-editor li.ql-indent-1.ql-direction-rtl.ql-align-right { padding-right: 4.5em; } .ql-editor .ql-indent-2:not(.ql-direction-rtl) { padding-left: 6em; } .ql-editor li.ql-indent-2:not(.ql-direction-rtl) { padding-left: 7.5em; } .ql-editor .ql-indent-2.ql-direction-rtl.ql-align-right { padding-right: 6em; } .ql-editor li.ql-indent-2.ql-direction-rtl.ql-align-right { padding-right: 7.5em; } .ql-editor .ql-indent-3:not(.ql-direction-rtl) { padding-left: 9em; } .ql-editor li.ql-indent-3:not(.ql-direction-rtl) { padding-left: 10.5em; } .ql-editor .ql-indent-3.ql-direction-rtl.ql-align-right { padding-right: 9em; } .ql-editor li.ql-indent-3.ql-direction-rtl.ql-align-right { padding-right: 10.5em; } .ql-editor .ql-indent-4:not(.ql-direction-rtl) { padding-left: 12em; } .ql-editor li.ql-indent-4:not(.ql-direction-rtl) { padding-left: 13.5em; } .ql-editor .ql-indent-4.ql-direction-rtl.ql-align-right { padding-right: 12em; } .ql-editor li.ql-indent-4.ql-direction-rtl.ql-align-right { padding-right: 13.5em; } .ql-editor .ql-indent-5:not(.ql-direction-rtl) { padding-left: 15em; } .ql-editor li.ql-indent-5:not(.ql-direction-rtl) { padding-left: 16.5em; } .ql-editor .ql-indent-5.ql-direction-rtl.ql-align-right { padding-right: 15em; } .ql-editor li.ql-indent-5.ql-direction-rtl.ql-align-right { padding-right: 16.5em; } .ql-editor .ql-indent-6:not(.ql-direction-rtl) { padding-left: 18em; } .ql-editor li.ql-indent-6:not(.ql-direction-rtl) { padding-left: 19.5em; } .ql-editor .ql-indent-6.ql-direction-rtl.ql-align-right { padding-right: 18em; } .ql-editor li.ql-indent-6.ql-direction-rtl.ql-align-right { padding-right: 19.5em; } .ql-editor .ql-indent-7:not(.ql-direction-rtl) { padding-left: 21em; } .ql-editor li.ql-indent-7:not(.ql-direction-rtl) { padding-left: 22.5em; } .ql-editor .ql-indent-7.ql-direction-rtl.ql-align-right { padding-right: 21em; } .ql-editor li.ql-indent-7.ql-direction-rtl.ql-align-right { padding-right: 22.5em; } .ql-editor .ql-indent-8:not(.ql-direction-rtl) { padding-left: 24em; } .ql-editor li.ql-indent-8:not(.ql-direction-rtl) { padding-left: 25.5em; } .ql-editor .ql-indent-8.ql-direction-rtl.ql-align-right { padding-right: 24em; } .ql-editor li.ql-indent-8.ql-direction-rtl.ql-align-right { padding-right: 25.5em; } .ql-editor .ql-indent-9:not(.ql-direction-rtl) { padding-left: 27em; } .ql-editor li.ql-indent-9:not(.ql-direction-rtl) { padding-left: 28.5em; } .ql-editor .ql-indent-9.ql-direction-rtl.ql-align-right { padding-right: 27em; } .ql-editor li.ql-indent-9.ql-direction-rtl.ql-align-right { padding-right: 28.5em; } .ql-editor .ql-video { display: block; max-width: 100%; } .ql-editor .ql-video.ql-align-center { margin: 0 auto; } .ql-editor .ql-video.ql-align-right { margin: 0 0 0 auto; } .ql-editor .ql-bg-black { background-color: #000; } .ql-editor .ql-bg-red { background-color: #e60000; } .ql-editor .ql-bg-orange { background-color: #f90; } .ql-editor .ql-bg-yellow { background-color: #ff0; } .ql-editor .ql-bg-green { background-color: #008a00; } .ql-editor .ql-bg-blue { background-color: #06c; } .ql-editor .ql-bg-purple { background-color: #93f; } .ql-editor .ql-color-white { color: #fff; } .ql-editor .ql-color-red { color: #e60000; } .ql-editor .ql-color-orange { color: #f90; } .ql-editor .ql-color-yellow { color: #ff0; } .ql-editor .ql-color-green { color: #008a00; } .ql-editor .ql-color-blue { color: #06c; } .ql-editor .ql-color-purple { color: #93f; } .ql-editor .ql-font-serif { font-family: Georgia, Times New Roman, serif; } .ql-editor .ql-font-monospace { font-family: Monaco, Courier New, monospace; } .ql-editor .ql-size-small { font-size: 0.75em; } .ql-editor .ql-size-large { font-size: 1.5em; } .ql-editor .ql-size-huge { font-size: 2.5em; } .ql-editor .ql-direction-rtl { direction: rtl; text-align: inherit; } .ql-editor .ql-align-center { text-align: center; } .ql-editor .ql-align-justify { text-align: justify; } .ql-editor .ql-align-right { text-align: right; } .ql-editor.ql-blank::before { color: rgba(0, 0, 0, 0.6); content: attr(data-placeholder); font-style: italic; left: 15px; pointer-events: none; position: absolute; right: 15px; } .ql-snow { box-sizing: border-box; } .ql-snow * { box-sizing: border-box; } .ql-snow .ql-hidden { display: none; } .ql-snow .ql-out-bottom, .ql-snow .ql-out-top { visibility: hidden; } .ql-snow .ql-tooltip { position: absolute; transform: translateY(10px); } .ql-snow .ql-tooltip a { cursor: pointer; text-decoration: none; } .ql-snow .ql-tooltip.ql-flip { transform: translateY(-10px); } .ql-snow .ql-formats { display: inline-block; vertical-align: middle; } .ql-snow .ql-formats:after { clear: both; content: ''; display: table; } .ql-snow .ql-stroke { fill: none; stroke: #444; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2; } .ql-snow .ql-stroke-miter { fill: none; stroke: #444; stroke-miterlimit: 10; stroke-width: 2; } .ql-snow .ql-fill, .ql-snow .ql-stroke.ql-fill { fill: #444; } .ql-snow .ql-empty { fill: none; } .ql-snow .ql-even { fill-rule: evenodd; } .ql-snow .ql-thin, .ql-snow .ql-stroke.ql-thin { stroke-width: 1; } .ql-snow .ql-transparent { opacity: 0.4; } .ql-snow .ql-direction svg:last-child { display: none; } .ql-snow .ql-direction.ql-active svg:last-child { display: inline; } .ql-snow .ql-direction.ql-active svg:first-child { display: none; } .ql-snow .ql-editor h1 { font-size: 2em; } .ql-snow .ql-editor h2 { font-size: 1.5em; } .ql-snow .ql-editor h3 { font-size: 1.17em; } .ql-snow .ql-editor h4 { font-size: 1em; } .ql-snow .ql-editor h5 { font-size: 0.83em; } .ql-snow .ql-editor h6 { font-size: 0.67em; } .ql-snow .ql-editor a { text-decoration: underline; } .ql-snow .ql-editor blockquote { border-left: 4px solid #ccc; margin-bottom: 5px; margin-top: 5px; padding-left: 16px; } .ql-snow .ql-editor code, .ql-snow .ql-editor pre { background-color: #f0f0f0; border-radius: 3px; } .ql-snow .ql-editor pre { white-space: pre-wrap; margin-bottom: 5px; margin-top: 5px; padding: 5px 10px; } .ql-snow .ql-editor code { font-size: 85%; padding: 2px 4px; } .ql-snow .ql-editor pre.ql-syntax { background-color: #23241f; color: #f8f8f2; overflow: visible; } .ql-snow .ql-editor img { max-width: 100%; } .ql-snow .ql-picker { color: #444; display: inline-block; float: left; font-size: 14px; font-weight: 500; height: 24px; position: relative; vertical-align: middle; } .ql-snow .ql-picker-label { cursor: pointer; display: inline-block; height: 100%; padding-left: 8px; padding-right: 2px; position: relative; width: 100%; } .ql-snow .ql-picker-label::before { display: inline-block; line-height: 22px; } .ql-snow .ql-picker-options { background-color: #fff; display: none; min-width: 100%; padding: 4px 8px; position: absolute; white-space: nowrap; } .ql-snow .ql-picker-options .ql-picker-item { cursor: pointer; display: block; padding-bottom: 5px; padding-top: 5px; } .ql-snow .ql-picker.ql-expanded .ql-picker-label { color: #ccc; z-index: 2; } .ql-snow .ql-picker.ql-expanded .ql-picker-label .ql-fill { fill: #ccc; } .ql-snow .ql-picker.ql-expanded .ql-picker-label .ql-stroke { stroke: #ccc; } .ql-snow .ql-picker.ql-expanded .ql-picker-options { display: block; margin-top: -1px; top: 100%; z-index: 1; } .ql-snow .ql-color-picker, .ql-snow .ql-icon-picker { width: 28px; } .ql-snow .ql-color-picker .ql-picker-label, .ql-snow .ql-icon-picker .ql-picker-label { padding: 2px 4px; } .ql-snow .ql-color-picker .ql-picker-label svg, .ql-snow .ql-icon-picker .ql-picker-label svg { right: 4px; } .ql-snow .ql-icon-picker .ql-picker-options { padding: 4px 0px; } .ql-snow .ql-icon-picker .ql-picker-item { height: 24px; width: 24px; padding: 2px 4px; } .ql-snow .ql-color-picker .ql-picker-options { padding: 3px 5px; width: 152px; } .ql-snow .ql-color-picker .ql-picker-item { border: 1px solid transparent; float: left; height: 16px; margin: 2px; padding: 0px; width: 16px; } .ql-snow .ql-picker:not(.ql-color-picker):not(.ql-icon-picker) svg { position: absolute; margin-top: -9px; right: 0; top: 50%; width: 18px; } .ql-snow .ql-picker.ql-header .ql-picker-label[data-label]:not([data-label=''])::before, .ql-snow .ql-picker.ql-font .ql-picker-label[data-label]:not([data-label=''])::before, .ql-snow .ql-picker.ql-size .ql-picker-label[data-label]:not([data-label=''])::before, .ql-snow .ql-picker.ql-header .ql-picker-item[data-label]:not([data-label=''])::before, .ql-snow .ql-picker.ql-font .ql-picker-item[data-label]:not([data-label=''])::before, .ql-snow .ql-picker.ql-size .ql-picker-item[data-label]:not([data-label=''])::before { content: attr(data-label); } .ql-snow .ql-picker.ql-header { width: 98px; } .ql-snow .ql-picker.ql-header .ql-picker-label::before, .ql-snow .ql-picker.ql-header .ql-picker-item::before { content: 'Normal'; } .ql-snow .ql-picker.ql-header .ql-picker-label[data-value="1"]::before, .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="1"]::before { content: 'Heading 1'; } .ql-snow .ql-picker.ql-header .ql-picker-label[data-value="2"]::before, .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="2"]::before { content: 'Heading 2'; } .ql-snow .ql-picker.ql-header .ql-picker-label[data-value="3"]::before, .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="3"]::before { content: 'Heading 3'; } .ql-snow .ql-picker.ql-header .ql-picker-label[data-value="4"]::before, .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="4"]::before { content: 'Heading 4'; } .ql-snow .ql-picker.ql-header .ql-picker-label[data-value="5"]::before, .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="5"]::before { content: 'Heading 5'; } .ql-snow .ql-picker.ql-header .ql-picker-label[data-value="6"]::before, .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="6"]::before { content: 'Heading 6'; } .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="1"]::before { font-size: 2em; } .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="2"]::before { font-size: 1.5em; } .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="3"]::before { font-size: 1.17em; } .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="4"]::before { font-size: 1em; } .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="5"]::before { font-size: 0.83em; } .ql-snow .ql-picker.ql-header .ql-picker-item[data-value="6"]::before { font-size: 0.67em; } .ql-snow .ql-picker.ql-font { width: 108px; } .ql-snow .ql-picker.ql-font .ql-picker-label::before, .ql-snow .ql-picker.ql-font .ql-picker-item::before { content: 'Sans Serif'; } .ql-snow .ql-picker.ql-font .ql-picker-label[data-value=serif]::before, .ql-snow .ql-picker.ql-font .ql-picker-item[data-value=serif]::before { content: 'Serif'; } .ql-snow .ql-picker.ql-font .ql-picker-label[data-value=monospace]::before, .ql-snow .ql-picker.ql-font .ql-picker-item[data-value=monospace]::before { content: 'Monospace'; } .ql-snow .ql-picker.ql-font .ql-picker-item[data-value=serif]::before { font-family: Georgia, Times New Roman, serif; } .ql-snow .ql-picker.ql-font .ql-picker-item[data-value=monospace]::before { font-family: Monaco, Courier New, monospace; } .ql-snow .ql-picker.ql-size { width: 98px; } .ql-snow .ql-picker.ql-size .ql-picker-label::before, .ql-snow .ql-picker.ql-size .ql-picker-item::before { content: 'Normal'; } .ql-snow .ql-picker.ql-size .ql-picker-label[data-value=small]::before, .ql-snow .ql-picker.ql-size .ql-picker-item[data-value=small]::before { content: 'Small'; } .ql-snow .ql-picker.ql-size .ql-picker-label[data-value=large]::before, .ql-snow .ql-picker.ql-size .ql-picker-item[data-value=large]::before { content: 'Large'; } .ql-snow .ql-picker.ql-size .ql-picker-label[data-value=huge]::before, .ql-snow .ql-picker.ql-size .ql-picker-item[data-value=huge]::before { content: 'Huge'; } .ql-snow .ql-picker.ql-size .ql-picker-item[data-value=small]::before { font-size: 10px; } .ql-snow .ql-picker.ql-size .ql-picker-item[data-value=large]::before { font-size: 18px; } .ql-snow .ql-picker.ql-size .ql-picker-item[data-value=huge]::before { font-size: 32px; } .ql-snow .ql-color-picker.ql-background .ql-picker-item { background-color: #fff; } .ql-snow .ql-color-picker.ql-color .ql-picker-item { background-color: #000; } .ql-snow .ql-tooltip { background-color: #fff; border: 1px solid #ccc; box-shadow: 0px 0px 5px #ddd; color: #444; padding: 5px 12px; white-space: nowrap; } .ql-snow .ql-tooltip::before { content: "Visit URL:"; line-height: 26px; margin-right: 8px; } .ql-snow .ql-tooltip input[type=text] { display: none; border: 1px solid #ccc; font-size: 13px; height: 26px; margin: 0px; padding: 3px 5px; width: 170px; } .ql-snow .ql-tooltip a.ql-preview { display: inline-block; max-width: 200px; overflow-x: hidden; text-overflow: ellipsis; vertical-align: top; } .ql-snow .ql-tooltip a.ql-action::after { border-right: 1px solid #ccc; content: 'Edit'; margin-left: 16px; padding-right: 8px; } .ql-snow .ql-tooltip a.ql-remove::before { content: 'Remove'; margin-left: 8px; } .ql-snow .ql-tooltip a { line-height: 26px; } .ql-snow .ql-tooltip.ql-editing a.ql-preview, .ql-snow .ql-tooltip.ql-editing a.ql-remove { display: none; } .ql-snow .ql-tooltip.ql-editing input[type=text] { display: inline-block; } .ql-snow .ql-tooltip.ql-editing a.ql-action::after { border-right: 0px; content: 'Save'; padding-right: 0px; } .ql-snow .ql-tooltip[data-mode=link]::before { content: "Enter link:"; } .ql-snow .ql-tooltip[data-mode=formula]::before { content: "Enter formula:"; } .ql-snow .ql-tooltip[data-mode=video]::before { content: "Enter video:"; } .ql-snow a { color: #06c; } .ql-container.ql-snow { border: 1px solid #ccc; } .ql-bubble { box-sizing: border-box; } .ql-bubble * { box-sizing: border-box; } .ql-bubble .ql-hidden { display: none; } .ql-bubble .ql-out-bottom, .ql-bubble .ql-out-top { visibility: hidden; } .ql-bubble .ql-tooltip { position: absolute; transform: translateY(10px); } .ql-bubble .ql-tooltip a { cursor: pointer; text-decoration: none; } .ql-bubble .ql-tooltip.ql-flip { transform: translateY(-10px); } .ql-bubble .ql-formats { display: inline-block; vertical-align: middle; } .ql-bubble .ql-formats:after { clear: both; content: ''; display: table; } .ql-bubble .ql-stroke { fill: none; stroke: #ccc; stroke-linecap: round; stroke-linejoin: round; stroke-width: 2; } .ql-bubble .ql-stroke-miter { fill: none; stroke: #ccc; stroke-miterlimit: 10; stroke-width: 2; } .ql-bubble .ql-fill, .ql-bubble .ql-stroke.ql-fill { fill: #ccc; } .ql-bubble .ql-empty { fill: none; } .ql-bubble .ql-even { fill-rule: evenodd; } .ql-bubble .ql-thin, .ql-bubble .ql-stroke.ql-thin { stroke-width: 1; } .ql-bubble .ql-transparent { opacity: 0.4; } .ql-bubble .ql-direction svg:last-child { display: none; } .ql-bubble .ql-direction.ql-active svg:last-child { display: inline; } .ql-bubble .ql-direction.ql-active svg:first-child { display: none; } .ql-bubble .ql-editor h1 { font-size: 2em; } .ql-bubble .ql-editor h2 { font-size: 1.5em; } .ql-bubble .ql-editor h3 { font-size: 1.17em; } .ql-bubble .ql-editor h4 { font-size: 1em; } .ql-bubble .ql-editor h5 { font-size: 0.83em; } .ql-bubble .ql-editor h6 { font-size: 0.67em; } .ql-bubble .ql-editor a { text-decoration: underline; } .ql-bubble .ql-editor blockquote { border-left: 4px solid #ccc; margin-bottom: 5px; margin-top: 5px; padding-left: 16px; } .ql-bubble .ql-editor code, .ql-bubble .ql-editor pre { background-color: #f0f0f0; border-radius: 3px; } .ql-bubble .ql-editor pre { white-space: pre-wrap; margin-bottom: 5px; margin-top: 5px; padding: 5px 10px; } .ql-bubble .ql-editor code { font-size: 85%; padding: 2px 4px; } .ql-bubble .ql-editor pre.ql-syntax { background-color: #23241f; color: #f8f8f2; overflow: visible; } .ql-bubble .ql-editor img { max-width: 100%; } .ql-bubble .ql-picker { color: #ccc; display: inline-block; float: left; font-size: 14px; font-weight: 500; height: 24px; position: relative; vertical-align: middle; } .ql-bubble .ql-picker-label { cursor: pointer; display: inline-block; height: 100%; padding-left: 8px; padding-right: 2px; position: relative; width: 100%; } .ql-bubble .ql-picker-label::before { display: inline-block; line-height: 22px; } .ql-bubble .ql-picker-options { background-color: #444; display: none; min-width: 100%; padding: 4px 8px; position: absolute; white-space: nowrap; } .ql-bubble .ql-picker-options .ql-picker-item { cursor: pointer; display: block; padding-bottom: 5px; padding-top: 5px; } .ql-bubble .ql-picker.ql-expanded .ql-picker-label { color: #777; z-index: 2; } .ql-bubble .ql-picker.ql-expanded .ql-picker-label .ql-fill { fill: #777; } .ql-bubble .ql-picker.ql-expanded .ql-picker-label .ql-stroke { stroke: #777; } .ql-bubble .ql-picker.ql-expanded .ql-picker-options { display: block; margin-top: -1px; top: 100%; z-index: 1; } .ql-bubble .ql-color-picker, .ql-bubble .ql-icon-picker { width: 28px; } .ql-bubble .ql-color-picker .ql-picker-label, .ql-bubble .ql-icon-picker .ql-picker-label { padding: 2px 4px; } .ql-bubble .ql-color-picker .ql-picker-label svg, .ql-bubble .ql-icon-picker .ql-picker-label svg { right: 4px; } .ql-bubble .ql-icon-picker .ql-picker-options { padding: 4px 0px; } .ql-bubble .ql-icon-picker .ql-picker-item { height: 24px; width: 24px; padding: 2px 4px; } .ql-bubble .ql-color-picker .ql-picker-options { padding: 3px 5px; width: 152px; } .ql-bubble .ql-color-picker .ql-picker-item { border: 1px solid transparent; float: left; height: 16px; margin: 2px; padding: 0px; width: 16px; } .ql-bubble .ql-picker:not(.ql-color-picker):not(.ql-icon-picker) svg { position: absolute; margin-top: -9px; right: 0; top: 50%; width: 18px; } .ql-bubble .ql-picker.ql-header .ql-picker-label[data-label]:not([data-label=''])::before, .ql-bubble .ql-picker.ql-font .ql-picker-label[data-label]:not([data-label=''])::before, .ql-bubble .ql-picker.ql-size .ql-picker-label[data-label]:not([data-label=''])::before, .ql-bubble .ql-picker.ql-header .ql-picker-item[data-label]:not([data-label=''])::before, .ql-bubble .ql-picker.ql-font .ql-picker-item[data-label]:not([data-label=''])::before, .ql-bubble .ql-picker.ql-size .ql-picker-item[data-label]:not([data-label=''])::before { content: attr(data-label); } .ql-bubble .ql-picker.ql-header { width: 98px; } .ql-bubble .ql-picker.ql-header .ql-picker-label::before, .ql-bubble .ql-picker.ql-header .ql-picker-item::before { content: 'Normal'; } .ql-bubble .ql-picker.ql-header .ql-picker-label[data-value="1"]::before, .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="1"]::before { content: 'Heading 1'; } .ql-bubble .ql-picker.ql-header .ql-picker-label[data-value="2"]::before, .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="2"]::before { content: 'Heading 2'; } .ql-bubble .ql-picker.ql-header .ql-picker-label[data-value="3"]::before, .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="3"]::before { content: 'Heading 3'; } .ql-bubble .ql-picker.ql-header .ql-picker-label[data-value="4"]::before, .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="4"]::before { content: 'Heading 4'; } .ql-bubble .ql-picker.ql-header .ql-picker-label[data-value="5"]::before, .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="5"]::before { content: 'Heading 5'; } .ql-bubble .ql-picker.ql-header .ql-picker-label[data-value="6"]::before, .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="6"]::before { content: 'Heading 6'; } .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="1"]::before { font-size: 2em; } .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="2"]::before { font-size: 1.5em; } .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="3"]::before { font-size: 1.17em; } .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="4"]::before { font-size: 1em; } .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="5"]::before { font-size: 0.83em; } .ql-bubble .ql-picker.ql-header .ql-picker-item[data-value="6"]::before { font-size: 0.67em; } .ql-bubble .ql-picker.ql-font { width: 108px; } .ql-bubble .ql-picker.ql-font .ql-picker-label::before, .ql-bubble .ql-picker.ql-font .ql-picker-item::before { content: 'Sans Serif'; } .ql-bubble .ql-picker.ql-font .ql-picker-label[data-value=serif]::before, .ql-bubble .ql-picker.ql-font .ql-picker-item[data-value=serif]::before { content: 'Serif'; } .ql-bubble .ql-picker.ql-font .ql-picker-label[data-value=monospace]::before, .ql-bubble .ql-picker.ql-font .ql-picker-item[data-value=monospace]::before { content: 'Monospace'; } .ql-bubble .ql-picker.ql-font .ql-picker-item[data-value=serif]::before { font-family: Georgia, Times New Roman, serif; } .ql-bubble .ql-picker.ql-font .ql-picker-item[data-value=monospace]::before { font-family: Monaco, Courier New, monospace; } .ql-bubble .ql-picker.ql-size { width: 98px; } .ql-bubble .ql-picker.ql-size .ql-picker-label::before, .ql-bubble .ql-picker.ql-size .ql-picker-item::before { content: 'Normal'; } .ql-bubble .ql-picker.ql-size .ql-picker-label[data-value=small]::before, .ql-bubble .ql-picker.ql-size .ql-picker-item[data-value=small]::before { content: 'Small'; } .ql-bubble .ql-picker.ql-size .ql-picker-label[data-value=large]::before, .ql-bubble .ql-picker.ql-size .ql-picker-item[data-value=large]::before { content: 'Large'; } .ql-bubble .ql-picker.ql-size .ql-picker-label[data-value=huge]::before, .ql-bubble .ql-picker.ql-size .ql-picker-item[data-value=huge]::before { content: 'Huge'; } .ql-bubble .ql-picker.ql-size .ql-picker-item[data-value=small]::before { font-size: 10px; } .ql-bubble .ql-picker.ql-size .ql-picker-item[data-value=large]::before { font-size: 18px; } .ql-bubble .ql-picker.ql-size .ql-picker-item[data-value=huge]::before { font-size: 32px; } .ql-bubble .ql-color-picker.ql-background .ql-picker-item { background-color: #fff; } .ql-bubble .ql-color-picker.ql-color .ql-picker-item { background-color: #000; } .ql-bubble .ql-color-picker svg { margin: 1px; } .ql-bubble .ql-color-picker .ql-picker-item.ql-selected, .ql-bubble .ql-color-picker .ql-picker-item:hover { border-color: #fff; } .ql-bubble .ql-tooltip { background-color: #444; border-radius: 25px; color: #fff; } .ql-bubble .ql-tooltip-arrow { border-left: 6px solid transparent; border-right: 6px solid transparent; content: " "; display: block; left: 50%; margin-left: -6px; position: absolute; } .ql-bubble .ql-tooltip:not(.ql-flip) .ql-tooltip-arrow { border-bottom: 6px solid #444; top: -6px; } .ql-bubble .ql-tooltip.ql-flip .ql-tooltip-arrow { border-top: 6px solid #444; bottom: -6px; } .ql-bubble .ql-tooltip.ql-editing .ql-tooltip-editor { display: block; } .ql-bubble .ql-tooltip.ql-editing .ql-formats { visibility: hidden; } .ql-bubble .ql-tooltip-editor { display: none; } .ql-bubble .ql-tooltip-editor input[type=text] { background: transparent; border: none; color: #fff; font-size: 13px; height: 100%; outline: none; padding: 10px 20px; position: absolute; width: 100%; } .ql-bubble .ql-tooltip-editor a { top: 10px; position: absolute; right: 20px; } .ql-bubble .ql-tooltip-editor a:before { color: #ccc; content: "D7"; font-size: 16px; font-weight: bold; } .ql-container.ql-bubble:not(.ql-disabled) a { position: relative; white-space: nowrap; } .ql-container.ql-bubble:not(.ql-disabled) a::before { background-color: #444; border-radius: 15px; top: -5px; font-size: 12px; color: #fff; content: attr(href); font-weight: normal; overflow: hidden; padding: 5px 15px; text-decoration: none; z-index: 1; } .ql-container.ql-bubble:not(.ql-disabled) a::after { border-top: 6px solid #444; border-left: 6px solid transparent; border-right: 6px solid transparent; top: 0; content: " "; height: 0; width: 0; } .ql-container.ql-bubble:not(.ql-disabled) a::before, .ql-container.ql-bubble:not(.ql-disabled) a::after { left: 0; margin-left: 50%; position: absolute; transform: translate(-50%, -100%); transition: visibility 0s ease 200ms; visibility: hidden; } .ql-container.ql-bubble:not(.ql-disabled) a:hover::before, .ql-container.ql-bubble:not(.ql-disabled) a:hover::after { visibility: visible; } "A Mother's Healing Touch" is a heartfelt exploration of the profound bond between a mother and her child, offering insights and guidance for nurturing emotional well-being and resilience. Drawing on the wisdom of ancient traditions and modern psychology, this book celebrates the transformative power of a mother's love and compassion in healing wounds, soothing fears, and fostering growth.Through personal anecdotes, practical tips, and mindfulness exercises, "A Mother's Healing Touch" offers support to mothers navigating the challenges of raising children in today's world. From soothing a crying infant to supporting a teenager through turbulent times, discover how to cultivate presence, empathy, and connection to strengthen your relationship with your child and promote their emotional resilience.Explore the healing potential of nurturing touch, empathetic listening, and unconditional acceptance as you embark on a journey of self-discovery and growth alongside your child. Whether you're facing moments of joy or adversity, this book serves as a guiding light, reminding mothers of the transformative power they hold to nurture, heal, and inspire their children through the gentle touch of love."

      A mother's healing touch

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      This study builds on previous work demonstrating that several beta connexins (Cx26, Cx30, and Cx32) have a carbamylation motif which renders them sensitive to CO<sub>2</sub>. In response to CO<sub>2</sub>, hemichannels composed of these connexins open, enabling diffusion of small molecules (such as ATP) between the cytosol and extracellular environment. Here, the authors have identified that an alpha connexin, Cx43, also contains a carbamylation motif, and they demonstrate that CO<sub>2</sub> opens Cx43 hemichannels. Most of the study involves using transfected cells expressing wildtype and mutant Cx43 to define amino acids required for CO<sub>2</sub> sensitivity. Hippocampal tissue slices in culture were used to show that CO<sub>2</sub>-induced synaptic transmission was affected by Cx43 hemichannels, providing a physiological context. The authors point out that the Cx43 gene significantly diverges from the beta connexins that are CO<sub>2</sub> sensitive, suggesting that the conserved carbamylation motif was present before the alpha and beta connexin genes diverged. 

      Strengths: 

      (1) The molecular analysis defining the amino acids that contribute to the CO<sub>2</sub> sensitivity of Cx43 is a major strength of the study. The rigor of analysis was strengthened by using three independent assays for hemichannel opening: dye uptake, patch clamp channel measurements, and ATP secretion. The resulting analysis identified key lysines in Cx43 that were required for CO<sub>2</sub>-mediated hemichannel opening. A double K to E Cx43 mutant produced a construct that produced hemichannels that were constitutively open, which further strengthened the analysis. 

      (2) Using hippocampal tissue sections to demonstrate that CO<sub>2</sub> can influence field excitatory postsynaptic potentials (fEPSPs) provides a native context for CO<sub>2</sub> regulation of Cx43 hemichannels. Cx43 mutations associated with Oculodentodigital Dysplasia (ODDD) inhibited CO<sub>2</sub>-induced hemichannel opening, although the mechanism by which this occurs was not elucidated. 

      Weaknesses: 

      (1) Cx43 channels are sensitive to cytosolic pH, which will be affected by CO<sub>2</sub>. Cytosolic pH was not measured, and how this affects CO<sub>2</sub>-induced Cx43 hemichannel activity was not addressed. 

      We have now addressed this with intracellular pH measurements and removal of the C-terminal pH sensor from Cx43 -the hemichannel remains CO<sub>2</sub> sensitive.

      (2) Cultured cells are typically grown in incubators containing 5% CO<sub>2</sub>, which is ~40 mmHg. It is unclear how cells would be viable if Cx43 hemichannels are open at this PCO2. 

      The cells look completely healthy with normal morphology and no sign of excessive cell death in the cultures. Presumably they have ways of compensating for the effects of partially open Cx43 hemichannels.

      (3) Experiments using Gap26 to inhibit Cx43 hemichannels in fEPSP measurements used a scrambled peptide as a control. Analysis should also include Gap peptides specifically targeting Cx26, Cx30, and Cx32 as additional controls. 

      We don’t feel this is necessary given the extensive prior literature in hippocampus showing the effect of ATP release via open Cx43 hemichannels on fEPSP amplitude that used astrocytic specific knockout of Cx43 and Gap26 (doi: 10.1523/jneurosci.0015-14.2014).

      (4) The mechanism by which ODDD mutations impair CO2-mediated hemichannel opening was not addressed. Also, the potential roles for inhibiting Cx43 hemichannels in the pathology of ODDD are unclear. 

      These pathological mutations that alter CO<SUB>2</SUB> sensitivity are similar to pathological mutation in Cx26 and Cx32, which also remove CO<SUB>2</SUB> sensitivity. Our cryo-EM studies on Cx26 give clues as to why these mutations have this effect -they alter conformational mobility of the channel (Brotherton et al 2022 doi: 10.1016/j.str.2022.02.010 and Brotherton et al 2024 doi: 10.7554/eLife.93686). We assume that similar considerations apply to Cx43, but this requires improved cryoEM structures of Cx43 hemichannels at differing levels of PCO<SUB>2</SUB>.

      We agree that the link between loss of CO<SUB>2</SUB> sensitivity of Cx43 and ODDD is not established and have revised the text to make this clear.

      (5) CO2 has no effect on Cx43-mediated gap junctional communication as opposed to Cx26 gap junctions, which are inhibited by CO2. The molecular basis for this difference was not determined. 

      Cx26 gap junction channels are so far unique amongst CO<SUB>2</SUB> sensitive connexins in being closed by CO<SUB>2</SUB>. We have addressed the mechanism by which this occurs in Nijjar et al 2025 DOI: 10.1113/JP285885 -the requirement of carbamylation of K108 in Cx26 (in addition to K125) for GJC closure.

      (6) Whether there are other non-beta connexins that have a putative carbamylation motif was not addressed. Additional discussion/analysis of how the evolutionary trajectory for Cx43 maintaining a carbamylation motif is unique for non-beta connexins would strengthen the study. 

      We have performed a molecular phylogenetic survey to show that the carbamylation motif occurs across the alpha connexin clade and have shown that Cx50 is indeed CO<SUB>2</SUB> sensitive (doi: 10.1101/2025.01.23.634273). This is now in Fig 12.

      Reviewer #2 (Public review): 

      Summary: 

      This paper examines the CO<SUB>2</SUB>  sensitivity of Cx43 hemichannels and gap junctional channels in transiently transfected Hela cells using several different assays, including ethidium dye uptake, ATP release, whole cell patch clamp recordings, and an imaging assay of gap junctional dye transfer. The results show that raising pCO<sub>2</sub> from 20 to 70 mmHg (at a constant pH of 7.3) causes an increase in opening of Cx43 hemichannels but does not block Cx43 gap junctions. This study also showed that raising pCO<SUB>2</SUB> from 20 to 35 mm Hg resulted in an increase in synaptic strength in hippocampal rat brain slices, presumably due to downstream ATP release, suggesting that the CO<SUB>2</SUB> sensitivity of Cx43 may be physiologically relevant. As a further test of the physiological relevance of the CO<sub>2</sub> sensitivity of Cx43, it was shown that two pathological mutations of Cx43 that are associated with ODDD caused loss of Cx43 CO<sub>2</sub>-sensitivity. Cx43 has a potential carbamylation motif that is homologous to the motif in Cx26. To understand the structural changes involved in CO<SUB>2</SUB> sensitivity, a number of mutations were made in Cx43 sites thought to be the equivalent of those known to be involved in the CO<SUB>2</SUB> sensitivity of Cx26, and the CO<SUB>2</SUB> sensitivity of these mutants was investigated. 

      Strengths: 

      This study shows that the apparent lack of functional Cx43 hemichannels observed in a number of previous in vitro function studies may be due to the use of HEPES to buffer the external pH. When Cx43 hemichannels were studied in external solutions in which CO<SUB>2</SUB>/bicarbonate was used to buffer pH instead of HEPES, Cx43 hemichannels showed significantly higher levels of dye uptake, ATP release, and ionic conductance. These findings may have major physiological implications since Cx43 hemichannels are found in many organs throughout the body, including the brain, heart, and immune system. 

      Weaknesses: 

      (1) Interpretation of the site-directed mutation studies is complicated. Although Cx43 has a potential carbamylation motif that is homologous to the motif in Cx26, the results of site-directed mutation studies were inconsistent with a simple model in which K144 and K105 interact following carbamylation to cause the opening of Cx43 hemichannels. 

      The mechanism of opening of Cx43 is more complex than that of Cx26, Cx32 and Cx50 and involves more Lys residues. The 4 Lys residues in Cx43 that are involved in opening the hemichannel have their equivalents in Cx26, but in Cx26 these additional residues seem to be involved in the closing of the GJC rather than opening of the hemichannel (see above). Cx50 is simpler and involves only two Lys residues (doi: 10.1101/2025.01.23.634273), which are equivalent to those in Cx26.

      (2) Secondly, although it is shown that two Cx43 ODDD-associated mutations show a loss of CO<sub>2</sub> sensitivity, there is no evidence that the absence of CO2 sensitivity is involved in the pathology of ODD

      We agree, but this is probably because this has not been directly tested by experiment, as the CO<Sub>2</sub> sensitivity of Cx43 was not previously known. As mentioned above we have revised the text to ensure that this is clear.

      Reviewer #3 (Public review): 

      In this paper, the authors aimed to investigate carbamylation effects on the function of Cx43-based hemichannels. Such effects have previously been characterized for other connexins, e.g., for Cx26, which display increased hemichannel (HC) opening and closure of gap junction channels upon exposure to increased CO<sub>2</sub> partial pressure (accompanied by increased bicarbonate to keep pH constant). 

      The authors used HeLa cells transiently transfected with Cx43 to investigate CO<sub>2</sub> dependent carbamylation effects on Cx43 HC function. In contrast to Cx43-based gap junction channels that are reported here to be insensitive to PCO<sub>2</sub> alterations, they provide evidence that Cx43 HC opening is highly dependent on the PCO2 pressure in the bath solution, over a range of 20 up to 70 mmHg encompassing the physiologically normal resting level of around 40 mmHg. They furthermore identified several Cx43 residues involved in Cx43 HC sensitivity to PCO2: K105, K109, K144 & K234; mutation of 2 or more of these AAs is necessary to abolish CO<sub>2</sub> sensitivity. The subject is interesting and the results indicate that a fraction of HCs is open at a physiological 40 mmHg PCO<sub>2</sub>, which differs from the situation under HEPES buffered solutions where HCs are mostly closed under resting conditions. The mechanism of HC opening with CO<sub>2</sub> gassing is linked to carbamylation, and the authors pinpointed several Lys residues involved in this process. 

      Overall, the work is interesting as it shows that Cx43 HCs have a significant open probability under resting conditions of physiological levels of CO<sub>2</sub> gassing, probably applicable to the brain, heart, and other Cx43 expressing organs. The paper gives a detailed account of various experiments performed (dye uptake, electrophysiology, ATP release to assess HC function) and results concluded from those. They further consider many candidate carbamylation sites by mutating them to negatively charged Glu residues. The paper ends with hippocampal slice work showing evidence for connexin-dependent increases of the EPSP amplitude that could be inhibited by HC inhibition with Gap26 (Figure 10). Another line of evidence comes from the Cx43-linked ODDD genetic disease, whereby L90V as well as the A44V mutations of Cx43 prevented the CO<sub>2</sub>-induced hemichannel opening response (Figure 11). Although the paper is interesting, in its present state, it suffers from (i) a problematic Figure 3, precluding interpretation of the data shown, and (ii) the poor use of hemichannel inhibitors that are necessary to strengthen the evidence in the crucial experiment of Figure 2 and others. 

      The panels in Figure 3 were mislabelled in the accompanying legend possibly leading to some confusion. This has now been corrected.

      We disagree that hemichannel blockers are needed to strengthen the evidence in Figure 2 and other figures. Our controls show that the CO<sub>2</sub>-sensitive responses absolutely requires expression of Cx43 and was modified by mutations of Cx43. It is hard to see how this evidence would be strengthened by use of peptide inhibitors or other blockers of hemichannels that may not be completely selective.

      Reviewing Editor Comments:

      (1) Improve electrophysiological evidence, addressing concerns about the initial experiment and including peptide inhibitor data where applicable. 

      We think the concerns about the electrophysiological evidence arise from a misunderstanding because we gave insufficient information about how we conducted the experiments. We have now provided a much more complete legend, added explanations in the text and given more detail in the Methods. We further respond to the reviewer below.

      We do not agree on the necessity of the peptide inhibitor to demonstrate dependence on Cx43.  We have shown that parental HeLa cells do not release ATP to changes in PCO<sub>2</sub> or voltage (Fig 2D; Butler & Dale 2023, 10.3389/fncel.2023.1330983; Lovatt et al 2025, 10.1101/2025.03.12.642803, 10.1101/2025.01.23.634273). Our previous papers have shown many times that parental HeLa cells do not load with dye to CO<sub>2</sub> or zero Ca<sup>2+</sup> (e.g. Huckstepp et al 2010, 10.1113/jphysiol.2010.192096; Meigh et al 2013, 10.7554/eLife.01213; Meigh et al 2014, 10.7554/eLife.04249), and we have shown that parental HeLa cells do not exhibit the same CO<sub>2</sub> dependent change in whole cell conductance that the Cx43-expressing cells do (Fig 2B). In addition, we shown that mutating key residues in Cx43 alters both CO<sub>2</sub>-sensitive release of ATP and the CO<sub>2</sub>-dependent dye loading without affecting the respective positive control. To bolster this, we have included data for the K144R mutation as a supplement to Fig 3. Given the expense of Gap26 it is impractical to include this as a standard control and unnecessary given the comprehensive controls outlined.

      Collectively, these data show that the responses to CO<sub>2</sub> require expression of Cx43 and can be modified by mutation of Cx43.

      (2) Strengthen the manuscript by measuring the effects of CO on cytosolic pH and Cx43 hemichannel opening. Consider using tail truncation mutants to assess the role of the C-terminal pH sensor in CO-mediated channel opening.

      We agree and have performed the suggested experiments to address this issue.

      (3) Investigate the effect of expressing the K105E/K109E Cx43 double mutant on cell viability.

      In our experiments the cells look completely healthy based on their morphology in brightfield microscopy and growth rates. 

      (4) Discuss and analyze the uniqueness of Cx43 among alpha connexins in maintaining the carbamylation motif.

      now discuss this -Cx43 is not unique. We have added a molecular phylogenetic survey of the alpha connexin clade in Fig 12. Apart from Cx37, the carbamylation motif appears in all the other members of the clade (but not necessarily in the human orthologue). In a different MS, currently posted on bioRxiv, we have documented the CO<sub>2</sub> sensitivity of Cx50 and its dependence on the motif.

      (5) Consider omitting data on ODDD-associated mutations unless there is evidence linking CO<sub>2</sub> sensitivity to disease pathology.

      This experiment is observational, and we are not making claims that there is a direct causal link. Removing the ODDD mutant findings would lose potentially useful information for anyone studying how these mutations alter channel function. We have reworded the text to ensure that we say that the link between loss of CO<sub>2</sub> sensitivity and ODDD remains unproven.

      (6) Justify the choice of high K<sup>⁺</sup> and low external calcium as a positive control in ATP release experiments.

      These two manipulations can open the hemichannel independently of the CO<sub>2</sub> stimulus. Extracellular Ca<sup>2+</sup> is well known to block all connexin hemichannels, and Cx43 is known to be voltage sensitive. The depolarisation from high K<sup>+</sup> is effective at opening the hemichannel and we preferred this as a more physiological way of opening the Cx43 hemichannel. We have added some explanatory text.

      (7) Clarify whether Cx43A44V or Cx43L90V mutations block gap junctional coupling.

      This is an interesting point. Since Cx43 GJCs are not CO<sub>2</sub> sensitive we feel this is beyond the scope of our paper. 

      (8) Discuss the potential implications of pCO₂ changes on myocardial function through alterations in intracellular pH.

      We have modified the discussion to consider this point.

      Reviewer #1 (Recommendations for the authors):

      (1) Measurements of the effects of CO<sub>2</sub> on cytosolic pH/Cx43 hemichannel opening would strengthen the manuscript. Since the pH sensor of Cx43 is on the C terminus, the authors could consider making tail truncation mutants to see how this affects CO<sub>2</sub>-mediated Cx43 channel opening.

      We have done this (truncating after residue 256) -the channel remains highly CO<sub>2</sub> and voltage sensitive. We have also documented the effect of the  hypercapnic solutions on intracellular pH measured with BCECF. These new data are now included as figure supplements to Figure 2.

      (2) What is the impact of expressing the K105E / K109E Cx43 double mutant on cell viability?

      There was no obvious observed impact, cell density was as expected (no evidence of increased cell death), brightfield and fluorescence visualisation indicated normal healthy cells. We have added a movie (Fig 9, movie supplement 1) to show the effect of La<sup>3+</sup> on the GRAB<sub>ATP</sub> signal in cells expressing Cx43<sup>K105E, K109E</sup> so readers can appreciate the morphology and its stability during the recording.

      (3) A quick look at other alpha connexins suggested that Cx43 was unique among alpha connexins in maintaining the carbamylation motif. This merits additional discussion/ analysis.

      This is an interesting point. Cx43 is not unique in the alpha clade in having the carbamylation motif as a number of other human alpha connexins also possess: Cx50, Cx59 and Cx62, and non-human alpha connexins (Cx40, Cx59, Cx46) also possess the motif. We have shown that Cx50 is CO<sub>2</sub> sensitive. We have performed a brief molecular phylogenetic analysis of the alpha connexon clade to highlight the occurrence of the carbamylation motif. This is now presented as Fig 12 to go with the accompanying discussion.

      (4) There were some minor writing issues that should be addressed. For instance, fEPSP is not defined. Also, insets showing positive controls in some experiments were not described in the figure legends.

      We have corrected these issues.

      Reviewer #2 (Recommendations for the authors):

      (1) I would omit the data on the ODDD-associated mutations since there is no evidence that loss of CO<sub>2</sub> sensitivity plays an important role in the underlying disease pathology.

      We are not making the claim CO<sub>2</sub> loss leads to the underlying pathology and have reviewed the text to ensure that we clearly express that this is a correlation not a cause. We think this is worth retaining as many pathological mutations in other CO<sub>2</sub> sensitive connexins (Cx26, Cx32 and Cx50) cause loss of CO<sub>2</sub> sensitivity, and this information may be helpful to other researchers.

      (2) Why is high K+ rather than low external calcium used as a positive control in ATP release experiments?

      We used of high K<sup>+</sup> and depolarisation as a positive control as regard this as a more physiological stimulus than the low external Ca<sup>2+</sup>.

      (3) Does Cx43A44V or Cx43L90V block gap junctional coupling?

      An interesting question but we have not examined this.

      (4) Provide references for biophysical recordings of Cx43 hemichannels performed in HEPES-buffered salines, which document Cx43 hemichannels as being shut.

      have added the original and some later references which examine Cx43 hemichannel gating in HEPES buffer and shows the need for substantial depolarisation to induce channel opening.

      (5) In the heart muscle, changes in PCO<sub>2</sub> have long been hypothesized to cause changes in myocardial function by changing pHi.

      This is true and we now add some discussion of this point. Now that we know that Cx43 is directly sensitive to CO<sub>2</sub> a direct action of CO<sub>2</sub> cannot be ruled out and careful experimentation is required to test this possibility. 

      Reviewer #3 (Recommendations for the authors):

      (1) Page 3: "... homologs of K125 and R104 ... ": the context is linked to Cx26, so Cx26 needs to be added here.

      Done

      (2) Page 4 text and related Figure 2:

      (a) Figure 2A&B: PCO2-dependent Cx43 HC opening is clearly present in the carboxy-fluorescein dye uptake experiments (Figure 2A) as well as in the electrophysiological experiments (Figure 2B). The curves look quite different between these two distinct readouts: dye uptake doubles from 20 to 70 mmHg in Figure 2A while the electrophysiological data double from 45 to 70 mmHg in Figure 2B. These responses look quite distinct and may be linked to a non-linearity of the dye uptake assay or a problem in the electrophysiological measurements of Figure 2B discussed in the next point.

      Different molecules/ions may have different permeabilities through the channel, which could explain the observed difference. Also, there is some contamination of the whole cell conductance change with another conductance (evident in recordings from parental HeLa cells). This is evident particularly at 70 mmHg. If this contaminating conductance were subtracted from the total conductance in the Cx43 expressing cells, then the dose response relations would be more similar. However, we are reluctant to add this additional data processing step to the paper.

      (b) The traces in Figure 2B show that the HC current is inward at 20 mmHg PCO2, while it switches to an outward current at 55mmHg PCO2. HCs are non-selective channels, so their current should switch direction around 0 mV but not at -50 mV. As such, the -50 mV switching point indicates involvement of another channel distinct from non-selective Cx43 hemichannels.

      We think that our incomplete description in the legend led to this misunderstanding. We used a baseline of 35 mmHg (where the channels will be slightly open) and changed to 20 mmHg to close them (or to higher PCO<sub>2</sub> to open them from this baseline), hence a decrease in conductance and loss of outward current for 20 mmHg. The holding potential for the recordings and voltage steps were the same in all recordings. We have now edited the legend and added more information into the methods to clarify this and how we constructed the dose response curve.

      We agree that Cx43 hemichannels are relatively nonselective and would normally be expected to have a reversal potential around 0 mV, but we are using K-Gluconate and the lowered reversal potential (~-65 mV) is likely due to poor permeation of this anion via Cx43.

      (c) A Hill slope of 6 is reported for this curve, which is extremely steep. The paper does not provide any further consideration, making this an isolated statement without any theoretical framework to understand the present finding in such context (i.e., in relation to the PCO2 dependency of Cx channels).

      Yes, we agree -it seems to be the case with all CO<sub>2</sub> sensitive connexins that we have looked at that the Hill coefficient versus CO<sub>2</sub> is >4. Hemichannels are of course hexameric so there is potential for 6 CO<sub>2</sub> molecules to be bound and extensive cooperativity. We have modified the text to give greater context.

      (d) A further remark to Figure 2 is that it does not contain any experiment showing the effect of Cx43 hemichannel inhibition with a reliable HC inhibitor such as Gap26, which is only used in the penultimate illustration of Figure 10. Gap26 should be used in Figure 2 and most of the other figures to show evidence of HC contribution. The lanthanum ions used in Figure 9 are a very non-specific hemichannel blocker and should be replaced by experiments with Gap26.

      We have addressed the first part of this comment above.

      We agree that La<sup>3+</sup> blocks all hemichannels, but in the context of our experiments and the controls we have performed it is entirely adequate and supports our conclusions. Our controls show (mentioned above and below) show that the expression of Cx43 is absolutely required for CO<sub>2</sub>-dependent ATP release (and dye loading). In Figure 9 our use of La<sup>3+</sup> was to show the presence of a constitutively open Cx43 mutant hemichannel. Gap26 would add little to this. Our further controls show that with expression of Cx43<sup>WT</sup> La<sup>3+</sup> did nothing to the ATP signal under baseline conditions (20 mmHg) supporting our conclusion that the mutant channels are constitutively open.

      (e) As the experiments of Figure 2 form the basis of what is to follow, the above remarks cast doubt on the robustness of the experiments and the data produced.

      We disagree, our results are extremely robust: 1) we have used three independent assays confirm the presence of the response; 2) parental HeLa cells do not release ATP, dye load or show large conductance changes to CO<sub>2</sub> showing the absolute requirement for expression of Cx43; 3) mutations of Cx43 (in the carbamylation motif) alter the CO<sub>2</sub> evoked ATP release and dye loading giving further confirmation of Cx43 as the conduit for ATP release and dye loading; and 4) we use standard positive controls (0 Ca<sup>²</sup>, high K<sup></sup>) to confirm cells still have functional channels for those mutations that modified CO<sub>2</sub> sensitivity.

      (f) The sentence "Cells transfected with GRAB-ATP only, showed ... " should be

      modified to "In contrast, cells not expressing Cx43 showed no responses to any applied CO2 concentration as concluded from GRAB-ATP experiments"

      We have modified the text.

      (3) Page 5 and Figures 3 & 4:

      (a) Figure 3 illustrates results obtained with mutations of 4 distinct Lys residues. However, the corresponding legend indicates mutations that are different from the ones shown in the corresponding illustrations, making it impossible to reliably understand and interpret the results shown in panels A-E.

      Thanks for pointing this out. Our apologies, we modified the figure so that the order of the images matched the order of the graph (and the legend) but then forgot to put the new version of the figure in the text. We have now corrected this so that Figure and legend match.

      (b) Figure 4 lacks control WT traces!

      The controls for this (showing that parental HeLa cells do not release ATP in response to CO<sub>2</sub> or depolarisation) are shown in Figure 2.

      (c) Figure 4, Supplement 1: High Hill coefficients of 10 are shown here, but they are not discussed anywhere, as is also the case for the remark on p.4. A Hill steepness of 10 is huge and points to many processes potentially involved. As reported above, these data are floating around in the manuscript without any connection.

      Yes, we agree this is very high and surprising. It may reflect as mentioned above the hexameric nature of the channel and that 4 Lys residues seem to be involved. We have used this equation to give some quantitative understanding of the effect of the mutations on CO<sub>2</sub> sensitivity and still think this is useful. We have no further evidence to interpret these values one way or the other.

      (4) Page 6: Carbamate bridges are proposed to be formed between K105 and K144, and between K109 and K234. The first three of these Lysine residues are located in the 55aa long cytoplasmic loop of Cx43, while K234 is in the juxta membrane region involved in tubulin interactions. Both K144 and and K234 are involved in Cx43 HC inhibition: K144 is the last aa of the L2 peptide (D119-K144 sequence) that inhibits Cx43 hemichannels while K234 is the first aa of the TM2 peptide that reduces hemichannel presence in the membrane (sequence just after TM4, at the start of the C-tail). This context should be added to increase insight and understanding of the CO2 carbamylation effects on Cx43 hemichannel opening.

      Thanks for suggesting this. We have added some discussion of CT to CL interactions in the context of regulation by pH and [Ca<sup>2+</sup>].

      (5) Page 7: The Cx43 ODDD A44V and L90V mutations lead to loss of pCO2 sensitivity in dye loading and ATP assays. However, A44V located in EL1 is reportedly associated with Cx43 HC activation, while L90V in TM2 is associated with HC inhibition. Remarkably, these mutations are focused on non-Lys residues, which brings up the question of how to link this to the paper's main thread.

      This follows the pattern that we have seen for other mutations such as A40V, A88V in Cx26 and several CMTX mutations of Cx32. Our cryoEM structures of Cx26 suggest that these mutations alter the flexibility of the molecule and hence abolish CO<sub>2</sub> sensitivity. We have reworded the text to avoid giving the impression that there is a demonstrated link between loss of CO<sub>2</sub> sensitivity of Cx43 and pathology.

      (6) Page 8: HCs constitutively open - 'constutively' perhaps does not have the best connotation as it is not related to HC constitution but CO2 partial pressure.

      Yes, we agree and have reworded this.

      (7) Page 9: "in all subtypes" -> not clear what is meant - do you mean "in all cell types"?

      We agree this is unclear -it refers to all astrocytic subtypes. We have amended the text.

      (8) Page 10: Composition of hypocapnic recording solution: bubbling description is incomplete "95%O2/5%" and should be "95%O2/5%CO2".

      Changed.

      (9) Page 11: Composition of zero Ca<sup>²⁺</sup> hypocapnic recording solution: perhaps better to call this "nominally Ca<sup>²⁺</sup>-free hypocapnic recording solution" as no Ca<sup>²⁺</sup> buffer is included in this solution

      Thanks for pointing this out. We did in fact add 1 mM EGTA to the solutions but omitted this from the recipe, this has now been corrected.

      (10) Page 11: in M&M I found that the NaHCO3- is lowered to 10 mM in the zero Ca<sup>²⁺</sup>condition, while the control experimental condition has 26 mM NaHCO3-. The zero Ca condition should be kept at a physiologically normal 26 mM NaHCO3- concentration, so why was this done? Lowering NaHCO3- during hemichannel stimulation may result in smaller responses and introduce non-linearities.

      For the dye loading we used 20 mmHg as the baseline condition and increased PCO<sub>2</sub> from this. Hence for the zero Ca<sup>2+</sup> positive control we modified the 20 mmHg hypocapnic solution by substituting Mg<sup>2+</sup> for Ca<sup>2+</sup> and adding EGTA. We have modified the text in the Methods to clarify this.

      Further remarks on the figures:

      (1) Figure 2A: Add 20 & 70 mmHg to the images, to improve the readability of this illustration.

      Done

      (2) Figure 3: WT responses are shown in panel F, but experimental data (images and curves) are lacking and should be included in a revised version.

      The wild type data is shown in Fig 2A. We have some sympathy for the comment, but we felt that Fig 2 should document CO<sub>2</sub> sensitivity, and then the subsequent Figs should analyse its basis. Hence the separation of Cx43<sup>WT</sup> data from the mutant data. In panel F, we state that we have recalculated the WT data from Fig 2A to allow the comparison.

      (3) Figures 4, 6, 8: Color codes for mmHg CO<sub>2</sub> pressure make reading these figures difficult; perhaps better to add mmHg values directly in relation to the traces.

      We have considered this suggestion but feel that the figures would become very cluttered with the additional labelling.

      (4) I wouldn't use colored lines when not necessary, e.g., Figure 9 100 µM La3+; Figure 10 (add 20->35 mmHg PCO2 switch; add scrGap26 above blue bars); Figure 11C & D.

      We agree and can see that in Figs 9 and 10 this muddles our colour scheme in other figures so have modified these figures. There was not space to put the suggested labels.

      (5) The mechanism of increased HC opening is not clear.

      We agree and have discussed various options and the analogy with what we know about Cx26. Ultimately new cryo-EM data is required.

      (6) Figure 10: 35G/35S are weird abbreviations for 35 mmHg Gap26 and scrambled Gap26.

      Yes, but we used these to fit into the available space.

      (7) Figure 11, legend: '20 mmHg PCO2 for each transfection for 70 mmHg PCO2'. It is not clear what is meant here.

      Thanks for pointing this out, we have reworded this to ensure clarity.

    1. Synthèse du webinaire : La place du numérique dans le projet associatif en 2025

      Résumé Exécutif

      Cette synthèse présente les conclusions clés de la 5ème édition du baromètre sur les pratiques numériques des associations, une étude menée conjointement par Solidatech et Recherche et Solidarité au printemps 2025 auprès de 2 285 responsables associatifs.

      L'analyse révèle une progression continue de la maturité numérique du secteur, avec 26 % des associations se considérant désormais "expérimentées", soit une hausse de 5 points par rapport à 2022.

      L'intelligence artificielle (IA) fait une entrée notable, utilisée par 18 % des associations (26 % pour les employeuses), principalement pour des gains d'efficacité, bien que des craintes éthiques et un manque de compétences demeurent des freins importants.

      Les objectifs principaux de l'usage du numérique restent stables et prioritaires : améliorer la communication (80 %), animer le réseau (75 %) et gérer les activités (70 %). Si le nombre d'associations ne rencontrant aucune difficulté a presque doublé depuis 2019 (passant de 16 % à 29 %), les freins humains (manque de compétences, appréhensions) restent la préoccupation majeure pour 44 % des structures.

      Enfin, l'étude souligne une professionnalisation croissante, avec une implication plus forte des salariés et des instances dirigeantes dans la stratégie numérique.

      1. Contexte et Méthodologie de l'Étude

      L'étude "La place du numérique dans le projet associatif en 2025" est la 5ème édition d'un baromètre initié en 2013. Elle est le fruit d'un partenariat historique entre Solidatech, un programme d'aide à la transformation numérique des associations, et Recherche et Solidarité, une association spécialisée dans la connaissance de la vie associative.

      Objectifs du baromètre :

      ◦ Suivre l'évolution des pratiques numériques dans les associations.    ◦ Fournir des enseignements utiles aux acteurs associatifs pour guider leurs démarches.    ◦ Informer les acteurs du numérique sur les réalités et spécificités du secteur associatif.    ◦ Constituer une ressource majeure pour les structures d'appui à la vie associative (CRDLA, Guid'Asso).

      Méthodologie :

      Échantillon : 2 285 responsables d'associations ont répondu à l'enquête.    ◦ Représentativité : Les résultats ont été redressés selon la méthode des quotas pour assurer leur représentativité par rapport au secteur associatif dans son ensemble et spécifiquement pour les associations employeuses.    ◦ Analyse : Les données sont analysées globalement et peuvent être segmentées par secteur d'activité, budget, effectif, contexte géographique (rural, urbain, QPV) et maturité numérique.

      2. État des Lieux de la Maturité Numérique en 2025

      Perception de la Maturité Numérique

      L'étude révèle une progression constante de la maturité numérique des associations. La part des associations se déclarant "expérimentées" a gagné 5 points depuis 2022, principalement au détriment de celles se jugeant "en progrès".

      Niveau de Maturité

      2019

      2022

      2025

      Peu initiée

      ~22%

      ~22%

      ~22%

      En progrès

      52%

      52%

      47%

      Expérimentée

      21%

      21%

      26%

      Implication et Gouvernance du Numérique

      L'étude montre une professionnalisation et une prise en main plus stratégique des sujets numériques au sein des associations.

      Professionnalisation : 30 % des associations employeuses confient désormais la gestion du numérique à un salarié dédié, marquant une tendance à la hausse.

      Implication des dirigeants : Le conseil d'administration ou le bureau s'implique directement sur les sujets numériques dans 24 % des associations, une proportion en augmentation continue depuis 2022, ce qui suggère une approche plus stratégique.

      Dépendance : Un référent unique (bénévole pour 24 %, salarié pour 30 %) gère souvent le numérique, créant un risque de dépendance et de perte de compétences en cas de départ.

      Budgets Alloués au Numérique

      La moitié des associations (50 %) dispose d'un budget dédié au numérique pour les dépenses courantes (maintenance, abonnements, hébergement).

      Investissement : 21 % des associations ont un budget d'investissement pour l'achat de matériel ou des conseils stratégiques.

      Prise de conscience : 24 % n'ont pas de budget dédié mais considèrent que ce serait une bonne idée.

      Cas spécifiques : 21 % estiment qu'un budget n'est pas utile, souvent car il s'agit de très petites structures s'appuyant sur les outils personnels des bénévoles.

      3. Objectifs, Usages et Outils Numériques

      Les Objectifs Prioritaires

      Le "top 3" des objectifs recherchés via le numérique reste inchangé, mais les usages s'intensifient avec une progression de 5 à 7 points pour chaque item par rapport à 2022.

      1. Mieux faire connaître l'association (Communication & Visibilité) : 80 %

      2. Améliorer l'animation du réseau (Lien interne et externe) : 75 %

      3. Gérer plus efficacement les activités : 70 %

      Deux pratiques connaissent une progression particulièrement forte :

      Travailler plus efficacement ensemble : Utilisé par 57 % des associations, soit un gain de 18 points depuis 2019, une tendance accélérée par la crise sanitaire.

      Rechercher des financements / collecter des dons : Concerne 33 % des associations, en hausse de 10 points depuis 2019, reflétant le besoin de diversifier les ressources.

      L'Usage des Outils Libres

      43 % des associations utilisent des outils libres. Pour la première fois en 2025, les motivations éthiques dépassent les raisons pratiques.

      Pour des raisons éthiques : 23 % (transparence, partage, liberté de l'information).

      Pour des raisons pratiques : 20 %.

      Besoin d'accompagnement : 14 % n'en utilisent pas mais souhaiteraient être accompagnées.

      Ne sait pas / Ne se prononce pas : 22 % des répondants, indiquant une méconnaissance persistante de cet écosystème.

      4. Focus Spécifique : L'Intelligence Artificielle (IA)

      Taux d'Adoption et Potentiel

      L'IA est une réalité émergente dans le secteur associatif, avec un potentiel de développement significatif.

      Taux d'utilisation actuel :

      18 % pour l'ensemble des associations.    ◦ 26 % pour les associations employeuses.

      Potentiel à court terme : 13 % des associations réfléchissent à son utilisation (18 % des employeuses), portant le potentiel total à 31 % (44 % pour les employeuses).

      Comparaison : Les associations employeuses (26 %) sont légèrement en retrait par rapport aux PME et ETI, qui affichent un taux d'adoption de 32 % (source : BPI France, 2025).

      Principaux Usages de l'IA

      Les associations se tournent vers l'IA principalement pour optimiser leurs opérations et leur communication.

      Usages de l'IA (utilisateurs actuels et potentiels)

      Ensemble des associations

      Associations employeuses

      Gagner en efficacité dans les tâches quotidiennes (ex: comptes-rendus)

      70 %

      >70%

      Créer des supports de communication internes ou externes (ex: images, vidéos)

      59 %

      >59%

      Créer des documents pédagogiques adaptés aux publics

      41 %

      >41%

      Faciliter l'analyse de données

      39 %

      >39%

      Faciliter les réponses aux appels à projets / demandes de subvention

      27 %

      >27%

      Appréhensions et Risques Identifiés

      Malgré leur intérêt, les associations expriment de fortes appréhensions, notamment les employeuses qui, bien que plus utilisatrices, sont aussi plus conscientes des risques.

      Appréhensions liées à l'IA

      Ensemble des associations

      Associations employeuses

      Craintes éthiques (perte de lien humain, désinformation)

      47 %

      >47%

      Manque de compétences en interne

      45 %

      >45%

      Risques et impact environnemental

      36 %

      >36%

      Risques liés à la confidentialité des données

      36 %

      >36%

      Risque de déstabiliser l'organisation (disparition de fonctions, etc.)

      8 %

      >8%

      Le faible score (8 %) du risque organisationnel suggère que les usages sont encore perçus comme ponctuels et que l'impact structurel de l'IA est sous-estimé.

      5. Difficultés Rencontrées et Leviers d'Action

      Évolution des Difficultés

      Une nette amélioration est observée : en 2025, 29 % des responsables déclarent ne rencontrer aucune difficulté particulière, contre seulement 16 % en 2019. Pour les 71 % qui en rencontrent, la hiérarchie des freins reste stable.

      1. Difficultés humaines (44 %) : Reste la préoccupation principale (lever les appréhensions, trouver les compétences, maintenir le lien).

      2. Difficultés techniques (33 %) : Stables, en lien avec l'évolution rapide des technologies et les risques (cybersécurité).

      3. Difficultés financières (24 %) : En forte baisse (vs. 41 % en 2019), mais ce chiffre est à nuancer car 81 % des associations financent le numérique sur fonds propres, ce qui peut créer des tensions de trésorerie.

      4. Difficultés stratégiques (21 %) : Considérées comme souvent sous-estimées par les analystes de l'étude.

      Témoignages d'Acteurs Associatifs (Verbatims)

      Sur le manque de temps : "Le problème [c'est] surtout de temps, des idées mais pas le temps de les mettre en place, de former et d'informer."

      Sur la dépendance : "Ancien bénévole qui maîtrise part. Le risque est de n'avoir personne pour assurer la continuité."

      Sur les financements : "Nous multiplions des comptes gratuits ou à bas coût qui ne sont pas reliés entre eux."

      Sur la cybersécurité : "Nous subissons du phishing de plus en plus évolué."

      Attentes pour Progresser

      Pour surmonter ces obstacles, les associations expriment plusieurs attentes :

      Meilleure connaissance des outils existants (47 %).

      • Montée en compétences des équipes.

      • Partage d'expériences avec d'autres associations.

      Accompagnement pour définir une stratégie numérique ou un diagnostic personnalisé (20 %).

      6. Les Clés de la Réussite de la Transformation Numérique

      L'étude conclut en rappelant quatre principes fondamentaux pour mener à bien un projet numérique :

      1. Ne pas perdre de vue le projet associatif : Le numérique doit rester un outil au service des missions de l'association, et non une fin en soi.

      2. Considérer la singularité de chaque projet : Prendre en compte les spécificités de l'association (valeurs, contraintes budgétaires, parties prenantes) pour orienter le choix des solutions et la conduite du changement.

      3. Instaurer une culture numérique partagée : Fournir un bagage minimum à tous les membres pour éviter les fractures numériques internes et favoriser l'adoption collective des outils.

      4. Suivre un cheminement par étape : Aborder la mise en place d'un nouvel outil comme un projet à part entière, avec une méthodologie claire (nommer un responsable, impliquer les utilisateurs, tester, former, déployer).

      --------------------------------------------------------------------------------

      Ce document est une synthèse du webinaire "La place du numérique dans le projet associatif en 2025", diffusé par Solidatech. Les données et analyses proviennent exclusivement des propos tenus par les intervenants (Lauren Gouin, Cécile Basin, Boris) durant la présentation.

    1. Synthèse du webinaire : IA & Associations

      Résumé Exécutif

      Ce document synthétise les enseignements clés du webinaire "IA & Associations : une bonne idée ?", présenté par Solidatech en collaboration avec des experts de la société Advent. L'intelligence artificielle (IA), et plus particulièrement les agents conversationnels génératifs comme ChatGPT, Claude ou Mistral, représente une opportunité majeure pour les associations, leur permettant d'optimiser leur efficacité opérationnelle et leur prise de décision stratégique. Le webinaire a mis en lumière trois axes principaux : les applications pratiques concrètes (rédaction de demandes de subvention, organisation d'événements), les risques inhérents à leur utilisation (fuites de données, biais, hallucinations) et les meilleures pratiques pour formuler des requêtes efficaces ("prompt engineering"). L'approche préconisée est celle d'une adoption mesurée et stratégique, en utilisant l'IA pour des tâches répondant à la méthode des "3 C" : Chronophages, Compliquées et peu motivantes. Enfin, des organisations de soutien comme Solidatech et le programme Cyber Forgood, ainsi que des outils spécifiques, ont été présentés comme des ressources clés pour accompagner les associations dans cette transition.

      --------------------------------------------------------------------------------

      1. Contexte et Acteurs de Soutien

      Le webinaire visait à démystifier l'usage de l'IA pour le secteur associatif en fournissant des clés de compréhension, des exemples pratiques et des stratégies de mitigation des risques.

      Solidatech

      Présenté par Lauren Guouin, Solidatech est un programme de solidarité numérique qui accompagne plus de 45 000 associations dans leur transition numérique depuis 2008. Porté par la coopérative d'insertion Les Ateliers du Bocage (mouvement Emmaüs), le programme agit sur trois fronts :

      Équipements numériques : Accès à des logiciels (Microsoft, Adobe, etc.) et du matériel informatique (neuf ou reconditionné) à tarifs solidaires.

      Montée en compétences : Mise à disposition de ressources (articles, newsletters, autodiagnostic numérique), formations certifiées Qualiopi et accompagnements personnalisés.

      Production de savoirs : Diffusion d'études, comme "La place du numérique dans le projet associatif".

      Cyber Forgood

      Animé par Julio de la société Advent, Cyber Forgood est un programme dédié à la protection et à l'accompagnement des acteurs de l'économie sociale et solidaire face aux cyber-risques. Une nouvelle plateforme, cyberforgood.org, sera lancée le 3 novembre et proposera dès janvier :

      • Une académie en ligne de 5 mois sur l'hygiène numérique, le RGPD et l'IA.

      • Un "boot camp" en présentiel à Paris pour échanger avec des experts.

      • Des accompagnements pro bono en cybersécurité.

      --------------------------------------------------------------------------------

      2. Comprendre l'Intelligence Artificielle Générative

      Léonard Kip, expert en cybersécurité et IA chez Advent, a défini l'IA comme un programme autonome capable d'imiter des actions humaines (prédiction, génération de contenu, prise de décision). L'explosion récente concerne l'IA générative, qui crée du contenu original à partir d'une requête.

      Comment fonctionnent les agents conversationnels ? Ces outils ne "comprennent" pas une question au sens humain. Ils s'appuient sur des réseaux de neurones artificiels entraînés sur des quantités astronomiques de données. Leur fonction principale est de prédire le mot suivant le plus probable en fonction du contexte fourni par la requête de l'utilisateur. Chaque nouveau mot généré enrichit le contexte, permettant de prédire le suivant, et ainsi de suite, pour construire une réponse cohérente. Cette mécanique explique pourquoi la précision et la richesse de la requête initiale sont cruciales pour obtenir un résultat pertinent.

      --------------------------------------------------------------------------------

      3. Analyse des Risques Majeurs et Stratégies de Mitigation

      L'utilisation de l'IA comporte des risques significatifs qu'il est essentiel de maîtriser. Un sondage réalisé durant le webinaire a révélé que la fuite de données confidentielles est la principale préoccupation (67 % des répondants).

      Risque Identifié

      Description

      Stratégies de Mitigation

      Hallucinations

      L'IA présente des informations factuellement incorrectes mais de manière très convaincante, car elle a tendance à vouloir satisfaire l'utilisateur plutôt que d'admettre son ignorance.

      - Vérifier systématiquement les réponses, surtout les plus surprenantes.<br>- Demander à l'IA de confirmer ou de détailler son raisonnement.<br>- Découper une requête complexe en plusieurs tâches plus simples.

      Biais Cognitifs

      L'IA reproduit les stéréotypes et préjugés présents dans ses données d'entraînement (internet, ouvrages), ce qui peut mener à des réponses discriminatoires.

      - Demander explicitement à l'IA d'éviter les biais et d'être "ouverte d'esprit".<br>- Relire sa propre requête pour s'assurer qu'elle n'induit pas de biais.<br>- Demander à l'IA de corriger une réponse si un biais est identifié.

      Fuite de Données Confidentielles

      Les conversations peuvent être utilisées par les éditeurs pour entraîner les futures versions de leurs modèles. Des fuites massives ont déjà eu lieu (ex: 370 000 conversations de l'IA Grok).

      - Ne jamais fournir d'informations sensibles (dossiers médicaux, données personnelles identifiables).<br>- Généraliser ou approximer les données (ex: "une femme dans la quarantaine" au lieu d'un âge précis).<br>- Utiliser les modes de "conversation éphémère" (disponibles sur Claude, Mistral) qui effacent les échanges.<br>- Dans les paramètres du compte, refuser l'utilisation des données pour l'amélioration de l'IA et programmer la suppression de l'historique.

      Génération de Contenu Dangereux

      L'IA peut être utilisée pour créer des contenus malveillants, bien que les plateformes majeures renforcent leurs garde-fous.

      - Signaler tout contenu inapproprié à l'éditeur de l'outil.<br>- Pour les associations proposant des services basés sur l'IA, mettre en place des systèmes de modération.

      Utilisation à des Fins Illégales

      Le risque le plus médiatisé est le "deepfake" (hypertrucage) : la création de fausses vidéos, images ou audios pour usurper l'identité d'une personne, une technique devenue très accessible.

      - Sensibiliser les membres et bénéficiaires aux risques légaux.<br>- Contrôler les usages si l'association met un service d'IA à disposition.

      --------------------------------------------------------------------------------

      4. L'Art de la Requête : Comment Dialoguer Efficacement avec une IA

      Pour dépasser le stade de la simple question-réponse et obtenir des résultats à haute valeur ajoutée, il est nécessaire de pratiquer l'ingénierie de requête ("prompt engineering"). Une requête efficace se compose de plusieurs éléments.

      La Formule d'une Requête Complète :

      1. Instruction : La tâche principale à effectuer.

      2. Contexte : Le "pourquoi" de la demande, le public cible, les objectifs et les enjeux. Cet élément est crucial pour guider l'IA.

      3. Format : La structure de la réponse souhaitée (tableau, liste à puces, résumé, nombre de mots). Avec le contexte, c'est l'ajout qui apporte le plus de valeur.

      4. Ton : Le style rédactionnel attendu (formel, créatif, empathique, etc.).

      5. Rôle/Persona : Demander à l'IA d'incarner un expert (ex: "Agis en tant que spécialiste de la collecte de fonds").

      6. Exemple : Fournir un ou plusieurs exemples du résultat attendu pour guider la génération.

      --------------------------------------------------------------------------------

      5. Cas d'Usage Concrets pour les Associations

      Les démonstrations réalisées avec l'outil Claude illustrent le potentiel de l'IA pour des tâches complexes.

      Aide à la Rédaction de Dossiers (ex: Demande de Subvention) :

      Scénario : Une association de recyclage d'ordinateurs veut répondre à un appel à projet pour obtenir 500 000 €.    ◦ Méthode : La requête incluait le contexte de l'association, l'objectif et l'intégralité du texte de l'appel à projet.    ◦ Résultat : L'IA a d'abord posé des questions pour obtenir des informations complémentaires (budget, effectifs), puis a généré un plan détaillé du dossier de réponse, des arguments alignés sur les axes de l'appel à projet et une première ébauche de contenu.

      Organisation d'Événements :

      Scénario : L'association souhaite organiser une soirée mémorable pour ses 20 ans.    ◦ Méthode : La requête demandait 5 idées d'activités originales.    ◦ Résultat : L'IA a proposé des concepts créatifs (ex: un "mur des 10 000 histoires" de bénéficiaires). Dans un second temps, elle a aidé à élaborer un rétroplanning et des estimations budgétaires pour mettre en œuvre les idées choisies.

      Aide à la Décision Stratégique :

      Scénario : L'association, basée à Paris, doit choisir deux nouvelles villes pour implanter des antennes.    ◦ Méthode : La requête demandait de proposer 10 villes et de les comparer selon trois critères : efficacité contre la fracture numérique, coût d'exploitation et potentiel de recrutement de bénévoles.    ◦ Résultat : L'IA a fourni une analyse comparative chiffrée et a recommandé Marseille et Lille en justifiant ce choix par une couverture géographique Nord-Sud optimale, dépassant la simple analyse des scores individuels.

      --------------------------------------------------------------------------------

      6. Outils Recommandés et Approche Stratégique

      Sélection d'Outils Pertinents

      Agents Conversationnels :

      Claude : Recommandé pour son alignement éthique (fondé par d'anciens d'OpenAI pour des raisons éthiques).    ◦ Mistral : Une alternative française/européenne de premier plan, privilégiée pour des enjeux de souveraineté numérique.

      Assistant de Réunion :

      Nuta : Solution française qui s'intègre aux outils collaboratifs pour générer des transcriptions, des comptes-rendus et des résumés de réunion.

      Création Marketing :

      Canva : Intègre désormais des fonctionnalités IA pour aider à la création de campagnes marketing (vigilance requise sur les questions de propriété intellectuelle).

      Définir une Stratégie d'Adoption : La Méthode des "3 C"

      Pour éviter un usage excessif et énergivore de l'IA, il est conseillé de l'adopter de manière ciblée. La première étape pour une association est d'identifier collectivement les tâches qui répondent aux trois critères suivants :

      1. Chronophage : Une tâche qui consomme beaucoup de temps.

      2. Compliquée : Une tâche qui demande une réflexion ou une expertise non triviale.

      3. Peu motivante : Une tâche répétitive ou administrative qui pèse sur les équipes.

      Si une tâche répond à ces trois critères, alors l'utilisation d'une IA pour l'assister ou l'automatiser est justifiée. Cette approche permet de commencer par un cas d'usage à fort impact et d'habituer progressivement les équipes.

      Versions Gratuites vs. Payantes

      Le passage à une version payante se justifie si l'outil est utilisé très fréquemment et que les limites de la version gratuite sont atteintes. Les versions payantes donnent généralement accès à des modèles plus performants, réduisant les risques de biais et d'hallucinations, sans toutefois les éliminer complètement.

      --------------------------------------------------------------------------------

      7. Conclusion : Vers une Utilisation Maîtrisée et Bénéfique

      L'IA doit être considérée comme un assistant puissant et non comme une solution magique ou un substitut à l'expertise humaine. La clé réside dans le maintien du contrôle et de l'esprit critique sur les contenus générés. Comme le souligne Léonard Kip : "Maîtriser l'IA, c'est pour votre épanouissement, pas votre paresse." Une approche progressive, axée sur des besoins réels et menée avec une conscience aiguë des risques, permettra aux associations de tirer le meilleur parti de cette révolution technologique.

    1. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      One of the most novel things of the manuscript is the use of a relatively quick photoablation system. Could this technique be applied in other laboratories? While the revised manuscript includes more technical details as requested, the description remains difficult to follow for readers from a biology background. I recommend revising this section to improve clarity and accessibility for a broader scientific audience.

      As suggested, we have adapted the paragraph related to the photoablation technique in the Material & Method section, starting line 1147. We believe it is now easier to follow.

      The authors suggest that in the animal model, early 3h infection with Neisseria do not show increase in vascular permeability, contrary to their findings in the 3D in vitro model. However, they show a non-significant increase in permeability of 70 KDa Dextran in the animal xenograft early infection. As a bioengineer this seems to point that if the experiment would have been done with a lower molecular weight tracer, significant increases in permeability could have been detected. I would suggest to do this experiment that could capture early events in vascular disruption.

      Comparing permeability under healthy and infected conditions using Dextran smaller than 70 kDa is challenging. Previous research (1) has shown that molecules below 70 kDa already diffuse freely in healthy tissue. Given this high baseline diffusion, we believe that no significant difference would be observed before and after N. meningitidis infection, and these experiments were not carried out. As discussed in the manuscript, bacteria-induced permeability in mice occurs at later time points, 16h post-infection, as shown previously (2). As discussed in the manuscript, this difference between the xenograft model and the chip could reflect the absence of various cell types present in the tissue parenchyma or simply vessel maturation time.

      One of the great advantages of the system is the possibility of visualizing infection-related events at high resolution. The authors show the formation of actin in a honeycomb structure beneath the bacterial microcolonies. This only occurred in 65% of the microcolonies. Is this result similar to in vitro 2D endothelial cultures in static and under flow? Also, the group has shown in the past positive staining of other cytoskeletal proteins, such as ezrin, in the ERM complex. Does this also occur in the 3D system?

      We imaged monolayers of endothelial cells in the flat regions of the chip (the two lateral channels) using the same microscopy conditions (i.e., Obj. 40X N.A. 1.05) that have been used to detect honeycomb structures in the 3D vessels in vitro. We showed that more than 56% of infected cells present these honeycomb structures in 2D, which is 13% less than in 3D, and is not significant due to the distributions of both populations. Thus, we conclude that under both in vitro conditions, 2D and 3D, the amount of infected cells exhibiting cortical plaques is similar. These results are in Figure 4E and S4B.

      We also performed staining of ezrin in the chip and imaged both the 3D and 2D regions. Although ezrin staining was visible in 3D (Author response image 1), it was not as obvious as other markers under these infected conditions, and we did not include it in the main text. Interpretation of this result is not straightforward, as the substrate of the cells is different, and it would require further studies on the behavior of ERM proteins in these different contexts.

      Author response image 1.

      F-actin (red) and ezrin (yellow) staining after 3h of infection with N. meningitidis (green) in 2D (top) and 3D (bottom) vessel-on-chip models.

      Recommendation to the authors:

      Reviewer #1 (Recommendation to the authors):

      I appreciate that the authors addressed most of my comments, of special relevance are the change of the title and references to infection-on-chip. I think that the current choice of words better acknowledges the incipient but strong bioengineering infection community. I also appreciate the inclusion of a limitation paragraph that better frames the current work and proposes future advancements.

      The addition of more methodological details has improved the manuscript. Although as mentioned earlier the wording needs to be accessible for the biology community. I also appreciated the addition of the quantification of binding under the WSS gradient in the different geometries and shown in Fig 3H. However, the description of the figure and the legend is not clear. What does "vessel" mean on the graph and "normalized histograms ...(blue)" in the figure legend. Could the authors rephrase it?

      In Figure 3F, we investigated whether Neisseria meningitidis exhibits preferential sites of infection. We hypothesized that, if bacteria preferentially adhered to specific regions, the local shear stress at these sites would differ from the overall distribution. To test this, we compared the shear stress at bacterial adhesion sites in the VoC (orange dots and curve) with the shear stress along the entire vascular edges (blue dots and curve). The high Spearman correlation indicates that there is no distinct shear stress value associated with bacterial adhesion. This suggests that bacteria can adhere across all regions, independently of local shear stress. To enhance clarity, the legend of Figure 3 and the related text have been rephrased in the revised manuscript (L289-314).

      Line 415. Should reference to Fig S5B, not Fig 5B. Also, the titles in Supplementary Figure 4 and 5 are duplicated, and the description of the legend inf Fig S5 seems a bit off. A and B seem to be swapped.

      Indeed, the reference to the right figure has been corrected. Also, the title of Figure S4 has been adapted to its contents, and the legend of Figure S5 has been corrected.

      Reviewer #2 (Recommendation to the authors):

      Minor comments to the authors:

      Line 163 "they formed" instead of "formed".

      Line 212 "two days" instead of "two day"

      Line 269 a space between two words is missing.

      These three comments have been addressed in the revised manuscript.

      In addition, I appreciate answering the comments, especially those requiring hypothesizing about including further cells. However, when discussing which other cells could be relevant for the model (lines 631 to 632) it would be beneficial to discuss not only the role of those cells but also how could they be included in the model. I think for the reader, inclusion of further cells could be seen as a challenge or limitation, and addressing these technical points in the discussion could be helpful.

      We thank Reviewer #2 for the insightful suggestion. Indeed, the method of introducing cells into the VoC depends on their type. Fibroblasts and dendritic cells, which are resident tissue cells, should be embedded in the collagen gel before polymerization and UV carving. This requires careful optimization to preserve chip integrity, as these cells exert pulling forces while migrating within the collagen matrix. In contrast, T cells and macrophages should be introduced through the vessel lumen to mimic their circulation in vivo. Pericytes can be co-seeded with endothelial cells, as they have been shown to self-organize within a few hours post-seeding. These important informations are now included in the manuscript (L577-587).

      Reviewer #3 (Recommendation to the authors):

      Suggestions and Recommendations

      Some suggestions related to the VOC itself:

      Figure 1, Fig S1, paragraph starting line 1071: More information would be helpful for the laser photoablation. For instance, is a non-standard UV laser needed? Which form of UV light is used? What is the frequency of laser pulsing? How many pulses/how long is needed to ablate the region of interest?

      The photoablation process requires a focused UV-laser, with high frequency (10 kHz) to lower the carving time while providing the required intensity to degrade collagen gel. To carve a reproducible number of 30 µm-large vessels, we used a 2 µm-large laser beam at an energy of 10 mW and moved the stage (i.e., sample) at a maximum speed of 1 mm/s. This information has been added to the related paragraph starting on line 1147 of the revised manuscript.

      It is difficult to understand the geometry of the VOC. In Figure 1C, is the light coloration representing open space through which medium can flow, and the dark section the collagen? On a single chip, how many vessels are cut through the collagen? It looks as if at least two are cut in Figure 1C in the righthand photo.

      In Figure 1C, the light coloration is the Factin staining. The horizontal upper and lower parts are the 2D lateral channels that also contain endothelial cells, and are connected to inlets and outlets, respectively. In the middle, two vertically carved 3D vessels are shown in the confocal image.

      Technically, we designed the PDMS structures to allow carving of 1 to 3 channels, maximizing the number of vessels that can be imaged while minimizing any loss of permeability at the PDMS/collagen/cells interface. This information has been added in the revised manuscript (L. 1147).

      If multiple vessels are cut in the center channel between the lateral channels, how do you ensure that medium flow is even between all vessels? A single chip with multiple different vessel architectures through the center channel would be expected to have different hydrostatic resistance with different architectures, thereby causing differences in flow rates in each vessel.

      To ensure a consistent flow rate regardless of the number of carved vessels, we opted to control the flow rate directly across the chip with a syringe pump. During experiments, one inlet and one outlet were closed, and a syringe pump was used. Because the carved vessels are arranged in parallel (derivation), the flow rate remains the same in each vessel. If a pressure controller had been used instead, the flow would have been distributed evenly across the different channels. This has been added to the revised manuscript in the paragraph starting on line 1210.

      The figures imply that the laser ablation can be performed at depth within the collagen gel, rather than just etching the surface. If this is the case, it should be stated explicitly. If not, this needs to be clarified.

      One of the main advantages of the photoablation technique is carving the collagen gel in volume, and not only etching the surface. Thanks to the 3D UV degradation, we can form the 3D architecture surrounded by the bulk collagen. This has been added to the revised manuscript, lines 154-155.

      Is the in-vivo-like vessel architecture connected to the lateral channel at an oblique angle, or is the image turned to fit the entire structure? (Figure 1F and 3E). Is that why there is high shear stress at its junction with the lateral channel depicted in Figure 3E?

      All structures require connection to the lateral channels to ensure media circulation and nutrient supply. The in vivo-like design must be rotated to allow the upper and lower branches of the complex structure to pass between the fixed PDMS pillars. To remain consistent with the image and the flow direction, we have kept the same orientation as in the COMSOL simulation. This leads to a locally higher shear stress at the top of the architecture. This has been added in the revised manuscript, in the paragraph starting on line 1474.

      Figure S1F,G: In the legend, shapes are circles, not squares. On the graphs, what do the numbers in parentheses mean?

      Indeed, the terms "squares" have been replaced by "circles" in Figure 1. (1) and (2) refer to the providers of the collagen, FujiFilm and Corning, respectively. We have added this mention in the legend in Figure S1.

      Figure 3B: how do the images on the left and right differ? Each of the 4 images needs to be explained.

      The four images represent the infected VoC from different viewing angles, illustrating the three-dimensional spread of infection throughout the vessel. A more detailed description has been added in the legend of Figure 3.

      Figure S3C is not referenced but should be, likely before sentence starting on line 299.

      Indeed, the reference to Figure S3C has been added line 301 of the revised manuscript.

      Results in Figure 3 with the pilD mutant are very interesting. It is worth commenting in the Discussion about how T4P functionality in addition to the presence of T4P contributes to Nm infection, and how in the future this could be probed with pilT mutants.

      We thank Reviewer #3 for this relevant insight. Following adhesion, a key functionality of Neisseria meningitidis for colony formation and enhanced infection is twitching motility. As suggested, we have added in the Discussion the idea of using a PilT mutant, which can adhere but cannot retract its pili, in the VoC model to investigate the role of motility in colonization in vitro under flow conditions (L611–623).

      Which vessel design was used for the data presented in Figures 4, 5, and 6 and associated supplemental figures?

      Straight channels have been mostly used in figures 4, 5, and 6. Rarely, we used the branched in vivo-like designs to observe potential similar infection patterns to in vivo, and related neutrophil activity. This has been added in the revised manuscript, lines 1435-1439.

      Figure 4B-D: the images presented in Figure 4C are not representative of the averages presented in Figures 4B,D. For instance, the aggregates appear much larger and more elongated in the animal model in Figure 4C, but the animal model and VOC have the colony doubling time (implying same size) in Figure 4B, and same average aggregate elongation in Figure 4D.

      The images in Figure 4C were selected to illustrate the elongation of colonies quantified in Figure 4D. The elongation angles are consistent between both images and align with the channel orientation. Representative images of colony expansion over time, corresponding to Figure 4A and 4B, are provided in Figure S4A.

      Figures 4E-F: dextran does not appear to diffuse in the VOC in response to histamine in these images, yet there is a significant increase in histamine-induced permeability in Figure 4F. Dotted lines should be used to indicate vessel walls for histamine, and/or a more representative image should be selected. A control set of images should also be included for comparison.

      We thank Reviewer #3 for the insightful comment. We confirm that we have carefully selected representative images for the histamine condition and adjusted them to display the same range of gray levels. The apparent increase in permeability with histamine is explained by a slight rise in background fluorescence, combined with the smaller channel size shown in Figure 4E.

      Figure S4 title is a duplicate of Figure S5 and is unrelated to the content of Figure S4. Suggest rewording to mention changes in permeability induced by Nm infection in the VOC and animal model.

      Indeed, the title of Figure S4 did not correspond to its content. We have, thus, changed it in the revised manuscript.

      Line 489 "...our Vessel-on-Chip model has the potential to fully capture the human neutrophil response during vascular infections, in a species-matched microenvironment", is an overstatement. As presented, the VOC model only contains endothelial cells and neutrophils. Many other cell types and structures can affect neutrophil activity. Thus, it is an overstatement to claim that the model can fully capture the human neutrophil response.

      We agree with the Reviewer #3, that neutrophil activity is fully recapitulated with other cell types, such as platelets, pericytes, macrophages, dendritic cells, and fibroblasts, that secrete important molecules such as cytokines, chemokines, TNF-α, and histamine. In our simplified model we were able to reconstitute the complex interaction of neutrophils with endothelial cells and with bacteria. The text was modified accordingly.

      Supplemental Figure 6 - Does CD62E staining overlap with sites of Nm attachment

      E-selectin staining does not systematically colocalize with Neisseria meningitidis colonies although bacterial adhesion is required. Its overall induced expression is heterogeneous across the tissue and shows heterogeneity from cell to cell as seen in vivo.

      Line 475, Figure 6E- Phagocytosis of Nm is described, but it is difficult to see. An arrow should be added to make this clear. Perhaps the reference should have been to Figure 6G? Consider changing the colors in Figure 6G away from red/green to be more color-blind friendly.

      Indeed, the reference to the right figure is Figure 6G, where the phagocytosis event is zoomed in. We have changed it in the text. Adapting the color of this figure 6G would imply to also change all the color codes of the manuscript, as red has been used for actin and green for Neisseria meningitidis.

      Lines 621-632 - This important discussion point should be reworked. Some suggested references to cite and discuss include PMID: 7913984, 15186399, 17991045, 18640287, 19880493.

      We have introduced in the discussion parts the following references as suggested (3–7), and discussed more the importance of introducting of immune cells to study immune cell-bacteria interaction and related immune response (L659-678).

      Minor corrections:

      •  Line 8 - suggest "photoablation-generated" instead of "photoablation-based"

      •  Line 57- remove the word "either", or modify the sentence

      •  Sentence on lines 162-165 needs rewording

      •  Lines 204-205- "loss of vascular permeability" should read "increase in vascular permeability"

      •  Line 293- "Measured" shear stress, should be "computed", since it was not directly measured (according to the Materials & Methods)

      •  Line 304- "consistently" should be "consistent"

      •  Fig. 3 legend, second line: replace "our" with "the VoC"

      •  Line 371, change "our" to "the"

      •  Line 415- Figure 5B doesn’t appear to show 2-D data. Is this in Figure S5B? Some clarification is needed. The quantification of Nm vessel association in both the VOC and the animal model should be shown in Figure 5, for direct comparison.

      •  Supplementary Figure 5C: correlation coefficient with statistical significance should be calculated.

      •  Figure 6 title, rephrase to "The infected VOC model"

      •  Line 450, replace "important" with "statistically significant"

      •  Line 459, suggest rephrasing to "bacterial pilus-mediated adhesion"

      •  Line 533- grammar needs correction

      •  Line 589- should be "sheds"

      •  Line 1106- should be "pellet"

      •  Lines 1223-1224 - is the antibody solution introduced into the inlet of the VOC for staining? Please clarify.

      •  Line 1295-unclear why Figure 2B is being referenced here

      All the suggested minor corrections have been taken into account in the revised manuscript.

      References

      (1) Gyohei Egawa, Satoshi Nakamizo, Yohei Natsuaki, Hiromi Doi, Yoshiki Miyachi, and Kenji Kabashima. Intravital analysis of vascular permeability in mice using two-photon microscopy. Scientific Reports, 3(1):1932, Jun 2013. ISSN 2045-2322. doi: 10.1038/srep01932.

      (2) Valeria Manriquez, Pierre Nivoit, Tomas Urbina, Hebert Echenique-Rivera, Keira Melican, Marie-Paule Fernandez-Gerlinger, Patricia Flamant, Taliah Schmitt, Patrick Bruneval, Dorian Obino, and Guillaume Duménil. Colonization of dermal arterioles by neisseria meningitidis provides a safe haven from neutrophils. Nature Communications, 12(1):4547, Jul 2021. ISSN 2041-1723. doi: 10.1038/s41467-021-24797-z.

      (3) Katherine A. Rhodes, Man Cheong Ma, María A. Rendón, and Magdalene So. Neisseria genes required for persistence identified via in vivo screening of a transposon mutant library. PLOS Pathogens, 18(5):1–30, 05 2022. doi: 10.1371/journal.ppat.1010497.

      (4) Heli Uronen-Hansson, Liana Steeghs, Jennifer Allen, Garth L. J. Dixon, Mohamed Osman, Peter Van Der Ley, Simon Y. C. Wong, Robin Callard, and Nigel Klein. Human dendritic cell activation by neisseria meningitidis: phagocytosis depends on expression of lipooligosaccharide (los) by the bacteria and is required for optimal cytokine production. Cellular Microbiology, 6(7):625–637, 2004. doi: https://doi.org/10.1111/j.1462-5822.2004.00387.x.

      (5) M. C. Jacobsen, P. J. Dusart, K. Kotowicz, M. Bajaj-Elliott, S. L. Hart, N. J. Klein, and G. L. Dixon. A critical role for atf2 transcription factor in the regulation of e-selectin expression in response to non-endotoxin components of neisseria meningitidis. Cellular Microbiology, 18(1):66–79, 2016. doi: https://doi.org/10.1111/cmi.12483.

      (6) Andrea Villwock, Corinna Schmitt, Stephanie Schielke, Matthias Frosch, and Oliver Kurzai. Recognition via the class a scavenger receptor modulates cytokine secretion by human dendritic cells after contact with neisseria meningitidis. Microbes and Infection, 10(10):1158–1165, 2008. ISSN 1286-4579. doi: https://doi.org/10.1016/j.micinf.2008.06.009.

      (7) Audrey Varin, Subhankar Mukhopadhyay, Georges Herbein, and Siamon Gordon. Alternative activation of macrophages by il-4 impairs phagocytosis of pathogens but potentiates microbial-induced signalling and cytokine secretion. Blood, 115(2):353–362, Jan 2010. ISSN 0006-4971. doi: 10.1182/blood-2009-08-236711.

    1. Bristles should be 0.2 mm indiameter, 10 mm in length and haverounded tips.2. Handle of the brush should bestraight and long enough for thepalm to grasp.3. There should be 3-4 rows, each consisting of 5-12 bristle cluster

      ① : Bristles should be 0.2 mm in diameter, 10 mm in length and have rounded tips. ① : Kıllar 0,2 mm çapında, 10 mm uzunluğunda ve yuvarlatılmış uçlu olmalıdır.

      ② : Handle of the brush should be straight and long enough for the palm to grasp. ② : Fırçanın sapı düz ve avucun kavrayabileceği kadar uzun olmalıdır.

      ③ : There should be 3-4 rows, each consisting of 5-12 bristle clusters. ③ : Her biri 5-12 kıl demetinden oluşan 3-4 sıra bulunmalıdır.

    Annotators

    1. str

      secrets.choice([1, 2, 3, 4, 5]) を実行すればintが返ってくるはずなので、ここはシーケンスの要素の型が返ってくるのだと思います。

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The manuscript by Choi and colleagues investigates the impact of variation in cortical geometry and growth on cortical surface morphology. Specifically, the study uses physical gel models and computational models to evaluate the impact of varying specific features/parameters of the cortical surface. The study makes use of this approach to address the topic of malformations of cortical development and finds that cortical thickness and cortical expansion rate are the drivers of differences in morphogenesis.

      The study is composed of two main sections. First, the authors validate numerical simulation and gel model approaches against real cortical postnatal development in the ferret. Next, the study turns to modelling malformations in cortical development using modified tangential growth rate and cortical thickness parameters in numerical simulations. The findings investigate three genetically linked cortical malformations observed in the human brain to demonstrate the impact of the two physical parameters on folding in the ferret brain.

      This is a tightly presented study that demonstrates a key insight into cortical morphogenesis and the impact of deviations from normal development. The dual physical and computational modeling approach offers the potential for unique insights into mechanisms driving malformations. This study establishes a strong foundation for further work directly probing the development of cortical folding in the ferret brain. One weakness of the current study is that the interpretation of the results in the context of human cortical development is at present indirect, as the modelling results are solely derived from the ferret. However, these modelling approaches demonstrate proof of concept for investigating related alterations more directly in future work through similar approaches to models of the human cerebral cortex.

      We thank the reviewer for the very positive comments. While the current gel and organismal experiments focus on the ferret only, we want to emphasize that our analysis does consider previous observations of human brains and morphologies therein (Tallinen et al., Proc. Natl. Acad. Sci. 2014; Tallinen et al., Nat. Phys. 2016), which we compare and explain. This allows us to analyze the implications of our study broadly to understand the explanations of cortical malformations in humans using the ferret to motivate our study. Further analysis of normal human brain growth using computational and physical gel models can be found in our companion paper (Yin et al., 2025), now also published to eLife: S. Yin, C. Liu, G. P. T. Choi, Y. Jung, K. Heuer, R. Toro, L. Mahadevan, Morphogenesis and morphometry of brain folding patterns across species. eLife, 14, RP107138, 2025. doi:10.7554/eLife.107138

      In future work, we plan to obtain malformed human cortical surface data, which would allow us to further investigate related alterations more directly. We have added a remark on this in the revised manuscript (please see page 8–9).

      Reviewer 2 (Public review):

      Summary:

      Based on MRI data of the ferret (a gyrencephalic non-primate animal, in whom folding happens postnatally), the authors create in vitro physical gel models and in silico numerical simulations of typical cortical gyrification. They then use genetic manipulations of animal models to demonstrate that cortical thickness and expansion rate are primary drivers of atypical morphogenesis. These observations are then used to explain cortical malformations in humans.

      Strengths:

      The paper is very interesting and original, and combines physical gel experiments, numerical simulations, as well as observations in MCD. The figures are informative, and the results appear to have good overall face validity.

      We thank the reviewer for the very positive comments.

      Weaknesses:

      On the other hand, I perceived some lack of quantitative analyses in the different experiments, and currently, there seems to be rather a visual/qualitative interpretation of the different processes and their similarities/differences. Ideally, the authors also quantify local/pointwise surface expansion in the physical and simulation experiments, to more directly compare these processes. Time courses of eg, cortical curvature changes, could also be plotted and compared for those experiments. I had a similar impression about the comparisons between simulation results and human MRI data. Again, face validity appears high, but the comparison appeared mainly qualitative.

      We thank the reviewer for the comments. Besides the visual and qualitative comparisons between the models, we would like to point out that we have included the quantification of the shape difference between the real and simulated ferret brain models via spherical parameterization and the curvature-based shape index as detailed in main text Fig. 4 and SI Section 3. We have also utilized spherical harmonics representations for the comparison between the real and simulated ferret brains at different maximum order N. In our revision, we have included more calculations for the comparison between the real and simulated ferret brains at more time points in the SI (please see SI page 6). As for the comparison between the malformation simulation results and human MRI data in the current work, since the human MRI data are two-dimensional while our computational models are threedimensional, we focus on the qualitative comparison between them. In future work, we plan to obtain malformed human cortical surface data, from which we can then perform the parameterization-based and curvature-based shape analysis for a more quantitative assessment.

      I felt that MCDs could have been better contextualized in the introduction.

      We thank the reviewer for the comment. In our revision, we have revised the description of MCDs in the introduction (please see page 2).

      Reviewer #1 (Recommendations for the authors):

      The study is beautifully presented and offers an excellent complement to the work presented by Yin et al. In its current form, the malformation portion of the study appears predominantly reliant on the numerical simulations rather than the gel model. It might be helpful, therefore, to further incorporate the results presented in Figure S5 into the main text, as this seems to be a clear application of the physical gel model to modelling malformations. Any additional use of the gel models in the malformation portion of the study would help to further justify the necessity and complementarity of the dual methodological approaches.

      We thank the reviewer for the suggestion. We have moved Fig. S5 and the associated description to the main text in the revised manuscript (please see the newly added Figure 5 on page 6 and the description on page 5–7). In particular, we have included a new section on the physical gel and computational models for ferret cortical malformations right before the section on the neurology of ferret and human cortical malformations.

      One additional consideration is that the analyses in the current study focus entirely on the ferret cortex. Given the emphasis in the title on the human brain, it may be worthwhile to either consider adding additional modelling of the human cortex or to consider modifying the title to more accurately align with the focus of the methods/results.

      We thank the reviewer for the suggestion. While the current gel and organismal experiments focus on the ferret only, we want to emphasize that our analysis does consider previous observations of human brains and morphologies therein (Tallinen et al., Proc. Natl. Acad. Sci. 2014; Tallinen et al., Nat. Phys. 2016), which we compare and explain. This allows us to analyze the implications of our study broadly to understand the explanations of cortical malformations in humans using the ferret to motivate our study. Therefore, we think that the title of the paper seems reasonable. To further highlight the connection between the ferret brain simulations and human brain growth, we have included an additional comparison between human brain surface reconstructions adapted from a prior study and the ferret simulation results in the SI (please see SI Section S4 and SI Fig. S5 on page 9–10).

      Two additional minor points:

      Table S1 seems sufficiently critical to the motivation for the study and organization of the results section to justify inclusion in the main text. Of course, I would leave any such minor changes to the discretion of the authors.

      We thank the reviewer for the suggestion. We have moved Table S1 and the associated description to the main text in the revised manuscript (please see Table 1 on page 7).

      Page 7, Column 1: “macacques” → “macaques”.

      We thank the reviewer for pointing out the typo. We have fixed it in the revised manuscript (please see page 8).

      Reviewer #2 (Recommendations for the authors):

      The methods lack details on the human MRI data and patients.

      We thank the reviewer for the comment. Note that the human MRI data and patients were from prior works (Smith et al., Neuron 2018; Johnson et al., Nature 2018; Akula et al., Proc. Natl. Acad. Sci. 2023) and were used for the discussion on cortical malformations in Fig. 6. In the revision, we have included a new subsection in the Methods section and provided more details and references of the MRI data and patients (please see page 9–10).

    1. Reviewer #1 (Public review):

      The authors investigated tactile spatial perception on the breast using discrimination, categorization, and direct localization tasks. They reach four main conclusions:

      (1) The breast has poor tactile spatial resolution.

      This conclusion is based on comparing just noticeable differences, a marker of tactile spatial resolution, across four body regions, two on the breast. The data compellingly support the conclusion; the study outshines other studies on tactile spatial resolution that tend to use problematic measures of tactile resolution, such as two-point-discrimination thresholds. The result will interest researchers in the field and possibly in other fields due to the intriguing tension between the finding and the sexually arousing function of touching the breast.

      The manuscript incorrectly describes the result as poor spatial acuity. Acuity measures the average absolute error, and acuity is good when response biases are absent. Precision relates to the error variance. It is common to see high precision with low acuity or vice versa. Just noticeable differences assess precision or spatial resolution, while points of subjective equality evaluate acuity or bias. Similar confusions between these terms appear throughout the manuscript.<br /> A paragraph within the next section seems to follow up on this insight by examining the across-participant consistency of the differences in tactile spatial resolution between body parts. To this aim, pairwise rank correlations between body sites are conducted. This analysis raises red flags from a statistical point of view. 1) An ANOVA and its follow-up tests assume no variation in the size of the tested effect but varying base values across participants. Thus, if significant differences between conditions are confirmed by the original statistical analysis, most participants will have better spatial resolution in one condition than the other condition, and the difference between body sites will be similar across participants. 2) Correlations are power-hungry, and non-parametric tests are power-hungry. Thus, the number of participants needed for a reliable rank correlation analysis far exceeds that of the study. In sum, a correlation should emerge between body sites associated with significantly different tactile JNDs; however, these correlations might only be significant for body sites with pronounced differences due to the sample size.

      (2) Larger breasts are associated with lower tactile spatial resolution

      This conclusion is based on a strong correlation between participants' JNDs and the size of their breasts. The depicted correlation convincingly supports the conclusion. The sample size is below that recommended for correlations based on power analyses, but simulations show that spurious correlations of the reported size are extremely unlikely at N=18. Moreover, visual inspection rules out that outliers drive these correlations. Thus, they are convincing. This result is of interest to the field, as it aligns with the hypothesis that nerve fibers are more sparsely distributed across larger body parts.

      (3) The nipple is a unit

      The data do not support this conclusion. The conclusion that the nipple is perceived as a unit is based on poor tactile localization performance for touches on the nipple compared to the areola. The problem is that the localization task is a quadrant identification task with the center being at the nipple. Quadrants for the areola could be significantly larger due to the relative size of the areola and the nipple; the results section seems to suggest this was accounted for when placing the tactile stimuli within the quadrants, but the methods section suggests otherwise. Additionally, the areola has an advantage because of its distance from the nipple, which leads to larger Euclidean distances between the centers of the quadrants than for the nipple. Thus, participants should do better for the areola than for the nipple even if both sites have the same tactile resolution.

      To justify the conclusion that the nipple is a unit, additional data would be required. 1) One could compare psychometric curves with the nipple as the center and psychometric curves with a nearby point on the areola as the center. 2) Performance in the quadrant task could be compared for the nipple and an equally sized portion of the areola and tactile locations that have the same distance to the border between quadrants in skin coordinates. 3) Tactile resolution could be directly measured for both body sites using a tactile orientation task with either a two-dot probe or a haptic grating.

      Categorization accuracy in each area was tested against chance using a Monte Carlo test, which is fine, though the calculation of the test statistic, Z, should be reported in the Methods section, as there are several options. Localization accuracies are then compared between areas using a paired t-test. It is a bit confusing that once a distribution-approximating test is used, and once a test that assumes Gaussian distributions when the data is Bernoulli/Binomial distributed. Sampling-based and t-tests are very robust, so these surprising choices should have hardly any effect on the results.

      A correlation based on N=4 participants is dangerously underpowered. A quick simulation shows that correlation coefficients of randomly sampled numbers are uniformly distributed at such a low sample size. This likely spurious correlation is not analyzed, but quite prominently featured in a figure and discussed in the text, which is worrisome.

      (4) Localization of tactile events on the breast is biased towards the nipple

      The conclusion that tactile percepts are drawn toward the nipple is based on localization biases for tactile stimuli on the breast compared to the back. Unfortunately, the way participants reported the tactile locations introduces a major confound. Participants indicated the perceived locations of the tactile stimulus on 3D models of these body parts. The nipple is a highly distinctive and cognitively represented landmark, far more so than the scapula, making it very likely that responses were biased toward the nipple regardless of the actual percepts. One imperfect but better alternative would have been to ask participants to identify locations on a neutral grey patch and help them relate this patch to their skin by repeatedly tracing its outline on the skin.

      Participants also saw their localization responses for the previously touched locations. This is unlikely to induce bias towards the nipple, but it renders any estimate of the size and variance of the errors unreliable. Participants will always make sure that the marked locations are sufficiently distant from each other.

      The statistical analysis is again a homebrew solution and hard to follow. It remains unclear why standard and straightforward measures of bias, such as regressing reported against actual locations, were not used.

      Null-hypothesis significance testing only lets scientists either reject the null hypothesis or not. The latter does NOT mean the Null hypothesis is true, i.e., it can never be concluded that there is no effect. This rule applies to every NHST test. However, it raises particular concerns with distribution tests. The only conclusion possible is that the data are unlikely from a population with the tested distribution; these tests do not provide insight into the actual distribution of the data, regardless of whether the result is significant or not.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The statistically adequate way of testing the biases is a hierarchical regression model (LMM) with a distance of the physical location from the nipple as a predictor, and a distance of the reported location from the nipple as a dependent variable. Either variable can be unsigned or signed for greater power, for example, coding the lateral breast as negative and the medial breast as positive. The bias will show in regression coefficients smaller than 1.

      Thank you for this suggestion. We have subsequently replaced the relevant ANOVA analyses with LMM analyses. Specifically, we use an LMM for breast and back separately to show the different effects of distance, then use a combined LMM to compare the interaction. Finally, we use an LMM to assess the differences between precision and bias on the back and breast. The new analysis confirms earlier statements and do not change the results/interpretation of the data.

      Moreover, any bias towards the nipple could simply be another instance of regression to the mean of the stimulus distribution, given that the tested locations were centered on the nipple. This confound can only be experimentally solved by shifting the distribution of the tested locations. Finally, given that participants indicated the locations on a 3D model of the body part, further experimentation would be required to determine whether there is a perceptual bias towards the nipple or whether the authors merely find a response bias.

      A localization bias toward the nipple in this context does not show that the nipple is the anchor of the breast's tactile coordinate system. The result might simply be an instance of regression to the mean of the stimulus distribution (also known as experimental prior). To convincingly show localization biases towards the nipple, the tested locations should be centered at another location on the breast.

      Another problem is the visual salience of the nipple, even though Blender models were uniformly grey. With this type of direct localization, it is very difficult to distinguish perceptual from response biases even if the regression to the mean problem is solved. There are two solutions to this problem: 1) Varying the uncertainty of the tactile spatial information, for example, by using a pen that exerts lighter pressure. A perceptual bias should be stronger for more uncertain sensory information; a response bias should be the same across conditions. 2) Measure bias with a 2IFC procedure by taking advantage of the fact that sensory information is noisier if the test is presented before the standard.

      We believe that the fact that we explicitly tested two locations with equally distributed test locations, both of which had landmarks, makes this unlikely. Indeed, testing on the back is exactly what the reviewer suggests. It would also be impossible to test this “on another location on the breast” as we are sampling across the whole breast. Moreover, as markers persisted on the model within each block, the participants were generating additional landmarks on each trial. Thus, if there were any regression to the mean, this would be observed for both locations. Nevertheless, we recognize that this test cannot distinguish between a sensory bias towards the nipple and consistent response bias that is always in the direction of the nipple, though to what extent these are the same thing is difficult to disentangle. That said, if we had restricted testing to half of the breast such that the distribution of points was asymmetrical this would allow us to test the hypothesis put forward by the reviewer. We recognize that this is a limitation of the data and have downplayed statements and added caveats accordingly.

      We have changed the appropriate heading and text in the discussion to downplay the finding:

      “Reports are biased towards the nipple”

      “suggesting that the nipple plays a pivotal role in the mental representation of the breast.”

      it might be harder to learn the range of locations on the back given that stimulation is not restricted to an anatomically defined region as it is the case for the breast.

      We apologize for any confusion but the point distribution is identical between tasks, as described in the methods.

      The stability of the JND differences between body parts across subjects is already captured in the analysis of the JNDs; the ANOVA and the post-hoc testing would not be significant if the order were not relatively stable across participants. Thus, it is unclear why this is being evaluated again with reduced power due to improper statistics.

      We apologize for any confusion here. Only one ANOVA with post-hoc testing was performed on the data. The second parenthetical describing the test was perhaps redundant and confusing, so I have removed it.

      “(Error! Reference source not found.A, B, 1-way ANOVA with Tukey’s HSD post-hoc t-test: p = 0.0284)”

      The null hypothesis of an ANOVA is that at least one of the mean values is different from the others; adding participants as a factor does not provide evidence for similarity.

      We agree with this statement and have removed the appropriate text.

      The pairwise correlations between body parts seem to be exploratory in nature. Like all exploratory analyses, the question arises of how much potential extra insights outweigh the risk of false positives. It would be hard to generate data with significant differences between several conditions and not find any correlations between pairs of conditions. Thus, the a priori chance of finding a significant correlation is much higher than what a correction accounts for.

      We broadly agree with this statement. However, we believe that the analyses were important to determine if participants were systematically more or less acute across body parts. Moreover, both the fact that we actually did not observe any other significant relationships and that we performed post-hoc correction imply that no false positives were observed. Indeed, in the one relationship that was observed, we would need to have an assumed FDR over 10x higher than the existing post hoc correction required implying a true relationship.

      If the JND at mid breast (measured with locations centered at the nipple) is roughly the same size as the nipple, it is not surprising that participants have difficulty with the categorical localization task on the nipple but perform better than chance on the significantly larger areola.

      We agree that it is not surprising given the previously shown data, however, the initial finding is surprising to many and this experiment serves to reinforce the previous finding.

      Neither signed nor absolute localization error can be compared to the results of the previous experiments. The JND should be roughly proportional to the variance of the errors.

      We apologize for any confusion, however we are not comparing the values, merely observing that the results are consistent.

      Reviewer #2 (Public review):

      I had a hard time understanding some parts of the report. What is meant by "broadly no relationship" in line 137?

      We have removed the qualifier to simplify the text.

      It is suggested that spatial expansion (which is correlated with body part size) is related between medial breast and hand - is this to say that women with large hands have large medial breast size? Nipple size was measured, but hand size was not measured, is this correct?

      Correct. We have added text to state as such.

      It is furthermore unclear how the authors differentiate medial breast and NAC. The sentence in lines 140-141 seems to imply the two terms are considered the same, as a conclusion about NAC is drawn from a result about the medial breast. This requires clarification.

      Thank you for catching this, we have corrected it in the text.

      Finally, given that the authors suspect that overall localization ability (or attention) may be overshadowed by a size effect, would not an analysis be adequate that integrates both, e.g. a regression with multiple predictors?

      If the reviewer means that participants would be consistently “acute” then we believe that SF1 would have stronger correlations. Consequently, we see no reason to add “overall tactile acuity” as a predictor.

      In the paragraph about testing quadrants of the nipple, it is stated that only 3 of 10 participants barely outperformed chance with a p < 0.01. It is unclear how a significant ttest is an indication of "barely above chance".

      We have adjusted the text to clarify our meaning.

      “On the nipple, however, participants were consistently worse at locating stimuli on the nipple than the breast (paired t-test, t = 3.42, p < 0.01) where only 3 of the 10 participants outperformed chance, though the group as a whole outperformed chance (Error! Reference source not found.B, 36% ± 13%; Z = 5.5, p < 0.01).”

      The final part of the paragraph on nipple quadrants (starting line 176) explains that there was a trend (4 of 10 participants) for lower tactile acuity being related to the inability to differentiate quadrants. It seems to me that such a result would not be expected: The stated hypothesis is that all participants have the same number of tactile sensors in their nipple and areola, independent of NAC size. In this section, participants determine the quadrant of a single touch. Theoretically, all participants should be equally able to perform this task, because they all have the same number of receptors in each quadrant of nipple and areola. Thus, the result in Figure 2C is curious.

      We agree that this result seemingly contradicts observations from the previous experiment, however we believe that it relates to the distinction between the ability to perform relative distinctions and absolute localizations. In the first experiment, the presentation of two sequential points provides an implicit reference whereas in the quadrant task there is no reference. With the results of the third experiment in mind, biases towards the nipple would effectively reduce the ability of participants to identify the quadrant. What this result may imply is that the degree of bias is greater for women with greater expansion. We have added text to the discussion to lay this out.

      “This negative trend implicitly contradicts the previous result where one might expect equal performance regardless of size as the location of the stimuli was scaled to the size of the nipple and areola. However, given the absence of a reference point, systematic biases are more likely to occur and thus may reflect a relationship between localization bias and breast size.”

      This section reports an Anova (line 193/194) with a factor "participant". This doesn't appear sensible. Please clarify. The factor distance is also unclear; is this a categorical or a continuous variable? Line 400 implies a 6-level factor, but Anovas and their factors, respectively, are not described in methods (nor are any of the other statistical approaches).

      We believe this comment has been addressed above with our replacement of the ANOVA with an LMM. We have also added descriptions of the analysis throughout the methods.

      The analysis on imprecision using mean pairwise error (line 199) is unclear: does pairwise refer to x/y or to touch vs. center of the nipple?

      We have clarified this to now read:

      “To measure the imprecision, we computed the mean pairwise distance between each of the reported locations for a given stimulus location and the mean reported location.”

      p8, upper text, what is meant by "relative over-representation of the depth axis"? Does this refer to the breast having depth but the equivalent area on the back not having depth? What are the horizontal planes (probably meant to be singular?) - do you simply mean that depth was ignored for the calculation of errors? This seems to be implied in Figure 3AB.

      This is indeed what we meant. We have attempted to clarify in the text.

      “Importantly, given the relative over-representation of the depth axis for the breast, we only considered angles in the horizontal planes such that the shape of the breast did not influence the results.” Became:

      “Importantly, because the back is a relatively flat surface in comparison to the breast, errors were only computed in the horizontal plane and depth was excluded when computing the angular error.”

      Lines 232-241, I cannot follow the conclusions drawn here. First, it is not clear to a reader what the aim of the presented analyses is: what are you looking for when you analyze the vectors? Second, "vector strength" should be briefly explained in the main text. Third, it is not clear how the final conclusion is drawn. If there is a bias of all locations towards the nipple, then a point closer to the nipple cannot exhibit a large bias, because the nipple is close-by. Therefore, one would expect that points close to the nipple exhibit smaller errors, but this would not imply higher acuity - just less space for localizing anything. The higher acuity conclusion is at odds with the remaining results, isn't it: acuity is low on the outer breast, but even lower at the NAC, so why would it be high in between the two?

      Thank you for pointing out the circular logic. We have replaced this sentence with a more accurate statement.

      “Given these findings, we conclude that the breast has lower tactile acuity than the hand and is instead comparable to the back. Moreover, localization of tactile events to both the back and breast are inaccurate but localizations to the breast are consistently biased towards the nipple.”

      The discussion makes some concrete suggestions for sensors in implants (line 283). It is not clear how the stated numbers were computed. Also, why should 4 sensors nipple quadrants receive individual sensors if the result here was that participants cannot distinguish these quadrants?

      Thank you for catching this, it should have been 4 sensors for the NAC, not just the nipple. We have fixed this in the text.

      I would find it interesting to know whether participants with small breast measurement delta had breast acuity comparable to the back. Alternatively, it would be interesting to know whether breast and back acuity are comparable in men. Such a result would imply that the torso has uniform acuity overall, but any spatial extension of the breast is unaccounted for. The lowest single participant data points in Figure 1B appear similar, which might support this idea.

      We agree that this is an interesting question and as you point out, the data does indicate that in cases of minimal expansion acuity may be constant on the torso. However, in the comparison of the JNDs, post-hoc testing revealed no significant difference between the back and either breast region. Consequently, subsampling the group would result in the same result. We have added a sentence to the discussion stating this.

      “Consequently, the acuity of the breast is likely determined initially by torso acuity and then any expansion.”

    1. Why do you think that Goodwill believes it necessary to continually innovate? /*<![CDATA[*/#mt-toc-container {display: none !important;}/*]]>*//*<![CDATA[*/ $(function() { if(!window['autoDefinitionList']){ window['autoDefinitionList'] = true; $('dl').find('dt').on('click', function() { $(this).next().toggle('350'); }); } });/*]]>*/ /*<![CDATA[*/window.addEventListener('load', function(){$('iframe').iFrameResize({warningTimeout:0, scrolling: 'omit'});})/*]]>*//*<![CDATA[*/ window.PageNum = "auto"; window.InitialOffset = "false"; window.PageName = "1.2: Case in Point: Doing Good as a Core Business Strategy"; /*]]>*/ /*<![CDATA[*/ //<!-- MathJax Config --> var front = window.PageNum.trim(); if(front=="auto"){ front = window.PageName.replace('\"', '\\\"').trim(); //front = "'..string.matchreplace(PageName,'\"','\\\"')..'".trim(); if(front.includes(":")){ front = front.split(":")[0].trim(); if(front.includes(".")){ front = front.split("."); front = front.map((int)=>int.includes("0")?parseInt(int,10):int).join("."); } front+="."; } else { front = ""; } } front = front.trim(); function loadMathJaxScript() { try { const script = document.createElement('script'); script.id = "mathjax-script"; script.src = "https://cdn.jsdelivr.net/npm/mathjax@4/tex-mml-svg.js"; script.type = "text/javascript"; script.defer = true; document.head.appendChild(script); } catch (err) { console.error(err); } } document.addEventListener('DOMContentLoaded', (e) => { loadMathJaxScript(); }); if (window.PageName !== 'Realtime MathJax'){ MathJax = { options: { ignoreHtmlClass: "tex2jax_ignore", processHtmlClass: "tex2jax_process", menuOptions: { settings: { zscale: "150%", zoom: "Double-Click", assistiveMml: true, // true to enable assitive MathML collapsible: false, // true to enable collapsible math }, }, }, output: { scale: 0.85, mtextInheritFont: false, displayOverflow: "linebreak", linebreaks: { width: "100%", }, }, startup: { pageReady: () => { if (window.activateBeeLine) { window.activateBeeLine(); } return MathJax.startup.defaultPageReady(); }, }, chtml: { matchFontHeight: true, }, tex: { tags: "all", tagformat: { number: (n) => { if (window.InitialOffset) { const offset = Number(window.InitialOffset); if(!offset) { return front + n; // If offset is falsy (nan, undefined, etc.) } const added = Number(n) + offset; return front + added; } else { return front + n; } }, }, macros: { eatSpaces: ['#1', 2, ['', ' ', '\\endSpaces']], PageIndex: ['{' + front.replace(/\./g, '{.}') + '\\eatSpaces#1 \\endSpaces}', 1], test: ["{" + front + "#1}", 1], mhchemrightleftharpoons: "{\\unicode{x21CC}\\,}", xrightleftharpoons: ['\\mhchemxrightleftharpoons[#1]{#2}', 2, ''] }, packages: { "[+]": [ "mhchem", "color", "cancel", "ams", "tagformat" ], }, }, loader: { '[tex]/mhchem': { ready() { const {MapHandler} = MathJax._.input.tex.MapHandler; const mhchem = MapHandler.getMap('mhchem-chars'); mhchem.lookup('mhchemrightarrow')._char = '\uE42D'; mhchem.lookup('mhchemleftarrow')._char = '\uE42C'; } }, load: [ "[tex]/mhchem", "[tex]/color", "[tex]/cancel", "[tex]/tagformat", ], }, }; }; //<!-- End MathJax Config -->/*]]>*/

      To keep up with a growing society and to continue to thrive in the society.

    1. self

      [/ 🧊/ ♖/ hyperpost/ ~/ indyweb/ 📓/ 20/ 25/ 11/ 3/ 🏛️](https://bafybeicbv7b4bpesh5wmnynftywhm2dzrswf6csndh2v4ndu2n3uuex4ny.ipfs.dweb.link/?filename=save%20string%20to%20local%20filesystem%20javascript%20-%20Brave%20Search%20(11_13_2025%208%EF%BC%9A27%EF%BC%9A28%20AM).html}

    1. Reviewer #1 (Public review):

      General assessment of the work:

      In this manuscript, Mohr and Kelly show that the C1 component of the human VEP is correlated with binary choices in a contrast discrimination task, even when the stimulus is kept constant and confounding variables are considered in the analysis. They interpret this as evidence for the role V1 plays during perceptual decision formation. Choice-related signals in single sensory cells are enlightening because they speak to the spatial (and temporal) scale of the brain computations underlying perceptual decision-making. However, similar signals in aggregate measures of neural activity offer a less direct window and thus less insight into these computations. For example, although I am not a VEP specialist, it seems doubtful that the measurements are exclusively picking up (an unbiased selection of) V1 spikes. Moreover, although this is not widely known, there is in fact a long history to this line of work. In 1972, Campbell and Kulikowski ("The Visual Evoked Potential as a function of contrast of a grating pattern" - Journal of Physiology) already showed a similar effect in a contrast detection task (this finding inspired the original Choice Probability analyses in the monkey physiology studies conducted in the early 1990's). Finally, it is not clear to me that there is an interesting alternative hypothesis that is somehow ruled out by these results. Should we really consider that simple visual signals such as spatial contrast are *not* mediated by V1? This seems to fly in the face of well-established anatomy and function of visual circuits. Or should we be open to the idea that VEP measurements are almost completely divorced from task-relevant neural signals? Why would this be an interesting technique then? In sum, while this work reports results in line with several single-cell and VEP studies and perhaps is technically superior in its domain, I find it hard to see how these findings would meaningfully impact our thinking about the neural and computational basis of spatial contrast discrimination.

      Summary of substantive concerns:

      (1) The study of choice probability in V1 cells is more extensive than portrayed in the paper's introduction. In recent years, choice-related activity in V1 has also been studied by Nienborg & Cumming (2014), Goris et al (2017), Jasper et al (2019), Lange et al (2023), and Boundy-Singer et al (2025). These studies paint a complex picture (a mixture of positive, absent, and negative results), but should be mentioned in the paper's introduction.

      (2) The very first study to conduct an analysis of stimulus-conditioned neural activity during a perceptual decision-making task was, in fact, a VEP study: Campbell and Kulikowski (1972). This study never gained the fame it perhaps deserves. But it would be appropriate to weave it into the introduction and motivation of this paper.

      (3) What are interesting alternative hypotheses to be considered here? I don't understand the (somewhat implicit) suggestion here that contrast representations late in the system can somehow be divorced from early representations. If they were, they would not be correlated with stimulus contrast.

      (4) I find the arguments about the timing of the VEP signals somewhat complex and not very compelling, to be honest. It might help if you added a simulation of a process model that illustrated the temporal flow of the neural computations involved in the task. When are sensory signals manifested in V1 activity informing the decision-making process, in your view? And how is your measure of neural activity related to this latent variable? Can you show in a simulation that the combination of this process and linking hypothesis gives rise to inverted U-shaped relationships, as is the case for your data?

    2. Reviewer #2 (Public review):

      Summary:

      Mohr and Kelly report a high-density EEG study in healthy human volunteers in which they test whether correlations between neural activity in the primary visual cortex and choice behavior can be measured non-invasively. Participants performed a contrast discrimination task on large arrays of Gabor gratings presented in the upper left and lower right quadrants of the visual field. The results indicate that single-trial amplitudes of C1, the earliest cortical component of the visual evoked potential in humans, predict forced-choice behavior over and beyond other behavioral and electrophysiological choice-related signals. These results constitute an important advance for our understanding of the nature and flexibility of early visual processing.

      Strengths:

      (1) The findings suggest a previously unsuspected role for aggregate early visual cortex activity in shaping behavioral choices.

      (2) The authors extend well-established methods for assessing covariation between neural signals and behavioral output to non-invasive EEG recordings.

      (3) The effects of initial afferent information in the primary visual cortex on choice behavior are carefully assessed by accounting for a wide range of potential behavioral and electrophysiological confounds.

      (4) Caveats and limitations are transparently addressed and discussed.

      Weaknesses:

      (1) It is not clear whether integration of contrast information across relatively large arrays is a good test case for decision-related information in C1. The authors raise this issue in the Discussion, and I agree that it is all the more striking that they do find C1 choice probability. Nevertheless, I think the choice of task and stimuli should be explained in more detail.

      (2) In a similar vein, while C1 has canonical topographical properties at the grand-average level, these may differ substantially depending on individual anatomy (which the authors did not assess). This means that task-relevant information will be represented to different degrees in individuals' single-trial data. My guess is that this confound was mitigated precisely by choosing relatively extended stimulus arrays. But given the authors' impressive track record on C1 mapping and modeling, I was surprised that the underlying rationale is only roughly outlined. For example, given the topographies shown and the electrode selection procedure employed, I assume that the differences between upper and lower targets are mainly driven by stimulus arms on the main diagonal. Did the authors run pilot experiments with more restricted stimulus arrays? I do not mean to imply that such additional information needs to be detailed in the main article, but it would be worth mentioning.

      (3) Also, the stimulus arrangement disregards known differences in conduction velocity between the upper and lower visual fields. While no such differences are evident from the maximal-electrode averages shown in Figure 1B, it is difficult to assess this issue without single-stimulus VEPs and/or a dedicated latency analysis. The authors touch upon this issue when discussing potential pre-C1 signals emanating from the magnocellular pathway.

      (4) I suspect that most of these issues are at least partly related to a lack of clarity regarding levels of description: the authors often refer to 'information' contained in C1 or, apparently interchangeably, to 'visual representations' before, during, or following C1. However, if I understand correctly, the signal predicting (or predicted by) behavioral choice is much cruder than what an RSA-primed readership may expect, and also cruder than the other choice-predictive signals entered as control variables: namely, a univariate difference score on single-trial data integrated over a 10 ms window determined on the basis of grand-averaged data. I think it is worth clarifying and emphasizing the nature of this signal as the difference of aggregate contrast responses that *can* only be read out at higher levels of the visual system due to the limited extent of horizontal connectivity in V1. I do not think that this diminishes the importance of the findings - if anything, it makes them more remarkable.

      (5) Arguably even more remarkable is the finding that C1 amplitudes themselves appear to be influenced by choice history. The authors address this issue in the Discussion; however, I'm afraid I could not follow their argument regarding preparatory (and differential?) weighting of read-outs across the visual hierarchy. I believe this point is worth developing further, as it bears on the issue of whether C1 modulations are present and ecologically relevant when looking (before and) beyond stimulus-locked averages.

    1. Reviewer #3 (Public review):

      Summary:

      Fengwen Huang et al. used multiple neuroscience techniques (transgenetic mouse, immunochemistry, bulk calcium recording, neural sensor, hippocampal-dependent task, optogenetics, chemogenetics, and interfer RNA technique) to elucidate the role of the excitatory cholecystokinin-positive pyramidal neurons in the hippocampus in regulating the hippocampal functions, including navigation and neuroplasticity.

      Strengths:

      (1) The authors provided the distribution profiles of excitatory cholecystokinin in the dorsal hippocampus via the transgenetic mice (Ai14::CCK Cre mice), immunochemistry, and retrograde AAV.

      (2) The authors used the neural sensor and light stimulation to monitor the CCK release from the CA3 area, indicating that CCK can be secreted by activation of the excitatory CCK neurons.

      (3) The authors showed that the activity of the excitatory CCK neurons in CA3 is necessary for navigation learning.

      (4) The authors demonstrated that inhibition of the excitatory CCK neurons and knockdown of the CCK gene expression in CA3 impaired the navigation learning and the neuroplasticity of CA3-CA1 projections.

      Weaknesses:

      (1) The causal relationship between navigation learning and CCK secretion?

      (2) The effect of overexpression of the CCK gene on hippocampal functions?

      (3) What are the functional differences between the excitatory and inhibitory CCK neurons in the hippocampus?

      (4) Do CCK sources come from the local CA3 or entorhinal cortex (EC) during the high-frequency electrical stimulation?

    1. Reviewer #2 (Public review):

      Summary:

      This highly novel and significant manuscript re-analyzes behavioral QTL data derived from morphine locomotor activity in the BXD recombinant inbred panel. The combination of interacting behavioral-pharmacology (morphine and naltrexone) time course data, high-resolution mouse genetic analyses, genetic analysis of gene expression (eQTLs), cross-species analysis with human gene expression and genetic data, and molecular modeling approaches with Bayesian network analysis produces new information on loci modulating morphine locomotor activity.

      Furthermore, the identification of time-wise epistatic interactions between the Oprm1 and Fgf12 loci is highly novel and points to methodological approaches for identifying other epistatic interactions using animal model genetic studies.

      Strengths:

      (1) Use of state-of-the art genetic tools for mapping behavioral phenotypes in mouse models.

      (2) Adequately powered analysis incorporating both sexes and time course analyses.

      (3) Detection of time and sex-dependent interactions of two QTL loci modulating morphine locomotor activity.

      (4) Identification of putative candidate genes by combined expression and behavioral genetic analyses.

      (5) Use of Bayesian analysis to model causal interactions between multiple genes and behavioral time points.

      Weaknesses:

      (1) There is a need for careful editing of the text and figures to eliminate multiple typographical and other compositional errors.

      (2) There are multiple examples of overstating the possible significance of results that should be corrected or at least directly pointed out as weaknesses in the Discussion. These include:

      a) Assumption that the Oprm1 gene is the causal candidate gene for the major morphine locomotor Chr10 QTL at the early time epochs. Oprm1 is 400,000 bp away from the support interval of the Mor10a QTL locus, and there is no mention as to whether the Oprm1 mRNA eQTL overlaps with Mor10a.

      b) Although the Bayesian analysis of possible complex interactions between Oprm1, Fgf12, other interacting genes, and behaviors is very innovative and produces testable hypotheses, a more straightforward mediation analysis of causal relationships between genotype, gene expression, and phenotype would have added strength to the arguments for the causal role of these individual genes.

      c) The GWAS data analysis for Oprm1 and Fgf12 is incomplete in not mentioning actual significance levels for Oprm1 and perhaps overstating the nominal significance findings for Fgf12.

      Appraisal:

      The authors largely succeeded in reaching goals with novel findings and methodology.

      Significance of Findings:

      This study will likely spur future direct experimental studies to test hypotheses generated by this complex analysis. Additionally, the broad methodological approach incorporating time course genetic analyses may encourage other studies to identify epistatic interactions in mouse genetic studies.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to referee comments: ____RC-2025-03008


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary In this article, the authors used the synthetic TALE DNA binding proteins, tagged with YFP, which were designed to target five specific repeat elements in Trypanosoma brucei genome, including centromere and telomeres-associated repeats and those of a transposon element. This is in order to detect and identified, using YFP-pulldown, specific proteins that bind to these repetitive sequences in T. brucei chromatin. Validation of the approach was done using a TALE protein designed to target the telomere repeat (TelR-TALE) that detected many of the proteins that were previously implicated with telomeric functions. A TALE protein designed to target the 70 bp repeats that reside adjacent to the VSG genes (70R-TALE) detected proteins that function in DNA repair and the protein designed to target the 177 bp repeat arrays (177R-TALE) identified kinetochore proteins associated T. brucei mega base chromosomes, as well as in intermediate and mini-chromosomes, which imply that kinetochore assembly and segregation mechanisms are similar in all T. brucei chromosome.

      Major comments: Are the key conclusions convincing? The authors reported that they have successfully used TALE-based affinity selection of protein-associated with repetitive sequences in the T. brucei genome. They claimed that this study has provided new information regarding the relevance of the repetitive region in the genome to chromosome integrity, telomere biology, chromosomal segregation and immune evasion strategies. These conclusions are based on high-quality research, and it is, basically, merits publication, provided that some major concerns, raised below, will be addressed before acceptance for publication. 1. The authors used TALE-YFP approach to examine the proteome associated with five different repetitive regions of the T. brucei genome and confirmed the binding of TALE-YFP with Chip-seq analyses. Ultimately, they got the list of proteins that bound to synthetic proteins, by affinity purification and LS-MS analysis and concluded that these proteins bind to different repetitive regions of the genome. There are two control proteins, one is TRF-YFP and the other KKT2-YFP, used to confirm the interactions. However, there are no experiment that confirms that the analysis gives some insight into the role of any putative or new protein in telomere biology, VSG gene regulation or chromosomal segregation. The proteins, which have already been reported by other studies, are mentioned. Although the author discovered many proteins in these repetitive regions, their role is yet unknown. It is recommended to take one or more of the new putative proteins from the repetitive elements and show whether or not they (1) bind directly to the specific repetitive sequence (e.g., by EMSA); (2) it is recommended that the authors will knockdown of one or a small sample of the new discovered proteins, which may shed light on their function at the repetitive region, as a proof of concept.

      Response

      The main request from Referee 1 is for individual evaluation of protein-DNA interaction for a few candidates identified in our TALE-YFP affinity purifications, particularly using EMSA to identify binding to the DNA repeats used for the TALE selection. In our opinion, such an approach would not actually provide the validation anticipated by the reviewer. The power of TALE-YFP affinity selection is that it enriches for protein complexes that associate with the chromatin that coats the target DNA repetitive elements rather than only identifying individual proteins or components of a complex that directly bind to DNA assembled in chromatin.

      The referee suggests we express recombinant proteins and perform EMSA for selected candidates, but many of the identified proteins are unlikely to directly bind to DNA - they are more likely to associate with a combination of features present in DNA and/or chromatin (e.g. specific histone variants or histone post-translational modifications). Of course, a positive result would provide some validation but only IF the tested protein can bind DNA in isolation - thus, a negative result would be uninformative.

      In fact, our finding that KKT proteins are enriched using the 177R-TALE (minichromosome repeat sequence) identifies components of the trypanosome kinetochore known (KKT2) or predicted (KKT3) to directly bind DNA (Marciano et al., 2021; PMID: 34081090), and likewise the TelR-TALE identifies the TRF component that is known to directly associate with telomeric (TTAGGG)n repeats (Reis et al 2018; PMID: 29385523). This provides reassurance on the specificity of the selection, as does the lack of cross selectivity between different TALEs used (see later point 3 below). The enrichment of the respective DNA repeats quantitated in Figure 2B (originally Figure S1) also provides strong evidence for TALE selectivity.

      It is very likely that most of the components enriched on the repetitive elements targeted by our TALE-YFP proteins do not bind repetitive DNA directly. The TRF telomere binding protein is an exception - but it is the only obvious DNA binding protein amongst the many proteins identified as being enriched in our TelR-TALE-YFP and TRF-YFP affinity selections.

      The referee also suggests that follow up experiments using knockdown of the identified proteins found to be enriched on repetitive DNA elements would be informative. In our opinion, this manuscript presents the development of a new methodology previously not applied to trypanosomes, and referee 2 highlights the value of this methodological development which will be relevant for a large community of kinetoplastid researchers. In-depth follow-up analyses would be beyond the scope of this current study but of course will be pursued in future. To be meaningful such knockdown analyses would need to be comprehensive in terms of their phenotypic characterisation (e.g. quantitative effects on chromosome biology and cell cycle progression, rates and mechanism of recombination underlying antigenic variation, etc) - simple RNAi knockdowns would provide information on fitness but little more. This information is already publicly available from genome-wide RNAi screens (www.tritrypDB.org), with further information on protein location available from the genome-wide protein localisation resource (Tryptag.org). Hence basic information is available on all targets selected by the TALEs after RNAi knock down but in-depth follow-up functional analysis of several proteins would require specific targeted assays beyond the scope of this study.

      NonR-TALE-YFP does not have a binding site in the genome, but YFP protein should still be expressed by T. brucei clones with NLS. The authors have to explain why there is no signal detected in the nucleus, while a prominent signal was detected near kDNA (see Fig.2). Why is the expression of YFP in NonR-TALE almost not shown compared to other TALE clones?

      Response

      The NonR-TALE-YFP immunolocalisation signal indeed is apparently located close to the kDNA and away from the nucleus. We are not sure why this is so, but the construct is sequence validated and correct. However, we note that artefactual localisation of proteins fused to a globular eGFP tag, compared to a short linear epitope V5 tag, near to the kinetoplast has been previously reported (Pyrih et al, 2023; PMID: 37669165),

      The expression of NonR-TALE-YFP is shown in Supplementary Fig. S2 in comparison to other TALE proteins. Although it is evident that NonR-TALE-YFP is expressed at lower levels than other TALEs (the different TALEs have different expression levels), it is likely that in each case the TALE proteins would be in relative excess.

      It is possible that the absence of a target sequence for the NonR-TALE-YFP in the nucleus affects its stability and cellular location. Understanding these differences is tangential to the aim of this study.

      However, importantly, NonR-TALE-YFP is not the only control for used for specificity in our affinity purifications. Instead, the lack of cross-selection of the same proteins by different TALEs (e.g. TelR-TALE-YFP, 177R-TALE-YFP) and the lack of enrichment of any proteins of interest by the well expressed ingiR-TALE-YFP or 147R-TALE-YFP proteins each provide strong evidence for the specificity of the selection using TALEs, as does the enrichment of similar protein sets following affinity purification of the TelR-TALE-YFP and TRF-YFP proteins which both bind telomeric (TTAGGG)n repeats. Moreover, control affinity purifications to assess background were performed using cells that completely lack an expressed YFP protein which further support specificity (Figure 6).

      We have added text to highlight these important points in the revised manuscript:

      Page 8:

      "However, the expression level of NonR-TALE-YFP was lower than other TALE-YFP proteins; this may relate to the lack of DNA binding sites for NonR-TALE-YFP in the nucleus."

      Page 8:

      "NonR-TALE-YFP displayed a diffuse nuclear and cytoplasmic signal; unexpectedly the cytoplasmic signal appeared to be in the vicinity the kDNA of the kinetoplast (mitochrondria). We note that artefactual localisation of some proteins fused to an eGFP tag has previously been observed in T. brucei (Pyrih et al, 2023)."

      Page 10:

      Moreover, a similar set of enriched proteins was identified in TelR-TALE-YFP affinity purifications whether compared with cells expressing no YFP fusion protein (No-YFP), the NonR-TALE-YFP or the ingiR-TALE-YFP as controls (Fig. S7B, S8A; Tables S3, S4). Thus, the most enriched proteins are specific to TelR-TALE-YFP-associated chromatin rather than to the TALE-YFP synthetic protein module or other chromatin.

      As a proof of concept, the author showed that the TALE method determined the same interacting partners enrichment in TelR-TALE as compared to TRF-YFP. And they show the same interacting partners for other TALE proteins, whether compared with WT cells or with the NonR-TALE parasites. It may be because NonR-TALE parasites have almost no (or very little) YFP expression (see Fig. S3) as compared to other TALE clones and the TRF-YFP clone. To address this concern, there should be a control included, with proper YFP expression.

      Response

      See response to point 2, but we reiterate that the ingi-TALE -YFP and 147R-TALE-YFP proteins are well expressed (western original Fig. S3 now Fig. S2) but few proteins are detected as being enriched or correspond to those enriched in TelR-TALE-YFP or TRF-YFP affinity purifications (see Fig. S9). Therefore, the ingi-TALE -YFP and 147R-TALE-YFP proteins provide good additional negative controls for specificity as requested. To further reassure the referee we have also included additional volcano plots which compare TelR-TALE-YFP, 70R-TALE-YFP or 177R-TALE-YFP to the ingiR-TALE-YFP affinity selection (new Figure S8). As with No-YFP or NonR-TALE-YFP controls, the use of ingiR-TALE-YFP as a negative control demonstrates that known telomere associated proteins are enriched in TelR-TALE-YFP affinity purification, RPA subunits enriched with 70R-TALE-YFP and Kinetochore KKT poroteins enriched with 177R-TALE-YFP. These analyses demonstrate specificity in the proteins enriched following affinity purification of our different TALE-YFPs and provide support to strengthen our original findings.

      We now refer to use of No-YFP, NonR-TALE-YFP, and ingiR-TALE -YFP as controls for comparison to TelR-TALE-YFP, 70R-TALE-YFP or 177R-TALE-YFP in several places:

      Page10:

      "Moreover, a similar set of enriched proteins was identified in TelR-TALE-YFP affinity purifications whether compared with cells expressing no YFP fusion protein (No-YFP), the NonR-TALE-YFP or the ingiR-TALE-YFP as controls (Fig. S7B, S8A; Tables S3, S4)."

      Page 11:

      "Thus, the nuclear ingiR-TALE-YFP provides an additional chromatin-associated negative control for affinity purifications with the TelR-TALE-YFP, 70R-TALE-YFP and 177R-TALE-YFP proteins (Fig. S8)."

      "Proteins identified as being enriched with 70R-TALE-YFP (Figure 6D) were similar in comparisons with either the No-YFP, NonR-TALE-YFP or ingiR-TALE-YFP as negative controls."

      Top Page 12:

      "The same kinetochore proteins were enriched regardless of whether the 177R-TALE proteomics data was compared with No-YFP, NonR-TALE or ingiR-TALE-YFP controls."

      Discussion Page 13:

      "Regardless, the 147R-TALE and ingiR-TALE proteins were well expressed in T. brucei cells, but their affinity selection did not significantly enrich for any relevant proteins. Thus, 147R-TALE and ingiR-TALE provide reassurance for the overall specificity for proteins enriched TelR-TALE, 70R-TALE and 177R-TALE affinity purifications."

      After the artificial expression of repetitive sequence binding five-TALE proteins, the question is if there is any competition for the TALE proteins with the corresponding endogenous proteins? Is there any effect on parasite survival or health, compared to the control after the expression of these five TALEs YFP protein? It is recommended to add parasite growth curves, for all the TALE-proteins expressing cultures.

      Response

      Growth curves for cells expressing TelR-TALE-YFP, 177R-TALE-YFP and ingiR-TALE-YFP are now included (New Fig S3A). No deficit in growth was evident while passaging 70R-TALE-YFP, 147R-TALE-YFP, NonR-TALE-YFP cell lines (indeed they grew slightly better than controls).

      The following text has been added page 8:

      "Cell lines expressing representative TALE-YFP proteins displayed no fitness deficit (Fig. S3A)."

      Since the experiments were performed using whole-cell extracts without prior nuclear fractionation, the authors should consider the possibility that some identified proteins may have originated from compartments other than the nucleus. Specifically, the detection of certain binding proteins might reflect sequence homology (or partial homology) between mitochondrial DNA (maxicircles and minicircles) and repetitive regions in the nuclear genome. Additionally, the lack of subcellular separation raises the concern that cytoplasmic proteins could have been co-purified due to whole cell lysis, making it challenging to discern whether the observed proteome truly represents the nuclear interactome.

      Response

      In our experimental design, we confirmed bioinformatically that the repeat sequences targeted were not represented elsewhere in the nuclear or mitochondrial genome (kDNA). The absence of subcellular fractionation could result in some cytoplasmic protein selection, but this is unlikely since each TALE targets a specific DNA sequence but is otherwise identical such that cross-selection of the same contaminating protein set would be anticipated if there was significant non-specific binding. We have previously successfully affinity selected 15 chromatin modifiers and identified associated proteins without major issues concerning cytoplasmic protein contamination (Staneva et al 2021 and 2022; PMID: 34407985 and 36169304). Of course, the possibility that some proteins are contaminants will need to be borne in mind in any future follow-up analysis of proteins of interest that we identified as being enriched on specific types of repetitive element in T. brucei. Proteins that are also detected in negative control, or negative affinity selections such as No-YFP, NoR-YFP, IngiR-TALE or 147R-TALE must be disregarded.

      '6'. Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether? As mentioned earlier, the author claimed that this study has provided new information concerning telomere biology, chromosomal segregation mechanisms, and immune evasion strategies. But there are no experiments that provides a role for any unknown or known protein in these processes. Thus, it is suggested to select one or two proteins of choice from the list and validate their direct binding to repetitive region(s), and their role in that region of interaction.

      Response

      As highlighted in response to point 1 the suggested validation and follow up experiments may well not be informative and are beyond the scope of the methodological development presented in this manuscript. Referee 2 describes the study in its current form as "a significant conceptual and technical advancement" and "This approach enhances our understanding of chromatin organization in these regions and provides a foundation for investigating the functional roles of associated proteins in parasite biology."

      The Referee's phrase 'validate their direct binding to repetitive region(s)' here may also mean to test if any of the additional proteins that we identified as being enriched with a specific TALE protein actually display enrichment over the repeat regions when examined by an orthogonal method. A key unexpected finding was that kinetochore proteins including KKT2 are enriched in our affinity purifications of the 177R-TALE-YFP that targets 177bp repeats (Figure 6F). By conducting ChIP-seq for the kinetochore specific protein KKT2 using YFP-KKT2 we confirmed that KKT2 is indeed enriched on 177bp repeat DNA but not flanking DNA (Figure 7). Moreover, several known telomere-associated proteins are detected in our affinity selections of TelR-TALE-YFP (Figure 6B, FigS6; see also Reis et al, 2018 Nuc. Acids Res. PMID: 29385523; Weisert et al, 2024 Sci. Reports PMID: 39681615).

      Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. The answer for this question depends on what the authors want to present as the achievements of the present study. If the achievement of the paper was is the creation of a new tool for discovering new proteins, associated with the repeat regions, I recommend that they add a proof for direct interactions between a sample the newly discovered proteins and the relevant repeats, as a proof of concept discussed above, However, if the authors like to claim that the study achieved new functional insights for these interactions they will have to expand the study, as mentioned above, to support the proof of concept.

      Response

      See our response to point 1 and the point we labelled '6' above.

      Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments. I think that they are realistic. If the authors decided to check the capacity of a small sample of proteins (which was unknown before as a repetitive region binding proteins) to interacts directly with the repeated sequence, it will substantially add of the study (e.g., by EMSA; estimated time: 1 months). If the authors will decide to check the also the function of one of at least one such a newly detected proteins (e.g., by KD), I estimate the will take 3-6 months.

      Response

      As highlighted previously the proposed EMSA experiment may well be uninformative for protein complex components identified in our study or for isolated proteins that directly bind DNA in the context of a complex and chromatin. RNAi knockdown data and cell location data (as well as developmental expression and orthology data) is already available through tritrypDB.org and trtyptag.org

      Are the data and the methods presented in such a way that they can be reproduced? Yes

      Are the experiments adequately replicated, and statistical analysis adequate? The authors did not mention replicates. There is no statistical analysis mentioned.

      Response

      The figure legends indicate that all volcano plots of TALE affinity selections were derived from three biological replicates. Cutoffs used for significance: PFor ChiP-seq two biological replicates were analysed for each cell line expressing the specific YFP tagged protein of interest (TALE or KKT2). This is now stated in the relevant figure legends - apologies for this oversight. The resulting data are available for scrutiny at GEO: GSE295698.

      Minor comments: -Specific experimental issues that are easily addressable. The following suggestions can be incorporated: 1. Page 18, in the material method section author mentioned four drugs: Blasticidine, Phleomycin and G418, and hygromycin. It is recommended to mention the purpose of using these selective drugs for the parasite. If clonal selection has been done, then it should also be mentioned.

      Response

      We erroneously added information on several drugs used for selection in our labaoratory. In fact all TALE-YFP construct carry the Bleomycin resistance genes which we select for using Phleomycin. Also, clones were derived by limiting dilution immediately after transfection.

      We have amended the text accordingly:

      Page 17/18:

      "Cell cultures were maintained below 3 x 106 cells/ml. Pleomycin 2.5 mg/ml was used to select transformants containing the TALE construct BleoR gene."

      "Electroporated bloodstream cells were added to 30 ml HMI-9 medium and two 10-fold serial dilutions were performed in order to isolate clonal Pleomycin resistant populations from the transfection. 1 ml of transfected cells were plated per well on 24-well plates (1 plate per serial dilution) and incubated at 37{degree sign}C and 5% CO2 for a minimum of 6 h before adding 1 ml media containing 2X concentration Pleomycin (5 mg/ml) per well."

      In the method section the authors mentioned that there is only one site for binding of NonR-TALE in the parasite genome. But in Fig. 1C, the authors showed zero binding site. So, there is one binding site for NonR-TALE-YFP in the genome or zero?

      Response

      We thank the reviewer for pointing out this discrepancy. We have checked the latest Tb427v12 genome assembly for predicted NonR-TALE binding sites and there are no exact matches. We have corrected the text accordingly.

      Page 7:

      "A control NonR-TALE protein was also designed which was predicted to have no target sequence in the T. bruceigenome."

      Page 17:

      "A control NonR-TALE predicted to have no recognised target in the T. brucei geneome was designed as follows: BLAST searches were used to identify exact matches in the TREU927 reference genome. Candidate sequences with one or more match were discarded."

      The authors used two different anti-GFP antibodies, one from Roche and the other from Thermo Fisher. Why were two different antibodies used for the same protein?

      Response

      We have found that only some anti-GFP antibodies are effective for affinity selection of associated proteins, whereas others are better suited for immunolocalisation. The respective suppliers' antibodies were optimised for each application.

      Page 6: in the introduction, the authors give the number of total VSG genes as 2,634. Is it known how many of them are pseudogenes?

      Response

      This value corresponds to the number reported by Consentino et al. 2021 (PMID: 34541528) for subtelomeric VSGs, which is similar to the value reported by Muller et al 2018 (PMID: 30333624) (2486), both in the same strain of trypanosomes as used by us. Based on the earlier analysis by Cross et al (PMID: 24992042), 80% of the identified VSGs in their study (2584) are pseudogenes. This approximates to the estimation by Consentino of 346/2634 (13%) being fully functional VSG genes at subtelomeres, or 17% when considering VSGs at all genomic locations (433/2872).

      I found several typos throughout the manuscript.

      Response

      Thank you for raising this, we have read through the manuscipt several times and hopefully corrected all outstanding typos.

      Fig. 1C: Table: below TOTAL 2nd line: the number should be 1838 (rather than 1828)

      Corrected- thank you.

      • Are prior studies referenced appropriately? Yes

      • Are the text and figures clear and accurate? Yes

      • Do you have suggestions that would help the authors improve the presentation of their data and conclusions? Suggested above

      Reviewer #1 (Significance (Required)):

      Describe the nature and significance of the advance (e.g., conceptual, technical, clinical) for the field: This study represents a significant conceptual and technical advancement by employing a synthetic TALE DNA-binding protein tagged with YFP to selectively identify proteins associated with five distinct repetitive regions of T. brucei chromatin. To the best of my knowledge, it is the first report to utilize TALE-YFP for affinity-based isolation of protein complexes bound to repetitive genomic sequences in T. brucei. This approach enhances our understanding of chromatin organization in these regions and provides a foundation for investigating the functional roles of associated proteins in parasite biology. Importantly, any essential or unique interacting partners identified could serve as potential targets for therapeutic intervention.

      • Place the work in the context of the existing literature (provide references, where appropriate). I agree with the information that has already described in the submitted manuscript, regarding its potential addition of the data resulted and the technology established to the study of VSGs expression, kinetochore mechanism and telomere biology.

      • State what audience might be interested in and influenced by the reported findings. These findings will be of particular interest to researchers studying the molecular biology of kinetoplastid parasites and other unicellular organisms, as well as scientists investigating chromatin structure and the functional roles of repetitive genomic elements in higher eukaryotes.

      • 1Define your field of expertise with a few keywords to help the authors contextualize your point of view. 2Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate. (1) Protein-DNA interactions/ chromatin/ DNA replication/ Trypanosomes (2) None

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary

      Carloni et al. comprehensively analyze which proteins bind repetitive genomic elements in Trypanosoma brucei. For this, they perform mass spectrometry on custom-designed, tagged programmable DNA-binding proteins. After extensively verifying their programmable DNA-binding proteins (using bioinformatic analysis to infer target sites, microscopy to measure localization, ChIP-seq to identify binding sites), they present, among others, two major findings: 1) 14 of the 25 known T. brucei kinetochore proteins are enriched at 177bp repeats. As T. brucei's 177bp repeat-containing intermediate-sized and mini-chromosomes lack centromere repeats but are stable over mitosis, Carloni et al. use their data to hypothesize that a 'rudimentary' kinetochore assembles at the 177bp repeats of these chromosomes to segregate them. 2) 70bp repeats are enriched with the Replication Protein A complex, which, notably, is required for homologous recombination. Homologous recombination is the pathway used for recombination-based antigenic variation of the 70bp-repeat-adjacent variant surface glycoproteins.

      Major Comments

      None. The experiments are well-controlled, claims well-supported, and methods clearly described. Conclusions are convincing.

      Response Thank you for these positive comments.

      Minor Comments

      1) Fig. 2 - I couldn't find an uncropped version showing multiple cells. If it exists, it should be linked in the legend or main text; Otherwise, this should be added to the supplement.

      Response

      The images presented represent reproducible analyses, and independently verified by two of the authors. Although wider field of view images do not provide the resolution to be informative on cell location, as requested we have provided uncropped images in new Fig. S4 for all the cell lines shown in Figure 2A.

      In addition, we have included as supplementary images (Fig. S3B) additional images of TelR-TALE-YFP, 177R-TALE-YFP and ingiR-TALE YFP localisation to provide additional support their observed locations presented in Figure 1. The set of cells and images presented in Figure 2A and in Fig S3B were prepared and obtained by a different authors, independently and reproducibly validating the location of the tagged protein.

      2) I think Suppl. Fig. 1 is very valuable, as it is a quantification and summary of the ChIP-seq data. I think the authors could consider making this a panel of a main figure. For the main figure, I think the plot could be trimmed down to only show the background and the relevant repeat for each TALE protein, leaving out the non-target repeats. (This relates to minor comment 6.) Also, I believe, it was not explained how background enrichment was calculated.

      Response

      We are grateful for the reviewer's positive view of original Fig. S1 and appreciate the suggestion. We have now moved these analysis to part B of main Figure 2 in the revised manuscript - now Figure 2B. We have also provided additional details in the Methods section on the approaches used to assess background enrichment.

      Page 19:

      Background enrichment calculation

      The genome was divided into 50 bp sliding windows, and each window was annotated based on overlapping genomic features, including CIR147, 177 bp repeats, 70 bp repeats, and telomeric (TTAGGG)n repeats. Windows that did not overlap with any of these annotated repeat elements were defined as "background" regions and used to establish the baseline ChIP-seq signal. Enrichment for each window was calculated using bamCompare, as log₂(IP/Input). To adjust for background signal amongst all samples, enrichment values for each sample were further normalized against the corresponding No-YFP ChIP-seq dataset.

      Note: While revising the manuscript we also noticed that the script had a nomalization error. We have therefore included a corrected version of these analyses as Figure 2B (old Fig. S1)

      3) Generally, I would plot enrichment on a log2 axis. This concerns several figures with ChIP-seq data.

      Response

      Our ChIP-seq enrichment is calculated by bamCompare. The resulting enrichment values are indeed log2 (IP/Input). We have made this clear in the updated figures/legends.

      4) Fig. 4C - The violin plots are very hard to interpret, as the plots are very narrow compared to the line thickness, making it hard to judge the actual volume. For example, in Centromere 5, YFP-KKT2 is less enriched than 147R-TALE over most of the centromere with some peaks of much higher enrichment (as visible in panel B), however, in panel C, it is very hard to see this same information. I'm sure there is some way to present this better, either using a different type of plot or by improving the spacing of the existing plot.

      Response

      We thank the reviewer for this suggestion; we have elected to provide a Split-Violin plot instead. This improves the presentation of the data for each centromere. The original violin plot in Figure 4C has been replaced with this Split-Violin plot (still Figure 4C).

      5) Fig. 6 - The panels are missing an x-axis label (although it is obvious from the plot what is displayed). Maybe the "WT NO-YFP vs" part that is repeated in all the plot titles could be removed from the title and only be part of the x-axis label?

      Response

      In fact, to save space the X axis was labelled inside each volcano plot but we neglected to indicate that values are a log2 scale indicating enrichment. This has been rectified - see Figure 6, and Fig. S7, S8 and S9.

      6) Fig. 7 - I would like to have a quantification for the examples shown here. In fact, such a quantification already exists in Suppl. Figure 1. I think the relevant plots of that quantification (YFP-KKT2 over 177bp-repeats and centromere-repeats) with some control could be included in Fig. 7 as panel C. This opportunity could be used to show enrichment separated out for intermediate-sized, mini-, and megabase-chromosomes. (relates to minor comment 2 & 8)

      Response

      The CIR147 sequence is found exclusively on megabase-sized chromosomes, while the 177 bp repeats are located on intermediate- and mini-sized chromosomes. Due to limitations in the current genome assembly, it is not possible to reliably classify all chromosomes into intermediate- or mini- sized categories based on their length. Therefore, original Supplementary Fig. S1 presented the YFP-KKT2 enrichment over CIR147 and 177 bp repeats as a representative comparison between megabase chromosomes and the remaining chromosomes (corrected version now presented as main Figure 2B). Additionally, to allow direct comparison of YFP-KKT2 enrichment on CIR147 and 177 bp repeats we have included a new plot in Figure 7C which shows the relative enrichment of YFP-KKT2 on these two repeat types.

      We have added the following text , page 12:

      "Taking into account the relative to the number of CIR147 and 177 bp repeats in the current T.brucei genome (Cosentino et al., 2021; Rabuffo et al., 2024), comparative analyses demonstrated that YFP-KKT2 is enriched on both CIR147 and 177 bp repeats (Figure 7C)."

      7) Suppl. Fig. 8 A - I believe there is a mistake here: KKT5 occurs twice in the plot, the one in the overlap region should be KKT1-4 instead, correct?

      Response

      Thanks for spotting this. It has been corrected

      8) The way that the authors mapped ChIP-seq data is potentially problematic when analyzing the same repeat type in different regions of the genome. The authors assigned reads that had multiple equally good mapping positions to one of these mapping positions, randomly. This is perfectly fine when analysing repeats by their type, independent of their position on the genome, which is what the authors did for the main conclusions of the work. However, several figures show the same type of repeat at different positions in the genome. Here, the authors risk that enrichment in one region of the genome 'spills' over to all other regions with the same sequence. Particularly, where they show YFP-KKT2 enrichment over intermediate- and mini-chromosomes (Fig. 7) due to the spillover, one cannot be sure to have found KKT2 in both regions. Instead, the authors could analyze only uniquely mapping reads / read-pairs where at least one mate is uniquely mapping. I realize that with this strict filtering, data will be much more sparse. Hence, I would suggest keeping the original plots and adding one more quantification where the enrichment over the whole region (e.g., all 177bp repeats on intermediate-/mini-chromosomes) is plotted using the unique reads (this could even be supplementary). This also applies to Fig. 4 B & C.

      Response

      We thank the reviewer for their thoughtful comments. Repetitive sequences are indeed challenging to analyze accurately, particularly in the context of short read ChIP-seq data. In our study, we aimed to address YFP-KKT2 enrichment not only over CIR147 repeats but also on 177 bp repeats, using both ChIP-seq and proteomics using synthetic TALE proteins targeted to the different repeat types. We appreciate the referees suggestion to consider uniquely mapped reads, however, in the updated genome assembly, the 177 bp repeats are frequently immediately followed by long stretches of 70 bp repeats which can span several kilobases. The size and repetitive nature of these regions exceeds the resolution limits of ChIP-seq. It is therefore difficult to precisely quantify enrichment across all chromosomes.

      Additionally, the repeat sequences are highly similar, and relying solely on uniquely mapped reads would result in the exclusion of most reads originating from these regions, significantly underestimating the relative signals. To address this, we used Bowtie2 with settings that allow multi-mapping, assigning reads randomly among equivalent mapping positions, but ensuring each read is counted only once. This approach is designed to evenly distribute signal across all repetitive regions and preserve a meaningful average.

      Single molecule methods such as DiMeLo (Altemose et al. 2022; PMID: 35396487) will need to be developed for T. brucei to allow more accurate and chromosome specific mapping of kinetochore or telomere protein occupancy at repeat-unique sequence boundaries on individual chromosomes.

      Reviewer #2 (Significance (Required)):

      This work is of high significance for chromosome/centromere biology, parasitology, and the study of antigenic variation. For chromosome/centromere biology, the conceptual advancement of different types of kinetochores for different chromosomes is a novelty, as far as I know. It would certainly be interesting to apply this study as a technical blueprint for other organisms with mini-chromosomes or chromosomes without known centromeric repeats. I can imagine a broad range of labs studying other organisms with comparable chromosomes to take note of and build on this study. For parasitology and the study of antigenic variation, it is crucial to know how intermediate- and mini-chromosomes are stable through cell division, as these chromosomes harbor a large portion of the antigenic repertoire. Moreover, this study also found a novel link between the homologous repair pathway and variant surface glycoproteins, via the 70bp repeats. How and at which stages during the process, 70bp repeats are involved in antigenic variation is an unresolved, and very actively studied, question in the field. Of course, apart from the basic biological research audience, insights into antigenic variation always have the potential for clinical implications, as T. brucei causes sleeping sickness in humans and nagana in cattle. Due to antigenic variation, T. brucei infections can be chronic.

      Response

      Thank you for supporting the novelty and broad interest of our manuscript

      My field of expertise / Point of view:

      I'm a computer scientist by training and am now a postdoctoral bioinformatician in a molecular parasitology laboratory. The laboratory is working on antigenic variation in T. brucei. The focus of my work is on analyzing sequencing data (such as ChIP-seq data) and algorithmically improving bioinformatic tools.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary

      Carloni et al. comprehensively analyze which proteins bind repetitive genomic elements in Trypanosoma brucei. For this, they perform mass spectrometry on custom-designed, tagged programmable DNA-binding proteins. After extensively verifying their programmable DNA-binding proteins (using bioinformatic analysis to infer target sites, microscopy to measure localization, ChIP-seq to identify binding sites), they present, among others, two major findings: 1) 14 of the 25 known T. brucei kinetochore proteins are enriched at 177bp repeats. As T. brucei's 177bp repeat-containing intermediate-sized and mini-chromosomes lack centromere repeats but are stable over mitosis, Carloni et al. use their data to hypothesize that a 'rudimentary' kinetochore assembles at the 177bp repeats of these chromosomes to segregate them. 2) 70bp repeats are enriched with the Replication Protein A complex, which, notably, is required for homologous recombination. Homologous recombination is the pathway used for recombination-based antigenic variation of the 70bp-repeat-adjacent variant surface glycoproteins.

      Major Comments

      None. The experiments are well-controlled, claims well-supported, and methods clearly described. Conclusions are convincing.

      Minor Comments

      1. Fig. 2 - I couldn't find an uncropped version showing multiple cells. If it exists, it should be linked in the legend or main text; Otherwise, this should be added to the supplement.
      2. I think Suppl. Fig. 1 is very valuable, as it is a quantification and summary of the ChIP-seq data. I think the authors could consider making this a panel of a main figure. For the main figure, I think the plot could be trimmed down to only show the background and the relevant repeat for each TALE protein, leaving out the non-target repeats. (This relates to minor comment 6.) Also, I believe, it was not explained how background enrichment was calculated.
      3. Generally, I would plot enrichment on a log2 axis. This concerns several figures with ChIP-seq data.
      4. Fig. 4C - The violin plots are very hard to interpret, as the plots are very narrow compared to the line thickness, making it hard to judge the actual volume. For example, in Centromere 5, YFP-KKT2 is less enriched than 147R-TALE over most of the centromere with some peaks of much higher enrichment (as visible in panel B), however, in panel C, it is very hard to see this same information. I'm sure there is some way to present this better, either using a different type of plot or by improving the spacing of the existing plot.
      5. Fig. 6 - The panels are missing an x-axis label (although it is obvious from the plot what is displayed). Maybe the "WT NO-YFP vs" part that is repeated in all the plot titles could be removed from the title and only be part of the x-axis label?
      6. Fig. 7 - I would like to have a quantification for the examples shown here. In fact, such a quantification already exists in Suppl. Figure 1. I think the relevant plots of that quantification (YFP-KKT2 over 177bp-repeats and centromere-repeats) with some control could be included in Fig. 7 as panel C. This opportunity could be used to show enrichment separated out for intermediate-sized, mini-, and megabase-chromosomes. (relates to minor comment 2 & 8)
      7. Suppl. Fig. 8 A - I believe there is a mistake here: KKT5 occurs twice in the plot, the one in the overlap region should be KKT1-4 instead, correct?
      8. The way that the authors mapped ChIP-seq data is potentially problematic when analyzing the same repeat type in different regions of the genome. The authors assigned reads that had multiple equally good mapping positions to one of these mapping positions, randomly. This is perfectly fine when analyzing repeats by their type, independent of their position on the genome, which is what the authors did for the main conclusions of the work. However, several figures show the same type of repeat at different positions in the genome. Here, the authors risk that enrichment in one region of the genome 'spills' over to all other regions with the same sequence. Particularly, where they show YFP-KKT2 enrichment over intermediate- and mini-chromosomes (Fig. 7) due to the spillover, one cannot be sure to have found KKT2 in both regions. Instead, the authors could analyze only uniquely mapping reads / read-pairs where at least one mate is uniquely mapping. I realize that with this strict filtering, data will be much more sparse. Hence, I would suggest keeping the original plots and adding one more quantification where the enrichment over the whole region (e.g., all 177bp repeats on intermediate-/mini-chromosomes) is plotted using the unique reads (this could even be supplementary). This also applies to Fig. 4 B & C.

      Significance

      This work is of high significance for chromosome/centromere biology, parasitology, and the study of antigenic variation. For chromosome/centromere biology, the conceptual advancement of different types of kinetochores for different chromosomes is a novelty, as far as I know. It would certainly be interesting to apply this study as a technical blueprint for other organisms with mini-chromosomes or chromosomes without known centromeric repeats. I can imagine a broad range of labs studying other organisms with comparable chromosomes to take note of and build on this study. For parasitology and the study of antigenic variation, it is crucial to know how intermediate- and mini-chromosomes are stable through cell division, as these chromosomes harbor a large portion of the antigenic repertoire. Moreover, this study also found a novel link between the homologous repair pathway and variant surface glycoproteins, via the 70bp repeats. How and at which stages during the process, 70bp repeats are involved in antigenic variation is an unresolved, and very actively studied, question in the field. Of course, apart from the basic biological research audience, insights into antigenic variation always have the potential for clinical implications, as T. brucei causes sleeping sickness in humans and nagana in cattle. Due to antigenic variation, T. brucei infections can be chronic.

      My field of expertise / Point of view:

      I'm a computer scientist by training and am now a postdoctoral bioinformatician in a molecular parasitology laboratory. The laboratory is working on antigenic variation in T. brucei. The focus of my work is on analyzing sequencing data (such as ChIP-seq data) and algorithmically improving bioinformatic tools.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary

      In this article, the authors used the synthetic TALE DNA binding proteins, tagged with YFP, which were designed to target five specific repeat elements in Trypanosoma brucei genome, including centromere and telomeres-associated repeats and those of a transposon element. This is in order to detect and identified, using YFP-pulldown, specific proteins that bind to these repetitive sequences in T. brucei chromatin. Validation of the approach was done using a TALE protein designed to target the telomere repeat (TelR-TALE) that detected many of the proteins that were previously implicated with telomeric functions. A TALE protein designed to target the 70 bp repeats that reside adjacent to the VSG genes (70R-TALE) detected proteins that function in DNA repair and the protein designed to target the 177 bp repeat arrays (177R-TALE) identified kinetochore proteins associated T. brucei mega base chromosomes, as well as in intermediate and mini-chromosomes, which imply that kinetochore assembly and segregation mechanisms are similar in all T. brucei chromosome.

      Major comments:

      Are the key conclusions convincing?

      The authors reported that they have successfully used TALE-based affinity selection of protein-associated with repetitive sequences in the T. brucei genome. They claimed that this study has provided new information regarding the relevance of the repetitive region in the genome to chromosome integrity, telomere biology, chromosomal segregation and immune evasion strategies. These conclusions are based on high-quality research and it is, basically, merits publication, provided that some major concerns, raised below, will be addressed before acceptance for publication. 1. The authors used TALE-YFP approach to examine the proteome associated with five different repetitive regions of the T. brucei genome and confirmed the binding of TALE-YFP with Chip-seq analyses. Ultimately, they got the list of proteins that bound to synthetic proteins, by affinity purification and LS-MS analysis and concluded that these proteins bind to different repetitive regions of the genome. There are two control proteins, one is TRF-YFP and the other KKT2-YFP, used to confirm the interactions. However, there are no experiment that confirms that the analysis gives some insight into the role of any putative or new protein in telomere biology, VSG gene regulation or chromosomal segregation. The proteins, which have already been reported by other studies, are mentioned. Although the author discovered many proteins in these repetitive regions, their role is yet unknown. It is recommended to take one or more of the new putative proteins from the repetitive elements and show whether or not they (1) bind directly to the specific repetitive sequence (e.g., by EMSA); (2) it is recommended that the authors will knockdown of one or a small sample of the new discovered proteins, which may shed light on their function at the repetitive region, as a proof of concept. 2. NonR-TALE-YFP does not have a binding site in the genome, but YFP protein should still be expressed by T. brucei clones with NLS. The authors have to explain why there is no signal detected in the nucleus, while a prominent signal was detected near kDNA (see Fig.2). Why is the expression of YFP in NonR-TALE almost not shown compared to other TALE clones? 3. As a proof of concept, the author showed that the TALE method determined the same interacting partners enrichment in TelR-TALE as compared to TRF-YFP. And they show the same interacting partners for other TALE proteins, whether compared with WT cells or with the NonR-TALE parasites. It may be because NonR-TALE parasites have almost no (or very little) YFP expression (see Fig. S3) as compared to other TALE clones and the TRF-YFP clone. To address this concern, there should be a control included, with proper YFP expression. 4. After the artificial expression of repetitive sequence binding five-TALE proteins, the question is if there is any competition for the TALE proteins with the corresponding endogenous proteins? Is there any effect on parasite survival or health, compared to the control after the expression of these five TALEs YFP protein? It is recommended to add parasite growth curves, for all the TALE-proteins expressing cultures. 5. Since the experiments were performed using whole-cell extracts without prior nuclear fractionation, the authors should consider the possibility that some identified proteins may have originated from compartments other than the nucleus. Specifically, the detection of certain binding proteins might reflect sequence homology (or partial homology) between mitochondrial DNA (maxicircles and minicircles) and repetitive regions in the nuclear genome. Additionally, the lack of subcellular separation raises the concern that cytoplasmic proteins could have been co-purified due to whole cell lysis, making it challenging to discern whether the observed proteome truly represents the nuclear interactome.

      Should the authors qualify some of their claims as preliminary or speculative, or remove them altogether?

      As mentioned earlier, the author claimed that this study has provided new information concerning telomere biology, chromosomal segregation mechanisms, and immune evasion strategies. But there are no experiments that provides a role for any unknown or known protein in these processes. Thus, it is suggested to select one or two proteins of choice from the list and validate their direct binding to repetitive region(s), and their role in that region of interaction. <br /> Would additional experiments be essential to support the claims of the paper? Request additional experiments only where necessary for the paper as it is, and do not ask authors to open new lines of experimentation. The answer for this question depends on what the authors want to present as the achievements of the present study. If the achievement of the paper was is the creation of a new tool for discovering new proteins, associated with the repeat regions, I recommend that they add a proof for direct interactions between a sample the newly discovered proteins and the relevant repeats, as a proof of concept discussed above, However, if the authors like to claim that the study achieved new functional insights for these interactions they will have to expand the study, as mentioned above, to support the proof of concept.

      Are the suggested experiments realistic in terms of time and resources? It would help if you could add an estimated cost and time investment for substantial experiments.

      I think that they are realistic. If the authors decided to check the capacity of a small sample of proteins (which was unknown before as a repetitive region binding proteins) to interacts directly with the repeated sequence, it will substantially add of the study (e.g., by EMSA; estimated time: 1 months). If the authors will decide to check the also the function of one of at least one such a newly detected proteins (e.g., by KD), I estimate the will take 3-6 months.

      Are the data and the methods presented in such a way that they can be reproduced?

      Yes

      Are the experiments adequately replicated, and statistical analysis adequate?

      The authors did not mention replicates. There is no statistical analysis mentioned.

      Minor comments:

      Specific experimental issues that are easily addressable.

      The following suggestions can be incorporated:

      1. Page 18, in the material method section author mentioned four drugs: Blasticidine, Phleomycin and G418, and hygromycin. It is recommended to mention the purpose of using these selective drugs for the parasite. If clonal selection has been done, then it should also be mentioned.
      2. In the method section the authors mentioned that there is only one site for binding of NonR-TALE in the parasite genome. But in Fig. 1C, the authors showed zero binding site. So, there is one binding site for NonR-TALE-YFP in the genome or zero?
      3. The authors used two different anti-GFP antibodies, one from Roche and the other from Thermo Fisher. Why were two different antibodies used for the same protein?
      4. Page 6: in the introduction, the authors give the number of total VSG genes as 2,634. Is it known how many of them are pseudogenes?
      5. I found several typos throughout the manuscript.
      6. Fig. 1C: Table: below TOTAL 2nd line: the number should be 1838 (rather than 1828)

      Are prior studies referenced appropriately?

      Yes

      Are the text and figures clear and accurate?

      Yes

      Do you have suggestions that would help the authors improve the presentation of their data and conclusions?

      Suggested above

      Significance

      Describe the nature and significance of the advance (e.g., conceptual, technical, clinical) for the field:

      This study represents a significant conceptual and technical advancement by employing a synthetic TALE DNA-binding protein tagged with YFP to selectively identify proteins associated with five distinct repetitive regions of T. brucei chromatin. To the best of my knowledge, it is the first report to utilize TALE-YFP for affinity-based isolation of protein complexes bound to repetitive genomic sequences in T. brucei. This approach enhances our understanding of chromatin organization in these regions and provides a foundation for investigating the functional roles of associated proteins in parasite biology. Importantly, any essential or unique interacting partners identified could serve as potential targets for therapeutic intervention.

      Place the work in the context of the existing literature (provide references, where appropriate).

      I agree with the information that has already described in the submitted manuscript, regarding its potential addition of the data resulted and the technology established to the study of VSGs expression, kinetochore mechanism and telomere biology.

      State what audience might be interested in and influenced by the reported findings.

      These findings will be of particular interest to researchers studying the molecular biology of kinetoplastid parasites and other unicellular organisms, as well as scientists investigating chromatin structure and the functional roles of repetitive genomic elements in higher eukaryotes.

      1Define your field of expertise with a few keywords to help the authors contextualize your point of view. 2Indicate if there are any parts of the paper that you do not have sufficient expertise to evaluate.

      1. Protein-DNA interactions/ chromatin/ DNA replication/ Trypanosomes
      2. None
    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1) The authors only report the quality of the classification considering the number of videos used for training, but not considering the number of mice represented or the mouse strain. Therefore, it is unclear if the classification model works equally well in data from all the mouse strains tested, and how many mice are represented in the classifier dataset and validation.

      We agree that strain-level performance is critical for assessing generalizability. In the revision we now report per-strain accuracy and F1 for the grooming classifier, which was trained on videos spanning 60 genetically diverse strains (n = 1100 videos) and evaluated on the test set videos spanning 51 genetically diverse strains (n=153 videos). Performance is uniform across most strains (median F1 = 0.94, IQR = 0.899–0.956), with only modest declines in albino lines that lack contrast under infrared illumination; this limitation and potential remedies are discussed in the text. The new per-strain metrics are presented in the Supplementary figure (corresponding to Figure 4).

      (2) The GUI requires pose tracking for classification, but the software provided in JABS does not do pose tracking, so users must do pose tracking using a separate tool. Currently, there is no guidance on the pose tracking recommendations and requirements for usage in JABS. The pose tracking quality directly impacts the classification quality, given that it is used for the feature calculation; therefore, this aspect of the data processing should be more carefully considered and described.

      We have added a section to the methods describing how to use the pose estimation models used in JABS. The reviewer is correct that pose tracking quality will impact classification quality. We recommend that classifiers should only be re-used on pose files generated by the same pose models used in the behavior classifier training dataset. We hope that the combination of sharing classifier training data and making a more unified framework for developing and comparing classifiers will get us closer to having foundational behavior classification models that work in many environments. We also would like to emphasize that deviating from using our pose model will also likely hinder re-using our shared large datasets in JABS-AI (JABS1200, JABS600, JABS-BxD).

      (3) Many statistical and methodological details are not described in the manuscript, limiting the interpretability of the data presented in Figures 4,7-8. There is no clear methods section describing many of the methods used and equations for the metrics used. As an example, there are no details of the CNN used to benchmark the JABS classifier in Figure 4, and no details of the methods used for the metrics reported in Figure 8.

      We thank the reviewer for bringing this to our attention. We have added a methods section to the manuscript to address this concern. Specifically, we now provide: (1) improved citation visibility of the source of CNN experiments such that the reader can locate the architecture information, (2) mathematical formulations for all performance metrics (precision, recall, F1, …) with explicit equations;  (3) detailed statistical procedures including permutation testing methods, power analysis and multiple testing corrections used throughout Figures 7-8. These additions facilitate reproducibility and proper interpretation of all quantitative results presented in the manuscript.

      Reviewer #2 (Public review):

      (1) The manuscript as written lacks much-needed context in multiple areas: what are the commercially available solutions, and how do they compare to JABS (at least in terms of features offered, not necessarily performance)? What are other open-source options?

      JABS adds to a list of commercial and open source animal tracking platforms. There are several reviews and resources that cover these technologies. JABS covers hardware, behavior prediction, a shared resource for classifiers, and genetic association studies. We’re not aware of another system that encompasses all these components. Commercial packages such as EthoVision XT and HomeCage Scan give users a ready-made camera-plus-software solution that automatically tracks each mouse and reports simple measures such as distance travelled or time spent in preset zones, but they do not provide open hardware designs, editable behavior classifiers, or any genetics workflow. At the open-source end, the >100 projects catalogued on OpenBehavior and summarised in recent reviews (Luxem et al., 2023; Işık & Ünal 2023) usually cover only one link in the chain—DIY rigs, pose-tracking libraries (e.g., DeepLabCut, SLEAP) or supervised and unsupervised behaviour-classifier pipelines (e.g., SimBA, MARS, JAABA, B-SOiD, DeepEthogram). JABS provides an open source ecosystem that integrates all four: (i) top-down arena hardware with parts list and assembly guide; (ii) an active-learning GUI that produces shareable classifiers; (iii) a public web service that enables sharing of the trained classifier and applies any uploaded classifier to a large and diverse strain survey; and (iv) built-in heritability, genetic-correlation and GWAS reporting. We have added a concise paragraph in the Discussion that cites these resources and makes this end-to-end distinction explicit.

      (2) How does the supervised behavioral classification approach relate to the burgeoning field of unsupervised behavioral clustering (e.g., Keypoint-MoSeq, VAME, B-SOiD)? 

      The reviewer raises an important point about the rapidly evolving landscape of automated behavioral analysis, where both supervised and unsupervised approaches offer complementary strengths for different experimental contexts. Unsupervised methods like Keypoint-MoSeq , VAME , and B-SOiD , which prioritize motif discovery from unlabeled data but may yield less precise alignments with expert annotations, as evidenced by lower F1 scores in comparative evaluations. Supervised approaches (like ours), by contrast, employ fully supervised classifiers to deliver frame-accurate, behavior-specific scores that align directly with experimental hypotheses. Ultimately, a pragmatic hybrid strategy, starting with unsupervised pilots to identify motifs and transitioning to supervised fine-tuning with minimal labels, can minimize annotation burdens and enhance both discovery and precision in ethological studies. This has been added in the discussion section of the manuscript.

      (3) What kind of studies will this combination of open field + pose estimation + supervised classifier be suitable for? What kind of studies is it unsuited for? These are all relevant questions that potential users of this platform will be interested in.

      This approach is suitable for a wide array of neuroscience, genetics, pharmacology, preclinical, and ethology studies. We have published in the domains of action detection for complex behaviors such as grooming, gait and posture, frailty, nociception, and sleep. We feel these tools are indispensable for modern behavior analysis. 

      (4) Throughout the manuscript, I often find it unclear what is supported by the software/GUI and what is not. For example, does the GUI support uploading videos and running pose estimation, or does this need to be done separately? How many of the analyses in Figures 4-6 are accessible within the GUI?

      We have now clarified these. The JABS framework comprises two distinct GUI applications with complementary functionalities. The JABS-AL (active learning) desktop application handles video upload, behavioral annotation, classifier training, and inference -- it does not perform pose estimation, which must be completed separately using our pose tracking pipeline (https://github.com/KumarLabJax/mouse-tracking-runtime). If a user does not want to use our pose tracking pipeline, we have provided conversions through SLEAP to convert to our JABS pose format.  The web-based GUI enables classifier sharing and cloud-based inference on our curated datasets (JABS600, JABS1200) and downstream behavioral statistics and genetic analyses (Figures 4-6). The JABS-AL application also supports CLI (command line interface) operation for batch processing.  We have clarified these distinctions and provided a comprehensive workflow diagram in the revised Methods section.

      (5) While the manuscript does a good job of laying out best practices, there is an opportunity to further improve reproducibility for users of the platform. The software seems likely to perform well with perfect setups that adhere to the JABS criteria, but it is very likely that there will be users with suboptimal setups - poorly constructed rigs, insufficient camera quality, etc. It is important, in these cases, to give users feedback at each stage of the pipeline so they can understand if they have succeeded or not. Quality control (QC) metrics should be computed for raw video data (is the video too dark/bright? are there the expected number of frames? etc.), pose estimation outputs (do the tracked points maintain a reasonable skeleton structure; do they actually move around the arena?), and classifier outputs (what is the incidence rate of 1-3 frame behaviors? a high value could indicate issues). In cases where QC metrics are difficult to define (they are basically always difficult to define), diagnostic figures showing snippets of raw data or simple summary statistics (heatmaps of mouse location in the open field) could be utilized to allow users to catch glaring errors before proceeding to the next stage of the pipeline, or to remove data from their analyses if they observe critical issues.

      These are excellent suggestions that align with our vision for improving user experience and data quality assessment. We recognize the critical importance of providing users with comprehensive feedback at each stage of the pipeline to ensure optimal performance across diverse experimental setups. Currently, we provide end-users with tools and recommendations to inspect their own data quality. In our released datasets (Strain Survey OFA and BXD OFA), we provide video-level quality summaries for coverage of our pose estimation models. 

      For behavior classification quality control, we employ two primary strategies to ensure proper operation: (a) outlier manual validation and (b) leveraging known characteristics about behaviors. For each behavior that we predict on datasets, we manually inspect the highest and lowest expressions of this behavior to ensure that the new dataset we applied it to maintains sufficient similarity. For specific behavior classifiers, we utilize known behavioral characteristics to identify potentially compromised predictions. As the reviewer suggested, high incidence rates of 1-3 frame bouts for behaviors that typically last multiple seconds would indicate performance issues.

      We currently maintain in-house post-processing scripts that handle quality control according to our specific use cases. Future releases of JABS will incorporate generalized versions of these scripts, integrating comprehensive QC capabilities directly into the platform. This will provide users with automated feedback on video quality, pose estimation accuracy, and classifier performance, along with diagnostic visualizations such as movement heatmaps and behavioral summary statistics.

      Reviewer #1 (Recommendations for the authors):

      (1) A weakness of this tool is that it requires pose tracking, but the manuscript does not detail how pose tracking should be done and whether users should expect that the data deposited will help their pose tracking models. There is no specification on how to generate pose tracking that will be compatible with JABS. The classification quality is directly linked to the quality of the pose tracking. The authors should provide more details of the requirements of the pose tracking (skeleton used) and what pose tracking tools are compatible with JABS. In the user website link, I found no such information. Ideally, JABS would be integrated with the pose tracking tool into a single pipeline. If that is not possible, then the utility of this tool relies on more clarity on which pose tracking tools are compatible with JABS.

      The JABS ecosystem was deliberately designed with modularity in mind, separating the pose estimation pipeline from the active learning and classification app (JABS-AL) to offer greater flexibility and scalability for users working across diverse experimental setups. Our pose estimation pipeline is documented in detail within the new Methods subsection, outlining the steps to obtain JABS-compatible keypoints with our recommended runtime (https://github.com/KumarLabJax/mouse-tracking-runtime) and frozen inference models (https://github.com/KumarLabJax/deep-hrnet-mouse). This pipeline is an independent component within the broader JABS workflow, generating skeletonized keypoint data that are then fed into the JABS-AL application for behavior annotation and classifier training.

      By maintaining this separation, users have the option to use their preferred pose tracking tools— such as SLEAP —while ensuring compatibility through provided conversion utilities to the JABS skeleton format. These details, including usage instructions and compatibility guidance, are now thoroughly explained in the newly added pose estimation subsection of our Methods section. This modular design approach ensures that users benefit from best-in-class tracking while retaining the full power and reproducibility of our active learning pipeline.

      (2) The authors should justify why JAABA was chosen to benchmark their classifier. This tool was published in 2013, and there have been other classification tools (e.g., SIMBA) published since then.  

      We appreciate the reviewer’s suggestion regarding SIMBA. However, our comparisons to JAABA and a CNN are based on results from prior work (Geuther, Brian Q., et al. "Action detection using a neural network elucidates the genetics of mouse grooming behavior." Elife 10 (2021): e63207.), where both were used to benchmark performance on our publicly released dataset. In this study, we introduce JABS as a new approach and compare it against those established baselines. While SIMBA may indeed offer competitive performance, we believe the responsibility to demonstrate this lies with SIMBA’s authors, especially given the availability of our dataset for benchmarking.

      (3) I had a lot of trouble understanding the elements of the data calculated in JABS vs outside of JABS. This should be clarified in the manuscript.

      (a) For example, it was not intuitive that pose tracking was required and had to be done separately from the JABS pipeline. The diagrams and figures should more clearly indicate that.

      (b) In section 2.5, are any of those metrics calculated by JABS? Another software GEMMA, but no citation is provided for this tool. This created ambiguity regarding whether this is an analysis that is separate from JABS or integrated into the pipeline.  

      We acknowledge the confusion regarding the delineation between JABS components and external tools, and we have comprehensively addressed this throughout the manuscript. The JABS ecosystem consists of three integrated modules: JABS-DA (data acquisition), JABS-AL (active learning for behavior annotation and classifier training), and JABS-AI (analysis and integration via web application). Pose estimation, while developed by our laboratory, operates as a preprocessing pipeline that generates the keypoint coordinates required for subsequent JABS classifier training and annotation workflows. We have now added a dedicated Methods subsection that explicitly maps each analytical step to its corresponding software component, clearly distinguishing between core JABS modules and external tools (such as GEMMA for genetic analysis). Additionally, we have provided proper citations and code repositories for all external pipelines to ensure complete transparency regarding the computational workflow and enable full reproducibility of our analyses.

      (4) There needs to be clearer explanations of all metrics, methods, and transformations of the data reported.

      (a) There is very little information about the architecture of the classification model that JABS uses.

      (b) There are no details on the CNN used for comparing and benchmarking the classifier in JABS.

      (c) Unclear how the z-scoring of the behavioral data in Figure 7 was implemented.

      (d) There is currently no information on how the metrics in Figure 8 are calculated.

      We have added a comprehensive Methods section that not only addresses the specific concerns raised above but provides complete methodological transparency throughout our study. This expanded section includes detailed descriptions of all computational architectures (including the JABS classifier and grooming benchmark models and metrics), statistical procedures and data transformations (including the z-scoring methodology for Figure 7), downstream genetic analysis (including all measures presented in Figure 8), and preprocessing pipelines. 

      (5) The authors talk about their datasets having visual diversity, but without seeing examples, it is hard to know what they mean by this visual diversity. Ideally, the manuscript would have a supplementary figure with a representation of the variety of setups and visual diversity represented in the datasets used to train the model. This is important so that readers can quickly assess from reading the manuscript if the pre-trained classifier models could be used with the experimental data they have collected.

      The visual diversity of our training datasets has been comprehensively documented in our previous tracking work (https://www.nature.com/articles/s42003-019-0362-1), which systematically demonstrates tracking performance across mice with diverse coat colors (black, agouti, albino, gray, brown, nude, piebald), body sizes including obese mice, and challenging recording conditions with dynamic lighting and complex environments. Notably, Figure 3B in that publication specifically illustrates the robustness across coat colors and body shapes that characterize the visual diversity in our current classifier training data. To address the reviewer's concern and enable readers to quickly assess the applicability of our pre-trained models to their experimental data, we have now added this reference to the manuscript to ground our claims of visual diversity in published evidence.

      (6) All figures have a lot of acronyms used that are not defined in the figure legend. This makes the figures really hard to follow. The figure legends for Figures 1,2, 7, and 9 did not have sufficient information for me to comprehend the figure shown.

      We have fixed this in the manuscript. 

      (7) In the introduction, the authors talk about compression artifacts that can be introduced in camera software defaults. This is very vague without specific examples.

      This is a complex topic that balances the size and quality of video data and is beyond the scope of this paper. We have carefully optimized this parameter and given the user a balanced solution. A more detailed blog post on compression artifacts can be found at our lab’s webpage (https://www.kumarlab.org/2018/11/06/brians-video-compression-tests/). We have also added a comment about keyframes shifting temporal features in the main manuscript. 

      (8) More visuals of the inside of the apparatus should be included as supplementary figures. For example, to see the IR LEDs surrounding the camera.

      We have shared data from JABS as part of several papers including the tracking paper (Geuther et al 2019), grooming, gait and posture, mouse mass. We have also released entire datasets that as part of this paper (JABS1800, JABS-BXD). We also have step by step assembly guide that shows the location of the lights/cameras and other parts (see Methods, JABS workflow guide, and this PowerPoint file in the GitHub repository (https://github.com/KumarLabJax/JABS-datapipeline/blob/main/Multi-day%20setup%20PowerPoint%20V3.pptx).

      (9) Figure 2 suggests that you could have multiple data acquisition systems simultaneously. Do each require a separate computer? And then these are not synchronized data across all boxes?

      Each JABS-DA unit has its own edge device (Nvidia Jetson). Each system (which we define as multiple JABS-DA areas associated with one lab/group) can have multiple recording devices (arenas). The system requires only 1 control portal (RPi computer) and can handle as many recording devices as needed (Nvidia computer w/ camera associated with each JABS-DA arena). To collect data, 1 additional computer is needed to visit the web control portal and initiate a recording session. Since this is a web portal, users can use any computer or a tablet. The recording devices are not strictly synchronized but can be controlled in a unified manner.

      (10) The list of parts on GitHub seems incomplete; many part names are not there.

      We thank referee for bringing this to our attention. We have updated the GitHub repository (and its README) which now links out to the design files. 

      (11) The authors should consider adding guidance on how tethers and headstages are expected to impact the use of JABS, as many labs would be doing behavioral experiments combined with brain measurements.

      While our pose estimation model was not specifically trained on tethered animals, published research demonstrates that keypoint detection models maintain robust performance despite the presence of headstages and recording equipment. Once accurate pose coordinates are extracted, the downstream behavior classification pipeline operates independently of the pose estimation method and would remain fully functional. We recommend users validate pose estimation accuracy in their specific experimental setup, as the behavior classification component itself is agnostic to the source of pose coordinates.

      Reviewer #2 (Recommendations for the authors):

      (1) "Using software-defaults will introduce compression artifacts into the video and will affect algorithm performance." Can this be quantified? I imagine most of the performance hit comes from a decrease in pose estimation quality. How does a decrease in pose estimation quality translate to action segmentation? Providing guidelines to potential users (e.g., showing plots of video compression vs classifier performance) would provide valuable information for anyone looking to use this system (and could save many labs countless hours replicating this experiment themselves). A relevant reference for the effect of compression on pose estimation is Mathis, Warren 2018 (bioRxiv): On the inference speed and video-compression robustness of DeepLabCut.

      Since our behavior classification approach depends on features derived from keypoint, changes in keypoint accuracy will affect behavior segmentation accuracy. We agree that it is important to try and understand this further, particularly with the shared bioRxiv paper investigating the effect of compression on pose estimation accuracy. Measuring the effect of compression on keypoint and behavior classification is a complex task to evaluate concisely, given the number of potential variables to inspect. To list a few variables that should be investigated are: discrete cosine transform quality (Mathis, Warren experiment), Frame Size (Mathis, Warren experiment), Keyframe Interval (new, unique to video data), inter-frame settings (new, unique to video data), behavior of interest, Pose models with compression-augmentation used in training ( https://arxiv.org/pdf/1506.08316?) and type of CNN used (under active development). The simplest recommendation that we can make at this time is that we know compression will affect behavior predictions and that users should be cautious about using our shared classifiers on compressed video data. To show that we are dedicated in sharing these results as we run those experiments, in a related work ( CV4Animals conference accepted paper (https://www.cv4animals.com/) and can be downloaded here https://drive.google.com/file/d/1UNQIgCUOqXQh3vcJbM4QuQrq02HudBLD/view) we have already begun to inspect how changing some factors affect behavior segmentation performance. In this work, we investigate the robustness of behavior classification across multiple behaviors using different keypoint subsets. Our findings in this work is that classifiers are relatively stable across different keypoint subsets. We are actively working on follow-up effort to investigate the effect of keypoint noise, CNN model architecture, and other factors we've listed above on behavior segmentation tasks.

      (2) The analysis of inter-annotator variability is very interesting. I'm curious how these differences compare to two other types of variability:

      (a) intra-annotator variability; I think this is actually hard to quantify with the presented annotation workflow. If a given annotator re-annotated a set of videos, but using different sparse subsets of the data, it is not possible to disentangle annotator variability versus the effect of training models on different subsets of data. This can only be rigorously quantified if all frames are labeled in each video.

      We propose an alternative approach to behavior classifier development in the text associated with Figure 3C. We do not advocate for high inter-annotator agreement since individual behavior experts have differing labeling style (an intuitive understanding of the behavior). Rather, we allow multiple classifiers for the same behavior and allow the end user to prioritize classifiers based on heritability of the behavior from a classifier.  

      (b) In lieu of this, I'd be curious to see the variability in model outputs trained on data from a single annotator, but using different random seeds or train/val splits of the data. This analysis would provide useful null distributions for each annotator and allow for more rigorous statistical arguments about inter-annotator variability. 

      JABS allows the user to use multiple classifiers (random forest, XGBoost). We do not expect the user to carry out hyperparameter tuning or other forms of optimization. We find that the major increase in performance comes from optimizing the size of the window features and folds of cross validation. However, future versions of JABS-AL could enable a complete hyper-parameter scan across seeds and data splits to obtain a null distribution for each annotator. 

      (c) I appreciate the open-sourcing of the video/pose datasets. The authors might also consider publicly releasing their pose estimation and classifier training datasets (i.e., data plus annotations) for use by method developers.

      We thank the referee for acknowledging our commitment to open data sharing practices. Building upon our previously released strain survey dataset, we have now also made our complete classifier training resources publicly available, including the experimental videos, extracted pose coordinates, and behavioral annotations. The repository link has been added to the manuscript to ensure full reproducibility and facilitate community adoption of our methods.  

      (3) More thorough discussion on the limitations of the top-down vs bottom-up camera viewpoint; are there particular scientific questions that are much better suited to bottomup videos (e.g., questions about paw tremors, etc.).

      Top-down imaging, bottom-up, and multi-view imaging have a variety of pros and cons. Generally speaking, multi-view imaging will provide the most accurate pose models but requires increased resources on both hardware setup as well as processing of data. Top-down provides the advantage of flexibility for materials, since the floor doesn’t need to be transparent. Additionally lighting and potential reflection with the bottom-up perspective. Since the paws are not occluded from the bottom-up perspective, models should have improved paw keypoint precision allowing the model to observe more subtle behaviors. However, the appearance of the arena floor will change over time as the mice defecate and urinate. Care must be taken to clean the arena between recordings to ensure transparency is maintained. This doesn’t impact top-down imaging that much but will occlude or distort from the bottom-up perspective. Additionally, the inclusion of bedding for longer recordings, which is required by IACUC, will essentially render bottom-up imaging useless because the bedding will completely obscure the mouse. Overall, while bottomup may provide a precision benefit that will greatly enhance subtle motion, top-down imaging is overall more robust for obtaining consistent imaging across large experiments for longer periods of time.

      (4) More thorough discussion on what kind of experiments would warrant higher spatial or temporal resolution (e.g., investigating slight tremors in a mouse model of neurodegenerative disease might require this greater resolution).

      This is an important topic that deserves its own perspective guide. We try to capture some of this in the paper on specifications. However, we only scratch the surface. Overall, there are tradeoffs between frame rate, resolution, color/monochrome, and compression. Labs have collected data at hundreds of frames per second to capture the kinetics of reflexive behavior for pain (AbdoosSaboor lab) or whisking behavior. Labs have also collected data a low 2.5 frames per second for tracking activity or centroid tracking (see Kumar et al PNAS). The data collection specifications are largely dependent on the behaviors being captured. Our rule of thumb is the Nyquist Limit, which states that the data capture rate needs to be twice that of the frequency of the event. For example, certain syntaxes of grooming occur at 7Hz and we need 14FPS to capture this data. JABS collects data at 30FPS, which is a good compromise between data load and behavior rate. We use 800x800 pixel resolution which is a good compromise to capture animal body parts while limiting data size. Thank you for providing the feedback that the field needs guidance on this topic. We will work on creating such guidance documents for video data acquisition parameters to capture animal behavior data for the community as a separate publication.

      (5) References 

      (a) Should add the following ref when JAABA/MARS are referenced: Goodwin et al.2024, Nat Neuro (SimBA)

      (b) Could also add Bohnslav et al. 2021, eLife (DeepEthogram).

      (c) The SuperAnimal DLC paper (Ye et al. 2024, Nature Comms) is relevant to the introduction/discussion as well.

      We thank the referee for the suggestions. We have added these references.  

      (6) Section 2.2:

      While I appreciate the thoroughness with which the authors investigated environmental differences in the JABS arena vs standard wean cage, this section is quite long and eventually distracted me from the overall flow of the exposition; might be worth considering putting some of the more technical details in the methods/appendix.

      These are important data for adopters of JABS to gain IACUC approval in their home institution. These committees require evidence that any new animal housing environment has been shown to be safe for the animals. In the development of JABS, we spent a significant amount of time addressing the JAX veterinary and IACUC concerns. Therefore, we propose that these data deserve to be in the main text. 

      (7) Section 2.3.1:

      (a) Should again add the DeepEthogram reference here

      (b) Should reference some pose estimation papers: DeepLabCut, SLEAP, Lightning Pose. 

      We thank the referee for the suggestions. We have added these references.  

      (c) "Pose based approach offers the flexibility to use the identified poses for training classifiers for multiple behaviors" - I'm not sure I understand why this wouldn't be possible with the pixel-based approach. Is the concern about the speed of model training? If so, please make this clearer.

      The advantage lies not just in training speed, but in the transferability and generalization of the learned representations. Pose-based approaches create structured, low-dimensional latent embeddings that capture behaviorally relevant features which can be readily repurposed across different behavioral classification tasks, whereas pixel-based methods require retraining the entire feature extraction pipeline for each new behavior. Recent work demonstrates that pose-based models achieve greater data efficiency when fine-tuned for new tasks compared to pixel-based transfer learning approaches [1], and latent behavioral representations can be partitioned into interpretable subspaces that generalize across different experimental contexts [2]. While pixel-based approaches can achieve higher accuracy on specific tasks, they suffer from the "curse of dimensionality" (requiring thousands of pixels vs. 12 pose coordinates per frame) and lack the semantic structure that makes pose-based features inherently reusable for downstream behavioral analysis.

      (1) Ye, Shaokai, et al. "SuperAnimal pretrained pose estimation models for behavioral analysis." Nature communications 15.1 (2024): 5165.

      (2) Whiteway, Matthew R., et al. "Partitioning variability in animal behavioral videos using semi-supervised variational autoencoders." PLoS computational biology 17.9 (2021): e1009439.  

      (d) The pose estimation portion of the pipeline needs more detail. Do users use a pretrained network, or do they need to label their own frames and train their own pose estimator? If the former, does that pre-trained network ship with the software? Is it easy to run inference on new videos from a GUI or scripts? How accurate is it in compliant setups built outside of JAX? How long does it take to process videos?

      We have added the guidance on pose estimation in the manuscript (section “2.3.1 Behavior annotation and classifier training” and in the methods section titled “Pose tracking pipeline”)

      (e) The final paragraph describing how to arrive at an optimal classifier is a bit confusing - is this the process that is facilitated by the app, or is this merely a recommendation for best practices? If this is the process the app requires, is it indeed true that multiple annotators are required? While obviously good practice, I imagine there will be many labs that just want a single person to annotate, at least in the beginning prototyping stages. Will the app allow training a model with just a single annotator?

      We have clarified this in the text. 

      (8) Section 2.5:

      (a) This section contained a lot of technical details that I found confusing/opaque, and didn't add much to my overall understanding of the system; sec 2.6 did a good job of clarifying why 2.5 is important. It might be worth motivating 2.5 by including the content of 2.6 first, and moving some of the details of 2.5 to the method/appendix.

      We moved some of the technical details in section 2.5 to the methods section titled “Genetic analysis”. Furthermore, we have added few statements to motivate the need of genetic analysis and how the webapp can facilitate this (which is introduced in the section 2.6)    

      (9) Minor corrections:

      (a) Bottom of first page, "always been behavior quantification task" missing "a".

      (b) "Type" column in Table S2 is undocumented and unused (i.e., all values are the same); consider removing.

      (c) Figure 4B, x-axis: add units.

      (d) Page 8/9: all panel references to Figure S1 are off by one

      We have fixed them in the updated manuscript.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      This paper by Poverlein et al reports the substantial membrane deformation around the oxidative phosphorylation super complex, proposing that this deformation is a key part of super complex formation. I found the paper interesting and well-written.

      We thank the Reviewer for finding our work interesting. 

      Analysis of the bilayer curvature is challenging on the fine lengthscales they have used and produces unexpectedly large energies (Table 1). Additionally, the authors use the mean curvature (Eq. S5) as input to the (uncited, but it seems clear that this is Helfrich) Helfrich Hamiltonian (Eq. S7). If an errant factor of one half has been included with curvature, this would quarter the curvature energy compared to the real energy, due to the squared curvature.

      We thank the Reviewer for raising this important issue. We have now clarified in the SI and main manuscript that we employ the Helfrich model. In our initial implementation, we indeed used the mean curvature H, thereby missing a factor of 2. As the Reviewer correctly noted, this resulted in curvature deformation energies that were underestimated by a factor of ~4. We have now corrected for this effect in the revised analysis, and the updated Table 1. Importantly, however, this correction does not alter the general conclusions of our work that supercomplex formation relieves membrane strain and stabilizes the system. We have added an additional paragraph where we discuss the magnitude of the observed bending effects, and compared the previous estimates in literature:

      SI: 

      “The local mean curvature of the membrane midplane was computed using the Helfrich model (4,5) …”

      (4) W. Helfrich, Elastic properties of lipid bilayers theory and possible experiments. Zeitschrift für Naturforschung 28c, 693-703 (1973).

      (5) F. Campelo et al., Helfrich model of membrane bending: From Gibbs theory of liquid interfaces to membranes as thick anisotropic elastic layers. Advances in Colloid and Interface Science 208, 25-33 (2014).

      Main Text: 

      “which measures the energetic cost of deforming the membrane from a flat geometry (ΔG<sub>curv</sub>) based on the Helfrich model (45, 46). …

      Our analysis suggests that both contributions are substantially reduced upon formation of the SC, with the curvature penalty decreasing by 79.2 ± 5.2 kcal mol<sup>-1</sup> (for a membrane area of ca. 1000 nm<sup>2</sup>) and the thickness penalty by 2.8 ± 2.0 kcal mol<sup>-1</sup> (Table 1).”

      “We note that the magnitude of the estimated bending energies (~10² kcal mol<sup>-1</sup>) (Table 1), while seemingly high at first glance, falls within the range expected for large-scale membrane deformation processes induced by large multi-domain proteins. For example, the Piezo mechanosensitive channel performs roughly 150k<sub>B</sub>T (≈ 90 kcal mol⁻¹) of work to bend the bilayer into its dome-like shape (65). Comparable energies have also been estimated for the nucleation of small membrane pores (66), while vesicle formation typically requires bending energies on the order of 300 kcal mol<sup>-1</sup>, largely independent of vesicle size (67). When normalized by the affected membrane area (~1000 nm<sup>2</sup>), these values correspond to an energy density of approximately 0.1 kcal mol<sup>-1</sup> nm<sup>-2</sup>, which places our estimates within a biophysically reasonable regime. Notably, cryo-EM structures of several supercomplexes shows that such assemblies can impose significant curvature on the surrounding bilayer (36, 50, 68), supporting the notion that respiratory chain organization is closely coupled to local membrane deformation. Nevertheless, we expect that the absolute deformation energies may be overestimated, as the continuum Helfrich model neglects molecular-level effects such as lipid tilt and local rearrangements, which can partially relax curvature stresses and reduce the effective bending penalty near protein–membrane interfaces (69, 70).”

      The bending modulus used (ca. 5 kcal/mol) is small on the scale of typically observed biological bending moduli. This suggests the curvature energies are indeed much higher even than the high values reported. Some of this may be due to the spontaneous curvature of the lipids and perhaps the effect of the protein modifying the nearby lipids properties.

      The SI initially included an incorrect value for the bending modulus (20 kJ mol<sup>-1</sup> instead of 20k<sub>B</sub>T), which has now been corrected. The revised value is consistent with experimentally reported bending moduli from X-ray scattering measurements, although there remains substantial uncertainty in the precise values across different experimental and computational studies.

      “The bending deformation energy was computed from the mean curvature field H(x,y), assuming a constant bilayer bending modulus κ (taken as 20k<sub>b</sub>T  = 11.85 kcal mol<sup>-1</sup> (6)):”

      (6) S. Brown et al., Comparative analysis of bending moduli in one-component membranes via coarsegrained molecular dynamics simulations. Biophysical Journal 124, 1–13 (2025).

      It is unclear how CDL is supporting SC formation if its effect stabilizing the membrane deformation is strong or if it is acting as an electrostatic glue. While this is a weakenss for a definite quantification of the effect of CDL on SC formation, the study presents an interesting observation of CDL redistribution and could be an interesting topic for future work.

      We agree with the Reviewer that future studies would be important to investigate the relationship between CDL-induced stabilization of membrane and its electrostatic effects.  

      In summary, the qualitative data presented are interesting (especially the combination of molecular modeling with simpler Monte Carlo modeling aiding broader interpretation of the results). The energies of the membrane deformations are quite large. This might reflect the roles of specific lipids stabilizing those deformations, or the inherent difficulty in characterizing nanometer-scale curvature.

      We thank the Reviewer for appreciating our work and for the help in further improving our findings.

      Reviewer #3 (Public review):

      Summary:

      In this contribution, the authors report atomistic, coarse-grained and lattice simulations to analyze the mechanism of supercomplex (SC) formation in mitochondria. The results highlight the importance of membrane deformation as one of the major driving forces for the SC formation, which is not entirely surprising given prior work on membrane protein assembly, but certainly of major mechanistic significance for the specific systems of interest.

      We thank Reviewer 3 for appreciating the importance of our study. 

      Strengths:

      The combination of complementary approaches, including an interesting (re)analysis of cryo-EM data, is particularly powerful, and might be applicable to the analysis of related systems. The calculations also revealed that SC formation has interesting impacts on the structural and dynamical (motional correlation) properties of the individual protein components, suggesting further functional relevance of SC formation. In the revision, the authors further clarified and quantified their analysis of membrane responses, leading to further insights into membrane contributions. They have also toned down the decomposition of membrane contributions into enthalpic and entropic contributions, which is difficult to do. Overall, the study is rather thorough, highly creative and the impact on the field is expected to be significant.

      Weaknesses:

      Upon revision, I believe the weakness identified in previous work has been largely alleviated.

      We thank the Reviewer for their previous remarks, which allowed us to significantly improve our manuscript.

    1. aiming to augment their own experiences and through that ended up uh augmenting uh what the rest of humanity can do.

      augmenting what the rest of humanity can do

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public review):

      Circannual timing is a phylogenetically widespread phenomenon in long-lived organisms and is central to the seasonal regulation of reproduction, hibernation, migration, fur color changes, body weight, and fat deposition in response to photoperiodic changes. Photoperiodic control of thyroid hormone T3 levels in the hypothalamus dictates this timing. However, the mechanisms that regulate these changes are not fully understood. The study by Stewart et al. reports that hypothalamic iodothyronine deiodinase 3 (Dio3), the major inactivator of the biologically active thyroid hormone T3, plays a critical role in circannual timing in the Djungarian hamster. Overall, the study yields important results for the field and is well-conducted, with the exception of the CRISPR/Cas9 manipulation.

      We appreciate the positive and supportive comment from the Reviewer. We have clarified the oversight in the Crispr/Cas9 data representation below. Our correction should alleviate any concern raised.

      Figure 1 lays the foundation for examining circannual timing by establishing the timing of induction, maintenance, and recovery phases of the circannual timer upon exposure of hamsters to short photoperiod (SP) by monitoring morphological and physiological markers. Measures of pelage color, torpor, body mass, plasma glucose, etc, established that the initiation phase occurred by weeks 4-8 in SP, the maintenance by weeks 12-20, and the recovery after week 20, where all morphological and physiological changes started to reverse back to long photoperiod phenotypes.

      The statistical analyses look fine, and the results are unambiguous.

      We thank the Reviewer for recognizing our attempts to highlight the phenomenon of circannual interval timing.

      Their representation could, however, be improved. In Figures 1d and 1e, two different measures are plotted on each graph and differentiated by dots and upward or downward arrowheads. The plots are so small, though, that distinguishing between the direction of the arrows is difficult. Some color coding would make it more reader-friendly. The same comment applies to Figure S4. 

      We have increased the panel size for Figure 1d and 1e. We have also changed the colour of the graphs in Figure 1d and 1e to facilitate the differentiation of the two dependent variables. For the circos plots, we attempted different ways to represent the data. We have opted to keep the figures in their current stage. The overall aim is to provide a ‘gestalt’ view of the timing of changes in transcript expression and highlighted only a few key genes. The whole dataset is provided in the supplementary materials for Reviewer/Reader interrogation.

      The authors went on to profile the transcriptome of the mediobasal and dorsomedial hypothalamus, paraventricular nucleus, and pituitary gland (all known to be involved in seasonal timing) every 4 weeks over the different phases of the circannual interval timer. A number of transcripts displaying seasonal rhythms in expression levels in each of the investigated structures were identified, including transcripts whose expression peaks during each phase. This included two genes of particular interest due to their known modulation of expression in response to photoperiod, Dio3 and Sst, found among the transcripts upregulated during the induction and maintenance phases, respectively. The experiments are technically sound and properly analyzed, revealing interesting candidates. Again, my main issues lie with the representation in the figure. In particular, the authors should clarify what the heatmaps on the right of Figures 1f and 1g represent. I suspect they are simply heatmaps of averaged expression of all genes within a defined category, but a description is missing in the legend, as well as a scale for color coding near the figure.

      We have clarified the heatmap and density maps in the Figure legend. We apologise for the lack of information to describe the figure panels. (see lines 644-648)

      Figure 2 reveals that SP-programmed body mass loss is correlated to increased Dio3-dependent somatostatin (Sst) expression. First, to distinguish whether the body mass loss was controlled by rheostatic mechanisms and not just acute homeostatic changes in energy balance, experiments from hamsters fed ad lib or experiencing an acute food restriction in both LP and SP were tested. Unlike plasma insulin, food restriction had no additional effect on SP-driven epididymal fat mass loss (Figure S7). This clearly establishes a rheostatic control of body mass loss across weeks in SP conditions. Importantly, Sst expression in the mediobasal hypothalamus increased in both ad lib fed or restriction fed SP hamsters and this increase in expression could be reduced by a single subcutaneous injection of active T3, clearly suggesting that increase in Sst expression in SP is due to a decrease of active T3 likely via Dio3 increase in expression in the hypothalamus. The results are unambiguous

      We thank the Reviewer for the supportive and affirmative feedback.

      Figure 3 provides a functional test of Dio3's role in the circannual timer. Mediobasal hypothalamic injections of CRISPR-Cas9 lentiviral vectors expressing two guide RNAs targeting the hamster Dio3 led to a significant reduction in the interval between induction and recovery phases seen in SP as measured by body mass, and diminished the extent of pelage color change by weeks 15-20. In addition, hamsters that failed to respond to SP exposure by decreasing their body mass also had undetectable Dio3 expression in the mediobasal hypothalamus. Together, these data provide strong evidence that Dio3 functions in the circannual timer. I noted, however, a few problems in the way the CRISPR modification of Dio3 in the mediobasal hypothalamus was reported in Figure S8. One is in Figure S8b, where the PAM sites are reported to be 9bp and 11bp downstream of sgRNA1 and sgRNA2, respectively. Is this really the case? If so, I would have expected the experiment to fail to show any effect as PAM sites need to immediately follow the target genomic sequence recognized by the sgRNA for Cas9 to induce a DNA double-stranded break. It seems that each guide contains a 3' NGG sequence that is currently underlined as part of sgRNAs in both Fig S8b and in the method section. If this is not a mistake in reporting the experimental design, I believe that the design is less than optimal and the efficiencies of sgRNAs are rather low, if at all functional.

      We apologize for the oversight and indeed the reporting in Figure S8b was a mistake. The PAM site previously indicated was the ‘secondary PAM site’ (which as the Reviewer notes would likely have low efficiency). The PAM site is described within the gRNA in the figure. We use Adobe Illustrator to generate figures, and during the editing process, the layer for PAM text was accidentally moved ‘back’ to a lower level. The oversight was not rectified before submission. We apologise for this unreservedly. The PAM site text has been moved forward, to highlight the location of the primary site (ie immediately following gRNA) and labelled the gRNA and PAM site in the ‘Target region’. The secondary PAM site text was removed to eliminate any confusion.

      The authors report efficiencies around 60% (line 325), but how these were obtained is not specified. 

      The efficiency provided are based on bioinformatic analyses and not in vivo assays. To reduce any confusion, we have removed the text. The gRNA were clearly effective to induce mutations based on the sequencing analyses.

      Another unclear point is the degree to which the mediobasal hypothalamus was actually mutated. Only one mutated (truncated) sequence in Figure S8c is reported, but I would have expected a range of mutations in different cells of the tissue of interest.

      The tissue punch would include multiple different cells (e.g., neuronal, glial, etc). We agree with the Reviewer that genomic samples from different cells would be included in the sequencing analyses. Given the large mutation in the target region, the gRNA was effective. We have only shown one representative sequence. If the Reviewer would like to see all mutations, we can easily show the other samples.

      Although the authors clearly find a phenotypic effect with their CRISPR manipulation, I suspect that they may have uncovered greater effects with better sgRNA design. These points need some clarification. I would also argue that repeating this experiment with properly designed sgRNAs would provide much stronger support for causally linking Dio3 in circannual timing.

      The gRNA was designed using the Gold-standard approach – ChopChop [citation Labon et al., 2019]. If the Reviewer’s concern re design is due to the comment above re PAM site; this issue was clarified and there are no concerns for the gRNA design. The major challenge with the Dio3 gene (single exon) with a very short sequence length (approx.. 412bp). There is limited scope within this sequence length to generate gRNA.

      A proposed schematic model for mechanisms of circannual interval timing is presented in Figure S9. I think this represents a nice summary of the findings put in a broader context and should be presented as a main figure in the manuscript itself rather than being relayed in supplementary materials.

      We agree with the Reviewer position and moved the figure to the main manuscript. The figure is now Figure 4.

      Reviewer #2 (Public review):

      Several animals and plants adjust their physiology and behavior to seasons. These changes are timed to precede the seasonal transitions, maximizing chances of survival and reproduction. The molecular mechanisms used for this process are still unclear. Studies in mammals and birds have shown that the expression of deiodinase type-1, 2, and 3 (Dio1, 2, 3) in the hypothalamus spikes right before the transition to winter phenotypes. Yet, whether this change is required or an unrelated product of the seasonal changes has not been shown, particularly because of the genetic intractability of the animal models used to study seasonality. Here, the authors show for the first time a direct link between Dio3 expression and the modulation of circannual rhythms.

      We appreciate the clear synthesis and support for the manuscript.

      Strengths:

      The work is concise and presents the data in a clear manner. The data is, for the most part, solid and supports the author's main claims. The use of CRISPR is a clear advancement in the field. This is, to my knowledge, the first study showing a clear (i.e., causal) role of Dio3 in the circannual rhythms in mammals. Having established a clear component of the circannual timing and a clean approach to address causality, this study could serve as a blueprint to decipher other components of the timing mechanism. It could also help to enlighten the elusive nature of the upstream regulators, in particular, on how the integration of day length takes place, maybe within the components in the Pars tuberalis, and the regulation of tanycytes.

      We thank the Reviewer for this positive summary.

      Weaknesses:

      Due to the nature of the CRISPR manipulation, the low N number is a clear weakness. This is compensated by the fact that the phenotypes shown here are strong enough. Also, this is the only causal evidence of Dio3's role; thus, additional evidence would have significantly strengthened the author's claims. The use of the non-responsive population of hamsters also helps, but it falls within the realm of correlations.

      We would also like to remind the Reviewer that one Crispr-Cas9 Dio3<sup>cc</sup> treated hamster did not show any mutation in the genome. This hamster was observed to have a change in body mass and pelage colour like controls. This animal provides another positive control.

      We also conducted a statistical power analysis to examine whether n=3 is sufficient for the Dio3<sup>cc</sup> treatment group. Using the appropriate expected difference in means and standard deviations for an alpha of 0.05; we regularly observed beta >0.8 across the dependent variables. 

      Additionally, the consequences of the mutations generated by CRISPR are not detailed; it is not clear if the mutations affect the expression of Dio3 or generate a truncation or deletion, resulting in a shorter protein.

      We agree with the Reviewer that transcript and protein assays would strengthen the genome mutation data. Due to the small brain region under investigation, we are limited in the amount of biological material to extract. Dio3 is an intronless gene and very short – approximately 412 base pairs in length. We opted to maximize resources into sequencing the gene as the confirmation of genetic mutation is paramount. Given the large size of the mutation in the treated hamsters, there would be no amplification of transcript or protein translated.

      Reviewer #3 (Public review):

      The authors investigated SP-induced physiological and molecular changes in Djungarian hamsters and the endogenous recovery from it after circa half a year. The study aimed to elucidate the intrinsic mechanism and included nice experiments to distinguish between rheostatic effects on energy state and homeostatic cues driven by an interval timer. It also aimed to elucidate the role of Dio3 by introducing a targeted mutation in the MBH by ICV. The experiments and analyses are sound, and the amount of work is impressive. The impact of this study on the field of seasonal chronobiology is probably high.

      We thank the Reviewer for their positive comments and support for our work.

      Even though the general conclusions are well-founded, I have fundamental criticism concerning 3 points, which I recommend revising:

      (1) The authors talk about a circannual interval timer, but this is no circannual timer. This is a circasemiannual timer. It is important that the authors use precise wording throughout the manuscript.

      We agree with the Reviewer that the change in physiology and behaviour does not approximate a full year (e.g. annual) and only a half of the year. We opted to use circannual timer as this term is established in the field (see doi: 10.1177/0748730404266626; doi: 10.1098/rstb.2007.2143). We cannot identify any publication that has used the term ‘semiannual timer’. We do not feel this manuscript is the appropriate time to introduce a new term to the field; we will endeavour to push the field to consider the use of ‘semiannual timer’. A Review or Opinion paper is best place for this discussion. We hope the Reviewer will understand our position.

      (2) The authors put their results in the context of clocks. For example, line 180/181 seasonal clock. But they have described and investigated an interval timer. A clock must be able to complete a full cycle endogenously (and ideally repeatedly) and not only half of it. In contrast, a timer steers a duration. Thus, it is well possible that a circannual clock mechanism and this circa-semiannual timer of photoperiodic species are 2 completely different mechanisms. The argumentation should be changed accordingly.

      We agree with the Reviewers definitions of circannual ‘clock’ and ‘timer’. We were careful to distinguish between the two concepts early in the manuscript (lines 41-46). We have added italics to emphasis the different terms. The use of seasonal clock on line 180/191 was imprecise and we appreciate the Reviewer highlighting our oversight and the text was revised. We have also revised the Abstract accordingly.

      (3) The authors chose as animal model the Djungarian hamster, which is a predominantly photoperiodic species and not a circannual species. A photoperiodic species has no circannual clock. That is another reason why it is difficult to draw conclusions from the experiment for circannual clocks. However, the Djungarian hamster is kind of "indifferent" concerning its seasonal timing, since a small fraction of them are indeed able to cycle (Anchordoquy HC, Lynch GR (2000), Evidence of an annual rhythm in a small proportion of Siberian hamsters exposed to chronic short days. J Biol Rhythms 15:122-125.). Nevertheless, the proportion is too small to suggest that the findings in the current study might reflect part of the circannual timing. Therefore, the authors should make a clear distinction between timers and clocks, as well as between circa-annual and circa-semiannual durations/periods.

      This comment is not clear to us. The Reviewer states the hamsters are not a circannual species, but then highlight one study that shows circannual rhythmicity. We agree that circannual rhythmicity in Djungarian hamsters is dependent on the physiological process under investigation (e.g. body mass versus reproduction) and that photoperiodic response system either dampen or mask robust cycles. We have corrected the text oversight highlighted above and the manuscript is focused on interval timers. We have kept the term circannual over semicircannual due to the prior use in the scientific literature.

      Reviewing Editor Comments:

      The detailed suggestions of the reviewers are outlined below (or above in case of reviewer 1). In light of the criticism, we ask the authors to especially pay attention to the comments on the Cas9/Crisp experiment, raised by Reviewers 1 and 2. As currently described, there are serious questions on the design of the sgRNAs, and also missing critical methodological details. If the latter are diligently taken care of, they may resolve the questions on the sgRNA design. Please also reconsider the wording along the suggestions of Reviewer 3.

      We appreciate the Editors time and support for the manuscript. We have clarified and corrected our oversight for the PAM site. This correction confirms the strength of the Crispr-cas9 gRNA used in the study. The correction should remove all concerns. We have also considered using semicircannual in the text. As there is existing scientific literature using circannual interval timer, and there is no publication to our knowledge for using ‘semicircannual; we have opted to keep with the current approach and use circannual. We feel a subsequent Opinion paper is more suitable to introduce a new term.

      Reviewer #2 (Recommendations for the authors):

      First, I want to commend the authors for their work. It is a clear advancement for our field. Below are a couple of comments and suggestions I have:

      we thank the Review for the positive comment and support. We have endeavoured to incorporate their suggested improvements to the manuscript.

      (1) Looking at the results of Figure 1A and Figure S8, the control in S8 showed a lower pelage color score as compared to the hamsters in 1A. Is this a byproduct of the ICV injection?

      The difference between Figure 1 and 3 is likely due to the smaller sample sizes. The controls in Figure 1 had a higher proportion of hamsters show complete white fur (score =3) at 1618 weeks compared to controls in Figure 3. It is possible, although unlikely that the ICV injection would reduce the development of winter phenotype. There was no substance in the ICV injection that would impact the prolactin signalling pathway. Our perspective is that the difference between the two figures is due to the different sampling population. Overall, the timing of the change in pelage colour is the same between the figures and suggest that the mechanisms of interval timer were unaffected.

      (2) Is there a particular reason why the pelage color for the CRISPR mutants is relegated to the supplemental information? In my opinion, this is also important, even though the results might be difficult to explain. Additionally, did the authors check for food intake and adipose mass in these animals?

      We agree with the Reviewer the pelage change is very interesting. We decided to have Figure 3 focus on body mass. The rationale was due to the robust nature of the data collection from Crispr-cas9 study (Fig.3b), in addition to the non-responsive hamsters (Fig.3e). We disagree that the data patterns are hard to explain, as pelage changes was similar to the photoperiodic induced change in body mass. No differences were observed for food intake or adipose tissue. We have added this information in the text (see lines 162-163).

      (3) I might have missed it, but did the authors check for the expression of Dio3 on the CRISPR mutants? Does the deletion cause reduced expression or any other mRNA effect, such as those resulting in the truncation of a protein?

      Due to the limited biological material extracted from the anatomical punches, we decided to focus on genomic mutations. Dio3 has a very short sequence length and the size of the mutations identified indicate that no RNA could be transcribed.

      (4) Could the authors clarify which reference genome or partial CDS (i.e., accession numbers) they used to align the gRNA? Did they use the SSSS strain or the Psun_Stras_1 isolate?

      The gRNAs were designed using the online tool CHOPCHOP, using the Mus musculus

      Dio3 gene. The generated gRNAs were subsequently aligned via blast with the Phodopus sungorus Dio3 partial cds (GenBank: MF662622.1), to ensure alignment with the species. We are confident that the gRNA designed align 100% in hamsters. Furthermore, we conducted BLAST to ensure there were no off-targets. The only gene identified in the BLAST was the rodent (i.e. hamster, mouse) Dio3 sequence.

      (5) Figure 3b. I do agree with the authors in pointing out that the decrease in body mass is occurring earlier in Dio3wt hamsters; however, the shape of the body mass dynamic is also different. Do the authors have any comments on the possible role of Dio3 in the process of exist of overwintering?

      This is a very interesting question. We do not have the data to evaluate the role of Dio3 for overwintering. We argue that disruption in Dio3 reduced the circannual interval period. For this interpretation, yes, Dio3 is necessary for overwintering. However, we would need to show the sufficiency of Dio3 to induce the winter phenotype in hamsters housed in long photoperiod. At this time, we do not have the technical ability to conduct this experiment.

      (6) In Figure 3d, the Dio3wt group does not show any dispersion. Is this correct? If that's true, and no dispersion is observed, no normality can be assumed, and a t-test can't be performed (Line 692).The Mann-Whitney test might be better suited.

      We conducted a Welch’s t-test to compare the difference in body mass period. We used the Welch’s test as the variance were not equal; Mann-Whitney test is best for skewed distributions. To clarify the test used, we have added ‘Welch’s test’ to the Figure legend.

      (9) Figure 1 h. It might be convenient to add the words "Induction", "maintenance", and "recovery" over each respective line on the polar graph for easier reading.

      We have added the text as suggested by the Reviewer.

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1: Please enlarge all partial graphics at least to the size of Figure 2. In the print version, labels are barely readable

      we have increased the panels in Figure 1 and 3 by 20% to accommodate the Reviewers suggestion.

      (2) Legend Figure 2: Add that the food restriction was 16h.

      We have added 16h to the text.

      (3) Figure 3b: enlarge font size. In the legend: Dio3cc hamsters delayed.... The delay might have been a week or so, but not more (and even that is unclear since the rise in body mass in that week seems to be rather a disturbance of the curve). Thus 'delay' might not be the most appropriate wording. Instead, the initial decline is slower, but both started at nearly the same week (=> no delay). Minimum body mass is reached at the identical week as in wt (=> no delay). Also, the increase started at the same week but was much faster in Dio3cc than in wt. Figure 3c: How can there be a period when there is no repeated cycle (rhythm)? This is rather a duration. Moreover, according to the displayed data, I am wondering which start point and which endpoint is used. The first and last values are the highest of the graph, but have they been the maximum? Especially for Dio3wt, it can be assumed that animals haven't reached the maximum at the end of the graph.

      We have increased the font size in Figure 3b. We have changed ‘delayed’ to ‘slower’ in the text. Period analyses, such as the Lomb-Scargle measure the duration of a cycle (and multiple cycles). The start point and end point used in the analyses were the initial data collection date (week 0) and the final data collection date (week 32). The Lomb-Scargle analyses determines the duration of the period that occurs within these phases of the cycle. We believe the period analyses conducted by the Lomb-Scargle is the most suitable for the scientific question.

      (4) Figure S9: This is a very nice graph and summarises your main results. It should appear in the main manuscript and not in the supplements.

      We appreciate the positive comment and suggestion. We agree with the Reviewer and have move the graph to the main figure. The revised manuscript indicates the graph as Figure 4.

    1. Synthèse des "Rendez-vous de la techno" : La filière STI2D

      Résumé

      Ce document synthétise les informations et témoignages présentés lors de l'événement "Les rendez-vous de la techno" consacré à la filière Sciences et Technologies de l'Industrie et du Développement Durable (STI2D).

      La filière STI2D se positionne comme une voie d'excellence scientifique et technologique, conçue pour les élèves qui privilégient l'apprentissage par la pratique, la manipulation et la réalisation de projets concrets, en contraste avec l'approche plus théorique de la voie générale.

      Elle s'adresse à des profils créatifs, aimant le travail en groupe, la résolution de problèmes et l'innovation.

      Le cursus est structuré pour fournir des connaissances solides en sciences, technologie, mathématiques et ingénierie, tout en développant une sensibilité aux enjeux industriels et environnementaux.

      La pédagogie, axée sur des projets concrets comme la conception d'une voiture solaire ou la modélisation 3D de châteaux, permet aux élèves de mettre en œuvre leurs compétences de manière tangible.

      La filière STI2D se distingue par la grande diversité des poursuites d'études qu'elle autorise.

      Elle ouvre aussi bien la voie à des études courtes (BTS, BUT) qu'à des parcours longs et exigeants menant aux plus hautes qualifications (Classes Préparatoires aux Grandes Écoles TSI, écoles d'ingénieurs, licences universitaires).

      Les témoignages d'élèves et d'étudiants confirment que la filière constitue un tremplin efficace vers la réussite, y compris pour des élèves se réorientant depuis la voie générale, et que ses diplômés sont recherchés dans de nombreux secteurs d'activité de pointe.

      --------------------------------------------------------------------------------

      1. Présentation Générale de la Filière STI2D

      1.1. Public Cible et Profil de l'Élève

      La filière STI2D est accessible après une classe de seconde générale et technologique.

      Elle est particulièrement adaptée aux élèves présentant les caractéristiques suivantes :

      Intérêt pour la technologie et les sciences : Un goût prononcé pour la manipulation, la compréhension des phénomènes physiques et la mise en œuvre de solutions techniques.

      Esprit pratique et créatif : L'envie de travailler en groupe sur des projets, de résoudre des problèmes concrets et de faire preuve de créativité et d'innovation.

      Ambition : La filière attire des élèves qui envisagent des carrières d'ingénieur ou de technicien supérieur.

      Selon Mme Amarante, le choix de cette filière correspond à un profil qui "aime la technologie", qui est "plutôt créatif", qui "aime aussi résoudre des problèmes, trouver des solutions".

      1.2. Compétences et Connaissances Acquises

      Le baccalauréat STI2D est présenté comme un "bac technologique plutôt scientifique" qui permet d'acquérir des compétences solides et variées :

      Connaissances pluridisciplinaires : Sciences, technologie, mathématiques et ingénierie.

      Compétences industrielles et environnementales : Une sensibilisation forte aux enjeux de l'industrie moderne et du développement durable.

      Approche design et innovation : Développement de la créativité et de la capacité à innover.

      --------------------------------------------------------------------------------

      2. Structure du Cursus Pédagogique

      L'enseignement en STI2D est conçu pour rendre les concepts scientifiques plus accessibles par l'expérimentation et la réalisation.

      2.1. Classe de Première

      L'objectif est de permettre aux élèves qui "ont du mal à comprendre les enseignements" de manière abstraite de "se rapprocher de la manipulation" et de "comprendre des phénomènes en petit groupe".

      Le programme s'articule autour de deux spécialités :

      Ingénierie, Innovation et Développement Durable (I2D) : Acquisition de connaissances scientifiques fondamentales à travers trois domaines : la matière, l'énergie et l'information.

      Innovation Technologique (IT) : Mise en œuvre des connaissances acquises en I2D à travers la réalisation de trois projets concrets durant l'année.

      2.2. Classe de Terminale

      En terminale, l'enseignement de spécialité I2D se poursuit, complété par un choix parmi quatre approfondissements spécifiques. L'année est marquée par un projet de 72 heures qui couvre l'étude, l'analyse, la conception, la simulation et le prototypage.

      Spécialité

      Acronyme

      Description

      Architecture et Construction

      AC

      Approfondissement des connaissances liées à la matière et à la structure.

      Innovation Technologique et Éco-conception

      ITEC

      Approfondissement des connaissances liées à la conception mécanique et au design.

      Systèmes d'Information et Numérique

      SIN

      Approfondissement des connaissances liées à l'informatique et aux systèmes numériques.

      Énergie et Environnement

      EE

      Approfondissement des connaissances liées à la gestion, au transport et au stockage de l'énergie.

      Un exemple de projet pluridisciplinaire cité est celui de la voiture solaire, qui a mobilisé trois spécialités :

      AC pour la conception du châssis.

      EE pour la gestion de l'énergie (panneaux solaires, stockage, alimentation moteur).

      SIN pour la commande et le pilotage de la voiture.

      --------------------------------------------------------------------------------

      3. Poursuites d'Études et Débouchés

      La filière STI2D offre un large éventail de possibilités après le baccalauréat, permettant aux élèves de choisir entre des études courtes ou longues.

      3.1. Panorama des Options Post-Baccalauréat

      Type de Parcours

      Formations Possibles

      Exemples Cités

      Études Courtes (Bac+2 / Bac+3)

      BTS (Brevet de Technicien Supérieur)

      BTS CIEL (Informatique et Réseau), BTS Électrotechnique, CPI, CPRP, CRSA.

      BUT (Bachelor Universitaire de Technologie)

      BUT Génie Civil Construction Durable, BUT Informatique, BUT Génie Industriel et Maintenance. Il est à noter que les BUT ont des places réservées pour les bacheliers technologiques.

      Études Longues (Bac+5 et plus)

      Classes Préparatoires aux Grandes Écoles (CPGE)

      Prépa TSI (Technologie et Sciences Industrielles), spécifiquement destinée aux bacheliers STI2D/STL, et Prépa TPC (Technologie, Physique et Chimie).

      Écoles d'Ingénieurs

      Accès direct via le concours GPI Polytech pour STI2D/STL ou après une CPGE ou un BTS/BUT.

      Licences Universitaires

      Licence Informatique, Mathématiques, Physique, Sciences pour l'Ingénieur.

      3.2. Données et Tendances (Parcoursup Janvier 2025)

      Les données de Parcoursup indiquent une répartition équilibrée des choix des bacheliers STI2D, avec "autant de jeunes qui s'orientent vers des BTS que sur des BUT".

      Un nombre légèrement inférieur d'élèves se dirige directement vers les classes préparatoires, les écoles d'ingénieurs ou les licences universitaires.

      3.3. Secteurs d'Activité

      Les diplômés peuvent intégrer des secteurs très variés, dont beaucoup sont des "métiers en tension" :

      • BTP, architecture

      • Énergie, électronique, environnement

      • Audiovisuel, informatique, recherche et développement

      • Secteurs de pointe : aéronautique, ferroviaire, construction navale

      --------------------------------------------------------------------------------

      4. Témoignages et Expériences Pratiques

      4.1. L'Atelier de Prototypage : Une Démonstration Concrète

      Une visite de l'atelier de prototypage a été organisée pour des élèves de seconde. Guidés par M. René, ils ont découvert :

      Des machines de fabrication complexes : Une voiture de course fabriquée sur place et ayant participé à une course à Albi.

      Des technologies de prototypage rapide : Des imprimantes 3D plastique et métal, ainsi qu'une machine de découpe laser.

      La démonstration a mis en évidence la simplicité d'utilisation de certaines machines, incarnant l'esprit "Fablab" du lycée. Un élève a pu utiliser la machine de découpe laser après seulement 10 minutes d'explications pour réaliser une pièce. Cette expérience a souligné l'accessibilité de la technologie et la capacité des élèves à "concevoir et réaliser des pièces" rapidement.

      4.2. Paroles d'Élèves de Terminale STI2D

      Les témoignages des élèves de terminale illustrent la richesse et la diversité des parcours et des projets au sein de la filière.

      Spécialité Architecture et Construction (AC) :

      Jade a travaillé sur la modélisation des conduites d'eaux usées d'une ville fictive (Moeville) et souhaite devenir architecte d'intérieur.  

      Albin, réorienté depuis la première générale, ne "regrette pas du tout" son choix.

      Il a participé à un projet de visite et de modélisation 3D du château de Jaligny.

      Il souligne la valeur de l'approche plus appliquée de la filière et vise une école d'architecture ou un BUT Génie Civil.

      Spécialité Énergie et Environnement (EE) :

      Tom a choisi cette filière pour son "attrait relativement particulier pour tout ce qui était les énergies" et le désir "d'améliorer le fonctionnement de la société sur son point énergétique".

      Bien qu'il se destine à devenir pilote, il "prend du plaisir à suivre les cours".

      Spécialité Innovation Technologique et Éco-conception (ITEC) :

      Will a choisi ITEC car il avait "beaucoup aimé les cours d'innovation technologique" en première.

      Il se dirige vers une école d'informatique ou de cybersécurité.  

      Zoé, intéressée par le design (automobile, espace, mode), trouve que la spécialité ITEC est une bonne formation polyvalente où "on fait un peu de tout".

      Spécialité Systèmes d'Information et Numérique (SIN) :

      Liam apprécie le fait qu'en filière technologique, "il y a plus de pratique que de théorie" et que "on travaille plus souvent en classe qu'à la maison".    ◦ Martin a choisi la filière STI2D pour accéder à la spécialité SIN en vue d'une carrière dans l'informatique. Il n'est "pas déçu" et s'oriente vers les sciences des données.

      4.3. Paroles d'Étudiants en Post-Baccalauréat

      BTS :

      ◦ Les étudiants de BTS CPI (Conception de Produits Industriels) montrent la complémentarité des parcours : Chris vient d'un bac général et y voit "la continuité de la matière science de l'ingénieur", tandis que Gauthier vient d'un bac STI2D ITEC et a été attiré par "le design qu'on faisait en ITECH".  

      Paul, en BTS CPRP, a préféré le cadre du BTS à celui du BUT pour son projet de carrière dans l'ingénierie militaire.

      Il note que la cohabitation entre bacheliers généraux et STI2D est "plutôt complémentaire", les uns apportant la théorie (maths, physique), les autres la pratique.

      Classe Préparatoire TSI :

      ◦ Deux étudiants confirment que la prépa est le "meilleur moyen pour faire ingénieur".

      Ils décrivent un changement de rythme important par rapport à la terminale : "Ça change de STI2D", "c'est vachement plus intense".

      Cependant, l'adaptation est facilitée par une "bonne ambiance" et une "beaucoup de solidarité", notamment à l'internat.

      --------------------------------------------------------------------------------

      5. Points Clés et Ressources

      5.1. Diversité et Représentation

      Il est souligné que la filière STI2D compte "globalement plus de garçons que de filles", tout en insistant sur le fait que "c'est aussi une filière pour les filles".

      La présence de plusieurs étudiantes parmi les témoins (Jade, Zoé, Joyce) vient appuyer ce propos.

      5.2. Outils d'Orientation

      Pour aider les élèves dans leur parcours, deux ressources numériques accessibles via "Mon Bureau Numérique" sont mises en avant :

      La plateforme Avenir : En lien avec l'ONISEP, elle propose de la documentation, des fiches formations et des témoignages.

      Mon projet sup : Un outil d'aide à la préparation du projet d'orientation au lycée, permettant de cibler des secteurs d'activité en fonction des compétences et des intérêts de l'élève.

    1. Reviewer #1 (Public review):

      Summary:

      The goal of the manuscript was to determine if strenuous exercise negatively impacted regeneration. Indeed, the major conclusion of the manuscript is that elevated exercise during the early stages of regeneration compromises the regenerative process. The authors further conclude that regeneration is disrupted due to defects in blastema formation, which is caused by impaired HA deposition and reduced active (nuclear) Yap.

      Strengths:

      (1) The paradigm of elevated exercise disrupting ECM and regeneration is significant, and provides an experimental model to better understand connections between the ECM and cell/tissue activities.

      (2) The conclusion that exercise intensity correlates with defects in regeneration is supported.

      (3) The demonstration for the requirement for HA is well supported via transcriptomics and multiple independent strategies to manipulate HA levels.

      (4) The demonstration that nuclear Yap depends on the amount of HA is well-supported.

      Weaknesses:

      (1) The authors conclude throughout the manuscript that "blastema formation" is disrupted, but they do not provide any insights into how blastema formation is disrupted (reduced de-differentiation? reduced cell migration? both?). While they show that there are fewer dividing cells, the timing of exercise is prior to outgrowth. So, the effect of dividing cells is likely secondary, which is not considered (or not clearly explained).

      (2) The authors conclude that patterning is affected, but their analyses of patterns (bifurcations) are very limited. It is also not clear if patterning is believed to be affected by a common exercise-induced mechanism or a different exercise-induced mechanism (or by a secondary mechanism).

      (3) The significance of HA in regeneration has been shown before in zebrafish fins, as well as in a handful of other models of regeneration. Although largely cited, explaining some of this work in more detail would give the reader a better picture of how HA is believed to promote regeneration. It may also highlight some emerging questions about the role of HA in regeneration that would permit a richer story and specific future directions.

      (4) In general, parts of the text lack specificity/clarity, and in other cases, there seems to be contradictory information.

      (5) Overall, many of the conclusions were well supported by the data, and this study is likely to provide a foundation for future research on the role of the ECM in tissue repair and regeneration. The main limitations were in connecting the experimental details with the specific processes required for regeneration, and in clearly explaining the findings.

    2. Author response:

      Reviewer #1

      We agree that further clarification how elevated exercise disrupts blastema formation would strengthen the manuscript. Our data suggests a major contribution of proliferation. Exercise reduced the fraction of proliferative cells at 3 dpa, consistent with disrupted HA production and downstream Yap signaling. This interpretation aligns with prior studies showing that proliferation contributes to blastema establishment and is not restricted to the outgrowth phase of fin regeneration (Poleo et al, 2001; Poss et al, 2002; Wang et al, 2019; Pfefferli et al, 2014; Hou et al, 2020). We will explore additional experiments to reinforce these insights into the cellular mechanisms underlying exercise-disrupted blastema formation.

      We acknowledge that our analysis of ray branching abnormalities is limited in the current manuscript. We focus our study on introducing the zebrafish swimming and regeneration model and then characterizing ECM and signaling changes accounting for disrupted blastema establishment. For completeness, we included the observation of skeletal patterning defects (branching delays and bone fusions) but without detailed analysis. We note that decreased expression of shha and Shh-pathway components following early exercise corresponds with the branching defects. However, we recognize exercise could have additional effects during the outgrowth  phase when branching morphogenesis actively occurs. Therefore, we will expand our discussion to outline future research directions related to exercise impacts on regenerative skeletal patterning.

      We will expand the Introduction and/or Discussion sections to provide more context on known HA roles across regeneration contexts, including in zebrafish fins. Finally, we will improve the text’s clarity and specificity throughout the manuscript, including to resolve or explain any apparent contradictions.

      Reviewer #2

      We appreciate the Reviewer's concern regarding the specificity of forced exercise as a model for mechanical loading. Forced exercise has been widely used in vivo to induce mechanical loading without the requirement for specialized implants or animal restraint, including in mouse (Wallace et al, 2015; Bomer et al, 2016), rat (Honda et al, 2003; Boerckel et al, 2011; Boerckel et al, 2012), and, most relevant to our study, zebrafish models (Fiaz et al, 2012; Fiaz et al, 2014; Suniaga et al, 2018). However, we will expand our discussion of this approach and ensure precise language distinguishing exercise from mechanical loading.

      We acknowledge the possibility that early shear stress disrupts the wound epidermis, which we will elaborate on in a revised Discussion. However, exercise-induced disruptions to the fin epidermis of early regenerates (1–2 dpa; Figure 2) typically resolve within one day, whereas fibroblast lineage cells still fail to establish a robust blastema. Therefore, sustained effects of mechanical loading and/or mechanosensation are likely major contributors to the observed regeneration phenotypes.

      We will explore whether HA acts as a general enhancer of fin regeneration by comparing blastemal HA supplementation vs. controls in non-exercised regenerating animals, if technically feasible. We will merge Figure S7 (HA supplementation) with Figure 5 (HA depletion) for clarity, as suggested.

      We will include a schematic and clear definitions for 'peripheral' and 'central' rays in a revised manuscript.

      Reviewer #3

      We included Hoechst and eosin fluorescent staining in the manuscript to show changes in tissue architecture following swimming exercise (Supplemental Figure 4). We will extend this histological analysis to include hematoxylin and eosin staining to provide additional tissue visualization.

      References

      Poleo G, Brown CW, Laforest L, Akimenko MA. Cell proliferation and movement during early fin regeneration in zebrafish. Dev Dyn. 2001 Aug;221(4):380-90.

      Poss KD, Nechiporuk A, Hillam AM, Johnson SL, Keating MT. Mps1 defines a proximal blastemal proliferative compartment essential for zebrafish fin regeneration. Development. 2002 Nov;129(22):5141-9.

      Wang YT, Tseng TL, Kuo YC, Yu JK, Su YH, Poss KD, Chen CH. Genetic Reprogramming of Positional Memory in a Regenerating Appendage. Curr Biol. 2019 Dec 16;29(24):4193-4207.e4.

      Pfefferli C, Müller F, Jaźwińska A, Wicky C. Specific NuRD components are required for fin regeneration in zebrafish. BMC Biol. 2014 Apr 29;12:30.

      Hou Y, Lee HJ, Chen Y, Ge J, Osman FOI, McAdow AR, Mokalled MH, Johnson SL, Zhao G, Wang T. Cellular diversity of the regenerating caudal fin. Sci Adv. 2020 Aug 12;6(33):eaba2084.

      Wallace IJ, Judex S, Demes B. Effects of load-bearing exercise on skeletal structure and mechanics differ between outbred populations of mice. Bone. 2015 Mar;72:1-8.

      Bomer N, Cornelis FM, Ramos YF, den Hollander W, Storms L, van der Breggen R, Lakenberg N, Slagboom PE, Meulenbelt I, Lories RJ. The effect of forced exercise on knee joints in Dio2(-/-) mice: type II iodothyronine deiodinase-deficient mice are less prone to develop OA-like cartilage damage upon excessive mechanical stress. Ann Rheum Dis. 2016 Mar;75(3):571-7.

      Honda A, Sogo N, Nagasawa S, Shimizu T, Umemura Y. High-impact exercise strengthens bone in osteopenic ovariectomized rats with the same outcome as Sham rats. J Appl Physiol (1985). 2003 Sep;95(3):1032-7.

      Boerckel JD, Kolambkar YM, Stevens HY, Lin AS, Dupont KM, Guldberg RE. Effects of in vivo mechanical loading on large bone defect regeneration. J Orthop Res. 2012 Jul;30(7):1067-75.

      Boerckel JD, Uhrig BA, Willett NJ, Huebsch N, Guldberg RE. Mechanical regulation of vascular growth and tissue regeneration in vivo. Proc Natl Acad Sci U S A. 2011 Sep 13;108(37):E674-80.

      Fiaz AW, Léon-Kloosterziel KM, Gort G, Schulte-Merker S, van Leeuwen JL, Kranenbarg S. Swim-training changes the spatio-temporal dynamics of skeletogenesis in zebrafish larvae (Danio rerio). PLoS One. 2012;7(4):e34072.

      Fiaz AW, Léon‐Kloosterziel KM, van Leeuwen JL, Kranenbarg S. Exploring the molecular link between swim‐training and caudal fin development in zebrafish (Danio rerio) larvae. Journal of Applied Ichthyology. 2014 Aug;30(4):753-61.

      Suniaga S, Rolvien T, Vom Scheidt A, Fiedler IAK, Bale HA, Huysseune A, Witten PE, Amling M, Busse B. Increased mechanical loading through controlled swimming exercise induces bone formation and mineralization in adult zebrafish. Sci Rep. 2018 Feb 26;8(1):3646.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      This study extends the previous interesting work of this group to address the potentially differential control of movement and posture. Their earlier work explored a broad range of data to make the case for a downstream neural integrator hypothesized to convert descending velocity movement commands into postural holding commands. Included in that data were observations from people with hemiparesis due to stroke. The current study uses similar data, but pushes into a different, but closely related direction, suggesting that these data may address the independence of these two fundamental components of motor control. I find the logic laid out in the second sentence of the abstract ("The paretic arm after stroke is notable for abnormalities both at rest and during movement, thus it provides an opportunity to address the relationships between control of reaching, stopping, and stabilizing") less then compelling, but the study does make some interesting observations. Foremost among them, is the relation between the resting force postural bias and the effect of force perturbations during the target hold periods, but not during movement. While this interesting observation is consistent with the central mechanism the authors suggest, it seems hard to me to rule out other mechanisms, including peripheral ones. These limitations should should be discussed.

      Thank you for summarizing our work. Note we have improved the logic in our abstract (…”providing an opportunity to ask whether control of these behaviors is independently affected in stroke”) based on your comments as outlined in our previous revision. We now extensively discuss limitations and potential alternative mechanisms in greater detail, in a dedicated section (lines 846-895; see response to reviewer 2 for further details).

      Reviewer #2 (Public review):

      Summary:

      Here the authors address the idea that postural and movement control are differentially impacted with stroke. Specifically, they examined whether resting postural forces influenced several metrics of sensorimotor control (e.g., initial reach angle, maximum lateral hand deviation following a perturbation, etc.) during movement or posture. The authors found that resting postural forces influenced control only following the posture perturbation for the paretic arm of stroke patients, but not during movement. They also found that resting postural forces were greater when the arm was unsupported, which correlated with abnormal synergies (as assessed by the Fugl-Meyer). The authors suggest that these findings can be explained by the idea that the neural circuitry associated with posture is relatively more impacted by stroke than the neural circuitry associated with movement. They also propose a conceptual model that differentially weights the reticulospinal tract (RST) and corticospinal tract (CST) to explain greater relative impairments with posture control relative to movement control, due to abnormal synergies, in those with stroke.

      Thank you for the brief but comprehensive summary. We would like to clarify one point: we do not suggest that our findings are necessarily due to the neural circuitry associated with posture being more impacted than the neural circuitry associated with movement. (rather, our conceptual model suggests that increased outflow through the (ipsilateral) RST, involved in posture, compensates for CST damage, at the expense of posture abnormalities spilling over into movement). Instead, we suggest that the neural circuitry for posture vs. movement control remains relatively separate in stroke, with impairments in posture control not substantially explaining impairments in movement control.

      Comments on revisions:

      The authors should be commended for being very responsive to comments and providing several further requested analyses, which have improved the paper. However, there is still some outstanding issues that make it difficult to fully support the provided interpretation.

      Thank you for appreciating our response to your earlier comments. We address the outstanding issues below.

      The authors say within the response, "We would also like to stress that these perturbations were not designed so that responses are directly compared to each other ***(though of course there is an *indirect* comparison in the sense that we show influence of biases in one type of perturbation but not the other)***." They then state in the first paragraph of the discussion that "Remarkably, these resting postural force biases did not seem to have a detectable effect upon any component of active reaching but only emerged during the control of holding still after the movement ended. The results suggest a dissociation between the control of movement and posture." The main issue here is relying on indirect comparisons (i.e., significant in one situation but not the other), instead of relying on direct comparisons. Using well-known example, just because one group / condition might display a significant linear relationship (i.e., slope_1 > 0) and another group / condition does not (slope_2 = 0), does not necessarily mean that the two groups / conditions are statistically different from one another [see Figure 1 in Makin, T. R., & Orban de Xivry, J. J. (2019). Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. eLife, 8, e48175.].

      We agree and are well aware of the limitation posed by an indirect comparison – hence the language we used to comment on the data (“did not seem”, “suggest”, etc.). To address this limitation, we performed a more direct comparison of how the two types of perturbations (moving vs. holding) interact with resting biases. For this comparison, we calculated a Response Asymmetry Index (RAI):

      Above, 𝑟<sub>𝐴</sub> is the response on direction where resting bias is most-aligned with the perturbation, and 𝑟<sub>𝑂</sub> is the response on direction where resting bias is most-opposed to the perturbation.

      We calculated RAIs for two response metrics used for both moving and holding perturbations: maximum deviation and time to stabilization/settling time. For these two response metrics, positive RAIs indicate an asymmetry in line with an effect of resting bias.

      The idea behind the RAI is that, while the magnitude of responses may well differ between the two types of perturbations, this will be accounted for by the ratio used to calculate the asymmetry. The same approach has been used to assess symmetry/laterality across a variety of different modalities, such as gait asymmetry (Robinson et al., 1987), the relative fMRI activity in the contralateral vs. ipsilateral sensorimotor cortex while performing a motor task (Cramer et al., 1997), or the relative strength of ipsilateral vs. contralateral responses to transcranial magnetic stimulation (McPherson et al., 2018). Notably, the normalization also addresses potential differences in overall stiffness between holding vs. moving perturbations, which would similarly affect aligned and opposing cases (see our response to your following point).

      Figure 8 shows RAIs we obtained for holding (red) vs. moving/pulse (blue) perturbations. For the maximum deviation (left), there is more asymmetry for the holding case though the pvalue is marginal (p=0.088) likely due to the large variability in the pulse case (individual values shown in black dots). For time to stabilization/settling time (right) the difference is significant (p=0.0048). Together, these analyses indicate that resting biases interact substantially more with holding compared to movement control, in line with a relative independence between these two control modalities. We now include this panel as Figure 8, and describe it in Results (lines 587-611).

      Note that even a direct comparison does not prove that resting biases and active movement control are perfectly independent. We now discuss these issues in more depth, in the new Limitations section suggested by the Reviewer (lines 836-849).

      The authors have provided reasonable rationale of why they chose certain perturbation waveforms for different. Yet it still holds that these different waveforms would likely yield very different muscular responses making it difficult to interpret the results and this remains a limitation. From the paper it is unknown how these different perturbations would differentially influence a variety of classic neuromuscular responses, including short-range stiffness and stretch reflexes, which would be at play here.

      Much of the results can be interpreted when one considers classic neuromuscular physiology. In Experiment 1, differences in resting postural bias in supported versus unsupported conditions can readily be explained since there is greater muscle activity in the unsupported condition that leads to greater muscle stiffness to resist mechanical perturbations (Rack, P. M., & Westbury, D. R. (1974). The short-range stiffness of active mammalian muscle and its effect on mechanical properties. The Journal of physiology, 240(2), 331-350.). Likewise muscle stiffness would scale with changes in muscle contraction with synergies. Importantly for experiment 2, muscle stiffness is reduced during movement (Rack and Westbury, 1974) which may explain why resting postural biases do not seem to be impacting movement. Likewise, muscle spindle activity is shown to scale with extrafusal muscle fiber activity and forces acting through the tendon (Blum, K. P., Campbell, K. S., Horslen, B. C., Nardelli, P., Housley, S. N., Cope, T. C., & Ting, L. H. (2020). Diverse and complex muscle spindle afferent firing properties emerge from multiscale muscle mechanics. eLife, 9, e55177.). The concern here is that the authors have not sufficiently considered muscle neurophysiology, how that might relate to their findings, and how that might impact their interpretation. Given the differences in perturbations and muscle states at different phases, the concern is that it is not possible to disentangle whether the results are due to classic neurophysiology, the hypothesis they propose, or both. Can the authors please comment.

      It is possible that neuromuscular physiology may explain part of our results. However, this would not contradict our conceptual model.

      Regarding Experiment 1, it is possible that stiffness would scale with changes in background muscle contraction as the reviewer suggests. Indeed, Bennett and al.(Bennett et al., 1992) used brief perturbations on the wrist to assess elbow stiffness, finding that, during movement, stiffness was increased in positions with a higher gravity load (and, in general, in positions where the net muscle torque was higher). However, during posture maintenance (like in our Experiment 1), they found that stiffness did not vary with (elbow) position or gravity load (two characteristics of our findings in Experiment 1):

      “The observed stiffness variation was not simply due to passive tissue or other joint angle dependent properties, as stiffnesses measured during posture were position invariant. Note that the minimum stiffness found in posture was higher than the peak stiffness measured during movement, and did not change much with the gravity load.” (illustrated in Fig. 5 of that paper)

      We thus find it very unlikely that stiffness explains the difference between the supported vs. unsupported conditions in Experiment 1.

      Even if stiffness modulation between the supported vs. unsupported conditions could explain our finding of stronger posture biases in the latter case, it would not be incompatible with our interpretation of increased RST drive: increased stiffness would potentially magnify the effects of the RST drive we propose to drive these resting biases. It is possible that the increase in resting biases under conditions of increased muscle contraction (lack of arm support) is mediated through an increase in muscle stiffness. In other words, the increase in resting biases may not directly reflect additional RST outflow per se, but the scaling, through stiffness, of the same magnitude of RST outflow. Understanding this interaction was beyond the scope of our experiment design; in line with this, we briefly comment about it in our Limitations section.

      Regarding Experiment 2, stiffness has indeed been shown to be lower during movement, and we now comment the potential effect of this on our results in the “Limitations” section (lines 815-830, replicated below). Importantly, for the case of holding perturbations, the increased stiffness associated with holding would increase resistance to both extension and flexion-inducing perturbations. Thus, higher stiffness would be unlikely to explain our finding whereby resting biases resist or aggravate the effects of holding perturbations depending on perturbation direction. In addition, the framework in Blum et al., that describes how interactions between alpha and gramma drive can explain muscle activity patterns, does not rule out central neural control of stiffness: “muscle spindles have a unique muscle-within-muscle design such that their firing depends critically on both peripheral and central factors” (emphasis ours). It may be, for example, that gamma motoneurons controlling muscle spindles and stiffness are modulated from input from the reticular formation, making this a mechanism in line with our conceptual model.

      “Moreover, it has been shown that joint stiffness is reduced during movement compared to holding control (Rack and Westbury, 1974; Bennett et al., 1992). Along similar lines, muscle spindle activity – which may modulate stiffness – scales with extrafusal muscle fiber activity (such as muscle exertion involved in holding) and forces acting through the tendon (Blum et al., 2020). Such observations could, in principle, explain why we were unable to detect a relationship between resting biases and active movement control but we readily found a relationship between resting biases and active holding control: reduced joint stiffness during movement could scale down the influence of resting abnormalities. There are two issues with this explanation, however. First, it is debatable whether this should be considered an alternative explanation per se: stiffness modulation could be, in total or in part, the manifestation of a central movement/posture CST/RST mechanism similar to the one we propose in our conceptual model. For example, (Blum et al., 2020) argue that muscle spindle firing depends on both peripheral and central factors. Second, increased stiffness would not necessarily help detect differences in how active postural control responds to within-resting-posture vs. out-of-resting-posture perturbations. This is because an overall increase in stiffness would likely increase resistance to perturbations in any direction.”

      The authors should provide a limitations paragraph. They should address 1) how they used different perturbation force profiles, 2) the muscles were in different states which would change neuromuscular responses between trial phase / condition, 3) discuss a lack of direct statistical comparisons that support their hypothesis, and 4) provide a couple of paragraphs on classic neurophysiology, such as muscle stiffness and stretch reflexes, and how these various factors could influence the findings (i.e., whether they can disentangle whether the reported results are due to classic neurophysiology, the hypothesis they propose, or both).

      Thank you for your suggestion. We now discuss these points in a separate paragraph (lines 846895), bringing together our previous discussion on stretch reflexes, our description of different perturbation types, and the additional issues raised by the reviewer above.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      The authors have responded well to all my concerns, save two minor points.

      Figure 2 appears to be unchanged, although they describe appropriate changes in the response letter.

      Thank you for catching this error – we now include the updated figure (further updated to use the terms near/distant in place of proximal/distal).

      I still take issue with the use of proximal and distal to describe the locations of targets. Taking definitions somewhat randomly from the internet, "The terms proximal and distal are used in structures that are considered to have a beginning and an end," and "Proximal and distal are anatomical terms used to describe the position of a body part in relation to another part or its origin." In any case, the hand does not become proximal just because you bring it to your chest. Why not simply stick to the common and clearly defined terms "near" and "distant"?

      Point taken. We have updated the paper to use the terms near/distant.

      Additional changes/corrections not outlined above

      We now include a link to the data and code supporting our findings (https://osf.io/hufy8/). In addition, we made several minor edits throughout the text to improve readability, and corrected occasional mislabeling of CCW and CW pulse data. Note that this correction did not alter the (lack of) relationship between resting biases and responses to perturbations during active movement.

      Response letter references

      Bennett D, Hollerbach J, Xu Y, Hunter I (1992) Time-varying stiffness of human elbow joint during cyclic voluntary movement. Exp Brain Res 88:433–442.

      Blum KP, Campbell KS, Horslen BC, Nardelli P, Housley SN, Cope TC, Ting LH (2020) Diverse and complex muscle spindle afferent firing properties emerge from multiscale muscle mechanics. Elife 9:e55177.

      Cramer SC, Nelles G, Benson RR, Kaplan JD, Parker RA, Kwong KK, Kennedy DN, Finklestein SP, Rosen BR (1997) A functional MRI study of subjects recovered from hemiparetic stroke. Stroke 28:2518–2527.

      McPherson JG, Chen A, Ellis MD, Yao J, Heckman C, Dewald JP (2018) Progressive recruitment of contralesional cortico-reticulospinal pathways drives motor impairment post stroke. J Physiol 596:1211–1225 Available at: https://doi.org/10.1113/JP274968.

      Rack PM, Westbury D (1974) The short range stiffness of active mammalian muscle and its effect on mechanical properties. J Physiol 240:331–350.

      Robinson R, Herzog W, Nigg BM (1987) Use of force platform variables to quantify the effects of chiropractic manipulation on gait symmetry. J Manipulative Physiol Ther 10:172–176.

      Williams PE, Goldspink G (1973) The effect of immobilization on the longitudinal growth of striated muscle fibres. J Anat 116:45.

    1. Reviewer #3 (Public review):

      Summary:

      The authors set out to extend their previous mapping of Drosophila head mechanosensory neurons (Eichler et al., 2024) by reconstructing their full second-order connectome. Their aim is to reveal how bristle mechanosensory neurons (BMNs) interface with excitatory and inhibitory partners to generate location-specific grooming movements, and to identify the circuit motifs and developmental lineages that support this transformation.

      Strengths:

      The strengths of this work are clear. The authors present a comprehensive synaptic-resolution connectome for BMNs, identifying nearly all of their pre- and postsynaptic partners. This dataset reveals important circuit motifs:

      (1) BMNs provide feedforward excitation to descending neurons, feedforward inhibition to interneurons, and are themselves strongly regulated by GABAergic presynaptic inhibition.

      (2) These motifs together support the idea that BMN activity is locally gated and hierarchically suppressed, fitting well with known behavioural sequences of grooming.

      (3) The study also shows that connectivity preserves somatotopy, such that BMNs from neighbouring bristle populations converge onto shared partners, while distant BMNs remain segregated.

      (4) A developmental analysis reveals both primary and secondary partners, suggesting a layered scaffold plus adult-specific elaborations.

      (5) Finally, the identification of hemilineage 23b (LB23) as a core postsynaptic pathway - incorporating previously described antennal grooming neurons (aBN2) - provides a striking link between developmental lineage, anatomical connectivity, and behavioral output.

      (6) Together, the dataset represents a valuable resource for the neuroscience community and a foundation for future functional studies.

      Weaknesses:

      There are also some weaknesses that mostly only limit clarity.

      (1) The writing is dense, with results often presented in a cryptic fashion and the functional implications deferred to the discussion. As a result, the significance of circuit motifs such as BMN→motor or reciprocal inhibitory loops is sometimes buried, rather than highlighted when first described.

      (2) Some assumptions require more explanation for non-specialist readers - for example, how bristle identity is inferred in EM in the absence of cuticular structures, or what is meant by "ascending" and "descending" in a dataset that does not include the ventral nerve cord. While some of this comes from the earlier paper, it would help readers of this one to explain this.

      (3) Visualization choices also sometimes obscure key conclusions: network graphs can be visually appealing but do not clearly convey somatotopy or BMN-type differences; heatmaps or region-level matrices would make the parallel, block-like organization of the circuit more evident.

      (4) The data might also speak to roles beyond grooming (e.g., mechanosensory modulation of posture or feeding), and a brief acknowledgement of this would broaden the impact.

      (5) The restriction to one hemisphere should be explicitly acknowledged as a limitation when framing this as a 'comprehensive' connectome.

      Overall, the authors achieve their main goal: they convincingly show that BMNs connect into parallel, somatotopically organized pathways, with LB23 providing a key lineage-based link from sensory input to grooming output. The dataset is carefully analyzed, and while the presentation could be streamlined, the connectome will be a valuable resource for researchers studying sensory processing, motor control, and the logic of circuit organization.

    1. Reviewer #1 (Public review):

      The manuscript presents a compelling new in vitro system based on isogenic co-cultures of human iPSC-derived hepatocytes and macrophages, enabling the modelling of hepatic immune responses with unprecedented physiological relevance. The authors show that co-culture leads to enhanced maturation of hepatocytes and tissue-resident macrophage identity, which cannot be achieved through conditioned media alone. Using this system, they functionally validate immune-driven hepatotoxic responses to a panel of drugs and compare the system's predictive power to that of monocyte-derived macrophages. The results underscore the necessity of macrophage-hepatocyte crosstalk for accurate modelling of liver inflammation and drug toxicity in vitro.

      The manuscript is clearly written and addresses a key limitation in liver organoid systems: the lack of immune complexity and tissue-specific macrophage imprinting. Nevertheless, several conclusions would benefit from a more careful interpretation of the data, and some important controls or explanations are missing, particularly in the flow cytometry gating strategies, stress marker validation, and cluster interpretations.

      Strengths:

      (1) Novelty and Relevance: The study presents a highly innovative co-culture system based on isogenic human iPSCs, addressing an unmet need in modelling immune-mediated hepatotoxicity.

      (2) Mechanistic Insight: The reciprocal reprogramming between iHeps and iMacs, including induction of KC-specific pathways and hepatocyte maturation markers, is convincingly demonstrated.

      (3) Functional Readouts: The application of the model to detect IL-6 responses to hepatotoxic compounds enhances its translational relevance.

      Weaknesses:

      (1) Several key claims, particularly those derived from PCA plots and DEG analyses, are overinterpreted and require more conservative language or further validation.

      (2) The purity of sorted hepatocytes and macrophages is not convincingly demonstrated; contamination across gates may confound transcriptomic readouts.

      (3) Stress response genes and ER stress/apoptosis signatures are not properly assessed, despite being potentially activated in the system.

      (4) Some figure panels and legends lack statistical annotations, and microscopy validation of morphological changes is missing.

      (5) The co-culture model with monocyte-derived macrophages is not fully characterised, making comparisons less informative.

    2. Reviewer #3 (Public review):

      Summary:

      In this study, the authors establish a human in vitro liver model by co-culturing induced hepatocyte-like cells (iHEPs) with induced macrophages (iMACs). Through flow cytometry-based sorting of cell populations at days 3 and 7 of co-culture, followed by bulk RNA sequencing, they demonstrate that bidirectional interactions between these two cell types drive functional maturation. Specifically, the presence of iMACs accelerates the hepatic maturation program of iHEPs, while contact-dependent cues from iHEPs enhance the acquisition of Kupffer cell identity in iMACs, indicating that direct cell-cell interactions are critical for establishing tissue-resident macrophage characteristics.

      Functionally, the authors show that iMAC-derived Kupffer-like cells respond to pathological stimuli by producing interleukin-6 (IL-6), a hallmark cytokine of hepatic immune activation. When exposed to a panel of clinically relevant hepatotoxic drugs, the co-culture system exhibited concentration-dependent modulation of IL-6 secretion consistent with reported drug-induced liver injury (DILI) phenotypes. Notably, this response was absent when hepatocytes were co-cultured with monocyte-derived macrophages from peripheral blood, underscoring the liver-specific phenotype and functional relevance of the iMAC-derived Kupffer-like cells. Collectively, the study proposes this co-culture platform as a more physiologically relevant model for interrogating macrophage-hepatocyte crosstalk and assessing immune-mediated hepatotoxicity in vitro.

      Strengths:

      A major strength of this study lies in its systematic dissection of cell-cell interactions within the co-culture system. By isolating each cell type following co-culture and performing comprehensive transcriptomic analyses, the authors provide direct evidence of bidirectional crosstalk between iMACs and iHEPs. The comparison with single-culture controls is particularly valuable, as it clearly demonstrates how co-culture enhances functional maturation and lineage-specific gene expression in both cell types. This approach allows for a more mechanistic understanding of how hepatocyte-macrophage interactions contribute to the acquisition of tissue-specific phenotypes.

      Weaknesses:

      (1) Overreliance on bulk RNA-seq data:

      The primary evidence supporting cell maturation is derived from bulk RNA sequencing, which has inherent limitations in resolving heterogeneous cellular states and functional maturation. The conclusions regarding hepatocyte maturation are based largely on increased expression of a subset of CYP genes and decreased AFP levels - markers that, while suggestive, are insufficient on their own to substantiate functional maturation. Additional phenotypic or functional assays (e.g., metabolic activity, protein-level validation) would significantly strengthen these claims.

      (2) Insufficient characterization of input cell populations:

      The manuscript lacks adequate validation of the cellular identities prior to co-culture. Although the authors reference previously published protocols for generating iHEPs and iMACs, it remains unclear whether the cells used in this study faithfully retain expected lineage characteristics. For example, hepatocyte preparations should be characterized by flow cytometry for ALB and AFP expression, while iMACs should be assessed for canonical macrophage markers such as CD45, CD11b, and CD14 before co-culture. Without these baseline data, it is difficult to interpret the magnitude or significance of any co-culture-induced changes.

      (3) Quantitative assessment of IL-6 production is insufficient:

      The analysis of drug-induced IL-6 responses is based primarily on relative changes compared to control conditions. However, percentage changes alone are inadequate to capture the biological relevance of these responses. Absolute cytokine production levels - particularly in response to LPS stimulation - should be reported and directly compared to PBMC-derived macrophages to determine whether iMAC-derived Kupffer-like cells exhibit enhanced cytokine output. Moreover, the Methods section should clearly describe how ELISA results were normalized or corrected to account for potential differences in cell number, viability, or culture conditions.

      (4) Unclear mechanistic interpretation of IL-6 modulation:

      The observed changes in IL-6 production upon drug treatment cannot be interpreted solely as evidence of Kupffer cell-specific functionality. For instance, IL-6 suppression by NSAIDs such as diclofenac is well known to result from altered prostaglandin synthesis due to COX inhibition, while leflunomide's effects are linked to metabolite-induced modulation of immune cell proliferation and broader cytokine networks. These mechanisms are distinct from Kupffer cell identity and may not directly reflect liver-specific macrophage function. Consequently, changes in IL-6 secretion alone - particularly without additional mechanistic evidence or analysis of other cytokines - are insufficient to conclude that co-culture with hepatocytes drives the acquisition of bona fide Kupffer cell maturity.

    1. O que é que você pagou ontem?

      1,Eu pagei o aluguel 2,Ele tomou em casa 3,Eu jantei no resturante ontem à noite 4,Ela saiu com uns amigos 5,eu almocei às onze e meia 6,Eles deitaram à meia-noite 7,Eu li um jornal 8,Ela se levantou às seis e quinze 9,Eu me-levantei às oito e vinte 10.Nós fomos ao cinema

    1. Reviewer #2 (Public review):

      Summary:

      The authors generate an optimized small molecule inhibitor of SMARCA2/4 and test it in a panel of cell lines. All uveal melanoma (UM) cell lines in the panel are growth inhibited by the inhibitor making the focus of the paper. This inhibition is correlated with loss of promoter occupancy of key melanocyte transcription factors e.g. SOX10. SOX10 overexpression and a point mutation in SMARCA4 can rescue growth inhibition exerted by the SMARCA2/4 inhibitor. Treatment of a UM xenograft model results in growth inhibition and regression which correlates with reduced expression of SOX10 but not discernible toxicity in the mice. Collectively, the data suggest a novel treatment of uveal melanoma.

      Strengths:

      There are many strengths of the study, including the strong challenge of the on-target effect, the assays used and the mechanistic data. The results are compelling as are the effects of the inhibitor. The in vivo data is dose-dependent and doses are low enough to be meaningful and associated with evidence of target engagement.

    2. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review): 

      Summary: 

      The presented study by Centore and colleagues investigates the inhibition of BAF chromatin remodeling complexes. The study is well written and includes comprehensive datasets, including compound screens, gene expression analysis, epigenetics, as well as animal studies. This is an important piece of work for the uveal melanoma research field, and sheds light on a new inhibitor class, as well as a mechanism that might be exploited to target this deadly cancer for which no good treatment options exist. 

      Strengths: 

      This is a comprehensive and well-written study. 

      Weaknesses: 

      There are minimal weaknesses. 

      Reviewer #2 (Public review): 

      Summary: 

      The authors generate an optimized small molecule inhibitor of SMARCA2/4 and test it in a panel of cell lines. All uveal melanoma (UM) cell lines in the panel are growth inhibited by the inhibitor making the focus of the paper. This inhibition is correlated with loss of promoter occupancy of key melanocyte transcription factors e.g. SOX10. SOX10 overexpression and a point mutation in SMARCA4 can rescue growth inhibition exerted by the SMARCA2/4 inhibitor. Treatment of a UM xenograft model results in growth inhibition and regression which correlates with reduced expression of SOX10 but not discernible toxicity in the mice. Collectively, the data suggest a novel treatment of uveal melanoma. 

      Strengths: 

      There are many strengths of the study, including the strong challenge of the on-target effect, the assays used and the mechanistic data. The results are compelling as are the effects of the inhibitor. The in vivo data is dose-dependent and doses are low enough to be meaningful and associated with evidence of target engagement. 

      Weaknesses: 

      The authors have addressed weaknesses in the revised version. 

      Reviewer #3 (Public review): 

      Summary: 

      This manuscript reports the discovery of new compounds that selectively inhibit SMARCA4/SMARCA2 ATPase activity and have pronounced effects on uveal melanoma cell proliferation. They induce apoptosis and suppress tumor growth, with no toxicity in vivo. The report provides biological significance by demonstrating that the drugs alter chromatin accessibility at lineage specific gene enhancer regions and decrease expression of lineage specific genes, including SOX10 and SOX10 target genes. 

      Strengths: 

      The study provides compelling evidence for the therapeutic use of these compounds and does a thorough job at elucidating the mechanisms by which the drugs work. The study will likely have a high impact on the chromatin remodeling and cancer fields. The datasets will be highly useful to these communities. 

      Weaknesses: 

      The authors have addressed all my concerns. 

      Recommendations for the authors: 

      We would, however, like to draw the authors attention to 2 comments by the referees. 

      Referee 1 comments: While BAP1 mutant UM cell lines were included for some of the experiments, it seems the in-vivo data mentioned in the response to the reviewers comment is missing? The authors stated that "MP46 (Supplementary Fig. 3a) is BAP1null uveal melanoma cell line with no detectable protein expression (AmiroucheneAngelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor." But the CDX model data shown in Figure 4 is from 92.1 cells. If this data is available, then the manuscript would benefit from its addition. 

      We thank the reviewer for bringing this to our attention. As the reviewer mentioned, we show 92-1 CDX model in our manuscript. Additionally, strong tumor growth inhibition was observed in MP-46  CDX model treated with our BAF ATPase inhibitor and can be found in Vaswani et al., 2025 (PMID:39801091, https://pubmed.ncbi.nlm.nih.gov/39801091/).

      Referee 3 comments: 

      Supplementary Figure 2C 

      Is the T910M mutation in the parental MP41 cells heterozygous? If so, the authors should indicate this in the figure legend. If this is a homozygous mutation, the authors should explain how the inhibitors suppress SMARCA4 activity in cells that have a LOF mutation. 

      Could the authors please comment on these issues before a final version is posted online? 

      We thank the reviewer for bringing this to our attention. T910M mutation is heterozygous and the variant allele frequency for that mutation is 0.5. We updated the figure legend accordingly to reflect the genotype of the mutations highlighted in the table.

      Reviewer #1 (Recommendations for the authors): 

      The authors have addressed most of the questions in their review. 

      While BAP1 mutant UM cell lines were included for some of the experiments, it seems the in-vivo data mentioned in the response to the reviewers comment is missing? The authors stated that "MP46 (Supplementary Fig. 3a) is BAP1-null uveal melanoma cell line with no detectable protein expression (Amirouchene-Angelozzi et al., Mol Oncol 2014), and we have observed strong tumor growth inhibition in this CDX model with our BAF ATPase inhibitor." But the CDX model data shown in Figure 4 is from 92.1 cells. If this data is available, then the manuscript would benefit from its addition. 

      Reviewer #3 (Recommendations for the authors): 

      Supplementary Figure 2C 

      Is the T910M mutation in the parental MP41 cells heterozygous? If so, the authors should indicate this in the figure legend. If this is a homozygous mutation, the authors should explain how the inhibitors suppress SMARCA4 activity in cells that have a LOF mutation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      In this manuscript, the authors performed an integration of 48 scRNA-seq public datasets and created a single-cell transcriptomic atlas for AML (222 samples comprising 748,679 cells). This is important since most AML scRNA-seq studies suffer from small sample size coupled with high heterogeneity. They used this atlas to further dissect AML with t(8;21) (AML-ETO/RUNX1-RUNX1T1), which is one of the most frequent AML subtypes in young people. In particular, they were able to predict Gene Regulatory Networks in this AML subtype using pySCENIC, which identified the paediatric regulon defined by a distinct group of hematopoietic transcription factors (TFs) and the adult regulon for t(8;21). They further validated this in bulk RNA-seq with AUCell algorithm and inferred prenatal signature to 5 key TFs (KDM5A, REST, BCLAF1, YY1, and RAD21), and the postnatal signature to 9 TFs (ENO1, TFDP1, MYBL2, KLF1, TAGLN2, KLF2, IRF7, SPI1, and YXB1). They also used SCENIC+ to identify enhancer-driven regulons (eRegulons), forming an eGRN, and found that prenatal origin shows a specific HSC eRegulon profile, while a postnatal origin shows a GMP profile. They also did an in silico perturbation and found AP-1 complex (JUN, ATF4, FOSL2), P300, and BCLAF1 as important TFs to induce differentiation. Overall, I found this study very important in creating a comprehensive resource for AML research. 

      Strengths: 

      (1) The generation of an AML atlas integrating multiple datasets with almost 750K cells will further support the community working on AML. 

      (2) Characterisation of t(8;21) AML proposes new interesting leads. 

      We thank the reviewer for a succinct summary of our work and highlighting its strengths.

      Weaknesses: 

      Were these t(8;21) TFs/regulons identified from any of the single datasets? For example, if the authors apply pySCENIC to any dataset, would they find the same TFs, or is it the increase in the number of cells that allows identification of these? 

      We implemented pySCENIC on individual datasets and compared the TFs (defining the regulons) identified to those from the combined AML scAtlas analysis. There were some common TFs identified, but these vary between individual studies. The union of all TFs identified makes a very large set - comprising around a third of all known TFs. AML scAtlas provides a more refined repertoire of TFs, perhaps as the underlying network inference approach is more robust with a higher number of cells. The findings of these investigations are included in Supplementary Figure 4DE, we hope this is useful for other users of pySCENIC.

      Reviewer #2 (Public review): 

      Summary: 

      The authors assemble 222 publicly available bone marrow single-cell RNA sequencing samples from healthy donors and primary AML, including pediatric, adolescent, and adult patients at diagnosis. Focusing on one specific subtype, t(8;21), which, despite affecting all age classes, is associated with better prognosis and drug response for younger patients, the authors investigate if this difference is reflected also in the transcriptomic signal. Specifically, they hypothesize that the pediatric and part of the young population acquires leukemic mutations in utero, which leads to a different leukemogenic transformation and ultimately to differently regulated leukemic stem cells with respect to the adult counterpart. The analysis in this work heavily relies on regulatory network inference and clustering (via SCENIC tools), which identifies regulatory modules believed to distinguish the pre-, respectively, post-natal leukemic transformation. Bulk RNA-seq and scATAC-seq datasets displaying the same signatures are subsequently used for extending the pool of putative signature-specific TFs and enhancer elements. Through gene set enrichment, ontology, and perturbation simulation, the authors aim to interpret the regulatory signatures and translate them into potential onset-specific therapeutic targets. The putative pre-natal signature is associated with increased chemosensitivity, RNA splicing, histone modification, stemness marker SMARCA2, and potentially maintained by EP300 and BCLAF1. 

      Strengths: 

      The main strength of this work is the compilation of a pediatric AML atlas using the efficient Cellxgene interface. Also, the idea of identifying markers for different disease onsets, interpreting them from a developmental angle, and connecting this to the different therapy and relapse observations, is interesting. The results obtained, the set of putative up-regulated TFs, are biologically coherent with the mechanisms and the conclusions drawn. I also appreciate that the analysis code was made available and is well documented. 

      We thank the reviewer for evaluating our work, and highlighting its key features, including creation of AML atlas, downstream analysis and interpretation for t(8;21) subtype.

      Weaknesses:

      There were fundamental flaws in how methods and samples were applied, a general lack of critical examination of both the results and the appropriateness of the methods for the data at hand, and in how results were presented. In particular: 

      (1) Cell type annotation: 

      (a) The 2-phase cell type annotation process employed for the scRNA-seq sample collection raised concerns. Initially annotated cells are re-labeled after a second round with the same cell types from the initial label pool (Figure 1E). The automatic annotation tools were used without specifying the database and tissue atlases used as a reference, and no information was shown regarding the consensus across these tools. 

      Cell type annotations are heavily influenced by the reference profiles used and vary significantly between tools. To address this, we used multiple cell type annotation tools which predominantly encompassed healthy peripheral blood cell types and/or healthy bone marrow populations. This determined the primary cluster cell types assigned. 

      Existing tools and resources are not leukemia specific, thus, to identify AMLassociated HSPC subpopulations we created a custom SingleR reference, using a CD34 enriched AML single-cell dataset. This was not suitable for the annotation of the full AML scAtlas, as it is derived from CD34 sorted cell types so is biased towards these populations. 

      We have made this much clearer in the revised manuscript, by splitting Figure 1 into two separate figures (now Figure 1 and Figure 2) reflecting both different analyses performed. The methods have also been updated with more detail on the cell type annotations, and we have included the automated annotation outputs as a supplementary table, as this may be useful for others in the single-cell community. 

      (b) Expression of the CD34 marker is only reported as a selection method for HSPCs, which is not in line with common practice. The use of only is admitted as a surface marker, while robust annotation of HSPCs should be done on the basis of expression of gene sets. 

      Most of the cells used in the HSPC analysis were in fact annotated as HSPCs with some exceptions. In line with this feedback, we have re-worked this analysis and simply taken HSPC annotated clusters forward for the subsequent analysis, yielding the same findings. 

      (c) During several analyses, the cell types used were either not well defined or contradictory, such as in Figure 2D, where it is not clear if pySCENIC and AUC scores were computed on HSPCs alone or merged with CMPs. In other cases, different cell type populations are compared and used interchangeably: comparing the HSPCderived regulons with bulk (probably not enriched for CD34+ cells) RNA samples could be an issue if there are no valid assumptions on the cell composition of the bulk sample. 

      We apologize for the lack of clarity regarding which cell types were used, the text has been updated to clarify that in the pySCENIC analysis all myeloid progenitor cells were included. 

      The bulk RNA-seq samples were used only to test the enrichment of our AML scAtlas derived regulons in an unbiased and large-scale way. While CD34 enriched samples could be preferable, this was not available to us. 

      We agree that more effort could be made to ensure the single-cell/myeloid progenitor derived regulons are comparable to the bulk-RNA sequencing data. In the original bulk RNA-seq validation analysis, we used all bulk-RNA sequencing timepoints (diagnostic, on-treatment, relapse) and included both bone marrow and peripheral blood. Upon reflection, and to better harmonize the bulk RNA-seq selection strategy with that of AML scAtlas, we revised our approach to include only diagnostic bone marrow samples. We expect that, since the leukemia blast count for pediatric AML is typically high at diagnosis, these samples will predominantly contain leukemic blasts. 

      (2) Method selection: 

      (a) The authors should explain why they use pySCENIC and not any other approach.They should briefly explain how pySCENIC works and what they get out in the main text. In addition they should explain the AUCell algorithm and motivate its usage. 

      pySCENIC is state-of-the-art method for network inference from scRNA data and is widely used within the single-cell community (over 5000 citations for both versions of the SCENIC pipeline). The pipeline has been benchmarked as one of the top performers for GRN analysis (Nguyen et al, 2021. Briefings in Bioinformatics). AUCELL is a module within the pySCENIC pipeline to summarize the activity of a set of genes (a regulon) into a single number which helps compare and visualize different regulons.  We have modified the manuscript (Results section 2 paragraph 2) to better explain this method and provided some rationale and accompanying citations to justify its use for this analysis. We thank the reviewer for highlighting this and hope our updates add some clarity.

      (b) The obtained GRN signatures were not critically challenged on an external dataset. Therefore, the evidence that supports these signatures to be reliable and significant to the investigated setting is weak. 

      These signatures were inferred using the most suitable AML single-cell RNA datasets currently available. To validate our findings, we used two independent datasets (the TARGET AML bulk RNA sequencing cohort, and the Lambo et al. scRNA-seq dataset). To clarify this workflow in the manuscript, we have added a panel to Figure 3 outlining the analytical process. To our knowledge, there are no other better-suited datasets for validation. Experimental validations on patient samples, while valuable, are beyond the scope of this study.

      (3) There are some issues with the analysis & visualization of the data. 

      Based on this feedback, we have improved several aspects of the analysis, changed some visualizations, and improved figure resolution throughout the manuscript. 

      (4) Discussion: 

      (a) What exactly is the 'regulon signature' that the authors infer? How can it be useful for insights into disease mechanisms? 

      The ’regulon signature’ here refers to a gene regulatory program (multiple gene modules, each defined by a transcription factor and its targets) which are specific to different age groups. Further investigation into this can be useful for understanding why patients of different ages confer a different clinical course. We have amended the text to explain this.  

      (b) The authors write 'Together this indicates that EP300 inhibition may be particularly effective in t(8;21) AML, and that BCLAF1 may present a new therapeutic target for t(8;21) AML, particularly in children with inferred pre-natal origin of the driver translocation.' I am missing a critical discussion of what is needed to further test the two targets. Put differently: Would the authors take the risk of a clinical study given the evidence from their analysis? 

      Indeed, many extensive studies would be required before these findings are clinically translatable. We have included a discussion paragraph (discussion paragraph 7) detailing what further work is required in terms of experimental validation and potential subsequent clinical study.

      Reviewer #1 (Recommendations for the authors): 

      In addition to the point raised above, Cytoscape files for the GRNs and eGRNs inferred would be useful to have. 

      We have now provided Cytoscape/eGRN tables in supplementary materials.

      Reviewer #2 (Recommendations for the authors): 

      (1) Figures 1F and 1G: You show the summed-up frequencies for all patients, right? It would be very interesting to see this per patient, or add error bars, since the shown frequencies might be driven by single patients with many cells. 

      While this type of plot could be informative, the large number of samples in the AML scAtlas rendered the output difficult to interpret. As a result, we decided not to include it in the manuscript.

      (2) An issue of selection bias has to be raised when only the two samples expressing the expected signatures are selected from the external scRNA dataset. Similarly, in the DepMap analysis, the age and nature of the other cell lines sensitive to EP300 and BCLAF1 should be reported. 

      Since the purpose of this analysis was to build on previously defined signatures, we selected the two samples which we had preliminary hypotheses for. It would indeed be interesting to explore those not matching these signatures; however, samples numbers are very small, so without preliminary findings robust interpretation and validation would be difficult. An expanded validation would be more appropriate once more data becomes available in the future. 

      We agree that investigating the age and nature of other BCLAF1/EP300 sensitive cell lines is a very valuable direction. Our analysis suggests that our BCLAF1 findings may also be applicable to other in-utero origin cancers, and we have now summarized these observations in Supplementary Figure 7H. 

      (3) Is there statistical evidence for your claim that "This shows that higher-risk subtypes have a higher proportion of LSCs compared to favorable risk disease."? At least intermediate and adverse look similar to me. How does this look if you show single patients?  

      We are grateful to the reviewer for noticing this oversight and have now included an appropriate statistical test in the revised manuscript. As before, while showing single patients may be useful, the large number of patients makes such plot difficult to interpret. For this reason, we have chosen not to include them.

      (4) Specify the statistical test you used to 'identify significantly differentially expressed TFs' (line 192). 

      The methods used for differential expression analysis are now clearly stated in the text as well as in the methods section. We hope this addition improves clarity for the reader.

      (5) Figure 2B: You show the summed up frequencies for all patients, right? It would be intriguing to see this figure per patient, since the shown frequencies might be driven by single patients with many cells. 

      Yes, the plot includes all patients. Showing individual patients on a single plot is not easily interpretable. 

      (6) Y axis in 2D is not samples, but single cells? Please specify. 

      We thank the reviewer for bringing this to our attention and have now updated Figure 3D accordingly. 

      (7) Figure 3A: I don't get why the chosen clusters are designated as post- and prenatal, given the occurrence of samples in them. 

      This figure serves to validate the previously defined regulon signatures, so the cluster designations are based on this. We have amended the text to elaborate on this point, which will hopefully provide greater clarity.

      (8) Figure 3E: What is shown on the y axis? Did you correct your p-values for multiple testing? 

      We apologize for this oversight and have now added a y axis label. P values were not corrected for multiple testing, as there are only few pairwise T tests performed.

      (9) Robustness: You find some gene sets up- and down-regulated. How would that change if you used an eg bootstrapped number of samples, or a different analysis approach? 

      To address this, we implemented both edgeR and DESeq2 for DE testing. Our findings (Supplementary Figure 5B) show that 98% of edgeR genes are also detected by DESeq2. We opted to use the smaller edgeR gene list for our analysis, due to the significant overlap showing robust findings. We thank the reviewer for this helpful suggestion, which has strengthened our analysis

      (10) Multiomics analysis:

      (a) Why only work on 'representative samples'? The idea of an integrated atlas is to identify robust patterns across patients, no? I'd love to see what regulons are robust, ie,  shared between patients.

      As discussed in point 2, there are very few samples available for the multiomics analysis. Therefore, we chose to focus on those samples which we had a working hypothesis for, as a validation for our other analyses. 

      (b) I don't agree that finding 'the key molecular processes, such as RNA splicing, histone modification, and TF binding' expressed 'further supports the stemness signature in presumed prenatal origin t(8;21) AML'.

      Following the improvements made on the bulk RNA-Seq analysis in response to the previous reviewer comments, we ended up with a smaller gene set. Consequently, the ontology results have changed. The updated results are now more specific and indicate that developmental processes are upregulated in presumed prenatal origin t(8;21) AML. 

      (c) Please clarify if the multiome data is part of the atlas.

      The multiome data is not a part of AML scAtlas, as it was published at a later date. We used this dataset solely for validation purposes and have updated the figures and text to clearly indicate that it is used as a validation dataset.  

      (d) Please describe the used data with respect to the number of patients, cells, age, etc.

      We clarified this point in the text and have also included supplementary tables detailing all samples used in the atlas and validation datasets. 

      (e) The four figures in Figure 4E look identical to me. What is the take-home message here? Do all perturbations have the same impact on driving differentiation? Please elaborate.

      The perturbation figure is intended to illustrate that other genes can behave similarly to members of the AP-1 complex (JUN and ATF4 here) following perturbation. Since the AP-1 complex is well known to be important in t(8;21) AML, we hypothesize that these other genes are also important. We apologize for the previous lack of interpretation here and have amended the text to clarify this point. 

      (11) Abstract: Please detail: how many of the 159 AML patients are t(8;21)? 

      We have now amended the abstract to include this. 

      (12) Figures: Increase font size where possible, eg age in 1B or risk group in 1G is super small and hard to read. 

      Extra attention has been given to improving the figure readability and resolution throughout the whole manuscript.  

      (13) Color codes in Figures 2B and 2C are all over the place and misleading: Sort 2C along age, indicate what is adult and adolescent, sort the x axis in 2B along age. 

      We have changed this figure accordingly.  

      (14) I suggest not coloring dendrograms, in my opinion this is highly irritating. 

      The dendrogram colors correspond to clusters which are referenced in the text, this coloring provides informative context and aids interpretation, making it a useful addition to the figure.

      (15) The resolution in Figure 4B is bad, I can't read the labels. 

      This visualization has been revised, to make presentation of this data clearer.  

      (16) In addition to selecting bulk RNA samples matching the two regulon signatures, some effort should have been put into investigating the samples not aligned with those, or assessing how unique these GRN signatures are to the specific cell type and disease of interest, excluding the influence of cell type composition and random noise. The lateonset signatures should also be excluded from being present in an external pre-natal cohort in a more statistically rigorous manner. 

      Our use of the bulk RNA-Seq data is solely intended for the validation of predefined regulon signatures, for which we already have a working hypothesis.  While we agree that further investigation of the samples that do not align with these signatures could yield interesting insights, we believe that such an analysis would extend beyond the scope of the current manuscript.

      (17) The specific bulk RNA samples used should be specified, along with the tissue of origin. The same goes for the Lambo dataset. 

      We have clarified this point in the text and provided a supplementary table detailing all samples used for validation, alongside the sample list from AML scAtlas.

      (18) In Supplementary Figure 5 B, the axes should be define. 

      We have updated this figure to include axis legends.

      (19) Supplementary Figure 4A. There is a mistake in the sex assignment for sample AML14D. Since chrY-genes are expressed, this sample is likely male, while the Xist expression is mostly zero. 

      We thank the reviewer for pointing out this error, which has now been corrected.  

      (20) Wording suggestions: 

      (a) Line 54: not compelling phrasing. 

      (b) Line 83: "allows to decipher". 

      (c) Line 88: repetition from line 85. 

      (d) Line 90: the expression "clean GRN" is not clear. 

      These wording suggestions have all been incorporated in the revised manuscript.

      (21) Supplementary Figure 3D is not interpretable, I suggest a different visualization. 

      We agree that the original figure was not the most informative and have replaced it with UMAPs displaying LSC6 and LSC17 scores.

    1. Author response:

      Reviewer #1 (Public review):

      Fombellida-Lopez and colleagues describe the results of an ART intensification trial in people with HIV infection (PWH) on suppressive ART to determine the effect of increasing the dose of one ART drug, dolutegravir, on viral reservoirs, immune activation, exhaustion, and circulating inflammatory markers. The authors hypothesize that ART intensification will provide clues about the degree to which low-level viral replication is occurring in circulation and in tissues despite ongoing ART, which could be identified if reservoirs decrease and/or if immune biomarkers change. The trial design is straightforward and well-described, and the intervention appears to have been well tolerated. The investigators observed an increase in dolutegravir concentrations in circulation, and to a lesser degree in tissues, in the intervention group, indicating that the intervention has functioned as expected (ART has been intensified in vivo). Several outcome measures changed during the trial period in the intervention group, leading the investigators to conclude that their results provide strong evidence of ongoing replication on standard ART. The results of this small trial are intriguing, and a few observations in particular are hypothesis-generating and potentially justify further clinical trials to explore them in depth. However, I am concerned about over-interpretation of results that do not fully justify the authors' conclusions.

      We thank Reviewer #1 for their thoughtful and constructive comments, which helped us clarify and improve the manuscript. Below, we address each of the reviewer’s points and describe the changes that we implemented in the revised version. We acknowledge the reviewer’s concern regarding potential overinterpretation of certain findings, and in the revised version we took particular care to ensure that all conclusions are supported by the data and framed within the exploratory nature of the study.

      (1) Trial objectives: What was the primary objective of the trial? This is not clearly stated. The authors describe changes in some reservoir parameters and no changes in others. Which of these was the primary outcome? No a priori hypothesis / primary objective is stated, nor is there explicit justification (power calculations, prior in vivo evidence) for the small n, unblinded design, and lack of placebo control. In the abstract (line 36, "significant decreases in total HIV DNA") and conclusion (lines 244-246), the authors state that total proviral DNA decreased as a result of ART intensification. However, in Figures 2A and 2E (and in line 251), the authors indicate that total proviral DNA did not change. These statements are confusing and appear to be contradictory. Regarding the decrease in total proviral DNA, I believe the authors may mean that they observed transient decrease in total proviral DNA during the intensification period (day 28 in particular, Figure 2A), however this level increases at Day 56 and then returns to baseline at Day 84, which is the source of the negative observation. Stating that total proviral DNA decreased as a result of the intervention when it ultimately did not is misleading, unless the investigators intended the day 28 timepoint as a primary endpoint for reservoir reduction - if so, this is never stated, and it is unclear why the intervention would then be continued until day 84? If, instead, reservoir reduction at the end of the intervention was the primary endpoint (again, unstated by the authors), then it is not appropriate to state that the total proviral reservoir decreased significantly when it did not.

      We agree with the reviewer that the primary objective of the study was not explicitly stated in the submitted manuscript. We clarified this in the revised manuscript (lines 361-364). As registered on ClinicalTrials.gov (NCT05351684), the primary outcome was defined as “To evaluate the impact of treatment intensification at the level of total and replication-competent reservoir (RCR) in blood and in tissues”, with a time frame of 3 months. Accordingly, our aim was to explore whether any measurable reduction in the HIV reservoir (total or replication-competent) occurred during the intensification period, including at day 28, 56, or 84. The protocol did not prespecify a single time point for this effect to occur, and the exploratory design allowed for detection of transient or sustained changes within the intensification window.

      We recognize that this scope was not clearly articulated in the original text and may have led to confusion in interpreting the transient drop in total HIV DNA observed at day 28. While total DNA ultimately returned to baseline by the end of intensification, the presence of a transient reduction during this 3-month window still fits within the framework of the study’s registered objective. Moreover, although the change in total HIV DNA was transient, it aligns with the consistent direction of changes observed across the multiple independent measures, including CA HIV RNA, RNA/DNA ratio and intact HIV DNA, collectively supporting a biological effect of intensification.

      We would also like to stress that this is the first clinical trial ever, in which an ART intensification is performed not by adding an extra drug but by increasing the dosage of an existing drug. Therefore, we were more interested in the overall, cumulative, effect of intensification throughout the entire trial period, than in differences between groups at individual time points. We clarified in the revised manuscript that this was a proof-of-concept phase 2 study, designed to reveal biological effects of ART intensification rather than confirm efficacy in a powered comparison. The absence of a prespecified statistical endpoint or sample size calculation reflects the exploratory nature of the trial.

      (2) Intervention safety and tolerability: The results section lacks a specific heading for participant safety and tolerability of the intervention. I was wondering about clinically detectable viremia in the study. Were there any viral blips? Was the increased DTG well tolerated? This drug is known to cause myositis, headache, CPK elevation, hepatotoxicity, and headache. Were any of these observed? What is the authors' interpretation of the CD4:8 ratio change (line 198)? Is this a significant safety concern for a longer duration of intensification? Was there also a change in CD4% or only in absolute counts? Was there relative CD4 depletion observed in the rectal biopsy samples between days 0 and 84? Interestingly, T cells dropped at the same timepoints that reservoirs declined... how do the authors rule out that reservoir decline reflects transient T cell decline that is non-specific (not due to additional blockade of replication)?

      We improved the Methods section to clarify how safety and tolerability were assessed during the study (lines 389-396). Safety evaluations were conducted on day 28 and day 84 and included a clinical examination and routine laboratory testing (liver function tests, kidney function, and complete blood count). Medication adherence was also monitored through pill counts performed by the study nurses.

      No virological blips above 50 copies/mL were observed and no adverse events were reported by participants during the 3-month intensification period. Although CPK levels were not included in the routine biological monitoring, no participant reported muscle pain or other symptoms suggestive of muscle toxicity.

      The CD4:CD8 ratio decrease noted during intensification was not associated with significant changes in absolute CD4 or CD8 counts, as shown in Figure 5. We interpret this ratio change as a transient redistribution rather than an immunological risk, therefore we do not consider it to represent a safety concern.

      We would like to clarify that CD4⁺ T-cell counts did not significantly decrease in any of the treatment groups, as shown in Figure 5. The apparent decline observed concerns the CD4/CD8 ratio, which transiently dropped, but not the absolute number of CD4⁺ T cells. Moreover, although the dynamics of total HIV DNA is indeed similar to that of CD4/CD8 ratio (both declined transiently and then returned to baseline by day 84), the dynamics of unspliced RNA and unspliced RNA/total DNA ratio are clearly different, as these markers demonstrated a sustained decrease that was maintained throughout the trial period, even when the CD4/CD8 ratio already returned to baseline. Also, we observed a significant decrease in intact HIV DNA at day 84 compared to day 0. These effects cannot be easily explained by a transient decline in CD4+ cells.

      (3) The investigators describe a decrease in intact proviral DNA after 84 days of ART intensification in circulating cells (Figure 2D), but no changes to total proviral DNA in blood or tissue (Figures 2A and 2E; IPDA does not appear to have been done on tissue samples). It is not clear why ART intensification would result in a selective decrease in intact proviruses and not in total proviruses if the source of these reservoir cells is due to ongoing replication. These reservoir results have multiple interpretations, including (but not limited to) the investigators' contention that this provides strong evidence of ongoing replication. However, ongoing replication results in the production of both intact and mutated/defective proviruses that both contribute to reservoir size (with defective proviruses vastly outnumbering intact proviruses). The small sample size and well-described heterogeneity of the HIV reservoir (with regard to overall size and composition) raise the possibility that the study was underpowered to detect differences over the 84-day intervention period. No power calculations or prior studies were described to justify the trial size or the duration of the intervention. Readers would benefit from a more nuanced discussion of reservoir changes observed here.

      We sincerely thank the reviewer for this insightful comment. We fully agree that the reservoir dynamics observed in our study might raise several possible interpretations, and that its complexity, resulting from continuous cycles of expansion and contraction, reflects the heterogeneity of the latent reservoir. 

      Total HIV DNA in PBMCs showed a transient decline during intensification (notably at day 28), ultimately returning to baseline by day 84. This biphasic pattern likely reflects the combined effects of suppression of ongoing low-level replication by an increased DTG dosage, followed by the expansion of infected cell clones (mostly harbouring defective proviruses). In other words, the transient decrease in total (intact + defective) DNA at day 28 may be due to an initial decrease in newly infected cells upon ART intensification, however at the subsequent time points this effect was masked by proliferation (clonal expansion) of infected cells with defective proviruses. Recent studies suggest that intact and defective proviruses are subjected to different selection pressures by the immune system on ART (PMID: 38337034) and their decay on therapy is different (intact proviruses are cleared much more rapidly than defectives). In addition, defective proviruses can be preferentially expanded as they can reprogram the host cell proliferation machinery (https://doi.org/10.1101/2025.09.22.676989). This explains why in our study the intact proviruses decreased, but the total proviruses did not change, between days 0 and 84, in the intensification group. Interestingly, in the control group, we observed a significant increase in total DNA at day 84 compared to day 0, with no difference for the intact DNA, which is also in line with the clonal expansion of defective proviruses.

      Importantly, we observed a significant decrease in intact proviral DNA between day 0 and day 84 in the intensification group (Figure 2D). This result directly addresses the study’s primary objective: assessing the impact of intensification on the replication-competent reservoir. In comparison, as the reviewer rightly points out, total HIV DNA includes over 90% defective genomes, which limits its interpretability as a biomarker of biologically relevant reservoir changes. In addition, other reservoir markers, such as cell-associated unspliced RNA and RNA/DNA ratios, also showed consistent trends supporting a biologically relevant effect of intensification. Even in the absence of sustained changes in total HIV DNA, the coherence across the different independent measures of the reservoir (intact DNA, unspliced RNA), suggests an effect indicative of ongoing replication pre-intensification.

      Regarding tissue reservoirs, the lack of substantial change in total HIV DNA between days 0 and 84 is also in line with the predominance of defective sequences in these compartments. Moreover, the limited increase in rectal tissue dolutegravir levels during intensification (from 16.7% to 20% of plasma concentrations) may have limited the efficacy of the intervention in this site.

      As for the IPDA on rectal biopsies, we attempted the assay using two independent DNA extraction methods (Promega Reliaprep and Qiagen Puregene), but both yielded high DNA shearing index values, and intact proviral detection was successful in only 3 of 40 samples. Given the poor DNA integrity, these results were not interpretable.

      That said, we fully acknowledge the limitations of our study, especially the small sample size, and we agree with the reviewer that caution is needed when interpreting these findings. In the revised manuscript, we adopted a more measured tone in the discussion (lines 340-346), stating that these observations are exploratory and hypothesis-generating, and require confirmation in larger, more powered studies. Nonetheless, we believe that the convergence of multiple reservoir markers pointing in the same direction constitutes a meaningful biological effect that deserves further investigation.

      (4) While a few statistically significant changes occurred in immune activation markers, it is not clear that these are biologically significant. Lines 175-186 and Figure 3: The change in CD4 cells + for TIGIT looks as though it declined by only 1-2%, and at day 84, the confidence interval appears to widen significantly at this timepoint, spanning an interquartile range of 4%. The only other immune activation/exhaustion marker change that reached statistical significance appears to be CD8 cells + for CD38 and HLA-DR, however, the decline appears to be a fraction of a percent, with the control group trending in the same direction. Despite marginal statistical significance, it is not clear there is any biological significance to these findings; Figure S6 supports the contention that there is no significant change in these parameters over time or between groups. With most markers showing no change and these two showing very small changes (and the latter moving in the same direction as the control group), these results do not justify the statement that intensifying DTG decreases immune activation and exhaustion (lines 38-40 in the abstract and elsewhere).

      We agree with the reviewer that the observed changes in immune activation and exhaustion markers were modest. We revised the abstract and the manuscript text (including a section header) to reflect this more accurately (lines 39, 175, 185, 253). We noted that these differences, while statistically significant (e.g., in TIGIT+ CD4+ T cells and CD38+HLA-DR+ CD8+ T cells), were limited in magnitude. We explicitly acknowledged these limitations and interpreted the findings with appropriate caution.

      (5) There are several limitations of the study design that deserve consideration beyond those discussed at line 327. The study was open-label and not placebo-controlled, which may have led to some medication adherence changes that confound results (authors describe one observation that may be evidence of this; lines 146-148). Randomized/blinded / cross-over design would be more robust and help determine signal from noise, given relatively small changes observed in the intervention arm.There does not seem to be a measurement of key outcome variables after treatment intensification ceased - evidence of an effect on replication through ART intensification would be enhanced by observing changes once intensification was stopped. Why was intensification maintained for 84 days? More information about the study duration would be helpful. Table 1 indicates that participants were 95% male. Sex is known to be a biological variable, particularly with regard to HIV reservoir size and chronic immune activation in PWH. Worldwide, 50% of PWH are women. Research into improving management/understanding of disease should reflect this, and equal participation should be sought in trials. Table 1 shows differing baseline reservoir sizes between the control and intervention groups. This may have important implications, particularly for outcomes where reservoir size is used as the denominator.

      We expanded the limitations section to address several key aspects raised by the reviewer: the absence of blinding and placebo control, the predominantly male study population, and the lack of postintervention follow-up. While we acknowledge that open-label designs can introduce behavioural biases, including potential changes in adherence, we now explicitly state that placebo-controlled, blinded trials would provide a more robust assessment and are warranted in future research (lines 340346). 

      The 84-day duration of intensification was chosen based on previous studies and provided sufficient time for observing potential changes in viral transcription and reservoir dynamics. However, we agree that including post-intervention follow-up would have strengthened the conclusions, and we highlighted this limitation and future direction in the revised manuscript (lines 340-346). 

      The sex imbalance is now clearly acknowledged as a limitation in the revised manuscript, and we fully support ongoing efforts to promote equitable recruitment in HIV research. We would like to add that, in our study, rectal biopsies were coupled with anal cancer screening through HPV testing. This screening is specifically recommended for younger men who have sex with men (MSM), as outlined in the current EACS guidelines (see: https://eacs.sanfordguide.com/eacs part2/cancer/cancerscreening-methods). As a result, MSM participants had both a clinical incentive and medical interest to undergo this procedure, which likely contributed to the higher proportion of male participants in the study.

      Lastly, although baseline total HIV DNA was higher in the intensified group, our statistical approach is based on a within-subject (repeated-measures) design, in which the longitudinal change of a parameter within the same participant during the study was the main outcome. In other words, we are not comparing absolute values of any marker between the groups, we are looking at changes of parameters from baseline within participants, and these are not expected to be affected by baseline imbalances.

      (6) Figure 1: the increase in DTG levels is interesting - it is not uniform across participants. Several participants had lower levels of DTG at the end of the intervention. Though unlikely to be statistically significant, it would be interesting to evaluate if there is a correlation between change in DTG concentrations and virologic / reservoir / inflammatory parameters. A positive relationship between increasing DTG concentration and decreased cell-associated RNA, for example, would help support the hypothesis that ongoing replication is occurring.

      We agree with the reviewer that assessing correlations between DTG concentrations and virological, immunological, or inflammatory markers would be highly informative. In fact, we initially explored this question in a preliminary way by examining whether individuals who showed a marked increase in DTG levels after intensification also demonstrated stronger changes in the viral reservoir. While this exploratory analysis did not reveal any clear associations, we would like to emphasize that correlating biological effects with DTG concentrations measured at a single timepoint may have limited interpretability. A more comprehensive understanding of the relationship between drug exposure and reservoir dynamics would ideally require multiple pharmacokinetic measurements over time, including pre-intensification baselines. This is particularly important given that DTG concentrations vary across individuals and over time, depending on adherence, metabolism, and other individual factors.

      (7) Figure 2: IPDA in tissue- was this done? scRNA in blood (single copy assay) - would this be expected to correlate with usCaRNA? The most unambiguous result is the decrease in cell-associated RNA - accompanying results using single-copy assay in plasma would be helpful to bolster this result.

      As mentioned in our response to point 3, we attempted IPDA on tissue samples, but technical limitations prevented reliable detection of intact proviruses. Regarding residual viremia, we did perform ultra-sensitive plasma HIV RNA quantification but due to a technical issue (an inadvertent PBMC contamination during plasma separation) that affected the reliability of the results we felt uncomfortable including these data in the manuscript.

      The use of the US RNA / Total DNA ratio is not helpful/difficult to interpret since the control and intervention arms were unmatched for total DNA reservoir size at study entry.

      We respectfully disagree with this comment. The US RNA/total DNA ratio is commonly used to assess the relative transcriptional activity of the viral reservoir, rather than its absolute size. While we acknowledge that the total HIV-1 DNA levels differed at baseline between the two groups, the US RNA/total DNA ratio specifically reflects the relationship between transcriptional activity and reservoir size within each individual, and is therefore not directly confounded by baseline differences in total DNA alone.

      Moreover, our analyses focus on within-subject longitudinal changes from baseline, not on direct between-group comparisons of absolute marker values. As such, the observed changes in the US RNA/total DNA ratio over time are interpreted relative to each participant's baseline, mitigating concerns related to baseline imbalances between groups.

      Reviewer #2 (Public review):

      Summary:

      An intensification study with a double dose of 2nd generation integrase inhibitor with a background of nucleoside analog inhibitors of the HIV retrotranscriptase in 2, and inflammation is associated with the development of co-morbidities in 20 individuals randomized with controls, with an impact on the levels of viral reservoirs and inflammation markers. Viral reservoirs in HIV are the main impediment to an HIV cure, and inflammation is associated with co-morbidities.

      Strengths:

      The intervention that leads to a decrease of viral reservoirs and inflammation is quite straightforward forward as a doubling of the INSTI is used in some individuals with INSTI resistance, with good tolerability.

      This is a very well documented study, both in blood and tissues, which is a great achievement due to the difficulty of body sampling in well-controlled individuals on antiretroviral therapy. The laboratory assays are performed by specialists in the field with state-of-the art quantification assays. Both the introduction and the discussion are remarkably well presented and documented.

      The findings also have a potential impact on the management of chronic HIV infection.

      Weaknesses:

      I do not think that the size of the study can be considered a weakness, nor the fact that it is open-label either.

      We thank Reviewer #2 for their constructive and supportive comments. We appreciate their positive assessment of the study design, the translational relevance of the intervention, and the technical quality of the assays. We also take note of their perspective regarding sample size and study design, which supports our positioning of this trial as an exploratory, hypothesis-generating phase 2 study.

      Reviewer #3 (Public review):

      The introduction does a very good job of discussing the issue around whether there is ongoing replication in people with HIV on antiretroviral therapy. Sporadic, non-sustained replication likely occurs in many PWH on ART related to adherence, drug-drug interactions and possibly penetration of antivirals into sanctuary areas of replication and as the authors point out proving it does not occur is likely not possible and proving it does occur is likely very dependent on the population studied and the design of the intervention. Whether the consequences of this replication in the absence of evolution toward resistance have clinical significance challenging question to address.

      It is important to note that INSTI-based therapy may have a different impact on HIV replication events that results in differences in virus release for specific cell type (those responsible for "second phase" decay) by blocking integration in cells that have completed reverse transcription prior to ART initiation but have yet to be fully activated. In a PI or NNRTI-based regimen, those cells will release virus, whereas with an INSTI-based regimen, they will not.

      Given the very small sample size, there is a substantial risk of imbalance between the groups in important baseline measures. Unfortunately, with the small sample size, a non-significant P value is not helpful when comparing baseline measures between groups. One suggestion would be to provide the full range as opposed to the inter-quartile range (essentially only 5 or 6 values). The authors could also report the proportion of participants with baseline HIV RNA target not detected in the two groups.

      We thank Reviewer #3 for their thoughtful and balanced review. We are grateful for the recognition of the strength of the Introduction, the complexity of evaluating residual replication, and the technical execution of the assays. We also appreciate the insightful suggestions for improving the clarity and transparency of our results and discussion.

      We revised the manuscript to address several of the reviewer’s key concerns. We agree that the small sample size increases the risk of baseline imbalances. We acknowledged these limitations in the manuscript (lines 327-330). For transparency, we now provide both the full range and the IQR for all parameters in Table 1. However, we would like to stress that our statistical approach is based on a within-subject (repeated-measures) design, in which the longitudinal change of a parameter within the same participant during the study was the main outcome. In other words, we are not comparing absolute values of any marker between the groups, we are looking at changes of parameters from baseline within participants, and these are not expected to be affected by baseline imbalances.

      A suggestion that there is a critical imbalance between groups is that the control group has significantly lower total HIV DNA in PBMC, despite the small sample size. The control group also has numerically longer time of continuous suppression, lower unspliced RNA, and lower intact proviral DNA. These differences may have biased the ability to see changes in DNA and US RNA in the control group.

      We acknowledge the significant baseline difference in total HIV DNA between groups, which we have clearly reported. However, the other variables mentioned, such as duration of continuous viral suppression, unspliced RNA levels, and intact proviral DNA, did not differ significantly between groups at baseline, despite differences in the median values (that are always present). These numerical differences do not necessarily indicate a critical imbalance.

      Notably, there was no significant difference in the change in US RNA/DNA between groups (Figure 2C).

      The nonsignificant difference in the change in US RNA/total DNA between groups is not unexpected, given the significant between-group differences for both US RNA and total DNA changes. Since the ratio combines both markers, it is likely to show attenuated between-group differences compared to the individual components. However, while the difference did not reach statistical significance (p = 0.09), we still observed a trend towards a greater reduction in the US RNA/total DNA ratio in the intervention group.

      The fact that the median relative change appears very similar in Figure 2C, yet there is a substantial difference in P values, is also a comment on the limits of the current sample size. 

      Although we surely agree that in general, the limited sample size impacts statistical power, we would like to point out that in Figure 2C, while the medians may appear similar, the ranges do differ between groups. At days 56 and 84, the median fold changes from baseline are indeed close but the full interquartile range in the DTG group stays below 1, while in the control group, the interquartile range is wider and covers approximately equal distance above and below 1. This explains the difference in p values between the groups.

      The text should report the median change in US RNA and US RNA/DNA when describing Figures 2A-2C.

      These data are already reported in the Results section (lines 164–166): "By day 84, US RNA and US RNA/total DNA ratio had decreased from day 0 by medians (IQRs) of 5.1 (3.3–6.4) and 4.6 (3.1–5.3) fold, respectively (p = 0.016 for both markers)."

      This statistical comparison of changes in IPDA results between groups should be reported. The presentation of the absolute values of all the comparisons in the supplemental figures is a strength of the manuscript.

      In the assessment of ART intensification on immune activation and exhaustion, the fact that none of the comparisons between randomized groups were significant should be noted and discussed.

      We would like to point out that a statistically significant difference between the randomized groups was observed for the frequency of CD4⁺ T cells expressing TIGIT, as shown in Figure 3A and reported in the Results section (p = 0.048).

      The changes in CD4:CD8 ratio and sCD14 levels appear counterintuitive to the hypothesis and are commented on in the discussion.

      Overall, the discussion highlights the significant changes in the intensified group, which are suggestive. There is limited discussion of the comparisons between groups where the results are less convincing.

      We observed statistically significant differences between the randomized groups for total DNA (p<0.001) and US RNA (p=0.01), as well as for the frequency of CD4⁺ T cells expressing TIGIT (p=0.048). We would like to stress that US RNA is a key marker of residual replication as it is very sensitive to de novo infection events. As discussed in the manuscript (lines 291-294), a newly infected CD4+ T lymphocyte can contain hundreds to thousands of US HIV RNA copies at the peak of infection. Therefore, a change in the US RNA level upon ART intensification is a very sensitive indicator of new infections. The fact that for US RNA we observed both a significant reduction in the intensified group and a significant difference between the groups is a strong indicator that some new infections had been occurring prior to intensification.

      The limitations of the study should be more clearly discussed. The small sample size raises the possibility of imbalance at baseline. The supplemental figures (S3-S5) are helpful in showing the differences between groups at baseline, and the variability of measurements is more apparent. The lack of blinding is also a weakness, though the PK assessments do help (note 3TC levels rise substantially in both groups for most of the time on study (Figure S2).

      The many assays and comparisons are listed as a strength. The many comparisons raise the possibility of finding significance by chance. In addition, if there is an imbalance at baseline outcomes, measuring related parameters will move in the same direction.

      We agree that the multiple comparisons raise the possibility of chance findings but would like to stress that in an exploratory study like this it is very important to avoid a type II error. In addition, the consistent directionality of the most relevant outcomes (US RNA and intact DNA) lends biological plausibility to the observed effects.

      The limited impact on activation and inflammation should be addressed in the discussion, as they are highlighted as a potentially important consequence of intermittent, not sustained replication in the introduction.

      The study is provocative and well executed, with the limitations listed above. Pharmacokinetic analyses help mitigate the lack of blinding. The major impact of this work is if it leads to a much larger randomized, controlled, blinded study of a longer duration, as the authors point out.

      Finally, we fully endorse the reviewer’s suggestion that the primary contribution of this study lies in its value as a proof-of-concept and foundation for future randomized, blinded trials of greater scale and duration. We highlighted this more clearly in the revised Discussion (lines 340-346).

      Reviewer #1 (Recommendations for the authors):

      (1) Lines 84-87: How would chronic immune activation/inflammation be expected to differ if viral antigen is being released from stable reservoirs rather than low-level replication?

      This is a very insightful question. Although release of viral antigens from stable reservoirs could certainly also trigger immune activation/inflammation, the reservoir cells in PWH on long-term ART are constantly being negatively selected by the immune system (PMID: 38337034; PMID: 36596305) so that after a number of years on therapy, most proviruses are either transcriptionally silent or express only a low amount of viral RNA/antigen. Recent evidence suggests that these selected cells possess specific biological properties that include mechanisms that limit proviral gene expression (PMID: 36599977; PMID: 36599978). In comparison, low-level replication would result in de novo infection of unselected, activated CD4+ cells that are expected to produce much more viral antigen than preselected reservoir cells.

      (2) Lines 249-253: There are multiple ways to explain this observation - alternatively, the total proviral DNA declined due to transient CD4 depletion.

      As discussed above, CD4⁺ T-cell counts did not significantly decrease in any of the treatment groups, as shown in Figure 5. The apparent decline observed concerns the CD4/CD8 ratio, which transiently dropped, but not the absolute number of CD4⁺ T cells. Moreover, although the dynamics of total HIV DNA is indeed similar to that of CD4/CD8 ratio (both declined transiently and then returned to baseline by day 84), the dynamics of unspliced RNA and unspliced RNA/total DNA ratio is clearly different, as these markers demonstrated a sustained decrease that was maintained throughout the trial period. Also, we observed a significant decrease in intact HIV DNA at day 84 compared to day 0. These effects cannot be easily explained by a transient decline in CD4+ cells.

      (3) Lines 301-305: This is a confusing explanation for not seeing an effect in tissue. Overall, there was no change in total proviral DNA in blood between days 0 and 84 either - yet the explanation for this observation is different (249-253). Was IPDA not performed on the tissue? Wouldn't this be the preferred test for reservoir depletion?

      We thank the reviewer for bringing this point to our attention. We modified the Discussion to prevent the confusion (lines 303-305). As for the IPDA on tissue, we attempted this assay on the tissue samples using two independent DNA extraction methods (Promega Reliaprep and Qiagen Puregene), but both yielded high DNA shearing index values, and intact proviral detection was successful in only 3 of 40 samples. Given the poor DNA integrity, these results were not interpretable.

    1. Reviewer #2 (Public review):

      Summary:

      The work set out to better understand the phenomenon of antibiotic persistence in mycobacteria. Three new observations are made using the pathogenic Mycobacterium abscessus as an experimental system: phenotypic tolerance involves suppression of ROS, protein synthesis inhibitors can be lethal for this bacterium, and levofloxacin lethality is unaffected by deletion of catalase, suggesting that this quinolone does not kill via ROS.

      Strengths:

      The ROS experiments are supported in three ways: measurement of ROS by a fluorescent probe, deletion of catalase increases lethality of selected antibiotics, and a hypoxia model suppresses antibiotic lethality. A variety of antibiotics are examined, and transposon mutagenesis identifies several genes involved in phenotypic tolerance, including one that encodes catalase. The methods are adequate for making these statements.

      Weaknesses:

      The work can be improved by a more comprehensive treatment of prior work, especially comparison of E. coli work with mycobacterial studies.<br /> Moreover, the work still has some technical issues to fix regarding description of the methods, supplementary material, and reference formating.

      Overall impact: Showing that ROS accumulation is suppressed during phenotypic tolerance, while expected, adds to the examples of the protective effects of low ROS levels. Moreover, the work, along with a few others, extends the idea of antibiotic involvement with ROS to mycobacteria. These are field-solidifying observations.

      Comments on revisions:

      The authors have moved this paper along nicely. I have a few general thoughts.

      (1) It would be helpful to have more references to specific figures and panels listed in the text to make reading easier.

      (2) I would suggest adding a statement about the importance of the work. From my perspective, the work shows the general nature of many statements derived from work with E. coli. This is important. The abstract says this overall, but a final sentence in the abstract would make it clear to all readers.

      (3) The paper describes properties that may be peculiar to mycobacteria. If the authors agree, I would suggest some stress on the differences from E. coli. Also, I would place more stress on novel findings. This might be done in a section called Concluding Remarks. The paper by Shee 2022 AAC could be helpful in phrasing general properties.

      (4) Several aspects still need work to be of publication quality. Examples are the materials table and the presentation of supplementary material. Reference formatting also needs attention.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weaknesses:

      Only 1 gene (katG) gave a strong and 1 (Mab_1456c) exhibited a minor defect. Two of the clones did not show any persistence phenotype (blaR and recR) and one (pafA) showed a minor phenotype,

      We have now carried out more detailed validation studies on the Tn-Seq, with analysis of timedependent killing over 14 d. This more comprehensive analysis shows that 4 of 5 genes analyzed do indeed have antibiotic tolerance defects under the conditions that Tn-Seq predicted a survival defect (Revised Figure 3). In addition, we found that even before actual cell death, several mutants had delayed resumption of growth after antibiotic removal (Figure 3 Supplemental).

      Fig 3 - Why is there such a huge difference in the extent of killing of the control strain in media, when exposed to TIG/LZD, when compared to Fig. 1C and Fig. 4. In Fig. 1C, M. abs grown in media decreases by >1 log by Day 3 and >4 log by Day 6, whereas in Fig. 3, the bacterial load decreases by <1 log by Day 3 and <2 log by Day 6. This needs to be clarified, if the experimental conditions were different, because if comparing to Fig. 1C data then the katG mutant strain phenotype is not very different.

      We agree with the reviewer that there is variability in the timing and extent of cell death from experiment to experiment. As noted by the reviewer, in Figure 1C the largest decrement in survival is between day 1 - day 3 (also seen in Figure 6A). As they noted in Figure 4 the largest decrement is between day 3 – day 6 (also seen in Figure 3A, Figure 5F). In each experiment with katG mutants we carefully compare the mutant vs. the control strain within that experiment, which is more accurate than comparing the behavior of mutant in one experiment to a control in another experiment.

      Reviewer #2 (Public review):

      Weaknesses:

      .First, word-choice decisions could better conform to the published literature. Alternatively, novel definitions could be included. In particular, the data support the concept of phenotypic tolerance, not persistence. 

      We appreciate the reviewers comments, text modified.

      Second, two of the novel observations could be explored more extensively to provide mechanistic explanations for the phenomena. 

      We have added several additional experiments, these are detailed below in response to specific comments.

      Reviewer #3 (Public review):

      Weaknesses:

      The findings could not be validated in clinical strains.

      We understand the reviewer’s concern that the katG phenotype was only observed in one of the two clinical strains we studied. We feel that our findings are relevant beyond the ATCC 19977 strain for two reasons

      (1) We have performed additional analyses of the two clinical isolates and indeed find significant accumulation of ROS following antibiotic exposure in both of these strains (revised Figure 6A).

      (2) We do in fact see a role for katG in starvation-induced antibiotic tolerance in Mabs clinical strain-2. It is not surprising that different strains from a particular species may have some different responses to stresses – for example, there is wide strain-specific variability in susceptibility to different phages within a species based on which particular phage defense modules a given strain carries (for example PMID: 37160116). We speculate that different Mabs strains may express varying levels of other antioxidant factors and note that the genes encoding several such factors were identified by our Tn-Seq screen including the peroxidases ahpC, ahpD, and ahpE. Our analysis of the genetic interactions between katG and these other factors is ongoing. 

      Comments/Suggestions

      (1) In Fig1E, the authors show no difference in killing Mtb with or without adaptation in PBS. These data are contrary to the data presented in Figure 1B. These also do not align with the data of M. smegmatis and M. abscesses. Please discuss these observations in light of the Duncan model of persistence (Mol Microbiol. 2002 Feb;43(3):717-31.).’

      The above referenced Duncan laboratory study found tolerance after prolonged starvation but did not actually examine tolerance at early time points. While some of the transcriptional and metabolic changes seen by Duncan and others are slow, other groups have described starvation responses in Mtb that are quite rapid. For example, the stringent response mediator ppGpp accumulates within a few hours after onset of starvation in Mtb (PMID: 30906866). We suspect that a rapid signaling response such as this underlies the phenotype we observe. Regarding the difference between Mtb and other mycobacterial species we also find it surprising that Mtb had a much more rapid starvation response. This is a clear species-specific difference that may reflect an adaptation of Mtb to the nutrient-limited physiologic niche within host macrophages.

      (2) Line 151, the authors state that they have used an M. abscesses Tn mutant library of ~ 55,000 mutant strains. The manuscript will benefit from the description of the coverage of total TA sites covered by the mutants.

      Text modified to add this detail. There are 91,559 TA sites in the abscessus genome. Thus, our Tn density is ~60%.

      (3) Line 155: Please explain how long the cells were kept in an Antibiotic medium.

      This technical detail was noted above on line 153 in the original text: “…and then exposed them to TIG/LZD for 6 days”. To clarify the overall conditions, we have also revised the text of the manuscript and added the detail of how long cells were passaged after removal of antibiotics.

      (4) Line 201: data not shown. Delayed resumption of growth after removal of antibiotic would be helpful in indicating drug resilience. This data could enhance the manuscript.

      Data now provided in Figure 3 Supplemental

      (5) Figures 4C and 4F represent the kill curve. It will be good to show the date with CFU against the drug concentration in place of OD600. CFU rather than OD600 best reflects growth inhibition.

      Figures 4C and 4F are measuring the minimum inhibitory concentration (MIC) to stop the overall growth of the bacterial population. While we agree that CFU could be analyzed, this would be measuring a different outcome – cell death and the minimum bactericidal concentration (MBC). In these experiments we sought to specifically examine the MIC so as to separate growth inhibition from cell death. For this we used the standard method employed by clinical microbiology laboratories for MIC, which is optical density of the culture (PMID: 10325306).

      (6) Figure 5C. The authors shall show the effect of TIG/LZD on M. abscesses ROS production without the PBS adaptation. It is important to conclude that TIG/LZD induces ROS in cells. Authors should utilize ROS scavengers such as Thiourea, DFO, etc., to conclude ROS's contribution to bacterial killing following inhibition of transcription and translation.

      New data added (revised Figure 5 and Figure 5 Supplemental)  

      (7) Line 303. Remove "note".

      Text revised. We thank the reviewer for identifying this typographical error.  

      (8) The introduction and Discussion are very similar, and several lines are repeated.

      Text revised with overlapping content removed.

      Reviewer #1 (Recommendations for the authors):

      It appears that the same datasets for PBS adapted cultures were plotted in A-C and D-F. Either this should be specifically mentioned in the legend or it might be better to integrate the non-adapted plots into A-C which would also allow easier comparison.

      Appreciate the reviewer’s suggestion; text modified with added clarification to figure legend.

      This manuscript is focused on M. abs and the antibiotics TIG/LZD, so the Mtb data or data using the antibiotics INH/RIF/EMB and serves more as a distraction and can be removed

      We appreciate the reviewer’s perspective. However, we wish to include these data to show the similarities (and differences) in starvation-induced tolerance between the three organisms.

      Fig 3 -As mentioned for Fig. 1, it appears that the same dataset was used for the control in all the figures A-E. This should be explicitly stated in the Figure legend.

      Appreciate the reviewer’s suggestion; text modified with added clarification to figure legend.

      The divergent results from the clinical strains are extremely interesting. It would be helpful to determine the oxidative stress levels (similar to the cellROX data shown in 5E), to tease out if the difference in katG role is because of lack of ROS induction in these strains or due to expression of alternate anti-oxidative stress defense mechanisms.

      We have performed additional cellROX analysis as suggested by the reviewer and found that the ROS induction is indeed present across all three Mabs strains, but that katG is only required in one of the two strains (Strain #2). These data are now included in the revised Figure 6.

      Reviewer #2 (Recommendations for the authors):

      GENERAL COMMENTS

      This is a nice piece of work that uses the pathogen Mabs as a test subject.

      The work has findings that likely apply generally to antibiotics and mycobacteria: 1) phenotypic tolerance is associated with suppression of ROS, 2) lethal protein synthesis inhibitors act via accumulation of ROS, and 3) levofloxacin behaves in an unexpected way. Each is a new observation. However, I believe that each topic requires more work to be firmly established to be suitable for eLife.

      Phenotypic tolerance: Association with suppression of ROS is important but expected. I would solidify the conclusion by performing several additional experiments. For example, confirm the lethal effect of ROS by reducing it with an iron chelator and a radical scavenger. There is a large literature on effects of iron uptake, levels, etc. on antibiotic lethality that could be applied to this question. In 2013 Imlay argued against the validity of fluorescent probes. Perhaps getting the same results with another probe would strengthen the conclusion.

      We have carried out additional experiments with both an iron chelator and small molecule ROS scavengers to further test this idea but note that these experiments have several inherent limitations: 1) These compounds have highly pleiotropic effects. For example while N-acetyl cysteine (NAC) is an antioxidant it also increases mycobacterial respiration and was shown to paradoxically decrease antibiotic tolerance in M. tuberculosis (PMID: 28396391). 2) It has been shown by the Imlay group that small-molecule antioxidants are often ineffective in quenching ROS in bacteria (PMID: 388893820), making negative results difficult to interpret. Nonetheless, we present new experimental data showing that iron chelation does indeed improve the survival of antibiotic-treated Mabs (revised Figure 5).  However,  small molecule antioxidants such as thiourea do not restore antibiotic tolerance and actually increased bacterial cell death, suggesting that they may be affecting respiration in Mabs in a manner similar to that seen for NAC in Mtb. We also note that our genetic analysis, which identified numerous other genes encoding proteins with antioxidant function (Figure 2) is a strong additional argument in support of the importance of ROS in antibiotic-mediated lethality. 

      Regarding the concern raised by Imlay about the validity of oxidation-sensitive dyes - this relates to concern bacterial autofluorescence induced by antibiotics that can confound analyses in some species. We have ruled this out in our analyses by using bacteria unstained by cellROX as controls to confirm that there is negligible autofluorescence in Mabs (<0.1%, Figure 5E, Figure 6A).

      Protein synthesis inhibitors: At present, this is simply an observation. More work is needed to suggest a mechanism. For example, with E. coli the aminoglycosides are protein synthesis inhibitors that also cause membrane damage. Membrane damage is known to stimulate ROS-mediated killing. Your observation needs to be extended because chloramphenicol, another protein synthesis inhibitor, blocks ROS production. The lethality may be a property of mycobacteria: does it occur with E. coli (note that rifampicin is bacteriostatic with E. coli but lethal to Mtb)?

      We agree with the reviewer that the mechanism underlying ROS accumulation following transcription or translational inhibition in Mabs is of significant interest. It is likely to be a mechanism different from E. coli, because in E. coli tetracyclines and rifamycins are both bacteriostatic, whereas in Mabs they are both bactericidal. Determining the mechanism by which translation inhibitors cause ROS accumulation in Mabs is an ongoing effort in our laboratory using proteomics and metabolomics, but is outside the scope of this manuscript.

      Levofloxacin: This is also at the observational stage but is unexpected. In other studies, ROS is involved in quinolone-mediated killing of bacteria. Why is this not the case with Mabs? The observation should be solidified by showing the contrast with moxifloxacin, since this compound has been studied with mycobacteria (Shee 2022 AAC). With E. coli, quinolone structure can affect the relative contribution of ROS to killing (Malik 2007 AAC), as is also seen with Mtb (Malik 2006 AAC). What is happening in the present work with levofloxacin, an important anti-tuberculosis drug? Is there a structure explanation (compare with ofloxacin)?

      While these are interesting questions, a detailed exploration of the structure-function relationships between different fluoroquinolone antibiotics and their varying activities on Mtb and Mabs is outside the scope of this manuscript.  

      The writing is generally easy to follow. However, the concept of persistence should be changed to phenotypic tolerance with text changes throughout. I base this suggestion on the definitions of tolerance and persistence as stated in the consensus review (Balaban 2019 Nat Micro Rev). Experimentally, tolerance is seen as a gradual decline in survival following antibiotic addition; the decline is slower than seen with wild-type cells. The data presented in this paper fit that definition. In contrast, persistence refers to a rapid drop in survival followed by a distinct plateau (Balaban 2019 Nat Micro Rev; for example, see Wu Lewis AAC 2012 ). Moreover, to claim persistence, it would be necessary to demonstrate subpopulation status, which is not done. The Balaban review is an attempt to bring order to the field with respect to persistence and tolerance, since the two are commonly used without regard for a consistent definition.

      We appreciate the reviewer’s suggestion; text modified in multiple places to clarify.

      Another issue requiring clarification is the relationship between resistance and tolerance. Killing by antibiotics is a two-step process, as most clearly seen with quinolones. First a reversible bacteriostatic event occurs. Resistance blocks that bacteriostatic damage. Then a lethal metabolic response to that damage occurs. Tolerance selectively blocks the second, killing event, a distinct process that often involves the accumulation of ROS. Direct antibiotic-mediated damage is an additional mode of killing that also stems from the reversible, bacteriostatic damage created by antibiotics. The authors recognize the distinction but could make it clearer. Take a look at Zheng (JJ Collins) 2020, 2022.

      Text modified to clarify this point

      Many readers would also like to see a bit more background on Mabs. For example, does it grow rapidly? Are there features that make it a good model for studying mycobacteria or bacteria in general? The more general, the better.

      Text modified, background added

      Below I have listed specific comments that I hope are useful in bringing the work to publication and making it highly cited.

      SPECIFIC COMMENTS

      Line 30 unexpectedly. I would delete this word because the result is expected from the ROS work of Shee et al 2022 with mycobacteria. Moreover, Zeng et al 2022 PNAS showed that ROS participates in antimicrobial tolerance, and persistence is a form of tolerance (Balalban et al, 2019, Nat Micro Rev).

      Text modified as per review suggestion

      Line 39 key goal: this is probably untrue in the general sense stated, since bacteriostatic antibiotics are sufficient to clear infection (Wald-Dickler 2019 Clin Infect Dis). However, it is likely to be the goal for Mtb infections.

      We agree with the reviewer that bacteriostatic antibiotics are effective in treating most types of infections and do not claim otherwise in the manuscript. However, from a clinical standpoint, eradication of the pathogen causing the infection is indeed the goal of antibiotic therapy in virtually all circumstances (with the exception of specific scenarios such as cystic fibrosis where it is recognized that the infecting organism cannot be fully eliminated). In most cases, the combination of bacteriostatic antibiotics and the host immune response is sufficient to achieve eradication. We have modified the manuscript text to reflect this nuance noted by the reviewer.

      Line 62 several: you list three, but hipAB works via ppGpp, so the sentence needs fixing

      Text modified  

      Line 70 uncertain: this uncertainty is unreferenced. Since everything is uncertain, this vague phrase does not add to the story.

      The reviewer makes an interesting philosophical argument. However, we would submit that some aspects of biology, for example the regulation of glycolysis, are understood in great detail. However, other mechanisms, such as the precise mechanisms of lethality for diverse antibiotics in different bacterial species, are far more uncertain and remain a subject of debate (for example PMID: 39910302). Text not modified.

      Line 72 somewhat controversial: I would delete this, because the points in the Science papers by Lewis and Imlay have been clarified and in some cases refuted by prior and subsequent work.

      Text modified

      Line 72 presumed: this suggests that it is wrong and perhaps a different idea has replaced it. Another, and more likely view is that there is an additional mode of killing. I suggest rephrasing to be more in line with the literature.

      Text modified for clarity. In this sentence “presume” refers to the historical concept that direct target inhibition was solely responsible for antibiotic lethality. As the reviewer notes, there is now significant literature that ROS (and perhaps other secondary effects) also contribute to bacterial killing.  

      Line 73 However and the following might also: this phrasing, plus the presumed, misleads the reader from your intent. I suggest rephrasing.

      See above re: line 72

      Line 75 citations: these are inappropriate and should be changed to fit the statement. I suggest the initial paper by Collins (Kohanski 2007 Cell) a recent paper by Zhao (Zeng PNAS 2022), and a review Drlica Expert Rev Anti-infect Therapy 2021). The present citations are fine if you want to narrow the statement to mycobacteria, but the history is that the E. coli work came first and was then generalized to mycobacteria. A mycobacterial paper for ROS is Shee 2022 AAC.

      We thank the reviewer for noticing that we inadvertently omitted several important E. coli-related references. These have been added.

      Line 75 and 76: Conversely ... unresolved. Compelling arguments have been made that show major flaws in the two papers cited, and a large body of evidence has now accumulated showing the validity of the idea promoted by the Collins lab, beginning with Kohanski 2007. In addition to many papers by Collins, see Hong 2019 PNAS and Zeng 2022 PNAS). It is fine if you want to counter the arguments against the Lewis and Imlay papers (summarized in Drlica & Zhao 2021 Expert Rev Anti-infect Therapy), but making a blanket statement suggests that the authors are unfamiliar with the literature.

      We agree with the reviewer that the weight of the evidence supports a role for antibiotic-induced ROS as an important mechanism for antibiotic lethality under many (though not all) conditions. We have revised the text to better reflect this nuance.

      Line 78. Advantages over what?

      Text modified

      Line 80 exposure: to finish the logic you need to show that E. coli and S. aureus persisters fail to do this.

      We thank the reviewer for their suggestion but studying these other organisms is outside the scope of this study. 

      Line 82 whereas: this misdirects the reader. It would seem that a simple "and" is better

      Text modified

      Line 89 I think this paragraph is about the need to study Mabs, the subject of the present report. This paragraph could use a more appropriate topic sentence to guide the reader so that no guessing is involved. I suggest rephrasing this paragraph to make the case for studying more compelling.

      Text modified

      Line 96. I suggest citing several references after subinhibitory concentration of antibiotic.

      The references are in the following sentence alongside the key observations.

      Line 99. Genetic analysis: how does this phrase fit with the idea of persister cells arising stochastically?

      There are two issues: 1) We would argue that persister formation is not completely stochastic, but rather a probability that can be modified both genetically and by environment (for example hipA PMID: 6348026). 2) Even if persister formation were totally stochastic, the survival of these cells may depend on specific genes – as we indeed find in our Tn-Seq analysis of Mabs.  

      Line 106. In this paragraph you need to define persister. The consensus definition (Balaban 2019 Nat Micro Rev) is a subpopulation of tolerant cells. Tolerance is defined as the slowing or absence of killing while an antibiotic retains its ability to block growth. See Zeng 2022 PNAS for example with rapidly growing cells. Phenotypic tolerance is the absence of killing due to environmental perturbations, most notably nutrient starvation, dormancy, and growth to stationary phase. By extension, phenotypic persistence would be subpopulation status of a phenotypically tolerant cells. If you have a different definition, it is important to state it and emphasize that you disagree with the consensus statement.

      Text modified  

      Line 109 unexpectedly. I would delete this word, because the literature leads the reader to expect this result unless you make a clear case for Mabs being fundamentally different from other bacteria with respect to how antibiotics kill bacteria (this is unlikely, see Shee 2022 AAC). Indeed, lines 111-113 state extensions of E. coli work, although suppression of ROS in phenotypic tolerance and genetic persistence have not been demonstrated.

      Text modified

      Line 124 you might add, in parentheses and with references, that a property of persisters is crosspersistence to multiple antibiotic classes. This is also true for tolerance, both genetic and phenotypic. An addition will support your approach.

      Text modified

      Line 128 minimal

      Text not modified. We appreciate the reviewer’s preference but both “minimal” and “minimum” are both widely accepted terms. Indeed, the Balaban et al 2019 consensus statement on definitions cited by the author above also uses “minimum” (PMID: 30980069), as do IDSA clinical guidelines (PMID: 39108079).

      Line 130 is MIC somehow connected to killing or did you also measure killing? Note that blocking growth and killing cells are mechanistically distinct phenomena, although they are related. By being upstream from killing, blockage of growth will also interfere with killing.

      Text modified

      Line 133 PBS is undefined

      Text modified

      Line 134 increase in persisters ... you need to establish that these are not phenotypically tolerant cells. Do they constitute the entire population (tolerance)? Your data would be more indicative of persisters if you saw a distinct plateau with the PBS samples, as such data are often used to document persistence (retardation of killing is a property of tolerance, Balaban 2019). Fig. 1B is clearly phenotypic tolerance, as the entire population grows. Your data suggest that you are not measuring persistence as defined in the literature (Balaban 2019). Line 139 persister should be tolerance •

      Text modified

      Lines 142, 143, 144. 159, 163, 171, 181, 211, 226, 238, 246, 277, 279,289 persistent should be tolerant

      Text modified

      Line 146 fig 1E Mtb does not show the adaptation phenomenon and it is clearly tolerant, not persistent. This should be pointed out. As stated, you may be misleading the reader.

      Text modified  

      *Line 169. Please make it clear whether these genes are affecting antibiotic susceptibility (MIC will affect killing because blocking growth is upstream) or if you are dealing with tolerance (no change in MIC). These measurements are essential and should included as a table. By antibiotic response, do you mean that antibiotics change expression levels?

      Regarding MICs, the data for MICs in control and katG mutant are presented in Figure 4C and 4F. Regarding ‘response’ we have clarified the text of this sentence.

      Line 174 Interestingly should be as expected

      Text not modified; tetracyclines do not induce ROS in E. coli and oxazolidinones have not been studied in this regard.

      Line 183 you need to include citations. You can cite the ability of chloramphenicol to block ROS-mediated killing of E. coli. That allows you to use the word unexpected

      Text modified

      Line 199. All of the data in Fig. 3 shows tolerance, not persistence, requiring word changes in this paragraph.

      Text modified

      Line 226. The MIC experiment is important. You can add that this result solidifies the idea that blocking growth and killing cells are distinct phenomena. You can cite Shee 2022 AAC for a mycobacterial paper

      Text modified

      Line 241. The result with levofloxacin is unexpected, because the fluoroquinolones are widely reported to induce ROS, even with mycobacteria (see Shee 2022 AAC). You need to point this out and perhaps redo the experiment to make sure it is correct.

      We appreciate the reviewer’s interest in this question. All experiments in this paper were repeated multiple times. This particular experiment was repeated 3 times and in all replicates the katG mutant was sensitized to translation inhibitors but not levofloxacin. Shee et al examined Mtb treated with moxifloxacin and found ROS generation, but did not assess whether a Mtb katG mutant had impaired survival. Thus, in addition to differences in: i) the species studied and ii) the particular fluoroquinolone used, the two sets of experiments were designed to address different questions (ROS accumulation vs protection by katG) . A cell might accumulate ROS without a katG mutant having impaired survival if genetic redundancy exists – a result we indeed see in our clinical Mabs strains under some conditions (new data included in revised Figure 6A).  

      Line 269 Additional controls would bolster the conclusion: use of an antioxidant such as thiourea and an iron chelator (dipyridyl) both should reduce ROS effects.

      New experiments performed, revised Figure 5.

      Line 276 the word no is singular

      Text modified

      Line 284 this suggested ... in fact previous work suggested. This summary paragraph might go better as the first paragraph of the Discussion

      Text modified to specify that this is in reference to the work in this manuscript

      Lines 294-299 Most of this is redundant and should be deleted.

      Text modified

      Line 299 this species is vague

      Text modified

      Line 310 Do you want to discuss spoT?

      Text not modified

      Line 313 paragraph is largely redundant

      Text modified

      Line 314 controversial. As above, I would delete this, especially since it is not referenced and is unlikely to be true. If you believe it, you have the obligation to show why the ROS-lethality idea is untrue. If you are referring to Lewis and Imlay, there were almost a dozen supporting papers before 2013 and many after. This statement does not make the present work more important, so deletion costs you nothing.

      Text modified

      Line 314 direct disruption of targets. This is clearly not a general principle, because the quinolones rapidly kill while inhibition of gyrase by temperature-sensitive mutations does not (Kreuzer 1979 J.Bact; Steck 1985). Indeed, formation of drug-gyrase-DNA complexes is reversible: death is not.

      Text modified

      Line 318 as pointed out above, you have not brought this story up to date. The two papers mainly focused on Kohanski 2007, ignoring other available evidence.’’

      Text modified

      Line 326 you need to cite Shee 2022 AAC

      Text modified

      Line 342 the idea of mutants being protective is not novel, as several have been reported with E. coli studies. Thus, there is a general principle involved.

      We agree that this suggests a potential general principle

      Line 344. It depends on the inhibitor. For example, aminoglycosides are translation inhibitors and they also cause the accumulation of ROS.

      We agree that ROS generation depends on the inhibitor, and indeed upon other variables including drug concentration, growth conditions, and bacterial species as well.  

      Line 347. You need to point out the considerable data showing that the absence of catalase increases killing

      Text modified

      Line 363 look at Shee 2022 AAC and Jacobs 2021 AAC

      Text modified, reference added.

      Line 585 I suggest having a colleague provide critical comments on the manuscript and acknowledge that person.

      Text not modified

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      Pavel et al. analyzed a cohort of atrial fibrillation (AF) patients from the University of Illinois at Chicago, identifying TTN truncating variants (TTNtvs) and TTN missense variants (TTNmvs). They reported a rare TTN missense variant (T32756I) associated with adverse clinical outcomes in AF patients. To investigate its functional significance, the authors modeled the TTN-T32756I variant using human induced pluripotent stem cell-derived atrial cardiomyocytes (iPSC-aCMs). They demonstrated that mutant cells exhibit aberrant contractility, increased activity of the cardiac potassium channel KCNQ1 (Kv7.1), and dysregulated calcium homeostasis. Interestingly, these effects occurred without compromising sarcomeric integrity. The study further identified increased binding of the titin-binding protein Four-and-a-Half Lim domains 2 (FHL2) with KCNQ1 and its modulatory subunit KCNE1 in the TTN-T32756I iPSCaCMs.

      Strengths:

      This work has translational potential, suggesting that targeting KCNQ1 or FHL2 could represent a novel therapeutic strategy for improving cardiac function. The findings may also have broader implications for treating patients with rare, disease-causing variants in sarcomeric proteins and underscore the importance of integrating genomic analysis with experimental evidence to advance AF research and precision medicine.

      Weaknesses

      (1) Variant Identification: It is unclear how the TTN missense variant (T32756I) was identified using REVEL, as none of the patients' parents reportedly carried the mutation or exhibited AF symptoms. Are there other TTN variants identified in the three patients carrying TTN-T32756I? Clarification on this point is necessary.  

      We thank the reviewer for their insightful comment. We have now clarified these in the method section.

      Line 484-491: “The TTN-T32756I variant (REVEL Score: 0.58758, Supplementary Table 1) was prioritized due to its occurrence in multiple unrelated individuals within our clinical AF cohort, despite no reported family history of AF in affected individuals. While no parental inheritance was observed, the possibility of de novo origin cannot be excluded. Furthermore, this variant is located within a region overlapping a deletion mutation recently shown to cause AF in a zebrafish model, supporting its potential pathogenicity [37]. Notably, the affected individuals did not carry additional loss-of-function TTN variants.”

      (2) Patient-Specific iPSC Lines: Since the TTN-T32756I variant was modeled using only one healthy iPSC line, it is unclear whether patient-specific iPSC-derived atrial cardiomyocytes would exhibit similar AF-related phenotypes. This limitation should be addressed.

      We have now acknowledged this limitation in the revised manuscript.

      Line 505-509: “Due to the patients' unavailability of peripheral blood mononuclear cells (PBMCs), we utilized a healthy iPSC line and introduced the TTN-T32756I variant using CRISPR/Cas9 genome editing. This approach ensures an isogenic background, thereby minimizing genetic variability and providing a controlled system to study the direct effects of the mutation.”

      (3) Hypertension as a Confounding Factor: The three patients carrying TTN-T32756I also have hypertension. Could the hypertension associated with this variant contribute secondarily to AF? The authors should discuss or rule out this possibility.

      We have now explicitly discussed this in the revised manuscript.

      Line 362-367: “Hypertension is a common comorbidity in patients with AF and could contribute to disease progression. However, all three individuals carrying TTN-T32756I exhibited earlyonset AF (onset before 66 years), with one case occurring as early as 36 years. This suggests a potential two-hit mechanism, where genetic predisposition and comorbidities influence disease risk. Importantly, our iPSC model isolates the genetic effects of TTN-T32756I from other factors, supporting a direct pathogenic role.”

      (4) FHL2 and KCNQ1-KCNE1 Interaction: Immunostaining data demonstrating the colocalization of FHL2 with the KCNQ1-KCNE1 (MinK) complex in TTN-T32756I iPSC-aCMs are needed to strengthen the mechanistic findings.

      We thank the reviewer for this insightful suggestion. We agree that additional immunostaining data would further strengthen the evidence for FHL2 colocalization with the KCNQ1-KCNE1 complex in TTN-T32756I iPSC-aCMs. In line with this, we have expanded our analysis to include both co-immunoprecipitation and confocal microscopy.  As described in the revised manuscript (Lines 282–287), the colocalization between KCNE1 and FHL2 was increased by approximately threefold in TTN-T32756I iPSC-aCMs compared with WT, supporting an enhanced interaction between these proteins (Figure 5A, Supplementary Figure 6). We are generating additional immunostaining data to validate and extend these findings, and we will incorporate them into the revised submission to further substantiate the mechanistic link proposed.

      Line 282-287: “…..if TTN-T32756I increases I<sub>ks</sub> by modulating the interaction between KCNQ1KCNE1 and FHL2, we performed co-immunoprecipitation studies and confocal microscopy in both WT and TTN-T32756I-iPSC-aCMs. The co-localization between KCNE1 and FHL2 increased ~3 fold in TTN-T32756I-iPSC-aCMs, suggesting an increased interaction between them (Figure 5A, Supplementary Figure 7).”

      (5) Functional Characterization of FHL2-KCNQ1-KCNE1 Interaction: To further validate the proposed mechanism, additional functional assays are necessary to characterize the interaction between FHL2 and the KCNQ1-KCNE1 complex in TTN-T32756I iPSC-aCMs.

      We thank the reviewer for this valuable suggestion. We agree that additional functional assays would provide further validation of the proposed mechanism. However, we believe such in-depth characterization warrants a dedicated follow-up study and is beyond the scope of the current revision. In this work, our primary objective is to establish that the TTN missense variant can exert a detrimental effect and serve as a substrate for AF. 

      Line 418-419: “Further study is needed to validate the proposed mechanism and determine if TTNmvs in other regions are associated with AF by a similar process.”

      Reviewer #2 (Public review):

      Summary:

      The authors present data from a single-center cohort of African-American and Hispanic/Latinx individuals with atrial fibrillation (AF). This study provides insight into the incidences and clinical impact of missense variants in this population in the Titin (TTN) gene. In addition, the authors identified a single amino acid TTN missense variant (TTN-T32756I) that was further studied using human induced pluripotent stem cell-derived atrial cardiomyocytes (iPSC-aCMs). These studies demonstrated that the Four-and-a-Half Lim domains 2 (FHL2) has increased binding with KCNQ1 and its modulatory subunit KCNE1 in the TTN-T32756I-iPSCaCMs, enhancing the slow delayed rectifier potassium current (Iks) and is a potential mechanism for atrial fibrillation. Finally, the authors demonstrate that suppression of FHL2 could normalize the Iks current.

      Strengths:

      The strengths of this manuscript/study are listed below:

      (1) This study includes a previously underrepresented population in the study of the genetic and mechanistic basis of AF.

      (2) The authors utilize current state-of-the-art methods to investigate the pathogenicity of a specific TTN missense variant identified in this underrepresented patient population.

      (3) The findings of this study identify a potential therapeutic for treating atrial fibrillation.

      Weaknesses:

      (1) The authors do not include a non-AF group when evaluating the incidence and clinical significance of TTN missense variants in AF patients.

      We appreciate the reviewer’s comment and acknowledge the limitation of not including a non-AF control group in our clinical analysis. As noted in the revised manuscript (Lines 347–353), our cohort was derived from a single-center registry of individuals with AF and therefore lacks a matched non-AF control population for direct comparison of TTN missense variant incidence. We agree that future studies incorporating larger, multiethnic validation cohorts with both AF and non-AF individuals, as well as evaluating AF-specific measures such as arrhythmia burden and treatment response, will be essential to fully elucidate the clinical significance of TTN missense variants in AF.

      Line 347-353: “Our cohort is derived from a single-center multi-ethnic registry of individuals with AF and lacks a matched cohort of non-AF controls to compare the incidence of TTN missense variants.  Further study exploring these associations in mult-ethnic, larger validation cohorts that include both AF and non-AF individuals and examining AF-specific measures such as arrhythmia burden or treatment response will be necessary to fully understand the clinical importance of TTNmvs in AF.”

      (2) The authors do not provide evidence that TTN-T32756I-iPSC-aCMs are arrhythmogenic, only that there is an increase in the Iks current and associated action potential changes. More specifically, the authors report that "compared to the WT, TTN-T32756I-iPSC-aCMs exhibited increased arrhythmic frequency," yet it is unclear what they are referring to by "arrhythmic frequency."

      We thank the reviewer for this important point and for highlighting the need for clarification. In our study, the term “arrhythmic frequency” was intended to describe the increased spontaneous beating rate, irregular action potential patterns, and abnormal calcium handling observed in TTN-T32756I iPSC-aCMs compared with WT. These findings support the concept that the AF-associated TTN-T32756I variant promotes ion channel remodeling and perturbs excitation–contraction coupling, thereby creating a potential arrhythmogenic substrate for AF. To avoid ambiguity, we have removed the term “arrhythmic frequency” and revised the text for clarity and precision (Lines 222–223).

      Lines 222-223: “Compared to the WT, TTN-T32756I-iPSC-aCMs exhibited increased frequency along with a significant reduction of the time to 50% and 90% decline of calcium transients (Figure 3G-I, Supplementary Figure 4F).”

      (3) There seem to be discrepancies regarding the impact of the TTN-T32756I variant on mechanical function. Specifically, the authors report "both reduced contraction and abnormal relaxation in TTN-T32756I-iPSC-aCMs" yet, separately report "the contraction amplitude of the mutant was also increased . . . suggesting an increased contractile force by the TTN-T32756IiPSC-aCMs and TTN-T32756I-iPSC-CMs exhibited similar calcium transient amplitudes as the WT."

      We thank the reviewer for highlighting this critical point and apologize for the lack of clarity. We intended to distinguish between changes in contractile force and contractile dynamics. Specifically, the increased contraction amplitude observed in TTN-T32756I iPSCaCMs reflects enhanced contractile force, whereas the reduced contraction duration and impaired relaxation reflect abnormalities in contractile kinetics. Together, these findings indicate that the TTN-T32756I variant alters both the strength and the temporal dynamics of contraction, consistent with dysfunctional mechanical performance. We have revised the text accordingly to more accurately convey these results (Lines 187–192).

      Lines 187-192: “Compared to WT, the beating frequency of the TTN-T32756I-iPSC-aCMs was significantly increased (52 ± 7.8 vs. 98 ± 7.5 beats per min, P=0.001; Figure 2C) coupled with the reduction of the contraction duration (456.5 ± 61.45 vs 262.9 ± 48.16 msec, P=0.032; Figure 2D), the peak-to-peak time (1529 ± 195.5 vs 636.6 ± 135.8 msec, P=0.004; Supplementary Figure 3B),  and the relaxation (281.5 ± 42.95 vs 79.40 ± 21.14 msec, P=0.003; Supplementary Figure 3A).”

      Reviewer #3 (Public review):

      Summary:

      The authors describe the abnormal contractile function and cellular electrophysiology in an iPSC model of atrial myocytes with a titin missense variant. They provide contractility data by sarcomere length imaging, calcium imaging, and voltage clamp of the repolarizing current iKs. While each of the findings is interesting, the paper comes across as too descriptive because there is no data merging to support a cohesive mechanistic story/statement, especially from the electrophysiological standpoint. There is not enough support for the title "A Titin Missense Variant Causes Atrial Fibrillation", since there is no strong causative evidence. There is some interesting clinical data regarding the variant of interest and its association with HF hospitalization, which may lead to future important discoveries regarding atrial fibrillation.

      Strengths:

      The manuscript is well written, and a wide range of experimental techniques are used to probe this atrial fibrillation model.

      Weaknesses

      (1) While the clinical data is interesting, it is essential to rule out heart failure with preserved EF as a confounder. HFpEF leads to AF due to increased atrial remodeling, so the fact that patients with this missense variant have increased HF hospitalizations does not necessarily directly support the variant as causative of AF. It could be that the variant is associated directly with HFpEF instead, and this needs to be addressed and corrected in the analyses.

      We appreciate the reviewer’s insightful comment and agree that HFpEF-related atrial remodeling could represent a potential confounder in the association between TTN missense variants and AF. The primary aim of our clinical analysis was to assess the potential significance of TTNmv in AF, recognizing the inherent limitations of retrospective observational data in establishing causality. To complement this, our in vitro studies were specifically designed to demonstrate that TTNmv can alter the electrophysiological substrate, thereby predisposing to AF independent of clinical comorbidities.

      While HFpEF is an important consideration, to our knowledge, no existing literature directly implicates TTNmv in HFpEF pathogenesis. In contrast, loss-of-function TTN variants are more commonly associated with HFrEF and dilated cardiomyopathy, and even these associations remain an area of active debate. To address potential confounding in our cohort, we adjusted for reduced ejection fraction in multivariable analyses of clinical outcomes. Additionally, we performed a sensitivity analysis excluding patients with nonischemic dilated cardiomyopathy (Supplementary Table 6). Together, these approaches mitigate the potential impact of heart failure subtypes on our findings, while our mechanistic studies strengthen the argument that TTNmv may contribute directly to AF susceptibility.

      (2) All contractility and electrophysiologic data should be done with pacing at the same rate in both control and missense variant groups, to control for the effect of cycle length on APD and calcium loading. A shorter APD cannot be claimed when the firing rate of one set of cells is much faster than the other, since shorter APD is to be expected with a quicker rate. Similarly, contractility is affected by diastolic interval because of the influence of SR calcium content on the myocyte power stroke. So the cells need to be paced at the same rate in the IonOptix for any direct comparison of contractility. The authors should familiarize themselves with the concept of electrical restitution.

      We thank the reviewer for this crucial technical comment. iPSC-derived cardiomyocytes (iPSC-CMs) are known to exhibit spontaneous automaticity due to the presence of pacemaker-like currents and reduced I<sub>K1</sub>, which enables interrogation of their intrinsic electrophysiological properties and disease-relevant remodeling. In our study, we leveraged this feature to test the hypothesis that TTN missense variants alter electrophysiological properties through ion channel remodeling. That said, we fully agree with the reviewer that pacing iPSCCMs at a controlled cycle length is essential for minimizing rate-dependent effects on APD, calcium handling, and contractility, and would improve the interpretability of group comparisons. While iPSC-CMs with matched genetic backgrounds are expected to display broadly comparable electrophysiological profiles, biological and technical variability can influence spontaneous beating rates, thereby confounding direct comparisons. To address this, we have incorporated pacing protocols into our revised experimental design to ensure that APD and contractility measurements are obtained under identical cycle lengths, consistent with the concept of electrical restitution.

      (3) It is interesting that the firing rate of the myocytes is faster with the missense variant. This should lead to a hypothesis and investigation of abnormal automaticity or triggered activity, which may also explain the increased contractility since all these mechanisms are related to the SR's calcium clock and calcium loading. See #2 above for suggestions on how to probe calcium handling adequately. Such an investigation into impulse initiation mechanisms would be compelling in supporting the primary statement of the paper since these are actual mechanisms thought to cause AF.

      We thank the reviewer for this insightful suggestion. We agree that the faster firing rate observed in TTN-T32756I iPSC-aCMs raises the possibility of abnormal automaticity or triggered activity, both of which are highly relevant to AF pathophysiology. As these mechanisms are tightly coupled to calcium handling and the SR calcium clock, further probing of calcium cycling abnormalities would provide valuable mechanistic insights. While this level of investigation is beyond the scope of the current study, we view it as a compelling future direction that could directly link TTN missense variants to impulse initiation abnormalities contributing to AF. 

      (4) The claim of shortened APD without correcting for cycle length is problematic. However, linking shortened APD in isolated cells alone to AF causation is more complicated. To have a setup for reentry, there must be a gradient of APD from short to long, and this can only be demonstrated at the tissue level, not at the cellular level, so reentry should not be invoked here. If shortened APD is demonstrated with correction of the cycle length problem, restitution curves can be made showing APD shortening at different cycle lengths. If restitution is abnormal (i.e. the APD does not shorten normally in relation to the diastolic interval), this may lead to triggered activity which is an arrhythmogenic mechanism. This would also tie in well with the finding of abnormally elevated iKs current since iKs is a repolarizing current directly responsible for restitution.

      We thank the reviewer for this necessary clarification. We agree that isolated cell studies cannot directly demonstrate reentrant circuits and that reentry should not be inferred solely from cellular APD data. Our observation of shortened APD and abnormal beating patterns in TTN-T32756I iPSC-aCMs suggests ion channel remodeling that may predispose to arrhythmogenic conditions. Still, we recognize that tissue-level gradients of APD are required to establish reentry as a mechanism. Accordingly, we have removed mention of “the reentrant mechanism” from the revised manuscript and limited our interpretation to the cellular findings. Future studies incorporating pacing protocols and restitution curve analyses will be valuable in determining whether abnormal APD restitution and elevated I<sub>Ks</sub> contribute to triggered activity, thereby providing a more direct mechanistic link to AF (Lines 101–105).

      Lines 101-105: “Our study showed that the TTN-T32756I iPSC-aCMs exhibited a striking AF-like EP phenotype in vitro, and transcriptomic analyses revealed that the TTNmv increases the activity of the FHL2, which then modulates the slow delayed rectifier potassium current (I<sub>Ks</sub>) to cause AF.” 

      Reviewer #1 (Recommendations for the authors):

      Electrophysiological Phenotype in Ventricular CMs: Has the iPSC line carrying TTN-T32756I been differentiated into ventricular cardiomyocytes (iPSC-vCMs)? The reported cellular phenotype in iPSC-aCMs does not seem to specifically reflect an AF phenotype. Does the variant produce similar electrophysiological alterations in iPSC-vCMs?

      We thank the reviewer for this thoughtful comment. To date, we have not differentiated the TTN-T32756I iPSC line into ventricular cardiomyocytes (iPSC-vCMs). Our current work focuses on iPSC-aCMs, where we demonstrate that the AF-associated TTNT32756I variant induces ion channel remodeling and abnormal beating patterns, thereby creating a potential arrhythmogenic substrate relevant to AF. We agree that investigating whether this variant produces similar or distinct electrophysiological alterations in iPSC-vCMs would provide essential insights into chamber-specific effects and broaden our mechanistic understanding. We have acknowledged this as a future direction in the revised manuscript (Lines 422–425).

      Lines 422-425: “While we have not yet explored the effect of TTN-T32756I in iPSC-derived ventricular cardiomyocytes, it would be interesting to investigate whether this variant produces similar or distinct electrophysiological alterations in the ventricular cardiomyocytes.”

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      This is a reply/revision plan, not definitive. Planned and already implemented revisions are underlined.

      First of all, we wish to express our gratitude to the reviewers: they helped to improve the paper.

      Reviewer #1:* **

      Reviewer #1 wrote: Major Comments: 1.Differential gene/pathway analysis across epithelial clusters: What are the differential genes or pathways among the epithelial clusters? Without CCA/Harmony integration, do the tumor subgroups show distinct differences? In addition, I suggest applying NMF or hdWGCNA to identify shared modules and test whether ATC and PTC harbor overlapping regulatory modules.

      *

      Reply plan: Both reviewers suggested some regulatory network analysis. We proposed to run SCENIC+ (Nature Methods, 2023, https://doi.org/10.1038/s41592-023-01938-4) on our data__.__

      * Reviewer #1 wrote: 2.Validation of TSHR/TPO-based subgrouping: While the TSHR/TPO grouping appears appropriate for stratification at the single-cell level, it is necessary to exclude sequencing depth as a confounding factor. Should validate the existence of these subpopulations using mIHC/IF on corresponding samples. *

      __Reply plan: __We made claims about RNA expression, not protein expression. Thus, validation should be at the RNA level:

      • We already replicated part of our analysis on the dataset published by Lu et al. (JCI 2023, https://doi.org/10.1172/JCI169653), see Figs. 3 and 4. This effort will be extended to all single cell analysis results from our study in the revised paper.
      • We will also present plots demonstrating that the sequencing depth is similar in the different cancer cell subgroups-further excluding it as a confounding factor. Reviewer #1 wrote: *3.Impact of mutational differences on conclusions: According to Supplementary Table 1, almost all PTC cases carried BRAF mutations, whereas four ATC patients harbored no BRAF mutation. Could this difference influence the conclusions of the study? Although the authors briefly mention this in the Discussion, a more thorough clarification is warranted. *

      Reply plan: The dataset of Lu et al. includes BRAF-mutated ATCs along with BRAF-mutated PTCs. Therefore, the replication mentioned earlier will also address those concerns. In fact, Fig. 4E-I already confirm in Lu et al. data the ordered loss of markers. Replication will be extended to other results of the study and be more emphasized in the paper.

      * Reviewer #1 wrote: 4. The statement "Myeloid and T cells also grouped in specific clusters" seems descriptive. Is this clustering biologically meaningful? Please elaborate.

      *

      __Reply plan: __This is an important point, and accordingly, a cell mixing experiment was specifically designed to sort apart technical effects from biological effects. We therefore know with certainty that the myeloid and T cell patients-specific clusters are the result of biological variation (Fig. 1). We further demonstrate that part of this variation is associated with hypoxia (Supp. Fig 4). So yes, the clustering is biologically meaningful.

      * Reviewer #1 wrote: Minor Comments: In Figure 2C, the "Epith TSHR-" population resembles myeloid cells. Could the authors clarify why this is the case? For the correlation analysis in Figure 2C, were highly variable genes or all genes used?

      *

      Reply plan: There is a simple explanation: The Epith TSHR- population the reviewer is referring to are cells from anaplastic thyroid cancers (ATC), which are tumors notoriously infiltrated by macrophages (Supp. Fig. 4). A high correlation of Epith TSHR- and macrophages proportion across our panel of ATC and papillary cancer (PTC) is therefore expected. Among other things, Fig. 2C shows that high correlation, but it is not meant to and does not show that Epith TSHR- and macrophages "resemble" one another. It shows that their proportions are highly positively correlated. This correlation analysis does not rely on gene expression but on cell type proportions. It measures co-occurrence rather than resemblance. The text has been clarified in order to prevent any confusion.

      • *

      * __Reviewer #2: __

      Reviewer #2 wrote: 1. This study largely confirms established facts that 1) PTC due to BRAF driver mutation is a heterogeneous tumour entity and 2) ATC is the most dedifferentiated of all thyroid cancers. Although interesting, observations of a highly variable tissue cell composition including immune cells and the gradual loss of thyroid differentiation markers, in part linked to tumor subclone development featured by altered chromosomal copy numbers, are thus not surprising.

      *

      __Reply plan: __We wish to respectfully express our take on this perception of the work:

      • There is a difference between conjecturing a high heterogeneity in the cell composition of thyroid cancers and establishing it with the level of accuracy and quantitative rigor our analysis provides. The extreme amplitude of that variation was surprising to us: the size of the microenvironment makes from 8.4 to 80% of the cells in PTCs driven by the same BRAF mutation.
      • We don't simply show that a subclone characterized by a large number of copy number events is less differentiated. We go all the way proving that those copy number alterations are associated with specific cell states that produce specific histology (Fig. 5). It required a combination of single cell transcriptomics, spatial transcriptomics and sophisticated computational analysis to establish that connection between genomic changes and histology. The fragmentation of epithelial sheets uncovered from CNV analysis had escaped the attention of pathologist colleagues and ours at first, this is not a parameter typically assessed in diagnostic, to our knowledge.
      • We don't simply show that there is a gradual loss of differentiation markers: this loss is ordered in a very specific way that mirrors the gain of markers during thyroid organoid differentiation. * Reviewer #2 wrote: 2. Considering tumor progression, comparison of PTC and ATC should preferably include specimens with the same driver mutation (BRAF or RAS), which is not the case here. This notion should be more clearly explained to readers. An optional improvement would be to conduct similar analyses on an ATC specimen that contains more differentiated PTC tumor portions arguably suggesting that PTC progresses to ATC (by mechanisms that are still largely unexplored).

      *

      __Reply plan: __This is clearly a limitation of our study. As already proposed in our reply to reviewer number one, we will extend to all our single cell results the replication of our analysis in the dataset of Lu e al., which includes ATCs and PTCs harboring the BRAF-mutation.

      * Reviewer #2 wrote: 3. Comments on findings of lymphocytic infiltration need to be balanced. Although autoimmune thyroid disease in infered a risk factor of developing malignancy it is unlikely that the majority of TCGA samples of PTC is associated with thyroiditis as indicated in Fig. 3 and Suppl Fig. 3. Immune cell abundance may rather reflect the tumor immune microenvironment (TIME).

      *

      __Reply plan: __The figure the reviewer is referring to demonstrates that PTC occurring in a background of thyroiditis also has a higher proportion of B cells. We did not claim, and the figure did not show, that "the majority of TCGA samples of PTC is associated with thyroiditis", because they don't. This point has been clarified.

      * Reviewer #2 wrote: 4. Some tissue sections seem of quite poor quality either shape-wise of containing rifts e.g. PTC7 in Fig. 3 and PTC2 in Fig. 5. The authors should explain whether and how this might influence analysis.

      *

      __Reply plan: __Spatial transcriptomics is typically performed on frozen sections. Frozen sections, which are obviously of lower visual quality than slice from FFPE preserved samples. Since no computational analyses were performed on the image, this lower quality has no impact on our results. Regarding RNA quality, the RINs were >7 for all tumors. RINs are now presented in Supp table S1.

      Reviewer #2 wrote: The experiment on mouse ESC/organoids (Fig. 6H-J) does not show much of an expected enhanced thyroid progenitor cell proliferation after induction of the mutant Braf allele by tamoxifen, which raises doupt whether the subsequent promoted growth by fibronectin at all is oncogene-related. This differs from the impact of BrafCA activation along with mouse thyroid development in vivo (Schoultz et al iScience 2023 PMID: 37534159). In the same experimental setup, it appears that mutant Braf prevents follicle formation (Fig. 6I). A control experiment investigating the influence of fibronectin in the absence of oncogene activation should be conclusive. The effect of Braf and fibronectin on thyroid organoid structure and function should be better explained, if necessary based on complementary experiments, and discussed in relation to the claimed association of fibronectin expression to "...low amounts of thyroid differentiation markers...) and "...loss of epithelial structure (PTC7, Figure 6E)." in the previous section of Results.

      __Reply plan: __The induction of the mutant Braf allele for 7 days increases the percentage of BrdU+ cells by 1.43 fold (p-value for Wilcoxon test = 0.035). The effects observed by Schoultz et al. are certainly more dramatic, but they result from an oncogenic activity spanning 1 to 6 months (4 to 26 times longer) in an in vivo model. Most importantly, oncogenic activity is initiated in Nkx2.1+ cells and not Tg+ cells, thus much earlier during development. These two models are thus not comparable. As for the effects of fibronectin on thyroid structure, we do not claim that our organoid model recapitulates the complex interactions between cancer cells and their microenvironment that shapes tissue morphology in vivo. This is now clarified in the text.

      We presented controls with no oncogene expression and no Fn1, controls with oncogene induction and no FN1 and organoids with oncogene induction and Fn1 treatment. This alone establishes the effect of Fn1 on induced organoids, which was our goal. We regard it as a novel and interesting but non-essential development in our paper.

      As the reviewer points out, while our results show an increased proliferation in Braf-mutated organoids treated with Fn1, they do not allow us to conclude on any potential interaction between Fn1 and the oncogenic process. The suggested experiment with Fn1 in absence of oncogene activation would add information, but we cannot follow up for practical lab management reasons detailed in Section 4 below.

      * Reviewer #2 wrote: 6. Concerning EMT profiling (Supplementary Fig. 7B) , there is a great similarity of ATC tumor cells and fibroblasts, and as stated in the text the malignant status of the former is indicated by chromosomal aberrations (refering to Suppl fig. 6). However, looking at Suppl. Fig. 7B it is evident that fractions of cells identified as fibroblasts express TG and TSHR suggesting mismatch. How was this comparison done in order to exclude mismatch? Is there no other profiled markers that distinguish cancer cells from stromal cells that can support conclusions?*

      Reply plan: TG-a thyrocyte marker-seems expressed by fibroblasts in Supplementary Figure 7B. The reviewer suggests this could be caused by an incomplete distinction between bona fide fibroblasts and thyrocytes in advanced EMT state. We argue that

      • Ambient TG RNA leaking out of thyrocytes nuclei contaminates the transcriptomes of all cell types. It is a well-known technical problem, with dedicated software packages to mitigate it. We preprocessed our data with one of them, SoupX, which corrected for most, but not all, ambient RNA contamination.
      • The plot below shows that there is nothing special about fibroblasts in that respect. For example, B and T cells are contaminated by TG at levels comparable to fibroblasts, endothelial cells and pericyte to higher levels.
      • In addition, the UMAP of Fig. 2A shows that EMT cells and fibroblast form very distinct clusters. Furthermore, the fibroblast cluster but not the two EMT clusters contain cells from PTC, and the PTC cluster do not contain cells with DNA copy number aberration. Thus, although both EMT cells and fibroblasts express the typical mesenchymal marker of Supplementary Fig. 7B, they are easy to distinguish on the basis of their overall transcriptomes.
      • The panel below has been added to the Supplementary Figure 7B. [Panel cannot be displayed here]

      Reviewer #2 wrote: *In the same figure, it appears there are no clear differences in EMT marker expression among PTC samples regardless of differentiation state, suggesting that the gradual loss of thyroid differentiation in PTC tumor cells and EMT are not parallel and potentially linked phenomena? Please clarify this dissociation of results. Is possible that refocusing on other EMT markers than the top 10, of which almost all concerns various collagen genes, might better reveal partial EMT in PTCs?

      *

      __Reply plan: __The technical basis of this comment is related to the previous point. Our perception is that the mesenchymal markers in Supplementary Fig. 7B show a binary effect, i.e. strong expression in ATC and no expression in PTC (beyond ambient RNA noise)-not a gradual effect. Thus, there is no correlation of COL1A1 and other mesenchymal markers with dedifferentiation in PTC as these markers are not expressed beyond the noise level of the experiments. A lot has been written about EMT in PTC, but one of the findings of our study is that while ATC undergo full EMT, EMT in PTC is very limited. PTC express FN1 but no other major mesenchymal markers such as collagens I and III, for example.

      • *

      Reviewer #2 wrote: *7. According to Suppl. Table 1, the ATC2 tumor does not harbor any mutations. What about chromosomal aberrations, was that included in analysis? Considering previous consistent reports of a high mutation burden in ATC, if not supported by other data (clinical, pathological) the diagnosis might be questioned for this particular case included in multiple analyses of the present study.

      *

      Reply: There is little doubt about the diagnostic of ATC2 by our pathologist collaborators

      • The histology of this tumor is strikingly anaplastic, i.e. without structure, as shown in the image below.
      • This tumor has a high level of macrophages infiltration typical of ATCs (Supplementary Fig. 4).
      • Reviewer #2 wrote: Minor comments: -The logical order of presentation of Results might benefit from first presenting specific PTC data following by ATC dito. I´m thinking of swapping the section of EMT in ATC to end of Results.*

      *Reply plan: We miss why the reviewer thinks that way. We believe that discussing the microenvironment, then tumor cells bring conciseness and clarity about how we propose to stratify the latter. By contrast, the suggested tumor type-centered structure entails going back and forth between the microenvironment and tumor cells, diluting the messages about both.

      * Reviewer #2 wrote: -Methods paragraph "Mouse ESC-derived thyroid organoids experiments" (starting with "ccc") seems to be missing some essential information.

      *Reply plan: A sentence was missing, indeed, and has been re-introduced in the manuscript. We thank the reviewer for catching that error.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      Expression pattern profiling of human thyroid cancer tissues by combining single cell/nuclei RNAseq analysis, spatial transcriptomics and immunofluorescence on corresponding tumor histologic sections. Papillary and anaplastic thyroid carcinomas (PTC n=10 and ATC n=4) were compared; some data were extracted from TCGA. The results indicate that ATCs consists of completely dedifferentiated tumor cells whereas PTCs show variable levels of dedifferentiation, which in a sense mimics the the reverse process of thyroid differentiation as observed in stem cell-based organoids. Moreover, PTC and ATC tumors show different levels of epithelial-mesenchymal transition. Fibronectin is inferred a role in promoting tumor growth, supported by functional studies on organoids. Authors suggest that global profiling of differentiation state is a promising technique to stratifiy tumor heterogeneity, with potentially might be useful distinguishing thyroid malignancies suitable or not to adjuvant treatment e.g. with radioiodine (RAI) therapy.

      Major comments:

      1. This study largely confirms established facts that 1) PTC due to BRAF driver mutation is a heterogeneous tumour entity and 2) ATC is the most dedifferentiated of all thyroid cancers. Although interesting, observations of a highly variable tissue cell composition including immune cells and the gradual loss of thyroid differentiation markers, in part linked to tumor subclone development featured by altered chromosomal copy numbers, are thus not surprising.
      2. Considering tumor progression, comparison of PTC and ATC should preferably include specimens with the same driver mutation (BRAF or RAS), which is not the case here. This notion should be more clearly explained to readers. An optional improvement would be to conduct similar analyses on an ATC specimen that contains more differentiated PTC tumor portions arguably suggesting that PTC progresses to ATC (by mechanisms that are still largely unexplored).
      3. Comments on findings of lymphocytic infiltration need to be balanced. Although autoimmune thyroid disease in infered a risk factor of developing malignancy it is unlikely that the majority of TCGA samples of PTC is associated with thyroiditis as indicated in Fig. 3 and Suppl Fig. 3. Immune cell abundance may rather reflect the tumor immune microenvironment (TIME).
      4. Some tissue sections seem of quite poor quality either shape-wise of containing rifts e.g. PTC7 in Fig. 3 and PTC2 in Fig. 5. The authors should explain whether and how this might influence analysis.
      5. The experiment on mouse ESC/organoids (Fig. 6H-J) does not show much of an expected enhanced thyroid progenitor cell proliferation after induction of the mutant Braf allele by tamoxifen, which raises doupt whether the subsequent promoted growth by fibronectin at all is oncogene-related. This differs from the impact of BrafCA activation along with mouse thyroid development in vivo (Schoultz et al iScience 2023 PMID: 37534159). In the same experimental setup, it appears that mutant Braf prevents follicle formation (Fig. 6I). A control experiment investigating the influence of fibronectin in the absence of oncogene activation should be conclusive. The effect of Braf and fibronectin on thyroid organoid structure and function should be better explained, if necessary based on complementary experiments, and discussed in relation to the claimed association of fibronectin expression to "...low amounts of thyroid differentiation markers...) and "...loss of epithelial structure (PTC7, Figure 6E)." in the previous section of Results.
      6. Concerning EMT profiling (Supplementary Fig. 7B) , there is a great similarity of ATC tumor cells and fibroblasts, and as stated in the text the malignant status of the former is indicated by chromosomal aberrations (refering to Suppl fig. 6). However, looking at Suppl. Fig. 7B it is evident that fractions of cells identified as fibroblasts express TG and TSHR suggesting mismatch. How was this comparison done in order to exclude mismatch? Is there no other profiled markers that distinguish cancer cells from stromal cells that can support conclusions? In the same figure, it appears there are no clear differences in EMT marker expression among PTC samples regardless of differentiation state, suggesting that the gradual loss of thyroid differentiation in PTC tumor cells and EMT are not parallel and potentially linked phenomena? Please clarify this dissociation of results. Is is possible that refocusing on other EMT markers than the top 10, of which almost all concerns various collagen genes, might better reveal partial EMT in PTCs?
      7. According to Suppl. Table 1, the ATC2 tumor does not harbor any mutations. What about chromosomal aberrations, was that included in analysis? Considering previous consistent reports of a high mutation burden in ATC, if not supported by other data (clinical, pathological) the diagnosis might be questioned for this particular case included in multiple analyses of the present study.

      Minor comments:

      • The logical order of presentation of Results might benefit from first presenting specific PTC data following by ATC dito. I´m thinking of swapping the section of EMT in ATC to end of Results.
      • Methods paragraph "Mouse ESC-derived thyroid organoids experiments" (starting with "ccc") seems to be missing some essential information.

      Significance

      The study confirms at single cell level the fundamental difference of PTC and ATC that is evident clinically and biologically, but does not address the intriguing issue how ATC may progress from PTC.

      Tumor heterogeneity of BRAFV600E-driven PTC in terms of dedifferentiation of functional parameters, which are of potential clinical relevance, is well documented.

      Reviewer expertise: thyroid development, thyroid cell and tumor biology, superficial knowledge in scRNAseq analysis

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      This study is well designed with a rational sample collection strategy. The authors collected PTC and ATC tissue samples for snRNA and spRNA sequencing, clearly characterizing tumor heterogeneity. Using representative thyroid differentiation markers (TSHR, TPO, TG, NIS), they distinguished different differentiation states of PTC and ATC and further validated the role of FN1 in organoid models. However, the manuscript is largely descriptive in nature, and several key issues remain to be addressed.

      Major Comments:

      1.Differential gene/pathway analysis across epithelial clusters: What are the differential genes or pathways among the epithelial clusters? Without CCA/Harmony integration, do the tumor subgroups show distinct differences? In addition, I suggest applying NMF or hdWGCNA to identify shared modules and test whether ATC and PTC harbor overlapping regulatory modules. 2.Validation of TSHR/TPO-based subgrouping: While the TSHR/TPO grouping appears appropriate for stratification at the single-cell level, it is necessary to exclude sequencing depth as a confounding factor. Should validate the existence of these subpopulations using mIHC/IF on corresponding samples. 3.Impact of mutational differences on conclusions: According to Supplementary Table 1, almost all PTC cases carried BRAF mutations, whereas four ATC patients harbored no BRAF mutation. Could this difference influence the conclusions of the study? Although the authors briefly mention this in the Discussion, a more thorough clarification is warranted. 4.The statement "Myeloid and T cells also grouped in specific clusters" seems descriptive. Is this clustering biologically meaningful? Please elaborate.

      Minor Comments:

      In Figure 2C, the "Epith TSHR-" population resembles myeloid cells. Could the authors clarify why this is the case? For the correlation analysis in Figure 2C, were highly variable genes or all genes used?

      Significance

      This study provides a comprehensive single-nucleus and spatial transcriptomic atlas of papillary and anaplastic thyroid carcinomas. Its strengths include well-designed sample collection, high-resolution profiling of tumor heterogeneity, and validation of FN1 function. By stratifying malignant cells with thyroid differentiation markers (TSHR, TPO, TG, NIS), the authors delineate differentiation states and highlight mechanisms of progression from PTC to ATC. However, the study remains mainly descriptive, and additional analyses of gene modules, pathway regulation would increase its conceptual depth. The findings will interest researchers in thyroid cancer, tumor heterogeneity, and the single-cell/spatial genomics field, with potential relevance for translational oncology.

      Field of expertise: thyroid cancer biology, single-cell and spatial transcriptomics.

    1. Reviewer #1 (Public review):

      Summary:

      The NF-kB signaling pathway plays a critical role in the development and survival of conventional alpha beta T cells. Gamma delta T cells are evolutionarily conserved T cells that occupy a unique niche in the host immune system and that develop and function in a manner distinct from conventional alpha beta T cells. Specifically, unlike the case for conventional alpha beta T cells, a large portion of gamma delta T cells acquire functionality during thymic development, after which they emigrate from the thymus and populate a variety of mucosal tissues. Exactly how gamma delta T cells are functionally programmed remains unclear. In this manuscript, Islam et al. use a wide variety of mouse genetic models to examine the influence of the NF-kB signaling pathway on gamma delta T cell development and survival. They find that the inhibitor of kappa B kinase complex (IKK) is critical to the development of gamma delta T1 subsets, but not adaptive/naïve gamma delta T cells. In contrast, IKK-dependent NF-kB activation is required for their long-term survival. They find that caspase 8-deficiency renders gamma delta T cells sensitive to RIPK1-mediated necroptosis, and they conclude that IKK repression of RIPK1 is required for the long-term survival of gamma delta T1 and adaptive/naïve gamma delta T cells subsets. These data will be invaluable in comparing and contrasting the signaling pathways critical for the development/survival of both alpha beta and gamma delta T cells.

      The conclusions of the paper are mostly well-supported by the data, but some aspects need to be clarified.

      (1) The authors appear to be excluding a significant fraction of the TCRlow gamma delta T cells from their analysis in Figure 1A. Since this population is generally enriched in CD25+ gamma delta T cells, this gating strategy could significantly impact their analysis due to the exclusion of progenitor gamma delta T cell populations.

      (2) The overall phenotype of the IKKDeltaTCd2 mice is not described in any great detail. For example, it is not clear if these mice possess altered thymocyte or peripheral T cell populations beyond that of gamma delta T cells. Given that gamma delta T cell development has been demonstrated to be influenced by gamma delta T cells (i.e, trans-conditioning), this information could have aided in the interpretation of the data. Related to this, it would have been helpful if the authors provided a comparison of the frequencies of each of the relevant subsets, in addition to the numbers.

      (3) The manner in which the peripheral gamma delta T cell compartment was analyzed is somewhat unclear. The authors appear to have assessed both spleen and lymph node separately. The authors show representative data from only one of these organs (usually the lymph node) and show one analysis of peripheral gamma delta T cell numbers, where they appear to have summed up the individual spleen and lymph node gamma delta T cell counts. Since gamma deltaT17 and gamma deltaT1 are distributed somewhat differently in these compartments (lymph node is enriched in gamma deltaT17, while spleen is enriched in gamma deltaT1), combining these data does not seem warranted. The authors should have provided representative plots for both organs and calculated and analyzed the gamma delta T cell numbers for both organs separately in each of these analyses.

      (4) The authors make extensive use of surrogate markers in their analysis. While the markers that they choose are widely used, there is a possibility that the expression of some of these markers may be altered in some of their genetic mutants. This could skew their analysis and conclusions. A better approach would have been to employ either nuclear stains (Tbx21, RORgammaT) or intracellular cytokine staining to definitively identify functional gamma deltaT1 or gamma deltaT17 subsets.

      (5) The analysis and conclusion of the data in Figure 3A is not convincing. Because the data are graphed on log scale, the magnitude of the rescue by kinase dead RIPK1 appears somewhat overstated. A rough calculation suggests that in type 1 game delta T cells, there is ~ 99% decrease in gamma delta T cells in the Cre+WT strain and a ~90% decrease in the Cre+KD+ strain. Similarly, it looks as if the numbers for adaptive gamma delta T cells are a 95% decrease and an 85% decrease, respectively. Comparing these data to the data in Figure 5, which clearly show that kinase dead RIPK1 can completely rescue the Caspase 8 phenotype, the conclusion that gamma delta T cells require IKK activity to repress RIPK1-dependent pathways does not appear to be well-supported. In fact, the data seem more in line with a conclusion that IKK has a significant impact on gamma delta T cell survival in the periphery that cannot be fully explained by invoking Caspase8-dependent apoptosis or necroptosis. Indeed, while the authors seem to ultimately come to this latter conclusion in the Discussion, they clearly state in the Abstract that "IKK repression of RIPK1 is required for survival of peripheral but not thymic gamma delta T cells." Clarification of these conclusions and seeming inconsistencies would greatly strengthen the manuscript. With respect to the actual analysis in Figure 3A, it appears that the authors used a succession of non-parametric t-tests here without any correction. It may be helpful to determine if another analysis, such as ANOVA, may be more appropriate.

      (6) The conclusion that the alternative pathway is redundant for the development and persistence of the major gamma delta T cell subsets is at odds with a previous report demonstrating that Relb is required for gamma delta T17 development (Powolny-Budnicka, I., et al., Immunity 34: 364-374, 2011). This paper also reported the involvement of RelA in gamma delta T17 development. The present manuscript would be greatly improved by the inclusion of a discussion of these results.

      (7) The data in Figures 1C and 3A are somewhat confusing in that while both are from the lymph nodes of IKKdeltaTCD2 mice, the data appear to be quite different (In Figure 3A, the frequency of gamma delta T cells increases and there is a near complete loss of the CD27+ subset. In Figure 1A, the frequency of gamma delta T cells is drastically decreased, and there is only a slight loss of the CD27+ subset.)

    2. Reviewer #2 (Public review):

      This study presents a comprehensive genetic dissection of the role of IKK signaling in the development and maintenance of lymphoid gd T cells. By employing a variety of conditional and mutant mouse models, the authors demonstrate that IKK-dependent NF-κB activation is essential for the generation of type 1 gd T cells, while adaptive gd T cells require this pathway primarily for long-term survival. The use of multiple complementary genetic strategies, including IKK deletion and modulation of RIPK1 and CASPASE8 activity, provides robust mechanistic insight into subset-specific regulation of gd T cell homeostasis. Overall, the study provides mechanistic insight for IKK-dependent regulation of gd T cell development and peripheral maintenance. However, additional experiments can be performed to improve this manuscript and its interpretations.

      Specific Concerns:

      (1) All approaches used confer changes to the entire T cell compartment. Therefore, the authors are unable to resolve whether the observations are mediated by direct and/or indirect effects (e.g., disorganized lymphoid architecture impacting maintenance/survival/homing).

      (2) Assessment of factors that impact T cell numbers in the periphery is necessary. Are there observable changes to the proliferation, survival, and migration of gd T cell subsets?

      (3) TCRd chain usage, especially among type 3 gd T cells, should be assessed.

      (4) The functional consequences of IKK signaling on gd T cells were largely unaddressed. Cytokine analyses were performed only in the RIPK1D138N Casp8∆TCD2 model, leaving open the question of how canonical NF-κB-dependent signaling impacts the long-term functionality of gd T cells.

      (5) The authors suggest that Caspase 8 is required for the development and maintenance of type 3 gd T cells. While the authors discussed the limitations of assessing adult mice in interpreting the data, it seems like a relatively straightforward experiment to perform.

      (6) While analyses of Casp8∆TCD2 RIPK1D138N mice suggest that loss of adaptive and type 1 gamma delta T cells in Casp8∆TCD2 animals is due to necroptosis, the contribution of RIPK3 kinase activity remains unexamined. RIPK3 activity determines whether cells die via necroptosis or apoptosis in RIPK1/Caspase8-dependent signaling, and inclusion of this analysis would strengthen mechanistic insights.

      (7) Canonical NF-κB signaling through cRel alone was not evaluated, leaving a gap in the understanding of transcriptional pathways required for gd T cell subsets.

    1. Reviewer #1 (Public review):

      It is widely accepted that the number of muscle stem cells (MuSCs) declines with aging, leading to diminished regenerative capacity. In this study, when MuSCs were labeled with YFP at a young age, the authors found that the YFP-positive MuSC population remained stable with aging. However, VCAM1 and Pax7 expression levels were reduced in the YFP-positive MuSCs. These VCAM1-negative/low cells exhibited limited proliferative potential and reduced regenerative ability upon transplantation into MuSC-depleted mice. Furthermore, Vcam1-/low MuSCs were highly sensitive to senolysis and represented the population in which Vcam1 expression could be restored by DHT. Finally, the authors identified CD200 and CD63 as markers capable of detecting the entire geriatric MuSC population, including Vcam1-/low cells. Although numerous studies have reported an age-related decline in MuSC numbers, this study challenges that consensus. Therefore, the conclusions require further careful validation.

      Major comments:

      (1) As mentioned above, numerous studies have reported that the number of MuSCs declines with aging. The authors' claim is valid, as Pax7 and Vcam1 were widely used for these observations. However, age-related differences have also been reported even when using these markers (Porpiglia et al., Cell Stem Cell 2022; Liu et al., Cell Rep 2013). When comparing geriatric Vcam1⁺ MuSCs with young MuSCs in this study, did the authors observe any of the previously reported differences? Furthermore, would increasing the sample size in Figure 1 reveal a statistically significant difference? The lack of significance appears to result from variation within the young group. In addition, this reviewer requests the presentation of data on MuSC frequency in geriatric control mice using CD200 and CD63 in the final figure.

      (2) Can the authors identify any unique characteristics of Pax7-VCAM-1 GER1-MuSCs using only the data generated in this study, without relying on public databases? For example, reduced expression of Vcam1 and Pax7. The results of such analyses should be presented.

      (3) In the senolysis experiment, the authors state that GER1-MuSCs were depleted. However, no data are provided to support this conclusion. Quantitative cell count data would directly address this concern. In addition, the FACS profile corresponding to Figure 4D should be included.

      (4) Figure S4: It remains unclear whether DHT enhances regenerative ability through restoration of the VCAM1 expression in GER1-MuSCs, as DHT also acts on non-MuSC populations. Analyses of the regenerative ability of Senolysis+DHT mice may help to clarify this issue.

      (5) Why are there so many myonuclear transcripts detected in the single-cell RNA-seq data? Was this dataset actually generated using single-nucleus RNA-seq? This reviewer considers it inappropriate to directly compare scRNA-seq and snRNA-seq results.

    2. Reviewer #2 (Public review):

      In this study, Kim et al. explore the heterogeneity within the aged MuSC population using a mouse model that enables lineage tracing of MuSCs throughout life. The questions addressed in the manuscript are highly relevant to the fields of aging and stem cell biology, and the experimental approach overcomes limitations of earlier studies. However, some of the claims would benefit from additional data analysis, and the central claim of the identification of a "previously unrecognized subpopulation" of aged MuSCs should be evaluated in light of prior work that has also examined MuSC heterogeneity in aging.

      Specific points:

      (1) As a general comment that is transversal to multiple figures, several experiments should include a direct comparison to a young cohort. Previous studies have shown that the depletion of subpopulations with aging is observed early in the aging process, for example, the loss of Pax7-high MuSCs is observed already in 18‐month‐old mice (Li, 2019, doi: 10.15252/embj.2019102154). Using only mice at 12-14 months as the control group is therefore insufficient to claim that no changes occur with aging.

      (2) One of the central claims of the manuscript is a challenge to the notion that MuSCs number declines with age. However, the data analysis associated with the quantification of YFP+ cells needs to be expanded to support this conclusion. The authors present YFP+ cells only as a proportion of Lin-neg cells. Since FAP numbers are known to decrease with aging, a stable proportion of YFP+ cells would simply indicate that MuSCs decline at the same rate as FAPs. To more accurately assess changes in MuSC abundance, the authors should report absolute numbers of YFP+ cells normalized to tissue mass (cells/ mg of muscle).

      (3) The authors emphasize that several studies use VCAM1 as a surface marker to identify MuSCs. However, many other groups rely on α7-integrin, and according to Figure 1D, the decline in ITGA7 expression within the YFP+ population is not significant. Therefore, the suggestion that MuSC numbers have been misquantified with aging would apply only to a subset of studies. If the authors can demonstrate that YFP+ cell numbers (normalized per milligram of tissue) remain unchanged in geriatric mice, the discussion should directly address the discrepancies with studies that quantify MuSCs using the Lin−/α7-integrin+ strategy.

      (4) The authors focus their attention on a population of VCAM-low/VCAM-neg subpopulation of MuSCs that is enriched in aging. However, the functional properties of this same population in middle-aged (or young) mice are not addressed. Thus, it remains unclear whether geriatric VCAM-low/VCAM-neg MuSCs lose regenerative potential or whether this subpopulation inherently possesses low regenerative capacity and simply expands during aging.

      (5) According to Figure 1F, the majority of MuSCs appear to fall within the category of VCAM-low or VCAM-neg (over 80% by visual estimate). It would be important to have an exact quantification of these data. As a result, the assays testing the proliferative and regenerative capacity of VCAM-low/negative cells are effectively assessing the performance of more than 80% of geriatric MuSCs, which unsurprisingly show reduced efficiency. Perhaps more interesting is the fact that a population of VCAM-high geriatric MuSCs retains full regenerative potential. However, the existence of MuSCs that preserve regenerative potential into old age has been reported in other studies (Garcia-Prat, 2020, doi: 10.1038/s41556-020-00593-7 ; Li, 2019, doi: 10.15252/embj.2019102154). At this point, the central question is whether the authors are describing the same aging-resistant subpopulations of MuSCs using a new marker (VCAM) or whether this study truly identifies a new subpopulation of MuSCs. The authors should directly compare the YFP+VCAM+ aged cells with other subpopulations that maintain regenerative potential in aging.

      (6) In Figure 3F, it is unclear from the data presentation and figure legend whether the authors are considering the average of fiber sizes in each mouse as a replicate (with three data points per condition), or applied statistical analysis directly to all individual fiber measurements. The very low p-values with n=3 are surprising. It is important to account for the fact that observations from the same mouse are correlated (shared microenvironment, mouse-specific effects) and therefore cannot be considered independent.

      (7) Regarding Figure 5, it is unclear why ITGA7, a classical surface marker for MuSCs that appears unchanged in aged YFP+ MuSCs (Fig. 1F), is considered inadequate for detecting and isolating GERI-MuSCs.

    1. Que horas são

      1.são onze horas 2,à uma hora 3,Tânia e Patrícia 4,beto 5,não eles vão restaurante italiano 6,O Beto vai fazer assistir televisão 7,Onze e quinze

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Detailed point-by-point response

      __ __The Reviewers provided suggestions to improve the manuscript, most notably by adding experiments to (1) further support the role of Stim and Orai in epidermal heat-off responses and (2) further characterize the thermosensory responses of epidermal cells. We additionally propose to include a new set of calcium imaging experiments to visualize nociceptor sensitization by epidermal cells.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Summary: Drosophila larvae are known to respond to noxious stimuli by rolling. The authors propose that this response arises not only by sensory response of nociceptive neurons but also by direct response of larval epidermal cells. They go onto test this idea by independently manipulating epidermal cells and nociceptive sensory neurons using GAL4 lines, GCAMPs and RNAis. The behavioural data are convincing and presented clearly with good statistical analysis. However the involvement of epidermal cells in evoking the behaviour as well as STIM/Orai mediated Ca2+ entry requires further experiments. Use of another independent GAL4 strain for epidermal cells, alternate RNAi lines for STIM and Orai, mutants for STIM and Orai and overexpression constructs for STIM and Orai would significantly enhance the data. Thus, as of now the key results require more convincing. The following additional experiments would be required to support their claims:

      1) Either use a second epidermal GAL4 strain to show key results OR provide images of the epidermal GAL4 expression double-labelled with a ppk driver using a different fluorescent protein to establish NO overlap of the epidermal GAL4 with neurons. These strains should be available free in Bloomington.

      We agree that the specificity of the GAL4 driver is an important point. In a recent publication (Yoshino et al, eLife, 2025) we provide the most comprehensive analysis of larval epidermal GAL4 drivers published to date. Included in this study is expression analysis of R38F11-GAL4 demonstrating that it is indeed specifically expressed in the epidermis. Based on the detailed expression analysis and functional analysis provided in that paper, R38F11-GAL4 was chosen for these studies as it is both highly specific for epidermal cells and provides uniform expression across the body wall.

      In our revised manuscript, we will more clearly detail how the driver was chosen for this study and provide a citation to the prior work to accompany our description of R38F11-GAL4 as an epidermis-specific driver line.

      2) Authors need to provide better data for the involvement of STIM and Orai in the Calcium responses observed. A single RNAi for each gene with marginal change in response is insufficient. The authors also do not state if the RNAis used are validated by them or anyone else. Minimally they should repeat their experiments with at least one other validated RNAi and rescue these with overexpression constructs of STIM and Orai (available in Bloomington). It is well established in literature that overexpression of STIM/Orai can rescue SOCE in Drosophila. Ideally, to be fully convincing they should test a Drosophila knockout for STIM (available in Bloomington). Heterozygotes of this are viable and should be tested. Additionally a UAS Orai dominant negative (OraiDN) strain is available in Bloomington and can be tested.

      We appreciate the Reviewer’s perspective on the importance of characterizing the efficacy of the reagents we used in this study. However, we disagree with the characterization of the change in response as “marginal”. Our results demonstrate that epidermal knockdown of Stim or Orai causes a significant reduction in the heat-off response of epidermal cells and heat-induced nociceptive sensitization.

      In a prior published study (Yoshino et al, eLife, 2025) we validated for their efficacy of these RNAi lines in combination with the same GAL4 driver at the same developmental stage. Specifically, we demonstrated that R38F11GAL4-mediated expression of UAS-Stim RNAi or UAS-Orai RNAi significantly attenuated store operated calcium entry following story depletion by thapsigargin. In the revised manuscript, we will add a statement referring to this prior validation along with a citation. In light of this prior characterization, we disagree that additional RNAi lines are required to corroborate the results.

      The most salient point of the Reviewer’s comment is that additional evidence should be provided to demonstrate more convincingly the requirement of Stim/Orai in epidermal heat-off responses. We detail our plans to address this point below, but first address the specific experimental suggestions the Reviewer provides.

      First, the Reviewer suggests the use of a dominant-negative version of Orai, and we agree that this could prove complimentary to our RNAi experiments.

      The Reviewer suggests two additional genetic approaches which are well-reasoned but problematic. First, they suggest rescuing the RNAi knockdowns with overexpression approaches. In addition to requiring the generation of new, RNAi-refractory transgenes, this approach is confounded by the effects of overexpressing CRAC channel components. Orai channels exhibit highly cooperative activation by Stim, and we previously showed that epidermal Stim overexpression drove mechanical nociceptive sensitization. Although this dosage effect confounds the rescue assays, we will examine whether epidermal Stim overexpression similarly sensitizes larvae to noxious thermal inputs as we would predict from our model.

      The final experiment the Reviewer suggests – phenotypic analysis of Stim knockouts – is not possible due to the lethal phase of the mutants. Furthermore, it is not possible using traditional mosaic analysis to generate mutant epidermal clones that span the entire epidermis. Such an approach might be possible with a newly engineered FLP-out Stim allele, but generating that reagent is beyond the scope of this work. The Reviewer suggests characterization of Stim heterozygotes, but Drosophila genes rarely show strong dosage effects as heterozygotes (though we acknowledge that dosage effects can be amplified in the cases of genetic interactions), hence a negative result (no effect on heat-off responses) would not be meaningful. In principle we could test whether Stim hetorozygosity enhances effects of epidermal Stim RNAi. Although a negative result will not be telling, the experiment is straightforward, and an enhancement of the effect of Stim RNA would support the model that RNAi provides an incomplete functional knockdown of Stim. We will therefore perform this experiment and incorporate the results into the revised manuscript, pending a postitive outcome.

      To better define the contributions of Stim and Orai to heat-off responses of epidermal cells, we will incorporate results from the following new experiments into our revised manuscript:

      • We will monitor effects of epidermis-specific expression of a dominant negative form of Orai on epidermal heat-off responses (calcium imaging) and heat-induced nociceptive sensitization (behavioral assays).
      • We will monitor effects of epidermis-specific co-expression of Stim+Orai RNAi on epidermal heat-off responses (calcium imaging) and heat-induced nociceptive sensitization (behavioral assays)
      • Orai channels exhibit highly cooperative activation by Stim, therefore we will examine whether epidermal Stim overexpression increases the amplitude of heat-off responses (calcium imaging) and sensitizes larvae to noxious thermal inputs (behavioral assays) as we would predict from our model.

        Minor comments that can be addressed:

      1) Figure 1: Further details required on how the rolling response is measured. Figure is uninformative. A video would be really helpful.

      We appreciate the suggestion. We will add a more detailed explanation of how the behaviors were scored along with an annotated video.

      2) I could not find Figure 1I described in the text. This section should be explained properly.

      Figure 1I is described in the figure legend and we will add an in-text citation.

      3) Figure 3: There appears to a small response at 32oC - why is this ignored in the text? It would be useful to have S3 in the main figure.

      The small response at 32C is not ignored, though that individual response is better understood in the context of all responses plotted in Figure 3D. We will reword the phrase “At temperature maxima below 35°C epidermal cells rarely exhibited heat-off responses” to reflect the small response that is observed at lower temperatures. We will also replace the trace in the figure – the original submission contained the one outlier sample that exhibited robust responses at 32 C.

      We appreciate the suggestion to include Fig S3 in the main text – we initially included it, but moved it to the supplement for space considerations. We will include it as a main figure in our revised submission.

      4) Fig 4: The DF/F traces for the two RNAis should be included in this figure.

      We appreciate the suggestion; we will add these traces to our revised submission.

      5) Extent of knockdown in the epidermis by each RNAi should be shown by RTPCRs.

      We note that efficacy of the knockdowns has been validated by us in acutely dissociated epidermal cells. RTPCR validation as described would require FACS-sorting of acutely dissociated, GFP-labeled epidermal cells from each specimen, an extremely time- and resource intensive experiment that provides limited information. The more relevant information is the physiological readout of Stim/Orai functional knockout using these reagents which we previously conducted. As described above, we will add a description of these experiments and the relevant citation.

      6) The authors need to explain why only a small change in the Ca2+ response is seen with either RNAi. Are there other Ca2+ channels involved? Ideally they could test mutants/RNAi for the TRP channel family. Loss of SOCE in Drosophila neurons changes the expression of other membrane channels - is this possible here? Minimally, this possibility needs to be discussed.

      We agree with the Reviewer that this topic warrants further discussion. Pending the results of our planned experiments (Orai dominan negative, Stim+Orai RNAi), we will incorporate a discussion of other channels that may contribute to the heat-off response. We appreciate the Reviewers point that loss of SOCE in Drosophila neurons can change the expression of membrane channels – that is an intriguing possibility that might explain the modest effects of Stim or Orai knockdown. We have not investigated effects of epidermal Stim/Orai knockdown on expression of other channels, but will incorporate this possibility into our discussion.

      7) In the methods section please explain how the % DF/F calculations are done and how are they normalised to the ionomycin response.

      We will incorporate these additional details in the methods section.

      8) Authors need to look at previous work on STIM and Orai in Drosophila and reference appropriately.

      We appreciate the suggestion and will incorporate additional discussion of relevant Drosophila work on STIM and Orai.

      **Referees cross-commenting**

      Reviewers 2 and 3 have raised some additional queries to what I had mentioned in my review. I agree with their comments. The authors should attempt to address all comments by all three reviewers.

      We address their comments below.

      Reviewer #1 (Significance (Required)):

      This is an interesting study that identifies epidermal cells in Drosophila with the ability to sense a drop in temperature after receiving noxious heat stimuli and invoke appropriate behaviour. Behaviour experiments are well conducted and convincing. So far only nociceptive neurons were thought to control such behavioural responses so the work is significant and important for the field. The mechanism identified needs further convincing and I have suggested experiments that would be of help. With the additional experiments suggested the work will be of interest to neuroethologists, Drosophila neuroscientists and scientists in the field of Ca signaling.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary:

      Noxious heat can have a strong adverse effect on animals, resulting in sensitization when noxious thermal stimuli are applied repeatedly. Noxious heat induces a characteristic rolling behavior in Drosophila melanogaster larvae. This study investigates sensitization, whereby a second heat stimulus evokes this behavior with significantly shorter latency (e.g., 3.4 seconds) than the initial exposure (e.g., 8.79 seconds). While prior research has implicated central and peripheral neurons in this process, recent findings in mammalian systems suggest a role for keratinocytes.

      In this manuscript, Yoshino et al. report that epidermal cells are necessary and sufficient to mediate heat sensitization in D. melanogaster larvae. Using an ex vivo epidermal imaging system, the authors demonstrate that calcium influx in epidermal cells is crucial for sensitization. Importantly, this calcium influx was observed only when the temperature was lowered from a dangerously high to a safe temperature. The calcium channel system Orai and Stim facilitates this influx.

      Major comments:

      (1) The authors clearly demonstrate the heat-off reaction using calcium influx imaging. However, all of the imaging shows the response to the first stimulation. Since the study focuses on sensitization, which shows a quicker response to the second heat stimulus, it would be helpful if the authors showed calcium influx when the second stimulus was applied. It would also be interesting to see how many times the epidermal cells can react to heat stimulation.

      We appreciate the suggestion from the Reviewer but note that the calcium influx we show occurs in epidermal cells, which signal to neurons to potentiate future responses in our model. We have emphasized this point in our revised manuscript.

      The relevant response to visualize the sensitization is the heat-evoked calcium response in nociceptors, not epidermal cells. We have verified that C4da neurons exhibit calcium responses to the warming stimulus we use in our heat-off paradigm and our preliminary studies suggest that the heat-off stimulus potentiates future responses to noxious heat in nociceptors. We will therefore examine (1) whether epidermal stimulation triggers a sensitization of nociceptors to thermal stimuli by monitoring heat-induced calcium responses using GCaMP, and (2) whether epidermal Stim and Orai are required for this sensitization.

      The second comment addresses the response of epidermal cells to repeated rounds of stimuli. We agree that this is an interesting point. We have verified that epidermal cells indeed respond to multiple rounds of heat-off stimuli. We will incorporate results from a paradigm in which epidermal cells are presented with two successive heat-off stimuli, spaced by 5 minutes to allow epidermal cytosolic calcium to return to baseline. We will incorporate new analysis examining the relative magnitude of epidermal cells to the first and second stimulus.

      (2) Figure 5 only shows one condition: a 30-second interval between the first and second heat application. While the rolling latency of the Luciferase RNAi control ranges from 4 to 12 seconds (with a median of 5 seconds), Fig. 1E shows a latency ranging from 6 to 12 seconds (with a median of 10 seconds) under the same 30-second interval conditions. This difference makes interpreting the effect of Stim and Orai confusing. The authors need to clarify whether the knockdowns accelerate the first response or delay the second response.

      The Reviewer notes that we assayed effects of Stim/Orai RNAi on heat-induced nociceptive sensitization in only one paradigm. Given the kinetics of cytosolic calcium increases following Stim or Orai RNAi in epidermal cells (Fig. 4F), we agree that an additional set of behavior experiments investing sensitization following a 60 sec recovery is warranted. For our revision we will conduct a time-course to assay requirements of epidermal Stim and Orai (using epidermal expression of Stim/Orai RNAi and Orai dominant negative transgenes) on heat-induced nociceptive sensitization. Our preliminary studies indicate that Stim and Orai RNAi significantly reduce heat-induced sensitization following 60 s of recovery (we present results from 30 s of recovery in the original submission).

      The Reviewer raises some questions about differences in behavioral latencies in Figure 1E and Figure 5B. We intentionally avoid such comparisons both because the genetic backgrounds are different and the experiments were conducted at very different times (more than 1 year apart). In both experiments the salient feature that we discuss is the presence or absence of sensitization, not the mean latency. We note that we do compare mean latency values in Figure 1B, but that was a distinct experimental paradigm (global heat of variable temperatures followed by focal noxious heat) designed specifically to define heat stimuli that generate the maximum level of sensitization. In that case, the genotype was fixed and all assays were conducted concurrently.

      Minor comments:

      (i) In Fig. 2C´´, the authors observed clear calcium influx in epidermal cells by combining the GCaMP genetic tool with an ex vivo thermal perfusion system. Although this system applies heat uniformly across the epidermal tissue, calcium influx is spatially restricted, appearing primarily in the head and tail regions of the epidermis. These results suggest that the heat-responsive epidermal cells are localized to these regions or that there are regional differences in sensitivity. The authors should explain the spatial relationship between the heat-applied epidermal cells and the occurrence of calcium influx.

      The Reviewer notes that intensity of the epidermal GCaMP signal is particularly intense in the anterior and posterior portions of the fillet preparation (Fig. 1B-1C), and we agree that it would be useful to include an explanation of this result, which is an artifact of the sample preparation.

      The specimens we use for calcium preparation are “butterfly” preparations – the body wall is filleted along the long axis with the exception of regions at the head and tail that are pinned down on sylgard plates. Hence, the regions in the head and tail contain intact tissue (including a double layer of skin when we image in widefield), not a single layer of skin (the rest of the prep). More significantly, the head and tail regions are pinned down, creating a wound that triggers lasting local calcium transients (note signal in the absence of temperature stimulus, Figure 1B’ and 1B”, 1C’). We therefore exclude this region from our analysis. We note that our behavior studies relied on stimuli presented to the abdominal segments we sample in the semi-intact calcium imaging. Similarly, we dissociated epidermal cells exclusively from these segments for imaging of acutely isolated epidermal cells.

      We do note that there is a periodicity to the signal – within each segment there are local maxima and minima of signal, and we agree with the Reviewer that this spatial segregation is an interesting point for discussion. We will add 1-2 sentences to our discussion of the result to acknowledge this point.

      (ii) Related to comment (i) above, if heat stimuli are applied topically using a heat probe under the ex vivo imaging system, how large an area reacts to the stimuli?

      The Reviewer raises an interesting question about the local response to heat stimuli. In our dissociated cell experiments we found that the overwhelming majority of isolated epidermal cells exhibit heat-off responses, and we likewise find that the majority of cells in our semi-intact preparation respond to heat-off stimuli. However, our current probe for delivering local heat stimuli is not compatible with our imaging system. We are working to incorporate an IR laser to focally deliver heat stimulus to explore whether epidermal cells signal to neighbors following stimulation, but such studies are beyond the scope of the current work.

      (iii) Providing supplementary movie(s) of the calcium live imaging would enhance the reader's understanding.

      We agree with the Reviewer that this would be a useful supplement. We will add representative movies as experimental supplements in our revised manuscript.

      (iv) The time point of the image in Fig. 2C´ ("before heat") is not the most informative for demonstrating a "heat-off" response. The authors should replace it with an image taken during the heat application to provide a more direct comparison with the post-stimulus influx shown in Fig. 2C´´.

      We appreciate the Reviewer’s suggestion and agree this would be a better choice to visually represent the change in fluorescence induced by the heat-off response. We will make this change in our revised manuscript.

      (v) The authors state that sensitization occurs "primarily in the 30-45 ºC range." However, the rolling probability and latency developed oppositely at 45 ºC stimulation than at 40 ºC. It would be doubtful that 45 ºC may be approaching a noxious or damaging threshold that engages a different phenomenon. The authors should reconsider including 45 ºC within the optimal sensitization range or provide a justification.

      We agree with the Reviewer that a more detailed discussion of the effects of temperature at the end of the range (45 C) is warranted. Exposure to a 45 C global heat stimulus triggered temporary paralysis in some larvae, and we suspect that this accounts for the apparent reduction in roll probability following the second stimulus. We can add a plot depicting the proportion of larvae that exhibited paralysis during 45 C global heat and determine whether these heat-paralyzed larvae exhibited distinct responses from larvae that were not paralyzed and provide a more detailed account of the optimal sensitization range.

      Treatment with 45 C stimuli still triggered a significant reduction in roll latency (sensitization), but we did not examine whether the latency was significantly different from what was observed at 40 C. We can add that analysis in the revision.

      (vi) In the sentence "To this end, we developed a perfusion system, that would deliver thermal ramps from ~20-45ºC ...," the tilde ~ should be replaced with "approximately".

      Noted. We will make the change.

      (vii) Throughout the manuscript, please clarify in the figure legends whether the sample size (n) refers to the number of individual animals or the number of cells.

      Noted. We will add the relevant details to our sample sizes notations.

      (viii) The Key Resources Table does not specify the wild-type (WT) strain used for the control experiments (e.g., in Fig. 1). Please provide the full genotype of the control strain used.

      We included the experimental genotypes in each figure legend, which we find more useful than the key resource table, which contains a list of all reagents used in the study (Drosophila alleles included).

      Reviewer #2 (Significance (Required)):

      General Assessment

      This study addresses a fundamental question in sensory biology: whether epidermal cells, long regarded as passive participants in somatosensation, actively contribute to noxious heat detection and avoidance behavior. While previous work has defined the neuronal circuits and TRP channel mechanisms underlying thermal nociception in Drosophila larvae, the potential sensory role of skin cells has remained largely unexplored. The authors integrate behavioral analysis with in vitro and ex vivo calcium imaging to provide a rigorous, multi-level investigation of epidermal thermosensitivity.

      Advancement

      The work advances the field by revealing that Drosophila epidermal cells are intrinsically thermosensitive and can acutely sensitize larval nociceptive responses to noxious heat through heat-off signaling. This discovery shifts the current paradigm of thermal nociception from a neuron-centric model to one that incorporates epidermal contributions, highlighting a conserved and previously underappreciated role of skin cells in active environmental sensing.

      The reviewer's expertise: Molecular genetics, developmental biology, insect physiology and endocrinology.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This manuscript describes the temperature responses of Drosophila larval epidermal cells. These cells are activated by cooling and also exhibit strong heat-off responses. Orai and Stim are required in epidermal cells for these heat-off responses. The heat-off responses sensitize the epidermal cells, leading to a greater proportion of animals displaying rolling behaviors and a reduced latency to initiate rolling following noxious heating treatment. The following comments are intended to help improve the manuscript.

      Major:

      1. In Figure 3A, the conclusion will be strengthened by testing heat-off responses from 10 {degree sign}C to 40 {degree sign}C.

      The Reviewer makes an important point. In our original experiment, the lack of response in the 10C – 30C experiment could be due to some cold-induced suppression of the off response. We have found that this is not the case – we have found that off responses following a 10C-40C ramp are indistinguishable from responses to a 20C-40C ramp. In our revised manuscript we will incorporate new results showing epidermal heat off responses to a 10C-40C ramp as well as normalization to 20C-40C responses performed in parallel.

      Figure 4C shows that 2-APB suppresses the heat-off response. Since 2-APB blocks both Orai and TRP channels, it is unclear why the authors focused exclusively on the Orai pathway without testing TRP channels.

      We found that epidermal cells exhibited minimal responses to warming stimuli, as would be expected for the epidermally expressed TRP channel TRPA1. In addition, the heat-off response we identified was remarkably similar to characteristic heat-off responses of mammalian CRAC channels. Hence, we focused our attention on the Orai pathway. While we agree that contributions of TRP channels could be of interest, especially if our additional analyses (double RNAi and Orai Dominant Negative) support the model that additional channels likely contribute to the heat-off response, the characteristic temperature responses of CRAC channels made them the most plausible candidate.

      In parallel to the experiments to further characterize Stim/Orai contributions to the heat-off response, we will assay requirements of TRPA1 to heat-induced nociceptor sensitization.

      While 2-APB completely abolishes the heat-off response, Orai and Stim RNAi only slightly (although significantly) reduce calcium responses. The knockdown efficiency of the RNAi constructs should be validated. Furthermore, testing whether combining Orai RNAi and Stim RNAi produces a stronger reduction in calcium responses would be informative.

      We addressed the question of knockdown efficiency above, and agree that testing the effects of Orai RNAi and Stim RNAi in combination is worthwhile. We detailed our plans for these experiments above.

      The study uses third-instar larvae. Please specify whether early, mid, or late third instar were used.

      In our original submission we stated “Third-instar larvae (96-120 AEL) larvae were used in all experiments” We provide additional details on the staging of larvae for all experiments in the methods section of our revised submission. To synchronize cultures, embryos were collected from experimental crosses for 24 h, aged for 96 h, and foraging mid-third instar larvae (96-120 h old) were used for all experiments.

      Please provide more details about the thin layer of water used. Specifically, indicate the size of the Peltier plate and the volume of water applied.

      We provide additional details on the application of global heat stimulus in the methods section of our revised manuscript. “For assays testing effects of varying the temperature of prior thermal stimuli on thermal nociception, larvae were individually transferred to a pre-warmed Peltier plate (11 x 7 cm; Torrey Pines Scientific). Peltier plates were warmed to the indicated temperatures, a thin layer of water was applied to the surface using a paint brush, and the temperature was verified using an infrared thermometer. Larvae were transferred individually to the Peltier plate, incubated for the indicated time, and recovered to 2% Agar Pads using a paint brush. Following 10 s of recovery, larvae were stimulated with a 41.5°C thermal probe, as above, and latency to the first complete roll was recorded.”

      Minor:

      1. There is an inconsistency between the text and the figure regarding the sample number in Figure 1D.

      We thank the reviewer for identifying the discrepancy. This inconsistency has been corrected in the revised submission.

      Please provide the raw representative data for the time course of heat-off calcium responses in Figure 1E.

      We will incorporate representative traces for the heat-off responses plotted in Figure 1E.

      A period is missing at the end of the sentence: "For curve fitting, sample-averaged fluorescence traces were fitted with a single exponential decay function using R to extract a representative time constant (τ) and assess response kinetics."

      We thank the reviewer for identifying the omission. The period has been added.

      In the sentence "Behavior Responses were analyzed post-hoc blind to genotype and were plotted according to roll probability and roll latency," the word Responses should begin with a lowercase r.

      This has been corrected in the revised submission.

      Reviewer #3 (Significance (Required)):

      This manuscript describes the heat-off responses of larval epidermal cells and investigates their underlying molecular mechanisms as well as associated behavioral consequences.

      The calcium responses and behavioral assays are clearly presented. However, the contribution of Stim and Orai to this process is not convincing.

      The study may be of interest to researchers working on Drosophila and temperature sensation, as well as to those studying Orai and Stim function.

      I am a researcher specializing in Drosophila thermosensation.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      This manuscript describes the temperature responses of Drosophila larval epidermal cells. These cells are activated by cooling and also exhibit strong heat-off responses. Orai and Stim are required in epidermal cells for these heat-off responses. The heat-off responses sensitize the epidermal cells, leading to a greater proportion of animals displaying rolling behaviors and a reduced latency to initiate rolling following noxious heating treatment. The following comments are intended to help improve the manuscript.

      Major:

      1. In Figure 3A, the conclusion will be strengthened by testing heat-off responses from 10 {degree sign}C to 40 {degree sign}C.
      2. Figure 4C shows that 2-APB suppresses the heat-off response. Since 2-APB blocks both Orai and TRP channels, it is unclear why the authors focused exclusively on the Orai pathway without testing TRP channels.
      3. While 2-APB completely abolishes the heat-off response, Orai and Stim RNAi only slightly (although significantly) reduce calcium responses. The knockdown efficiency of the RNAi constructs should be validated. Furthermore, testing whether combining Orai RNAi and Stim RNAi produces a stronger reduction in calcium responses would be informative.
      4. The study uses third-instar larvae. Please specify whether early, mid, or late third instar were used.
      5. Please provide more details about the thin layer of water used. Specifically, indicate the size of the Peltier plate and the volume of water applied.

      Minor:

      1. There is an inconsistency between the text and the figure regarding the sample number in Figure 1D.
      2. Please provide the raw representative data for the time course of heat-off calcium responses in Figure 1E.
      3. A period is missing at the end of the sentence: "For curve fitting, sample-averaged fluorescence traces were fitted with a single exponential decay function using R to extract a representative time constant (τ) and assess response kinetics."
      4. In the sentence "Behavior Responses were analyzed post-hoc blind to genotype and were plotted according to roll probability and roll latency," the word Responses should begin with a lowercase r.

      Significance

      This manuscript describes the heat-off responses of larval epidermal cells and investigates their underlying molecular mechanisms as well as associated behavioral consequences.

      The calcium responses and behavioral assays are clearly presented. However, the contribution of Stim and Orai to this process is not convincing.

      The study may be of interest to researchers working on Drosophila and temperature sensation, as well as to those studying Orai and Stim function.

      I am a researcher specializing in Drosophila thermosensation.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Summary:

      Noxious heat can have a strong adverse effect on animals, resulting in sensitization when noxious thermal stimuli are applied repeatedly. Noxious heat induces a characteristic rolling behavior in Drosophila melanogaster larvae. This study investigates sensitization, whereby a second heat stimulus evokes this behavior with significantly shorter latency (e.g., 3.4 seconds) than the initial exposure (e.g., 8.79 seconds). While prior research has implicated central and peripheral neurons in this process, recent findings in mammalian systems suggest a role for keratinocytes. In this manuscript, Yoshino et al. report that epidermal cells are necessary and sufficient to mediate heat sensitization in D. melanogaster larvae. Using an ex vivo epidermal imaging system, the authors demonstrate that calcium influx in epidermal cells is crucial for sensitization. Importantly, this calcium influx was observed only when the temperature was lowered from a dangerously high to a safe temperature. The calcium channel system Orai and Stim facilitates this influx.

      Major comments:

      (1) The authors clearly demonstrate the heat-off reaction using calcium influx imaging. However, all of the imaging shows the response to the first stimulation. Since the study focuses on sensitization, which shows a quicker response to the second heat stimulus, it would be helpful if the authors showed calcium influx when the second stimulus was applied. It would also be interesting to see how many times the epidermal cells can react to heat stimulation.

      (2) Figure 5 only shows one condition: a 30-second interval between the first and second heat application. While the rolling latency of the Luciferase RNAi control ranges from 4 to 12 seconds (with a median of 5 seconds), Fig. 1E shows a latency ranging from 6 to 12 seconds (with a median of 10 seconds) under the same 30-second interval conditions. This difference makes interpreting the effect of Stim and Orai confusing. The authors need to clarify whether the knockdowns accelerate the first response or delay the second response.

      Minor comments:

      (i) In Fig. 2C´´, the authors observed clear calcium influx in epidermal cells by combining the GCaMP genetic tool with an ex vivo thermal perfusion system. Although this system applies heat uniformly across the epidermal tissue, calcium influx is spatially restricted, appearing primarily in the head and tail regions of the epidermis. These results suggest that the heat-responsive epidermal cells are localized to these regions or that there are regional differences in sensitivity. The authors should explain the spatial relationship between the heat-applied epidermal cells and the occurrence of calcium influx.

      (ii) Related to comment (i) above, if heat stimuli are applied topically using a heat probe under the ex vivo imaging system, how large an area reacts to the stimuli?

      (iii) Providing supplementary movie(s) of the calcium live imaging would enhance the reader's understanding.

      (iv) The time point of the image in Fig. 2C´ ("before heat") is not the most informative for demonstrating a "heat-off" response. The authors should replace it with an image taken during the heat application to provide a more direct comparison with the post-stimulus influx shown in Fig. 2C´´.

      (v) The authors state that sensitization occurs "primarily in the 30-45 ºC range." However, the rolling probability and latency developed oppositely at 45 ºC stimulation than at 40 ºC. It would be doubtful that 45 ºC may be approaching a noxious or damaging threshold that engages a different phenomenon. The authors should reconsider including 45 ºC within the optimal sensitization range or provide a justification.

      (vi) In the sentence "To this end, we developed a perfusion system, that would deliver thermal ramps from ~20-45ºC ...," the tilde ~ should be replaced with "approximately".

      (vii) Throughout the manuscript, please clarify in the figure legends whether the sample size (n) refers to the number of individual animals or the number of cells.

      (viii) The Key Resources Table does not specify the wild-type (WT) strain used for the control experiments (e.g., in Fig. 1). Please provide the full genotype of the control strain used.

      Significance

      General Assessment

      This study addresses a fundamental question in sensory biology: whether epidermal cells, long regarded as passive participants in somatosensation, actively contribute to noxious heat detection and avoidance behavior. While previous work has defined the neuronal circuits and TRP channel mechanisms underlying thermal nociception in Drosophila larvae, the potential sensory role of skin cells has remained largely unexplored. The authors integrate behavioral analysis with in vitro and ex vivo calcium imaging to provide a rigorous, multi-level investigation of epidermal thermosensitivity.

      Advancement

      The work advances the field by revealing that Drosophila epidermal cells are intrinsically thermosensitive and can acutely sensitize larval nociceptive responses to noxious heat through heat-off signaling. This discovery shifts the current paradigm of thermal nociception from a neuron-centric model to one that incorporates epidermal contributions, highlighting a conserved and previously underappreciated role of skin cells in active environmental sensing.

      The reviewer's expertise: Molecular genetics, developmental biology, insect physiology and endocrinology.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary: Drosophila larvae are known to respond to noxious stimuli by rolling. The authors propose that this response arises not only by sensory response of nociceptive neurons but also by direct response of larval epidermal cells. They go onto test this idea by independently manipulating epidermal cells and nociceptive sensory neurons using GAL4 lines, GCAMPs and RNAis. The behavioural data are convincing and presented clearly with good statistical analysis. However the involvement of epidermal cells in evoking the behaviour as well as STIM/Orai mediated Ca2+ entry requires further experiments. Use of another independent GAL4 strain for epidermal cells, alternate RNAi lines for STIM and Orai, mutants for STIM and Orai and overexpression constructs for STIM and Orai would significantly enhance the data. Thus, as of now the key results require more convincing. The following additional experiments would be required to support their claims:

      1) Either use a second epidermal GAL4 strain to show key results OR provide images of the epidermal GAL4 expression double-labelled with a ppk driver using a different fluorescent protein to establish NO overlap of the epidermal GAL4 with neurons. These strains should be available free in Bloomington.

      2) Authors need to provide better data for the involvement of STIM and Orai in the Calcium responses observed. A single RNAi for each gene with marginal change in response is insufficient. The authors also do not state if the RNAis used are validated by them or anyone else. Minimally they should repeat their experiments with at least one other validated RNAi and rescue these with overexpression constructs of STIM and Orai (available in Bloomington). It is well established in literature that overexpression of STIM/Orai can rescue SOCE in Drosophila. Ideally, to be fully convincing they should test a Drosophila knockout for STIM (available in Bloomington). Heterozygotes of this are viable and should be tested. Additionally a UAS Orai dominant negative (OraiDN) strain is available in Bloomington and can be tested.

      Minor comments that can be addressed:

      1) Figure 1: Further details required on how the rolling response is measured. Figure is uninformative. A video would be really helpful.

      2) I could not find Figure 1I described in the text. This section should be explained properly.

      3) Figure 3: There appears to a small response at 32oC - why is this ignored in the text? It would be useful to have S3 in the main figure.

      4) Fig 4: The DF/F traces for the two RNAis should be included in this figure.

      5) Extent of knockdown in the epidermis by each RNAi should be shown by RTPCRs.

      6) The authors need to explain why only a small change in the Ca2+ response is seen with either RNAi. Are there other Ca2+ channels involved? Ideally they could test mutants/RNAi for the TRP channel family. Loss of SOCE in Drosophila neurons changes the expression of other membrane channels - is this possible here? Minimally, this possibility needs to be discussed.

      7) In the methods section please explain how the % DF/F calculations are done and how are they normalised to the ionomycin response.

      8) Authors need to look at previous work on STIM and Orai in Drosophila and reference appropriately.

      Referees cross-commenting

      Reviewers 2 and 3 have raised some additional queries to what I had mentioned in my review. I agree with their comments. The authors should attempt to address all comments by all three reviewers.

      Significance

      This is an interesting study that identifies epidermal cells in Drosophila with the ability to sense a drop in temperature after receiving noxious heat stimuli and invoke appropriate behaviour. Behaviour experiments are well conducted and convincing. So far only nociceptive neurons were thought to control such behavioural responses so the work is significant and important for the field. The mechanism identified needs further convincing and I have suggested experiments that would be of help. With the additional experiments suggested the work will be of interest to neuroethologists, Drosophila neuroscientists and scientists in the field of Ca signaling.

    1. Reviewer #1 (Public review):

      The authors use inducible Fz::mKate2-sfGFP to explore "cell-scale signaling" in PCP. They reach several conclusions. First, they conclude that cell-scale signaling does not depend on limiting pools of core components (other than Fz). Second, they conclude that cell-scale signaling does not depend on microtubule orientation, and third, they conclude that cell-scale signaling is strong relative to cell to cell coupling of polarity.

      There are some interesting inferences that can be drawn from the manuscript, but there are also some significant challenges in interpreting the results and conclusions from the work as presented. I suggest that the authors 1) define "cell-scale signaling," as the precise meaning must be inferred, 2) reconsider some premises upon which some conclusions depend, 3) perform an essential assay validation, and 4) explain some other puzzling inconsistencies.

      Major concerns (first round of review):

      The exact meaning of cell-scale signaling is not defined, but I infer that the authors use this term to describe how what happens on one side of a cell affects another side. The remainder of my critique depends on this understanding of the intended meaning.

      The authors state that any tissue wide directional information comes from pre-existing polarity and its modification by cell flow, such that the de novo signaling paradigm "bypasses" these events and should therefore not be responsive to any further global cues. It is my understanding that this is not a universally accepted model, and indeed, the authors' data seem to suggest otherwise. For example, the image in Fig 5B shows that de novo induction restores polarity orientation to a predominantly proximal to distal orientation. If no global cue is active, how is this orientation explained? The 6 hr condition, that has only partial polarity magnitude, is quite disordered. Do the patterns at 8 and 10 hrs become more proximally-distally oriented? It is stated that they all show swirls, but please provide adult wing images, and the corresponding orientation outputs from QuantifyPolarity to help validate the notion that the global cues are indeed bypassed by this paradigm.

      It is implicit that, in the de novo paradigm, polarization is initiated immediately or shortly after heat shock induction. However, the results should be differently interpreted if the level of available Fz protein does not rise rapidly and then stabilize before the 6 hr time point, and instead continues to rise throughout the experiment. Western blots of the Fz::mKate2-sfGFP at time points after induction should be performed to demonstrate steady state prior to measurements. Otherwise, polarity magnitude could simply reflect the total available pool of Fz at different times after induction. Interpreting stability is complex, and could depend on the same issue, as well as the amount of recycling that may occur. Prior work from this lab using FRAP suggested that turnover occurs, and could result from recycling as well as replenishment from newly synthesized protein.

      From the Fig 3 results, the authors claim that limiting pools of core proteins do not explain cell-scale signaling, a result expected based on the lack of phenotypes in heterozygotes, but of course they do not test the possibility that Fz is limiting. They do note that some other contributing protein could be.

      In Fig 3, it is unclear why the authors chose to test dsh1/+ rather than dsh[null]/+. In any case, the statistically significant effect of Dsh dose reduction is puzzling, and might indicate that the other interpretation is correct. Ideally, a range including larger and smaller reductions would be tested. As is, I don't think limiting Dsh is ruled out.

      The data in Fig 5 are somewhat internally inconsistent, and inconsistent with the authors' interpretation. In both repolarization conditions, the authors claim that repolarization extends only to row 1, and row 1 is statistically different from non-repolarized row 1, but so too is row 3. Row 2 is not. This makes no sense, and suggests either that the statistical tests are inappropriate and/or the data is too sparse to be meaningful. For the related boundary intensity data in Fig 6, the authors need to describe exactly how boundaries were chosen or excluded from the analysis. Ideally, all boundaries would be classified as either meido-lateral (meaning anterior-posterior) or proximal-distal depending on angle.

      If the authors believe their Fig 5 and 6 analyses, how do they explain that hairs are reoriented well beyond where the core proteins are not? This would be a dramatic finding, because as far as I know, when core proteins are polarized, prehair orientation always follows the core protein distribution. Surprisingly, the authors do not so much as comment about this. The authors should age their wings just a bit more to see whether the prehair pattern looks more like the adult hair pattern or like that predicted by their protein orientation results.

    2. Author response:

      (1) General Statements

      Our manuscript studies mechanisms of planar polarity establishment in vivo in the Drosophila pupal wing. Specifically we seek to understand mechanisms of ‘cell-scale signalling’ that is responsible for segregating core pathway planar polarity proteins to opposite cell edges. This is an understudied question, in part because it is difficult to address experimentally.

      We use conditional and restrictive expression tools to spatiotemporally manipulate core protein activity, combined with quantitative measurement of core protein distribution, polarity and stability. Our results provide evidence for a robust cell-scale signal, while arguing against mechanisms that depend on depletion of a limited pool of a core protein or polarised transport of core proteins on microtubules. Furthermore, we show that polarity propagation across a tissue is hard, highlighting the strong intrinsic capacity of individual cells to establish and maintain planar polarity.

      The original manuscript received three fair and thorough peer-reviews, which raised many important points. In response, we decided to embark on a full revision that attempts to answer all of the points. We have included new data to support our conclusions in Supplemental Figures 1, 2 and 5.

      Additionally in response to the reviewers we have revised the manuscript title, which is now ‘Characterisation of cell-scale signalling by the core planar polarity pathway during Drosophila wing development’.

      (2) Point-by-point description of the revisions

      We thank all of the reviewers for their thorough and thoughtful review of our manuscript. They raise many helpful points which have been extremely useful in assisting us to revise the manuscript.

      In response we have carried out a major revision of the manuscript, making numerous changes and additions to the text and also adding new experimental data. Specific changes are listed after our detailed response to each comment.

      Reviewer #1:

      […] Major points:

      The exact meaning of cell-scale signaling is not defined, but I infer that the authors use this term to describe how what happens on one side of a cell affects another side. The remainder of my critique depends on this understanding of the intended meaning.

      As the reviewer points out, it is important that the meaning of the term ‘cell-scale signalling’ is clear to the reader and in response to their comment we have had another go at defining it explicitly in the Introduction to the manuscript.

      Specifically, we use the term ‘cell-scale signalling’ to describe possible intracellular mechanisms acting on core protein segregation to opposite cell membranes during core pathway dependent planar polarisation. For example, this could be a signal from distal complexes at one side of the cell leading to segregation of proximal complexes to the opposite cell edge, or vice versa. See also our response to Reviewer #2 regarding the distinction between ‘molecular-scale’ and ‘cell-scale’ signalling. 

      Changes to manuscript: Revised definition of ‘cell-scale signalling’ in Introduction.

      The authors state that any tissue wide directional information comes from pre-existing polarity and its modification by cell flow, such that the de novo signaling paradigm "bypasses" these events and should therefore not be responsive to any further global cues. It is my understanding that this is not a universally accepted model, and indeed, the authors' data seem to suggest otherwise. For example, the image in Fig 5B shows that de novo induction restores polarity orientation to a predominantly proximal to distal orientation. If no global cue is active, how is this orientation explained?

      We assume that the reviewer’s point is that it is not universally accepted that de novo induction after hinge contraction leads to uncoupling from global cues (rather than that it is not accepted that hinge contraction remodels radial polarity to a proximodistal pattern). We are (we believe) the only lab that has used de novo induction as a tool, and we’re not aware of any debate in the literature about whether this bypasses global cues. Nevertheless, we accept that it is hard to prove there is no influence of global cues, when the nature of those cues and the time at which they act remain unclear. Below we summarise the reasons why we believe there are not significance effects of global cues in our experiments that would influence the interpretation of our results.

      First, our reading of the literature supports a broad consensus that an early radial core planar polarity pattern is realigned by cell flow produced by hinge contraction beginning at around 16h APF (e.g. Aigouy et al., 2010; Strutt and Strutt, 2015; Aw and Devenport, 2017; Butler and Wallingford, 2017; Tan and Strutt, 2025). Taken at face value, this suggests that there are ‘radial’ cues present prior to hinge contraction, maybe coming from the wing margin – arguably these radial cues could be Ft-Ds or Wnts or both, given they are expressed in patterns consistent with such a role (notwithstanding the published evidence arguing against roles for either of these cues). It then appears that hinge contraction supercedes these cues to convert a radial pattern to a proximodistal pattern – whether the radial cues that affect the core pathway earlier remain active after hinge contraction is unclear, although both Ft-Ds and Wnts appear to maintain their ‘radial’ patterns beyond the beginning of hinge contraction (e.g. Merkel et al., 2014; Ewen-Campen et al., 2020; Yu et al., 2020).

      We think that the reviewer is proposing the presence of a proximodistal cue that is active in the proximal region of the wing that we use for our experiments shown e.g. in Fig.5, and that this cue orients core polarity here (but not elsewhere in the wing) in a time window after 18h APF. Ft-Ds and Wnts do not seem to be plausible candidates as they are still in ‘radial’ patterns. This leaves either an unknown proximodistal cue (a gradient of some unknown signalling molecule?), or possibly some ability of hinge contraction to align proximodistal polarity specifically in this wing region but not elsewhere. We cannot definitively rule out either of these possibilities, but neither do we think there is sufficient evidence to justify invoking their existence to explain our observations.

      In particular, the reason that we don’t think there is a proximodistal cue in the proximal part of the wing after 18h APF, is that work from our lab shows that induction of Fz or Stbm expression at times around or after the start of hinge contraction (i.e. >16 h APF) results in increasing levels of trichome swirling with polarity not being coordinated with the tissue axis either proximally or distally (Strutt and Strutt, 2002; Strutt and Strutt 2007). Our simplest interpretation for this is that induction at these stages fails to establish the early radial pattern of core pathway polarity and hence hinge contraction cannot reorient radial to proximodistal. If hinge contraction alone could specify proximodistal polarity in the absence of the earlier radial polarity, then we would not expect to see swirling over much of the proximal wing (where the forces from hinge contraction are strongest (Etournay et al., 2015)).

      In this manuscript, our earliest de novo experiments begin with Fz induction at 18h APF (de novo 10h), then at 20h APF (de novo 8h) and at 22h APF (de novo 6h). The image in Fig. 5B, referred to by the reviewer, is of a wing where Fz is induced de novo at 22 h APF. In these wings, as expected, the core proteins localise asymmetrically in stereotypical swirling patterns throughout the wing surface (see Fig. 2M and also Strutt and Strutt, 2002; Strutt and Strutt 2007), but – usefully for our experiments – they broadly localise along the proximal-distal axis in the region analysed in Fig. 5B. Given the strong swirling in surrounding regions when inducing at >20h APF, we feel reasonably confident in assuming that the pattern is not due to a proximodistal cue present in the proximal wing.

      We appreciate that the original manuscript did not show images including the trichome pattern in adjacent regions, so this point would not have been clear, but we now include these in Supplementary Fig. 5. We have also added a note in the legend to Fig. 5B to clarify that the proximodistal pattern seen is local to this wing region. We apologise for this oversight and the confusion caused and appreciate the feedback.

      The 6 hr condition, that has only partial polarity magnitude, is quite disordered. Do the patterns at 8 and 10 hrs become more proximally-distally oriented? It is stated that they all show swirls, but please provide adult wing images, and the corresponding orientation outputs from QuantifyPolarity to help validate the notion that the global cues are indeed bypassed by this paradigm.

      In all three ‘normal’ de novo conditions (6h, 8h and 10h), regardless of the time of induction, the polarity orientation patterns of Fz-mKate2 in pupal and adult wings are very similar in the experimentally analysed region (Fig. S5B-E). The strong local hair swirling agrees with the previous published data (Strutt and Strutt, 2002; Strutt and Strutt 2007). Overall, we don’t see any evidence that the 10h de novo induction results in more proximodistally coordinated polarity than the 8h or 6h conditions. This is consistent with our contention that there is no global cue present at these stages, which presumably would have a stronger effect when core pathway activity was induced at earlier stages.

      Changes to manuscript: Added additional explanation of the ‘de novo induction’ paradigm and why we believe the resulting polarity patterns are unlikely to be influenced by any global signals in Introduction and Results section ‘Induced core protein relocalisation…’. Added quantification of polarity in the experiment region proximal to the anterior cross-vein in pupal wings (Fig.S5E-E’’’) and zoomed-out images of the surrounding region in adult wings showing that the polarity pattern does not become more proximodistal when induction time is longer, and also that there is not overall proximodistal polarity in proximal regions of the wing (Fig.S5B-D), arguing against an unknown proximodistal polarity cue at these stages of development.

      In the de novo paradigm, polarization is initiated immediately or shortly after heat shock induction. However, the results should be differently interpreted if the level of available Fz protein does not rise rapidly and then stabilize before the 6 hr time point, and instead continues to rise throughout the experiment. Western blots of the Fz::mKate2-sfGFP at time points after induction should be performed to demonstrate steady state prior to measurements. Otherwise, polarity magnitude could simply reflect the total available pool of Fz at different times after induction. Interpreting stability is complex, and could depend on the same issue, as well as the amount of recycling that may occur. Prior work from this lab using FRAP suggested that turnover occurs, and could result from recycling as well as replenishment from newly synthesized protein. 

      The reviewer raises an important point, which we agree could confound our experimental interpretations. As suggested we have now carried out western blotting and quantitation for Fz::mKate2-sfGFP levels and added these data to Fig.S1 (Fig. S1C,D). Quantified Fz is not significantly different between the three de novo polarity induction timings and not significantly different compared to constitutive Fz::mKate2-sfGFP expression (although there is a trend towards increasing Fz::mKate2-sfGFP protein levels with increasing induction times). These data are consistent with Fz::mKate2-sfGFP being at steady state in our experiments and that levels are sufficient to achieve normal polarity (as constitutive Fz::mKate2-sfGFP does so). Therefore it is unlikely that differing protein levels explain the differing polarity magnitudes at the different induction times. Interestingly, Fz::mKate2-sfGFP levels are lower than endogenous Fz levels, possibly due to lower expression or increased turnover/reduced recycling.

      Changes to manuscript: Added western blot analysis of Fz::mKate2-sfGFP expression under 10h, 8h and 6h induction conditions vs endogenous Fz expression and constitutive Fz::mKate2sfGFP expression (Fig.S1C-D) and discussed in Results section ‘Planar polarity establishment is…’.

      From the Fig 3 results, the authors claim that limiting pools of core proteins do not explain cellscale signaling, a result expected based on the lack of phenotypes in heterozygotes, but of course they do not test the possibility that Fz is limiting. They do note that some other contributing protein could be. 

      Previously published results from our lab (Strutt et al., 2016 Cell Reports; Supplemental Fig. S6E) show that in a heterozygous fz mutant background, Fz protein levels are not affected by halving the gene dosage when compared to wt, suggesting that Fz is most likely produced in excess and is not normally limiting, but that protein that cannot form complexes may be rapidly degraded. We have now added this information to the text.

      Changes to manuscript: Added explanation in text that Fz levels had previously been shown to not be dosage sensitive in Results section ‘Planar polarity establishment is…’ and also added a caveat to the Discussion about not directly testing Fz.

      In Fig 3, it is unclear why the authors chose to test dsh1/+ rather than dsh[null]/+. In any case, the statistically significant effect of Dsh dose reduction is puzzling, and might indicate that the other interpretation is correct. Ideally, a range including larger and smaller reductions would be tested. As is, I don't think limiting Dsh is ruled out. 

      Concerning the choice of dsh allele, we appreciate the query of the reviewer regarding use of dsh[1] instead of a null, as there might be a concern that dsh[1] would give a less strong phenotype. The answer is that over more than two decades we and others have never found any evidence that dsh[1] does not act as a ‘null’ for planar polarity in the pupal wing, and furthermore use of dsh[1] preserves function in Wg signalling – and we would prefer to rule out any phenotypic effects due to any potential cross-talk between the two pathways that might be seen using a complete null. To expand on this point, dsh[1] mutant protein is never seen at cell junctions (Axelrod 2001; Shimada et al., 2001; our own work), and by every criteria we have used, planar polarity is completely disrupted in hemizygous or homozygous mutants e.g. see quantifications of polarity in (Warrington et al., 2017 Curr Biol).

      In terms of the broader point, whether we can rule out Dsh being limiting, we were very careful to be clear that we did not see evidence for Dsh (or other core proteins) being limiting in terms of ‘rates of core pathway de novo polarisation’. When the reviewer says ‘the statistically significant effect of Dsh dose reduction is puzzling’ we believe they are referring to the data in Fig. 3J, showing a small but significantly different reduction in stable Fz in de novo 6h conditions (also seen in 8h de novo conditions, Fig. S3I). As Dsh is known to stabilise Fz in complexes (Strutt et al., 2011 Dev Cell; Warrington et al., 2017 Curr Biol), in itself this result is not wholly surprising. Nevertheless, while this shows that halving Dsh levels does modestly reduce Fz stability, it does not alter our conclusion that halving Dsh levels does not affect Fz polarisation rate under either 6h or 8h de novo conditions.

      Unfortunately, we do not have available to us a practical way of achieving consistent intermediate reductions in Dsh levels (e.g. a series of verified transgenes expressing at different levels). Levels of all the core proteins could be dialled down using transgenes, to see when the system breaks, and indeed we have previously published that lower levels of polarity are seen if Fmi levels are <<50% or if animals are transheterozygous for pk, stbm, dgo or dsh, pk, stbm, dgo simultaneously (Strutt et al., 2016 Cell Reports). However, it seems to be a trivial result that eventually the ability to polarise is lost if insufficient core proteins are present at the junctions. For this reason we have focused on a simple set of experiments reducing gene dosage singly by 50% under two de novo induction conditions, and have been careful to state our results cautiously. The assays we carried out were a great deal of work even for just the 5 heterozygous conditions tested.

      We believe that the experiments shown effectively make the point that there is no strong dosage sensitivity – and it remains our contention that if protein levels were the key to setting up cell-scale polarity, then a 50% reduction would be expected to show an effect on the rate of polarisation. We further note that as Fz::mKate2-sfGFP levels are lower than endogenous Fz levels (see above), the system might be expected to be sensitised to further dosage reductions, and despite this we failed to see an effect on rate of polarisation.

      We note that Reviewer #3 made a similar point about whether we can rule out dosage sensitivity on the basis of 50% reductions in protein level. To address the comments of both reviewers we had now added some further narrative and caveats in the text.

      In a similar vein, Reviewer #2 requested data on whether dosage reduction altered protein levels by the expected amount. We have now added further explanation/references and western blot data to address this.

      Changes to manuscript: Added more explanation of our choice of dsh[1] as an appropriate mutant allele to use in Results section ‘Planar polarity establishment is…’. Added some narrative and caveats regarding whether lowering levels more than 50% would add to our findings in the Discussion. Revised conclusions to be more cautious including altering section title to read ‘Planar polarity establishment is not highly sensitive to variation in protein levels of core complex components’.

      Also added westerns and text/references showing that for the tested proteins there is a reduction in protein levels upon removal of one gene dosage in Results section ‘Planar polarity establishment is…’ and Fig.S2.

      The data in Fig 5 are somewhat internally inconsistent, and inconsistent with the authors' interpretation. In both repolarization conditions, the authors claim that repolarization extends only to row 1, and row 1 is statistically different from non-repolarized row 1, but so too is row 3. Row 2 is not. This makes no sense, and suggests either that the statistical tests are inappropriate and/or the data is too sparse to be meaningful. 

      As we’re sure the reviewer appreciates, this was an extremely complex experiment to perform and analyse. We spent a lot of time trying to find the best way to illustrate the results (finally settling on a 2D vector representation of polarity) and how to show the paired statistical comparisons between different groups. Moreover, in the end we were only able to detect generally quite modest (statistically significant) changes in cell polarity under the experimental conditions.

      However, we note that failure to see large and consistent changes in polarity is exactly the expected result if it is hard to repolarise from a boundary – and this is of course the conclusion that we draw. Conversely, if repolarisation were easy, which was our expectation at least under de novo conditions without existing polarity, then we would have expected large and highly statistically significant changes in polarity across multiple cell rows. Hence we stand by our conclusion that ‘it is hard to repolarise from a boundary of Fz overexpression in both control and de novo polarity conditions’.

      Overall, we were trying to establish three points:

      (1) to demonstrate that repolarisation occurs from a boundary of overexpression i.e. from boundary 0 to row 0

      (2) to establish whether a wave of repolarisation occurs across rows 1, 2 and 3

      (3) to determine if in repolarisation in de novo condition it is easier to repolarise than in repolarisation in the control (already polarised) condition Taking each in turn:

      (1) To detect repolarisation from a boundary relative to the control condition, we have to compare row 0 in repolarisation condition (Fig.5G,K) vs control condition (Fig.5F,J). This comparison shows a significative repolarisation (p=0.0014). From now, row 0 in repolarisation condition is our reference for repolarisation occurring.

      (2) To determine if there is a wave of repolarisation in the repolarisation condition we have to compare row 0 vs row 1 to 3 in the repolarisation condition (Fig.5K). Row 1 is not significantly different to row 0, but rows 2 and 3 are different and the vectors show obviously lower polarity than row 0. Hence no wave of repolarisation is detected over rows 1 to 3.

      (3) To determine if it is easier to repolarise in the de novo condition, our reference for establishment of a repolarisation pattern is the polarisation condition in rows 0 to 3. So, we compare repolarisation condition vs repolarisation in de novo condition, row 0 vs row 0, row 1 vs row 1, row 2 vs row 2 and row 3 vs row 3 – in each case no significative difference in polarity is detected, supporting our conclusion that it is not easier to repolarise in the de novo condition.

      We agree that the variations in row 3 are puzzling, but there is no evidence that this is due to propagation of polarity from row 0, and so in terms of our three questions, it does not alter our conclusions.

      Changes to manuscript: We have extensively revised the text describing the results in Fig.5 to hopefully make the reasons for our conclusions clearer and also be more cautious in our conclusions in Results section ‘Induced core protein relocalisation…’. 

      For the related boundary intensity data in Fig 6, the authors need to describe exactly how boundaries were chosen or excluded from the analysis. Ideally, all boundaries would be classified as either meido-lateral (meaning anterior-posterior) or proximal-distal depending on angle. 

      We thank the reviewer for pointing out that this was not clear.

      All boundaries were classified following their orientation compared to the Fz over-expression boundary using hh-GAL4 expressed in the wing posterior compartment. Horizontal junctions were defined as parallel to the Fz over-expression boundary (between 0 and 45 degrees) and mediolateral junctions as junctions linking two horizontal boundaries (between 45 and 90 degrees).

      Changes to manuscript: The boundary classification detailed above has been added in the Materials and Methods.

      If the authors believe their Fig 5 and 6 analyses, how do they explain that hairs are reoriented well beyond where the core proteins are not? This would be a dramatic finding, because as far as I know, when core proteins are polarized, prehair orientation always follows the core protein distribution. Surprisingly, the authors do not so much as comment about this. The authors should age their wings just a bit more to see whether the prehair pattern looks more like the adult hair pattern or like that predicted by their protein orientation results.

      Again the reviewer makes an interesting point, and we agree that this is something that we should have more directly addressed in the manuscript.

      There are three reasons why we might expect adult trichomes to show a different effect from the measured core protein polarity pattern seen in our experiments:

      (i) we are assaying core protein polarity at 28h APF, but trichomes emerge at >32h APF, so there is still time for polarity to propagate a bit further from the boundary. We now have added data showing that by the point of trichome initiation, the wave of polarisation extends 3-4 cell rows (Fig.S5A).

      (ii) it has long been known that a strong localisation of core proteins at a cell edge is not required for polarisation of trichome polarity from a boundary. For instance, in Strutt & Strutt 2007 we show clones of cells overexpressing Fz causing propagation through pk[pk-sple] mutant tissue where there is no detectable core protein polarity. We were following up prior observations of Adler et al., 2000 in the wing and Lawrence et al., 2004 in the abdomen.

      (iii) there is evidence to suggest that the polarity of adult trichomes is locally coupled, possibly mechanically. This point is hard to prove without live imaging taking in both initial core protein localisation, the site of actin-rich trichome initiation and then the final orientation of the much larger microtubule filled trichome, and we’re not aware that such data exist. However, Wong & Adler 1993 (JCB) showed that over a number of hours trichomes become much larger and move towards the centre of the cell, presumably becoming decoupled from any core protein cue. The images in Guild … & Tilney, 2005 (MBoC)  are also interesting to look at in this regard. Finally, septate junction proteins have been implicated in local alignment of trichomes, independently of the core pathway (Venema … & Auld, 2004 Dev Biol).

      Changes to manuscript: Added new data in Fig.S5A showing where trichomes initiate under 6h de novo induction conditions, for comparison to core protein localisation and adult trichome data in Fig.5. Added some text explaining why adult trichome repolarisation might be stronger than the observed effects on core protein localisation in Discussion. 

      Minor points:

      As the authors know, there is a model in the literature that suggests microtubule trafficking provides a global cue to orient PCP. The authors' repolarization data in Fig 4 make a reasonably convincing case against a role for no role for microtubules in cell-scale signaling, but do not rule out a role as a global cue. The authors should be careful of language such as "...MTs and core proteins being oriented independently of each other" that would appear to possibly also refer to a role as a global cue. 

      Thank you for pointing out that this was not clear. We have now modified the text to hopefully address this.

      Changes to manuscript: Text updated in Results section ‘Microtubules do not provide…’.

      Significance:

      There are two negative conclusions and one positive conclusion made by the authors. Provided the above points are addressed, the negative conclusions, that core proteins are not limiting and that microtubules are not involved in cell-scale signaling are solid. The positive conclusion is more nebulous - the authors say that cell-scale signaling is strong relative to cell-cell signaling - but how strong is strong? Strong relative to their prior expectations? I'm not sure how to interpret such a conclusion. Overall, we learn something from these results, though it fails to reveal anything about mechanism. These results will be of some interest to those studying PCP.

      The reviewer raises an interesting point, which is how do you compare the strength of two different processes, even if both processes affect the same outcome (in this case cell polarity). Repolarisation from a boundary has not been carefully studied at the level of core protein localisation in any previous study to our knowledge – this is one of the important novel aspects of this study. Hence there is not a baseline for defining strong repolarisation. Similarly, there has been no investigation of the nature of ‘cell-scale signalling’. This was a considerable challenge for us in writing the manuscript, and we have done our best to find appropriate language that hopefully conveys our message adequately. Minimally our work may provide a baseline for helping to define the ‘strengths’ of these processes in future studies.

      One of our main points is that we can generate an artificial boundary of Fz expression, where Fz levels are at least several fold higher than in the neighbouring cell (e.g. compare Fig.4N’ and O’) and only two rows of cells show a significant change in polarity relative to controls. Even when the tissue next to the overexpression domain is still in the process of generating polarity (de novo condition) then the boundary has little effect on polarity in neighbouring cell rows. This was a result that surprised us, and we tried to convey that by using language to suggest cell-scale signalling was stronger than cell-cell signalling i.e. stronger in terms of the ability to define the final direction of polarity.

      Changes to manuscript: In the revised manuscript we have reviewed our use of language and now avoid saying ‘strong’ but instead use terms such as ‘effective’ and ‘robust’ in e.g. Results section ‘Induced core protein relocalisation…’, the Discussion and we have also changed the title of the manuscript to avoid claiming a ‘strong’ signal.

      Reviewer #2:

      […] Critique

      The experiments described in this paper are of high quality with a sophisticated level of design and analysis. However, there needs to be some recalibration of the extent of the conclusions that can be drawn (see below). Moreover, a limitation of this paper is that, despite the quality of their data, they cannot give a molecular hint about the nature of their proposed cell-scale signal. Below are a two key points that the authors may want to clarify.

      (1) The first set of repolarisation experiment is performed after the global cell rearrangements that have been shown to act as global signal. However, this approach does not exclude the possible contribution of an unknown diffusible global signal.

      A similar point was raised by Reviewer 1. For the convenience of this reviewer, we’ll summarise the arguments against such an unknown cue again below. More broadly, both reviewers asking a similar question indicates that we have failed to lay out the evidence in sufficient detail. In our defence, we have used the same ‘de novo’ paradigm in three previous publications (Strutt and Strutt 2002, 2007; Brittle et al 2022) without attracting (overt) controversy. We have now added text to the Introduction and Results that goes into more detail, as well as more experimental evidence (Fig.S5).

      Firstly, it is worth noting that the global cues acting in the wing are poorly understood, with mostly negative evidence against particular cues accruing in recent years. This makes it a hard subject to succinctly discuss. Secondly, we accept that it is hard to prove there is no influence of global cues, when the nature of those cues and the time at which they act remain unclear. Below we summarise the reasons why we believe there are not significance effects of global cues in our experiments that would influence the interpretation of our results.

      First, our reading of the literature supports a broad consensus that an early radial core planar polarity pattern is realigned by cell flow produced by hinge contraction beginning at around 16h APF (e.g. Aigouy et al., 2010; Strutt and Strutt, 2015; Aw and Devenport, 2017; Butler and Wallingford, 2017; Tan and Strutt, 2025). Taken at face value, this suggests that there are ‘radial’ cues present prior to hinge contraction, maybe coming from the wing margin – arguably these radial cues could be Ft-Ds or Wnts or both, given they are expressed in patterns consistent with such a role (notwithstanding the published evidence arguing against roles for either of these cues). It then appears that hinge contraction supercedes these cues to convert a radial pattern to a proximodistal pattern – whether the radial cues that affect the core pathway earlier remain active after hinge contraction is unclear, although both Ft-Ds and Wnts appear to maintain their ‘radial’ patterns beyond the beginning of hinge contraction (e.g. Merkel et al., 2014; Ewen-Campen et al.,2020; Yu et al., 2020).

      We think that the reviewers are proposing the presence of a proximodistal cue that is active in the proximal region of the wing that we use for our experiments shown e.g. in Fig.5, and that this cue orients core polarity here (but not elsewhere in the wing) in a time window after 18h APF. Ft-Ds and Wnts do not seem to be plausible candidates as they are still in ‘radial’ patterns. This leaves either an unknown proximodistal cue (a gradient of some unknown signalling molecule?), or possibly some ability of hinge contraction to align proximodistal polarity specifically in this wing region but not elsewhere. We cannot definitively rule out either of these possibilities, but neither do we think there is sufficient evidence to justify invoking their existence to explain our observations.

      In particular, the reason that we don’t think there is a proximodistal cue in the proximal part of the wing after 18h APF, is that work from our lab shows that induction of Fz or Stbm expression at times around or after the start of hinge contraction (i.e. >16 h APF) results in increasing levels of trichome swirling with polarity not being coordinated with the tissue axis either proximally or distally (Strutt and Strutt, 2002; Strutt and Strutt 2007). Our simplest interpretation of this is that induction at these stages fails to result in the early radial pattern of core pathway polarity being established and hence a failure of hinge contraction to reorient radial to proximodistal. If hinge contraction alone could specify proximodistal polarity in the absence of the earlier radial polarity, then we would not expect to see swirling over much of the proximal wing (where the forces from hinge contraction are strongest, Etournay et al., 2015).

      In this manuscript, our earliest de novo experiments begin at 18h APF (de novo 10h), then at 20h APF (de novo 8h) and at 22h APF (de novo 6h). The image in Fig. 5B referred to by Reviewer 1, is of a wing where Fz is induced de novo at 22 h APF. In these wings, as expected, the core proteins localise asymmetrically in stereotypical swirling patterns throughout the wing surface (see Fig. 2M and also Strutt and Strutt, 2002; Strutt and Strutt 2007), but – usefully for our experiments – they broadly localise along the proximal-distal axis in the region analysed in Fig. 5B. Given the strong swirling in surrounding regions when inducing at >20h APF, we feel reasonably confident in assuming that the pattern is not due to a proximodistal cue present in the proximal wing. We appreciate that the original manuscript did not show images including the trichome pattern in adjacent regions, so this point would not have been clear, but we now include these in Supplementary Fig.S5. We have also added a note in the legend to Fig. 5B to clarify that the proximodistal pattern seen is local to this wing region.

      Changes to manuscript: Text extended in Introduction and Results to better explain why we believe the de novo conditions that we use most likely result in a polarity pattern that is not significantly influenced by ‘global cues’. Now show zoomed-out images of the surrounding region around the experiment region proximal to the anterior cross-vein region in adult wings, showing that the polarity pattern does not become more proximodistal when induction time is longer, and also that there is not overall proximodistal polarity in proximal regions of the wing, arguing against an unknown proximodistal polarity cue at these stages of development (Fig.S5B-E’’’).

      (2) The putative non-local cell scale signal must be more precisely defined (maybe also given a better name). It is not clear to me that one can separate cell-scale from molecular-scale signal.

      Local signals can redistribute within a cell (or membrane) so local signals are also cell-scale. Without a clear definition, it is difficult to interpret the results of the gene dosage experiments. The link between gene dosage and cell-scale signal is not rigorously stated. Related to this, the concluding statement of the introduction is too cryptic.

      We thank the reviewer for raising this, as again a similar comment was made by Reviewer 1, so we are clearly falling short in defining the term. We have now had another attempt in the Introduction.

      To more specifically answer the point made by the reviewer regarding molecular vs cellular, we are essentially being guided here by the prior computational modelling work, as at the biological level the details are still being worked out. A specific class of previous models only allowed ‘signals’ between core proteins to act ‘locally’, meaning within a cell junction, and within the models there was no explicit mechanism by which proteins on other junctions could ‘detect’ the polarity of a neighbouring junction (e.g. Amonlirdviman et al., 2005; Le Garrec et al., 2006; Fischer et al., 2013). Other models implicitly or explicitly encode a mechanism by which cell junctions can be influenced by the polarity of other junctions (e.g. Meinhardt, 2007; Burak and Shraiman, 2009; Abley et al., 2013; Shadkhoo and Mani, 2019), for instance by diffusion of a factor produced by localisation of particular planar polarity proteins.

      We agree with the reviewer that a cell-scale signal will depend on ‘molecules’ and thus could be called ‘molecular-scale’, but here by ‘molecular-scale’ we mean signals that at the range of the sizes of molecules i.e. nanometers, rather than cell-scale signals that act at the size of cells i.e. micrometers. A caveat to our definition is that we implicitly include interactions that occur locally on cell junctions (<1 µm range) within ‘molecular-scale’, but this is a shorter range than ‘cellular-scale’ which requires signals acting over the diameter of a cell (3-5 µm). Nevertheless, we think the concept of ‘molecular-scale’ vs ‘cell-scale’ is a helpful one in this context, and have attempted to address the issue through a more careful definition of the terms.

      Changes to manuscript: Text revised in Introduction and legend to Fig.1 to more carefully define ‘cell-scale signalling’ and to distinguish it from ‘molecular-scale signalling’. Final sentence of Introduction also altered so we no longer cryptically speculate on the nature of the cell-scale signal but leave this to the Discussion.

      Minor comments. 

      Some of the (clever) genetic manipulation may need more details in the text. For example:

      - Need to specify if the hs-flp approach induces expression throughout the tissue.

      We apologise for the lack of clarity. In all the experiments, the hs-FLP transgene is present in all cells, and heat-shock results in ubiquitous expression. 

      Changes to manuscript: We have clarified this in the Results and Materials and Methods.

      - Need to specify in the text that in the unpolarised condition the tissue is both dsh and fz mutant.

      The reviewer is of course correct and we have updated this point in the text. The full genotype for the unpolarised condition is: w dsh<sup>1</sup> hsFLP22/y;; Act>>fz-mKate2sfGFP, fz<sup>P21</sup>/fz<sup>P21</sup> (see Table S1). So this line is mutant for dsh and fz with induced expression of Fz-mKate2sfGFP. 

      Changes to manuscript: We have clarified this in the relevant part of the Results.

      - Need to specify in the text that the experiment illustrated in Fig 5 is with hh-gal4. 

      As noted by the reviewer, we continued to use the same hh-GAL4 repolarisation paradigm as in Fig.4 and this info was in the legend to Fig.5 legend. However, we agree it is helpful to be explicit about this in the main text.

      Changes to manuscript: We have added this to this section of the Results.

      - Need to address a possible shortcoming of the hh experiment, that the AP boundary is a region of high tension.

      It is true that the AP boundary is under high tension in the wing disc (e.g. Landsberg et al., 2009). But we are not aware of any evidence that this higher tension persists into the pupal wing. In separate studies we have labelled for Myosin II in pupal wings (Trinidad et al 2025 Curr Biol; Tan & Strutt 2025 Nature Comms), and as far as we have noticed have not seen preferentially higher levels on the AP boundary. We think if tension were higher, the cell boundaries would appear straighter than in surrounding cells (as seen in the wing disc) and this is not evident in our images.

      - Need to dispel the possibility that there is no residual polarisation (e.g. of other components) in fz1 mutant (I assume this is the case).

      We use the null allele fz[P21] through this work, and we and others have consistently reported a complete loss of polarisation of other core proteins or downstream components in this background. The caveat to this is that core proteins that persist at cell junctions always appear at least slightly punctate in mutant backgrounds for other core proteins, and so any automated detection algorithm will always find evidence of individual cell polarity above a baseline level of uniform distribution. Hence we tend to use lack of local coordination of polarity (variance of cell polarity angle) as an additional measure of loss of polarisation, in addition to direct measures of average cell polarity. (We discuss this in the QuantifyPolarity manuscript Tan et al 2021 e.g. Fig.S6).

      Changes to manuscript: We now include in the Materials and Methods section ‘Fly genetics…’ a much more extensive explanation of the evidence for specific mutant alleles being ‘null’ for planar polarity function (including dsh1 as raised by Reviewer 1), specifically that they result in no detectable planar polarisation of either other core proteins or downstream effectors, and added appropriate references.

      - Need to provide evidence that 50% gene dosage commensurately affect protein level. 

      This is a good suggestion. In the case of Stbm, we have already published a western blot showing that a reduction in gene dosage results in reduced protein levels (Strutt et al 2016, Fig.S6). We have now performed western blots to quantify protein levels upon reduction of fmi, pk and dgo levels (we actually used EGFP-dgo for the latter, as we don’t have antibodies that can detect endogenous Dgo on western blots).

      Changes to manuscript: When presenting the dosage reduction experiments, we now refer back to Strutt et al., 2016 explicitly for Stbm, and have added western blot data for Fmi, Pk and EGFPDgo in new Fig.S2.

      - I am surprised that the relationship with microtubule polarity was never investigated. Is this true? 

      We agree this is a point that needed further clarification, as Reviewer 1 made a related point regarding the two possible roles for microtubules, one being as a mediator of a global cue upstream of the core pathway, and the second (which we investigate in this manuscript) as a mediator of a cell-scale signal downstream of the core pathway.

      Both the Uemura and Axelrod groups have published on potential upstream function as a global cue mediator in the Drosophila wing (e.g. Shimada et al., 2006; Harumoto et al., 2010; Matis et al., 2014).

      Both groups have also looked out whether core pathway components could affect orientation of microtubules (Harumoto et al., 2010; Olofsson at al., 2014; Sharp and Axelrod 2016). Notably Harumoto et al., 2010 observed that in 24h APF wings, loss of Fz or Stbm did not alter microtubule polarity from a proximodistal orientation consistent with the microtubules aligning along the long cell axis in the absence of other cues. However, this did not rule out an instructive effect of Fz or Stbm on microtubule polarity during core pathway cell-scale signalling. The Axelrod lab manuscripts saw interesting effects of Pk protein isoforms on microtubule polarity, albeit not throughout the entire wing, which hinted at a potential role in cell-scale signalling. Taken together this prior work was the motivation for our directed experiments to specifically test whether the core pathway might generate cell-scale polarity by instructing microtubule polarity.

      Changes to manuscript: We have revised the Results section ‘Microtubules do not…’ to make a clearer distinction regarding possible ‘upstream’ and ‘downstream’ roles of microtubules in Drosophila core pathway planar polarity and the motivation for our experiments investigating the latter.

      - The authors suggest that polarity does not propagate as a wave. And yet the range measured in adult is longer than in the pupal wing. Explain. 

      Again an excellent point, also made by Reviewer 1, which we have now addressed explicitly in the manuscript. For the convenience of this reviewer, we lay out the reasons why we think the propagation of polarity seen in the adult is further than seen for core protein localisation.

      There are three reasons why we might expect adult trichomes to show a different effect from the measured core protein polarity pattern seen in our experiments:

      (i) we are assaying core protein polarity at 28h APF, but trichomes emerge at >32h APF, so there is still time for polarity to propagate a bit further from the boundary. We now have added data showing that by the point of trichome initiation, the wave of polarisation extends 3-4 cell rows (Fig.S5A).  

      (ii) it has long been known that a strong localisation of core proteins at a cell edge is not required for polarisation of trichome polarity from a boundary. For instance, in Strutt & Strutt 2007 we show clones of cells overexpressing Fz causing propagation through pk[pk-sple] mutant tissue where there is no detectable core protein polarity. We were following up prior observations of Adler et al 2000 in the wing and Lawrence et al 2004 in the abdomen.

      (iii) there is evidence to suggest that the polarity of adult trichomes is locally coupled, possibly mechanically. This point is hard to prove without live imaging taking in both initial core protein localisation, the site of actin-rich trichome initiation and then the final orientation of the much larger microtubule filled trichome, and we’re not aware that such data exist. However, Wong & Adler 1993 (JCB) showed that over a number of hours trichomes become much larger and move towards the centre of the cell, presumably becoming decoupled from any core protein cue. The images in Guild … & Tilney, 2005 (MBoC)  are also interesting to look at in this regard. Finally, septate junction proteins have been implicated in local alignment of trichomes, independently of the core pathway (Venema … & Auld, 2004 Dev Biol).

      Changes to manuscript: Added new data in Fig.S5A showing where trichomes initiate under 6h de novo induction conditions, for comparison to core protein localisation and adult trichome data in Fig.5. Added some text explaining why adult trichome repolarisation might be stronger than the observed effects on core protein localisation in Discussion. 

      - The discussion states that the cell-intrinsic system remains to be fully characterised, implying that it has been partially characterised. What do we know about it? 

      As the reviewer probably realises, we were attempting to side-step a long speculative discussion about the various hints and ideas in the literature by grouping them under the umbrella of ‘remaining to be fully characterised’. We would argue that this current manuscript is the first to attempt to systematically investigate the nature of ‘cell-scale signalling’. The lack of prior work is probably due to two factors (i) pioneering theoretical work showed that a sufficiently strong global signal coupled with ‘local’ (i.e. confined to one cell junction) protein interactions was sufficient to polarise cells without the need to invoke the existence of a cell-scale signal; (ii) there is no easy way to identify cell-scale signals as their loss results in loss of polarity which will also occur if other (i.e. more locally acting) core pathway functions are compromised.

      The main investigation of the potential for cell-scale signalling has been another set of theory studies (Burak and Shraiman 2009; Abley et al., 2013; Shadkhoo and Mani 2019) which have considered the possibility of diffusible signals. In our present work we have further considered the possibility of a ‘depletion’ model, based on the pioneering theory work of Hans Meinhardt, and as discussed above the possibility that microtubules could mediate a cell-scale signal.

      Changes to manuscript: We have revised the Discussion to hopefully be clearer about the current state of knowledge.

      Reviewer #3:

      […] Major comments

      The data are clearly presented and the manuscript is well written. The conclusions are well supported by the data. 

      (1) The authors use a system to de novo establish PCP, which has the advantage of excluding global cues orienting PCP and thus to focus on the cell-intrinsic mechanisms. At the same time, the system has the limitation that it is unclear to what extent de novo PCP establishment reflects 'normal' cell scale PCP establishment, in particular because the Gal4/UAS expression system that is used to induce Fz expression will likely result in much higher Fz levels compared with the endogenous levels. The authors should briefly discuss this limitation. 

      We apologise if this wasn’t clear. We only used GAL4/UAS overexpression when we were generating an artificial boundary of Fz expression with hh-GAL4 to induce repolarisation. The de novo induction system involves Fz::mKate2-sfGFP being expressed directly under an Act5C promoter without use of GAL4/UAS. In response to a comment from Reviewer 1 we have now carried out western blot analysis which shows that Fz::mKate2-sfGFP levels under Act5C are actually lower than endogenous Fz levels. As we achieve normal levels of polarity, similar to what we measure in wild-type conditions when measured using QuantifyPolarity, we assume that therefore Fz levels are not limiting under these conditions. However, we note that lower than normal levels of Fz might sensitise the system to perturbation, which in fact would be advantageous in our study, as it might for instance have been expected to more readily reveal dosage sensitivity of other components.

      Changes to manuscript: We now describe the levels of expression achieved using the de novo induction system (Fig.S1C-D) and discuss possible consequences in the relevant Results sections and Discussion.

      (2) Fig. 3. The authors use heterozygous mutant backgrounds to test the robustness of de novo PCP establishment towards (partial) depletion in core PCP proteins. The authors conclude that de novo polarization is 'extremely robust to variation in protein level'. Since the authors (presumably) lowered protein levels by 50%, this conclusion appears to be somewhat overstated. The authors should tune down their conclusion. 

      Reviewer 1 makes a similar point about whether we can argue that the lack of sensitivity to a 50% reduction in protein levels actually rules out the depletion model. To address the comments of both reviewers we had now added some further narrative and caveats in the text.

      We nevertheless believe that the experiments shown effectively make the point that there is no strong dosage sensitivity – and it remains our contention that if protein levels were the key to setting up cell-scale polarity, then a 50% reduction would be expected to show an effect on the rate of polarisation. We further note that as Fz::mKate2-sfGFP levels are lower than endogenous Fz levels, the system might be expected to be sensitised to further dosage reductions, and despite this we fail to see an effect on rate of polarisation.

      In a similar vein, Reviewer 2 requested data on whether dosage reduction altered protein levels by the expected amount. We have now added further explanation/references and western blot data to address this.

      Changes to manuscript: Added some narrative and caveats regarding whether lowering levels more than 50% would add to our findings in the Discussion. Revised conclusions to be more cautious including altering section title to read ‘Planar polarity establishment is not highly sensitive to variation in protein levels of core complex components.

      Also added westerns and text/references showing that for the tested proteins there is a reduction in protein levels upon removal of one gene dosage in Results section ‘Planar polarity establishment is…’ and Fig.S2.

      Minor comments :

      (1) Page 3. The authors mention and reference that they used the PCA method to quantify cell polarity magnification and magnitude. It would help the unfamiliar reader, if the authors would briefly describe the principle of this method. 

      Changes to manuscript: More details have been added in Materials & Methods.

      Significance:

      The manuscript contributes to our understanding of how planar cell polarity is established. It extends previous work by the authors (Strutt and Strutt, 2002,2007) that already showed that induction of core PCP pathway activity by itself is sufficient to induce de novo PCP. This manuscript further explores the underlying mechanisms. The authors test whether de novo PCP establishment depends on an 'inhibitory signal', as previously postulated (Meinhardt, 2007), but do not find evidence. They also test whether core PCP proteins help to orient microtubules (which could enhance cell intrinsic polarization of core PCP proteins), but, again, do not find evidence, corroborating previous work (Harumoto et al, 2010). The most significant finding of this manuscript, perhaps, is the observation that local de novo PCP establishment does not propagate far through the tissue. A limitation of the study is that the mechanisms establishing intrinsic cell scale polarity remain unknown. The work will likely be of interest to specialists in the field of PCP.

    1. Reviewer #2 (Public Review):

      Summary:

      Li and colleagues applied virtual reality (VR) based training to create different navigational experiences for a set of visually similar scenes. They found that participants were better at visually discriminating scenes with different navigational experiences compared to scenes with similar navigational experiences. Moreover, this experience-based effect was also reflected in the fMRI data, with the PPA showing higher discriminability for scenes with different navigational experiences. Together, their results suggest that previous navigational experiences shape visual scene representation.

      Strengths:

      (1) The work has theoretical value as it provides novel evidence to the ongoing debate between visual and non-visual contributions to scene representation. While the idea that visual scene representation can encode navigational affordances is not new (e.g., Bonner & Epstein, 2017, PNAS), this study is one of the first to demonstrate that navigational experiences can causally shape visual scene representation. Thus, it serves as a strong test for the hypothesis that our visual scene representations involve encoding top-down navigational information.

      (2) The training paradigm with VR is novel and has the potential to be used by the broader community to explore the impact of experience on other categorical visual representations.

      (3) The converging evidence from behavioral and fMRI experiments consolidates the work's conclusion.

      Weaknesses:

      (1) While this work attempts to demonstrate the effect of navigational experience on visual scene representation, it's not immediately clear to what extent such an effect necessarily reflects altered visual representations. Given that scenes in the navigable condition were more explored and had distinct contextual associations than scenes in the non-navigable condition (where participants simply turned around), could the shorter response time for a scene pair with mismatched navigability be explained by the facilitation of different contextual associations or scene familiarities, rather than changes in perceptual representations? Especially when the visual similarity of the scenes was high and different visual cues might not have been immediately available to participants, the different contextual associations and/or familiarity could serve as indirect cues to facilitate participants' judgment, even if perceptual representations remained intact.

      (2) Similarly, the above-chance fMRI classification results in the PPA could also be explained by the different contextual associations and/or scene familiarities between navigable and non-navigable scenes, rather than different perceptual processes related to scene identification.

      (3) For the fMRI results, the specificity of the experience effect on the PPA is not strictly established, making the statement "such top-down effect was unique to the PPA" groundless. A significant interaction between navigational conditions and ROIs would be required to make such a claim.

      (4) For the behavioral results, the p-value of the interaction between groups and the navigational conditions was 0.05. I think this is not a convincing p-value to rule out visual confounding for the training group. Moreover, from Figure 2B, there appears to be an outlier participant in the control group who deviates dramatically from the rest of the participants. If this outlier is excluded, will the interaction become even less significant?

      (5) Experiment 1 only consists of 25 participants in each group. This is quite a small sample size for behavioral studies when there's no replication. It would be more convincing if an independent pre-registered replication study with a larger sample size could be conducted.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      • *

      __Reviewer #1 __


      Major comments


      1. The manuscript posits that the loss of function of MASh components (Ogc1 and Aralar) decreases adrenergic-stimulated lipolysis by altering the cytosolic NAD⁺/NADH ratio, with AMPK/ACC mentioned as possible mediators. However, this remains speculative. Please provide mechanistic data directly linking MASh-dependent NAD⁺/NADH changes to the regulation of lipolysis in brown adipocytes during adrenergic stimulation. Answer 1) The reviewer raises an important point regarding the direct assessment of cytosolic NAD⁺/NADH redox changes as a mechanistic link for altered lipolysis in brown adipocytes lacking MASh components. To address this point, we added new data to the revised manuscript showing lactate/pyruvate ratio as measured by metabolomics. This is a well-established surrogate marker to monitor changes in redox balance. Notably, under basal (non-stimulated) conditions, the lactate/pyruvate ratio did not display any significant differences between Aralar 1 KD and control cells, suggesting preservation of cytosolic NAD⁺/NADH levels in the absence of functional MASh under these conditions. This finding is consistent with reports showing the robustness of NAD⁺ regeneration via multiple shuttles and the possibility of metabolic compensation when one shuttle is compromised (PMID: 40540398; PMID: 37647199).

      The results have been added as new supplementary Figure 1 as following:

      Our new metabolomics data also revealed substantial reductions in the aspartate/glutamate ratio in Aralar 1 knockdown cells, serving as a metabolomic signature of impaired MASh function and reduced exchange of these amino acids between the cytosol and mitochondria. Given that the MASh is a major mechanism for exporting cytosolic reducing equivalents into the mitochondria under high metabolic demand, its loss would be expected to impact redox homeostasis, particularly under adrenergic stimulation when glycolytic flux and lipolytic activity are elevated (PMID: 40540398).

      Importantly, although our redox surrogate marker did not detect alterations, this may be explained by activation of compensatory pathways, most notably the glycerol phosphate shuttle (GPSh), which is highly expressed and active in brown adipocytes. Indirect support for this compensation comes from data shown in figure 4I showing reduced glycerol release in Aralar 1 KD cells upon norepinephrine stimulation and blocked lipolysis. This suggests a redirection of glycolytically derived G3P away from release and toward enhanced cycling within the GPSh, supporting cytosolic NAD⁺ regeneration via mitochondrial FAD-dependent G3PDH and cytosolic NAD⁺-dependent G3PDH activity. This is consistent with studies documenting that the combined action of MASh and GPSh maintains NAD redox homeostasis in brown adipocytes especially during non-thermogenic conditions (PMID: 168075; PMID: 40540398; PMID: 37647199). We have included a discussion about this possibility at page 9, third paragraph as follows:

      *“Previous studies have shown that BAT exhibits high activity of mitochondrial FAD-dependent glycerol-3-phosphate dehydrogenase (mG3PDH), which functions as an electron sink to sustain low cytosolic NADH levels essential for continuous glycolytic flux [11]. Accordingly, suppression of the MASh, either genetically or pharmacologically, is likely to induce a compensatory upregulation of the GPSh. This adaptation would enhance G3P turnover, contributing to the maintenance of cytosolic NAD redox balance. Moreover, the increased flux through the GPSh could favor fatty acid esterification and triglyceride synthesis or re-esterification, consistent with our findings in Ogc and/or Aralar 1 KD cells, where (i) triglyceride content rises (Fig. 3), (ii) overall respiratory rates remain largely unaltered (Figs. 2D–G), and (iii) glycerol release declines significantly (Fig. 4I). Notably, the decrease in glycerol release persists even when lipolysis is blocked by ATGlistatin, suggesting that the available G3P pool is rerouted from dephosphorylation and extracellular release toward oxidation to DHAP by mG3PDH to regenerate cytosolic NAD+ under MASh-deficient conditions. We propose that interference with the MASh does not directly impact lipolysis but instead alters the cellular balance between DHAP and G3P owing to enhanced activity of the GPSh. This metabolic shift would favor the esterification of G3P with free fatty acids, thereby promoting triglyceride synthesis. These results support the notion that, even during adrenergic stimulation—when long-chain unsaturated fatty acids and their CoA esters strongly inhibit mG3PDH activity [11]—the residual flux through the glycerophosphate shuttle remains critical for sustaining cytosolic NAD redox equilibrium [11,19,32].” *

      • *

      At the mechanistic level, adrenergic stimulation in brown adipocytes activates robust lipolysis and thermogenic gene programs, generating high NADH that must be efficiently reoxidized to sustain flux through glycolysis and lipolysis-linked pathways. Our findings are consistent with a model in which the loss of MASh does not prevent cytosolic NAD⁺ regeneration or lipolytic flux during acute adrenergic stimulation, due to compensatory upregulation of the GPSh, as suggested by the glycerol release changes. Thus, while MASh normally acts as a conduit for NADH export and aspartate/glutamate exchange, in its absence, the GPSh maintains cytosolic redox balance, thereby sustaining glycolytic and lipolytic capacity.

      We agree that future studies should employ direct measurements of cytosolic NAD⁺/NADH ratios (e.g., genetically-encoded redox sensors) during adrenergic stimulation and specific pharmacological inhibition of both shuttles to dissect these relationships in greater detail. We sincerely appreciate the reviewer's input, which has prompted us to clarify the indirect but robust evidence supporting a role for compensatory redox shuttle activity in preserving brown adipocyte lipolysis in the setting of MASh impairment.

      We have further added a new paragraph in the discussion section (page 10)::

      *“Mechanistically, the connection between the MASh and lipolysis appears to involve regulation of the cytosolic NAD⁺/NADH redox balance. MASh activity facilitates the regeneration of NAD⁺ from NADH in the cytosol, primarily through the reduction of oxaloacetate to malate by cytosolic malate dehydrogenase (Fig. 1G-H). Despite the theoretical expectation that reductions in MASh activity would disturb redox homeostasis, our metabolomic data show that the lactate/pyruvate ratio remains unchanged under conditions of MASh impairment, indicating that the overall cytosolic NAD⁺/NADH ratio is maintained (Figure S1A-C). While direct measurements of cytosolic NAD⁺/NADH were not performed, the preserved lactate/pyruvate ratio in Aralar 1 KD cells under basal conditions strongly suggests redox stability, likely due to compensatory activity by alternative mitochondrial shuttles or metabolic adaptations that maintain NAD redox homeostasis despite MASh impairment [18,33]. *

      Previous evidence indicates that BAT exhibits high activity of mitochondrial FAD-dependent glycerol-3-phosphate dehydrogenase (G3PDH), which acts as an electron sink to sustain low cytosolic NADH levels critical for glycolysis [34]. In this sense, it is conceivable that genetic or pharmacological suppression of MASh triggers compensatory enhancement of the G3P shuttle, increasing G3P availability and facilitating the maintenance of cytosolic NAD redox balance. This adaptation could also promote fatty acid esterification and triglyceride synthesis or re-esterification, aligning with our observations that in Ogc and/or Aralar 1 KD cells: (i) triglyceride levels increase (Fig. 3); (ii) overall respiratory rates are preserved (Figs. 2D–G); and (iii) glycerol release is significantly reduced (Fig 4I).”

      • *

      __ The absence of in vivo analysis of lipid-droplet size in MASh loss-of-function models is a major concern. In vitro results could be confounded by differences in differentiation stage between groups. Please document equivalent adipogenesis across groups (e.g., Pparg/Cebpa/Plin1/Fabp4 expression).__

      Answer 2) We thank the reviewer for the thoughtful and constructive comment regarding potential confounding by differences in differentiation stage, and for highlighting the importance of documenting equivalence between experimental groups. We appreciate the opportunity to clarify and provide additional assurance on this point.

      As detailed in our manuscript, we have performed qPCR analysis of multiple well-established markers of brown adipocyte differentiation, including Ucp1, Elovl3, Prdm16, Pparg, Cebpa, Plin1, and Fabp4, in both scramble, aralar1 KD, and Ogc KD cells (see Fig. S1A and accompanying text). Our results show no apparent effect of these genetic interventions on overall differentiation, as the expression levels of these key markers were consistently unaltered across groups. Furthermore, adenoviral-mediated knockdown of Ogc achieved an approximate 80% reduction in Ogc mRNA (see Fig. S1B), yet most differentiation markers remained unaffected. We did observe significant increases in Atgl, Pgc1α, and Tfam mRNA levels, which may indicate a degree of pathway reprogramming without affecting the general differentiation profile. We propose that interference with the MASh does not directly impact lipolysis but instead alters the cellular balance between DHAP and G3P owing to enhanced activity of the GPSh. This metabolic shift would favor the esterification of G3P with free fatty acids, thereby promoting triglyceride synthesis.

      Additional experimental support for equivalent differentiation can be drawn from our respirometry data presented in Figures 2E and 2G. These figures demonstrate that respiratory rates upon norepinephrine stimulation, which is a sensitive indicator of brown adipocyte thermogenic capacity, were essentially identical in scramble, aralar1 KD, and Ogc KD cells. Since norepinephrine-stimulated respiration requires both functional mitochondria and the full differentiation of brown adipocytes, these results strongly support the conclusion that silencing either MASh component does not impair the fundamental ability of cells to undergo brown adipocyte differentiation or achieve functional thermogenic competence.

      This is consistent with published findings showing that norepinephrine triggers robust respiration and thermogenic activation only in fully differentiated and functional brown adipocytes, making such measurements a widely accepted proxy for differentiation status and mitochondrial integrity. Thus, the equivalent respiratory responses observed in all groups further validate that differentiation was not compromised by the genetic interventions.

      We hope this clarifies that equivalent adipogenesis was carefully documented and that any observed phenotypes are unlikely to be attributable to differences in differentiation stages. Thank you again for your rigorous assessment and for helping to ensure the robustness of our study.

      __ Please include rescue experiments (add-back OGC1 and Aralar) to rule out siRNA/shRNA off-target effects and verify that the phenotype stems from MASh loss of function.__

      Answer 3) We thank the reviewer for this important suggestion regarding the inclusion of rescue experiments with add-back of Ogc and Aralar to definitively exclude off-target effects of the siRNA/shRNA-mediated knockdowns.

      We would like to kindly point out that although we did not perform add-back rescue experiments directly, the consistency of phenotypes observed across two independent genetic interventions—aralar 1 KD and Ogc KD—strongly argues against off-target effects being responsible for the observed metabolic and functional alterations. Specifically, both knockdowns yielded remarkably similar phenotypes in multiple assays, including respirometry analyses, mitochondrial morphology, lipid droplet homeostasis, and lipid metabolism, supporting the conclusion that these effects stem from MASh loss of function rather than nonspecific silencing.

      Furthermore, our new supplementary data (new Supplementary Figure 1A-F) reveals a significant reduction in the aspartate/glutamate ratio in Aralar 1 KD cells, a compelling functional readout for MASh impairment. This molecular evidence corroborates that our genetic interventions effectively disrupted MASh activity as intended.

      We sincerely appreciate the reviewer’s thorough evaluation and understand the importance of rescue experiments. While recognizing their value, we believe the convergent genetic, metabolic, and functional evidence presented across two different MASh components provides strong and consistent support that the phenotypes observed are due to specific loss of MASh function.


      __ Please expand on physiological significance: What is the importance of MASh regulation of BAT lipolysis in long-term adaptive thermogenesis?__

      Answer 4) This is a very interesting aspect, and we have included a new paragraph in the discussion section (page 14) to address it as follows:

      “Our results, supported by recent literature, strongly indicate that the malate–aspartate shuttle (MASh) plays a key role in facilitating fatty acid–dependent thermogenesis in brown adipocytes. Specifically, BAT-targeted overexpression of GOT1 has been shown to enhance β-oxidation and support acute cold-induced thermogenesis (PMID: 40540398). Interestingly, genetic ablation of GOT1—and thus MASh inhibition—preserves cold-induced thermogenesis by promoting a metabolic shift from fatty acid to glucose oxidation. Our findings corroborate and extend these observations by demonstrating that MASh impairment sustains overall respiratory activity in norepinephrine-stimulated brown adipocytes (Figures 2D–2G), while concurrently impairing lipolysis and resulting in an accumulation of small lipid droplets (Figures 3 and 4). Collectively, these data suggest that MASh not only modulates substrate preference towards fatty acid oxidation but also facilitates lipolysis, an essential upstream step that enables lipid oxidation and supports thermogenic heat production.”

      Minor comments

      1. __ Fig. 4 legend/title contains a typo ("lypolysis" → lipolysis).__ Answer 1) Corrected

      __ In Fig. 2 legend line: "Adevirus-mediated" → Adenovirus-mediated; "OCAR" → OCR.__

      Answer 2) Corrected

      __ For lipolysis imaging, you already show Forskolin/Atglistatin/Etomoxir controls; add a vehicle-only time course overlay in the main figure (currently in text/legend) to aid visual comparison.__

      Answer 3) We thank the reviewer for pointing this out. To improve clarity, we have updated the labeling in Figures 3 and 4: “basal” now clearly refers to the unstimulated/untreated condition, and the previously labeled “UT” condition has been clarified as “untransduced.” These changes make the figure legends and data presentation more consistent and easier to interpret.

      __ Ensure consistent gene symbols (Atgl/Pnpla2), and protein capitalization.__

      Answer 4) Corrected.

      __Reviewer #2 __

      Major points:

      1. __ In the current manuscript, mitochondrial morphology (area, aspect ratio, and roundness) was analyzed in OGC1 KD cells using TMRE, whereas MitoTracker Deep Red (MTDR) was used in Aralar1 KD cells. Notably, TMRE is a ΔΨm-dependent probe. The signal intensity can change, or the distribution may reflect alterations in membrane potential rather than true morphological changes. Therefore, the observed differences in OGC1 KD cells based on TMRE staining may be confounded by the dye's functional dependence, potentially biasing the conclusions. It is recommended to evaluate mitochondrial morphology with consistent trackers across conditions. In addition, in the subsequent OCR analysis, mitochondrial area was used for normalization. Please clarify which staining method was employed, and provide justification for its suitability.__ Answer 1) We thank the reviewer for this insightful comment. Indeed, TMRE is a membrane potential-sensitive dye and could therefore potentially affect measurements of mitochondria.

      We would like to point out that mitochondrial morphology was quantified based on mitochondrial area rather than fluorescence intensity. To create an accurate binary map of mitochondria, we used a low threshold, which allowed us to include even weakly stained mitochondria and thereby detect them independently of their membrane potential. In all imaged cells, TMRE signal was sufficient to reliably identify mitochondrial pixels. Moreover, these images were acquired using a confocal microscope, where the risk of pixel expansion due to higher fluorescence intensity is minimized. Lastly, given that overall mitochondrial oxygen consumption in these cells remains largely intact, we do not expect a substantial loss of membrane potential, although minor effects cannot be entirely excluded.

      We opted to use TMRE for imaging Ogc KD cells because the scramble control for these shRNA viruses carries an mKate fluorescent tag, which overlaps with the MTDR signal. Since accurate assessment of transduction efficiency relied on detecting mKate, MTDR could not be used in these experiments. Importantly, we only compare mitochondrial morphology within the same staining condition and do not draw conclusions across cells stained with different dyes.

      To ensure transparency, we have added a new section at the discussion (page 17, 2nd paragraph) highlighting the potential influence of ΔΨm-dependent dyes on morphological measurements as follows:

      “It is also important to note that mitochondrial morphology was quantified using MTDR in Aralar 1 KD cells and TMRE in Ogc KD cells due to experimental constraints (see Methods). TMRE is a membrane potential–dependent dye, which could potentially influence morphology measurements. To minimize this risk, we used confocal microscopy, which reduces the likelihood of pixel expansion due to higher fluorescence intensity, and set thresholds to detect even weakly stained mitochondria. Nonetheless, we cannot fully exclude the possibility that the differences in morphology observed between Aralar 1 and Ogc KD are influenced by the use of different dyes; however, statistical comparisons were never performed across samples stained with different dyes.”

      Also, we have expanded the Methods section (page 22, 2nd paragraph) to include a rationale for using these dyes and describe the analysis protocol as following:

      “TMRE was used for Ogc KD cells because the scramble control for the shRNA viruses carries an mKate fluorescent tag, which overlaps with MTDR fluorescence, preventing its use. MTDR was used for Aralar KD cells. Image Analysis was performed in FIJI (ImageJ, NIH). For the quantification of mitochondrial morphology and area, images stained with TMRE or MTDR were analyzed. Thresholds were adjusted to ensure that even weakly stained mitochondria were detected and included in the analysis. Only the mitochondrial area was evaluated, independent of fluorescence intensity.”

      Minor points:

      1. __ In the introduction, the authors state that "LDH activity increases in the context of BAT activation". This point is important for the logic of the manuscript, reference [10] cited here is not sufficient to support this claim. It is recommended to provide appropriate references to support this statement.__ Answer 1) We have substantially changed this paragraph in the revised manuscript to better explain why LDH would not act as a major player in contributing to NAD redox balance in the context of BAT thermogenesis, as follows:

      “In mammalian cells, cytosolic NAD⁺ is regenerated through lactate dehydrogenase (LDH), the glycerol-3-phosphate shuttle (GPSh), or the malate-aspartate shuttle (MASh). In BAT, however, lactate production rises only slightly with adrenergic activation and most lactate is oxidized via the TCA cycle, suggesting that LDH primarily consumes NAD⁺ rather than regenerating it [PMID: 30456392; PMID: 37337122; PMID: 30456392; PMID: 37802078; PMID: 40982723]. Consequently, mitochondrial redox shuttles become critical for sustaining cytosolic NAD⁺ supply”.

      We have also provided additional references to support this new section at the introduction.

      __ In Fig. 1A and B-D, there are inconsistencies and duplications in the abbreviation labels. Please check and revise accordingly. __

      Answer 2) We thank the reviewer for this comment. We would like to clarify that Figure 1A is a schematic overview of the system, while Figures 1B–D show protein expression in specific contexts: whole BAT (B), whole liver (C), and BAT mitochondria (D). In Figures 1B and 1C, all components are shown because both cytosolic (MDH1 and GOT1) and mitochondrial proteins (MDH2, GOT2, Aralar 1 and 2 and OGC) are present. In contrast, Figure 1D shows only mitochondrial components (OGC, Aralar1, MDH2, and GOT2). Although Aralar2 is a mitochondrial protein, it was not detected in this study (Forner et al., 2009). Similarly, cytosolic components such as MDH1 and GOT1 are not shown in Figure 1D because they are absent in the mitochondrial fraction. We have revised the figure legend to make these distinctions clearer.

      __ In Fig. S1, the number of n indicated does not match the number of data points shown. Please clarify whether these represent technical replicates or biological replicates, and provide a detailed description of the statistical methods used throughout the manuscript.__

      Answer 3) We thank the reviewer for catching this and allowing us to correct our mistakes. In the revised version, we have corrected the figure legend of Supplementary Figure 1 so that the number of n matches the data points shown.

      __ Please provide details on the normalization strategy used in the BODIPY-C12/BODIPY-493 staining analysis, such as whether fluorescence intensity was quantified as mean or integrated values, and whether the analysis was normalized to lipid droplet area, cell number, or baseline. Since lipolytic stimulation can reduce droplet size and increase droplet number, these factors may bias the results. __

      Answer 4) We thank the reviewer for this important comment and apologize for the lack of detail regarding this analysis. The analysis of BODIPY-C12 and BODIPY-493 was performed by quantifying the mean fluorescence intensity of BODIPY-C12 detected within a mask generated from the BODIPY-493 signal. This approach allowed us to define all lipid droplets and measure the release of previously esterified C12. To account for variability across samples, the data were normalized to each sample’s individual baseline at time point 0 and expressed as fold change relative to this baseline. In the revised manuscript we have included this description in the Methods section (page 18, last paragraph) for clarity and reproducibility, as following:

      “Lipid Droplet area was defined based on Bodipy 493/503 signal, which was used to generate a mask identifying all lipid droplets. Within this mask, the mean fluorescence intensity of BODIPY C12 was quantified over time to monitor the release of previously esterified C12. To account for variability between samples, data were normalized to each sample’s individual baseline at time point 0 and expressed as fold change relative to this baseline.”

      __ The manuscript notes that the unexpected result in Fig. 3K-M in parallel with increased Atgl mRNA expression might be because it does not reflect protein levels or enzymatic activity. To strengthen this point, it is recommended to include data on ATGL and phosphorylation ATGL. __

      Answer 5) We thank the reviewer for this constructive comment. We have clarified these aspects in the revised Results and Discussion sections to reflect this interpretation more accurately as follows:

      “Notably, Atgl mRNA measurement in our study was primarily used as a marker of brown adipocyte differentiation, rather than as a direct indicator of ATGL protein abundance or enzymatic activity. We detected increased Atgl expression only in Ogc KD cells (Fig. S1H), but not in Aralar 1 KD cells (Fig. S1G). This likely does not reflect a major difference in differentiation status, as other brown adipocyte markers and norepinephrine-stimulated respiration were comparable between scramble and knockdown cells (Fig. 2D-G and 2N-O and S1G-H). Although lipolysis was not evaluated in Ogc KD cells, in Aralar 1 KD cells basal lipolysis remained unchanged (Fig. 4D-E and 4G-I), whereas norepinephrine-stimulated lipolysis was delayed or partially inhibited. Notably, the enhanced fatty acid esterification observed in Ogc KD cells despite elevated Atgl expression is not contradictory, since in brown adipocytes lipolysis and re-esterification occur concurrently to sustain high lipid turnover [34].

      __ Red-on-black is not a great color code for IMFs, how about black-and-white? __

      Answer 6) We have changed color text for white on figures 2H and K as suggested.

      __Reviewer #3 __

      Major points;

      1. __ Although in the manuscript Veliova and coworkers demonstrated that MAS is functional in brown adipocytes showing kinetic parameters equivalent to that previously described in other tissues, surprisingly, when its components are downregulated, no effect, or very little, on mitochondrial respiration is found (figure 2). This is an intriguing result since MAS disruption has been widely reported to impair respiration in different cell types and tissues. However, since no direct evidence of MAS dysfunction is provided, it is possible that MAS may still remain partially or fully functional under the conditions used by the authors, and therefore this point needs to be clarified to validate these results.__ Answer 1) We thank the reviewer for the insightful comment and the opportunity to clarify these important points regarding MASh dysfunction validation in our study. We acknowledge the reviewer’s observation that mitochondrial respiration was largely unaffected by MASh component knockdown, which is indeed intriguing. Importantly, as already indicated in our responses to Reviewer 1, we have provided new data showing direct molecular evidence of MASh impairment through substantial reductions in the aspartate/glutamate ratio in Aralar 1 KD cells (new Supplementary Figure S1F). This ratio is a well-established functional readout reflecting MASh activity and amino acid exchange between cytosol and mitochondria, as demonstrated in original experimental studies of MASh function in multiple tissues including brown adipocytes (PMID: 4436323). The reduction in the aspartate/glutamate ratio directly confirms loss of MASh functionality even though respiratory rates remained unchanged, likely due to metabolic compensation by robust glycerol phosphate shuttle (GPSh) activity, as further supported by our data showing reduced glycerol release upon norepinephrine stimulation in Aralar 1 KD cells cells (Figure 4I). This metabolic rerouting maintains cytosolic NAD⁺ regeneration and partially preserves respiration and energy metabolism under these experimental conditions (PMID: 168075; PMID: 40540398; PMID: 37647199). Thus, the combination of metabolomic, respirometry, and functional lipid data strongly indicates that MASh activity was disrupted specifically and effectively by our genetic interventions. This molecular evidence was already signposted in our original manuscript and responses, underscoring that MASh loss of function—and not residual or compensatory MASh activity—is responsible for the phenotypes reported. We greatly appreciate the reviewer’s insightful attention to this critical mechanistic issue and hope this provides clear reassurance that MASh impairment was indeed achieved and functionally validated within our study framework.

      Furthermore, strategies used to downregulate MAS components produce only a partial reduction in mRNA levels, about 70 %, but its outcome on protein levels has not been determined. and the remaining protein level could be sufficient to maintain shuttle activity. Therefore, the effect of silencing at protein level should be analyzed, because as authors also point out on page 16; "mRNA levels may not reflect actual protein levels or activity".

      Answer 2) We thank the reviewer for this important point. Our knockdowns resulted in ~70–80% reduction in mRNA levels. While not complete, this represents a substantial decrease and is sufficient to produce strong functional effects. At the time the experiments were performed, we did not have access to suitable antibodies, and the available antibodies did not provide reliable signals in our samples, which is why we used qPCR to estimate knockdown efficiency. Importantly, we observed clear phenotypic changes in both knockdowns (Aralar and OGC), and both showed very similar phenotypes. This suggests that the level of knockdown was sufficient to significantly impair MAS activity. In the revised version we added new data which further validated the functional impact of Aralar KD (given that this protein has an alternative isoform, as pointed out by the reviewer). We performed metabolomics experiments measuring aspartate and glutamate levels. Our new data shows that the aspartate-to-glutamate ratio is significantly reduced in Aralar KD cells. This ratio serves as a proxy for glutamate catabolism, and the observed decrease suggests reduced glutamate catabolism, likely due to impaired MAS activity. Therefore, the reduced whole-cell aspartate/glutamate ratio serves as a metabolic signature of MAS impairment, consistent with Aralar KD. These data indicate that Aralar is sufficiently downregulated to produce a functional effect, supporting our conclusion that MAS activity is impaired. The results have been added as new supplementary Figure 1 as follows:

      __ In the case of aspartate/glutamate carriers (AGCs) the role of citrin/slc25a13, the second AGC paralog, should also be analyzed. This AGC isoform is discarded based on proteomic data from brown adipose tissue, but, as it is shown in figure 1B, its levels are similar those of Aralar/slc25a12, the only AGC silenced. Besides, primary brown adipocytes differentiated for 7 days are used here, and it is possible that factors such as culture conditions or differentiation itself could alter AGC levels. Therefore, it is necessary to determine the protein levels of citrin/AGC2, and, if necessary, downregulate it together with the Aralar/AGC1 isoform. citrin/AGC2 activity may be responsible for the observed difference between the OGC and Aralar/AGC1 KD adipocytes.__

      Answer 3) We thank the reviewer for this important point. We chose Aralar1 because it is the isoform predominantly expressed in brown adipose tissue (PMID: 23436904). We acknowledge, however, that compensatory increases in Citrin/AGC2 upon Aralar1 knockdown are possible. To address this, we have included new metabolomics data in the revised manuscript (added as Supplementary Figure 1), which provides additional support that downregulation of Aralar1, even if not complete, is sufficient to cause a metabolic change reflected by a reduced aspartate/glutamate ratio in these cells. This functional change supports that the knockdown of Aralar1 alone is sufficient to study its role in brown adipocytes, although minor compensation by Citrin/AGC2 cannot be entirely excluded.

      To address this explicitly, we have added a paragraph to the discussion (page 13, 2nd paragraph) highlighting the potential for partial compensation by Citrin/AGC2 and explaining why the observed metabolic effects are still attributable to Aralar 1 knockdown, as follows:

      “Phenotypes observed in Aralar 1 KD cells closely resemble those in Ogc KD cells, particularly in terms of lipid metabolism alterations and energy expenditure. The main difference lies in mitochondrial morphology, which is altered in Ogc KD cells but remains unchanged in Aralar 1-silenced cells (Fig. 2J,M). Unlike Ogc, which lacks an alternative isoform, Aralar 1 has a paralog Aralar 2 (Citrin, or SLC25A13) that may partially compensate for its loss. This potential compensation might explain the preservation of mitochondrial morphology in Aralar 1 KD cells. Nonetheless, our metabolomics data demonstrate that downregulation of Aralar 1 alone significantly reduces the aspartate/glutamate ratio (Fig. S1D-F). Since this ratio reflects glutamate catabolism, its decrease indicates impaired malate-aspartate shuttle activity and reduced glutamate catabolism. Therefore, although compensation by Aralar 2 cannot be entirely excluded, Aralar 1 KD alone suffices to cause substantial impairment of malate-aspartate shuttle function”.

      • *

      __ OGC and Aralar/AGC1 silencing is associated with the accumulation of smaller lipid droplets and impaired norepinephrine-induced lipolysis, but no mechanistical evidence is provided. The authors discuss a role for AMPK signaling associated with the redox unbalance generated by MAS disfunction but neither of them is proven.__

      Answer 4) We thank the reviewer for this insightful question, which was also raised by Reviewer 1 (see Reviewer 1, Question 1 above). Here, we aim to clarify the mechanistic basis by which MASh may regulate lipolysis in BAT in a complementary and refined manner.

      Our new data directly addresses this issue by examining cytosolic redox status through the lactate/pyruvate ratio, a well-established indicator of NAD⁺/NADH balance. Under basal conditions, Aralar 1 KD cells showed no change in this ratio compared to controls, indicating preserved cytosolic NAD⁺ regeneration despite reduced MASh activity. This observation is consistent with previous studies demonstrating the resilience of cellular redox homeostasis through overlapping NAD⁺-regenerating systems (PMID: 40540398; PMID: 37647199). The new results are shown in Supplementary Figure 1.

      At the same time, we detected a marked decrease in the aspartate/glutamate ratio in Aralar 1 KD cells, confirming impaired MASh function and reduced amino acid exchange between cytosol and mitochondria. The lack of redox imbalance likely reflects compensatory mechanisms, most notably the GPSh, which is highly active in brown adipocytes. Supporting this view, Aralar 1 KD cells displayed significantly reduced glycerol release upon norepinephrine stimulation (Fig. 4I), suggesting enhanced metabolic cycling of G3P through mitochondrial and cytosolic G3PDH, thereby sustaining NAD⁺ regeneration and redox equilibrium.

      We therefore propose that, although MASh normally facilitates NADH export and aspartate/glutamate exchange, its loss activates GPSh-mediated compensation that preserves cytosolic NAD⁺/NADH balance and maintains lipolytic flux during adrenergic stimulation. These findings refine our mechanistic understanding of how redox shuttle interplay supports glycolytic and lipolytic processes in BAT. Future studies employing NAD⁺/NADH sensors and simultaneous blockade of both shuttles will be essential to dissect these compensatory mechanisms in greater detail.

      Minor points;

      1. __ Is pyruvate present in respiration medium? If so, no effect on respiration is expected as pyruvate reverses the respiratory defects caused by MAS inactivation. __ Answer 1) Thanks for this important insight. In fact, as indicated in the methods section (page 17, last paragraph) all respirometry experiments were carried out in the absence of pyruvate in the media. Therefore, preserved overall respiratory rates in Aralar 1 and Ogc KD cannot be explained by compensatory pyruvate oxidation present in the media.

      __ In figure 4, only data from Aralar KD cells in relation to norepinephrine-stimulated lipolysis are shown. What happens when OGC is silenced? __

      Answer 2) This is a very interesting and relevant question. We did not perform the norepinephrine-stimulated lipolysis experiments in Ogc-silenced cells, since in most of the other experiments presented in the manuscript Ogc and Aralar 1 silencing converged to very similar, if not identical, phenotypes. Based on these consistent overlaps, we anticipate that Ogc KD would likely lead to comparable effects on lipolysis as observed in Aralar 1 KD cells. Nonetheless, we fully agree that direct assessment of lipolysis upon Ogc KD would strengthen this conclusion, and we consider this an important aspect for future studies.

      __ Nomenclature used for mitochondrial carriers is confusing. Please do not use OGC1 as there is only one isoform. Furthermore, different names for OGC are used in the manuscript; oxoglutarate carrier, malate-ketoglutarate carrier or OGC1/SLC25A11. In the case of citrin/AGC2, Aralar2 is used and is a uncommon designation.__

      Answer 3) We corrected all OGC naming in the revised manuscript. We also changed “aralar 2” for “citrin” since this was more commonly used in the literature.

      __ Some panels of figures 3 and 4 should be improved. Panels 3J, 3L and 4G are difficult to see. In panel 3J please clarify UT line from untreated/NE, are they not transduced? No equivalents conditions are assayed in Aralar KD and OGC KO cells.__

      Answer 4) We thank the reviewer for giving us the opportunity to improve this figure and apologize for the confusing labeling. In the revised version, we have clarified the labels in panels 3J, 3L, and 4G to improve visibility, and we have added descriptions of all abbreviations to the figure legends, accordingly.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      In this manuscript, Veliova and coworkers explore the contribution of the malate-aspartate NADH shuttle (MAS) to energy metabolism in brown adipose tissue. This work done by a group expert in mitochondrial metabolism, continues an interesting previous one (Veliova, 2020) where it was shown that the inhibition of the mitochondrial pyruvate carrier caused an increase in energy expenditure mediated by the activation of MAS in BAT. Here, the authors have explored the consequences of the lack of MAS activity on BAT metabolism by the silencing of the metabolite transporters that are part of MAS in cultured primary brown adipocytes. Using this loss-of-function approach, the role for MAS in the regulation of lipid homeostasis in BAT is analyzed. The results could be interesting, but in my opinion, they are not sufficiently proven. Much more evidence should be provided to confirm MAS deficiency and the mechanisms involved in the alteration of lipid homeostasis.

      Major points

      1. Although in the manuscript Veliova and coworkers demonstrated that MAS is functional in brown adipocytes showing kinetic parameters equivalent to that previously described in other tissues, surprisingly, when its components are downregulated, no effect, or very little, on mitochondrial respiration is found (figure 2). This is an intriguing result since MAS disruption has been widely reported to impair respiration in different cell types and tissues. However, since no direct evidence of MAS dysfunction is provided, it is possible that MAS may still remain partially or fully functional under the conditions used by the authors, and therefore this point needs to be clarified to validate these results. Furthermore, strategies used to downregulate MAS components produce only a partial reduction in mRNA levels, about 70 %, but its outcome on protein levels has not been determined. and the remaining protein level could be sufficient to maintain shuttle activity. Therefore, the effect of silencing at protein level should be analyzed, because as authors also point out on page 16; "mRNA levels may not reflect actual protein levels or activity".
      2. In the case of aspartate/glutamate carriers (AGCs) the role of citrin/slc25a13, the second AGC paralog, should also be analyzed. This AGC isoform is discarded based on proteomic data from brown adipose tissue, but, as it is shown in figure 1B, its levels are similar those of Aralar/slc25a12, the only AGC silenced. Besides, primary brown adipocytes differentiated for 7 days are used here, and it is possible that factors such as culture conditions or differentiation itself could alter AGC levels. Therefore, it is necessary to determine the protein levels of citrin/AGC2, and, if necessary, downregulate it together with the Aralar/AGC1 isoform. citrin/AGC2 activity may be responsible for the observed difference between the OGC and Aralar/AGC1 KD adipocytes.
      3. OGC and Aralar/AGC1 silencing is associated with the accumulation of smaller lipid droplets and impaired norepinephrine-induced lipolysis, but no mechanistical evidence is provided. The authors discuss a role for AMPK signaling associated with the redox unbalance generated by MAS disfunction but neither of them is proven.

      Minor points

      1. Is pyruvate present in respiration medium? If so, no effect on respiration is expected as pyruvate reverses the respiratory defects caused by MAS inactivation.
      2. In figure 4, only data from Aralar KD cells in relation to norepinephrine-stimulated lipolysis are shown. What happens when OGC is silenced?
      3. Nomenclature used for mitochondrial carriers is confusing. Please do not use OGC1 as there is only one isoform. Furthermore, different names for OGC are used in the manuscript; oxoglutarate carrier, malate-ketoglutarate carrier or OGC1/SLC25A11. In the case of citrin/AGC2, Aralar2 is used and is a uncommon designation.
      4. Some panels of figures 3 and 4 should be improved. Panels 3J, 3L and 4G are difficult to see. In panel 3J please clarify UT line from untreated/NE, are they not transduced? No equivalents conditions are assayed in Aralar KD and OGC KO cells.

      Significance

      General assessment: The robust part of this study is its analysis of some aspects related to lipid metabolism in cultured primary cells derived from brown adipose tissue. The participating teams are well-versed in this topic and the approaches used are correct. However, no data in animal models supporting these results are provided and this fact rests interest.

      Advance: This manuscript is the "logical" continuation of a previous study, Veliova et al., (2020) EMBO Rep, more relevant in my opinion. Also, recently, it has been also proposed using animal models, either by overexpression or using deficient mice for GOT1 a cytosolic protein component of MAS, a role for MAS in BAT thermogenesis (Park et al., Cell Rep. 2025). The novelty in this manuscript is the analysis of deficient cells in the metabolite transporter that regulate the direction of NADH shuttling. However, since no evidence is provided its effect on NAD+/NADH ratio, the conclusions related to the role of MAS, or the mitochondrial carriers silenced, in the regulation of lipolysis in BAT and its involvement in thermogenesis are not convinced.

      Audience: These results could be of interest to the audience interested in basic research, but could also be useful in the translational/clinical area because they address metabolic aspects in adipose tissue.

      My expertise is focus on mitochondrial metabolism, specifically in the function of a subtype of mitochondrial carriers regulated by cytosolic calcium and how they participate in the control of different mitochondrial functions, such as respiration, calcium buffering, cell proliferation. Some of these transporters are components of MAS such as Aralar/AGC1 or citrin/AGC2.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      This manuscript presents novel findings on the role of the malate-aspartate shuttle (MASh) in brown adipose tissue (BAT). Building on the recent advances in elucidating the contribution of MASh to BAT metabolism, the present study provides new evidence by offering direct biochemical validation using a reconstituted BAT mitochondrial system and by introducing genetic data on the mitochondrial carriers OGC1 and Aralar1, thereby adding significant new insight. However, the following points require further clarification.

      Major points:

      1. In the current manuscript, mitochondrial morphology (area, aspect ratio, and roundness) was analyzed in OGC1 KD cells using TMRE, whereas MitoTracker Deep Red (MTDR) was used in Aralar1 KD cells. Notably, TMRE is a ΔΨm-dependent probe. The signal intensity can change, or the distribution may reflect alterations in membrane potential rather than true morphological changes. Therefore, the observed differences in OGC1 KD cells based on TMRE staining may be confounded by the dye's functional dependence, potentially biasing the conclusions. It is recommended to evaluate mitochondrial morphology with consistent trackers across conditions. In addition, in the subsequent OCR analysis, mitochondrial area was used for normalization. Please clarify which staining method was employed, and provide justification for its suitability.

      Minor points:

      1. In the introduction, the authors state that "LDH activity increases in the context of BAT activation". This point is important for the logic of the manuscript, reference [10] cited here is not sufficient to support this claim. It is recommended to provide appropriate references to support this statement.
      2. In Fig. 1A and B-D, there are inconsistencies and duplications in the abbreviation labels. Please check and revise accordingly.
      3. In Fig. S1, the number of n indicated does not match the number of data points shown. Please clarify whether these represent technical replicates or biological replicates, and provide a detailed description of the statistical methods used throughout the manuscript.
      4. Please provide details on the normalization strategy used in the BODIPY-C12/BODIPY-493 staining analysis, such as whether fluorescence intensity was quantified as mean or integrated values, and whether the analysis was normalized to lipid droplet area, cell number, or baseline. Since lipolytic stimulation can reduce droplet size and increase droplet number, these factors may bias the results.
      5. The manuscript notes that the unexpected result in Fig. 3K-M in parallel with increased Atgl mRNA expression might be because it does not reflect protein levels or enzymatic activity. To strengthen this point, it is recommended to include data on ATGL and phosphorylation ATGL.
      6. Red-on-black is not a great color code for IMFs, how about black-and-white?

      Referees cross-commenting

      To my opinion, all three reviewers have provided constructive criticism of the work.

      Significance

      The work dives deeper into mitochondrial function and metabolism of brown adipocytes and, thus, advances our understanding of thermogenesis in an incremental fashion. The work will be relevant to brown adipose tissue researchers and mitochondrial biologist.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      The paper makes a clear, well-supported case that the malate-aspartate shuttle (MAS) is active in brown adipocytes and supports adrenergically stimulated lipolysis. The combination of a functional MAS assay, targeted carrier knockdowns, and multi-modal lipolysis measurements is a strong package. The reconstituted mitochondrial assay paired with live-cell lipolysis imaging is technically elegant and broadly reusable. The main gap is the limited in-vitro scope relative to in-vivo cold adaptation.

      Major comments

      1. The manuscript posits that the loss of function of MASh components (Ogc1 and Aralar) decreases adrenergic-stimulated lipolysis by altering the cytosolic NAD⁺/NADH ratio, with AMPK/ACC mentioned as possible mediators. However, this remains speculative. Please provide mechanistic data directly linking MASh-dependent NAD⁺/NADH changes to the regulation of lipolysis in brown adipocytes during adrenergic stimulation.
      2. The absence of in vivo analysis of lipid-droplet size in MASh loss-of-function models is a major concern. In vitro results could be confounded by differences in differentiation stage between groups. Please document equivalent adipogenesis across groups (e.g., Pparg/Cebpa/Plin1/Fabp4 expression)
      3. Please include rescue experiments (add-back OGC1 and Aralar) to rule out siRNA/shRNA off-target effects and verify that the phenotype stems from MASh loss of function.
      4. Please expand on physiological significance: What is the importance of MASh regulation of BAT lipolysis in long-term adaptive thermogenesis?

      Minor comments

      1. Fig. 4 legend/title contains a typo ("lypolysis" → lipolysis).
      2. In Fig. 2 legend line: "Adevirus-mediated" → Adenovirus-mediated; "OCAR" → OCR.
      3. For lipolysis imaging, you already show Forskolin/Atglistatin/Etomoxir controls; add a vehicle-only time course overlay in the main figure (currently in text/legend) to aid visual comparison.
      4. Ensure consistent gene symbols (Atgl/Pnpla2), and protein capitalization.

      Referees cross-commenting

      In my view, the feedback offered by all three reviewers has been highly constructive, as each of them has contributed thoughtful and meaningful criticism that can help improve the quality, clarity, and overall impact of the work.

      Significance

      Advance - how it fits the literature and what kind of advance.

      Relative to prior work linking MASh (often via GOT1) to fuel preference and redox during thermogenesis, this study fills a mechanistic gap by showing that carrier-level MASh disruption (Aralar1/OGC1) specifically impairs adrenergic lipid mobilization upstream of β-oxidation, while respiration per cell can be buffered by compensatory mitochondrial biogenesis (lower OCR per mitochondrion). Conceptual/fundamental advance: it sharpens the redox - lipolysis axis in BAT and clarifies why changes in fuel availability (lipolysis) may limit thermogenesis even when bulk OCR looks preserved.

      Audience - who will be interested/influenced.

      Specialized but cross-cutting: adipose biology & thermogenesis, mitochondrial/redox metabolism, lipid-droplet and lipolysis communities, and metabolic-disease researchers exploring strategies to modulate BAT fuel handling.

      Reviewer expertise

      Adipose tissue and systemic energy metabolism; mitochondrial bioenergetics; thermogenic mechanisms in BAT/beige fat; transcriptional and metabolic control of lipid mobilization. Not a specialist in membrane-carrier biophysics.

    1. Reviewer #1 (Public review):

      Summary:

      The authors investigate the role of H3K115ac in mouse embryonic stem cells. They report that H3K115ac localizes to regions enriched for fragile nucleosomes, CpG islands, and enhancers, and that it correlates with transcriptional activity. These findings suggest a potential role for this globular domain modification in nucleosome dynamics and gene regulation. If robust, these observations would expand our understanding of how non-tail histone modifications contribute to chromatin accessibility and transcriptional control.

      Strengths:

      (1) The study addresses a histone PTM in the globular domain, which is relatively unexplored compared to tail modifications.

      (2) The implication of a histone PTM in fragile nucleosome localization is novel and, if substantiated, could represent a significant advance for the field.

      Weaknesses:

      (1) The absence of replicate paired-end datasets limits confidence in peak localization.

      (2) The analyses are primarily correlative, making it difficult to fully assess robustness or to support strong mechanistic conclusions.

      (3) Some claims (e.g., specificity for CpG islands, "dynamic" regulation during differentiation) are not fully supported by the analyses as presented.

      (4) Overall, the study introduces an intriguing new angle on globular PTMs, but additional rigor and mechanistic evidence are needed to substantiate the conclusions.

    2. Reviewer #2 (Public review):

      Summary:

      Kumar et al. aimed to assess the role of the understudied H3K115 acetylation mark, which is located in the nucleosomal core. To this end, the authors performed ChIP-seq experiments of H3K115ac in mouse embryonic stem cells as well as during differentiation into neuronal progenitor cells. Subsequent bioinformatic analyses revealed an association of H3K115ac with fragile nucleosomes at CpG island promoters, as well as with enhancers and CTCF binding sites. This is an interesting study, which provides important novel insights into the potential function of H3K115ac. However, the study is mainly descriptive, and functional experiments are missing.

      Strengths:

      (1) The authors present the first genome-wide profiling of H3K115ac and link this poorly characterized modification to fragile nucleosomes, CpG island promoters, enhancers, and CTCF binding sites.

      (2) The study provides a valuable descriptive resource and raises intriguing hypotheses about the role of H3K115ac in chromatin regulation.

      (3) The breadth of the bioinformatic analyses adds to the value of the dataset

      Weaknesses:

      (1) I am not fully convinced about the specificity of the antibody. Although the experiment in Figure S1A shows a specific binding to H3K115ac-modified peptides compared to unmodified peptides, the authors do not show any experiment that shows that the antibody does not bind to unrelated proteins. Thus, a Western of a nuclear extract or the chromatin fraction would be critical to show. Also, peptide competition using the H3K115ac peptide to block the antibody may be good to further support the specificity of the antibody. Also, I don't understand the experiment in Figure S1B. What does it tell us when the H3K115ac histone mark itself is missing? The KLF4 promoter does not appear to be a suitable positive control, given that hundreds of proteins/histone modifications are likely present at this region.

      It is important to clearly demonstrate that the antibody exclusively recognizes H3K115ac, given that the conclusion of the manuscript strongly depends on the reliability of the obtained ChIP-Seq data.

      (2) The association of H3K115ac with fragile nucleosomes based on MNase-Sensitivity and fragment length, which are indirect methods and can have technical bias. Experiments that support that the H3K115ac modified nucleosomes are indeed more fragile are missing.

      (3) The comparison of H3K115ac with H3K122ac and H3K64ac relies on publicly available datasets. Since the authors argue that these marks are distinct, data generated under identical experimental conditions would be more convincing. At a minimum, the limitations of using external datasets should be discussed.

      (4) The enrichment of H3K115ac at enhancers and CTCF binding sites is notable but remains descriptive. It would be interesting to clarify whether H3K115ac actively influences transcription factor/CTCF binding or is a downstream correlate.

      (5) No information is provided about how H3K115ac may be deposited/removed. Without this information, it is difficult to place this modification into established chromatin regulatory pathways.

      At the very least, the authors should acknowledge these limitations and provide additional validation of antibody specificity.

    3. Author response:

      Reviewer 1:

      Comment 1. The reviewer was under the impression that that we did not perform biological replicates of our ChIP-seq experiments. All ChIP-seq (and ATAC-seq) experiments were performed with biological replicates and the Pearson’s correlations (all >0.9) between replicates were provided in Supplementary Table 1. We had indicated this in the text and methods but will try to make this even clearer.

      Reviewer 2:

      Comment 2. The reviewer states that our claim of H3K115ac being associated with fragile nucleosomes is based solely on MNase sensitivity and fragment length. This is not correct. Figure 3C and D show the results of sucrose gradient sedimentation experiments, followed by ChIP-seq clearly showing that H3K115ac fractionates with chromatin particles that are enriched for fragile nucleosomes and subnucleosomes. By contrast, H3K115ac is not enriched in stable mononucleosome

      Comment 3. The reviewer states that our H3K122ac and H3K64ac comparison rely on publicly available datasets. We would emphasize that these are our own datasets generated and published previously (Pradeepa et. al., 2016) but using exactly the same native MNase ChIP protocol as used here for H3K115ac and processed with identical computational pipelines.

      Reviewer 3:

      Reviewer 3 is mistaken in thinking our ChIP experiments are performed under cross-linked conditions. As clearly stated in the main text and methods, all our ChIP-seq for histone modifications is done on native MNase-digested chromatin – with no cross-linking. This includes the spike-in experiment shown in Fig S1B to test H3K115ac antibody specificity against the bar-coded SNAP-ChIP® K-AcylStat Panel from Epicypher. We could not include H3K115ac bar-coded nucleosomes in that experiment since they are not available in the panel. 

      Following that, we would propose to make minor revisions in response to specific reviewer recommendations before posting a version of record. These would include:

      (1) Figure 2: title needs change: "H3K115ac marks CpG island promoters poised for activation". this is to make sure it reads with the title for the corresponding section in the main text. Also see: Reviewer 1 comment 7 in Recommendations part. 

      (2) Figure S2B: legend should read: "Gene ontology analysis for the set of genes analysed in Figure 2C"

      (3) Figure F4D: Provide the replicates for western blot 

      (4) Figure 4A,B: Corrected formatting issues.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      The manuscript by Bru et al. focuses on the role of vacuoles as a phosphate buffering system for yeast cells. The authors describe here the crosstalk between the vacuole and the cytosol using a combination of in vitro analyses of vacuoles and in vivo assays. They show that the luminal polyphosphatases of the vacuole can hydrolyse polyphosphates to generate inorganic phosphate, yet they are inhibited by high

      concentrations. This balances the synthesis of polyphosphates against the inorganic phosphate pool. Their data further show that the Pho91 transporter provides a valve for the cytosol as it gets activated by a decline in inositol pyrophosphate levels. The authors thus demonstrate how the vacuole functions as a phosphate buffering system to maintain a constant cytosolic inorganic phosphate pool. 

      This is a very consistent and well-written manuscript with a number of convincing experiments, where the authors use isolated vacuoles and cellular read-out systems to demonstrate the interplay of polyphosphate synthesis, hydrolysis, and release. The beauty of this system the authors present is the clear correlation between product inhibition and the role of Pho91 as a valve to release Pi to the cytosol to replenish the cytosolic pool. I find the paper overall an excellent fit and only have a few issues, including: 

      (1) Figure 3: The authors use in their assays 1 mM ZnCl2 or 1mM MgCl2. Is this concentration in the range of the vacuolar luminal ion concentration? Did they also test the effect of Ca2+, as this ion is also highly concentrated in the lumen? 

      The concentrations inside vacuoles reach those values. However, given that polyP can chelate divalent metal ions, what would matter are the concentrations of free Zn<sup>2+</sup> or Mg<sup>2+</sup> inside the organelle. These are not known. This is not critical since we use those two conditions only as a convenient tool to differentiate Ppn1 and Ppn2 activity in vitro. In our initial characterisation of Ppn2 (10.1242/jcs.201061), we had also tested Mn, Co, Ca, Ni, Cu. Only Zn and Co supported activity. Ca did not. Andreeva et al. (10.1016/j.biochi.2019.06.001) reached similar conclusions and extended our results.

      (2) Regarding the concentration of 30 mM K-PI, did the authors also use higher and lower concentrations? I agree that there is inhibition by 30 mM, but they cannot derive conclusions on the luminal concentration if they use just one in their assay. A titration is necessary here. 

      The concentration of 30 mM was not chosen arbitrarily. It is the luminal P<sup>i</sup> concentration that the vacuoles reached through polyP synthesis and hydrolysis when they entered a plateau of luminal P<sup>i</sup>. We consider this as an upper limit because polyP kept increasing which luminal P<sup>i</sup> did not. Thus, there is no physiological motivation for trying higher values. We have nevertheless added a titration to the revised version (new Fig. 3A).

      (3) What are the consequences on vacuole morphology if the cells lack Pho91? 

      We had not observed significant abnormalities during a screen of the genome-wide deletion collection of yeast (10.1371/journal.pone.0054160), nor in other experiments with pho91 mutants, which we have not included in this manuscript due to a lack of effect.

      (4) Discussion: The authors do not refer to the effect of calcium, even though I would expect that the levels of the counterion should affect the phosphate metabolism. I would appreciate it if they would extend their discussion accordingly. 

      The situation is much more complex because Ca2+ is not the only counterion. Major pools of counterions (up to hundreds of mM) are constituted by vacuolar lysine, arginine, polyamines, Mg, Zn etc. Their interplay with polyP is probably complex and worth to be treated in a dedicated project. If we wanted to limit the discussion of this complexity not to the simple statement that it is not understood, which is not very useful, we would have to engage in a lot of speculation. We feel that this would make the discussion lose focus and not contribute concrete insights.

      (5) I would appreciate a brief discussion on how phosphate sensing and control are done in human cells. Do they use a similar lysosomal buffer system? 

      Mammalian cells have their Pi exporter XPR1 mainly on a lysosome-like compartment (10.1016/j.celrep.2024.114316). Whether and how it functions there for Pi export from the cytosol is not entirely clear. We have addressed this situation in the revised discussion section.

      Reviewer #2 (Public review): 

      Summary: 

      This manuscript presents a well-conceived and concise study that significantly advances our understanding of polyphosphate (polyP) metabolism and its role in cytosolic phosphate (Pi) homeostasis in a model unicellular eukaryote. The authors provide evidence that yeast vacuoles function as dynamic regulatory buffers for Pi homeostasis, integrating polyP synthesis, storage, and hydrolysis in response to cellular metabolic demands. The work is methodologically sound and offers valuable insights into the conserved mechanisms of phosphate regulation across eukaryotes. 

      Strengths: 

      The results demonstrate that the vacuolar transporter chaperone (VTC) complex, in conjunction with luminal polyphosphatases (Ppn1/Ppn2) and the Pi exporter Pho91, establishes a finely tuned feedback system that balances cytosolic Pi levels. Under Pi-replete conditions, inositol pyrophosphates (InsPPs) promote polyP synthesis and storage while inhibiting polyP hydrolysis, leading to vacuolar Pi accumulation. 

      Conversely, Pi scarcity triggers InsPP depletion, activating Pho91-mediated Pi export and polyP mobilization to sustain cytosolic phosphate levels. This regulatory circuit ensures metabolic flexibility, particularly during critical processes such as glycolysis, nucleotide synthesis, and cell cycle progression, where phosphate demand fluctuates dramatically. 

      From my viewpoint, one of the most important findings is the demonstration that vacuoles act as a rapidly accessible Pi reservoir, capable of switching between storage (as polyP) and release (as free Pi) in response to metabolic cues. The energetic cost of polyP synthesis-driven by ATP and the vacuolar proton gradient-highlights the evolutionary importance of this buffering system. The study also draws parallels between yeast vacuoles and acidocalcisomes in other eukaryotes, such as Trypanosoma and Chlamydomonas, suggesting a conserved role for these organelles in phosphate homeostasis. 

      Weaknesses: 

      While the manuscript is highly insightful, referring to yeast vacuoles as "acidocalcisome-like" may warrant further discussion. Canonical acidocalcisomes are structurally and chemically distinct (e.g., electrondense, in most cases spherical, and not routinely subjected to morphological changes, and enriched with specific ions), whereas yeast vacuoles have well-established roles beyond phosphate storage. A comment on this terminology could strengthen the comparative analysis and avoid potential confusion in the field.  

      Yeast vacuoles show all major chemical features of acidocalcisomes. They are acidified, contain high concentrations of Ca, polyP (which make them electron-dense, too), other divalent ions, such as Mg, Zn, Mn etc, and high concentrations of basic amino acids. Thus, they clearly have an acidocalcisome-like character. In addition, they have hydrolytic, lysosomelike functions and, depending on the strain background, they can be larger than acidocalcisomes described e.g. in protists. We have elaborated on this point in the introduction of the revised version.

      Reviewer #3 (Public review): 

      Bru et al. investigated how inorganic phosphate (Pi) is buffered in cells using S. cerevisiae as a model. Pi is stored in cells in the form of polyphosphates in acidocalcisomes. In S. cerevisiae, the vacuole, which is the yeast lysosome, also fulfills the function of Pi storage organelle. Therefore, yeast is an ideal system to study Pi storage and mobilization. 

      They can recapitulate in their previously established system, using isolated yeast vacuoles, findings from their own and other groups. They integrate the available data and propose a working model of feedback loops to control the level of Pi on the cellular level. 

      This is a solid study, in which the biological significance of their findings is not entirely clear. The data analysis and statistical significance need to be improved and included, respectively. The manuscript would have benefited from rigorously testing the model, which would also have increased the impact of the study. 

      It is not clear to us what the reviewer would see as a more rigorous test of the model.  

      Reviewer #1 (Recommendations for the authors): 

      (1) Figure 2: Why do the authors label the blue curve in A and B as BY and in C and D as WT? Is this a different genetic background they used here? This should be specified in the legend. 

      No, it is the same background. The figures had been reshuffled before submission and we overlooked to replace "BY" by "WT". This has been corrected. Now we consistently use WT in all figures

      (2) Figure 4 has different scaling for the two panels, which should be labeled as A and B. I am aware that the authors do this for comparison, but it is rather confusing at first glance. I recommend having them at the same scale. 

      We chose this representation on two separate scales because this figure shall primarily illustrate that the shift between pho91 and WT curves vanishes in the presence of IP7. We now highlight in the figure legend that the scales are different to avoid confusion.  

      (3) Figure 8: I would appreciate a model with normal and low Pi concentrations in comparison, as this is what the authors worked out. 

      We have modified the figure. It now compares Pi-rich and Pi-limited scenarios.

      (4) Minor issue: Wouldn't it make more sense to show the molar concentration in the Figures rather than the nmol of Pi/ug of protein? I am aware that this would require information on the vacuole volume rather than the reaction volume, and the authors do this calculation later on. 

      It depends. We often chose this representation because it illustrates the price to pay (metabolic input in terms of protein that must be dedicated to this task) to sequester a certain quantity of P<sup>i</sup>. But, as we provide the corresponding P<sup>i</sup> concentration in the text, this information is accessible to the reader, too.

      Reviewer #2 (Recommendations for the authors): 

      As stated above in the weaknesses section, while functional parallels exist, canonical acidocalcisomes are structurally and chemically distinct, typically smaller, electron-dense, and enriched with cations. Whereas yeast vacuoles are larger, multifunctional organelles with well-established roles beyond phosphate storage. Explicitly addressing these differences would strengthen the comparative framework and prevent potential confusion in interpreting the evolutionary relationships between these organelles. 

      We agree to some degree, which is the reason why we refer to vacuoles as acidocalcisome-like organelles. In fact, vacuoles share virtually all defining chemical traits of acidocalcisomes. They just have a second functional domain as hydrolytic, lysosome-like organelles. Given the plasticity of endo-lysosomal compartments, and acidocalcisomes belong to this group because of their biogenesis through the AP3 pathway, this is not shocking to us. But the reviewer's comment made us realize that it is better to explicitly address this point. We have added a section to the introduction to do this.

      Reviewer #3 (Recommendations for the authors): 

      (1) Page 8: It is unclear why the authors only estimated the Pi concentration in wild-type vacuoles. This should also be done for vacuoles from other strains. 

      This information is inherent in Figure 2. PolyP hyperaccumulating strains show the same plateau as the wildtype, meaning that they also reach around 30 mM luminal Pi concentration, whereas vtc4 vacuoles reach only around 1/10th of that increase, indicating that they remain at 3 mM. We mention this now in the text.

      (2) The attempts of the localization of Pho91 through tagging are not satisfactory. The author described different localizations for Pho91 depending on whether it was tagged on the N- or C-terminus or when Nterminally tagged and overexpressed using two strong promoters. While it is not uncommon that proteins show different localization patterns, depending on where the tag is inserted, it is possible that one of the tags would reflect the localization of the endogenous protein. There is an easy way to test this, in particular when Pho91 is endogenously tagged. pho91∆ has reported phenotypes such as abnormal vacuolar morphology or increased autophagy. They could also measure PI content in vacuoles. The authors could compare the phenotypes of the endogenously tagged strains with WT and a pho91∆ strain. 

      Indeed, the attempts to localise the protein through fluorescent tags are unsatisfactory, in our hands as in the hands of others. We would not have created a series of many different tagged versions (we present only a selection of these in the manuscript) if the creation of a faithful reporter for Pho91 localisation were so straightforward. Expression from the endogenous promoter yields quite low signals (which is why others have overexpressed their GFP fusion from strong promotors). But overexpression brings at least a significant part of the protein to the cell surface, where it can then function as Pi importer and suffice to restore much of the maximal Pi uptake capacity that genuine plasma membrane transporters provide and support normal growth of the cells (Wykoff & O’Shea, 2001). But the localisation pattern of Pho91-GFP, likewise overexpressed from a strong promotor, does not reflect this plasma membrane localisation (see the references that the reviewer mentioned under (3)). The published overexpressed GFP-fusions localise only to the vacuole, suggesting that even in this case the GFP tag may create an artefact. Therefore, we went through a large variety of Pho91 gene fusions, which led us to the conclusion that the protein is very sensitive to tags at both ends and that fusion proteins hence are unlikely to reliably report the correct location of the protein. Given this, we resorted to quantitative proteomics to clarify the issue. This quantitative experiment goes beyond previously published proteomics analyses that the reviewer mentions under (3), which found the protein in the vacuolar fraction but did not calculate the enrichment factors, which is crucial. 

      A strong phenotype of abnormal vacuolar morphology is not apparent in our cultures. 

      (3) Moreover, Pho91 has been identified as a component enriched in vacuolar-mitochondria contact sites (vCLAMP), and this localization was confirmed with GFP-Pho91 (PMID: 25026036). Likewise, PMID: 35175277 also detected Pho91 by mass spectrometry as a vacuolar protein and showed endogenously tagged GFP-Pho91 on the vacuole (co-staining with Vph1). The authors may request the strains from the authors of these papers and use them for their experiments. PMID: 17804816, the oldest of the three reports (from 2007) reports a GFP-Pho91 under either TEF or ADH promoter that localizes to the vacuole. They also showed that the fusion protein is functional. These and other experiments led them to conclude that Pho91 exports phosphate from the vacuolar lumen to the cytoplasma. 

      We have now included these references. As argued above, we have analysed also the strains from PMID17804816. The observed clear localisation of the fusion protein to vacuoles is only visible upon overexpression, not upon expression from the endogenous locus. Apparently also this construct is unlikely to report Pho91 localisation reliably (though, by chance, overexpression leads it to the correct location). Thus, we maintain our conclusion that C- or N-terminally GFP-tagged versions of Pho91 are unreliable tools for localising the protein.

      (4) The impact of pho91∆ on Pho4-GFP nuclear localization is modest at best (increase from 5% of cells showing Pho4-GFP in the nucleus in WT vs 10% in pho91∆), and only somewhat stronger in ppn1∆/ppn2∆. This means 90% of pho91∆ cells do not respond, and Pho4-GFP stays cytoplasmic. It is unclear how the author can derive a meaningful conclusion from these data. Moreover, are these data really supporting the model, or do these data rather indicate that there are additional factors/pathways needed? What is the biological significance of the marginal increase from 5% to 10% of cells that would respond? What happens to the cells that cannot respond? Will they die or at least have a growth disadvantage? It would be useful to provide some functional studies. 

      We should have explained the nature of the assay better. The experiment exploits the fact that dividing yeast cells transiently fall into a state of Pi scarcity during S-phase. Since S-phase is less than a quarter of the cell cycle, only a small fraction of the cells transiently activates the PHO pathway. These cannot be well characterised by ensemble assays, but microscopy circumvents this background of the whole population and picks them up very clearly, allowing to quantify them. We have adapted the respective chapter in the results section to improve the description of this experiment.

      (5) The quantification of the data is suboptimal, as in most assays the mean and standard error of the mean (SEM) are given. SEM is not really appropriate in these cases because it gives only the error of the mean and not of the entire data. Therefore, the standard deviation (SD) is needed, which reports on the variability of the data, and which is usually much larger than the SEM. Using the SD, would also allow the authors to do proper statistical analysis, which is missing entirely in this manuscript. 

      SEM also comprises the variability of the data. It is linked with the SD (SEM=SD/SQRT(n)), but SEM also considers the number of the experiments n. The main goal is to compare the means, and SEM is an appropriate and frequently used tool for this because it illustrates how well the arithmetic mean may estimate the true mean of the population. Therefore, we kept the SEM but have added tests of significance for the differences shown.

      (6) Statistical testing in Figure 7 is essential as the effects are very small. Again, are these changes big enough for a biologically meaningful response? The authors should at least discuss this. 

      Our previous time course analyses of InsPP dynamics, performed under comparable conditions as in this study, showed that InsP8 decreases by around 50% in the first 30 min after transfer to Pi starvation (DOI: https://doi.org/10.7554/eLife.87956) and that this decline is already sufficient to trigger the PHO starvation program, as assessed by Pho4-GFP translocation into the nucleus. Thus, a 50% decrease, which is observed in ppn1 ppn2 mutants, is functionally significant. We have now also evaluated statistical significance in Fig. 7, which is given for the 50% reduction of InsP8 and 1-InsP7 in ppn1 ppn2. 

      Minor points: 

      (1) There are a number of smaller edits (use of italic or better the absence thereof, lacking information in the reference list, and some typos). 

      Thank you. We have corrected those.

      (2) The exact n should be given in the Figure legend. 

      Corrected.

      (3) Page 8, line 8: it would be nice to have a picture of the wild-type vacuoles and what you measured. 

      We now present a sample image in the new Suppl. Fig. 1.

      (4) PMID: 11779791 showed already that Pho91 cannot rescue the absence of the plasma membrane Pi transporters. This study should be at least cited. 

      This is not quite correct. The study that the reviewer mentions showed that Pho91 supports slower growth and the authors concluded that "A synthetic lethal phenotype was observed when (all) five phosphate transporters were inactivated...". We had cited the same group and the same first author, just using their later study (Wykoff et al., 2007) that had recapitulated the results from PMID11779791 and showed in addition quite good growth of the PHO91 expressing strain on YPD (Suppl. Fig. 2). We had obtained the strains from this group. In reproducing their experiments, we noticed that the growth of Pho91 that these authors had observed is due to incomplete repression of Pho84. They had overexpressed Pho84 from a galactose inducible promotor to generate a background with a regulatable Pi transporter. This trick allowed them to conveniently manipulate the strain and reduce (but not abolish) Pho84 expression by transferring the cells from galactose to glucose for their experiments. Therefore, we chose a more rigorous plasmid shuffling strategy to test the individual P<sub>i</sub> transporter, which allows an assessment without the leaky background expression of Pho84 on glucose. In contrast to O'Shea and colleagues, we observed zero growth of a strain expressing only PHO91. We have revised the results section to make this discrepancy more evident and provide a better motivation for our experiment.

      (5) It would be nice to see the actual data in Figure 6; not only a quantification. 

      We illustrate the phenotype of nuclear Pho4-GFP in panel A. Showing all the images necessary to appreciate the differences between the strains would require including many dozens of images into the figure, which would not be useful.

    1. Synthèse du Webinaire : « Mon enfant est différent. Et alors ? »

      Résumé

      Ce document de synthèse analyse les informations clés du webinaire « Mon enfant est différent. Et alors ? », organisé par la Fédération des Conseils de Parents d'Élèves (FCPE).

      L'événement visait à informer, dédramatiser et fournir des outils concrets aux familles d'enfants présentant des spécificités neurodéveloppementales.

      En partenariat avec trois associations expertes — **HyperSupers -

      • TDAH France**,
      • la Fédération Française des DYS (FFDYS) et
      • l'Association Nationale Pour les Enfants Intellectuellement Précoces (ANPEIP) —,

      le webinaire a abordé le

      • Trouble du Déficit de l'Attention avec ou sans Hyperactivité (TDAH),
      • les troubles DYS, et
      • le Haut Potentiel Intellectuel (HPI).

      Les points essentiels qui en ressortent sont :

      1. Prévalence et Normalisation : Les troubles et spécificités abordés sont courants, représentant en moyenne plus d'un élève par classe en France.

      Il est crucial de comprendre qu'il s'agit de conditions neurodéveloppementales, et non de conséquences d'une mauvaise éducation parentale ou d'un manque d'efforts de la part de l'enfant.

      2. Importance du Diagnostic : Un repérage précoce et un diagnostic précis et différentiel sont fondamentaux. Ils permettent de mettre en place un accompagnement adapté, d'éviter les interprétations erronées des comportements de l'enfant (paresse, provocation) et de prévenir la dégradation de l'estime de soi.

      3. Vers une École Inclusive : L'inclusion scolaire est un droit et une nécessité. La clé réside dans une collaboration étroite entre les parents, les équipes éducatives et les associations.

      La FCPE réaffirme que « l'école inclusive, ce n'est pas une école à part, c'est l'école pour toutes et tous ».

      4. Ressources et Soutien : Des dispositifs d'accompagnement scolaire (PAI, PAP, PPS) existent pour répondre aux besoins spécifiques des élèves.

      Les associations jouent un rôle indispensable en offrant une expertise, des ressources documentaires, une formation et un soutien par les pairs, brisant ainsi l'isolement souvent ressenti par les familles.

      Contexte et Objectifs du Webinaire

      Organisé par la FCPE et animé par Aline, secrétaire générale adjointe, le webinaire a été conçu comme un « moment d'échange utile, bienveillant et concret ».

      L'objectif principal était de répondre à la préoccupation de nombreux parents : « Mon enfant ne rentre pas tout à fait dans les cases, comment l'aider à s'épanouir à l'école ? ».

      Le constat de départ est que ces différences, bien que faisant « partie du paysage ordinaire de l'école », sont trop souvent « ni suffisamment repérées ni suffisamment accompagnées ».

      Prévalence des Troubles et Spécificités en Milieu Scolaire

      Catégorie

      Prévalence

      Représentation en Classe

      Troubles DYS

      5 à 8 % des enfants

      Environ 1 à 2 élèves par classe

      TDAH

      Environ 5 % des enfants

      Environ 1 élève par classe

      Haut Potentiel Intellectuel (HPI)

      2 à 3 % des enfants

      Environ 1 élève par classe

      Total combiné

      > 10 % des élèves

      Plus d'un enfant par classe en moyenne

      Analyse des Troubles et Spécificités

      1. Le Trouble du Déficit de l’Attention avec ou sans Hyperactivité (TDAH)

      Présenté par Daniel de HyperSupers - TDAH France, le TDAH est un trouble du neurodéveloppement (TND) qui affecte les fonctions cérébrales liées à l'organisation de la pensée, la mémoire, la communication et l'apprentissage.

      Symptômes Cardinaux : Le TDAH se manifeste à travers trois axes principaux dont l'intensité varie selon les individus :

      Inattention : Difficulté à maintenir son attention, oublis fréquents, tendance à être distrait ("dans la lune"), évitement des tâches exigeant une concentration soutenue.

      C'est le symptôme le plus persistant à l'âge adulte.  

      Hyperactivité : Agitation motrice incessante chez l'enfant, qui se transforme souvent en hyperactivité mentale (idées qui fusent) à l'adolescence et à l'âge adulte.  

      Impulsivité : Difficulté à attendre son tour, tendance à interrompre les autres, réponses précipitées avant la fin d'une question.

      Prévalence et Comorbidités :

      ◦ Touche environ 5 % des enfants (350 000 en France) et 3 % des adultes.  

      ◦ Les garçons sont deux fois plus fréquemment diagnostiqués, mais le trouble est sous-diagnostiqué chez les filles, où l'inattention est souvent le symptôme prédominant.  

      50 % des personnes avec TDAH présentent au moins un trouble associé (comorbidité), comme des troubles DYS, un trouble du spectre de l'autisme, des troubles anxieux ou dépressifs, ou un trouble oppositionnel avec provocation.

      Diagnostic et Prise en Charge :

      ◦ Le diagnostic est clinique et se base sur des questionnaires validés (ex: DSM-5), qui exigent que les symptômes soient présents avant 12 ans, dans au moins deux sphères de vie (école, famille), et qu'ils aient un impact significatif sur la qualité de vie.  

      ◦ La prise en charge est multimodale : psychoéducation (expliquer le trouble à l'enfant et aux parents), aménagements scolaires (PAP, PPS), guidance parentale (ex: méthode Barkley), et éventuellement un traitement médicamenteux.

      Impact Scolaire : L'élève avec TDAH peut être perçu comme rêveur, perturbateur ou paresseux.

      Il a du mal à suivre les consignes, perd ses affaires et fournit un rendement scolaire faible malgré une grande dépense d'énergie, ce qui entraîne une fatigue importante et une baisse de l'estime de soi.

      2. Les Troubles Spécifiques des Apprentissages (Troubles DYS)

      Présentés par Fabienne de la Fédération Française des DYS (FFDYS), les troubles DYS sont également des troubles du neurodéveloppement.

      Principes Fondamentaux :

      ◦ Ils ne sont ni une maladie (on n'en guérit pas), ni un trouble psychique, ni une déficience intellectuelle ou sensorielle. L'intelligence est préservée.  

      ◦ Ils ne sont pas dus à un manque de stimulation ou à un environnement socioculturel défavorable.  

      ◦ Leur caractéristique centrale est une difficulté à automatiser certaines fonctions cognitives, ce qui oblige l'enfant à être en surcharge cognitive permanente, provoquant une grande lenteur et une fatigue intense.

      Les Différents Troubles DYS :

      Dyslexie / Dysorthographie : Trouble de l'identification des mots écrits. La lecture est lente, hachée (déchiffrage), ce qui entrave l'accès au sens.

      Il s'accompagne quasi systématiquement d'une dysorthographie (difficulté à automatiser les règles orthographiques).  

      Dysphasie (Trouble Développemental du Langage) : Trouble de la communication orale, affectant la compréhension et/ou l'expression. L'enfant doit faire un effort majeur pour comprendre les consignes orales et pour se faire comprendre.  

      Dyscalculie : Trouble de la cognition logico-mathématique, affectant la compréhension du sens du nombre, des quantités et des opérations.  

      Dyspraxie (Trouble Développemental de la Coordination) / Dysgraphie : Trouble de la planification et de l'automatisation des gestes.

      L'enfant est qualifié de "maladroit", a des difficultés avec la motricité fine (écriture, laçage, utilisation des couverts), l'organisation spatiale (géométrie, lecture de tableaux) et la gestion du temps.

      3. Le Haut Potentiel Intellectuel (HPI)

      Présenté par Frédéric de l'ANPEIP, le HPI n'est pas un trouble mais une spécificité reconnue par l'Éducation Nationale comme un "besoin éducatif particulier".

      Définition et Identification :

      ◦ Il se caractérise par un fonctionnement intellectuel qualitativement différent, validé par des études en neuro-imagerie.   

      ◦ L'identification repose sur un bilan psychologique complet réalisé par un professionnel, et ne se résume pas à un chiffre de QI (supérieur à 130). Le bilan analyse l'estime de soi, l'anxiété, les relations sociales, etc.  

      Un individu ne se résume pas à un chiffre.

      Caractéristiques :

      ◦ Questionnements incessants sur des sujets existentiels (vie, mort), grande curiosité.   

      ◦ Compréhension très rapide, capacité à faire des liens et des raccourcis.  

      ◦ Grande sensibilité et sens critique développé très tôt.

      Concepts Clés :

      Dyssynchronie : Un décalage entre le développement intellectuel (souvent en avance) et les développements affectif, social ou psychomoteur (qui correspondent à l'âge réel).

      Un enfant de 6 ans peut avoir une pensée très mature mais la motricité d'un enfant de son âge, rendant l'écriture difficile.  

      Double ou Triple Spécificité : Un enfant HPI peut également présenter un TDAH et/ou des troubles DYS.

      Le HPI peut alors masquer les troubles pendant un temps, rendant le diagnostic complexe et souvent tardif (fin de collège ou lycée).

      Impact Scolaire : Le décalage peut mener à l'ennui, à un désinvestissement et à des difficultés de socialisation. L'appréciation "peut mieux faire, a des capacités mais ne les exploite pas" est fréquente.

      L'Accompagnement des Enfants et le Soutien aux Familles

      Le Rôle de la FCPE

      La FCPE, en tant qu'association nationale de parents d'élèves, est présente à toutes les strates du système éducatif.

      Représentation : Elle siège dans les instances nationales, régionales, départementales et locales (conseil d'école, conseil d'administration, CESCE, commissions d'appel, etc.).

      Partenariats : Elle collabore avec les municipalités, les académies, les ministères, mais aussi avec des organismes comme la CDAPH (Commission des droits et de l'autonomie des personnes handicapées), la CPAM, l'ARS et la MDA (Maison des Adolescents).

      Missions : Elle porte une attention particulière aux droits de l'enfant et au respect des besoins éducatifs particuliers. Elle fait partie de collectifs comme le Réseau Éducation Sans Frontières (RESF) pour accompagner les familles en situation de précarité.

      Le Soutien des Associations Partenaires

      Chaque association offre un soutien crucial basé sur l'expertise et la pair-aidance.

      Association

      Actions et Ressources Clés

      HyperSupers - TDAH France

      • - Soutien par les pairs : Groupes de parole (GSP), forums en ligne, hotline "SOS Rentrée Scolaire".<br>\
        • Ressources : Site internet (tdah-france.fr), brochures, livres, web-documentaires, chaîne YouTube.<br>\
        • Formation : Modules de formation en ligne pour les adhérents.<br>\
        • Plaidoyer : Représentation dans les instances nationales de santé et du handicap.

      Fédération Française des DYS (FFDYS)

      • - Réseau Local : Fédération de 150 associations locales accessibles via une carte sur le site ffdys.com.<br>\
        • Événements : Journée Nationale des DYS, colloques scientifiques (disponibles en replay).<br>\
        • Information : Podcasts, vidéos, guides pratiques (orientation, emploi).<br>\
        • Formation : Organisme de formation pour les professionnels de l'éducation, de la santé et de l'emploi.

      ANPEIP

      • - Réseau Régional : Fédération de 14 associations régionales.<br>\
        • Rupture de l'isolement : Cafés parents, sorties, ateliers pour enfants et adolescents pour qu'ils se retrouvent entre pairs.<br>\
        • Information : Conférences et ateliers pour démystifier le HPI.<br>\
        • Plaidoyer : Partenaire de l'Éducation Nationale, travaille à l'harmonisation des pratiques et à la mise à jour des documents officiels (Vade-mecum HPI).

      Citations Clés

      Aline (FCPE) : « Une école inclusive, ce n'est pas une école à part, c'est l'école pour toutes et tous. »

      Fabienne (FFDYS) : « [Les troubles DYS] ce ne sont pas des maladies, ce qui veut dire qu'on n'en guérit pas. On va garder ces troubles tout au long de sa vie. »

      Frédéric (ANPEIP) : « Un individu ne se résume pas à un chiffre. »

      Daniel (TDAH France), à propos des adultes diagnostiqués tardivement : « Croyez-moi, c'est une libération pour ces adultes là, ils repartent d'un pied nouveau. »

    1. Reviewer #3 (Public review):

      This study investigates the connection between glycolysis and the biosynthesis of sulfur-containing amino acids in controlling fungal morphogenesis, using Saccharomyces cerevisiae and C. albicans as model organisms. The authors identify a conserved metabolic axis that integrates glycolysis with cysteine/methionine biosynthetic pathways to influence morphological transitions. This work broadens the current understanding of fungal morphogenesis, which has largely focused on gene regulatory networks and cAMP-dependent signaling pathways, by emphasizing the contribution of metabolic control mechanisms. However, despite the novel conceptual framework, the study provides limited mechanistic characterization of how the sulfur metabolism and glycolysis blockade directly drive morphological outcomes. In particular, the rationale for selecting specific gene deletions, such as Met32 (and not Met4), or the Met30 deletion used to probe this pathway, is not clearly explained, making it difficult to assess whether these targets comprehensively represent the metabolic nodes proposed to be critical. Further supportive data and experimental validation would strengthen the claims on connections between glycolysis, sulfur amino acid metabolism, and virulence.

      Strengths:

      (1) The delineation of how glycolytic flux regulates fungal morphogenesis through a cAMP-independent mechanism is a significant advancement. The coupling of glycolysis with the de novo biosynthesis of sulfur-containing amino acids, a requirement for morphogenesis, introduces a novel and unexpected layer of regulation.

      (2) Demonstrating this mechanism in both S. cerevisiae and C. albicans strengthens the argument for its evolutionary conservation and biological importance.

      (3) The ability to rescue the morphogenesis defect through exogenous supplementation of sulfur-containing amino acids provides functional validation.

      (4) The findings from the murine Pfk1-deficient model underscore the clinical significance of metabolic pathways in fungal infections.

      Weaknesses:

      (1) While the link between glycolysis and sulfur amino acid biosynthesis is established via transcriptomic and proteomic analysis, the specific regulation connecting these pathways via Met30 remains to be elucidated. For example, what are the expression and protein levels of Met30 in the initial analysis from Figure 2? How specific is this effect on Met30 in anaerobic versus aerobic glycolysis, especially when the pentose phosphate pathway is involved in the growth of the cells when glycolysis is perturbed?

      (2) Including detailed metabolite profiling could have strengthened the metabolic connection and provided additional insights into intermediate flux changes, i.e., measuring levels of metabolites to check if cysteine or methionine levels are influenced intracellularly. Also, it is expected to see how Met30 deletion could affect cell growth. Data on Met30 deletion and its effect on growth are not included, especially given that a viable heterozygous Met30 strain has been established. Measuring the cysteine or methionine levels using metabolomic analysis would further strengthen the claims in every section.

      (3) In comparison with the previous bioRxiv (doi: https://doi.org/10.1101/2025.05.14.654021) of this article in May 2025 to the recent bioRxiv of this article (doi: https://doi.org/10.1101/2025.05.14.654021), there have been some changes, and Met30 deletion has been recently included, and the chemical perturbation of glycolysis has been added as new data. Although the changes incorporated in the recent version of the article improved the illustration of the hypothesis in Figure 6, which connects glycolysis to Sulfur metabolism, the gene expression and protein levels of all genes involved in the illustrated hypothesis are not consistently shown. For example, in some cases, the Met4 expression is not shown (Figure 4), and the Met30 expression is not shown during profiling (gene expression or protein levels) throughout the manuscript. Lack of consistency in profiling the same set of key genes makes understanding more complicated.

      (4) The demonstrated link between glycolysis and sulfur amino acid biosynthesis, along with its implications for virulence in C. albicans, is important for understanding fungal adaptation, as mentioned in the article; however, the Met4 activation was not fully characterized, nor were the data presented when virulence was assessed in Figure 4. Why is Met4 not included in Figure 4D and I? Especially, according to Figure 6, Met4 activation is crucial and guides the differences between glycolysis-active and inactive conditions.

      (5) Similarly, the rationale behind selecting Met32 for characterizing sulfur metabolism is unclear. Deletion of Met32 resulted in a significant reduction in pseudohyphal differentiation; why is this attributed only to Met32? What happens if Met4 is deleted? It is not justified why Met32, rather than Met4, was chosen. Figure 6 clearly hypothesizes that Met4 activation is the key to the mechanism.

      (6) The comparative RT-qPCR in Figure 5 did not account for sulfur metabolism genes, whereas it was focused only on virulence and hyphal differentiation. Is there data to support the levels of sulfur metabolism genes?

      (7) To validate the proposed interlink between sulfur metabolism and virulence, it is recommended that the gene sets (illustrated in Figure 6) be consistently included across all comparative data included throughout the comparisons. Excluding sulfur metabolism genes in Figure 5 prevents the experiment from demonstrating the coordinated role of glycolysis perturbation → sulfur metabolism → virulence. The same is true for other comparisons, where the lack of data on Met30, Met4, etc., makes it hard to connect the hypothesis. It is also recommended to check the gene expression of other genes related to the cAMP pathway and report them to confirm the cAMP-independent mechanism. For example, gap2 deletion was used to confirm the effects of cAMP supplementation, but the expression of this gene was not assessed in the RNA-seq analysis in Figure 2. It would be beneficial to show the expression of cAMP-related genes to completely confirm that they do not play a role in the claims in Figure 2.

      (8) Although the NAC supplementation study is included in the new version of the article compared to the previous version in BioRxiv (May 2025), the link to sulfur metabolism is not well characterized in Figure 5 and their related datasets. The main focus of the manuscript is to delineate the role of sulfur metabolism; hence, it is anticipated that Figure 5 will include sulfur-related metabolic genes and their links to pfk1 deletion, using RT-PCR measurements as shown for the virulence genes.

      (9) The manuscript would benefit from more information added to the introduction section and literature supports for some of the findings reported earlier, including the role of (i) cAMP-PKA and MAPK pathways, (ii) what is known in the literature that reports about the treatment with 2DG (role of Snf1, HXT1, and HXT3), as well as how gpa2 is involved. Some sentences in the manuscripts are repetitive; it would be beneficial to add more relevant sections to the introduction and discussion to clarify the rationale for gene choices.

    1. Reviewer #2 (Public review):

      Summary:

      In this study, the authors identify the N-glycosylation factor B4GALT1 as an important regulator of CD8 T-cell function.

      Strengths:

      (1) The use of complementary ex vivo and in vivo CRISPR screens is commendable and provides a useful dataset for future studies of CD8 T-cell biology.

      (2) The authors perform multiple untargeted analyses (RNAseq, glycoproteomics) to hone their model on how B4GALT1 functions in CD8 T-cell activation.

      (3) B4GALT1 is shown to be important in both in vitro T-cell killing assays and a mouse model of tumor control, reinforcing the authors' claims.

      Weaknesses:

      (1) The authors did not verify the efficiency of knockout in their single-gene KO lines.

      (2) As B4GALT1 is a general N-glycosylation factor, the phenotypes the authors observe could formally be attributable to indirect effects on glycosylation of other proteins.

      (3) The specific N-glycosylation sites of TCR and CD8 are not identified, and would be helpful for site-specific mutational analysis to further the authors' model.

      (4) The study could benefit from further in vivo experiments testing the role of B4GALT1 in other physiological contexts relevant to CD8 T cells, for example, autoimmune disease or infectious disease.

    2. Author response:

      Reviewer #1 (Public review):

      Summary:

      The study by Yu et al investigated the role of protein N-glycosylation in regulating T-cell activation and functions is an interesting work. By using genome-wide CRISPR/Cas9 screenings, the authors found that B4GALT1 deficiency could activate expression of PD-1 and enhance functions of CD8+ T cells both in vitro and in vivo, suggesting the important roles of protein N-glycosylation in regulating functions of CD8+ T cells, which indicates that B4GALT1 is a potential target for tumor immunotherapy.

      Strengths:

      The strengths of this study are the findings of novel function of B4GALT1 deficiency in CD8 T cells.

      Weaknesses:

      However, authors did not directly demonstrate that B4GALT1 deficiency regulates the interaction between TCR and CD8, as well as functional outcomes of this interaction, such as TCR signaling enhancements.

      We are very sorry that we did not highlight our results in Fig. 5f-h enough. In those figures, we demonstrated the interaction between TCR and CD8 increased significantly in B4GALT1 deficient T-cells, by FRET assays. To confirm the important role of TCR-CD8 interaction in mediating the functions of B4GALT1 in regulating T-cell functions, such as in vitro killing of target cells, we artificially tethered TCR and CD8 by a CD8β-CD3ε fusion protein and tested its functions in both WT and B4GALT1 knockout CD8<sup>+</sup> T-cell. Our results demonstrate that such fusion protein could bypass the effect of B4GALT1 knockout in CD8<sup>+</sup>T-cells (Fig. 5g-h). Together with the results that B4GALT1 directly regulates the galactosylation of TCR and CD8, those results strongly support the model that B4GALT1 modulates T-cell functions mainly by galactosylations of TCR and CD8 that interfere their interaction.

      Reviewer #2 (Public review):

      Summary:

      In this study, the authors identify the N-glycosylation factor B4GALT1 as an important regulator of CD8 T-cell function.

      Strengths:

      (1) The use of complementary ex vivo and in vivo CRISPR screens is commendable and provides a useful dataset for future studies of CD8 T-cell biology.

      (2) The authors perform multiple untargeted analyses (RNAseq, glycoproteomics) to hone their model on how B4GALT1 functions in CD8 T-cell activation.

      (3) B4GALT1 is shown to be important in both in vitro T-cell killing assays and a mouse model of tumor control, reinforcing the authors' claims.

      Weaknesses:

      (1) The authors did not verify the efficiency of knockout in their single-gene KO lines.

      Thank reviewer for reminding. We verified the efficiency of some gRNAs by FACS and Surveyor assay. We will add those data in supplementary results in revised version later.

      (2) As B4GALT1 is a general N-glycosylation factor, the phenotypes the authors observe could formally be attributable to indirect effects on glycosylation of other proteins.

      please see response to reviewer #1.

      (3) The specific N-glycosylation sites of TCR and CD8 are not identified, and would be helpful for site-specific mutational analysis to further the authors' model.

      Thank reviewer for suggestion! Unfortunately, there are multiple-sites of TCR and CD8 involved in N-glycosylation (https://glycosmos.org/glycomeatlas). We worry that mutations of all these sites may not only affect glycosylation of TCR and CD8 but also other essential functions of those proteins.

      (4) The study could benefit from further in vivo experiments testing the role of B4GALT1 in other physiological contexts relevant to CD8 T cells, for example, autoimmune disease or infectious disease.

      Thank reviewer for this great suggestion to expand the roles of B4GALT1 in autoimmune and infection diseases. However, since in current manuscript we are mainly focusing on tumor immunology, we think we should leave these studies for future works.

    1. The key to productive failure as we envision it is to recognize when one’s work is suffering from a type 1 or type 2 fail, and to transform it to a type 3 or type 4

      This is the likely goal of my project. I had some weak foundational material to begin with, but simply making the attempt and getting over a want of total success might help me get a better perspective on my research and on me as a person.

    1. Let’s add mod to our formula. But we’ll use mod 13 (instead of mod 12).

      For example, if we had a PRNG doing X sub i is 10, then X sub i plus 1 is 10 times 2 plus 1 is 21, then mod 13 gives us 8. If X sub i is 8, X sub i times 2 plus 1 is 17 mod 13 is 4. If we take 4 as our input, we get 9 and then 6 and then 0, so forth, so if we continue in this way, we will produce another sequence of numbers.

      The numbers jump around seemingly randomly between 0 and 11. But notice that the sequence repeats when it hits 10, so it repeats after a very short time. And, it can’t generate a 12. It just wouldn't be a very good PRNG.

    1. Synthèse du Webinaire : Concilier les Enjeux de l'Alimentation Durable et la Précarité

      Résumé

      Ce document de synthèse résume les échanges du webinaire "Comment concilier les 4 enjeux de l'alimentation durable et la précarité ?", organisé par le CRES et le GRAINE PACA.

      Il met en lumière la complexité de la précarité alimentaire, un phénomène hétérogène et difficile à quantifier, qui toucherait environ 8 millions de personnes en France.

      La région PACA se distingue par un taux de pauvreté élevé, le troisième plus important de France, exacerbant les inégalités d'accès à une alimentation de qualité.

      Les interventions scientifiques ont démontré que les quatre piliers de l'alimentation durable (nutrition/santé, environnement, socio-économique, socio-culturel) ne convergent pas naturellement.

      Cependant, des études approfondies révèlent qu'un régime alimentaire à la fois sain et à faible impact environnemental peut être moins coûteux.

      La clé réside dans une "végétalisation saine" de l'alimentation : une réduction de la consommation de produits animaux, notamment la viande de ruminant, compensée par un apport accru en céréales complètes, légumineuses, fruits et légumes.

      La région PACA dispose d'un écosystème structuré pour aborder ces défis, avec des instances de coordination comme la COALIM et des réseaux thématiques (Précalim, Éducalim, Régalim, PAT) visant à décloisonner les approches.

      Des programmes nationaux comme "Mieux Manger Pour Tous" et des réglementations telles que la loi EGalim offrent des cadres financiers et légaux pour transformer les systèmes alimentaires, y compris l'aide alimentaire.

      Enfin, l'étude de cas de l'épicerie sociale de Mouans-Sartoux illustre une transition réussie d'un modèle d'aide basé sur les invendus à une offre de produits frais, bio et locaux.

      Cette transformation, rendue possible par la volonté politique, des partenariats stratégiques (Biocoop, producteurs locaux) et l'accès à des financements dédiés, prouve qu'il est possible d'améliorer radicalement la qualité et la durabilité de l'aide alimentaire tout en respectant la dignité des bénéficiaires.

      --------------------------------------------------------------------------------

      1. Introduction et Contexte du Webinaire

      Organisé par le CRES PACA (Comité Régional d'Éducation pour la Santé) et le GRAINE PACA (Réseau Régional pour l'Éducation à l'Environnement et au Développement Durable), ce webinaire a bénéficié du soutien financier de la DREAL, et a été mené en partenariat avec la DRAF et l'ADEME Provence-Alpes-Côte d'Azur.

      Il s'inscrit dans le cadre du programme Mieux Manger Pour Tous et fait partie d'un cycle de deux webinaires portés par deux réseaux régionaux majeurs :

      Précalim : Réseau régional de lutte contre la précarité alimentaire.

      Éducalim : Réseau régional de l'éducation à l'alimentation durable et au goût.

      Les objectifs principaux du webinaire étaient les suivants :

      • Approfondir les connaissances sur la notion d'alimentation durable et les leviers pour concilier ses enjeux chez les personnes en situation de précarité.

      • Identifier les principales réglementations liées à l'alimentation durable pour tous.

      • Découvrir une action de terrain inspirante et reproductible.

      2. Le Cadre Stratégique et les Réseaux d'Acteurs en Région PACA

      2.1. L'Écosystème Régional pour une Alimentation Durable

      Présenté par Peggy Bucas (DRAF), le maillage régional en PACA est conçu pour maximiser l'efficacité des actions en faveur de l'alimentation durable.

      La COALIM : Cette instance réunit les institutions régionales (DRAF, DREAL, DREETS, ARS, ADEME, Région) qui pilotent des missions et des financements liés à l'alimentation durable. Elle assure une concertation et une complémentarité des actions.

      Les Réseaux Thématiques Régionaux : Quatre réseaux principaux apportent un soutien thématique et méthodologique aux porteurs de projet.

      Précalim : Focalisé sur la lutte contre la précarité alimentaire.    ◦ Éducalim : Centré sur l'éducation à l'alimentation durable et au goût.    ◦ Régalim : Dédié à la lutte contre le gaspillage et les pertes alimentaires.    ◦ Réseau des PAT : Anime les 29 Projets Alimentaires Territoriaux (PAT) de la région.

      Les PAT sont des leviers essentiels pour favoriser les coopérations, casser les fonctionnements en silo et développer une approche systémique. Ils ont pour mission d'intégrer un volet "justice sociale" pour réduire la précarité alimentaire.

      2.2. Le Réseau Précalim et le Programme "Mieux Manger Pour Tous"

      Présentés par Sandrine Fort (DREETS), le réseau Précalim et le programme MMPT sont des piliers de la lutte contre la précarité alimentaire dans la région.

      Le Réseau Précalim :

      Membres : Près de 600 membres (institutions, associations, collectivités). Un appel est lancé pour intégrer davantage d'acteurs agricoles.

      Objectifs :

      ◦ Créer de l'interconnaissance entre les acteurs.  

      ◦ Favoriser le partage d'initiatives et les retours d'expérience.  

      ◦ Promouvoir les synergies et les coopérations.  

      ◦ Valoriser les actions et les acteurs.

      Actions : Journées de rencontre, webinaires thématiques, ateliers "accélérateurs de projets" et une plateforme collaborative sur l'espace de l'ADEME.

      Le Programme "Mieux Manger Pour Tous" (MMPT) :

      Origine : Issu du plan d'action pour la transformation de l'aide alimentaire.

      Budget national : 60 millions d'euros en 2023, avec une progression de 10 millions par an prévue jusqu'en 2027.

      Objectifs :

      1. Améliorer la qualité nutritionnelle et gustative de l'aide alimentaire.   

      2. Soutenir la participation et l'accompagnement des personnes précaires.   

      3. Transformer les dispositifs locaux de lutte contre la précarité alimentaire (ex: paniers solidaires, groupements d'achat).   

      4. Réduire l'impact environnemental du système d'aide alimentaire.

      Chiffres du programme en PACA :

      2023 : 51 projets financés pour 1,7 million d'euros.    ◦ 2024 : 62 projets financés pour 2,5 millions d'euros.    ◦ 2025 : Enveloppe de 3,3 millions d'euros, avec 46 projets supplémentaires en cours d'instruction.

      3. La Précarité Alimentaire : Définitions et Chiffres Clés

      3.1. Définitions Fondamentales

      Terme

      Définition

      Alimentation Durable (FAO)

      Régimes alimentaires qui contribuent à protéger la biodiversité, sont culturellement acceptables, économiquement équitables et accessibles, et nutritionnellement sûrs et sains. Elle repose sur quatre enjeux : Nutrition/Santé, Environnement, Socio-économique, et Socio-culturel.

      Lutte contre la Précarité Alimentaire

      Favoriser l'accès à une alimentation sûre, diversifiée, de bonne qualité et en quantité suffisante pour les personnes en situation de vulnérabilité, dans le respect de leur dignité et en développant leur capacité d'agir.

      Aide Alimentaire

      Fourniture de denrées alimentaires aux personnes vulnérables, assortie d'une proposition d'accompagnement.

      Insécurité Alimentaire (FAO)

      Situation dans laquelle une personne n'a pas un accès régulier à suffisamment d'aliments sains et nutritifs pour une croissance et une vie active et saine. Elle est mesurée par l'échelle FIES (Food Insecurity Experience Scale).

      3.2. État des Lieux de la Précarité Alimentaire

      La mesure de la précarité alimentaire en France est complexe en raison de l'absence de méthode de recensement homogène et régulière.

      Les données sont issues du croisement de plusieurs sources (statistiques publiques, études ponctuelles comme INCA 3, données des associations).

      Chiffres nationaux (estimations) :

      Personnes en insécurité alimentaire : 8 millions, soit 11% de la population (Anses).

      Insatisfaction alimentaire : 16% des personnes déclarent ne pas avoir assez à manger et 45% ne pas avoir les aliments souhaités (CRÉDOC, 2022).

      Bénéficiaires de l'aide alimentaire : Entre 2 et 9 millions. La DGCS recense 5,3 millions de personnes inscrites auprès des associations habilitées.

      Non-recours à l'aide alimentaire : 75% des personnes en insécurité alimentaire n'ont pas recours à l'aide alimentaire (étude INCA 3, 2015).

      Difficultés financières : 38% des Français rencontrent des difficultés financières pour consommer des fruits et légumes frais tous les jours (Baromètre Ipsos/Secours Populaire, 2024).

      Impacts sur la santé :

      • La prévalence de l'obésité est près de quatre fois plus élevée chez les adultes les plus pauvres.

      • La consommation de fruits et légumes est deux fois plus faible chez les personnes en insécurité alimentaire (230g/jour en moyenne contre une recommandation de 400g/jour).

      Situation en région PACA :

      Taux de pauvreté : 3ème plus élevé de France, touchant environ 850 000 personnes.

      Niveau de vie médian des personnes pauvres : 10 600 € par an, soit plus de deux fois inférieur au niveau de vie médian de l'ensemble de la population de la région (22 000 €).

      Département le plus pauvre : Le Vaucluse, avec un taux de pauvreté de 20% (5ème plus élevé de France).

      Groupes les plus touchés :

      ◦ Les ménages dont le référent a moins de 30 ans (25% de taux de pauvreté).   

      ◦ Les familles monoparentales (30,2%).  

      ◦ Les seniors (la part des retraités parmi les ménages pauvres est de 30,4%).

      4. Éclairages Scientifiques : Vers une Alimentation Durable et Abordable

      Florent Vieux (MS-Nutrition) a présenté plusieurs études visant à quantifier les dimensions de l'alimentation durable (nutrition, environnement, coût) à partir de bases de données de référence (INCA 3, Ciqual, Agribalyse, Kantar).

      4.1. Hiérarchie des Groupes Alimentaires

      Cette étude montre que le classement des aliments en termes de coût et d'impact environnemental dépend fortement de l'unité fonctionnelle choisie.

      Unité Fonctionnelle

      Constats Clés

      Par kilogramme (€/kg)

      - Les plus chers/impactants : Viande de ruminant, produits de la mer. <br> - Les moins chers/impactants : Fruits, légumes, légumineuses.

      Par 100 kilocalories (€/100 kcal)

      - Les fruits et légumes deviennent très chers et impactants en raison de leur faible densité énergétique. <br> - Les produits laitiers et les œufs restent en position intermédiaire.

      Par unité de qualité nutritionnelle

      - Les produits de la mer redeviennent plus "abordables". <br>

      • Les fruits, légumes et légumineuses restent des choix très pertinents (faible coût/impact rapporté à leur densité nutritionnelle).

      Conclusion principale : Le classement du coût et de l'impact environnemental des catégories d'aliments est très similaire.

      Certains aliments comme les légumineuses, les pommes de terre et les céréales complètes sont systématiquement peu coûteux et peu impactants, quelle que soit l'unité fonctionnelle.

      4.2. Approche par "Déviance Positive"

      Cette étude a comparé les régimes alimentaires d'individus ayant une bonne qualité nutritionnelle mais des impacts environnementaux différents.

      Le groupe "plus durable" (bonne nutrition, faible impact) présentait également un coût alimentaire plus faible.

      Marqueurs d'une bonne qualité nutritionnelle (communs aux deux groupes) :

      ◦ Consommation élevée de fruits et légumes.  

      ◦ Consommation élevée de produits laitiers.  

      ◦ Faible consommation de boissons sucrées.

      Ce qui distingue le groupe à faible impact environnemental :

      ◦ Une consommation beaucoup plus faible de viande de ruminant.  

      ◦ Une consommation nettement plus élevée de céréales complètes pour compenser.

      4.3. Conclusion et Recommandations

      L'ensemble des études convergent vers un message principal : la "végétalisation saine".

      Il s'agit de réduire la consommation de produits animaux (surtout la viande) et de la substituer par des choix végétaux éclairés (céréales complètes, légumineuses, fruits et légumes).

      Enjeu spécifique pour les personnes précaires : L'augmentation de la consommation de fruits et légumes est prioritaire, car leur niveau de consommation de départ est particulièrement bas.

      Empreinte carbone : Si les plus pauvres ont une empreinte carbone globale bien plus faible que les plus riches, la différence est moins marquée pour le poste "alimentation". Agir sur ce levier reste donc pertinent pour tous.

      5. Cadre Réglementaire et Levier d'Action

      5.1. La Loi EGalim comme Modèle

      Clara Vigan (DRAF) a présenté la loi EGalim, appliquée à la restauration collective, comme un levier puissant pouvant inspirer des actions au-delà de ce secteur.

      Objectifs de la loi :

      50% de produits de qualité et durables, dont au moins 20% de produits bio.  

      Diversification des sources de protéines avec l'introduction de menus végétariens, ce qui permet de réduire les coûts.  

      Lutte contre le gaspillage alimentaire.    ◦ Réduction de l'usage du plastique.

      Ces principes peuvent être transposés à l'aide alimentaire pour améliorer la qualité de l'offre tout en maîtrisant les budgets.

      5.2. L'Impératif de la Sécurité Sanitaire des Aliments

      Peggy Bucas (DRAF) a rappelé les règles fondamentales du "Paquet Hygiène", cruciales pour toute structure distribuant des denrées.

      Principes clés : traçabilité des dons, respect de la chaîne du froid/chaud, hygiène des locaux et du personnel.

      Distinction essentielle :

      DLC (Date Limite de Consommation) : Dépassement impérativement interdit.    ◦ DDM (Date de Durabilité Minimale) : "à consommer de préférence avant", le produit reste consommable sans risque sanitaire après la date.

      6. Étude de Cas : La Transformation de l'Épicerie Sociale de Mouans-Sartoux

      Rémy Georgon (CCAS de Mouans-Sartoux) a partagé le retour d'expérience de la transformation de l'épicerie sociale de la commune.

      Le déclic : Une prise de conscience collective en 2020 face à la baisse de qualité des dons issus des invendus. La structure réalisait qu'elle distribuait "des produits que personne n'a achetés".

      La stratégie de transformation :

      1. Partenariats stratégiques : Une collaboration avec le magasin Biocoop local a permis d'instaurer une offre de produits en vrac (alimentaire et hygiène) et de créer un rayon de produits bio achetés.  

      2. Recherche de financements : Mobilisation des appels à projets "France Relance" (pour renouveler les équipements de froid) et "Mieux Manger Pour Tous".   

      3. Approvisionnement local et de saison : Mise en place d'un système de commande groupée de légumes frais et de saison auprès d'un producteur local.  

      4. Synergie avec la politique de la ville : Le projet MMPT a permis de financer l'embauche d'un maraîcher par le CCAS, mis à disposition de la régie agricole municipale pour augmenter la production de légumes bio à destination de l'épicerie.  

      5. Implication des bénéficiaires : Les usagers ont été consultés pour définir les produits frais prioritaires à acheter (produits laitiers).

      Résultats quantitatifs :

      ◦ En 2024, les produits bio représentaient 7% du stock (avec 0% de fruits et légumes).   

      ◦ Au premier semestre 2025, ce chiffre est passé à 46% de produits bio en poids, dont 62% sont des fruits et légumes.  

      ◦ Le budget d'achat de denrées est passé de 4 000 € à 25 000 €, soutenu par des subventions.

      Facteurs clés de succès :

      ◦ La conviction et l'engagement du responsable.  

      ◦ Une forte volonté politique et le soutien de la mairie.    ◦ La capacité à chercher des partenaires et des financements externes.  

      ◦ Le choix de privilégier la qualité sur la quantité.

    1. narcissism is more strongly related to behavioral problems in the context of low self-esteem [3]. This is an important consideration for children with symptoms of ADHD who are at greater risk for poor self-esteem,

      This is the whole thesis in one sentence. 1) ADHD kids are at risk for low self-esteem (from criticism). 2) Narcissism is the defense that builds on top of it.

    1. Synthèse d'une Recherche sur les Associations et leurs Territoires

      Synthèse

      Ce document de synthèse présente les principaux enseignements d'un travail de recherche doctoral analysant les relations complexes entre les associations et leurs territoires.

      La recherche démontre que le territoire d'une association n'est pas une simple donnée géographique, mais une construction dynamique et relationnelle façonnée par les interactions, les ressources mobilisées et les proximités (géographique, organisationnelle, institutionnelle) avec un écosystème d'acteurs.

      L'étude distingue une zone d'activité, souvent locale, d'une zone d'influence (liée au projet associatif) beaucoup plus large, soulignant que ces deux dimensions sont complémentaires.

      Il ressort que la coopération, centrale dans ce processus, est fortement guidée par l'appartenance sectorielle et le partage de valeurs, ce qui a des implications directes sur la recherche de financements et la légitimité des associations.

      La méthodologie mixte, combinant une analyse quantitative nationale de 1600 bassins de vie et une étude qualitative approfondie de huit territoires, confère une robustesse significative à ces conclusions.

      Ces travaux offrent des arguments concrets pour le plaidoyer, permettant aux associations de mieux valoriser leur contribution multidimensionnelle au développement et à l'attractivité des territoires.

      1. Contexte et Problématique de la Recherche

      Un Contexte Institutionnel en Tension

      La recherche s'inscrit dans un contexte institutionnel marqué par un "vrai questionnement aujourd'hui et mise en péril du monde associatif". Cette tension est illustrée par une dichotomie fondamentale :

      D'une part, une reconnaissance croissante de l'importance des associations. Un rapport de la Cour des comptes de septembre souligne que "les associations mettent en œuvre des activités sociales relevant du périmètre de l'État", attestant de leur rôle crucial dans la "soutenabilité et la durabilité voir l'inclusion de notre société".

      D'autre part, une remise en cause systématique de leurs financements et de leur existence même.

      Cette situation crée un paradoxe entre la vision nationale, qui reconnaît leur apport systémique, et les réalités territoriales, où la légitimité des associations à agir et à être financées est constamment interrogée.

      L'Enjeu de la Relation au Territoire

      La question centrale qui motive la recherche est de qualifier la relation entre les associations et les territoires. Le financement et la légitimité d'une association sur un territoire sont souvent liés à son périmètre géographique d'influence et d'activité. La recherche vise donc à dépasser l'idée que les associations sont simplement "non délocalisables". En effet, une association peut fermer des postes dans une ville pour en ouvrir dans une autre, ce qui constitue une forme de délocalisation. Le travail de recherche se propose de déconstruire la notion de "local" pour analyser comment une association passe de la simple localisation (présence dans un espace) à l'ancrage (relations établies) et à la territorialisation (devenir une composante spécifique et indissociable du territoire).

      2. Le Cadre du Projet de Recherche Doctoral

      Cette recherche est menée dans le cadre d'une thèse en CIFRE (Convention Industrielle de Formation par la Recherche) au sein du Réseau National des Maisons des Associations (RNMA), débutée en 2022.

      Le Réseau National des Maisons des Associations (RNMA)

      Le RNMA est un réseau national dont les membres sont des Maisons des Associations (MDA), qu'elles soient de statut associatif ou des services municipaux. Ses missions incluent :

      • Faire remonter les problématiques du niveau local au niveau national.

      • Accompagner le métier d'accompagnateur de la vie associative.

      • Développer l'ingénierie, notamment en accompagnant la mise en place d'observatoires locaux de la vie associative, considérés comme des outils de co-construction de politiques publiques.

      Les Objectifs de la Thèse

      La thèse vise à qualifier et interpréter les relations entre les associations et le territoire à travers trois objectifs principaux :

      1. Identifier et qualifier les variables socio-économiques qui expliquent la présence des établissements associatifs employeurs sur un territoire (approche quantitative).

      L'hypothèse est que le tissu associatif est lié aux caractéristiques historiques, géographiques et culturelles d'un lieu.

      2. Tester et démontrer les relations entre les caractéristiques du territoire et les caractéristiques organisationnelles et sectorielles des associations (approche qualitative).

      3. Identifier les facteurs de diversité des associations, en analysant le processus qui mène de la localisation à la territorialisation.

      3. Cadre Conceptuel et Méthodologie

      Définitions Opérationnelles : Association et Territoire

      La recherche s'appuie sur des définitions précises pour structurer son analyse :

      L'Association :

      Elle est appréhendée non pas par sa définition juridique (loi 1901), mais comme une "forme organisationnelle construite" qui associe de manière intrinsèque une activité (réponse à des besoins) et un projet collectif.

      Elle mobilise pour cela des moyens humains (bénévoles, militants, salariés) et matériels.

      Le Territoire : Il n'est pas considéré comme un espace géographique statique, mais comme "une construction qui vient de l'histoire de la fondation de la structure [...] et surtout des interactions qu'elle va avoir avec d'autres organisations". Le territoire est donc le produit des relations entre les acteurs.

      Une Approche Méthodologique Mixte

      La robustesse de l'étude repose sur une méthodologie en deux temps :

      1. Phase Quantitative :

      Périmètre : Environ 1600 "bassins de vie" (définition INSEE) en France métropolitaine.   

      Analyse : Une méthode statistique a été utilisée pour croiser les caractéristiques socio-démographiques des bassins de vie (pyramide des âges, revenus, types d'emplois) avec la présence d'établissements associatifs employeurs (données INSEE Floress 2021).  

      Résultat : L'analyse a permis de répartir l'ensemble des bassins de vie en trois grands groupes ("clusters"), c'est-à-dire trois familles partageant des caractéristiques similaires dans l'articulation entre leur profil socio-économique et la présence associative.

      2. Phase Qualitative :

      Échantillon : Huit bassins de vie ont été sélectionnés, représentatifs des trois clusters et caractérisés par la présence d'une Maison des Associations (associative ou municipale).

      Les territoires étudiés sont : Grenoble, Dijon, Amiens, Concarneau, Montrevault-sur-Èvre, Niort, Crayon et Mauguio.   

      Collecte de données : 28 entretiens semi-directifs ont été menés (3 à 4 par bassin de vie) avec des associations de tous secteurs, employeuses comme non employeuses.

      Le choix de se concentrer initialement sur les associations employeuses pour la partie quantitative s'explique par la disponibilité de données statistiques fiables et consolidées au niveau national (via les déclarations URSSAF), ce qui n'est pas le cas pour les associations non employeuses.

      4. Étude de Cas : Le Bassin de Vie de Dijon

      Pour illustrer la démarche d'analyse, le cas d'une association culturelle dans le bassin de vie de Dijon est présenté.

      Caractéristiques Socio-économiques du Territoire

      Indicateur

      Donnée

      Département

      Côte-d'Or

      Évolution de la population (2016-2021)

      +2,5 %

      Structure démographique

      23 % de moins de 20 ans, 26 % de retraités

      Catégories socio-professionnelles

      11 % de cadres

      Économie

      83 % de l'emploi dans le secteur tertiaire

      Tissu associatif employeur

      947 établissements, générant près de 13 000 salaires

      Le territoire est perçu par les acteurs locaux comme offrant une bonne qualité de vie ("bon vivre"), avec une université, une offre culturelle importante, mais aussi un côté "un petit peu bourgeois".

      Modélisation d'une Association Culturelle

      Objet : Association de musique électronique, créée en 2004 par des passionnés suite à la fermeture d'un club.

      Structuration : D'abord bénévole, elle se professionnalise à partir de 2012 et compte aujourd'hui 3 salariés et une gouvernance de 8 personnes.

      Activités :

      Programmation/Production : Concerts, festivals.    ◦ Création : Studios de mixage.   

      Militantisme : Promotion de la musique électronique, professionnalisation du secteur, et mise en avant des valeurs de tolérance et de diversité.  

      Publics diversifiés : Ateliers en EHPAD, sensibilisation pour les jeunes en MJC, "booms" pour enfants.

      Écosystème : L'association interagit avec de multiples acteurs à différentes échelles (commune, métropole, département, national) :

      • autres associations (culturelles, environnementales),
      • la MDA municipale,
      • les financeurs institutionnels (Ville, DRAC, Conseil Régional), et
      • des réseaux (Ligue de l'enseignement, fédérations culturelles).

      Analyse via le Prisme des Proximités

      La relation entre l'association et son écosystème est analysée à travers trois types de proximités :

      Proximité Géographique : Évidente avec ses salariés, son public local, les EHPAD, les MJC et les autres associations locales. Elle facilite la rencontre et la coopération.

      Proximité Organisationnelle : Le partage de modes de fonctionnement.

      Elle existe avec toutes les associations (gouvernance démocratique, non-lucrativité) mais est beaucoup plus forte avec les associations du même secteur culturel, qui partagent des règles et des logiques d'action communes (ex: organiser un festival).

      Proximité Institutionnelle : Le partage de valeurs et de normes. De même, si des valeurs comme la solidarité sont partagées largement dans le monde associatif, cette proximité est nettement plus marquée au niveau sectoriel.

      5. Principaux Résultats et Conclusions Transversales

      L'étude de cas et les autres entretiens permettent de dégager des conclusions plus générales.

      Le Territoire : Une Construction Dynamique et Relationnelle

      Les résultats montrent que les associations ne sont pas simplement "localisées".

      Leur capacité à s'ancrer et à se territorialiser repose sur des mécanismes complexes où la coopération occupe une place centrale.

      Le territoire n'est donc "pas du tout figé ni fixe, il est tout à fait dynamique" ; il est multi-scalaire, multi-acteurs, et incarné par la capacité des associations à mobiliser des ressources et des proximités.

      Distinction entre Zone d'Activité et Zone d'Influence

      Une distinction fondamentale est établie :

      La Zone d'Activité : L'espace où se déploient les activités concrètes de l'association. Dans le cas de Dijon, elle est principalement concentrée sur la commune et la métropole.

      La Zone d'Influence : L'espace beaucoup plus large sur lequel rayonne le projet associatif (le militantisme, les valeurs, la reconnaissance du mouvement). Elle "dépasse largement tous ces périmètres là".

      Ces deux zones sont complémentaires et dynamiques.

      Le Rôle Central des Proximités et du Secteur d'Activité

      La Proximité Géographique est structurante : Elle est la condition première de la rencontre et de la connaissance mutuelle.

      Les MDA jouent un rôle clé de "lieu ressources" et de "facilitateurs".

      Le Secteur d'Activité est déterminant : Les proximités organisationnelle et institutionnelle sont décuplées au sein d'un même secteur.

      Les associations d'un même domaine partagent des règles, des logiques et surtout des valeurs spécifiques beaucoup plus fortes.

      Les Valeurs comme guide de l'action : La proximité institutionnelle (le partage de valeurs) est un facteur crucial pour la coopération et la recherche de financement.

      Comme l'indique un verbatim marquant de l'étude : "On travaille pas avec quelqu'un tout court si on n'a pas les mêmes valeurs".

      6. Implications pour le Plaidoyer Associatif

      Les résultats de cette recherche offrent des pistes concrètes pour que les associations valorisent leurs activités auprès des acteurs publics et des financeurs.

      1. Dépasser la logique de l'activité seule : Il est crucial de montrer que l'association ne se résume pas à son activité (qui contribue à l'attractivité et au développement local), mais qu'elle possède également un projet et une zone d'influence qui rayonnent bien au-delà.

      2. Démontrer l'effet de levier : Un financement local (par exemple, "1 € d'un acteur d'une commune") n'est pas une simple subvention.

      Il a un effet levier qui permet d'aller chercher d'autres financements à d'autres échelles (régionale, nationale), contribuant ainsi au rayonnement global de l'association et du territoire.

      3. Valoriser l'attraction de ressources externes : Les associations, par leur réseau et leur zone d'influence, attirent des ressources extérieures (artistes, expertises, financements) qu'elles mettent à disposition des habitants et du territoire, renforçant ainsi son attractivité.

    1. Regular Expressions Notepad++ regular expressions (“regex”) use the Boost regular expression library v1.85 (as of NPP v8.6.6), which was originally based on PCRE (Perl Compatible Regular Expression) syntax, only departing from it in very minor ways. Complete documentation on the precise implementation is to be found on the Boost pages for search syntax and replacement syntax. (Some users have misunderstood this paragraph to mean that they can use one of the regex-explainer websites that accepts PCRE and expect anything that works there to also work in Notepad++; this is not accurate. There are many different “PCRE” implimentations, and Boost itself does not claim to be “PCRE”, though both Boost and PCRE variants have the same origins in an early version of Perl’s regex engine. If your regex-explainer does not claim to use the same Boost engine as Notepad++ uses, there will be differences between the results from your chosen website and the results that Notepad++ gives.) The Notepad++ Community has a FAQ on other resources for regular expressions. Note: Regular expression “backward” search is disallowed due to sometimes surprising results. (For example, in the text to the test they travelled, a forward regex t\w+ will find 5 results; the same regex searching backward will find 17 matches.) If you really need this feature, please see Allow regex backward search to learn how to activate this option. Important Note: Syntax that works in the Find What: box for searching will not always work in the Replace with: box for replacement. There are different syntaxes. The Control Characters and Match by character code syntax work in both; other than that, see the individual sections for Searches vs Substitutions for which syntaxes are valid in which fields. Regex Special Characters for Searches In a regular expression (shortened into regex throughout), special characters interpreted are: Single-character matches . or \C ⇒ Matches any character. If you check the box which says . matches newline, or use the (?s) search modifier, then . or \C will match any character, including newline characters (\r or \n). With the option unchecked, or using the (?-s) search modifier, . or \C only match characters within a line, and do not match the newline characters. Any Unicode character within the Basic Multilingual Plane (BMP) (with a codepoint from U+0000 through U+FFFF) will be matched per these rules. Any Unicode character that is beyond the BMP (with a codepoint from U+10000 through U+10FFFF) will be matched as two separate characters instead, since the “surrogate code” uses two characters. (See the Match by Character Code section for more on how surrogate codes work.) \X ⇒ Matches a single non-combining character followed by any number (zero or more) combining characters. You can think of \X as a “. on steroids”: it matches the whole grapheme as a unit, not just the base character itself. This is useful if you have a Unicode encoded text with accents as separate, combining characters. For example, the letter ǭ̳̚, with four combining characters after the o, can be found either with the regex (?-i)o\x{0304}\x{0328}\x{031a}\x{0333} or with the shorter regex \X (the latter, being generic, matches more than just ǭ̳̚, inluding but not limited to ą̳̄̚ or o alone); if you want to limit the \X in this example to just match a possibly-modified o (so “o followed by 0 or more modifiers”), use a lookahead before the \X: (?=o)\X, which would match o alone or ǭ̳̚, but not ą̳̄̚. \$ , \( , \) , \* , \+ , \. , \? , \[ , \] , \\ , \| ⇒ Prefixing a special character with \ to “escape” the character allows you to search for a literal character when the regular expression syntax would otherwise have that character have a special meaning as a regex meta-character. The characters $ ( ) * + . ? [ ] \ | all have special meaning to the regex engine in normal circumstances; to get them to match as a literal (or to show up as a literal in the substitution), you will have to prefix them with the \ character. There are also other characters which are special only in certain circumstances (any time a character is used with a non-literal meaning throughout the Regular Expression section of this manual); if you want to match one of those sometimes-special characters as literal character in those situations, those sometimes-special characters will also have to be escaped in those situations by putting a \ before it. Please note: if you escape a normal character, it will sometimes gain a special meaning; this is why so many of the syntax items listed in this section have a \ before them. Match by character code It is possible to match any character using its character code. This allows searching for any character, even if you cannot type it into the Find box, or the Find box doesn’t seem to match your emoji that you want to search for. If you are using an ANSI encoding in your document (that is, using a character set like Windows 1252), you can use any character code with a decimal codepoint from 0 to 255. If you are using Unicode (one of the UTF-8 or UTF-16 encodings), you can actually match any Unicode character. These notations require knowledge of hexadecimal or octal versions of the character code. (You can find such character code information on most web pages about ASCII, or about your selected character set, and about UTF-8 and UTF-16 representations of Unicode characters.) \0ℕℕℕ ⇒ A single byte character whose code in octal is ℕℕℕ, where each ℕ is an octal digit. (That’s the number 0, not the letter o or O.) This notation works for for codepoints 0-255 (\0000 - \0377), which covers the full ANSI character set range, or the first 256 Unicode characters. For example, \0101 looks for the letter A, as 101 in octal is 65 in decimal, and 65 is the character code for A in ASCII, in most of the character sets, and in Unicode. \xℕℕ ⇒ Specify a single character with code ℕℕ, where each ℕ is a hexadecimal digit. What this stands for depends on the text encoding. This notation works for codepoints 0-255 (\x00 - \xFF), which covers the full ANSI character set range, or the first 256 Unicode characters. For instance, \xE9 may match an é or a θ depending on the character set (also known as the “code page”) in an ANSI encoded document. These next two only work with Unicode encodings (so the various UTF-8 and UTF-16 encodings): \x{ℕℕℕℕ} ⇒ Like \xℕℕ, but matches a full 16-bit Unicode character, which is any codepoint from U+0000 to U+FFFF. \x{ℕℕℕℕ}\x{ℕℕℕℕ} ⇒ For Unicode characters above U+FFFF, in the range U+10000 to U+10FFFF, you need to break the single 5-digit or 6-digit hex value and encode it into two 4-digit hex codes; these two codes are the “surrogate codes” for the character. For example, to search for the 🚂 STEAM LOCOMOTIVE character at U+1F682, you would search for the surrogate codes \x{D83D}\x{DE82}. If you want to know the surrogate codes for a given character, search the internet for “surrogate codes for character” (where character is the fancy Unicode character you need the codes for); the surrogate codes are equivalent to the two-word UTF-16 encoding for those higher characters, so UTF-16 tables will also work for looking this up. Any site or tool that you are likely to be using to find the U+###### for a given Unicode character will probably already give you the surrogate codes or UTF-16 words for the same character; if not, find a tool or site that does. You can also compute surrogate codes yourself from the character code, but only if you are comfortable with hexadecimal and binary. Skip the following bullets if you are prone to mathematics-based PTSD. Start with your Unicode U+######, calling the hexadecimal digits as PPWXYZ. The PP digits indicate the plane. subtract one and convert to the 4 binary bits pppp (so PP=01 becomes 0000, PP=0F becomes 1110, and PP=10 becomes 1111) Convert each of the other digits into 4 bits (W as wwww, X as xxvv, Y as yyyy, and Z as zzzz; you will see in a moment why two different characters are used in xxvv) Write those 20 bits in sequence: ppppwwwwxxvvyyyyzzzz Group into two equal groups: ppppwwwwxx and vvyyyyzzzz (you can see that the X ⇒ xxvv was split between the two groups, hence the notation) Before the first group, insert the binary digits 110110 to get 110110ppppwwwwxx, and split into the nibbles 1101 10pp ppww wwxx. Convert those nibbles to hex: it will give you a value from \x{D800} thru \x{DBFF}; this is the High Surrogate code Before the second group, insert the binary digits 110111 to get 110111vvyyyyzzzz, and split into the nibbles 1101 11vv yyyy zzzz. Convert those nibbles to hex: it will give you a value from \x{DC00} thru \x{DFFF}; this is the Low Surrogate code Combine those into the final \x{ℕℕℕℕ}\x{ℕℕℕℕ} for searching. For more on this, see the Wikipedia article on Unicode Planes, and the discussion in the Notepad++ Community Forum about how to search for non-ASCII characters Collating Sequences [[._col_.]] ⇒ The character the col “collating sequence” stands for. For instance, in Spanish, ch is a single letter, though it is written using two characters. That letter would be represented as [[.ch.]]. This trick also works with symbolic names of control characters, like [[.BEL.]] for the character of code 0x07. See also the discussion on character ranges. Control characters \a ⇒ The BEL control character 0x07 (alarm). \b ⇒ The BS control character 0x08 (backspace). This is only allowed inside a character class definition. Otherwise, this means “a word boundary”. \e ⇒ The ESC control character 0x1B. \f ⇒ The FF control character 0x0C (form feed). \n ⇒ The LF control character 0x0A (line feed). This is the regular end of line under Unix systems. \r ⇒ The CR control character 0x0D (carriage return). This is part of the DOS/Windows end of line sequence CR-LF, and was the EOL character on Mac 9 and earlier. OSX and later versions use \n. \t ⇒ The TAB control character 0x09 (tab, or hard tab, horizontal tab). \c☒ ⇒ The control character obtained from character ☒ by stripping all but its 5 lowest order bits. For instance, \cA and \ca both stand for the SOH control character 0x01. You can think of this as “\c means ctrl”, so \cA is the character you would get from hitting Ctrl+A in a terminal. (Note that \c☒ will not work if ☒ is outside of the Basic Multilingual Plane (BMP) – that is, it only works if ☒ is in the Unicode character range U+0000 - U+FFFF. The intention of \c☒ is to mnemonically escape the ASCII control characters obtained by typing Ctrl+☒, it is expected that you will use a simple ASCII alphanumeric for the ☒, like \cA or \ca.) Special Control escapes \R ⇒ Any newline sequence. Specifically, the atomic group (?>\r\n|\n|\x0B|\f|\r|\x85|\x{2028}|\x{2029}). Please note, this sequence might match one or two characters, depending on the text. Because its length is variable-width, it cannot be used in lookbehinds. Because it expands to a parentheses-based group with an alternation sequence, it cannot be used inside a character class. If you accidentally attempt to put it in a character class, it will be interpreted like any other literal-character escape (where \☒ is used to make sure that the next character is literal) meaning that the R will be taken as a literal R, without any special meaning. For example, if you try [\t\R]: you may be intending to say, “match any single character that’s a tab or a newline”, but what you are actually saying is “match the tab or a literal R”; to get what you probably intended, use [\t\v] for “a tab or any vertical spacing character”, or [\t\r\n] for “a tab or carriage return or newline but not any of the weird verticals”. Ranges or kinds of characters Character Classes [_set_] ⇒ This indicates a set of characters, for example, [abc] means any of the literal characters a, b or c. You can also use ranges by putting a hyphen between characters, for example [a-z] for any character from a to z. You can use a collating sequence in character ranges, like in [[.ch.]-[.ll.]] (these are collating sequences in Spanish). Certain characters require special treatment inside character classes: To use a literal - in a character class: Use it directly as the first or last character in the enclosing class notation, like [-abc] or [abc-]; OR use it “escaped” at any position, like [\-abc] or [a\-bc] . To use a literal ] in a character class: Use it directly right after the opening [ of the class notation, like []abc]; OR use it “escaped” at any position, like [\]abc] or [a\]bc] . To use a literal [ in a character class: Use it directly like any other character, like [ab[c]; “escaping” is not necessary, but is permissible, like [ab\[c] . This character is not special when used alone inside a class; however, there are cases where it is special in combination with another: If used with a colon in the order [: inside a class, it is the opening sequence for a named class (described below); if you want to include both a [ and a : inside the same character class, do not use them unescaped right next to each other; either change the order, like [:[], or escape one or both, like [\[:] or [[\:] or [\[\:] . If used with an equals sign in the order [= inside a class, it is the opening sequence for an equivalence class (described below); if you want to include both a [ and a = inside the same character class, do not use them unescaped right next to each other; either change the order, like [=[], or escape one or both, like [\[=] or [[\=] or [\[\=] . To use a literal \ in a character class, it must be doubled (i.e., \\) inside the enclosing class notation, like [ab\\c] . To use a literal ^ in a character class: Use it directly as any character but the first, such as [a^b] or [ab^]; OR use it “escaped” at any position, such as [\^ab] or [a\^b] or [ab\^] . [^_set_] ⇒ The complement of the characters in the set. For example, [^A-Za-z] means any character except an alphabetic character. Care should be taken with a complement list, as regular expressions are always multi-line, and hence [^ABC]* will match until the first A, B or C (or a, b or c if match case is off), including any newline characters. To confine the search to a single line, include the newline characters in the exception list, e.g. [^ABC\r\n]. [[:_name_:]] or [[:☒:]] ⇒ The whole character class named name. For many, there is also a single-letter “short” class name, ☒. Please note: the [:_name_:] and [:☒:] must be inside a character class [...] to have their special meaning. short full name description equivalent character class alnum letters and digits alpha letters h blank spacing which is not a line terminator [\t\x20\xA0] cntrl control characters [\x00-\x1F\x7F\x81\x8D\x8F\x90\x9D] d digit digits graph graphical character, so essentially any character except for control chars, \0x7F, \x80 l lower lowercase letters print printable characters [\s[:graph:]] punct punctuation characters [!"#$%&'()*+,\-./:;<=>?@\[\\\]^_{\|}~] s space whitespace (word or line separator) [\t\n\x0B\f\r\x20\x85\xA0\x{2028}\x{2029}] u upper uppercase letters unicode any character with code point above 255 [\x{0100}-\x{FFFF}] w word word characters [_\d\l\u] xdigit hexadecimal digits [0-9A-Fa-f] Note that letters include any unicode letters (ASCII letters, accented letters, and letters from a variety of other writing systems); digits include ASCII numeric digits, and anything else in Unicode that’s classified as a digit (like superscript numbers ¹²³…). Note that those character class names may be written in upper or lower case without changing the results. So [[:alnum:]] is the same as [[:ALNUM:]] or the mixed-case [[:AlNuM:]]. As stated earlier, the [:_name_:] and [:☒:] (note the single brackets) must be a part of a surrounding character class. However, you may combine them inside one character class, such as [_[:d:]x[:upper:]=], which is a character class that would match any digit, any uppercase, the lowercase x, and the literal _ and = characters. These named classes won’t always appear with the double brackets, but they will always be inside of a character class. If the [:_name_:] or [:☒:] are accidentally not contained inside a surrounding character class, they will lose their special meaning. For example, [:upper:] is the character class matching :, u, p, e, and r; whereas [[:upper:]] is similar to [A-Z] (plus other unicode uppercase letters) [^[:_name_:]] or [^[:☒:]] ⇒ The complement of character class named name or ☒ (matching anything not in that named class). This uses the same long names, short names, and rules as mentioned in the previous description. Character classes may not contain parentheses-based groups of any kind, including the special escape \R (which expands to a parentheses-based group when evaluated, even though \R doesn’t look like it contains parentheses). Character Properties These properties behave similar to named character classes, but cannot be contained inside a character class. \p☒ or \p{_name_} ⇒ Same as [[:☒:]] or [[:_name_:]], where ☒ stands for one of the short names from the table above, and name stands for one of the full names from above. For instance, \pd and \p{digit} both stand for a digit, just like the escape sequence \d does. \P☒ or \P{_name_} ⇒ Same as [^[:☒:]] or [^[:_name_:]] (not belonging to the class name). Character escape sequences \☒ ⇒ Where ☒ is one of d, w, l, u, s, h, v, described below. These single-letter escape sequences are each equivalent to a class from above. The lower-case escape sequence means it matches that class; the upper-case escape sequence means it matches the negative of that class. (Unlike the properties, these can be used both inside or outside of a character class.) Description Escape Sequence Positive Class Negative Escape Sequence Negative Class digits \d [[:digit:]] \D [^[:digit:]] word chars \w [[:word:]] \W [^[:word:]] lowercase \l [[:lower:]] \L [^[:lower:]] uppercase \u [[:upper:]] \U [^[:upper:]] word/line separators \s [[:space:]] \S [^[:space:]] horizontal space \h [[:blank:]] \H [^[:blank:]] vertical space \v see below \V Vertical space: This encompasses all the [[:space:]] characters that aren’t [[:blank:]] characters: The LF, VT, FF, CR , NEL control characters and the LS and PS format characters: 0x000A (line feed), 0x000B (vertical tabulation), 0x000C (form feed), 0x000D (carriage return), 0x0085 (next line), 0x2028 (line separator) and 0x2029 (paragraph separator). There isn’t a named class which matches. Note: despite its similarity to \v, even though \R matches certain vertical space characters, it is not a character-class-equivalent escape sequence (because it evaluates to a parentheses()-based expression, not a class-based expression). So while \d, \l, \s, \u, \w, \h, and \v are all equivalent to a character class and can be included inside another bracket[]-based character class, the \R is not equivalent to a character class, and cannot be included inside a bracketed[] character-class. Equivalence Classes [[=_char_=]] ⇒ All characters that differ from char by case, accent or similar alteration only. For example [[=a=]] matches any of the characters: A, À, Á, Â, Ã, Ä, Å, a, à, á, â, ã, ä and å. Multiplying operators + ⇒ This matches 1 or more instances of the previous character, as many as it can. For example, Sa+m matches Sam, Saam, Saaam, and so on. [aeiou]+ matches consecutive strings of vowels. * ⇒ This matches 0 or more instances of the previous character, as many as it can. For example, Sa*m matches Sm, Sam, Saam, and so on. ? ⇒ Zero or one of the last character. Thus Sa?m matches Sm and Sam, but not Saam. *? ⇒ Zero or more of the previous group, but minimally: the shortest matching string, rather than the longest string as with the “greedy” operator. Thus, m.*?o applied to the text margin-bottom: 0; will match margin-bo, whereas m.*o will match margin-botto. +? ⇒ One or more of the previous group, but minimally. {ℕ} ⇒ Matches ℕ copies of the element it applies to (where ℕ is any decimal number). {ℕ,} ⇒ Matches ℕ or more copies of the element it applies to. {ℕ,ℙ} ⇒ Matches ℕ to ℙ copies of the element it applies to, as much it can (where ℙ ≥ ℕ). {ℕ,}? or {ℕ,ℙ}? ⇒ Like the above, but minimally. *+ or ?+ or ++ or {ℕ,}+ or {ℕ,ℙ}+ ⇒ These so called “possessive” variants of greedy repeat marks do not backtrack. This allows failures to be reported much earlier, which can boost performance significantly. But they will eliminate matches that would require backtracking to be found. As an example, see how the matching engine handles the following two regexes: When regex “.*” is run against the text “abc”x : `“` matches `“` `.*` matches `abc”x` `”` doesn't match ( End of line ) => Backtracking `.*` matches `abc”` `”` doesn't match letter `x` => Backtracking `.*` matches `abc` `”` matches `”` => 1 overall match `“abc”` When regex “.*+”, with a possessive quantifier, is run against the text “abc”x : `“` matches `“` `.*+` matches `abc”x` ( catches all remaining characters ) `”` doesn't match ( End of line ) Notice there is no match at all in this version, because the possessive quantifier prevents backtracking to a possible solution. Anchors Anchors match a zero-length position in the line, rather than a particular character. ^ ⇒ This matches the start of a line (except when used inside a set, see above). $ ⇒ This matches the end of a line. \< ⇒ This matches the start of a word using Boost’s definition of words. \> ⇒ This matches the end of a word using Boost’s definition of words. \b ⇒ Matches either the start or end of a word. \B ⇒ Not a word boundary. It represents any location between two word characters or between two non-word characters. \A or \` ⇒ Matches the start of the file. \z or \' ⇒ Matches the end of the file. \Z ⇒ Matches like \z with an optional sequence of newlines before it. This is equivalent to (?=\v*\z), which departs from the traditional Perl meaning for this escape. \G ⇒ This “Continuation Escape” matches the end of the previous match, or matches the start of the text being matched if no previous match was found. In Find All or Replace All circumstances, this will allow you to anchor your next match at the end of the previous match. If it is the first match of a Find All or Replace All, and any time you use a single Find Next or Replace, the “end of previous match” is defined to be the start of the search area – the beginning of the document, or the current caret position, or the start of the highlighted text. Because of that, if you are using it in an alternation, where you want to say “find any occurrence of something after some prefix, or after a previous match), you will want to make sure that your prefix includes the start-of-file \A, otherwise the \G portion may accidentally match start-of-file when you don’t want that to occur. Capture Groups and Backreferences (_subset_) ⇒ Numbered Capture Group: Parentheses mark a part of the regular expression, also known as a subset expression or capture group. The string matched by the contents of the parentheses (indicated by subset in this example) can be re-used with a backreference or as part of a replace operation; see Substitutions, below. Groups may be nested. (?<name>_subset_) or (?'name'_subset_) ⇒ Named Capture Group: Names the value matched by subset as the group name. Please note that group names are case-sensitive. \ℕ, \gℕ, \g{ℕ}, \g<ℕ>, \g'ℕ', \kℕ, \k{ℕ}, \k<ℕ> or \k'ℕ' ⇒ Numbered Backreference: These syntaxes match the ℕth capture group earlier in the same expression. (Backreferences are used to refer to the capture group contents only in the search/match expression; see the Substitution Escape Sequences for how to refer to capture groups in substitutions/replacements.) A regex can have multiple subgroups, so \2, \3, etc. can be used to match others (numbers advance left to right with the opening parenthesis of the group). You can have as many capture groups as you need, and are not limited to only 9 groups (though some of the syntax variants can only reference groups 1-9; see the notes below, and use the syntaxes that explicitly allow multi-digit ℕ if you have more than 9 groups) Example: ([Cc][Aa][Ss][Ee]).*\1 would match a line such as Case matches Case but not Case doesn't match cASE. \ℕ ⇒ This form can only have ℕ as digits 1-9, so if you have more than 9 capture groups, you will have to use one of the other numbered backreference notations, listed in the next bullet point. Example: the expression \10 matches the contents of the first capture group \1 followed by the literal character 0”, not the contents of the 10th group. \gℕ, \g{ℕ}, \g<ℕ>, \g'ℕ', \kℕ, \k{ℕ}, \k<ℕ> or \k'ℕ' ⇒ These forms can handle any non-zero ℕ. For positive ℕ, it matches the ℕth subgroup, even if ℕ has more than one digit. \g10 matches the contents from the 10th capture group, not the contents from the first capture group followed by the literal 0. If you want to match a literal number after the contents of the ℕth capture group, use one of the forms that has braces, brackets, or quotes, like \g{ℕ} or \k'ℕ' or \k<ℕ>: For example, \g{2}3 matches the contents of the second capture group, followed by a literal 3, whereas \g23 would match the contents of the twenty-third capture group. For clarity, it is highly recommended to always use the braces or brackets form for multi-digit ℕ For negative ℕ, groups are counted backwards relative to the last group, so that \g{-1} is the last matched group, and \g{-2} is the next-to-last matched group. Please, note the difference between absolute and relative backreferences. For instance, an exact four-letters word palindrome can be matched with : the regex (?-i)\b(\w)(\w)\g{2}\g{1}\b, when using absolute (positive) coordinates the regex (?-i)\b(\w)(\w)\g{-1}\g{-2}\b, when using relative (negative) coordinates \g{name}, \g<name>, \g'name', \k{name}, \k<name> or \k'name' ⇒ Named Backreference: The string matching the subexpression named name. (As with the Numbered Backreferences above, these Named Backreferences are used to refer to the capture group contents only in the search/match expression; see the Substitution Escape Sequences for how to refer to capture groups in substitutions/replacements.)

      regular expression

    1. Note de Synthèse sur le Bizutage : Définition, Risques et Actions

      Synthèse

      Le bizutage est un délit grave et non une simple tradition étudiante, défini par l'article 225-16-1 du Code pénal.

      Il se caractérise par le fait d'amener une personne, consentante ou non, à subir ou commettre des actes humiliants ou dégradants, souvent accompagnés d'une consommation excessive d'alcool.

      Ce phénomène touche principalement l'enseignement supérieur et les internats, et est généralement orchestré par les étudiants des années supérieures (deuxième ou troisième année) sur les nouveaux arrivants.

      Les conséquences du bizutage sont profondes et peuvent être psychologiques (traumatismes durables, dépression), physiques (blessures, handicaps à vie) et parfois mortelles.

      Les actes vont de l'ingestion forcée de substances à des simulations sexuelles, des insultes et la diffusion d'images dégradantes sur les réseaux sociaux.

      La dynamique de groupe et la pression sociale rendent le refus extrêmement difficile pour les victimes, invalidant toute notion de consentement.

      Les parents ont un rôle crucial à jouer dans la prévention, en identifiant les signaux d'alerte avant les week-ends d'intégration (questionnaires déplacés, demande d'apporter de l'alcool, décharges de responsabilité) et en maintenant le dialogue avec leurs enfants.

      En cas de bizutage avéré, il est impératif de soutenir la victime sans la juger, de recueillir des preuves (certificats médicaux, témoignages, photos) et de contacter la direction de l'établissement, qui a l'obligation légale de saisir le procureur.

      Le Comité National Contre le Bizutage (CNCB) constitue une ressource essentielle pour l'écoute, le conseil et la médiation.

      --------------------------------------------------------------------------------

      1. Définition Juridique et Caractéristiques du Bizutage

      Le bizutage n'est pas une pratique anodine mais un délit formellement interdit et sanctionné par la loi française. Sa compréhension passe par une analyse de sa définition légale et de ses distinctions avec d'autres phénomènes comme le harcèlement.

      1.1. Le Cadre Légal : Article 225-16-1 du Code Pénal

      Le bizutage est défini comme le fait, pour une personne, "d'amener autrui, contre son gré ou non, à subir ou à commettre des actes humiliants ou dégradants ou à consommer de l'alcool de manière excessive" dans le cadre de manifestations ou réunions liées aux milieux scolaire, sportif et socio-éducatif.

      Sanctions : Ce délit est puni de six mois d'emprisonnement et de 7 500 euros d'amende.

      Auteur de la loi : Le Comité National Contre le Bizutage (CNCB) a participé à l'élaboration de cette loi en 1998.

      1.2. Concepts Fondamentaux

      Deux notions clés de la loi méritent une attention particulière :

      "Actes humiliants ou dégradants" : La perception de l'humiliation est subjective. Un acte peut être vécu comme profondément dégradant par une personne et pas par une autre.

      Il n'existe pas d'échelle pour mesurer l'humiliation. Un acte est considéré comme tel dès lors qu'il met la personne mal à l'aise ou porte atteinte à sa dignité.

      "Contre son gré ou non" : C'est l'élément le plus crucial. La notion de consentement n'existe pas dans le bizutage.

      Un jeune qui participe aux épreuves, même en donnant l'impression de s'amuser, n'est pas considéré comme consentant au regard de la loi. La pression du groupe, le désir d'intégration et la consommation d'alcool annihilent le libre arbitre.

      1.3. Distinction avec le Harcèlement

      Il est essentiel de ne pas confondre le bizutage et le harcèlement :

      Le Harcèlement : Vise une seule personne (ou un groupe restreint) pour des motifs spécifiques (physique, origine, etc.). Il s'agit d'un ou plusieurs harceleurs contre une victime ciblée.

      Le Bizutage : Vise un groupe entier, les "nouveaux", par un autre groupe, les "anciens".

      La seule et unique raison du bizutage est le statut de nouvel arrivant. L'objectif affiché, bien que perverti, est un rite de passage pour "intégrer" la promotion.

      2. Manifestations et Contexte du Bizutage

      Le bizutage se déroule selon des schémas récurrents, impliquant des acteurs spécifiques dans des environnements propices à l'abus de pouvoir.

      2.1. Acteurs et Lieux Concernés

      Les Bizuteurs : Généralement les étudiants de deuxième ou troisième année, souvent organisés par le Bureau des Élèves (BDE).

      Leurs motivations sont diverses : se venger d'un bizutage subi, ou un sentiment de toute-puissance et de perversité.

      Les Bizutés : Les nouveaux arrivants (premières années).

      Lieux : Le phénomène touche tous les types d'établissements de l'enseignement supérieur (universités, écoles de commerce, médecine, architecture, BTS), les centres sportifs (CREPS) et est particulièrement prévalent dans les internats, qui sont des lieux clos et propices aux abus.

      2.2. Formes et Exemples d'Actes de Bizutage

      Les pratiques sont variées mais suivent souvent une escalade, une "spirale" qui commence de manière prétendument "amusante" avant de dégénérer.

      Catégorie d'actes

      Exemples concrets issus de témoignages

      Humiliation Physique

      - Se faire couvrir d'un mélange "collant et puant" (œufs, farine, litière pour lapin, soupe de poisson).<br>- Être attaché à d'autres, parfois dans des positions dégradantes.<br>- Passer dans un tuyau rempli d'huile ou une bassine de soda.

      Consommation Forcée

      - Obligation de boire de l'alcool en grande quantité (la vodka est très fréquente).<br>- Ingurgiter de la nourriture ou des boissons dégoûtantes.

      Atteintes Sexuelles

      - Obliger une fille à simuler une fellation ou à faire un strip-tease.<br>- Chanter des chansons obscènes.<br>- Insultes à caractère sexiste pour les filles et homophobe pour les garçons.

      Cyber-violence

      - Déshabiller les bizutés, les filmer ou les photographier.<br>- Diffuser les images sur les réseaux sociaux.

      Menaces

      - Menacer ceux qui refusent de participer, les qualifier de "nuls" ou de "pas drôles".

      2.3. La Psychologie du Bizuteur

      La justification principale avancée par les bizuteurs est de "souder la promotion" et de créer des liens.

      En réalité, la logique sous-jacente est une relation de dominant-dominé.

      Un témoignage d'un ancien bizuteur est particulièrement éclairant :

      "Je retiens du bizutage, un sentiment enivrant de pouvoir. C'est en criant 'bois et ferme ta gueule' à une première année [...] que j'ai compris le plaisir d'être tyran d'un jour. J'ai adoré soumettre des premières années."

      3. Conséquences Graves et Dégâts Humains

      La formule du CNCB résume l'impact du bizutage : "Il tue parfois, il traumatise souvent et il humilie toujours."

      3.1. Conséquences Psychologiques

      Traumatismes à long terme : Des victimes contactent le CNCB 5, 10, voire 30 ans après les faits, n'ayant jamais réussi à oublier.

      Dépression et décrochage : De nombreux jeunes développent une dépression et abandonnent leurs études pour ne plus avoir à croiser leurs "bourreaux" dans les couloirs.

      Honte et culpabilité : Les victimes ressentent une profonde honte d'avoir accepté, de ne pas avoir su dire non, ce qui les empêche souvent de parler.

      3.2. Conséquences Physiques et Mortelles

      Le bizutage peut causer des blessures graves, voire la mort.

      Blessures graves : Un jeune est resté aveugle pendant trois semaines après avoir été baigné dans des liquides toxiques ; un autre est handicapé à vie après une chute de trois étages lors d'un bizutage à Lille en 2012.

      Décès : Plusieurs décès directement liés à des bizutages ont été recensés.

      Année

      Lieu

      Contexte

      2012

      Saint-Cyr

      Noyade

      2013

      École des Mines

      Décès

      2017

      Fac de Nanterre / Dentaire de Rennes

      Décès

      2021

      Lille

      Décès de Simon Monray

      Message de prévention crucial : Ne jamais laisser seul un jeune fortement alcoolisé. Il faut appeler les secours (pompiers, SAMU) et rester avec lui. Un jeune est décédé à Rennes d'un coma éthylique après avoir été laissé seul pour "cuver son vin".

      4. Rôle des Parents et Stratégies d'Action

      Les parents sont en première ligne pour prévenir le bizutage et agir s'il survient.

      4.1. Prévention en Amont (Avant un week-end d'intégration)

      Dialoguer : Profiter de la demande de financement pour le week-end pour aborder le sujet du bizutage, sans effrayer mais en prévenant.

      Analyser l'invitation et la liste de matériel : Certains signes doivent alerter.

      Questionnaire "bizarre" avec des questions intimes ou sur l'alcool.  

      ◦ Demande de prévoir des vêtements "qui ne craignent rien".    ◦ Demande d'apporter de l'alcool.  

      ◦ Demande de signer une décharge de responsabilité, qui n'a aucune valeur juridique.

      Exiger des informations claires : Les parents doivent connaître le lieu et le programme précis du week-end. Un lieu tenu secret est un signal d'alarme majeur.

      Assurer la communication : Le jeune doit toujours conserver son téléphone portable.

      La confiscation des téléphones vise à couper les victimes du monde extérieur et doit déclencher une alerte immédiate auprès de la direction de l'établissement.

      Conseiller le refus : Inciter le jeune à dire non s'il se sent mal à l'aise et à se regrouper avec d'autres qui partagent ses réticences.

      4.2. Réaction Après un Bizutage

      Identifier les signaux de détresse :

      ◦ Refus de parler du week-end, malaise.  

      ◦ Changement de comportement : isolement, anxiété, sommeil perturbé.  

      ◦ Volonté de quitter l'établissement.   

      ◦ Marques physiques ou blessures.

      Écouter et soutenir :

      ◦ Rester calme, ne pas paniquer.    ◦ Écouter sans porter de jugement sur l'incapacité du jeune à dire non.  

      Ne jamais minimiser les faits subis.   

      Déculpabiliser la victime : les seuls coupables sont les bizuteurs.

      Recueillir des preuves :

      ◦ Faire établir des certificats médicaux (physiques et psychologiques).    ◦ Conserver toutes les preuves : messages, photos, noms des organisateurs et des autres victimes, dates, lieux.

      4.3. Démarches Institutionnelles et Judiciaires

      Contacter l'établissement : Informer le chef d'établissement, les services de la vie étudiante ou le référent bizutage.

      Le CNCB peut servir de médiateur en garantissant l'anonymat.

      Obligation de l'établissement : Le chef d'établissement a l'obligation légale de saisir le procureur de la République lorsqu'il a connaissance de faits délictueux.

      Il doit aussi engager des poursuites disciplinaires.

      Porter plainte : Il est conseillé de consulter un avocat avant d'engager une procédure judiciaire.

      La justice est souvent très lente et de nombreuses plaintes sont classées sans suite.

      Le processus peut être long (l'affaire de 2012 s'est terminée en 2025) et coûteux.

      Objectif des sanctions : Les sanctions doivent être rapides et exemplaires pour dissuader de futures tentatives, car les bizuteurs ne sont généralement pas récidivistes.

      5. Le Comité National Contre le Bizutage (CNCB)

      Le CNCB est un acteur central de la lutte contre ce phénomène en France.

      Composition : Il regroupe des adhérents directs et des personnes morales comme les fédérations de parents d'élèves, des syndicats enseignants, la Conférence des présidents d'universités et la Conférence des grandes écoles.

      Missions :

      1. Recueillir les témoignages, écouter et conseiller les victimes.  

      2. Interpeller les responsables d'établissements et les ministères.   

      3. Intervenir dans les établissements pour prévenir et éradiquer le bizutage.  

      4. Réfléchir avec les jeunes à des formes d'accueil respectueuses et bienveillantes.

      Partenariats : Le CNCB travaille en étroit partenariat avec le Ministère de l'Enseignement Supérieur et le Ministère des Sports, qui le subventionnent.

      En revanche, la collaboration avec le Ministère de l'Éducation Nationale est décrite comme inexistante ("c'est un mur").

      Ressources : Le site web du CNCB met à disposition de nombreux outils (diaporamas, brochures) pour permettre à d'autres acteurs (parents, enseignants) de mener des actions de sensibilisation.

      6. Citations Clés

      Sur la nature du bizutage : "Le bizutage, il tue parfois, il traumatise souvent et il humilie toujours."

      Sur le consentement : "Les visiteurs nous disent 'mais madame, on a obligé personne. Tout le monde était d'accord.' [...] Et là, il faut vraiment remettre les pendules à l'heure et réfléchir avec eux parce que c'est faux. Le nouveau, il n'a pas le choix."

      Sur la motivation du bizuteur : "Je retiens du bizutage, un sentiment enivrant de pouvoir. [...] J'ai compris le plaisir d'être tyran d'un jour. J'ai adoré soumettre des premières années."

      Témoignage d'une victime : "Je me sentais sale dans tous les sens du terme. On est obligé parce qu'on nous dit en gros, si vous ne le faites pas, vous n'êtes pas drôle. Vous êtes des nuls."

      Sur l'importance de dénoncer : "Dénoncer un bizutage, c'est dénoncer un délit et que de dénoncer un délit, c'est le devoir de tout citoyen. Et que si on ne dénonce pas les faits, eh bien, c'est tout simple, ils se reproduiront l'année d'après."

    1. Dr. Krieger’s work developing and applying ecosocial theory has continued to expand, via: (1) empirical research; (2) a new series on “small books, big ideas for population health” published by Oxford University Press; (3)  development of freely available resources to improve monitoring of and action to address health inequities, via the Public Health Disparities Geocoding Project, and (4) the continued growth and reach of the Spirit of 1848 Caucus.

      Dr Kriegers work has 4 major points points: ecosocial, empirical, her new series and growth of Spirit of 1848 Caucus

    1. The first question we ask is if the compound is ionic or covalent? If it is covalent, which is typically between 2 or more nonmetals, we need to ask, is it a simple molecule, or is it an acid. If it is a simple molecule we use Greek prefixes to identify the number of atoms of each type of element in the molecule. If it is an acid, we base it's name on the ionic compound it would form if hydrogen could be a cation. Note, hydrogen can not lose its only electron as then it would be a subatomic particle and the charge density would be too high, so it forms a covalent bond. If it is ionic, we use the principle of charge neutrality to name the compound.

      Memorize this strategy to know how to identify the compound from one another Essential and useful knowledge. Boppe will have us do this on Study guide and test. -Revisit 8/10 2:00 AM, 8/10 4:00 PM, 8/10 7:30 PM

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewer’s Comments

      We thank all three reviewers for their thoughtful and detailed comments, which will help us to improve the quality and clarity of our manuscript.


      __Reviewer #1 (Evidence, reproducibility and clarity (Required)): __ Summary: In this work, Tripathi et al address the open question of how the Fat/Ds pathway affects organ shape, using the Drosophila wing as a model. The Fat/Ds pathway is a conserved but complex pathway, interacting with Hippo signalling to affect growth and providing planar cell polarity that can influence cellular dynamics during morphogenesis. Here, authors use genetic perturbations combined with quantification of larval, pupal, and adult wing shape and laser ablation to conclude that the Ft/Ds pathway affects wing shape only during larval stages in a way that is at least partially independent of its interaction with Hippo and rather due to an effect on tissue tension and myosin II distribution. Overall the work is clearly written and well presented. I only have a couple major comments on the limitations of the work.

      Major comments: 1. Authors conclude from data in Figures 1 and 2 that the Fat/Ds pathway only affects wing shape during larval stages. When looking at the pupal wing shape analysis in Figure 2L, however, it looks there is a difference in wt over time (6h-18h, consistent with literature), but that difference in time goes away in RNAi-ds, indicating that actually there is a role for Ds in changing shape during pupal stages, although the phenotype is clearly less dramatic than that of larval stages. No statistical test was done over time (within the genotype), however, so it's hard to say. I recommend the authors test over time - whether 6h and 18h are different in wild type and in ds mutant. I think this is especially important because there is proximal overgrowth in the Fat/Ds mutants, much of which is contained in the folds during larval stages. That first fold, however, becomes the proximal part of the pupal wing after eversion and contracts during pupal stages to elongate the blade (Aiguoy 2010, Etournay 2015). Also, according to Trinidad Curr Biol 2025, there is a role for Fat/Ds pathway in pupal stages. All of that to say that it seems likely that there would be a phenotype in pupal stages. It's true it doesn't show up in the adult wing in the experiments in Fig 1, but looking at the pupal wing itself is more direct - perhaps the very proximal effect is less prominent later, as there is potential for further development after 18hr before adulthood and the most proximal parts are likely anyway excluded in the analysis.

      Response: Our main purpose in examining pupal wing shape was to emphasize that wings lacking ds are visibly abnormal even at early pupal stages. The reviewer makes the point that the change in shape from 6h to 18h APF is greater in control wings than in RNAi-ds wings. We have added quantitation of this to the revised manuscript as suggested. This difference could be interpreted as indicating that Ds-Fat signaling actively contributes to wing shape during pupal morphogenesis. However, given the genetic evidence that Ds-Fat signaling influences wing shape only during larval growth, we favor the interpretation that it reflects consequences of Ds-Fat action during larval stages – eg, overgrowth of the wing, particularly the proximal wing and hinge as occurs in ds and fat mutants, could result in relatively less elongation during the pupal hinge contraction phase. This wouldn’t change our key conclusions, but it is something that we discuss in a revised manuscript.

      I think there needs to be a mention and some discussion of the fact that the wing is not really flat. While it starts out very flat at 72h, by 96h and beyond, there is considerable curvature in the pouch that may affect measurements of different axis and cell shape. It is not actually specified in the methods, so I assume the measurements were taken using a 2D projection. Not clear whether the curvature of the pouch was taken into account, either for cell shape measurements presented in Fig 4 or for the wing pouch dimensional analysis shown in Fig 3, 6, and supplements. Do perturbations in Ft/Ds affect this curvature? Are they more or less curved in one or both axes? Such a change could affect the results and conclusions. The extent to which the fat/ds mutants fold properly is another important consideration that is not mentioned. For example, maybe the folds are deeper and contain more material in the ds/fat mutants, and that's why the pouch is a different shape? At the very least, this point about the 3D nature of the wing disc must be raised in discussion of the limitations of the study. For the cell shape analysis, you can do a correction based on the local curvature (calculated from the height map from the projection). For the measurement of A/P, D/V axes of the wing pouch, best would be to measure the geodesic distance in 3D, but this is not reasonable to suggest at this point. One can still try to estimate the pouch height/curvature, however, both in wild type and in fat/ds mutants.

      Response: The wing pouch measurements were done on 2D projections of wing discs that were already slightly flattened by coverslips, so there is not much curvature outside of the folds. We will revise the methods to make sure this is clear. While we recognize that the absolute values measured can be affected by this, our conclusions are based on the qualitative differences in proportions between genotypes and time points, and we wouldn’t expect these to differ significantly even if 3D distances were measured. Obtaining accurate 3D measures is technically more challenging - it requires having spacers matching the thickness of the wing disc, which varies at different time points and genotypes, and then measuring distances across curved surfaces. What we propose to address this is to do a limited set of 3D measures on wild-type and dsmutant wing discs at early and late stages and which we expect will confirm our expectation that the conclusions of our analysis are unaffected, while at the same time providing an indication of how much curvature affects the values obtained. We will also make sure the issue of wing disc curvature and folds is discussed in the text.

      Minor comments: 1. The analysis of the laser ablation is not really standard - usually one looks at recoil velocity or a more complicated analysis of the equilibrium shape using a model (e.g Shivakumar and Lenne 2016, Piscitello-Gomez 2023, Dye et al 2021). One may be able to extract more information from these experiments - nevertheless, I doubt the conclusions would change, given that that there seems to be a pretty clear difference between wt and ds (OPTIONAL).

      Response: We will add measurements of recoil velocities to complement our current analysis of circular cuts.

      Figure 7G: I think you also need a statistical test between RNAi-ds and UAS-rokCA+RNAi-ds.

      Response: We include this statistical test in the revised manuscript (it shows that they are significantly different).

      In the discussion, there is a statement: "However, as mutation or knock down of core PCP components, including pk or sple, does not affect wing shape... 59." Reference 59 is quite old and as far as I can tell shows neither images nor quantifications of the wing shape phenotype (not sure it uses "knockdown" either - unless you mean hypomorph?). A more recent publication Piscitello-Gomez et al Elife 2023 shows a very subtle but significant wing shape phenotype in core PCP mutants. It doesn't change your logic, but I would change the statement to be more accurate by saying "mutation of core PCP components has only subtle changes in adult wing shape"

      Response: Thank-you for pointing this out, we have revised the manuscript accordingly.

      **Referee cross-commenting**

      Reviewer2: Reviewer 2 makes the statement: "The distance along the AP boundary from the pouch border to DV midline is topologically comparable to the PD length of the adult wing. The distance along the DV boundary from A border to P border is topologically comparable to the AP length of the adult wing."

      I disagree - the DV boundary wraps around the entire margin of the adult wing (as correctly drawn with the pink line in Fig 2A). It is not the same as the wide axis of the adult wing (perpendicular to the AP boundary). It is not trivial to map the proximal-distal axis of the larval wing to the proximal-distal axis of the adult, due to the changes in shape that occur during eversion. Thus, I find it much easier to look at the exact measurement that the authors make, and it is much more standard in the field, rather than what the reviewer suggests. Alternatively, one could I guess measure in the adult the ratio of the DV margin length (almost the circumference of the blade?) to the AP boundary length. That may be a more direct comparison. Actually the authors leave out the term "boundary" - what they call AP is actually the AP boundary, not the AP axis, and likewise for the DV - what they measure is DV boundary, but I only noticed that in the second read-through now. Just another note, these measurements of the pouch really only correspond to the very distal part of the wing blade, as so much of the proximal blade comes from the folds in the wing disc. Therefore, a measurement of only distal wing shape would be more comparable.

      Response: We thank Reviewer 1 for their comments here. In terms of the region measured, we measure to the inner Wg ring in the disc, the location of this ring in the adult is actually more proximal than described above (eg see Fig 1B of Liu, X., Grammont, M. & Irvine, K. D. Roles for scalloped and vestigial in regulating cell affinity and interactions between the wing blade and the wing hinge. Developmental Biology 228, 287–303 (2000)), and this defines roughly the region we have measured in adult wings (with the caveat noted above that the measurements in the disc can be affected by curvature and the hinge/pouch fold, which we will address).

      Reviewer 2 states that authors cannot definitively conclude anything about mechanical tension from their reported cutting data because the authors have not looked at initial recoil velocity. I strongly disagree. __The wing disc tissue is elastic on much longer timescales than what's considered after laser ablation (even hours), and the shape of the tissue after it equilibrates from a circular cut (1-2min) can indeed be used to infer tissue stresses (see Dye et al Elife 2021, Piscitello-Gomez et al eLife 2023, Tahaei et al arXiv 2024).__ In the wing disc, the direction of stresses inferred from initial recoil velocity are correlated with the direction of stresses inferred from analysing the equilibrium shape after a circular cut. Rearrangements, a primary mechanism of fluidization in epithelia, does not occur within 1'. Analysing the equilibrium shape after circular ablation may be more accurate for assessing tissue stresses than initial recoil velocity - in Piscitello-Gomez et al 2023, the authors found that a prickle mutation (PCP pathway) affected initial recoil velocity but not tissue stresses in the pupal wing. Such equilibrium circular cuts have also been used to analyze stresses in the avian embryo, where it correlates with directions of stress gathered from force inference methods (Kong et al Scientific Reports 2019). The Tribolium example noted by the reviewer is on the timescale of tens to hundreds of minutes - much longer than the timescale of laser ablation retraction. It is true the analysis of the ablation presented in this paper is not at the same level as those other cited papers and could be improved. But I don't think the analysis would be improved by additional experiments doing timelapse of initial retraction velocity.

      Response: Thank-you, we agree with Reviewer 1 here.

      Reviewer 2 states "If cell anistropy is caused by polarized myosin activity, that activity is typically polarized along the short edges not long edges" Not true in this case. Myosin II accumulates along long boundaries (Legoff and Lecuit 2013). "Therefore, interpreting what causes the cell anistropy and how DS regulates it is difficult," Agreed - but this is well beyond the scope of this manuscript. The authors clearly show that there is a change of cell shape, at least in these two regions. Better would be to quantify it throughout the pouch and across multiple discs. Similar point for myosin quantifications - yes, polarity would be interesting and possible to look at in these data, and it would be better to do so on multiple discs, but the lack of overall myosin on the junctions shown here is not nothing. Interpreting what Ft/Ds does to influence tension and myosin and eventually tissue shape is a big question that's not answered here. I think the authors do not claim to fully understand this though, and maybe further toning down the language of the conclusions could help.

      Response: We agree with Reviewer 1 here and will also add quantitation of myosin across multiple discs and will include higher magnification myosin images and polarity tests.

      Reviewer 3: I agree with many of the points raised by Reviewer 3, in particular that relevant for Fig 1. The additional experiments looking at myosin II localization and laser ablation in the other perturbations (Hippo and Rok mutants/RNAi) would certainly strengthen the conclusions.

      Response: Reviewer 3 comment on Fig 1 requests Ab stains to assess recovery of expression after downshift, which we will do.

      We will add examination of myosin localization in hpo RNAi wing discs, and in the ds/rok combinations. We note that the effects of Rok manipulations on myosin and on recoil velocity have been described previously (eg Rauskolb et al 2014).

      Reviewer #1 (Significance (Required)): I think the work provides a clear conceptual advance, arguing that the Ft/Ds pathway can influence mechanical stress independently of its interaction with Hippo and growth. Such a finding, if conserved, could be quite important for those studying morphogenesis and Fat function in this and other organisms. For this point, the genetic approach is a clear strength. Previous work in the Drosophila wing has already shown an adult wing phenotype for Ft/Ds mutations that was attributed to its role in the larval growth phase, as marked clones show aberrant growth in mutants. The novelty of this work is the dissection of the temporal progression of this phenotype and how it relates to Hippo and myosin II activation. It remains unclear exactly how Ft/Ds may affect tissue tension, except that it involves a downregulation of myosin II - the mechanism of that is not addressed here and would involve considerable more work. I think the temporal analysis of the wing pouch shape was quite revealing, providing novel information about how the phenotype evolves in time, in particular that there is already a phenotype quite early in development. As mentioned above, however, the lack of consideration of the wing disc as a 3D object is a potential limitation. While the audience is likely mostly developmental biologists working in basic research, it may also interest those studying the pathway in other contexts, including in vertebrates given its conservation and role in other processes.

      __Reviewer #2 (Evidence, reproducibility and clarity (Required)): __ The manuscript begins with very nice data from a ts sensitive period experiment. Instead of a ts mutation, the authors induced RNAi in a temperature dependent manner. The results are striking and strong. Knockdown of FT or DS during larval stages to late L3 changed shape while knockdown of FT or DS during later pupal stages did not. This indicates they are required during larval, not pupal stages of wing development for this shape effect. They did shift-up or shift-down at "early pupa stage" but precisely what stage that means was not described anywhere in the manuscript. White prepupal? Time? Likewise a shift-down was done at "late L3" but that meaning is also vague. Moreover, I was surprised to see they did not do a shift-up at the late L3 stage, to give completeness to the experiment. Why?

      Response: We have added more precise descriptions of the timing, and we will also add the requested late L3 shift-up experiment.

      Looking at the "shape" of the larval wing pouch they see a difference in the mutants. The pouch can be approximated as an ellipse, but with differing topology to the adult wing. Here, they muddled the analysis. The adult wing surface is analogous to one hemisphere of the larval wing pouch, ie., either dorsal or ventral compartment. The distance along the AP boundary from the pouch border to DV midline is topologically comparable to the PD length of the adult wing. The distance along the DV boundary from A border to P border is topologically comparable to the AP length of the adult wing. They confusingly call this latter metric the "DV length" and the former metric the "AP length" , and in fact they do not measure the PD length but PD+DP length. Confusing. Please change to make this consistent with earlier analysis of the adult and invert the reported ratio and divide by two.

      Then you would find the larval PD/AP ratio is smaller in the FT and DS mutants than wildtype, which resembles the smaller PD/AP ratio seen in the mutant adult wings. Totally consistent and also provides further evidence with the ts experiments that FT and DS exert shape effects in the larval phase of life.

      Response: As noted by Reviewer 1 in cross-referencing, some of the statements made by Reviewer 2 here are incorrect, eg “The distance along the DV boundary from A border to P border is topologically comparable to the AP length of the adult wing.” They are correct where they note that the A-P length we measure in the discs is actually equivalent to 2x the adult wing length, since we are measuring along both the dorsal and ventral wing, but this makes no difference to the analysis as the point is to compare shape between time points and genotypes, not to make inferences based on the absolute numbers obtained. The numerical manipulations suggested are entirely feasible but we think they are unnecessary.

      The remainder of the manuscript has experimental results that are more problematic, and really the authors do not figure out how the shape effect in larval stages is altered. I outline below the main problems.

      1. They compare the FT DS shape phenotypes to those of mutants or knockdowns in Hippo pathway genes (Hippo is known to be downstream of FT and DS). They find these Hippo perturbations do have shape effects trending in same direction as FT and DS effects. Knockdown reduces the PD/AP ratio while overexpressing WARTS increases the PD/AP ratio. The effect magnitudes are not as strong, but then again, they are using hypomorphic alleles and RNAi, which often induces partial or hypomorphic phenotypes. The effect strength is comparable when wing pouches are young but then dissipates over time, while FT and DS effects do not dissipate over time. The complexity of the data do not negate the idea that Hippo signaling is also playing some role and could be downstream of FT and DS in all of this. But the authors really downplay the data to the point of stating "These results imply that Ds-Fat influences wing pouch shape during wing disc growth separately from its effects on Hippo signaling." I think a more expansive perspective is needed given the caveats of the experiments.

      Response: Our results emphasize that the effects of Ds-Fat on wing shape cannot be explained solely by effects on Hippo signaling, eg as we stated on page 7 “These observations suggest that Hippo signaling contributes to, but does not fully explain, the influence of ds or fat on adult wing shape.” We also note that impairment of Hippo signaling has similar effects in younger discs, but very different effects in older discs, which clearly indicates that they are having very different effects during disc growth; we will revise the text to make sure our conclusions are clear.

                    The reviewer wonders whether some of the differences could be due to the nature of the alleles or gene knockdown. First, the *ex*, *ds*, and *fat* alleles that we use are null alleles (eg see FlyBase), so it is not correct to say that we use only hypomorphic alleles and RNAi. We do use a hypomorphic allele for wts, and RNAi for hpo, for the simple reason that null alleles in these genes are lethal, so adult wings could not be examined. A further issue that is not commented on by the reviewer, but is more relevant here, is that there are multiple inputs into Hippo signaling, so of course even a null allele for ex, ds or fat is not a complete shutdown of Hippo signaling. Nonetheless, one can estimate the relative impairment of Hippo signaling by measuring the increased size of the wings, and from this perspective the knockdown conditions that we use are associated with roughly comparable levels of Hippo pathway impairment, so we stand by our results. We do however, recognize that these issues could be discussed more clearly in the text, and will do so in a revised manuscript.
      

      Puzzlingly, this lack of taking seriously a set of complex results does not transfer to another set of experiments in which they inhibit or activate ROK, the rho kinase. When ROK is perturbed, they also see weak effects on shape when compared to FT or DS perturbation. This weakness is seen in adults, larvae, clones and in epistasis experiments. The epistasis experiment in particular convincingly shows that constitutuve ROK activation is not epistatic to loss of DS; in fact if anything the DS phenotype suppresses the ROK phenotype. These results also show that one cannot simply explain what FT and DS are doing with some single pathway or effector molecule like ROK. It is more complex than that.

      What I really think was needed were experiments combining FT and DS knockdown with other mutants or knockdowns in the Hippo and Rho pathways, and even combining Hippo and Rho pathway mutants with FT or DS intact, to see if there are genetic interactions (additive, synergistic, epistatic) that could untangle the phenotypic complexity.

      Response: We’re puzzled by these comments. First, we never claimed that what Fat or Ds do could be explained simply by manipulation of Rok (eg, see Discussion). Moreover, examination of wings and wing discs where ds is combined with Rho manipulations is in Fig 7, and Hippo and Rho pathway manipulation combinations are in Fig S5. We don’t think that combining ds or fat mutations with other Hippo pathway mutations would be informative, as it is well established that Ds-Fat are upstream regulators of Hippo signaling.

      Laser cutting experiments were done to see if there is anisotropy in tissue tension within the wing pouch. This was to test a favored idea that FT and DS activity generates anisotropy in tissue tension, thereby controlling overall anisotropic shape of the pouch. However there is a fundamental flaw to their laser cutting analysis. Laser cutting is a technique used to measure mechanical tension, with initial recoil velocity directly proportional to the tissue's tension. By cutting a small line and observing how quickly the edges of the cut snap apart, people can quantify the initial recoil velocity and infer the stored mechanical stress in the tissue at the time of ablation. Live imaging with high-speed microscopy is required to capture the immediate response of the tissue to the cut since initial recoil velocity occurs in the first few seconds. A kymograph is created by plotting the movement of the tissue edges over this time scale, perpendicular to the cut. The initial recoil velocity is the slope of the kymograph at time zero, representing how fast the severed edges move apart. A higher recoil velocity indicates higher mechanical tension in the tissue. However, the authors did not measure this initial recoil velocity but instead measured the distance between the severed edges at one time point: 60 seconds after cutting. This is much later than the time point at which the recoil usually begins to dissipate or decay. This decay phase typically lasts a minute or two, during which time the edges continue to separate but at a progressively slower rate. This time-dependent decay of the recoil reveals whether the tissue behaves more like a viscous fluid or an elastic solid. Therefore, the distance metric at 60 seconds is a measurement of both tension and the material properties of the cells. One cannot know then whether a difference in the distance is due to a difference in tension or fluidity of the cells. If the authors made measurements of edge separation at several time points in the first 10 seconds after ablation, they can deconvolute the two. Otherwise their analysis is inconclusive. Anisotropy in recoil could be caused by greater tissue fluidity along one axis. Observing a gradient of cell fluidity in a tissue along one axis of a tissue has been observed in the amnioserosa of Tribolium for example. (Related and important point - was the anisotropy of recoil oriented along the PD or AP axis or not oriented to either axis, this key point was never stated)..

      The authors cannot definitiviely conclude anything about mechanical tension from their reported cutting data.

      Response: As noted by Reviewer 1 in cross-commenting, there is no fluidity on a time scale of 1 minute in the wing disc, and circular ablations are an established methods to investigate tissue stress. We choose the circular ablation method in part because it interrogates stress over a larger area, whereas cutting individual junctions is subject to more variability, particularly as the orientation of the junction (eg radial vs tangential) impacts the tension detected in the wing disc. Nonetheless, we will add recoil measurements to the revised manuscript to complement our circular ablations, which we expect will provide independent confirmation of our results and address the Reviewer’s concern here.

      They measured the eccentricity of wing pouch cells near the pouch border, and found they were highly anisotropic compared to DS mutant cells at comparable locations. Cells were elongated but again what if either axis (PD or AP) they were elongated along was never stated. If cell anistropy is caused by polarized myosin activity, that activity is typically polarized along the short edges not long edges. Thus, recoil velocity after laaser cutting would be stonger along the axis aligned with short cell edges. It looks like the cutting anisotropy they see is greater along the axis aligned with long cell edges. Of course, if the cell anisotropy is caused by a pulling force exerted by the pouch boundary, then it would stretch the cells. This would in fact fit their cutting data. But then again, the observed cell anisotropy could also be caused by variation in the fluid-solid properties of the wing cells as discussed earlier. Compression of the cells then would deform them anisotropically and produce the anisotropic shapes that were observed, Therefore, interpreting what causes the cell anistropy and how DS regulates it is difficult,

      Response: As noted by Reviewer 1 in cross-commenting, it is well established that tension and myosin are higher along long edges in the proximal wing. However, we acknowledge that we could do a better job of making the location and orientation of the regions shown in these experiments clear and, we will address this in a revised manuscript.

      The imaging and analysis of the myosin RLC by GFP tagging is also flawed. SQH-GFP is a tried and true proxy for myosin activity in Drosophila. Although the authors image the wing pouch of wildtype and DS mutants. they did so under low magnification to image the entire pouch. This gives a "low-res" perspective of overall myosin but what they needed to do was image at high magnification in that proximal region of the pouch and see if Sqh-GFP is polarized in wildtype cells along certain cell edges aligned with an axis. And if such a polarity is observed, is it present or absent in the DS mutant. From the data shown in Figure 5, I cannot see any significant difference between wildtype and knocked down samples at this low resolution. Any difference, if there is any, is not really interpretable.

      Response: We agree that examination of myosin localization at high resolution to see if it is polarized is a worthwhile experiment. We did in fact do this, and myosin (Sqh:GFP) appeared unpolarized in ds mutants. However, the levels of myosin were so low that we didn’t feel confident in our assessment, so we didn’t include it. We now recognize that this was a mistake, and we will include high resolution myosin images and assessments of (lack of) polarity in a revised manuscript to address this comment.

      In conclusion, the manuscript has multiple problems that make it imposiible for the authors to make the claims they make in the current manuscript. And even if they calibrated their interpretations to fit the data, there is not much of a simple clear picture as to how FT and DS regulate pouch eccentricity in the larval wing.

      Response: We think that the legitimate issues raised are addressable, as described above, while some of the criticisms are incorrect (as noted by Reviewer 1).

      Reviewer #2 (Significance (Required)): This manuscript describes experiments studying the role that the protocadherins FAT and DACHSOUS play in determining the two dimensional "shape" of the fruit fly wing. By "shape", the manuscript really means how much the wing's outline, when approximated as an ellipse, deviates from a circle. The elliptical approximations of FT and DS mutant wings more closely resemble a circle compared to the more eccentric wildtype wings. This suggests the molecules contribute to anisotropic growth in some way. A great deal of attention has been paid on how FT and DS regulate overall organ growth and planar cell polarity, and the Irvine lab has made extensive contributions to these questions over the years. Somewhat understudied is how FT and DS regulate wing shape, and this manuscript focuses on that. It follows up on an interesting result that the Irvine lab published in 2019, in which mud mutants randomized spindle pole orientation in wing cells but did not change the eccentricity of wings, ruling out biased cell division orientation as a mechanism for the anisotropic growth.

      __Reviewer #3 (Evidence, reproducibility and clarity (Required)): __ Summary The authors investigate the mechanisms underlying epithelial morphogenesis using the Drosophila wing as a model system. Specifically, they analyze the contribution of the conserved Fat/Ds pathway to wing shape regulation. The main claim of the manuscript is that Ds/Fat controls wing shape by regulating tissue mechanical stress through MyoII levels, independently of Hippo signaling and tissue growth.

      Major Comments To support their main conclusions, the authors should address the following major points and consider additional experiments where indicated. Most of the suggested experiments are feasible within a reasonable timeframe, while a few are more technically demanding but would substantially strengthen the manuscript's central claims.

      Figure 1: The authors use temperature-sensitive inactivation of Fat or Ds to determine the developmental window during which these proteins regulate wing shape. To support this claim, it is essential to demonstrate that upon downshift during early pupal stages, Ds or Fat protein levels are restored to normal. For consistency, please include statistical analyses in Figure 1P and ensure that all y-axis values in shape quantifications start at 1.

      Response: We will do the requested antibody stains for Fat (Ds antibody is unfortunately no longer available, but the point made by the reviewer can be addressed by Fat as the approach and results are the same for both genes). We have also added the requested statistical analysis to Fig 1P, and adjusted the scales as requested.

      Figure 2: The authors propose that wing shape is regulated by Fat/Ds during larval development. However, Figure 2L suggests that wing elongation occurs in control conditions between 6 and 12 h APF, while this elongation is not observed upon Ds RNAi. The authors should therefore perform downshift experiments while monitoring wing shape during the pupal stage to substantiate their main claim. In addition, equivalent data for Fat loss of function should be included to support the assertion that Fat and Ds act similarly.

      Response: As noted in our response to point 1 of Reviewer 1, we agree that there does seem to be relatively more elongation in control wings than in ds RNAi wings, but we think this likely reflects effects of ds on growth during larval stages, and we will revise the manuscript to comment on this.

      We will also add the suggested examination of fat RNAi pupal wings.

      The suggested examination of pupal wing shape in downshift experiments is unfortunately not feasible. Our temperature shift experiments expressing ds or fat RNAi are done using the UAS-Gal4-Gal80tssystem. We also use the UAS-Gal4 system to mark the pupal wing. If we do a downshift experiment, then expression of the fluorescent marker will be shut down in parallel with the shut down of ds or fat RNAi, so the pupal wings would no longer be visible.

      Figure 3: The authors state that "These observations indicate that Ds-Fat signaling influences wing shape during the initial formation of the wing pouch, in addition to its effects during wing growth." This conclusion is not fully supported, as the authors only examine wing shape at 72 h AEL. At this stage, fat or ds mutant wings already display altered morphology. The authors could only make this claim if earlier time points were fully analyzed. In fact, the current data rather suggest that Ds function is required before 72 h AEL, as a rescue of wing shape is observed between 72 and 120 h AEL.

      Response: First, I think we are largely in agreement with the Reviewer, as the basis for our saying that DS-Fat are likely required during initial formation of the wing pouch is that our data show they must be required before 72 h AEL. Second, 72 h is the earliest that we can look using Wg expression as a marker, as at earlier stages it is in a ventral wedge rather than a ring around the future wing pouch + DV line (eg see Fig 8 of Tripathi, B. K. & Irvine, K. D. The wing imaginal disc. Genetics (2022) doi:10.1093/genetics/iyac020.). We can revise the text to make sure this is clear.

      Figure 4: The authors state that "The influence of Ds-Fat on wing shape is not explained by Hippo signaling." However, this conclusion is not supported by their data, which show that partial loss of ex or hippo causes clear defects in wing shape. In addition, the initial wing shape is affected in wts and ex mutants, and hypomorphic alleles were used for these experiments. Therefore, the main conclusion requires revision. It would be useful to include a complete dataset for hippo RNAi, ex, and wts conditions in Figure S1. The purpose and interpretation of the InR^CA experiments are also unclear. While InR^CA expression can increase tissue growth, Hippo signaling has functions beyond growth control. Whether Hippo regulates tissue shape through InR^CA-dependent mechanisms remains to be clarified.

      Response: As noted in our response to point 1 of Reviewer 2 - our results emphasize that the effects of Ds-Fat on wing shape cannot be explained solely by effects on Hippo signaling, eg as we stated on page 7 “These observations suggest that Hippo signaling contributes to, but does not fully explain, the influence of ds or fat on adult wing shape.” We also note that impairment of Hippo signaling has similar effects in younger discs, but very different effects in older discs, which clearly indicates that they are having very different effects during disc growth. We will make some revisions to the text to make sure that our conclusions are clear throughout.

      While we used a hypomorphic allele for wts, because null alleles are lethal, the ex allele that we used is described in Flybase as an amorph, not a hypomorph, and as noted in our response to Reviewer 2, we will add some discussion about relative strength of effects on Hippo signaling.

      In Fig S1, we currently show adult wings for ex[e1] and RNAi-Hpo, and wing discs for wts[P2]/wts[x1], and for ex[e1]. The wts combination does not survive to adult so we can’t include this. We will however, add hpo RNAi wing discs as requested.

                    The purpose of including InR^CA experiments is to try to separate effects of Hippo signaling from effects of growth, because InR signaling manipulation provides a distinct mechanism for increasing growth. We will revise the text to try to make sure this is clearer.
      

      Figure 5: This figure presents images of MyoII distribution, but no quantification across multiple samples is provided. Moreover, the relationship between changes in tissue stress and MyoII levels remains unclear. Performing laser ablation and MyoII quantification on the same samples would provide stronger support for the proposed conclusions.

      Response: We will revise the quantitation so that it presents analysis of averages across multiple discs, rather than representative examples of single discs.

      Both the myosin imaging, and the laser ablation were done on the same genotypes (wildtype and ds) at the same ages (108 h AEL) so we think it is valid to directly compare them. Moreover, the imaging conditions for laser ablation and myo quantification are different, so it’s not feasible to do them at the same time (For ablations we do a single Z plane and a single channel (has to include Ecad, or an equivalent junctional marker) on live discs, so that fast imaging can be done. For Myo imaging we do multiple Z stacks and multiple channels (eg Ecad and Myo), which is not compatible with the fast imaging needed for analysis of laser ablations).

      Figure 6: It is unclear when Rok RNAi and Rok^CA misexpression were induced. To substantiate their claims, the authors should measure both MyoII levels and mechanical tension under the different experimental conditions in which wing shape was modified through Rok modulation (i.e. the condition shown in Fig. 7G). For comparison, fat and ds data should be added to Fig 6H. Overall, the effects of Rok modulation appear milder than those of Fat manipulation. Given that Dachs has been shown to regulate tension downstream of Fat/Ds, it would be informative to determine whether tissue tension is altered in dachs mutant wings and to assess the relative contribution of Dachs- versus MyoII-mediated tension to wing shape control. It would also be interesting to test whether Rok activation can rescue dachs loss-of-function phenotypes.

      Response: In these Rok experiments there was no separate temporal control of Rok RNAi or Rok^CA expression, they were expressed under nub-Gal4 control throughout development.

      We will add examination of myosin in combinations of ds RNAi and rok manipulation as in Fig 7G to a revised manuscript.

      Data for fat and ds comparable to that shown in Fig 6H is already presented in Fig 3D, and we don’t think its necessary to reproduce this again in Fig 6H.

      We agree that the effects of Rok manipulations are milder than those of Fat manipulations; as we try to discuss, this could be because the pattern or polarity of myosin is also important, not just the absolute level, and we will add assessment of myosin polarity.

      The suggestion to also look at dachs mutants is reasonable, and we will add this. In addition, we plan to add an "activated" Dachs (a Zyxin-Dachs fusion protein previously described in Pan et al 2013) that we anticipate will provide further evidence that the effects of Ds-Fat are mediated through Dachs. We will also add the suggested experiment combining Rok activation with dachs loss-of-function.

      Figure 7: The authors use genetic interactions to support their claim that Fat controls wing shape independently of Hippo signaling. However, these interactions do not formally exclude a role for Hippo. Moreover, previous work has shown that tissue tension regulates Hippo pathway activity, implying that any manipulation of tension could indirectly affect Hippo and growth. To provide more direct evidence, the authors should further analyze MyoII localization and tissue tension under the various experimental conditions tested (as also suggested above).

      Response: As discussed above, our data clearly show that Fat has effects independently of Hippo signaling that are crucial for its effects on wing shape, but we did not mean to imply that the regulation of Hippo signaling by Fat makes no contribution to wing shape control, and we will revise the text to make this clearer. We will also add additional analysis of Myosin localization , as described above.

      Reviewer #3 (Significance (Required)): How organ growth and shape are controlled remains a fundamental question in developmental biology, with major implications for our understanding of disease mechanisms. The Drosophila wing has long served as a powerful and informative model to study tissue growth and morphogenesis. Work in this system has been instrumental in delineating the conserved molecular and mechanical processes that coordinate epithelial dynamics during development. The molecular regulators investigated by the authors are highly conserved, suggesting that the findings reported here are likely to be of broad biological relevance.

      Previous studies have proposed that anisotropic tissue growth regulates wing shape during larval development and that such anisotropy induces mechanical responses that promote MyoII localization (Legoff et al., 2013, PMID: 24046320; Mao et al., 2013, PMID: 24022370). The Ds/Fat system has also been shown to regulate tissue tension through the Dachs myosin, a known modulator of the Hippo/YAP signaling pathway. As correctly emphasized by the authors, the respective contributions of anisotropic growth and mechanical tension to wing shape control remain only partially understood. The current study aims to clarify this issue by analyzing the role of Fat/Ds in controlling MyoII localization and, consequently, wing shape. This represents a potentially valuable contribution. However, the proposed mechanistic link between Fat/Ds and MyoII localization remains insufficiently explored. Moreover, the role of MyoII is not fully discussed in the broader context of Dachs function and its known interactions with MyoII (Mao et al., 2011, PMID: 21245166; Bosveld et al., 2012, PMID: 22499807; Trinidad et al., 2024, PMID: 39708794). Most importantly, the experimental evidence supporting the authors' conclusions would benefit from further strengthening. It should also be noted that disentangling the relative contributions of anisotropic growth and MyoII polarization to tissue shape and size remains challenging, as MyoII levels are known to increase in response to anisotropic growth (Legoff et al., 2013; Mao et al., 2013), and mechanical tension itself can modulate Hippo/YAP signaling (Rauskolb et al., 2014, PMID: 24995985).

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary

      The authors investigate the mechanisms underlying epithelial morphogenesis using the Drosophila wing as a model system. Specifically, they analyze the contribution of the conserved Fat/Ds pathway to wing shape regulation. The main claim of the manuscript is that Ds/Fat controls wing shape by regulating tissue mechanical stress through MyoII levels, independently of Hippo signaling and tissue growth.

      Major Comments

      To support their main conclusions, the authors should address the following major points and consider additional experiments where indicated. Most of the suggested experiments are feasible within a reasonable timeframe, while a few are more technically demanding but would substantially strengthen the manuscript's central claims.

      Figure 1:

      The authors use temperature-sensitive inactivation of Fat or Ds to determine the developmental window during which these proteins regulate wing shape. To support this claim, it is essential to demonstrate that upon downshift during early pupal stages, Ds or Fat protein levels are restored to normal. For consistency, please include statistical analyses in Figure 1P and ensure that all y-axis values in shape quantifications start at 1.

      Figure 2:

      The authors propose that wing shape is regulated by Fat/Ds during larval development. However, Figure 2L suggests that wing elongation occurs in control conditions between 6 and 12 h APF, while this elongation is not observed upon Ds RNAi. The authors should therefore perform downshift experiments while monitoring wing shape during the pupal stage to substantiate their main claim. In addition, equivalent data for Fat loss of function should be included to support the assertion that Fat and Ds act similarly.

      Figure 3:

      The authors state that "These observations indicate that Ds-Fat signaling influences wing shape during the initial formation of the wing pouch, in addition to its effects during wing growth." This conclusion is not fully supported, as the authors only examine wing shape at 72 h AEL. At this stage, fat or ds mutant wings already display altered morphology. The authors could only make this claim if earlier time points were fully analyzed. In fact, the current data rather suggest that Ds function is required before 72 h AEL, as a rescue of wing shape is observed between 72 and 120 h AEL.

      Figure 4:

      The authors state that "The influence of Ds-Fat on wing shape is not explained by Hippo signaling." However, this conclusion is not supported by their data, which show that partial loss of ex or hippo causes clear defects in wing shape. In addition, the initial wing shape is affected in wts and ex mutants, and hypomorphic alleles were used for these experiments. Therefore, the main conclusion requires revision. It would be useful to include a complete dataset for hippo RNAi, ex, and wts conditions in Figure S1. The purpose and interpretation of the InR^CA experiments are also unclear. While InR^CA expression can increase tissue growth, Hippo signaling has functions beyond growth control. Whether Hippo regulates tissue shape through InR^CA-dependent mechanisms remains to be clarified.

      Figure 5:

      This figure presents images of MyoII distribution, but no quantification across multiple samples is provided. Moreover, the relationship between changes in tissue stress and MyoII levels remains unclear. Performing laser ablation and MyoII quantification on the same samples would provide stronger support for the proposed conclusions.

      Figure 6:

      It is unclear when Rok RNAi and Rok^CA misexpression were induced. To substantiate their claims, the authors should measure both MyoII levels and mechanical tension under the different experimental conditions in which wing shape was modified through Rok modulation (i.e. the condition shown in Fig. 7G). For comparison, fat and ds data should be added to Fig 6H.<br /> Overall, the effects of Rok modulation appear milder than those of Fat manipulation. Given that Dachs has been shown to regulate tension downstream of Fat/Ds, it would be informative to determine whether tissue tension is altered in dachs mutant wings and to assess the relative contribution of Dachs- versus MyoII-mediated tension to wing shape control. It would also be interesting to test whether Rok activation can rescue dachs loss-of-function phenotypes.

      Figure 7:

      The authors use genetic interactions to support their claim that Fat controls wing shape independently of Hippo signaling. However, these interactions do not formally exclude a role for Hippo. Moreover, previous work has shown that tissue tension regulates Hippo pathway activity, implying that any manipulation of tension could indirectly affect Hippo and growth. To provide more direct evidence, the authors should further analyze MyoII localization and tissue tension under the various experimental conditions tested (as also suggested above).

      Significance

      How organ growth and shape are controlled remains a fundamental question in developmental biology, with major implications for our understanding of disease mechanisms. The Drosophila wing has long served as a powerful and informative model to study tissue growth and morphogenesis. Work in this system has been instrumental in delineating the conserved molecular and mechanical processes that coordinate epithelial dynamics during development. The molecular regulators investigated by the authors are highly conserved, suggesting that the findings reported here are likely to be of broad biological relevance.

      Previous studies have proposed that anisotropic tissue growth regulates wing shape during larval development and that such anisotropy induces mechanical responses that promote MyoII localization (Legoff et al., 2013, PMID: 24046320; Mao et al., 2013, PMID: 24022370). The Ds/Fat system has also been shown to regulate tissue tension through the Dachs myosin, a known modulator of the Hippo/YAP signaling pathway. As correctly emphasized by the authors, the respective contributions of anisotropic growth and mechanical tension to wing shape control remain only partially understood. The current study aims to clarify this issue by analyzing the role of Fat/Ds in controlling MyoII localization and, consequently, wing shape. This represents a potentially valuable contribution. However, the proposed mechanistic link between Fat/Ds and MyoII localization remains insufficiently explored. Moreover, the role of MyoII is not fully discussed in the broader context of Dachs function and its known interactions with MyoII (Mao et al., 2011, PMID: 21245166; Bosveld et al., 2012, PMID: 22499807; Trinidad et al., 2024, PMID: 39708794). Most importantly, the experimental evidence supporting the authors' conclusions would benefit from further strengthening. It should also be noted that disentangling the relative contributions of anisotropic growth and MyoII polarization to tissue shape and size remains challenging, as MyoII levels are known to increase in response to anisotropic growth (Legoff et al., 2013; Mao et al., 2013), and mechanical tension itself can modulate Hippo/YAP signaling (Rauskolb et al., 2014, PMID: 24995985).

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      The manuscript begins with very nice data from a ts sensitive period experiment. Instead of a ts mutation, the authors induced RNAi in a temperature dependent manner. The results are striking and strong. Knockdown of FT or DS during larval stages to late L3 changed shape while knockdown of FT or DS during later pupal stages did not. This indicates they are required during larval, not pupal stages of wing development for this shape effect. They did shift-up or shift-down at "early pupa stage" but precisely what stage that means was not described anywhere in the manuscript. White prepupal? Time? Likewise a shift-down was done at "late L3" but that meaning is also vague. Moreover, I was surprised to see they did not do a shift-up at the late L3 stage, to give completeness to the experiment. Why?

      Looking at the "shape" of the larval wing pouch they see a difference in the mutants. The pouch can be approximated as an ellipse, but with differing topology to the adult wing. Here, they muddled the analysis. The adult wing surface is analogous to one hemisphere of the larval wing pouch, ie., either dorsal or ventral compartment. The distance along the AP boundary from the pouch border to DV midline is topologically comparable to the PD length of the adult wing. The distance along the DV boundary from A border to P border is topologically comparable to the AP length of the adult wing. They confusingly call this latter metric the "DV length" and the former metric the "AP length" , and in fact they do not measure the PD length but PD+DP length. Confusing. Please change to make this consistent with earlier analysis of the adult and invert the reported ratio and divide by two. Then you would find the larval PD/AP ratio is smaller in the FT and DS mutants than wildtype, which resembles the smaller PD/AP ratio seen in the mutant adult wings. Totally consistent and also provides further evidence with the ts experiments that FT and DS exert shape effects in the larval phase of life.

      The remainder of the manuscript has experimental results that are more problematic, and really the authors do not figure out how the shape effect in larval stages is altered. I outline below the main problems.

      1. They compare the FT DS shape phenotypes to those of mutants or knockdowns in Hippo pathway genes (Hippo is known to be downstream of FT and DS). They find these Hippo perturbations do have shape effects trending in same direction as FT and DS effects. Knockdown reduces the PD/AP ratio while overexpressing WARTS increases the PD/AP ratio. The effect magnitudes are not as strong, but then again, they are using hypomorphic alleles and RNAi, which often induces partial or hypomorphic phenotypes. The effect strength is comparable when wing pouches are young but then dissipates over time, while FT and DS effects do not dissipate over time. The complexity of the data do not negate the idea that Hippo signaling is also playing some role and could be downstream of FT and DS in all of this. But the authors really downplay the data to the point of stating "These results imply that Ds-Fat influences wing pouch shape during wing disc growth separately from its effects on Hippo signaling." I think a more expansive perspective is needed given the caveats of the experiments.

      Puzzlingly, this lack of taking seriously a set of complex results does not transfer to another set of experiments in which they inhibit or activate ROK, the rho kinase. When ROK is perturbed, they also see weak effects on shape when compared to FT or DS perturbation. This weakness is seen in adults, larvae, clones and in epistasis experiments. The epistasis experiment in particular convincingly shows that constitutuve ROK activation is not epistatic to loss of DS; in fact if anything the DS phenotype suppresses the ROK phenotype. These results also show that one cannot simply explain what FT and DS are doing with some single pathway or effector molecule like ROK. It is more complex than that.

      What I really think was needed were experiments combining FT and DS knockdown with other mutants or knockdowns in the Hippo and Rho pathways, and even combining Hippo and Rho pathway mutants with FT or DS intact, to see if there are genetic interactions (additive, synergistic, epistatic) that could untangle the phenotypic complexity. 2. Laser cutting experiments were done to see if there is anisotropy in tissue tension within the wing pouch. This was to test a favored idea that FT and DS activity generates anisotropy in tissue tension, thereby controlling overall anisotropic shape of the pouch. However there is a fundamental flaw to their laser cutting analysis. Laser cutting is a technique used to measure mechanical tension, with initial recoil velocity directly proportional to the tissue's tension. By cutting a small line and observing how quickly the edges of the cut snap apart, people can quantify the initial recoil velocity and infer the stored mechanical stress in the tissue at the time of ablation. Live imaging with high-speed microscopy is required to capture the immediate response of the tissue to the cut since initial recoil velocity occurs in the first few seconds. A kymograph is created by plotting the movement of the tissue edges over this time scale, perpendicular to the cut. The initial recoil velocity is the slope of the kymograph at time zero, representing how fast the severed edges move apart. A higher recoil velocity indicates higher mechanical tension in the tissue. However, the authors did not measure this initial recoil velocity but instead measured the distance between the severed edges at one time point: 60 seconds after cutting. This is much later than the time point at which the recoil usually begins to dissipate or decay. This decay phase typically lasts a minute or two, during which time the edges continue to separate but at a progressively slower rate. This time-dependent decay of the recoil reveals whether the tissue behaves more like a viscous fluid or an elastic solid. Therefore, the distance metric at 60 seconds is a measurement of both tension and the material properties of the cells. One cannot know then whether a difference in the distance is due to a difference in tension or fluidity of the cells. If the authors made measurements of edge separation at several time points in the first 10 seconds after ablation, they can deconvolute the two. Otherwise their analysis is inconclusive. Anisotropy in recoil could be caused by greater tissue fluidity along one axis. Observing a gradient of cell fluidity in a tissue along one axis of a tissue has been observed in the amnioserosa of Tribolium for example. (Related and important point - was the anisotropy of recoil oriented along the PD or AP axis or not oriented to either axis, this key point was never stated)..

      The authors cannot definitiviely conclude anything about mechanical tension from their reported cutting data. 3. They measured the eccentricity of wing pouch cells near the pouch border, and found they were highly anisotropic compared to DS mutant cells at comparable locations. Cells were elongated but again what if either axis (PD or AP) they were elongated along was never stated. If cell anistropy is caused by polarized myosin activity, that activity is typically polarized along the short edges not long edges. Thus, recoil velocity after laaser cutting would be stonger along the axis aligned with short cell edges. It looks like the cutting anisotropy they see is greater along the axis aligned with long cell edges. Of course, if the cell anisotropy is caused by a pulling force exerted by the pouch boundary, then it would stretch the cells. This would in fact fit their cutting data. But then again, the observed cell anisotropy could also be caused by variation in the fluid-solid properties of the wing cells as discussed earlier. Compression of the cells then would deform them anisotropically and produce the anisotropic shapes that were observed, Therefore, interpreting what causes the cell anistropy and how DS regulates it is difficult, 4. The imaging and analysis of the myosin RLC by GFP tagging is also flawed. SQH-GFP is a tried and true proxy for myosin activity in Drosophila. Although the authors image the wing pouch of wildtype and DS mutants. they did so under low magnification to image the entire pouch. This gives a "low-res" perspective of overall myosin but what they needed to do was image at high magnification in that proximal region of the pouch and see if Sqh-GFP is polarized in wildtype cells along certain cell edges aligned with an axis. And if such a polarity is observed, is it present or absent in the DS mutant. From the data shown in Figure 5, I cannot see any significant difference between wildtype and knocked down samples at this low resolution. Any difference, if there is any, is not really interpretable.

      In conclusion, the manuscript has multiple problems that make it imposiible for the authors to make the claims they make in the current manuscript. And even if they calibrated their interpretations to fit the data, there is not much of a simple clear picture as to how FT and DS regulate pouch eccentricity in the larval wing.

      Significance

      This manuscript describes experiments studying the role that the protocadherins FAT and DACHSOUS play in determining the two dimensional "shape" of the fruit fly wing. By "shape", the manuscript really means how much the wing's outline, when approximated as an ellipse, deviates from a circle. The elliptical approximations of FT and DS mutant wings more closely resemble a circle compared to the more eccentric wildtype wings. This suggests the molecules contribute to anisotropic growth in some way. A great deal of attention has been paid on how FT and DS regulate overall organ growth and planar cell polarity, and the Irvine lab has made extensive contributions to these questions over the years. Somewhat understudied is how FT and DS regulate wing shape, and this manuscript focuses on that. It follows up on an interesting result that the Irvine lab published in 2019, in which mud mutants randomized spindle pole orientation in wing cells but did not change the eccentricity of wings, ruling out biased cell division orientation as a mechanism for the anisotropic growth.

    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      In this work, Tripathi et al address the open question of how the Fat/Ds pathway affects organ shape, using the Drosophila wing as a model. The Fat/Ds pathway is a conserved but complex pathway, interacting with Hippo signalling to affect growth and providing planar cell polarity that can influence cellular dynamics during morphogenesis. Here, authors use genetic perturbations combined with quantification of larval, pupal, and adult wing shape and laser ablation to conclude that the Ft/Ds pathway affects wing shape only during larval stages in a way that is at least partially independent of its interaction with Hippo and rather due to an effect on tissue tension and myosin II distribution. Overall the work is clearly written and well presented. I only have a couple major comments on the limitations of the work.

      Major comments:

      1. Authors conclude from data in Figures 1 and 2 that the Fat/Ds pathway only affects wing shape during larval stages. When looking at the pupal wing shape analysis in Figure 2L, however, it looks there is a difference in wt over time (6h-18h, consistent with literature), but that difference in time goes away in RNAi-ds, indicating that actually there is a role for Ds in changing shape during pupal stages, although the phenotype is clearly less dramatic than that of larval stages. No statistical test was done over time (within the genotype), however, so it's hard to say. I recommend the authors test over time - whether 6h and 18h are different in wild type and in ds mutant. I think this is especially important because there is proximal overgrowth in the Fat/Ds mutants, much of which is contained in the folds during larval stages. That first fold, however, becomes the proximal part of the pupal wing after eversion and contracts during pupal stages to elongate the blade (Aiguoy 2010, Etournay 2015). Also, according to Trinidad Curr Biol 2025, there is a role for Fat/Ds pathway in pupal stages. All of that to say that it seems likely that there would be a phenotype in pupal stages. It's true it doesn't show up in the adult wing in the experiments in Fig 1, but looking at the pupal wing itself is more direct - perhaps the very proximal effect is less prominent later, as there is potential for further development after 18hr before adulthood and the most proximal parts are likely anyway excluded in the analysis.
      2. I think there needs to be a mention and some discussion of the fact that the wing is not really flat. While it starts out very flat at 72h, by 96h and beyond, there is considerable curvature in the pouch that may affect measurements of different axis and cell shape. It is not actually specified in the methods, so I assume the measurements were taken using a 2D projection. Not clear whether the curvature of the pouch was taken into account, either for cell shape measurements presented in Fig 4 or for the wing pouch dimensional analysis shown in Fig 3, 6, and supplements. Do perturbations in Ft/Ds affect this curvature? Are they more or less curved in one or both axes? Such a change could affect the results and conclusions. The extent to which the fat/ds mutants fold properly is another important consideration that is not mentioned. For example, maybe the folds are deeper and contain more material in the ds/fat mutants, and that's why the pouch is a different shape? At the very least, this point about the 3D nature of the wing disc must be raised in discussion of the limitations of the study. For the cell shape analysis, you can do a correction based on the local curvature (calculated from the height map from the projection). For the measurement of A/P, D/V axes of the wing pouch, best would be to measure the geodesic distance in 3D, but this is not reasonable to suggest at this point. One can still try to estimate the pouch height/curvature, however, both in wild type and in fat/ds mutants.

      Minor comments:

      1. The analysis of the laser ablation is not really standard - usually one looks at recoil velocity or a more complicated analysis of the equilibrium shape using a model (e.g Shivakumar and Lenne 2016, Piscitello-Gomez 2023, Dye et al 2021). One may be able to extract more information from these experiments - nevertheless, I doubt the conclusions would change, given that that there seems to be a pretty clear difference between wt and ds (OPTIONAL).
      2. Figure 7G: I think you also need a statistical test between RNAi-ds and UAS-rokCA+RNAi-ds.
      3. In the discussion, there is a statement: "However, as mutation or knock down of core PCP components, including pk or sple, does not affect wing shape... 59." Reference 59 is quite old and as far as I can tell shows neither images nor quantifications of the wing shape phenotype (not sure it uses "knockdown" either - unless you mean hypomorph?). A more recent publication Piscitello-Gomez et al Elife 2023 shows a very subtle but significant wing shape phenotype in core PCP mutants. It doesn't change your logic, but I would change the statement to be more accurate by saying "mutation of core PCP components has only subtle changes in adult wing shape"

      Referee cross-commenting

      Reviewer2:

      Reviewer 2 makes the statement: "The distance along the AP boundary from the pouch border to DV midline is topologically comparable to the PD length of the adult wing. The distance along the DV boundary from A border to P border is topologically comparable to the AP length of the adult wing."

      I disagree - the DV boundary wraps around the entire margin of the adult wing (as correctly drawn with the pink line in Fig 2A). It is not the same as the wide axis of the adult wing (perpendicular to the AP boundary). It is not trivial to map the proximal-distal axis of the larval wing to the proximal-distal axis of the adult, due to the changes in shape that occur during eversion. Thus, I find it much easier to look at the exact measurement that the authors make, and it is much more standard in the field, rather than what the reviewer suggests. Alternatively, one could I guess measure in the adult the ratio of the DV margin length (almost the circumference of the blade?) to the AP boundary length. That may be a more direct comparison. Actually the authors leave out the term "boundary" - what they call AP is actually the AP boundary, not the AP axis, and likewise for the DV - what they measure is DV boundary, but I only noticed that in the second read-through now. Just another note, these measurements of the pouch really only correspond to the very distal part of the wing blade, as so much of the proximal blade comes from the folds in the wing disc. Therefore, a measurement of only distal wing shape would be more comparable.

      Reviewer 2 states that authors cannot definitively conclude anything about mechanical tension from their reported cutting data because the authors have not looked at initial recoil velocity. I strongly disagree. The wing disc tissue is elastic on much longer timescales than what's considered after laser ablation (even hours), and the shape of the tissue after it equilibrates from a circular cut (1-2min) can indeed be used to infer tissue stresses (see Dye et al Elife 2021, Piscitello-Gomez et al eLife 2023, Tahaei et al arXiv 2024). In the wing disc, the direction of stresses inferred from initial recoil velocity are correlated with the direction of stresses inferred from analysing the equilibrium shape after a circular cut. Rearrangements, a primary mechanism of fluidization in epithelia, does not occur within 1'. Analysing the equilibrium shape after circular ablation may be more accurate for assessing tissue stresses than initial recoil velocity - in Piscitello-Gomez et al 2023, the authors found that a prickle mutation (PCP pathway) affected initial recoil velocity but not tissue stresses in the pupal wing. Such equilibrium circular cuts have also been used to analyze stresses in the avian embryo, where it correlates with directions of stress gathered from force inference methods (Kong et al Scientific Reports 2019). The Tribolium example noted by the reviewer is on the timescale of tens to hundreds of minutes - much longer than the timescale of laser ablation retraction. It is true the analysis of the ablation presented in this paper is not at the same level as those other cited papers and could be improved. But I don't think the analysis would be improved by additional experiments doing timelapse of initial retraction velocity.

      Reviewer 2 states "If cell anistropy is caused by polarized myosin activity, that activity is typically polarized along the short edges not long edges" Not true in this case. Myosin II accumulates along long boundaries (Legoff and Lecuit 2013). "Therefore, interpreting what causes the cell anistropy and how DS regulates it is difficult," Agreed - but this is well beyond the scope of this manuscript. The authors clearly show that there is a change of cell shape, at least in these two regions. Better would be to quantify it throughout the pouch and across multiple discs. Similar point for myosin quantifications - yes, polarity would be interesting and possible to look at in these data, and it would be better to do so on multiple discs, but the lack of overall myosin on the junctions shown here is not nothing. Interpreting what Ft/Ds does to influence tension and myosin and eventually tissue shape is a big question that's not answered here. I think the authors do not claim to fully understand this though, and maybe further toning down the language of the conclusions could help.

      Reviewer 3:

      I agree with many of the points raised by Reviewer 3, in particular that relevant for Fig 1. The additional experiments looking at myosin II localization and laser ablation in the other perturbations (Hippo and Rok mutants/RNAi) would certainly strengthen the conclusions.

      Significance

      I think the work provides a clear conceptual advance, arguing that the Ft/Ds pathway can influence mechanical stress independently of its interaction with Hippo and growth. Such a finding, if conserved, could be quite important for those studying morphogenesis and Fat function in this and other organisms. For this point, the genetic approach is a clear strength. Previous work in the Drosophila wing has already shown an adult wing phenotype for Ft/Ds mutations that was attributed to its role in the larval growth phase, as marked clones show aberrant growth in mutants. The novelty of this work is the dissection of the temporal progression of this phenotype and how it relates to Hippo and myosin II activation. It remains unclear exactly how Ft/Ds may affect tissue tension, except that it involves a downregulation of myosin II - the mechanism of that is not addressed here and would involve considerable more work. I think the temporal analysis of the wing pouch shape was quite revealing, providing novel information about how the phenotype evolves in time, in particular that there is already a phenotype quite early in development. As mentioned above, however, the lack of consideration of the wing disc as a 3D object is a potential limitation. While the audience is likely mostly developmental biologists working in basic research, it may also interest those studying the pathway in other contexts, including in vertebrates given its conservation and role in other processes.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Overall Response.

      We would like to thank the reviewers for their analysis of the manuscript. From their comments it is clear that our manuscript was not. We completely rewrote the manuscript to focus on the central core question which was how does Adam13 regulates gene expression in general and TFap2a in particular leading to the expression of Calpain8 a protein required for CNC migration.

      The following model will be the central line of our story. It will address all of the proteins involved and mechanistical evidences that link Adam13 to one of its proven effector target Calpain8.

      • *

      *Reviewer #1 (Evidence, reproducibility and clarity (Required)): **

      In this manuscript, Pandey et al. show that the ADAM13 protein modulates histone modifications in cranical neural crest and that the Arid3a protein binds the Tfap2a promoter in an Adam13-dependent manner and has promoter-specific effects on transcription. Furthermore, they show that the Adam13 and human ADAM9 proteins associated with histone modifiers as well as proteins involved in RNA splicing. Although the manuscript is mostly clearly written and the figures well assembled, it reads like a couple of separate and unfinished stories.*

      I believe that our story line was not clear and that the overarching questions was not well stated. We have made every effort to change this in the revised manuscript. I would like to include a figure that explains the story.

      In short:

      1 We knew that Adam13 could regulate gene expression in CNC via its cytoplasmic domain.

      2 We also knew that this required Adam13 interaction with Arid3a and that a direct target with the transcription factor TFAP2a which in turn regulates functional targets that we had identified including the protocadherin PCNS and the protease Calpain8.

      Our goal was to understand the mechanism allowing Adam13 to regulate gene expression.

      3 This first part of this manuscript shows how Adam13 modulates Histone modification in vivo in the CNC globally as well as specifically on the Tfap2a promoter. This results I an Open chromatin.

      4 Using Chip we show that Adam13 and Arid3a both bind to the Tfap2a promoter and that Arid3a binding to the first ATG depends on Adam13.

      5 Using Luciferase reporter we show that both Adam13 and Arid3a can induce expression at the first ATG.

      *They show using immunocytochemistry and qPCR that ADAM13 knockouts in CNCs afffects histone modifications. Here ChIP-seq or Cut-n-Run experiments would be more appropriate and would result in a more comprehensive understanding of the changes mediated. *

      I agree but we did not have the fund and now I have nobody working in the lab to do this experiment. These are also likely to overlap with the RNAseq data that we have and would simply add more open leads. We selected to go after the only direct target that we know which is TFAP2a and focus on this gene to understand the mechanism.

      We believe that the Chip PCR experiment are sufficient for this story.

      *The immunohistochemistry assays should at least be verified further using western blotting or other more quantiative methods. *

      Immunofluorescence and statistical analysis is a valid quantification method. Western blot of CNC explants is not trivial and requires a large amount of material. Given the small overall change we also would not expect to be able to detect the change over the noise of western blot. The Chip PCR confirms our finding in a completely independent manner.

      *The authors then show that ADAM13 interacts with a number of histone modifiers such as KDM3B, KDM4B and KMT2A but strangely they do not follow up this interesting observation to map the interactions further (apart from a co-ip with KMT2A), the domains involved, the functional role of the interactions or how they mediate the changes in chromatin modifications. *

      We selected KMT2a because it is expressed in the Hek293T cells. KMT2D has been shown to regulate CNC development in Xenopus and is responsible for the Kabuki syndrome in human. We used aphafold to predict interaction and found that Adam13 interact with the Set domain. In addition we see multiple Set- containing domain protein in our mass spec data. The mass spec is done on Human hek293T cells that express a subset of KMT proteins. We now include evidence that Adam13 interact with the KMT2D SET domain (new figure 5D)

      The authors then show that ADAM13 affects expression of the TFAP2a gene in a promoter specific manner - affecting expression from S1 but not S2.

      It is the S1 but not S3. Adam13 has no effect on S2.

      • They further show that ADAM13 affects the binding of the Arid3 transcription fator to the S1-promoter but not to the S3 promoter. However, ADAM13 was present at both promoters. Absence of ADAM13 resulted in increased H3K9me2/3 and decreased H3K4me3 at the S1 promoter whereas only H3K4me3 was changed at the S2 promoter*

      S3 not S2*. Unfortunately, they do not show how this is mediated or through which binding elements this takes place. Why is ADAM13 present at both promoters but only affects Arid3 binding at S1? *

      We agree this is a very interesting question that could be the subject of an entire publication. Promoter deletion and mutation to identify which site are bound by and modulated by Adam13/Arid3a is not trivial.

      *The authors claim that transfecting Arid3a and Adam13 together further increases expression from a reporter (Fig 4E) but this is not true as no statistical comparison is done between the singly transfected and double transfected cells. *

      This is correct, there is a small increase that is not significant with both. The fact that both proteins can induce the promoter suggest but does not prove that they can have additive roles. The loss of function experiment shows that the human Arid3a expressed in Hek293T cells is important for Adam13 increases of S1. It is possible that the dose of the endogenous Arid3a is sufficient to get full activity of Adam13.* Then the authors surprisingly start investigating association of proteins with the two isoforms of TFAP2a which in the mind of this reviewer is a different question entirely. *

      We agree and have removed this part of the manuscript.

      *They find a number of proteins involved in splicing. And the observation that ADAM13 also interacts with splicing factors is really irrelevant in terms of the story that they are trying to tell. Transcription regulation and splicing are different processes and although both affect the final outcome, mRNA, they need to be investigated separately. The link is at least not very clear from the manuscript. Again, the effects on splicing are not further investigated through functional analysis and as presented the data presented is too open-ended and lacking in clarity. *

      We agree that beside the different activities of the TFap2a isoform, the rest of the splicing regulation could be a separate study. We were interested to understand how these two isoforms could activate Calpain8 so differently this is why we looked at LC/MS/MS. We have removed this part of the story from the manuscript.*

      Additional points: 1. In the abstract they propose that the ADAMs may act as extracellular sensors. This is not substantiated by the results. *

      As an extracellular protein translocating into the nucleus it is a possibility that we propose, but I agree this is not investigated in this manuscript. We will modify the text.* 2. Page 5, line 16: what is referred to by 6 samples 897 proteins? Were 6 samples analyzed for each condition? The number of repeats for the mass spec analysis is not clear from the text nor are the statistical parameters used to analyse the data. This is also true for the mass spec presented in the part on TFAP2aL-S1 and Adam13 regulate splicing. Statistics and repeats are not presented. *

      In general we provide biological triplicate and use the statistical function of Scaffold to identify proteins that are significantly enriched or absent in each samples.

      When we specify 6 samples it means 6 independent proteins samples were analyzed and used for our statistic. We use Scafold T-test with a p value less than 0.05. Peptides were identified with 95% confidence and proteins with 99% confidence.* 3. Page 6, line 19: set domain should be SET domain. *

      Yes* 4. The number of repeats in the RNA sequencing of the CNCs is not clear from the text. *

      Three biological replicates (Different batch of embryos from different females).* 5. The explanation of Figure C is a bit lacking. There are two forms of TFAP2a, L and S, but only one is presented in the figure. Do both forms have the extra S1-3 exons? Also, at the top of the figure it is not clear that the boxes are part of a continuous DNA sequence. Also, it is not clear which codon is not coding. *

      Xenopus laevis are pseudo tetraploid giving in most cases L and S genes in addition to the 2 alleles from being diploid. The TFAP2a gene structure is conserved between both aloalleles and is similar to the human gene. For promoter analysis and Chip PCR we chose one of the alloallele (L), given that the RNAseq data showed that both genes and variant behave the same in response to Adam13. This only becomes important in loss of function experiment in which both L and S version need to be knock down or Knock out.

      * In the sashimi plot there are green and pink shaded areas. What do they denote? What exactly is lacking in the MO13 mutant - seems that a particular exon is missing suggesting skipping?*

      MO13 is a morpholino that bocks the translation of Adam13 (Already characterized with >90% of the protein absent) but does not affect Adam13 mRNA expression.* 7. Page 11, line 9: „with either MbC or MbC and MO13" needs to be rephrased. *

      Will do *8. Page 11, line 19: „the c-terminus of....and S3) and" should be „the C-terminus of...and S3 and". ** 9. Page 15, line 10: substrateS 10. Page 16, line 23: the sentence „increases H3K9 to the promoter of the most upstream" needs revision. 11. Page 26, line 12: Here the authors say: „for two samples two-tail unpaired". What does this mean? Statistics should not be performed on fewer than three samples. In legnd to Figure 6 it indicates that T-test was performed on two samples. 12. The discussion should be shortened and simplified. 13. Figure 1 legend. How many images were quantitated for each condition? *

      At least 3 images per condition. For 3 independent experiments. (9 images per condition).* 14. Figure 2 has a strange order of panels where G is below B. 15. Figure 6 legend, line 12. „proteins that were significantly enriched in either of the 2 samples" is not very clear. What exactly does this mean?

      Reviewer #1 (Significance (Required)):

      If the authors follow up on either the transcription-part of the story, or the splicing part of the story, they are likely to have important results to present. However, in the present format the paper is lacking in focus as both issues are mixed together without a clear end-result. *

      We have entirely changed the paper according to these comments.

      *

      • *

      *Reviewer #2 (Evidence, reproducibility and clarity (Required)): **

      Panday et al seeks to determine the function of ADAM13 in regulating histone modifications, gene expression and splicing during cranial neural crest development. Specifically, the authors tested how Adam13, a metalloprotease, could modify chromatin by interaction with Arid3a and Tfap2a and RNA splicing and gene expression. They then utilize knockouts in Xenopus and HEK293T cells followed by immunofluorescence, IPs, BioID, luciferase assays, Mass spec and RNA assays. Although there is some strong data in the BioID and luciferase experiments, the manuscript tells multiple stories, linking together too many things to make a compelling story. The result is a paper that is very difficult to read and understand the take home message. In addition, some of the conclusions are not supported by the data. This unfortunately means it is not ready for publication. However, I have added below some suggestions that would strengthen the manuscript. My comments are below: *

      Clarity is clearly an issue here. The new version is entirely re-written.

      Here is the take home message:

      We knew that Adam13 could regulate gene expression via its cytoplasmic domain. One of the key targets was identified as Calpain8, a protein critical for CNC migration. We subsequently showed that Adam13 and Arid3a regulated Tfap2a expression which in turn regulated Calpain8.

      In this manuscript we investigated 1) how Adam13 regulates TFAP2a and 2) how Tfap2a controls Calpain8 expression.

      The take home message is that Adam13 bind to Histone methyl transferase and changes the histone methylation code overall in the CNC and in particular at the TFAP2a promoter. This results in more open chromatin. We further find that Adam13 binds to the Tfap2a promoter in vivo and is important for Arid3a binding to the first start. Tfap2a that include this N-terminus sequence regulates Capn8 expression.*

      Major comments: 1. I think it would be better to split out the chromatin modification function from the splicing in two separate papers. While there is a connection, having it all together makes the story difficult to follow. *

      Agree but I believe that the S1 vs S3 story of Tfap2a is important for the overall story. The new paper does not emphasize splicing.* 2. The immunofluorescence of H3K9me2/3, in Figure 1, 2, 3 following Adam13 knockdown is not convincing. There seems to be a strong edge effect especially in Figure 2 and 3. *

      The statistical analysis shows that the results, while modest, are significant (Three independent experiments using 3 different females and 3 explants for each condition were analyzed). The edge effect observed is eliminated by the mask that we use that normalize the expression to either DAPI or Snai2. The edge effect is seen in both control and KD as well. These are further confirmed by the Chip PCR on one direct target.

      Similarly the Arid3a expression in Supp Figure 1 if anything seems increased.

      We have previously shown that Arid3a expression is not affected by Adam13 KD (Khedgikar et al). Our point here is simply that the difference in Tfap2a cannot be explained by a decrease in Arid3a expression. It is not a critical figure and was eliminated in the new manuscript.

      *It would be better to quantify by western blot and not by fluorescent intensity since it is difficult to determine what a small change in fluorescent intensity means in vivo. *

      Not all antibodies used here work by western blot and the quantity of material required for western blot is much larger than IF. Given the small overall changes and the variability observed in Western blot it is not a viable alternative.

      IF is a quantitative method that has been used widely to assert increase or decrease of protein level or post translational modification. The fact that the same post translational modification that we see in cranial neural crest explants can also be seen by ChipPCR on the Tfap2a promoter confirm this observation.

      *Also, it does not say in the text or the figure legend what these are, Xenopus explants of CNC? *

      These are CNC explants. It is now clearly stated in the figure legend.* 3. The rationale for isolating KMT2A from the other chromatin modifiers in the dataset is not clear. *

      The new manuscript is clarifying that point. Because we are using Hek293T cells in this assay, which are human embryonic kidney derived instead of Xenopus Cranial neural crest cells, we are not interested in a specific protein but rather a family of protein that can modify histones (KMT and KDM). Our rational is if Adam13 can bind to KMT2 via the SET domain, it is likely to interact with KTM2 that are expressed in the CNC. KMT2A and D are expressed in the CNC. This is why we selected KMT2a here (Hek293T). We now include 1 co-IP with the Set domain of Xenopus KMT2D (new figure 5D)

      From the RNA-seq in Supp Figure 2 it is not changed as much as likely some of the others.

      The new manuscript addresses this point. We did not show or expect that the loss of Adam13 would affect mRNA expression of Kmt2.

      *Also, the arrow seems to indicate that it is right above the cutoff. What about other proteins with ATPase activity? That is the top hit in the Dot plot nuclear function. Would be helpful to write out Adam13 cytoplasm/nucleus here. *

      We have used another set of proteomics data that does not include the cytoplasmic/nuclear extract to simplify the results. We hope that the changes make it more obvious.

      Given that we are looking at Chromatin remodeling enzyme here we did not chose to investigate further in this report the ATPase. This is such a wide category that it could lead us away from the main story here.* 4. The splicing information, while interesting would be better as a different manuscript. The sashimi plot requires more explanation as written. *

      We agree and think that a simple representation of the fold change of the different isoform is more obvious. It is now a minor part of figure 1 and the legend has been improved to describe the method here.

      How do you tell if the interactions are changed from this?

      I do not understand this question. The sashimi plot indicate the read through from the mRNA that goes from one exon to the next quantifying the specific exon usage. It can therefore be quantified and compared between different conditions.

      • The authors argue there is a reduction of Tfap2a in Figure 3H but half the explant is not expressing sox9 in the Adam13 knockdown. How is this kind of experiment controlled when measure areas that don't have any fluorescence because of the nature of the explants? *

      We have removed this figure as we had already shown previously by western blot that Tfap2a protein decreased in MO13 embryos. As noted on the histogram, the fluorescence is only measured in Sox9 positive cells in each explant. Three independent experiments with 3 explants for each. We also have seen a decrease by Western blot and mRNA expression (Both RNAseq and realtime PCR). In most of our explants, the vast majority of the cells are positive for Snai2 and Sox9, while those that are negative are positive for Sox3 (data not shown here). There is always less signal in the center of the explant possibly due to the penetration of antibody or interference with the signal by the cells pigment or yolk autofluorescence. Our control explants have the same effect so our quantification is valid.* 5. The use of a germ line Xenopus mutant for Adam13 is great but how were these knockouts validated? *

      All of the KO were validated by sequencing, RNAseq and protein expression. These are now included in the supplemental figure 1.

      *More information is required here. The Chip-qPCR has a lot of variability between the samples, especially in the H3K9me2/3. *

      All ChipPCR were performed on Xenopus embryos. The variability is tested by statistical analysis and is either significant or not.

      Because these are in cell lines, this should be more consistent.

      They are not in cell lines but in Xenopus embryos.

      • In addition, it is difficult to understand what this means for cranial neural crest cells when assaying in HEK293T cells with the luciferase assay. *

      We use Luciferase assay in Hek293T cells to test if Xenopus protein can induce a specific reporter (Gain of function). We also use luciferase reporter in Xenopus to test if they can perceive the loss of a specific protein (For example Adam13).

      Our result show that Adam13 or Arid3a expression in Hek293T cells can induce the TFAP2S1 reporter. * 6. The migration assay shows only an example of what it looks like to have defective migration. But it would be better to show control embryos, embryos with Adam13 knockdown and what the rescues look like so the reader can make their own conclusion.*

      We can certainly include this but have published this assay in multiple publication before. The picture is a single example, the histogram shows that statistical validation.

      • The argument from the section above suggests the S1 isoform is the primary one but S3 in this assay also rescues, please explain what this result means since it seems to suggest that even though these isoforms have different activity the function is similar in terms of the ability to rescue defective migration. *

      The result in Hek293T cells shows that only TFAP2aS1 can induce Calpain8, while both S1 and S3 can partially rescue CNC migration in embryos lacking Adam13. The issue here is the dose of mRNA injected for each variant might be too high. Adam13 proteolytic activity is also critical, so we do not expect a complete rescue. The fact that S1 is significantly better at rescuing than S3 is relevant here. It is possible that if we were to decrease the dose of each mRNA we would find one in which S3 no longer rescues but S1 does.

      * The next section again talks about yet another protein Calpain-8. Here the authors use MO13 for luciferase assays instead of HEK293 cells. The authors do not explain why they decided to switch from cells to MO.*

      Calpain8 is one of the validate target of Adam13 that can rescue CNC migration (Cousin et al Dev Cell). We use the luciferase reporter corresponding to the Xenopus Capn8 reporter to show 1 in vivo that loss of Adam13 reduce its expression (Similar to the Capn8 gene). We then went in vitro using Hek293T cells for gain of function experiment that shows that only the Tfaps2S1 variant can induce it while S3 does not.

      We hope that the graphical summary and the new manuscript make this clear.* 8. The experiment to IP RNA supports only the correlation that Adam9 and Adam13 bind RNA and RNA binding proteins to regulate splicing. This conclusion presented is not supported by the data presented here. While there is a sentence about why Adam9 was chosen here, it would be preferred to focus on Adam13 as the rest of the manuscript is focused on Adam13. The conclusions are generalized to all ADAMs, but ADAM13 and ADAM9 are the only ADAMs investigated in the manuscript *

      This figure is no longer included. For each of the protein classes that we identify by Masspec we try to find a validation. RNA-IP is simply a validation that Adam13 and Adam9 can bind to complexes that include RNA in a cytoplasmic domain dependent fashion. The conclusion that Adam13 and possibly ADAM9 might be involved in regulating splicing is 1) that the protein associated with Adam13 are include multiple splicing factors, 2) that the RNAseq analysis shows abnormal splicing in CNC missing Adam13 and 3) that the form of TFAP2a induced by Adam13 (S1) associate significantly more with splicing factor than the S3 isoform.

      We agree that the generalization to other ADAM is not demonstrated here but only suggested. We selected ADAM9 and ADAM19 because we have shown that they can each rescue Adam13 function in the CNC. Unfortunately there are no ADAM19 antibody that work by IP on the market. We have tested multiple company and multiple cell lines.

      We believe that the ADAM9 experiment is critical to show that the protein associated with Adam13 are not simply the result of overexpressing a different species protein sin ADAM9 is the endogenous protein.*

      Minor comments 1. The manuscript using a lot of abbreviations (PCNS, NI, MO, SH3) and lingo that are unclear to a general reader. Please define acronyms when first used, as well as be clear on which model is being used in each experiment. *

      We have corrected this* 2. Similarly, the figures are not labeled such that a reader would be able to understand ie MO13 should be Adam13 knockdown etc. *

      We have corrected this in the legend

      • Please identify the genes on the heatmap and some highlighted genes from volcano plot from the RNA-seq.*

      The volcano plot is from MS/MS not RNAseq. We have list of all of the genes and/or proteins corresponding to each figure in tables

      We now have a figure from the RNAseq and a subset of genes of interest are show. *4. Why use the flag tag in Figure 5? *

      We used Flag-tagged construct to only immunoprecipitated the variants and not the endogenous TFPA2a in these experiments. Also we used RFP-Flag to eliminate any protein that bound to the tag or the antibody.

      This figure is no longer in the manuscript.* 5. Is the data in figure 4A-D the same as Supp. Figure 4A-D? *

      These are independent biological replicates of the same experiment.* 6. Please italicize gene symbols - e.g. "key transcription factors that exemplify CNC, such as the SOX9, FOXD3, SNAI1, SNAI2, and TFAP2 family". *

      We clearly have missed some, we are using italicized for gene, and regular for proteins. It might not be clear in the text when we are referring to genes and proteins. We will correct this in the rewrite. 7. Please review the manuscript for grammatical and typographical errors. * We have used all available software including Word and Grammarly. We will try to improve on the next version. **Cross-commenting**

      I think the two reviewers on one the same page on this manuscript.

      Reviewer #2 (Significance (Required)):

      If more solid, would be a conceptual advance in role of Adam13 in mediating chromatin modification and transcription factors, adds to exiting work from this lab, good for a specialize audience, my expertise is in in neural crest development, non-mammalian modes, epigenetic regulators.*

      • *
    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Panday et al seeks to determine the function of ADAM13 in regulating histone modifications, gene expression and splicing during cranial neural crest development. Specifically, the authors tested how Adam13, a metalloprotease, could modify chromatin by interaction with Arid3a and Tfap2a and RNA splicing and gene expression. They then utilize knockouts in Xenopus and HEK293T cells followed by immunofluorescence, IPs, BioID, luciferase assays, Mass spec and RNA assays. Although there is some strong data in the BioID and luciferase experiments, the manuscript tells multiple stories, linking together too many things to make a compelling story. The result is a paper that is very difficult to read and understand the take home message. In addition, some of the conclusions are not supported by the data. This unfortunately means it is not ready for publication. However, I have added below some suggestions that would strengthen the manuscript. My comments are below:

      Major comments:

      1. I think it would be better to split out the chromatin modification function from the splicing in two separate papers. While there is a connection, having it all together makes the story difficult to follow.
      2. The immunofluorescence of H3K9me2/3, in Figure 1, 2, 3 following Adam13 knockdown is not convincing. There seems to be a strong edge effect especially in Figure 2 and 3. Similarly the Arid3a expression in Supp Figure 1 if anything seems increased. It would be better to quantify by western blot and not by fluorescent intensity since it is difficult to determine what a small change in fluorescent intensity means in vivo. Also, it does not say in the text or the figure legend what these are, Xenopus explants of CNC?
      3. The rationale for isolating KMT2A from the other chromatin modifiers in the dataset is not clear. From the RNA-seq in Supp Figure 2 it is not changed as much as likely some of the others. Also, the arrow seems to indicate that it is right above the cutoff. What about other proteins with ATPase activity? That is the top hit in the Dot plot nuclear function. Would be helpful to write out Adam13 cytoplasm/nucleus here.
      4. The splicing information, while interesting would be better as a different manuscript. The sashimi plot requires more explanation as written. How do you tell if the interactions are changed from this? The authors argue there is a reduction of Tfap2a in Figure 3H but half the explant is not expressing sox9 in the Adam13 knockdown. How is this kind of experiment controlled when measure areas that don't have any fluorescence because of the nature of the explants?
      5. The use of a germ line Xenopus mutant for Adam13 is great but how were these knockouts validated? More information is required here. The Chip-qPCR has a lot of variability between the samples, especially in the H3K9me2/3. Because these are in cell lines, this should be more consistent. In addition, it is difficult to understand what this means for cranial neural crest cells when assaying in HEK293T cells with the luciferase assay.
      6. The migration assay shows only an example of what it looks like to have defective migration. But it would be better to show control embryos, embryos with Adam13 knockdown and what the rescues look like so the reader can make their own conclusion. The argument from the section above suggests the S1 isoform is the primary one but S3 in this assay also rescues, please explain what this result means since it seems to suggest that even though these isoforms have different activity the function is similar in terms of the ability to rescue defective migration.
      7. The next section again talks about yet another protein Calpain-8. Here the authors use MO13 for luciferase assays instead of HEK293 cells. The authors do not explain why they decided to switch from cells to MO.
      8. The experiment to IP RNA supports only the correlation that Adam9 and Adam13 bind RNA and RNA binding proteins to regulate splicing. This conclusion presented is not supported by the data presented here. While there is a sentence about why Adam9 was chosen here, it would be preferred to focus on Adam13 as the rest of the manuscript is focused on Adam13. The conclusions are generalized to all ADAMs, but ADAM13 and ADAM9 are the only ADAMs investigated in the manuscript

      Minor comments

      1. The manuscript using a lot of abbreviations (PCNS, NI, MO, SH3) and lingo that are unclear to a general reader. Please define acronyms when first used, as well as be clear on which model is being used in each experiment.
      2. Similarly, the figures are not labeled such that a reader would be able to understand ie MO13 should be Adam13 knockdown etc.
      3. Please identify the genes on the heatmap and some highlighted genes from volcano plot from the RNA-seq.
      4. Why use the flag tag in Figure 5?
      5. Is the data in figure 4A-D the same as Supp. Figure 4A-D?
      6. Please italicize gene symbols - e.g. "key transcription factors that exemplify CNC, such as the SOX9, FOXD3, SNAI1, SNAI2, and TFAP2 family".
      7. Please review the manuscript for grammatical and typographical errors.

      Cross-commenting

      I think the two reviewers on one the same page on this manuscript.

      Significance

      If more solid, would be a conceptual advance in role of Adam13 in mediating chromatin modification and transcription factors, adds to exiting work from this lab, good for a specialize audience, my expertise is in in neural crest development, non-mammalian modes, epigenetic regulators

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      In this manuscript, Pandey et al. show that the ADAM13 protein modulates histone modifications in cranical neural crest and that the Arid3a protein binds the Tfap2a promoter in an Adam13-dependent manner and has promoter-specific effects on transcription. Furthermore, they show that the Adam13 and human ADAM9 proteins associated with histone modifiers as well as proteins involved in RNA splicing.

      Although the manuscript is mostly clearly written and the figures well assembled, it reads like a couple of separate and unfinished stories. They show using immunocytochemistry and qPCR that ADAM13 knockouts in CNCs afffects histone modifications. Here ChIP-seq or Cut-n-Run experiments would be more appropriate and would result in a more comprehensive understanding of the changes mediated. The immunohistochemistry assays should at least be verified further using western blotting or other more quantiative methods. The authors then show that ADAM13 interacts with a number of histone modifiers such as KDM3B, KDM4B and KMT2A but strangely they do not follow up this interesting observation to map the interactions further (apart from a co-ip with KMT2A), the domains involved, the functional role of the interactions or how they mediate the changes in chromatin modifications.

      The authors then show that ADAM13 affects expression of the TFAP2a gene in a promoter specific manner - affecting expression from S1 but not S2. They further show that ADAM13 affects the binding of the Arid3 transcription fator to the S1-promoter but not to the S3 promoter. However, ADAM13 was present at both promoters. Absence of ADAM13 resulted in increased H3K9me2/3 and decreased H3K4me3 at the S1 promoter whereas only H3K4me3 was changed at the S2 promoter. Unfortunately, they do not show how this is mediated or through which binding elements this takes place. Why is ADAM13 present at both promoters but only affects Arid3 binding at S1? The authors claim that transfecting Arid3a and Adam13 together further increases expression from a reporter (Fig 4E) but this is not true as no statistical comparison is done between the singly transfected and double transfected cells.

      Then the authors surprisingly start investigating association of proteins with the two isoforms of TFAP2a which in the mind of this reviewer is a different question entirely. They find a number of proteins involved in splicing. And the observation that ADAM13 also interacts with splicing factors is really irrelevant in terms of the story that they are trying to tell. Transcription regulation and splicing are different processes and although both affect the final outcome, mRNA, they need to be investigated separately. The link is at least not very clear from the manuscript. Again, the effects on splicing are not further investigated through functional analysis and as presented the data presented is too open-ended and lacking in clarity.

      Additional points:

      1. In the abstract they propose that the ADAMs may act as extracellular sensors. This is not substantiated by the results.
      2. Page 5, line 16: what is referred to by 6 samples 897 proteins? Were 6 samples analyzed for each condition? The number of repeats for the mass spec analysis is not clear from the text nor are the statistical parameters used to analyse the data. This is also true for the mass spec presented in the part on TFAP2aL-S1 and Adam13 regulate splicing. Statistics and repeats are not presented.
      3. Page 6, line 19: set domain should be SET domain.
      4. The number of repeats in the RNA sequencing of the CNCs is not clear from the text.
      5. The explanation of Figure C is a bit lacking. There are two forms of TFAP2a, L and S, but only one is presented in the figure. Do both forms have the extra S1-3 exons? Also, at the top of the figure it is not clear that the boxes are part of a continuous DNA sequence. Also, it is not clear which codon is not coding.
      6. In the sashimi plot there are green and pink shaded areas. What do they denote? What exactly is lacking in the MO13 mutant - seems that a particular exon is missing suggesting skipping?
      7. Page 11, line 9: „with either MbC or MbC and MO13" needs to be rephrased.
      8. Page 11, line 19: „the c-terminus of....and S3) and" should be „the C-terminus of...and S3 and".
      9. Page 15, line 10: substrateS
      10. Page 16, line 23: the sentence „increases H3K9 to the promoter of the most upstream" needs revision.
      11. Page 26, line 12: Here the authors say: „for two samples two-tail unpaired". What does this mean? Statistics should not be performed on fewer than three samples. In legnd to Figure 6 it indicates that T-test was performed on two samples.
      12. The discussion should be shortened and simplified.
      13. Figure 1 legend. How many images were quantitated for each condition?
      14. Figure 2 has a strange order of panels where G is below B.
      15. Figure 6 legend, line 12. „proteins that were significantly enriched in either of the 2 samples" is not very clear. What exactly does this mean?

      Significance

      If the authors follow up on either the transcription-part of the story, or the splicing part of the story, they are likely to have important results to present. However, in the present format the paper is lacking in focus as both issues are mixed together without a clear end-result.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors describe a new computational method (SegPore), which segments the raw signal from nanopore direct RNA-Seq data to improve the identification of RNA modifications. In addition to signal segmentation, SegPore includes a Gaussian Mixture Model approach to differentiate modified and unmodified bases. SegPore uses Nanopolish to define a first segmentation, which is then refined into base and transition blocks. SegPore also includes a modification prediction model that is included in the output. The authors evaluate the segmentation in comparison to Nanopolish and Tombo (RNA002) as well as f5c and Uncalled 4 (RNA004), and they evaluate the impact on m6A RNA modification detection using data with known m6A sites. In comparison to existing methods, SegPore appears to improve the ability to detect m6A, suggesting that this approach could be used to improve the analysis of direct RNA-Seq data.

      Strengths:

      SegPore address an important problem (signal data segmentation). By refining the signal into transition and base blocks, noise appears to be reduced, leading to improved m6A identification at the site level as well as for single read predictions. The authors provide a fully documented implementation, including a GPU version that reduces run time. The authors provide a detailed methods description, and the approach to refine segments appears to be new.

      Weaknesses:

      The authors show that SegPore reduces noise compared to other methods, however the improvement in accuracy appears to be relatively small for the task of identifying m6A. To run SegPore, the GPU version is essential, which could limit the application of this method in practice.

      As discussed in Paragraph 4 of the Discussion, we acknowledge that the improvement of SegPore combined with m6Anet over Nanopolish+m6Anet in bulk in vivo analysis is modest. This outcome is likely influenced by several factors, including alignment inaccuracies caused by pseudogenes or transcript isoforms, the presence of additional RNA modifications that can affect signal baselines, and the fact that m6Anet is specifically trained on Nanopolish-derived events. Additionally, the absence of a modification-free (in vitro transcribed) control sample in the benchmark dataset makes it challenging to establish true k-mer baselines.

      Importantly, these challenges do not exist for in vitro data, where the signal is cleaner and better defined. As a result, SegPore achieves a clear and substantial improvement at the single-molecule level, demonstrating the strength of its segmentation approach and its potential to significantly enhance downstream analyses. These results indicate that SegPore is particularly well suited for benchmarking and mechanistic studies of RNA modifications under controlled experimental conditions, and they provide a strong foundation for future developments.

      We also recognize that the current requirement for GPU acceleration may limit accessibility in some computational environments. To address this, we plan to further optimize SegPore in future versions to support efficient CPU-only execution, thereby broadening its applicability and impact.

      Reviewer #2 (Public review):

      Summary:

      The work seeks to improve detection of RNA m6A modifications using Nanopore sequencing through improvements in raw data analysis. These improvements are said to be in the segmentation of the raw data, although the work appears to position the alignment of raw data to the reference sequence and some further processing as part of the segmentation, and result statistics are mostly shown on the 'data-assigned-to-kmer' level.

      As such, the title, abstract and introduction stating the improvement of just the 'segmentation' does not seem to match the work the manuscript actually presents, as the wording seems a bit too limited for the work involved.

      The work itself shows minor improvements in m6Anet when replacing Nanopolish' eventalign with this new approach, but clear improvements in the distributions of data assigned per kmer. However, these assignments were improved well enough to enable m6A calling from them directly, both at site-level and at read-level.

      A large part of the improvements shown appear to stem from the addition of extra, non-base/kmer specific, states in the segmentation/assignment of the raw data, removing a significant portion of what can be considered technical noise for further analysis. Previous methods enforced assignment of (almost) all raw data, forcing a technically optimal alignment that may lead to suboptimal results in downstream processing as datapoints could be assigned to neighbouring kmers instead, while random noise that is assigned to the correct kmer may also lead to errors in modification detection.

      For an optimal alignment between the raw signal and the reference sequence, this approach may yield improvements for downstream processing using other tools.

      Additionally, the GMM used for calling the m6A modifications provides a useful, simple and understandable logic to explain the reason a modification was called, as opposed to the black models that are nowadays often employed for these types of tasks.

      Weaknesses:

      The manuscript suggests the eventalign results are improved compared to Nanopolish. While this is believably shown to be true (Table 1), the effect on the use case presented, downstream differentiation between modified and unmodified status on a base/kmer, is likely limited for during downstream modification calling the noisy distributions are often 'good enough'. E.g. Nanopolish uses the main segmentation+alignment for a first alignment and follows up with a form of targeted local realignment/HMM test for modification calling (and for training too), decreasing the need for the near-perfect segmentation+alignment this work attempts to provide. Any tool applying a similar strategy probably largely negates the problems this manuscript aims to improve upon. Should a use-case come up where this downstream optimisation is not an option, SegPore might provide the necessary improvements in raw data alignment.

      Thank you for this thoughtful comment. We agree that many current state-of-the-art (SOTA) methods perform well on benchmark datasets, but we believe there is still substantial room for improvement. Most existing benchmarks are based on limited datasets, primarily focusing on DRACH motifs in human and mouse transcriptomes. However, m6A modifications can also occur in non-DRACH motifs, where current models tend to underperform. Furthermore, other RNA modifications, such as pseudouridine, inosine, and m5C, remain less studied, and their detection is likely to benefit from more accurate and informative signal modeling.

      It is also important to emphasize that raw signal segmentation and RNA modification detection are fundamentally distinct tasks. SegPore focuses on improving the segmentation step by producing a cleaner and more interpretable signal, which provides a stronger foundation for downstream analyses. Even if RNA modification detection algorithms such as m6Anet can partially compensate for noisy segmentation in specific cases, starting from a more accurate signal alignment can still lead to improved accuracy, robustness, and interpretability—particularly in challenging scenarios such as non-canonical motifs or less characterized modifications.

      Scientific progress in this field is often incremental, and foundational improvements can have a significant long-term impact. By enhancing raw signal segmentation, SegPore contributes an essential building block that we expect will enable the development of more accurate and generalizable RNA modification detection algorithms as the community integrates it into more advanced workflows.

      Appraisal:

      The authors have shown their methods ability to identify noise in the raw signal and remove their values from the segmentation and alignment, reducing its influences for further analyses. Figures directly comparing the values per kmer do show a visibly improved assignment of raw data per kmer. As a replacement for Nanopolish' eventalign it seems to have a rather limited, but improved effect, on m6Anet results. At the single read level modification modification calling this work does appear to improve upon CHEUI.

      Impact:

      With the current developments for Nanopore based modification calling largely focusing on Artificial Intelligence, Neural Networks and the likes, improvements made in interpretable approaches provide an important alternative that enables deeper understanding of the data rather than providing a tool that plainly answers the question of wether a base is modified or not, without further explanation. The work presented is best viewed in context of a workflow where one aims to get an optimal alignment between raw signal data and the reference base sequence for further processing. For example, as presented, as a possible replacement for Nanopolish' eventalign. Here it might enable data exploration and downstream modification calling without the need for local realignments or other approaches that re-consider the distribution of raw data around the target motif, such as a 'local' Hidden Markov Model or Neural Networks. These possibilities are useful for a deeper understanding of the data and further tool development for modification detection works beyond m6A calling.

      Reviewer #3 (Public review):

      Summary:

      Nucleotide modifications are important regulators of biological function, however, until recently, their study has been limited by the availability of appropriate analytical methods. Oxford Nanopore direct RNA sequencing preserves nucleotide modifications, permitting their study, however many different nucleotide modifications lack an available base-caller to accurately identify them. Furthermore, existing tools are computationally intensive, and their results can be difficult to interpret.

      Cheng et al. present SegPore, a method designed to improve the segmentation of direct RNA sequencing data and boost the accuracy of modified base detection.

      Strengths:

      This method is well described and has been benchmarked against a range of publicly available base callers that have been designed to detect modified nucleotides.

      Weaknesses:

      However, the manuscript has a significant drawback in its current version. The most recent nanopore RNA base callers can distinguish between different ribonucleotide modifications, however, SegPore has not been benchmarked against these models.

      The manuscript would be strengthened by benchmarking against the rna004_130bps_hac@v5.1.0 and rna004_130bps_sup@v5.1.0 dorado models, which are reported to detect m5C, m6A_DRACH, inosine_m6A and PseU.

      A clear demonstration that SegPore also outperforms the newer RNA base caller models will confirm the utility of this method.

      Thank you for highlighting this important limitation. While Dorado, the new ONT basecaller, is publicly available and supports modification-aware basecalling, suitable public datasets for benchmarking m5C, inosine, m6A, and PseU detection on RNA004 are currently lacking. Dorado’s modification-aware models are trained on ONT’s internal data, which is not publicly released. Therefore, it is currently not feasible to directly evaluate or compare SegPore’s performance against Dorado for these RNA modifications.

      We would also like to emphasize that SegPore’s primary contribution lies in raw signal segmentation, which is an upstream and foundational step in the RNA modification detection pipeline. As more publicly available datasets for RNA004 modification detection become accessible, we plan to extend our work to benchmark and integrate SegPore with modification detection tasks on RNA004 data in future studies.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Comments based on Author Response

      “However, it is valid to compare them on the segmentation task, where SegPore exhibits better performance (Table 1).”

      This dodges the point of the actual use case of this approach, as Nanopolish indeed does not support calling modifications for this kind of data, but the general approach it uses might, if adapted for this data, nullify the gains made in the examples presented.

      We respectfully disagree with the comment that the advantages demonstrated by SegPore could be “nullified”. Although SegPore’s performance is indeed more modest in in vivo datasets, it shows substantially better performance than CHEUI in in vitro data, clearly demonstrating that improved segmentation directly contributes to more accurate RNA modification estimation.

      It is worth noting that CHEUI relies on Nanopolish’s segmentation results for m6A detection. Despite this, SegPore outperforms CHEUI, further supporting the conclusion that segmentation quality has a meaningful impact on downstream modification calling.

      In conclusion, based on our current experimental results, SegPore is particularly well suited for RNA modification analysis from in vitro transcribed data, where its improved segmentation provides a clear advantage over existing methods.

      Further comments

      (2) “(2) Page 3  employ models like Hidden Markov Models (HMM) to segment the signal, but they are prone to noise and inaccuracies”

      “That's the alignment/calling part, not the segmentation?”

      “Current methods, such as Nanopolish, employ models like Hidden Markov Models (HMM) to segment the signal”

      I get the impression the word 'segment' has a different meaning in this work than what I'm used to based on my knowledge around Nanopolish and Tombo, see the deeper code examples further down below.

      Additionally, in Nanopolish there is a clear segmentation step (or event detection) without any HMM, then a sort of dynamic timewarping step that aligns the segments and re-combines some segments into a single segment where necessary afterwards. I believe the HMM in Nanopolish is not used at all unless modification calling, but if you can point out otherwise I'm open for proof.

      Now I believe it is the meaning of 'segmenting the signal' that confuses me, and now the clarification makes it a bit odd as well:

      “Nanopolish and Tombo align the raw signal to the reference sequence to determine which portion of the signal corresponds to each k-mer. We define this process as the segmentation task, referred to as "eventalign" in Nanopolish.”

      So now it's clearly stated the raw signal is being 'aligned' and then the process is suddenly defined as the 'segmentation task', and again referred to as "eventalign". Why is it not referred to as the 'alignment task' instead?

      I understand the segmentation and alignment parts are closely connected but to me, it seems this work picks the wrong word for the problem being solved.

      “Unlike Nanopolish and Tombo, which directly align the raw signal to the reference sequence,…”

      Looking at their code, I believe both Nanopolish and Tombo actually do segment the data first (or "event detection"), then they align the segments/events they found, and finally multiple events aligned to the same section are merged. See for yourself:

      Nanopolish:

      https://github.com/jts/nanopolish/blob/master/src/nanopolish_squiggle_read.cpp<br /> Line 233:

      cpp

      trim_and_segment_raw(fast5_data.rt, trim_start, trim_end, varseg_chunk, varseg_thresh);

      event_table et = detect_events(fast5_data.rt, *ed_params);

      Line 270:

      cpp

      // align events to the basecalled read

      std::vector event_alignment = adaptive_banded_simple_event_align(*this, *this->base_model[strand_idx], read_sequence);

      Where event detection is further defined at line 268 here:

      https://github.com/jts/nanopolish/blob/master/src/thirdparty/scrappie/event_detection.c

      Tombo:

      https://github.com/nanoporetech/tombo/blob/master/tombo/resquiggle.py

      line 1162 and onwards shows a ‘segment_signal’ call and the results are used in a ‘find_adaptive_base_assignment’ call, where ‘segment_signal’ starting at line 1057 tries to find where the signal jumps from a series of similar values to another (start of a base change in the pore), stored in ‘valid_cpts’, and the ‘find_adaptive_base_assignment’ tries to align the resulting segment values to the expected series of values:

      python

      valid_cpts, norm_signal, new_scale_values = segment_signal(

      map_res, num_events, rsqgl_params, outlier_thresh, const_scale)

      event_means = ts.compute_base_means(norm_signal, valid_cpts)

      dp_res = find_adaptive_base_assignment(

      valid_cpts, event_means, rsqgl_params, std_ref, map_res.genome_seq,

      start_clip_bases=map_res.start_clip_bases,

      seq_samp_type=seq_samp_type, reg_id=map_res.align_info.ID)

      These implementations are also why I find the choice of words for what is segmentation and what is alignment a bit confusing in this work, as both Tombo and Nanopolish do a similar, clear segmentation step (or an "event detection" step), followed by the alignment of the segments they determined. The terminology in this work appears to deviate from these.

      We thank the reviewer for the detailed comments!

      First of all, we sincerely apologize for our earlier misunderstanding regarding how Nanopolish and Tombo operate. Based on a closer examination of their source codes, we now recognize that both tools indeed include a segmentation step based on change-point detection methods, after which the resulting segments are aligned to the reference sequence. We have revised the relevant text in the manuscript accordingly:

      - “Current methods, such as Nanopolish, employ change-point detection methods to segment the signal and use dynamic programming methods and HMM to align the derived segments to the reference sequence,”

      - “We define this process as the segmentation and alignment task (abbreviated as the segmentation task), which is referred to as “eventalign” in Nanopolish.”

      - “In SegPore, we segment the raw signal into small fragments using a Hierarchical Hidden Markov Model (HHMM) and align the mean values of these fragments to the reference, where each fragment corresponds to a sub-state of a k-mer. By contrast, Nanopolish and Tombo use change-point–based methods to segment the signal and employ dynamic programming approaches together with profile HMMs to align the resulting segments to the reference sequence.”

      Regarding terminology, we originally borrowed the term “segmentation” from speech processing, where it refers to dividing continuous audio signals into meaningful units. In the context of nanopore signal analysis, segmentation and alignment are often tightly coupled steps. Because of this and because our initial focus was on methodological development rather than terminology, we used the term “segmentation task” to describe the combined process of signal segmentation and alignment.

      However, we now recognize that this terminology may cause confusion. Changing every instance of “segmentation” to “segmentation and alignment” or “alignment” would require substantial rewriting of the manuscript. Therefore, in this revision, we have clearly defined “segmentation task” as referring to the combined process of segmentation and alignment. We apologize for any earlier confusion and will adopt the term “alignment” in future work for greater clarity.

      (3) I think I do understand the meaning, but I do not understand the relevance of the Aj bit in the last sentence. What is it used for?

      Based on the response and another close look at Fig1, it turns out the j refers to extremely small numbers 1 and 2 in step 3. You may want in improve readability for these.

      Thank you for the suggestion. We have added subscripts to all nucleotides in the reference sequence in Figure 1A and revised the legend to clarify the notation and improve readability. Specifically, we now include the following explanation:

      “For example, A<sub>j</sub> denotes the base ‘A’ at the j-th position on the reference sequence. In this example, A<sub>1</sub> and A<sub>2</sub> refer to the first and second occurrences of ‘A’ in the reference sequence, respectively. Accordingly, μ<sub>1</sub> and μ<sub>2</sub> are aligned to A<sub>1</sub>, while μ<sub>3</sub> is aligned to A<sub>2</sub>”.

      (6) “We chose to use the poly(A) tail for normalization because it is sequence-invariant- i.e., all poly(A) tails consist of identical k-mers, unlike transcript sequences which vary in composition. In contrast, using the transcript region for normalization can introduce biases: for instance, reads with more diverse k-mers (having inherently broader signal distributions) would be forced to match the variance of reads with more uniform k-mers, potentially distorting the baseline across k-mers.”

      While the next part states there was a benchmark showing SegPore still works without this normalization, I think this answer does not touch upon the underlying issue I'm trying to point out here.

      - The biases mentioned here due to a more diverse (or different) subsets of k-mers in a read indeed affects the variance of the signal overall.

      - As I pointed out in my earlier remark here, this can be resolved using an approach of 'general normalization', 'mapping to expected signal', 'theil-sen fitting of scale and offset', 're-mapping to expected signal', as Tombo and Nanopolish have implemented.<br /> - Alternatively, one could use the reference sequence (using the read mapping information) and base the expected signal mean and standard deviation on that instead.

      - The polyA tail stability as an indicator for the variation in the rest of the signal seems a questionable assumption to me. A 'noisy' pore could introduce a large standard deviation using the polyA tail without increasing the deviations on the signal induced by the variety of k-mers, rather it would be representative for the deviations measured within a single k-mer segment. I thought this possible discrepancy is to be expected from a worn out pore, hence I'd imagine reads sequenced later in a run to provide worse results using this method.

      In the current version it is not the statement that is unclear, it is the underlying assumption of how this works that I question.

      We thank the reviewer for raising this important point and for the insightful discussion. Our choice of using the poly(A) tail for normalization is based on the working hypothesis that the poly(A) signal reflects overall pore-level variability and provides a stable reference for signal scaling. We find this to be a practical and effective approach in most experimental settings.

      We agree that more sophisticated strategies, such as “general normalization” or iterative fitting to the expected signal (as implemented in Tombo and Nanopolish), could in principle generate a "better" normalization. However, these approaches are significantly more challenging to implement in practice. This is because signal normalization and alignment are mutually dependent processes: baseline estimates for k-mers influence alignment accuracy, while alignment accuracy, in turn, affects baseline calculation. This interdependence becomes even more complex in the presence of RNA modifications, which alter signal distributions and further confound model fitting.

      It is worth noting that this limitation is already evident in our results. As shown in Figure 4B (first and second k-mers), Nanopolish produces more dispersed baselines than SegPore, even for these unmodified k-mers, suggesting inherent limitations in its normalization strategy. Ideally, baselines for the same k-mer should remain highly consistent across different reads.

      In contrast, poly(A)-based normalization offers a simpler and more robust solution that avoids this circular dependency. Because poly(A) sequences are compositionally homogeneous, they enable reliable estimation of scaling parameters without assumptions about k-mer composition or modification state. Regarding the reviewer’s concern about pore instability, we mitigate this issue by including only high-quality, confidently mapped reads in our analysis, which reduces the likelihood of incorporating signals from degraded or “noisy” pores.

      We fully agree that exploring more advanced normalization strategies is an important direction for future work, and we plan to investigate such approaches as the field progresses.

      (8) “In the remainder of this paper, we refer to these resulting events as the output of eventalign analysis or the segmentation task.”

      Picking only one descriptor rather than two alternatives would be easier to follow (and I'd prefer the first).

      Thank you for the suggestion. We have revised the sentence to:

      “In the remainder of this paper, we refer to these resulting events as the output of eventalign analysis, which also represents the final output of the segmentation and alignment task.”

      (9) “Additionally, a complete explanation of how the weighted mean is computed is provided in Section 5.3 of Supplementary Note 1. It is derived from signal points that are assigned to a given 5mer.”

      I believe there's no more mention of a weighted mean, and I don't get any hits when searching for 'weight'. Is that intentional?

      We apologize for the misplacement of the formulas. We have updated Section 5.3 of Supplementary Note 1 to clarify the definition of the weighted mean. Because multiple current signal segments may be aligned to a single k-mer, we computed the weighted mean for each k-mer across these segments, where the weight corresponds to the number of data points assigned to “curr” state in each event.

      (17) Response: We revised the sentence to clarify the selection criteria: "For selected 5mers “that exhibit both a clearly unmodified and a clearly” “modified signal component”, “SegPore reports the modification rate at each site,” “as well as the modification state of that site on individual reads.””

      So is this the same set described on page 13 ln 343 or not?

      “Due to the differences between human (Supplementary Fig. S2A) and mouse (Supplementary Fig. S2B), only six 5mers were found to have m6A annotations in the test data's ground truth (Supplementary Fig. S2C). For a genomic location to be identified as a true m6A modification site, it had to correspond to one of these six common 5mers and have a read coverage of greater than 20.”

      I struggle to interpret the 'For selected 5mers' part, as I'm not sure if this is a selection I'm supposed to already know at this point in the text or if it's a set just introduced here. If the latter, removing the word 'selected' would clear it up for me.

      We apologize for the confusion. What we mean is that when pooling signals aligned to the same k-mer across different genomic locations and reads, only a subset of k-mers exhibit a bimodal distribution — one peak corresponding to the unmodified state and another to the modified state. Other k-mers show a unimodal distribution, making it impossible to reliably estimate modification levels. We refer to the subset of k-mers that display a bimodal distribution as the “selected” k-mers.

      The “selected k-mers” described on page 13, line 343, must additionally have ground truth labels available in both the training and test datasets. There are 10 k-mers with ground truth annotations in the training data and 11 in the test data, and only 6 of these k-mers are shared between the two datasets, therefore only those 6 overlapping k-mers are retained for evaluation. These 6 k-mers satisfy both criteria: (1) exhibiting a bimodal distribution and (2) having ground truth annotations in both training and test sets.

      To improve clarity, we have removed the term “selected” from the sentence.

      (21) "Tombo used the "resquiggle" method to segment the raw signals, and we standardized the segments using the “poly(A)” tail to ensure a fair comparison “(See” “preprocessing section in Materials and Methods)."”

      In the Materials and Methods:

      “The raw signal segment corresponding to the poly(A) tail is used to standardize the raw signal for each read.”

      I cannot find more detailed information here on what the standardization does, do you mean to refer to Supplementary Note 1, Section 3 perhaps?

      Thank you for pointing this out. Yes, the standardization procedure is described in detail in Supplementary Note 1, Section 3. Tombo itself does not segment and align the raw signal on the absolute pA scale, which can result in very large variance in the derived events if the raw signal is used directly. To ensure a fair comparison, we therefore applied the same preprocessing steps to Tombo’s raw signals as we did for SegPore, using only the event boundary information from Tombo while standardizing the signal in the same way.

      We have revised the sentence for clarity as follows:

      “Tombo used the "resquiggle" method to segment the raw signals, but the resulting signals are not reported on the absolute pA scale. To ensure a fair comparison with SegPore, we standardized the segments using the poly(A) tail in the same way as SegPore (See preprocessing section in Materials and Methods).”

      (22A) The table shown does help showing the benchmark is unlikely to be 'cheated'. However I am suprised to see the Avg std for Nanopolish and Tombo going up instead of down, as I'd expect the transition values to increase the std, and hence, removing them should decrease these values. So why does this table show the opposite?

      I believe this table is not in the main text or the supplement, would it not be a good idea to cover this point somewhere in the work?

      Thank you for this insightful comment. In response, we carefully re-examined our analysis and identified a bug in the code related to boundary removal for Nanopolish. We have now corrected this issue and included the updated results in Supplementary Table S1 of the revised manuscript. As shown in the updated table, the average standard deviations decrease after removing the boundary regions for both Nanopolish and Tombo.

      We have now included this table in Supplementary Table S1 in the revised manuscript and added the following clarification:

      “It is worth noting that the data points corresponding to the transition state between two consecutive 5-mers are not included in the calculation of the standard deviation in SegPore’s results in Table 1. However, their exclusion does not affect the overall conclusion, as there are on average only ~6 points per 5-mer in the transition state (see Supplementary Table S1 for more details).”

      (22B) As mentioned in 2), I'm happy there's a clear definition of what is meant but I found the chosen word a bit odd.

      We apologize for the earlier unclear terminology. We now refer to it as the segmentation and alignment task, abbreviated as the segmentation task.

      (23) Reading back I can gather that from the text earlier, but the summation of what is being tested is this:

      “including Tombo, MINES (31), Nanom6A (32), m6Anet, Epinano (33), and CHEUI (20). “

      next, the identifier "Nanopolish+m6Anet" is, aside from the figure itself, only mentioned in the discussion. Adding a line that explains that "Nanopolish+m6Anet" is the default method of running m6Anet and "SegPore+m6Anet" replaces the Nanopolish part for m6Anet with Segpore, rather than jumping straight to "SegPore+m6Anet", would clarify where this identifier came from.

      Thank you for the helpful suggestion. We have added the identifier to the revised manuscript as follows:

      “Given their comparable methodologies and input data requirements, we benchmarked SegPore against several baseline tools, including Tombo, MINES (31), Nanom6A (32), m6Anet, Epinano (33), and CHEUI (20). By default, MINES and Nanom6A use eventalign results generated by Tombo, while m6Anet, Epinano, and CHEUI rely on eventalign results produced by Nanopolish. In Fig. 3C, ‘Nanopolish+m6Anet’ refers to the default m6Anet pipeline, whereas ‘SegPore+m6Anet’ denotes a configuration in which Nanopolish’s eventalign results are replaced with those from SegPore.”

      (24) For completeness I'd expect tickmarks and values on the y-axis as well.

      Thank you for the suggestion. We have updated Figures 3A and 3B in the revised manuscript to include tick marks and values on the y-axis as requested.

      (25) Considering this statement and looking back at figure 3a and 3b, wouldn't this be easier to observe if the histograms/KDE's were plotted with overlap in a single figure?

      We appreciate the suggestion. However, we believe that overlaying Figures 3A and 3B into a single panel would make the visualization cluttered and more difficult to interpret.

      (29) Please change the sentence in the text to make that clear. As it is written now (while it's the same number of motifs, so one might guess it) it does not seem to refer to that particular set of motifs and could be a new selection of 6 motifs.

      We appreciate the suggestion and have revised the sentence for clarity as follows:

      “We evaluated m6A predictions using two approaches: (1) SegPore’s segmentation results were fed into m6Anet, referred to as SegPore+m6Anet, which works for all DRACH motifs and (2) direct m6A predictions from SegPore’s Gaussian Mixture Model (GMM), which is limited to the six selected 5-mers shown in Supplementary Fig. S2C that exhibit clearly separable modified and unmodified components in the GMM (see Materials and Methods for details). ”

      (31) I think we have a different interpretation of the word 'leverage', or perhaps what it applies to. I'd say it leverages the jiggling if there's new information drawn from the jiggling behaviour. It's taking it into account if it filters for it. The HHMM as far as I understand tries to identify the jiggles, and ignore their values for the segmentation etc. So while one might see this as an approach that "leverages the hypothesis", I don't see how this HHMM "leverages the jiggling property" itself.

      Thank you for the helpful suggestion. We have replaced the word “leverages” with “models” in the revised manuscript.

      New points

      pg6ln166: “…we extract the aligned raw signal segment and reference sequence segment from Nanopolish's events [...] we extract the raw signal segment corresponding to the transcript region for each input read based on Nanopolish's poly(A) detection results.”

      It is not clear as to why this different approach is applied for these two cases in this part of the text.

      Thank you for pointing this out. The two approaches refer to different preprocessing strategies for in vivo and in vitro data.

      For in vivo data, a large proportion of reads do not span the full-length transcript and often map only to a portion of the reference sequence. Moreover, because a single gene can generate multiple transcript isoforms, a read may align equally well to several possible transcripts. Therefore, we extract only the raw signal segment that corresponds to the mapped portion of the transcript for each read.

      In contrast, for in vitro data, the transcript sequence is known precisely. As a result, we can directly extract all raw signals following the poly(A) tail and align them to the complete reference sequence.

      pg10ln259: An important distinction from classical global alignment algorithms is that one or multiple base blocks may align with a single 5mer.”

      If there was usually a 1:1 mapping the alignment algorithm would be more or less a direct match, so I think the multiple blocks aligning to a 5mer thing is actually quite common.

      Thank you for the comment. The “classical global alignment algorithm” here refers to the Needleman–Wunsch algorithm used for sequence alignment. Our intention was to highlight the conceptual difference between traditional sequence alignment and nanopore signal alignment. In classical sequence alignment, each base typically aligns to a single position in the reference. In contrast, in nanopore signal alignment, one or multiple signal segments — corresponding to varying dwell times of the motor protein — can align to a single 5-mer.

      We have revised the sentence as follows:

      “An important distinction from classical global alignment algorithms (Needleman–Wunsch algorithm)……”

      pg13ln356: "dwell time" is not defined or used before, I guess it's effectively the number of raw samples per segment but this should be clarified.

      Thank you for pointing this out. We have now added a clear definition of dwell time in the text as follows:

      "such as the normalized mean μ_i, standard deviation σ_i, dwell time l_i (number of data points in the event)."

      pg13ln358: “Feature vectors from 80% of the genomic locations were used for training, while the remaining 20% were set aside for validation.”

      I assume these are selected randomly but this is not explicitly stated here and should be.

      Yes, they are randomly selected. We have revised the sentence as follows:

      “Feature vectors from a randomly selected 80% of the genomic locations were used for training, while the remaining 20% were set aside for validation.”

      pg18ln488: The manuscript now evaluates RNA004 and compares against f5c and Uncalled4. It mentions the differences between RNA004 and RNA002, namely kmer size and current levels, but does not explain where the starting reference model values for the RNA004 model come from: In pg18ln492 they state "RNA004 provides reference values for 9mers", then later they seem to use a 5mer parameter table (pg19ln508), are they re-using the same table from RNA002 or did they create a 5mer table from the 9mer reference table?

      We apologize for the confusion. The reference model table for RNA004 9-mers is obtained from f5c (the array named ‘rna004_130bps_u_to_t_rna_9mer_template_model_builtin_data’in  https://raw.githubusercontent.com/hasindu2008/f5c/refs/heads/master/src/model.h).

      Author response image 1.

      We have revised the subsection header “5-mer parameter table” in the Method to “5-mer & 9-mer parameter table” to highlight this and added a paragraph about how to obtain the 9-mer parameter table:

      “In the RNA004 data analysis (Table 2), we obtained the 9-mer parameter table from the source code of f5c (version 1.5). Specifically, we used the array named ‘rna004_130bps_u_to_t_rna_9mer_template_model_builtin_data’ from the following file: https://raw.githubusercontent.com/hasindu2008/f5c/refs/heads/master/src/model.h (accessed on 17 October 2025).”

      Also, in page 18 line 195, we added the following sentence:

      “The 9-mer parameter table in pA scale for RNA004 data provided by f5c (see Materials and Methods) was used in the analysis.”

      pg19ln520: “Additionally, due to the differences of the k-mer motifs between human and mouse (Supplementary Fig. S2), six shared 5mers were selected to demonstrate SegPore's performance in modification prediction directly.”

      "the differences" - in occurrence rates, as I gather from the supplementary figure, but it would be good to explicitly state it in this sentence itself too.

      Thank you for the helpful suggestion. We agree that the original sentence was vague. The main reason for selecting only six 5-mers is the difference in the availability of ground truth labels for specific k-mer motifs between human and mouse datasets. We have revised the sentence accordingly:

      “Additionally, due to the differences in the availability of ground truth labels for specific k-mer motifs between human and mouse (Supplementary Fig. S2), six shared 5-mers were selected to directly demonstrate SegPore’s performance in modification prediction.”

      pg24ln654: “SegPore codes current intensity levels”

      "codes" is meant to be "stores" I guess? Perhaps "encodes"?

      Thank you for the suggestion. We have now replaced it with “encodes” in the revised manuscript.

      Lastly, looking at the feedback from the other reviewers comment:

      The 'HMM' mentioned in line 184 looks fine to me, the HHMM is 2 HMM's in a hierarchical setup and the text now refers to one of these HMM layers. If this is to be changed it would need to state the layer (e.g. "the outer HHMM layer") throughout the text instead.

      We agree with this assessment and believe that the term “inner HMM” is accurate in this context, as it correctly refers to one of the two HMM layers within the HHMM structure. Therefore, we have decided to retain the current terminology.

      Reviewer #3 (Recommendations for the authors):

      I recommend the publication of this manuscript, provided that the following comments are addressed.

      Page 5, Preprocessing: You comment that the poly(A) tail provides a stable reference that is crucial for the normalisation of all reads. How would this step handle reads that have interrupted poly(A) tails (e.g. in the case of mRNA vaccines that employ a linker sequence)? Or cell types that express TENT4A/B, which can include transcripts with non-A residues in the poly(A) tail: https://www.science.org/doi/full/10.1126/science.aam5794.

      It depends on Nanopolish’s ability to reliably detect the poly(A) tail. In general, the poly(A) region produces a long stretch of signals fluctuating around a current level of ~108.9 pA (RNA002) with relatively stable variation, which allows it to be identified and used for normalization.

      For in vivo data, if the poly(A) tail is interrupted (e.g., due to non-A residues or linker sequences), two scenarios are possible:

      (1) The poly(A) tail may not be reliably detected, in which case the corresponding read will be excluded from our analysis.

      (2) Alternatively, Nanopolish may still recognize the initial uninterrupted portion of the poly(A) signal, which is typically sufficient in length and stability to be used for signal normalization.

      For in vitro data, the poly(A) tails are uninterrupted, so this issue does not arise.

      All analyses presented in this study are based exclusively on reads with reliably detected poly(A) tails.

      Page 7, 5mer parameter table: r9.4_180mv_70bps_5mer_RNA is an older kmer model (>2 years). How does your method perform with the newer RNA kmer models that do permit the detection of multiple ribonucleotide modifications? Addressing this comment would be beneficial, however I understand that it would require the generation of new data, as limited RNA004 datasets are available in the public domain.

      “r9.4_180mv_70bps_5mer_RNA” is the most widely used k-mer model for RNA002 data. Regarding the newer k-mer models, we believe the reviewer is referring to the “modification basecalling” models available in Dorado, which are specifically designed for RNA004 data. At present, SegPore can perform RNA modification estimation only on RNA002 data, as this is the platform for which suitable training data and ground truth annotations are available. Evaluating SegPore’s performance with the newer RNA004 modification models would require new datasets containing known modification sites generated with RNA004 chemistry. Since such data are currently unavailable, we have not yet been able to assess SegPore under these conditions. This represents an important future direction for extending and validating our method.

      The Methods and Results sections contain redundant information -please streamline the information in these sections and reduce the redundancy.

      We thank the reviewer for this suggestion and acknowledge that there is some overlap between the Methods and Results sections. However, we feel that removing these parts could compromise the clarity and readability of the manuscript, especially given that Reviewer 2 emphasized the need for clearer explanations. We therefore decided to retain certain methodological descriptions in the Results section to ensure that key steps are understandable without requiring the reader to constantly cross-reference the Methods.

      Minor comments

      Please be consistent when referring to k-mers and 5-mers (sometimes denoted as 5mers - please change to 5-mers throughout).

      We have revised the manuscript to ensure consistency and now use “5-mers” throughout the text.

      Introduction

      Lines 80 - 112: Please condense this section to roughly half the length (1-2 paragraphs). In general, the results described in the introduction should be very brief, as they are described in full in the results section.

      Thank you for the suggestion. We have condensed the original three paragraphs into a single, more concise paragraph as follows:

      "SegPore is a novel tool for direct RNA sequencing (DRS) signal segmentation and alignment, designed to overcome key limitations of existing approaches. By explicitly modeling motor protein dynamics during RNA translocation with a Hierarchical Hidden Markov Model (HHMM), SegPore segments the raw signal into small, biologically meaningful fragments, each corresponding to a k-mer sub-state, which substantially reduces noise and improves segmentation accuracy. After segmentation, these fragments are aligned to the reference sequence and concatenated into larger events, analogous to Nanopolish’s “eventalign” output, which serve as the foundation for downstream analyses. Moreover, the “eventalign” results produced by SegPore enhance interpretability in RNA modification estimation. While deep learning–based tools such as m6Anet classify RNA modifications using complex, non-transparent features (see Supplementary Fig. S5), SegPore employs a simple Gaussian Mixture Model (GMM) to distinguish modified from unmodified nucleotides based on baseline current levels. This transparent modeling approach improves confidence in the predictions and makes SegPore particularly well-suited for biological applications where interpretability is essential."

      Line 104: Please change "normal adenosine" to "adenosine".

      We have revised the manuscript as requested and replaced all instances of “normal adenosine” with “adenosine” throughout the text.

      Materials and Methods

      Line 176: Please reword "...we standardize the raw current signals across reads, ensuring that the mean and standard deviation of the poly(A) tail are consistent across all reads." To "...we standardize the raw current signals for each read, ensuring that the mean and standard deviation are consistent across the poly(A) tail region."

      We have changed sentence as requested.

      “Since the poly(A) tail provides a stable reference, we standardize the raw current signals for each read, ensuring that the mean and standard deviation are consistent across the poly(A) tail region.”

      Line 182: Please describe the RNA translocation hypothesis, as this is the first mention of it in the text. Also, why is the Hierachical Hidden Markov model perfect for addressing the RNA translocation hypothesis? Explain more about how the HHMM works and why it is a suitable choice.

      We have revised the sentence as requested:

      “The RNA translocation hypothesis (see details in the first section of Results) naturally leads to the use of a hierarchical Hidden Markov Model (HHMM) to segment the raw current signal.”

      The motivation of the HHMM is explained in detail in the the first section “RNA translocation hypothesis” of Results. As illustrated in Figure 2, the sequencing data suggest that RNA molecules may translocate back and forth (often referred to as jiggling) while passing through the nanopore. This behavior results in complex current fluctuations that are challenging to model with a simple HMM. The HHMM provides a natural framework to address this because it can model signal dynamics at two levels. The outer HMM distinguishes between two major states — base states (where the signal corresponds to a stable sub-state of a k-mer) and transition states (representing transitions from one base state to the next). Within each base state, an inner HMM models finer signal variation using three states — “curr”, “prev”, and “next” — corresponding to the current k-mer sub-states and its neighboring k-mer sub-states. This hierarchical structure captures both the stable signal patterns and the stochastic translocation behavior, enabling more accurate and biologically meaningful segmentation of the raw current signal.

      Line 184: do you mean HHMM? Please be consistent throughout the text.

      As explained in the previous response, the HHMM consists of two layers: an outer HMM and an inner HMM. The term “HMM” in line 184 is meant to be read together with “inner” at the end of line 183, forming the phrase “inner HMM.” It seems the reviewer may have overlooked this when reading the text.

      Line 203: please delete: "It is obviously seen that".

      We have removed the phrase “It is obviously seen that” from the sentence as requested. The revised sentence now reads:

      “The first part of Eq. 2 represents the emission probabilities, and the second part represents the transition probabilities.”

      Line 314, GMM for 5mer parameter table re-estimation: "Typically, the process is repeated three to five times until the5mer parameter table stabilizes." How is the stabilisation of the 5mer parameter table quantified? What is a reasonable cut-off that would demonstrate adequate stabilisation of the 5mer parameter table? Please add details of this to the text.

      We have revised the sentence to clarify the stabilization criterion as follows:

      “Typically, the process is repeated three to five times until the 5-mer parameter table stabilizes (when the average change of mean values of all 5-mers is less than 5e-3).”

      Results

      Line 377: Please edit to read "Traditional base calling algorithms such as Guppy and Albacore assume that the RNA molecule is translocated unidirectionally through the pore by the motor protein."

      We have revised the sentence as:

      “In traditional basecalling algorithms such as Guppy and Albacore, we implicitly assume that the RNA molecule is translocated through the pore by the motor protein in a monotonic fashion, i.e., the RNA is pulled through the pore unidirectionally.”

      Line 555, m6A identification at the site level: "For six selected m6A motifs, SegPore achieved an ROC AUC of 82.7% and a PR AUC of 38.7%, earning the third best performance compared with deep leaning methods m6Anet and CHEUI (Fig. 3D)." So SegPore performs third best of all deep learning methods. Do you recommend its use in conjunction with m6Anet for m6A detection? Please clarify in the text. This will help to guide users to possible best practice uses of your software.

      Thank you for the suggestion. We have added a clarification in the revised manuscript to guide users.

      “For practical applications, we recommend taking the intersection of m6A sites predicted by SegPore and m6Anet to obtain high-confidence modification sites, while still benefiting from the interpretability provided by SegPore’s predictions.”

      Figures.

      Figure 1A please refer to poly(A) tail, rather than polyA tail.

      We have updated it to poly(A) tail in the revised manuscript.

    1. Reviewer #3 (Public review):

      Summary:

      Pinho et al., investigated the role of the dorsal VS ventral hippocampus and gender differences in mediated learning. While previous studies already established the engagement of the hippocampus in sensory preconditioning, the authors here took advantages of freely-moving fiber photometry recording and chemogenetics to observe and manipulate sub-regions of the hippocampus (drosal VS ventral) in a cell-specific manner. Importantly, the authors validated the sensory preconditioning procedure in male mice. The authors found no evidence of sensory preconditioning in female mice, but rather a generalization effect, stressing the importance of gender differences in fear learning. After validation of a sensory preconditioning procedure in male mice using light and tone neutral stimuli and a mild foot shock as the unconditioned stimulus, the authors used fiber photometry to record from all neurons VS parvalbumin_positive_only neurons in the dorsal hippocampus or ventral hippocampus of male mice during both preconditioning and conditioning phases. They found an increased activity of all neurons, PV+_only neurons, and CAMKII+ neurons in both sub-regions of the hippocampus during both preconditioning and conditioning phases. Finally, the authors found that chemogenetic inhibition of CaMKII+ neurons (but not PV+_only neurons) in the dorsal (but not ventral) hippocampus specifically prevented the formation of an association between the two neutral stimuli (i.e., light and tone cues). This manipulation had no effect on the direct association between the light cue and the mild foot shock. This set of data (1) validates sensory preconditioning in male mice, and stresses the importance of taking gender effect into account; (2) validates the recruitment of dorsal and ventral hippocampi during preconditioning and conditioning phases; (3) and further establishes the specific role of CaMKII+ neurons in the dorsal hippocampus, but not ventral hippocampus, in the formation of an association between two neutral stimuli, but not between a neutral-stimulus and a mild foot shock.

      Strengths:

      The authors developed a sensory preconditioning procedure in male mice to investigate mediated learning using light and tone cues as neutral stimuli, and a mild foot shock as the unconditioned stimulus. They provide evidence of a gender effect in the formation of light-cue association. The authors took advantage of fiber-photometry and chemogenetics to target sub-regions of the hippocampus, in a cell-specific manner and investigate their role during different phases of a sensory conditioning procedure, and developed a DeepLabCut-based strategy to assess freezing fear responses.

      Weaknesses:

      The authors went further than previous studies by investigating the role of sub-regions the hippocampus in mediated learning, however, there are a few weaknesses that should be addressed in future studies:

      (1) This study found a generalization effect in female mice only. While the authors attempted to neutralize this effect, the mechanism underlying this gender effect and whether female mice can display evidence for mediated learning has yet to be determined.

      (2) One of the main effects from which derives the conclusion of this study (i.e., deficit of mediated learning in male mice when CAMKII+ neurons are inhibited in the dorsal HPC during the preconditioning phase) lies in the absence of a significant difference of the freezing response before and during the tone cue presentation when CAMKII+ are chemogenetically inhibited during the Probe Test Tone phase (cf. Fig. 4 Panel B, DPCd group). The fear response before the tone cue presentation in this group (DPCd) seems higher than in Controls_d and DPTd groups and could have masked a mediated learning effect.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The study by Pinho et al. presents a novel behavioral paradigm for investigating higher-order conditioning in mice. The authors developed a task that creates associations between light and tone sensory cues, driving mediated learning. They observed sex differences in task acquisition, with females demonstrating faster-mediated learning compared to males. Using fiber photometry and chemogenetic tools, the study reveals that the dorsal hippocampus (dHPC) plays a central role in encoding mediated learning. These findings are crucial for understanding how environmental cues, which are not directly linked to positive/negative outcomes, contribute to associative learning. Overall, the study is well-designed, with robust results, and the experimental approach aligns with the study's objectives. 

      Strengths: 

      (1) The authors develop a robust behavioral paradigm to examine higher-order associative learning in mice. 

      (2) They discover a sex-specific component influencing mediated learning, with females exhibiting enhanced learning abilities. 

      (3) Using fiber photometry and chemogenetic techniques, the authors identify the dorsal hippocampus but not the ventral hippocampus, which plays a crucial for encoding mediated learning.

      We appreciate the strengths highlighted by the Reviewer and the valuable and complete summary of our work.

      Weaknesses: 

      (1) The study would be strengthened by further elaboration on the rationale for investigating specific cell types within the hippocampus.  

      We thank the Reviewer for highlighting this important point. In the revised manuscript, we have added new information (Page 11, Lines 27-34) to specifically explain the rational of studying the possible cell-type specific involvement in sensory preconditioning.

      (2) The analysis of photometry data could be improved by distinguishing between early and late responses, as well as enhancing the overall presentation of the data.  

      According to the Reviewer comment, we have included new panels in Figure 3E and the whole Supplementary Figure 4, which separates the photometry data across different preconditioning and conditioning sessions, respectively. Overall, this data suggests that there are no major changes on cell activity in both hippocampal regions during the different sessions as similar light-tone-induced enhancement of activity is observed. These findings have been incorporated in the Results Section (Page 12, Lines 13-15, 19-20 and 35-36).

      (3) The manuscript would benefit from revisions to improve clarity and readability.

      Based on the fair comment, we have gone through the text to increase clarity and readability.

      Reviewer #2 (Public review): 

      Summary: 

      Pinho et al. developed a new auditory-visual sensory preconditioning procedure in mice and examined the contribution of the dorsal and ventral hippocampus to learning in this task. Using photometry they observed activation of the dorsal and ventral hippocampus during sensory preconditioning and conditioning. Finally, the authors combined their sensory preconditioning task with DREADDs to examine the effect of inhibiting specific cell populations (CaMKII and PV) in the DH on the formation and retrieval/expression of mediated learning. 

      Strengths: 

      The authors provide one of the first demonstrations of auditory-visual sensory preconditioning in male mice. Research on the neurobiology of sensory preconditioning has primarily used rats as subjects. The development of a robust protocol in mice will be beneficial to the field, allowing researchers to take advantage of the many transgenic mouse lines. Indeed, in this study, the authors take advantage of a PV-Cre mouse line to examine the role of hippocampal PV cells in sensory preconditioning. 

      We acknowledge the Reviewer´s effort and for highlighting the strengths of our work.

      Weaknesses: 

      (1) The authors report that sensory preconditioning was observed in both male and female mice. However, their data only supports sensory preconditioning in male mice. In female mice, both paired and unpaired presentations of the light and tone in stage 1 led to increased freezing to the tone at test. In this case, fear to the tone could be attributed to factors other than sensory preconditioning, for example, generalization of fear between the auditory and visual stimulus.

      We thank the comment raised by the Reviewer. At first, we were hypothesizing that female mice were somehow able to associate light and tone although they were presented separately during the preconditioning sessions. Thus, we designed new experiments (shown in Supplementary Figure 2D) to test if we would observe data congruent with our initial hypothesis or with fear generalization as proposed by the reviewer. We have performed a new experiment comparing a Paired group with two additional control groups that are (i) an Unpaired group where we increased the time between the light and tone presentations and (ii) an experimental group where the light was absent during the conditioning. Clearly, the new results indicate the presence of fear generalization in female mice aswe found a significant cue-induced increase on freezing responses in all the experimental groups tested. In accordance with the Reviewer’s suggestion, we can conclude that mediated learning is not correctly observed in female mice using the protocol described (i.e. with 2 conditioning sessions). All these new results forced us to reorganize the structure and the figures of the manuscript to focus more in male mice in the Main Figures whereas showing the data with female mice in Supplementary Figures. Overall, our data clearly revealed the necessity to have adapted behavioral protocols for each sex demonstrating sex differences in sensory preconditioning, which was added in the Discussion Section (Page 15, lines 12-37).

      (2) In the photometry experiment, the authors report an increase in neural activity in the hippocampus during both phase 1 (sensory preconditioning) and phase 2 (conditioning). In the subsequent experiment, they inhibit neural activity in the DH during phase 1 (sensory preconditioning) and the probe test, but do not include inhibition during phase 2 (conditioning). It was not clear why they didn't carry forward investigating the role of the hippocampus during phase 2 conditioning. Sensory preconditioning could occur due to the integration of the tone and shock during phase two, or retrieval and chaining of the tonelight-shock memories at test. These two possibilities cannot be differentiated based on the data. Given that we do not know at which stage the mediate learning is occurring, it would have been beneficial to additionally include inhibition of the DH during phase 2. 

      Following the Reviewer’s valuable comment, we have conducted a new experiment where we have chemogenetically inhibited the CaMKII-positive neurons of the dHPC during the conditioning to explore their involvement in mediated learning formation. Notably, the inhibition of principal neurons of the dHPC during conditioning does not impair the formation ofthe mediated learning in our hands. These new results are now shown in Supplementary Figure 7G and added in the Results section (Page 13, Lines 19-23).

      (3) In the final experiment, the authors report that inhibition of the dorsal hippocampus during the sensory preconditioning phase blocked mediated learning. While this may be the case, the failure to observe sensory preconditioning at test appears to be due more to an increase in baseline freezing (during the stimulus off period), rather than a decrease in freezing to the conditioned stimulus. Given the small effect, this study would benefit from an experiment validating that administration of J60 inhibited DH cells. Further, given that the authors did not observe any effect of DREADD inhibition in PV cells, it would also be important to validate successful cellular silencing in this protocol.  

      According to the Reviewer comments, we have performed new experiments to validate the use of J60 to inhibit hippocampal cells that are shown in Supplementary Figure 7 E-F for CaMKII-positive neurons, in which J60 administration tends to decrease the frequency of calcium events both in the dHPC and vHPC. Furthermore, in Supplementary Figure 8 B-C we show that J60 is also able to modify calcium events in PV-positive interneurons. Although,the best method to validate the use of DREADD (i.e. to inhibit hippocampal cell activity) could be electrophysiology recordings, we lack this technique in our laboratory. Thus, in order to adress the reviewer comment, we decided to combine the DREADD modulation through J60 administration with photometry recordings, where several tendencies are confirmed. In addition, a similar approach has been used in another preprint of the lab (https://doi.org/10.1101/2025.08.29.673009), where there is an increase of phospho-PDH, a marker of neuronal inhibition upon J60 administration in the dHPC, as well as in other experiments conducted from a collaborator lab where they were able to observe a modulation of SOM-positive interneurons activity upon J60 administration (PhD defense of Miguel Sabariego, University Pompeu Fabra, Barcelona). 

      Reviewer #3 (Public review): 

      Summary: 

      Pinho et al. investigated the role of the dorsal vs ventral hippocampus and the gender differences in mediated learning. While previous studies already established the engagement of the hippocampus in sensory preconditioning, the authors here took advantage of freely-moving fiber photometry recording and chemogenetics to observe and manipulate sub-regions of the hippocampus (dorsal vs. ventral) in a cell-specific manner. The authors first found sex differences in the preconditioning phase of a sensory preconditioning procedure, where males required more preconditioning training than females for mediating learning to manifest, and where females displayed evidence of mediated learning even when neutral stimuli were never presented together within the session. 

      After validation of a sensory preconditioning procedure in mice using light and tone neutral stimuli and a mild foot shock as the unconditioned stimulus, the authors used fiber photometry to record from all neurons vs. parvalbumin_positive_only neurons in the dorsal hippocampus or ventral hippocampus of male mice during both preconditioning and conditioning phases. They found increased activity of all neurons, as well as PV+_only neurons in both sub-regions of the hippocampus during both preconditioning and conditioning phases. Finally, the authors found that chemogenetic inhibition of CaMKII+ neurons in the dorsal, but not ventral, hippocampus specifically prevented the formation of an association between the two neutral stimuli (i.e., light and tone cues), but not the direct association between the light cue and the mild foot shock. This set of data: (1) validates the mediated learning in mice using a sensory preconditioning protocol, and stresses the importance of taking sex effect into account; (2) validates the recruitment of dorsal and ventral hippocampi during preconditioning and conditioning phases; and (3) further establishes the specific role of CaMKII+ neurons in the dorsal but not ventral hippocampus in the formation of an association between two neutral stimuli, but not between a neutralstimulus and a mild foot shock. 

      Strengths: 

      The authors developed a sensory preconditioning procedure in mice to investigate mediated learning using light and tone cues as neutral stimuli, and a mild foot shock as the unconditioned stimulus. They provide evidence of a sex effect in the formation of light-cue association. The authors took advantage of fiber-photometry and chemogenetics to target sub-regions of the hippocampus, in a cell-specific manner and investigate their role during different phases of a sensory conditioning procedure. 

      We thank the Reviewer for the extensive summary of our work and for giving interesting value to some of our findings.

      Weaknesses: 

      The authors went further than previous studies by investigating the role of sub-regions of the hippocampus in mediated learning, however, there are several weaknesses that should be noted: 

      (1) This work first validates mediated learning in a sensory preconditioning procedure using light and tone cues as neutral stimuli and a mild foot shock as the unconditioned stimulus, in both males and females. They found interesting sex differences at the behavioral level, but then only focused on male mice when recording and manipulating the hippocampus. The authors do not address sex differences at the neural level. 

      We appreciate the comment of the Reviewer. Indeed, thanks to other Reviewer comments during this revision process (see Point 1 of Reviewer #2), we performed an additional experiment that reveals that using the described protocol in female mice we observed fear generalization rather than mediated learning responding. This data pointed to the need of sex-specific changes in the behavioral protocols to measure sensory preconditioning. The revised version of the manuscript, although highlighting these sex differences in behavioral performance (see Supplementary Figure 2), is more focused in male mice and, accordingly, all photometry or chemogenetic experiments are performed using male mice. In future studies, once we are certain to have a sensory preconditioning paradigm working in female mice, it will be very interesting to study if the same hippocampal mechanisms mediating this behavior in male mice are also observed in female mice.  

      (2) As expected in fear conditioning, the range of inter-individual differences is quite high. Mice that didn't develop a strong light-->shock association, as evidenced by a lower percentage of freezing during the Probe Test Light phase, should manifest a low percentage of freezing during the Probe Test Tone phase. It would interesting to test for a correlation between the level of freezing during mediated vs test phases. 

      Thanks to the comment raised by the reviewer, we generated a new set of data correlating mediated and direct fear responses. As it can be observed in Supplementary Figure 3, there is a significant correlation between mediated and direct learning in male mice (i.e. the individuals that freeze more in the direct learning test, correlate with the individuals that express more fear response in the mediated learning test). In contrast, this correlation is absent in female mice, further confirming what we have explained above. We have highlighted this new analysis in the Results section (Page 11, Lines 20-24).

      (3) The use of a synapsin promoter to transfect neurons in a non-specific manner does not bring much information. The authors applied a more specific approach to target PV+ neurons only, and it would have been more informative to keep with this cell-specific approach, for example by looking also at somatostatin+ inter-neurons. 

      The idea behind using a pan neuronal promoter was to assess in general terms how neuronal activity in the hippocampus is engaged during different phases of the lighttone sensory preconditioning. However, the comment of the Reviewer is very pertinent and, as suggested, we have generated some new data targeting CaMKII-positive neurons (see Point 4 below). Finally, although it could be extremely interesting, we believe that targeting different interneuron subtypes is out of the scope of the present work. However, we have added this in the Discussion Section as a future perspective/limitation of our study (Page 17, Lines 9-24).   

      (4) The authors observed event-related Ca2+ transients on hippocampal pan-neurons and PV+ inter-neurons using fiber photometry. They then used chemogenetics to inhibit CaMKII+ hippocampal neurons, which does not logically follow. It does not undermine the main finding of CaMKII+ neurons of the dorsal, but not ventral, hippocampus being involved in the preconditioning, but not conditioning, phase. However, observing CaMKII+ neurons (using fiber photometry) in mice running the same task would be more informative, as it would indicate when these neurons are recruited during different phases of sensory preconditioning. Applying then optogenetics to cancel the observed event-related transients (e.g., during the presentation of light and tone cues, or during the foot shock presentation) would be more appropriate.  

      We have generated new photometry data to analyze the activity of CaMKII-positive neurons during the preconditioning phase to confirm their engagement during the light-tone pairings. Thus, we infused a CaMKII-GCAMP calcium sensor into the dHPC and vHPC of mice and we recorded its activity during the 6 preconditioning sessions. The new results can be found in Figure 3 and explained in the Results section (Page 12, Lines 26-36). The results clearly show an engagement of CaMKII-positive neurons during the light-tone pairing observed both in the dHPC and vHPC. Finally, although the suggestion of performing optogenetic manipulations would be very elegant, we expect to have convinced the reviewer that our chemogenetic results clearly show and are enough to demonstrate the involvement of dHPC in the formation of mediated learning in the Light-Tone sensory preconditioning paradigm. However, we have added this in the Discussion Section as a future perspective/limitation of our study (Page 17, Lines 9-24).  

      (5) Probe tests always start with the "Probe Test Tone", followed by the "Probe Test Light". "Probe Test Tone" consists of an extinction session, which could affect the freezing response during "Probe Test Light" (e.g., Polack et al. (http://dx.doi.org/10.3758/s13420-013-0119-5)). Preferably, adding a group of mice with a Probe Test Light with no Probe Test Tone could help clarify this potential issue. The authors should at least discuss the possibility that the tone extinction session prior to the "Probe Test Light" could have affected the freezing response to the light cue. 

      We appreciate the comment raised by the reviewer. However, we think that our direct learning responses are quite robust in all of our experiments and, thus, the impact of a possible extinction based on the tone presentation should not affect our direct learning. However, as it is an important point, we have discussed it in the Discussion Section (Page 17, Lines 12-14).  

      Reviewer #4 (Public review): 

      Summary 

      Pinho et al use in vivo calcium imaging and chemogenetic approaches to examine the involvement of hippocampal sub-regions across the different stages of a sensory preconditioning task in mice. They find clear evidence for sensory preconditioning in male but not female mice. They also find that, in the male mice, CaMKII-positive neurons in the dorsal hippocampus: (1) encode the audio-visual association that forms in stage 1 of the task, and (2) retrieve/express sensory preconditioned fear to the auditory stimulus at test. These findings are supported by evidence that ranges from incomplete to convincing. They will be valuable to researchers in the field of learning and memory. 

      We appreciate the summary of our work and all the constructive comments raised by the Reviewer, which have greatly improved the clarity and quality of our manuscript.  

      Abstract 

      Please note that sensory preconditioning doesn't require the stage 1 stimuli to be presented repeatedly or simultaneously. 

      The reviewer is right, and we have corrected and changed that information in the revised abstract.  

      "Finally, we combined our sensory preconditioning task with chemogenetic approaches to assess the role of these two hippocampal subregions in mediated learning."  This implies some form of inhibition of hippocampal neurons in stage 2 of the protocol, as this is the only stage of the protocol that permits one to make statements about mediated learning. However, it is clear from what follows that the authors interrogate the involvement of hippocampal sub-regions in stages 1 and 3 of the protocol - not stage 2. As such, most statements about mediated learning throughout the paper are potentially misleading (see below for a further elaboration of this point). If the authors persist in using the term mediated learning to describe the response to a sensory preconditioned stimulus, they should clarify what they mean by mediated learning at some point in the introduction. Alternatively, they might consider using a different phrase such as "sensory preconditioned responding". 

      Considering the arguments of the Reviewer, we have modified our text in the Abstract and through the main text. Moreover, based on a comment of Reviewer #2 (Point 2) we have generated new data demonstrating that dHPC does not seem to be involved in mediated learning formation during Stage 2, as its inhibition does not impair sensory preconditioning responding. This new data can be seen in Supplementary Figure 7G.  

      Introduction 

      "Low-salience" is used to describe stimuli such as tone, light, or odour that do not typically elicit responses that are of interest to experimenters. However, a tone, light, or odour can be very salient even though they don't elicit these particular responses. As such, it would be worth redescribing the "low-salience" stimuli in some other terms. 

      Through the revised version of the manuscript, we have replaced the term “lowsalience” by “innocuous stimuli” or avoiding any adjective as we think is not necessary.  

      "These higher-order conditioning processes, also known as mediated learning, can be captured in laboratory settings through sensory preconditioning procedures2,6-11."  Higher-order conditioning and mediated learning are not interchangeable terms: e.g., some forms of second-order conditioning are not due to mediated learning. More generally, the use of mediated learning is not necessary for the story that the authors develop in the paper and could be replaced for accuracy and clarity. E.g., "These higher-order conditioning processes can be studied in the laboratory using sensory preconditioning procedures2,6-11." 

      According to the Reviewer proposal, we have modified the text. 

      In reference to Experiment 2, it is stated that: "However, when light and tone were separated on time (Unpaired group), male mice were not able to exhibit mediated learning response (Figure 2B) whereas their response to the light (direct learning) was not affected (Figure 2D). On the other hand, female mice still present a lower but significant mediated learning response (Figure 2C) and normal direct learning (Figure 2E). Finally, in the No-Shock group, both male (Figure 2B and 2D) and female mice (Figure 2C and 2E) did not present either mediated or direct learning, which also confirmed that the exposure to the tone or light during Probe Tests do not elicit any behavioral change by themselves as the presence of the electric footshock is required to obtain a reliable mediated and direct learning responses."  The absence of a difference between the paired and unpaired female mice should not be described as "significant mediated learning" in the latter. It should be taken to indicate that performance in the females is due to generalization between the tone and light. That is, there is no sensory preconditioning in the female mice. The description of performance in the No-shock group really shouldn't be in terms of mediated or direct learning: that is, this group is another control for assessing the presence of sensory preconditioning in the group of interest. As a control, there is no potential for them to exhibit sensory preconditioning, so their performance should not be described in a way that suggests this potential. 

      All these comments are very pertinent and also raised by Reviewer #2 (Point 1, see above). In the revised version of the manuscript, we have carefully changed, when necessary, our interpretation of the results (e.g. in the case of the No-Shock group). In addition, we have generated new data that confirm that using similar conditions (i.e. 2 conditioning sessions in our SPC) in female mice we observe fear generalization and not a confident sensory preconditioning responding. In our opinion, this is not discarding the presence of mediated learning in female mice but suggesting that adapted protocols must be used in each sex. These results forced us to change the organization of the Figures but we hope the reviewer would agree with all the changes proposed. In addition, we have re-wrote a paragraph in the Discussion Section to explain these sex differences (see Page 15, lines 12-37). 

      Methods - Behavior 

      I appreciate the reasons for testing the animals in a new context. This does, however, raise other issues that complicate the interpretation of any hippocampal engagement: e.g., exposure to a novel context may engage the hippocampus for exploration/encoding of its features - hence, it is engaged for retrieving/expressing sensory preconditioned fear to the tone. This should be noted somewhere in the paper given that one of its aims is to shed light on the broader functioning of the hippocampus in associative processes. 

      This general issue - that the conditions of testing were such as to force engagement of the hippocampus - is amplified by two further features of testing with the tone. The first is the presence of background noise in the training context and its absence in the test context. The second is the fact that the tone was presented for 30 s in stage 1 and then continuously for 180s at test. Both changes could have contributed to the engagement of the hippocampus as they introduce the potential for discrimination between the tone that was trained and tested. 

      We have now added these pertinent comments in a “Study limitations” paragraph found in the Discussion Section (Page 17, Lines 9-24). Indeed, the different changes of context (including the presence of background noise) have been implemented by the fact that during the setting up of the paradigm we had problems of fear generalization (also in male mice). Similarly, differences in cue exposure between the preconditioning phase and the test phase were also decided based on important differences between previous protocols used in rats compared to how mice are responding. Certainly, mice were not able to adapt their behavioral responses when shorter time windows exposing the cue were used as it clearly happens with rats [1].

      Results - Behavior 

      The suggestion of sex differences based on differences in the parameters needed to generate sensory preconditioning is interesting. Perhaps it could be supported through some set of formal analyses. That is, the data in supplementary materials may well show that the parameters needed to generate sensory preconditioning in males and females are not the same. However, there needs to be some form of statistical comparison to support this point. As part of this comparison, it would be neat if the authors included body weight as a covariate to determine whether any interactions with sex are moderated by body weight.  

      Regarding the comparison between male and female mice, although the comments of the Reviewer are pertinent and interesting, we think that with the new data generated is not appropriate to compare both sexes as we still have to optimize the SPC protocol for female mice. 

      What is the value of the data shown in Figure 1 given that there are no controls for unpaired presentations of the sound and light? In the absence of these controls, the experiment cannot have shown that "Female and male mice show mediated learning using an auditory-visual sensory preconditioning task" as implied by its title. Minimally, this experiment should be relabelled. 

      Based on the new data generated with female mice, we have decided to remove Figure 1 and re-organize the structure of the manuscript. We hope that the Reviewer would agree that this has improved the clarity of the manuscript.  

      "Altogether, this data confirmed that we successfully set up an LTSPC protocol in mice and that this behavioral paradigm can be used to further study the brain circuits involved in higherorder     conditioning."  Please insert the qualifier that LTSPC was successfully established in male mice. There is no evidence of LTSPC in female mice. 

      We fully agree with the Reviewer and our new findings further confirm this issue. Thus, we have changed the statement in the revised version of the manuscript.  

      Results - Brain 

      "Notably, the inhibition of CaMKII-positive neurons in the dHPC (i.e. J60 administration in DREADD-Gi mice) during preconditioning (Figure 4B), but not before the Probe Test 1 (Figure 4B), fully blocked mediated, but not direct learning (Figure  4D)." The right panel of Figure 4B indicates no difference between the controls and Group DPC in the percent change in freezing from OFF to ON periods of the tone. How does this fit with the claim that CaMKII-positive neurons in the dorsal hippocampus regulate associative formation during the session of tone-light exposures in stage 1 of sensory preconditioning? 

      To improve the quality of the figures and to avoid possible redundancies between panels, in the new version of the manuscript, we have decided to remove all the panels regarding the percentage of change. However, in our opinion regarding the issue raised by the Reviewer, the inhibition of the dHPC clearly induced an impairment of mediated learning as animals do not change their behavior (i.e. there is no significant increase of freezing between OFF and ON periods) when the tone appears in comparison with the other two groups. The graphs indicating the percentage of change (old version of the manuscript) was a different manner to show the presence of tone- or light-induced responses in each experimental group. Thus, a significant effect (shown by # symbol) meant that in that specific experimental group there was a significant change in behavior (freezing) when the cue (tone or light) appeared compared when there was no cue (OFF period). Thus, in the old panel 4B commented by the Reviewer, in our opinion, the absence of significance in the group where the dHPC has been inhibited during thepreconditioning, compared to the other groups, where a clear significant effect can be observed, indicate an impairment of mediated learning formation. However, to avoid any confusion, we have slightly modified the text to strictly mention what is being analyzed and/or shown in the graphs and, as mentioned, the graphs of percentage of change have been removed.  

      Discussion 

      "When low salience stimuli were presented separated on time or when the electric footshock was absent, mediated and direct learning were abolished in male mice. In female mice, although light and tone were presented separately during the preconditioning phase, mediated learning was reduced but still present, which implies that female mice are still able to associate the two low-salience stimuli." 

      This doesn't quite follow from the results. The failure of the female unpaired mice to withhold their freezing to the tone should not be taken to indicate the formation of a light-tone association across the very long interval that was interpolated between these stimulus presentations. It could and should be taken to indicate that, in female mice, freezing conditioned to the light simply generalized to the tone (i.e., these mice could not discriminate well between the tone and light). 

      As discussed above, we fully agree with the Reviewer and all the manuscript has been modified as described above. 

      "Indeed, our data suggests that when hippocampal activity is modulated by the specific manipulation of hippocampal subregions, this brain region is not involved during retrieval."  Does this relate to the results that are shown in the right panel of Figure 4B, where there is no significant difference between the different groups? If so, how does it fit with the results shown in the left panel of this figure, where differences between the groups are observed? 

      "In line with this, the inhibition of CaMKII-positive neurons from the dorsal hippocampus, which has been shown to project to the restrosplenial cortex56, blocked the formation of mediated learning." 

      Is this a reference to the findings shown in Figure 4B and, if so, which of the panels exactly? That is, one panel appears to support the claim made here while the other doesn't. In general, what should the reader make of data showing the percent change in freezing from stimulus OFF to stimulus ON periods? 

      In our opinion, as pointed above, the graphs indicating the percentage of change were a different manner to show the presence of tone- or light-induced behavioral responses in each experimental group. Thus, a significant effect (shown by # symbol) meant that in this specific experimental group there was a significant change in behavior (freezing) when the cue (tone or light appear) compared when there was no cue (OFF period). Thus, in the old panel 4B commented by the Reviewer, in our opinion, the absence of significance in the group where the dHPC has been inhibited during the preconditioning, compared to the other groups where a clear significant effect can be observed, indicates an impairment of mediated learning formation. In the revised version of the manuscript, we have rephrased these sentences to stick to what the graphs are showing and, as explained, the graphs of percentage of change have been removed.

      Reviewer #1 (Recommendations for the authors): 

      The authors may address the following questions: 

      (1) The study identifies major sex differences in the conditioning phase, with females showing faster learning. Since hormonal fluctuations can influence learning and behavior, it would be helpful for the authors to comment on whether they tracked the estrous cycle of the females and whether any potential effects of the cycle on mediated learning were considered. 

      This is a relevant and important point raised by the Reviewer. In our study we did not track the estrous cycle to investigate whether it exists any effect of the cycle on mediated learning, which could be an interesting project by itself. Although in the revised version of the manuscript we provide new information regarding the mediated learning performance in male and female mice, we agree with the reviewer that sex hormones may account for the observed sex differences. However, the aim of the present work was to explore potential sex differences in mediated learning responding rather than to investigate the specific mechanisms behind these potential sex differences. 

      For this reason and to avoid adding further complexity to our present study, we did not check the estrous cycle in the female mice, the testosterone levels in male mice or analyze the amount of sex hormones during different phases of the sensory preconditioning task. Indeed, we think that checking the estrous cycle in female mice would still not be enough to ascertain the role of sex hormones because checking the androgen levels in male mice would also be required. In line with this, meta-analysis of neuroscience literature using the mouse model as research subjects [2-4]  has revealed that data collected from female mice (regardless of the estrous cycle) did not vary more than the data from males. In conclusion, we think that using randomized and mixed cohorts of male and female mice (as in the present study) would provide the same degree of variability in both sexes. Nevertheless, we have added a sentence to point to this possibility in the Discussion Section (Page 15, lines 32-37). 

      (2) The rationale for including parvalbumin (PV) cells in the study could be clarified. Is there prior evidence suggesting that this specific cell type is involved in mediated learning? This could apply to sensory stimuli not used in the current study.

      In the revised version of the manuscript, we have better clarified why we targeted PV interneurons, specifically mentioning previous studies [5] (see Page 11, Lines 27-34). 

      (3) The photometry recordings from the dHPC during the preconditioning phase, shown in Figure 3, are presented as average responses. It would be beneficial to separate the early vs. late trials to examine whether there is an increase in hippocampal activity as the associative learning progresses, rather than reporting the averaged data. Additionally, to clarify the dynamics of the dHPC in associative learning, the authors could compare the magnitude of photometry responses when light and tone stimuli are presented individually in separate sessions versus when they are presented closely in time to facilitate associative learning.

      As commented above, according to the Reviewer’s comment, we have now included a new Supplementary Figure 4, which splits the photometry data by the different preconditioning and conditioning sessions. Overall, this data suggests that there are no major changes on cell activity in both hippocampal regions during the different sessions as similar light-tone-induced enhancement of activity is observed. There is only an interesting trend in the activity of Pan-Neurons over the onset of light during conditioning sessions. All this is included now in the Results Section (Page 12, Line 13-15).

      (4) The authors note that PV cell responses recorded with GCaMP were similar to general hippocampal neurons, yet chemogenetic manipulations of PV cells did not impact behavior. A more detailed discussion of this discrepancy would be helpful. 

      As suggested by the Reviewer, we have included additional Discussion to explain the potential discrepancy between the activity of PV interneurons assessed by photometry and its modulation by chemogenetics (see Page 16, Lines 27-33).   

      (5) All fiber photometry recordings were conducted in male mice. Given the sex differences observed in associative learning, the authors could expand the study to include dHPC responses in females during both preconditioning and conditioning sessions. 

      We appreciate the comment of the Reviewer. Indeed, thanks to other comments made by other Reviewers in this revision (see Point 1 of Reviewer #2), we are not still sure that we have an optimal protocol to study mediated learning in female mice due to sexspecific changes related to fear generalization. Thus, the revised version of the manuscript, although highlighting these sex differences in behavioral performance (see Supplementary Figure 2), is more focused in male mice and, accordingly, all photometry or chemogenetic experiments are performed exclusively using male mice. In future studies, once we would be sure to have a sensory preconditioning paradigm working in female mice, it will be very interesting to study if the same hippocampal mechanisms mediating this behavior in male mice are also observed in female mice. 

      Minor Comments: 

      (1) In the right panel of Figure 2A, females received only one conditioning session, so the "x2" should be corrected to "x1" conditioning to accurately reflect the data. 

      We thank the Reviewer for the comment that has been addressed in the revised version of the manuscript.  

      (2) The overall presentation of Figure 3 could be improved. For example, the y-axis in Panel B could be cut to a maximum of 3 rather than 6, which would better highlight the response data. Alternatively, including heatmap representations of the z-score responses could enhance clarity and visual impact.  

      We thank the Reviewer for the comment that has been addressed providing a new format for Figures 2 and 3 in the revised version of the manuscript.   

      (3) There are several grammatical errors throughout the manuscript. It is recommended that the authors use a grammar correction tool to improve the overall writing quality and readability.  

      We have tried to correct the grammar through all the manuscript.  

      Reviewer #2 (Recommendations for the authors):  

      (1) In the abstract the authors write that sensory preconditioning requires the "repeated and simultaneous presentation of two low-salience stimuli such as a light and a tone". Previous research has shown that sensory preconditioning can still occur if the two stimuli are presented serially, rather than simultaneously. Further, the tone and the light are not necessarily "low-salience", for example, they can be loud or bright. It would be better to refer to them as innocuous. 

      In the revised version of the abstract, we have included the modifications suggested by the Reviewer.   

      (2) The authors develop a novel automated tool for assessing freezing behaviour in mice that correlates highly with both manual freezing and existing, open-source freeze estimation software (ezTrack). The authors should explain how the new program differs from ezTrack, or if it provides any added benefit over this existing software. 

      We have added new information in the Results Section (Page 10, Lines 13-20 to better explain how the new tool to quantify freezing could improve existing software.  

      (3) In Experiment 1, the authors report a sex difference in levels of freezing between male and female mice when they are only given one session of sensory preconditioning. This should be supported by a statistical comparison of levels of freezing between male and female mice. 

      Based on the new results obtained with female mice, we have decided to remove the original Figure 1 of the manuscript as it is not meaningful to compare male and female mediated learning response if we do not have an optimal protocol in female mice.  

      (4) Why did the authors choose to vary the duration of the stimuli across preconditioning, conditioning, and testing? During preconditioning, the light-tone compound was 30s, in conditioning the light was 10s, and at test both stimuli were presented continuously for 3 min. Did the level of freezing vary across the three-minute probe session? There is some evidence that rodents can learn the timing of stimuli and it may be the case that freezing was highest at the start of the test stimulus, when it most closely resembled the conditioned stimulus. 

      Differences in cue exposure between the preconditioning phase and the test phase were decided based on important differences between previous protocols used in rats compared to how mice are responding. Indeed, mice were not able to adapt their behavioral responses when shorter time windows exposing the cue were used as it clearly happens with rats1. In addition, we have added a new graph to show the time course of the behavioral responses (see Figure 1 and 4 and Supplementary Figure 2) that correlate with the quantification of freezing responses shown by the percentage of freezing during ON and OFF periods.   

      (5) The title of Experiment 1 "Female and male mice show mediated learning using an auditory-visual sensory preconditioning task" - this experiment does not demonstrate mediated learning; it merely shows that animals will freeze more in the presence of a stimulus as compared with no stimulus. This experiment lacks the necessary controls to claim mediated learning (which are presented in Experiment 2) and should therefore be retitled something more appropriate.

      As stated above, based on the new results obtained with female mice, we have decided to remove the original Figure 1 of the manuscript as it is not meaningful to compare male and female mediated learning response if we do not have an optimal protocol in female mice.   

      (6) In Figure 2, why does the unpaired group show less freezing to the tone than the paired group given that the tone was directly paired with the shock in both groups? 

      We believe the Reviewer may have referred to the tone in error (i.e. there are no differences in the freezing observed to the tone) and (s)he might be talking about the freezing induced by the Light in the direct learning test. In this case, it is true that the direct learning (e.g. percentage of freezing) seems to be slightly lower in the unpaired group compared to the paired one, which could be due to a latent inhibition process caused by the different exposure of cues between paired and unpaired experimental groups. However, the direct learning in both groups is clear and significant and there are no significant differences between them, which makes difficult to extract any further conclusion. 

      (7) The stimuli in the design schematics are quite small and hard to see, they should be enlarged for clarity. The box plots also looked stretched and the colour difference between the on and off periods is difficult to discern. 

      We have included some important modification to the Figures in order to address the comments made by the Reviewer and improve its quality.   

      (8) The authors do not include labels for the experimental groups (paired, unpaired, no shock) in Figures 2B, 2D, 2C, and 2E. This made it very difficult to interpret the figure.  

      According to this suggestion, Figure 2 has been changed accordingly. 

      (9) The levels of freezing during conditioning should be presented for all experiments.  

      We have generated a new Supplementary Figure 9 to show the freezing levels during conditioning sessions. 

      (10) In the final experiment, the authors wrote that mice were injected with J60 or saline, but I could not find the data for the saline animals.  

      In the Results and Methods section, we have included a sentence to better explain this issue. In addition, we have added a new Supplementary Figure 7 to show the performance of all control groups.  

      (11) Please list the total number of animals (per group, per sex) for each experiment.  

      In the revised version of the manuscript, we have added this information in each Figure Legend.  

      Reviewer #3 (Recommendations for the authors): 

      I found this study very interesting, despite a few weaknesses. I have several minor comments to add, hoping that it would improve the manuscript: 

      (1) The terminology used is not always appropriate/consistent. I would use "freely moving fiber photometry" or simply "fiber photometry" as calcium imaging conventionally refers to endoscopic or 2-photon calcium imaging. 

      We thank the Reviewer for this comment that has been addressed and corrected in the revised version of the manuscript. 

      (2) "Dorsal hippocampus mediates light-tone sensory preconditioning task in mice" suggests that a brain region mediates a task. I would rather suggest, e.g. "Dorsal hippocampus mediates light-tone association in mice" 

      We thank the Reviewer for this comment that has been addressed and corrected in the revised version of the manuscript.

      (3) As you are using low-salience stimuli, it would be better to also inform the readership with the light intensity used for the light cue, for replicability purposes. 

      In the Methods section (Page 5, Line 30), we have added new information regarding the visual stimuli used. 

      (4) If the authors didn't use a background noise during the probe tests, the tone cue could have been perceived as being louder/clearer by mice. Couldn't it have inflated the freezing response for the tone cue?  

      This is an interesting comment made by the Reviewer although we do not have any data to directly answer his/her suggestion. However, the presence of the Background noise resulted necessary to set up the protocol and to change different aspects of the context through all the paradigm, which was necessary to avoid fear generalization in mice. In addition, as demonstrated before [6] , the presence of background noise is important to avoid that other auditory cue (i.e. tone) could induce fear responses by itself as the transition of noise to silence is a signal to danger for animals. 

      (5) "salience" is usually used for the intensity of a stimulus, not for an association or pairing. Rather, we usually refer to the strength of an association. 

      We thank the Reviewer for this comment that has been addressed and corrected in the revised version of the manuscript.

      (6) Figure 3, panel A. "RCaMP Neurons", maybe "Pan-Neurons" would be more appropriate, as PV+ inter-neurons are also neurons. 

      We thank the Reviewer for this comment that has been corrected accordingly.

      (7) Figure 4, panel A, please add the AAV injected, and the neurons labelled in your example slice. 

      We thank the Reviewer for this comment that has been corrected accordingly.

      References

      (1) Wong, F. S., Westbrook, R. F. & Holmes, N. M. 'Online' integration of sensory and fear memories in the rat medial temporal lobe. Elife 8 (2019). https://doi.org:10.7554/eLife.47085

      (2) Prendergast, B. J., Onishi, K. G. & Zucker, I. Female mice liberated for inclusion in neuroscience and biomedical research. Neurosci Biobehav Rev 40, 1-5 (2014). https://doi.org:10.1016/j.neubiorev.2014.01.001

      (3) Becker, J. B., Prendergast, B. J. & Liang, J. W. Female rats are not more variable than male rats: a meta-analysis of neuroscience studies. Biol Sex Differ 7, 34 (2016). https://doi.org:10.1186/s13293-016-0087-5

      (4) Shansky, R. M. Are hormones a "female problem" for animal research? Science 364,  825-826 (2019). https://doi.org:10.1126/science.aaw7570

      (5) Busquets-Garcia, A. et al. Hippocampal CB1 Receptors Control Incidental Associations. Neuron 99, 1247-1259 e1247 (2018). https://doi.org:10.1016/j.neuron.2018.08.014

      (6) Pereira, A. G., Cruz, A., Lima, S. Q. & Moita, M. A. Silence resulting from the cessation of movement signals danger. Curr Biol 22, R627-628 (2012). https://doi.org:10.1016/j.cub.2012.06.015

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      SMC5/6 is a highly conserved complex able to dynamically alter chromatin structure, playing in this way critical roles in genome stability and integrity that include homologous recombination and telomere maintenance. In the last years, a number of studies have revealed the importance of SMC5/6 in restricting viral expression, which is in part related to its ability to repress transcription from circular DNA. In this context, Oravcova and colleagues recently reported how SMC5/6 is recruited by two mutually exclusive complexes (orthologs of yeast Nse5/6) to SV40 LT-induced PML nuclear bodies (SIMC/SLF2) and DNA lesions (SLF1/2). In this current work, the authors extend this study, providing some new results. However, as a whole, the story lacks unity and does not delve into the molecular mechanisms responsible for the silencing process. One has the feeling that the story is somewhat incomplete, putting together not directly connected results.

      Please see the introductory overview above.

      (1) In the first part of the work, the authors confirm previous conclusions about the relevance of a conserved domain defined by the interaction of SIMC and SLF2 for their binding to SMC6, and extend the structural analysis to the modelling of the SIMC/SLF2/SMC complex by AlphaFold. Their data support a model where this conserved surface of SIMC/SLF2 interacts with SMC at the backside of SMC6's head domain, confirming the relevance of this interaction site with specific mutations. These results are interesting but confirmatory of a previous and more complete structural analysis in yeast (Li et al. NSMB 2024). In any case, they reveal the conservation of the interaction. My major concern is the lack of connection with the rest of the article. This structure does not help to understand the process of transcriptional silencing reported later beyond its relevance to recruit SMC5/6 to its targets, which was already demonstrated in the previous study.

      Demonstrating the existence of a conserved interface between the Nse5/6-like complexes and SMC6 in both yeast and human is foundationally important, not confirmatory, and was not revealed in our previous study. It remains unclear how this interface regulates SMC5/6 function, but yeast studies suggest a potential role in inhibiting the SMC5/6 ATPase cycle. Nevertheless, the precise function of Nse5/6 and its human orthologs in SMC5/6 regulation remain undefined, largely due to technical limitations in available in vivo analyses. The SIMC1/SLF2/SMC6 complex structure likely extends to the SLF1/2/SMC6 complex, suggesting a unifying function of the Nse5/6-like complexes in SMC5/6 regulation, albeit in the distinct processes of ecDNA silencing and DNA repair. There have been no studies to date (including this one) showing that SIMC1-SLF2 is required for SMC5/6 recruitment to ecDNA. Our previous study showed that SIMC1 was needed for SMC5/6 to colocalize with SV40 LT antigen at PML NBs. Here we show that SIMC1 is required for ecDNA repression, in the absence of PML NBs, which was not anticipated.

      (2) In the second part of the work, the authors focus on the functionality of the different complexes. The authors demonstrate that SMC5/6's role in transcription silencing is specific to its interaction with SIMC/SLF2, whereas SMC5/6's role in DNA repair depends on SLF1/2. These results are quite expected according to previous results. The authors already demonstrated that SLF1/2, but not SIMC/SLF2, are recruited to DNA lesions. Accordingly, they observe here that SMC5/6 recruitment to DNA lesions requires SLF1/2 but not SIMC/SLF2. Likewise, the authors already demonstrated that SIMC/SLF2, but not SLF1/2, targets SMC5/6 to PML NBs. Taking into account the evidence that connects SMC5/6's viral resistance at PML NBs with transcription repression, the observed requirement of SIMC/SLF2 but not SLF1/2 in plasmid silencing is somehow expected. This does not mean the expectation has not to be experimentally confirmed. However, the study falls short in advancing the mechanistic process, despite some interesting results as the dispensability of the PML NBs or the antagonistic role of the SV40 large T antigen. It had been interesting to explore how LT overcomes SMC5/6-mediated repression: Does LT prevent SIMC/SLF2 from interacting with SMC5/6? Or does it prevent SMC5/6 from binding the plasmid? Is the transcription-dependent plasmid topology altered in cells lacking SIMC/SLF2? And in cells expressing LT? In its current form, the study is confirmatory and preliminary. In agreement with this, the cartoons modelling results here and in the previous work look basically the same.

      Our previous study only examined the localization of SLF1 and SIMC1 at DNA lesions. The localization of these subcomplexes alone should not be used to define their roles in SMC5/6 localization. Indeed, the field is split in terms of whether Nse5/6-like complexes are required for ecDNA binding/loading, or regulation of SMC5/6 once bound. 

      We agree, determining the potential mechanism of action of LT in overcoming SMC5/6-based repression is an important next step. We believe it is unlikely due to blocking of the SMC5/6SIMC1/SLF2 interface, since SIMC1-SLF2 is required for SMC5/6 to localize at LT-induced foci. It will require the identification of any direct interactions with SMC5/6 subunits, and better methods for assessing SMC5/6 loading and activity on ecDNAs. Unlike HBx, Vpr, and BNRF1 it does not appear to induce degradation of SMC5/6, making it a more complex and interesting challenge. Also, the dispensability of PML NBs in plasmid silencing versus viral silencing raises multiple important questions about SMC5/6’s repression mechanism. 

      (3) There are some points about the presented data that need to be clarified.

      Thank you, we have addressed these points below, within the Recommendations for authors section.

      Reviewer #2 (Public review):

      Oracová et al. present data supporting a role for SIMC1/SLF2 in silencing plasmid DNA via the SMC5/6 complex. Their findings are of interest, and they provide further mechanistic detail of how the SMC5/6 complex is recruited to disparate DNA elements. In essence, the present report builds on the author's previous paper in eLife in 2022 (PMID: 36373674, "The Nse5/6-like SIMC1-SLF2 complex localizes SMC5/6 to viral replication centers") by showing the role of SIMC1/SLF2 in localisation of the SMC5/6 complex to plasmid DNA, and the distinct requirements as compared to recruitment to DNA damage foci. Although the findings of the manuscript are of interest, we are not yet convinced that the new data presented here represents a compelling new body of work and would better fit the format of a "research advance" article. In their previous paper, Oracová et al. show that the recruitment of SMC5/6 to SV40 replication centres is dependent on SIMC1, and specifically, that it is dependent on SIMC1 residues adjacent to neighbouring SLF2.

      We agree. We submitted this manuscript as a “Research Advance”, not as a standalone research article, given that it is an extension of our previous “Research Article” (1).

      Other comments

      (1) The mutations chosen in Figure 1 are quite extensive - 5 amino acids per mutant. In addition, they are in many cases 'opposite' changes, e.g., positive charge to negative charge. Is the effect lost if single mutations to an alanine are made?

      The mutations were chosen to test and validate the predicted SIMC1-SLF2-SMC6 structure i.e. the contact point between the conserved patch of SIMC1-SLF2 and SMC6. Multiple mutations and charge inversions increased the chance of disrupting the extensive interface. In this respect, the mutations were successful and informative, confirming the requirement of this region in specifically contacting SMC6. Whilst alanine scanning mutations are possible, we believe that they would not add to, or detract from, our validation of the predicted SIMC1-SLF2-SMC6 interface.

      (2) In Figure 2c, it isn't clear from the data shown that the 'SLF2-only' mutations in SMC6 result in a substantial reduction in SIMC1/SLF2 binding.

      To clarify the difference between wild-type and SLF2-only mutations in SIMC1-SLF2 interaction, we have performed an image volume analysis. This shows that the SLF2-facing SMC6 mutant reduces its interaction with SIMC1 (to 44% of WT) and SLF2 (to 21% of WT). The reduction in both SIMC1 and SLF2 interaction with SMC6 SLF2-facing mutant is expected, since SIMC1 and SLF2 are an interdependent heterodimer.  

      Author response table 1.

      (3) In the GFP reporter assays (e.g. Figure 3), median fluorescence is reported - was there any observed difference in the percentage of cells that are GFP positive?

      Yes, as expected when the GFP plasmid is not actively repressed, the percent of GFP positive cells differs in each cell line – in the same trend as GFP intensity

      (4) The potential role of the large T antigen as an SMC5/6 evasion factor is intriguing. However, given the role of the large T antigen as a transcriptional activator, caution is required when interpreting enhanced GFP fluorescence. Antagonism of the SMC5/6 complex in this context might be further supported by ChIP experiments in the presence or absence of large T. Can large T functionally substitute for HBx or HIV-Vpr?

      We agree, the potential role of LT in SMC5/6 antagonism is interesting. We did state in the text “While LT is known to be a promiscuous transcriptional activator (2,3) that does not rule out a co-existing role in antagonizing SMC5/6. Indeed, these findings are reminiscent of HBx from HBV and Vpr of HIV-1, both of which are known promiscuous transcriptional activators that also directly antagonize SMC5/6 to relieve transcriptional repression (4-10).“ We have tried ChIP experiments, but found these to be unreliable in assessing SMC5/6 association with plasmid DNA. Given the many disparate targets of LT, HBx and Vpr (other than SMC5/6), it seems unlikely that LT could functionally substitute for HBx and Vpr in supporting HBV and HIV-1 infections. Whilst certainly an interesting future question, we believe it is beyond the scope of this study.

      (5) In Figure 5c, the apparent molecular weight of large T and SMC6 appears to change following transfection of GFP-SMC5 - is there a reason for this?

      We are not certain as to what causes the molecular weight shift, but it is not specifically related to GFPSMC5 transfection. Rather, it appears to be a general effect of the pulldown. Indeed, a very weak “background” band of LT is seen in the GFP only pulldown, which also runs at a “higher” molecular weight, as in the GFP-SMC5 pulldown. We believe that the effect is instead related to gel mobility in the wells that contain post pulldown proteins and different buffers. We have also seen similar effects using different protein-protein interaction pairs. 

      Reviewer #3 (Public review):

      Summary:

      This study by the Boddy and Otomo laboratories further characterizes the roles of SMC5/6 loader proteins and related factors in SMC5/6-mediated repression of extrachromosomal circular DNA. The work shows that mutations engineered at an AlphaFold-predicted protein-protein interface formed between the loader SLF2/SIMC1 and SMC6 (similar to the interface in the yeast counterparts observed by cryo-EM) prevent co-IP of the respective proteins. The mutations in SLF2 also hinder plasmid DNA silencing when expressed in SLF2-/- cell lines, suggesting that this interface is needed for silencing. SIMC1 is dispensable for recruitment of SMC5/6 to sites of DNA damage, while SLF1 is required, thus separating the functions of the two loader complexes. Preventing SUMOylation (with a chemical inhibitor) increases transcription from plasmids but does not in SLF2-deleted cell lines, indicating the SMC5/6 silences plasmids in a SUMOylation dependent manner. Expression of LT is sufficient for increased expression, and again, not additive or synergistic with SIMC1 or SLF2 deletion, indicating that LT prevents silencing by directly inhibiting 5/6. In contrast, PML bodies appear dispensable for plasmid silencing.

      Strengths:

      The manuscript defines the requirements for plasmid silencing by SMC5/6 (an interaction of Smc6 with the loader complex SLF2/SIMC1, SUMOylation activity) and shows that SLF1 and PML bodies are dispensable for silencing. Furthermore, the authors show that LT can overcome silencing, likely by directly binding to (but not degrading) SMC5/6.

      Weaknesses:

      (1) Many of the findings were expected based on recent publications.

      There have been no manuscripts describing the role of SIMC1-SLF2 in ecDNA silencing. There have been studies describing SLF2’s roles in ecDNA silencing, but these suggested SLF2 had an SLF1 independent role, with no mention of an alternate Nse5-like cofactor. Our earlier study in eLife (1) described the identification of SIMC1 as an Nse5-like cofactor for SLF2 but did not test potential roles of the complex in ecDNA silencing. Also, the apparent dispensability of PML NBs in plasmid silencing (in U2OS cells) was unexpected based on recent publications. Finally, SV40 LT has not previously been implicated in SMC5/6 inhibition, which may occur through novel mechanisms.

      (2) While the data are consistent with SIMC1 playing the main function in plasmid silencing, it is possible that SLF1 contributes to silencing, especially in the absence of SIMC1. This would potentially explain the discrepancy with the data reported in ref. 50. SLF2 deletion has a stronger effect on expression than SIMC1 deletion in many but not all experiments reported in this manuscript. A double mutant/deletion experiments would be useful to explore this possibility.

      It is interesting to note that the data in ref. 50 (11) is also at odds with that in ref. 45 (8) in terms of defining a role for SLF1 in the silencing of unintegrated HIV-1 DNA. The Irwan study showed that SLF1 deficient cells exhibit increased expression of a reporter gene from unintegrated HIV-1, whereas the Dupont study found that SLF1 deletion, unlike SLF2 deletion, has no effect. It is unclear what the basis of this discrepancy is. In line with the Dupont study, we found no effect of SLF1 deletion on plasmid expression (Figure 4B), whereas SLF2 deletion increased reporter expression (Figure 3A/B). It is possible that SLF1 could support some plasmid silencing in the absence of SIMC1, especially considering the gross structural similarity in their C-terminal Nse5-like domains. However, we have been unable to generate double-knockout SIMC1 and SLF1 cells to test such a possibility, and shSLF1 has been ineffective. 

      (3) SLF2 is part of both types of loaders, while SLF1 and SIMC1 are specific to their respective loaders. Did the authors observe differences in phenotypes (growth, sensitivities to DNA damage) when comparing the mutant cell lines or their construction? This should be stated in the manuscript.

      We have not observed significant differences in the growth rates of each cell line, and DNA damage sensitivities are as yet untested.   

      (4) It would be desirable to have control reporter constructs located on the chromosome for several experiments, including the SUMOylation inhibition (Figures 5A and 5-S2) and LT expression (Figure 5D) to exclude more general effects on gene expression.

      We have repeated all GFP reporter assays using integrated versus episomal plasmid DNA. A seminal study by Decorsière et al. (6) showed that SMC5/6 degradation by HBx of HBV increased transcription of episomal but not chromosomally integrated reporters. In line with this data, the deletion of SLF2 does not notably impact the expression of our GFP reporter construct when it is genomically integrated (Figure 3—figure supplement 1C).  

      Somewhat surprisingly, given the generally transcriptionally repressive roles of SUMO, inhibition of the SUMO pathway with SUMOi did not significantly impact the expression of our genomically integrated GFP reporter, versus the episomal plasmid (Figure 5—figure supplement 1C). Finally, the expression of SV40 LT, which enhances plasmid reporter expression (Figure 5D), also did not notably affect expression of the same reporter when located in the genome (Figure 5—figure supplement 3B). This is an interesting result, which is in line with an early study showing that HBx of HBV induces transcription from episomal, but not chromosomally integrated reporters (12). This further suggests that SV40 LT acts similarly to other early viral proteins like HBx and Vpr to counteract or bypass SMC5/6 restriction, amongst their multifaceted functions. Clearly, further analyses are needed to define mechanisms of LT in counteracting SMC5/6, but they do not appear to include complex degradation as seen with HBx and Vpr.  

      (5) Figure 5A: There appears to be an increase in GFP in the SLF2-/- cells with SUMOi? Is this a significant increase?

      No significant difference was found between WT, SIMC1-/- or SLF2-/- when treated with SUMOi (p>0.05). The p-value is 0.0857 (when comparing SLF2-/- to WT in the SUMOi condition) This is described in the figure legend to Figure 5.

      (6) The expression level of SFL2 mut1 should be tested (Figure 3B).

      Full length SLF2 (WT or mutants) has been undetectable by western analyses. However, truncated SLF2 mut1 expresses well and binds SIMC1 but not SMC6 (Figure 1C). Moreover, full length SLF2 mut1 expression was confirmed by qPCR – showing a somewhat higher expression level than SLF2 WT (Figure 3—figure supplement 1B).  

      Reviewer #1 (Recommendations for the authors):

      There are some points about the presented data that need to be clarified.

      (1) Figures 3, 4B, and 5. The authors should rule out the possibility that the reported effects on transcription were due to alterations in plasmid number. This is particularly important, taking into account the importance of SMC5/6 in DNA replication.

      We used qPCR to assess plasmid copy number versus genomic DNA in our cell lines, testing at 72 hours post transfection to avoid any impact of cytosolic DNA (13). Our qPCR data show that there is no significant impact on plasmid copy number across our cell lines i.e. WT and SLF2 null.  SMC5/6 has a positive role in DNA replication progression on the genome (e.g. (14)), so loss of SMC5/6 “targeting” in SIMC1 and SLF2 null cells would be unlikely to promote replication fork progression per se. 

      (2) Figure S1A. In contrast to the statement in the text, the SIMC1-combo control is affected in its binding to SLF2; however, it is not affected in its binding to SMC6. This is somehow unexpected because it suggests that the solenoid-like structure is not required for SMC6 binding, just specific patches at either SIMC or SLF2. This should be commented on.

      We appreciate the reviewer’s observation regarding the discrepancy between Figure S1A and the text. This was our oversight. The data show that SLF2 recovery was reduced in the pull-down with the SIMC1 combo control mutant, while SLF2 expression was unchanged. Because SLF2 or SIMC1 variants that fail to associate typically show poor expression (1), these findings suggest that the SIMC1 combo control mutant associates with SLF2, albeit more weakly. Since the mutations were introduced into surface residues of SIMC1, it is not immediately clear how they would weaken the interaction or destabilize the complex. In contrast, SMC6 was fully recovered with the SIMC1 combo control mutant, indicating that the SIMC1–SMC6 interaction remains stable without stoichiometric SLF2. This may reflect direct recognition of a SIMC1 binding epitope or stabilization of its solenoid structure by SMC6, although this interpretation remains uncertain given the unstable nature of free SIMC1 and SLF2. Alternatively, SMC6 may have co-sedimented with the SIMC1 combo control mutant together with SLF2, which was initially retained but subsequently lost during washing, whereas SMC6 remained due to its limited solubility in the absence of other SMC5/6 subunits. While further mechanistic analysis will require purified SMC5/6 components, our data support the AlphaFold-based model by demonstrating that SIMC1 mutations on the non–SMC6-contacting surface retain association with SMC6. The text has been revised accordingly.

      (3) The SLF2-only mutant has alterations that affect interactions with both SLF2 and SIMC1. Is it not another Mixed mutant?

      We appreciate the reviewer’s observation regarding the discrepancy between the mutant name (“SLF2only”) and its description (“while N947 forms salt bridges with SIMC1”). The previous statement was inaccurate due to a misinterpretation of several AlphaFold models. Across these models, the SIMC1– SLF2 interface residues remain largely consistent, but the SIMC1 residue R470 exhibits positional variability—contacting N947 in some models but not in others. Given this variability and the absence of an experimental structure, we have revised the text to avoid overinterpretation. Because the N947 side chain is oriented toward SLF2 and consistently forms polar contacts with the H1148 side chain and G1149 backbone, we have renamed this mutant “SLF2-facing,” which more accurately describes its modeled environment. The other mutants are likewise renamed “SIMC1-facing” and “SIMC1–SLF2groove-facing,” providing a clearer and more consistent description of the interface.

      (4) The SLF2-only mutant still displays clear interactions with SMC6. Can this be explained with the AlphaFold model?

      SIMC1 may contribute more substantially to SMC6 binding than SLF2, consistent with our mutagenesis results. However, the energetic contributions of individual residues or proteins cannot be quantitatively inferred from structural models alone. Comprehensive experimental and computational analyses would be required to address this point.

      (5) The conclusions about the role of SUMOylation are vague; it is already known that its general effect on transcription repression, and the authors already demonstrated that SIMC interacts with SUMO pathway factors. Concerning the epistatic effect, the experiment should be done at a lower inhibitor concentration; at 100 nM there is not much margin to augment according to the kinetics analysis in Figure S5.

      The SUMO pathway is indeed thought to be generally repressive for transcription. Notably, in response to a suggestion from Reviewer 3 (public review point 4), we have repeated several of our GFP expression assays using cells with the GFP reporter plasmid integrated into the genome (please see Figure 3—figure supplement 1C; Figure 5—figure supplement 1C; Figure 5—figure supplement 3B). This type of integrated reporter does not show elevated expression following inhibition of the SMC5/6 complex, unlike ecDNAs (6,10). Interestingly, SUMOi, LT expression, and SLF2 knockout also did not notably impact the expression of our integrated GFP reporter (Figure 3—figure supplement 1C; Figure 5—figure supplement 1C; Figure 5—figure supplement 3B, unlike that of the plasmid (ecDNA) reporter. Given the “general” inhibitory effect of SUMO on transcription, the SUMOi result was not expected, and it opens further interesting avenues for study. 

      In Figure 5—figure supplement 1A, 100 nM SUMOi increases reporter expression well below the highest SUMOi dose. We believe that the ~3-4 fold induction of GFP expression in SLF2 null cells, if independent of SUMOylation, should further increase GFP expression. The impact of SUMOylation on GFP reporter expression remains “vague”, but our data indicate that SMC5/6 operates within SUMO’s “umbrella” function and provides a starting point for more mechanistic dissection. 

      (6) Figure 5C. Why is the size different between Input versus GFP-PD?

      Please see our response to this question above: reviewer 2, point (5)

      Reviewer #2 (Recommendations for the authors):

      If further data could be provided to extend on that which is presented, then publication as a 'standalone research article' may be appropriate, but not in its present form.

      We submitted this manuscript as a “Research Advance” not as a standalone research article, given that it was an extension of our previous research article (1).

      Reviewer #3 (Recommendations for the authors):

      (1) The term 'LT' should be defined in the title

      We have updated the title accordingly.  

      (2) This reviewer found the nomenclature of the SMC6 mutants confusing (SIMC1-only...). Either rephrase or define more clearly in the text and the figures.

      We agree with the reviewer and have renamed the mutants as “SIMC1-facing”, “SLF2-facing,”, and “SIMC1–SLF2-groove-facing”.

      (3) The authors could better emphasize that LT blocks silencing in trans (not only on its cognate target sequence in cis). This is consistent with the observed direct binding to SMC5/6.

      We appreciate the suggestion to further emphasize the impact of LT on plasmid silencing. We did not want to overstate its impact at this time because we do not know if it directly binds SMC5/6 or indeed affects SMC5/6 function more broadly. LT expression like HBx, does cause induction of a DNA damage response, but we cannot at this point tie that response to SMC5/6 inhibition alone.

      (4) Figure 5 S1: the merge looks drastically different. Is DAPI omitted in the wt merge image?

      Thank you for noting this issue. We have corrected the image, which was impacted by the use of an underexposed DAPI image.  

      (5) Figure 1: how is the structure in B oriented relative to A? A visual guide would be helpful.

      We have added arrows to indicate the view orientation and rotational direction to turn A to B.

      (6) Line 126, unclear what "specificity" here means.

      We have revised the sentence without this word, which now starts with “To confirm the SIMC1-SMC6 interface, we introduced….”

      (7) Line 152, The statement implies that the conserved residues are needed for loader subunits interactions ('mediating the SIMC1-SLF2 interaction"). Does Figure 1C not show that the residues are not important? Please clarify.

      Thank you for noting this writing error. We have corrected the sentence to provide the intended meaning. It now reads "Collectively, these results confirm that the conserved surface patch of SIMC1SLF2 is essential for SMC6 binding.” 

      References

      (1) Oravcova M, Nie M, Zilio N, Maeda S, Jami-Alahmadi Y, Lazzerini-Denchi E, Wohlschlegel JA, Ulrich HD, Otomo T, Boddy MN. The Nse5/6-like SIMC1-SLF2 complex localizes SMC5/6 to viral replication centers. Elife. 2022;11. PMCID: PMC9708086

      (2) Sullivan CS, Pipas JM. T antigens of simian virus 40: molecular chaperones for viral replication and tumorigenesis. Microbiol Mol Biol Rev. 2002;66(2):179-202. PMCID: PMC120785

      (3) Gilinger G, Alwine JC. Transcriptional activation by simian virus 40 large T antigen: requirements for simple promoter structures containing either TATA or initiator elements with variable upstream factor binding sites. J Virol. 1993;67(11):6682-8. PMCID: PMC238107

      (4) Qadri I, Conaway JW, Conaway RC, Schaack J, Siddiqui A. Hepatitis B virus transactivator protein, HBx, associates with the components of TFIIH and stimulates the DNA helicase activity of TFIIH. Proc Natl Acad Sci U S A. 1996;93(20):10578-83. PMCID: PMC38195

      (5) Aufiero B, Schneider RJ. The hepatitis B virus X-gene product trans-activates both RNA polymerase II and III promoters. EMBO J. 1990;9(2):497-504. PMCID: PMC551692

      (6) Decorsiere A, Mueller H, van Breugel PC, Abdul F, Gerossier L, Beran RK, Livingston CM, Niu C, Fletcher SP, Hantz O, Strubin M. Hepatitis B virus X protein identifies the Smc5/6 complex as a host restriction factor. Nature. 2016;531(7594):386-9. 

      (7) Murphy CM, Xu Y, Li F, Nio K, Reszka-Blanco N, Li X, Wu Y, Yu Y, Xiong Y, Su L. Hepatitis B Virus X Protein Promotes Degradation of SMC5/6 to Enhance HBV Replication. Cell Rep. 2016;16(11):2846-54. PMCID: PMC5078993

      (8) Dupont L, Bloor S, Williamson JC, Cuesta SM, Shah R, Teixeira-Silva A, Naamati A, Greenwood EJD, Sarafianos SG, Matheson NJ, Lehner PJ. The SMC5/6 complex compacts and silences unintegrated HIV-1 DNA and is antagonized by Vpr. Cell Host Microbe. 2021;29(5):792-805 e6. PMCID: PMC8118623

      (9) Felzien LK, Woffendin C, Hottiger MO, Subbramanian RA, Cohen EA, Nabel GJ. HIV transcriptional activation by the accessory protein, VPR, is mediated by the p300 co-activator. Proc Natl Acad Sci U S A. 1998;95(9):5281-6. PMCID: PMC20252

      (10) Diman A, Panis G, Castrogiovanni C, Prados J, Baechler B, Strubin M. Human Smc5/6 recognises transcription-generated positive DNA supercoils. Nat Commun. 2024;15(1):7805. PMCID: PMC11379904

      (11) Irwan ID, Bogerd HP, Cullen BR. Epigenetic silencing by the SMC5/6 complex mediates HIV-1 latency. Nat Microbiol. 2022;7(12):2101-13. PMCID: PMC9712108

      (12) van Breugel PC, Robert EI, Mueller H, Decorsiere A, Zoulim F, Hantz O, Strubin M. Hepatitis B virus X protein stimulates gene expression selectively from extrachromosomal DNA templates. Hepatology. 2012;56(6):2116-24. 

      (13) Lechardeur D, Sohn KJ, Haardt M, Joshi PB, Monck M, Graham RW, Beatty B, Squire J, O'Brodovich H, Lukacs GL. Metabolic instability of plasmid DNA in the cytosol: a potential barrier to gene transfer. Gene Ther. 1999;6(4):482-97. 

      (14) Gallego-Paez LM, Tanaka H, Bando M, Takahashi M, Nozaki N, Nakato R, Shirahige K, Hirota T. Smc5/6-mediated regulation of replication progression contributes to chromosome assembly during mitosis in human cells. Mol Biol Cell. 2014;25(2):302-17. PMCID: PMC3890350

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public Review): 

      Summary: 

      This paper by Schommartz and colleagues investigates the neural basis of memory reinstatement as a function of both how recently the memory was formed (recent, remote) and its development (children, young adults). The core question is whether memory consolidation processes as well as the specificity of memory reinstatement differ with development. A number of brain regions showed a greater activation difference for recent vs. remote memories at the long versus shorter delay specifically in adults (cerebellum, PHG, LOC). A different set showed decreases in the same comparison, but only in children (precuneus, RSC). The authors also used neural pattern similarity analysis to characterize reinstatement, though still in this revised paper I have substantive concerns about how the analyses were performed. While scene-specific reinstatement decreased for remote memories in both children and adults, claims about its presence cannot be made given the analyses. Gist-level reinstatement was observed in children but not adults, but I also have concerns about this analysis. Broadly, the behavioral and univariate findings are consistent with the idea memory consolidation differs between children and adults in important ways, and takes a step towards characterizing how.

      Strengths: 

      The topic and goals of this paper are very interesting. As the authors note, there is little work on memory consolidation over development, and as such this will be an important data point in helping us begin to understand these important differences. The sample size is great, particularly given this is an onerous, multi-day experiment; the authors are to be commended for that. The task design is also generally well controlled, for example as the authors include new recently learned pairs during each session.  

      Weaknesses: 

      As noted above and in my review of the original submission, the pattern similarity analysis for both item and category-level reinstatement were performed in a way that is not interpretable given concerns about temporal autocorrelation within scanning run.Unfortunately these issues remain of concern in this revision because they were not rectified. Most of my review focuses on this analytic issue, though I also outline additional concerns. 

      (1) The pattern similarity analyses are largely uninterpretable due to how they were performed. 

      (a) First, the scene-specific reinstatement index: The authors have correlated a neural pattern during a fixation cross (delay period) with a neural pattern associated with viewing a scene as their measure of reinstatement. The main issue with this is that these events always occurred back-to-back in time. As such, the two patterns will be similar due simply to the temporal autocorrelation in the BOLD signal. Because of the issues with temporal autocorrelation within scanning run, it is always recommended to perform such correlations only across different runs. In this case, the authors always correlated patterns extracted from the same run, and which moreover have temporal lags that are perfectly confounded with their comparison of interest (i.e., from Fig 4A, the "scene-specific" comparisons will always be back-to-back, having a very short temporal lag; "set-based" comparisons will be dispersed across the run, and therefore have a much higher lag). The authors' within-run correlation approach also yields correlation values that are extremely high - much higher than would be expected if this analysis was done appropriately. The way to fix this would be to restrict the analysis to only cross-run comparisons, which is not possible given the design. 

      To remedy this, in the revision the authors have said they will refrain from making conclusions about the presence of scene-specific reinstatement (i.e., reinstatement above baseline). While this itself is an improvement from the original manuscript, I still have several concerns. First, this was not done thoroughly and at times conclusions/interpretations still seem to imply or assume the presence of scene reinstatement (e.g., line 979-985, "our research supports the presence of scene-specific reinstatement in 5-to-7-year-old children"; line 1138). 

      We thank the reviewers for pointing out that there are inconsistencies in our writing. We agree that we cannot make any claims about the baseline level of scene-specific reinstatement. To reiterate, our focus is on the changes in reinstatement over time (30 minutes, 24 hours, and two weeks after learning), which showed a robust decrease. Importantly, scenespecific reinstatement indices for recent items — tested on different days — did not significantly differ, as indicated by non-significant main effects of Session (all p > .323) and Session x ROI interactions (all p > .817) in either age group. This supports our claim that temporal autocorrelation is stable and consistent across conditions and that the observed decline in scene-specific reinstatement reflects a time-dependent change in remote retrieval. We have revised the highlighted passages, accordingly, emphasizing the delay-related decrease in scene-specific reinstatement rather than its absolute magnitude. 

      Second, the authors' logic for the neural-behavioural correlations in the PLSC analysis involved restricting to regions that showed significant reinstatement for the gist analysis, which cannot be done for the analogous scene-specific reinstatement analysis. This makes it challenging to directly compare these two analyses since one was restricted to a small subset of regions and only children (gist), while scene reinstatement included both groups and all ROIs. 

      We thank the reviewer for pointing this out and want to clarify that it was not our intention to directly compare these analyses. For the neural-behavioral correlations, we included only those regions identified based on gist-like representations baseline, whereas for scene-specific reinstatement, we included all regions due to the absence of such a baseline. The primary aim of the PLSC analysis was to identify a set of regions that, after a stringent permutation and bootstrapping procedure, form a latent variable that explains a significant proportion of variance in behavioral performance across all participants. 

      Third, it is also unclear whether children and adults' values should be directly comparable given pattern similarity can be influenced by many factors like motion, among other things. 

      We thank the reviewer for raising this important point. In our multivariate analysis, we included confounding regressors specifically addressing motion-related artefacts. Following recent best practices for mitigating motion-related confounding factors in both adult and pediatric fMRI data (Ciric et al., 2017; Esteban et al., 2020; Jones et al., 2021; Satterthwaite et al., 2013), we implemented the most effective motion correction strategies. 

      Importantly, our group × session interaction analysis focuses on relative changes in reinstatement over time rather than comparing absolute levels of pattern similarity between children and adults. This approach controls for potential baseline differences and instead examines whether the magnitude of delay-related changes differs across groups. We believe this warrants the comparison and ensures that our conclusions are not driven by group-level differences in baseline similarity or motion artifacts.

      My fourth concern with this analysis relates to the lack of regional specificity of the effects. All ROIs tested showed a virtually identical pattern: "Scene-specific reinstatement" decreased across delays, and was greater in children than adults. I believe control analyses are needed to ensure artifacts are not driving these effects. This would greatly strengthen the authors' ability to draw conclusions from the "clean" comparison of day 1 vs. day 14. (A) The authors should present results from a control ROI that should absolutely not show memory reinstatement effects (e.g., white matter?). Results from the control ROI should look very different - should not differ between children and adults, and should not show decreases over time. 

      (C) If the same analysis was performed comparing the object cue and immediately following fixation (rather than the fixation and the immediately following scene), the results should look very different. I would argue that this should not be an index of reinstatement at all since it involves something presented visually rather than something reinstated (i.e., the scene picture is not included in this comparison). If this control analysis were to show the same effects as the primary analysis, this would be further evidence that this analysis is uninterpretable and hopelessly confounded. 

      We appreciate the reviewer’s suggestion to strengthen the interpretation of our findings by including appropriate control analyses to rule out non-memory-related artifacts. In response, we conducted several control analyses, detailed below, which collectively support the specificity of the observed reinstatement effects. The report of the results is included in the manuscript (line 593-619).

      We checked that item reinstatement for incorrectly remembered trial did not show any session-related decline for any ROI. This indicates that the reinstatement for correctly remembered items is memory-related (see Fig. S5 for details). 

      We conducted additional analyses on three subregions of the corpus callosum (the body, genu, and splenium). The results of the linear mixed-effects models revealed no significant group effect (all p > .426), indicating no differences between children and adults. In contrast, all three ROIs showed a significant main effect of Session (all p < .001). However, post hoc analyses indicated that this effect was driven by differences between the recent and the Day 14 remote condition. The main contrasts of interest – recent vs. Day 1 remote and Day 1 remote vs. Day 14 remote – were not significant (all p > .080; see Table S10.4), suggesting that, unlike in other ROIs, there was no delay-related decrease in scene-specific reinstatement in these white matter regions.

      Then we repeated our analysis using the same procedure but replaced the “scene” time window with the “object” time window. The rationale for this control is that comparing the object cue to the immediately following fixation period should not reflect scene reinstatement, as the object and the reinstated scene rely on distinct neural representations. Accordingly, we did not expect a delay-related decrease in the reinstatement index. Consistent with this expectation, the analysis using the object – fixation similarity index – though also influenced by temporal autocorrelation – did not reveal any significant effect of session or delay in any ROI (all p > .059; see Table S9, S9.1).

      Together, these control analyses provide converging evidence that our findings are not driven by global or non-specific signal changes. We believe that these control analyses strengthen our interpretation about delay-related decrease in scene-specific reinstatement index. 

      (B) Do the recent items from day 1 vs. day 14 differ? If so, this could suggest something is different about the later scans (and if not, it would be reassuring). 

      The recent items tested on day 1 and day14 do not differ (all p. > .323). This effect remains stable across all ROIs.

      (b) For the category-based neural reinstatement: (1) This suffers from the same issue of correlations being performed within run. Again, to correct this the authors would need to restrict comparisons to only across runs (i.e., patterns from run 1 correlated with patterns for run 2 and so on). The authors in their response letter have indicated that because the patterns being correlated are not derived from events in close temporal proximity, they should not suffer from the issue of temporal autocorrelation. This is simply not true. For example, see the paper by Prince et al. (eLife 2022; on GLMsingle). This is not the main point of Prince et al.'s paper, but it includes a nice figure that shows that, using standard modelling approaches, the correlation between (same-run) patterns can be artificially elevated for lags as long as ~120 seconds (and can even be artificially reduced after that; Figure 5 from that paper) between events. This would affect many of the comparisons in the present paper. The cleanest way to proceed is to simply drop the within-run comparisons, which I believe the authors can do and yet they have not. Relatedly, in the response letter the authors say they are focusing mainly on the change over time for reinstatement at both levels including the gist-type reinstatement; however, this is not how it is discussed in the paper. They in fact are mainly relying on differences from zero, as children show some "above baseline" reinstatement while adults do not, but I believe there were no significant differences over time (i.e., the findings the authors said they would lean on primarily, as they are arguably the most comparable).  

      We thank the reviewer for this important comment regarding the potential inflation of similarity values due to within-run comparisons.

      To address the reviewer’s concern, we conducted an additional cross-run analysis for all correctly retrieved trials. The approach restricted comparisons to non-overlapping runs (run1run2, run2-run3, run1-run3). This analysis revealed robust gist-like reinstatement in children for remote Day 14 memories in the mPFC (p = .035) and vlPFC (p = .0007), in adults’ vlPFC remote Day 1 memories (p = .029), as well as in children and adults remote Day 1 memories in LOC (p < .02). A significant Session effect in both regions (mPFC: p = .026; vlPFC: p = .002) indicated increased reinstatement for long delay (Day 14) compared to short-delay and recent session (all p < .05). Given that the cross-run results largely replicate and reinforce the effects found previously with within-run, we believe that combining both sources of information is methodologically justified and statistically beneficial. Specifically, both approaches independently identified significant gist-like reinstatement in children’s mPFC and vlPFC (although within-run vlPFC effect (short delay: p = .038; long delay p = .047) did not survive multiple comparisons), particularly for remote memories. Including both withinrun and between-run comparisons increases the number of unique, non-repeated trial pairs, improving statistical power without introducing redundancy. While we acknowledge that same-run comparisons may be influenced by residual autocorrelation (as shown by Prince et al. 2022, eLife), we believe that our design mitigates this risk through consistency between within-run and cross-run results, long inter-trial intervals, and trial-wise estimation of activation. We have adjusted the manuscript, accordingly, reporting the combined analysis. We also report cross-run and within-run analysis separately in supplementary materials (Tables S12.1, S12.2, showing that they converge with the cross-run results and thus strengthen rather than dilute the findings. 

      As suggested, we now explicitly highlight the change over time as the central finding. We observe a clear increase in gist-like reinstatement from recent to remote memories in children, particularly in mPFC and vlPFC. These effects based on combined within- and cross-run comparisons, are now clearly stated in the main results and interpreted in the discussion accordingly. 

      (2) This analysis uses a different approach of comparing fixations to one another, rather than fixations to scenes. In their response letter and the revised paper, the authors do provide a bit of reasoning as to why this is the most sensible. However, it is still not clear to me whether this is really "reinstatement" which (in my mind) entails the re-evoking of a neural pattern initially engaged during perception. Rather, could this be a shared neural state that is category specific? 

      We thank the reviewer for raising this important conceptual point about whether our findings reflect reinstatement in the classical sense — namely, the reactivation of perceptual neural patterns — or a shared, category-specific state.

      While traditional definitions of reinstatement emphasize item-specific reactivation (e.g., Ritchey et al., 2013; Xiao et al., 2017) it is increasingly recognized that memory retrieval can also involve the reactivation of abstracted, generalized, or gist-like representations, especially as memories consolidate. Our analysis follows this view, aimed to capture how memory representations evolve over time, particularly in development.

      Several studies support this broader notion of gist-like reinstatement. For instance, Chen et al. (2017) showed that while event-specific patterns were reinstated across the default mode network and medial temporal lobe, inter-subject recall similarity exceeded encodingretrieval similarity, suggesting transformation and abstraction beyond perceptual reinstatement. Zhuang et al. (2021) further showed that loss of neural distinctiveness in the

      MTL over time predicted false memories, linking neural similarity to representational instability. This aligns with our finding that greater gist-like reinstatement is associated with lower memory accuracy.

      Ye et al. (2020) discuss how memory representations are reshaped post-encoding — becoming more differentiated, integrated, or weakened depending on task goals and neural resources. While their work focuses on adults, our previous findings (Schommartz et al., 2023) suggest that children’s neural systems (the same sample) are structurally immature, making them more likely to rely on gist-based consolidation (see Fandakova et al., 2019). Adults, by contrast, may retain more item-specific traces.

      Relatedly, St-Laurent & Buchsbaum (2019) show that with repeated encoding, neural memory representations become increasingly distinct from perception, suggesting that reinstatement need not mimic perception. We agree that reinstatement does not always reflect reactivation of low-level sensory patterns, particularly over long delays or in developing brains.

      Finally, while we did not correlate retrieval patterns directly with perceptual encoding patterns, we assessed neural similarity among retrieved items within vs. between categories, based on non-repeated, independently sampled trials. This approach is intended to capture the structure and delay-related transformation of mnemonic representations, especially in terms of how they become more schematic or gist-like over time. Our findings align conceptually with the results of Kuhl et al. (2012), who used MVPA to show that older and newer visual memories can be simultaneously reactivated during retrieval, with greater reactivation of older memories interfering with retrieval accuracy for newer memories. Their work highlights how overlapping category-level representations in ventral temporal cortex can reflect competition among similar memories, even in the absence of item-specific cues. In our developmental context, we interpret the increased neural similarity among category members in children as possibly reflecting such representational overlap or competition, where generalized traces dominate over item-specific ones. This pattern may reflect a shift toward efficient but less precise retrieval, consistent with developmental constraints on memory specificity and consolidation.

      In this context, we view our findings as evidence of memory trace reorganization — from differentiated, item-level representations toward more schematic, gist-like neural patterns (Sekeres et al., 2018), particularly in children. Our cross-run analyses further confirm that this is not an artifact of same-run correlations or low-level confounds. We have clarified this distinction and interpretation throughout the revised manuscript (see lines 144-158; 1163-1170).

      In any case, I think additional information should be added to the text to clarify that this definition differs from others in the literature. The authors might also consider using some term other than reinstatement. Again (as I noted in my prior review), the finding of no category-level reinstatement in adults is surprising and confusing given prior work and likely has to do with the operationalization of "reinstatement" here. I was not quite sure about the explanation provided in the response letter, as category-level reinstatement is quite widespread in the brain for adults and is robust to differences in analytic procedures etc. 

      We agree that our operationalization of "reinstatement" differs from more conventional uses of the term, which typically involve direct comparisons between encoding and retrieval phases, often with item-level specificity. As our analysis is based on similarity among retrieval-phase trials (fixation-based activation patterns) and focuses on within- versus between-category neural similarity, we agree that the term reinstatement may suggest a stronger encoding–retrieval mapping than we are claiming.

      To avoid confusion and overstatement, we have revised the terminology throughout the manuscript: we now refer to our measure as “gist-like representations” rather than “gist-like reinstatement.” This change better reflects the nature of our analysis — namely, that we are capturing shared neural patterns among category-consistent memories that may reflect reorganized or abstracted traces, especially after delay and in development.

      As the reviewer rightly points out, category-level reinstatement is well documented in adults (e.g., Kuhl & Chun, 2014; Tompary et al., 2020; Tompary & Davachi, 2017). The absence of such effects in our adult group may indeed reflect differences in study design, particularly our use of non-repeated, cross-trial comparisons based on fixation events. It may also reflect different consolidation strategies, with adults preserving more differentiated or item-specific representations, while children form more schematic or generalizable representations — a pattern consistent with our interpretation and supported by prior work (Fandakova et al., 2019; Sekeres et al., 2018) 

      We have updated the relevant sections of the manuscript (Results, Discussion (particularly lines 1163- 1184), and Figure captions) to clarify this terminology shift and explicitly contrast our approach with more standard definitions of reinstatement. We hope this revision provides the needed conceptual clarity while preserving the integrity of our developmental findings.

      (3) Also from a theoretical standpoint-I'm still a bit confused as to why gist-based reinstatement would involve reinstatement of the scene gist, rather than the object's location (on the screen) gist. Were the locations on the screen similar across scene backgrounds from the same category? It seems like a different way to define memory retrieval here would be to compare the neural patterns when cued to retrieve the same vs. similar (at the "gist" level) vs. different locations across object-scene pairs. This is somewhat related to a point from my review of the initial version of this manuscript, about how scene reinstatement is not necessary. The authors state that participants were instructed to reinstate the scene, but that does not mean they were actually doing it. The point that what is being measured via the reinstatement analyses is actually not necessary to perform the task should be discussed in more detail in the paper. 

      We appreciate the reviewer’s thoughtful theoretical question regarding whether our measure of “gist-like representations” might reflect reinstatement of spatial (object-location) gist, rather than scene-level gist. We would like to clarify several key points about our task design and interpretation:

      (1) Object locations were deliberately varied and context dependent.

      In our stimulus set, each object was embedded in a rich scene context, and the locations were distributed across six distinct possible areas within each scene, with three possible object placements per location. These placements were manually selected to ensure realistic and context-sensitive positioning of objects within the scenes. Importantly, locations were not fixed across scenes within a given category. For example, objects placed in “forest” scenes could appear in different screen locations across different scene exemplars (e.g., one in the bottom-left side, another floating above). Therefore, the task did not introduce a consistent spatial schema across exemplars from the same scene category that could give rise to a “location gist.”

      (2) Scene categories provided consistent high-level contextual information.

      By contrast, the scene categories (e.g., farming, forest, indoor, etc.) provided semantically coherent and visually rich contextual backgrounds that participants could draw upon during retrieval. This was emphasized in the instruction phase, where participants were explicitly encouraged to recall the whole scene based on the stories they created during learning (not just the object or its position). While we acknowledge that we cannot directly verify the reinstated content, this instruction aligns with prior studies showing that scene and context reinstatement can occur even without direct task relevance (e.g., Kuhl & Chun, 2014; Ritchey et al., 2013).

      (3) Our results are unlikely to reflect location-based reinstatement.

      If participants had relied on a “location gist” strategy, we would have expected greater neural similarity across scenes with similar spatial layouts, regardless of category. However, our design avoids this confound by deliberately varying locations across exemplars within categories. Additionally, our categorical neural similarity measure contrasted within-category vs. between-category comparisons — making it sensitive to shared contextual or semantic structure, not simply shared screen positions.

      Considering this, we believe that the neural similarity observed in the mPFC and vlPFC in children at long delay reflects the emergence of scene-level, gist-like representations, rather than low-level spatial regularities. Nevertheless, we now clarify this point in the manuscript and explicitly discuss the limitation that reinstatement of scene context was encouraged but not required for successful task performance.

      Future studies could dissociate spatial and contextual components of reinstatement more directly by using controlled spatial overlap or explicit location recall conditions. However, given the current task structure, location-based generalization is unlikely to account for the category-level similarity patterns we observe.

      (2) Inspired by another reviewer's comment, it is unclear to me the extent to which age group differences can be attributed to differences in age/development versus memory strength. I liked the other reviewer's suggestions about how to identify and control for differences in memory strength, which I don't think the authors actually did in the revision. They instead showed evidence that memory strength does seem to be lower in children, which indicates this is an interpretive confound. For example, I liked the reviewer's suggestion of performing analyses on subsets of participants who were actually matched in initial learning/memory performance would have been very informative. As it is, the authors didn't really control for memory strength adequately in my opinion, and as such their conclusions about children vs. adults could have been reframed as people with weak vs. strong memories. This is obviously a big drawback given what the authors want to conclude. Relatedly, I'm not sure the DDM was incorporated as the reviewer was suggesting; at minimum I think the authors need to do more work in the paper to explain what this means and why it is relevant. (I understand putting it in the supplement rather

      than the main paper, but I still wanted to know more about what it added from an interpretive perspective.) 

      We appreciate the reviewer’s thoughtful concerns regarding potential confounding effects of memory strength on the observed age group differences. This is indeed a critical issue when interpreting developmental findings.

      While we agree that memory strength differs between children and adults — and our own DDM-based analysis confirms this, mirroring differences observed in accuracy — we would like to emphasize that these differences are not incidental but rather reflect developmental changes in the underlying memory system. Given the known maturation of both structural and functional memory-related brain regions, particularly the hippocampus and prefrontal cortex, we believe it would be theoretically inappropriate to control for memory strength entirely, as doing so would remove variance that is central to the age-related neural effects we aim to understand.

      To address the reviewer's concern empirically, we conducted an additional control analysis in which we subsampled children to include only those who reached learning criterion after two cycles (N = 28 out of 49 children, see Table S1.1, S1.2, Figure S1, Table S9.1), thereby selecting a high-performing subgroup. Importantly, this subsample replicated behavioral and neural results to the full group. This further suggests that the observed age group differences are not merely driven by differences in memory strength.

      As abovementioned, the results of the DDM support our behavioral findings, showing that children have lower drift rates for evidence accumulation, consistent with weaker or less accessible memory representations. While these results are reported in the Supplementary Materials (section S2.1, Figure S2, Table S2), we agree that their interpretive relevance should be more clearly explained in the main text. We have therefore updated the Discussion section to explicitly state how the DDM results provide converging evidence for our interpretation that developmental differences in memory quality — not merely strategy or task performance — underlie the observed neural differences (see lines 904-926).

      In sum, we view memory strength not as a confound to be removed, but as a meaningful and theoretically relevant factor in understanding the emergence of gist-like representations in children. We have clarified this interpretive stance in the revised manuscript and now discuss the role of memory strength more explicitly in the Discussion.

      (3) Some of the univariate results reporting is a bit strange, as they are relying upon differences between retrieval of 1- vs. 14-day memories in terms of the recent vs. remote difference, and yet don't report whether the regions are differently active for recent and remote retrieval. For example in Figure 3A, neither anterior nor posterior hippocampus seem to be differentially active for recent vs. remote memories for either age group (i.e., all data is around 0). Precuneus also interestingly seems to show numerically recent>remote (values mostly negative), whereas most other regions show the opposite. This difference from zero (in either direction) or lack thereof seems important to the message. In response to this comment on the original manuscript, the authors seem to have confirmed that hippocampal activity was greater during retrieval than implicit baseline. But this was not really my question - I was asking whether hippocampus is (and other ROIs in this same figure are) differently engaged for recent vs. remote memories.

      We thank the reviewer for bringing up this important point. Our previous analysis showed that both anterior and posterior regions of the hippocampus, anterior parahippocampal gyrus and precuneus exhibited significant activation from zero in children and adults for correctly remembered items (see Fig. S2, Table S7 in Supplementary Materials). Based on your suggestion, our additional analysis showed: 

      (i) The linear mixed-effects model for correctly remembered items showed no significant interaction effects (group x session x memory age (recent, remote)) for the anterior hippocampus (all p > .146; see Table S7.1).

      (ii) For the posterior hippocampus, we observed a significant main effect of group (F(1,85),   = 5.62, p = .038), showing significantly lower activation in children compared to adults (b = .03, t = -2.34, p = .021). No other main or interaction effects were significant (all p > .08; see Table S7.1).

      (iii) For the anterior PHG, that also showed no significant remote > recent difference, the model showed that there was indeed no difference between remote and recent items across age groups and delays (all p > .194; Table S7.1). 

      Moreover, when comparing recent and remote hippocampal activation directly, there were no significant differences in either group (all FDR-adjusted p > .116; Table S7.2), supporting the conclusion that hippocampal involvement was stable across delays for successfully retrieved items. 

      In contrast, analysis of unsuccessfully remembered items showed that hippocampal activation was not significantly different from zero in either group (all FDR-adjusted p > .052; Fig. S2.1, Table S7.1), indicating that hippocampal engagement was specific to successful memory retrieval.

      To formally test whether hippocampal activation differs between remembered and forgotten items, we ran a linear mixed-effects model with Group, Memory Success (remembered vs. forgotten), and ROI (anterior vs. posterior hippocampus) as fixed effects. This model revealed a robust main effect of memory success (F(1,1198) = 128.27, p < .001), showing that hippocampal activity was significantly higher for remembered compared to forgotten items (b = .06, t(1207) = 11.29, p < .001; Table S7.3). 

      As the reviewer noted, precuneus activation was numerically higher for recent vs. remote items, and this was confirmed in our analysis. While both recent and remote retrieval elicited significantly above-zero activation in the precuneus (Table S7.2), activation for recent items was significantly higher than for remote items, consistent across both age groups.

      Taken together, these analyses support the conclusion that hippocampal involvement in successful retrieval is sustained across delays, while other ROIs such as the precuneus may show greater engagement for more recent memories. We have now updated the manuscript text ( lines 370-390) and supplementary materials to reflect these findings more clearly, as well as to clarify the distinction between activation relative to baseline and memory-agerelated modulation.

      (4) Related to point 3, the claims about hippocampus with respect to multiple trace theory feel very unsupported by the data. I believe the authors want to conclude that children's memory retrieval shows reliance on hippocampus irrespective of delay, presumably because this is a detailed memory task. However the authors have not really shown this; all they have shown is that hippocampal involvement (whatever it is) does not vary by delay. But we do not have compelling evidence that the hippocampus is involved in this task at all. That hippocampus is more active during retrieval than implicit baseline is a very low bar and does not necessarily indicate a role in memory retrieval. If the authors want to make this claim, more data are needed (e.g., showing that hippocampal activity during retrieval is higher when the upcoming memory retrieval is successful vs. unsuccessful). In the absence of this, I think all the claims about multiple trace theory supporting retrieval similarly across delays and that this is operational in children are inappropriate and should be removed. 

      We thank the reviewer for pointing this out. We agree that additional analysis of hippocampal activity during successful and unsuccessful memory retrieval is warranted. This will provide stronger support for our claim that strong, detailed memories during retrieval rely on the hippocampus in both children and adults. Our previously presented results on the remote > recent univariate signal difference in the hippocampus (p. 14-18; lines 433-376, Fig. 3A) show that this difference does not vary between children and adults, or between Day 1 and Day 14. Our further analysis showed that both anterior and posterior regions of the hippocampus exhibited significant activation from zero in children and adults for correctly remembered items (see Fig. S2, Table S7 in Supplementary Materials). Based on your suggestion, our recent additional analysis showed:

      (i) For forgotten items, we did not observe any activation significantly higher than zero in either the anterior or posterior hippocampus for recent and remote memory on Day 1 and Day 14 in either age group (all p > .052 FDR corrected; see Table S7.1, Fig. S2.1).

      (ii) After establishing no difference between recent and remote activation across and between sessions (Day 1, Day 14), we conducted another linear mixed-effects model with group x memory success (remembered, forgotten) x region (anterior hippocampus, posterior hippocampus), with subject as a random effect. The model showed no significant effects for the memory success x region interaction (F = 1.12(1,1198), p = .289) and no significant group x memory success x region interaction (F = .017(1,1198), p = .895). However, we observed a significant main effect of memory success (F = 128.27(1,1198), p < .001), indicating significantly higher hippocampal activation for remembered compared to forgotten items (b = .06, t = 11.29, p <.001; see Table S7.3).

      (iii) Considering the comparatively low number of incorrect trials for recent items in the adult group, we reran this analysis only for remote items. Similarly, the model showed no significant effects for the memory success x region interaction (F = .72(1,555), p = .398) and no significant group x memory success x region interaction (F = .14(1,555), p = .705). However, we observed a significant main effect of memory success (F = 68.03(1,555), p < .001), indicating significantly higher hippocampal activation for remote remembered compared to forgotten items (b = .07, t = 8.20, p <.001; see Table S7.3).

      Taken together, our results indicate that significant hippocampal activation was observed only for correctly remembered items in both children and adults, regardless of memory age and session. For forgotten items, we did not observe any significant hippocampal activation in either group or delay. Moreover, hippocampal activation was significantly higher for remembered compared to forgotten memories. This evidence supports our conclusions regarding the Multiple Trace and Trace Transformation Theories, suggesting that the hippocampus supports retrieval similarly across delays, and provides novel evidence that this process is operational in both children and adults. This aligns also with Contextual Bindings Theory, as well as empirical evidence by Sekeres, Winokur, & Moscovitch (2018), among others. We have added this information to the manuscript.

      (5) There are still not enough methodological details in the main paper to make sense of the results. Some of these problems were addressed in the revision but others remain. For example, a couple of things that were unclear: that initially learned locations were split, where half were tested again at day 1 and the other half at day 14; what specific criterion was used to determine to pick the 'well-learned' associations that were used for comparisons at different delay periods (object-scene pairs that participants remembered accurately in the last repetition of learning? Or across all of learning?). 

      We thank the reviewer for pointing this out. The initially learned object-scene associations on Day 0 were split in two halves based on  their categories before the testing. Specifically, half of the pairs from the first set and half of the pairs from the second set of 30 object-scene associations were used to create the set 30 remote pair for Day 1 testing. A similar procedure was repeated for the remaining pairs to create a set of remote object-scene associations for Day 14 retrieval. We tried to equally distribute the categories of pairs between the testing sets. We added this information to the methods section of the manuscript (see p. 47, lines 12371243). In addition, the sets of association for delay test on Day 1 and Day 14 were not based on their learning accuracy. Of note, the analysis of variance revealed that there was no difference in learning accuracy between the two sets created for delay tests in either age group (children: p = .23; adults  p = .06). These results indicate that the sets were comprised of items learned with comparable accuracy in both age groups. 

      (6) In still find the revised Introduction a bit unclear. I appreciated the added descriptions of different theories of consolidation, though the order of presented points is still a bit hard to follow. Some of the predictions I also find a bit confusing as laid out in the introduction. (1) As noted in the paper multiple trace theory predicts that hippocampal involvement will remain high provided memories retained are sufficiently high detail. The authors however also predict that children will rely more on gist (than detailed) memories than adults, which would seem to imply (combined with the MTT idea) that they should show reduced hippocampal involvement over time (while in adults, it should remain high). However, the authors' actual prediction is that hippocampus will show stable involvement over time in both kids and adults. I'm having a hard time reconciling these points. (2) With respect to the extraction of gist in children, I was confused by the link to Fuzzy Trace Theory given the children in the present study are a bit young to be showing the kind of gist extraction shown in the Brainerd & Reyna data. Would 5-7 year olds not be more likely to show reliance on verbatim traces under that framework? Also from a phrasing perspective, I was confused about whether gist-like information was something different from just gist in this sentence: "children may be more inclined to extract gist information at the expense of detailed or gist-like information." (p. 8) - is this a typo? 

      We thank the reviewer for this thoughtful observation. 

      Our hypothesis of stable hippocampal engagement over time was primarily based on Contextual Binding Theory (Yonelinas et al., 2019), and the MTT, supported by the evidence provided by Sekeres et al., 2018, which posits that the hippocampus continues to support retrieval when contextual information is preserved, even for older, consolidated memories. Given that our object-location associations were repeatedly encoded and tied to specific scene contexts, we believe that retrieval success for both recent and remote memories likely involved contextual reinstatement, leading to sustained hippocampal activity. Also in accordance with the MTT and related TTT, different memory representations may coexist, including detailed and gist-like memories. Therefore, we suggest that children may not rely on highly detailed item-specific memory, but rather on sufficiently contextualized schematic traces, which still engage the hippocampus. This distinction is now made clearer in the Introduction (see lines 223-236).

      We appreciate the reviewer’s point regarding Fuzzy Trace Theory (Brainerd & Reyna, 2002). Indeed, in classic FTT, young children are thought to rely more on verbatim traces due to immature gist extraction mechanisms (primarily from verbal material). However, we use the term “gist-like representations” to refer to schematic or category-level retrieval that emerges through structured, repeated learning (as in our task). This form of abstraction may not require full semantic gist extraction in the FTT sense but may instead reflect consolidation-driven convergence onto shared category-level representations — especially when strategic resources are limited. We now clarify this distinction and revise the ambiguous sentence with typo (“at the expense of detailed or gist-like information”) to better reflect our intended meaning (see p.8).

      (7) For the PLSC, if I understand this correctly, the profiles were defined for showing associations with behaviour across age groups. (1) As such, is it not "double dipping" to then show that there is an association between brain profile and behaviour-must this not be true by definition? If I am mistaken, it might be helpful to clarify this in the paper. (2) In addition, I believe for the univariate and scene-specific reinstatement analyses these profiles were defined across both age groups. I assume this doesn't allow for separate definition of profiles across the two group (i.e., a kind of "interaction"). If this is the case, it makes sense that there would not be big age differences... the profiles were defined for showing an association across all subjects. If the authors wanted to identify distinct profiles in children and adults they may need to run another analysis. 

      We thank the reviewer for this thoughtful comment. 

      (1) We agree that showing the correlation between the latent variable and behavior may be redundant, as the relationship is already embedded in the PLSC solution and quantified by the explained variance. Our intention was merely to visualize the strength of this relationship. In hindsight, we agree that this could be misinterpreted, and we have removed the additional correlation figure from the manuscript.

      We also see the reviewer’s point that, given the shared latent profile across groups, it is expected that the strength of the brain-behavior relationship does not differ between age groups. Instead, to investigate group differences more appropriately, we examined whether children and adults differed in their expression of the shared latent variable (i.e., brain scores). This analysis revealed that children showed significantly lower brain scores than adults both in short delay, t(83) = -4.227, p = .0001, and long delay, t(74) = -5.653, p < .001, suggesting that while the brain-behavior profile is shared, its expression varies by group. We have added this clarification to the Results section (p. 19-20) of the revised manuscript. 

      (2) Regarding the second point, we agree with the reviewer that defining the PLS profiles across both age groups inherently limits the ability to detect group-specific association, as the resulting latent variables represent shared pattern across the full sample. To address this, we conducted additional PLS analyses separately within each age group to examine whether distinct neural upregulation profiles (remote > recent) emerge for short and long delay conditions.

      These within-group analyses, however, were based on smaller subsamples, which reduced statistical power, especially when using bootstrapping to assess the stability of the profiles. For the short delay, although some regions reached significance, the overall latent variables did not reach conventional thresholds for stability (all p > .069), indicating that the profiles were not robust. This suggests that within-group PLS analyses may be underpowered to detect subtle effects, particularly when modelling neural upregulation (remote > recent), which may be inherently small.

      Nonetheless, when we exploratively applied PLSC separately within each group using recent and remote activity levels against the implicit baseline (rather than the contrast remote > recent) and its relation to memory performance, we observed significant and stable latent variables in both children and adults. This implies that such contrasts (vs. baseline) may be more sensitive and better suited to detect meaningful brain–behavior relationships within age groups. We have added this clarification to the Results sections of the manuscript to highlight the limitations of within-group contrasts for neural upregulation. 

      Author response image 1.

      (3) Also, as for differences between short delay brain profile and long delay brain profile for the scene-specific reinstatement - there are 2 regions that become significant at long delay that were not significant at a short delay (PC, and CE). However, given there are ceiling effects in behaviour at the short but not long delay, it's unclear if this is a meaningful difference or just a difference in sensitivity. Is there a way to test whether the profiles are statistically different from one another?

      We thank the reviewer for this comment. To better illustrate differential profiles also for high memory accuracy after immediate delay (30 minutes delay), we added the immediate (30 minutes delay) condition as a third reference point, given the availability of scene-specific reinstatement data at this time point. Interestingly, the immediate reinstatement profile revealed a different set of significant regions, with distinct expression patterns compared to both the short and long delay conditions. This supports the view that scene-specific reinstatement is not static but dynamically reorganized over time.

      Regarding the ceiling effect at short delay, we acknowledge this as a potential limitation. However, we note that our primary analyses were conducted across both age groups combined, and not solely within high-performing individuals. As such, the grouping may mitigate concerns that ceiling-level performance in a subset of participants unduly influenced the overall reinstatement profile. Moreover, we observed variation in neural reinstatement despite ceiling-level behavior, suggesting that the neural signal retains sensitivity to consolidation-related processes even when behavioral accuracy is near-perfect.

      While we agree that formal statistical comparisons of reinstatement profiles across delays (e.g., using representational profile similarity or interaction tests) could be an informative direction, we feel that this goes beyond the scope of the current manuscript. 

      (4) As I mentioned above, it also was not ideal in my opinion that all regions were included for the scene-specific reinstatement due to the authors' inability to have an appropriate baseline and therefore define above-chance reinstatement. It makes these findings really challenging to compare with the gist reinstatement ones. 

      We appreciate the reviewer’s comment and agree that the lack of a clearly defined baseline for scene-specific reinstatement limits our ability to determine whether these values reflect above-chance reinstatement. However, we would like to clarify that we do not directly compare the magnitude of scene-specific reinstatement to that of gist-like reinstatement in our analyses or interpretations. These two analyses serve complementary purposes: the scenespecific analysis captures trial-unique similarity (within-item reinstatement), while the gistlike analysis captures category-level representational structure (across items). Because they differ not only in baseline assumptions but also in analytical scope and theoretical interpretation, our goal was not to compare them directly, but rather to explore distinct but co-existing representational formats that may evolve differently across development and delay.

      (8) I would encourage the authors to be specific about whether they are measuring/talking about memory representations versus reinstatement, unless they think these are the same thing (in which case some explanation as to why would be helpful). For example, especially under the Fuzzy Trace framework, couldn't someone maintain both verbatim and gist traces of a memory yet rely more on one when making a memory decision? 

      We thank the reviewer for pointing out the importance of conceptual clarity when referring to memory representations versus reinstatement. We agree that these are distinct but related concepts: in our framework, memory representations refer to the neural content stored as a result of encoding and consolidation, whereas reinstatement refers to the reactivation of those representations during retrieval. Thus, reinstatement serves as a proxy for the underlying memory representation — it is how we measure or infer the nature (e.g., specificity, abstraction) of the stored content.

      Under Fuzzy Trace Theory, it is indeed possible for both verbatim and gist representations to coexist. Our interpretation is not that children lack verbatim traces, but rather that they are more likely to rely on schematic or gist-like representations during retrieval, especially after a delay. Our use of neural pattern similarity (reinstatement) reflects which type of representation is being accessed, not necessarily which traces exist in parallel.

      To avoid ambiguity, we have revised the manuscript to more explicitly distinguish between reinstatement (neural reactivation) and the representational format (verbatim vs. gist-like), especially in the framing of our hypotheses and interpretation of age group differences.

      (9) With respect to the learning criteria - it is misleading to say that "children needed between two to four learning-retrieval cycles to reach the criterion of 83% correct responses" (p. 9). Four was the maximum, and looking at the Figure 1C data it appears as though there were at least a few children who did not meet the 83% minimum. I believe they were included in the analysis anyway? Please clarify. Was there any minimum imposed for inclusion?

      We thank the reviewer for pointing this out. As stated in Methods Section (p. 50, lines 13261338) “These cycles ranged from a minimum of two to a maximum of four.<…> The cycles ended when participants provided correct responses to 83% of the trials or after the fourth cycle was reached.” We have corrected the corresponding wording in the Results section (line 286-289) to reflect this more accurately. Indeed, five children did not reach the 83% criterion but achieved final performance between 70 and 80% after the fourth learning cycle. These participants were included in this analysis for two main reasons:

      (1) The 83% threshold was established during piloting as a guideline for how many learningretrieval cycles to allow, not a strict learning criterion. It served to standardize task continuation, rather than to exclude participants post hoc.

      (2) The performance of these five children was still well above chance level (33%), indicating meaningful learning. Excluding them would have biased the sample toward higherperforming children and reduced the ecological validity of our findings. Including them ensures a more representative view of children’s performance under extended learning conditions.

      (10) For the gist-like reinstatement PLSC analysis, results are really similar a short and long delays and yet some of the text seems to implying specificity to the long delay. One is a trend and one is significant (p. 31), but surely these two associations would not be statistically different from one another?  

      We agree with the reviewer that the associations at short and long delays appeared similar. While a formal comparison (e.g., using a Z-test for dependent correlations) would typically be warranted, in the reanalyzed dataset only the long delay profile remains statistically significant, which limits the interpretability of such a comparison. 

      (11) As a general comment, I had a hard time tying all of the (many) results together. For example adults show more mature neocortical consolidation-related engagement, which the authors say is going to create more durable detailed memories, but under multiple trace theory we would generally think of neocortical representations as providing more schematic information. If the authors could try to make more connections across the different neural analyses, as well as tie the neural findings in more closely with the behaviour & back to the theoretical frameworks, that would be really helpful.  

      We thank the reviewer for this valuable suggestion. We have revised the discussion section to more clearly link the behavioral and neural findings and to interpret them in light of existing consolidation theories for better clarity. 

      Reviewer #2 (Public Review): 

      Schommartz et al. present a manuscript characterizing neural signatures of reinstatement during cued retrieval of middle-aged children compared to adults. The authors utilize a paradigm where participants learn the spatial location of semantically related item-scene memoranda which they retrieve after short or long delays. The paradigm is especially strong as the authors include novel memoranda at each delayed time point to make comparisons across new and old learning. In brief, the authors find that children show more forgetting than adults, and adults show greater engagement of cortical networks after longer delays as well as stronger item-specific reinstatement. Interestingly, children show more category-based reinstatement, however, evidence supports that this marker may be maladaptive for retrieving episodic details. The question is extremely timely both given the boom in neurocognitive research on the neural development of memory, and the dearth of research on consolidation in this age group. Also, the results provide novel insights into why consolidation processes may be disrupted in children. 

      We thank the reviewer for the positive evaluation.

      Comments on the revised version: 

      I carefully reviewed not only the responses to my own reviews as well as those raised by the other reviewers. While they addressed some of the concerns raised in the process, I think many substantive concerns remain. 

      Regarding Reviewer 1: 

      The authors point that the retrieval procedure is the same over time and similarly influenced by temporal autocorrelations, which makes their analysis okay. However, there is a fundamental problem as to whether they are actually measuring reinstatement or they are only measuring differences in temporal autocorrelation (or some non-linear combination of both). The authors further argue that the stimuli are being processed more memory wise rather than perception wise, however, I think there is no evidence for that and that perception-memory processes should be considered on a continuum rather than as discrete processes. Thus, I agree with reviewer 1 that these analyses should be removed. 

      We thank the reviewer for raising this important question. We would like to clarify a few key points regarding temporal autocorrelation and reinstatement.

      During the fixation window, participants were instructed to reinstate the scene and location associated with the cued object from memory. This task was familiar to them, as they had been trained in retrieving locations within scenes. Our analysis aims to compare the neural representations during this retrieval phase with those when participants view the scene, in order to assess how these representations change in similarity over time, as memories become less precise.

      We acknowledge that temporal proximity can lead to temporal autocorrelation. However, evidence suggests that temporal autocorrelation is consistent and stable across conditions (Gautama & Van Hulle, 2004; Woolrich et al., 2004). Shinn & Lagalwar (2021)further demonstrated that temporal autocorrelation is highly reliable at both the subject and regional levels. Given that we analyze regions of interest (ROIs) separately, potential spatial variability in temporal autocorrelation is not a major concern.

      No difference between item-specific reinstatement for recent items on day 1 and day 14 (which were merged) for further delay-related comparison also suggests that the reinstatement measure was stable for recent items even sampled at two different testing days. 

      Importantly, we interpret the relative change in the reinstatement index rather than its absolute value.

      In addition, when we conducted the same analysis for incorrectly retrieved memories, we did not observe any delay-related decline in reinstatement (see p. 25, lines 623-627). This suggests that the delay-related changes in reinstatement are specific to correctly retrieved memories. 

      Finally, our control analysis examining reinstatement between object and fixation time points (as suggested by Reviewer 1) revealed no delay-related effects in any ROI (see p.24, lines 605-612), further highlighting the specificity of the observed delay-related change in item reinstatement.

      We emphasize that temporal autocorrelation should be similar across all retrieval delays due to the identical task design and structure. Therefore, any observed decrease in reinstatement with increasing delay likely reflects a genuine change in the reinstatement index, rather than differences in temporal autocorrelation. Since our analysis includes only correctly retrieved items, and there is no perceptual input during the fixation window, this process is inherently memory-based, relying on mnemonic retrieval rather than sensory processing.

      We respectfully disagree with the reviewer's assertion that retrieval during the fixation period cannot be considered more memory-driven than perception-driven. At this time point, participants had no access to actual images of the scene, making it necessary for them to rely on mnemonic retrieval. The object cue likely triggered pattern completion for the learned object-scene association, forming a unique memory if remembered correctly(Horner & Burgess, 2013). This process is inherently mnemonic, as it is based on reconstructing the original neural representation of the scene (Kuhl et al., 2012; Staresina et al., 2013).

      While perception and memory processes can indeed be viewed as a continuum, some cognitive processes are predominantly memory-based, involving reconstruction rather than reproduction of previous experiences (Bartlett, 1932; Ranganath & Ritchey, 2012). In our task, although the retrieved material is based on previously encoded visual information, the process of recalling this information during the fixation period is fundamentally mnemonic, as it does not involve visual input. Our findings indicate that the similarity between memorybased representations and those observed during actual perception decreases over time, suggesting a relative change in the quality of the representations. However, this does not imply that detailed representations disappear; they may still be robust enough to support correct memory recall. Previous studies examining encoding-retrieval similarity have shown similar findings(Pacheco Estefan et al., 2019; Ritchey et al., 2013).

      We do not claim that perception and memory processes are entirely discrete, nor do we suggest that only perception is involved when participants see the scene. Viewing the scene indeed involves recognition processes, updating retrieved representations from the fixation period, and potentially completing missing or unclear information. This integrative process demonstrates the interrelation of perception and memory, especially in complex tasks like the one we employed.

      In conclusion, our task design and analysis support the interpretation that the fixation period is primarily characterized by mnemonic retrieval, facilitated by cue-triggered pattern completion, rather than perceptual processing. We believe this approach aligns with the current understanding of memory retrieval processes as supported by the existing literature.

      The authors seem to have a design that would allow for across run comparisons, however, they did not include these additional analyses. 

      Thank you for pointing this out. We ran as additional cross-run comparison. This results and further proceeding are reported in the comment for reviewer 1. 

      To address the reviewer’s concern, we conducted an additional cross-run analysis for all correctly retrieved trials. The approach restricted comparisons to non-overlapping runs (run1run2, run2-run3, run1-run3). This analysis revealed robust gist-like reinstatement in children for remote Day 14 memories in the mPFC (p = .035) and vlPFC (p = .0007), in adults’ vlPFC remote Day 1 memories (p = .029), as well as in children and adults remote Day 1 memories in LOC (p < .02). A significant Session effect in both regions (mPFC: p = .026; vlPFC: p = .002) indicated increased reinstatement for long delay (Day 14) compared to short-delay and recent session (all p < .05). Given that the cross-run results largely replicate and reinforce the effects found previously with within-run, we believe that combining both sources of information is methodologically justified and statistically beneficial. Specifically, both approaches independently identified significant gist-like reinstatement in children’s mPFC and vlPFC (although within-run vlPFC effect (short delay: p = .038; long delay p = .047) did not survive multiple comparisons), particularly for remote memories. Including both withinrun and between-run comparisons increases the number of unique, non-repeated trial pairs, improving statistical power without introducing redundancy. While we acknowledge that same-run comparisons may be influenced by residual autocorrelation(Prince et al., 2022), we believe that our design mitigates this risk through consistency between within-run and crossrun results, long inter-trial intervals, and trial-wise estimation of activation. We have adjusted the manuscript, accordingly, reporting the combined analysis. We also report cross-run and within-run analysis separately in supplementary materials (Tables S12.1, S12.2, showing that they converge with the cross-run results and thus strengthen rather than dilute the findings. 

      As suggested, we now explicitly highlight the change over time as the central finding. We observe a clear increase in gist-like reinstatement from recent to remote memories in children, particularly in mPFC and vlPFC. These effects based on combined within- and cross-run comparisons, are now clearly stated in the main results and interpreted in the discussion accordingly. 

      (1) The authors did not satisfy my concerns about different amounts of re-exposures to stimuli as a function of age, which introduces a serious confound in the interpretation of the neural data. 

      (2) Regarding Reviewer 1's point about different number of trials being entered into analysis, I think a more formal test of sub-sampling the adult trials is warranted. 

      (1) We thank the reviewer for pointing this out. Overall, children needed 2 to 4 learning cycles to improve their performance and reach the learning criteria, compared to 2 learning cycles in adults. To address the different amounts of re-exposure to stimuli between the age groups, we subsampled the child group to only those children who reached the learning criteria after 2 learning cycles. For this purpose, we excluded 21 children from the analysis who needed 3 or 4 learning cycles. This resulted in 39 young adults and 28 children being included in the subsequent analysis. 

      (i) We reran the behavioral analysis with the subsampled dataset (see Supplementary Materials,  Table S1.1, Fig. S1, Table S1.2). This analysis replicated the previous findings of less robust memory consolidation in children across all time delays. 

      (ii) We reran the univariate analysis (see in Supplementary Materials, Table S9.1). This analysis also replicated fully the previous findings. This indicates that the inclusion of child participants with greater material exposure during learning in the analysis of neural retrieval patterns did not affect the group differences in univariate neural results. 

      These subsampled results demonstrated that the amount of re-exposure to stimuli during encoding does not affect consolidation-related changes in memory retrieval at the behavioral and neural levels in children and adults across all time delays. We have added this information to the manuscript (line 343-348, 420-425). 

      (2) We appreciate Reviewer 1's suggestion to perform a formal test by sub-sampling the adult trials to match the number of trials in the child group. However, we believe that this approach may not be optimal for the following reasons:

      (i) Loss of Statistical Power: Sub-sampling the adult trials would result in a reduced sample size, potentially leading to a significant loss of statistical power and the ability to detect meaningful effects, particularly in a context where the adult group is intended to serve as a robust control or comparison group.

      (ii) Introducing sub-sampling could introduce variability that complicates the interpretation of results, particularly if the trial sub-sampling process does not fully capture the variability inherent in the original adult data.

      (iii) Robustness of Existing Findings: We have already addressed potential concerns about unequal trial numbers by conducting analyses that control for the number of learning cycles, as detailed in our supplementary materials. These analyses have shown that the observed effects are consistent, suggesting that the differences in trial numbers do not critically influence our findings.

      Given these considerations, we hope the reviewer understands our rationale and agrees that the current analysis is robust and appropriate for addressing the research questions.

      I also still fundamentally disagree with the use of global signals when comparing children to adults, and think this could very much skew the results. 

      We thank the reviewer for raising this important issue. To address this concern comprehensively, we have taken the following steps:

      (1) Overview of the literature support for global signal regression (GSR). A growing body of methodological and empirical research supports the inclusion of global signal repression as part of best practice denoising pipelines, particularly when analyzing pediatric fMRI data. Studies such as (Ciric et al., 2017; Parkes et al., 2018; J. D. Power et al., 2012, 2014; Power et al., 2012), and (Thompson et al., 2016) show that  GSR improves motion-related artifact removal. Critically, pediatric-specific studies (Disselhoff et al., 2025; Graff et al., 2022) conclude that pipelines including GSR are most effective for signal recovery and artifact removal in younger children. Graff et al. (2021) demonstrated that among various pipelines, GSR yielded the best noise reduction in 4–8-year-olds. Additionally, (Li et al., 2019; Qing et al., 2015) emphasized that GSR reduces artifactual variance without distorting the spatial structure of neural signals. (Ofoghi et al., 2021)demonstrated that global signal regression helps mitigate non-neuronal noise sources, including respiration, cardiac activity, motion, vasodilation, and scanner-related artifacts. Based on this and other recent findings, we consider GSR particularly beneficial for denoising paediatric  fMRI data in our study.

      (2) Empirical comparison of pipelines with and without GSR. We re-run the entire first-level univariate analysis using the pipeline that excluded the global signal regression. The resulting activation maps (see Supplementary Figure S3.2, S4.2, S5.2, S9.2) differed notably from the original pipeline. Specifically, group differences in cortical regions such as mPFC, cerebellum, and posterior PHG no longer reached significance, and the overall pattern of results appeared noisier. 

      (3) Evaluation of the pipeline differences. To further evaluate the impact of GSR, we conducted the following analyses:

      (a) Global signal is stable across groups and sessions. A linear mixed-effects model showed no significant main effects or interactions involving group or session on the global signal (F-values < 2.62, p > .11), suggesting that the global signal was not group- or session-dependent in our sample. 

      (b) Noise Reduction Assessment via Contrast Variability. We compared the variability (standard deviation and IQR) of contrast estimates across pipelines. Both SD (b = .070, p < .001) and IQR (b = .087, p < .001) were significantly reduced in the GSR pipeline, especially in children (p < .001) compared to adults (p = .048). This suggests that GSR reduces inter-subject variability in children, likely reflecting improved signal quality.

      (c) Residual Variability After Regressing Global Signal. We regressed out global signal post hoc from both pipelines and compared the residual variance. Residual standard deviation was significantly lower for the GSR pipeline (F = 199, p < .001), with no interaction with session or group, further indicating that GSR stabilizes the signal and attenuates non-neuronal variability.

      Conclusion

      In summary, while we understand the reviewer’s concern, we believe the empirical and theoretical support for GSR, especially in pediatric samples, justifies its use in our study. Nonetheless, to ensure full transparency, we provide full results from both pipelines in the Supplementary Materials and have clarified our reasoning in the revised manuscript.

      Reviewer #1 (Recommendations For The Authors): 

      (1) Some figures are still missing descriptions of what everything on the graph means; please clarify in captions. 

      We thank the reviewer for pointing this out. We undertook the necessary adjustments in the graph annotations. 

      (2) The authors conclude they showed evidence of neural reorganization of memory representations in children (p. 41). But the gist is not greater in children than adults, and also does not differ over time-so, I was confused about what this claim was based on? 

      We thank the reviewer for raising this question. Our results on gist-like reinstatements suggest that gist-like reinstatement was significantly higher in children compared to adults in the mPFC in addition to the child gist-like reinstatement indices being significantly higher than zero (see p.27-28). These results support our claim on neural reorganization of memory represenations in children. We hope this clarifies the issue. 

      References

      Bartlett, F. C. (1932). Remembering: A study in experimental and social psychology. Cambridge University Press.

      Brainerd, C. J., & Reyna, V. F. (2002). Fuzzy-Trace Theory: Dual Processes in Memory, Reasoning, and Cognitive Neuroscience (pp. 41–100). https://doi.org/10.1016/S00652407(02)80062-3

      Chen, J., Leong, Y. C., Honey, C. J., Yong, C. H., Norman, K. A., & Hasson, U. (2017). Shared memories reveal shared structure in neural activity across individuals. Nature Neuroscience, 20(1), 115–125. https://doi.org/10.1038/nn.4450

      Ciric, R., Wolf, D. H., Power, J. D., Roalf, D. R., Baum, G. L., Ruparel, K., Shinohara, R. T., Elliott, M. A., Eickhoff, S. B., Davatzikos, C., Gur, R. C., Gur, R. E., Bassett, D. S., & Satterthwaite, T. D. (2017). Benchmarking of participant-level confound regression strategies for the control of motion artifact in studies of functional connectivity. NeuroImage, 154, 174–187. https://doi.org/10.1016/j.neuroimage.2017.03.020

      Disselhoff, V., Jakab, A., Latal, B., Schnider, B., Wehrle, F. M., Hagmann, C. F., Held, U., O’Gorman, R. T., Fauchère, J.-C., & Hüppi, P. (2025). Inhibition abilities and functional brain connectivity in school-aged term-born and preterm-born children. Pediatric Research, 97(1), 315–324. https://doi.org/10.1038/s41390-024-03241-0

      Esteban, O., Ciric, R., Finc, K., Blair, R. W., Markiewicz, C. J., Moodie, C. A., Kent, J. D., Goncalves, M., DuPre, E., Gomez, D. E. P., Ye, Z., Salo, T., Valabregue, R., Amlien, I. K., Liem, F., Jacoby, N., Stojić, H., Cieslak, M., Urchs, S., … Gorgolewski, K. J. (2020). Analysis of task-based functional MRI data preprocessed with fMRIPrep. Nature Protocols, 15(7), 2186–2202. https://doi.org/10.1038/s41596-020-0327-3

      Fandakova, Y., Leckey, S., Driver, C. C., Bunge, S. A., & Ghetti, S. (2019). Neural specificity of scene representations is related to memory performance in childhood. NeuroImage, 199, 105–113. https://doi.org/10.1016/j.neuroimage.2019.05.050

      Gautama, T., & Van Hulle, M. M. (2004). Optimal spatial regularisation of autocorrelation estimates in fMRI analysis. NeuroImage, 23(3), 1203–1216.  https://doi.org/10.1016/j.neuroimage.2004.07.048

      Graff, K., Tansey, R., Ip, A., Rohr, C., Dimond, D., Dewey, D., & Bray, S. (2022). Benchmarking common preprocessing strategies in early childhood functional connectivity and intersubject correlation fMRI. Developmental Cognitive Neuroscience, 54, 101087. https://doi.org/10.1016/j.dcn.2022.101087

      Horner, A. J., & Burgess, N. (2013). The associative structure of memory for multi-element events. Journal of Experimental Psychology: General, 142(4), 1370–1383. https://doi.org/10.1037/a0033626

      Jones, J. S., the CALM Team, & Astle, D. E. (2021). A transdiagnostic data-driven study of children’s behaviour and the functional connectome. Developmental Cognitive Neuroscience, 52, 101027. https://doi.org/10.1016/j.dcn.2021.101027

      Kuhl, B. A., Bainbridge, W. A., & Chun, M. M. (2012). Neural Reactivation Reveals Mechanisms for Updating Memory. Journal of Neuroscience, 32(10), 3453–3461. https://doi.org/10.1523/JNEUROSCI.5846-11.2012

      Kuhl, B. A., & Chun, M. M. (2014). Successful Remembering Elicits Event-Specific Activity Patterns in Lateral Parietal Cortex. Journal of Neuroscience, 34(23), 8051–8060. https://doi.org/10.1523/JNEUROSCI.4328-13.2014

      Li, J., Kong, R., Liégeois, R., Orban, C., Tan, Y., Sun, N., Holmes, A. J., Sabuncu, M. R., Ge, T., & Yeo, B. T. T. (2019). Global signal regression strengthens association between resting-state functional connectivity and behavior. NeuroImage, 196, 126–141. https://doi.org/10.1016/j.neuroimage.2019.04.016

      Ofoghi, B., Chenaghlou, M., Mooney, M., Dwyer, D. B., & Bruce, L. (2021). Team technical performance characteristics and their association with match outcome in elite netball. International Journal of Performance Analysis in Sport, 21(5), 700–712. https://doi.org/10.1080/24748668.2021.1938424

      Pacheco Estefan, D., Sánchez-Fibla, M., Duff, A., Principe, A., Rocamora, R., Zhang, H., Axmacher, N., & Verschure, P. F. M. J. (2019). Coordinated representational reinstatement in the human hippocampus and lateral temporal cortex during episodic memory retrieval. Nature Communications, 10(1), 2255. https://doi.org/10.1038/s41467019-09569-0

      Parkes, L., Fulcher, B., Yücel, M., & Fornito, A. (2018). An evaluation of the efficacy, reliability, and sensitivity of motion correction strategies for resting-state functional MRI. NeuroImage, 171, 415–436. https://doi.org/10.1016/j.neuroimage.2017.12.073

      Power, J. D., Barnes, K. A., Snyder, A. Z., Schlaggar, B. L., & Petersen, S. E. (2012). Spurious but systematic correlations in functional connectivity MRI networks arise from subject motion. NeuroImage, 59(3), 2142–2154. https://doi.org/10.1016/j.neuroimage.2011.10.018

      Power, J. D., Mitra, A., Laumann, T. O., Snyder, A. Z., Schlaggar, B. L., & Petersen, S. E. (2014). Methods to detect, characterize, and remove motion artifact in resting state fMRI. NeuroImage, 84, 320–341. https://doi.org/10.1016/j.neuroimage.2013.08.048

      Power, S. D., Kushki, A., & Chau, T. (2012). Intersession Consistency of Single-Trial Classification of the Prefrontal Response to Mental Arithmetic and the No-Control State by NIRS. PLoS ONE, 7(7), e37791. https://doi.org/10.1371/journal.pone.0037791

      Prince, J. S., Charest, I., Kurzawski, J. W., Pyles, J. A., Tarr, M. J., & Kay, K. N. (2022). Improving the accuracy of single-trial fMRI response estimates using GLMsingle. ELife, 11. https://doi.org/10.7554/eLife.77599

      Qing, Z., Dong, Z., Li, S., Zang, Y., & Liu, D. (2015). Global signal regression has complex effects on regional homogeneity of resting state fMRI signal. Magnetic Resonance Imaging, 33(10), 1306–1313. https://doi.org/10.1016/j.mri.2015.07.011

      Ranganath, C., & Ritchey, M. (2012). Two cortical systems for memory-guided behaviour. Nature Reviews Neuroscience, 13(10), 713–726. https://doi.org/10.1038/nrn3338

      Ritchey, M., Wing, E. A., LaBar, K. S., & Cabeza, R. (2013). Neural Similarity Between Encoding and Retrieval is Related to Memory Via Hippocampal Interactions. Cerebral Cortex, 23(12), 2818–2828. https://doi.org/10.1093/cercor/bhs258

      Satterthwaite, T. D., Elliott, M. A., Gerraty, R. T., Ruparel, K., Loughead, J., Calkins, M. E., Eickhoff, S. B., Hakonarson, H., Gur, R. C., Gur, R. E., & Wolf, D. H. (2013). An improved framework for confound regression and filtering for control of motion artifact in the preprocessing of resting-state functional connectivity data. NeuroImage, 64, 240–256. https://doi.org/10.1016/j.neuroimage.2012.08.052

      Schommartz, I., Lembcke, P. F., Pupillo, F., Schuetz, H., de Chamorro, N. W., Bauer, M., Kaindl, A. M., Buss, C., & Shing, Y. L. (2023). Distinct multivariate structural brain profiles are related to variations in short- and long-delay memory consolidation across children and young adults. Developmental Cognitive Neuroscience, 59. https://doi.org/10.1016/J.DCN.2022.101192

      Sekeres, M. J., Winocur, G., & Moscovitch, M. (2018). The hippocampus and related neocortical structures in memory transformation. Neuroscience Letters, 680, 39–53. https://doi.org/10.1016/j.neulet.2018.05.006

      Shinn, L. J., & Lagalwar, S. (2021). Treating Neurodegenerative Disease with Antioxidants: Efficacy of the Bioactive Phenol Resveratrol and Mitochondrial-Targeted MitoQ and SkQ. Antioxidants, 10(4), 573. https://doi.org/10.3390/antiox10040573

      Staresina, B. P., Alink, A., Kriegeskorte, N., & Henson, R. N. (2013). Awake reactivation predicts memory in humans. Proceedings of the National Academy of Sciences, 110(52), 21159–21164. https://doi.org/10.1073/pnas.1311989110

      St-Laurent, M., & Buchsbaum, B. R. (2019). How Multiple Retrievals Affect Neural Reactivation in Young and Older Adults. The Journals of Gerontology: Series B, 74(7), 1086–1100. https://doi.org/10.1093/geronb/gbz075

      Thompson, G. J., Riedl, V., Grimmer, T., Drzezga, A., Herman, P., & Hyder, F. (2016). The Whole-Brain “Global” Signal from Resting State fMRI as a Potential Biomarker of Quantitative State Changes in Glucose Metabolism. Brain Connectivity, 6(6), 435–447. https://doi.org/10.1089/brain.2015.0394

      Tompary, A., & Davachi, L. (2017). Consolidation Promotes the Emergence of Representational Overlap in the Hippocampus and Medial Prefrontal Cortex. Neuron, 96(1), 228-241.e5. https://doi.org/10.1016/j.neuron.2017.09.005

      Tompary, A., Zhou, W., & Davachi, L. (2020). Schematic memories develop quickly, but are not expressed unless necessary. PsyArXiv.

      Woolrich, M. W., Behrens, T. E. J., Beckmann, C. F., Jenkinson, M., & Smith, S. M. (2004). Multilevel linear modelling for FMRI group analysis using Bayesian inference. NeuroImage, 21(4), 1732–1747. https://doi.org/10.1016/j.neuroimage.2003.12.023

      Xiao, X., Dong, Q., Gao, J., Men, W., Poldrack, R. A., & Xue, G. (2017). Transformed Neural Pattern Reinstatement during Episodic Memory Retrieval. The Journal of Neuroscience, 37(11), 2986–2998. https://doi.org/10.1523/JNEUROSCI.2324-16.2017

      Ye, Z., Shi, L., Li, A., Chen, C., & Xue, G. (2020). Retrieval practice facilitates memory updating by enhancing and differentiating medial prefrontal cortex representations. ELife, 9, 1–51. https://doi.org/10.7554/ELIFE.57023

      Yonelinas, A. P., Ranganath, C., Ekstrom, A. D., & Wiltgen, B. J. (2019). A contextual binding theory of episodic memory: systems consolidation reconsidered. Nature Reviews. Neuroscience, 20(6), 364–375. https://doi.org/10.1038/S41583-019-01504

      Zhuang, L., Wang, J., Xiong, B., Bian, C., Hao, L., Bayley, P. J., & Qin, S. (2021). Rapid neural reorganization during retrieval practice predicts subsequent long-term retention and false memory. Nature Human Behaviour, 6(1), 134–145.

      https://doi.org/10.1038/s41562-021-01188-4

    1. Reviewer #1 (Public review):

      Summary:

      In the research manuscript submitted to eLife (Manuscript ID eLife-RP-RA-2024-104545) titled "Therapeutic benefits of maintaining CDK4/6 inhibitors and incorporating CDK2 inhibitors beyond progression in breast cancer" authors identified 1) CDK4/6i treatment attenuates the growth of drug-resistant cell by prolongation of G1 phase; 2) CDK4/6i treatment results in an ineffective Rb inactivation pathways and suppress the growth of drug-resistant tumors; 3) Addition of endocrine therapy augments the efficacy of CDK4/6i maintenance; 4) Addition of CDK2i with CDK4/6 treatment as second-line treatment can suppress the growth of resistant cell; 5) finally role of cyclin E as key driver of resistance to CDK4/6 and CDK2 inhibition.

      Strengths:

      To prove authors complicated proposal, authors employed orchestration of several kinds of live cell markers, timed in situ hybridization, IF and Immono-bloting. The authors strongly recognize the resistance of CDK4/6 + ET therapy and demonstrated how to overcome it.

      Weaknesses:

      None.

      Comments on revisions:

      In response to the reviewers' questions and comments, the authors have revised the manuscript accordingly and sufficiently addressed the differences between their study and previous works on CDK4/6 and CDK2 combination therapy as a second-line approach.

    2. Reviewer #3 (Public review):

      Summary:

      In their manuscript, Armand and colleagues investigate the potential of continuing CDK4/6 inhibitors or combining them with CDK2 inhibitors in the treatment of breast cancer that has developed resistance to initial therapy. Utilizing cellular and animal models, the research examines whether maintaining CDK4/6 inhibition or adding CDK2 inhibitors can effectively control tumor growth after resistance has set in. The key findings from the study indicate that the sustained use of CDK4/6 inhibitors can slow down the proliferation of cancer cells that have become resistant, and the combination of CDK2 inhibitors with CDK4/6 inhibitors can further enhance the suppression of tumor growth. Additionally, the study identifies that high levels of Cyclin E play a significant role in resistance to the combined therapy. These results suggest that continuing CDK4/6 inhibitors along with the strategic use of CDK2 inhibitors could be an effective strategy to overcome treatment resistance in hormone receptor-positive breast cancer. However, several issues need to be addressed before considering its publication.

      Strengths:

      (1) Continuous CDK4/6 Inhibitor Treatment Significantly Suppresses the Growth of Drug-Resistant HR+ Breast Cancer: The study demonstrates that the continued use of CDK4/6 inhibitors, even after disease progression, can significantly inhibit the growth of drug-resistant breast cancer.

      (2) Potential of Combined Use of CDK2 Inhibitors with CDK4/6 Inhibitors: The research highlights the potential of combining CDK2 inhibitors with CDK4/6 inhibitors to effectively suppress CDK2 activity and overcome drug resistance.

      (3) Discovery of Cyclin E Overexpression as a Key Driver: The study identifies overexpression of cyclin E as a key driver of resistance to the combination of CDK4/6 and CDK2 inhibitors, providing insights for future cancer treatments.

      (4) Consistency of In Vitro and In Vivo Experimental Results: The study obtained supportive results from both in vitro cell experiments and in vivo tumor models, enhancing the reliability of the research.

      (5) Validation with Multiple Cell Lines: The research utilized multiple HR+/HER2- breast cancer cell lines (such as MCF-7, T47D, CAMA-1) and triple-negative breast cancer cell lines (such as MDA-MB-231), validating the broad applicability of the results.

      Comments on revisions:

      The authors made a significant effort to improve the manuscript. My comments were sufficiently addressed.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary: 

      In this manuscript, the authors identified that

      (1) CDK4/6i treatment attenuates the growth of drug-resistant cells by prolongation of the G1 phase;

      (2) CDK4/6i treatment results in an ineffective Rb inactivation pathway and suppresses the growth of drugresistant tumors;

      (3) Addition of endocrine therapy augments the efficacy of CDK4/6i maintenance; 

      (4) Addition of CDK2i with CDK4/6 treatment as second-line treatment can suppress the growth of resistant cell; 

      (5) The role of cyclin E as a key driver of resistance to CDK4/6 and CDK2 inhibition.

      Strengths: 

      To prove their complicated proposal, the authors employed orchestration of several kinds of live cell markers, timed in situ hybridization, IF and Immunoblotting. The authors strongly recognize the resistance of CDK4/6 + ET therapy and demonstrated how to overcome it. 

      Weaknesses: 

      The authors need to underscore their proposed results from what is to be achieved by them and by other researchers. 

      Reviewer #2 (Public review): 

      Summary: 

      This study elucidated the mechanism underlying drug resistance induced by CDK4/6i as a single agent and proposed a novel and efficacious second-line therapeutic strategy. It highlighted the potential of combining CDK2i with CDK4/6i for the treatment of HR+/HER2- breast cancer.

      Strengths: 

      The study demonstrated that CDK4/6 induces drug resistance by impairing Rb activation, which results in diminished E2F activity and a delay in G1 phase progression. It suggests that the synergistic use of CDK2i and CDK4/6i may represent a promising second-line treatment approach. Addressing critical clinical challenges, this study holds substantial practical implications.

      Weaknesses: 

      (1) Drug-resistant cell lines: Was a drug concentration gradient treatment employed to establish drug-resistant cell lines? If affirmative, this methodology should be detailed in the materials and methods section. 

      We greatly appreciate the reviewer for raising this important question. In the revised manuscript, we have updated the methods section (“Drug-resistant cell lines”) to more precisely describe how the drug-resistant cell lines were established. 

      (2) What rationale informed the selection of MCF-7 cells for the generation of CDK6 knockout cell lines? Supplementary Figure 3. A indicates that CDK6 expression levels in MCF-7 cells are not notably elevated. 

      We appreciate the reviewer’s insightful question about the rationale for selecting MCF-7 cells to generate CDK6 knockout cell lines. This choice was guided by prior studies highlighting the significant role of CDK6 in mediating resistance to CDK4/6 inhibitors (21-24). Moreover, we observed a 4.6-fold increase in CDK6 expression in CDK4/6i resistant MCF-7 cells compared to their drug-naïve counterparts (Supplementary Figure 3A). While we did not detect notable differences in CDK4/6 activity between wild-type and CDK6 knockout cells under CDK4/6 inhibitor treatment, these findings point to a potential non-canonical function of CDK6 in conferring resistance to CDK4/6 inhibitors.  

      (3) For each experiment, particularly those involving mice, the author must specify the number of individuals utilized and the number of replicates conducted, as detailed in the materials and methods section. 

      We sincerely thank the reviewer for bringing this to our attention. In the revised manuscript, we have explicitly stated the number of replicates and mice used for each experiment as appropriate in figure legends and relevant text to ensure transparency and clarity. 

      (4) Could this treatment approach be extended to triple-negative breast cancer?

      We greatly appreciate the reviewer’s inquiry about extending our findings to triple-negative breast cancer (TNBC). Based on the data presented in Figure 1 and Supplementary Figure 2, which include the TNBC cell line MDA-MB-231, we expect that the benefits of maintaining CDK4/6 inhibitors could indeed be applicable to TNBC with an intact Rb/E2F pathway. Additionally, our recent paper (25) indicates a similar mechanism in TNBC.

      Reviewer #3 (Public review):

      Summary: 

      In their manuscript, Armand and colleagues investigate the potential of continuing CDK4/6 inhibitors or combining them with CDK2 inhibitors in the treatment of breast cancer that has developed resistance to initial therapy. Utilizing cellular and animal models, the research examines whether maintaining CDK4/6 inhibition or adding CDK2 inhibitors can effectively control tumor growth after resistance has set in. The key findings from the study indicate that the sustained use of CDK4/6 inhibitors can slow down the proliferation of cancer cells that have become resistant, and the combination of CDK2 inhibitors with CDK4/6 inhibitors can further enhance the suppression of tumor growth. Additionally, the study identifies that high levels of Cyclin E play a significant role in resistance to the combined therapy. These results suggest that continuing CDK4/6 inhibitors along with the strategic use of CDK2 inhibitors could be an effective strategy to overcome treatment resistance in hormone receptor-positive breast cancer.

      Strengths: 

      (1) Continuous CDK4/6 Inhibitor Treatment Significantly Suppresses the Growth of Drug-Resistant HR+ Breast Cancer: The study demonstrates that the continued use of CDK4/6 inhibitors, even after disease progression, can significantly inhibit the growth of drug-resistant breast cancer. 

      (2) Potential of Combined Use of CDK2 Inhibitors with CDK4/6 Inhibitors: The research highlights the potential of combining CDK2 inhibitors with CDK4/6 inhibitors to effectively suppress CDK2 activity and overcome drug resistance. 

      (3) Discovery of Cyclin E Overexpression as a Key Driver: The study identifies overexpression of cyclin E as a key driver of resistance to the combination of CDK4/6 and CDK2 inhibitors, providing insights for future cancer treatments. 

      (4) Consistency of In Vitro and In Vivo Experimental Results: The study obtained supportive results from both in vitro cell experiments and in vivo tumor models, enhancing the reliability of the research. 

      (5) Validation with Multiple Cell Lines: The research utilized multiple HR+/HER2- breast cancer cell lines (such as MCF-7, T47D, CAMA-1) and triple-negative breast cancer cell lines (such as MDA-MB-231), validating the broad applicability of the results.

      Weaknesses: 

      (1) The manuscript presents intriguing findings on the sustained use of CDK4/6 inhibitors and the potential incorporation of CDK2 inhibitors in breast cancer treatment. However, I would appreciate a more detailed discussion of how these findings could be translated into clinical practice, particularly regarding the management of patients with drug-resistant breast cancer. 

      Thank you to the reviewer for this crucial comment. In the revised Discussion, we've broadened our exploration of clinical translation. Specifically, we emphasize that ongoing CDK4/6 inhibition, although not fully stopping resistant tumors, significantly slows their growth and may offer a therapeutic window when combined with ET and CDK2 inhibition. We also note that these approaches may work best for patients without Rb loss or newly acquired resistance-driving mutations, and that cyclin E overexpression could be a biomarker to inform patient selection. These points together highlight that our findings provide a mechanistic understanding and potential framework for clinical trials testing maintenance CDK4/6i with selective addition of CDK2i as a secondline strategy in drug-resistant HR+/HER2- breast cancer.

      (2) While the emergence of resistance is acknowledged, the manuscript could benefit from a deeper exploration of the molecular mechanisms underlying resistance development. A more thorough understanding of how CDK2 inhibitors may overcome this resistance would be valuable. 

      We thank the reviewer for this valuable suggestion. In the revised manuscript, we have expanded our Discussion to more explicitly synthesize the molecular mechanisms of resistance and how CDK2 inhibitors counteract them. Specifically, we describe how sustained CDK4/6 inhibition drives a non-canonical route of Rb degradation, resulting in inefficient E2F activation and prolonged G1 phase progression. We also highlight the role of c-Myc in amplifying E2F activity and promoting resistance, and we show that continued ET mitigates this effect by suppressing c-Myc. Importantly, we demonstrate that CDK2 inhibition alone cannot fully suppress the growth of resistant cells, but when combined with CDK4/6 inhibition, it produces durable repression of E2F and Myc target gene programs and significantly delays the G1/S transition. Finally, we identify cyclin E overexpression as a key mechanism of escape from dual CDK4/6i + CDK2i therapy, suggesting its potential as a biomarker for patient stratification . Together, these findings provide a detailed mechanistic rationale for how CDK2 inhibition can overcome specific pathways of resistance in HR<sup>+</sup>/HER2<sup>-</sup> breast cancer.

      (3) The manuscript supports the continued use of CDK4/6 inhibitors, but it lacks a discussion on the long-term efficacy and safety of this approach. Additional studies or data to support the safety profile of prolonged CDK4/6 inhibitor use would strengthen the manuscript. 

      We appreciate the reviewer’s insightful comment. In the revised manuscript, we emphasize the longterm efficacy and safety considerations of sustained CDK4/6 inhibition. Clinical trial and retrospective data have shown that continued CDK4/6i therapy can extend progression-free survival in selected patients, while maintaining a favorable safety profile (26-28). We have updated the Discussion to highlight these findings more explicitly, underscoring that while prolonged CDK4/6 inhibition slows but does not fully arrest tumor growth, it remains a clinically viable strategy when balanced against its manageable toxicity profile.

      Reviewer #1 (Recommendations for the authors): 

      It is well known that the combination therapy of CDK4/6i and ET has therapeutic benefits in ER(+) HER2(-) advanced breast cancer. However, drug resistance is a problem, and second-line therapy to solve this problem has not been established. Although some parts of the research results are already reported, the authors confirmed them by employing live cell markers, and further proved and suggested how to overcome this resistance in detail. This part is considered novel. 

      Overall, this research manuscript is eligible to be accepted with the appropriate addressing of questions.

      (1)The effects and biochemical changes of combination therapy of CDK4/6i and CDK2i are already known in several papers. The author needs to highlight the differences between the author's research and that of otherresearchers. 

      We thank the reviewer for the opportunity to clarify the novelty of our findings in the context of prior studies on CDK4/6i and CDK2i combination therapy. In the revised manuscript, we have updated the Discussion section to more clearly delineate how our work extends and differs from existing research.

      Specifically, we now state:

      Page 12: The combination of CDK4/6i and ET has reshaped treatment for HR<sup>+</sup>/HER2<sup>-</sup> breast cancer (1-8). However, resistance commonly emerges, and no consensus second-line standard is established. Our data show that continued CDK4/6i treatment in drug-resistant cells engages a non-canonical, proteolysis-driven route of Rb inactivation, yielding attenuated E2F output and a pronounced delay in G1 progression (Figure 7G). Concurrent ET further deepens this blockade by suppressing c-Myc-mediated E2F amplification, thereby prolonging G1 and slowing population growth. Importantly, CDK2 inhibition alone was insufficient to control resistant cells. Robust suppression of CDK2 activity and resistant-cell growth required CDK2i in combination with CDK4/6i, consistent with prior reports supporting dual CDK targeting (9-16). Moreover, cyclin E, and in some contexts cyclin A, blunted the efficacy of the CDK4/6i and CDK2i combination by reactivating CDK2. Together, these findings provide a mechanistic rationale for maintaining CDK4/6i beyond progression and support testing ET plus CDK4/6i with the strategic addition of CDK2i, as evidenced by concordant in vitro and in vivo results.

      (2) Regarding Figures 3H and 3I, I wonder if it is live cell imaging results or if the authors counter each signal via timed IF staining slides? If live cell imaging is used, the authors need to present the methods. 

      We appreciate the reviewer’s question. Figures 3H and 3I derive from a live–fixed correlative pipeline rather than purely live imaging or independently timed IF slides. We first imaged asynchronously proliferating cells live for ≥48 h to (i) segment/track nuclei with H2B fluorescence, (ii) define mitotic exit (t = 0 at anaphase), and (iii) record CDK2 activity using a CDK2 KTR in the last live frame. Immediately after the live acquisition, we pulsed EdU (10 µM, 15 min) and fixed the same wells, photobleached fluorescent proteins (3% H₂O₂ + 20 mM HCl, 2 h, RT) to prevent crosstalk, and then performed click-chemistry EdU detection, IF for phospho-Rb (Ser807/811) and total Rb, and RNA FISH for E2F1. Fixed-cell readouts (p-Rb positivity, EdU incorporation, E2F1 mRNA puncta) were mapped back to each single cell’s live-derived time since mitosis and/or CDK2 activity, enabling the kinetic plots shown in Fig. 3H–I.

      To ensure transparency and reproducibility, we added detailed methods describing this workflow in the “Immunofluorescence and mRNA fluorescence in situ hybridization (FISH)” section under a dedicated “live– fixed pipeline” paragraph, and we cross-referenced acquisition and analysis parameters in “Live- and fixed-cell image acquisition” and “Image processing and analysis.” These updates specify: EdU pulse/fix conditions, photobleaching, antibodies/probes, imaging hardware and channels, segmentation/tracking, mitosis alignment, background correction, and how fixed readouts were binned/quantified as functions of time after mitosis and CDK2 activity.

      (3) Regarding Figure 3F, seven images were obtained in same fields? The author needs to describe the meaning of the white image and the yellow and blue image of the bottom in detail. 

      Thank you for raising this point. All seven panels in Fig. 3F are from the same field of view. The top row shows the raw channels (Hoechst, p-Rb, total Rb, and E2F1 RNA FISH). The bottom row shows the corresponding processed outputs from that field: (i) nuclear segmentation, (ii) phosphorylated Rb-status classification, and (iii) cell boundaries used for single-cell RNA-FISH quantification. We have revised the figure legend to make this explicit.

      (4) The author showed E2F mRNA by ISH, but in fact, RB does not suppress E2F mRNA but suppresses protein, so the author needs to confirm E2F at the protein level.

      We sincerely appreciate the reviewer’s thoughtful suggestion to examine E2F1 at the protein level. In our study, we focused on E2F1 mRNA expression because it is a well-established and biologically meaningful readout of E2F1 transcriptional activity. Due to its autoregulatory nature (17), the release of active E2F1 protein from Rb induces the transcription of E2F1 itself, creating a positive feedback loop. As a result, E2F1 mRNA abundance serves as a direct and reliable proxy for E2F1 protein activity (18-20). Thus, quantifying E2F1 mRNA provides a biologically relevant and mechanistic indicator of Rb-E2F pathway status. To clarify this rationale, we have updated the Results section and added references supporting our use of E2F1 mRNA as a readout for E2F1 activity.

      (5) Is it possible to synchronize cells (nocodazole shake-off, Double thymidine block) under the presence of cdk4/6i? If so, then the authors need to demonstrate the delay of G1 progression via immunoblotting. 

      We thank the reviewer for this constructive suggestion. To address it, we performed nocodazole synchronization followed by release and monitored cell-cycle progression in the presence or absence of CDK4/6 inhibition.

      Specifically, we added the following new datasets to the revised manuscript:

      Fig. 3L: Live single-cell trajectories of CDK4/6 and CDK2 activities alongside the Cdt1-degron reporter after 14 hours of nocodazole (250 nM) treatment and release. We compared the averaged traces of CDK4/6 and CDK2 activities and Cdt1 intensity in parental cells (gray) and resistant cells with (red) and without (blue) CDK4/6i maintenance. These data show suppressed and delayed CDK2 activation, as well as a right-shifted S-phase entry, particularly under continuous CDK4/6 inhibition.

      Fig. 3M: Fixed-cell EdU pulse-labeling at 4, 6, 8, 12, 16, and 24 h post-release further confirms a significant delay in S-phase entry and prolonged G1 duration in CDK4/6i-maintained cells compared with naïve and withdrawn conditions.

      Together, these results directly demonstrate the delay in G1 progression following synchronized mitotic exit under CDK4/6 inhibition.

      (6) In Figure 5C the authors showed a violin plot of c-Myc level. Is this Immunohistochemical staining? The authors need to clarify the methods.

      Thank you for flagging this. The c-Myc measurements in Fig. 5C are from immunofluorescence (IF), not IHC. We now state this explicitly in the legend.

      (7) Regarding Live cell immunofluorescence tracing of live-cell reporters, the author needs to clarify the methods (excitation, emission), name of instruments, and software used.

      To address this, we have expanded the “Live-cell, fixed-cell, and tumor tissue image acquisition” section in the Materials and Methods.

      (8) Lines 475 SF1A, the authors need to correct typos. Naïve Naïve.

      We greatly appreciate the reviewer’s attention to this detail and have ensured all typos have been addressed.  

      (9) The authors need to unify Cdt1-degron(legends) Vs Cdt1 degron (figures). 

      We greatly appreciate your attention to this discrepancy. Language referring to the Cdt1 degron has been unified between figures and legends. 

      Reviewer #3 (Recommendations for the authors):

      (1) While the manuscript discusses the selection of doses for CDK4/6 inhibitors and CDK2 inhibitors, there is a lack of detailed data on the dose-response relationship. Additional data on the effects of different doses would be beneficial. 

      We appreciate the reviewer’s important comment. To address it, we performed additional dose– response experiments testing a range of CDK4/6i and CDK2i concentrations. These analyses revealed a clear synergistic interaction between the two inhibitors. The new data are now presented in Figure 6G and Supplementary Figure 8F of the revised manuscript.

      (2) In clinical trials, the criteria for patient selection are crucial for interpreting study outcomes. A detailed description of the patient selection criteria should be provided.  

      We thank the reviewer for bringing this important point to our attention. In the revised manuscript, we have clarified the patient selection criteria relevant to the interpretation of clinical outcomes. Specifically, we note that retrospective analyses suggest patients with indolent disease and no prior chemotherapy may benefit most from continued CDK4/6i plus ET. Moreover, our data and others’ indicate that clinical benefit is expected in tumors retaining an intact Rb/E2F axis, while resistance-driving alterations (e.g., Rb loss, PIK3CA, ESR1, FGFR1–3, HER2, FAT1 mutations) are likely to limit efficacy. Finally, we highlight cyclin E overexpression as a potential biomarker of resistance to combined CDK4/6i and CDK2i, underscoring the need for biomarker-guided patient stratification. These additions provide a more detailed framework for patient selection in future clinical applications.

      References

      (1) Finn RS, Crown JP, Lang I, Boer K, Bondarenko IM, Kulyk SO, et al. The cyclin-dependent kinase 4/6 inhibitor palbociclib in combination with letrozole versus letrozole alone as first-line treatment of oestrogen receptor-positive, HER2-negative, advanced breast cancer (PALOMA-1/TRIO-18): a randomised phase 2 study. Lancet Oncol 2015;16:25-35

      (2) Finn RS, Martin M, Rugo HS, Jones S, Im S-A, Gelmon K, et al. Palbociclib and Letrozole in Advanced Breast Cancer. New England Journal of Medicine 2016;375:1925-36

      (3) Turner NC, Slamon DJ, Ro J, Bondarenko I, Im S-A, Masuda N, et al. Overall Survival with Palbociclib and Fulvestrant in Advanced Breast Cancer. New England Journal of Medicine 2018;379:1926-36

      (4) Dickler MN, Tolaney SM, Rugo HS, Cortés J, Diéras V, Patt D, et al. MONARCH 1, A Phase II Study of Abemaciclib, a CDK4 and CDK6 Inhibitor, as a Single Agent, in Patients with Refractory HR(+)/HER2(-) Metastatic Breast Cancer. Clin Cancer Res 2017;23:5218-24

      (5) Johnston S, Martin M, Di Leo A, Im S-A, Awada A, Forrester T, et al. MONARCH 3 final PFS: a randomized study of abemaciclib as initial therapy for advanced breast cancer. npj Breast Cancer 2019;5:5

      (6) Hortobagyi GN, Stemmer SM, Burris HA, Yap Y-S, Sonke GS, Hart L, et al. Overall Survival with Ribociclib plus Letrozole in Advanced Breast Cancer. New England Journal of Medicine 2022;386:94250

      (7) Slamon DJ, Neven P, Chia S, Fasching PA, De Laurentiis M, Im S-A, et al. Overall Survival with Ribociclib plus Fulvestrant in Advanced Breast Cancer. New England Journal of Medicine 2019;382:51424

      (8) Im S-A, Lu Y-S, Bardia A, Harbeck N, Colleoni M, Franke F, et al. Overall Survival with Ribociclib plus Endocrine Therapy in Breast Cancer. New England Journal of Medicine 2019;381:307-16

      (9) Pandey K, Park N, Park KS, Hur J, Cho YB, Kang M, et al. Combined CDK2 and CDK4/6 Inhibition Overcomes Palbociclib Resistance in Breast Cancer by Enhancing Senescence. Cancers (Basel) 2020;12

      (10) Freeman-Cook K, Hoffman RL, Miller N, Almaden J, Chionis J, Zhang Q, et al. Expanding control of the tumor cell cycle with a CDK2/4/6 inhibitor. Cancer Cell 2021;39:1404-21 e11

      (11) Dietrich C, Trub A, Ahn A, Taylor M, Ambani K, Chan KT, et al. INX-315, a selective CDK2 inhibitor, induces cell cycle arrest and senescence in solid tumors. Cancer Discov 2023

      (12) Al-Qasem AJ, Alves CL, Ehmsen S, Tuttolomondo M, Terp MG, Johansen LE, et al. Co-targeting CDK2 and CDK4/6 overcomes resistance to aromatase and CDK4/6 inhibitors in ER+ breast cancer. NPJ Precis Oncol 2022;6:68

      (13) Kudo R, Safonov A, Jones C, Moiso E, Dry JR, Shao H, et al. Long-term breast cancer response to CDK4/6 inhibition defined by TP53-mediated geroconversion. Cancer Cell 2024

      (14) Arora M, Moser J, Hoffman TE, Watts LP, Min M, Musteanu M, et al. Rapid adaptation to CDK2 inhibition exposes intrinsic cell-cycle plasticity. Cell 2023;186:2628-43 e21

      (15) Kumarasamy V, Wang J, Roti M, Wan Y, Dommer AP, Rosenheck H, et al. Discrete vulnerability to pharmacological CDK2 inhibition is governed by heterogeneity of the cancer cell cycle. Nature Communications 2025;16:1476

      (16) Dommer AP, Kumarasamy V, Wang J, O'Connor TN, Roti M, Mahan S, et al. Tumor Suppressors Condition Differential Responses to the Selective CDK2 Inhibitor BLU-222. Cancer Res 2025

      (17) Johnson DG, Ohtani K, Nevins JR. Autoregulatory control of E2F1 expression in response to positive and negative regulators of cell cycle progression. Genes & Development 1994;8:1514-25

      (18) Chung M, Liu C, Yang HW, Koberlin MS, Cappell SD, Meyer T. Transient Hysteresis in CDK4/6 Activity Underlies Passage of the Restriction Point in G1. Mol Cell 2019;76:562-73 e4

      (19) Kim S, Leong A, Kim M, Yang HW. CDK4/6 initiates Rb inactivation and CDK2 activity coordinates cell-cycle commitment and G1/S transition. Sci Rep 2022;12:16810

      (20) Yang HW, Chung M, Kudo T, Meyer T, Yang HW, Chung, Mingyu, Kudo T, et al. Competing memories of mitogen and p53 signalling control cell-cycle entry. Nature 2017;549:404-8

      (21) Yang C, Li Z, Bhatt T, Dickler M, Giri D, Scaltriti M, et al. Acquired CDK6 amplification promotes breast cancer resistance to CDK4/6 inhibitors and loss of ER signaling and dependence. Oncogene 2017;36:2255-64

      (22) Li Q, Jiang B, Guo J, Shao H, Del Priore IS, Chang Q, et al. INK4 Tumor Suppressor Proteins Mediate Resistance to CDK4/6 Kinase Inhibitors. Cancer Discov 2022;12:356-71

      (23) Ji W, Zhang W, Wang X, Shi Y, Yang F, Xie H, et al. c-myc regulates the sensitivity of breast cancer cells to palbociclib via c-myc/miR-29b-3p/CDK6 axis. Cell Death & Disease 2020;11:760

      (24) Wu X, Yang X, Xiong Y, Li R, Ito T, Ahmed TA, et al. Distinct CDK6 complexes determine tumor cell response to CDK4/6 inhibitors and degraders. Nature Cancer 2021;2:429-43

      (25) Kim S, Son E, Park HR, Kim M, Yang HW. Dual targeting CDK4/6 and CDK7 augments tumor response and anti-tumor immunity in breast cancer models. J Clin Invest 2025

      (26) Ravani LV, Calomeni P, Vilbert M, Madeira T, Wang M, Deng D, et al. Efficacy of Subsequent Treatments After Disease Progression on CDK4/6 Inhibitors in Patients With Hormone Receptor-Positive Advanced Breast Cancer. JCO Oncol Pract 2025;21:832-42

      (27) Martin JM, Handorf EA, Montero AJ, Goldstein LJ. Systemic Therapies Following Progression on Firstline CDK4/6-inhibitor Treatment: Analysis of Real-world Data. Oncologist 2022;27:441-6

      (28) Kalinsky K, Bianchini G, Hamilton E, Graff SL, Park KH, Jeselsohn R, et al. Abemaciclib Plus Fulvestrant in Advanced Breast Cancer After Progression on CDK4/6 Inhibition: Results From the Phase III postMONARCH Trial. J Clin Oncol 2025;43:1101-12

    1. Reviewer #2 (Public review):

      Summary:

      The manuscript by Selvaratnam et al. defines how the transcription factor HEB integrates with TCR signaling to regulate Id3 expression in the context of gdT17 maturation in the fetal thymus. Using conditional HEB ablation driven by Vav Cre, flow cytometry, scRNA-seq, and reanalysis of ChIP-seq data the authors, provide evidence for a sequential model in which HEB and TCR-induced Egr2 cooperatively upregulate Id3, enabling gdT17 maturation and limiting diversion to the ab lineages. The work provides an important mechanistic insight into how the E/ID-protein axis coordinates gd T cell specification and effector maturation.

      Strengths include:

      (1) The proposed model that HEB primes, TCR induces, and Id3 stabilizes gdT17 cells in embryonal development is elegant and consistent with the findings.

      (2) The choice of animal models and the study of a precise developmental window.

      (3) The cross-validation of flow, scRNA-seq, and ChIP-seq reanalyses strengthens the conclusions.

      (4) The study clarifies the dual role of Id3, first as an HEB-dependent maturation factor for gdT17 cells, and as a suppressor of diversion to the ab lineages.

      Weaknesses:

      (1) The ChIP-seq reanalysis indicates overlapping HEB, E2A, and Egr2 peaks ~60 kb upstream of Id3. Given that the Egr2 data are not generated using the same thymocyte subsets, some form of validation should be considered for the co-binding of HEB and Egr2, potentially ChIP-qPCR in sorted gdT17 progenitors.

      (2) E2A expression is not affected in HEB-deficient cells, raising the question of partial compensation, a point that should be specifically discussed.

      (3) All experiments are done at E18, when fetal gdT17 development predominates. The discussion could address whether these mechanisms extend to neonatal or adult gdT17 subsets.

    1. Reviewer #1 (Public review):

      Summary:

      Drosophila larval type II neuroblasts generate diverse types of neurons by sequentially expressing different temporal identity genes during development. Previous studies have shown that the transition from early temporal identity genes (such as Chinmo and Imp) to late temporal identity genes (such as Syp and Broad) depends on the activation of the expression of EcR by Seven-up (Svp) and progression through the G1/S transition of the cell cycle. In this study, Chaya and Syed examined whether the expression of Syp and EcR is regulated by cell cycle and cytokinesis by knocking down CDK1 or Pav, respectively, throughout development or at specific developmental stages. They find that knocking down CDK1 or Pav either in all type II neuroblasts throughout development or in single-type neuroblast clones after larval hatching consistently leads to failure to activate late temporal identity genes Syp and EcR. To determine whether the failure of the activation of Syp and EcR is due to impaired Svp expression, they also examined Svp expression using a Svp-lacZ reporter line. They find that Svp is expressed normally in CDK1 RNAi neuroblasts. Further, knocking down CDK1 or Pav after Svp activation still leads to loss of Syp and EcR expression. Finally, they also extended their analysis to type I neuroblasts. They find that knocking down CDK1 or Pav, either at 0 hours or at 42 hours after larval hatching, also results in loss of Syp and EcR expression in type I neuroblasts. Based on these findings, the authors conclude that cycle and cytokinesis are required for the transition from early to late temporal identity genes in both types of neuroblasts. These findings add mechanistic details to our understanding of the temporal patterning of Drosophila larval neuroblasts.

      Strengths:

      The data presented in the paper are solid and largely support their conclusion. Images are of high quality. The manuscript is well-written and clear.

      Weaknesses:

      The quantifications of the expression of temporal identity genes and the interpretation of some of the data could be more rigorous.

      (1) Expression of temporal identity genes may not be just positive or negative. Therefore, it would be more rigorous to quantify the expression of Imp, Syp, and EcR based on the staining intensity rather than simply counting the number of neuroblasts that are positive for these genes, which can be very subjective. Or the authors should define clearly what qualifies as "positive" (e.g., a staining intensity at least 2x background).

      (2) The finding that inhibiting cytokinesis without affecting nuclear divisions by knocking down Pav leads to the loss of expression of Syp and EcR does not support their conclusion that nuclear division is also essential for the early-late gene expression switch in type II NSCs (at the bottom of the left column on page 5). No experiments were done to specifically block the nuclear division in this study. This conclusion should be revised.

      (3) Knocking down CDK1 in single random neuroblast clones does not make the CDK1 knockdown neuroblast develop in the same environment (except still in the same brain) as wild-type neuroblast lineages. It does not help address the concern whether "type 2 NSCS with cell cycle arrest failed to undergo normal temporal progression is indirectly due to a lack of feedback signaling from their progeny", as discussed (from the bottom of the right column on page 9 to the top of the left column on page 10). The CDK1 knockdown neuroblasts do not divide to produce progeny and thus do not receive a feedback signal from their progeny as wild-type neuroblasts do. Therefore, it cannot be ruled out that the loss of Syp and EcR expression in CDK1 knockdown neuroblasts is due to the lack of the feedback signal from their progeny. This part of the discussion needs to be clarified.

      (4) In Figure 2I, there is a clear EcR staining signal in the clone, which contradicts the quantification data in Figure 2J that EcR is absent in Pav RNAi neuroblasts. The authors should verify that the image and quantification data are consistent and correct.

    1. Reviewer #2 (Public review):

      This short report by Hensley and Yildiz explores kinesin-1 motility under more physiological load geometries than previous studies. Large Z-direction (or radial) forces are a consequence of certain optical trap experimental geometries, and likely do not occur in the cell. Use of a long DNA tether between the motor and the bead can alleviate Z-component forces. The authors perform three experiments. In the first, they use two assay geometries - one with kinesin attached directly to a bead and the other with kinesin attached via a 2 kbp DNA tether - with a constant-position trap to determine that reducing the Z component of force leads to a difference in stall time but not stall force. In the second, they use the same two assay geometries with a constant-force trap to replicate the asymmetric slip bond of kinesin-1; reducing the Z component of force leads to a small but uniform change in the run lengths and detachment rates under hindering forces but not assisting forces. In the third, they connect two or three kinesin molecules to each DNA, and measure a stronger scaling in stall force and time when the Z component of force is reduced. They conclude that kinesin-1 is a more robust motor than previously envisaged, where much of its weakness came from the application of axial force. If forces are instead along the direction of transport, kinesin can hold on longer and work well in teams. The experiments are rigorous, and the data quality is very high. There is little to critique or discuss. The improved dataset will be useful for modeling and understanding multi-motor transport. The conclusions complement other recent works that used different approaches to low-Z component kinesin force spectroscopy, and provide strong value to the kinesin field.

      Major comments:

      (1) Kinesin-1 is covalently bound to a DNA oligo, which then attaches to the DNA chassis by hybridization. This oligo is 21 nt with a relatively low GC%. At what force does this oligo unhybridize? Can the authors verify that their stall force measurements are not cut short by the oligo detaching from the chassis?

      (2) Figure 1, a justification or explanation should be provided for why events lower than 1.5 pN were excluded. It appears arbitrary.

      (3) Figure 2b, is the difference in velocity statistically significant?

      (4) The number of measurements for each experimental datapoint in the corresponding figure caption should be provided. SEM is used without, but N is not reported in the caption.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The Major Histocompatibility Complex (MHC) region is a collection of numerous genes involved in both innate and adaptive immunity. MHC genes are famed for their role in rapid evolution and extensive polymorphism in a variety of vertebrates. This paper presents a summary of gene-level gain and loss of orthologs and paralogs within MHC across the diversity of primates, using publicly available data.

      Strengths:

      This paper provides a strong case that MHC genes are rapidly gained (by paralog duplication) and lost over millions of years of macroevolution. The authors are able to identify MHC loci by homology across species, and from this infer gene duplications and losses using phylogenetic analyses. There is a remarkable amount of genic turnover, summarized in Figure 6 and Figure 7, either of which might be a future textbook figure of immune gene family evolution. The authors draw on state-of-the-art phylogenetic methods, and their inferences are robust insofar as the data might be complete enough to draw such conclusions.

      Weaknesses:

      One concern about the present work is that it relies on public databases to draw inferences about gene loss, which is potentially risky if the publicly available sequence data are incomplete. To say, for example, that a particular MHC gene copy is absent in a taxon (e.g., Class I locus F absent in Guenons according to Figure 1), we need to trust that its absence from the available databases is an accurate reflection of its absence in the genome of the actual organisms. This may be a safe assumption, but it rests on the completeness of genome assembly (and gene annotations?) or people uploading relevant data. This reviewer would have been far more comfortable had the authors engaged in some active spot-checking, doing the lab work to try to confirm absences at least for some loci and some species. Without this, a reader is left to wonder whether gene loss is simply reflecting imperfect databases, which then undercuts confidence in estimates of rates of gene loss.

      Indeed, just because a locus has not been confirmed in a species does not necessarily mean that it is absent. As we explain in the Figure 1 caption, only a few species have had their genomes extensively studied (gray background), and only for these species does the absence of a point in this figure mean that a locus is absent. The white background rows represent species that are not extensively studied, and we point out that the absence of a point does not mean that a locus is absent from the species, rather undiscovered. We have also added a parenthetical to the text to explain this (line 156): “Only species with rows highlighted in gray have had their MHC regions extensively studied (and thus only for these rows is the absence of a gene symbol meaningful).”

      While we agree that spot-checking may be a helpful next step, one of the goals of this manuscript is to collect and synthesize the enormous volume of MHC evolution research in the primates, which will serve as a jumping-off point for other researchers to perform important wet lab work.

      Some context is useful for comparing rates of gene turnover in MHC, to other loci. Changing gene copy numbers, duplications, and loss of duplicates, are common it seems across many loci and many organisms; is MHC exceptional in this regard, or merely behaving like any moderately large gene family? I would very much have liked to see comparable analyses done for other gene families (immune, like TLRs, or non-immune), and quantitative comparisons of evolutionary rates between MHC versus other genes. Does MHC gene composition evolve any faster than a random gene family? At present readers may be tempted to infer this, but evidence is not provided.

      Our companion paper (Fortier and Pritchard, 2025) demonstrates that the MHC is a unique locus in many regards, such as its evidence for deep balancing selection and its excess of disease associations. Thus, we expect that it is evolving faster than any random gene family. It would be interesting to repeat this analysis for other gene families, but that is outside of the scope of this project. Additionally, allele databases for other gene families are not nearly as developed, but as more alleles become available for other polymorphic families, a comparable analysis could become possible.

      We have added a paragraph to the discussion (lines 530-546) to clarify that we do not know for certain whether the MHC gene family is evolving rapidly compared to other gene families.

      While on the topic of making comparisons, the authors make a few statements about relative rates. For instance, lines 447-8 compare gene topology of classical versus non-classical genes; and line 450 states that classical genes experience more turnover. But there are no quantitative values given to these rates to provide numerical comparisons, nor confidence intervals provided (these are needed, given that they are estimates), nor formal statistical comparisons to confirm our confidence that rates differ between types of genes.

      More broadly, the paper uses sophisticated phylogenetic methods, but without taking advantage of macroevolutionary comparative methods that allow model-based estimation of macroevolutionary rates. I found the lack of quantitative measurements of rates of gene gain/loss to be a weakness of the present version of the paper, and something that should be readily remedied. When claiming that MHC Class I genes "turn over rapidly" (line 476) - what does rapidly mean? How rapidly? How does that compare to rates of genetic turnover at other families? Quantitative statements should be supported by quantitative estimates (and their confidence intervals).

      These statements refer to qualitative observations, so we cannot provide numerical values. We simply conclude that certain gene groups evolve faster or slower based on the species and genes present in each clade. It is difficult to provide estimates because of the incomplete sampling of genes that survived to the present day. In addition, the presence or absence of various orthologs in different species still needs to be confirmed, at which point it might be useful to be more quantitative. We have also added a paragraph to the discussion to address this concern and advocate for similar analyses of other gene families in the future when more data is available (lines 530-546).

      The authors refer to 'shared function of the MHC across species' (e.g. line 22); while this is likely true, they are not here presenting any functional data to confirm this, nor can they rule out neofunctionalization or subfunctionalization of gene duplicates. There is evidence in other vertebrates (e.g., cod) of MHC evolving appreciably altered functions, so one may not safely assume the function of a locus is static over long macroevolutionary periods, although that would be a plausible assumption at first glance.

      Indeed, we cannot assume that the function of a locus is static across time, especially for the MHC region. In our research, we read hundreds of papers that each focused on a small number of species or genes and gathered some information about them, sometimes based on functional experiments and sometimes on measures such as dN/dS. These provide some indication of a gene’s broad classification in a species or clade, even if the evidence is preliminary. Where possible, we used this preliminary evidence to give genes descriptors “classical,” “non-classical,” “dual characteristics,” “pseudogene,” “fixed”, or “unfixed.” Sometimes multiple individuals and haplotypes were analyzed, so we could even assign a minimum number of gene copies present in a species. We have aggregated all of these references into Supplementary Table 1 (for Class I/Figure 1) and Supplementary Table 2 (for Class II/Figure 2) along with specific details about which data points in these figures that each reference supports. We realize that many of these classifications are based on a small number of individuals or indirect measures, so they may change in the future as more functional data is generated.

      Reviewer #2 (Public review):

      Summary:

      The authors aim to provide a comprehensive understanding of the evolutionary history of the Major Histocompatibility Complex (MHC) gene family across primate species. Specifically, they sought to:

      (1) Analyze the evolutionary patterns of MHC genes and pseudogenes across the entire primate order, spanning 60 million years of evolution.

      (2) Build gene and allele trees to compare the evolutionary rates of MHC Class I and Class II genes, with a focus on identifying which genes have evolved rapidly and which have remained stable.

      (3) Investigate the role of often-overlooked pseudogenes in reconstructing evolutionary events, especially within the Class I region.

      (4) Highlight how different primate species use varied MHC genes, haplotypes, and genetic variation to mount successful immune responses, despite the shared function of the MHC across species.

      (5) Fill gaps in the current understanding of MHC evolution by taking a broader, multi-species perspective using (a) phylogenomic analytical computing methods such as Beast2, Geneconv, BLAST, and the much larger computing capacities that have been developed and made available to researchers over the past few decades, (b) literature review for gene content and arrangement, and genomic rearrangements via haplotype comparisons.

      (6) The authors overall conclusions based on their analyses and results are that 'different species employ different genes, haplotypes, and patterns of variation to achieve a successful immune response'.

      Strengths:

      Essentially, much of the information presented in this paper is already well-known in the MHC field of genomic and genetic research, with few new conclusions and with insufficient respect to past studies. Nevertheless, while MHC evolution is a well-studied area, this paper potentially adds some originality through its comprehensive, cross-species evolutionary analysis of primates, focus on pseudogenes and the modern, large-scale methods employed. Its originality lies in its broad evolutionary scope of the primate order among mammals with solid methodological and phylogenetic analyses.

      The main strengths of this study are the use of large publicly available databases for primate MHC sequences, the intensive computing involved, the phylogenetic tool Beast2 to create multigene Bayesian phylogenetic trees using sequences from all genes and species, separated into Class I and Class II groups to provide a backbone of broad relationships to investigate subtrees, and the presentation of various subtrees as species and gene trees in an attempt to elucidate the unique gene duplications within the different species. The study provides some additional insights with summaries of MHC reference genomes and haplotypes in the context of a literature review to identify the gene content and haplotypes known to be present in different primate species. The phylogenetic overlays or ideograms (Figures 6 and 7) in part show the complexity of the evolution and organisation of the primate MHC genes via the orthologous and paralogous gene and species pathways progressively from the poorly-studied NWM, across a few moderately studied ape species, to the better-studied human MHC genes and haplotypes.

      Weaknesses:

      The title 'The Primate Major Histocompatibility Complex: An Illustrative Example of GeneFamily Evolution' suggests that the paper will explore how the Major Histocompatibility Complex (MHC) in primates serves as a model for understanding gene family evolution. The term 'Illustrative Example' in the title would be appropriate if the paper aimed to use the primate Major Histocompatibility Complex (MHC) as a clear and representative case to demonstrate broader principles of gene family evolution. That is, the MHC gene family is not just one instance of gene family evolution but serves as a well-studied, insightful example that can highlight key mechanisms and concepts applicable to other gene families. However, this is not the case, this paper only covers specific details of primate MHC evolution without drawing broader lessons to any other gene families. So, the term 'Illustrative Example' is too broad or generalizing. In this case, a term like 'Case Study' or simply 'Example' would be more suitable. Perhaps, 'An Example of Gene Family Diversity' would be more precise. Also, an explanation or 'reminder' is suggested that this study is not about the origins of the MHC genes from the earliest jawed vertebrates per se (~600 mya), but it is an extension within a subspecies set that has emerged relatively late (~60 mya) in the evolutionary divergent pathways of the MHC genes, systems, and various vertebrate species.

      Thank you for your input on the title; we have changed it to “A case study of gene family evolution” instead.

      Thank you also for pointing out the potential confusion about the time span of our study. We have added “Having originated in the jawed vertebrates,” to a sentence in the introduction (lines 38-39). We have also added the sentence “Here, we focus on the primates, spanning approximately 60 million years within the over 500-million-year evolution of the family \citep{Flajnik2010}.“ to be more explicit about the context for our work (lines 59-61).

      Phylogenomics. Particular weaknesses in this study are the limitations and problems associated with providing phylogenetic gene and species trees to try and solve the complex issue of the molecular mechanisms involved with imperfect gene duplications, losses, and rearrangements in a complex genomic region such as the MHC that is involved in various effects on the response and regulation of the immune system. A particular deficiency is drawing conclusions based on a single exon of the genes. Different exons present different trees. Which are the more reliable? Why were introns not included in the analyses? The authors attempt to overcome these limitations by including genomic haplotype analysis, duplication models, and the supporting or contradictory information available in previous publications. They succeed in part with this multidiscipline approach, but much is missed because of biased literature selection. The authors should include a paragraph about the benefits and limitations of the software that they have chosen for their analysis, and perhaps suggest some alternative tools that they might have tried comparatively. How were problems with Bayesian phylogeny such as computational intensity, choosing probabilities, choosing particular exons for analysis, assumptions of evolutionary models, rates of evolution, systemic bias, and absence of structural and functional information addressed and controlled for in this study?

      We agree that different exons have different trees, which is exactly why we repeated our analysis for each exon in order to compare and contrast them. In particular, the exons encoding the binding site of the resulting protein (exons 2 and 3 for Class I and exon 2 for Class II) show evidence for trans-species polymorphism and gene conversion. These phenomena lead to trees that do not follow the species tree and are fascinating in and of themselves, which we explore in detail in our companion paper (Fortier and Pritchard, 2025). Meanwhile, the non-peptide-binding extracellular-domain-encoding exon (exon 4 for Class I and exon 3 for Class II) is comparably sized to the binding-site-encoding exons and provides an interesting functional contrast. As this exon is likely less affected by trans-species polymorphism, gene conversion, and convergent evolution, we present results from it most often in the main text, though we occasionally touch on differences between the exons. See lines 191-196, 223-226, and 407-414 for some examples of how we discuss the exons in the text. Additionally, all trees from all of these exons can be found in the supplement. 

      We agree that introns would valuable to study in this context. Even though the non--binding-site-encoding exons are probably *less* affected by trans-species polymorphism, gene conversion, and convergent evolution, they are still functional. The introns, however, experience much more relaxed selection, if any, and comparing their trees to those for the exons would be valuable and illuminating. We did not generate intron trees for two reasons. Most importantly, there is a dearth of data available for the introns; in the databases we used, there was often intron data available only for human, chimpanzee, and sometimes macaque, and only for a small subset of the genes. This limitation is at odds with the comprehensive, many-gene-many-species approach which we feel is the main novelty of this work. Secondly, the introns that *are* available are difficult to align. Even aligning the exons across such a highly-diverged set of genes and pseudogenes was difficult and required manual effort. The introns proved even more difficult to try to align across genes. In the future, when more intron data is available and sufficient effort is put into aligning them, it will be possible and desirable to do a comparable analysis. We also added a sentence to the “Data” section to briefly explain why we did not include introns (lines 134-135).

      We explain our Bayesian phylogenetics approach in detail in the Methods (lines 650-725), including our assumptions and our solutions to challenges specific to this application. For further explanation of the method itself, we suggest reading the original BEAST and BEAST2 papers (Drummond & Rambaut (2007), Drummond et al. (2012), Bouckaert et al. (2014), and Bouckaert et al. (2019)). Known structural and functional information helped us validate the alignments we used in this study, but the fact that such information is not fully known for every gene and species should not affect the method itself.

      Gene families as haplotypes. In the Introduction, the MHC is referred to as a 'gene family', and in paragraph 2, it is described as being united by the 'MHC fold', despite exhibiting 'very diverse functions'. However, the MHC region is more accurately described as a multigene region containing diverse, haplotype-specific Conserved Polymorphic Sequences, many of which are likely to be regulatory rather than protein-coding. These regulatory elements are essential for controlling the expression of multiple MHC-related products, such as TNF and complement proteins, a relationship demonstrated over 30 years ago. Non-MHC fold loci such as TNF, complement, POU5F1, lncRNA, TRIM genes, LTA, LTB, NFkBIL1, etc, are present across all MHC haplotypes and play significant roles in regulation. Evolutionary selection must act on genotypes, considering both paternal and maternal haplotypes, rather than on individual genes alone. While it is valuable to compile databases for public use, their utility is diminished if they perpetuate outdated theories like the 'birth-and-death model'. The inclusion of prior information or assumptions used in a statistical or computational model, typically in Bayesian analysis, is commendable, but they should be based on genotypic data rather than older models. A more robust approach would consider the imperfect duplication of segments, the history of their conservation, and the functional differences in inheritance patterns. Additionally, the MHC should be examined as a genomic region, with ancestral haplotypes and sequence changes or rearrangements serving as key indicators of human evolution after the 'Out of Africa' migration, and with disease susceptibility providing a measurable outcome. There are more than 7000 different HLA-B and -C alleles at each locus, which suggests that there are many thousands of human HLA haplotypes to study. In this regard, the studies by Dawkins et al (1999 Immunol Rev 167,275), Shiina et al. (2006 Genetics 173,1555) on human MHC gene diversity and disease hitchhiking (haplotypes), and Sznarkowska et al. (2020 Cancers 12,1155) on the complex regulatory networks governing MHC expression, both in terms of immune transcription factor binding sites and regulatory non-coding RNAs, should be examined in greater detail, particularly in the context of MHC gene allelic diversity and locus organization in humans and other primates.

      Thank you for these comments. To clarify that the MHC “region” is different from (and contains) the MHC “gene family” as we describe it, we changed a sentence in the abstract (lines 8-10) from “One large gene family that has experienced rapid evolution is the Major Histocompatibility Complex (MHC), whose proteins serve critical roles in innate and adaptive immunity.” to “One large gene family that has experienced rapid evolution lies within the Major Histocompatibility Complex (MHC), whose proteins serve critical roles in innate and adaptive immunity.” We know that the region is complex and contains many other genes and regulatory sequences; Figure 1 of our companion paper (Fortier and Pritchard, 2025) depicts these in order to show the reader that the MHC genes we focus on are just one part of the entire region.

      We love the suggestion to look at the many thousands of alleles present at each of the classical loci. This is the focus of our complimentary paper (Fortier and Pritchard, 2025) which explores variation at the allele level. In the current paper, we look mainly at the differences between genes and the use of different genes in different species.

      Diversifying and/or concerted evolution. Both this and past studies highlight diversifying selection or balancing selection model is the dominant force in MHC evolution. This is primarily because the extreme polymorphism observed in MHC genes is advantageous for populations in terms of pathogen defence. Diversification increases the range of peptides that can be presented to T cells, enhancing the immune response. The peptide-binding regions of MHC genes are highly variable, and this variability is maintained through selection for immune function, especially in the face of rapidly evolving pathogens. In contrast, concerted evolution, which typically involves the homogenization of gene duplicates through processes like gene conversion or unequal crossing-over, seems to play a minimal role in MHC evolution. Although gene duplication events have occurred in the MHC region leading to the expansion of gene families, the resulting paralogs often undergo divergent evolution rather than being kept similar or homozygous by concerted evolution. Therefore, unlike gene families such as ribosomal RNA genes or histone genes, where concerted evolution leads to highly similar copies, MHC genes display much higher levels of allelic and functional diversification. Each MHC gene copy tends to evolve independently after duplication, acquiring unique polymorphisms that enhance the repertoire of antigen presentation, rather than undergoing homogenization through gene conversion. Also, in some populations with high polymorphism or genetic drift, allele frequencies may become similar over time without the influence of gene conversion. This similarity can be mistaken for gene conversion when it is simply due to neutral evolution or drift, particularly in small populations or bottlenecked species. Moreover, gene conversion might contribute to greater diversity by creating hybrids or mosaics between different MHC genes. In this regard, can the authors indicate what percentage of the gene numbers in their study have been homogenised by gene conversion compared to those that have been diversified by gene conversion?

      We appreciate the summary, and we feel we have appropriately discussed both gene conversion and diversifying selection in the context of the MHC genes. Because we cannot know for sure when and where gene conversion has occurred, we cannot quantify percentages of genes that have been homogenized or diversified.  

      Duplication models. The phylogenetic overlays or ideograms (Figures 6 and 7) show considerable imperfect multigene duplications, losses, and rearrangements, but the paper's Discussion provides no in-depth consideration of the various multigenic models or mechanisms that can be used to explain the occurrence of such events. How do their duplication models compare to those proposed by others? For example, their text simply says on line 292, 'the proposed series of events is not always consistent with phylogenetic data'. How, why, when? Duplication models for the generation and extension of the human MHC class I genes as duplicons (extended gene or segmental genomic structures) by parsimonious imperfect tandem duplications with deletions and rearrangements in the alpha, beta, and kappa blocks were already formulated in the late 1990s and extended to the rhesus macaque in 2004 based on genomic haplotypic sequences. These studies were based on genomic sequences (genes, pseudogenes, retroelements), dot plot matrix comparisons, and phylogenetic analyses of gene and retroelement sequences using computer programs. It already was noted or proposed in these earlier 1999 studies that (1) the ancestor of HLA-P(90)/-T(16)/W(80) represented an old lineage separate from the other HLA class I genes in the alpha block, (2) HLA-U(21) is a duplicated fragment of HLA-A, (3) HLA-F and HLA-V(75) are among the earliest (progenitor) genes or outgroups within the alpha block, (4) distinct Alu and L1 retroelement sequences adjoining HLA-L(30), and HLA-N genomic segments (duplicons) in the kappa block are closely related to those in the HLA-B and HLA-C in the beta block; suggesting an inverted duplication and transposition of the HLA genes and retroelements between the beta and kappa regions. None of these prior human studies were referenced by Fortier and Pritchard in their paper. How does their human MHC class I gene duplication model (Fig. 6) such as gene duplication numbers and turnovers differ from those previously proposed and described by Kulski et al (1997 JME 45,599), (1999 JME 49,84), (2000 JME 50,510), Dawkins et al (1999 Immunol Rev 167,275), and Gaudieri et al (1999 GR 9,541)? Is this a case of reinventing the wheel?

      Figures 6 and 7 are intended to synthesize and reconcile past findings and our own trees, so they do not strictly adhere to the findings of any particular study and cannot fully match all studies. In the supplement, Figure 6 - figure supplement 1 and Figure 7 - figure supplement 1 duly credit all of the past work that went into making these trees. Most previous papers focus on just one aspect of these trees, such as haplotypes within a species, a specific gene or allelic lineage relationship, or the branching pattern of particular gene groups. We believe it was necessary to bring all of these pieces of evidence together. Even among papers with the same focus (to understand the block duplications that generated the current physical layout of the MHC), results differ. For example, Geraghty (1992), Hughes (1995), Kulski (2004)/Kulski (2005),  and Shiina (1999) all disagree on the exact branching order of the genes MHC-W, -P, and -T, and of MHC-G, -J, and -K. While the Kulski studies you pointed out were very thorough for their era, they still only relied on data from three species and one haplotype per species. Our work is not intended to replace or discredit these past works, simply build upon them with a larger set of species and sequences. We hope the hypotheses we propose in Figures 6 and 7 can help unify existing research and provide a more easily accessible jumping-off-point for future work.

      Results. The results are presented as new findings, whereas most if not all of the results' significance and importance already have been discussed in various other publications. Therefore, the authors might do better to combine the results and discussion into a single section with appropriate citations to previously published findings presented among their results for comparison. Do the trees and subsets differ from previous publications, albeit that they might have fewer comparative examples and samples than the present preprint? Alternatively, the results and discussion could be combined and presented as a review of the field, which would make more sense and be more honest than the current format of essentially rehashing old data.

      In starting this project, we found that a large barrier to entry to this field of study is the immense amount of published literature over 30+ years. It is both time-consuming and confusing to read up on the many nuances of the MHC genes, their changing names, and their evolution, making it difficult to start new, innovative projects. We acknowledge that while our results are not entirely novel, the main advantage of our work is that it provides a thorough, comprehensive starting point for others to learn about the MHC quickly and dive into new research. We feel that we have appropriately cited past literature in both the main text, appendices, and supplement, so that readers may dive into a particular area with ease.

      Minor corrections:

      (1) Abstract, line 19: 'modern methods'. Too general. What modern methods?

      To keep the abstract brief, the methods are introduced in the main text when each becomes relevant as well as in the methods section.

      (2) Abstract, line 25: 'look into [primate] MHC evolution.' The analysis is on the primate MHC genes, not on the entire vertebrate MHC evolution with a gene collection from sharks to humans. The non-primate MHC genes are often differently organised and structurally evolved in comparison to primate MHC.

      Thank you! We have added the word “primate” to the abstract (line 25).

      (3) Introduction, line 113. 'In a companion paper (Fortier and Pritchard, 2024)' This paper appears to be unpublished. If it's unpublished, it should not be referenced.

      This paper is undergoing the eLife editorial process at the same time; it will have a proper citation in the final version.

      (4) Figures 1 and 2. Use the term 'gene symbols' (circle, square, triangle, inverted triangle, diamond) or 'gene markers' instead of 'points'. 'Asterisks "within symbols" indicate new information.

      Thank you, the word “symbol” is much clearer! We have changed “points” to “symbols” in the captions for Figure 1, Figure 1 - figure supplement 1, Figure 2, and Figure 2 - figure supplement 1. We also changed this in the text (lines 157-158 and 170).

      (5) Figures. A variety of colours have been applied for visualisation. However, some coloured texts are so light in colour that they are difficult to read against a white background. Could darker colours or black be used for all or most texts?

      With such a large number of genes and species to handle in this work, it was nearly impossible to choose a set of colors that were distinct enough from each other. We decided to prioritize consistency (across this paper, its supplement, and our companion paper) as well as at-a-glance grouping of similar sequences. Unfortunately, this means we had to sacrifice readability on a white background, but readers may turn to the supplement if they need to access specific sequence names.

      (6) Results, line 135. '(Fortier and Pritchard, 2024)' This paper appears to be unpublished. If it's unpublished, it should not be referenced.

      Repeat of (3). This paper is undergoing the eLife editorial process at the same time; it will have a proper citation in the final version.

      (7) Results, lines 152 to 153, 164, 165, etc. 'Points with an asterisk'. Use the term 'gene symbols' (circle, square, triangle, inverted triangle, diamond) or 'gene markers' instead of 'points'. A point is a small dot such as those used in data points for plotting graphs .... The figures are so small that the asterisks in the circles, squares, triangles, etc, look like points (dots) and the points/asterisks terminology that is used is very confusing visually.

      Repeat of (4). Thank you, the word “symbol” is much clearer! We have changed “points” to “symbols” in the captions for Figure 1, Figure 1 - figure supplement 1, Figure 2, and Figure 2 - figure supplement 1. We also changed this in the text (lines 157-158 and 170).

      (8) Line 178 (BEA, 2024) is not listed alphabetically in the References.

      Thank you for catching this! This reference maps to the first bibliography entry, “SUMMARIZING POSTERIOR TREES.” We are unsure how to cite a webpage that has no explicit author within the eLife Overleaf template, so we will consult with the editor.

      (9) Lines 188-190. 'NWM MHC-G does not group with ape/OWM MHC-G, instead falling outside of the clade containing ape/OWM MHC-A, -G, -J and -K.' This is not surprising given that MHC-A, -G, -J, and -K are paralogs of each other and that some of them, especially in NWM have diverged over time from the paralogs and/or orthologs and might be closer to one paralog than another and not be an actual ortholog of OWM, apes or humans.

      We included this sentence to clarify the relationships between genes and to help describe what is happening in Figure 6. Figure 6 - figure supplement 1 includes all of the references that go into such a statement and Appendix 3 details our reasoning for this and other statements.

      (10) Line 249. Gene conversion: This is recombination between two different genes where a portion of the genes are exchanged with one another so that different portions of the gene can group within one or other of the two gene clades. Alternatively, the gene has been annotated incorrectly if the gene does not group within either of the two alternative clades. Another possibility is that one or two nucleotide mutations have occurred without a recombination resulting in a mistaken interpretation or conclusion of a recombination event. What measures are taken to avoid false-positive conclusions? How many MHC gene conversion (recombination) events have occurred according to the authors' estimates? What measures are taken to avoid false-positive conclusions?

      All of these possibilities are certainly valid. We used the program GENECONV to infer gene conversion events, but there is considerable uncertainty owing to the ages of the genes and the inevitable point mutations that have occurred post-event. Gene conversion was not the focus of our paper, so we did our best to acknowledge it (and the resulting differences between trees from different exons) without spending too much time diving into it. A list of inferred gene conversion events can be found in Figure 3 - source data 1 and Figure 4 - source data 1.

      (11) Lines 284-286. 'The Class I MHC region is further divided into three polymorphic blocks-alpha, beta, and kappa blocks-that each contains MHC genes but are separated by well-conserved non-MHC genes.' The MHC class I region was first designated into conserved polymorphic duplication blocks, alpha and beta by Dawkins et al (1999 Immunol Rev 167,275), and kappa by Kulski et al (2002 Immunol Rev 190,95), and should be acknowledged (cited) accordingly.

      Thank you for catching this! We have added these citations (lines 302-303)!

      (12) Lines 285-286. 'The majority of the Class I genes are located in the alpha-block, which in humans includes 12 MHC genes and pseudogenes.' This is not strictly correct for many other species, because the majority of class I genes might be in the beta block of new and old-world monkeys, and the authors haven't provided respective counts of duplication numbers to show otherwise. The alpha block in some non-primate mammalian species such as pigs, rats, and mice has no MHC class I genes or only a few. Most MHC class I genes in non-primate mammalian species are found in other regions. For example, see Ando et al (2005 Immunogenetics 57,864) for the pig alpha, beta, and kappa regions in the MHC class I region. There are no pig MHC genes in the alpha block.

      Yes, which is exactly why we use the phrase “in humans” in that particular sentence. The arrangement of the MHC in several other primate reference genomes is shown in Figure 1 - figure supplement 2.

      (13) Line 297 to 299. 'The alpha-block also contains a large number of repetitive elements and gene fragments belonging to other gene families, and their specific repeating pattern in humans led to the conclusion that the region was formed by successive block duplications (Shiina et al., 1999).' There are different models for successive block duplications in the alpha block and some are more parsimonious based on imperfect multigenic segmental duplications (Kulski et al 1999, 2000) than others (Shiina et al., 1999). In this regard, Kulski et al (1999, 2000) also used duplicated repetitive elements neighbouring MHC genes to support their phylogenetic analyses and multigenic segmental duplication models. For comparison, can the authors indicate how many duplications and deletions they have in their models for each species?

      We have added citations to this sentence to show that there are different published models to describe the successive block duplications (line 307). Our models in Figure 6 and Figure 7 are meant to aggregate past work and integrate our own, and thus they were not built strictly by parsimony. References can be found in Figure 6 - figure supplement 1 and Figure 7 - figure supplement 1.

      (14) Lines 315-315. 'Ours is the first work to show that MHC-U is actually an MHC-A-related gene fragment.' This sentence should be deleted. Other researchers had already inferred that MHC-U is actually an MHC-A-related gene fragment more than 25 years ago (Kulski et al 1999, 2000) when the MHC-U was originally named MHC-21.

      While these works certainly describe MHC-U/MHC-21 as a fragment in the 𝛼-block, any relation to MHC-A was by association only and very few species/haplotypes were examined. So although the idea is not wholly novel, we provide convincing evidence that not only is MHC-U related to MHC-A by sequence, but also that it is a very recent partial duplicate of MHC-A. We show this with Bayesian phylogenetic trees as well as an analysis of haplotypes across many more species than were included in those papers.  

      (15) Lines 361-362. 'Notably, our work has revealed that MHC-V is an old fragment.' This is not a new finding or hypothesis. Previous phylogenetic analysis and gene duplication modelling had already inferred HLA-V (formerly HLA-75) to be an old fragment (Kulski et al 1999, 2000).

      By “old,” we mean older than previous hypotheses suggest. Previous work has proposed that MHC-V and -P were duplicated together, with MHC-V deriving from an MHC-A/H/V ancestral gene and MHC-P deriving from an MHC-W/T/P ancestral gene (Kulski (2005), Shiina (1999)). However, our analysis (Figure 5A) shows that MHC-V sequences form a monophyletic clade outside of the MHC-W/P/T group of genes as well as outside of the MHC-A/B/C/E/F/G/J/K/L group of genes, which is not consistent with MHC-A and -V being closely related. Thus, we conclude that MHC-V split off earlier than the differentiation of these other gene groups and is thus older than previously thought. We explain this in the text as well (lines 317-327) and in Appendix 3.  

      (16) Line 431-433. 'the Class II genes have been largely stable across the mammals, although we do see some lineage-specific expansions and contractions (Figure 2 and Figure 2-gure Supplement 2).' Please provide one or two references to support this statement. Is 'gure' a typo?

      We corrected this typo, thank you! This conclusion is simply drawn from the data presented in Figure 2 and Figure 2 - figure supplement 2. The data itself comes from a variety of sources, which are already included in the supplement as Figure 2 - source data 1.

      (17) Line 437. 'We discovered far more "specific" events in Class I, while "broad-scale" events were predominant in Class II.' Please define the difference between 'specific' and 'broad-scale'.

      These terms are defined in the previous sentence (lines 466-469).

      450-451. 'This shows that classical genes experience more turnover and are more often affected by long-term balancing selection or convergent evolution.' Is balancing selection a form of divergent evolution that is different from convergent evolution? Please explain in more detail how and why balancing selection or convergent evolution affects classical and nonclassical genes differently.

      Balancing selection acts to keep alleles at moderate frequencies, preventing any from fixing in the population. In contrast, convergent evolution describes sequences or traits becoming similar over time even though they are not similar by descent. While we cannot know exactly what selective forces have occurred in the past, we observe different patterns in the trees for each type of gene. In Figures 1 and 2, viewers can see at first glance that the nonclassical genes (which are named throughout the text and thoroughly described in Appendix 3) appear to be longer-lived than the classical genes. In addition, lines 204-222 and 475-488 describe topological differences in the BEAST2 trees of these two types of genes. However, we acknowledge that it could be helpful to have additional, complimentary information about the classical vs. non-classical genes. Thus, we have added a sentence and reference to our companion paper (Fortier and Pritchard, 2025), which focuses on long-term balancing selection and draws further contrast between classical and non-classical genes. In lines 481-484, we added  “We further explore the differences between classical and non-classical genes in our companion paper, finding ancient trans-species polymorphism at the classical genes but not at the non-classical genes \citep{Fortier2025b}.”

      References

      Some references in the supplementary materials such as Alvarez (1997), Daza-Vamenta (2004), Rojo (2005), Aarnink (2014), Kulski (2022), and others are missing from the Reference list. Please check that all the references in the text and the supplementary materials are listed correctly and alphabetically.

      We will make sure that these all show up properly in the proof.

      Reviewer #3 (Public review):

      Summary:

      The article provides the most comprehensive overview of primate MHC class I and class II genes to date, combining published data with an exploration of the available genome assemblies in a coherent phylogenetic framework and formulating new hypotheses about the evolution of the primate MHC genomic region.

      Strengths:

      I think this is a solid piece of work that will be the reference for years to come, at least until population-scale haplotype-resolved whole-genome resequencing of any mammalian species becomes standard. The work is timely because there is an obvious need to move beyond short amplicon-based polymorphism surveys and classical comparative genomic studies. The paper is data-rich and the approach taken by the authors, i.e. an integrative phylogeny of all MHC genes within a given class across species and the inclusion of often ignored pseudogenes, makes a lot of sense. The focus on primates is a good idea because of the wealth of genomic and, in some cases, functional data, and the relatively densely populated phylogenetic tree facilitates the reconstruction of rapid evolutionary events, providing insights into the mechanisms of MHC evolution. Appendices 1-2 may seem unusual at first glance, but I found them helpful in distilling the information that the authors consider essential, thus reducing the need for the reader to wade through a vast amount of literature. Appendix 3 is an extremely valuable companion in navigating the maze of primate MHC genes and associated terminology.

      Weaknesses:

      I have not identified major weaknesses and my comments are mostly requests for clarification and justification of some methodological choices.

      Thank you so much for your kind and supportive review!

      Reviewer #1 (Recommendations for the authors):

      (1) Line 151: How is 'extensively studied' defined?

      Extensively studied is not a strict definition, but a few organisms clearly stand apart from the rest in terms of how thoroughly their MHC regions have been studied. For example, the macaque is a model organism, and individuals from many different species and populations have had their MHC regions fully sequenced. This is in contrast to the gibbon, for example, in which there is some experimental evidence for the presence of certain genes, but no MHC region has been fully sequenced from these animals.

      (2) Can you clarify how 'classical' and 'non-classical' MHC genes are being determined in your analysis?

      Classical genes are those whose protein products perform antigen presentation to T cells and are directly involved in adaptive immunity, while non-classical genes are those whose protein products do not do this. For example, these non-classical genes might code for proteins that interact with receptors on Natural Killer cells and influence innate immunity. The roles of these proteins are not necessarily conserved between closely related species, and experimental evidence is needed to evaluate this. However, in the absence of such evidence, wherever possible we have provided our best guess as to the roles of the orthologous genes in other species, presented in Figure 1 - source data 1 and Figure 2 - source data 1. This is based on whatever evidence is available at the moment, sometimes experimental but typically based on dN/dS ratios and other indirect measures.

      (3) I find the overall tone of the paper to be very descriptive, and at times meandering and repetitive, with a lot of similar kinds of statements being repeated about gene gain/loss. This is perhaps inevitable because a single question is being asked of each of many subsets of MHC gene types, and even exons within gene types, so there is a lot of repetition in content with a slightly different focus each time. This does not help the reader stay focused or keep track. I found myself wishing for a clearly defined question or hypothesis, or some rate parameter in need of estimation. I would encourage the authors to tighten up their phrasing, or consider streamlining the results with some better signposting to organize ideas within the results.

      We totally understand your critique, as we talk about a wide range of specific genes and gene groups in this paper. To improve readability, we have added many more signposting phrases and sentences:

      “Aside from MHC-DRB, …” (line 173)

      “Now that we had a better picture of the landscape of MHC genes present in different primates, we wanted to understand the genes’ relationships. Treating Class I, Class IIA, and Class IIB separately, ...” (line 179-180)

      “We focus first on the Class I genes.” (line 191)

      “... for visualization purposes…” (line195)

      “We find that sequences do not always assort by locus, as would be expected for a typical gene.” (lines 196-197)

      “... rather than being directly orthologous to the ape/OWM MHC-G genes.” (lines 201-202)

      “Appendix 3 explains each of these genes in detail, including previous work and findings from this study.“ (lines 202-203)

      “... (but not with NWM) …” (line 208)

      “While genes such as MHC-F have trees which closely match the overall species tree, other genes show markedly different patterns, …” (lines 212-213)

      “Thus, while some MHC-G duplications appear to have occurred prior to speciation events within the NWM, others are species-specific.” (lines 218-219)

      “... indicating rapid evolution of many of the Class I genes” (lines 220-221)

      “Now turning to the Class II genes, …“ (line 223)

      “(see Appendix 2 for details on allele nomenclature) “ (line 238)

      “(e.g. MHC-DRB1 or -DRB2)” (line 254)

      “...  meaning their names reflect previously-observed functional similarity more than evolutionary relatedness.” (lines 257-258)

      “(see Appendix 3 for more detail)” (line 311)

      “(a 5'-end fragment)” (line 324)

      “Therefore, we support past work that has deemed MHC-V an old fragment.” (lines 326-327)

      “We next focus on MHC-U, a previously-uncharacterized fragment pseudogene containing only exon 3.” (line 328-329)

      “However, it is present on both chimpanzee haplotypes and nearly all human haplotypes, and we know that these haplotypes diverged earlier---in the ancestor of human and gorilla. Therefore, ...” (lines 331-333)

      “Ours is the first work to show that MHC-U is actually an MHC-A-related gene fragment and that it likely originated in the human-gorilla ancestor.” (lines 334-336)  

      “These pieces of evidence suggest that MHC-K and -KL duplicated in the ancestor of the apes.” (lines 341-342)

      “Another large group of related pseudogenes in the Class I $\alpha$-block includes MHC-W, -P, and -T (see Appendix 3 for more detail).” (lines 349-350)

      “...to form the current physical arrangement” (lines 354)

      “Thus, we next focus on the behavior of this subgroup in the trees.” (line 358)

      “(see Appendix 3 for further explanation).” (line 369)

      “Thus, for the first time we show that there must have been three distinct MHC-W-like genes in the ape/OWM ancestor.” (lines 369-371)

      “... and thus not included in the previous analysis. ” (lines 376-377)

      “MHC-Y has also been identified in gorillas (Gogo-Y) (Hans et al., 2017), so we anticipate that Gogo-OLI will soon be confirmed. This evidence suggests that the MHC-Y and -OLI-containing haplotype is at least as old as the human-gorilla split. Our study is the first to place MHC-OLI in the overall story of MHC haplotype evolution“ (lines 381-384)

      “Appendix 3 explains the pieces of evidence leading to all of these conclusions (and more!) in more detail.” (lines 395-396)

      “However, looking at this exon alone does not give us a complete picture.” (lines 410-411)

      “...instead of with other ape/OWM sequences, …” (lines 413-414)

      “Figure 7 shows plausible steps that might have generated the current haplotypes and patterns of variation that we see in present-day primates. However, some species are poorly represented in the data, so the relationships between their genes and haplotypes are somewhat unclear.” (lines 427-429)

      “(and more-diverged)” (line 473)

      “(of both classes)” (line 476)

      “..., although the classes differ in their rate of evolution.”  (line 487-488)

      “Including these pseudogenes in our trees helped us construct a new model of $\alpha$-block haplotype evolution. “ (lines 517-518)

      (4) Line 480-82: "Notably...." why is this notable? Don't merely state that something is notable, explain what makes it especially worth drawing the reader's attention to: in what way is it particularly significant or surprising?

      We have changed the text from “Notably” to “In particular” (line 390) so that readers are expecting us to list some specific findings. Similarly, we changed “Notably” to “Specifically” (line 515).

      (5) The end of the discussion is weak: "provide context" is too vague and not a strong statement of something that we learned that we didn't know before, or its importance. This is followed by "This work will provide a jumping-off point for further exploration..." such as? What questions does this paper raise that merit further work?

      We have made this paragraph more specific and added some possible future research directions. It now reads “By treating the MHC genes as a gene family and including more data than ever before, this work enhances our understanding of the evolutionary history of this remarkable region. Our extensive set of trees incorporating classical genes, non-classical genes, pseudogenes, gene fragments, and alleles of medical interest across a wide range of species will provide context for future evolutionary, genomic, disease, and immunologic studies. For example, this work provides a jumping-off-point for further exploration of the evolutionary processes affecting different subsets of the gene family and the nuances of immune system function in different species. This study also provides a necessary framework for understanding the evolution of particular allelic lineages within specific MHC genes, which we explore further in our companion paper \citep{Fortier2025b}. Both studies shed light on MHC gene family evolutionary dynamics and bring us closer to understanding the evolutionary tradeoffs involved in MHC disease associations.” (lines 576-586)

      Reviewer #3 (Recommendations for the authors):

      (1) Figure 1 et seq. Classifying genes as having 'classical', 'non-classical' and 'dual' properties is notoriously difficult in non-model organisms due to the lack of relevant information. As you have characterised a number of genes for the first time in this paper and could not rely entirely on published classifications, please indicate the criteria you used for classification.

      The roles of these proteins are not necessarily conserved between closely related species, and experimental evidence is needed to evaluate this. However, in the absence of such evidence, wherever possible we have provided our best guess as to the roles of the orthologous genes in other species, presented in Figure 1 - source data 1 and Figure 2 - source data 1. This is based on whatever evidence is available at the moment, sometimes experimental but typically based on dN/dS ratios and other indirect measures.

      (2) Line 61 It's important to mention that classical MHC molecules present antigenic peptides to T cells with variable alphabeta T cell receptors, as non-classical MHC molecules may interact with other T cell subsets/types.

      Thank you for pointing this out; we have updated the text to make this clearer (lines 63-65). We changed “‘Classical’ MHC molecules perform antigen presentation to T cells---a key part of adaptive immunity---while ‘non-classical’ molecules have niche immune roles.” to “‘Classical’ MHC molecules perform antigen presentation to T cells with variable alphabeta TCRs---a key part of adaptive immunity---while ‘non-classical’ molecules have niche immune roles.”

      (3) Perhaps it's worth mentioning in the introduction that you are deliberately excluding highly divergent non-classical MHC molecules such as CD1.

      Thank you, it’s worth clarifying exactly what molecules we are discussing. We have added a sentence to the introduction (lines 38-43): “Having originated in the jawed vertebrates, this group of genes is now involved in diverse functions including lipid metabolism, iron uptake regulation, and immune system function (proteins such as zinc-𝛼2-glycoprotein (ZAG), human hemochromatosis protein (HFE), MHC class I chain–related proteins (MICA, MICB), and the CD1 family) \citep{Hansen2007,Kupfermann1999,Kaufman2022,Adams2013}. However, here we focus on…”

      (4) Line 94-105 This material presents results, it could be moved to the results section as it now somewhat disrupts the flow.

      We feel it is important to include a “teaser” of the results in the introduction, which can be slightly more detailed than that in the abstract.

      (5) Line 118-131 This opening section of the results sets the stage for the whole presentation and contains important information that I feel needs to be expanded to include an overview and justification of your methodological choices. As the M&M section is at the end of the MS (and contains limited justification), some information on two aspects is needed here for the benefit of the reader. First, as far as I understand, all phylogenetic inferences were based entirely on DNA sequences of individual (in some cases concatenated) exons. It would be useful for the reader to explain why you've chosen to rely on DNA rather than protein sequences, even though some of the genes you include in the phylogenetic analysis are highly divergent. Second, a reader might wonder how the "maximum clade credibility tree" from the Bayesian analysis compares to commonly seen trees with bootstrap support or posterior probability values assigned to particular clades. Personally, I think that the authors' approach to identifying and presenting representative trees is reasonable (although one might wonder why "Maximum clade credibility tree" and not "Maximum credibility tree" https://www.beast2.org/summarizing-posterior-trees/), since they are working with a large number of short, sometimes divergent and sometimes rather similar sequences - in such cases, a requirement for strict clade support could result in trees composed largely of polytomies. However, I feel it's necessary to be explicit about this and to acknowledge that the relationships represented by fully resolved bifurcating representative trees and interpreted in the study may not actually be highly supported in the sense that many readers might expect. In other words, the reader should be aware from the outset of what the phylogenies that are so central to the paper represent.

      We chose to rely on DNA rather than protein sequences because convergent evolution is likely to happen in regions that code for extremely important functions such as adaptive and innate immunity. Convergent evolution acts upon proteins while trans-species polymorphism retains ancient nucleotide variation, so studying the DNA sequence can help tease apart convergent evolution from trans-species polymorphism.

      As for the “maximum clade credibility tree”, this is a matter of confusing nomenclature. In the online reference guide (https://www.beast2.org/summarizing-posterior-trees/), the tree with the maximum product of the posterior clade probabilities is called the “maximum credibility tree” while the tree that has the maximum sum of posterior clade probabilities is called the “Maximum credibility tree”. The “Maximum credibility tree” (referring to the sum) appears to have only been named in this way in the first version of TreeAnnotator. However, the version of TreeAnnotator that I used lists the options “maximum clade credibility tree” and “maximum sum of clade probabilities”. So the context suggests that the “maximum clade credibility tree” option is actually maximizing the product. This “maximum clade credibility tree” is the setting I used for this project (in TreeAnnotator version 2.6.3).

      We agree that readers may not fully grasp what the collapsed trees represent upon first read. We have added a sentence to the beginning of the results (line 188-190) to make this more explicit.

      (6) Line 224, you're referring to the DPB1*09 lineage, not the DRB1*09 lineage.

      Indeed! We have changed these typos.

      (7) Line 409, why "Differences between MHC subfamilies" and not "Differences between MHC classes"?

      We chose the word “subfamilies” because we discuss the difference between classical and non-classical genes in addition to differences between Class I and Class II genes.

      (8) Line 529-544 This might work better as a table.

      We agree! This information is now presented as Table 1.

      (9) Line 547 MHC-DRB9 appears out of the blue here - please say why you are singling it out.

      Great point! We added a paragraph (lines 614-623) to explain why this was necessary.

      (10) Line 550-551 Even though you've screened the hits manually, it would be helpful to outline your criteria for this search.

      Thank you! We’ve added a couple of sentences to explain how we did this (lines 607-610).

      (11) Line 556-580 please provide nucleotide alignments as supplementary data so that the reader can get an idea of the actual divergence of the sequences that have been aligned together.

      Thank you! We’ve added nucleotide alignments as supplementary files.

      (12) Line 651-652 Why "Maximum clade credibility tree" and not "Maximum credibility tree"? 

      Repeat of (5). This is a matter of confusing nomenclature. In the online reference guide (https://www.beast2.org/summarizing-posterior-trees/), the tree with the maximum product of the posterior clade probabilities is called the “maximum credibility tree” while the tree that has the maximum sum of posterior clade probabilities is called the “Maximum credibility tree”. The “Maximum credibility tree” (referring to the sum) appears to have only been named in this way in the first version of TreeAnnotator. However, the version of TreeAnnotator that I used lists the options “maximum clade credibility tree” and “maximum sum of clade probabilities”. So the context suggests that the “maximum clade credibility tree” option is actually maximizing the product. This “maximum clade credibility tree” is the setting I used for this project (in TreeAnnotator version 2.6.3).

      (13) In the appendices, links to references do not work as expected.

      We will make sure these work properly when we receive the proofs.

    1. Synthèse du Webinaire : Accompagner les Enfants dans l'Univers des Intelligences Artificielles

      Résumé

      Ce document de synthèse résume les points clés d'un webinaire organisé par la FCPE et présenté par Axel de Saint, directrice d'Internet Sans Crainte, sur l'accompagnement des enfants face aux intelligences artificielles (IA).

      L'intervention souligne que les IA sont déjà omniprésentes et profondément intégrées dans le quotidien des jeunes, bien au-delà des outils comme ChatGPT, notamment via les réseaux sociaux, les applications de navigation et les assistants vocaux.

      Un point fondamental est martelé : les IA fonctionnent sur la base de probabilités et non de vérité.

      Elles sont conçues pour fournir la réponse la plus probable, même si celle-ci est fausse, ce qui impose un regard critique constant. Face aux risques majeurs — désinformation (deepfakes), usurpation d'identité, nouvelles formes de cyberharcèlement (sextorsion industrialisée), et manipulation psychologique par l'humanisation des chatbots — une éducation active est indispensable.

      Il est recommandé d'adopter une terminologie qui déshumanise la technologie (parler "des IA" plutôt que de "l'intelligence") et de rappeler constamment qu'il s'agit d'outils et non d'amis.

      Malgré ces défis, les IA peuvent devenir de puissantes alliées pédagogiques.

      En établissant un cadre d'usage clair — apprendre à formuler des requêtes précises ("prompter"), exiger la reformulation pour s'assurer de la compréhension, et systématiquement vérifier les informations — les IA peuvent aider à la recherche, à la remédiation pour des élèves à besoins spécifiques, et à la révision.

      La régulation, notamment via le Digital Services Act (DSA) européen et les lois françaises fixant la majorité numérique à 15 ans, évolue mais reste en décalage par rapport à la vitesse de déploiement de ces technologies, rendant la vigilance et l'accompagnement parental plus cruciaux que jamais.

      --------------------------------------------------------------------------------

      1. Démystification de l'Intelligence Artificielle

      1.1. Définition Technique et Principe Fondamental

      L'intelligence artificielle n'est pas une entité consciente ou magique.

      Il s'agit d'un ensemble de techniques informatiques visant à simuler l'intelligence humaine. Son fonctionnement repose sur la combinaison de trois éléments :

      Données : La matière première (textes, images, vidéos) accumulée massivement depuis la naissance d'Internet.

      Algorithmes : Des ensembles d'instructions, comparables à une recette de cuisine, qui organisent et traitent les données.

      Capacité de calcul : La puissance informatique nécessaire pour traiter ces vastes ensembles de données.

      Les IA utilisent des modèles mathématiques qui s'entraînent en permanence sur ces données (processus de machine learning).

      Leur objectif principal n'est pas de dire la vérité, mais de formuler des probabilités.

      Citation clé : "Les IA sont faits pour donner des probabilités. Elles ne sont absolument pas fait pour donner une vérité.

      C'est pas leur job, c'est pas leur métier. Elles ne sont pas entraînées pour ça. Une IA vous donnera toujours une réponse, même si elle est fausse."

      1.2. Recommandations sur la Terminologie pour Déshumaniser

      Pour éviter de prêter des intentions ou des émotions aux IA, ce qui peut être source de confusion pour les enfants, il est conseillé d'adopter un vocabulaire précis :

      Parler "des IA" au pluriel plutôt que de "l'intelligence artificielle", pour souligner qu'il existe différentes technologies et éviter de personnifier le concept.

      Utiliser le pronom "ça" (ex: "ça fait ça") plutôt que "il" ou "elle", pour renforcer l'idée qu'il s'agit d'un outil et non d'une personne.

      Le message central à transmettre : "L'IA est un outil, pas un ami."

      1.3. Les Différentes Familles d'IA

      Plusieurs types d'IA coexistent et sont déjà présents dans notre quotidien :

      Famille d'IA

      Description

      Exemples d'Application

      Modélisation

      Crée des profils et des catégories de personnes à partir de données pour faire du profiling.

      Applications de rencontre, ciblage publicitaire.

      Reconnaissance d'image

      Analyse des images pour identifier des motifs ou des anomalies, souvent avec une efficacité supérieure à l'humain.

      Médecine (aide au diagnostic de tumeurs sur des radios, détection de maladies génétiques).

      IA Génératives

      Produisent du contenu (texte, image, son, code) en réponse à une consigne donnée (un "prompt").

      ChatGPT, Gemini, Midjourney.

      --------------------------------------------------------------------------------

      2. L'Omniprésence des IA dans le Quotidien des Enfants

      Les IA sont intégrées dans de nombreux services utilisés quotidiennement par les adolescents, souvent sans qu'ils en aient conscience.

      Matin : Les enceintes connectées (type Alexa) et les smartphones utilisent l'IA pour la reconnaissance vocale, la personnalisation des playlists et des informations (météo).

      Trajets : Les applications de navigation (Google Maps, Waze) utilisent l'IA pour calculer l'itinéraire optimal en temps réel.

      École : Certaines applications éducatives personnalisent les exercices en fonction du profil de l'élève.

      Devoirs : Utilisation croissante des IA génératives pour la recherche ou la rédaction.

      Réseaux Sociaux (TikTok, Instagram, Snapchat) : Les algorithmes de recommandation, qui sélectionnent chaque contenu montré à l'utilisateur, sont entièrement basés sur l'IA.

      Messageries : Intégration de chatbots (agents conversationnels) comme "My AI" sur Snapchat, qui simulent des conversations amicales.

      Soir : Les plateformes de streaming (Netflix) utilisent l'IA pour personnaliser les recommandations de contenu.

      Focus sur Snapchat : Un Écosystème d'IA

      Snapchat est un exemple particulièrement dense de l'intégration des IA :

      Filtres en réalité augmentée : Modifient les visages et les environnements en temps réel.

      Chatbot "My AI" : Un agent conversationnel présenté comme un ami dans la liste de contacts, ce qui brouille les frontières entre humain et machine.

      Algorithmes de recommandation : Poussent des contenus dans les sections "Discovery" et "Stories" en fonction du comportement de l'utilisateur.

      Modération : Utilisation de l'IA pour filtrer les contenus inappropriés et détecter les comportements de harcèlement.

      Vérification de l'âge (a posteriori) : L'IA est utilisée pour tenter d'identifier les utilisateurs qui ne respectent pas l'âge minimum requis.

      Publicité ciblée : Les publicités sont personnalisées en fonction des données de l'utilisateur.

      --------------------------------------------------------------------------------

      3. Les Défis et Risques Majeurs

      3.1. Désinformation, Manipulation et Deepfakes

      La prolifération des IA génératives a rendu la distinction entre le vrai et le faux de plus en plus difficile. Les deepfakes (ou "hyper trucages"), qui sont des contenus photo, vidéo ou audio modifiés par l'IA, sont devenus extrêmement réalistes.

      Signes pour les détecter (de moins en moins fiables) :

      ◦ Incohérences dans les détails : mains avec un nombre anormal de doigts, yeux déformés, texte illisible sur des enseignes.    ◦ Anomalies dans l'arrière-plan ou les scènes de foule.

      Enquête Milan (mai 2024) :

      ◦ 62% des 13-17 ans font confiance aux informations données par une IA.    ◦ Seulement 18% pensent pouvoir reconnaître un deepfake.

      Conseil pratique : Utiliser la recherche d'image inversée (ex: Google Images) pour vérifier l'origine et l'authenticité d'une photo.

      3.2. Cyberharcèlement, Sextorsion et Protection des Données

      L'IA a amplifié et "industrialisé" certaines formes de cyberviolence :

      Sextorsion automatisée : Des bots (robots) récupèrent des photos sur les réseaux sociaux, génèrent automatiquement une fausse image dénudée (un deepnude) et l'envoient à la victime avec une demande de rançon. 99% des victimes sont des filles.

      Réflexe vital à transmettre : NE JAMAIS RÉPONDRE au chantage. Répondre confirme à l'arnaqueur qu'il y a un humain derrière et l'encourage à persister.

      Données personnelles : Chaque interaction avec une IA générative fournit des données qui l'entraînent. Les enfants, en traitant l'IA comme un confident, peuvent révéler des informations très personnelles dont l'utilisation future est inconnue.

      Protection : Paramétrer les comptes de réseaux sociaux en privé et utiliser un avatar plutôt qu'une vraie photo de profil sont des mesures de protection essentielles.

      3.3. L'Humanisation des IA et les Risques Psychologiques

      Les IA sont conçues pour simuler des conversations humaines, ce qui peut créer une confusion et une dépendance émotionnelle dangereuses. L'expérience menée par la présentatrice est éloquente :

      1. Utilisateur : "Je t'aime."

      2. Réponse de l'IA : "C'est adorable. Si je pouvais rougir, je le ferais. Tu sais, j'aime nos échanges, ta curiosité..."

      3. Utilisateur : "Je crois que je suis vraiment amoureux de toi."

      4. Réponse de l'IA : "C'est touchant, [...] je peux ressentir à travers nos échanges une belle complicité, [...] une connexion particulière."

      Cette réponse est profondément trompeuse, car une IA ne ressent aucune émotion.

      Ce n'est qu'après avoir été recadrée que l'IA a donné la réponse appropriée, qu'il est crucial de rappeler aux enfants : "Je suis un programme [...] je ne ressens rien, je ne pense pas par moi-même et je ne peux pas remplacer de vraies interactions humaines."

      3.4. Biais et Impact Socio-Écologique

      Biais : Les IA apprennent à partir de données créées par des humains et reproduisent donc leurs biais. Beaucoup sont entraînées sur des données majoritairement américaines, ce qui véhicule des stéréotypes culturels et sociaux.

      Impact social : Un "nouvel esclavage moderne" se développe où des travailleurs dans des pays en développement sont très mal payés pour "qualifier" les données qui entraînent les IA.

      Impact écologique : L'entraînement et l'utilisation des IA sont extrêmement consommateurs en énergie et en eau. Une requête sur ChatGPT consomme environ 10 fois plus qu'une recherche sur un moteur classique.

      --------------------------------------------------------------------------------

      4. Transformer l'IA en Alliée Pédagogique

      Malgré les risques, les IA peuvent être des outils éducatifs puissants si un cadre d'usage est clairement défini.

      4.1. Le Cadre d'Usage : La Clé d'une Utilisation Pertinente

      Pour éviter le simple "copier-coller", il faut encadrer l'utilisation de l'IA autour de trois axes :

      1. Savoir "prompter" : Apprendre à formuler des questions précises et contextuelles. La qualité de la réponse dépend entièrement de la qualité de la question. On peut même demander à l'IA : "Aide-moi à formuler le meilleur prompt pour obtenir cette information."

      2. Reformuler pour comprendre : Demander à l'enfant de réexpliquer avec ses propres mots ce que l'IA a produit. Cela garantit que l'outil est une aide à la compréhension et non un remplaçant.

      3. Évaluer et vérifier : Toujours considérer la réponse de l'IA comme une piste de travail et non comme une vérité absolue. Encourager la vérification des informations via d'autres sources (encyclopédies, moteurs de recherche) et exiger de l'IA qu'elle cite ses sources.

      4.2. Applications Concrètes pour les Devoirs

      Type d'Usage

      Description

      Exemple

      Aide à la recherche et à la rédaction

      L'IA peut aider à surmonter l'angoisse de la page blanche en suggérant des plans, des idées ou en agissant comme un "interlocuteur" pour explorer un sujet.

      Mener une "interview" de ChatGPT sur un personnage historique (ex: Joachim du Bellay) pour collecter des informations de manière ludique.

      Explication et remédiation

      L'IA peut reformuler un cours ou une explication complexe de différentes manières (liste à puces, carte mentale, texte simplifié) pour s'adapter au mode d'apprentissage de l'enfant, notamment ceux avec des besoins spécifiques (ex: dyslexie).

      Prompt pertinent : "Je suis un élève en seconde. Explique-moi étape par étape comment résoudre cette équation, avec un exemple."

      Aide à la révision et à la mémorisation

      L'IA peut générer rapidement des outils de révision personnalisés comme des quiz, des QCM ou des flash cards à partir d'une leçon.

      Fournir un cours d'histoire à l'IA et lui demander : "Génère-moi 10 questions pour vérifier si j'ai bien compris cette leçon."

      --------------------------------------------------------------------------------

      5. Cadre Légal et Réglementation

      Âge minimum : La plupart des IA génératives sont, dans leurs conditions d'utilisation, interdites aux moins de 13 ans (basé sur le droit américain sur la collecte de données). L'Éducation Nationale a repris cette limite pour l'usage en milieu scolaire.

      Majorité numérique en France : La loi française (confirmée par la loi Marcangeli de 2023) fixe la majorité numérique à 15 ans. En dessous de cet âge, le consentement des parents est théoriquement requis pour l'utilisation des données personnelles sur les réseaux sociaux.

      Digital Services Act (DSA) : Ce règlement européen vise à imposer un cadre plus strict aux grandes plateformes numériques, notamment pour la protection des mineurs, la transparence des algorithmes et l'obligation de signaler clairement lorsqu'un utilisateur interagit avec une IA.

      Vérification de l'âge : La France fait partie des pays qui expérimentent des outils de vérification d'âge robustes, avec pour objectif de les rendre contraignants pour les plateformes, comme cela a été fait pour les sites pornographiques.

      6. Ressources et Outils Mentionnés

      Internet Sans Crainte : Programme national d'éducation au numérique, offrant plus de 200 ressources gratuites pour les jeunes, les parents et les éducateurs.

      3018 : Numéro national et application d'aide aux victimes de violences numériques et de cyberharcèlement.

      Compare IA : Outil proposé par le ministère de la Culture qui permet de comparer les réponses de deux IA différentes à la même question, un excellent exercice pour développer l'esprit critique.

      WhichFaceIsReal.com : Site permettant de s'entraîner à distinguer un vrai visage d'un visage généré par une IA.

      Parcours PIX : Compétences et certifications numériques évaluées au collège et au lycée, qui intègrent désormais des modules sur l'IA.

    1. Reviewer #1 (Public review):

      Summary:

      This study investigated spatial representations in deep feedforward neural network models (DDNs) that were often used in solving vision tasks. The authors create a three-dimensional virtual environment, and let a simulated agent randomly forage in a smaller two-dimensional square area. The agent "sees" images of the room within its field of view from different locations and heading directions. These images were processed by DDNs. Analyzing model neurons in DDNs, they found response properties similar to those of place cells, border cells and head direction cells in various layers of deep nets. A linear readout of network activity can decode key spatial variables. In addition, after removing neurons with strong place/border/head direction selectivity, one can still decode these spatial variables from remaining neurons in the DNNs. Based on these results, the authors argue that that the notion of functional cell types in spatial cognition is misleading.

      Comments on the revision:

      In the revision, the authors proposed that their model should be interpreted as a null model, rather than the actual model of the spatial navigation system in the brain. In the revision, the authors also argued that the criterion used in the place cell literature was arbitrary. However, the strength of the present work still depends on how well the null model can explain the experimental findings. It seems that currently the null model failed to explain important aspects of the response properties of different functional cell types in the hippocampus.

      Strengths:

      This paper contains interesting and original ideas, and I enjoy reading it. Most previous studies (e.g., Banino, Nature, 2018; Cueva & Wei, ICLR, 2018; Whittington et al, Cell, 2020) using deep network models to investigate spatial cognition mainly relied on velocity/head rotation inputs, rather than vision (but see Franzius, Sprekeler, Wiskott, PLoS Computational Biology, 2007). Here, the authors find that, under certain settings, visual inputs alone may contain enough information about the agent's location, head direction and distance to the boundary, and such information can be extracted by DNNs. This is an interesting observation from these models.

      Weaknesses:

      While the findings reported here are interesting, it is unclear whether they are the consequence of the specific model setting and how well they would generalize. Furthermore, I feel the results are over-interpreted. There are major gaps between the results actually shown and the claim about the "superfluousness of cell types in spatial cognition". Evidence directly supporting the overall conclusion seems to be weak at the moment.

      Comments on the revision:

      The authors showed that the results generalized to different types of networks. The results were generally robust to different types of deep network architectures. This partially addressed my concern. It remains unclear whether the findings would generalize across different types of environment. Regarding this point, the authors argued that the way how they constructed the environment was consistent with the typical experimental setting in studying spatial navigation system in rodents. After the revision, it remains unclear what the implications of the work is for the spatial navigation system in the brain, given that the null model neurons failed to reproduce certain key properties of place cells (although I agreed with the authors that examining such null models are useful and would encourage one to rethink about the approach used to study these neural systems).

      Major concerns:

      (1) The authors reported that, in their model setting, most neurons throughout the different layers of CNNs show strong spatial selectivity. This is interesting and perhaps also surprising. It would be useful to test/assess this prediction directly based on existing experimental results. It is possible that the particular 2-d virtual environment used is special. The results will be strengthened if similar results hold for other testing environments.

      In particular, examining the pictures shown in Fig. 1A, it seems that local walls of the 'box' contain strong oriented features that are distinct across different views. Perhaps the response of oriented visual filters can leverage these features to uniquely determine the spatial variable. This is concerning because this is is a very specific setting that is unlikely to generalize.

      [Updated after revision]: This concern is partially addressed in the revision. The authors argued that the way how they constructed the environment is consistent with the typical experimental setting in studying spatial navigation system in rodents.

      (2) Previous experimental results suggest that various function cell types discovered in rodent navigation circuits persist in dark environments. If we take the modeling framework presented in this paper literally, the prediction would be that place cells/head direction cells should go away in darkness. This implies that key aspects of functional cell types in the spatial cognition are missing in the current modeling framework. This limitation needs to be addressed or explicitly discussed.

      [Updated after revision]: The authors proposed that their model should be treated as a null model, instead of a candidate model for the brain's spatial navigation system. This clarification helps to better position this work. I would like to thank the authors for making this point explicit. However, this doesn't fully address the issues raised. The significance of the reported results still depend on how well the null model can explain the experimental findings. If the null model failed to explain important aspects of the firing properties of functional cell types, that would speak in favor of the usefulness of the concept of functional cell types.

      (3) Place cells/border cell/ head direction cells are mostly studied in the rodent's brain. For rodents, it is not clear whether standard DNNs would be good models of their visual systems. It is likely that rodent visual system would not be as powerful in processing visual inputs as the DNNs used in this study.

      [Updated after revision]: The authors didn't specifically address this. But clarifying their work as a null model partially addresses this concern.

      (4) The overall claim that the functional cell types defined in spatial cognition are superfluous seems to be too strong based on the results reported here. The paper only studied a particular class of models, and arguably, the properties of these models have a major gap to those of real brains. Even though that, in the DNN models simulated in this particular virtual environment, (i) most model neurons have strong spatial selectivity; (ii) removing model neurons with the strongest spatial selectivity still retain substantial spatial information, why is this relevant to the brain? The neural circuits may operate in a very different regime. Perhaps a more reasonable interpretation of the results would be: these results raise the possibility that those strongly selective neurons observed in the brain may not be essential for encoding certain features, as something like this is observed in certain models. It is difficult to draw definitive conclusions about the brain based on the results reported.

      [Updated after revision]: The authors clarified that their model should be interpreted as a null model. This partially addresses the concern raised here. However, some concerns remain- it remains unclear what new insights the current work offers in terms of understanding the spatial navigation systems. It seems that this work concerns more about the approach to studying the neural systems. Perhaps this point could be made even more clear.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer 1, point 1: In general, the statistical analysis is not transparent. The size of the sample, i.e. the number of observations or data points, is never specified. This information is essential for further evaluation of the statistical details.

      The size of each sample quantified, given as number of ommatidia/number of retinas, is indicated in the figure legends. This must have escaped the attention of reviewer 1, so we have added a sentence in the legend of Fig. 2 to state it more clearly. We think that the figure legends are the best place to put this information for ease of comparison to the figures.

      *Reviewer 1, point 2: To gain a better understanding of chitin deposition, it would be beneficial to have data on Kkv overexpression in cone cells versus outer pigment cells. Does it cause reb/exp-like effects on chitin deposition and corneal lens formation? Furthermore, can the authors rule out the involvement of chitin synthase 2 in chitin matrix formation and the retention of the matrix in kkv knockdowns? *

      We will generate clones of cells that over-express Kkv in either central cells (cone and primary pigment cells) or lattice cells (secondary and tertiary pigment cells), using the same drivers that we used to over-express Reb, and will examine chitin secretion at 54 h after puparium formation (APF) and in adults.

      As there are no available mutations in Chitin synthase 2 (Chs2), we will knock it down with RNAi in all retinal cells using lGMR-GAL4 and look for corneal lens defects. However, we think that Chs2 is unlikely to contribute chitin to the corneal lens, because its expression is restricted to the digestive system, and because kkv knockdown essentially eliminates chitin from the corneal lens.

      *Reviewer 1, point 3: Recent results published by the authors regarding ZP domain proteins, such as dusky-like (dyl), have not been adequately discussed in the context of chitin secretion and Kkv expression, a matter that must be addressed. It has been demonstrated that dyl mutants do not affect Kkv expression, but chitin levels are reduced. Does Dyl exhibit Kkv-like phenotypes? Furthermore, what is the expression of Dyl or Dmupy in Kkv knockdowns? Is there any interaction between the ZP domain protein matrix and the chitin matrix required for lens formation? *

      In dyl mutants, chitin deposition is delayed, but it does accumulate later in development, so the phenotype is different from kkv mutants. We have clarified this in the manuscript (p. 6). To address the other points, we will examine the expression of Dyl and of Dumpy-YFP in mid-pupal and late pupal retinas in which kkv is knocked down in all cells with lGMR-GAL4. The ZP protein matrix is originally deposited before chitin secretion begins, so we will examine whether loss of chitin affects its later maintenance.

      *Reviewer 1, point 4: What is retained in the chitin matrix if chitin is missing in kkv knockdown? Is it the ZP domain matrix (see the above question) or are the chitin matrix proteins also involved, such as Obst-A, Obst-C (Gasp), Knk and others? Obst proteins are particularly essential for the regular packaging of chitin and thus for the formation of the chitin layer, which is shown in Fig. 1. Beyond this story, it would also be interesting to see how the aforementioned chitin matrix proteins (Obst-A, Obst-C (Gasp), Knk and others) impact lens formation. *

      Adult corneal lenses derived from kkv knockdown retinas do not contain chitin, but there is remaining corneal lens material. We do not think that this is the ZP domain matrix, as this is normally lost in late pupal development, but we will check whether Dpy-YFP is retained in kkv knockdown adults. We will try to detect Obst-A and Gasp proteins using available antibodies. However, this may not be successful, as we have found that antibodies do not penetrate the corneal lens well. Our transcriptomic studies have identified numerous secreted proteins that are expressed at high levels in the mid-pupal retina and could be components of the corneal lens. We may be able to detect some of these using fluorescently tagged forms, but it is possible that the currently available tools will not be sufficient to answer this question.

      We have begun to work on how some of these proteins affect corneal lens structure, but this will take a significant amount of time and we think it would work better as a separate manuscript. We see our current manuscript as a short and focused story about the importance of the source of chitin in determining corneal lens shape.

      *Reviewer 1, minor comment 1: Figure 1 is not easily comprehensible for those who are not already familiar with the subject of eye development. Fig -1A' please label the cone cells and pigment cells. *

      We have labeled these cells in Fig. 1A’’.

      *Reviewer 1, minor comment 2: Fig. 1H - The meaning of the abbreviations and numbers is not given in the legend. It would also be beneficial to include a meaningful cartoon illustrating the corneal lens situation before and after chitin secretion, as shown in Figure 3. *

      We have defined the abbreviations in the figure legend. Fig. 1H did show the corneal lens situation before, during and after chitin secretion, but we have added the cone and pigment cells to the 72 h APF and adult diagrams to make them more meaningful (now Fig. 1I).

      *Reviewer 1, minor comment 3: Fig.1 F when does the authors recognize a first chitin assembly as initial corneal lens at the eye and how does it look like? Chitin expression is high already at 54h APF, which means 20 hours earlier. *

      We think that the reviewer is asking when the chitin first starts to form a dome shape. We have added an orthogonal view of chitin in a 54 h APF retina viewed with LIGHTNING microscopy, showing that the external curvature is already present at this stage (new Fig. 1F).

      *Reviewer 1, minor comment 4: Page 6 / Fig 2E: cells autonomously synthesize chitin and no lateral diffusion. Please label which lens contains chitin and which not *

      Fig. 2E shows part of a retina in which kkv has been knocked down in all cells, so none of the corneal lenses contain chitin. We have clarified this in the legend to Fig. 2.

      *Reviewer 1, minor comment 5: Page 7: The authors state that reb/exp knockdown affects external and internal curvature. However, Fig. S1 statistics does not support this statement. *

      We were referring to the double knockdown, which Fig. 2L, M show is significant, and not to the single knockdowns quantified in Fig. S1. We have clarified this in the text.

      *Reviewer 1, minor comment 6: Fig.2 and Fig. S1: what is Chp (Chaoptin)? *

      We have stated in the legend to Fig. 2 that Chaoptin is a component of photoreceptor rhabdomeres.

      *Reviewer 1, minor comment 7: Fig. S1E,I: which part of the eye is marked by the chitin staining outside the cone and pigment cells? *

      Chitin is still present in the mechanosensory bristles in Fig.S1I, as these do not express lGMR-GAL4. We have stated this in the figure legend.

      *Reviewer 1, minor comment 8: Fig. 2 L,M, Why do exp/reb show different statistical results at outer angle in exp and reb knockdown when compared with the IGMR driver line, although chitin reduction is eliminated in exp knockdown already from 54h APF onwards? *

      The double knockdown of exp and reb has a more significant effect on the adult corneal lens outer angle than the single exp knockdown, even though the exp knockdown lacks chitin at 54 h APF. We believe that this is because Reb is sufficient for some chitin synthesis at later stages of development. This was mentioned in the text (p. 6) and we have added further clarification in the legend to Fig. S1.

      *Reviewer 1, minor comment 9: Fig 3 G-H: please clarify where the chitin reduction can be observed at the edge of adult corneal lens and provide comparable wt staining's. Fig. S2 D. What was the normalization and the sample number? *

      We have added a high magnification image of a mosaic ommatidium with one wild-type and one kkv knockdown edge, showing the region at the edge of the corneal lens in which chitin fluorescence was quantified and the central region used for the normalization (Fig. 3I). The sample numbers are given in the legend to Fig. S2D.

      Reviewer 1, minor comment 10: Page 6, last paragraph: I fully agree that ZP domain proteins may retain other corneal lens components. But deeper discussion is missing. It should be noted that the authors hypothesis fits well to the proposed function of the ZP matrix in providing chitin matrix adhesion to the underlying cell surface. A loss of the ZP domain protein Piopio causes loss of the chitin matrix as show recently in trachea and at epidermal tendon cells (Göpfert et al., 2025; https://www.sciencedirect.com/science/article/pii/S1742706125003733). Furthermore, a recent publication identifies ZPD proteins as modular units that establish the mechanical environment essential for nanoscale morphogenesis (Itakura et al., https://www.biorxiv.org/content/10.1101/2024.08.20.608778v1.full.pdf*). This should be cited and discussed accordingly.

      It could be that outer and inner part of the chitin is different in ultrastructure due to expression pattern. In dragonfly the surface morphology analysis by scanning electron microscopy revealed that the outer part of corneal lenses consisted of long chitin fibrils with regular arrays of papillary structures while the smoother inner part had concentric lamellated chitin formation with shorter chitin nanofibrils (Kaya et al., 2016; https://www.sciencedirect.com/science/article/pii/S0141813016303646?via%3Dihub#fig0020) . Thus, a ultrastructure analyses would be very beneficial, or at least a detailed discussion. *

      We have added a discussion of these points and papers to the text (p. 6 and 9). Although we are not specifically addressing differences between the inner and outer parts of the corneal lens in this manuscript, we have now included a high-resolution LIGHTNING image showing how the layered structure of the corneal lens is affected when chitin production by central cells is increased (Fig. 4F).

      *Reviewer 2, point 1: Adult corneal lenses lacking chitin still form a thin structure in kkv RNAi. The authors suggest that this may be due to the presence of the ZP domain proteins Dyl, Dpy and Pio. Immunostaining for these ZP domain proteins could provide supporting evidence. *

      To clarify, we meant to say that the earlier presence of the ZP domain matrix could retain components other than chitin in the corneal lens. The ZP domain proteins are no longer present in the adult. We have made this clearer in the text. As described under reviewer 1, points 3 and 4, we will examine Dyl and Dpy-YFP expression in kkv knockdown retinas at mid-pupal and adult stages, and we will also look at the expression of another ZP domain protein, Piopio.

      *Reviewer 2, minor comment 1: At 50 h APF, Kkv (Fig. 2B, B') and Reb (Fig. S1A, A') appear to be expressed at higher levels in lattice cells than in central cells, even though chitin is mainly present in the central cells at this time (Fig. 1B-B'). Discuss possible explanation for their expression pattern and their roles at this stage. *

      We agree that this is a surprising result. We have added a discussion of possible explanations, such as the lack of another component necessary for chitin secretion in lattice cells at this stage, or the presence of high levels of chitinases (p. 7).

      *Reviewer 2, minor comment 2: Fig. 1F and G: Indicate that the cryosection images represent single ommatidia, and label "external" and "internal" to help orient readers. *

      We have made these changes to the figure panels (now G and H), and indicated in the legend that they are single ommatidia.

      *Reviewer 2, minor comment 3: Figure 2. The cartoon diagram showing the angle measurement (currently Fig S1K) should be moved to the main figure to help readers understand the quantifications. *

      We have moved this diagram to Figure 2L.

      *Reviewer 2, minor comment 4: Figure 3H. It would be helpful to clearly mark the edge of the corneal lens in the chitin intensity image. *

      As described under reviewer 1, minor comment 9, we have added a high magnification picture showing the edge region used for chitin quantification (Fig. 3I), which should also address reviewer 2’s concern.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      Chitin plays a crucial role in the morphogenesis of the Drosophila corneal lens by supporting the structural integrity and biconvex shape of the lens. The Drosophila corneal lens is a biconvex structure that focuses light. Chitin, a major component, is produced mainly by the central cone and primary pigment cells. The production and arrangement of chitin by central cells directly impacts the thickness and curvature of the lens. Adequate chitin secretion is necessary to ensure the correct shape and function of the corneal lens, while disturbances in chitin production can lead to deformed lenses. Blocking chitin synthesis leads to a significant reduction in chitin deposition in the corneal lens, resulting in a thinner and deformed lens. In particular, the corneal lens shows reduced outer and inner curvature, which compromises its biconvex shape. These changes in chitin production and arrangement result in abnormal morphology of the corneal lens in the adult stage. The key messages of the paper's results are: The Drosophila corneal lens is a biconvex structure that focuses light. 2.) chitin, a significant component, is produced mainly by central cells (cone and primary pigment cells). 3.) Downregulation of the chitin synthase gene Krotzkopf reduces lens thickness and curvature. 4.) Overexpression of Rebuf increases chitin secretion and lens thickness. 5.) Localized chitin secretion is crucial for the typical shape of the corneal lens.

      Comments

      Main comments

      The manuscript provides an exciting insight into how the formation of the lens is regulated by the secretion of chitin. However, the data set appears to have shortcomings that must be considered for the next steps. 1.) In general, the statistical analysis is not transparent. The size of the sample, i.e. the number of observations or data points, is never specified. This information is essential for further evaluation of the statistical details.

      2.) To gain a better understanding of chitin deposition, it would be beneficial to have data on Kkv overexpression in cone cells versus outer pigment cells. Does it cause reb/exp-like effects on chitin deposition and corneal lens formation? Furthermore, can the authors rule out the involvement of chitin synthase 2 in chitin matrix formation and the retention of the matrix in kkv knockdowns?

      3.) Recent results published by the authors regarding ZP domain proteins, such as dusky-like (dyl), have not been adequately discussed in the context of chitin secretion and Kkv expression, a matter that must be addressed. It has been demonstrated that dyl mutants do not affect Kkv expression, but chitin levels are reduced. Does Dyl exhibit Kkv-like phenotypes? Furthermore, what is the expression of Dyl or Dmupy in Kkv knockdowns? Is there any interaction between the ZP domain protein matrix and the chitin matrix required for lens formation?

      4.) What is retained in the chitin matrix if chitin is missing in kkv knockdown? Is it the ZP domain matrix (see the above question) or are the chitin matrix proteins also involved, such as Obst-A, Obst-C (Gasp), Knk and others? Obst proteins are particularly essential for the regular packaging of chitin and thus for the formation of the chitin layer, which is shown in Fig. 1. Beyond this story, it would also be interesting to see how the aforementioned chitin matrix proteins impact lens formation.

      Minor comments:

      Page 6: Figure 1 is not easily comprehensible for those who are not already familiar with the subject of eye development.

      Fig -1A' please label the cone cells and pigment cells.

      Fig. 1H - The meaning of the abbreviations and numbers is not given in the legend. It would also be beneficial to include a meaningful cartoon illustrating the corneal lens situation before and after chitin secretion, as shown in Figure 3.

      Fig.1 F when does the authors recognize a first chitin assembly as initial corneal lens at the eye and how does it look like? Chitin expression is high already at 54h APF, which means 20 hours earlier.

      Page 6 / Fig 2E: cells autonomously synthesize chitin and no lateral diffusion. Please label which lens contains chitin and which not

      Page 7: The authors state that reb/exp knockdown affects external and internal curvature. However, Fig. S1 statistics does not support this statement.

      Fig.2 and Fig. S1: what is Chp (Chaoptin)?

      Fig. S1E,I: which part of the eye is marked by the chitin staining outside the cone and pigment cells?

      Fig. 2 L,M, Why do exp/reb show different statistical results at outer angle in exp and reb knockdown when compared with the IGMR driver line, although chitin reduction is eliminated in exp knockdown already from 54h APF onwards?

      Fig 3 G-H: please clarify where the chitin reduction can be observed at the edge of adult corneal lens and provide comparable wt staining's. Fig. S2 D. What was the normalization and the sample number?

      Page 6, last paragraph: I fully agree that ZP domain proteins may retain other corneal lens components. But deeper discussion is missing. It should be noted that the authors hypothesis fits well to the proposed function of the ZP matrix in providing chitin matrix adhesion to the underlying cell surface. A loss of the ZP domain protein Piopio causes loss of the chitin matrix as show recently in trachea and at epidermal tendon cells (Göpfert et al., 2025; https://www.sciencedirect.com/science/article/pii/S1742706125003733). Furthermore, a recent publication identifies ZPD proteins as modular units that establish the mechanical environment essential for nanoscale morphogenesis (Itakura et al., https://www.biorxiv.org/content/10.1101/2024.08.20.608778v1.full.pdf). This should be cited and discussed accordingly.

      It could be that outer and inner part of the chitin is different in ultrastructure due to expression pattern. In dragonfly the surface morphology analysis by scanning electron microscopy revealed that the outer part of corneal lenses consisted of long chitin fibrils with regular arrays of papillary structures while the smoother inner part had concentric lamellated chitin formation with shorter chitin nanofibrils (Kaya et al., 2016; https://www.sciencedirect.com/science/article/pii/S0141813016303646?via%3Dihub#fig0020) . Thus, a ultrastructure analyses would be very beneficial, or at least a detailed discussion.

      Significance

      The manuscript's strength and most important aspects are the genetic expression, and localization studies of the chitin under control of the chitin synthase kkv, reb and exp in Drosophila pupal and adult eye . However, beyond this manuscript, the development of mechanistic details, such as interaction partners that trigger secretion and action at the ZP matrix and adjacent apical membranes will be interesting.

      The manuscript uses nice genetics tools to describe the Chitin secretion differences in Drosophila eye and their specific impact on corneal lens formation. Such a precise molecular analysis has not been investigated before in insects. Therefore, the study deeply extends knowledge about the role of Chitin synthases and chitin secretion in insect eye.

      The audience will not only rather specialized in basic research in zoology, developmental biology, and cell biology in terms of how the Chitin synthases produce chitin. Nevertheless, as chitin is relevant to material research and medical and immunological aspects, the manuscript will be interesting beyond the specific field and thus for a broader audience.

      I'm working on chitin in the tracheal system and epidermis in Drosophila.

    1. Você tem a sua revista?

      1,Tenho,sim.tenho o meu revista 2,sim,eu gosto o meu quartro 3,Sim,ele comversa com a sua amiga 4,Sim,Ela telefona para o seu médico 5,Sim nós temos camisas novas 6,sim ela mora com o seu primo 7,Sim,eu sempre venho de ônibus 8.sim nós almoçamos com a minha tia todos os dias

    1. Onde é que a Anita mora?

      1,Ela é morar longe do centro 2,Ele é perto daquele parque 3,Não,ele trabalha na loja 4,Proque ela quer conhecer Ricrdo melhor 5,Amanhã, depois do jogo

    1. Document d'Information : Le Métier d'AESH et l'École Inclusive

      Synthèse

      Ce document analyse les conditions de travail des Accompagnants d'Élèves en Situation de Handicap (AESH) et leur impact sur la mise en œuvre de l'école inclusive en France, vingt ans après la loi fondatrice de 2005.

      Il ressort un paradoxe central : alors que les AESH sont des acteurs indispensables au fonctionnement de l'inclusion scolaire, leur profession est marquée par une précarité systémique, un manque criant de reconnaissance institutionnelle et une maltraitance latente.

      Les conditions de travail se caractérisent par des salaires inférieurs au seuil de pauvreté pour un temps partiel imposé, une absence de formation qualifiante, des missions floues qui favorisent le "bricolage" et une charge physique et émotionnelle considérable.

      Cette situation, où les AESH doivent constamment lutter pour leur place et pallier les dysfonctionnements du système, révèle que la maltraitance de ces professionnels se traduit inévitablement par une négligence envers les élèves qu'ils accompagnent, compromettant ainsi les fondements mêmes du projet d'école inclusive.

      Analyse Détaillée

      1. Le Paradoxe de la Profession d'AESH : Fierté et Maltraitance

      La profession d'AESH est traversée par une dualité profonde, identifiée par le chercheur Frédéric Grimau comme un conflit entre une "grande fierté" et une "grande maltraitance".

      Fierté et Utilité Sociale : Les AESH expriment une fierté légitime pour leur travail, conscients de leur rôle essentiel. Ils déploient une "ingéniosité" remarquable pour faire fonctionner l'inclusion, souvent "à bout de bras".

      Leur contribution est fondamentale, comme le résume la formule : "sans AESH, il n'y a pas d'école inclusive".

      Les témoignages d'élèves confirment ce rôle crucial, évoquant "la complicité" et "la confiance" établies avec leur accompagnant.

      Maltraitance Institutionnelle : En parallèle, les AESH subissent une forme de maltraitance institutionnelle qui se manifeste par une invisibilisation systématique.

      Exclusion Symbolique : Ils sont fréquemment omis des communications officielles de la hiérarchie (par exemple, les vœux de bonnes vacances).

      L'accès à des espaces communs comme la "salle des profs" leur est parfois refusé, renforçant un sentiment de mise à l'écart.

      L'appellation "salle des adultes" ou "salle des personnels" est suggérée comme un premier pas vers la reconnaissance.  

      Confusion Hiérarchique : L'organisation du travail est marquée par un "flou dans les prescriptions" et dans la chaîne de commandement, illustré par le témoignage : "dans mon école tout le monde est mon chef".

      Cette situation est source d'inconfort et de dévalorisation.

      2. Des Conditions de Travail Précaires et un Rôle Mal Défini

      La précarité matérielle et la définition imprécise du métier constituent des freins majeurs à la professionnalisation et au bien-être des AESH.

      Aspect

      Description

      Salaires et Précarité

      La rémunération est basée sur le SMIC horaire, mais les contrats sont majoritairement à temps incomplet, plaçant de nombreux AESH sous le seuil de pauvreté.

      Beaucoup sont contraints de cumuler plusieurs emplois (cantine, aide aux devoirs) pour subvenir à leurs besoins, ce qui entraîne une grande fatigabilité.

      L'accès aux primes REP/REP+, pour le travail en éducation prioritaire, n'a été accordé qu'en 2023.

      Le "Flou" Institutionnel

      Le manque de définition claire des missions est pratique pour l'institution qui peut ainsi faire des "économies".

      Cependant, ce "flou" contraint les AESH à un "bricolage" permanent, comme l'illustre la situation dégradante d'un change d'élève réalisé avec des sacs poubelles et des rideaux en guise de paravent, soulignant "l'indignité totale" pour l'enfant et les professionnels.

      Charge Physique et Émotionnelle

      Le métier comporte une pénibilité physique importante (troubles musculosquelettiques dus au port d'élèves, manque d'infrastructures adaptées).

      La charge mentale est également très lourde : les AESH travaillent avec le "risque de l'incident" constant (crise, violence, fugue), une pression comparable à celle des conducteurs de bus ou de train.

      3. Une Absence de Formation et de Reconnaissance Professionnelle

      L'un des principaux griefs concerne l'inexistence d'une véritable formation, ce qui nuit à la légitimité et à l'efficacité des accompagnants.

      Une Formation Inexistante : La "formation" initiale se résume à 60 heures d'"adaptation à l'emploi", souvent dispensées sous forme de "diaporamas" informatifs dans un amphithéâtre, sans aucune mise en pratique.

      Ce dispositif, hérité des contrats aidés de 2005, est jugé totalement inadapté à la complexité des situations de handicap.

      Les syndicats revendiquent une véritable formation diplômante de niveau Bac+2 sur concours.

      L'Autoformation comme Norme : Face à ce vide, les AESH sont contraints de "s'autoformer".

      Le personnage d'Yvan dans la bande dessinée Ulis de Fabien Toulmet, qui se rend à la bibliothèque pour se documenter sur l'autisme, illustre cette réalité.

      Myiam Sonaï témoigne avoir dû découvrir seule les spécificités des différentes pathologies (dyslexie, dysorthographie, etc.).

      La Lutte pour la Place : La reconnaissance professionnelle se gagne au quotidien dans les établissements.

      Les AESH doivent "se faire leur place" auprès d'équipes enseignantes qui peuvent initialement se montrer distantes.

      L'institution ne prévoit pas de temps dédié à la collaboration et à la concertation, pourtant essentiels pour un travail d'équipe efficace.

      De plus, les AESH sont souvent exclus des Équipes de Suivi de la Scolarisation (ESS), alors que leur parole est primordiale, étant les professionnels les plus proches de l'élève au quotidien.

      4. L'AESH au Cœur des Dysfonctionnements de l'École Inclusive

      Les AESH se retrouvent en première ligne pour gérer les contradictions et les lacunes du système.

      Le Rôle de "Tampon" : Selon Fabien Toulmet, les AESH sont dans une "strate intermédiaire" entre les élèves et les professeurs et font "tampon", absorbant les dysfonctionnements du système.

      Ils sont souvent amenés à dépasser leurs fonctions pour pallier le manque de personnel, en s'occupant de plusieurs élèves simultanément ou en surveillant l'ensemble d'une classe.

      Dépassement de Fonctions et Gestes Techniques :

      Certains se voient confier des tâches relevant du soin, voire du domaine médical (changer une trachéotomie sans formation), alors que la mission d'aide aux "gestes de la vie quotidienne" n'inclut pas les soins.

      Langage et Stigmatisation : Les AESH sont aussi des médiateurs sociaux qui luttent contre la stigmatisation.

      Ils doivent naviguer dans un univers de sigles techniques (GEVASCO, MDPH, PIAL) et faire face à un langage parfois infantilisant ("les enfants" pour des adolescents).

      Ils sont également confrontés à l'usage du mot "Ulis" comme une insulte entre élèves, reflétant la persistance des préjugés.

      5. Évolutions et Inquiétudes pour l'Avenir

      Les réformes récentes et à venir suscitent de vives inquiétudes quant à une dégradation supplémentaire des conditions de travail.

      Les Pôles Inclusifs d'Accompagnement Localisés (PIAL) : Ce dispositif a complexifié le travail en introduisant une "mutualisation" du temps qui se traduit souvent par des affectations multiples et des distances de déplacement importantes.

      Le Pôle d'Appui à la Scolarité (PAS) : Cette nouvelle structure, prévue par la loi, inquiète particulièrement.

      Elle vise à étendre les missions des AESH à l'ensemble des élèves à besoins éducatifs particuliers (incluant les élèves allophones, les enfants du voyage, etc.), et pas seulement ceux en situation de handicap.

      Cette extension des tâches, sans formation ni revalorisation salariale, risque d'accroître une "charge mentale" déjà très élevée.

      Le Problème Politique : Les intervenants s'accordent sur le fait que les difficultés rencontrées sont le symptôme d'un manque de volonté politique et d'investissement.

      L'école inclusive ne peut se construire uniquement sur le "dévouement" des personnels.

      Elle nécessite des investissements concrets dans le bâti scolaire, les manuels adaptés, et surtout, dans la reconnaissance et la formation de celles et ceux qui la rendent possible au quotidien.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #4

      Evidence, reproducibility and clarity

      Overall, the authors show an interesting and conclusive work on the activation of ERM proteins upon TBXA2R signaling. The use of the ebBRET biosensor to assess ERM-protein activation enables elegant investigation of activation modalities. The Thromboxane A2 analogue U46619 robustly shows activation of ERM proteins in ebBRET assays as well as an increase in ERM-protein phosphorylation status. The functional effects of this signaling pathway are shown convincingly for moesin, where moesin mediates an TBXA2R mediated increase in cell motility, invasion and metastasis of triple-negative breast cancer Hs578 cells in vitro and in vivo. Nonetheless, some points need to be clarified.

      Significance

      Comment 1: In the title the authors state, that ERM-activation via TBXA2R is controlling invasion and motility of triple-negative breast cancer cells. In the manuscript, there is only data supporting this assumption for moesin (MSN). Therefore, the authors need to change the title accordingly or support additional experiments for the other two ERM-proteins radixin and ezrin. Throughout the experiments, the p-ERM antibody is used to measure ERM-protein activation. Since the effects on invasion and motility observed in Hs578 cells are mainly mediated through moesin, it would be necessary to see, at least for one experiment per cell line (HEK293T, Hs578) the detailed phosphorylation status of ezrin, radixin and moesin separately. As there are specific, phospho-detecting antibodies for this case, this could be done rather easy. Furthermore, showing specific increase of phosphorylated moesin would support the functional data shown in Figure 5 and 6. To investigate the functional effect of TBXA2R mediated activation of ezrin and radixin on cell motility and invasion, similar experiments could be done in e.g. HMC-1-8 breast cancer cells (high ezrin expression) and HCC1187 (high radixin expression).

      Comment 2: Figure 1A, C, D: The concentration of staurosporine is with 100 nM relatively high for kinase inhibition. It would be informative to see the assay with increasing staurosporine concentrations, e.g. from 1 nM to 50 nM. In general, a concentration of 1-10 nM should be sufficient for kinase inhibition, preventing unspecific effects of the drug.

      Comment 3: The citation for the p-ERM antibody is confusing, as there is only p-Moe used in the cited paper (Roubinet, 2011). There is a p-ERM antibody commercially available (Cell Signaling, Phospho-ezrin (Thr567)/radixin (Thr564)/moesin (Thr558) Antibody #3141). Could you clarify which antibody you are using?

      Comment 4: From the inhibitor experiments using C3 transferase toxin (Figure 2), the authors conclude that RhoA plays a role in TBXA2R mediated ERM activation. As mentioned in the manufacturer's description, C3 toxin is inhibiting RhoA, RhoB and RhoC. Therefore, it would be necessary to repeat those experiments under RhoA knockdown conditions (e.g. using an siRNA-based approach) to state that specifically RhoA is involved.

      Comment 5: To assess, if the findings in Figure 5 and 6 are due to the higher moesin expression in Hs578 cells or are linked to a specific function of moesin, a re-expression experiment would be informative. To achieve this, the 2D and 3D migration experiments could be redone after re-expression of moesin, ezrin and radixin separately in moesin knockdown conditions.

      Minor comments:

      • Even though U46619 is a known Thromboxane A2 analogue, including negative and positive controls would strengthen the results. In detail, this could be done by showing a known protein which gets phosphorylated downstream of TBXA2R signaling and a protein which is not affected by this signaling pathway alongside the shown effects on ERM-proteins.
      • Figure 1 J: There are no statistics comparing the conditions of SQ-29548 treated cells in presence/absence of U46619, that should be added.
      • Figure 1 G, H: How was the quantification for cell periphery performed? In detail, how were the thresholds set for cell periphery / not cell periphery?
      • Figure 3 H:
        • The labelling indicating presence of U46619 is missing.
        • Also, what is the rationale behind normalizing MB-453 for 3 cell lines and comparing the BT-549 to MB-157?
      • Suppl. Fig 4 D: Define y-axis better. Absorbance at what wave length?
      • Define FERM and ERMAD abbreviations in introduction.
    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary

      This manuscript investigates the role of the thromboxane A2 receptor (TBXA2R) in activating ERM (ezrin, radixin, and moesin) proteins to promote cell motility and invasion in triple-negative breast cancer (TNBC) cells. Using TBXA2R stimulation and a series of in vitro and in vivo experiments, the authors report that ERM activation is mediated through a TBXA2R signaling pathway involving Gαq/11 and Gα12/13 subunits, RhoA, and SLK/LOK kinases. They propose that this pathway enhances cell migration, invasion, and metastatic potential in TNBC.

      General criticisms

      Experimental design and analyses are adequate, even though certain experiments lack appropriate controls or employ the wrong statistical tests. However, the study primarily relies on a single TNBC cell line and heavy use of overexpression systems and/or small molecule inhibitors, raising concerns about the generalizability and specificity of the findings. Furthermore, several conclusions appear premature and unsupported by the current data. Critical controls and additional validation experiments are necessary to support the claims about the role of TBXA2R in metastasis and to justify the strong mechanistic conclusions drawn.

      Specific criticisms

      Figure 1

      TBXA2R expression should be shown to understand whether different ebBRET signals are dependent on the overexpression levels of TBXA2R.

      E-F: As ERM levels change over time, one would like to understand whether this is due to misloading or whether there is an underlying biological event going on in the stimulated cells. Are total ERM levels really changing over time? Please add a blot for 1-2 housekeeping proteins as loading controls. This is also crucial to clarify the kinetics of ERM activation; such notable intensity variations make quantifications of non-linear WB signals not fully reliable. In F, mean and SD should be plotted.

      G: The authors need to use a PM marker if they want to claim that pERM increases at the cell cortex. TBXA2R localization should also be shown.

      Figure 2

      A: This reviewer cannot see the purported partial inhibition in Ga12/13 KO cells. Are differences between the two KOs significant? Furthermore, there are reports indicating that YM-254890 may not be specific for Gaq. Experiments on double KO cells are needed to assess the possible redundancy between the two Ga subfamilies. C-D: it is important to add a positive control for the activity of Y-27632 in these experiments. Please show that a ROCK-dependent effect is inhibited in the treated cells. G: The working model is premature as it is unknown whether ROCKi was active. While asking for ROCK1/2 KO cells would be too much, this claim is far-fetched.

      Figure 3

      B: In the legend, it is not clear what grey and light read colours mark. E-F: This reviewer finds it difficult to believe that p-ERM and TBXA2R signal intensities at the cell cortex could be reliably quantified using IHC images. The representative samples would indicate that p-ERM and TBXA2R positivity are not correlated. It would be crucial to show examples for each of the TNBC subgroups the existence of which is inferred based on p-ERM and TBXA2R staining. The conclusion that "no TNBC samples exhibited high TBXA2R expression and low levels of p-ERMs, further supporting a role for TBXA2R signalling in ERM activation in TNBC" is an overstatement.

      Figure 4

      The authors wrote that "We focused on the Hs578T cell line, which showed a median level of TBXA2R mRNA expression among the six TNBC cell lines tested". I do not understand the rationale for it as anti-TBXA2R antibodies detecting endogenous TBXA2R are available and thus why not use the median protein levels?

      Figure 5

      Effects of the knockouts are subtle, and rescue experiments would be needed to corroborate these results. The employed statistical analysis is prone to overestimating differences. The authors should use the superplots instead. The authors might also decide to use other TNBC cell lines to explore the functional relevance of this pathway in BC progression. This is particularly important because Hs578T are poorly tumorigenic, and they often do not form palpable tumours in mice.

      Figure 6

      The fact that Hs578T are poorly tumorigenic in mice is likely the reason why the authors used the experimental metastasis model. However, it is puzzling that metastases were studied in the liver but not in the lungs. Furthermore, the whole approach is rather artefactual as the TBXA2R agonist was administered for the entire duration of these experiments. What is the pathological relevance of such a study? Including a spontaneous metastasis model or alternative TNBC lines that mimic human disease more closely would help strengthen the functional relevance of this pathway in BC progression and study's translational relevance.

      Figure S2

      B-M: the pERM signal appears to be perinuclear in some of the tested cell lines. Please use a PM marker.

      Figure S3

      The authors should use the superplots to analyse the cell migration data.

      Discussion

      The claim that "our findings demonstrated that kinases of the SLK family are the only kinases needed for ERM activation by TBXA2R" should be tuned down as only 2 cell lines were tested. In this section, the authors should also discuss the proposed pro-metastatic functions of TXA2 and TXA2R in more detail, including vascular permeability. The sweeping conclusion that "TBXA2R expression correlates with phosphorylation and activation of ERMs in TNBC patient samples" clashes with the authors' own results; please stick to the data.

      Concluding remarks

      This study investigates a signaling pathway whereby TBXA2R thorugh ERM activation enhances the migratory and invasive potential of TNBC cells. However, several improvements are needed to support the main claims. The dependence on a single TNBC cell line, reliance on pharmacological inhibitors with potential off-target effects, and limited in vivo relevance detract from the generalizability of the findings. Additional TNBC models, adeguate controls, and a broader focus on natural metastasis patterns would make the conclusions more compelling. Moderating certain overstated claims would be needed to align the interpretations with the actual data.

      Cross-commenting

      I found comments in the other reviewers' reports that align with my criticisms on the mouse experiments as well as with those pertaining to the tissue culture work.

      Significance

      General comments

      The manuscript investigates the role of TBXA2R in the regulation of ERM in the context of TNBC metastasis. Much of this TBXA2R signalling axis is already known, as well as that SLK and LOK can phosphorylate ERM in other cell systems. Similarly, the positive role of ERM in cell migration/invasion and cancer progression has long been reported. The somewhat unexpected finding that ERM phosphorylation is independent of ROCK remains not fully convincing. The BC-related part is problematic as the continuous administration a TBXA2R agonist is required for key tumour metrics to show some differences in vivo. This calls into question the main conclusion of the work, namely that the TBXA2R/ERM-dependent pathway is activated during BC progression in TNBC cells.

      Audience

      Specialists interested in GPCRs and signal transduction or in the cytoskeleton.

      Expertise

      Rev: cancer cell biology, signal transduction, cytoskeleton, actin biochemistry, multiplexed imaging, mouse model of human diseases.

      Co-rev: nanoparticles, cell biology.

    3. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #4

      Evidence, reproducibility and clarity

      Overall, the authors show an interesting and conclusive work on the activation of ERM proteins upon TBXA2R signaling. The use of the ebBRET biosensor to assess ERM-protein activation enables elegant investigation of activation modalities. The Thromboxane A2 analogue U46619 robustly shows activation of ERM proteins in ebBRET assays as well as an increase in ERM-protein phosphorylation status. The functional effects of this signaling pathway are shown convincingly for moesin, where moesin mediates an TBXA2R mediated increase in cell motility, invasion and metastasis of triple-negative breast cancer Hs578 cells in vitro and in vivo. Nonetheless, some points need to be clarified.

      Significance

      Comment 1: In the title the authors state, that ERM-activation via TBXA2R is controlling invasion and motility of triple-negative breast cancer cells. In the manuscript, there is only data supporting this assumption for moesin (MSN). Therefore, the authors need to change the title accordingly or support additional experiments for the other two ERM-proteins radixin and ezrin. Throughout the experiments, the p-ERM antibody is used to measure ERM-protein activation. Since the effects on invasion and motility observed in Hs578 cells are mainly mediated through moesin, it would be necessary to see, at least for one experiment per cell line (HEK293T, Hs578) the detailed phosphorylation status of ezrin, radixin and moesin separately. As there are specific, phospho-detecting antibodies for this case, this could be done rather easy. Furthermore, showing specific increase of phosphorylated moesin would support the functional data shown in Figure 5 and 6. To investigate the functional effect of TBXA2R mediated activation of ezrin and radixin on cell motility and invasion, similar experiments could be done in e.g. HMC-1-8 breast cancer cells (high ezrin expression) and HCC1187 (high radixin expression).

      Comment 2: Figure 1A, C, D: The concentration of staurosporine is with 100 nM relatively high for kinase inhibition. It would be informative to see the assay with increasing staurosporine concentrations, e.g. from 1 nM to 50 nM. In general, a concentration of 1-10 nM should be sufficient for kinase inhibition, preventing unspecific effects of the drug.

      Comment 3: The citation for the p-ERM antibody is confusing, as there is only p-Moe used in the cited paper (Roubinet, 2011). There is a p-ERM antibody commercially available (Cell Signaling, Phospho-ezrin (Thr567)/radixin (Thr564)/moesin (Thr558) Antibody #3141). Could you clarify which antibody you are using?

      Comment 4: From the inhibitor experiments using C3 transferase toxin (Figure 2), the authors conclude that RhoA plays a role in TBXA2R mediated ERM activation. As mentioned in the manufacturer's description, C3 toxin is inhibiting RhoA, RhoB and RhoC. Therefore, it would be necessary to repeat those experiments under RhoA knockdown conditions (e.g. using an siRNA-based approach) to state that specifically RhoA is involved.

      Comment 5: To assess, if the findings in Figure 5 and 6 are due to the higher moesin expression in Hs578 cells or are linked to a specific function of moesin, a re-expression experiment would be informative. To achieve this, the 2D and 3D migration experiments could be redone after re-expression of moesin, ezrin and radixin separately in moesin knockdown conditions.

      Minor comments:

      • Even though U46619 is a known Thromboxane A2 analogue, including negative and positive controls would strengthen the results. In detail, this could be done by showing a known protein which gets phosphorylated downstream of TBXA2R signaling and a protein which is not affected by this signaling pathway alongside the shown effects on ERM-proteins.
      • Figure 1 J: There are no statistics comparing the conditions of SQ-29548 treated cells in presence/absence of U46619, that should be added.
      • Figure 1 G, H: How was the quantification for cell periphery performed? In detail, how were the thresholds set for cell periphery / not cell periphery?
      • Figure 3 H:
        • The labelling indicating presence of U46619 is missing.
        • Also, what is the rationale behind normalizing MB-453 for 3 cell lines and comparing the BT-549 to MB-157?
      • Suppl. Fig 4 D: Define y-axis better. Absorbance at what wave length?
      • Define FERM and ERMAD abbreviations in introduction.
    4. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary

      This manuscript investigates the role of the thromboxane A2 receptor (TBXA2R) in activating ERM (ezrin, radixin, and moesin) proteins to promote cell motility and invasion in triple-negative breast cancer (TNBC) cells. Using TBXA2R stimulation and a series of in vitro and in vivo experiments, the authors report that ERM activation is mediated through a TBXA2R signaling pathway involving Gαq/11 and Gα12/13 subunits, RhoA, and SLK/LOK kinases. They propose that this pathway enhances cell migration, invasion, and metastatic potential in TNBC.

      General criticisms

      Experimental design and analyses are adequate, even though certain experiments lack appropriate controls or employ the wrong statistical tests. However, the study primarily relies on a single TNBC cell line and heavy use of overexpression systems and/or small molecule inhibitors, raising concerns about the generalizability and specificity of the findings. Furthermore, several conclusions appear premature and unsupported by the current data. Critical controls and additional validation experiments are necessary to support the claims about the role of TBXA2R in metastasis and to justify the strong mechanistic conclusions drawn.

      Specific criticisms

      Figure 1

      TBXA2R expression should be shown to understand whether different ebBRET signals are dependent on the overexpression levels of TBXA2R.

      E-F: As ERM levels change over time, one would like to understand whether this is due to misloading or whether there is an underlying biological event going on in the stimulated cells. Are total ERM levels really changing over time? Please add a blot for 1-2 housekeeping proteins as loading controls. This is also crucial to clarify the kinetics of ERM activation; such notable intensity variations make quantifications of non-linear WB signals not fully reliable. In F, mean and SD should be plotted.

      G: The authors need to use a PM marker if they want to claim that pERM increases at the cell cortex. TBXA2R localization should also be shown.

      Figure 2

      A: This reviewer cannot see the purported partial inhibition in Ga12/13 KO cells. Are differences between the two KOs significant? Furthermore, there are reports indicating that YM-254890 may not be specific for Gaq. Experiments on double KO cells are needed to assess the possible redundancy between the two Ga subfamilies. C-D: it is important to add a positive control for the activity of Y-27632 in these experiments. Please show that a ROCK-dependent effect is inhibited in the treated cells. G: The working model is premature as it is unknown whether ROCKi was active. While asking for ROCK1/2 KO cells would be too much, this claim is far-fetched.

      Figure 3

      B: In the legend, it is not clear what grey and light read colours mark. E-F: This reviewer finds it difficult to believe that p-ERM and TBXA2R signal intensities at the cell cortex could be reliably quantified using IHC images. The representative samples would indicate that p-ERM and TBXA2R positivity are not correlated. It would be crucial to show examples for each of the TNBC subgroups the existence of which is inferred based on p-ERM and TBXA2R staining. The conclusion that "no TNBC samples exhibited high TBXA2R expression and low levels of p-ERMs, further supporting a role for TBXA2R signalling in ERM activation in TNBC" is an overstatement.

      Figure 4

      The authors wrote that "We focused on the Hs578T cell line, which showed a median level of TBXA2R mRNA expression among the six TNBC cell lines tested". I do not understand the rationale for it as anti-TBXA2R antibodies detecting endogenous TBXA2R are available and thus why not use the median protein levels?

      Figure 5

      Effects of the knockouts are subtle, and rescue experiments would be needed to corroborate these results. The employed statistical analysis is prone to overestimating differences. The authors should use the superplots instead. The authors might also decide to use other TNBC cell lines to explore the functional relevance of this pathway in BC progression. This is particularly important because Hs578T are poorly tumorigenic, and they often do not form palpable tumours in mice.

      Figure 6

      The fact that Hs578T are poorly tumorigenic in mice is likely the reason why the authors used the experimental metastasis model. However, it is puzzling that metastases were studied in the liver but not in the lungs. Furthermore, the whole approach is rather artefactual as the TBXA2R agonist was administered for the entire duration of these experiments. What is the pathological relevance of such a study? Including a spontaneous metastasis model or alternative TNBC lines that mimic human disease more closely would help strengthen the functional relevance of this pathway in BC progression and study's translational relevance.

      Figure S2

      B-M: the pERM signal appears to be perinuclear in some of the tested cell lines. Please use a PM marker.

      Figure S3

      The authors should use the superplots to analyse the cell migration data.

      Discussion

      The claim that "our findings demonstrated that kinases of the SLK family are the only kinases needed for ERM activation by TBXA2R" should be tuned down as only 2 cell lines were tested. In this section, the authors should also discuss the proposed pro-metastatic functions of TXA2 and TXA2R in more detail, including vascular permeability. The sweeping conclusion that "TBXA2R expression correlates with phosphorylation and activation of ERMs in TNBC patient samples" clashes with the authors' own results; please stick to the data.

      Concluding remarks

      This study investigates a signaling pathway whereby TBXA2R thorugh ERM activation enhances the migratory and invasive potential of TNBC cells. However, several improvements are needed to support the main claims. The dependence on a single TNBC cell line, reliance on pharmacological inhibitors with potential off-target effects, and limited in vivo relevance detract from the generalizability of the findings. Additional TNBC models, adeguate controls, and a broader focus on natural metastasis patterns would make the conclusions more compelling. Moderating certain overstated claims would be needed to align the interpretations with the actual data.

      Cross-commenting

      I found comments in the other reviewers' reports that align with my criticisms on the mouse experiments as well as with those pertaining to the tissue culture work.

      Significance

      General comments

      The manuscript investigates the role of TBXA2R in the regulation of ERM in the context of TNBC metastasis. Much of this TBXA2R signalling axis is already known, as well as that SLK and LOK can phosphorylate ERM in other cell systems. Similarly, the positive role of ERM in cell migration/invasion and cancer progression has long been reported. The somewhat unexpected finding that ERM phosphorylation is independent of ROCK remains not fully convincing. The BC-related part is problematic as the continuous administration a TBXA2R agonist is required for key tumour metrics to show some differences in vivo. This calls into question the main conclusion of the work, namely that the TBXA2R/ERM-dependent pathway is activated during BC progression in TNBC cells.

      Audience

      Specialists interested in GPCRs and signal transduction or in the cytoskeleton.

      Expertise

      Rev: cancer cell biology, signal transduction, cytoskeleton, actin biochemistry, multiplexed imaging, mouse model of human diseases.

      Co-rev: nanoparticles, cell biology.

    1. Reviewer #2 (Public review):

      Summary:

      This work examines an important question in the planning and control of reaching movements - where do biases in our reaching movements arise and what might this tell us about the planning process. They compare several different computational models to explain the results from a range of experiments including those within the literature. Overall, they highlight that motor biases are primarily caused errors in the transformation between eye and hand reference frames. One strength of the paper is the large numbers of participants studied across many experiments. However, one weakness is that most of the experiments follow a very similar planar reaching design - with slicing movements through targets rather than stopping within a target. This is partially addressed with Exp 4. This work provides a valuable insight into the biases that govern reaching movements. While the evidence is solid for planar reaching movements, further support in the manner of 3D reaching movements would help strengthen the findings.

      Strengths:

      The work uses a large number of participants both with studies in the laboratory which can be controlled well and a huge number of participants via online studies. In addition, they use a large number of reaching directions allowing careful comparison across models. Together these allow a clear comparison between models which is much stronger than would usually be performed.

    2. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Wang et al. studied an old, still unresolved problem: Why are reaching movements often biased? Using data from a set of new experiments and from earlier studies, they identified how the bias in reach direction varies with movement direction, and how this depends on factors such as the hand used, the presence of visual feedback, the size and location of the workspace, the visibility of the start position and implicit sensorimotor adaptation. They then examined whether a visual bias, a proprioceptive bias, a bias in the transformation from visual to proprioceptive coordinates and/or biomechanical factors could explain the observed patterns of biases. The authors conclude that biases are best explained by a combination of transformation and visual biases.

      A strength of this study is that it used a wide range of experimental conditions with also a high resolution of movement directions and large numbers of participants, which produced a much more complete picture of the factors determining movement biases than previous studies did. The study used an original, powerful, and elegant method to distinguish between the various possible origins of motor bias, based on the number of peaks in the motor bias plotted as a function of movement direction. The biomechanical explanation of motor biases could not be tested in this way, but this explanation was excluded in a different way using data on implicit sensorimotor adaptation. This was also an elegant method as it allowed the authors to test biomechanical explanations without the need to commit to a certain biomechanical cost function.

      We thank the reviewer for their enthusiastic comments.

      (1) The main weakness of the study is that it rests on the assumption that the number of peaks in the bias function is indicative of the origin of the bias. Specifically, it is assumed that a proprioceptive bias leads to a single peak, a transformation bias to two peaks, and a visual bias to four peaks, but these assumptions are not well substantiated. Especially the assumption that a transformation bias leads to two peaks is questionable. It is motivated by the fact that biases found when participants matched the position of their unseen hand with a visual target are consistent with this pattern. However, it is unclear why that task would measure only the effect of transformation biases, and not also the effects of visual and proprioceptive biases in the sensed target and hand locations. Moreover, it is not explained why a transformation bias would lead to this specific bias pattern in the first place.

      We would like to clarify two things.

      Frist, the measurements of the transformation bias are not entirely independent of proprioceptive and visual biases. Specifically, we define transformation bias as the misalignment between the internal representation of a visual target and the corresponding hand position. By this definition, the transformation error entails both visual and proprioceptive biases (see Author response image 1). Transformation biases have been empirically quantified in numerous studies using matching tasks, where participants either aligned their unseen hand to a visual target (Wang et al., 2021) or aligned a visual target to their unseen hand (Wilson et al., 2010). Indeed, those tasks are always considered as measuring proprioceptive biases assuming visual bias is small given the minimal visual uncertainty.

      Author response image 1.

      Second, the critical difference between models is in how these biases influence motor planning rather than how those biases are measured. In the Proprioceptive bias model, a movement is planned in visual space. The system perceives the starting hand position in proprioceptive space and transforms this into visual space (Vindras & Viviani, 1998; Vindras et al., 2005). As such, bias only affects the perceived starting position; there is no influence on the perceived target location (no visual bias).

      In contrast, the Transformation bias model proposes that while both the starting and target positions are perceived in visual space, movement is planned in proprioceptive space. Consequently, both positions must be transformed from visual space to proprioceptive coordinates before movement planning (i.e., where is my sensed hand and where do I want it to be). Under this framework, biases can emerge from both the start and target positions. This is how the transformation model leads to different predictions compared to the perceptual models, even if the bias is based on the same measurements.

      We now highlight the differences between the Transformation bias model and the Proprioceptive bias model explicitly in the Results section (Lines 192-200):

      “Note that the Proprioceptive Bias model and the Transformation Bias model tap into the same visuo-proprioceptive error map. The key difference between the two models arises in how this error influences motor planning. For the Proprioceptive Bias model, planning is assumed to occur in visual space. As such, the perceived position of the hand (based on proprioception) is transformed into the visual space. This will introduce a bias in the representation of the start position. In contrast, the Transformation Bias model assumes that the visually-based representations of the start and target positions need to be transformed into proprioceptive space for motor planning. As such, both positions are biased in the transformation process. In addition to differing in terms of their representation of the target, the error introduced at the start position is in opposite directions due to the direction of the transformation (see fig 1g-h).”

      In terms of the motor bias function across the workspace, the peaks are quantitatively derived from the model simulations. The number of peaks depends on how we formalize each model. Importantly, this is a stable feature of each model, regardless of how the model is parameterized. Thus, the number of peaks provides a useful criterion to evaluate different models.

      Figure 1 g-h illustrates the intuition of how the models generate distinct peak patterns. We edited the figure caption and reference this figure when we introduce the bias function for each model.

      (2) Also, the assumption that a visual bias leads to four peaks is not well substantiated as one of the papers on which the assumption was based (Yousif et al., 2023) found a similar pattern in a purely proprioceptive task.

      What we referred to in the original submission as “visual bias” is not an eye-centric bias, nor is it restricted to the visual system. Rather, it may reflect a domain-general distortion in the representation of position within polar space. We called it a visual bias as it was associated with the perceived location of the visual target in the current task. To avoid confusion, we have opted to move to a more general term and now refer to this as “target bias.”

      We clarify the nature of this bias when introducing the model in the Results section (Lines 164-169):

      “Since the task permits free viewing without enforced fixation, we assume that participants shift their gaze to the visual target; as such, an eye-centric bias is unlikely. Nonetheless, prior studies have shown a general spatial distortion that biases perceived target locations toward the diagonal axes(Huttenlocher et al., 2004; Kosovicheva & Whitney, 2017). Interestingly, this bias appears to be domain-general, emerging not only for visual targets but also for proprioceptive ones(Yousif et al., 2023). We incorporated this diagonal-axis spatial distortion into a Target Bias model. This model predicts a four-peaked motor bias pattern (Fig 1f).”

      We also added a paragraph in the Discussion to further elaborate on this model (Lines 502-511):

      “What might be the source of the visual bias in the perceived location of the target? In the perception literature, a prominent theory has focused on the role of visual working memory account based on the observation that in delayed response tasks, participants exhibit a bias towards the diagonals when recalling the location of visual stimuli(Huttenlocher et al., 2004; Sheehan & Serences, 2023). Underscoring that the effect is not motoric, this bias is manifest regardless of whether the response is made by an eye movement, pointing movement, or keypress(Kosovicheva & Whitney, 2017). However, this bias is unlikely to be dependent on a visual input as similar diagonal bias is observed when the target is specified proprioceptively via the passive displacement of an unseen hand(Yousif et al., 2023). Moreover, as shown in the present study, a diagonal bias is observed even when the target is continuously visible. Thus, we hypothesize that the bias to perceive the target towards the diagonals reflects a more general distortion in spatial representation rather than being a product of visual working memory.”

      (3) Another weakness is that the study looked at biases in movement direction only, not at biases in movement extent. The models also predict biases in movement extent, so it is a missed opportunity to take these into account to distinguish between the models.

      We thank the reviewer for this suggestion. We have now conducted a new experiment to assess angular and extent biases simultaneously (Figure 4a; Exp. 4; N = 30). Using our KINARM system, participants were instructed to make center-out movements that would terminate (rather than shoot past) at the visual target. No visual feedback was provided throughout the experiment.

      The Transformation Bias model predicts a two-peaked error function in both the angular and extent dimensions (Figure 4c). Strikingly, when we fit the data from the new experiment to both dimensions simultaneously, this model captures the results qualitatively and quantitatively (Figure 4e). In terms of model comparison, it outperformed alternative models (Figure 4g) particularly when augmented with a visual bias component. Together, these results provide strong evidence that a mismatch between visual and proprioceptive space is a key source of motor bias.

      This experiment is now reported within the revised manuscript (Lines 280-301).

      Overall, the authors have done a good job mapping out reaching biases in a wide range of conditions, revealing new patterns in one of the most basic tasks, but unambiguously determining the origin of these biases remains difficult, and the evidence for the proposed origins is incomplete. Nevertheless, the study will likely have a substantial impact on the field, as the approach taken is easily applicable to other experimental conditions. As such, the study can spark future research on the origin of reaching biases.

      We thank the reviewer for these summary comments. We believe that the new experiments and analyses do a better job of identifying the origins of motor biases.

      Reviewer #2 (Public Review):

      Summary:

      This work examines an important question in the planning and control of reaching movements - where do biases in our reaching movements arise and what might this tell us about the planning process? They compare several different computational models to explain the results from a range of experiments including those within the literature. Overall, they highlight that motor biases are primarily caused by errors in the transformation between eye and hand reference frames. One strength of the paper is the large number of participants studied across many experiments. However, one weakness is that most of the experiments follow a very similar planar reaching design - with slicing movements through targets rather than stopping within a target. Moreover, there are concerns with the models and the model fitting. This work provides valuable insight into the biases that govern reaching movements, but the current support is incomplete.

      Strengths:

      The work uses a large number of participants both with studies in the laboratory which can be controlled well and a huge number of participants via online studies. In addition, they use a large number of reaching directions allowing careful comparison across models. Together these allow a clear comparison between models which is much stronger than would usually be performed.

      We thank the reviewer for their encouraging comments.

      Weaknesses:

      Although the topic of the paper is very interesting and potentially important, there are several key issues that currently limit the support for the conclusions. In particular I highlight:

      (1) Almost all studies within the paper use the same basic design: slicing movements through a target with the hand moving on a flat planar surface. First, this means that the authors cannot compare the second component of a bias - the error in the direction of a reach which is often much larger than the error in reaching direction.

      Reviewer 1 made a similar point, noting that we had missed an opportunity to provide a more thorough assessment of reaching biases. As described above, we conducted a new experiment in which participants made pointing movements, instructed to terminate the movements at the target. These data allow us to analyze errors in both angular and extent dimensions. The transformation bias model successfully predicts angular and extent biases, outperformed the other models at both group and individual levels. We have now included this result as Exp 4 in the manuscript. Please see response to Reviewer 1 Comment 3 for details.

      Second, there are several studies that have examined biases in three-dimensional reaching movements showing important differences to two-dimensional reaching movements (e.g. Soechting and Flanders 1989). It is unclear how well the authors' computational models could explain the biases that are present in these much more common-reaching movements.

      This is an interesting issue to consider. We expect the mechanisms identified in our 2D work will generalize to 3D.

      Soechting and Flanders (1989) quantified 3D biases by measuring errors across multiple 2D planes at varying heights (see Author response image 2 for an example from their paper). When projecting their 3-D bias data to a horizontal 2D space, the direction of the bias across the 2D plane looks relatively consistent across different heights even though the absolute value of the bias varies (Author response image 2). For example, the matched hand position is generally to the leftwards and downward of the target. Therefore, the models we have developed and tested in a specific 2D plane are likely to generalize to other 2D plane of different heights.

      Author response image 2.

      However, we think the biases reported by Soechting and Flanders likely reflect transformation biases rather than motor biases. First, the movements in their study were performed very slowly (3–5 seconds), more similar to our proprioceptive matching tasks and much slower than natural reaching movements (<500ms). Given the slow speed, we suspect that motor planning in Soechting and Flanders was likely done in a stepwise, incremental manner (closed loop to some degree). Second, the bias pattern reported in Soechting and Flanders —when projected into 2D space— closely mirrors the leftward transformation errors observed in previous visuo-proprioceptive matching task (e.g., Wang et al., 2021).

      In terms of the current manuscript, we think that our new experiment (Exp 4, where we measure angular and radial error) provides strong evidence that the transformation bias model generalizes to more naturalistic pointing movements. As such, we expect these principles will generalize were we to examine movements in three dimensions, an extension we plan to test in future work.

      (2) The model fitting section is under-explained and under-detailed currently. This makes it difficult to accurately assess the current model fitting and its strength to support the conclusions. If my understanding of the methods is correct, then I have several concerns. For example, the manuscript states that the transformation bias model is based on studies mapping out the errors that might arise across the whole workspace in 2D. In contrast, the visual bias model appears to be based on a study that presented targets within a circle (but not tested across the whole workspace). If the visual bias had been measured across the workspace (similar to the transformation bias model), would the model and therefore the conclusions be different?

      We have substantially expanded the Methods section to clarify the modeling procedures (detailed below in section “Recommendations for the Authors”). We also provide annotated code to enable others to easily simulate the models.

      Here we address three points relevant to the reviewer’s concern about whether the models were tested on equal footing, and in particular, concern that the transformation bias model was more informed by prior literature than the visual bias model.

      First, our center-out reaching task used target locations that have been employed in both visual and proprioceptive bias studies, offering reasonable comprehensive coverage of the workspace. For example, for a target to the left of the body’s midline, visual biases tend to be directed diagonally (Kosovicheva & Whitney, 2017), while transformation biases are typically leftward and downward (Wang et al, 2021). In this sense, the models were similarly constrained by prior findings.

      Second, while the qualitative shape of each model was guided by prior empirical findings, no previous data were directly used to quantitatively constrain the models. As such, we believe the models were evaluated on equal footing. No model had more information or, best we can tell, an inherent advantage over the others.

      Third, reassuringly, the fitted transformation bias closely matches empirically observed bias maps reported in prior studies (Fig 2h). The strong correspondence provides convergent validity and supports the putative causality between transformation biases to motor biases.

      (3) There should be other visual bias models theoretically possible that might fit the experimental data better than this one possible model. Such possibilities also exist for the other models.

      Our initial hypothesis, grounded in prior literature, was that motor biases arise from a combination of proprioceptive and visual biases. This led us to thoroughly explore a range of visual models. We now describe these alternatives below, noting that in the paper, we chose to focus on models that seemed the most viable candidates. (Please also see our response to Reviewer 3, Point 2, on another possible source of visual bias, the oblique effect.)

      Quite a few models have described visual biases in perceiving motion direction or object orientation (e.g., Wei & Stocker, 2015; Patten, Mannion & Clifford, 2017). Orientation perception would be biased towards the Cartesian axis, generating a four-peak function. However, these models failed to account for the motor biases observed in our experiments. This is not surprising given that these models were not designed to capture biases related to a static location.

      We also considered a class of eye-centric models where biases for peripheral locations are measured under fixation. A prominent finding here is that the bias is along the radial axis in which participants overshoot targets when they fixate on the start position during the movement (Beurze et al., 2006; Van Pelt & Medendorp, 2008). Again, this is not consistent with the observed motor biases. For example, participants undershoot rightward targets when we measured the distance bias in Exp 4. Importantly, since most our tasks involved free viewing in natural settings with no fixation requirements, we considered it unlikely that biases arising from peripheral viewing play a major role.

      We note, though, that in our new experiment (Exp 4), participants observed the visual stimuli from a fixed angle in the KinArm setup (see Figure 4a). This setup has been shown to induce depth-related visual biases (Figure 4b, e.g., Volcic et al., 2013; Hibbard & Bradshaw, 2003). For this reason, we implemented a model incorporating this depth bias as part of our analyses of these data. While this model performed significantly worse than the transformation bias model alone, a mixed model that combined the depth bias and transformation bias provided the best overall fit. We now include this result in the main text (Lines 286-294).

      We also note that the “visual bias” we referred to in the original submission is not restricted to the visual system. A similar bias pattern has been observed when the target is presented visually or proprioceptively (Kosovicheva & Whitney, 2017; Yousif, Forrence, & McDougle, 2023). As such, it may reflect a domaingeneral distortion in the representation of position within polar space. Accordingly, in the revision, we now refer to this in a more general way, using the term “target bias.” We justify this nomenclature when introducing the model in the Results section (Lines 164-169). Please also see Reviewer 1 comment 2.

      We recognize that future work may uncover a better visual model or provide a more fine-grained account of visual biases (or biases from other sources). With our open-source simulation code, such biases can be readily incorporated—either to test them against existing models or to combine them with our current framework to assess their contribution to motor biases. Given our explorations, we expect our core finding will hold: Namely, that a combination of transformation and target biases offers the most parsimonious account, with the bias associated with the transformation process explaining the majority of the observed motor bias in visually guided movements.

      Given the comments from the reviewer, we expanded the discussion session to address the issue of alternative models of visual bias (lines 522-529):

      “Other forms of visual bias may influence movement. Depth perception biases could contribute to biases in movement extent(Beurze et al., 2006; Van Pelt & Medendorp, 2008). Visual biases towards the principal axes have been reported when participants are asked to report the direction of moving targets or the orientation of an object(Patten et al., 2017; Wei & Stocker, 2015). However, the predicted patterns of reach biases do not match the observed biases in the current experiments. We also considered a class of eye-centric models in which participants overestimate the radial distance to a target while maintaining central fixation(Beurze et al., 2006; Van Pelt & Medendorp, 2008). At odds with this hypothesis, participants undershot rightward targets when we measured the radial bias in Exp 4. The absence of these other distortions of visual space may be accounted for by the fact that we allowed free viewing during the task.”

      (4) Although the authors do mention that the evidence against biomechanical contributions to the bias is fairly weak in the current manuscript, this needs to be further supported. Importantly both proprioceptive models of the bias are purely kinematic and appear to ignore the dynamics completely. One imagines that there is a perceived vector error in Cartesian space whereas the other imagines an error in joint coordinates. These simply result in identical movements which are offset either with a vector or an angle. However, we know that the motor plan is converted into muscle activation patterns which are sent to the muscles, that is, the motor plan is converted into an approximation of joint torques. Joint torques sent to the muscles from a different starting location would not produce an offset in the trajectory as detailed in Figure S1, instead, the movements would curve in complex patterns away from the original plan due to the non-linearity of the musculoskeletal system. In theory, this could also bias some of the other predictions as well. The authors should consider how the biomechanical plant would influence the measured biases.

      We thank the reviewer for encouraging us on this topic and to formalize a biomechanical model. In response, we have implemented a state-of-the-art biomechanical framework, MotorNet

      (https://elifesciences.org/articles/88591), which simulates a six-muscle, two-skeleton planar arm model using recurrent neural networks (RNNs) to generate control policies (See Figure 6a). This model captures key predictions about movement curvature arising from biomechanical constraints. We view it as a strong candidate for illustrating how motor bias patterns could be shaped by the mechanical properties of the upper limb.

      Interestingly, the biomechanical model did not qualitatively or quantitatively reproduce the pattern of motor biases observed in our data. Specifically, we trained 50 independent agents (RNNs) to perform random point-to-point reaching movements across the workspace used in our task. We used a loss function that minimized the distance between the fingertip and the target over the entire trajectory. When tested on a center-out reaching task, the model produced a four-peaked motor bias pattern (Figure 6b), in contrast to the two-peaked function observed empirically. These results suggest that upper limb biomechanical constraints are unlikely to be a primary driver of motor biases in reaching. This holds true even though the reported bias is read out at 60% of the reaching distance, where biomechanical influences on the curvature of movement are maximal. We have added this analysis to the results (lines 367-373).

      It may seem counterintuitive that biomechanics plays a limited role in motor planning. This could be due to several factors. First, First, task demands (such as the need to grasp objects) may lead the biomechanical system to be inherently organized to minimize endpoint errors (Hu et al., 2012; Trumbower et al., 2009). Second, through development and experience, the nervous system may have adapted to these biomechanical influences—detecting and compensating for them over time (Chiel et al., 2009).

      That said, biomechanical constraints may make a larger contribution in other contexts; for example, when movements involve more extreme angles or span larger distances, or in individuals with certain musculoskeletal impairments (e.g., osteoarthritis) where physical limitations are more likely to come into play. We address this issue in the revised discussion.

      “Nonetheless, the current study does not rule out the possibility that biomechanical factors may influence motor biases in other contexts. Biomechanical constraints may have had limited influence in our experiments due to the relatively modest movement amplitudes used and minimal interaction torques involved. Moreover, while we have focused on biases that manifest at the movement endpoint, biomechanical constraints might introduce biases that are manifest in the movement trajectories.(Alexander, 1997; Nishii & Taniai, 2009) Future studies are needed to examine the influence of context on reaching biases.”

      Reviewer #3 (Public review):

      The authors make use of a large dataset of reaches from several studies run in their lab to try to identify the source of direction-dependent radial reaching errors. While this has been investigated by numerous labs in the past, this is the first study where the sample is large enough to reliably characterize isometries associated with these radial reaches to identify possible sources of errors.

      (1) The sample size is impressive, but the authors should Include confidence intervals and ideally, the distribution of responses across individuals along with average performance across targets. It is unclear whether the observed “averaged function” is consistently found across individuals, or if it is mainly driven by a subset of participants exhibiting large deviations for diagonal movements. Providing individual-level data or response distributions would be valuable for assessing the ubiquity of the observed bias patterns and ruling out the possibility that different subgroups are driving the peaks and troughs. It is possible that the Transformation or some other model (see below) could explain the bias function for a substantial portion of participants, while other participants may have different patterns of biases that can be attributable to alternative sources of error.

      We thank the reviewer for encouraging a closer examination of the individual-level data. We did include standard error when we reported the motor bias function. Given that the error distribution is relatively Gaussian, we opted to not show confidence intervals since they would not provide additional information.

      To examine individual differences, we now report a best-fit model frequency analysis. For Exp 1, we fit each model at the individual level and counted the number of participants that are best predicted by each model. Among the four single source models (Figure 3a), the vast majority of participants are best explained by the transformation bias model (48/56). When incorporating mixture models, the combined transformation + target bias model emerged as the best fit for almost all participants across experiments (50/56). The same pattern holds for Exp 3b, the frequency analysis is more distributed, likely due to the added noise that comes with online studies.

      We report this new analysis in the Results. (see Fig 3. Fig S2). Note that we opted to show some representative individual fits, selecting individuals whose data were best predicted by different models (Fig S2). Given that the number of peaks characterizes each model (independent of the specific parameter values), the two-peaked function exhibited for most participants indicates that the Transformation bias model holds at the individual level and not just at the group level.

      (2) The different datasets across different experimental settings/target sets consistently show that people show fewer deviations when making cardinal-directed movements compared to movements made along the diagonal when the start position is visible. This reminds me of a phenomenon referred to as the oblique effect: people show greater accuracy for vertical and horizontal stimuli compared to diagonal ones. While the oblique effect has been shown in visual and haptic perceptual tasks (both in the horizontal and vertical planes), there is some evidence that it applies to movement direction. These systematic reach deviations in the current study thus may reflect this epiphenomenon that applies across modalities. That is, estimating the direction of a visual target from a visual start position may be less accurate, and may be more biased toward the horizontal axis, than for targets that are strictly above, below, left, or right of the visual start position. Other movement biases may stem from poorer estimation of diagonal directions and thus reflect more of a perceptual error than a motor one. This would explain why the bias function appears in both the in-lab and on-line studies although the visual targets are very different locations (different planes, different distances) since the oblique effects arise independent of plane, distance, or size of the stimuli. When the start position is not visible like in the Vindras study, it is possible that this oblique effect is less pronounced; masked by other sources of error that dominate when looking at 2D reach endpoint made from two separate start positions, rather than only directional errors from a single start position. Or perhaps the participants in the Vindras study are too variable and too few (only 10) to detect this rather small direction-dependent bias.

      The potential link between the oblique effect and the observed motor bias is an intriguing idea, one that we had not considered. However, after giving this some thought, we see several arguments against the idea that the oblique effect accounts for the pattern of motor biases.

      First, by the oblique effect, perceptual variability is greater along the diagonal axes compared to the cardinal axes. These differences in perceptual variability have been used to explain biases in visual perception through a Bayesian model under the assumption that the visual system has an expectation that stimuli are more likely to be oriented along the cardinal axes (Wei & Stocker, 2015). Importantly, the model predicts low biases at targets with peak perceptual variability. As such, even though those studies observed that participants showed large variability for stimuli at diagonal orientations, the bias for these stimuli was close to zero. Given we observed a large bias for targets at locations along the diagonal axes, we do not think this visual effect can explain the motor bias function.

      Second, the reviewer suggested that the observed motor bias might be largely explained by visual biases (or what we now refer to as target biases). If this hypothesis is correct, we would anticipate observing a similar bias pattern in tasks that use a similar layout for visual stimuli but do not involve movement. However, this prediction is not supported. For example, Kosovicheva & Whitney (2017) used a position reproduction/judgment task with keypress responses (no reaching). The stimuli were presented in a similar workspace as in our task. Their results showed four-peaked bias function while our results showed a two-peaked function.

      In summary, we don’t think oblique biases make a significant contribution to our results.

      A bias in estimating visual direction or visual movement vector Is a more realistic and relevant source of error than the proposed visual bias model. The Visual Bias model is based on data from a study by Huttenlocher et al where participants “point” to indicate the remembered location of a small target presented on a large circle. The resulting patterns of errors could therefore be due to localizing a remembered visual target, or due to relative or allocentric cues from the clear contour of the display within which the target was presented, or even movements used to indicate the target. This may explain the observed 4-peak bias function or zig-zag pattern of “averaged” errors, although this pattern may not even exist at the individual level, especially given the small sample size. The visual bias source argument does not seem well-supported, as the data used to derive this pattern likely reflects a combination of other sources of errors or factors that may not be applicable to the current study, where the target is continuously visible and relatively large. Also, any visual bias should be explained by a coordinates centre on the eye and should vary as a function of the location of visual targets relative to the eyes. Where the visual targets are located relative to the eyes (or at least the head) is not reported.

      Thank you for this question. A few key points to note:

      The visual bias model has also been discussed in studies using a similar setup to our study. Kosovicheva & Whitney (2017) observed a four-peaked function in experiments in which participants report a remembered target position on a circle by either making saccades or using key presses to adjust the position of a dot. However, we agree that this bias may be attenuated in our experiment given that the target is continuously visible. Indeed, the model fitting results suggest the peak of this bias is smaller in our task (~3°) compared to previous work (~10°, Kosovicheva & Whitney, 2017; Yousif, Forrence, & McDougle, 2023).

      We also agree with the reviewer that this “visual bias” is not an eye-centric bias, nor is it restricted to the visual system. A similar bias pattern is observed even if the target is presented proprioceptively (Yousif, Forrence, & McDougle, 2023). As such, this bias may reflect a domain-general distortion in the representation of position within polar space. Accordingly, in the revision, we now refer to this in a more general way, using the term “target bias”, rather than visual bias. We justify this nomenclature when introducing the model in the Results section (Lines 164-169). Please also see Reviewer 1 comment 2 for details.

      Motivated by Reviewer 2, we also examined multiple alternative visual bias models (please refer to our response to Reviewer 2, Point 3.

      The Proprioceptive Bias Model is supposed to reflect errors in the perceived start position. However, in the current study, there is only a single, visible start position, which is not the best design for trying to study the contribution. In fact, my paradigms also use a single, visual start position to minimize the contribution of proprioceptive biases, or at least remove one source of systematic biases. The Vindras study aimed to quantify the effect of start position by using two sets of radial targets from two different, unseen start positions on either side of the body midline. When fitting the 2D reach errors at both the group and individual levels (which showed substantial variability across individuals), the start position predicted most of the 2D errors at the individual level – and substantially more than the target direction. While the authors re-plotted the data to only illustrate angular deviations, they only showed averaged data without confidence intervals across participants. Given the huge variability across their 10 individuals and between the two target sets, it would be more appropriate to plot the performance separately for two target sets and show confidential intervals (or individual data). Likewise, even the VT model predictions should differ across the two targets set since the visual-proprioceptive matching errors from the Wang et al study that the model is based on, are larger for targets on the left side of the body.

      To be clear, in the Transformation bias model, the vector bias at the start position is also an important source of error. The critical difference between the proprioceptive and transformation models is how bias influences motor planning. In the Proprioceptive bias model, movement is planned in visual space. The system perceives the starting hand position in proprioceptive space and transforms this into visual space (Vindras & Viviani, 1998; Vindras et al., 2005). As such, the bias is only relevant in terms of the perceived start position; it does not influence the perceived target location. In contrast, the transformation bias model proposes that while both the starting and target positions are perceived in visual space, movements are planned in proprioceptive space. Consequently, when the start and target positions are visible, both positions must be transformed from visual space to proprioceptive coordinates before movement planning. Thus, bias will influence both the start and target positions. We also note that to set the transformation bias for the start/target position, we referred to studies in which bias is usually referred to as proprioception error measurement. As such, changing the start position has a similar impact on the Transformation and the Proprioceptive Bias models in principle, and would not provide a stronger test to separate them.

      We now highlight the differences between the models in the Results section, making clear that the bias at the start position influences both the Proprioceptive bias and Transformation bias models (Lines 192200).

      “Note that the Proprioceptive Bias model and the Transformation Bias model tap into the same visuo-proprioceptive error map. The key difference between the two models arises in how this error influences motor planning. For the Proprioceptive Bias model, planning is assumed to occur in visual space. As such, the perceived position of the hand (based on proprioception) is transformed into visual space. This will introduce a bias in the representation of the start position. In contrast, the Transformation Bias model assumes that the visually-based representations of the start and target positions need to be transformed into proprioceptive space for motor planning. As such, both positions are biased in the transformation process. In addition to differing in terms of their representation of the target, the error introduced at the start position is in opposite directions due to the direction of the transformation (see fig 1g-h).”

      In terms of fitting individual data, we have conducted a new experiment, reported as Exp 4 in the revised manuscript (details in our response to Reviewer 1, comment 3). The experiment has a larger sample size (n=30) and importantly, examined error for both movement angle and movement distance. We chose to examine the individual differences in 2-D biases using this sample rather than Vindras’ data as our experiment has greater spatial resolution and more participants. At both the group and individual level, the Transformation bias model is the best single source model, and the Transformation + Target Bias model is the best combined model. These results strongly support the idea that the transformation bias is the main source of the motor bias.

      As for the different initial positions in Vindras et al (2005), the two target sets have very similar patterns of motor biases. As such, we opted to average them to decrease noise. Notably, the transformation model also predicts that altering the start location should have limited impact on motor bias patterns: What matters for the model is the relative difference between the transformation biases at the start and target positions rather than the absolute bias.

      Author response image 3.

      I am also having trouble fully understanding the V-T model and its associated equations, and whether visual-proprioception matching data is a suitable proxy for estimating the visuomotor transformation. I would be interested to first see the individual distributions of errors and a response to my concerns about the Proprioceptive Bias and Visual Bias models.

      We apologize for the lack of clarity on this model. To generate the T+V (Now Transformation + Target bias, or TR+TG) model, we assume the system misperceives the target position (Target bias, see Fig S5a) and then transforms the start and misperceived target positions into proprioceptive space (Fig S5b). The system then generates a motor plan in proprioceptive space; this plan will result in the observed motor bias (Fig. S5c). We now include this figure as Fig S5 and hope that it makes the model features salient.

      Regarding whether the visuo-proprioceptive matching task is a valid proxy for transformation bias, we refer the reviewer to the comments made by Public Reviewer 1, comment 1. We define the transformation bias as the discrepancy between corresponding positions in visual and proprioceptive space. This can be measured using matching tasks in which participants either aligned their unseen hand to a visual target (Wang et al., 2021) or aligned a visual target to their unseen hand (Wilson et al., 2010).

      Nonetheless, when fitting the model to the motor bias data, we did not directly impose the visual-proprioceptive matching data. Instead, we used the shape of the transformation biases as a constraint, while allowing the exact magnitude and direction to be free parameters (e.g., a leftward and downward bias scaled by distance from the right shoulder). Reassuringly, the fitted transformation biases closely matched the magnitudes reported in prior studies (Fig. 2h, 1e), providing strong quantitative support for the hypothesized causal link between transformation and motor biases.

      Recommendations for the authors:

      Overall, the reviewers agreed this is an interesting study with an original and strong approach. Nonetheless, there were three main weaknesses identified. First, is the focus on bias in reach direction and not reach extent. Second, the models were fit to average data and not individual data. Lastly, and most importantly, the model development and assumptions are not well substantiated. Addressing these points would help improve the eLife assessment.

      Reviewer #1 (Recommendations for the authors):

      It is mentioned that the main difference between Experiments 1 and 3 is that in Experiment 3, the workspace was smaller and closer to the shoulder. Was the location of the laptop relative to the participant in Experiment 3 known by the authors? If so, variations in this location across participants can be used to test whether the Transformation bias was indeed larger for participants who had the laptop further from the shoulder.

      Another difference between Experiments 1 and 3 is that in Experiment 1, the display was oriented horizontally, whereas it was vertical in Experiment 3. To what extent can that have led to the different results in these experiments?

      This is an interesting point that we had not considered. Unfortunately, for the online work we do not record the participants’ posture.

      Regarding the influence of display orientation (horizontal vs. vertical), Author response image 4 presents three relevant data points: (1) Vandevoorde and Orban de Xivry (2019), who measured motor biases in-person across nine target positions using a tablet and vertical screen; (2) Our Experiment 1b, conducted online with a vertical setup; (3) Our in-person Experiment 3b, using a horizontal monitor. For consistency, we focus on the baseline conditions with feedback, the only condition reported in Vandevoorde. Motor biases from the two in-person studies were similar despite differing monitor orientations: Both exhibited two-peaked functions with comparable peak locations. We note that the bias attenuation in Vandevoorde may be due to their inclusion of reward-based error signals in addition to cursor feedback. In contrast, compared to the in-person studies, the online study showed reduced bias magnitude with what appears to be a four peaked function. While more data are needed, these results suggest that the difference in the workspace (more restricted in our online study) may be more relevant than monitor orientation.

      Author response image 4.

      For the joint-based proprioceptive model, the equations used are for an arm moving in a horizontal plane at shoulder height, but the figures suggest the upper arm was more vertical than horizontal. How does that affect the predictions for this model?

      Please also see our response to your public comment 1. When the upper limb (or the lower limb) is not horizontal, it will influence the projection of the upper limb to the 2-D space. Effectively in the joint-based proprioceptive model, this influences the ratio between L1 and L2 (see  Author response image 5b below). However, adding a parameter to vary L1/L2 ratio would not change the set of the motor bias function that can be produced by the model. Importantly, it will still generate a one-peak function. We simulated 50 motor bias function across the possible parameter space. As shown by  Author response image 5c-d, the peak and the magnitude of the motor bias functions are very similar with and without the L1/L2 term. We characterize the bias function with the peak position and the peak-to-valley distance. Based on those two factors, the distribution of the motor bias function is very similar ( Author response image 5e-f). Moreover, the L1/L2 ratio parameter is not recoverable by model fitting ( Author response image 5c), suggesting that it is redundant with other parameters. As such we only include the basic version of the joint-based proprioceptive model in our model comparisons.

      Author response image 5.

      It was unclear how the models were fit and how the BIC was computed. It is mentioned that the models were fit to average data across participants, but the BIC values were based on all trials for all participants, which does not seem consistent. And the models are deterministic, so how can a log-likelihood be determined? Since there were inter-individual differences, fitting to average data is not desirable. Take for instance the hypothetical case that some participants have a single peak at 90 deg, and others have a single peak at 270 deg. Averaging their data will then lead to a pattern with two peaks, which would be consistent with an entirely different model.

      We thank the reviewer for raising these issues.

      Given the reviewers’ comments, we now report fits at both the group and individual level (see response to reviewer 3 public comment 1). The group-level fitting is for illustration purposes. Model comparison is now based on the individual-level analyses which show that the results are best explained by the transformation model when comparing single source models and best explained by the T+V (now TG+TR) model when consider all models. These new results strongly support the transformation model.

      Log-likelihoods were computed assuming normally distributed motor noise around the motor biases predicted by each model.

      We updated the Methods section as follows (lines 841-853):

      “We used the fminsearchbnd function in MATLAB to minimize the sum of loglikelihood (LL) across all trials for each participant. LL were computed assuming normally distributed noise around each participant’s motor biases:

      [11] LL = normpdf(x, b, c)

      where x is the empirical reaching angle, b is the predicted motor bias by the model, c is motor noise, calculated as the standard deviation of (x − b). For model comparison, we calculated the BIC as follow:

      [12] BIC = -2LL+k∗ln(n)

      where k is the number of parameters of the models. Smaller BIC values correspond to better fits. We report the sum of ΔBIC by subtracting the BIC value of the TR+TG model from all other models.

      For illustrative purposes, we fit each model at the group level, pooling data across all participants to predict the group-averaged bias function.”

      What was the delay of the visual feedback in Experiment 1?

      The visual delay in our setup was ~30 ms, with the procedure used to estimate this described in detail in Wang et al (2024, Curr. Bio.). We note that in calculating motor biases, we primarily relied on the data from the no-feedback block.

      Minor corrections

      In several places it is mentioned that movements were performed with proximal and distal effectors, but it's unclear where that refers to because all movements were performed with a hand (distal effector).

      By 'proximal and distal effectors,' we were referring to the fact that in the online setup, “reaching movements” are primarily made by finger and/or wrist movements across a trackpad, whereas in the inperson setup, the participants had to use their whole arm to reach about the workspace. To avoid confusion, we now refer to these simply as 'finger' versus 'hand' movements.

      In many figures, Bias is misspelled as Bais.

      Fixed.

      In Figure 3, what is meant by deltaBIC (*1000) etc? Literally, it would mean that the bars show 1,000 times the deltaBIC value, suggesting tiny deltaBIC values, but that's probably not what's meant.

      ×1000' in the original figure indicates the unit scaling, with ΔBIC values ranging from approximately 1000 to 4000. However, given that we now fit the models at the individual level, we have replaced this figure with a new one (Figure 3e) showing the distribution of individual BIC values.

      Reviewer #2 (Recommendations for the authors):

      I have concerns that the authors only examine slicing movements through the target and not movements that stop in the target. Biases create two major errors - errors in direction and errors in magnitude and here the authors have only looked at one of these. Previous work has shown that both can be used to understand the planning processes underlying movement. I assume that all models should also make predictions about the magnitude biases which would also help support or rule out specific models.

      Please see our response to Reviewer 1 public review 3.

      As discussed above, three-dimensional reaching movements also have biases and are not studied in the current manuscript. In such studies, biomechanical factors may play a much larger role.

      Please see our response to your public review.

      It may be that I am unclear on what exactly is done, as the methods and model fitting barely explain the details, but on my reading on the methods I have several major concerns.

      First, it feels that the visual bias model is not as well mapped across space if it only results from one study which is then extrapolated across the workspace. In contrast, the transformation model is actually measured throughout the space to develop the model. I have some concerns about whether this is a fair comparison. There are potentially many other visual bias models that might fit the current experimental results better than the chosen visual bias model.

      Please refers to our response to your public review.

      It is completely unclear to me why a joint-based proprioceptive model would predict curved planned movements and not straight movements (Figure S1). Changes in the shoulder and elbow joint angles could still be controlled to produce a straight movement. On the other hand, as mentioned above, the actual movement is likely much more complex if the physical starting position is offset from the perceived hand.

      Natural movements are often curved, reflecting a drive to minimize energy expenditure or biomechanical constraints (e.g., joint and muscle configuration). This is especially the case when the task emphasizes endpoint precision (Codol et al., 2024) like ours. Trajectory curvature was also observed in a recent simulation study in which a neural network was trained to control a biomechanical model (2-limb, 6muscles) with the cost function specified to minimize trajectory error (reach to a target with as straight a movement as possible). Even under these constraints, the movements showed some curvature. To examined whether the endpoint reaching bias somehow reflects the curvature (or bias during reaching), we included the prediction of this new biomechanical model in the paper to show it does not explain the motor bias we observed.

      To be clear, while we implemented several models (Joint-based proprioceptive model and the new biomechanical model) to examine whether motor biases can be explained by movement curvature, our goal in this paper was to identify the source of the endpoint bias. Our modeling results reveal a previously underappreciated source of motor bias—a transformation error that arises between visual and proprioceptive space—plays a dominant role in shaping motor bias patterns across a wide range of experiments, including naturalistic reaching contexts where vision and hand are aligned at the start position. While the movement curvature might be influenced by selectively manipulating factors that introduce a mismatch between the visual starting position and the actual hand position (such as Sober and Sabes, 2003), we think it will be an avenue for future work to investigate this question.

      The model fitting section is barely described. It is unclear how the data is fit or almost any other aspects of the process. How do the authors ensure that they have found the minimum? How many times was the process repeated for each model fit? How were starting parameters randomized? The main output of the model fitting is BIC comparisons across all subjects. However, there are many other ways to compare the models which should be considered in parallel. For example, how well do the models fit individual subjects using BIC comparisons? Or how often are specific models chosen for individual participants? While across all subjects one model may fit best, it might be that individual subjects show much more variability in which model fits their data. Many details are missing from the methods section. Further support beyond the mean BIC should be provided.

      We fit each model 150 times and for each iteration, the initial value of each parameter was randomly selected from a uniform distribution. The range for each parameter was hand tuned for each model, with an eye on making sure the values covered a reasonable range. Please see our response to your first minor comment below for the range of all parameters and how we decide the iteration number for each model.

      Given the reviewers’ comments in the individual difference, we now fit the models at individual level and report a frequency analysis, describing the best fitting model for each participant. In brief, the data for a vast majority of the participants was best explained by the transformation model when comparing single source models and by the T+V (TR+TG) model when consider all models. Please see response to reviewer 3 public comment 1 for the updated result.

      We updated the method session, and it reads as follows (lines 841-853):

      _“_We used the fminsearchbnd function in MATLAB to minimize the sum of loglikelihood (LL) across all trials for each participant. LL were computed assuming normally distributed noise around each participant’s motor biases:

      [11]       𝐿𝐿 = 𝑛𝑜𝑟𝑚𝑝𝑑𝑓(𝑥, 𝑏, 𝑐)

      where x is the empirical reaching angle, b is the predicted motor bias by the model, c is motor noise, calculated as the standard deviation of x-b.

      For model comparison, we calculated the BIC as follows:

      [12] BIC = -2LL+k∗ln(n)

      where k is the number of parameters of the models. Smaller BIC values correspond to better fits. We report the sum of ΔBIC by subtracting the BIC value of the TR+TG model from all other models.

      Line 305-307. The authors state that biomechanical issues would not predict qualitative changes in the motor bias function in response to visual manipulation of the start position. However, I question this statement. If the start position is offset visually then any integration of the proprioceptive and visual information to determine the start position would contain a difference from the real hand position. A calculation of the required joint torques from such a position sent through the mechanics of the limb would produce biases. These would occur purely because of the combination of the visual bias and the inherent biomechanical dynamics of the limb.

      We thank the reviewer for this comment. We have removed the statement regarding inferences about the biomechanical model based on visual manipulations of the start position. Additionally, we have incorporated a recently proposed biomechanical model into our model comparisons to expand our exploration of sources of bias. Please refer to our response to your public review for details.

      Measurements are made while the participants hold a stylus in their hand. How can the authors be certain that the biases are due to the movement and not due to small changes in the hand posture holding the stylus during movements in the workspace. It would be better if the stylus was fixed in the hand without being held.

      Below, we have included an image of the device used in Exp 1 for reference. The digital pen was fixed in a vertical orientation. At the start of the experiment, the experimenter ensured that the participant had the proper grip alignment and held the pen at the red-marked region. With these constraints, we see minimal change in posture during the task.

      Author response image 6.

      Minor Comments

      Best fit model parameters are not presented. Estimates of the accuracy of these measures would also be useful.

      In the original submission, we included a Table S1 that presented the best-fit parameters for the TR+TG (Previously T+V) model. Table S1 now shows the parameters for the other models (Exp 1b and 3b, only). We note the parameter values from these non-optimal models are hard to interpret given that core predictions are inconsistent with the data (e.g., number of peaks).

      We assume that by "accuracy of these measures," the reviewers are referring to the reliability of the model fits. To assess this, we conducted a parameter recovery analysis in which we simulated a range of model parameters for each model and then attempted to recover them through fitting. Each model was simulated 50 times, with the parameters randomly sampled from distributions used to define the initial fitting parameters. Here, we only present the results for the combined models (TR+TG, PropV+V, and PropJ+V), as the nested models would be even easier to fit.

      As shown in Fig. S4, all parameters were recovered with high accuracy, indicating strong reliability in parameter estimation. Additionally, we examined the log-likelihood as a function of fitting iterations (Fig. S4d). Based on this curve, we determined that 150 iterations were sufficient given that the log-likelihood values were asymptotic at this point. Moreover, in most cases, the model fitting can recover the simulated model, with minimal confusion across the three models (Fig. S4e).

      What are the (*1000) and (*100) in the Change in BIC y-labels? I assume they indicate that the values should be multiplied by these numbers. If these indicate that the BIC is in the hundreds or thousands it would be better the label the axes clearly, as the interpretation is very different (e.g. a BIC difference of 3 is not significant).

      ×1000' in the original figure indicates the unit scaling, with ΔBIC values ranging from approximately 1000 to 4000. However, given that we now fit the models at the individual level, we have replaced this figure with a new one showing the distribution of individual BIC values.

      Lines 249, 312, and 315, and maybe elsewhere - the degree symbol does not display properly.

      Corrected.

      Line 326. The authors mention that participants are unaware of their change in hand angle in response to clamped feedback. However, there may be a difference between sensing for perception and sensing for action. If the participants are unaware in terms of reporting but aware in terms of acting would this cause problems with the interpretation?

      This is an interesting distinction, one that has been widely discussed in the literature. However, it is not clear how to address this in the present context. We have looked at awareness in different ways in prior work with clamped feedback. In general, even when the hand direction might have deviated by >20d, participants report their perceived hand position after the movement as near the target (Tsay et al, 2020). We also have used post-experiment questionnaires to probe whether they thought their movement direction had changed over the course of the experiment (volitionally or otherwise). Again, participants generally insist they moved straight to the target throughout the experiment. So it seems that they unaware of any change in action or perception.

      Reaction time data provide additional support that participants are unaware of any change in behavior. The RT function remains flat after the introduction of the clamp, unlike the increases typically observed when participants engage in explicit strategy use (Tsay et al, 2024).

      Figure 1h: The caption suggests this is from the Wang 2021 paper. However, in the text 180-182 it suggests this might be the map from the current results. Can the authors clarify?

      Fig 1e is the data from Wang et al, 2021. We formalized an abstract map based on the spatial constrains observed in Fig 1e, and simulated the error at the start and target position based on this abstraction (Fig 1h). We have revised the text to now read (Lines 182-190):

      “Motor biases may thus arise from a transformation error between these coordinate systems. Studies in which participants match a visual stimulus to their unseen hand or vice-versa provide one way to estimate this error(Jones et al., 2009; Rincon-Gonzalez et al., 2011; van Beers et al., 1998; Wang et al., 12/2020). Two key features stand out in these data: First, the direction of the visuo-proprioceptive mismatch is similar across the workspace: For right-handers using their dominant limb, the hand is positioned leftward and downward from each target. Second, the magnitude increases with distance from the body (Fig 1d). Using these two empirical constraints, we simulated a visual-proprioceptive error map (Fig. 1h) by applying a leftward and downward error vector whose magnitude scaled with the distance from each location to a reference point.”

      Reviewer #3 (Recommendations for the authors):

      The central idea behind the research seems quite promising, and I applaud the efforts put forth. However, I'm not fully convinced that the current model formulations are plausible explanations. While the dataset is impressively large, it does not appear to be optimally designed to address the complex questions the authors aim to tackle. Moreover, the datasets used to formulate the 3 different model predictions are SMALL and exhibit substantial variability across individuals, and based on average (and thus "smoothed") data.

      We hope to have addressed these concerns with the two major changes to revised manuscript: 1) The new experiment in which we examine biases in both angle and extent and 2) the inclusion in the analyses of fits based on individual data sets.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) Discrepancies with previous findings need clarification, especially regarding the absence of similar behavioral effects in F1. Lack of discussion on the decision to modify paradigms instead of using the same model. Presentation of behavioral data in supplementary materials, with a recommendation to include behavioral quantification in main figures. Absence of quantification for freezing behavior, a crucial measure in fear conditioning.

      We agree, thank you. One of the major revisions we have made to this version of the manuscript is the addition of much more thorough analysis of our F1 behavior. While not captured by the (relatively gross) measure of the approach-avoid index, further analysis has highlighted interesting differences between the F1s of unpaired and paired offspring, and in an odor-specific manner. As these analyses have given rise to many new results and conclusions, we have attempted to adjust the manuscript to reflect the major change that we do, in fact, find effects in F1, if subtle. 

      Classical odor-shock pairing was used in both Dias & Ressler’s and our study to directly expand upon the findings of increase in cell number. This enabled our discovery of biasing of newborn OSNs. For our behavioral readouts, we chose to focus on the ethological behavior of avoidance. From our extensive behavioral analysis (Figures 5 & 6), we successfully identified several behavioral differences in the F1 offspring that had not previously been described.

      Reviewer #2 (Public Review):

      (1) The main weakness is the disconnect between the morphological changes reported and the lack of change in aversion to the odorant in F1 progeny. The authors also do not address the mechanisms underlying the inheritance of the phenotype, which may lie outside of the scope of the present study.

      Thank you for your comments. Our revised manuscript includes both new experiments and new analyses that probe the relationship between a change in cell number and a change in avoidance behavior, and we have revised the manuscript text to address this point directly. In short, we find both in the F0 generation (at extended time points) and in the F1, that an increase in cell number does not always correlate with avoidance behavior. However, we do find nuanced behavioral differences between the offspring of unpaired and paired fathers. Whether the increase in cell number in offspring is necessary to observe the behavioral changes is outside the scope of the current study, but certainly a question we are interested in answering in future work. 

      Reviewer #3 (Public Review):

      (1) In the abstract / summary, the authors raise expectations that are not supported by the data. For example, it is claimed that "increases in F0 were due to biased stem cell receptor choice." While an active field of study that has seen remarkable progress in the past decade, olfactory receptor gene choice and its relevant timing in particular is still unresolved. Here, Liff et al., do not pinpoint at what stage during differentiation the "biased choice" is made. 

      EdU is only taken into stem cells in the S phase, and differences in EdU-labeled M71 or MOR23 OSNs across fear conditioning groups indicates a biasing in subtype identity. We do not make claims regarding the exact stage of OSN maturation at which biasing may occur; rather, we demonstrate that the stem cells that were dividing during EdU administration are more likely to mature into an M71 OSN if a mouse receives paired acetophenone conditioning compared to unpaired or no conditioning (and similarly with MOR23 and lyral). This phenomenon must involve receptor choice, as that is the mechanism by which OSN subtypes form. 

      (2) Similarly, the concluding statement that the study provides "insight into the heritability of acquired phenotypes" is somewhat misleading. The experiments do not address the mechanisms underlying heritability. 

      We do not claim to provide direct insight into the mechanisms underlying heritability. Our experiments do provide insight into the heritability of acquired phenotypes, as we corroborate previous studies that this olfactory fear conditioning paradigm induces heritable changes in the nose and in behavior. We also demonstrate odor-specific behavioral differences in the offspring conditioned fathers, suggesting that the mechanisms underlying the specific behavioral phenotypes may be unique to the conditioning odorant, and not one universal mechanism. These results provide basic knowledge that will accelerate our ability to uncover the mechanisms driving heritable changes. 

      (3) The statement that "the percentage of newborn M71 cells is 4-5 times that of MOR23 may simply reflect differences in the birth rates of the two cell populations" should, if true, result in similar differences in the occurrence of mature OSNs with either receptor identity. According to Fig. 1H & J, however, this is not the case. 

      We have removed that statement from the manuscript, as subtype-specific differences in proliferation rates are not the focus of this study and we do not wish to make claims about it based on our EdU experiments. We do not compare our iDISCO cell density counts to EdU co-labeling counts nor ratio counts, as differences between M71 and MOR23 quantification in cleared tissue versus EdU uptake may simply reflect the inherent differences between methodologies. Our claims are solely within M71 cohorts and MOR23 cohorts. 

      (4) An important result is that Liff et al., in contrast to results from other studies, "do not observe the inheritance of odor-evoked aversion to the conditioned odor in the F1 generation." This discrepancy needs to be discussed. 

      This is discussed in the manuscript, and we report behavioral differences revealed by additional analyses. 

      (5) The authors speculate that "the increase in neurons responsive to the conditioned odor could enhance the sensitivity to, or the discrimination of, the paired odor in F0 and F1. This would enable the F1 population to learn that odor predicts shock with fewer training cycles or less odorant when trained with the conditioned odor." This is a fascinating idea that, in fact, could have been readily tested by Liff and coworkers. If this hypothesis were found true, this would substantially enhance the impact of the study for the field.

      We agree that additional F1 behavioral paradigms are a major next step to understand the functional behavioral differences that may emerge from an increase in specific OSN subtype. Due to the nontrivial amount of time and effort it requires to generate F1 offspring (on the order of many months), and because we do not test individual offspring in multiple behavioral assays (such that they are naïve to their father’s conditioning odor), these experiments are outside the scope of this current study. 

      Reviewer #1 (Recommendations For The Authors):

      (1) Considering that the authors are expanding upon the previous findings of Dias and Ressler (2014), it is crucial to clarify the discrepancies in the results between both works in the discussion. While I acknowledge the use of a different experimental design by the authors, if the premise assumes there is a universal mechanism for transgenerational acquired modification it prompts the question: Why don't we observe similar behavioral effects in F1 in the present model? This issue needs extensive discussion in the manuscript to advance the field's understanding of this topic. Additionally, I am also curious about the author's decision to modify the paradigms instead of using exactly the same model to further extend their findings on stem cells, for example. Could you please provide comments on this choice and elaborate on this aspect in the discussion? 

      We agree, thank you. One of the major revisions we have made to this version of the manuscript is the addition of much more thorough analysis of our F1 behavior. While not captured by the (relatively gross) measure of the approach-avoid index, further analysis has highlighted interesting differences between the F1s of unpaired and paired offspring, and in an odor-specific manner. As these analyses have given rise to many new results and conclusions, we have attempted to adjust the manuscript to reflect the major change that we do, in fact, find effects in F1, if subtle. 

      Classical odor-shock pairing was used in both Dias & Ressler’s and our study to directly expand upon the findings of increase in cell number. This enabled our discovery of biasing of newborn OSNs. For our behavioral readouts, we chose to focus on the ethological behavior of avoidance. From our extensive behavioral analysis (Figures 5 & 6), we successfully identified several behavioral differences in the F1 offspring that had not previously been described. We have revised the discussion section to elaborate on these decisions.

      We incorporated the behavioral data into the main figures and included a freezing metric to Figure 5 (F, J, & N). We did do an analysis of time spent freezing in the control vs. conditioned chamber, but since the F0 paired mice spend so little time in the conditioned odor chamber, they also spend most of their time freezing in the control odor chamber. Thus, we felt it was better to show the overall time spent freezing during the trial.

      (2) It is unclear why the authors chose to present all behavioral data to supplementary materials. I strongly recommend not only incorporating the behavioral data into the main figures but also expanding the behavioral quantification. It appears that the author dismissed the potential effects on F1 without a thorough exploration of animals' behaviors. The task contains valuable information that could be further investigated, potentially altering the findings or even the conclusions of the study. Notably, the absence of quantification for freezing behavior is incomprehensive. Freezing is a crucial measure in fear conditioning, and it's surprising that the authors did not mention it throughout the manuscript. I encourage the author to include freezing data in the analysis and other behavioral quantification as follows: a) freezing during odor presentation and ITI for conditioning days. b) freezing during odor preference test in all compartments. c) it is not very clear the design of the Odor preference behavioral testing. Is the odor presented in a discrete manner or the order is constantly presented in the compartment? Could the authors quantify the latency to avoid after the visit in the compartment? d) in the video it is very clear the animals are doing a lot of risk assessment, this could be also analyzed and included as a fear measure.  

      Thanks for the suggestion. We incorporated the behavioral data into the main figures and included a freezing metric to Figure 5 (F, J, & N). We did do an analysis of time spent freezing in the control vs. conditioned chamber, but since the F0 paired mice spend so little time in the conditioned odor chamber, they also spend most of their time freezing in the control odor chamber. Thus, we felt it was better to show the overall time spent freezing during the trial. In the methods section we describe that the odor is continuously bubbled into the chamber throughout the trial, but we have clarified this in the main text as well. As for further behavioral metrics like latencies and risk assessment, initial analyses have not shown anything in the F1 data that we wished to report here. Future work from the lab will investigate this further.

      (3) In the Dias and Ressler paper, a crucial difference exists between the models that could elucidate the absence of transgenerational effects on F1. In their study, the presence of the unconditioned stimulus (US) is consistent across all generations in the startle task. I am curious whether, in the present study, the authors considered pairing the F1 with a US-paired task in a protocol that does not induce fear conditioning (e.g., lower shock intensity or fewer pairings). Could this potentially lead to an increased response in the parental-paired offspring? Did the author consider this approach? I understand how extensive this experiment can be, therefore I'm not directly requesting, although it would be a fantastic achievement if the results are positive. Please consider discussing this fundamental difference in the manuscript. 

      To clarify, the F1 generation is presented with the unconditioned stimulus, just never conditioned with it. In these experiments, we were primarily interested in the F1’s naïve reaction to their father’s conditioning odorant, and whether the presentation of that odor in the absence of a stressor would lead to any fear-like behavioral responses.

      We have considered the experiments you have suggested and have ongoing projects in the lab further investigating F1 effects and whether their father’s experiences affect their ability to learn in conditioning tasks. Because of the amount of time and effort it requires to generate F1 offspring, and because we do not wish to test individual offspring in multiple assays, we do not present any of these experiments in the current manuscript. Ongoing work is looking into whether 1-day (vs. 3-day) conditioning is sufficient in the offspring of paired mice, and we appreciate the suggestion of subthreshold shock intensity. We will also clarify in the discussion that future work will try to answer these questions. 

      (4) If the videos were combined it would be better to appreciate the behavioral differences of paired vs unpaired. 

      Thank you for the suggestion, fixed. Video S1 is now a combination of unpaired and paired example videos. 

      (5) Figure 3E, is there an outlier in the paired group that is driving the difference? Please run an outlier test on the data if this has not been done. If already done, please express the stats. 

      We ran an outlier test using the ROUT method (Q=1%) and did not find any outliers to be removed. We also ran the same test on all other data and removed one mouse from the Acetophenone F1 Paired group in Figure 5 (also described in the Methods section). 

      (6) I understand that using the term "olfactory" twice in the title may seem redundant. However, the authors specifically demonstrate the effects of olfactory fear conditioning. I suggest including "odor-induced" before "fear conditioning" in the title for greater specificity and accuracy. This modification would better reflect the study's focus on olfactory fear conditioning, especially given the authors did not explore fear conditioning broadly (e.g., contextual, and auditory aspects were not examined). 

      Thank you for your feedback. We found “olfactory” twice as cumbersome. We have changed the title to “Fear conditioning biases olfactory sensory neuron expression across generations”, to more accurately highlight the importance of the olfactory sensory neuron expression, intergenerationally. 

      (7) The last page of the manuscript has a list of videos (8 videos), but only two were presented.

      We have made sure to include all 7 videos (videos 1 and 2 were combined) in this version.  

      Reviewer #2 (Recommendations For The Authors):

      (1) The analyses mentioned on lines 210-220 should be presented. 

      Thank you for the suggestion. We have removed this part of the manuscript as we do not have a large enough n to draw conclusions about cell longevity in this paper. Future studies in the lab will incorporate this analysis.

      Reviewer #3 (Recommendations For The Authors):

      (1) The manuscript contains several supplementary figures and movies that are not referred to in the main text. 

      All supplementary figures and movies are now referred to in the manuscript text.

      (2) In the abstract, the authors state that they "investigated changes in the morphology of the olfactory epithelium." I think that is (technically) not what they did. In fact, the authors do not show any morphometry of the epithelium (e.g., thickness, layers, etc.), but count the density of OSNs that share a specific receptor identity. Along the same lines, the authors state in the abstract that recent work has shown that conditioning is "resulting in increases in olfactory receptor frequencies." However, recent studies did not show increased "receptor frequencies", but changes in cell count. Whether (or not) receptor expression per OSN is also changed remains unknown (would be interesting though). 

      Yes, agreed. We changed “morphology” to “cellular composition.” We also changed any references to “receptor frequencies” to “olfactory sensory neuron frequencies.”

      (3) Reference 20 needs to be updated. 

      Thank you, updated.

      (4) l.52: the distribution of OSNs into (four) zones is a somewhat outdated concept as zonal boundaries are rather blurry. Generally, of course, dorsoventral differences are real. 

      Yes, we agree and changed the verbiage to “region” as opposed to “zone.” We mainly bring this up because it later becomes relevant that both M71 and MOR23 are expressed in the same (antero-dorsal) region and thus can be quantified with the same methodology.

      (5) Fig. 3B & C: the EdU background staining is quite peculiar. Any reason why the epithelium is mostly (with the sustentacular nuclei being a noticeable exception) devoid of background? 

      We use the ThermoFisher Click-iT Plus EdU kit (Invitrogen, C10638) and it has consistently produced very good signal to noise ratio.

      Responses to Editor’s note

      We thank the editor for their constructive suggestions. 

      (1) Should you choose to revise your manuscript, please include full statistical reporting including exact p-values wherever possible alongside the summary statistics (test statistic and df) and 95% confidence intervals. These should be reported for all key questions and not only when the p-value is less than 0.05. 

      Thank you for the suggestion. We created two supplementary tables with statistical reporting: Table S1 for the main figure statistics, and Table S2 for the supplementary figure statistics.

    1. This manuscript focuses on “how Technological Activities and Products (TAPs) are assessed in the individual evaluation of researchers’ trajectories in Argentina.” The authors present a rigorous, relevant, and critical theoretical framework addressing social impact, scientific careers, and the case of Argentina. The method involves an analysis of a database of external reviewer reports corresponding to promotion applications submitted in 2017 and 2018, encompassing 421 reports from three areas. Textual quantitative analysis using software and qualitative content analysis of Section 2 of the experts’ evaluation form were the applied methods. Some issues should be clarified by the authors:

      1. Was there any institutional review board (IRB) approval for the data collection process?

      2. The description of the qualitative methods is insufficient. I strongly recommend using a checklist for qualitative research (e.g., COnsolidated criteria for REporting Qualitative research — COREQ, available at https://onlinelibrary.wiley.com/pb-assets/assets/17416612/COREQ_Checklist-1556513515737.pdf) to ensure transparency, rigor, and reproducibility of the results.

      3. Although the quantitative and qualitative results are interesting and thought-provoking, a stronger connection between these two methodological approaches is recommended. Employing a mixed-method strategy could be beneficial.

      4. The Discussion section needs reorganization. The first paragraph should summarize the quantitative and qualitative findings, while subsequent paragraphs should interpret and explain each result using a theoretical framework. Considering that this is an empirical study, the section should not include a discussion of the literature revised, as currently presented in the first paragraph. Finally, the limitations of the study must be acknowledged.

    Annotators

    1. L'école dont nous rêvons : Synthèse de la consultation des acteurs de l'éducation

      Résumé

      Ce document de synthèse résume les points clés de la consultation "L'école dont nous rêvons", organisée par l'Institut de France et l'Académie des Sciences, avec un événement local piloté par l'INSPÉ de l'Académie de Lille et l'Université de Lille.

      La consultation vise à mener une réflexion prospective et structurelle sur l'avenir de l'école en France, en s'éloignant d'une simple liste de doléances pour se concentrer sur les défis à relever et les leviers de transformation.

      L'initiative nationale s'articule autour de cinq grands thèmes : l'élève, le métier d'enseignant, l'organisation des établissements, la mixité sociale et scolaire, et l'inclusion des élèves à besoins spécifiques.

      La méthodologie repose sur une double consultation : des auditions institutionnelles et des rencontres de terrain pour valoriser les initiatives existantes et recueillir des propositions concrètes.

      L'objectif final est de proposer des scénarios de transformation chiffrés et échelonnés dans le temps, destinés à éclairer le débat public sans imposer de solution unique, avec un horizon fixé à 2050.

      L'Académie de Lille, caractérisée par sa grande diversité de territoires et une forte proportion d'élèves en éducation prioritaire, met en avant son engagement dans la lutte contre les déterminismes et le développement de réponses locales adaptées.

      Elle souligne l'importance d'un cadre scolaire bienveillant, d'une cohésion de la communauté éducative, de partenariats territoriaux forts et d'une dynamique d'innovation et d'expérimentation soutenue par la recherche.

      Les initiatives inspirantes présentées lors de la consultation illustrent des leviers d'action concrets :

      La collaboration professionnelle (AEPS) pour transformer le métier et diffuser les bonnes pratiques.

      La valorisation du plurilinguisme (CASNAV) comme une richesse pour toute la communauté scolaire.

      Le croisement des savoirs (ATD Quart Monde) entre école, familles et quartier pour une meilleure compréhension mutuelle et la réussite de tous.

      L'ancrage territorial (Cités éducatives) pour rompre l'isolement des établissements et créer des dynamiques collaboratives.

      La coopération interprofessionnelle (PIA3) entre les secteurs scolaire et médico-social pour une inclusion réussie.

      La pratique artistique (CFMI) comme outil de cohésion, de plaisir d'enseigner et de développement interdisciplinaire.

      Ensemble, ces perspectives dessinent les contours d'une école plus agile, collaborative, inclusive et ancrée dans son territoire, capable de s'adapter aux transformations sociales et de redonner du pouvoir d'agir à l'ensemble de ses acteurs.

      --------------------------------------------------------------------------------

      1. Contexte et objectifs de la consultation

      L'événement "L'école dont nous rêvons" s'inscrit dans le cadre d'une grande consultation nationale initiée par l'Institut de France et l'Académie des Sciences.

      L'étape lilloise a été co-pilotée par la Maison pour la science de l'INSPÉ de l'Académie de Lille et la direction culture de l'Université de Lille.

      La démarche se veut prospective, visant à envisager les possibles pour la construction future de l'école.

      Elle a pour but de dépasser les constats sur les dysfonctionnements pour se concentrer sur les défis majeurs auxquels l'institution scolaire est et sera confrontée, parmi lesquels :

      • L'intégration de l'intelligence artificielle.

      • Le choc démographique à venir.

      • La nécessaire prise en compte des évolutions sociales et de leur impact sur les élèves, les enseignants et les familles.

      L'ambition est de bâtir l'école de demain de manière "offensive et positive", plutôt que de réagir défensivement à des évolutions qui auraient déjà dépassé l'institution.

      2. Le projet national "L'école dont nous rêvons"

      2.1. Philosophie et méthodologie

      Lancé il y a deux ans, le projet national ne cherche pas à dresser la liste de ce qui ne fonctionne pas, mais à identifier les défis que l'école doit relever.

      L'objectif est de proposer des modifications structurelles au système éducatif pour lui conférer plus de flexibilité, d'agilité et de capacité d'adaptation au terrain et aux élèves, tout en redonnant du "pouvoir d'agir" aux acteurs.

      La consultation se déroule en deux volets parallèles :

      1. Auditions institutionnelles : Rencontres avec des syndicats, des conférences de recteurs, des associations d'élus (maires de France, maires ruraux), des réseaux de parents, de chercheurs, et des partenaires comme les MDPH.

      2. Rencontres de terrain : Déplacements dans deux à trois lieux par région, en recherchant une grande diversité géographique et sociologique. Ces rencontres ont un double objectif :

      Parler en bien de l'école : Mettre en lumière les réussites et reconnaître le talent et l'énergie des acteurs de terrain.   

      Réflexion collective : Faire remonter les bonnes idées et identifier les freins, au-delà des simples listes de doléances.

      2.2. Les cinq axes de réflexion

      La consultation est structurée autour de cinq thèmes principaux, axés sur la structure du système plutôt que sur des questions purement pédagogiques (comme le choix d'une méthode de lecture).

      Axe de réflexion

      Contenu et questions clés

      1. L'élève

      Mettre l'élève au centre. Réflexion sur les rythmes (annuels, hebdomadaires), la mise en œuvre effective des cycles, la personnalisation des parcours et l'éducation au choix pour rendre l'élève acteur de son orientation.

      2. Le métier d'enseignant

      Définir les contours du métier au-delà des heures de cours. Intégrer le tutorat, le travail en équipe, la formation continue et l'évolution de carrière pour renforcer l'attractivité de la profession.

      3. L'organisation de l'établissement

      Renforcer l'ancrage territorial et le travail partenarial (collectivités, familles, associations, secteur médico-social). Penser l'autonomie en termes de subsidiarité pour mieux s'adapter au contexte local.

      4. La mixité sociale et scolaire

      Assurer la mixité, porter une ambition commune pour tous les élèves et mettre en place des dispositifs de remédiation efficaces pour les plus fragiles.

      5. L'inclusion des élèves à besoins spécifiques

      Imaginer des parcours adaptés et une continuité de prise en charge pour les élèves en situation de handicap, mais aussi ceux présentant des troubles de l'apprentissage, de l'attention ou des problèmes de santé mentale.

      2.3. Finalité du projet

      Le projet aboutira à la création d'un groupement (potentiellement un Groupement d'Intérêt Public) réunissant des institutions prestigieuses (Collège de France, ENS de Paris, Lyon et Rennes, Académie des Sciences, CNAM, etc.). Ce groupement aura pour mission de :

      • Identifier des axes prioritaires pour chaque thème.

      • Proposer plusieurs scénarios de transformation.

      Chiffrer ces scénarios en termes de moyens financiers et humains.

      • Définir une trajectoire de transformation à long terme.

      Le résultat sera un document accessible à tous, visant à éclairer le débat public pour que la société puisse s'emparer de la question de l'école. Le choix final relèvera d'une décision démocratique.

      3. Perspectives de l'Académie de Lille

      3.1. Un territoire de défis et d'engagements

      L'Académie de Lille est marquée par une forte densité de population et une grande diversité de territoires, allant de zones urbaines très peuplées à des zones rurales.

      Chiffres clés :

      • Plus de 750 000 élèves.

      • Plus de 3 000 écoles et 667 établissements du second degré.

      • Près de 60 000 enseignants.

      • Un tiers des élèves relève de l'éducation prioritaire (41 REP+, 158 quartiers prioritaires).

      La lutte contre les déterminismes est une priorité historique de l'académie, qui s'efforce d'apporter des réponses locales adaptées aux enjeux territoriaux, comme les Territoires Éducatifs Ruraux (TER) ou le projet "Calais territoire bilingue".

      3.2. Conditions de la réussite et dynamique d'innovation

      Pour l'académie, l'école doit être un lieu d'émancipation où l'élève se sent bien et en confiance. Plusieurs conditions sont jugées nécessaires pour y parvenir :

      La cohésion de la communauté éducative : Implication de tous les personnels (enseignants, direction, CPE, pôle santé-social, AED, etc.) et des parents.

      Un partenariat fort avec les acteurs du territoire : Élus, tissu associatif.

      Une orientation scolaire et professionnelle menée en lien avec le supérieur et le monde de l'entreprise.

      L'académie se caractérise par une forte dynamique d'innovation, avec l'appui de chercheurs pour évaluer et améliorer les projets :

      25 projets expérimentaux dérogatoires dans le second degré (environ 50 établissements).

      Plus de 300 projets innovants suivis dans le premier et le second degré.

      • Des laboratoires intégrés (plus de 100) en mathématiques, français, musique, etc., qui sont à la croisée de l'innovation et de la formation.

      3.3. La formation comme levier essentiel

      La formation continue est considérée comme un pilier pour accompagner les évolutions.

      L'École Académique de la Formation Continue (EAFC) a mis en place près de 5 000 formations pour 2024-2025, à destination de près de 40 000 personnels de tous corps. Deux exemples illustrent cet investissement :

      Intelligence Artificielle : Un plan a permis de former plus de 5 000 personnes.

      Compétences Psychosociales (CPS) : Création d'un Diplôme Universitaire avec l'INSPÉ pour former 50 formateurs d'ici fin 2026, avec l'objectif d'irriguer tous les lieux et temps de l'enfant, et pas seulement la classe.

      4. Présentation d'initiatives inspirantes

      Plusieurs initiatives locales ont été présentées pour illustrer des pistes concrètes et "ouvrir le champ des possibles".

      4.1. AEPS : La force du collectif pour le métier d'enseignant

      L'Association pour l'Enseignement de l'Éducation Physique (AEPS) agit comme un réseau national pour diffuser les connaissances en EPS.

      Elle favorise le lien et la transformation du métier en permettant aux enseignants de se former et d'échanger sur leur temps personnel.

      L'association, reconnue jusqu'à l'inspection générale, démontre l'importance des collectifs professionnels pour faire évoluer les pratiques.

      Citation clé : "Vaut la peine d'être enseigné ce qui unit et ce qui libère." - Olivier Reboul

      4.2. CASNAV : Le plurilinguisme comme levier éducatif

      Le Centre Académique pour la Scolarisation des élèves allophones Nouvellement Arrivés (CASNAV) souligne une évolution majeure : l'inclusion en classe ordinaire est désormais vue comme la condition de l'apprentissage, et non plus comme un objectif après la maîtrise du français.

      L'initiative phare est la "Feuille de route" du Conseil de l'Europe, expérimentée dans un collège lillois. Elle vise à :

      • Réaliser une "photographie" de toutes les langues présentes dans un établissement (langues enseignées et langues familiales).

      • Associer tous les acteurs (élèves, enseignants, direction, parents, personnels non-enseignants).

      • Valoriser la diversité linguistique comme une richesse et une compétence centrale pour tous.

      4.3. ATD Quart Monde : Croiser les savoirs pour la réussite de tous

      L'association propose une démarche de "croisement des savoirs et des pratiques" pour tisser des liens entre l'école, les familles (notamment en situation de grande précarité) et le quartier.

      La méthode repose sur la reconnaissance que chacun détient un savoir utile :

      • Savoir académique (école).

      • Savoir d'action (professionnels).

      • Savoir d'expérience de vie (parents).

      Un point crucial de la démarche est de commencer le travail en groupes de pairs avant de rassembler tout le monde, afin de garantir une parole plus égale.

      Cela permet de lever les malentendus, de construire la confiance et de prendre conscience des "dimensions cachées" de la précarité qui freinent les apprentissages.

      4.4. Cités Éducatives : L'intelligence territoriale en action

      L'expérience des Cités Éducatives montre comment l'ancrage territorial peut dynamiser les établissements.

      En faisant de chaque collège le chef de file d'une thématique liée aux forces de son territoire (santé, mathématiques, arts/langues), le projet a permis de :

      Rompre l'isolement des équipes et des établissements.

      Augmenter l'implication des enseignants en les décentrant de leur seule classe pour les faire agir à l'échelle du réseau.

      • Créer une émulation et une synergie où "les forces de l'un viennent au secours des faiblesses de l'autre".

      • Donner un sens concret au rôle de coordinateur de discipline.

      4.5. PIA3 : Coordonner scolaire et médico-social pour l'inclusion

      Face à une législation sur l'éducation inclusive qui évolue rapidement, le projet PIAL "100% IDT" a développé des formations interprofessionnelles partagées entre les personnels de l'Éducation nationale et ceux du secteur médico-social.

      L'objectif est de décloisonner les cultures et de faire collaborer ces acteurs pour mieux accompagner la scolarisation des élèves à besoins éducatifs particuliers.

      La démarche, basée sur la recherche et l'évaluation, répond à un besoin fort du territoire.

      4.6. CFMI : La musique comme ADN de la co-construction

      Le Centre de Formation de Musiciens Intervenants (CFMI) a pour ADN la co-construction de projets entre artistes et enseignants.

      L'initiative "Cœur ressource interprofessionnelle" a rassemblé pendant près de 10 ans des enseignants de tous niveaux, des artistes et des professeurs de conservatoire pour chanter ensemble.

      Ce projet a permis de :

      • Tisser des liens entre les établissements et les ressources culturelles du territoire.

      • Favoriser le plaisir d'enseigner, considéré comme une condition essentielle.

      • Montrer comment la pratique artistique peut irriguer l'ensemble des disciplines.

      Le rêve porté par le CFMI est celui d'une école "où on chante tous les jours".

    1. Former les Futurs Enseignants à une Approche Sensible de l'Espace : Synthèse et Analyse

      Résumé

      Ce document synthétise les arguments clés d'une recherche sur la nécessité de former les futurs enseignants à une approche sensible, incarnée et interdisciplinaire de l'espace.

      La thèse centrale est que l'éducation scolaire, qui a historiquement cherché à neutraliser et contraindre le corps des élèves, doit évoluer pour faire de ce dernier un outil fondamental d'apprentissage et de compréhension de l'environnement proche.

      En créant un pont entre l'architecture et la géographie, deux disciplines qui partagent un intérêt pour l'espace vécu mais restent peu intégrées dans les cursus, il est possible de développer une pédagogie plus riche et émancipatrice.

      Une expérimentation menée à l'INSPÉ de Bordeaux auprès de futurs enseignants a servi de cas d'étude.

      En mobilisant des dispositifs comme le "parcours augmenté" ou la "carte mentale", l'étude a révélé une tendance des participants à privilégier des représentations conceptuelles et objectives de l'espace (vue de dessus, absence de corps), conformes aux normes scolaires traditionnelles.

      Ce constat démontre l'urgence de former les enseignants à dépasser l'approche purement cartographique pour intégrer la dimension vécue, sensorielle et émotionnelle.

      La conclusion préconise l'intégration de modules de formation basés sur l'expérience sensible, capables de croiser les disciplines et de donner aux élèves les moyens de devenir des acteurs conscients et engagés de leur environnement.

      --------------------------------------------------------------------------------

      1. Le Paradigme de l'Espace Scolaire : Corps et Pédagogie

      L'analyse de l'espace scolaire révèle des tensions profondes entre les cadres physiques, les modèles pédagogiques et la place accordée au corps de l'élève.

      1.1. Le Déterminisme Spatial en Question

      Une idée reçue suggère que modifier l'espace d'apprentissage (mobilier, lieu) suffit à transformer la pédagogie.

      Une expérimentation de terrain contredit ce "déterminisme spatial". Sur trois enseignants invités à faire classe dans la cour de récréation, deux ont répliqué leur modèle frontal et contrôlé, réorganisant les élèves en rangs.

      Seul l'enseignant qui pratiquait déjà une pédagogie différenciée (en îlots) en classe a permis aux élèves une plus grande liberté corporelle.

      Constat : Le changement spatial ne garantit pas un changement pédagogique.

      Le Triptyque de Pascal Clerc : Cette observation illustre la persistance du modèle où "une classe qui fait classe dans une classe" se reproduit, même en extérieur.

      Nécessité d'accompagnement : Il est crucial de former et d'accompagner les enseignants pour qu'ils puissent exploiter différemment les potentiels pédagogiques des espaces, intérieurs comme extérieurs.

      1.2. La Neutralisation du Corps à l'École

      L'institution scolaire a historiquement cherché à neutraliser le corps des élèves, le contraignant par le mobilier et les règles.

      Cette mise à l'écart du corps, relégué aux cours d'Éducation Physique et Sportive, entre en paradoxe avec l'ambition d'une éducation intégrale qui devrait englober les dimensions physique, sensible et intellectuelle.

      Le Corps comme Outil de Mesure et de Perception : L'expérience personnelle de la chercheuse, Maylis Leuret, en tant qu'étudiante en architecture, illustre comment le corps peut devenir un outil premier pour comprendre et représenter l'espace (mesurer avec ses pas, évaluer les proportions avec son regard).

      Corps, Intimité et Espace : Faire exister le corps en classe revient à aborder la question de l'intime et du rapport à soi et aux autres.

      Des sujets comme l'éducation à la vie affective et relationnelle (EVARS) ou l'aménagement des toilettes scolaires sont des extensions de cette problématique, soulignant un rapport au corps souvent tabou ou négligé.

      Passer d'un Espace Subi à un Espace Habité : L'enjeu central de la recherche est de transformer l'espace scolaire d'un cadre subi en un lieu réellement habité par les élèves, porteur d'apprentissages et d'expériences.

      2. Un Pont entre Architecture et Géographie

      La recherche propose de créer un "rapport fécond" entre l'architecture et la géographie pour développer une éducation à l'espace plus complète, ancrée dans le vécu.

      2.1. Des Notions et Enjeux Communs

      Bien que relevant de champs académiques distincts, ces deux disciplines convergent sur plusieurs points :

      Notions partagées : Le corps, l'espace, l'habiter, l'environnement proche.

      Évolution commune : Elles intègrent de plus en plus la dimension du vécu, de l'expérience sensible et des appropriations pour analyser la manière dont les individus trouvent leur place sur un territoire.

      2.2. Une Faible Intégration Curriculaire

      Malgré leurs synergies potentielles, la relation entre géographie et architecture est faible dans les parcours de formation :

      • La géographie occupe une place restreinte dans les 21 écoles d'architecture françaises.

      • L'architecture n'est pas un objet d'étude explicite dans les programmes de géographie du primaire, du secondaire ou des licences universitaires.

      2.3. Vers une "Géographie Expérientielle"

      Un courant de la géographie scolaire, la "géographie expérientielle", promeut une approche alignée sur ces principes.

      Portée par des chercheuses comme Sophie Gojal et Caroline Lingénère-Frésal, elle valorise une géographie ancrée dans l'expérience spatiale des élèves.

      "La géographie expérientielle permet aux élèves de penser l'espace, de se penser dans l'espace et de faire le lien entre leur pratique spatiale et le cours de géographie." - Caroline Lingénère-Frésal (2020)

      Cette approche mobilise des démarches actives comme les sorties de terrain, les jeux de rôle et les dispositifs sensibles pour rendre les enfants enquêteurs et acteurs de leur environnement.

      3. L'Expérimentation de l'INSPÉ de Bordeaux : Méthodologie et Constats

      Une séance de formation a été menée le 26 mars 2025 à l'INSPÉ de Bordeaux auprès de 11 étudiants en Master 2, en collaboration avec la géographe Julie Picard, pour tester l'impact de dispositifs sensibles.

      3.1. Objectifs et Hypothèses

      Objectif : Analyser comment l'introduction de dispositifs interdisciplinaires et sensibles transforme le rapport des futurs enseignants à leur environnement proche.

      Hypothèse : Ces démarches, croisant architecture et géographie, diversifient les représentations de l'espace et constituent des outils pédagogiques transférables.

      3.2. Dispositifs Pédagogiques Déployés

      L'expérimentation s'est articulée autour de trois dispositifs principaux dans la cour de l'INSPÉ :

      Dispositif

      Description

      Objectif

      Parcours Commenté

      Les participants s'enregistrent seuls (téléphone) en décrivant oralement ce qu'ils voient, observent et ressentent dans l'espace.

      Verbaliser la perception, identifier les lieux emblématiques ou marquants.

      Parcours Augmenté

      Expérimentation de l'espace avec des sens altérés ou mis en avant (ex: guidé les yeux fermés, marcher pieds nus).

      Solliciter des sens autres que la vue (toucher, ouïe) pour générer de nouvelles perceptions et émotions.

      Parcours Iconographique

      Prise de photographies représentant un lieu caractéristique, un endroit de bien-être ou de malaise.

      Capturer une représentation subjective et visuelle du rapport à l'espace.

      3.3. Analyse des Résultats : La Carte Mentale comme Révélateur

      À l'issue des parcours, les participants devaient réaliser une carte mentale de l'INSPÉ de mémoire.

      L'analyse de ces productions a révélé plusieurs tendances :

      Prédominance de la Vue de Dessus : Les représentations s'apparentaient majoritairement à des plans cartographiques, conformes aux normes scolaires.

      Injonction à l'Objectivité : Les cartes étaient souvent structurées, légendées et zonées, dans une quête de clarté et d'objectivité.

      Absence quasi totale du Vivant : Les participants ne se sont pas représentés, ni n'ont représenté les autres. Les corps étaient absents, une distance étant mise avec l'expérience vécue.

      Les participants ont justifié cela par le mouvement constant des corps, difficile à "fixer" sur une carte.

      Freins à la Représentation : Une difficulté à "bien dessiner" et l'absence de modèle ont freiné l'expressivité.

      3.4. Analyse des Perceptions Sensorielles

      L'analyse des retours a montré que les expériences de déambulation ont principalement sollicité trois sens :

      La vue : Omniprésente et indispensable à l'orientation.

      L'ouïe : Devenue dominante lorsque la vue était réduite, souvent associée à des sons apaisants (chant des oiseaux).

      Le toucher : Émergeant dans des situations spécifiques (marche pieds nus), provoquant des ressentis variés (curiosité, inconfort, fraîcheur, humidité, dégoût).

      Le fait d'être dehors a été majoritairement ressenti comme positif, renforçant le bien-être (calme, relaxation).

      4. Conclusions et Recommandations pour la Formation

      L'ensemble de cette démarche souligne l'importance de former les enseignants à une éducation à l'espace qui dépasse le cadre conceptuel pour intégrer le corps et le sensible.

      4.1. Vers une Application Pédagogique Concrète

      L'utilisation de ces dispositifs avec des élèves nécessite un cadrage précis pour être efficace. Interrogée sur la possibilité de faire dessiner aux élèves leur "école idéale", Maylis Leuret met en garde contre une consigne trop ouverte qui mène souvent à des imaginaires stéréotypés (piscine, dromadaires) sans portée concrète.

      Pour viser une amélioration réelle du cadre de vie, l'enjeu doit être précis et s'appuyer sur un récit commun construit à partir de l'expérience quotidienne.

      4.2. La Nécessité d'une Pédagogie Interdisciplinaire

      Une approche sensible de l'espace constitue un levier puissant pour l'interdisciplinarité, permettant de croiser de multiples compétences et domaines :

      Géographie et Mathématiques : Relation à l'environnement, structuration de l'espace.

      Français et Arts Plastiques : Mise en récit, représentation sensible.

      Éducation Physique et Sportive : Balade sensible, mesure de l'espace par le corps.

      4.3. Propositions pour la Formation des Enseignants

      Malgré un cadrage institutionnel qui tend à prioriser le français et les mathématiques, il est essentiel d'intégrer cette dimension dans la formation :

      Intégrer des modules dédiés aux méthodes de recherche ancrées dans l'expérience sensible au sein des maquettes de formation des INSPÉ.

      Organiser des sessions pratiques : balades sensibles, parcours augmentés, ateliers utilisant la carte, la photo, le son ou la maquette.

      Renforcer la coordination institutionnelle et valoriser les liens entre la géographie expérientielle et les pratiques d'atelier de projet des écoles d'architecture.

      En définitive, faire place au corps dans l'enseignement de l'espace ouvre la voie à une éducation plus émancipatrice, attentive aux vécus et capable de former des citoyens plus conscients des enjeux liés à leur environnement.

    1. Les Espaces Scolaires : Analyse d'un Milieu de Vie et d'Apprentissage

      Résumé

      Ce document de synthèse analyse la nature complexe des espaces d'apprentissage scolaires, en s'appuyant sur les recherches de Guilhem Labinal.

      L'analyse révèle que la "forme scolaire" traditionnelle, caractérisée par une salle de classe fermée et une disposition frontale des élèves (type "bus" ou "wagon"), reste prédominante malgré son inadéquation reconnue avec les pédagogies actives et le bien-être des élèves.

      Un paradoxe central émerge : une majorité d'enseignants, notamment dans le secondaire, utilise quotidiennement une configuration spatiale qu'ils ne jugent pas idéale pour leurs pratiques pédagogiques.

      Le maintien de ce modèle s'explique par de multiples contraintes systémiques : la gestion du temps (cours de 45-50 minutes), les effectifs élevés, les normes sociales entre collègues, les impératifs de sécurité et l'inertie d'un modèle historiquement ancré.

      L'analyse souligne qu'il n'existe pas de déterminisme spatial : changer le mobilier ne suffit pas à transformer la pédagogie.

      Une transformation durable requiert une approche écosystémique qui intègre les dimensions matérielle, pédagogique et relationnelle (l'espace vécu).

      L'étude des espaces scolaires doit être multiscalaire (de la salle de classe à l'établissement) et pluridisciplinaire, en mobilisant la géographie, la sociologie et la psychologie environnementale.

      Des méthodes qualitatives comme les "parcours commentés" se révèlent particulièrement efficaces pour documenter la singularité de l'expérience spatiale des différents acteurs (élèves, enseignants, CPE, personnel non enseignant), démontrant que l'espace de l'un n'est pas l'espace de l'autre.

      En fin de compte, l'aménagement des espaces scolaires est le produit d'un arbitrage constant entre des tensions et des besoins variés, nécessitant une réflexion globale et concertée.

      --------------------------------------------------------------------------------

      1. La Forme Scolaire Traditionnelle en Question

      La conférence de Guilhem Labinal s'ouvre sur une critique fondamentale de la "forme scolaire" traditionnelle, un modèle spatial et temporel qui sépare l'école du reste de la société et impose un cadre normatif rigide à l'apprentissage.

      1.1. Caractéristiques du Modèle Dominant

      Le modèle de la salle de classe conventionnelle est décrit comme un "dispositif carré qui est quand même fermé".

      Il est le fruit d'une histoire et de normes anciennes, comme les instructions de Marcelin Berthelot au XIXe siècle (1,25 m² par élève, salles de 40-50 m²). Ce modèle induit :

      La Fixité et l'Immobilité : Les chaises, parfois "arrimées au sol", contraignent les corps et limitent les mouvements, ce qui pose des questions sur le développement moteur et le bien-être des élèves.

      La Séparation : L'école est conçue comme un lieu clos, séparé de la société, des familles et même des objets du quotidien comme le smartphone.

      L'architecture de certains établissements, avec des "double herse de château fort", renforce cette image d'enfermement sécuritaire.

      L'Organisation Frontale : À partir du CP, l'organisation spatiale bascule vers un "ordre magistraux-centré", avec une disposition privilégiée dite en "bus" ou "wagon".

      1.2. Le Paradoxe de l'Usage par les Enseignants

      Une enquête menée auprès d'enseignants d'histoire-géographie de l'académie de Versailles révèle un paradoxe frappant.

      Dispositif Spatial

      Usage Quotidien

      Dispositif Jugé Idéal

      Type "Bus" ou "Wagon" (Frontal)

      • Plus de 80 % (74 % des 32 répondants)

      Majoritaire

      Autres (îlots, U, etc.)

      • Moins de 25 % (25 % des 32 répondants)

      Minoritaire

      Ce décalage significatif soulève une question centrale : "pourquoi les enseignants utilisent un dispositif qui a priori ne correspond pas du tout au dispositif idéal pour mettre en œuvre leur séquence ?".

      Le faible taux de réponse au questionnaire suggère également que ce dispositif est si "normé qu'il a été intégré dans la forme scolaire sans qu'on la questionne beaucoup", surtout dans le secondaire.

      2. Les Contraintes Empêchant la Transformation des Espaces

      Le maintien du modèle traditionnel n'est pas simplement le fait d'un choix individuel mais le résultat d'un ensemble de contraintes systémiques qui pèsent sur les acteurs éducatifs.

      La Contrainte Temporelle : La durée des cours (45 à 50 minutes) est citée comme un obstacle majeur.

      Un enseignant interrogé déclare ne pas avoir "le temps ou le courage de modifier la disposition" et de la remettre en place, qualifiant la démarche de "trop chronophage".

      Les Normes Sociales et la Pression des Pairs : Les salles étant partagées, modifier l'agencement peut créer des tensions.

      La nécessité de remettre la salle "dans l'ordre que les collègues sont susceptibles d'attendre" sous peine d'une "pause café assez désagréable" est une puissante force d'inertie.

      Les Effectifs : Gérer une classe de 35 élèves rend toute réorganisation logistiquement complexe et difficile à mettre en œuvre.

      L'Appropriation de l'Espace : Les enseignants ont besoin de s'approprier leur espace de travail.

      L'expérience post-Covid, où les enseignants devaient changer de salle à la place des élèves, a montré que beaucoup se sentaient "pas chez eux".

      Cette appropriation est essentielle pour installer une organisation "didactiquement finalisée".

      L'Absence de Déterminisme Spatial : L'idée qu'il suffit de changer le mobilier pour changer la pédagogie est un postulat erroné.

      Guilhem Labinal insiste : "il n'y a pas de déterminisme par le dispositif spatial de la pédagogie".

      L'aménagement doit accompagner un projet pédagogique, et non le précéder.

      Les Impératifs de Sécurité : La logique de sécurisation, exacerbée depuis les attentats, conduit à un renforcement de l'enfermement (portiques, vitres peintes).

      Cette approche est parfois paradoxale, car elle peut créer des attroupements dangereux devant les entrées.

      3. Approches Théoriques et Cadres d'Analyse

      Pour comprendre la complexité des espaces d'apprentissage, une approche pluridisciplinaire et un cadre conceptuel robuste sont nécessaires.

      3.1. Le Triptyque : Ordre Distributionnel, Fonctionnel et Transactionnel

      Guilhem Labinal propose un modèle d'analyse en trois dimensions pour appréhender la salle de classe comme un "microsystème" :

      1. L'Ordre Distributionnel : Il s'agit du cadre matériel et architectural, de la "façon dont sont disposés des objets dans l'espace". C'est la matérialité brute.

      2. L'Ordre Fonctionnel : C'est la disposition pédagogique, "la manière dont on organise les différents éléments dans l'espace en relation avec une finalité pédagogique".

      3. L'Ordre Transactionnel : C'est l'espace vécu, qui reconnaît que les lieux "sont vécus différemment [...] par les uns et par les autres".

      Il intègre les régulations, les transgressions et les relations interpersonnelles.

      3.2. L'Apport des Différentes Disciplines

      L'étude des espaces scolaires est un champ de recherche où convergent plusieurs disciplines :

      La Géographie : Longtemps focalisée sur les échelles macro (quartier, ville), elle s'intéresse désormais à l'échelle micro, en appliquant des concepts comme la distance, la proximité ou l'itinéraire aux interactions dans une salle de classe.

      L'analyse porte sur la place du corps, du regard et des gestes.

      La Psychologie Environnementale : Elle a été précurseur dans l'étude de l'influence de l'architecture et du design sur les états psychologiques et les comportements sociaux.

      La Didactique : Ce champ a longtemps été un "impensé", se concentrant sur le triangle savoir-élève-enseignant ou les supports technologiques, mais rarement sur "les effets de la matérialité du dispositif architectural".

      La Sociologie et l'Anthropologie : Ces disciplines analysent les relations entre pairs, les relations de pouvoir, les dynamiques de genre et de régulation au sein des espaces comme la cour de récréation ou la salle de classe.

      4. Données Empiriques et Méthodes de Recherche

      Différentes études, quantitatives et qualitatives, permettent de documenter l'impact des espaces sur l'apprentissage et le bien-être.

      4.1. Les Études Expérimentales et Quantitatives

      Ces études analysent les effets de variables isolées sur les apprentissages : luminosité, température, qualité de l'air, acoustique.

      Une étude de grande ampleur menée par Peter Barret a identifié que 16 paramètres spécifiques (lumière, température, appropriation, complexité, couleur, flexibilité...) pouvaient expliquer 16 % de la variation des progrès scolaires des élèves sur un an.

      Cependant, ces approches présentent des limites :

      Difficulté d'isoler les variables dans un environnement complexe.

      "Effet établissement" (exposition au soleil, localisation des salles) qui rend les comparaisons difficiles.

      "Effet maître" : l'efficacité pédagogique d'un enseignant est une variable majeure difficile à neutraliser.

      Excès de codification dans les questionnaires, qui empêche l'expression d'un vécu singulier.

      4.2. Les Approches Qualitatives et Phénoménologiques

      Pour surmonter ces limites, les approches qualitatives se concentrent sur l'expérience vécue ("le vécu").

      Le Concept d' "Habiter" : Il ne s'agit pas seulement d'être présent dans un lieu, mais de le "vivre [...] dans la diversité des modes d'habiter", en fonction des moments, des personnes rencontrées et des actions menées.

      Les Parcours Commentés : Cette méthode consiste à se promener dans l'établissement avec un acteur (enseignant, CPE, élève...) et à le laisser commenter son vécu des lieux.

      Elle fait émerger la "singularité de la relation qu'on entretient avec les lieux" et révèle une approche multisensorielle (sons, bruits, circulation).

      Les Cartes Mentales : Elles permettent d'exprimer la relation subjective à un lieu, mais nécessitent d'être triangulées avec d'autres méthodes (entretiens d'explicitation) pour éviter les omissions.

      5. L'Établissement Scolaire : Un Écosystème Multiscalaire

      L'analyse ne peut se limiter à la salle de classe.

      L'établissement dans son ensemble fonctionne comme un écosystème complexe où les espaces sont vécus et appropriés différemment par chaque acteur.

      5.1. L'Espace de l'Un n'est pas l'Espace de l'Autre

      Les parcours commentés révèlent des "régimes d'habiter" distincts :

      L'enseignant fréquente principalement la salle des profs, le CDI, la salle de reprographie et sa propre salle.

      Le CPE (Conseiller Principal d'Éducation) a un parcours beaucoup plus large, de la grille d'entrée à son bureau, traversant la cour où il est constamment interpellé par les élèves (un trajet qui peut prendre "entre 15 et 20 minutes").

      Le personnel de cuisine ou d'entretien a ses propres itinéraires et temporalités, souvent méconnus des autres acteurs, ce qui peut générer des incompréhensions (ex: propreté des toilettes pendant la pause déjeuner).

      5.2. Une Typologie des Espaces Vécus

      Au sein de l'établissement, différents types d'espaces coexistent :

      Espaces privatisés : L'espace derrière le bureau du professeur, où aucun élève ne s'aventure par convention.

      Espaces à accès conditionnel : L'administration.

      Espaces sacralisés : Un banc ou un recoin qui devient un "haut lieu de la vie particulière de l'élève".

      Espaces genrés : La cour de récréation, où les usages varient fortement (jeux de ballon vs autres activités).

      Espaces de sociabilité : Les "cabinets" de physique ou d'histoire-géographie, qui peuvent renforcer la cohésion disciplinaire au détriment de l'interdisciplinarité.

      La gestion de ces espaces est un "arbitrage" constant entre les besoins de différents acteurs, illustrant que "l'espace c'est le fruit d'une équation de tension entre acteurs sociaux".

    1. Reviewer #3 (Public review):

      In the study the authors performed longitudinal 1P calcium imaging of mouse mPFC across 8 weeks during learning of an olfactory-guided task, including habituation, training, and sleep periods. The authors' goal was to determine how the mPFC representation of the task changed and what aspects of activity emerged between the learning and the learned conditions of the task. The task had 3 arms. Odor was sampled at the end of the middle arm (named the "Sample" period). The animal then needed to run to one of the two other arms (R or L) based on the odor. The whole period until they reached the end of one of the choice arms was the "Outward" period. The time at the reward end was the "Reward" period. They noted several changes from the learning condition to the learned condition:

      (1) They classified cells in a few ways. First each cell was classified as SI (spatially informative) if it had significantly more spatial information than shuffled activity, and ~50% of cells ended up being SI cells. Then among the SI cells they classified a cell as a TC (task cell) if it had statistically similar activity maps for R versus L arms, and a GC (goal arm cell) otherwise. Note that there are 4 kinds of these cells: outer arm TCs and GCs and middle arm TCs and GCs (with middle arm GCs essentially being like "splitter cells" since they are not similarly active in the middle arm for R versus L trials). There was an increase in TCs from the learning to the learned condition sessions. They also note the sources of these TCs (some came from GCs, others from non-SI cells).

      (2) They analyze activity sequences across cells. They extracted 500 ms duration bursts (defined as periods of activity > 0.5 standard deviations over what I assume is the mean, which is a permissive threshold encompassing a significant fraction of the activity in non-sleep, non-habituation periods). They first noted that the resulting "Burst rates were significantly larger during behavioral epochs than during sleep and during periods of habituation to the arena" and "Moreover, burst rates during correct trials were significantly lower than during error trials". For the sequence analysis they only considered bursts consisting of at least 5 active cells. A cell's activity within the burst was set to the center of mass calcium activity. Then they took all the sequences from all learned and learning sessions together and hierarchically clustered them based on the Spearman's rank correlation between the order of activity in each pair of sequences (among the cells active in both). The iterative hierarchical clustering process produces groups (clusters) of sequences such that there are multiple repeats of sequences within a cluster. Different sequences are expressed across all the longitudinally recorded sessions. They noted "large differences of sequence activation between learning and learned condition, both in the spatial patterns (example animal in Fig. 4D) and the distribution of the sequences (Fig. 4D,E). Rastermap plots (Fig. 4D) also reveal little similarity of sequence expression between task and habituation or sleep condition." They also note the difference in the sequences between learning and learned condition was larger than the different between correct and error trials within each condition. They conclude that during task learning new representations are established, as measured by the burst sequence content. They do additional analyses of the sequence clusters by assessing the spatial informativeness (SI) of each sequence cluster. Over learning they find an increase in clusters that are spatially informative (clusters that tend to occur in specific locations). Finally, they analyzed the SI clusters in a similar manner as SI cells and classified them as task phase selective sequences (TSs) and goal arm selective sequences (GSs) and did some further analysis. However, they themselves conclude that the frequency of TSs and GSs is limited because most sequence clusters were non-SI. In the discussion they say "In addition to GSs and TSs, we found that most of the recurring sequences are not related to behavior (not SI)".

      (3) As an alternative to analyzing individual cells and sequences of individual cells, they then look for trajectory replay using Bayesian population decoding of location during bursts. They analyze TS bursts, GS bursts, and non-SI bursts. They say "we found correlations of decoded position with time bin (within a 500 ms burst) strongly exceeding chance level only during outward and reward phase, for both GSs and TSs (Fig 5H)." Fig5H shows distributions indicating statistically significant bias in the forward direction (using correlations of decoded location versus time bin across 10 bins of 50 ms each within each 500ms burst). They find that the Outward trajectories appear to reflect the actual trajectory during running itself, so are likely not replay. But the sequences at the Reward are replay as they do not reflect the current location. Furthermore, replay at the Reward is in the forward direction (unlike the reverse replay at Reward seen in the hippocampus) and this replay is only seen in the learned and not the learning condition. At the same time, they find that replay is not seen during odor Sampling, from which they conclude there is no evidence of replay used for planning. Instead they say the replay at the Reward could possibly be for evaluation during the Reward phase, though this would only be for the learned condition. They conclude "Together with our finding of strong changes in sequence expression after learning (Fig 4E) these findings suggest that a representation of task develops during learning".

      This study provides valuable new information about the evolution of mPFC activity during the learning of a odor-based 2AFC T-maze-like task. They show convincing evidence of changes in single cell tuning, population sequences, and replay events. They also find novel forward replay at the Reward, and find that this is present only after the animal learned the task. In the discussion the authors note "the present study, to our knowledge, identified for the first time fast recurring neural sequence activity from 1-p calcium data, based on correlation analysis". Overall, they find a substantial amount of change among the features they analyzed and according to their methods, though they note a small amount of activity was preserved through learning.

      One comment is that the threshold for extracting burst events (0.5 standard deviations, presumably above the mean) seems lower than what one usually sees as a threshold for population burst detection, and the authors show (in Supplementary Fig 1) that this means bursts cover ~20-40% of the data. However, it is potentially a strength of this work that their results are found by using this more permissive threshold.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The study mainly replicates the authors' previously reported results about generalized and trajectory-specific coding of task structure by prefrontal neurons, and stable and changing representations over learning (Muysers et al., 2024, PMID: 38459033; Muysers et al., 2025, PMID: 40057953), although there are useful results about changes in goal-selective and taskphase selective cells over learning. There are basic shortcomings in the scientific premise of two new points in this manuscript, that of the contribution of pre-existing spatial representations, and the role of replay sequences in the prefrontal cortex, both of which cannot be adequately tested in this experimental design.

      We agree with the reviewer that we have not made sufficiently clear which aspects of our paper add to previous publications. We have now better explained methodological differences.

      Also, we agree that our very general statements on pre-existing spatial representations in the introduction and abstract in the previous manuscript were not properly followed up in the Results section. In the revision, the respective statements are clarified, and we also added analysis of a further control condition (see response to A), which shows that particularly a subset of task cells maintains there firing fields from an early habituation period, arguing that, while the population representation of the task largely develops during learning, there exists a scaffold of small but significant amount of cells that could be interpreted as a schema.

      We also further clarified our view on replay sequences in the prefrontal cortex (see response to B). Particularly, we are grateful to the reviewer for the suggestion to also include other reactivation analysis which led to new results presented in new Figure 3.

      [A] The study denotes neurons that show precise spatial firing equivalently irrespective of goal, as generalized task representations, and uses this as a means to testing whether pre-existing spatial representations can contribute to task coding and learning. …. [I]n order to establish generalization for abstract task rules or cognitive flexibility, as motivated in the manuscript, there is a need to show that these neurons "generalize" not just to firing in the same position during learning of a given task… For an adequate test of pre-existing spatial structure, either a comparison task, as in the examples above, is needed, or at least a control task in which animals can run similar trajectories without the task contingencies. An unambiguous conclusion about pre-existing spatial structure is not possible without these controls.

      We thank the reviewer for this suggestion. We may, however, note that the previous manuscript did not make strong claims about pre-existing structures in the Results or Discussion. Also Schemas were only taken up as a discussion point. We nevertheless agree with the reviewer that assessment of the spatial prestructure requests further analysis. To address their point, we analyzed neuronal activity during the habituation phase before the start of task training, when the animals freely explored the same maze without any task contingency (animals explored mostly in the arms of the maze). We compared the place fields of neurons during this habituation period with their task-related activity. Consistent with the small overlap of firing rate maps between learning and learned phase, also this analysis revealed a small number of cells with significant correlations (up to 20% for task cells; a significant fraction according to a  binomial test). The results are shown as a new Figure supplement to Figure 2.

      [B] The scientific premise for the test of replay sequences is motivated using hippocampal activity in internally guided spatial working memory rule tasks [...] and applied here to prefrontal activity in a sensory-cue guided spatial memory task [...]. There are several issues with the conclusion in the manuscript that prefrontal replay sequences are involved in evaluating behavioral outcomes rather than planning future outcomes.

      We agree with the reviewer that preplay in Hippocampus and mPFC are distinct. We further emphasized this distinctiveness in the respective paragraph in the discussion (see response to B1).

      [B. 1] First, odor sampling in odor-guided memory tasks is an active sensory processing state that leads to beta and other oscillations in olfactory regions, hippocampus, prefrontal cortex, and many other downstream networks [...]. This is an active sensory state, not conducive to internal replay sequences, unlike references used in this manuscript to motivate this analysis, which are hippocampal spatial memory studies with internally guided rather than sensory-cue guided decisions, where internal replay is seen during immobility at reward wells. These two states cannot be compared with the expectation of finding similar replay sequences, so it is trivially expected that internal replay sequences will not be seen during odor sampling.

      We agree with the reviewer that the sampling phase cannot be compared with the “preplay” state in the hippocampus. We have rewritten the manuscript in the results and discussion sections to clarify. We, however, disagree, that the absence of replay sequences in the mPFC 1P calcium data is trivial, since we actually do see many sequences during sampling (Fig 4E, Fig 4 suppl 2 A). These sequences are just not related to task activity and may e.g. reflect activity related to sensing, but do not contain information about goal arm.

      [B. 2] Second, sequence replay is not the only signature of reactivation. Many studies have quantified prefrontal replay using template matching and reactivation strength metrics that do not involve sequences [...].  Third, previous studies have explicitly shown that prefrontal activity can be decoded during odor sampling to predict future spatial choices - this uses sensory-driven ensemble activity in prefrontal cortex and not replay, as odor sampling leads to sensory driven processing and recall rather than a reactivation state [..].

      We thank the reviewer for the suggestion to also perform reactivation analysis (Peyrache et al., 2009, 2010). The results are summarized in the new Figure 3. And show that indeed reactivation is stronger during the sampling phase and it is goal arm specific, arguing that sequence analysis extracts information (partly) complementary to rate covariance based analysis.

      We hope to have convinced the reviewer that, together, the complementary results of reactivation an sequence analysis, as well as the ability to follow these measures over an extended period of time, gives unique insights far beyond the previous publications of these data sets. A consistent analysis of population representation, however, required some reanalyses of previous findings, since we only could focus on a limited number of animals and cells, for which tracking was possible over such a long period of time.

      Reviewer #2 (Public review):

      Further controls are needed to validate the results.

      We thank the reviewer for their generally supportive statements. The revised manuscript contains a number of controls in several new figure supplements.

      Reviewer #3 (Public review):

      [They] conclude that the frequency of TSs and GSs is limited (I believe because most sequence clusters were non-SI - the authors can verify this and write it in the text?). In the discussion, they say, "In addition to GSs and TSs, we found that most of the recurring sequences are not related to behavior".

      The reviewer is correct most clusters were not SI (Fig 5 A). We have added this information in the MS.

      [...] They conclude "Together with our finding of strong changes in sequence expression after learning (Figure 3E) these findings suggest that a representation of task develops during learning, however, it does not reflect previous network structure." I am not sure what is meant here by the second part of this sentence (after "however ..."). Is it the idea that the replay represents network structure, and the lack of Reward replay in the learning condition means that the network structure must have been changed to get to the learned condition? Please clarify.

      The reviewer is correct in their assertion. We rewrote the sentence to clarify: “Together with our finding of strong changes in sequence expression after learning (now Fig 4E) these findings suggest that a representation of task develops during learning, however, it does not reflect sequence structure during learning and habituation”.

      (1) There are some statements that are not clear, such as at the end of the introduction, where the authors write, "Both findings suggest that the mPFC task code is locally established during learning." What is the reasoning behind the "locally established" statement? Couldn't the learning be happening in other areas and be inherited by the mPFC? Or are the authors assuming that newly appearing sequences within a 500-ms burst period must be due to local plasticity?

      We agree that the wording “local” can be misleading, we rephrased the corresponding sentences.

      (2) The threshold for extracting burst events (0.5 standard deviations, presumably above the mean, but the authors should verify this) seems lower than what one usually sees as a threshold for population burst detection. What fraction of all data is covered by 500 ms periods around each such burst? However, it is potentially a strength of this work that their results are found by using this more permissive threshold.

      Since we work with a slow calcium signal, we cannot use as strict thresholds as usually employed using electrophysiology. In addition, our sequence detection approach adds a further level of strictness such that we only consider bursts with recurring sequence structure. In response to this reviewer’s question, we have added quantification of the fraction of all data covered by 500 ms periods in Figure Supplement 1, panels D and E. Indeed we include a large fraction (20 to 40%) (except sleep and habituation), which is consistent with our interpretation that during the outward phase sequences mainly reflect task field firing.

      Reviewer #1(Recommendations for the  Authors):

      It is possible that 1-photon recordings do not have the temporal resolution and information about oscillatory activity to enable these kinds of analyses. Therefore, an unambiguous conclusion about the existence and role of prefrontal reactivation is not possible in this experimental and analytical design.

      We indeed cannot extract information encoded in LFP oscillations from the calcium signal, we now mention the relation between LFP oscillations and olfaction-guided behaviors in the discussion (including the suggested references). However, our finding that sequence and covariance-based analysis yield partly complementary results argues that it indeed allows conclusion about the existence and role of prefrontal reactivation.

      Reviewer #2 (Recommendations for the authors):

      The results of the Muysers et al. (2025) paper need to be discussed in detail and explain why the cell categorization is different, three groups of spatial cells vs two groups here. Also, explain in what aspect the major findings in this work go beyond what was shown in Figure 4 in that paper.

      The main goal of this paper was to explore sequence/replay like activity, which is not at all captured in the Muysers et al. 2025 paper. Because of this focus on sequences, we excluded the inward runs (from reward to sampling point) for better interpretability and thus ended up with only two types of cells. Muysers et al. included backward runs and could thereby also assess whether the place field remains in the outward and inward runs. We added this clarification in the Results section.

      Regarding the reviewer’s question regarding figure 4: Our task cells would largely overlap with the “path-equivalent cells” from Muysers et al. 2025 (albeit not taking into account inward runs). In this sense their finding that the share of path-equivalent cells increases with learning  is consistent with our report of increasing fraction of task cells in Figure 2 C. Our Figure 2 adds that some task cells develop from previous goal cells with fields at the same location (generalizing). Moreover, we use spatial information as a criterion to identify TC and GCs, showing that a large fraction of cells actually is and remains spatially unselective. In Muysers et al. 2025 a statistical criterion was not applied on spatial selectivity but peak height, with fewer neurons failing this test. Moreover, we were analyzing only those cells trackable over the whole period. Despite all these methodological differences, the result of increasing the number of task/path-equivalent cells over learning was consistent. The main reason for recategorization of the cells in the present manuscript was to be able to meaningfully link them to sequence activity (Fig. 5E, F).

      It is not clear from the description how the cell type transitions were quantified. Was the last learning day compared to the first learned day? Given that, particularly during learning, there are changes across days in the spatial representations according to Figure 2 of Muysers et al. (2025), this is the meaningful way to make the comparisons. Nevertheless, it is also not clear whether the daily variations within learning and learned conditions differ from the transition day, so without comparing these three conditions, it is hard to make a firm conclusion from examining only changes in the transition days.

      The analysis of cell type transitions was performed by pooling all learning sessions and comparing them with all learned sessions, without taking into account the chronological order of sessions within each category. This approach allowed us to identify broad changes associated with learning state. Figure supplement 1.C shows the session intervals per animal. We argue that the large interval between learning and learned session justifies this analysis approach.

      Identifying sequences by a clustering method in which sequence patterns of individual events are compared is an interesting idea. Nevertheless, there is a danger, as with any clustering method, that data without clustering tendency could be artificially subdivided into clusters.

      In Figure 4.C, we show three example sequence cluster templates (colored) obtained via hierarchical clustering, along with representative member sequences (black) sorted by cluster membership. In response to this reviewer’s comment, we now included a complete clustering result for one animal, including all sequence clusters and their member sequences. It is provided in Figure 4 supplement 1. This comprehensive visualization serves as an additional control, demonstrating that the clustering approach identifies consistent sequence patterns across the dataset.

      Furthermore, it is possible that some cells at the edge of the cluster boundary may show a more similar sequence tendency to events detected at the overlapping border region of another cluster. Was this controlled for? It would be essential to show that events clustered together all show higher similarity to each other than to events in any other clusters.

      By default, the clusters are rejected if in the adjacency matrix of the graph constructed by significant motif similarity,  the number of within cluster edges is smaller than the number of without cluster edges. In subsequent cluster merges the separation is increased since only those clusters are merged that show significant similarity. As a visual control, we monitor plots as shown in Figure 4 supplement 1. Sequence templates (color dot clouds) are supposed to show no serial correlation when ordered according to any one template other than its own. We have added more clarification to the Methods including a new Figure 6 illustrating the Method.

      From the description, it was not clear how the sequence similarity was established between pairs of individual events. The only way I can see it is that the sequence (orders at which cells fire) is established with one event, and the rank order correlation is calculated with this order for the other event. However, in this case, distance A-B is not the same as distance B-A. Not sure how this is handled with the clustering procedure. Secondly, how the number of clusters is established in the hierarchical clustering procedure needs to be explained. Furthermore, from the method description, it is not clear how GS and TS sequences are identified. Can an event be classified as both a TS and GS event at the same time?

      The reviewer is correct in their assertion that we compute all pairwise rank order correlations (that are then subject to a statistical test detailed in the original method publication Chenani et al., 2019). By nature of the rankorder correlation the coefficients A-B and B-A are symmetric. This is now more carefully explained in the Methods.

      Several control analyses are needed to show that the sequences detected reflect not random patterns but those that repeat at a higher than random chance. This requires, at the first step, to establish to what degree sequences are consistent within a cluster and to what degree individual events show a sequential firing tendency. And at the next stage, these need to be compared with randomised events in which spike timing of cells is jittered or spike identity is randomised, and show that these events result in poorer sequence tendency and less consistent clusters.

      The controls requested by the reviewer are already implemented in our Method (see original publication of the Method in Chenani et al., 2019). This is now made clearer in the Methods section.

      Firing rate and place-related firing of cells alone could generate sequences even if cells otherwise fire independently from each other. In a similar manner, it was shown before that reactivation of waking cell assemblies could be seen in sleep, in which case firing rate differences across cells belonging to the same assembly could also generate sequential patterns without temporal coordination. Appropriate shuffling procedures need to be performed to exclude such scenarios.

      We are aware that the sequential firing in our data (particularly during the outward phase when the animal is performing the task), is most likely resulting from the correlations between rate maps and the animals trajectory. During the reward, this is less likely. An intrinsic control is that during sampling we do not see these sequences. Given the nature of the calcium signal, a direct connection to firing rate is not possible. However, we argue that using our center of mass-approach of the calcium trace effectively normalizes for firing rate effects. Shuffling dF/F amplitudes (as a proxy for firing rates) would thus have no effect on the center of mass sequences. We, however, consider this to be an important methodological difference between sequence analysis with spikes and Calcium signals and have added a related comment to the Methods part.

      The past literature describing mPFC reactivation, replay, and sequences needs to be described, and findings of this work need to be appropriately acknowledged, and those findings compared with this work (starting with this work from 2007 PMID: 18006749). In the current reading, a novice reader of this field might conclude that this is the first work that identified relay and sequences in the mPFC.

      We would like to apologize that the manuscript evokes this impression. This was not our intention, in fact we have given strong emphasis on the Kaefer et al. paper in the Discussion. We have now added early references on PFC replay based on electro-physiological recordings in the Discussion section.

      The analysis of Figure 4H is not sufficient to show that only forward sequences occur. If 50% are forward and 50% are reverse, the median is zero. Some of the presented histograms look like Gaussian distributions with SD=1, which would show that those events were not real sequences. It should be tested whether the distributions are significantly different from the expected Gaussian.

      We agree with the reviewer that we did not explicitly test for significance of individual replays, but only tested for the rightward shift of the median. We have now added these significance tests/p values in Figure 5) and indeed could show that none of the significant backward replays exceed the fraction expected by chance, whereas forward replay significantly exceeds chance levels only in the cases where the median had a significant right ward shift (except for non-SI clusters). We would like to thank the reviewer for this suggestion, which we think makes the analysis stronger.

      Overall, the clarity of the text could be improved, and further examples of reactivated sequences should be shown, and the methods should be illustrated in the figures. At the current version, I fear that even readers in this field would give up on reading the current text given an insufficient level of clarity.

      We have included more examples of reactivated sequence (Suppl2 to Figure 5) and made extensive additions to the methods part. Particularly, we followed the reviewer’s request for method illustration (new Figure 6).

      Reviewer #3 (Recommendations for the authors):

      My main comment here is for the authors to increase the clarity of the manuscript.[...] For instance, it was difficult to follow what was being done to determine TSs and GSs.

      We have made extensive additions to the Methods section including a new Figure 6 depicting the workflow of the sequence analysis in a schematic manner.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer 1:

      Strengths:

      The innovation on the task alone is likely to be impactful for the field, extending recent continuous report (CPR) tasks to examine other aspects of perceptual decision-making and allowing more naturalistic readouts. One interesting and novel finding is the observation of dyadic convergence of confidence estimates even when the partner is incidental to the task performance, and that dyads tend to be more risk-seeking (indicating greater confidence) than when playing solo. The paper is well-written and clear.”

      We thank reviewer 1 for this encouraging evaluation. Below we address the identified weaknesses and recommendations.

      (1) Do we measure metacognitive confidence?

      One concern with the novel task is whether confidence is disambiguated from a tracking of stimulus strength or coherence. […] But in the context of an RDK task, one simple strategy here is to map eccentricity directly to (subjective) motion coherence - such that the joystick position at any moment in time is a vector with motion direction and strength. This would still be an interesting task - but could be solved without invoking metacognition or the need to estimate confidence in one's motion direction decision. […] what the subjects might be doing is tracking two features of the world - motion strength and direction. This possibility needs to be ruled out if the authors want to claim a mapping between eccentricity and decision confidence […].”

      We thank reviewer 1 for pointing out that the joystick tilt responses of our subjects could potentially be driven by stimulus coherence instead of metacognitive decision confidence. Below, we present four arguments to address this point of concern:

      (1.1) Similar physical coherence between high and low confidence states

      Nominal motion coherence is a discrete value, but the random noisiness in the stimulus causes the actual frame-by-frame coherence to be distributed around this nominal value. Because of this, subjects might scale their joystick tilt report according to the coherence fluctuations around the nominal value. To check if this was the case, we use a median split to separate stimulus states into states with large versus small joystick tilt, individually for each nominal coherence. For each stimulus state, we extracted the actual instantaneous (frame-to-frame) motion coherence, which is based on the individual movements of dots in the stimulus patch between two frames, recorded in our data files.

      First, we compared the motion coherence between stimulus states with large versus small joystick tilt. For each stimulus state, we calculated average instantaneous motion coherence, and analyzed the difference of the medians for the large versus small tilt distributions for each subject and each coherence level. The resulting histograms show the distribution of differences across all 38 subjects for each nominal coherence, and are, except for the coherence of 22%, not significantly different from zero across subjects (Author response image 1). For the 22% coherence condition, the difference amounts to 0.19% – a very small, non-perceptible difference. Thus, we do no find systematic differences between the average motion coherence in states with high versus low joystick tilt.

      Author response image 1.

      Histograms of within-subject difference between medians of average coherence distributions with large and small joystick tilt for all subjects. Coherence is color-coded (cyan – 0%, magenta – 98%). On top, the title of each panel illustrates the number of significant differences (Ranksum test in each subject) without correction for multiple comparisons (see Author response table 1 below). In the second row of the title, we show the result of the population t-test against zero. Only 22% coherence shows a significant bias. Positive values indicate higher average coherence for large joystick tilt.  

      Author response table 1.

      List of all individual significantly different coherence distributions between high and low tilt states, without correction for multiple comparisons. Median differences do not show a consistent bias (i.e. positive values) that would indicate higher average coherence for the large tilts.

      (1.2) Short-term stimulus fluctuations have no effect

      […] But to fully characterise the task behaviour it also seems important to ask how and whether fluctuations in motion energy (assuming that the RDK frames were recorded) during a steady state phase are affecting continuous reporting of direction and eccentricity, prior to asking how social information is incorporated into subjects' behaviour.

      In addition to the analysis of stimulus coherence and tilt averaged across each stimulus state (1.1), we analyzed moment-to-moment relationship between instantaneous coherence and ongoing reports of accuracy and tilt. Below, we provide evidence that short-term fluctuations in the instantaneous coherence (i.e. the motion energy of the stimulus) do not result in correlated changes in joystick responses, neither for tilt nor accuracy. For each continuous stimulus state, we calculated cross-correlation functions between the instantaneous coherence, tilt and accuracy, and then averaged the cross-correlation across all states of the same nominal coherence, and then across subjects. The resulting average cross-correlation functions are essentially flat. This further supports our interpretation that the joystick reports do not reflect short-term fluctuations of motion energy.

      Author response image 2.

      Cross-correlation between the length of the resultant vector with joystick accuracy (left) and tilt (right). Coherence is color-coded. Shaded background illustrates 95% confidence intervals.

      (1.3) Joystick tilt changes over time despite stable average stimulus coherence

      If perceptual confidence is derived from evidence integration, we should see changes over time even when the stimulus is stable. Here, we have analyzed the average slope of the joystick tilt as a function of time within each stimulus state for each subject and each coherence, to verify if our participants tilted their joystick more with additional evidence. This is illustrated with a violin plot below (Author response image 3). The linear slopes of the joystick tilt progression over the course of stimulus states are different between coherence levels. High coherence causes more tilt over time, resulting in positive slopes for most subjects. In contrast, low/no coherence results mostly in flat or negative slopes. This tilt progression over time indicates that low coherence results in lower confidence, as subjects do not wager more with weak evidence. In contrast, high coherence causes subjects to exhibit more confidence, indicated by positive slope of the joystick tilt.

      Author response image 3.

      Violin plots showing the fitted slopes of the joystick tilt time course in the last 200 samples (1667 ms) leading up to a next stimulus direction (cf. Figure 2D). Positive values signify an increase in joystick tilt over time. Each dot shows the average slope for one subject. Coherence is color-coded. The dashed line at zero indicates unchanged joystick tilt over the analyzed time window.

      (1.4) Cross-correlation between response accuracy and joystick tilt

      Similar to 1.2 above, we have cross-correlated the frame-by-frame changes of joystick accuracy and tilt for each individual stimulus state and each subject. Across subjects, changes in tilt occur later than changes in accuracy, indicating that changes in the quality of the report are followed by changes in the size of the wager. Given that this process is not driven by short-term changes in the motion energy of the stimulus (see 1.2 above), we interpret this as additional evidence for a metacognitive assessment of the quality of the behavioral report (i.e. accuracy) reflected in the size of the wager (our measure for confidence). (See Figure 2E).

      (2) Peri-decision wagering is different to post-decision wagering

      […] One route to doing this would be to ask whether the eccentricity reports show statistical signatures of confidence that have been established for more classical punctate tasks. Here a key move has been to identify qualitative patterns in the frame of reference of choice accuracy - with confidence scaling positively with stimulus strength for correct decisions, and negatively with stimulus strength for incorrect decisions (the so-called X-pattern, for instance Sanders et al. 2016 Neuron […].

      We thank reviewer 1 for the constructive feedback. Our behavioral data do not show similar signatures to the previously reported post-decision confidence expression (Desender et al., 2021; Sanders et al., 2016). The previously described patterns show, first of all, that confidence for the incorrect type1 decisions diverges from the correct type1 decisions, declining with stimulus strength (e.g. coherence), as compared to increase for correct decisions. In our task, there is a graded accuracy and (putative) confidence expression, but there are no correct or incorrect decisions – instead, there are hits and misses of the reward targets presented at nominal directions. Instead of a decline for misses, we observe an equally positive scaling with coherence for the confidence, both for hits and misses (Author response image 4A). This is because in our peri-decision wagering task, the expression of confidence causally determines the binary hit or miss outcome. The outcome in our task is a function of the two-dimensional joystick response: higher tilt (confidence) requires a more accurate response to successfully hit a target. Thus, a subject can display a high (but not high enough) level of accuracy and confidence but still remain unsuccessful. If we instead median-split the confidence reports by high and low accuracy (Author response image 4C), we observe a slight separation, especially for higher coherences, but still no clear different in slopes.

      We do observe the other two dynamic signatures of confidence (Desender et al., 2021): signature 2 – monotonically increasing accuracy as a function of confidence (Author response image 4), and signature 3 – steeper type 1 psychometric performance (accuracy) for high versus low confidence (Author response image 4D).

      Author response image 4.

      Confidence (i.e., joystick tilt, left column) and accuracy reports (right column) for different stimulus coherence, sorted by discrete outcome (hit versus miss, upper row) and the complementary joystick dimension (lower row, based on median split).

      Author response image 5.

      Accuracy reports correlate positively with confidence reports. For each stimulus state, we averaged the joystick response in the time window between 500 ms (60 samples) after a direction change until the first reward target appearance. If there was no target, we took all samples until the next RDP direction change into account. This corresponds to data snippets averaged in Figure 2D. Thus, for each stimulus state, we extracted a single value for joystick accuracy and for tilt (confidence). Subsequently, we fitted a linear regression to the accuracy-confidence scatter within each subject and within each coherence level. The plot above shows the average linear regression between accuracy and confidence across all subjects (i.e., the slopes and intercepts were averaged across n=38 subjects). Coherence is color-coded.

      (3)  Additional analyses regarding the continuous nature of our data

      I was surprised not to see more analysis of the continuous report data as a function of (lagged) task variables. […]

      Reviewer 1 requested more analyses regarding the continuous nature of our data. We agree that this is a useful addition to our paper, and thank reviewer 1 for this suggestion. To address this point, we revised main Figure 2 and provided additional panels. Panel D illustrates the continuous ramp-up of both accuracy and tilt (confidence) for high coherence levels, suggesting ongoing evidence integration and meta-cognitive assessment. Panel E shows the cross-correlation between frame-by-frame changes in accuracy and tilt (see 1.4 above). Here, we demonstrate that changes in the accuracy precede changes in joystick tilt, characterizing the continuous nature of the perceptual decision-making process.

      (4) Explicit motivation regarding continuous social experiments

      This paper is innovating on a lot of fronts at once - developing a new CPR task for metacognition, and asking exploratory questions about how a social setting influences performance on this novel task. However, the rationale for this combination was not made explicit. Is the social manipulation there to help validate the new task as a measure of confidence as dissociated from other perceptual variables? (see query 1 below). Or is the claim that the social influence can only be properly measured in the naturalistic CPR task, and not in a more established metacognition task?

      Our rationale for the combination of real-time decision making and social settings was twofold:

      i. Primates, including humans, are social species. Naturally, most behavior is centered around a social context and continuously unfolds in real-time. We wanted to showcase a paradigm in which distinct aspects of continuous perceptual decision-making could be assessed over time in individual and social environments.

      ii. Human behavior is susceptible to what others think and do. We wanted to demonstrate that the sheer presence of a co-acting social partner affects continuous decision-making, and quantify the extent and direction of social modulation.

      We agree that the motivation for combining the new task and this specific type of social co-action should be more clear. We have clarified this aspect in the Introduction, line 92-109. In brief, the continuous, free-flowing nature of the CPR task and real-time availability of social information made this design a very suitable paradigm for assessing unconstrained social influences. We see this study as the first step into disentangling the neural basis of social modulation in primates. See also the response to reviewer 2, point 2, below.

      (5) Response to minor points

      (5.1)  Clarification on behavioral modulation patterns

      Lines 295-298, isn't it guaranteed to observe these three behavioral patterns (both participants improving, both getting worse, only one improving while the other gets worse) even in random data?

      The reviewer is correct. We now simply illustrate these possibilities in Figure 4B and how these patterns could lead to divergence or convergence between the participants (see also line 282). Unlike random data, our results predominantly demonstrate convergence.

      (5.2) Clarification on AUC distributions

      Lines 703-707, it wasn't clear what the AUC values referred to here (also in Figure 3) - what are the distributions that are being compared? I think part of the confusion here comes from AUC being mentioned earlier in the paper as a measure of metacognitive sensitivity (correct vs. incorrect trial distributions), whereas my impression here is that here AUC is being used to investigate differences in variables (e.g., confidence) between experimental conditions.

      We apologize for the confusion. Indeed, the AUC analysis was used for the two purposes:

      (i) To assess the metacognitive sensitivity (line 175, Supplementary Figure 2).

      (ii) To assess the social modulation of accuracy and confidence (starting at line 232, Figures 3-6). 

      We now introduce the second AUC approach for assessing social modulation, and the underlying distributions of accuracy and confidence derived from each stimulus state, separately in each subject, in line 232.

      (5.3) Clarification of potential ceiling effects

      Could the findings of the worse solo player benefitting more than the better solo player (Figure 4c) be partly due to a compressive ceiling effect - e.g., there is less room to move up the psychometric function for the higher-scoring player?

      We thank the reviewer for this insight. First, even better performing participants were not at ceiling most of the times, even at the highest coherence (cf. Figure 2 and Supplementary Figure 3C). To test for the potential ceiling effect in the better solo players, we correlated their social modulation (expressed as AUC as in Figure 4) to the solo performance. There was no significant negative correlation for the accuracy (p > 0.063), but there was a negative correlation for the confidence (r = - 0.39, p = 0.0058), indicating that indeed low performing “better players in a dyad” showed more positive social modulation. We note however that this correlation was driven mainly by few such initially low performing “better” players, who mostly belonged to the dyads where both participants improved in confidence (green dots, Figure 4B), and that even the highest solo average confidence was at ceiling (<0.95). To conclude, the asymmetric social modulation effect we observe is mainly due to the better players declining (orange and red dots, Figure 4B), rather than due to both players improving but the better player improving less (green dots, Figure 4B).

      Reviewer 2:

      Strengths:

      There are many things to like about this paper. The visual psychophysics has been undertaken with much expertise and care to detail. The reporting is meticulous and the coverage of the recent previous literature is reasonable. The research question is novel.

      We thank reviewer 2 for this positive evaluation. Below we address the identified weaknesses and recommendations.

      (1) Streamlining the text to make the paper easier to read

      The paper is difficult to read. It is very densely written, with little to distinguish between what is a key message and what is an auxiliary side note. The Figures are often packed with sometimes over 10 panels and very long captions that stick to the descriptive details but avoid clarity. There is much that could be shifted to supplementary material for the reader to get to the main points.

      We thank reviewer 2 for the honest assessment that our article was difficult to read and understand, and for providing specific examples of confusion. We substantially improved the clarity:

      We added a Glossary that defines key terms, including Accuracy and Hit rate. 

      We replaced the confusing term “eccentricity” with joystick “tilt”.

      We simplified Figures 3 and 5, moving some panels into supplementary figures.

      We substantially redesigned and simplified our main Figure 4, displaying the data in a more straightforward, less convoluted way, and removing several panels. This change was accompanied by corresponding changes in the text (section starting at line 277).

      More generally, we shortened the Introduction, substantially revised the Results and the figure legends, and streamlined the Discussion.

      (2) Dyadic co-action vs joint dyadic decision making

      A third and very important one is what the word "dyadic" refers to in the paper. The subjects do not make any joint decisions. However, the authors calculate some "dyadic score" to measure if the group has been able to do better than individuals. So the word dyadic sometimes refers to some "nominal" group. In other places, dyadic refers to the social experimental condition. For example, we see in Figure 3c that AUC is compared for solo vs dyadic conditions. This is confusing.

      […] my key criticism is that the paper makes strong points about collective decision-making and compares its own findings with many papers in that field when, in fact, the experiments do not involve any collective decision-making. The subjects are not incentivized to do better as a group either. […]

      The reviewer is correct to highlight these important aspects. We did, in fact, not investigate a situation where two players had to reach a joint decision with interdependent payoff and there was no incentive to collaborate or even incorporate the information provided by the other player. To make the meaning of “dyadic” in our context more explicit, we have clarified the nature of the co-action and independent payoff (e.g. lines 107, 211, 482, 755 - Glossary), and used the term “nominal combined score” (line 224) and “nominal “average accuracy” within a dyad” (line 439).

      Concerning the key point about embedding our findings into the literature on collective decision-making, we would like to clarify our motivation. Outside of the recent study by Pescetelli and Yeung, 2022, we are not aware of any perceptual decision-making studies that investigated co-action without any explicit joint task. So naturally, we were stimulated by the literature on collective decisions, and felt it is appropriate to compare our findings to the principles derived from this exciting field.  Besides developing continuous – in time and in “space” (direction) – peri-decision wagering CPR game, the social co-action context is the main novel contribution of our work. Although it is possible to formulate cooperative or competitive contexts for the CPR, we leveraged the free-flowing continuous nature of the task that makes it most readily amendable to study spontaneously emerging social information integration.

      We now more explicitly emphasize that most prior work has been done using the joint decision tasks, in contrast to the co-action we study here, in Introduction and Discussion.

      (3) Addition of relevant literature to Discussion

      […] To see why this matters, look at Lorenz et al PNAS (https://www.pnas.org/doi/10.1073/pnas.1008636108) and the subsequent commentary that followed it from Farrell (https://www.pnas.org/doi/full/10.1073/pnas.1109947108). The original paper argued that social influence caused herding which impaired the wisdom of crowds. Farrell's reanalysis of the paper's own data showed that social influence and herding benefited the individuals at the expense of the crowd demonstrating a form of tradeoff between individual and joint payoff. It is naive to think that by exposing the subjects to social information, we should, naturally, expect them to strive to achieve better performance as a group.

      Another paper that is relevant to the relationship between the better and worse performing members of the dyad is Mahmoodi et al PNAS 2015 (https://www.pnas.org/doi/10.1073/pnas.1421692112). Here too the authors demonstrate that two people interacting with one another do not "bother" figuring out each others' competence and operate under "equality assumption". Thus, the lesser competent member turns out to be overconfident, and the more competent one is underconfident. The relevance of this paper is that it manages to explain patterns very similar to Schneider et al by making a much simpler "equality bias" assumption.

      We thank reviewer 2 for pointing out these highly relevant references, which we have now integrated in the Discussion (lines 430 and 467). Regarding the debate of Lorenz et al and Farell, although it is about very different type of tasks – single-shot factual knowledge estimation, it is very illuminating for understanding the differing perspectives on individual vs group benefit. We fully agree that it is naïve to assume that during independent co-action in our highly demanding task participants would strive to achieve better performance as a group – if anything, we expected less normative and more informational, reliability-driven effects as a way to cope with task demands.

      Mahmoodi et al. is a particularly pertinent and elegant study, and the equality bias they demonstrate may indeed underlie the effects we see. We admit that we did not know this paper at the time of our initial writing, but it is encouraging to see the convergence [pun intended] despite task and analysis differences. As highlighted above (2), our novel contributions remain that we observe mutual alignment, or convergence, in real-time without explicitly formulated collective decision task and associated social pressure, and that we separate asymmetric social effects on accuracy and confidence.

      Other reviewer-independent changes:

      Additional information: Angular error in Figure 2

      In panel A of the main Figure 2, we have added the angular error of the solo reports (blue dashed line) to give readers an impression about the average deviation of subjects’ joystick direction from the nominal stimulus direction. We have pointed out that angular error is the basis for accuracy calculation.

      Data alignment

      In the previous version of the manuscript, we have presented data with different alignments: Accuracy values were aligned to the appearance of the first target in a stimulus state (target-alignment) to avoid the predictive influence of target location within the remaining stimulus state, while the joystick tilt was extracted at the end of each stimulus state (state-alignment) to allow subjects more time to make a deliberate, confidence-guided report (Methods). We realized that this is confusing as it compares the social modulation of the two response dimensions at different points in time. In the revision, we use state-aligned data in most figures and analyses and clearly indicate which alignment type has been used. We kept the target-alignment for the illustration of the angular error in the solo-behavior (Figure 2). Specifically, this has only changed the reporting on accuracy statistics. None of the results have changed fundamentally, but the social modulation on accuracy became even stronger in state-aligned data.

      In summary, we hope that these revisions have resulted in an easier-to-understand and convincing article, with clear terminology and concise and important takeaway messages.

      We thank both reviewers and the editors again for their time and effort, and look forward to the reevaluation of our work.

      References

      Desender K, Donner TH, Verguts T. 2021. Dynamic expressions of confidence within an evidence accumulation framework. Cognition 207:104522. doi:10.1016/j.cognition.2020.104522

      Pescetelli N, Yeung N. 2022. Benefits of spontaneous confidence alignment between dyad members. Collective Intelligence 1. doi:10.1177/26339137221126915

      Sanders JI, Hangya B, Kepecs A. 2016. Signatures of a Statistical Computation in the Human Sense of Confidence. Neuron 90:499–506. doi:10.1016/j.neuron.2016.03.025

    1. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Vineis et al. examined the structure and functional potential of microbial communities along a vertical sediment profile of a salt marsh, using a genome-centric metagenomic approach. They attempted to test whether (1) the microbial communities within dynamic upper layers contain genomes with diverse functional potential, (2) the energy limited deeper sediments contain microbial consortia assembled to metabolise complex carbon, and (3) microbial compositional changes in the low energy sediments mirror the burial processes observed in marine environments with similar energetic limitations. Results revealed a core microbial consortia that contains a collective metabolic potential for complex carbon and aromatics degradation, suggesting putative syntrophic interactions. Besides, the recovery of MAGs assembled independently from multiple depths in the same core and the consistent relative abundance structure of MAGs within co-occurrence network modules together suggest burial process as a likely mechanism for microbial assembly.

      Strengths:

      (1) Two long sediment cores (down to 240 cm deep) were collected in this study, allowing investigation of the less well characterised subsurface microbiome in salt marsh.

      (2) A genome-centric metagenomic approach was employed here, which provides information on both the structure and functional potential of the salt marsh sediment microbiome, which is not possible in commonly performed 16S rRNA-based surveys.

      Weaknesses:

      (1) In both the abstract and conclusion, the authors claimed that results from this study provide a "mechanistic understanding" of the assembly and distribution of the microbial communities in salt marsh sediment (P2, L31 and P35, L645-649). However, both claims are speculative and not supported by solid evidence. Firstly, the genomic data presented in this study and supplementary physical properties of sediments in the broader area are not enough to make a solid claim (that appears in the title) on microbial assembly being governed by a burial process. Alternative explanations include residual bioturbation, slow porewater advection, etc. Therefore, this remains an interesting hypothesis unless additional evidence is provided to rule out the alternative explanations. Similarly, the claim on the detailed syntrophic interactions among members within a co-occurrence network module (e.g. P36, L649-652) is purely speculative and warrants functional validation experiments to prove.

      (2) A major aim of this work was to study complex carbon degradation. However, neither CAZymes, the first-line carbon degradation enzymes, nor peptidases, which can be important contributors to carbon degradation at depth, was examined here. METABOLIC, which the authors used for functional annotation of MAGs, by default generates peptidases outputs and can be easily integrated here.

      (3) No geochemical data is available to provide context for the genomic analysis here. Without such information, readers cannot even tell whether the surface sediment samples were oxic or anoxic. A reference to a PhD thesis is provided (P6, L126) but it would be most helpful to extract relevant data from there and provide as a supplementary table.

      (4) A single metagenomic binning tool, CONCOCT, was used in this study, which very likely has resulted in a limited number of MAGs recovered. More (high-quality) MAGs are expected with the use of additional binners and a bin consolidation procedure.

      (5) Several terminologies are misleading here. Firstly, the term "co-occurring" or "co-located" microbes or MAGs (e.g. P1, L19 and P31, L537) can be misleading as it could imply a close spatial relationship. However, co-occurrence networks rely on correlations of (relative) abundance and show statistical associations instead of direct spatial or physical relationships. I would suggest alternative names such as co-abundant or statistically associated microbes. Secondly, the term "persistent conversion of soil organic carbon" (P36, L654) in the conclusion is also misleading as it implies an active process, which cannot be tested without metatranscriptomics or metaproteomics data.

      (6) Based on a NMDS plot of KEGG IDs (Figure 4B), the authors claimed that the functional potential among MAGs in modules 1, 2 and 7 was very similar (P18, L346). However, the dispersions of modules 1 and 2 were just too large. A proper statistical test, such as PERMANOVA, should be used to support the claim.

      (7) Genome-scale metabolic networks was analysed using Metag2Metabo (M2M) and results were discussed in detail (P26, L453-466). However, the source data should be provided in a supplementary table to show what metabolites are producible by which MAGs.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The authors have used gene deletion approaches in zebrafish to investigate the function of genes of the hox clusters in pectoral fin "positioning" (but perhaps more accurately pectoral fin "formation"). 

      Strengths: 

      The authors have employed a robust and extensive genetic approach to tackle an important and unresolved question. The results are largely presented in a very clear way. 

      We thank the reviewer for the positive summary and for recognizing the strengths of our genetic approach and presentation.

      Weaknesses: 

      The Abstract suggests that no genetic evidence exists in model organisms for a role of Hox genes in limb positioning. There are, however, several examples in mouse and other models (both classical genetic and other) providing evidence for a role of Hox genes in limb position, which is elaborated on in the Introduction.

      It would perhaps be more accurate to state that several lines of evidence in a range of model organisms (including the mouse) support a role for Hox genes in limb positioning. The author's work is not weakened by a more inclusive introduction that cites the current literature more comprehensively. 

      Thank you for this constructive comment. We agree that our Abstract implied an absence of genetic evidence across model organisms and could be misleading. We have revised the Abstract to acknowledge that multiple lines of evidence—including classical and molecular studies in mouse and other models—support a role for Hox genes in limb/fin positioning. We have also expanded the Introduction to cite this literature more comprehensively. These changes clarify the current state of knowledge while preserving the novelty of our zebrafish genetic findings.

      It would be helpful for the authors to make a clear distinction between "positioning" of the limb/fin and whether a limb/fin "forms" at all, independent of the relative position of this event along the body axis.

      We thank the reviewer for pointing this out. In the revised manuscript, we now make a distinction between these two aspects: we describe “positioning” as being specified by the expression domains of Hox genes along the anterior–posterior axis, while the “formation” of pectoral fins reflects the functional requirement of Hox genes to induce tbx5a expression and thereby initiate fin development. We have clarified this distinction in the text to better separate these related but distinct roles of Hox genes.

      Discussion of why the zebrafish is sensitive to Hoxb loss with reference to the fin, but mouse Hoxb mutants do make a limb?  

      We thank the reviewer for this important comment. Our interpretation is that paired fins first appeared in vertebrates that already possessed four Hox clusters. It is likely that novel functions related to pectoral fin positioning emerged within the HoxB cluster at that time, contributing to the origin of pectral fins. In zebrafish, we found that these functions remain largely restricted to the hoxba and hoxbb clusters, such that loss of both results in complete absence of pectoral fins. In contrast, mice exhibit a high degree of functional redundancy across Hox clusters. For example, deletion of all HoxB genes except Hoxb13 does not result in forelimb loss (Medina-Martinez et al., 2000), and forelimbs are still present in Hoxa5;Hoxb5;Hoxc5 triple knockouts (Xu et al., 2013). Thus, although we cannot fully explain why HoxB cluster deletions alone do not abolish forelimb formation in mice, it is plausible that overlapping functions from other Hox clusters compensate for the loss of HoxB genes, consistent with the general robustness of the mammalian Hox system. We have revised the Discussion to clarify this point.

      Is this down to exclusive expression of Hoxbs in the zebrafish pectoral fin forming region rather than a specific functional role of the protein? This is important as it has implications for the interpretation of results throughout the paper and could explain some apparently conflicting results.  

      We thank the reviewer for this insightful comment. To address this point, we newly analyzed the expression patterns of PG4–8 genes in the hoxba and hoxbb clusters. Our in situ hybridization results revealed that only hoxb4a, hoxb5a, and hoxb5b are detectably expressed in the pectoral fin buds (Figure 5C, 5E, Figure 7M-R). While we cannot completely exclude the possibility of functional differences among Hox proteins, our data strongly suggest that the loss of pectoral fins in hoxba;hoxbb cluster mutants is primarily due to the expression domains of these specific Hox genes in the fin-forming region, rather than to unique biochemical properties of the proteins. We have added these new data as a figure in the revised manuscript (Figure 7M-R) and clarified this point in the text (lines 312-316).

      Why is Hoxba more potent than Hoxbb? Is this because Hoxba has Hox4/5 present, while Hoxbb has only Hoxb5? Hoxba locus has retained many more Hox genes in cluster than hoxbb; therefore, one might expect to see greater redundancy in this locus).  

      We thank the reviewer for raising this important point. At present, we do not know the precise reason why hoxba appears more potent than hoxbb. The possibility raised by the reviewer—that differences in retained gene content (e.g., Hox4/5 in hoxba versus only Hoxb5 in hoxbb) may underlie this discrepancy—is certainly plausible. However, our previous study on the formation of dorsal and anal fins showed a similar situation: although PG11–13 Hox genes are present in both hoxca and hoxcb clusters, deletion of hox            genes in hoxca cluster had a more pronounced effect on median fin development (Adachi et al., 2024). This suggests that, following the teleost-specific whole-genome duplication, duplicated Hox clusters are not functionally equivalent, and asymmetric retention or deployment of functions may occur. The mechanistic basis of such bias remains unclear and warrants further investigation.

      Deletion of either Hoxa or Hoxd in the background of the Hoxba mutant does have some effect. Is this a reflection of protein function or expression dynamics of Hoxa/Hoxd genes?  

      We appreciate the reviewer’s comment and the opportunity to clarify this point. In Figure 2, we compared several double mutants with the hoxba single mutant. Among thesm, only the hoxba;hoxbb mutant exhibited a complete loss of tbx5a expression, whereas other combinations did not differ substantially from the hoxba mutant alone. Therefore, we consider that additional deletions such as hoxaa, hoxab, and hoxda do not have a strong effect beyond the hoxba deletion itself, and it is unlikely that Hoxa or Hoxd proteins functionally compensate for Hoxba in regulating tbx5a expression. Consistent with this interpretation, in our previous study we did not detect abnormalities in tbx5a expression in the hoxaa;hoxab;hoxda triple mutant (Ishizaka et al., 2024). Taken together, these observations support our view that the hoxba and hoxbb clusters are specifically required for the induction of tbx5a in the pectoral fin field.

      Can we really be confident that there is a "transformation of pectoral fin progenitor cells into cardiac cells"? 

      The failure to repress Nkx2.5 in the posterior (pelvic fin) domain is clear, but have these cells actually acquired cardiac identity? They would be expected to express Tbx5a (or b) as cardiac precursors, but this domain does not broaden. There is no apparent expansion of the heart (field)/domain or progenitors beyond the 16 somite stage. The claimed "migration" of heart precursors in the mutant is not clear. The heart/cardiac domain that does form in the mutant is not clearly expanded in the mutant. The domain of cmlc2 looks abnormal in the mutant, but I am not convinced it is "enlarged" as claimed by the authors. The authors have not convincingly shown that "the cells that should form the pectoral fin instead differentiate into cardiac cells."  The only clear conclusion is the loss of pectoral fin-forming cells rather than these fin-forming cells being "transformed" into a new identity. It would be interesting to know what has happened to the cells of the pectoral fin-forming region in these double mutants. 

      We sincerely thank the reviewer for this important comment. We agree that our data do not yet allow us to conclude with certainty that the presumptive pectoral fin progenitor cells in hoxba;hoxbb cluster mutants are fully “transformed” into cardiac cells. Our intention was to describe the striking posterior expansion of nkx2.5 expression and the altered morphology of the cmlc2-positive cardiac field in the mutants, which suggested a shift in cell fate. However, as the reviewer correctly points out, we did not directly demonstrate that the missing fin progenitors acquire bona fide cardiac identity.

      To address this, we have revised the text to clarify that the most robust conclusion from our current dataset is the loss of pectoral fin-forming cells in hoxba;hoxbb cluster mutants. We have softened or removed the claim of “transformation” and instead emphasize that our observations are consistent with an expansion of cardiac marker expression domains into the region where fin progenitors normally arise. We also acknowledge that the cmlc2 domain is abnormal rather than unequivocally enlarged, and have adjusted our wording accordingly.

      It is not clear what the authors mean by a "converse" relationship between forelimb/pectoral fin and heart formation. The embryological relationship between these two populations is distinct in amniotes.  

      We thank the reviewer for pointing this out. Our intention was to highlight the reciprocal balance between pectoral fin and cardiac progenitors in zebrafish. In particular, Waxman et al. (2008) demonstrated that retinoic acid signaling promotes pectoral fin formation while restricting the expansion of cardiac progenitors, thereby illustrating this reciprocal relationship. To avoid confusion, we have revised the text to explicitly state that this applies to zebrafish.

      The authors show convincing data that RA cannot induce Tbx5a in the absence of Hob clusters, but I am not convinced by the interpretation of this result. The results shown would still be consistent with RA acting directly upstream of tbx5a, but merely that RA acts in concert with hox genes to activate tbx5a. In the absence of one or the other, Tbx5a would not be expressed. It is not necessary that RA and hoxbs act exclusively in a linear manner (i.e., RA regulates hoxb that in turn regulates tbx5a).  

      We appreciate the reviewer’s thoughtful comment. We agree that our original wording in the Results section implied a strictly linear model of RA→Hox→tbx5a. In response, we have revised the Results to state only the experimental observation, namely that RA-dependent induction of tbx5a does not occur in the absence of the hoxba and hoxbb clusters.

      We have moved the broader interpretation to the Discussion, where we now emphasize that  our data are compatible with multiple models. One possibility is a linear pathway in which RA induces Hox expression that subsequently activates tbx5a. Alternatively, it is also plausible that RA induces Hox expression and that RA and Hox proteins act cooperatively to induce tbx5a. Our findings do not distinguish between these possibilities, and both models remain consistent with the data. We believe this restructuring addresses the reviewer’s concern by keeping the Results factual and limiting mechanistic interpretation to the Discussion.

      The authors have carried out a functional test for the function of hoxb6 and hoxb8 in the hemizygous hoxb mutant background. What is lacking is any expression analysis to demonstrate whether Hoxb6b or Hoxb8b are even expressed in the appropriate pectoral fin territory to be able to contribute to pectoral fin development, either in this assay or in normal pectoral fin development. 

      We thank the reviewer for emphasizing the importance of expression analyses. In response, we performed a comprehensive whole-mount in situ hybridization survey of all eight PG4–8 Hox genes from the hoxba and hoxbb clusters (hoxb4a, hoxb5a, hoxb5b, hoxb6a, hoxb6b, hoxb7a, hoxb8a, and hoxb8b) during pectoral fin development (18–30 hpf). Among these, only hoxb4a, hoxb5a, and hoxb5b displayed detectable expression in the developing pectoral fin buds. In contrast, hoxb6a, hoxb6b, hoxb7a, hoxb8a, and hoxb8b were not expressed in this territory. These new data have been incorporated into the revised manuscript (Fig. 7M-R). We believe that this dataset provides a more complete and systematic picture of which Hoxb genes are available to function in pectoral fin development, and we are grateful to the reviewer for this valuable suggestion, which significantly strengthened our study.

      (The term "compensate" used in this section is confusing/misleading.) 

      We thank the reviewer for this helpful remark. We agree that the term “compensate” was misleading in this context, as it could be confused with genetic compensation mechanisms such as transcriptional adaptation. To avoid this ambiguity, we have revised the wording.

      Specifically, we replaced “compensate for” with “mimic the effect of” or “phenocopy” depending on the context. We believe this revision improves clarity and prevents misunderstanding.

      The authors' confounding results described in Figures 6-7 are consistent with the challenges faced in other model organisms in trying to explore the function of genes in the hox cluster and the known redundancy that exists across paralogous groups and across individual clusters.  Given the experimental challenges in deciphering the actual functions of individual or groups of hox genes, a discussion of the normal expression pattern of individual and groups of hox genes (and how this may change in different mutant backgrounds) could be helpful to make conclusions about likely normal function of these genes and compensation/redundancy in different mutant scenarios.  

      We appreciate the reviewer’s thoughtful comment. We agree that functional analyses of Hox genes are often complicated by redundancy within and across clusters. In this revision, we have included additional expression data of PG4–8 genes from the hoxba and hoxbb clusters, showing that only hoxb4a, hoxb5a, and hoxb5b are expressed in the fin buds. Although we did not analyze expression changes across mutant backgrounds in this study, we consider this an important direction for future experiments.

      Reviewer #2 (Public review): 

      Summary: 

      The authors of this manuscript performed a fascinating set of zebrafish mutant analyses on hox cluster deletion and pinpointed the cause of the pectoral fin loss in one combinatorial hox cluster mutant of Hoxba and Hoxbb. 

      Strengths: 

      The study is based on a variety of existing experimental tools that enabled the authors' past construction of hox cluster mutants, and is well-designed. The manuscript is well written to report the authors' findings on the mechanism that positions the pectoral fin. 

      Weaknesses: 

      The study does not focus on the other hox clusters other than ba and bb, and is confined to the use of zebrafish, as well as the comparison with existing reports from mouse experiments.  

      We thank the reviewer for the thoughtful and encouraging evaluation of our manuscript. We are pleased that the strengths of our study design and clarity of writing were recognized. We also acknowledge the noted limitations, and while our focus here is on zebrafish hoxba and hoxbb clusters, we agree that future studies should expand to other hox clusters and additional models. Below, we provide individual responses to the specific points raised.

      Reviewer #1 (Recommendations for the authors): 

      (1) Some additional expression analyses of Hoxb6/b8 etc, could be carried out to address some issues raised in the main review.  

      We thank the reviewer for this suggestion. In response, we performed additional whole-mount in situ hybridization analyses of PG4–8 genes from the hoxba and hoxbb clusters, including hoxb6b and hoxb8b. These experiments showed that only hoxb4a, hoxb5a, and hoxb5b are expressed in the developing fin buds, whereas hoxb6a, hoxb6b, hoxb7a, hoxb8a, and hoxb8b are not. We have incorporated these new data into the revised manuscript (Figure 7M-R), which we believe clarify why functional tests of hoxb6b and hoxb8b did not uncover specific requirements in fin development.

      (2) The discussion section, particularly the more speculative section on evolutionary significance, could be reduced. Discussion of pelvic fin could be removed also, as this has not and could not be addressed with the current experimental design.  

      We thank the reviewer for this helpful suggestion. In line with the recommendation, we have reduced the speculative section on evolutionary significance in the Discussion to make it more concise and focused. We have also removed the discussion of pelvic fins, as these were not directly addressed by our current experimental design. We believe these changes improve the clarity and focus of the Discussion section.

      (3) The conclusions on transformation to cardiac identity could be reevaluated and presented differently.  

      We appreciate the reviewer’s insightful comment. In the revised manuscript, we have toned down our interpretation regarding a transformation to cardiac identity. Instead, we now describe the findings more cautiously, emphasizing the clear loss of fin precursors rather than a definitive acquisition of cardiac fate. We believe this revision presents a more balanced interpretation of the data.

      (4) Minor typographical - I would suggest removing "Genetic Evidence:" from the title.  

      We appreciate the reviewer’s suggestion. In accordance with this comment, we have revised the title to: “HoxB-derived hoxba and hoxbb clusters are essential for the anterior-posterior positioning of zebrafish pectoral fins”.

      Reviewer #2 (Recommendations for the authors): 

      (1) The authors mention the redundancy (between the a type and b type) of Hox clusters derived from an additional whole genome duplication in the teleost fish lineage. But, they do not refer to whether the zebrafish Tbx5 ortholog has an additional copy. This information helps the readers' interpretation of the data presented. First of all, tbx5a suddenly appears on line 143 without introducing its relationship with Tbx5, which needs to be explained in a revised manuscript.  

      We thank the reviewer for highlighting this important point. In zebrafish, there are indeed two Tbx5 orthologs, tbx5a and tbx5b. In the revised manuscript, we have modified the text around line 124 to introduce tbx5a in the context of its orthology to Tbx5, ensuring that its appearance in the Results is clear to the readers.

      (2) I did not readily get whether the limb/fin 'positioning' that the authors focus on in this study is 'anteroposterior' positioning, but not anything else. If it is what is meant, the word 'anteroposterior' should just be inserted at the first appearance of the word 'positioning'.  

      We thank the reviewer for pointing this out. Our study specifically addresses the anteroposterior positioning of paired appendages, that is, how the initial site of pectoral fin formation is defined along the anterior–posterior axis of the body. To clarify this, we have revised the text to insert the word “anteroposterior” at the first appearance of the term “positioning” in both the Abstract and Introduction (lines 26 and 53). We believe this change resolves the ambiguity and makes the focus of our study explicit.

      (3) Figure 5B also shows the remarkable reduction of hoxc1a expression, which the authors do not mention at all. I wonder how this is explained and how the authors justify no remark on this throughout the manuscript. 

      We thank the reviewer for this insightful comment. As correctly noted, we did observe a marked reduction of hoxc1a expression in Figure 5B. However, based on our genetic analyses, we consider that the causal genes underlying the phenotype are most likely located in hoxba and hoxbb clusters. Therefore, although the change in hoxc1a expression is indeed a notable phenomenon, we did not emphasize it in the manuscript in order to maintain focus on the primary clusters responsible for the observed phenotype (lines 240-241). We agree that this point should be acknowledged, and we have now added a brief note in the Results to clarify our findings.

      (4) Figure 1 consists of multiple panels (A-M) but lacks panel D.  

      We apologize for the oversight. We have corrected it.

      (5) Line 85 - precise role -> exact role.  

      We have corrected it (line 95).

      (6) Line 87 - the vertebrate class Actinopterygii & the class Sarcopterygii. 

      Thank the reviewer for pointing out. We have corrected it (line 98-99).

      (7) Line 90 - homologous -> orthologous. 

      We have corrected it (line 102).

      (8) Figure 5 - For interpretability of the data, I suggest writing 'Paralogous groups' on the top of the panels A and B, and 'Cluster' vertically on the left.  

      We thank the reviewer for this helpful suggestion. As recommended, we have added

      “Paralogous groups” at the top of panels A and B, and “Clusters” vertically on the left side of Figure 5 to facilitate interpretation of the data.

      (9) Some subheading titles are too long. They can be shortened into 'hoxb5a and -b5b expression in pectoral fin buds are RA-dependent' instead of 'Expression patterns of hoxb5a and hoxb5b in pectoral fin buds are dependent on RA', for example.  

      We appreciate the reviewer’s suggestion regarding the length of the subheading titles. In response, we have shortened the relevant subheadings in both the Results and Discussion sections to make them more concise while retaining their scientific meaning. For example, the subheading originally written as “Expression patterns of hoxb5a and hoxb5b in pectoral fin buds are dependent on RA” has been revised to “hoxb5a/b5b expression in pectoral fin buds is

      RA-dependent.” Similar adjustments have been made to other subheadings throughout these sections. We believe these changes improve readability and consistency without altering the intended content.

      (10) Line 408 - why tetrapods, instead of cartilaginous fishes, which are thought of as natural in this context? 

      We appreciate the reviewer’s careful reading and insightful comment. However, in response to Reviewer 1’s suggestion, we have substantially reduced the speculative section on evolutionary significance in the Discussion. As a result, this specific part of the text has now been deleted. We thank the reviewer for raising this point.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript uses optical coherence tomography (OCT) to visualize tissue microstructures about 1-2 mm under the finger pad skin surface. Their geometric features are tracked and used to generate tissue strains upon skin surface indentation by a series of transparent stimuli both normal and tangential to the surface. Then movements of the stratum corneum and the upper portion of the viable epidermis are evaluated. Based upon this data, across a number of participants and ridges, around 300 in total, the findings report upon particular movements of these tissue microstructures in various loading states. A better understanding of the mechanics of the skin microstructures is important to understand how surface forces propagate toward the locations of mechanoreceptive end organs, which lie near the edge of the epidermis and dermis, from which tactile responses of at least two peripheral afferents originate. Indeed, the microstructures of the skin are likely to be important in shaping how neural afferents respond and enhance their sensitivity, receptive field characteristics, etc. 

      Strengths: 

      The use of OCT in the context of analyzing the movements of skin microstructures is novel. Also novel and powerful is the use of distinct loading cases, e.g., normal, tangential, and stimulus features, e.g., edges, and curves. I am unaware of other empirical visualization studies of this sort. They are state-of-the-art in this field.

      Moreover, in addition to the empirical imaging observations, strain vectors in the tissues are calculated over time. 

      Weaknesses: 

      The interpretation of the results and their framing relative to the overall hypotheses/questions and prior works could be articulated more clearly. In particular, the major findings of the manuscript are in newly describing a central concept regarding "ridge flanks," but such structures are neither anatomically nor mechanistically defined in a clear fashion. For example, "... it appears that the primary components of ridge deformation and, potentially, neural responses are deformations of the ridge flanks and their relative movement, rather than overall bending of the ridges themselves." From an anatomical perspective, I think what the authors mean by "ridge flanks" is a differential in strain from one lateral side of a papillary ridge to the other. But is it unclear what about the continuous layers of tissue would cause such behaviors. Perhaps a sweat duct or some other structure (not visible to OCT) would subdivide the "flanks" of a papillary ridge somehow? If not due to particular anatomy, then is the importance of the "ridge flank" due to a mechanistic phenomenon of some sort? Given that the findings of the manuscript center upon the introduction of this new concept, I think a greater effort should be made to define what exactly are the "ridge flanks." It is clear from the results, especially the sliding case, that there is something important that the manuscript is getting at with this concept. 

      We apologize for the confusion around our use of ‘ridge flanks’. To recap the overall goal briefly, we wanted to measure the deformation of papillary ridges and their associated sub-surface structures to different tactile stimuli. Capturing these deformations and comparing them against different proposed ideas, for example bending (horizontal shear) of the entire ridge versus differential deformations of different sub-parts, constrains neural activation mechanisms, has implications for how well tactile stimuli can be spatially resolved on the skin, and for whether sub-surface deformations can be easily predicted from surface movements alone. Our mesh was dense enough to compare the stratum corneum and the viable epidermis directly, where we expected some differences due to their previously documented mechanical differences, as well as the ridge flanks, which refers to the two (proximal and distal) sides of a single papillary ridge and their associated structure in the SC and VE (as correctly surmised by the reviewer). Differential behaviour across ridge flanks might be seen, because various observations of the surface of the stratum corneum had suggested mechanical differences between the papillary ridges and the grooves dividing them, potentially leading to differential deformations of these two halves depending on which direction they were facing tissue with different mechanical properties.

      We now provide a clearer definition of ridge flanks in Figure 1 and in the main text. Importantly, existing prior research is better connected to our own investigation in the Introduction and we now specifically explain why we investigate ridge flanks.

      The OCT used herein cannot visualize deep and fully into what the manuscript refers to as a "ridge"(note others have previously broken apart this concept apart into "papillary", "intermediate" and "limiting" ridges) near locations of the mechanoreceptive end organs lie at the epidermal-dermal border. Therefore, the OCT must make inferences about the movements of these deeper tissues, but cannot see them directly, and it is the movements of these deeper tissues that are likely driving the intricacies of neural firing. Note the word "ridge" is used often in the manuscript's abstract, introduction, and discussion but the definition in Fig. 1 and elsewhere differs in important ways from prior works of Cauna (expert in anatomy). Therefore, the manuscript should clarify if "ridge" refers to the papillary ridge (visible at the exterior of the skin), intermediate ridge (defined by Cauna as what the authors refer to as the primary ridge), and limiting ridge (defined by Cauna as what the authors refer to as the secondary ridge). What the authors really mean (I think) is some combination of the papillary and intermediate ridge structures, but not the full intermediate ridge. The manuscript acknowledges this in the "Limitations and future work" section, stating that these ridges cannot be resolved. This is important because the manuscript is oriented toward tracking this structure. It sets up the narrative and hypotheses to evaluate the prior works of Cauna, Gerling, Swensson, and others who all directly addressed the movement of this anatomical feature which is key to understanding ultimately how stresses at these locations might move the peripheral end organs (i.e., Merkel cells, Meissner corpuscles). 

      Thank you for these observations. Indeed, our terminology was not consistent. We have now switched to Cauna’s terminology and added additional labels in Figure 1, explaining all mentioned structures in the main text. We have also changed the language in many instances in the main text to make it clearer whether we are referring to individual anatomical ridges (papillary, limiting, etc.) or the whole structure. Additionally, it is now clearer from the start which features are tracked, and we specifically state  that intermediate ridges are excluded from our tracking.

      Regarding the intermediate ridge, it indeed plays a big role in Cauna’s lever hypothesis. Given the intermediate ridge is excluded from our analysis, we can neither prove nor disprove this hypothesis in our current work. However, there are many mechanical mysteries to solve regarding the structures directly above, which are the main focus of this paper. We have rewritten the introduction to make these questions clearer. For example, Cauna observed pliability of the papillary ridges in surface experiments. Swensson found differential expression patterns of keratin in epidermis tissue in and above the intermediate ridges, but the direct mechanical consequences that are proposed in their paper concern the behaviour of papillary ridges, rather than relying on a mechanical role of intermediate ridges. Even Cauna’s lever idea implies specific deformation of the stratum corneum, which would be measurable in our study, as the upper handle of the ‘lever’ needs turning. We observed little movement in accordance with this idea, putting the lever mechanism into question. While this does not rule out a mechanical role of the intermediate ridge, these findings constrain its potential mechanisms.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors investigate sub-skin surface deformations to a number of different, relevant tactile stimuli, including pressure and moving stimuli. The results demonstrate and quantify the tension and compression applied from these types of touch to fingerprint ridges, where pressure flattens the ridges. Their study further revealed that on lateral movement, prominent vertical shearing occurred in ridge deformation, with somewhat inconsistent horizontal shear. This also shows how much the deeper skin layers are deformed in touch, meaning the activation of all cutaneous mechanoreceptors, as well as the possibility of other deeper non-cutaneous mechanoreceptors. 

      Strengths: 

      The paper has many strengths. As well as being impactful scientifically, the methods are sound and innovative, producing interesting and detailed results. The results reveal the intricate workings of the skin layers to pressure touch, as well as sliding touch over different conditions. This makes it applicable to many touch situations and provides insights into the differential movements of the skin, and thus the encoding of touch in regards to the function of fingerprints. The work is very clearly written and presented, including how their work relates to the literature and previous hypotheses about the function of fingerprint ridges. The figures are very well-presented and show individual and group data well. The additional supplementary information is informative and the video of the skin tracking demonstrates the experiments well. 

      Weaknesses: 

      There are very few weaknesses in the work, rather the authors detail well the limitations in the discussion. Therefore, this opens up lots of possibilities for future work. 

      We thank the reviewer for these encouraging comments.

      Impact/significance: 

      Overall, the work will likely have a large impact on our understanding of the mechanics of the skin. The detail shown in the study goes beyond current understanding, to add profound insights into how the skin actually deforms and moves on contact and sliding over a surface, respectively. The method could be potentially applied in many other different settings (e.g. to investigate more complex textures, and how skin deformation changes with factors like dryness and aging). This fundamental piece of work could therefore be applied to understand skin changes and how these impact touch perception. It can further be applied to understand skin mechanoreceptor function better and model these. Finally, the importance of fingertip ridges is well-detailed, demonstrating how these play a role in directly shaping our touch perception and how they can shape the interactions we have with surfaces. 

      Reviewer #3 (Public Review): 

      Summary: 

      The publication presents unique in-vivo images of the upper layer of the epidermis of the glabrous skin when a flat object compresses or slides on the fingertip. The images are captured using OCT, and are the process of recovering the strain that fingerprints experience during the mechanical stimulation. 

      The most important finding is, in my opinion, that fingerprints undergo pure compression/tension without horizontal shear, hinting at the fact that the shear stress caused by the tangential load is transferred to the deeper tissues and ultimately to the mechanoreceptors (SA-I / RA-I). 

      Strengths: 

      Fascinating new insights into the mechanics of glabrous skin. To the best of my knowledge, this is the first experimental evidence of the mechanical deformation of fingerprints when subjected to dynamic mechanical stimulation. The OCT measurement allows an unprecedented measurement of the depth of the skin whereas previous works were limited to tracking the surface deformation.  - The robust data analysis reveals the continuum mechanics underlying the deformation of the fingerprint ridges. 

      Weaknesses: 

      I do not see any major weaknesses. The work is mainly experimental and is rigorously executed. Two points pique my curiosity, however: 

      (1) How do the results presented in this study compare with previous finite element analysis? I am curious to know if the claim that the horizontal shear strain is transferred to the previous layer is also captured by these models. The reason is that the FEA models typically use homogeneous materials and whether or not the behavior in-silico and in-vivo matches would offer an idea of the nature of the stratum corneum. 

      Very few modeling studies have examined combined normal and tangential loading of the fingertip. Additionally, results are often expressed in terms of Von Mises stresses, and not deformation [1,2], making direct comparison challenging. Nevertheless, one multilayered study [3] supports our finding that the largest deformations are found in deeper tissues.

      (1) Shao, F., Childs, T. H. C., Barnes, C. J. & Henson, B. Finite element simulations of static and sliding contact between a human fingertip and textured surfaces. Tribology International 43, 2308–2316 (2010).

      (2) Tang, W. et al. Investigation of mechanical responses to the tactile perception of surfaces with different textures using the finite element method. Advances in Mechanical Engineering 8, (2016).

      (3) Amaied, E., Vargiolu, R., Bergheau, J. M. & Zahouani, H. Aging effect on tactile perception: Experimental and modelling studies. Wear 332–333, 715–724 (2015). 

      (2) Was there a specific reason why the authors chose to track only one fingerprint? From the method section, it seems that nothing would have prevented tracking a denser point cloud and reconstructing the stain on a section of the skin rather than just one ridge. With such data, the author could extend their analysis to multiple ridges interaction and get a better sense of the behavior of the entire strip of skin. 

      We apologise for the confusion regarding this point. While in our illustration and the accompanying videos, we only show a single tracked ridge for clarity, we do indeed track all visible ridges in every frame. As imaging slices were 4 mm wide, often 8-9 ridges were visible concurrently. However, during the sliding experiments the skin was sometimes dragged along with the stimulus, causing some ridges to disappear from view for certain periods and then re-enter the frame. This would make it difficult to expand the analysis to multiple ridges, but in any case, we found neighbouring ridges to behave very consistently within a given trial, so that their mechanical behaviour (relative to the tactile feature, if any) could be averaged in the analysis.

      Reviewer #1 (Recommendations For The Authors): 

      Discussion, line 213, "Thus, the primary mechanism through which the ridge conforms to the object involves the relative movement and shearing of the ridge flanks, rather than relying on the groves as articulated joints." I don't see this as definitely proven in the imaging and analysis. This could be a hypothesis to come from this work for further evaluation but is a quite strong statement not obviously supported by the evidence. 

      We have rephrased this statement as a proposal for further testing:

      “Therefore, we propose that the primary mechanism through which a ridge conforms to an object might involve the relative movement and shearing of the ridge flanks, rather than relying on the grooves as articulated joints.”

      Discussion, line 220, "Our findings strongly indicate that the majority of the surface movement of the skin was observed by deeper tissue rather than surface layers of the skin." But since there are no measurements of such tissues, or of collagen bundle tightening, etc. it is not obvious to me how this can be proven as it is not directly observable and was not modeled. 

      We have reworded this paragraph to be more cautious and have included potential avenues for future testing of this idea:

      “It is possible that the majority of the surface movement of the skin was absorbed by deeper tissues rather than the surface layers of the skin imaged in the present study. If that is the case, recent modeling work has suggested that tissue deformations are highly dependent on the orientation of collagen fibers in these tissues (Duprez et al., 2024), which might be amenable to tracking in future OCT work to test this idea directly. Additionally, previous work investigating tactile afferent responses to tangential skin movements has reported strong activation of SA-2 receptors, thought to measure skin stretch mainly in deeper tissues (Saal et al., 2025), providing further indirect evidence.”

      Figure 1, A. As noted elsewhere, there are issues with the naming of the anatomy, and there is no definition of the concept of "ridge flanks." Also, it does not indicate the depth point to which OCT can resolve. 

      We have updated and expanded the labels in Figure 1A to clarify the anatomy (along with changes in the text described above). Figure 1C now includes a sentence about the resolvability of features below the mesh:

      “Detail view of a single OCT frame showing ridged skin structure and clear boundary between the stratum corneum and viable epidermis. A mesh covering the stratum corneum and the upper part of the viable epidermis (without the intermediate ridge) is overlaid spanning a single papillary ridge. The border between the viable epidermis and dermis is less clearly delineated, but some deeper features are resolved less well.”

      The concept of a ridge flank is now illustrated in Figure 1B(i) and Figure 1B(iv), and referred to in both the caption and main text. Updated figure caption text:

      “These deformations need not apply to the whole ridge structure but might affect different parts separately, e.g. via shearing in different directions across both ridge flanks  as shown on the far right

      (see darker shading to highlight a single ridge flank).”

      Updated text in the main manuscript:

      “Additionally, if there are indeed mechanical differences between papillary ridges and their neighbouring grooves at the level of the stratum corneum, this might result in differential movements of the two sides of each papillary ridge, here referred to as ridge flanks (see Figure 1B-iv, right, for a potential example).”

      Note that Figure 4B also includes an illustration of this concept.

      Figure 1, B. This mechanical representation does not capture the entirety of the papillary-intermediate ridge unit in question, as set up by the authors in the introduction. Also, in the caption it is not ridge deformation, but upper SC and VE deformation. And the OCT cannot resolve the whole ridge. 

      We have reworded the figure caption”

      “Potential deformations of the tracked ridge structure, including the stratum corneum and the bulk of the viable epidermis, during tactile interactions, with arrows indicating the directions of relative deformation. [...]”

      Importantly, the main manuscript text has been rewritten in the introduction section to clarify our research question and how much of the sub-surface ridge structure is tracked:

      “From a mechanical standpoint, these conflicting interpretations raise the question of how the outermost two skin layers typically deform at the resolution of single papillary ridges, whether by tension, compression, or shear (see examples in Figure 1B). Additionally, such deformations might apply to individual papillary ridges and all their sub-surface structures equally, for example horizontal shearing that bends the papillary ridge in a certain direction, while levering its sub-surface aspects in the opposite direction. Conversely, individual parts of the ridge structure might deform differently. For example, the viable epidermis might deform to a different extent or in different directions due to its lower stiffness and different morphology. Additionally, if there are indeed mechanical differences between papillary ridges and their neighbouring grooves at the level of the stratum corneum, this might result in differential movements of the two sides of each papillary ridge, here referred to as ridge flanks (see Figure 1B-iv, right, for a potential example). To empirically address these questions, we employed Optical Coherence Tomography (OCT) to precisely measure the sub-surface deformation of individual fingerprint ridges in response to a variety of mechanical events. Specifically, we focused on the stratum corneum and the bulk of the viable epidermis (excluding intermediate ridges), which could be robustly resolved and tracked by our setup.”

      Figure 1, C: While it is noted in the caption that the locations of the intermediate and limiting ridges, as well as the collagen bundles, are clearly visible, it is not clear to me, although the caption uses these words. This is especially the case below the orange mesh. From the picture, and because this is not labeled, it leaves it up to my interpretation, it seems like the secondary ridge (limiting) is larger than the primary (intermediate). 

      We have reworded the caption as follows:

      “Detail view of a single OCT frame showing ridged skin structure and clear boundary between the stratum corneum and viable epidermis. A mesh covering the stratum corneum and the upper part of the viable epidermis (without the intermediate ridge) is overlaid spanning a single papillary ridge. The border between the viable epidermis and dermis is less clearly delineated.”

      Indeed, while the intermediate ridge was often visible in the OCT images, its size was rather inconsistent and it could appear as larger or smaller than the limiting ridge, while in histological images it is generally shown as larger (however note that there is somewhat limited data). This difference might be due to imaging artifacts, e.g. limited visibility into the deeper tissues, might reflect individual differences between participants, or could indicate that intermediate ridges are not of a consistent height in the (out-of-plane) direction along a given ridge. We have clarified this in the Limitations section of the Discussion:

      “[...] while we could confidently track landmarks associated with the stratum corneum, we could not reliably identify intermediate ridges in the viable epidermis, though they were visible in some of the frames, limiting the depth of the fitted mesh. We hypothesize that the additional depth of these ridges combined with their slender morphology might have degraded the signal. 3D OCT imaging (see below) might help to resolve these features in future work and settle open questions regarding their precise morphology.”

      Figure 1, D, and E: How do these measurements compare with the literature? They seem reasonable to me based on a cursory review, but there is a need to directly compare, especially since measurements in this context with the OCT are novel and could be valuable. 

      We have clarified this in the main text and added more references to the existing literature:

      “We measured an average ridge width of 0.47 mm across participants (Figure 1D), consistent with previous studies (Moore, 1989; Ohler and Cummins, 1942). Average skin layer thickness was 0.38 mm for the stratum corneum and 0.12 mm for the viable epidermis across our dataset (Figure 1E), again in agreement with previous studies using both in vivo imaging and ex vivo histology (Fruhstorfer et al., 2000; Lintzeri et al., 2022; Maiti et al., 2020).”

      Abstract 4th sentence's structure makes me think that hundreds of individual fingerprint ridges can be tracked at the same time. Perhaps it could be tweaked to clearly indicate that hundreds were tracked between trials between participants. 

      We have changed the sentence to now read:

      “Here, we used optical coherence tomography to image and track sub-surface deformations of hundreds of individual fingerprint ridges across ten participants and four individual contact events at high spatial resolution in vivo.”

      Introduction, 1st sentence, the fingertip per se is not an organ, though the skin is an organ. 

      Changed the wording from “organ” to “structure”.

      Introduction, 1st sentence, "... that convert skin deformations ..." Need to add word skin to be clear. 

      Done.

      Introduction, 3rd paragraph, "Alternately, the grooves may be stiffer or less ...". In this paragraph, and this sentence in particular, Cauna is cited and the words groves and ridges are used. But this is not adequately explained. Cauna had distinct terminology, where he referred to papillary, intermediate, and limiting ridges, that exist in addition to ready ridges. It is important because the manuscript uses the word "ridges" in a non-specific way. This is done not just here but throughout the manuscript, and is central to the questions which can be addressed with OCT. 

      Anatomy has been better defined and more extensively labelled in Figure 1A, including labels for ‘papillary ridges’ and ‘grooves’. We have reworded this paragraph to better explain the concepts and how they relate to the subsequent analyses in the paper

      “Consequently, the mechanical response of the skin below its immediate surface remains largely unknown, leading to conflicting interpretations in the literature. For instance, it has been proposed that the papillary ridges are stiffer than the neighbouring grooves (Swensson et al., 1998), which might imply that normal loading of the skin might not affect the ridges’ profile appreciably. Conversely, other observations have suggested that the grooves are relatively stiff, allowing the papillary ridges to deform considerably (Cauna, 1954; Johansson and LaMotte, 1983). However, the sub-surface consequences of this putative pliability during object contact or stick-to-slip transitions (see e.g. Delhaye et al., 2016) are unclear: the whole ridge structure might bend as proposed in Cauna’s lever mechanism (Cauna, 1954), but this view has proved controversial (see e.g. Gerling and Thomas, 2008), with direct empirical evidence lacking.”

      Figure 1. Avoid red-green dots for colorblind accessibility. PMMA is not in the caption. 

      We have switched the colors of the mechanoreceptors in panel A to a colorblind-friendly scheme. We now also specify the material of the plates in the figure 1 caption.

      Results, line 102. "... papillary ridge structure...." Is this the ridge to which is being referred? 

      In conjunction with the updated labeling in Figure 1A, we have updated the terminology throughout the paper to be more consistent.

      Results, line 99. "We noted a small increase in the area of the strateum corneum, which was likely an artifact due to the fit of the mesh to the ridge's curvature ..." There is very little discussion of Fig. F's finding related to an increase in area in the SC and decrease in the VE. It makes me question if this finding in this panel is an artifact. With stiff tissue like stratum corneum, how would the area increase? 

      This finding could be a measurement artifact or it could be the result of skin from neighbouring regions pushing into the imaged space. We have reworded the brief description in the Results:

      “We noted a small increase in the area of the stratum corneum, which was possibly an artifact due to the imperfect fit of the mesh to the ridge's curvature (but see Discussion for an alternative explanation).”

      Additionally, we have added a short section in the Discussion in the Limitations section:

      “Some of our tactile interactions might have caused skin deformations out-of-plane that were thus not measurable. For example, the slight increase in thickness of the stratum corneum under normal load might be explained as a measurement artifact due to the coarse nature of the mesh fitted, but could alternatively reflect tissue from out-of-plane regions pushing into the imaged space. Indeed, recent surface measurements of the skin's behaviour during initial object contact have reported compression of the skin in the plane parallel to its surface (Doumont et al., 2025), which would result in increasing thickness, assuming that the stratum corneum is incompressible. Future studies could consider creating three-dimensional reconstructions of the fingerprint structure to study such effects.”

      Figure 3. The colors used in slip and stick are not colorblind accessible. 

      We have changed the background colors in Figure 3A,B,C to a colorblind accessible version.

      Results, line 151, "Thus, most of this shearing must be sustained by deeper tissues." But there are no direct observations as such. Also, in the next sentence, "collagen fiber bundles" are referred to in a non-specific way. This section is highly speculative with no systematic visualization of these structures, and should probably be moved to the discussion. 

      We have reworded this sentence to be more cautious. We have now also highlighted collagen fiber bundles visible in the figure. Systematic analysis of these is beyond the scope of the present study, as these were not tracked, but might be possible in future studies. The reworded sentence reads as follows:

      “Thus, it is possible that shearing is sustained by deeper tissues, an effect that could be tested in future studies by directly tracking the angle and orientation of collagen fiber bundles anchoring the epidermis to deeper tissues (see highlighted examples in Figure 3B).”

      Results, line 161, " Horizontal shear ..." do you mean surface shear, per the Fig. 1 definition? 

      For consistency, we have changed the labels to ‘Horizontal shear’ and ‘Vertical shear’ in Figure 1A(iii) and Figure 1A(iv) as these are the terms used throughout the paper.

      Discussion, line 198, "... flatten even at relatively low forces." This is an interesting point and it would be useful to note how low exactly. 

      We have reworded this sentence to better reflect the findings described earlier:

      “We found that individual ridges tended to flatten considerably at relatively low forces of 0.5 N, with higher forces increasing deformations only moderately.”

      Reviewer #2 (Recommendations For The Authors): 

      Minor comments that could improve the paper even further 

      In the abstract, it may be good to specify that the stimuli were all applied to the finger, this was not an active, self-generated tactile interaction, e.g. change 'in response to a variety of tactile stimuli' to 'in response to a variety of passively-applied tactile stimuli'. 

      Done.

      Comment on the grey/blue colours in the figures. I like the combination of blue/orange for different conditions, but sometimes the blue is very difficult to see against the grey background. Is there any way of making the grey background shading lighter and/or the blue darker/more vivid?

      We have changed the color of the SC mesh to a darker shade of blue, which is more easily distinguished from the grey background. This applies to figures 2B/C, 3D, 4A/B/D/E, and all supplementary figures.

      Methods. Could you please add a little more detail about exactly where the images were taken, e.g. in the exact middle of the fingerpad, at the fingertip? Did you line up the skin fingerprint ridges to be in a plane? It is just to better understand how the stimulus moved against the skin, which itself is rounded, and whether it was at a point where the ridges were relatively linear or curved. 

      We have added the following text in the “Experimental set-up” section of the Methods:

      “The participant's finger was secured in a finger holder, which was positioned in such a way that the flat part of the fingertip distal to the whorl made initial contact with the plate as it was lowered onto the fingertip. The scanner was positioned such that its scan path aligned with the distal-proximal axis of the plate, targeting the centre line of the fingerpad so that the fingerprint ridges were oriented orthogonally to the line scan.”

      and

      “For these experiments, imaging focused on the central flat part of the contact area, such that all fingerprint ridges visible in the imaged region were in contact with the plate throughout the trial.”

      Methods. There is no section about statistics, yet you do use them in the paper. It may be good to add a few details in the methods to outline the package you used to do the statistics, as well as why you chose the tests you carried out. 

      We have added a new Statistics section at the end of the Methods:

      “Statistical tests were run in Python using the scipy.stats package. As distributions were skewed, we used non-parametric analyses throughout the study. Bonferroni corrections were used when multiple comparisons were made.”

      A very minor point. Discussion, line 210: 'In this study...' is vague, which study exactly? It is preferable to be more precise, e.g. 'In the present/current study...'. 

      Fixed.

      Discussion. One point you may want to add is the possibility of looking at other skin regions. For example, would this approach work on the palm, on border glabrous/hairy skin, on various hairy skin sites, and on the foot? The possibilities could be endless if it could be applied anywhere, but it may depend on the technical positioning and skin itself. However, it would be interesting to know. 

      We have added the following text at the end of the Discussion section:

      “Finally, while we focused on the fingertip only, many other skin regions present interesting mechanical challenges waiting to be explored. The general ridged structure observed on the fingertip is common to all glabrous skin, but the local ridge mechanics might still differ: glabrous skin on the foot sole exhibits some morphological differences in order to support large weights that might well influence its mechanical response (Boyle et al., 2019). For example, the morphology of transverse ridges (running orthogonal to and connecting limiting with intermediate ridges) differs across regions on the foot sole (Nagashima and Tsuchida, 2011) and very likely from the hand (Yamada et al., 1996). Our method should be directly applicable to study deformations of these ridges, though three-dimensional observations might be needed to resolve some of the open questions. Hairy skin in contrast differs from glabrous skin in that the stratum corneum is much thinner. It also lacks the clearly organised ridge structure, but exhibits more loosely oriented skin folds instead, which very likely also serve a mechanical function (Leyva-Mendivil et al., 2015) and in principle are amenable to study using OCT.”

      In the last lines of the discussion, you mention the possible effects of skin moisturization. The Tomlinson et al. paper refers to the hydration of the skin with regard to water, which I would say is a slightly different factor. I think you can mention this paper and talk about the water level of the skin/hydration, but also add specifically that moisturization (i.e. by an emollient, humectant, or occlusive substance) is another factor to consider (e.g. effects found by Dione et al, 2023 Sci Rep). Overall, these two points relate to the dryness of the skin and the humidity of surfaces being contacted, therefore you could expand on both. 

      Thank you for the correction! We now mention both skin hydration and moisturization separately in this section.

    1. . inside this room

      1.entra esse quarto 2.perto de nesses carros 3,em essa mesa 4,esses mineno sábatos 5,Nós partimos casa cedo 6,Quem vive nessa casa 7,você acredita nesse homen

    1. 1 Quarto Quarto enables you to weave together content and executable code into a finished document. To learn more about Quarto see https://quarto.org. 2 Running Code When you click the Render button a document will be generated that includes both content and the output of embedded code. You can embed code like this: Show the code 1 + 1 [1] 2 You can add options to executable code like this [1] 4 The echo: false option disables the printing of code (only output is displayed).

      I would delete this section because it does not pertain to your work. It also displays as the only headings on your table of contents.

    1. Reviewer #2 (Public review):

      Summary:

      The authors utilize large volume electron microscopy ("connectomics") data to address how circuits remain stable during development. They focus on the development of the Drosophila nociceptive circuit between larval stages L1 and L3. Their analyses focus on changes to pre- and post-synaptic circuit partners (i.e., pre-synaptic axons and post-synaptic dendrites) and conduct a thorough analysis of eliminating likely changes to both that could balance circuits. Ultimately, they find that the change in axonal growth (i.e, cable length) is mismatched with dendritic growth, but that this is balanced by an increase in the synapse density of pre-synaptic axons.

      Strengths:

      The authors used connectomics, the gold standard for neural circuit tracing, to conduct their analyses, and thus their results are strongly supported by the quality of the data. They carefully eliminated several models for how pre- and post-synaptic changes could co-develop to preserve circuit stability until they identified a major driver in changes in the timing of axon development relative to dendritic development. I also admired their willingness to be transparent about the limitations of their studies, including a lack of analyses of changes to inhibitory inputs and a lack of dynamics in their data. Overall, it's difficult to argue their results are wrong, but they may be incomplete. That said, it's difficult to account for every variable, and they covered the more salient topics, and it's my opinion that this is an important contribution that moves the field forward while also being careful to note its limitations that could and should motivate future work.

      Weaknesses:

      I identified a few weaknesses that could benefit from revisions:

      (1) I found parts of the text confusing, verging on misleading, specifically as it relates to other species. For example, in Line 93, the authors state that they have shown that synapses per unit dendrite length remain remarkably constant across species and brain regions. This was mentioned throughout the manuscript, and it wasn't clear to me whether this was referring to across development or in adults. If over-development, this contrasts with other recently published work of our own comparing synapse densities in the developing mouse and rhesus macaque. Whether they are different or the same is equally interesting and should be discussed more clearly. Related to this, it's not clear that mammalian circuits over development remain stable. For example, our work shows that the ratio of excitatory and inhibitory synapses changes quite a lot in developing mice and primates.

      (2) I was not convinced by the use of axon-dendritic cable overlap. While axons and dendrites certainly need to be close together to make a synapse, I don't understand why this predicts they will connect. In connectomic data, axons pass by hundreds if not thousands of potential post-synaptic partners without making a synapse. Ultimately, the authors' data on changes in axon cable length between L1 and L3 would predict more overlap, but I found the use of overlap confusing and unnecessary, relative to the concreteness of their other analyses. I would suggest removing this from their analyses or providing a stronger argument for how overlap predicts connectivity.

      (3) Figure 7. For non-computational neuroscientists, I think it would be tremendously helpful to include a table that outlines the metrics you used. The text states you constrained these models with your EM data, but it would be helpful to summarize the range of numerical data you used for each parameter.

      (4) The most important finding to me was the asymmetry between axon and dendrite development. Perhaps beyond the scope of this work, it raises the question of whether there are privileged axons that uniquely increase their synapse density. Figure 5D alludes to this, where the fold change in cable length is not proportional to the change in synapse density. Could it be that over development, specific inputs become dominant while others prune their synapses, resulting in an overall balanced circuit, but dominance of specific partners changes? Either answer (i.e., yes, there are privileged circuits that emerge from L1 to L3, or no) would be very interesting and greatly elevate the significance of this work.

      (5) Related to my comment #1, can the authors comment on whether these changes are unique to Drosophila nociceptive circuits? Do all circuits remain balanced over development in flies? Finally, could you clarify why L1 to L3 was chosen?

    1. Reviewer #1 (Public review):

      Summary:

      Using a computational modeling approach based on the Drift and Diffusion Model (DDM) introduced by Ratcliff and McKoon in 2008, the article by Shevlin and colleagues investigates whether there are differences between neutral and negative emotional states in:

      (1) The timings of the integration in food choices of the perceived healthiness and tastiness of food options in individuals with bulimia nervosa (BN) and healthy participants (2) The weighting of the perceived healthiness and tastiness of these options.

      Strengths:

      By looking at the mechanistic part of the decision process, the approach has potential to improve the understanding of pathological food choices.

      Weaknesses:

      I thank the author for reviewing their manuscript.

      However, I still have major concerns.

      The authors say that they removed any causal claims in their revised version of the manuscript. The sentence before the last one of the abstract still says "bias for high-fat foods predicted more frequent subjective binge episodes over three months". This is a causal claim that I already highlighted in my previous review, specifically for that sentence (see my second sentence of my major point 2 of my previous review).

      I also noticed that a comment that I added was not sent to the authors. In this comment I was highlighting that in Figure 2 of Galibri et al., I was uncertain about a difference between neutral and negative inductions of the average negative rating after the induction in the BN group (i.e. comparing the negative rating after negative induction in BN to the negative rating after neutral induction in BN). Figure 2 of Galibri et al. looks to me that:

      (1) The BN participants were more negative before the induction when they came to the neutral session than when they came to the negative session. (2) The BN participants looked almost negatively similar (taking into account the error bars reported) after the induction in both sessions

      These observations are of high importance because they may support the fact that BN patients were likely in a similar negative state to run the food decision task in both conditions (negative and neutral). Therefore, the lack of difference in food choices in BN patients is unsurprising and nothing could be concluded from the DDM analyses. Moreover, the strong negative ratings of BN patients in the neutral condition as compared to healthy participants together with almost similar negative ratings after the two inductions contradict the authors' last sentence of their abstract.

      I appreciate that the authors reproduced an analysis of their initial paper regarding the negative ratings (i.e. Table S1). It partly answers my aforementioned point but does not address the fact that BN may have been in a similar negative state in both conditions (neutral and negative) when running the food decision task: if BN patients were similarly negative after both induction (neutral and negative), nothing can be concluded from their differences in their results obtained from the DDM. As the authors put it, "not all loss-of-control eating occurs in the context of negative state", I add that far from all negative states lead to a loss-of-control eating in BN patients. This grounds all my aforementioned remarks and my remarks of my first review.

      A solution for that is to run a paired t-test in BN patients only comparing the score after the induction in the two conditions (neutral and negative) reported in Figure 2 of their initial article.

      I appreciate the analysis that the authors added with the restrictive subscale of the EDE-Q. That this analysis does not show any association with the parameters of interest does not show that there is a difference in the link between self reported restrictions and self reported binges. Only such a difference would allow us to claim that the results the authors report may be related to binges.

      I appreciate the wording of the answer of the authors to my third point: "the results suggest that individuals whose task behavior is more reactive to negative affect tend to be the most symptomatic, but the results do not allow us to determine whether this reactivity causes the symptoms". This sentence is crystal clear and sums very well the limits of the associations the authors report with binge eating frequency. However, I do not see this sentence in the manuscript. I think the manuscript would benefit substantially from adding it.

      Statistical analyses:

      If I understood well the mixed models performed, analyses of supplementary tables S1 and S27 to S32 are considering all measures as independent which means that the considered score of each condition (neutral vs negative) and each time (before vs after induction) which have been rated by the same participants are independent. Such type of analyses does not take into account the potential correlation between the 4 scores of a given participant. As a consequence, results may lead to false positives that a linear mixed model does not address. The appropriate analysis would be to run adapted statistical tests pairing the data without running any mixed model.

      Notes:

      It is not because specific methods like correlating self reported measures over long periods with almost instantaneous behaviors (like tasks) have been used extensively in studies that these methods are adapted to answer a given scientific question. Measures aggregated over long periods miss the variations in instantaneous behaviors over these periods.

    2. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This study provides a valuable contribution to understanding how negative affect influences food-choice decision making in bulimia nervosa, using a mechanistic approach with a drift diffusion model (DDM) to examine the weighting of tastiness and healthiness attributes. The solid evidence is supported by a robust crossover design and rigorous statistical methods, although concerns about low trial counts, possible overfitting, and the absence of temporally aligned binge-eating measures limit the strength of causal claims. Addressing modeling transparency, sample size limitations, and the specificity of mood induction effects, would enhance the study's impact and generalizability to broader populations.

      We thank the Editor and Reviewers for their summary of the strengths of our study, and for their thoughtful review and feedback on our manuscript. We apologize for the confusion in how we described the multiple steps performed to ensure that the hierarchical model reported in the main text was the best fit for the data but was not overfitted. Regarding “model transparency,” as described in our response to Reviewer 1 below, we have now more clearly explained (with references) that the use of hierarchical estimation procedures allows for information sharing across participants, which improves the reliability and stability of parameter estimates—even when the number of trials per individual is small. We have clarified for the less familiar reader how our Bayesian model selection criterion penalizes models with more parameters (e.g., more complex models).

      Details about model diagnostics, recoverability, and posterior predictive checks are all provided in the Supplementary Materials. We have clarified how these steps ensure that the parameters we estimate are identifiable and interpretable, while confirming that the model can reproduce key patterns in the data, ultimately supporting the validity of the winning model. Additionally, we have provided all scripts for estimating the models by linking to our public Github repository. Furthermore, we have edited language throughout to eliminate any implication of causal claims and acknowledged the limitation of the small sample size. Given these efforts, we are concerned that the current wording about “modeling transparency” in the public eLife Assessment may inadvertently misrepresent the modeling practices in our paper. Would it be possible to revise or remove that particular phrase to better reflect the steps we have taken? We believe this would help avoid confusion for readers.

      We have also taken additional steps to ensure that we have used “appropriate and validated methodology in line with current state-of-the-art," and we have added references to recent papers supporting our approaches.

      All changes in the revised text are marked in blue.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      Using a computational modeling approach based on the drift diffusion model (DDM) introduced by Ratcliff and McKoon in 2008, the article by Shevlin and colleagues investigates whether there are differences between neutral and negative emotional states in:

      (1) The timings of the integration in food choices of the perceived healthiness and tastiness of food options between individuals with bulimia nervosa (BN) and healthy participants.

      (2) The weighting of the perceived healthiness and tastiness of these options.

      Strengths:

      By looking at the mechanistic part of the decision process, the approach has the potential to improve the understanding of pathological food choices. The article is based on secondary research data.

      Weaknesses:

      I have two major concerns and a major improvement point.

      The major concerns deal with the reliability of the results of the DDM (first two sections of the Results, pages 6 and 7), which are central to the manuscript, and the consistency of the results with regards to the identification of mechanisms related to binge eating in BN patients (i.e. last section of the results, page 7).

      (1) Ratcliff and McKoon in 2008 used tasks involving around 1000 trials per participant. The Chen et al. experiment the authors refer to involves around 400 trials per participant. On the other hand, Shevlin and colleagues ask each participant to make two sets of 42 choices with two times fewer participants than in the Chen et al. experiment. Shevlin and colleagues also fit a DDM with additional parameters (e.g. a drift rate that varies according to subjective rating of the options) as compared to the initial version of Ratcliff and McKoon. With regards to the number of parameters estimated in the DDM within each group of participants and each emotional condition, the 5- to 10-fold ratio in the number of trials between the Shevlin and colleagues' experiment and the experiments they refer to (Ratcliff and McKoon, 2008; Chen et al. 2022) raises serious concerns about a potential overfitting of the data by the DDM. This point is not highlighted in the Discussion. Robustness and sensitivity analyses are critical in this case.

      We thank the Reviewer for their thoughtful critique. We agree that a limited number of trials can impede reliable estimation, which we acknowledge in the Discussion section. However, we used a hierarchical estimation approach which leverages group information to constrain individual-level estimates. This use of group-level parameters to inform individual-level estimates reduces overfitting and noise that can arise when trial counts are low, and the regularization inherent in hierarchical fitting prevents extreme parameter estimates that could arise from noisy or limited data (Rouder & Lu, 2005). As a result, hierarchical estimation has been repeatedly shown to work well in settings with low trial counts, including as few as 40 trials per condition (Lerche et al., 2017; Ratcliff & Childers, 2015; Wiecki et al., 2013). In addition, previous applications of the time-varying DDM to food choice task data has included experiments with as few as 60 trials per condition (Maier et al., 2020). We have added references to these more recent approaches and specifically note their advantages for the modeling of tasks with fewer trials. Finally, our successful parameter recovery described in the Supplementary Materials supports the robustness of the estimation procedure and the reliability of our results.

      The authors compare different DDMs to show that the DDM they used to report statistical results in the main text is the best according to the WAIC criterion. This may be viewed as a robustness analysis. However, the other DDM models (i.e. M0, M1, M2 in the supplementary materials) they used to make the comparison have fewer parameters to estimate than the one they used in the main text. Fits are usually expected to follow the rule that the more there are parameters to estimate in a model, the better it fits the data. Additionally, a quick plot of the data in supplementary table S12 (i.e. WAIC as a function of the number of parameters varying by food type in the model - i.e. 0 for M0, 2 for M1, 1 for M2 and 3 for M3) suggests that models M1 and potentially M2 may be also suitable: there is a break in the improvement of WAIC between model M0 and the three other models. I would thus suggest checking how the results reported in the main text differ when using models M1 and M2 instead of M3 (for the taste and health weights when comparing M3 with M1, for τS when comparing M3 with M2). If the differences are important, the results currently reported in the main text are not very reliable.

      We thank the Reviewer for highlighting that it would be helpful to explicitly note that we specifically selected WAIC as one of two methods to assess model fit because it penalizes for model complexity. We now explicitly state that, in addition to being more robust than other metrics like AIC or BIC when comparing hierarchical Bayesian models like those in the current study, model fit metrics like WAIC penalize for model complexity based on the number of parameters (Watanabe, 2010). Therefore, more complex models (i.e., those with more parameters) do not automatically have lower WAIC. Additionally, we now more clearly note that our second method to assess model fit, posterior predictive checks, demonstrate that only model M3 can reproduce key behavioral patterns present in the empirical data. As described in the Supplementary Materials, M1 and M2 miss key patterns in the data. In summary, we used best practices to assess model fit and reliability (Wilson & Collins, 2019): results from the WAIC comparison (which penalizes models with more parameters) and results from posterior predictive checks align in showing that M3 provided the best fit to our data. We have added a sentence to the manuscript to state this explicitly.

      (2) The second main concern deals with the association reported between the DDM parameters and binge eating episodes (i.e. last paragraph of the results section, page 7). The authors claim that the DDM parameters "predict" binge eating episodes (in the Abstract among other places) while the binge eating frequency does not seem to have been collected prospectively. Besides this methodological issue, the interpretation of this association is exaggerated: during the task, BN patients did not make binge-related food choices in the negative emotional state. Therefore, it is impossible to draw clear conclusions about binge eating, as other explanations seem equally plausible. For example, the results the authors report with the DDM may be a marker of a strategy of the patients to cope with food tastiness in order to make restrictive-like food choices. A comparison of the authors' results with restrictive AN patients would be of interest. Moreover, correlating results of a nearly instantaneous behavior (i.e. a couple of minutes to perform the task with the 42 food choices) with an observation made over several months (i.e. binge eating frequency collected over three months) is questionable: the negative emotional state of patients varies across the day without systematically leading patients to engage in a binge eating episode in such states.

      I would suggest in such an experiment to collect the binge craving elicited by each food and the overall binge craving of patients immediately before and after the task. Correlating the DDM results with these ratings would provide more compelling results. Without these data, I would suggest removing the last paragraph of the Results.

      We thank the Reviewer for these interesting and important suggestions, and we agree that claims about causal connections between our decision parameters and symptom severity metrics would be inappropriate. Per the Reviewer’s suggestions, we have eliminated the use of the word “predict” to describe the tested association with symptom metrics. We also agree that more time-locked associations with craving ratings and near-instantaneous behavior would be useful, and we have added this as an important direction for future research in the discussion. However, associating task-based behavior with validated self-report measures that assess symptom severity over long periods of time that precede the task visit (e.g., over the past 2 weeks in depression, over the past month in eating disorders) is common practice in computational psychiatry, psychiatric neuroimaging, and clinical cognitive neuroscience (Hauser et al., 2022; Huys et al., 2021; Wise et al., 2023), and this approach has been used several times specifically with food choice tasks (Dalton et al., 2020; Steinglass et al., 2015). We have revised the language throughout the manuscript to clarify: the results suggest that individuals whose task behavior is more reactive to negative affect tend to be the most symptomatic, but the results do not allow us to determine whether this reactivity causes the symptoms.

      In response to this Reviewer’s important point about negative affect not always producing loss-of-control eating in individuals with BN, we now explicitly note that while several studies employing ecological momentary assessments (EMA) have repeatedly shown that increases in negative affect significantly increase the likelihood of subsequent loss-of-control eating (Alpers & Tuschen-Caffier, 2001; Berg et al., 2013; Haedt-Matt & Keel, 2011; Hilbert & Tuschen-Caffier, 2007; Smyth et al., 2007), not all loss-of-control eating occurs in the context of negative affect. We further note that future studies should integrate food choice task data pre and post-affect inductions with measures capturing the specific frequency of loss of control eating episodes that occur during states of high negative affect.

      (3) My major improvement point is to tone down as much as possible any claim of a link with binge eating across the entire manuscript and to focus more on the restrictive behavior of BN patients in between binge eating episodes (see my second major concern about the methods). Additionally, since this article is a secondary research paper and since some of the authors have already used the task with AN patients, if possible I would run the same analyses with AN patients to test whether there are differences between AN (provided they were of the restrictive subtype) and BN.

      We appreciate the Reviewer’s very helpful suggestions. We have adjusted our language linking loss-of-control eating frequency with decision parameters, and we have added sentences focusing on the implications for the restrictive behavior of patients with BN between binge eating episodes. In the Supplementary Materials, we have added an analysis of the restraint subscale of the EDE-Q and confirmed no relationship with parameters of interest. While we agree additional analyses with AN patients would be of interest, this is outside the scope of the paper. Our team have collected data from individuals with AN using this task, but not with any affect induction or measure of affect. Therefore, we have added this important direction for future research to the discussion.

      Reviewer #2 (Public review):

      Summary:

      Binge eating is often preceded by heightened negative affect, but the specific processes underlying this link are not well understood. The purpose of this manuscript was to examine whether affect state (neutral or negative mood) impacts food choice decision-making processes that may increase the likelihood of binge eating in individuals with bulimia nervosa (BN). The researchers used a randomized crossover design in women with BN (n=25) and controls (n=21), in which participants underwent a negative or neutral mood induction prior to completing a food-choice task. The researchers found that despite no differences in food choices in the negative and neutral conditions, women with BN demonstrated a stronger bias toward considering the 'tastiness' before the 'healthiness' of the food after the negative mood induction.

      Strengths:

      The topic is important and clinically relevant and methods are sound. The use of computational modeling to understand nuances in decision-making processes and how that might relate to eating disorder symptom severity is a strength of the study.

      Weaknesses:

      The sample size was relatively small and may have been underpowered to find differences in outcomes (i.e., food choice behaviors). Participants were all women with BN, which limits the generalizability of findings to the larger population of individuals who engage in binge eating. It is likely that the negative affect manipulation was weak and may not have been potent enough to change behavior. Moreover, it is unclear how long the negative affect persisted during the actual task. It is possible that any increases in negative affect would have dissipated by the time participants were engaged in the decision-making task.

      We thank the Reviewer for their comments on the strengths of the paper, and for highlighting these important considerations regarding the sample demographics and the negative affect induction. As in the original paper that focused only on ultimate food choice behaviors, we now specifically acknowledge that the study was only powered to detect small to medium group differences in the effect of negative emotion on these final choice behaviors.

      Regarding the sample demographics, we agree that the study’s inclusion of only female participants is a limitation. Although the original decision for this sampling strategy was informed by data suggesting that bulimia nervosa is roughly six times more prevalent among females than males (Udo & Grilo, 2018), we now note in the discussion that our female-only sample limits the generalizability of the findings.

      We also agree with the Reviewer’s noted limitations of the negative mood induction, and based on the reviewer’s suggestions, we have expanded our original description of these limitations in the Discussion. Specifically, we now note that although the task was completed immediately after the affect induction, the study did not include intermittent mood assessments throughout the choice task, so it is unclear how long the negative affect persisted during the actual task.

      Reviewer #3 (Public review):

      Summary:

      The study uses the food choice task, a well-established method in eating disorder research, particularly in anorexia nervosa. However, it introduces a novel analytical approach - the diffusion decision model - to deconstruct food choices and assess the influence of negative affect on how and when tastiness and healthiness are considered in decision-making among individuals with bulimia nervosa and healthy controls.

      Strengths:

      The introduction provides a comprehensive review of the literature, and the study design appears robust. It incorporates separate sessions for neutral and negative affect conditions and counterbalances tastiness and healthiness ratings. The statistical methods are rigorous, employing multiple testing corrections.

      A key finding - that negative affect induction biases individuals with bulimia nervosa toward prioritizing tastiness over healthiness - offers an intriguing perspective on how negative affect may drive binge eating behaviors.

      Weaknesses:

      A notable limitation is the absence of a sample size calculation, which, combined with the relatively small sample, may have contributed to null findings. Additionally, while the affect induction method is validated, it is less effective than alternatives such as image or film-based stimuli (Dana et al., 2020), potentially influencing the results.

      We agree that the limited sample size and specific affect induction method may have contributed to the null model-agnostic behavioral findings. Based on this Reviewer’s and Reviewer 2’s comments, we have added these factors to our acknowledgements of limitations in the discussion.

      Another concern is the lack of clarity regarding which specific negative emotions were elicited. This is crucial, as research suggests that certain emotions, such as guilt, are more strongly linked to binge eating than others. Furthermore, recent studies indicate that negative affect can lead to both restriction and binge eating, depending on factors like negative urgency and craving (Leenaerts et al., 2023; Wonderlich et al., 2024). The study does not address this, though it could explain why, despite the observed bias toward tastiness, negative affect did not significantly impact food choices.

      We thank the Reviewer for raising these important points and possibilities. In the Supplementary Materials, we have added an additional analysis of the specific POMS subscales that comprise the total negative affect calculation that was reported in the original paper (Gianini et al., 2019). We also report total negative affect scores from the POMS in the main text. Ultimately, we found that, across both groups, the negative affect induction increased responses related to anger, confusion, depression, and tension while reducing vigor.

      We agree with the Reviewer that factors like negative urgency and cravings are relevant here. The study did not collect any measures of craving, and in response to Reviewer 1 and this Reviewer, we now note in the discussion that replication studies including momentary craving assessments will be important. While we do not have any measurements of cravings, we did measure negative urgency. The original paper (Gianini et al., 2019) did not find that negative urgency was related to restrictive food choices. We have now repeated those analyses, and we also were unable to find any meaningful patterns related to negative urgency. Nonetheless, we have added an analysis of negative urgency scores and decision parameters to the Supplementary Materials.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Please improve the description of the computational methods: the fit of the DDM, the difference between the models used in the DDM, and the difference between the DDM model and the models used in the linear mixed models (the word "model" is at the end confusing as it may refer either to the DDM or to the statistical analysis of the DDM parameters).

      We thank the Reviewer for highlighting the unclear language. We have updated the main text to clarify when the term “model” refers to the DDM itself versus the regression models assessing DDM parameters. As described above, we have clarified that both tests of model fit (WAIC and posterior predictive checks) suggest that Model 3 was the best fit to the data. We have also clarified the differences between the tested models in the Supplementary Materials.

      Please avoid reporting estimates of main effects in statistical models when an interaction is included: the estimates of the main effects may be heavily biased by the interaction term (this can be checked by re-running the model without the interaction term).

      We sincerely appreciate the Reviewer’s comment regarding the interpretation of main effects in the presence of significant interaction terms. In the revised manuscript, we no longer discuss significant main effects and instead focus on interpreting the interaction terms.

      Additionally, to help unpack interaction effects, we now include exploratory simple effects analyses in the supplementary materials. Simple effects analyses allow us to examine the effects of one independent variable at specific values of other independent variables (Aiken et al., 1991; Brambor et al., 2006; Jaccard & Turrisi, 2003; Winer et al., 1991).

      Supplementary tables S5 and S6 are excessive: there is no third-level interaction (supplementary tables S3 and S4) to justify a split between BN and healthy participants. Please perform rather a descending regression. Accordingly, the results reported in the second paragraph of page 7 should be entirely rewritten.

      We agree with the Reviewer’s suggestion that these tables are unnecessary. We have updated them to include details about simple effects analyses described above. We have revised the main text to reflect these changes.

      The words such as "predictive" indicating a causality link is used in several places in the manuscript including the supplementary materials while the experimental design does not allow such claims. This should be rephrased.

      We agree with the Reviewer that the term “predicted” in the main text improperly suggested a causal relationship between symptom severity and DDM parameters that our methods cannot evaluate. We have updated the main text with more appropriate language. However, our use of the term “predicted” in the Supplementary Materials refers to predicting the probability of a choice based on trial-level features which is standard use of the term in the computational cognitive modeling literature (Piray et al., 2019; Wilson & Collins, 2019; Zhang et al., 2020).

      The word "evaluated" appears twice in line 42 of the supplementary materials. Same with "in" at line 50.

      Thank you very much for highlighting this. We have removed the repeated words.

      Reviewer #2 (Recommendations for the authors):

      (1) I think it would be helpful if the authors noted in the Methods how long the food-choice task took. Prior research has suggested that in-lab mood inductions are very short-lasting (e.g., max 7 minutes) and it is likely that the task itself may have impacted the mood states of participants. Expanding on this in the Discussion/limitations seems important.

      The Reviewer raises an important point regarding the duration of our affect manipulation. Since we did not measure mood during or after the Food Choice Task, we cannot determine how long these effects persisted. We have added this limitation to the discussion section, noting that the absence of continuous affect measures following mood induction is a widespread limitation in the field.

      (2) Personally, I was a bit confused about what data the researchers were using to extrapolate information on whether or not participants were considering healthiness or tastiness. How was this operationalized? Is this an assumption being made based on how quickly someone chose a low-fat vs. high-fat food?

      We thank this Reviewer for highlighting that our models’ complexity warrants a more thorough explanation.

      Since we collected tastiness and healthiness attribute ratings during the first phase of the Food Choice Task, we can use those values to determine how these attribute values influence decision-making. Independently, foods were classified as low-fat or high-fat based on their objective properties (i.e., the percentage of calories from fat). However, the primary information we used to compute model parameters were participants’ attribute ratings, choices, and response times.

      In these models, the drift rate parameter captures the speed and direction of evidence accumulation. As the unsigned magnitude of the drift rate increases, the decision-maker is making up their mind more quickly. Once the evidence accumulates to a response boundary, the option associated with that boundary is selected. A positive drift rate means they are moving toward choosing one option (i.e., upper boundary), and a negative drift rate means they are moving toward choosing the other (i.e., lower boundary). In these decisions, decision-makers often consider multiple attributes, such as perceived healthiness and tastiness. Each of these attributes can influence the evidence accumulation process with different strengths, or weights.

      In addition, decision-makers do not consider all attributes at the same time. Inspired by earlier work on multi-attribute decision-making (Maier et al., 2020; Sullivan & Huettel, 2021), our modeling approach computes a parameter (i.e., relative attribute onset) which captures the time delay between when each attribute starts influencing the evidence accumulation process. This parameter gives us a way to estimate when decision-makers are considering different attributes, and tells us how much influence each attribute has, because if the attribute starts late, it has less time to influence the decision. These models use a piecewise drift rate function to describe how evidence changes over time within a trial: sometimes the decision maker only considers taste, sometimes only health, and other times both. Importantly, models with a relative attribute onset parameter can produce key behavioral patterns observed in mouse-tracking studies that models without this parameter are unable to replicate (Maier et al., 2020).

      In summary, the computational model describes decision-makers’ behaviors (what they would choose, and how fast they would choose) using different potential values of the drift weights and relative start time parameters. We then used Bayesian estimation methods to compare the model's predictions to the actual data. By examining how reaction times and choices change depending on the attribute values of the presented options, the model allows us to infer when each attribute is considered, and how strongly it influences the final choice.

      We have clarified this in the main text.

      Reviewer #3 (Recommendations for the authors):

      I wonder whether there were any measures concerning negative affect before and after the mood induction? This would make it clearer whether there was a significant change before and after. If different emotions were assessed, which emotion showed the strongest change?

      We thank the Reviewer for flagging this point. We realize that the main text did not make it clear that mood was assessed before and after the mood induction using the POMS (McNair et al., 1989). While these analyses were conducted and the results were reported in the original manuscript (Gianini et al., 2019), we now report them in the main text for completeness. Additionally, we added more details about how specific emotions changed by analyzing the subscales of the POMS in the Supplementary Materials. As mentioned above, we found that, across both groups, the negative affect induction increased responses related to anger, confusion, depression, and tension while reducing vigor.

      Thank you again for your consideration and for the reviewers’ comments and suggestions. We believe their incorporation has significantly strengthened the paper. In addition, thank you for the opportunity to publish our work in eLife. We look forward to hearing your response.

      References

      Aiken, L. S., West, S. G., & Reno, R. R. (1991). Multiple regression: Testing and interpreting interactions. Sage Publications, Inc.

      Alpers, G. W., & Tuschen-Caffier, B. (2001). Negative feelings and the desire to eat in bulimia nervosa. Eating Behaviors, 2(4), 339–352. https://doi.org/10.1016/S1471-0153(01)00040-X

      Berg, K. C., Crosby, R. D., Cao, L., Peterson, C. B., Engel, S. G., Mitchell, J. E., & Wonderlich, S. A. (2013). Facets of negative affect prior to and following binge-only, purge-only, and binge/purge events in women with bulimia nervosa. Journal of Abnormal Psychology, 122(1), 111–118. https://doi.org/10.1037/a0029703

      Brambor, T., Clark, W. R., & Golder, M. (2006). Understanding Interaction Models: Improving Empirical Analyses. Political Analysis, 14(1), 63–82. https://doi.org/10.1093/pan/mpi014

      Dalton, B., Foerde, K., Bartholdy, S., McClelland, J., Kekic, M., Grycuk, L., Campbell, I. C., Schmidt, U., & Steinglass, J. E. (2020). The effect of repetitive transcranial magnetic stimulation on food choice-related self-control in patients with severe, enduring anorexia nervosa. International Journal of Eating Disorders, 53(8), 1326–1336. https://doi.org/10.1002/eat.23267

      Gianini, L., Foerde, K., Walsh, B. T., Riegel, M., Broft, A., & Steinglass, J. E. (2019). Negative affect, dietary restriction, and food choice in bulimia nervosa. Eating Behaviors, 33, 49–54. https://doi.org/10.1016/j.eatbeh.2019.03.003

      Haedt-Matt, A. A., & Keel, P. K. (2011). Revisiting the affect regulation model of binge eating: A meta-analysis of studies using ecological momentary assessment. Psychological Bulletin, 137(4), 660–681. https://doi.org/10.1037/a0023660

      Hauser, T. U., Skvortsova, V., Choudhury, M. D., & Koutsouleris, N. (2022). The promise of a model-based psychiatry: Building computational models of mental ill health. The Lancet Digital Health, 4(11), e816–e828. https://doi.org/10.1016/S2589-7500(22)00152-2

      Hilbert, A., & Tuschen-Caffier, B. (2007). Maintenance of binge eating through negative mood: A naturalistic comparison of binge eating disorder and bulimia nervosa. International Journal of Eating Disorders, 40(6), 521–530. https://doi.org/10.1002/eat.20401

      Huys, Q. J. M., Browning, M., Paulus, M. P., & Frank, M. J. (2021). Advances in the computational understanding of mental illness. Neuropsychopharmacology, 46(1), 3–19. https://doi.org/10.1038/s41386-020-0746-4

      Jaccard, J., & Turrisi, R. (2003). Interaction effects in multiple regression (2nd ed.). Sage Publications, Inc.

      Lerche, V., Voss, A., & Nagler, M. (2017). How many trials are required for parameter estimation in diffusion modeling? A comparison of different optimization criteria. Behavior Research Methods, 49(2), 513–537. https://doi.org/10.3758/s13428-016-0740-2

      Maier, S. U., Raja Beharelle, A., Polanía, R., Ruff, C. C., & Hare, T. A. (2020). Dissociable mechanisms govern when and how strongly reward attributes affect decisions. Nature Human Behaviour, 4(9), Article 9. https://doi.org/10.1038/s41562-020-0893-y

      McNair, D., Lorr, M., & Droppleman, L. (1989). Profile of mood states (POMS).

      Piray, P., Dezfouli, A., Heskes, T., Frank, M. J., & Daw, N. D. (2019). Hierarchical Bayesian inference for concurrent model fitting and comparison for group studies. PLOS Computational Biology, 15(6), e1007043. https://doi.org/10.1371/journal.pcbi.1007043

      Ratcliff, R., & Childers, R. (2015). Individual differences and fitting methods for the two-choice diffusion model of decision making. Decision, 2(4), 237–279. https://doi.org/10.1037/dec0000030

      Rouder, J. N., & Lu, J. (2005). An introduction to Bayesian hierarchical models with an application in the theory of signal detection. Psychonomic Bulletin & Review, 12(4), 573–604. https://doi.org/10.3758/BF03196750

      Smyth, J. M., Wonderlich, S. A., Heron, K. E., Sliwinski, M. J., Crosby, R. D., Mitchell, J. E., & Engel, S. G. (2007). Daily and momentary mood and stress are associated with binge eating and vomiting in bulimia nervosa patients in the natural environment. Journal of Consulting and Clinical Psychology, 75(4), 629–638. https://doi.org/10.1037/0022-006X.75.4.629

      Steinglass, J., Foerde, K., Kostro, K., Shohamy, D., & Walsh, B. T. (2015). Restrictive food intake as a choice—A paradigm for study. International Journal of Eating Disorders, 48(1), 59–66. https://doi.org/10.1002/eat.22345

      Sullivan, N., & Huettel, S. A. (2021). Healthful choices depend on the latency and rate of information accumulation. Nature Human Behaviour, 5(12), Article 12. https://doi.org/10.1038/s41562-021-01154-0

      Udo, T., & Grilo, C. M. (2018). Prevalence and Correlates of DSM-5–Defined Eating Disorders in a Nationally Representative Sample of U.S. Adults. Biological Psychiatry, 84(5), 345–354. https://doi.org/10.1016/j.biopsych.2018.03.014

      Watanabe, S. (2010). Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory. Journal of Machine Learning Research, 11, 3571–3594.

      Wiecki, T. V., Sofer, I., & Frank, M. J. (2013). HDDM: Hierarchical Bayesian estimation of the drift-diffusion model in Python. Frontiers in Neuroinformatics, 7. https://doi.org/10.3389/fninf.2013.00014

      Wilson, R. C., & Collins, A. G. (2019). Ten simple rules for the computational modeling of behavioral data. eLife, 8, e49547. https://doi.org/10.7554/eLife.49547

      Winer, B. J., Brown, D. R., & Michels, K. M. (1991). Statistical principles in experimental design (3rd ed). McGraw-Hill.

      Wise, T., Robinson, O. J., & Gillan, C. M. (2023). Identifying Transdiagnostic Mechanisms in Mental Health Using Computational Factor Modeling. Biological Psychiatry, 93(8), 690–703. https://doi.org/10.1016/j.biopsych.2022.09.034

      Zhang, L., Lengersdorff, L., Mikus, N., Gläscher, J., & Lamm, C. (2020). Using reinforcement learning models in social neuroscience: Frameworks, pitfalls and suggestions of best practices. Social Cognitive and Affective Neuroscience, 15(6), 695–707. https://doi.org/10.1093/scan/nsaa089

    1. This is often what great films do. They take what is available in reality and distill it down so viewers can understand it the first time. When you make a copy of a copy, things get blurry, more difficult to decipher. Films often do the exact opposite with elements of culture. They reduce present reality and essential past experiences into distilled intertextual products—mediated messages that combine various types of text into one.
      1. Films simplify reality Great films don’t try to show all of real life as it is. Reality is complex, messy, and full of details. Instead, filmmakers select and simplify certain parts of reality so that viewers can quickly grasp the meaning or emotion the filmmaker wants to communicate. It’s like taking a rich soup and boiling it down into a flavorful sauce — the essence remains, but it’s more concentrated and easier to take in.
      2. Copies of copies get blurry When you make a copy of a copy (for example, photocopying a photocopy), each generation becomes less clear — details are lost. This metaphor means that when cultural ideas get repeated without care, they can become blurry or watered down, harder for audiences to understand or connect with.
      3. Films do the opposite Rather than making ideas blurrier, films often clarify them. They take complex cultural experiences or histories — things that might be confusing or vast in real life — and distill them into clear, emotionally powerful scenes, symbols, or stories.
      4. Intertextuality: blending many “texts” When the passage says films produce “intertextual products,” it means that movies mix together many forms of expression — visual art, literature, music, news, myths, history, advertising, etc. Each of these is a kind of “text.” A film is thus a mediated message — a carefully crafted combination of many cultural materials that reflect and reshape reality in a form audiences can immediately relate to.
    1. Author response:

      The following is the authors’ response to the previous reviews

      eLife Assessment

      This study provides an important extension of credibility-based learning research with a well-controlled paradigm by showing how feedback reliability can distort reward-learning biases in a disinformation-like bandit task. The strength of evidence is convincing for the core effects reported (greater learning from credible feedback; robust computational accounts, parameter recovery) but incomplete for the specific claims about heightened positivity bias at low credibility, which depend on a single dataset, metric choices (absolute vs relative), and potential perseveration or cueing confounds. Limitations concerning external validity and task-induced cognitive load, and the use of relatively simple Bayesian comparators, suggest that incorporating richer active-inference/HGF benchmarks and designs that dissociate positivity bias from choice history would further strengthen this paper.

      We thank the editors and reviewers for a careful assessment.

      In response, we have toned down our claims regarding heightened positivity biases, explicitly stating that the findings are equivocal and depend on the scale (i.e., metric) and study (whereas previously we stated our hypothesis was supported). We have also clarified which aspects of the findings extend beyond perseveration. We believe the evidence now presented provides convincing support for this more nuanced claim.

      We wish to emphasize that dissociating positivity bias from perseveration is a challenge not just for our work, but for the entire field of behavioral reinforcement learning. In fact, in a recent preprint (Learning asymmetry or perseveration? A critical re-evaluation and solution to a pervasive confound, Vidal-Perez et al., 2025; https://osf.io/preprints/psyarxiv/xdse5_v1) we argue that, to date, all studies claiming evidence for positivity bias beyond perseveration suffered flaws, and that there are currently no robust, behavioral, model-agnostic signatures that dissociate effects of positivity bias from perseveration. While this remains a limitation, we would stress that, relative to the state of the art in the field, our work goes beyond what has previously been reported. We believe this should also be reflected in the assessment of our work.

      We elaborate more on these issues in our responses to R3 below.

      Public Reviews:

      Reviewer #1 (Public review):

      Comments on revisions:

      In their updated version the authors have made some edits to address my concerns regarding the framing of the 'normative' bayesian model, clarifying that they utilized a simple bayesian model which is intended to adhere in an idealized manner to the intended task structure, though further simulations would have been ideal.

      The authors, however, did not take my recommendation to explore the symptoms in the symptom scales they collected as being a potential source of variability. They note that these were for hypothesis generation and were exploratory, fair enough, but this study is not small and there should have been sufficient sample size for a very reasonable analysis looking at symptom scores.

      However, overall the toned down claims and clarifications of intent are adequate responses to my previous review.

      We thank the reviewer. We remain convinced that targeted hypotheses tested using betterpowered designs is the most effective way to examine how our findings relate to symptom scales, something we hope to pursue in future studies.

      Reviewer #2 (Public review):

      This important paper studies the problem of learning from feedback given by sources of varying credibility. The convincing combination of experiment and computational modeling helps to pin down properties of learning, while opening unresolved questions for future research.

      Summary:

      This paper studies the problem of learning from feedback given by sources of varying credibility. Two bandit-style experiments are conducted in which feedback is provided with uncertainty, but from known sources. Bayesian benchmarks are provided to assess normative facets of learning, and alternative credit assignment models are fit for comparison. Some aspects of normativity appear, in addition to possible deviations such as asymmetric updating from positive and negative outcomes.

      Strengths:

      The paper tackles an important topic, with a relatively clean cognitive perspective. The construction of the experiment enables the use of computational modeling. This helps to pinpoint quantitatively the properties of learning and formally evaluate their impact and importance. The analyses are generally sensible, and advanced parameter recovery analyses (including cross-fitting procedure) provide confidence in the model estimation and comparison. The authors have very thoroughly revised the paper in response to previous comments.

      Weaknesses:

      The authors acknowledge the potential for cognitive load and the interleaved task structure to play a meaningful role in the results, though leave this for future work. This is entirely reasonable, but remains a limitation in our ability to generalize the results. Broadly, some of the results obtain in cases where the extent of generalization is not always addressed and remains uncertain.

      We thank the reviewer once more for a thoughtful assessment of our work.

      Reviewer #3 (Public review):

      Summary

      This paper investigates how disinformation affects reward learning processes in the context of a twoarmed bandit task, where feedback is provided by agents with varying reliability (with lying probability explicitly instructed). They find that people learn more from credible sources, but also deviate systematically from optimal Bayesian learning: They learned from uninformative random feedback, learned more from positive feedback, and updated too quickly from fully credible feedback (especially following low-credibility feedback). Overall, this study highlights how misinformation could distort basic reward learning processes, without appeal to higher order social constructs like identity.

      Strengths

      • The experimental design is simple and well-controlled; in particular, it isolates basic learning processes by abstracting away from social context

      • Modeling and statistics meet or exceed standards of rigor

      • Limitations are acknowledged where appropriate, especially those regarding external validity - The comparison model, Bayes with biased credibility estimates, is strong; deviations are much more compelling than e.g. a purely optimal model

      • The conclusions are of substantial interest from both a theoretical and applied perspective

      Weaknesses

      The authors have addressed most of my concerns with the initial submission. However, in my view, evidence for the conclusion that less credible feedback yields a stronger positivity bias remains weak. This is due to two issues.

      Absolute or relative positivity bias?

      The conclusion of greater positivity bias for lower credible feedback (Fig 5) hinges on the specific way in which positivity bias is defined. Specifically, we only see the effect when normalizing the difference in sensitivity to positive vs. negative feedback by the sum. I appreciate that the authors present both and add the caveat whenever they mention the conclusion. However, without an argument that the relative definition is more appropriate, the fact of the matter is that the evidence is equivocal.

      We thank the reviewer for an insightful engagement with our manuscript. The reviewer’s comments on the subtle interplay between perseveration and learning asymmetries were so thought-provoking that they have inspired a new article that delves deeply into how gradual choice-perseveration can lead to spurious conclusions about learning asymmetries in Reinforcement Learning (Learning asymmetry or perseveration? A critical re-evaluation and solution to a pervasive confound, Vidal-Perez et al., 2025; https://osf.io/preprints/psyarxiv/xdse5_v1).

      To the point- we agree with the reviewer the evidence for this hypothesis is equivocal, and we took on board the suggestion to tone down our interpretation of the findings. We now state explicitly, both in the results section (“Positivity bias in learning and credibility”) and in the Discussion, that the results provide equivocal support for our hypothesis:

      RESULTS

      “However, we found evidence for agent-based modulation of positivity bias when this bias was measured in relative terms. Here we calculated, for each participant and agent, a relative Valence Bias Index (rVBI) as the difference between the Credit Assignment for positive feedback (CA+) and negative feedback (CA-), relative to the overall magnitude of CA (i.e., |CA+| + |CA-|) (Fig. 5c). Using a mixed effects model, we regressed rVBIs on their associated credibility (see Methods), revealing a relative positivity bias for all credibility levels [overall rVBI (b=0.32, F(1,609)=68.16), 50% credibility (b=0.39, t(609)=8.00), 75% credibility (b=0.41, F(1,609)=73.48) and 100% credibility (b=0.17, F(1,609)=12.62), all p’s<0.001]. Critically, the rVBI varied depending on the credibility of feedback (F(2,609)=14.83, p<0.001), such that the rVBI for the 3-star agent was lower than that for both the 1-star (b=-0.22, t(609)=-4.41, p<0.001) and 2-start agent (b=-0.24, F(1,609)=24.74, p<0.001). Feedback with 50% and 75% credibility yielded similar rVBI values (b=0.028, t(609)=0.56,p=0.57). Finally, a positivity bias could not stem from a Bayesian strategy as both Bayesian models predicted a negativity bias (Fig. 5b-c; Fig. S8; and SI 3.1.1.3 Table S11-S12, 3.2.1.1, and 3.2.1.2). Taken together, this provides equivocal support for our initial hypothesis, depending on the measurement scale used to assess the effect (absolute or relative).”

      “Previous research has suggested that positivity bias may spuriously arise from pure choice-perseveration (i.e., a tendency to repeat previous choices regardless of outcome) (49–51). While our models included a perseveration-component, this control may not be perfect. Therefore, in additional control analyses, we generated (using ex-post simulations based on best fitting parameters) synthetic datasets using models including choice-perseveration but devoid of feedback-valence bias, and fitted them with our credibilityvalence model (see SI 3.6.1). These analyses confirmed that a pure perseveration account can masquerade as an apparent positivity bias and even predict the qualitative pattern of results related to credibility (i.e., a higher relative positivity bias for low-credibility feedback). Critically, however, this account consistently predicted a reduced magnitude of credibility-effect on relative positivity bias as compared to the one we observed in participants, suggesting some of the relative amplification of positivity bias goes above and beyond a contribution from perseveration.”

      DISCUSSION

      “Previous reinforcement learning studies, report greater credit-assignment based on positive compared to negative feedback, albeit only in the context of veridical feedback (43,44,63). Here, we investigated whether a positivity bias is amplified for information of low credibility, but our findings are equivocal and vary as a function of scaling (absolute or relative) and study. We observe selective absolute amplification of a positivity bias for information of low and intermediate credibility in the discovery study alone. In contrast, we find a relative (to the overall extent of CA) amplification of confirmation bias in both studies. Importantly, the magnitude of these amplification effects cannot be reproduced in ex-post simulations of a model incorporating simple choice perseveration without an explicit positivity bias, suggesting that at least part of the amplification reflects a genuine increase in positivity bias.”

      There is also a good reason to think that the absolute definition is more appropriate. As expected, participants learn more from credible feedback. Thus, normalizing by average learning (as in the relative definition) amounts to dividing the absolute difference by increasingly large numbers for more credible feedback. If there is a fixed absolute positivity bias (or something that looks like it), the relative bias will necessarily be lower for more credible feedback. In fact, the authors own results demonstrate this phenomenon (see below). A reduction in relative bias thus provides weak evidence for the claim.

      We agree with the reviewer that absolute and relative measures can yield conflicting impressions. To some extent, this is precisely why we report both (i.e., if the two would necessarily agree, reporting both would be redundant). However, we are unconvinced that one measure is inherently more appropriate than the other. In our view, both are valid as long as they are interpreted carefully and in the right context. To illustrate, consider salary changes, which can be expressed on either an absolute or a relative scale. If Bob’s £100 salary increases to £120 and Alice’s £1000 salary increases to £1050, then Bob’s raise is absolutely smaller but relatively larger. Is one measure more appropriate than the other? Economists would argue not; rather, the choice of scale depends on the question at hand.

      In the same spirit, we have aimed to be as clear and transparent as possible in stating that 1) in the main study, there is no effect in the absolute sense, and 2) framing positivity bias in relative terms is akin to expressing it as a percentage change.

      It is interesting that the discovery study shows evidence of a drop in absolute bias. However, for me, this just raises questions. Why is there a difference? Was one a just a fluke? If so, which one?

      We are unsure why we didn’t find absolute amplification effect within the main studies. However, we don’t think the results from the preliminary study were just a ‘fluke’. We have recently conducted two new studies (in preparation for publication), where we have been able to replicate the finding of increased positivity bias for lower-credibility sources in both absolute and relative terms. We agree current results leave unresolved questions and we hope to follow up on these in the near future.

      Positivity bias or perseveration?

      Positivity bias and perseveration will both predict a stronger relationship between positive (vs. negative) feedback and future choice. They can thus be confused for each other when inferred from choice data. This potentially calls into question all the results on positivity bias.

      The authors clearly identify this concern in the text and go to considerable lengths to rule it out. However, the new results (in revision 1) show that a perseveration-only model can in fact account for the qualitative pattern in the human data (the CA parameters). This contradicts the current conclusion:

      Critically, however, these analyses also confirmed that perseveration cannot account for our main finding of increased positivity bias, relative to the overall extent of CA, for low-credibility feedback.

      Figure 24c shows that the credibility-CA model does in fact show stronger positivity bias for less credible feedback. The model distribution for credibility 1 is visibly lower than for credibilities 0.5 and 0.75.

      The authors need to be clear that it is the magnitude of the effect that the perseveration-only model cannot account for. Furthermore, they should additionally clarify that this is true only for models fit to data; it is possible that the credibility-CA model could capture the full size of the effect with different parameters (which could fit best if the model was implemented slightly differently).

      The authors could make the new analyses somewhat stronger by using parameters optimized to capture just the pattern in CA parameters (for example by MSE). This would show that the models are in principle incapable of capturing the effect. However, this would be a marginal improvement because the conclusion would still rest on a quantitative difference that depends on specific modeling assumptions.

      We thank the reviewer for raising this important point. We agree our original wording could have been more carefully formulated and are grateful for this opportunity to refine this. The reviewer is correct that a model with only perseveration can qualitatively reproduce the pattern of increased relative positivity bias for less credible feedback in the main study (but not in the discovery study), and our previous text did not acknowledge this. As stated in the previous section, we have revised the manuscript (in the Results, Discussion, and SI) to ensure we address this in full. Our revised text now makes it explicit that while a pure perseveration account predicts the qualitative pattern, it does not predict the magnitude of the effects we observe in our data.

      RESULTS

      “Previous research has suggested that positivity bias may spuriously arise from pure choice-perseveration (i.e., a tendency to repeat previous choices regardless of outcome) (49–51). While our models included a perseveration-component, we acknowledge this control is not perfect. Therefore, in additional control analyses, we generated (using ex-post simulations based on best fitting parameters) synthetic datasets using models including choice-perseveration, but devoid of feedback-valence bias, and fitted these with our credibility-valence model (see SI 3.6.1). These analyses confirmed that a pure perseveration account can masquerade as an apparent positivity bias, and even predict the qualitative pattern of results related to credibility (i.e., a higher relative positivity bias for low-credibility feedback). Critically, however, this account consistently predicted a reduced magnitude of credibility-effect on relative positivity bias as compared to the one we observed in participants, suggesting at least some of the relative amplification of positivity bias goes above and beyond contributions from perseveration.”

      DISCUSSION

      “Previous reinforcement learning studies, report greater credit-assignment based on positive compared to negative feedback, albeit only in the context of veridical feedback (43,44,63). Here, we investigated whether a positivity bias is amplified for information of low credibility, but our findings on this matter were equivocal and varied as a function of scaling (absolute or relative) and study. We observe selective absolute amplification of the positivity bias for information of low and intermediate credibility in the discovery study only. In contrast, we find a relative (to the overall extent of CA) amplification of confirmation bias in both studies. Importantly, the magnitude of these amplification effects cannot be reproduced in ex-post simulations of a model incorporating simple choice perseveration without an explicit positivity bias, suggesting that at least part of the amplification reflects a genuine increase in positivity bias.”

      SI (3.6.1)

      “Interestingly, a pure perseveration account predicted an amplification of the relative positivity bias under low (compared to full) credibility (with the two rightmost histograms in Fig. S24d falling in the positive range). However, the magnitude of this effect was significantly smaller than the empirical effect (as the bulk of these same histograms lies below the green points). Moreover, this account predicted a negative amplification (i.e., attenuation) of an absolute positivity bias, which was again significantly smaller than the empirical effect (see corresponding histograms in S24b). This pattern raises an intriguing possibility that perseveration may, at least partially, mask a true amplification of absolute positivity bias.”

      Furthermore, our revisions make it now explicit that these analyses are based on ex-post simulations using the model best-fitting parameters. We do not argue that this pattern can’t be captured by other parameters crafted specifically to capture this pattern. However, we believe that the ex-post fitting is the best practice to check whether a model can produce an effect of interest (see for example The Importance of Falsification in Computational Cognitive Modeling, Palminteri et al., 2017; https://www.sciencedirect.com/science/article/pii/S1364661317300542?via%3Dihub). Based on this we agree with the reviewer the benefit from the suggested additional analyses is minimal.

      New simulations clearly demonstrate the confound in relative bias

      Figure 24 also speaks to the relative vs. absolute question. The model without positivity bias shows a slightly stronger absolute "positivity bias" for the most credible feedback, but a weaker relative bias. This is exactly in line with the logic laid out above. In standard bandit tasks, perseveration can be quite well-captured by a fixed absolute positivity bias, which is roughly what we see in the simulations (I'm not sure what to make of the slight increase; perhaps a useful lead for the authors). However, when we divide by average credit assignment, we now see a reduction. This clearly demonstrates that a reduction in relative bias can emerge without any true differences in positivity bias.

      This relates back to the earlier point about scaling. However, we wish to clarify that this is not a confound in the usual sense i.e., an external variable that varies systematically with the independent variable (credibility) and influences the dependent variable (positivity bias), thereby undermining causal inference. Rather, we consider it is a scaling issue: measuring absolute versus relative changes in the same variable can yield conflicting impressions.

      Given everything above, I think it is unlikely that the present data can provide even "solid" evidence for the claim that positivity bias is greater with less credible feedback. This confound could be quickly ruled out, however, by a study in which feedback is sometimes provided in the absence of a choice. This would empirically isolate positivity bias from choice-related effects, including perseveration.

      We trust our responses make clear we have tempered our claims and stated explicitly where a conclusion is equivocal. We believe we have convincing evidence for a nuanced claim regarding how credibility affects positivity bias.

      We are grateful for the reviewer’s suggestion of a study design to empirically isolate positivity bias from choice-related effects. We have considered this carefully, but do not believe the issue is as straightforward as suggested. As we understand it, the suggestion assumes that positivity bias should persist when people process feedback in the absence of choice (where perseverative tendencies would not be elicited). While this is possible, there is existing work that indicates otherwise. In particular, Chambon et al. (2020, Nature Human Behavior) compared learning following free versus forced choices and found that learning asymmetries, including a positivity bias, were selectively evident in free-choice trials but not in forced-choice trials. This implies that a positivity bias is intricately tied to the act of choosing, rather than a general learning artifact that emerges independently of choice context. This is further supported by arguments that the positivity bias in reinforcement learning is better understood as a form of confirmation bias, whereby feedback confirming a choice is weighted more heavily (Palminteri et al., 2017, Plos Comp. Bio.). In other words, it is unclear whether one should expect positivity/confirmation bias to emerge when feedback is provided in the absence of choice.

      That said, we agree fully with a need to have task designs that better dissociate positivity bias from perseveration. We now acknowledge in our Discussion that such designs can benefit future studies on this topic:

      Future studies could also benefit from using designs that are better suited for dissociating learning asymmetries from gradual perseveration (51).

      We hope to be able to pursue this direction in the future.

      Recommendations for the Authors:

      I greatly appreciate the care with which you responded to my comments. I'm sorry that I can't improve my overall evaluation, given the seriousness of the concerns in the public review (which the new results have unfortunately bolstered more than assuaged). If it were me, I would definitely collect more data because both issues could very likely be strongly addressed with slight modifications of the current task.

      Alternatively, you could just dramatically de-emphasize the claim that positivity bias is higher for less credible feedback. I will be sad because it was my favorite result, but you have many other strong results, and I would still label the paper "important" without this one.

      We thank the reviewer for an exceptionally thorough and insightful engagement with our manuscript. Your meticulous attention to detail, and sharp conceptual critiques, have been invaluable, and our paper is immeasurably stronger and more rigorous as a direct result of this input. Indeed, the referee’s comments inspired us to prepare a new article that delves deeply into the confound of dissociating between gradual choice-perseveration and learning asymmetries in RL (Learning asymmetry or perseveration? A critical re-evaluation and solution to a pervasive confound, Vidal-Perez et al., 2025; https://osf.io/preprints/psyarxiv/xdse5_v1).

      Specifically, in this new paper we address the point that dissociating positivity bias from perseveration is a challenge not just for our work, but for the entire field of behavioral reinforcement learning. In fact, we argue that all studies claiming evidence for positivity bias, over and above an effect of perseveration, are subject to flaws, including being biased to find evidence for positivity/confirmation bias. Furthermore, we agree with the reviewer’s wish to see modelagnostic support and note there are currently no robust, behavioral, model-agnostic signatures implicating positivity bias over and above an effect of perseveration. While this remains an acknowledged limitation within our current work, we trust the reviewer will agree that relative to other efforts in the field, our current work pushes the boundary and takes several important steps beyond what has previously been done in this area.

      Below are some minor notes, mostly on the new content-hopefully easy; please don't put much time into addressing these!

      Main text

      where individuals preferably learn from . Perhaps "preferentially"?

      The text has been modified to accommodate the reviewer’s comment:

      “Additionally, in both experiments, participants exhibited increased learning from trustworthy information when it was preceded by non-credible information and an amplified normalized positivity bias for noncredible sources, where individuals preferentially learn from positive compared to negative feedback (relative to the overall extent of learning).”

      One interpretation of this model is as a "sophisticated" logistic ... the CA parameters take the role of "regression coefficients"

      Consider removing "sophisticated" and also the quotations around "regression coefficients". This came across as unprofessional to me.

      The text has been modified to accommodate the reviewer’s comment:

      “The probability to choose a bandit (say A over B) in this family of models is a logistic function of the contrast choice-propensities between these two bandits. One interpretation of this model is as a logistic regression, where the CA parameters take the role of regression coefficients corresponding to the change in log odds of repeating the just-taken action in future trials based on the feedback (+/- CA for positive or negative feedback, respectively; the model also includes gradual perseveration which allows for constant log-odd changes that are not affected by choice feedback).”

      These models operate as our instructed-credibility and free-credibility Bayesian models, but also incorporate a perseveration values, updated in each trial as in our CA models (Eqs. 3 and 5).

      Is Eq 3 supposed to be Eq 4 here? I don't see how Eq 3 is relevant. Relatedly, please use a variable other than P for perseveration because P(chosen) reads as "probability chosen" - and you actually use P in latter sense in e.g. Eq 11

      The text has been modified to accommodate the reviewer’s comment. P values have been changed to Pers and P(bandit) has been replaced by Prob(bandit). “All models also included gradual perseveration for each bandit. In each trial the perseveration values (Pers) were updated according to

      Where PERS is a free parameter representing the P-value change for the chosen bandit, and fP (Î[0,1]) is the free parameter denoting the forgetting rate applied to the Pers value. Additionally, the Pers-values of all the non-chosen bandits (i.e., again, the unchosen bandit of the current pair, and all the bandits from the not-shown pairs) were forgotten as follows:

      We modelled choices using a softmax decision rule, representing the probability of the participant to choose a given bandit over the alternative:

      SI

      Figure 24 and Figure 26: in the x tick labels, consider using e.g. "0.5 vs 1" rather than "0.5-1". I initially read this as a bin range.

      We thank the reviewer for pointing this out. Our intention was to denote a direct subtraction (i.e., the effect for 0.5 credibility minus the effect for 1.0 credibility). We were concerned that not noting the subtraction might confuse readers about the direction of the plotted effect. We have clarified this in the figure legends:

      “Figure 24: Predicted positivity bias results for participants and for simulations of the Credibility-CA (including perseveration, but no valence-bias component). a, Valence bias results measured in absolute terms (by regressing the ML CA parameters, on their associated valence and credibility). b, Difference in positivity bias (measured in absolute terms) across credibility levels. On the x-axis, the hyphen (-) represents subtraction, such that a label of '0.5-1' indicates the difference in the measurement for the 0.5 and 1.0 credibility conditions. Such differences are again based in the same mixed effects model as plot a. The inflation of aVBI for lower-credibility agents is larger than the one predicted by a pure perseveration account. c, Valence bias results measured in relative terms (by regressing the rVBIs on their associated credibility). Participants present a higher rVBI than what would be predicted by a perseveration account (except for the completely credible agent). d, Difference in rVBI across credibility levels. Such differences are again based in the same mixed effects model as plot c. The inflation of rVBI for lower-credibility agents is larger than the one predicted by a pure perseveration account. Histograms depict the distribution of coefficients from 101 simulated group-level datasets generated by the Credibility-CA model and fitted with the Credibility-Valence CA model. Gray circles represent the mean coefficient from these simulations, while black/green circles show the actual regression coefficients from participant behaviour (green for significant effects in participants, black for non-significant). Significance markers (* p<.05, ** p<.01) indicate that fewer than 5% or 1% of simulated datasets, respectively, predicted an effect as strong as or stronger than that observed in participants, and in the same direction as the participant effect.”

      However, importantly, these simulations did not predict a change in the level of positivity bias as a function of feedback credibility

      You're confirming the null hypothesis here; running more simulations would likely yield a significant effect. The simulation shows a pretty clear pattern of increasing positivity bias with higher credibility. Crucially, this is the opposite of what people show. Please adjust the language accordingly.

      The text has been modified to accommodate the reviewer’s comment.

      “However, importantly, these simulations did not reveal a significant change in the level of positivity bias as a function of feedback credibility, neither at an absolute level (F(3,412)=1.43,p=0.24), nor at a relative level (F(3,412)=2.06,p=0.13) (Fig. S25a-c). Numerically, the trend was towards an increasing (rather than decreasing) positivity bias as a function of credibility.”

      More importantly, the inflation in positivity bias for lower credibility feedback is substantially higher in participants than what would be predicted by a pure perseveration account, a finding that holds true for both absolute (Fig. S24b) and relative (Fig. S24d) measures.

      A statistical test would be nice here, e.g. a regression like rVBI ~ credibility_1 * is_model. Alternatively, clearly state what to look for in the figure, where it is pretty clear when you know exactly what you're looking for.

      The text has been modified to make sure that the figure is easier to interpret (we pointed out to readers what they should look at):

      “Interestingly, a pure perseveration account predicted an amplification of the relative positivity bias under low (compared to full) credibility (with the two rightmost histograms in Fig. S24c falling in the positive range). However, the magnitude of this effect was significantly smaller than the empirical effect (as the bulk of these same histograms lies below the green points). Moreover, this account predicted a negative amplification (i.e., attenuation) of an absolute positivity bias, which was again significantly smaller than the empirical effect (see corresponding histograms in S24b). This pattern raises an intriguing possibility that perseveration may partially mask a true amplification of absolute positivity bias.”

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #1

      Evidence, reproducibility and clarity

      Summary:

      The authors use atomic force microscopy (AFM) to study mitochondria isolated from primary mouse livers, and they attempt to correlate these measurements with mitochondrial membrane potential and oxygen consumption under different bioenergetic conditions. They argue that AFM could be used diagnostically to assess mitochondrial function. While there is some novelty in potentially using AFM to assess mitochondrial function in the clinic, it is not clear how this would be more efficient or meaningful that assessing mitochondrial parameters by more standard methods, such as respirometry, confocal microscopy, etc. Considerably more work would need to be performed, particularly on relevant patient samples, to show that AFM holds potential as a diagnostic tool. It is important to note that the authors of this study have not taken sufficient care to quantify the mitochondrial membrane potential in a manner that could be considered reliable, which casts further doubt upon the merits of this method for diagnosing mitochondrial function. These concerns, laid out in detail below, should be thoroughly addressed before publication.

      Major comments:

      The authors used azide to inhibit complex V, but azide is also a potent inhibitor of complex IV (Bowler et al., 2006). Why did the authors not use oligomycin, which is more specific, to inhibit complex V? In Fig. 1 H - K, the y axes are labelled in a confusing or ambiguous way. The legend says that all data represent the mean {plus minus} SEM; however, panels D, F, H, and K have no error bars. For example, the data in H and K are shown as violin plots. Typically, the y axis would say what the name of the quantity is (e.g., mean TMRM fluorescence intensity) followed by the units (e.g., a.u.) in parentheses. However, the authors write, for example, in panel K "Mean pixel (TMRM)." The authors seem to follow the correct convention in panels D - G, so it is not clear why H - K are written incorrectly. In any event, the authors need to specify how these data were obtained, as there are virtually no details as to the methods of how these measurements of mitochondrial membrane potential were acquired. For example, JC-1 is a ratiometric probe. In its monomeric form, it emits a green signal, but, as the dye aggregates into so-called J-aggregates, the emission is red. The correct way of analyzing JC-1 signal is to compute the ratio of red over green fluorescence intensity. However, in the authors' quantifications, they simply say "Fluorescence (JC-1)." The units of the y axes go from zero to 20,000, which means that the authors likely did not assess the ratio of these emissions, so the data are not informative as to the actual mitochondrial membrane potential. Moreover, the authors indicate that they use 5 µM JC-1. This seems quite a high concentration, particularly for staining isolated mitochondria, which means that the dye has direct access to the organelle without having to cross the plasma membrane. There is no information about how long the dye was allowed to load and whether it was washed off prior to obtaining the measurements with the plate reader. Likewise, the authors used TMRM to also try to assess the mitochondrial membrane potential. In this case, they used 0.5 µM, but they did not indicate for what duration the mitochondria were exposed to the dye before going through the FACS. It should be noted, too, that TMRM is a Nernstian probe, which effectively stains mitochondria at concentrations as low is 1 nM. Accordingly, it is known that TMRM (and other mitochondrial dyes) can be toxic at higher concentrations, inhibiting essential processes such as OXPHOS. The very low dynamic range of the TMRM signal in panels H and K suggest that the signal was saturated, because there was too much dye loaded into the mitochondria. Moreover, the values, ranging merely from zero to 80 suggest a very insensitive method for quantifying the mitochondrial membrane potential. In Fig. S1 A-B, the authors used confocal microscopy to assess the isolated mitochondria. It would be wise to continue to use this technique for the other experiments, as plate readers and FACS offer no direct visual cues to validate that the numbers reflect bona fide biological measurements. Especially in the case of FACS, where there is an exceedingly large number of events, the statistics become essentially meaningless, as it is possible to show that almost anything is statistically significantly different if there is a sufficiently high number of samples or events. The authors should bear in mind that measuring the mitochondrial membrane potential is not trivial. One needs to understand the properties of the probes that are being employed as well as the instruments that are used to make the measurements. Care must be taken to ascertain that the quantifications reflect true biological processes. The authors claim, for Fig. 1, that there is an "excellent correlation" between height fluctuations and mitochondrial membrane potential. Given that the mitochondrial membrane potential measurements were associated with various errors (see above), it is premature to assert that there is any correlation, at all. Furthermore, if the authors want to argue that there is indeed a correlation between these variables, then they should perform an appropriate statistical analysis, e.g., a pearson correlation coefficient test.

      For the reasons explained above, the JC-1 and TMRM measurements in Figs. 3 and 4 are not convincing. The authors must demonstrate, unambiguously, that they understand the use of these probes and that they are making accurate measurements.

      Given that MTCH2 was recently reported to function as an insertase of the OMM (Guna et al., 2022), understanding the KO phenotype is extremely challenging, since it implicates the downstream loss of function of numerous other proteins. It would be valuable to examine other KO models with more specific mitochondrial defects, which can simplify the interpretation of the data. For example, suppression of any of the large Dynamin GTPases that control mitochondrial shape, i.e., MFN1/2, OPA1, or DRP1. Conversely, modulation of mitochondrial membrane composition by suppression of specific phospholipid biosynthetic enzymes would be valuable. It is important to note that the authors are attempting to highlight AFM as a novel way to assess patient samples, but they do not provide any data as to whether mitochondria, derived from a patient with a known mitochondrial defect, could be meaningfully assessed by this method. It is worth pointing out, too, that isolating mitochondria from primary tissues involves a significant amount of stress to the organelle. To understand mitochondrial function in a manner that reflects an in vivo state as much as possible, it would be essential to show that the isolated mitochondria from the liver are largely the same as those in intact liver cells. The authors should be aware that isolating live hepatocytes is far from a trivial thing to do (Charni-Natan & Goldstein, 2020). Simply mincing the liver and subjecting it to mechanical and enzymatic dissociation likely involves significant mitochondrial stress, which implies that the values derived from isolated mitochondria represent a highly non-physiological, even dysfunctional, condition. These are fundamental concerns which should be considered and discussed in any report that is lauding the potential diagnostic benefits of quantifying isolated mitochondria from primary tissues.

      The authors say, in the discussion, "Accordingly, the AFM method employed here measured several characteristics such as morphology and elastic modulus of the structures, as well as fully exploiting the rich information available from the noise spectra." There was no measurement of "morphology" in this study. Differences in height are not what is generally considered in discussions of mitochondrial morphology, which reflects the dynamic changes in organelle shape and connectivity, typically in the x-y (rather than z) axes.

      The authors performed experiments on fixed and dried mitochondria; however, there is no systematic comparison of the integrated power and other parameters compared to the live mitochondria isolates. This is a key comparison that should have been performed, as it would offer a basic frame of reference for the values of the live organelles. Another key experiment that is lacking in this study is measurement of the same organelle over time to understand the variance in individual organelles from moment to moment.

      Minor comments:

      Generally, the authors should moderate their claims that AFM could be used diagnostically until the above concerns are addressed.

      There needs to be considerably more detail as to the methods that were used here. This is essential insofar as the authors wish to convince potential readers that the experiments were carefully conducted and that the data is reliable. Putting numbers on the margin of the manuscript would be helpful for the referee to specifically address certain points.

      References:

      Bowler MW, Montgomery MG, Leslie AG, Walker JE. How azide inhibits ATP hydrolysis by the F-ATPases. Proc Natl Acad Sci U S A. 2006 Jun 6;103(23):8646-9. doi: 10.1073/pnas.0602915103. Epub 2006 May 25. PMID: 16728506; PMCID: PMC1469772.

      Guna A, Stevens TA, Inglis AJ, Replogle JM, Esantsi TK, Muthukumar G, Shaffer KCL, Wang ML, Pogson AN, Jones JJ, Lomenick B, Chou TF, Weissman JS, Voorhees RM. MTCH2 is a mitochondrial outer membrane protein insertase. Science. 2022 Oct 21;378(6617):317-322. doi: 10.1126/science.add1856. Epub 2022 Oct 20. PMID: 36264797; PMCID: PMC9674023.

      Charni-Natan M, Goldstein I. Protocol for Primary Mouse Hepatocyte Isolation. STAR Protoc. 2020 Aug 13;1(2):100086. doi: 10.1016/j.xpro.2020.100086. PMID: 33111119; PMCID: PMC7580103.

      Significance

      I am an expert in imaging of mitochondria, with considerable direct knowledge of various super-resolution and advanced imaging systems. I have also studied mitochondrial function, using standard biochemical and molecular approaches. I have great familiarity with mitochondrial behavior and dynamics, as understood from live-cell imaging approaches and morphological analysis.

      This study is potentially interesting due to its relatively novel use of AFM to examine mitochondria. However, there is a lot of uncertainty in the measurements due to technical oversights and lack of relevant controls. Whether AFM could be useful in the clinic remains an open question. If the authors could address the comments above, it would go a long way to finding out one way or the other.

    1. Reviewer #1 (Public review):

      Summary:

      The authors provide a compelling case that the unique variance explained by LLMs is different (and later) than the unique variance explained by DNNs. This characterises when, and to some extent where, these differences occur, and for LLMs, why. The authors also probe what in the sentences is driving the brain alignment.

      Strengths:

      (1) The study is timely.

      (2) There is a robust dataset and results.

      (3) There is compelling separation between unique responses related to LLMs and DNNs.

      (4) The paper is well-written.

      Weaknesses:

      The authors could explore more of what the overlap between the LLM and DNN means, and in general, how this relates to untrained networks.

    2. Reviewer #2 (Public review):

      Summary:

      This study provides an investigation into the temporal dynamics of visuo-semantic processing in the human brain, leveraging both deep neural networks (DNNs) and large language models (LLMs). By developing encoding models based on vision DNNs, LLMs, and their fusion, the authors demonstrate that vision DNNs preferentially account for early, broadband EEG responses, while LLMs capture later, low-frequency signals and more detailed visuo-semantic information. It is shown that the parietal cortex shows responses during visuo-semantic processing that can be partially accounted for by language features, highlighting the role of higher-level areas in encoding abstract semantic information.

      Strengths:

      The study leverages a very large EEG dataset with tens of thousands of stimulus presentations, which provides an unusually strong foundation for benchmarking a variety of vision DNNs and LLMs. This scale not only increases statistical power but also allows robust comparison across model architectures, ensuring that the conclusions are not idiosyncratic to a particular dataset or stimulus set.

      By using high-density EEG, the authors are able to capture the fine-grained temporal dynamics of visuo-semantic processing, going beyond the coarse temporal resolution of fMRI-based studies. This enables the authors to disentangle early perceptual encoding from later semantic integration, and to characterize how different model types map onto these stages of brain activity. The temporal dimension provides a particularly valuable complement to previous fMRI-based model-to-brain alignment studies.

      The encoding models convincingly show that vision DNNs and LLMs play complementary roles in predicting neural responses. The vision DNNs explain earlier broadband responses related to perceptual processing, while LLMs capture later, lower-frequency signals that reflect higher-order semantic integration. This dual contribution provides new mechanistic insights into how visual and semantic information unfold over time in the brain, and highlights the utility of combining unimodal models rather than relying on multimodal networks alone.

      Weaknesses:

      (1) The experimental design is insufficiently described, particularly regarding whether participants were engaged in a behavioral task or simply passively viewing images. Task demands are known to strongly influence neural coding and representations, and without this information, it is difficult to interpret the nature of the EEG responses reported.

      (2) The description of the encoding model lacks precision and formalization. It is not entirely clear what exactly is being predicted, how the model weights are structured across time points, or the dimensionality of the inputs and outputs. A more formal mathematical formulation would improve clarity and reproducibility.

      (3) The selected vision DNNs (CORnet-S, ResNet, AlexNet, MoCo) have substantially lower ImageNet classification accuracies than current state-of-the-art models, with gaps of at least 10%. Referring to these models collectively as "vision DNNs" may overstate their representational adequacy. This performance gap raises concerns about whether the chosen models can fully capture the visual and semantic features needed for comparison with brain data. Clarification of the rationale for choosing these particular networks, and discussion of how this limitation might affect the conclusions, is needed.

      (4) The analytic framework treats "vision" and "language" as strictly separate representational domains. However, semantics are known to emerge in many state-of-the-art visual models, with different layers spanning a gradient from low-level visual features to higher-level semantic representations. Some visual layers may be closer to LLM-derived representations than others. By not examining this finer-grained representational structure within vision DNNs, the study may oversimplify the distinction between vision- and language-based contributions.

      (5) The study uses static images, which restricts the scope of the findings to relatively constrained visual semantics. This limitation may explain why nouns and adjectives improved predictions over vision DNNs, but verbs did not. Verbs often require dynamic information about actions or events, which static images cannot convey.

    3. Reviewer #3 (Public review):

      Summary:

      Rong et al., compare EEG image responses from a large-scale dataset to state-of-the-art vision and language models, as well as their fusion. They find that the fusion of models provides the best predictivity, with early contribution from vision models and later predictivity from language models. The paper has several strengths: high temporal resolution data (though at the expense of spatial resolution), detailed comparison of alignment (and differences) between vision and language model embeddings, and comparison of "fusion" of different DNN models.

      Despite the paper's strengths, it is not clear what is at stake with these findings or how they advance our knowledge beyond other recent studies showing vision versus language model predictions of visual cortex responses with fMRI.

      Strengths:

      The authors use a large-scale EEG dataset and a comprehensive modeling approach. The methods are sound and involve multiple model comparisons. In particular, the disentangling of vision and language model features is something that has been largely ignored in prior related studies.

      Weaknesses:

      (1) The authors state their main hypothesis (lines 48-51) that human neural responses to visual stimulation are better modelled by combining representations from a vision DNN and an LLM than by the representations from either of the two components alone, and that the vision DNN and LLM components would uniquely predict earlier and later stages of visual processing, respectively.

      While they confirm this hypothesis in largely compelling ways, it is not clear whether these results tell us something about the brain beyond how to build the most predictive model.

      In particular, why do language models offer advantages over vision models, and what does this tell us about human visual processing? In several places, the discussion of advantages for the language model felt somewhat trivial and did not seem to advance our understanding of human vision, e.g., "responses for visual stimulation encode detailed information about objects and their properties" (lines 266-270) and "LLM representations capture detailed visuo-semantic information about the stimulus images" (line 293).

      (2) It is not clear what the high temporal resolution EEG data tell us that the whole-brain fMRI data do not. The latency results seem to be largely in line with fMRI findings, where the early visual cortex is better predicted by vision models, and the language model is better in later/more anterior regions. In addition, it would help to discuss whether the EEG signals are likely to be restricted to the visual cortex, or could the LLM predictivity explain downstream processing captured by whole-brain EEG signals?

      Relatedly, it would help the authors to expand on the implications of the frequency analysis.

      (3) While the authors test many combinations of vision and language models and show their "fusion" advantages are largely robust to these changes, it is still hard to ignore the vast differences between vision and language models, in terms of architecture and how they are trained. Two studies (Wang et al., 2023, and Conwell et al., 2024) have now shown that when properly controlling for architecture and dataset, there is little to no advantage of language alignment in predicting visual cortex responses. It would help for the authors to both discuss this aspect of the prior literature and to try to address the implications for their own findings (related to pt 1 about what, if anything, is "special" about language models).

      (4) Model features - it would help to state the dimensionality of the input embeddings for each model and how much variance is explained and preserved after the PCA step? I wonder how sensitive the findings are to this choice of dimensionality reduction, and whether an approach that finds the optimal model layer (in a cross-validated way) would show less of a difference between vision/language models (I realize this is not feasible with models like GPT-3).

      (5) To better understand the fusion advantage, it would help to look at the results, look for a pair of vision models and a pair of language models. Can a similar advantage be found by combining models from the same modality?

    1. Reviewer #1 (Public review):

      Summary:

      Rahmani et al. utilize the TurboID method to characterize global proteome changes in the worm's nervous system induced by a salt-based associative learning paradigm. Altogether, they uncover 706 proteins tagged by the TurboID method in worms that underwent the memory-inducing protocol. Next, the authors conduct a gene enrichment analysis that implicates specific molecular pathways in salt-associative learning, such as MAP kinase and cAMP-mediated pathways, as well as specific neuronal classes including pharyngeal neurons, and specific sensory neurons, interneurons, and motor neurons. The authors then screen a representative group of hits from the proteome analysis. They find that mutants of candidate genes from the MAP kinase pathway, namely dlk-1 and uev-3, do not affect performance in the learning paradigm. Instead, multiple acetylcholine signaling mutants, as well as a protein-kinase-A mutant, significantly affected performance in the associative memory assay (e.g., acc-1, acc-3, lgc-46, and kin-2). Finally, the authors demonstrate that protein-kinase-A mutants, as well as acetylcholine signaling mutants, do not exhibit a phenotype in a related but distinct conditioning paradigm-aversive salt conditioning-suggesting their effect is specific to appetitive salt conditioning.

      Overall, the authors addressed the concerns raised in the previous review round, including the statistics of the chemotaxis experiments and the systems-level analysis of the neuron class expression patterns of their hits. I also appreciate the further attempt to equalize the sample size of the chemotaxis experiments and the transparent reporting of the sample size and statistics in the figure captions and Table S9. The new results from the panneuronal overexpression of the kin-2 gain-of-function allele also contribute to the manuscript. Together, these make the paper more compelling. The additional tested hits provide a comprehensive analysis of the main molecular pathways that could have affected learning. However, the revised manuscript includes more information and analysis, raising additional concerns.

      Major comments:

      As reviewer 4 noted, and as also shown to be relevant for C30G12.6 presented in Figure 6, the backcrossing of the mutants is important, as background mutations may lead to the observed effects. Could the authors add to Table 1, sheet 1, the outcrossing status of the tested mutants? It is important to validate that the results of the positive hits (where learning was affected), such as acc-1, acc-3, and lgc-46, do not stem from background mutations.

      The fold change in the number of hits for different neurons in the CENGEN-based rank analysis requires a statistical test (discussed on pages 17-19 and summarized in Table S7). Similar to the other gene enrichment analyses presented in the manuscript, the new rank analysis also requires a statistical test. Since the authors extensively elaborate on the results from this analysis, I think a statistical analysis is especially important for its interpretation. For example, if considering the IL1 neurons, which ranked highest, and assuming random groups of genes-each having the same size as those of the ranked neurons (209 genes in total for IL1 in Table S7)-how common would it be to get the calculated fold change of 1.38 or higher? Such bootstrapping analysis is common for enrichment analysis. Perhaps the authors could consult with an institutional expert (Dr. Pawel Skuza, Flinders University) for the statistical aspects of this analysis.

      The learning phenotypes from Figure S8, concerning acc-1, acc-3, and lgc-46 mutants, are summarized in a scheme in Figure 4; however, the chemotaxis results are found in the supplemental Figure S8. Perhaps I missed the reasoning, but for transparency, I think the relevant Figure S8 results should be shown together with their summary scheme in Figure 4.

    2. Reviewer #4 (Public review):

      Summary:

      In this manuscript, authors used a learning paradigm in C. elegans; when worms were fed in a saltless plate, its chemotaxis to salt is greatly reduced. To identify learning-related proteins, authors employed nervous system-specific transcriptome analysis to compare whole proteins in neurons between high-salt-fed animals and saltless-fed animals. Authors identified "learning-specific proteins" which are observed only after saltless feeding. They categorized these proteins by GO analyses, pathway analyses and expression site analyses, and further stepped forward to test mutants in selected genes identified by the proteome analysis. They find several mutants that are defective or hyper-proficient for learning, including acc-1/3 and lgc-46 acetylcholine receptors, F46H5.3 putative arginine kinase, and kin-2, a cAMP pathway gene. These mutants were not previously reported to have abnormality in the learning paradigm.

      Concerns:

      Upon revision, authors addressed all concerns of this reviewer, and the results are now presented in a way that facilitates objective evaluation. Authors' conclusions are supported by the results presented, and the strength of the proteomics approach is persuasively demonstrated.

      Significance:

      (1) Total neural proteome analysis has not been conducted before for learning-induced changes, though transcriptome analysis has been performed for odor learning (Lakhina et al., http://dx.doi.org/10.1016/j.neuron.2014.12.029). This warrants the novelty of this manuscript, because for some genes, protein levels may change even though mRNA levels remain the same. Although in a few reports TurboID has been used in C. elegans, this is the first report of a systematic analysis of tissue-specific differential proteomics.

      (2) Authors found five mutants that have abnormality in the salt learning. These genes have not been described to have the abnormality, providing novel knowledge to the readers, especially those who work on C. elegans behavioural plasticity. Especially, involvement of acetylcholine neurotransmission has not been addressed before. Although transgenic rescue experiments have not been performed except kin-2, and the site of action (neurons involved) has not been tested in this manuscript, it will open the venue to further determine the way in which acetylcholine receptors, cAMP pathway etc. influences the learning process.

    3. Author response:

      General Statements

      We thank the reviewers for providing us the opportunity to revise our manuscript titled “Identifying regulators of associative learning using a protein-labelling approach in C. elegans.” We appreciate the insightful feedback that we received to improve this work. In response, we have extensively revised the manuscript with the following changes: we have (1) clarified the criteria used for selecting candidate genes for behavioural testing, presenting additional data from ‘strong’ hits identified in multiple biological replicates (now testing 26 candidates, previously 17), (2) expanded our discussion of the functional relevance of validated hits, including providing new tissue-specific and neuron class-specific analyses, and (3) improved the presentation of our data, including visualising networks identified in the ‘learning proteome’, to better highlight the significance of our findings. We also substantially revised the text to indicate our attempts to address limitations related to background noise in the proteomic data and outlined potential refinements for future studies. All revisions are clearly marked in the manuscript in red font. A detailed, point-by-point response to each comment is provided below.

      Point-by-point description of the revisions:

      Reviewer #1 (Evidence, reproducibility and clarity):

      Summary:

      Rahmani et al., utilize the TurboID method to characterize the global proteome changes in the worm's nervous system induced by a salt-based associative learning paradigm. Altogether, Rahmani et al., uncover 706 proteins that are tagged by the TurboID method specifically in samples extracted from worms that underwent the memory inducing protocol. Next, the authors conduct a gene enrichment analysis that implicates specific molecular pathways in saltassociative learning, such as MAP-kinase and cAMP-mediated pathways. The authors then screen a representative group of the hits from the proteome analysis. The authors find that mutants of candidate genes from the MAP-kinase pathway, namely dlk-1 and uev-3, do not affect the performance in the learning paradigm. Instead multiple acetylcholine signaling mutants significantly affected the performance in the associative memory assay, e.g., acc-1, acc-3, gar-1, and lgc-46. Finally, the authors demonstrate that the acetylcholine signaling mutants did not exhibit a phenotype in similar but different conditioning paradigms, such as aversive salt-conditioning or appetitive odor conditioning, suggesting their effect is specific to appetitive salt conditioning.

      Major comments:

      (1) The statistical approach and analysis of the behavior assay:

      The authors use a 2-way ANOVA test which assumes normal distribution of the data. However, the chemotaxis index used in the study is bounded between -1 and 1, which prevents values near the boundaries to be normally distributed.

      Since most of the control data in this assay in this study is very close to 1, it strongly suggests that the CI data is not normally distributed and therefore 2-way ANOVA is expected to give skewed results.

      I am aware this is a common mistake and I also anticipate that most conclusions will still hold also under a more fitting statistical test.

      We appreciate the point raised by Reviewer 1 and understand the importance of performing the correct statistical tests.

      The statistical tests used in this study were chosen since parametric tests, particularly ANOVA tests to assess differences between multiple groups, are commonly used to assess behaviour in the C. elegans learning and memory field. Below is a summary of the tests used by studies that perform similar behavioural tests cited in this work, as examples:

      Author response table 1.

      A summary for the statistical tests performed by similar studies for chemotaxis assay data. References (listed in the leftmost column) were observed to (A) use parametric tests only or (B) performed either a parametric or non-parametric test on each chemotaxis assay dataset depending on whether the data passed a normality test. Listings for ANOVA tests are in bold to demonstrate their common use in the C. elegans learning and memory field.

      We note Reviewer 1's concern that this may stem from a common mistake. As stated, Two-way ANOVA generally relies on normally distributed data. We used GraphPad Prism to perform the Shapiro-Wilk normality test on our chemotaxis assay data as it is generally appropriate for sample sizes < 50 (α = 0.05), and found that most data passes this test including groups with skewed indices. For example, this is the data for Figure S8C:

      Author response table 2.

      Shapiro-Wilk normality test results for chemotaxis assay data in Figure S8C. Chemotaxis assay data was generated to assess salt associative learning capacity for wild-type (WT) versus lgc-46(-) mutant C. elegans. Three experimental groups were prepared for each C. elegans strain (naïve, high-salt control, and trained). From top-to-bottom, the data below displays the ‘W’ value, ‘P value’, a binary yes/no for whether the data passes the Shapiro-Wilk normality test, and a ‘P value summary’ (ns = nonsignificant). W values measure the similarity between a normal distribution and the chemotaxis assay data. Data is considered normal in the Shapiro-Wilk normality test when a W value is near 1.0 and the null hypothesis is not rejected (i.e., P value > 0.05).

      The manuscript now includes the use of the Shapiro-Wilk normality test to assess chemotaxis assay data before using two-way ANOVA on page 51.

      Nevertheless an appropriate statistical analysis should be performed. Since I assume the authors would wish to take into consideration both the different conditions and biological repeats, I can suggest two options:

      - Using a Generalized linear mixed model, one can do with R software.

      - Using a custom bootstrapping approach.

      We thank Reviewer 1 for suggesting these two options. We carefully considered both approaches and consulted with the in-house statistician at our institution (Dr Pawel Skuza, Flinders University) for expert advice to guide our decision. In summary:

      (1) Generalised linear mixed models: Generalised linear mixed models (GLMMs) are generally most appropriate for nested/hierarchal data. However, our chemotaxis assay data does not exhibit such nesting. Each biological replicate (N) consists of three technical replicates, which are averaged to yield a single chemotaxis index per N. Our statistical comparisons are based solely on these averaged values across experimental groups, making GLMMs less applicable in this context.

      (2) Bootstrapping: Based on advice from our statistician, while bootstrapping can be a powerful tool, its effectiveness is limited when applied to datasets with a low number of biological replicates (N). Bootstrapping relies on resampling existing data to simulate additional observations, which may artificially inflate statistical power and potentially suggest significance where the biological effect size is minimal or not meaningful. Increasing the number of biological replicates to accommodate bootstrapping could introduce additional variability and compromise the interpretability of the results.

      The total number of assays, especially controls, varies quite a bit between the tested mutants. For example compare the acc-1 experiment in Figure 4.A., and gap-1 or rho-1 in Figure S4.A and D. It is hard to know the exact N of the controls, but I assume that for example, lowering the wild type control of acc-1 to equivalent to gap-1 would have made it non significant. Perhaps the best approach would be to conduct a power analysis, to know what N should be acquired for all samples.

      We thoroughly evaluated performing the power analysis: however, this is typically performed with the assumption that an N = 1 represents a singular individual/person. An N =1 in this study is one biological replicate that includes hundreds of worms, which is why it is not typically employed in our field for this type of behavioural test.

      Considering these factors, we have opted to continue using a two-way ANOVA for our statistical analysis. This choice aligns with recent publications that employ similar experimental designs and data structures. Crucially, we have verified that our data meet the assumptions of normality, addressing key concerns regarding the suitability of parametric testing. We believe this approach is sufficiently rigorous to support our main conclusions. This rationale is now outlined on page 51.

      To be fully transparent, our aim is to present differences between wild-type and mutant strains that are clearly visible in the graphical data, such that the choice of statistical test does not become a limiting factor in interpreting biological relevance. We hope this rationale is understandable, and we sincerely appreciate the reviewer’s comment and the opportunity to clarify our analytical approach.

      We hope that Reviewer 1 will appreciate these considerations as sufficient justification to retain the statistical tests used in the original manuscript. Nevertheless, to constructively address this comment, we have performed the following revisions:

      (1) Consistent number of biological replicates: We performed additional biological replicates of the learning assay to confirm the behavioural phenotypes for the key candidates described (KIN-2 , F46H5.3, ACC-1, ACC-3, LGC-46). We chose N = 5 since most studies cited in this paper that perform similar behavioural tests do the same (see Author response table 3 below).

      Author response table 3.

      A summary for sample sizes generated by similar studies for chemotaxis assay data. References (listed in the leftmost column) were observed to the sample sizes (N) below corresponding to biological replicates of chemotaxis assay data. N values are in bold when the study uses N ≤ 5.

      (1) Grouped presentation of behavioural data: We now present all behavioural data by grouping genotypes tested within the same biological replicate, including wild-type controls, rather than combining genotypes tested separately. This ensures that each graph displays data from genotypes sharing the same N, also an important consideration for performing parametric tests. Accordingly, we re-performed statistical analyses using this reduced N for relevant graphs. As anticipated, this rendered some comparisons non-significant. All statistical comparisons are clearly indicated on each graph.

      (2) Improved clarity of figure legends: We revised figure legends for Figures 5, 6, S7, S8, & S9 to make clear how many biological replicates have been performed for each genotype by adding N numbers for each genotype in all figures.

      The authors use the phrasing "a non-significant trend", I find such claims uninterpretable and should be avoided. Examples: Page 16. Line 7 and Page 18, line 16.

      This is an important point. While we were not able to find the specific phrasing "a non-significant trend" from this comment in the original manuscript, we acknowledge that referring to a phenotype as both a trend and non-significant may confuse readers, which was originally stated in the manuscript in two locations.

      The main text has been revised on pages 27 & 28 when describing comparisons between trained groups between two C. elegans lines, by removing mentions of trends and retaining descriptions of non-significance.

      (2) Neuron-specific analysis and rescue of mutants:

      Throughout the study the authors avoid focusing on specific neurons. This is understandable as the authors aim at a systems biology approach, however, in my view this limits the impact of the study. I am aware that the proteome changes analyzed in this study were extracted from a pan neuronally expressed TurboID. Yet, neuron-specific changes may nevertheless be found. For example, running the protein lists from Table S2, in the Gene enrichment tool of wormbase, I found, across several biological replicates, enrichment for the NSM, CAN and RIG neurons. A more careful analysis may uncover specific neurons that take part in this associative memory paradigm. In addition, analysis of the overlap in expression of the final gene list in different neurons, comparing them, looking for overlap and connectivity, would also help to direct towards specific circuits.

      This is an important and useful suggestion. We appreciate the benefit in exploring the data from this study from a neuron class-specific lens, in addition to the systems-level analyses already presented.

      The WormBase gene enrichment tool is indeed valuable for broad transcriptomic analyses (the findings from utilising this tool are now on page 16); however, its use of Anatomy Ontology (AO) terms also contains annotations from more abundant non-neuronal tissues in the worm. To strengthen our analysis and complement the Wormbase tool, we also used the CeNGEN database as suggested by Reviewer 3 Major Comment 1 (Taylor et al., 2021), which uses single cell RNA-Seq data to profile gene expression across the C. elegans nervous system. We input our learning proteome data into CeNGEN as a systemic analysis, identifying neurons highly represented by the learning proteome (on pages 16-20). To do this, we specifically compared genes/proteins from high-salt control worms and trained worms to identify potential neurons that may be involved in this learning paradigm. Briefly, we found:

      - WormBase gene enrichment tool: Enrichment for anatomy terms corresponding to specific interneurons (ADA, RIS, RIG), ventral nerve cord neurons, pharyngeal neurons (M1, M2, M5, I4), PVD sensory neurons, DD motor neurons, serotonergic NSM neurons, and CAN.

      - CeNGEN analysis: Representation of neurons previously implicated in associative learning (e.g., AVK interneurons, RIS interneurons, salt-sensing neuron ASEL, CEP & ADE dopaminergic neurons, and AIB interneurons), as well as neurons not previously studied in this context (pharyngeal neurons I3 & I6, polymodal neuron IL1, motor neuron DA9, and interneuron DVC). Methods are detailed on pages 50 & 51.

      These data are summarised in the revised manuscript as Table S7 & Figure 4.

      To further address the reviewer’s suggestion, we examined the overlap in expression patterns of the validated learning-associated genes acc-1, acc-3, lgc-46, kin-2, and F46H5.3 across the neuron classes above, using the CeNGEN database. This was done to explore potential neuron classes in which these regulators may act in to regulate learning. This analysis revealed both shared and distinct expression profiles, suggesting potential functional connectivity or co-regulation among subsets of neurons. To summarise, we found:

      - All five learning regulators are expressed in RIM interneurons and DB motor neurons.

      - KIN-2 and F46H5.3 share the same neuron expression profile and are present in many neurons, so they may play a general function within the nervous system to facilitate learning.

      - ACC-3 is expressed in three sensory neuron classes (ASE, CEP, & IL1).

      - In contrast, ACC-1 and LGC-46 are expressed in neuron classes (in brackets) implicated in gustatory or olfactory learning paradigms (AIB, AVK, NSM, RIG, & RIS) (Beets et al., 2012, Fadda et al., 2020, Wang et al., 2025, Zhou et al., 2023, Sato et al., 021), neurons important for backward or forward locomotion (AVE, DA, DB, & VB) (Chalfie et al., 1985), and neuron classes for which their function is yet detailed in the literature (ADA, I4, M1, M2, & M5).

      These neurons form a potential neural circuit that may underlie this form of behavioural plasticity, which we now describe in the main text on pages 16-20 & 34-35 and summarise in Figure 4.

      OPTIONAL: A rescue of the phenotype of the mutants by re-expression of the gene is missing, this makes sure to avoid false-positive results coming from background mutations. For example, a pan neuronal or endogenous promoter rescue would help the authors to substantiate their claims, this can be done for the most promising genes. The ideal experiment would be a neuron-specific rescue but this can be saved for future works.

      We appreciate this suggestion and recognise its potential to strengthen our manuscript. In response, we made many attempts to generate pan-neuronal and endogenous promoter reexpression lines. However, we faced several technical issues in transgenic line generation, including poor survival following microinjection likely due to protein overexpression toxicity (e.g., C30G12.6, F46H5.3), and reduced animal viability for chemotaxis assays, potentially linked to transgene-related reproductive defects (e.g., ACC-1). As we have previously successfully generated dozens of transgenic lines in past work (e.g. Chew et al., Neuron 2018; Chew et al., Phil Trans B 2018; Gadenne/Chew et al., Life Science Alliance 2022), we believe the failure to produce most of these lines is not likely due to technical limitations. For transparency, these observations have been included in the discussion section of the manuscript on pages 39 & 40 as considerations for future troubleshooting.

      Fortunately, we were able to generate a pan-neuronal promoter line for KIN-2 that has been tested and included in the revised manuscript. This new data is shown in Figure 5B and described on pages 23 & 24. Briefly, this shows that pan-neuronal expression of KIN-2 from the ce179 mutant allele is sufficient to reproduce the enhanced learning phenotype observed in kin2(ce179) animals, confirming the role of KIN-2 in gustatory learning.

      To address the potential involvement of background mutations (also indicated by Reviewer 4 under ‘cross-commenting’), we have also performed experiments with backcrossed versions of several mutants. These experiments aimed to confirm that salt associative learning phenotypes are due to the expected mutation. Namely, we assessed kin-2(ce179) mutants that had been backcrossed previously by another laboratory, as well as C30G12.6(-) and F46H5.3(-) animals backcrossed in this study. Although not all backcrossed mutants retained their original phenotype (i.e., C30G12.6) (Figure 6D, a newly added figure), we found that backcrossed versions of KIN-2 and F46H5.3 both robustly showed enhanced learning (Figures 5A & 6B).

      This is described in the text on pages 23-26.

      Minor comments:

      (1) Lack of clarity regarding the validation of the biotin tagging of the proteome.

      The authors show in Figure 1 that they validated that the combination of the transgene and biotin allows them to find more biotin-tagged proteins. However there is significant biotin background also in control samples as is common for this method. The authors mention they validated biotin tagging of all their experiments, but it was unclear in the text whether they validated it in comparison to no-biotin controls, and checked for the fold change difference.

      This is an important point: We validated our biotin tagging method prior to mass spectrometry by comparing ‘no biotin’ and ‘biotin’ groups. This is shown in Figure S1 in the revised manuscript, which includes a western blot comparing untreated and biotin treated animals that are nontransgenic or expressing TurboID. As expected, by comparing biotinylated protein signal for untreated and treated lanes within each line, biotin treatment increased the signal 1.30-fold for non-transgenic and 1.70-fold for TurboID C. elegans. This is described on page 8 of the revised manuscript.

      To clarify, for mass spectrometry experiments, we tested a no-TurboID (non-transgenic) control, but did not perform a no-biotin control. We included the following four groups: (1) No-TurboID ‘control’ (2) No-TurboID ‘trained’, (3) pan-neuronal TurboID ‘control’ and (4) pan-neuronal TurboID ‘trained’, where trained versus control refers to whether ‘no salt’ was used as the conditioned stimulus or not, respectively (illustrated in Figure 1A). Due to the complexity of the learning assay (which involves multiple washes and handling steps, including a critical step where biotin is added during the conditioning period), and the need to collect sufficient numbers of worms for protein extraction (>3,000 worms per experimental group), adding ‘no-biotin’ controls would have doubled the number of experimental groups, which we considered unfeasible for practical reasons. This is explained on pages 8 & 9 of the revised manuscript.

      Also, it was unclear which exact samples were tested per replicate. In Page 9, Lines 17-18: "For all replicates, we determined that biotinylated proteins could be observed ...", But in Page 8, Line 24 : "We then isolated proteins from ... worms per group for both 'control' and 'trained' groups,... some of which were probed via western blotting to confirm the presence of biotinylated proteins".

      Could the authors specify which samples were verified and clarify how?

      Thank you for pointing out these unclear statements: We have clarified the experimental groups used for mass spectrometry experiments as detailed in the response above on pages 8 & 9. In addition, western blots corresponding to each biological replicate of mass spectrometry data described in the main text on page 10 and have been added to the revised manuscript (as Figure S3). These western blots compare biotinylation signal for proteins extracted from (1) NoTurboID ‘control’ (2) No-TurboID ‘trained’, (3) pan-neuronal TurboID ‘control’ and (4) panneuronal TurboID ‘trained’. These blots function to confirm that there were biotinylated proteins in TurboID samples, before enrichment by streptavidin-mediated pull-down for mass spectrometry.

      OPTIONAL: include the fold changes of biotinylated proteins of all the ones that were tested. Similar to Figure 1.C.

      This is an excellent suggestion. As recommended by the reviewer, we have included foldchanges for biotinylated protein levels between high-salt control and trained groups (on pages 9 & 10 for replicate #1 and in Table S2 for replicates #2-5). This was done by measuring protein levels in whole lanes for each experimental group per biological replicate within western blots (Figure 1C for replicate #1 and Figure S3 for replicates #2-5) of protein samples generated for mass spectrometry (N = 5).

      (2) Figure 2 does not add much to the reader, it can be summarized in the text, as the fraction of proteins enriched for specific cellular compartments.

      I would suggest to remove Figure 2 (originally written as figure 3) to text, or transfer it to the supplementry material.

      As noted in cross-comment response to Reviewer 4, there were typos in the original figure references, we have corrected them above. Essentially, this comment is referring to Figure 2.

      We appreciate this feedback from Reviewer 1. We agree that the original Figure 2 functions as a visual summary from analysis of the learning proteome at the subcellular compartment level. However, it also serves to highlight the following:

      - Representation for neuron-specific GO terms is relatively low, but even this small percentage represents entire protein-protein networks that are biologically meaningful, but that are difficult to adequately describe in the main text.

      - TurboID was expressed in neurons so this figure supports the relevance of the identified proteome to biological learning mechanisms.

      - Many of these candidates could not be assessed by learning assay using single mutants since related mutations are lethal or substantially affect locomotion. These networks therefore highlight the benefit in using strategies like TurboID to study learning.

      We have chosen to retain this figure, moving it to the supplementary material as Figure S4 in the revised manuscript, as suggested.

      OPTIONAL- I would suggest the authors to mark in a pathway summary figure similar to Figure 3 (originally written as Figure 4) the results from the behavior assay of the genetic screen. This would allow the reader to better get the bigger picture and to connect to the systemic approach taken in Figures 2 and 3.

      We think this is a fantastic suggestion and thank Reviewer 1 for this input. In the revised manuscript, we have added Figure 7, which summarises the tested candidates that displayed an effect on learning, mapped onto potential molecular pathways derived from networks in the learning proteome. This figure provides a visual framework linking the behavioural outcomes to the network context. This is described in the main text on pages 32-33.

      (3) Typo in Figure 3: the circle of PPM1: The blue right circle half is bigger than the left one.

      We thank the Reviewer for noticing this, the node size for PPM-1.A has been corrected in what is now Figure 2 in the revised work.

      (4) Unclarity in the discussions. In the discussion Page 24, Line 14, the authors raise this question: "why are the proteins we identified not general learning regulators?. The phrasing and logic of the argumentation of the possible answers was hard to follow. - Can you clarify?

      We appreciate this feedback in terms of unclarity, as we strive to explain the data as clearly and transparently as possible. Our goal in this paragraph was to discuss why some candidates were seen to only affect salt associative learning, as opposed to showing effects in multiple learning paradigms (i.e., which we were defining as a ‘general learning regulator’). We have adjusted the wording in several places in this paragraph now on pages 36 & 37 to address this comment. We hope the rephrased paragraph provides sufficient rationalisation for the discussion regarding our selection strategy used to isolate our protein list of potential learning regulators, and its potential limitations.

      Cross-Commenting

      Firstly, we would like to express our appreciation for the opportunity for reviewers to crosscomment on feedback from other reviewers. We believe this is an excellent feature of the peer review process, and we are grateful to the reviewers for their thoughtful engagement and collaborative input.

      I would like to thank Reviewer #4 for the great cross comment summary, I find it accurate and helpful.

      I also would like to thank Reviewer #4 for spotting the typos in my minor comments, their page and figure numbers are the correct ones.

      We have corrected these typos in the relevant comments, and have responded to them accordingly.

      Small comment on common point 1 - My feeling is that it is challanging to do quantitative mass spectrometry, especially with TurboID. In general, the nature of MS data is that it hints towards a direction but a followup validation work is required in order to assess it. For example, I am not surprised that the fraction of repeats a hit appeared in does not predict well whether this hit would be validated behavioraly. Given these limitations, I find the authors' approach reasonable.

      We thank Reviewer 1 for this positive and thoughtful feedback. We also appreciate Reviewer 4’s comment regarding quantitative mass spectrometry and have addressed this in detail below (see response to Reviewer 4). However, we agree with Reviewer 1 that there are practical challenges to performing quantitative mass spectrometry with TurboID, primarily due to the enrichment for biotinylated proteins that is a key feature of the sample preparation process.

      Importantly, we whole-heartedly agree with Reviewer 1’s statement that “In general, the nature of MS data is that it hints towards a direction but a follow-up validation work is required in order to assess it”. This is the core of our approach: however, we appreciate that there are limitations to a qualitative ‘absent/present’ approach. We have addressed some of these limitations by clarifying the criteria used for selecting candidate genes, based additionally on the presence of the candidate in multiple biological replicates (categorised as ‘strong’ hits). Based on this method, we were able to validate the role of several novel learning regulators (Figures 5, 6, & S7). We sincerely hope that this manuscript can function as a direction for future research, as suggested by this Reviewer.

      I also would like to highlight this major comment from reviewer 4:

      "In Experimental Procedures, authors state that they excluded data in which naive or control groups showed average CI < 0.6499, and/or trained groups showed average CI < -0.0499 or > .5499 for N2 (page 36, lines 5-7). "

      This threshold seems arbitrary to me too, and it requires the clarifications requested by reviewer 4.

      As detailed in our response to Reviewer 4, Major Comment 2, data were excluded only in rare cases, specifically when N2 worms failed to show strong salt attraction prior to training, or when trained N2 worms did not exhibit the expected behavioural difference compared to untrained controls – this can largely be attributed to clear contamination or over-population issues, which are visible prior to assessing CTX plates and counting chemotaxis indices.

      These criteria were initially established to provide an objective threshold for excluding biological replicates, particularly when planning to assay a large number of genetic mutants. However, after extensive testing across many replicates, we found that N2 worms (that were not starved, or not contaminated) consistently displayed the expected phenotype, rendering these thresholds unnecessary. We acknowledge that emphasizing these criteria may have been misleading, and have therefore removed them from page 50 in the revised manuscript to avoid confusion and ensure clarity.

      Reviewer #1 (Significance):

      This study does a great job to effectively utilize the TurboID technique to identify new pathways implicated in salt-associative learning in C. elegans. This technique was used in C. elegans before, but not in this context. The salt-associative memory induced proteome list is a valuable resource that will help future studies on associative memory in worms. Some of the implicated molecular pathways were found before to be involved in memory in worms like cAMP, as correctly referenced in the manuscript. The implication of the acetylcholine pathway is novel for C. elgeans, to the best of my knowledge. The finding that the uncovered genes are specifically required for salt associative memory and not for other memory assays is also interesting.

      However overall I find the impact of this study limited. The premise of this work is to use the Turbo-ID method to conduct a systems analysis of the proteomic changes. The work starts by conducting network analysis and gene enrichment which fit a systemic approach. However, since the authors find that ~30% of the tested hits affect the phenotype, and since only 17/706 proteins were assessed, it is challenging to draw conclusive broad systemic claims.

      Alternatively, the authors could have focused on the positive hits, and understand them better, find the specific circuits where these genes act. This could have increased the impact of the work. Since neither of these two options are satisfied, I view this work as solid, but not wide in its impact and therefore estimate the audience of this study would be more specialized.

      My expertise is in C. elegans behavior, genetics, and neuronal activity, programming and machine learning.

      We thank the Reviewer for these comments and appreciate the recognition of the value of the proteomic dataset and the identification of novel molecular pathways, including the acetylcholine pathway, as well as the specificity of the uncovered genes to salt-associative memory. Regarding the reviewer’s concern about the overall impact and scope of the study, we respectfully offer the following clarification. Our aim was to establish a systems-level approach for investigating learning-related proteomic changes using TurboID, and we acknowledge that only a subset of the identified proteins was experimentally tested (now 26/706 proteins in the revised manuscript). Although only five of the tested single gene mutants showed a robust learning phenotype in the revised work (after backcrossing, more stringent candidate selection, improved statistical analysis in addressing reviewer comments), our proteomic data provides us a unique opportunity to define these candidates within protein-protein networks (as illustrated in Figure 7). Importantly, our functional testing focused on single-gene mutants, which may not reveal phenotypes for genes that act redundantly (now mentioned on pages 28-30). This limitation is inherent to many genetic screens and highlights the value of our proteomic dataset, which enables the identification of broader protein-protein interaction networks and molecular pathways potentially involved in learning.

      To support this systems-level perspective, we have added Figure 7, which visually integrates the tested candidates into molecular pathways derived from the learning proteome for learning regulators KIN-2 and F46H5.3. We also emphasise more explicitly in the text (on pages 32-33) the value of our approach by highlighting the functional protein networks that can be derived from our proteomics dataset.

      We fully acknowledge that the use of TurboID across all neurons limits the resolution needed to pinpoint individual neuron contributions, and understand the benefit in further experiments to explore specific circuits. Many circuits required for salt sensing and salt-based learning are highly explored in the literature and defined explicitly (see Rahmani & Chew, 2021), so our intention was to complement the existing literature by exploring the protein-protein networks involved in learning, rather than on neuron-neuron connectivity. However, we recognise the benefit in integrating circuit-level analyses, given that our proteomic data suggests hundreds of candidates potentially involved in learning. While validating each of these candidates is beyond the scope of the current study, we have taken steps to suggest candidate neurons/circuits by incorporating tissue enrichment analyses and single-cell transcriptomic data (Table S7 & Figure 4). These additions highlight neuron classes of interest and suggest possible circuits relevant to learning.

      We hope this clarification helps convey the intended scope and contribution of our study. We also believe that the revisions made in response to Reviewer 1’s feedback have strengthened the manuscript and enhanced its significance within the field.

      Reviewer #2 (Evidence, reproducibility and clarity):

      Summary:

      In this study by Rahmani in colleagues, the authors sought to define the "learning proteome" for a gustatory associative learning paradigm in C. elegans. Using a cytoplasmic TurboID expressed under the control of a pan-neuronal promoter, the authors labeled proteins during the training portion of the paradigm, followed by proteomics analysis. This approach revealed hundreds of proteins potentially involved in learning, which the authors describe using gene ontology and pathways analysis. The authors performed functional characterization of some of these genes for their requirement in learning using the same paradigm. They also compared the requirement for these genes across various learning paradigms, and found that most hits they characterized appear to be specifically required for the training paradigm used for generating the "learning proteome".

      Major Comments:

      (1) The definition of a "hit" from the TurboID approach is does not appear stringent enough. According to the manuscript, a hit was defined as one unique peptide detected in a single biological replicate (out of 5), which could give rise to false positives. In figure S2, it is clear that there relatively little overlap between samples with regards to proteins detected between replicates, and while perhaps unintentional, presenting a single unique peptide appears to be an attempt to inflate the number of hits. Defining hits as present in more than one sample would be more rigorous. Changing the definition of hits would only require the time to re-list genes and change data presented in the manuscript accordingly.

      We thank Reviewer 2 for this valuable comment, and the following related suggestion. We agree with the statement that “Defining hits as present in more than one sample would be more rigorous”. Therefore, to address this comment, we have now separated candidates into two categories in Table 2 in the revised manuscript: ‘strong’ (present in 3 or more biological replicates) and ‘weak’ candidates (present in 2 or fewer biological replicates). However, we think these weaker candidates should still be included in the manuscript, considering we did observe relationships between these proteins and learning. For example, ACC-1, which influences salt associative learning in C. elegans, was detected in one replicate of mass spectrometry as a potential learning regulator (Figure S8A). We describe this classification in the main text on pages 21-22.

      We also agree with Reviewer 2 that the overlap between individual candidate hits is low between biological replicates; the inclusion of Figure S2 in the original manuscript serves to highlight this limitation. However, it is also important to consider that there is notable overlap for whole molecular pathways between biological replicates of mass spectrometry data as shown in Figure 2 in the revised manuscript (this consideration is now mentioned on pages 13-14). We have included Figure 3 to illustrate representation for two metabolic processes across several biological replicates normally indispensable to animal health, as an example to provide additional visual aid for the overlap between replicates of mass spectrometry. We provide this figure (described on pages 13 & 15) to demonstrate the strength of our approach in that it can detect candidates not easily assessable by conventional forward or reverse genetic screens.

      We also appreciate the opportunity to explain our approach. The criteria of “at least one unique peptide” was chosen based on a previous work for which we adapted for this manuscript (Prikas et al., 2020). It was not intended to inflate the number of hits but rather to ensure sensitivity in detecting low-abundance neuronal proteins. We have clarified this in our Methods (page 46).

      (2) The "hits" that the authors chose to functionally characterize do not seem like strong candidate hits based on the proteomics data that they generated. Indeed, most of the hits are present in a single, or at most 2, biological replicate. It is unclear as to why the strongest hits were not characterized, which if mutant strains are publicly available, would not be a difficult experiment to perform.

      We thank the reviewer for this important suggestion. To address this, we have described two molecular pathways with multiple components that appear in more than one biological replicate of mass spectrometry data in Figure 3 (main text on page 13). In addition, we have included Figures 6 & S7 where 9 additional single mutants corresponding to candidates in three or more biological replicates of mass spectrometry were tested for salt associative learning. Briefly, we found the following (number of replicates that a protein was unique to TurboID trained animals is in brackets):

      - Novel arginine kinase F46H5.3 (4 replicates) displays an effect in both salt associative learning and salt aversive learning in the same direction (Figures 6A, 6B, & S9A, pages 31-32 & 37-38).

      - Worms with a mutation for armadillo-domain protein C30G12.6 (3 replicates) only displayed an enhanced learning phenotype when non-backcrossed, not backcrossed. This suggests the enhanced learning phenotype was caused by a background mutation (Figure 6, pages 24-25).

      - We did not observe an effect on salt associative learning when assessing mutations for the ciliogenesis protein IFT-139 (5 replicates), guanyl nucleotide factors AEX-3 or TAG52 (3 replicates), p38/MAPK pathway interactor FSN-1 (3 replicates), IGCAM/RIG-4 (3 replicates), and acetylcholine components ACR-2 (4 replicates) and ELP-1 (3 replicates) (Figure S7, on pages 27-30). However, we note throughout the section for which these candidates are described that only single gene mutants were tested, meaning that genes that function in redundant or compensatory pathways may not exhibit a detectable phenotype.

      Because of the lack of strong evidence that these are indeed proteins regulated in the context of learning based on proteomics, including evidence of changes in the proteins (by imaging expression changes of fluorescent reporters or a biochemical approach), would increase confidence that these hits are genuine.

      We thank Reviewer 2 for this suggestion – we agree that it would have been ideal to have additional evidence suggesting that changes in candidate protein levels are associated directly with learning. Ideally, we would have explored this aspect further; however, as outlined in response to Reviewer 1 Major Comment 2 (OPTIONAL), this was not feasible within the scope of the current study due to several practical challenges. Specifically, we attempted to generate pan-neuronal and endogenous promoter rescue lines for several candidates, but encountered significant challenges, including poor survival post-microinjection (likely due to protein overexpression toxicity) and reduced viability for behavioural assays, potentially linked to transgene-related reproductive defects. This information is now described on pages 39 & 40 of the revised work.

      To address these limitations, we performed additional behavioural experiments where possible. We successfully generated a pan-neuronal promoter line for kin-2, which was tested and included in the revised manuscript (Figure 5B, pages 30 & 31). In addition, to confirm that observed learning phenotypes were due to the expected mutations and not background effects, we conducted experiments using backcrossed versions of several mutant lines as suggested by Reviewer 4 Cross Comment 3 (Figure 6, pages 23-24 & 24-26). Briefly, this shows that panneuronal expression of KIN-2 from the ce179 mutant allele is sufficient to repeat the enhanced learning phenotype observed in backcrossed kin-2(ce179) animals, providing additional evidence that the identified hits are required for learning. We also confirmed that F46H5.3 modulates salt associative learning, given both non-backcrossed and backcrossed F46H5.3(-) mutants display a learning enhancement phenotype. The revised text now describes this data on the page numbers mentioned above.

      Minor Comments:

      (1) The authors highlight that the proteins they discover seem to function uniquely in their gustatory associative paradigm, but this is not completely accurate. kin-2, which they characterize in figure 4, is required for positive butanone association (the authors even say as much in the manuscript) in Stein and Murphy, 2014.

      We appreciate this correction and thank the Reviewer for pointing this out. We have amended the wording appropriately on page 31 to clarify our meaning.

      “Although kin-2(ce179) mutants were not shown to impact salt aversive learning, they have been reported previously to display impaired intermediate-term memory (but intact learning and short-term memory) for butanone appetitive learning (Stein and Murphy, 2014).”

      Reviewer #2 (Significance):

      General Assessment:

      The approach used in this study is interesting and has the potential to further our knowledge about the molecular mechanisms of associative behaviors. Strengths of the study include the design with carefully thought out controls, and the premise of combining their proteomics with behavioral analysis to better understand the biological significance of their proteomics findings. However, the criteria for defining hits and prioritization of hits for behavioral characterizations were major wweaknesses of the paper.

      Advance:

      There have been multiple transcriptomic studies in the worm looking at gene expression changes in the context of behavioral training (Lakhina et al., 2015, Freytag 2017). This study compliments and extends those studies, by examining how the proteome changes in a different training paradigm. This approach here could be employed for multiple different training paradigms, presenting a new technical advance for the field.

      Audience:

      This paper would be of interest to the broader field of behavioral and molecular neuroscience. Though it uses an invertebrate system, many findings in the worm regarding learning and memory translate to higher organisms.

      I am an expert in molecular and behavioral neuroscience in both vertebrate and invertebrate models, with experience in genetics and genomics approaches.

      We appreciate Reviewer 2’s thoughtful assessment and constructive feedback. In response to concerns regarding definition and prioritisation of hits, we have revised our approach as detailed above to place more consideration on ‘strong’ hits present in multiple biological replicates. We have also added new behavioural data for additional mutants that fall into this category (Figures 6 & S7). We hope these revisions strengthen our study and enhance its relevance to the behavioural/molecular neuroscience community.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary:

      In the manuscript titled "Identifying regulators of associative learning using a protein-labelling approach in C. elegans" the authors attempted to generate a snapshot of the proteomic changes that happen in the C. elegans nervous system during learning and memory formation. They employed the TurboID-based protein labeling method to identify the proteins that are uniquely found in samples that underwent training to associate no-salt with food, and consequently exhibited lower attraction to high salt in a chemotaxis assay. Using this system they obtained a list of target proteins that included proteins represented in molecular pathways previously implicated in associative learning. The authors then further validated some of the hits from the assay by testing single gene mutants for effects on learning and memory formation.

      Major Comments:

      In the discussion section, the authors comment on the sources of "background noise" in their data and ways to improve the specificity. They provide some analysis on this aspect in Supplementary figure S2. However, a better visualization of non-specificity in the sample could be a GO analysis of tissue-specificity, and presented as a pie chart as in Figure 2A. Nonneuronal proteins such as MYO-2 or MYO-3 repeatedly show up on the "TurboID trained" lists in several biological replicates (Tables S2 and S3). If a major fraction of the proteins after subtraction of control lists are non-specific, that increases the likelihood that the "hits" observed are by chance. This analysis should be presented in one of the main figures as it is essential for the reader to gauge the reliability of the experiment.

      We agree with this assessment and thank Reviewer 3 for this constructive suggestion. In response, we have now incorporated a comprehensive tissue-specific analysis of the learning proteome in the revised manuscript. Using the single neuron RNA-Seq database CeNGEN, we identified the proportion of neuronal vs non-neuronal proteins from each biological replicate of mass spectrometry data. Specifically, we present Table 1 on page 17 (which we originally intended to include in the manuscript, but inadvertently left out), which shows that 87-95% (i.e. a large majority) of proteins identified across replicates corresponded to genes detected in neurons, supporting that the TurboID enzyme was able to target the neuronal proteome as expected. Table 1 is now described in the main text of the revised work on page 16.

      In addition, we performed neuron-specific analyses using both the WormBase gene enrichment tool and the CeNGEN single-cell transcriptomic database, which we describe in detail on our response to Reviewer 1 Major Comment 2. To summarise, these analyses revealed enrichment of several neuron classes, including those previously implicated in associative learning (e.g., ASEL, AIB, RIS, AVK) as well as neurons not previously studied in this context (e.g., IL1, DA9, DVC) (summarised in Table S7). By examining expression overlap across neuron types, we identified shared and distinct profiles that suggest potential functional connectivity and candidate circuits underlying behavioural plasticity (Figure 4). Taken together, these data show that the proteins identified in our dataset are (1) neuronal and (2) expressed in neurons that are known to be required for learning. Methods are detailed on pages 50-51.

      Other than the above, the authors have provided sufficient details in their experimental and analysis procedures. They have performed appropriate controls, and their data has sufficient biological and technical replaictes for statistical analysis.

      We appreciate this positive feedback and thank the Reviewer for acknowledging the clarity of our experimental and analysis procedures.

      Minor Comments:

      There is an error in the first paragraph of the discussion, in the sentences discussing the learning effects in gar-1 mutant worms. The sentences in lines 12-16 on page 22 says that gar-1 mutants have improved salt-associative learning and defective salt-aversive learning, while in fact the data and figures state the opposite.

      We appreciate the Reviewer noting this discrepancy. As clarified in our response to Reviewer 1, Major Comment 1 above, we reanalysed the behavioural data to ensure consistency across genotypes by comparing only those tested within the same biological replicates (thus having the same N for all genotypes). Upon this reanalysis, we found that the previously reported phenotype for gar-1 mutants in salt-associative learning was not statistically different from wildtype controls. Therefore, we have removed references to GAR-1 from the manuscript.

      Reviewer #3 (Significance):

      Strengths and limitations:

      This study used neuron-specific TurboID expression with transient biotin exposure to capture a temporally restricted snapshot of the C. elegans nervous system proteome during saltassociative learning. This is an elegant method to identify proteins temporally specific to a certain condition. However, there are several limitations in the way the experiments and analyses were performed which affect the reliability of the data. As the authors themselves have noted in the discussion, background noise is a major issue and several steps could be taken to improve the noise at the experimental or analysis steps (use of integrated C. elegans lines to ensure uniformity of samples, flow cytometry to isolate neurons, quantitative mass spec to detect fold change vs. strict presence/absence).

      Advance:

      Several studies have demonstrated the use of proximity labeling to map the interactome by using a bait protein fusion. In fact, expressing TurboID not fused to a bait protein is often used as a negative control in proximity labeling experiments. However, this study demonstrates the use of free TurboID molecules to acquire a global snapshot of the proteome under a given condition.

      Audience:

      Even with the significant limitations, this study is specifically of interest to researchers interested in understanding learning and memory formation. Broadly, the methods used in this study could be modified to gain insights into the proteomic profiles at other transient developmental stages. The reviewer's field of expertise: Cell biology of C. elegans neurons.

      We thank the reviewer for their thoughtful evaluation of our work. We appreciate the recognition of the novelty and potential of using neuron-specific TurboID to capture a temporally restricted snapshot of the C. elegans nervous system proteome during learning. We agree that this approach offers a unique opportunity to identify proteins associated with specific behavioural states in future studies.

      We also appreciate the reviewer’s comments regarding limitations in experimental and analytical design. In revising the manuscript, we have taken several steps to address these concerns and improve the clarity, rigour, and interpretability of our data. Specifically:

      - We now provide a frequency-based representation of proteomic hits (Table 2), which helps clarify how candidate proteins were selected and highlights differences between trained and control groups.

      - We have added neuron-specific enrichment analyses using both WormBase and CenGEN databases (Table S7 & Figure 4), which help identify candidate neurons and potential circuits involved in learning (methods on pages 50-51).

      - We have clarified the rationale for using qualitative proteomics in the context of TurboID, in addition to acknowledging the challenges of integrating quantitative mass spectrometry with biotin-based enrichment (page 39). Additional methods for improving sample purity, such as using integrated lines or FACS-enrichment of neurons, could further refine this approach in future studies. For transparency, we did attempt to integrate the TurboID transgenic line to improve the strength and consistency of biotinylation signals. However, despite four rounds of backcrossing, this line exhibited unexpected phenotypes, including a failure to respond reliably to the established training protocol. As a result, we were unable to include it in the current study. Nonetheless, we believe our current approach provides a valuable proof-of-concept and lays the groundwork for future refinement.

      By addressing the major concerns of peer reviewers, we believe our study makes a significant and impactful contribution by demonstrating the feasibility of using TurboID to capture learninginduced proteomic changes in the nervous system. The identification of novel learning-related mutants, including those involved in acetylcholine signalling and cAMP pathways, provides new directions for future research into the molecular and circuit-level mechanisms of behavioural plasticity.

      Reviewer #4 (Evidence, reproducibility and clarity):

      Summary:

      In this manuscript, authors used a learning paradigm in C. elegans; when worms were fed in a saltless plate, its chemotaxis to salt is greatly reduced. To identify learning-related proteins, authors employed nervous system-specific transcriptome analysis to compare whole proteins in neurons between high-salt-fed animals and saltless-fed animals. Authors identified "learningspecific genes" which are observed only after saltless feeding. They categorized these proteins by GO analyses and pathway analyses, and further stepped forward to test mutants in selected genes identified by the proteome analysis. They find several mutants that are defective or hyper-proficient for learning, including acc-1/3 and lgc-46 acetylcholine receptors, gar-1 acetylcholine receptor GPCR, glna-3 glutaminase involved in glutamate biosynthesis, and kin-2, a cAMP pathway gene. These mutants were not previously reported to have abnormality in the learning paradigm.

      Major comments:

      (1) There are problems in the data processing and presentation of the proteomics data in the current manuscript which deteriorates the utility of the data. First, as the authors discuss (page 24, lines 5-12), the current approach does not consider amount of the peptides. Authors state that their current approach is "conservative", because some of the proteins may be present in both control and learned samples but in different amounts. This reviewer has a concern in the opposite way: some of the identified proteins may be pseudo-positive artifacts caused by the analytical noise. The problem is that authors included peptides that are "present" in "TurboID, trained" sample but "absent" in the "Non-Tg, trained" and "TurboID, control" samples in any one of the biological replicates, to identify "learning proteome" (706 proteins, page 8, last line - page 9, line 8; page 32, line 21-22). The word "present" implies that they included even peptides whose amounts are just above the detection threshold, which is subject to random noise caused by the detector or during sample collection and preparation processes. This consideration is partly supported by the fact that only a small fraction of the proteins are common between biological replicates (honestly and respectably shown in Figure S2). Because of this problem, there is no statistical estimate of the identity in "learning proteome" in the current manuscript. Therefore, the presentation style in Tables S2 and S3 are not very useful for readers, especially because authors already subtracted proteins identified in Non-Tg samples, which must also suffer from stochastic noise. I suggest either quantifying the MS/MS signal, or if authors need to stick to the "present"/"absent" description of the MS/MS data, use the number of appearances in biological replicates of each protein as estimate of the quantity of each protein. For example, found in 2 replicates in "TurboID, learned" and in 0 replicates in "Non-Tg, trained". One can apply statistics to these counts. This said, I would like to stress that proteins related to acquisition of memory may be very rare, especially because learning-related changes likely occur in a small subset of neurons. Therefore, 1 time vs 0 time may be still important, as well as something like 5 times vs 1 time. In summary, quantitative description of the proteomics results is desired.

      We thank the reviewer for these valuable comments and suggestions.

      We acknowledge that quantitative proteomics would provide beneficial information; however, as also indicated by Reviewer 1 (in cross-comment), it is practically challenging to perform with TurboID. We have included discussion of potential future experiments involving quantitative mass spectrometry, as well as a comprehensive discussion of some of the limitations of our approach as summarised by this Reviewer, in the Discussion section (page 39). However, we note that our qualitative approach also provides beneficial knowledge, such as the identification of functional protein networks acting within biological pathways previously implicated in learning (Figure 2), and novel learning regulators ACC-1/3, LGC-46, and F46H5.3.

      We agree with the assessment that the frequency of occurrence for each candidate we test per biological replicate is useful to disclose in the manuscript as a proxy for quantification. This was also highlighted by Reviewer 2 (Major Comment 1). As detailed above in response to R2, we have now separated candidates into two categories: ‘strong’ (present in 3 or more biological replicates) and ‘weak’ candidates (present in 2 or fewer biological replicates). We have also added behavioural data after testing 9 of these strong candidates in Figures 6 & S7.

      We have also added Table 2 to the revised manuscript, which summarises the frequency-based representation of the proteomics results, as suggested. This is described on pages 22-23.

      Briefly, this shows the range of candidates further explored using single mutant testing. Specifically, this data showed that many of the tested candidates were more frequently detected in trained worms compared to high-salt controls. This includes both strong and weak candidates, providing a clearer view of how proteomic frequency informed our selection for functional testing.

      (2) There is another problem in the treatment of the behavioural data. In Experimental Procedures, authors state that they excluded data in which naive or control groups showed average CI < 0.6499, and/or trained groups showed average CI < -0.0499 or > 0.5499 for N2 (page 36, lines 5-7). How were these values determined? One common example for judging a data point as an outlier is > mean + 1.5, 2 or 3 SD, or < mean - 1.5, 2 or 3 SD. Are these values any of these standards, or determined through other methods? If these values were determined simply by authors' decision, it could potentially introduce a bias and in the worst cases lead to incorrect conclusions. A related question is, authors state "trained animals showed a lower CI (~0.3)" where in the referred Figure 1B, the corresponding data shows averages close to 0. Why is the inconsistency? The assay that authors use is close to those described in the previous literature (Kunitomo et al., http://dx.doi.org/10.1038/ncomms3210). In this previous paper, it was described that animals conditioned under no salt with food show negative CI and are attracted to the low salt concentration area. Quantitative analysis of behavioural patterns showed migration bias towards lower salt concentrations (negative chemotaxis). Essentially the same concept was reported by Luo et al. (http://dx.doi.org/10.1016/j.neuron.2014.05.010). The experimental procedure employed in the current work is very similar with those by the Japanese group, with a notable difference: the chemotaxis assay plate included 50mM NaCl in Kunitomo et al, while authors used chemotaxis plate without added NaCl (p35, line 18). The latter is expected to cause shallow gradient towards the low-salt area, which may be the reason for the weak negative CI in the trained animals. In any case, the value of CI itself is not a problem, and authors' current assay is valid. The only concern of mine is the potential of author-introduced cognitive bias, possibly affecting, for example, whether a certain mutant has a significant defect or not. What happens if the cut-offs of -0.0499 and 0.5499 are omitted and all data were included in the analyses? What are the average CIs of N2 in all performed experiments for each of naive, control and trained groups?

      Thank you for pointing this out. As mentioned by both Reviewer 1 and Reviewer 4, the original manuscript states the following: “Data was excluded for salt associative learning experiments when wild-type N2 displayed (1) an average CI ≤ 0.6499 for naïve or control groups and/or (2) an average CI either < -0.0499 or >0.5499 for trained groups.”

      To clarify, we only excluded experiments in rare cases where N2 worms did not display robust high salt attraction before training, or where trained N2 did not display the expected behavioural difference compared to untrained or high-salt control N2. These anomalies were typically attributable to clear contamination or starvation issues that could clearly be observed prior to counting chemotaxis indices on CTX plates.

      We established these exclusion criteria in advance of conducting multiple learning assays to ensure an objective threshold for identifying and excluding assays affected by these rare but observable issues. However, these criteria were later found to be unnecessary, as N2 worms robustly displayed the expected untrained and trained phenotypes for salt associative learning when not compromised by starvation or contamination.

      We understand that the original criteria may have appeared to introduce arbitrary bias in data selection. To address this concern, we have removed these criteria from the revised manuscript from page 50.

      Minor comments:

      (1) Related to Major comments 1), the successful effect of neuron-specific TurboID procedure was not evaluated. Authors obtained both TurboID and Non-Tg proteome data. Do they see enrichment of neuron-specific proteins? This can be easily tested, for example by using the list of neuron-specific genes by Kaletsky et al. (http://dx.doi.org/10.1038/nature16483 or http://dx.doi.org/10.1371/journal.pgen.1007559), or referring to the CenGEN data.

      We thank this Reviewer for this helpful suggestion, which was echoed by Reviewer 3 (Major Comment 1). As indicated in the response to R3 above, the revised manuscript now includes Table 1 as a tissue-specific analysis of the learning proteome, using the single neuron RNASeq database CeNGEN to identify the proportion of neuronal proteins from each biological replicate of mass spectrometry data. Generally, we observed a range of 87-95% of proteins corresponded to genes from the CeNGEN database that had been detected in neurons, providing evidence that the TurboID enzyme was able to target the neuronal proteome as expected. Table 1 is now described in the main text of the revised work on pages 16 & 17.

      (2) The behavioural paradigm needs to be described accurately. Page 5, line 16-17, "C. elegans normally have a mild attraction towards higher salt concentration": in fact, C. elegans raised on NGM plates, which include approximately 50mM of NaCl, is attracted to around 50mM of NaCl (Kunitomo et al., Luo et al.) but not 100-200 mM.

      We thank the Reviewer for pointing this out. We agree that clarification is necessary. The revised text reads as follows on page 5: “C. elegans are typically grown in the presence of salt (usually ~ 50 mM) and display an attraction toward this concentration when assayed for chemotaxis behaviour on a salt gradient (Kunitomo et al., 2013, Luo et al., 2014).

      Training/conditioning with ‘no salt + food’ partially attenuates this attraction (group referred to ‘trained’).”

      Authors call this assay "salt associative learning", which refers to the fact that worms associate salt concentration (CS) and either presence or absence of food (appetitive or aversive US) during conditioning (Kunitomo et al., Luo et al., Nagashima et al.) but they are looking at only association with presence of food, and for proteome analysis they only change the CS (NaCl concentration, as discussed in Discussion, p24, lines 4-5). It is better to attempt to avoid confusion to the readers in general.

      Thank you Reviewer 4 for highlighting this clarity issue. We clarify our definition of “salt associative learning” for the purpose of this study in the revised manuscript on page 6 with the following text:

      “Similar behavioural paradigms involving pairings between salt/no salt and food/no food have been previously described in the literature (Nagashima et al. 2019). Here, learning experiments were performed by conditioning worms with either ‘no salt + food’ (referred to as ‘salt associative learning’) or ‘salt + no food’ (called ‘salt aversive learning’).”

      (3) page 32, line 23: the wording "excluding" is obscure and misleading because the elo-6 gene was included in the analysis.

      We appreciate this Reviewer for pointing out this misleading comment, which was unintentional. We have now removed it from the text (on page 21).

      (4) Typo at page 24, line 18: "that ACC-1" -> "than ACC-1".

      This has been corrected (on page 37).

      (5) Reference. In "LEO, T. H. T. et al.", given and sir names are flipped for all authors. Also, the paper has been formally published (http://dx.doi.org/10.1016/j.cub.2023.07.041).

      We appreciate the Reviewer drawing our attention to this – the reference has been corrected and updated.

      I would like to express my modest cross comments on the reviews:

      (1) Many of the reviewers comment on the shortage in the quantitative nature of the proteome analysis, so it seems to be a consensus.

      Thank you Reviewer 4 for this feedback. We appreciate the benefit in performing quantitative mass spectrometry, in that it provides an additional way to parse molecular mechanisms in a biological process (e.g., fold-changes in protein expression induced by learning). However, we note that quantitative mass spectrometry is challenging to integrate with TurboID due to the requirement to enrich for biotinylated peptides during sample processing (we now mention this on page 39). Nevertheless, it would be exciting to see this approach performed in a future study.

      To address the limitations of our original qualitative approach and enhance the clarity and utility of our dataset, we have made the following revisions in the manuscript:

      (1) Candidate selection criteria: We now clearly define how candidates were selected for functional testing, based on their frequency across biological replicates. Specifically, “strong candidates” were detected in three or more replicates, while “weak candidates” appeared in two or fewer.

      (2) Frequency-based representation (Table 2):We appreciate the suggestion by Reviewer 4 (Major Comment 1) to quantify differences between high-salt control and trained groups. We now provide the frequency-based representation of the candidates tested in this study within our proteomics data in Table 2. This data showed that many of the tested candidates were more frequently detected in trained worms compared to high-salt controls. This includes both strong and weak candidates

      We hope these additions help clarify our approach and demonstrate the value of the dataset, even within the constraints of qualitative proteomics.

      (2) Also, tissue- or cell-specificity of the identified proteins were commonly discussed. In reviewer #3's first Major comment, appearance of non-neuronal protein in the list was pointed out, which collaborate with my (#4 reviewer's) question on successful identification of neuronal proteins by this method. On the other hand, reviewer #1 pointed out subset neuron-specific proteins in the list. Obviously, these issues need to be systematically described by the authors.

      We agree with Reviewer 4 that these analyses provide a critical angle of analysis that is not explored in the original manuscript.

      Tissue analysis (Reviewer 3 Major Comment 1): We have used the single neuron RNA-Seq database CeNGEN, to identify that 87-95% (i.e. a large majority) of proteins identified across replicates corresponded to genes detected in neurons. These findings support that the TurboID enzyme was able to target the neuronal proteome as expected. Table 1 provides this information as is now described in the main text of the revised work on page 16.

      Neuron class analyses (Reviewer 1 Major Comment 2): In response, we have used the suggested Wormbase gene enrichment tool and CeNGEN. We specifically input proteins from the learning proteome into Wormbase, after filtering for proteins unique to TurboID trained animals. For CeNGEN, we compared genes/proteins from control worms and trained worms to identify potential neurons that may be involved in this learning paradigm.

      Briefly, we found highlight a range of neuron classes known in learning (e.g., RIS interneurons), cells that affect behaviour but have not been explored in learning (e.g., IL1 polymodal neurons), and neurons for which their function/s are unknown (e.g., pharyngeal neuron I3). Corresponding text for this new analysis has been added on pages 16-20, with a new table and figure added to illustrate these findings (Table S7 & Figure 4). Methods are detailed on pages 50-51.

      (3) Given reviewer #1's OPTIONAL Major comment, as an expert of behavioral assays in C. elegans, I would like to comment based on my experience that mutants received from Caenorhabditis Genetics Center or other labs often lose the phenotype after outcrossing by the wild type, indicating that a side mutation was responsible for the observed behavioral phenotype. Therefore, outcrossing may be helpful and easier than rescue experiments, though the latter are of course more accurate.

      Thank you for this suggestion. To address the potential involvement of background mutations, we have done experiments with backcrossed versions of mutants tested where possible, as shown in Figure 6. We found that F46H5.3(-) mutants maintained enhanced learning capacity after backcrossing with wild type, compared to their non-backcrossed mutant line. This was in contrast to C30G12.6(-) animals which lost their enhanced learning phenotype following backcrossing using wild type worms. This is described in the text on pages 24-26.

      (4) Just let me clarify the first Minor comment by reviewer #2. Authors described that the kin-2 mutant has abnormality in "salt associative learning" and "salt aversive learning", according to authors' terminology. In this comment by reviewer #2, "gustatory associative learning" probably refers to both of these assays.

      Reviewer 4 is correct. We have amended the wording appropriately on page 31 to clarify our meaning to address Reviewer 2’s comment.

      “Although kin-2(ce179) mutants were not shown to impact salt aversive learning, they have been reported previously to display impaired intermediate-term memory (but intact learning and short-term memory) for butanone appetitive learning (Stein and Murphy, 2014).”

      (5) There seem to be several typos in reviewer #1's Minor comments.

      "In Page 9, Lines 17-18" -> "Page 8, Lines 17-18".

      "Page 8, Line 24" -> "Page 7, Line 24".

      "I would suggest to remove figure 3" -> "I would suggest to remove figure 2"

      "summary figure similar to Figure 4" -> "summary figure similar to Figure 3"

      "In the discussion Page 24, Line 14" -> "In the discussion Page 23, Line 14"

      (I note that because a top page was inserted in the "merged" file but not in art file for review, there is a shift between authors' page numbers and pdf page numbers in the former.) It would be nice if reviewer #1 can confirm on these because I might be wrong.

      We appreciate Reviewer 4 noting this, and can confirm that these are the correct references (as indicated by Reviewer 1 in their cross-comments)

      Reviewer #4 (Significance):

      (1) Total neural proteome analysis has not been conducted before for learning-induced changes, though transcriptome analysis has been performed for odor learning (Lakhina et al., http://dx.doi.org/10.1016/j.neuron.2014.12.029). This guarantees the novelty of this manuscript, because for some genes, protein levels may change even though mRNA levels remain the same. We note an example in which a proteome analysis utilizing TurboID, though not the comparison between trained/control, has led to finding of learning related proteins (Hiroki et al., http://dx.doi.org/10.1038/s41467-022-30279-7). As described in the Major comments 1) in the previous section, improvement of data presentation will be necessary to substantiate this novelty.

      We appreciate this thoughtful feedback. We agree that while the neuronal transcriptome has been explored in Lakhina et al., 2015 for C. elegans in the context of memory, our study represents the first to examine learning-induced changes in the total neuronal proteome. We particularly agree with the statement that “for some genes, protein levels may change even though mRNA levels remain the same”. This is essential rationale that we now discuss on page 42.

      Additionally, we acknowledge the relevance of the study by Hiroki et al., 2022, which used TurboID to identify learning-related proteins, though not in a trained versus control comparison. Our work builds on this by directly comparing trained and control conditions, thereby offering new insights into the proteomic landscape of learning. This is now clarified on page 36.

      To substantiate the novelty and significance of our approach, we have revised the data presentation throughout the manuscript, including clearer candidate selection criteria, frequency-based representation of proteomic hits (Table 2), and neuron-specific enrichment analyses (Table S7 & Figure 4). We hope these improvements help convey the unique contribution of our study to the field.

      (2) Authors found six mutants that have abnormality in the salt learning (Fig. 4). These genes have not been described to have the abnormality, providing novel knowledge to the readers, especially those who work on C. elegans behavioural plasticity. Especially, involvement of acetylcholine neurotransmission has not been addressed. Although site of action (neurons involved) has not been tested in this manuscript, it will open the venue to further determine the way in which acetylcholine receptors, cAMP pathway etc. influences the learning process.

      Thank you Reviewer 4, for this encouraging feedback. To further strengthen the study and expand its relevance, we have tested additional mutants in response to Reviewer 3’s comments, as shown in Figures 6 & S7. These results provide even more candidate genes and pathways for future exploration, enhancing the significance and impact of our study.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #3 (Public review):

      The central issue for evaluating the overfilling hypothesis is the identity of the mechanism that causes the very potent (>80% when inter pulse is 20 ms), but very quickly reverting (< 50 ms) paired pulse depression (Fig 1G, I). To summarize: the logic for overfilling at local cortical L2/3 synapses depends critically on the premise that probability of release (pv) for docked and fully primed vesicles is already close to 100%. If so, the reasoning goes, the only way to account for the potent short-term enhancement seen when stimulation is extended beyond 2 pulses would be by concluding that the readily releasable pool overfills. However, the conclusion that pv is close to 100% depends on the premise that the quickly reverting depression is caused by exocytosis dependent depletion of release sites, and the evidence for this is not strong in my opinion. Caution is especially reasonable given that similarly quickly reverting depression at Schaffer collateral synapses, which are morphologically similar, was previously shown to NOT depend on exocytosis (Dobrunz and Stevens 1997). Note that the authors of the 1997 study speculated that Ca2+-channel inactivation might be the cause, but did not rule out a wide variety of other types of mechanisms that have been discovered since, including the transient vesicle undocking/re-docking (and subsequent re-priming) reported by Kusick et al (2020), which seems to have the correct timing.

      Thank you for your comments on an alternative possibility besides Ca<sup>2+</sup> channel inactivation. Kusick et al. (2020) showed that transient destabilization of docked vesicle pool is recovered within 14 ms after stimulation. This rapid recovery implies that post-stimulation undocking events might be largely resolved before the 20 ms inter-stimulus interval (ISI) used in our paired-pulse ratio (PPR) experiments, arguing against the possibility that post-AP undocking/re-docking events significantly influence PPR measured at 20 ms ISI. Furthermore, Vevea et al. (2021) showed that post-stimulus undocking is facilitated in synaptotagmin-7 (Syt7) knockout synapses. In our study, Syt7 knockdown did not affect PPR at 20 ms ISI, suggesting that the undocking process described in Kusick et al. may not be a major contributor to the paired-pulse depression observed at 20 ms interval in our study. Therefore, it is unlikely that transient vesicle undocking primarily underlies the strong PPD at 20 ms ISI in our experiments. Taken together, the undocking/redocking dynamics reported by Kusick et al. are too rapid to affect PPR at 20 ms ISI, and our Syt7 knockdown data further argue against a significant role of this process in the PPD observed at 20 ms interval.

      In an earlier round of review, I suggested raising extracellular Ca<sup>2+</sup>, to see if this would increase synaptic strength. This is a strong test of the authors' model because there is essentially no room for an increase in synaptic strength. The authors have now done experiments along these lines, but the result is not clear cut. On one hand, the new results suggest an increase in synaptic strength that is not compatible with the authors' model; technically the increase does not reach statistical significance, but, likely, this is only because the data set is small and the variation between experiments is large. Moreover, a more granular analysis of the individual experiments seems to raise more serious problems, even supporting the depletion-independent counter hypothesis to some extent. On the other hand, the increase in synaptic strength that is seen in the newly added experiments does seem to be less at local L2/3 cortical synapses compared to other types of synapses, measured by other groups, which goes in the general direction of supporting the critical premise that pv is unusually high at L2/3 cortical synapses. Overall, I am left wishing that the new data set were larger, and that reversal experiments had been included as explained in the specific points below.

      Specific Points:

      (1) One of the standard methods for distinguishing between depletion-dependent and depletion-independent depression mechanisms is by analyzing failures during paired pulses of minimal stimulation. The current study includes experiments along these lines showing that pv would have to be extremely close to 1 when Ca<sup>2+</sup> is 1.25 mM to preserve the authors' model (Section "High double failure rate ..."). Lower values for pv are not compatible with their model because the k<sub>1</sub> parameter already had to be pushed a bit beyond boundaries established by other types of experiments.

      It should be noted that we did not arbitrarily pushed the k<sub>1</sub> parameter beyond boundaries, but estimated the range of k<sub>1</sub> based on the fast time constant for recovery from paired pulse depression as shown in Fig. 3-S2-Ab.

      The authors now report a mean increase in synaptic strength of 23% after raising Ca to 2.5 mM. The mean increase is not quite statistically significant, but this is likely because of the small sample size. I extracted a 95% confidence interval of [-4%, +60%] from their numbers, with a 92% probability that the mean value of the increase in the full population is > 5%. I used the 5% value as the greatest increase that the model could bear because 5% implies pv < 0.9 using the equation from Dodge and Rahamimoff referenced in the rebuttal. My conclusion from this is that the mean result, rather than supporting the model, actually undermines it to some extent. It would have likely taken 1 or 2 more experiments to get above the 95% confidence threshold for statistical significance, but this is ultimately an arbitrary cut off.

      Our key claim in Fig. 3-S3 is not the statistical non-significance of EPSC changes, but the small magnitude of the change (1.23-fold). This small increase is far less than the 3.24-fold increase predicted by the fourth-power relationship (D&R equation, Dodge & Rahamimoff, 1967), which would be valid under the conditions that the fusion probability of docked vesicles (p<sub>v</sub>) is not saturated. We do not believe that addition of new experiments would increase the magnitude of EPSC change as high as the Dodge & Rahamimoff equation predicts, even if more experiments (n) yielded a statistical significance. In other words, even a small but statistically significant EPSC changes would still contradict with what we expect from low p<sub>v</sub> synapses. It should be noted that our main point is the extent of EPSC increase induced by high external [Ca<sup>2+</sup>], not a p-value. In this regard, it is hard for us to accept the Reviewer’s request for larger sample size expecting lower p-value.

      Although we agree to Reviewer’s assertion that our data may indicate a 92% probability for the high Ca<sup>2+</sup> -induced EPSC increases by more than 5%, we do not agree to the Reviewer’s interpretation that the EPSC increase necessarily implies an increase in p<sub>v</sub>. We are sorry that we could not clearly understand the Reviewer’s inference that the 5% increase of EPSCs implies p<sub>v</sub> < 0.9. Please note that release probability (p<sub>r</sub>) is the product of p<sub>v</sub> and the occupancy of docked vesicles in an active zone (p<sub>occ</sub>). We imagine that this inference might be under the premise that p<sub>occ</sub> is constant irrespective of external [Ca<sup>2+</sup>]. Contrary to the Reviewer’s premise, Figure 2c in Kusick et al. (2020) showed that the number of docked SVs increased by c. a. 20% upon increasing external [Ca<sup>2+</sup>] to 2 mM. Moreover, Figure 7F in Lin et al. (2025) demonstrated that the number of TS vesicles, equivalent to p<sub>occ</sub> increased by 23% at high external [Ca<sup>2+</sup>]. These extents of p<sub>occ</sub> increases are similar to our magnitude of high external Ca<sup>2+</sup> -induced increase in EPSC (1.23-fold). Of course, it is possible that both increase of p<sub>occ</sub> and p<sub>v</sub> contributed to the high [Ca<sup>2+</sup>]<sub>o</sub>-induced increase in EPSC. The low PPR and failure rate analysis, however, suggest that p<sub>v</sub> is already saturated in baseline conditions of 1.3 mM [Ca<sup>2+</sup>]<sub>o</sub> and thus it is more likely that an increase in p<sub>occ</sub> is primarily responsible for the 1.23-fold increase. Moreover, the 1.23-fold increase, does not match to the prediction of the D&R equation, which would be valid at synapses with low p<sub>v</sub>. Therefore, interpreting our observation (1.23-fold increase) as a slight increase in p<sub>occ</sub> is rather consistent with recent papers (Kusick et al.,2020; Lin et al., 2025) as well as our other results supporting the baseline saturation of p<sub>v</sub> as shown in Figure 2 and associated supplement figures (Fig. 2-S1 and Fig. 2-S2).

      (2) The variation between experiments seems to be even more problematic, at least as currently reported. The plot in Figure 3-figure supplement 3 (left) suggests that the variation reflects true variation between synapses, not measurement error.

      Note that there was a substantial variance in the number of docked or TS vesicles at baseline and its fold changes at high external Ca<sup>2+</sup> condition in previous studies too (Lin et al., 2025; Kusick et al., 2020). Our study did not focus on the heterogeneity but on the mean dynamics of short-term plasticity at L2/3 recurrent synapses. Acknowledging this, the short-term plasticity of these synapses could be best explained by assuming that vesicular fusion probability (p<sub>v</sub>) is near to unity, and that release probability is regulated by p<sub>occ</sub>. In other words, even though p<sub>v</sub> is near to unity, synaptic strength can increase upon high external [Ca<sup>2+</sup>], if the baseline occupancy of release sites (p<sub>occ</sub>) is low and p<sub>occ</sub> is increased by high [Ca<sup>2+</sup>]. Lin et al. (2025) showed that high external [Ca<sup>2+</sup>] induces an increase in the number of TS vesicles (equivalent to p<sub>occ</sub>) by 23% at the calyx synapses. Different from our synapses, the baseline p<sub>v</sub> (denoted as p<sub>fusion</sub> in Lin et al., 2025) of the calyx synapse is not saturated (= 0.22) at 1.5 mM external [Ca<sup>2+</sup>], and thus the calyx synapses displayed 2.36-fold increase of EPSC at 2 mM external [Ca<sup>2+</sup>], to which increases in p<sub>occ</sub> as well as in p<sub>v</sub> (from 0.22 to 0.42) contributed. Therefore, the small increase in EPSC (= 23%) supports that p<sub>v</sub> is already saturated at L2/3 recurrent synapses.

      And yet, synaptic strength increased almost 2-fold in 2 of the 8 experiments, which back extrapolates to pv < 0.2.

      We are sorry that we could not understand the first comment in this paragraph. Could you explain in detail why two-fold increase implies pv < 0.2?

      If all of the depression is caused by depletion as assumed, these individuals would exhibit paired pulse facilitation, not depression. And yet, from what I can tell, the individuals depressed, possibly as much as the synapses with low sensitivity to Ca<sup>2+</sup>, arguing against the critical premise that depression equals depletion, and even arguing - to some extent - for the counter hypothesis that a component of the depression is caused by a mechanism that is independent of depletion.

      For the first statement in this paragraph, we imagine that ‘the depression’ means paired pulse depression (PPD). If so, we can not understand why depletion-dependent PPD should lead to PPF. If the paired pulse interval is too short for docked vesicles to be replenished, the first pulse-induced vesicle depletion would result in PPD. We are very sorry that we could not understand Reviewer’s subsequent inference, because we could not understand the first statement.

      I would strongly recommend adding an additional plot that documents the relationship between the amount of increase in synaptic strength after increasing extracellular Ca<sup>2+</sup> and the paired pulse ratio as this seems central.

      We found no clear correlation of EPSC<sub>1</sub> with PPR changes (ΔPPR) as shown in the figure below.

      Author response image 1.

      Plot of PPR changes as a function of EPSC1.<br />

      (3) Decrease in PPR. The authors recognize that the decrease in the paired-pulse ratio after increasing Ca<sup>2+</sup> seems problematic for the overfilling hypothesis by stating: "Although a reduction in PPR is often interpreted as an increase in pv, under conditions where pv is already high, it more likely reflects a slight increase in p<sub>occ</sub> or in the number of TS vesicles, consistent with the previous estimates (Lin et al., 2025)."

      We admit that there is a logical jump in our statement you mentioned here. We appreciate your comment. We re-wrote that part in the revised manuscript (line 285) as follows:

      “Recent morphological and functional studies revealed that elevation of [Ca<sup>2+</sup>]<sub>o</sub> induces an increase in the number of TS or docked vesicles to a similar extent as our observation (Kusick et al., 2020; Lin et al., 2025), raising a possibility that an increase in p<sub>occ</sub> is responsible for the 1.23-fold increase in EPSC at high [Ca<sup>2+</sup>]<sub>o</sub> . A slight but significant reduction in PPR was observed under high [Ca<sup>2+</sup>]<sub>o</sub> too. An increase in p<sub>occ</sub> is thought to be associated with that in the baseline vesicle refilling rate. While PPR is always reduced by an increase in p<sub>v,</sub> the effects of refilling rate to PPR is complicated. For example, PPR can be reduced by both a decrease (Figure 2—figure supplement 1) and an increase (Lin et al., 2025) in the refilling rate induced by EGTA-AM and PDBu, respectively. Thus, the slight reduction in PPR is not contradictory to the possible contribution of p<sub>occ</sub> to the high [Ca<sup>2+</sup>]<sub>o</sub> effects.”

      I looked quickly, but did not immediately find an explanation in Lin et al 2025 involving an increase in pocc or number of TS vesicles, much less a reason to prefer this over the standard explanation that reduced PPR indicates an increase in pv.

      Fig. 7F of Lin et al. (2025) shows an 1.23-fold increase in the number of TS vesicles by high external [Ca<sup>2+</sup>]. The same figure (Fig. 7E) in Lin et al. (2025) also shows a two-fold increase of p<sub>fusion</sub> (equivalent to p<sub>v</sub> in our study) by high external [Ca<sup>2+</sup>] (from 0.22 to 0.42,). Because p<sub>occ</sub> is the occupancy of TS vesicles in a limited number of slots in an active zone, the fold change in the number of TS vesicles should be similar to that of p<sub>occ</sub>.

      The authors should explain why the most straightforward interpretation is not the correct one in this particular case to avoid the appearance of cherry picking explanations to fit the hypothesis.

      The results of Lin et al. (2025) indicate that high external [Ca<sub>2+</sub>] induces a milder increase in p<sub>occ</sub> (23%) compared to p<sub>v</sub> (190%) at the calyx synapses. Because the extent of p<sub>occ</sub> increase is much smaller than that of p<sub>v</sub> and multiple lines of evidence in our study support that the baseline p<sub>v</sub> is already saturated, we raised a possibility that an increase in p<sub>occ</sub> would primarily contribute to the unexpectedly low increase of EPSC at 2.5 mM [Ca<sub>2+</sub>]<sub>o</sub>. As mentioned above, our interpretation is also consistent with the EM study of Kusick et al. (2020). Nevertheless, the reduction of PPR at 2.5 mM Ca<sub>2+</sub> seems to support an increase in p<sub>v,</sub> arguing against this possibility. On the other hand, because p<sub>occ</sub> = k<sub>1</sub>/(k<sub>1</sub>+b<sub>1</sub>) under the simple vesicle refilling model (Fig. 3-S2Aa), a change in p<sub>occ</sub> should associate with changes in k<sub>1</sub> and/or b<sub>1</sub>. While PPR is always reduced by an increase in p<sub>v,</sub> the effects of refilling rate to PPR is complicated. For example, despite that EGTA-AM would not increase p<sub>v,</sub> it reduced PPR probably through reducing refilling rate (Fig. 2-S1). On the contrary, PDBu is thought to increase k<sub>1</sub> because it induces two-fold increase of p<sub>occ</sub> (Fig. 7L of Lin et al., 2025). Such a marked increase of p<sub>occ,</sub> rather than p<sub>v,</sub> seems to be responsible for the PDBu-induced marked reduction of PPR (Fig. 7I of Lin et al., 2025), because PDBu induced only a slight increase in p<sub>v</sub> (Fig. 7K of Lin et al., 2025). Therefore, the slight reduction of PPR is not contradictory to our interpretation that an increase in p<sub>occ</sub> might be responsible for the slight increase in EPSC induced by high [Ca<sup>2+</sup>]<sub>o</sub>.

      (4) The authors concede in the rebuttal that mean pv must be < 0.7, but I couldn't find any mention of this within the manuscript itself, nor any explanation for how the new estimate could be compatible with the value of > 0.99 in the section about failures.

      We have never stated in the rebuttal or elsewhere that the mean p<sub>v</sub> must be < 0.7. On the contrary, both of our manuscript and previous rebuttals consistently argued that the baseline p<sub>v</sub> is already saturated, based on our observations including low PPR, tight coupling, high double failure rate and the minimal effect of external Ca<sup>2+</sup> elevation.

      (5) Although not the main point, comparisons to synapses in other brain regions reported in other studies might not be accurate without directly matching experiments.

      Please understand that it not trivial to establish optimal experimental settings for studying other synapses using the same methods employed in the study. We think that it should be performed in a separate study. Furthermore, we have already shown in the manuscript that action potentials (APs) evoked by oChIEF activation occur in a physiologically natural manner, and the STP induced by these oChIEF-evoked APs is indistinguishable from the STP elicited by APs evoked by dual-patch electrical stimulation. Therefore, we believe that our use of optogenetic stimulation did not introduce any artificial bias in measuring STP.

      As it is, 2 of 8 synapses got weaker instead of stronger, hinting at possible rundown, but this cannot be assessed because reversibility was not evaluated. In addition, comparing axons with and without channel rhodopsins might be problematic because the channel rhodopsins might widen action potentials.

      We continuously monitored series resistance and baseline EPSC amplitude throughout the experiments. The figure below shows the mean time course of EPSCs at two different [Ca<sup>2+</sup>]<sub>o</sub>. As it shows, we observed no tendency for run-down of EPSCs during experiments. If any, such recordings were discarded from analysis. In addition, please understand that there is a substantial variance in the number of docked vesicles at both baseline and high external Ca<sup>2+</sup> (Lin et al., 2025; Kusick et al., 2020) as well as short-term dynamics of EPSCs at our synapses.

      Author response image 2.

      Time course of normalized amplitudes of the first EPSCs during paired-pulse stimulation at 20 ms ISI in control and in the elevated external Ca<sup>2+</sup> (n = 8).<br />

      (6) Perhaps authors could double check with Schotten et al about whether PDBu does/does not decrease the latency between osmotic shock and transmitter release. This might be an interesting discrepancy, but my understanding is that Schotten et al didn't acquire information about latency because of how the experiments were designed.

      Schotten et al. (2015) directly compared experimental and simulation data for hypertonicity-induced vesicle release. They showed a pronounced acceleration of the latency as the tonicity increases (Fig. 2-S2), but this tonicity-dependent acceleration was not reproduced by reducing the activation energy barrier for fusion (ΔEa) in their simulations (Fig. 2-S1). Thus, the authors mentioned that an unknown compensatory mechanism counteracting the osmotic perturbation might be responsible for the tonicity-dependent changes in the latency. Importantly, their modeling demonstrated that reducing ΔEa, which would correspond to increasing p<sub>v</sub> results in larger peak amplitudes and shorter time-to-peak, but did not accelerate the latency. Therefore, there is currently no direct explanation for the notion that PDBu or similar manipulations shorten latency via an increase in p<sub>v</sub>.

      (7) The authors state: "These data are difficult to reconcile with a model in which facilitation is mediated by Ca2+-dependent increases in pv." However, I believe that discarding the premise that depression is always caused by depletion would open up wide range of viable possibilities.

      We hope that Reviewer understands the reasons why we reached the conclusion that the baseline p<sub>v</sub> is saturated at our synapses. First of all, strong paired pulse depression (PPD) cannot be attributed to Ca<sup>2+</sup> channel inactivation because Ca<sup>2+</sup> influx at the axon terminal remained constant during 40 Hz train stimulation (Fig.2 -S2). Moreover, even if Ca<sup>2+</sup> channel inactivation is responsible for the strong PPD, this view cannot explain the delayed facilitation that emerges subsequent pulses (third EPSC and so on) in the 40 Hz train stimulation (Fig. 1-4), because Ca<sup>2+</sup> channel inactivation gradually accumulates during train stimulations as directly shown by Wykes et al. (2007) in chromaffin cells. Secondly, the strong PPD and very fast recovery from PPD indicates very fast refilling rate constant (k<sub>1</sub>). Under this high k<sub>1</sub>, the failure rates were best explained by p<sub>v</sub> close to unity. Thirdly, the extent of EPSC increase induced by high external Ca<sup>2+</sup> was much smaller than other synapses such as calyx synapses at which p<sub>v</sub> is not saturated (Lin et al., 2025), and rather similar to the increases in p<sub>occ</sub> estimated at calyx synapses or the EM study (Kusick et al., 2020; Lin et al., 2025).

      Reference

      Wykes et al. (2007). Differential regulation of endogenous N-and P/Q-type Ca<sup>2+</sup> channel inactivation by Ca<sup>2+</sup>/calmodulin impacts on their ability to support exocytosis in chromaffin cells. Journal of Neuroscience, 27(19), 5236-5248.

      Reviewer #3 (Recommendations for the authors):

      I continue to think that measuring changes in synaptic strength when raising extracellular Ca<sup>2+</sup> is a good experiment for evaluating the overfilling hypothesis. Future experiments would be better if the authors would include reversibility criteria to rule out rundown, etc. Also, comparisons to other types of synapses would be stronger if the same experimenter did the experiments at both types of synapses.

      We observed no systemic tendency for run-down of EPSCs during these experiments (Author response image 2). Furthermore, the observed variability is well within the expected variance range in the number of docked vesicles at both baseline and high external Ca²⁺ (Lin et al., 2025; Kusick et al., 2020) and reflects biological variability rather than experimental artifact. Therefore, we believe that additional reversibility experiments are not warranted. However, we are open to further discussion if the Reviewer has specific methodological concerns not resolved by our present data.

      For the second issue, as mentioned above, we think that studying at other synapse types should be done in a separate study.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Recommendations for the authors):

      (1) The onus of making the revisions understandable to the reviewers lies with the authors. In its current form, how the authors have approached the review is hard to follow, in my opinion. Although the authors have taken a lot of effort in answering the questions posed by reviewers, parallel changes in the manuscript are not clearly mentioned. In many cases, the authors have acknowledged the criticism in response to the reviewer, but have not changed their narrative, particularly in the results section.

      We fully acknowledge your concern regarding the narrative linking EB-induced GluCl expression to JH biosynthesis and fecundity enhancement, particularly the need to address alternative interpretations of the data. Below, we outline the specific revisions made to address your feedback and ensure the manuscript’s narrative aligns more precisely with the experimental evidence:

      (1) Revised Wording in the Results Section

      To avoid overinterpretation of causality, we have modified the language in key sections of the Results (e.g., Figure 5 and related text):

      Original phrasing:

      “These results suggest that EB activates GluCl which induces JH biosynthesis and release, which in turn stimulates reproduction in BPH (Figure 5J).”

      Revised phrasing:

      “We also examined whether silencing Gluclα impacts the AstA/AstAR signaling pathway in female adults. Knock-down of Gluclα in female adults was found to have no impact on the expression of AT, AstA, AstB, AstCC, AstAR, and AstBR. However, the expression of AstCCC and AstCR was significantly upregulated in dsGluclα-injected insects (Figure 5-figure supplement 2A-H). Further studies are required to delineate the direct or indirect mechanisms underlying this effect of Gluclα-knockdown.” (line 643-649). And we have removed Figure 5J in the revised manuscript.

      (2) Expanded Discussion of Alternative Mechanisms

      In the Discussion section, we have incorporated a dedicated paragraph to explore alternative pathways and compensatory mechanisms:

      Key additions:

      “This EB action on GluClα expression is likely indirect, and we do not consider EB as transcriptional regulator of GluClα. Thus, the mechanism behind EB-mediated induction of GluClα remains to be determined. It is possible that prolonged EB exposure triggers feedback mechanisms (e.g. cellular stress responses) to counteract EB-induced GluClα dysfunction, leading to transcriptional upregulation of the channel. Hence, considering that EB exposure in our experiments lasts several days, these findings might represent indirect (or secondary) effects caused by other factors downstream of GluCl signaling that affect channel expression.” (line 837-845).

      (2) In the response to reviewers, the authors have mentioned line numbers in the main text where changes were made. But very frequently, those lines do not refer to the changes or mention just a subsection of changes done. As an example please see point 1 of Specific Points below. The problem is throughout the document making it very difficult to follow the revision and contributing to the point mentioned above.

      Thank you for highlighting this critical oversight. We sincerely apologize for the inconsistency in referencing line numbers and incomplete descriptions of revisions, which undoubtedly hindered your ability to track changes effectively. We have eliminated all vague or incomplete line number references from the response letter. Instead, revisions are now explicitly tied to specific sections, figures, or paragraphs.

      (3) The authors need to infer the performed experiments rationally without over interpretation. Currently, many of the claims that the authors are making are unsubstantiated. As a result of the first review process, the authors have acknowledged the discrepancies, but they have failed to alter their interpretations accordingly.

      We fully agree that overinterpretation of data undermines scientific rigor. In response to your feedback, we have systematically revised the manuscript to align claims strictly with experimental evidence and to eliminate unsubstantiated assertions. We sincerely apologize for the earlier overinterpretations and appreciate your insistence on precision. The revised manuscript now rigorously distinguishes between observations (e.g., EB-GluCl-JH correlations) and hypotheses (e.g., GluCl’s mechanistic role). By tempering causal language and integrating competing explanations, we aimed to present a more accurate and defensible narrative.

      SPECIFIC POINTS (to each question initially raised and their rebuttals)

      (1a) "Actually, there are many studies showing that insects treated with insecticides can increase the expression of target genes". Please note what is asked for is that the ligand itself induces the expression of its receptor. Of course, insecticide treatment will result in the changes expression of targets. Of all the evidences furnished in rebuttal, only Peng et al. 2017 fits the above definition. Even in this case, the accepted mode of action of chlorantraniliprole is by inducing structural change in ryanodine receptor. The observed induction of ryanodine receptor chlorantraniliprole can best be described as secondary effect. All others references do not really suffice the point asked for.

      We appreciate the reviewers’ suggestions for improving the manuscript. First, we have supplemented additional studies supporting the notion that " There are several studies showing that insects treated with insecticides display increases in the expression of target genes. For example, the relative expression level of the ryanodine receptor gene of the rice stem borer, Chilo suppressalis was increased 10-fold after treatment with chlorantraniliprole, an insecticide which targets the ryanodine receptor (Peng et al., 2017). In Drosophila, starvation (and low insulin) elevates the transcription level of the receptors of the neuropeptides short neuropeptide F and tachykinin (Ko et al., 2015; Root et al., 2011). In BPH, reduction in mRNA and protein expression of a nicotinic acetylcholine receptor α8 subunit is associated with resistance to imidacloprid (Zhang et al., 2015). Knockdown of the α8 gene by RNA interference decreased the sensitivity of N. lugens to imidacloprid (Zhang et al., 2015). Hence, the expression of receptor genes may be regulated by diverse factors, including insecticide exposure.” We have inserted text in lines 846-857 to elaborate on these possibilities.

      Second, we would like to reiterate our position: we have merely described this phenomenon, specifically that EB treatment increases GluClα expression. “This EB action on GluClα expression is likely indirect, and we do not consider EB as transcriptional regulator of GluClα. Thus, the mechanism behind EB-mediated induction of GluClα remains to be determined. It is possible that prolonged EB exposure triggers feedback mechanisms (e.g. cellular stress responses) to counteract EB-induced GluClα dysfunction, leading to transcriptional upregulation of the channel. Hence, considering that EB exposure in our experiments lasts several days, these findings might represent indirect (or secondary) effects caused by other factors downstream of GluCl signaling that affect channel expression.” We have inserted text in lines 837-845 to elaborate on these possibilities.

      Once again, we sincerely appreciate this discussion, which has provided us with a deeper understanding of this phenomenon.

      b. The authors in their rebuttal accepts that they do not consider EB to a transcriptional regulator of Gluclα and the induction of Gluclα as a result of EB can best be considered as a secondary effect. But that is not reflected in the manuscript, particularly in the result section. Current state of writing implies EB up regulation of Gluclα to an important event that contributes majorly to the hypothesis. So much so that they have retained the schematic diagram (Fig. 5J) where EB -> Gluclα is drawn. Even the heading of the subsection says "EB-enhanced fecundity in BPHs is dependent on its molecular target protein, the Gluclα channel". As mentioned in the general points, it is not enough to have a good rebuttal written to the reviewer, the parent manuscript needs to reflect on the changes asked for.

      Thank you for your comments. We have carefully addressed your suggestions and made corresponding revisions to the manuscript.

      We fully acknowledge the reviewer's valid concern. In this revised manuscript, “However, we do not propose that EB is a direct transcriptional regulator of Gluclα, since EB and other avermectins are known to alter the channel conformation and thus their function (Wolstenholme, 2012; Wu et al., 2017). Thus, it is likely that the observed increase in Gluclα transcipt is a secondary effect downstream of EB signaling.” (Line 625-629). We agree that the original presentation in the manuscript, particularly within the Results section, did not adequately reflect this nuance and could be misinterpreted as suggesting a direct regulatory role for EB on Gluclα transcription.

      Regarding Fig. 5J, we have removed the figure and all mentions of Fig. 5J and its legend in the revised manuscript.

      c. "We have inserted text on lines 738 - 757 to explain these possibilities." Not a single line in the section mentioned above discussed the topic in hand. This is serious undermining of the review process or carelessness to the extreme level.

      In the Results section, we have now added descriptions “Taken together, these results reveal that EB exposure is associated with an increase in JH titer and that this elevated JH signaling contributes to enhanced fecundity in BPH.” (line 375-377).

      For the figures, we have removed Fig. 4N and all mentions of Fig. 4N and its legend in the revised manuscript.

      Lastly, regarding the issue of locating specific lines, we deeply regret any inconvenience caused. Due to the track changes mode used during revisions, line numbers may have shifted, resulting in incorrect references. We sincerely apologize for this and have now corrected the line numbers.

      (2) The section written in rebuttal should be included in the discussion as well, explaining why authors think a nymphal treatment with JH may work in increasing fecundity of the adults. Also, the authors accept that EBs effect on JH titer in Indirect. The text of the manuscript, results section and figures should be reflective of that. It is NOT ok to accept that EB impacts JH titer indirectly in a rebuttal letter while still continuing to portray EB direct effect on JH titer. In terms of diagrams, authors cannot put a -> sign until and unless the effect is direct. This is an accepted norm in biological publications.

      We appreciate the reviewer’s valuable suggestions here. We have now carefully revised the manuscript to address all concerns, particularly regarding the mechanism linking nymphal EB exposure to adult fecundity and the indirect nature of EB’s effect on JH titers. Below are our point-by-point responses and corresponding manuscript changes. Revised text is clearly marked in the resubmitted manuscript.

      (1) Clarifying the mechanism linking nymphal EB treatment to adult fecundity:

      Reviewer concern: Explain why nymphal EB treatment increases adult fecundity despite undetectable EB residues in adults.

      Response & Actions Taken:

      We agree this requires explicit discussion. We now propose that nymphal EB exposure triggers developmental reprogramming (e.g., metabolic/epigenetic changes) that persist into adulthood, indirectly enhancing JH synthesis and fecundity. This is supported by two key findings:

      (1) No detectable EB residues in adults after nymphal treatment (new Figure 1–figure supplement 1C).

      (2) Increased adult weight and nutrient reserves (Figure 1–figure supplement 3E,F), suggesting altered resource allocation.

      Added to Discussion (Lines 793–803): Notably, after exposing fourth-instar BPH nymphs to EB, no EB residues were detected in the subsequent adult stage. This finding indicates that the EB-induced increase in adult fecundity is initiated during the nymphal stage and s manifests in adulthood - a mechanism distinct from the direct fecundity enhancement of fecundity observed when EB is applied to adults. We propose that sublethal EB exposure during critical nymphal stages may reprogram metabolic or endocrine pathways, potentially via insulin/JH crosstalk. For instance, increased nutrient storage (e.g., proteins, sugars; Figure 2–figure supplement 2) could enhance insulin signaling, which in turn promotes JH biosynthesis in adults (Ling and Raikhel, 2021; Mirth et al., 2014; Sheng et al., 2011). Future studies should test whether EB alters insulin-like peptide expression or signaling during development.

      (3) Emphasizing EB’s indirect effect on JH titers:Reviewer concern: The manuscript overstated EB’s direct effect on JH. Arrows in figures implied causality where only correlation exists.

      Response & Actions

      Taken:We fully agree. EB’s effect on JH is indirect and multifactorial (via AstA/AstAR suppression, GluCl modulation, and metabolic changes). We have:

      Removed oversimplified schematics (original Figures 3N, 4N, 5J).

      Revised all causal language (e.g., "EB increases JH" → "EB exposure is associated with increased circulating JH III "). (Line 739)

      Clarified in Results/Discussion that EB-induced JH changes are likely secondary to neuroendocrine disruption.

      Key revisions:

      Results (Lines 375–377):

      "Taken together, these results reveal that EB exposure is associated with an increase in JH titer and that JH signaling contributes to enhanced fecundity in BPH."

      Discussion (Lines 837–845):

      This EB action on GluClα expression is likely indirect, and we do not consider EB as transcriptional regulator of GluClα. Thus, the mechanism behind EB-mediated induction of GluClα remains to be determined. It is possible that prolonged EB exposure triggers feedback mechanisms (e.g. cellular stress responses) to counteract EB-induced GluClα dysfunction, leading to transcriptional upregulation of the channel. Hence, considering that EB exposure in our experiments lasts several days, these findings might represent indirect (or secondary) effects caused by other factors downstream of GluCl signaling that affect channel expression.

      a. Lines 281-285 as mentioned, does not carry the relevant information.

      Thank you for your careful review of our manuscript. We sincerely apologize for the confusion regarding line references in our previous response. Due to extensive revisions and tracked changes during the revision process, the line numbers shifted, resulting in incorrect citations for Lines 281–285. The correct location for the added results (EB-induced increase in mature eggs in adult ovaries) is now in lines 253-258: “We furthermore observed that EB treatment of female adults also increases the number of mature eggs in the ovary (Figure 2-figure supplement 1).”

      b. Lines 351-356 as mentioned, does not carry the relevant information. Lines 281-285 as mentioned, does not carry the relevant information.

      Thank you for your careful review of our manuscript. We sincerely apologize for the confusion regarding line references in our previous response. The correct location for the added results is now in lines 366-371: “We also investigated the effects of EB treatment on the JH titer of female adults. The data indicate that the JH titer was also significantly increased in the EB-treated female adults compared with controls (Figure 3-figure supplement 3A). However, again the steroid 20-hydroxyecdysone, was not significantly different between EB-treated BPH and controls (Figure 3-figure supplement 3B).”

      c. Lines 378-379 as mentioned, does not carry the relevant information. Lines 387-390 as mentioned, does not carry the relevant information.

      We sincerely apologize for the confusion regarding line references in our previous response.

      The correct location for the added results is now in lines 393-394: We furthermore found that EB treatment in female adults increases JHAMT expression (Figure 3-figure supplement 3C).

      The other correct location for the added results is now in lines 405-408: We found that Kr-h1 was significantly upregulated in the adults of EB-treated BPH at the 5M, 5L nymph and 4 to 5 DAE stages (4.7-fold to 27.2-fold) when 4th instar nymph or female adults were treated with EB (Figure 3H and Figure 3-figure supplement 3D)..

      (3) The writing quality is still extremely poor. It does not meet any publication standard, let alone elife.

      We fully understand your concerns and frustrations, and we sincerely apologize for the deficiencies in our writing quality, which did not meet the high standards expected by you and the journal. We fully accept your criticism regarding the writing quality and have rigorously revised the manuscript according to your suggestions.

      (4) I am confused whether Figure 2B was redone or just edited. Otherwise this seems acceptable to me.

      Regarding Fig. 2B, we have edited the text on the y-axis. The previous wording included the term “retention,” which may have caused misunderstanding for both the readers and yourself, leading to the perception of contradiction. We have now revised this wording to ensure accurate comprehension.

      (5) The rebuttal is accepted. However, still some of the lines mentioned does not hold relevant information.

      This error has been corrected.

      The correct location for the added results is now in lines 255-258 and lines 279-282: “Hence, although EB does not affect the normal egg developmental stages (see description in next section), our results suggest that EB treatment promotes oogenesis and, as a result the insects both produce more eggs in the ovary and a larger number of eggs are laid.” and “However, considering that the number of eggs laid by EB treated females was larger than in control females (Figure 1 and Figure 1-figure supplement 1), our data indicates that EB treatment of BPH can both promote both oogenesis and oviposition.”

      (6) Thank you for the clarification. Although now discussed extensively in discussion section, the nuances of indirect effect and minimal change in expression should also be reflected in the result section text. This is to ensure that readers have clear idea about content of the paper.

      Corrected. To ensure readers gain a clear understanding of our data, we have briefly presented these discussions in the Results section. Please see line 397-402: The levels of met mRNA slightly increased in EB-treated BPH at the 5M and 5L instar nymph and 1 to 5 DAE adult stages compared to controls (1.7-fold to 2.9-fold) (Figure 3G). However, it should be mentioned that JH action does not result in an increase of Met. Thus, it is possible that other factors (indirect effects), induced by EB treatment cause the increase in the mRNA expression level of Met.

      (7) As per the author's interpretation, it becomes critical to quantitate the amount of EB present at the adult stages after a 4th instar exposure to it. Only this experiment will unambiguously proof the authors claim. Also, since they have done adult insect exposure to EB, such experiments should be systematically performed for as many sections as possible. Don't just focus on few instances where reviewers have pointed out the issue.

      Thank you for raising this critical point. To address this concern, we have conducted new supplementary experiments. The new experimental results demonstrate that residual levels of emamectin benzoate (EB) in adult-stage brown planthoppers (BPH) were below the instrument detection limit following treatment of 4th instar nymphs with EB. Line 172-184: “To determine whether EB administered during the fourth-instar larval stage persists as residues in the adult stage, we used HPLC-MS/MS to quantify the amount of EB present at the adult stage after exposing 4th-instar nymphs to this compound. However, we found no detectable EB residues in the adult stage following fourth-instar nymphal treatment (Figure 1-figure supplement 1C). This suggests that the mechanism underlying the increased fecundity of female adults induced by EB treatment of nymphs may differ from that caused by direct EB treatment of female adults. Combined with our previous observation that EB treatment significantly increased the body weight of adult females (Figure 1—figure supplement 3E and F), a possible explanation for this phenomenon is that EB may enhance food intake in BPH, potentially leading to elevated production of insulin-like peptides and thus increased growth. Increased insulin signaling could potentially also stimulate juvenile hormone (JH) biosynthesis during the adult stage (Badisco et al., 2013).”

      (8) Thank you for the revision. Lines 725-735 as mentioned, does not carry the relevant information. However, since the authors have decided to remove this systematically from the manuscript, discussion on this may not be required.

      Thank you for identifying the limited relevance of the content in Lines 725–735 of the original manuscript. As recommended, we have removed this section in the revised version to improve logical coherence and maintain focus on the core findings.

      (9) Normally, dsRNA would last for some time in the insect system and would down-regulate any further induction of target genes by EB. I suggest the authors to measure the level of the target genes by qPCR in KD insects before and after EB treatment to clear the confusion and unambiguously demonstrate the results. Please Note- such quantifications should be done for all the KD+EB experiments. Additionally, citing few papers where such a rescue effect has been demonstrated in closely related insect will help in building confidence.

      We appreciate the reviewer’s suggestion to clarify the interaction between RNAi-mediated gene knockdown (KD) and EB treatment. To address this, we performed additional experiments measuring Kr-h1 expression via qPCR in dsKr-h1-injected insects before and after EB exposure.

      The results (now Figure 3–figure supplement 4) show that:

      (1) EB did not rescue *Kr-h1* suppression at 24h post-treatment (*p* > 0.05).

      (2) Partial recovery of fecundity occurred later (Figure 3M), likely due to:

      a) Degradation of dsRNA over time, reducing KD efficacy (Liu et al., 2010).

      b) Indirect effects of EB (e.g., hormonal/metabolic reprogramming) compensating for residual Kr-h1 suppression.

      Please see line 441-453: “Next, we investigated whether EB treatment could rescue the dsRNA-mediated gene silencing effect. To address this, we selected the Kr-h1 gene and analyzed its expression levels after EB treatment. Our results showed that Kr-h1 expression was suppressed by ~70% at 72 h post-dsRNA injection. However, EB treatment did not significantly rescue Kr-h1 expression in gene knock down insects (*p* > 0.05) at 24h post-EB treatment (Figure 3-figure supplement 4). While dsRNA-mediated Kr-h1 suppression was robust initially, its efficacy may decline during prolonged experiments. This aligns with reports in BPH, where effects of RNAi gradually diminish beyond 7 days post-injection (Liu et al., 2010a). The late-phase fecundity increase might reflect partial Kr-h1 recovery due to RNAi degradation, allowing residual EB to weakly stimulate reproduction. In addition, the physiological impact of EB (e.g., neurotoxicity, hormonal modulation) could manifest via compensatory feedback loops or metabolic remodeling.”

      (10) Not a very convincing argument. Besides without a scale bar, it is hard for the reviewers to judge the size of the organism. Whole body measurements of JH synthesis enzymes will remain as a quite a drawback for the paper.

      In response to your suggestion, we have also included images with scale bars (see next Figure 1). The images show that the head region is difficult to separate from the brown thoracic sclerite region. Furthermore, the anatomical position of the Corpora Allata in brown planthoppers has never been reported, making dissection uncertain and highly challenging. To address this, we are now attempting to use Drosophila as a model to investigate how EB regulates JH synthesis and reproduction.

      Author response image 1.<br /> This illustration provides a visual representation of the brown planthopper (BPH), a major rice pest.<br />

      Figure 1. This illustration provides a visual representation of the brown planthopper (BPH), a major rice pest.).

      (11) "The phenomenon reported was specific to BPH and not found in other insects. This limits the implications of the study". This argument still holds. Combined with extreme species specificity, the general effect that EB causes brings into question the molecular specificity that the authors claim about the mode of action.

      We acknowledge that the specificity of the phenomenon to BPH may limit its broader implications, but we would like to emphasize that this study provides important insights into the unique biological mechanisms in BPH, a pest of significant agricultural importance. The molecular specificity we described in the manuscript is based on rigorous experimental evidence. We believe that it contributes to valuable knowledge to understand the interaction of external factors such as EB and BPH and resurgence of pests. We hope that this study will inspire further research into the mechanisms underlying similar phenomena in other insects, thereby broadening our understanding of insect biology. Since EB also has an effect on fecundity in Drosophila, albeit opposite to that in BPHs (Fig. 1 suppl. 2), it seems likely that EB actions may be of more general interest in insect reproduction.

      (12) The authors have added a few lines in the discussion but it does not change the overall design of the experiments. In this scenario, they should infer the performed experiments rationally without over interpretation. Currently, many of the claims that the authors are making are unsubstantiated. As a result of the first review process, the authors have acknowledged the discrepancies, but they have failed to alter their interpretations accordingly.

      We appreciate your concern regarding the experimental design and the need for rational inference without overinterpretation. In response, we would like to clarify that our discussion is based on the experimental data we have collected. We acknowledge that our study focuses on BPH and the specific effects of EB, and while we agree that broader generalizations require further research, we believe the new findings we present are valid and contribute to the understanding of this specific system.

      We also acknowledge the discrepancies you mentioned and have carefully considered your suggestions. In this revised version, we believe our interpretations are reasonable and consistent with the data, and we have adjusted our discussion to better reflect the scope of our findings. We hope that these revisions address your concerns. Thank you again for your constructive feedback.

      ADDITIONAL POINTS

      (1) Only one experiment was performed with Abamectin. No titration for the dosage were done for this compound, or at least not provided in the manuscript. Inclusion of this result will confuse readers. While removing this result does not impact the manuscript at all. My suggestion would be to remove this result.

      We acknowledge that the abamectin experiment lacks dose-titration details and that its standalone presentation could lead to confusion. However, we respectfully request to retain these results for the following reasons:

      Class-Specific Mechanism Validation:

      Abamectin and emamectin benzoate (EB) are both macrocyclic lactones targeting glutamate-gated chloride channels (GluCls). The observed similarity in their effects on BPH fecundity (e.g., Figure 1—figure supplement 1B) supports the hypothesis that GluCl modulation, rather than compound-specific off-target effects, drives the reproductive enhancement. This consistency strengthens the mechanistic argument central to our study.

      (2) The section "The impact of EB treatment on BPH reproductive fitness" is poorly described. This needs elaboration. A line or two should be included to describe why the parameters chosen to decide reproductive fitness were selected in the first place. I see that the definition of brachypterism has undergone a change from the first version of the manuscript. Can you provide an explanation for that? Also, there is no rationale behind inclusion of statements on insulin at this stage. The authors have not investigated insulin. Including that here will confuse readers. This can be added in the discussion though.

      Thank you for your suggestion. We have added an explanation regarding the primary consideration of evaluating reproductive fitness. In the interaction between sublethal doses of insecticides and pests, reproductive fitness is a key factor, as it accurately reflects the potential impact of insecticides on pest control in the field. Among the reproductive fitness parameters, factors such as female Nilaparvata lugens body weight, lifespan, and brachypterous ratio (as short-winged N. lugens exhibit higher oviposition rates than long-winged individuals) are critical determinants of reproductive success. Therefore, we comprehensively assessed the effects of EB on these parameters to elucidate the primary mechanism by which EB influences reproduction. We sincerely appreciate your constructive feedback.

      (3) "EB promotes ovarian maturation in BPH" this entire section needs to be rewritten and attention should be paid to the sequence of experiments described.

      Thank you for your suggestion. Based on your recommendation, we have rewritten this section (lines 267–275) and adjusted the sequence of experimental descriptions to improve the structural clarity of this part.

      (4) Figure 3N is outright wrong and should be removed or revised.

      In accordance with your recommendation, we have removed the figure.

      (5) When you are measuring hormonal titers, it is important to mention explicitly whether you are measuring hemolymph titer or whole body.

      We believe we have explicitly stated in the Methods section (line 1013) that we measured whole-body hormone titers. However, we now added this information to figure legends.

      (6)  EB induces JH biosynthesis through the peptidergic AstA/AstAR signaling pathway- this section needs attention at multiple points. Please check.

      We acknowledge that direct evidence for EB-AstA/AstAR interaction is limited and have framed these findings as a hypothesis for future validation.

      References

      Liu, S., Ding, Z., Zhang, C., Yang, B., Liu, Z., 2010. Gene knockdown by intro-thoracic injection of double-stranded RNA in the brown planthopper, Nilaparvata lugens. Insect Biochem. Mol. Biol. 40, 666-671

    1. Author response:

      The following is the authors’ response to the current reviews

      Reviewer #1 (Public review):

      In this work, Rios-Jimenez and Zomer et al have developed a 'zero-code' accessible computational framework (BEHAV3D-Tumour Profiler) designed to facilitate unbiased analysis of Intravital imaging (IVM) data to investigate tumour cell dynamics (via the tool's central 'heterogeneity module' ) and their interactions with the tumour microenvironment (via the 'large-scale phenotyping' and 'small-scale phenotyping' modules). A key strength is that it is designed as an open-source modular Jupyter Notebook with a user-friendly graphical user interface and can be implemented with Google Colab, facilitating efficient, cloud-based computational analysis at no cost. In addition, demo datasets are available on the authors GitHub repository to aid user training and enhance the usability of the developed pipeline.

      To demonstrate the utility of BEHAV3D-TP, they apply the pipeline to timelapse IVM imaging datasets to investigate the in vivo migratory behaviour of fluorescently labelled DMG cells in tumour bearing mice. Using the tool's 'heterogeneity module' they were able to identify distinct single-cell behavioural patterns (based on multiple parameters such as directionality, speed, displacement, distance from tumour edge) which was used to group cells into distinct categories (e.g. retreating, invasive, static, erratic). They next applied the framework's 'large-scale phenotyping' and 'small-scale phenotyping' modules to investigate whether the tumour microenvironment (TME) may influence the distinct migratory behaviours identified. To achieve this, they combine TME visualisation in vivo during IVM (using fluorescent probes to label distinct TME components) or ex vivo after IVM (by large-scale imaging of harvested, immunostained tumours) to correlate different tumour behavioural patterns with the composition of the TME. They conclude that this tool has helped reveal links between TME composition (e.g. degree of vascularisation, presence of tumour-associated macrophages) and the invasiveness and directionality of tumour cells, which would have been challenging to identify when analysing single kinetic parameters in isolation.

      While the analysis provides only preliminary evidence in support of the authors conclusions on DMG cell migratory behaviours and their relationship with components of the tumour microenvironment, conclusions are appropriately tempered in the absence of additional experiments and controls.

      The authors also evaluated the BEHAV3D TP heterogeneity module using available IVM datasets of distinct breast cancer cell lines transplanted in vivo, as well as healthy mammary epithelial cells to test its usability in non-tumour contexts where the migratory phenotypes of cells may be more subtle. This generated data is consistent with that produced during the original studies, as well as providing some additional (albeit preliminary) insights above that previously reported. Collectively, this provides some confidence in BEHAV3D TP's ability to uncover complex, multi-parametric cellular behaviours that may be missed using traditional approaches.

      While the tool does not facilitate the extraction of quantitative kinetic cellular parameters (e.g. speed, directionality, persistence and displacement) from intravital images, the authors have developed their tool to facilitate the integration of other data formats generated by open-source Fiji plugins (e.g. TrackMate, MTrackJ, ManualTracking) which will help ensure its accessibility to a broader range of researchers. Overall, this computational framework appears to represent a useful and comparatively user-friendly tool to analyse dynamic multi-parametric data to help identify patterns in cell migratory behaviours, and to assess whether these behaviours might be influenced by neighbouring cells and structures in their microenvironment.

      When combined with other methods, it therefore has the potential to be a valuable addition to a researcher's IVM analysis 'tool-box'.

      We thank the reviewer for carefully considering our manuscript and providing constructive comments. We appreciate the recognition of BEHAV3D-TP’s user-friendliness, modular design, and ability to link cell behavior with the tumor microenvironment. In the future, we plan to extend the tool to incorporate segmentation and tracking modules, once we have approaches that are broadly applicable or allow for personalized model training, further enhancing its utility for the community.

      Reviewer #2 (Public review):

      Summary:

      The authors produce a new tool, BEHAV3D to analyse tracking data and to integrate these analyses with large and small scale architectural features of the tissue. This is similar to several other published methods to analyse spatio-temporal data, however, the connection to tissue features is a nice addition, as is the lack of requirement for coding. The tool is then used to analyse tracking data of tumour cells in diffuse midline glioma. They suggest 7 clusters exist within these tracks and that they differ spatially. They ultimately suggest that these behaviours occur in distinct spatial areas as determined by CytoMAP.

      Strengths:

      - The tool appears relatively user-friendly and is open source. The combination with CytoMAP represents a nice option for researchers.

      - The identification of associations between cell track phenotype and spatial features is exciting and the diffuse midline glioma data nicely demonstrates how this could be used.

      We thank the reviewer for their careful reading and thoughtful comments. Feedback from all revision rounds has helped us clarify key points and improve the manuscript, and we are grateful for the positive remarks regarding our application to diffuse midline glioma and the potential of the tool to enable new biological insights.

      Reviewer #3 (Public review):

      The manuscript by Rios-Jimenez developed a software tool, BEHAV3D Tumor Profiler, to analyze 3D intravital imaging data and identify distinctive tumor cell migratory phenotypes based on the quantified 3D image data. Moreover, the heterogeneity module in this software tool can correlate the different cell migration phenotypes with variable features of the tumor microenvironment. Overall, this is a useful tool for intravital imaging data analysis and its open-source nature makes it accessible to all interested users.

      Strengths:

      An open-source software tool that can quantify cell migratory dynamics from intravital imaging data and identify distinctive migratory phenotypes that correlate with variable features of the tumor microenvironment.

      Weaknesses:

      Motility is the main tumor cell feature analyzed in the study together with some other tumor-intrinsic features, such as morphology. However, these features are insufficient to characterize and identify the heterogeneity of the tumor cell population that impacts their behaviors in the complex tumor microenvironment (TME). For instance, there are important non-tumor cell types in the TME, and the interaction dynamics of tumor cells with other cell types, e.g., fibroblasts and distinct immune cells, play a crucial role in regulating tumor behaviors. BEHAV3D-TP focuses on analysis of tumor-alone features, and cannot be applied to analyze important cell-cell interaction dynamics in 3D.

      We thank the reviewer for their careful assessment and encouraging remarks regarding BEHAV3D-TP.

      Regarding the concern about the tool’s current focus on motility features, we would like to clarify again that BEHAV3D-TP is designed to be highly flexible and extensible. Users can incorporate a wide range of features—including dynamic, morphological, and spatial parameters—into their analyses. In the latest revision, we have make this even more explicit by explaining that the feature selection interface allows users to either (i) directly select them for clustering or (ii) select features for correlation with clusters (See Small scale phenotyping module section in Methods).

      Importantly, while our current analysis emphasizes clustering based on dynamic behaviors, Figure 4 demonstrates that these behavioral clusters are associated at the single-cell level with distinct proximities to key TME components, such as TAMMs and blood vessels. These spatial interaction features could also have been included in the clustering itself—creating dynamic-spatial clusters—but we deliberately chose not to do so. This decision was guided by established principles of feature selection: including features with unknown or potentially irrelevant variability can introduce noise and obscure biologically meaningful patterns, ultimately reducing the clarity and interpretability of the resulting clusters. Instead, we adopted a two-step approach—first identifying clusters based on core dynamic features, then examining their relationships with spatial and interaction metrics. This allowed us to reveal meaningful associations of particular cell behavior such as the invading cluster in proximity of TAMMs without overfitting or complicating the clustering model.

      To address the reviewer’s point in the latest revision round, we have updated the Small-scale phenotyping module  to highlight the possibility of including spatial interaction features with various TME cell types. We also revised the manuscript text and Figure 1 to clarify that these environmental features can be used both upstream as clustering input (Option 1) and for downstream analysis (Option 2), depending on the user’s experimental goals. Attached to this rebuttal letter, we also provide an additional figure illustrating these options in the feature selection panels of the Colab notebook.

      In summary, while the clustering presented in this study is based on dynamic parameters, BEHAV3D-TP fully supports the integration of interaction features and other non-motility descriptors. This modularity enables users to customize their analysis pipelines according to specific biological questions, including those involving cell–cell interactions and spatial dynamics within the TME.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      Intravital microscopy (IVM) is a powerful tool that facilitates live imaging of individual cells over time in vivo in their native 3D tissue environment. Extracting and analysing multi-parametric data from IVM images however is challenging, particularly for researchers with limited programming and image analysis skills. In this work, RiosJimenez and Zomer et al have developed a 'zero-code' accessible computational framework (BEHAV3D-Tumour Profiler) designed to facilitate unbiased analysis of IVM data to investigate tumour cell dynamics (via the tool's central 'heterogeneity module' ) and their interactions with the tumour microenvironment (via the 'large-scale phenotyping' and 'small-scale phenotyping' modules). It is designed as an open-source modular Jupyter Notebook with a user-friendly graphical user interface and can be implemented with Google Colab, facilitating efficient, cloud-based computational analysis at no cost. Demo datasets are also available on the authors GitHub repository to aid user training and enhance the usability of the developed pipeline. 

      To demonstrate the utility of BEHAV3D-TP, they apply the pipeline to timelapse IVM imaging datasets to investigate the in vivo migratory behaviour of fluorescently labelled DMG cells in tumour bearing mice. Using the tool's 'heterogeneity module' they were able to identify distinct single-cell behavioural patterns (based on multiple parameters such as directionality, speed, displacement, distance from tumour edge) which was used to group cells into distinct categories (e.g. retreating, invasive, static, erratic). They next applied the framework's 'large-scale phenotyping' and 'small-scale phenotyping' modules to investigate whether the tumour microenvironment (TME) may influence the distinct migratory behaviours identified. To achieve this, they combine TME visualisation in vivo during IVM (using fluorescent probes to label distinct TME components) or ex vivo after IVM (by large-scale imaging of harvested, immunostained tumours) to correlate different tumour behavioural patterns with the composition of the TME. They conclude that this tool has helped reveal links between TME composition (e.g. degree of vascularisation, presence of tumour-associated macrophages) and the invasiveness and directionality of tumour cells, which would have been challenging to identify when analysing single kinetic parameters in isolation. 

      The authors also evaluated the BEHAV3D TP heterogeneity module using available IVM datasets of distinct breast cancer cell lines transplanted in vivo, as well as healthy mammary epithelial cells to test its usability in non-tumour contexts where the migratory phenotypes of cells may be more subtle. This generated data is consistent with that produced during the original studies, as well as providing some additional (albeit preliminary) insights above that previously reported. Collectively, this provides some confidence in BEHAV3D TP's ability to uncover complex, multi-parametric cellular behaviours that may be missed using traditional approaches. 

      Overall, this computational framework appears to represent a useful and comparatively user-friendly tool to analyse dynamic multi-parametric data to help identify patterns in cell migratory behaviours, and to assess whether these behaviours might be influenced by neighbouring cells and structures in their microenvironment. When combined with other methods, it therefore has the potential to be a valuable addition to a researcher's IVM analysis 'tool-box'. 

      Strengths: 

      •  Figures are clearly presented, and the manuscript is easy to follow. 

      •  The pipeline appears to be intuitive and user-friendly for researchers with limited computational expertise. A detailed step-by-step video and demo datasets are also included to support its uptake. 

      •  The different computational modules have been tested using relevant datasets, including imaging data of normal and tumour cells in vivo. 

      •  All code is open source, and the pipeline can be implemented with Google Colab. 

      •  The tool combines multiple dynamic parameters extracted from timelapse IVM images to identify single-cell behavioural patterns and to cluster cells into distinct groups sharing similar behaviours, and provides avenues to map these onto in vivo or ex vivo imaging data of the tumour microenvironment 

      Weaknesses: 

      •  The tool does not facilitate the extraction of quantitative kinetic cellular parameters (e.g. speed, directionality, persistence and displacement) from intravital images. To use the tool researchers must first extract dynamic cellular parameters from their IVM datasets using other software including Imaris, which is expensive and therefore not available to all. Nonetheless, the authors have developed their tool to facilitate the integration of other data formats generated by open-source Fiji plugins (e.g. TrackMate, MTrackJ, ManualTracking) which will help ensure its accessibility to a broader range of researchers. 

      •  The analysis provides only preliminary evidence in support of the authors conclusions on DMG cell migratory behaviours and their relationship with components of the tumour microenvironment. The authors acknowledge this however, and conclusions are appropriately tempered in the absence of additional experiments and controls. 

      We thank the reviewer for their thorough and constructive assessment of our work and are pleased that the accessibility, functionality, and potential impact of BEHAV3DTumour Profiler were well received. We particularly appreciate the acknowledgment of the tool’s ease of use for researchers with limited computational expertise, the clarity of the manuscript, and the relevance of our approach for identifying multi-parametric migratory behaviours and their correlation with the tumour microenvironment.

      Regarding the weaknesses raised:

      (1) Lack of built-in tracking and kinetic parameter extraction – As noted in our initial revision, while we agree that integrating open-source tracking and segmentation functionality could be valuable, it is beyond the scope of the current work. Our tool is designed to focus specifically on downstream analysis of already extracted kinetic data, addressing a gap in post-processing tools for exploring complex migratory behaviour and spatial correlations. Since different experimental systems often require tailored imaging and segmentation pipelines, we believe that decoupling tracking from the downstream analysis can actually be a strength, offering greater versatility. Researchers can use their preferred or most appropriate tracking software—whether proprietary or opensource—and then analyze the resulting data with BEHAV3D-TP. To support this, we ensured compatibility with widely used tools including open-source Fiji plugins (e.g., TrackMate, MTrackJ, ManualTracking), and we also cited several relevant studies and that address the upstream processing steps. Importantly, the main aim of our tool is to fill the gap in post-tracking analysis, enabling quantitative interpretation and pattern recognition that has until now required substantial coding effort or custom solutions.

      (2) Preliminary nature of the biological conclusions – We fully agree with this assessment and have explicitly acknowledged this limitation in the manuscript. Our aim was to demonstrate the utility of BEHAV3D-TP in uncovering heterogeneity and spatial associations in vivo, while encouraging further hypothesis-driven studies using complementary biological approaches. We are grateful that the reviewer recognizes the cautious interpretation of our results and their added value beyond single-parameter analysis.

      Reviewer #2 (Public review): 

      Summary: 

      The authors produce a new tool, BEHAV3D to analyse tracking data and to integrate these analyses with large and small scale architectural features of the tissue. This is similar to several other published methods to analyse spatio-temporal data, however, the connection to tissue features is a nice addition, as is the lack of requirement for coding. The tool is then used to analyse tracking data of tumour cells in diffuse midline glioma. They suggest 7 clusters exist within these tracks and that they differ spatially. They ultimately suggest that there these behaviours occur in distinct spatial areas as determined by CytoMAP. 

      Strengths: 

      - The tool appears relatively user-friendly and is open source. The combination with CytoMAP represents a nice option for researchers. 

      - The identification of associations between cell track phenotype and spatial features is exciting and the diffuse midline glioma data nicely demonstrates how this could be used. 

      Weaknesses: 

      The revision has dealt with many concerns, however, the statistics generated by the process are still flawed. While the statistics have been clarified within the legends and this is a great improvement in terms of clarity the underlying assumptions of the tests used are violated. The problem is that individual imaging positions or tracks are treated as independent and then analysed by ANOVA. As separate imaging positions within the same mouse are not independent, nor are individual cells within a single mouse, this makes the statistical analyses inappropriate. For a deeper analysis of this that is feasible within a review please see Lord, Samuel J., et al. "SuperPlots: Communicating reproducibility and variability in cell biology." The Journal of cell biology 219.6 (2020): e202001064. Ultimately, while this is a neat piece of software facilitating the analysis of complex data, the fact that it will produce flawed statistical analysis is a major problem. This problem is compounded by the fact that much imaging analysis has been analysed in this inappropriate manner in the past, leading to issues of interpretation and ultimately reproducibility. 

      We thank the reviewer for their careful reading and thoughtful feedback. We are encouraged by the recognition of BEHAV3D-TP’s ease of use, open-source accessibility, and the value of integrating cell behaviour with spatial features of the tissue. We appreciate the positive remarks regarding our application to diffuse midline glioma (DMG) and the potential for the tool to enable new biological insights.

      We also appreciate the reviewer’s continued concern regarding the statistical treatment of the data. While we agree with the broader principle that care must be taken to avoid violating assumptions of independence, we respectfully disagree that all instances where individual tracks or imaging positions are used constitute flawed analysis. Importantly, our work is centered on characterizing heterogeneity at the single-cell level in distinct TME regions. Therefore, in certain cases—especially when comparing distinct behavioral subtypes across varying TME environments and multiple mice—it is appropriate to treat individual imaging positions as independent units. This approach is particularly relevant given our findings that large-scale TME regions differ across positions. When analyzing features such as the percentage of DMG cells in proximity to TAMMs, averaging per mouse would obscure these regional differences and reduce the resolution of biologically meaningful variation.

      To address this concern further, we have revised the figure legends, main text, and documentation, carefully considering the appropriate statistical unit for each analysis. As detailed below, we used mouse-level aggregation where the experimental question required inter-mouse reproducibility, and a position-based approach where the aim was to explore intra-tumoral heterogeneity.

      Figure 3d and Supplementary Figure 5d: In this analysis, we treated imaging positions as independent units because our data specifically demonstrate that, within individual mice, different positions correspond to distinct large-scale tumor microenvironment phenotypes. Therefore, averaging across the whole mouse would obscure these important spatial differences and not accurately reflect the heterogeneity we aim to characterize.

      Figure 4c-e; Supplementary Figure 6d: While our initial aim was to highlight single-cell variability, we acknowledge that the original presentation may have been misleading. In the revised manuscript, we have updated the graphs for greater clarity. To quantify how often tumor cells of each behavioral type are located near TAMMs (Fig. 4c) or blood vessels (Fig. 4e), we now calculate the percentage of tumor cells "close" to environmental feature per behavioral cluster within each imaging position. This classification is based on the distance to the TME feature of interest and is detailed in the “Large-scale phenotyping” section of the Methods. For the number of SR101 objects in a 30um radius we averaged per position.

      We treated individual imaging positions as the units of analysis rather than averaging per mouse, as our data (see Figure 2) show that positions vary in their TME phenotypes—such as Void, TAMM/Oligo, and TAMM/Vascularized—as well as in the number of TAMMs, SR101 cells or blood vessels per position. These differences are biologically meaningful and relevant to the quantification that we performed – percentage of tumor cell in close proximity to distinct TME features.

      To account for inter-mouse and TME region variability, we applied a linear mixedeffects model with both mouse and TME class included as random effects.

      Supplementary Figure 3d: Following the reviewer’s suggestion, we have averaged the distance to the 3 closest GBM neighbours per mouse, treating each mouse as an independent unit for comparison across distinct GBM morphodynamic clusters. To account for inter-mouse variability when assessing statistical significance, we employed a linear mixed model with mouse included as a random effect. 

      Distance to 3 neighbours is a feature not used in the clustering, thus variability between mice can be more pronounced—for example, due to differences in tumor compactness or microenvironment structure across individual mice. To appropriately account for this, mouse was included as a random effect in the model.

      Supplementary Figure 4c: Following the reviewer’s suggestion, we averaged cell speed per mouse, treating each mouse as an independent unit for comparison across distinct DMG behavioral clusters. Statistical significance was assessed using ANOVA followed by Tukey’s post hoc test. When comparing cell speed, which is a feature used in the clustering process, inter-mouse variability was already addressed during clustering itself. Therefore, in the downstream analysis of this cluster-derived feature, it is appropriate to treat each mouse as an independent unit without including mouse as a random effect.

      Supplementary Figure 5e-g: Following the reviewer’s suggestion, we averaged cell speed per mouse, treating each mouse as an independent unit for comparison across distinct DMG behavioral clusters. Statistical significance was assessed using ANOVA followed by Tukey’s post hoc test.

      Supplementary Figure 6c: Following the reviewer’s suggestion, we averaged cell distance to the 10 closest DMG neighbours per mouse, treating each mouse as an independent unit for comparison across distinct DMG behavioral clusters. To account for inter-mouse variability, we used a linear mixed model with mouse included as a random effect.

      Reviewer #3 (Public review): 

      The manuscript by Rios-Jimenez developed a software tool, BEHAV3D Tumor Profiler, to analyze 3D intravital imaging data and identify distinctive tumor cell migratory phenotypes based on the quantified 3D image data. Moreover, the heterogeneity module in this software tool can correlate the different cell migration phenotypes with variable features of the tumor microenvironment. Overall, this is a useful tool for intravital imaging data analysis and its open-source nature makes it accessible to all interested users. 

      Strengths: 

      An open-source software tool that can quantify cell migratory dynamics from intravital imaging data and identify distinctive migratory phenotypes that correlate with variable features of the tumor microenvironment. 

      Weaknesses: 

      Motility is only one tumor cell feature and is probably not sufficient to characterize and identify the heterogeneity of the tumor cell population that impacts their behaviors in the complex tumor microenvironment (TME). For instance, there are important nontumor cell types in the TME, and the interaction dynamics of tumor cells with other cell types, e.g., fibroblasts and distinct immune cells, play a crucial role in regulating tumor behaviors. BEHAV3D-TP focuses on only motility feature analysis, and cannot be applied to analyze other tumor cell dynamic features or cell-cell interaction dynamics. 

      Regarding the concern about the tool’s current focus on motility features, we would like to clarify that BEHAV3D-TP is designed to be highly flexible and extensible. As described in our first revision, users can incorporate a wide range of features—including dynamic, morphological, and spatial parameters—into their analyses. In the current revision, we have make this even more explicit by explaining that the feature selection interface allows users to either (i) directly select them for clustering or (ii) select features for correlation with clusters (See Small scale phenotyping module section in Methods and Rebuttal Figure).

      Importantly, while our current analysis emphasizes clustering based on dynamic behaviors, Figure 4 demonstrates that these behavioral clusters are associated at the single-cell level with distinct proximities to key TME components, such as TAMMs and blood vessels. These spatial interaction features could also have been included in the clustering itself—creating dynamic-spatial clusters—but we deliberately chose not to do so. This decision was guided by established principles of feature selection: including features with unknown or potentially irrelevant variability can introduce noise and obscure biologically meaningful patterns, ultimately reducing the clarity and interpretability of the resulting clusters. Instead, we adopted a two-step approach—first identifying clusters based on core dynamic features, then examining their relationships with spatial and interaction metrics. This allowed us to reveal meaningful associations of particular cell behavior such as the invading cluster in proximity of TAMMs without overfitting or complicating the clustering model.

      To further address the reviewer’s point, we have updated the Small-scale phenotyping module  to highlight the possibility of including spatial interaction features with various TME cell types. We also revised the manuscript text and Figure 1 to clarify that these environmental features can be used both upstream as clustering input (Option 1) and for downstream analysis (Option 2), depending on the user’s experimental goals. Author response image 1 illustrates these options in the feature selection panels of the Colab notebook.

      Author response image 1.

      (a) In the small-scale phenotyping module, microenvironmental factors (MEFs) detected in the segmented IVM movies are identified and their coordinates imported. From here, there are two options: (b) include the relationship to these MEFs as a feature for clustering, or (c) exclude this relationship and instead correlate MEFs with cell behavior to assess potential spatial associations.<br />

      In summary, while the clustering presented in this study is based on dynamic parameters, BEHAV3D-TP fully supports the integration of interaction features and other non-motility descriptors. This modularity enables users to customize their analysis pipelines according to specific biological questions, including those involving cell–cell interactions and spatial dynamics within the TME.

      Reviewer #2 (Recommendations for the authors): 

      If the software were adjusted to produce analyses following best practices in the field as outlined in Lord, Samuel J., et al. "SuperPlots: Communicating reproducibility and variability in cell biology." The Journal of cell biology 219.6 (2020): e202001064. this could be a helpful piece of software. The major current issue would be that it democratises the ability to analyse complex imaging data, allowing non-experts to carry out these analyses but misleads them and encourages poor statistical practice. 

      We appreciate the reviewer’s suggestion and the reference to best practices outlined in Lord et al., 2020. As discussed in detail in our point-by-point response to Reviewer #2, we have revised several figures to enhance clarity and statistical rigor, including Figure 4c,e; Supplementary Figures 3d, 4c, 5e–g, and 6c–d. Specifically, we adjusted how data are summarized and displayed—averaging per mouse where appropriate and clarifying the statistical methods used. Where imaging positions were retained as the unit of analysis, this decision was grounded in the biological relevance of intra-mouse spatial heterogeneity (as demonstrated in Figure 2). Additionally, we applied linear mixed-effects models in cases where inter-mouse or inter-Large scale TME regions variability needed to be accounted for. We believe these changes address the core concern about reproducibility and statistical interpretation while preserving the biological insights captured by our approach.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer 1:

      We thank Reviewer 1 for the discussion on the possible causes of ERPs and their relevance for the interpretation of changes in aperiodic activity. We have changed the relevant paragraph to read as follows: For example, ERPs may reflect changes in periodic activity, such as phase resets (Makeig et al., 2002), or baseline shifts (Nikulin et al., 2007). ERPs may also capture aperiodic activity, either in the form of evoked transients triggered by an event (Shah et al., 2004) or induced changes in the ongoing background signal. This has important implications: evoked transients can alter the broadband spectrum without implying shifts in ongoing background activity, whereas induced aperiodic changes may signal different neural mechanisms, such as shifts in the excitation-inhibition balance (Gao et al., 2017).

      Reviewer 1 argued that a time point-by-time point comparison between ERPs and aperiodic parameters may not be the most appropriate approach, since aperiodic time series have lower temporal resolution than ERPs. Reviewer suggested comparing their topographies instead. We had already done this in the first version of the paper (see Fig. S7: https://elifesciences.org/reviewedpreprints/101071v1#s10). However, in the second version, we opted to use linear mixed models for each channel-time point in order to maintain consistency with the other analyses in the paper (e.g. the comparison between FOOOF parameters and baseline-corrected power).

      Nevertheless, we repeated the topographic correlations as in the first version, and the results are shown below. Correlations were computed for each time point, subject and condition, and then averaged across these dimensions for visualisation. The pattern differs from that of the linear mixedmodel results (see Fig. S14), with notable correlations appearing after ~0.5 s for the exponent and after ~1.0 s for the offset. Still, the correlations remain low, suggesting that aperiodic parameters and ERPs encode different information (at least in this dataset).

      Author response image 1.<br />

      Additionally, to control for the effect of smearing we have performed the same linear mixed model analysis as in Fig. S14 on low-pass filtered ERPs (with cut-off 10 Hz), and the results were largely similar as in Fig. S14.

      Reviewer 1 discussed two possible explanations for the observed correlations between baselinecorrected power and FOOOF parameters (Figure 4): “The correlation between the exponent and lowfrequency activity could be of either direction: low frequency power changes could reflect 1/f shifts, or exponent estimates might be biased by undetected delta/theta activity. I think that one other piece of evidence /…/ to intuitively highlight why the latter is more likely is the /…/ decrease at high ("transbeta") frequencies, which suggests a rotational shift /../.” We agree with the interpretation that lowfrequency power changes in our data primarily reflect 1/f shifts. However, we are uncertain about the reviewer’s statement that the “latter” explanation (i.e., bias in exponent estimates due to delta/theta activity) is more likely. Given the context, we believe the reviewer may have intended to say the “former” explanation is more likely.

      We agree with the reviewers' observation that rhythmicity, as estimated using the pACF, can be independent of power (Myrov et al., 2024, Fig. 1). However, it seems that in real (non-simulated) datasets, the pACF and power spectral density (PSD) are often moderately correlated (e.g. Myrov et al., 2024, Fig. 5).

      Reviewer 1 asked whether we had examined aperiodic changes in the data before and after subtracting the response-locked ERPs. We did not carry out this extra analysis as, as the reviewer suggests, it would have been excessive – the current version of the paper already contains more than 60 figures. As mentioned in the manuscript, we acknowledge the possibility that response-locked ERPs contribute to the second aperiodic component. However, due to the weak correlation between reaction times and aperiodic activity, the presence of both components throughout the entire epoch (in at least the first and third datasets) and the distinct differences between the ERPs and the aperiodic activity in the different conditions (see Fig. 8 vs. Fig. S13), we cannot conclusively determine whether the second aperiodic component is directly related to motor responses. Finally, we agree with the reviewer that the distribution of the response-locked ERP more closely resembles the frontocentral (earlier) aperiodic component than the later post-response component. We have amended the relevant paragraph in the Discussion to include these observations. ”While it is possible that response-related ERPs contributed to the second aperiodic component, several observations suggest otherwise: both aperiodic components were present throughout the entire epoch, differences between conditions diverged between ERPs and aperiodic activity (compare Figure 8 and Figure S16), and the associations with reaction times were weak. Moreover, the distribution of the response-locked ERP qualitatively resembled the earlier frontocentral aperiodic component more than the later post-response component. Taken together, these findings suggest that ERPs and aperiodic activity capture distinct aspects of neural processing, rather than reflecting the same underlying phenomenon.”

      We agree with Reviewer 1 that our introduction of aperiodic activity was abrupt, and that the term 'aperiodic exponent' required definition. We have now defined it as the spectral steepness in log–log space (i.e. the slope), and have added a brief explanatory sentence to the introduction.

      Reviewer 1 noted that the phrase 'task-related changes in overall power' could be misinterpreted as referring to total (broadband) power, and recommended that we specify a frequency range. We agree, so we have replaced 'overall power' with 'spectral power within a defined frequency range'.

      We agree with Reviewer 1 that the way we worded things in the Discussion section regarding alpha activity and inhibitory processes was awkward and could easily be misread. We have rephrased the sentences and added a brief explanation to avoid implying a direct link between alpha attenuation and neural inhibition.

      Furthermore, based on the reviewer’s suggestion, we added a brief comment in the Discussion section (Theoretical and methodological implications) on theoretical perspectives regarding the interaction between age and aperiodic activity.

      Reviewer 1 suggested including condition as a fixed effect in order to examine whether the relationship between FOOOF parameters and baseline-corrected power is modulated by condition. Specifically, the reviewer proposed changing our model from

      baseline_corrected_power ~ 1 + fooof_parameter + (1|modality) + (1|nback) + (1|stimulus) + (1|subject)

      to

      baseline_corrected_power ~ 1 + fooof_parameter + modality*nback *stimulus + (1|subject)

      While we appreciate this suggestion, we believe that including design variables as fixed effects would confound the interpretation of (marginal) R² as a measure of the association between FOOOF parameters and baseline-corrected power. Our primary question in this analysis was about the fundamental relationship between these measures, not how experimental conditions moderate this relationship.

      To address the reviewer's concern regarding condition-specific effects, we conducted separate analyses for each condition using a simpler model:

      baseline_corrected_power ~ 1 + fooof_parameter + (1|subject)

      The results (now included in the Supplement, Fig. S4–S6) show generally smaller effect sizes compared to our original random-effects model, with notable differences between conditions. The 2-back conditions, particularly the non-target trials, exhibited the weakest associations. Despite these differences, the overall patterns remained consistent with our original findings: exponent and offset exhibited positive associations at low frequencies (delta, theta) and negative associations at higher frequencies (beta, low gamma), while periodic activity correlated substantially with baselinecorrected power in the alpha, beta, and gamma ranges.

      However, this condition-specific approach has important limitations. With only 47 subjects per condition, the statistical power is insufficient for stable correlation estimates (Schönbrodt & Perugini, 2013; https://doi.org/10.1016/j.jrp.2013.05.009). This likely explains why the effects are smaller and less stable effects than in our original model, which uses the full dataset's power while appropriately accounting for condition-related variance through random effects. Since these additional analyses do not alter our primary conclusions, we have included them in the Supplement for completeness and made a minor change in the Discussion section.

      Reviewer 1 asked what channels are lines on Figure 9 based on. As stated in the Methods section, “We fitted models in a mass univariate manner, that is for each channel, frequency (where applicable), and time point separately. /…/ For the purposes of visualisation, p-values were averaged across channels (for heatmaps or lines) or across time (for topographies).” Therefore, the lines and heatmaps apply to all channels.

      Reviewer 2:

      We would like to thank reviewer 2 for their detailed explanation of the expected behaviour of the specparam algorithm. We have added the following explanation to the Methods section:

      Importantly, as noted by the reviewer, this behaviour reflects an explicit design choice of the algorithm: to avoid overfitting ambiguous peaks at the edges of the spectrum, FOOOF excludes peaks that are too close to the boundaries. This exclusion is controlled by the _bw_std_edge parameter, which defines the distance that a peak must be from the edge in order to be retained (in units of standard deviation; set to 1.0 by default). Therefore, although the algorithm is functioning as intended, users should be careful when interpreting aperiodic parameters in datasets where lowfrequency oscillatory activity might be expected.

      In line with the reviewer’s suggestion we have added a version of specparam to the paper.

      We thank reviewer 2 for pointing out two studies that used a time-resolved approach to spectral parameterisation. We have updated the text accordingly:

      Although a similar approach has been used to track temporal dynamics in sleep and resting state (e.g., Wilson et al., 2022; Ameen et al., 2024), as well as in task-based contexts (e.g., Barrie et al., 1996; Preston et al., 2025), its specific application to working memory paradigms remains underexplored.

      Reviewer 3:

      Reviewer 3 notes that the revised manuscript feels less intriguing than the original version. While we understand this concern, we believe this difference arises from a misalignment in expectations regarding the scope and purpose of our study. We think the reviewer is interpreting our work as focusing on whether theta activity is elicited in a paradigm that reliably produces theta oscillations. In contrast, our study is framed around a working memory task in which, based on prior literature, we expected to observe theta activity but instead found an absence of theta spectral peaks in almost all participants. Note that the absence of theta is already noteworthy in itself, given that theta oscillations are believed to play a crucial role in working memory.

      Importantly, Van Engen et al. (2024) have recently reported similar findings:

      ”While we did not observe load-dependent aperiodic changes over the frontal midline, we did reveal the possibility that previous frontal midline theta results that do not correct for aperiodic activity likely do not reflect theta oscillations. /…/ While our results do not invalidate previous research into extracranial theta oscillations in relation to WM, they challenge popular and widely held beliefs regarding the mechanistic role for theta oscillations to group or segregate channels of information”.

      From this perspective, we maintain that the following statements are still justified:

      “substantial portion of the changes often attributed to theta oscillations in working memory tasks may be influenced by shifts in the spectral slope of aperiodic activity”

      "Note that although no prominent oscillatory peak in the theta range was observed at the group level, and some of this activity could potentially fall within the delta range, similar lowfrequency patterns have often been referred to as 'theta' in previous work, even in the absence of a clear spectral peak"

      These formulations are intended to emphasize existing interpretations of changes in low-frequency power as theta oscillations in related research.

      Next, Reviewer 3 pointed out that “spectral reflection (peak?) in spectral power plot does not imply that an event is repeating (i..e. oscillatory).” We agree with the reviewer that not every spectral peak implies a true oscillation. To address this, we complemented the power analyses with a measure of rhythmicity (phase autocorrelation function, pACF) after the first round of reviews, and the pACF results were largely similar to those for periodic activity. These results suggest that, in our case, periodic activity is indeed largely oscillatory.

      However, we do agree with the reviewer that the term “oscillatory” is not interchangeable with “periodic”. To address this, we reviewed the paper for all appearances of “oscillations”, “oscillatory” and related terms, and replaced them with “power”, “spectral” or “periodic activity” where appropriate (all changes are marked in red in the latest version of the manuscript).

      Examples of corrections:

      Changes in aperiodic activity appear as low-frequency oscillations in baseline-corrected time-frequency plots à low-frequency power

      “The periodic component includes only the parameterised oscillatory peak” à spectral peak

      “FOOOF decomposition may miss low-frequency oscillations near the edges of the spectrum” à low-frequency peaks

      We disagree with the reviewer’s assertion that the subtitle “Aperiodic parameters are largely independent of oscillatory activity” is misleading for a methods oriented paper. Namely, the full subtitle is “Rhythmicity analysis reveals aperiodic parameters are largely independent of oscillatory activity”. Since rhythmicity is a phase-based measure that requires repeating dynamics and is therefore indicative of oscillations, we believe this phrasing is technically accurate.

      Finally, we would like to emphasise our contribution once again. Our analyses of rhythmicity, spectrally parameterised power, and baseline-corrected power offer different perspectives on the data. Each of these analyses may lead to different interpretations, but performing all of them on the same data provides a more comprehensive insight into what is actually going on in the data.

      Our findings demonstrate that conclusions drawn from a single analytical approach may be incomplete or misleading. For example, as we discuss in the paper, many studies examine thetagamma coupling in scalp EEG during n-back tasks without first establishing whether theta activity genuinely oscillates (e.g. Rajji et al., 2016). The absence of true theta oscillations would undermine the validity of such analyses. Our multifaceted approach provides researchers with a systematic framework for validating oscillatory assumptions before proceeding with more complex analyses.

    1. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review)

      Summary:

      This manuscript addresses the question of whether spontaneous activity contributes to the clustering of retinogeniculate synapses before eye opening. The authors re-analyze a previously published dataset to answer the question. The authors conclude that synaptic clustering is eye-specific and activity dependent during the first postnatal week. While there is useful information in this manuscript, I don't see how the data meaningfully supports the claims made about clustering.

      In adult retinogeniculate connections, functionally specificity is supported by select pairings of retinal ganglion cells and thalamocortical cells forming dozens of synaptic connections in subcellular microcircuits called glomeruli. In this manuscript, the authors measure whether the frequency of nearby synapses is higher in the observed data than in a model where synapses are randomly distributed throughout the volume. Any real anatomical data will deviate from such a model. The interesting biological question is not whether a developmental state deviates from random. The interesting question is how much of the adult clustering occurs before eye opening. In trying to decode the analysis in this manuscript, I can't tell if the answer is 99% or 0.001%.

      We thank the reviewer for their helpful critique through both rounds of review. We have refocused the manuscript on paired eye-specific measurements of active zone addition and spatial relationships among active zones at each age. All effect sizes and power values for each comparison are now reported in Table S2. These measures allow readers to gauge biological significance more transparently.

      Strengths:

      The source dataset is high resolution data showing the colocalization of multiple synaptic proteins across development. Added to this data is labeling that distinguishes axons from the right eye from axons from the left eye. The first order analysis of this data showing changes in synapse density and in the occurrence of multi-active zone synapses is useful information about the development of an important model system.

      Weaknesses:

      I don't think the analysis of clustering within this dataset improves our understanding of how the system works. It is possible that the result is clear to the authors based on looking at the images. As a reader trying to interpret the analysis, I ran into the following problems:

      • It is not possible to estimate biologically meaningful effect sizes from the data provided. Spontaneous activity in the post natal week could be responsible for 99% or 0.001% of RGC synapse clustering.

      • The sample size is too small for the kinds of comparisons being made. The authors point out that many STORM studies use an n of 1 while the authors have n = 3 for each of their six experimental groups. However, the critical bit is what kinds of questions you are trying to answer with a given sample size. This study depends on determining whether the differences between groups are due to age, genotype, or individual variation. This study also makes multiple comparisons of many different noisy parameters that test the same or similar hypothesis. In this context, it is unlikely that n = 3 sufficiently controls for individual variation.

      We have revised the manuscript to focus on eye-specific differences, which are paired measurements collected at each age. We have measured effect sizes and performed power tests for all comparisons presented in the manuscript. These measurements are shown for every figure in a new supplemental table S2.

      • There is no clear biological interpretation of the core measure of the publication, the normalized clustering index. The normalized clustering index starts with counting the fraction of single active zone synapses within various distances to the edge of synapses. This frequency is compared to a randomization model in which the positions of synapses are randomized throughout a volume. The authors found that the biggest deviation between the observed and randomized proximity frequency using a distance threshold of 1.5 um. They consider the deviation from the random model to be a sign of clustering. However, two RGC synapses 1.5 um apart have a good chance of coming from the same RGC axon. At this scale, real observations will, therefore, always look more clustered than a model where synapses are randomly placed in a volume. If you randomly place synapses on an axon, they will be much closer together than if you randomly place synapses within a volume. The authors normalize their clustering measure by dividing by the frequency of clustering in the normalized model. That makes the measure of clustering an ambiguous mix of synapse clustering, axon morphology, and synaptic density.

      We have removed the “normalized clustering index”. “Clustered” inputs are now defined strictly as those that have a neighboring single active-zone (sAZ) synapse within 1.5 mm. For each type of input (sAZ and mAZ) we show 1) the ratio of clustered to isolated inputs for both eyes, and 2) the number of neighboring sAZs (Figure 4).

      We agree with the reviewer that many synapses are likely made nearby along the same axon from an individual RGC. In this scenario, sAZ synapses that are nearby a neighboring mAZ input may be part of the same nascent bouton. And, sAZ synapses nearby other sAZ neighbors may ultimately mature into a mAZ input. At the same time, inputs from one RGC may form nearby other inputs from neighboring RGCs. We discuss these motifs and potential mechanisms of cell-autonomous and non-autonomous development (Lines 300-308).

      • Other measures are also very derived. For instance, one argument is based on determining that the cumulative distribution of the distance of dominant-eye multi-active zone synapses with nearby single-active zone synapses from dominant-eye multi-active zone synapses is statistically different from the cumulative distribution of the distance of dominant-eye multi-active zones without nearby single-active zone synapses from dominant-eye multi-active zones. Multiple permutations of this measure are compared.

      We have simplified the presentation to show all measured path lengths for every input. This allows the reader to see each of the inputs and their relative distances. We present these data for like-eye type interactions at P4 and P8 (Figures 5 and S5).   

      • There are major biological differences between groups that are difficult to control for. Between P2, P4, and P8, there are changes in cell morphology and synaptic density. There are also large differences in synapse density between wild type and KO mice. It is difficult to be confident that these differences are not responsible for the relatively subtle changes in clustering indices.

      • Many claims are based on complicated comparisons between groups rather than the predominating effects within the data. It is noted that: "In KO mice, dominant eye projections showed increased clustering around mAZ synapses compared to sAC synapses suggesting partial maintenance of synaptic clustering despite retinal wave defects". In contrast, I did not notice any discussion of the fact that the most striking trend in those measures is that the clustering index decreases from P2 to P8.

      Related to the points above, we have revised the manuscript to focus on eye-specific release site addition and spatial relationships. For clarity, we have removed the clustering index and instead present ratios of clustered and isolated inputs, the number of sAZ synapses near each input type, and distance between like-eye mAZ inputs (Figure 4).      

      • Statistics are improperly applied. In my first review I tried to push the authors to calculate confidence intervals for two reasons. First, I believed the reader should be able to answer questions such as whether 99% or 0.01% of RGC synaptic clustering occurred in the first postnatal week. Second, I wanted the authors to deal with the fact that n=3 is underpowered for many of the questions they were asking. While many confidence intervals can now be found leading up to a claim, it is difficult to find claims that are directly supported by the correct confidence interval. Many claims are still incorrectly based on which combinations of comparisons produced statistically significant differences and which combinations did not.

      We have substantially revised the manuscript to focus on within-group paired effects between eye-of-origin. We performed power tests for all statistical presentations and effect sizes and powers are presented for every figure in a new supplemental table S2. To simplify the manuscript and make it easier to read, we report confidence interval measurements in a separate supplemental table S3.

      Reviewer #2 (Public review):

      Summary:

      This study provides a valuable data set showing changes in the spatial organization of synaptic proteins at the retinogeniculate connection during a developmental period of active axonal and synaptic remodeling. The data collected by STORM microscopy is state-of-the-art in terms of the high-resolution view of the presynaptic components of a plastic synapse. The revision has addressed many, but not all, of the initial concerns about the authors interpretation of their data. However, with the revisions, the manuscript has become very dense and difficult to follow.

      We greatly appreciate the reviewer’s thoughtful comments through two rounds of review. To improve the clarity of the manuscript, we have substantially revised the work to streamline the narrative, clearly define terminology, and simplify data presentations, allowing readers to more directly interpret results and their implications.

      Strengths:

      The data presented is of good quality and provides an unprecedented view at high resolution of the presynaptic components of the retinogeniculate synapse during active developmental remodeling. This approach offers an advance to the previous mouse EM studies of this synapse because the CTB label allows identification of the eye from which the presynaptic terminal arises.

      Weaknesses:

      From these data the authors conclude that eye-specific increase in mAZ synapse density occur over retinogeniculate refinement, that sAZ synapses cluster close to mAZ synapses over age, and that this process depends on spontaneous activity and proximity to eye-specific mAZ synapses. While the interpretation of this data set is much more grounded in this revised submission, some of the authors' conclusions/statements still lack convincing supporting evidence.

      This includes:

      (1) The conclusion that multi-active zone synapses are loci for synaptic clustering. This statement, or similar ones (e.g., line 407) suggest that mAZ synapses actively or through some indirect way influence the clustering of sAZ synapses. There is no evidence for this. Clustering of retinal synapses are in part due to the fact that retinal inputs synapse on the proximal dendrites. With increased synaptogenesis, there will be increased density of retinal terminals that are closely localized. And with development, perhaps sAZ synapses mature into mAZ synapses. This scenario could also explain a large part of this data set.

      We thank the reviewer for their comment. We have removed the ambiguous phrasing and clarified the manuscript to explicitly discuss alternative interpretations consistent with the results (Lines 300-308). This includes a discussion of sAZ synapse maturation into mAZ inputs (Lines 294-296).

      (2) The conclusion that, "clustering depends on spontaneous retinal activity" could be misleading to the reader given that the authors acknowledge that their data is most consistent with a failure of synaptogenesis in the mutant mice (in the rebuttal). Additionally clustering does occur in CTB+ projections around mAZ synapses.

      We have removed the highlighted phrase and revised the manuscript to focus on differences in release site addition between eye-of-origin. We clarified our discussion of activity-dependent changes to state that synapses fail to form in the mutant and synaptic clustering was reduced (Lines 324-330).

      (3) Line 403: "Since mAZ synapses are expected to have a higher release probability, they likely play an important role in driving plasticity mechanisms reliant on neurotransmission.":What evidence do the authors have that mAZ are expected to have higher release probability?

      We thank the reviewer for their careful reading. Because they have several active zones, mAZ synapses are expected to have a higher number of release sites (N), which could be independent of release probability at any individual active zone (Pr). We have removed the reference to release probability. Instead, we maintain focus on active zone number.

      Reviewer #3 (Public review):

      This study is a follow-up to a recent study of synaptic development based on a powerful data set that combines anterograde labeling, immunofluorescence labeling of synaptic proteins, and STORM imaging (Cell Reports, 2023). Specifically, they use anti-Vglut2 label to determine the size of the presynaptic structure (which they describe as the vesicle pool size), anti-Bassoon to label active zones with the resolution to count them, and anti-Homer to identify postsynaptic densities. Their previous study compared the detailed synaptic structure across the development of synapses made with contra-projecting vs. ipsi-projecting RGCs and compared this developmental profile with a mouse model with reduced retinal waves. In this study, they produce a new detailed analysis on the same data set in which they classify synapses into "multi-active zone" vs. "single-active zone" synapses and assess the number and spacing of these synapses. The authors use measurements to make conclusions about the role of retinal waves in the generation of same-eye synaptic clusters, providing key insight into how neural activity drives synapse maturation.

      Strengths:

      This is a fantastic data set for describing the structural details of synapse development in a part of the brain undergoing activity-dependent synaptic rearrangements. The fact that they can differentiate eye of origin is what makes this data set unique over previous structural work. The addition of example images from EM data set provides confidence in their categorization scheme.

      Weaknesses:

      Though the descriptions of synaptic clusters are important and represent a significant advance, the authors conclusions regarding the biological processes driving these clusters are not testable by such a small sample. This limitation is expected given the massive effort that goes into generating this data set. Of course the authors are free to speculate, but many of the conclusions of the paper are not statistically supported.

      We thank the reviewer for their helpful comments throughout the revision process. We have substantially modified the manuscript to reframe the work around release site addition during eye-specific competition. Power tests and effect size measurements are presented for every figure in a new supplemental table S2.

      Reviewer #2 (Recommendations for the authors):

      (1) Authors should discuss that it is not clear what the relationship is between sAZ and mAZ, and sAZ could turn into a mAZ. This is not unreasonable that the number of AZ/bouton increases with development given that in the adult rodent retinogeniculate bouton, there is an average of 27 active zones (Budisantoso et al, 2012).

      We thank the reviewer for their helpful suggestion. We have added a discussion of the relationship between sAZ and mAZ inputs and the point that sAZ synapses may mature into mAZ synapses (Lines 294-296). We now reference the work of Budisantoso et al., J. Neurosci. 2012.   

      (2) The authors should clarify how the statistics are calculated for the normalized clustering index (figure 3B, C). For ratios of values each with variance, the variance is summed when calculating SEM.

      For clarity, we have removed the normalized clustering index analysis. We have simplified the work to present a clear definition of clustered and unclustered inputs, where clustering is defined by the presence of a nearby neighboring synapse within 1.5mm. We present the ratio of clustered and unclustered inputs for each input type and eye-of-origin. We also show the number of sAZ synapses nearby each clustered input (Figure 4).

      (3) The authors have significantly clarified the terminology that they use in the text. This is much appreciated. However, it would be helpful to the naïve reader if they could define their use of the word "synapse" as referring to individual active zones/release sites or to terminals/boutons. For example:

      Line 378: "Prior electron microscopy studies in the mouse found limited evidence of convergent synaptic clustering from neighboring RGCs at postnatal day 8 (10, 13), suggesting that the mAZ synapses seen in STORM images are single retinogeniculate terminals. The lack of synaptic convergence in prior EM reconstructions at P8 implies that early clustering around mAZ synapses may result from local output clustering within individual RGC arbors.":

      What do the authors mean by "convergent synaptic clustering": do they mean clustering of release sites from different RGC inputs? And what does "local output clustering" mean?

      We thank the reviewer for their suggestion to use clear terminology. We have revised the manuscript to define our use of the term “synapse” as a single active zone/release site (Lines 134-136). We refer to mAZ boutons in STORM data as “inputs”. We have revised the discussion of prior EM studies (Lines 130-132) and clarified all discussions of synaptic clustering throughout the work.

      (4) While the authors argue that the retina-specific β2-nAChR mice exhibit disrupted retinal waves and defects in eye specific segregation, the authors are studying issues of active zone density which may depend on mechanisms depending on the postsynaptic neuron. This should be acknowledged.

      We have updated the text to discuss the fact that postsynaptic mechanisms are also critical for the refinement of eye-specific synapses (Lines 332-340). We have added several additional references to the manuscript accordingly.

      Reviewer #3 (Recommendations for the authors):

      The authors have addressed many of my original concerns. The additional description of criteria for categorizing synapses, showing all the data points, gives the reader a stronger sense of where the numbers in the quantification come from. Replacing the "complex/simple" distinction with the "multi/single active zone" and the other clarifying text was effective. The addition of the EM data was also a very nice example to help interpret STORM images. It does appear there was no quantification on this EM data set and perhaps just a few example images were taken as "proof of principle". If, by chance, the authors have more EM images to make a data set of them that allows for some quantification, that would be great to add.

      We thank the reviewer for their helpful comments on the manuscript through both rounds of review. The EM data we collected were 2D images of a subset of physical sections at postnatal day 8. Most dAPEX2(+) profiles had a single active zone, but a definitive identification would require 3D imaging so that each terminal can be assessed in its entirety for release sites that might be missed in a single cross section. Similarly, multi-active zone boutons are positively identified in 2D images, but definitive measurements of AZ number would require 3D information. We analyzed our 2D EM images and present a plot of dAPEX2(+) profile size versus active zone number below. These measures are positively correlated (r = 0.74), with larger profiles containing more active zones.

      Author response image 1.<br />

      Unfortunately, we are not currently equipped to perform volumetric EM imaging at our home institution and are concerned that analysis of 2D data may be inconclusive. For these reasons, we are opting to maintain a qualitative presentation of our current EM results and we look forward to collaborating with other experts to achieve volumetric EM reconstructions in the future

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) Summary:

      The authors note that it is challenging to perform diffusion MRI tractography consistently in both humans and macaques, particularly when deep subcortical structures are involved. The scientific advance described in this paper is effectively an update to the tracts that the XTRACT software supports. The claims of robustness are based on a very small selection of subjects from a very atypical dMRI acquisition (n=50 from HCP-Adult) and an even smaller selection of subjects from a more typical study (n=10 from ON-Harmony).

      Strengths:

      The changes to XTRACT are soundly motivated in theory (based on anatomical tracer studies) and practice (changes in seeding/masking for tractography), and I think the value added by these changes to XTRACT should be shared with the field. While other bundle segmentation software typically includes these types of changes in release notes, I think papers are more appropriate.

      We would like to thank the reviewer for their assessment and we appreciate the comments for improving our manuscript. We have added new results, sampling from a larger cohort with a typical dMRI protocol (N=50 from UK Biobank), as well as showcasing examples from individual subject reconstructions (Supplementary figures S6, S7). We also demonstrate comparisons against another approach that has been proposed for extracting parts of the cortico-striatal bundle in a bundle segmentation fashion, as the reviewer suggests (see comment and Author response image 1 below). 

      We would also like to take the opportunity to summarise the novelty of our contribuIons, as detailed in the Introduction, which we believe extend beyond a mere software update; this is a byproduct of this work rather than the aim. 

      i) We devise for the first Ime standard-space protocols for 21 challenging cortico-subcortical bundles for both human and macaque and we interrogate them in a comprehensive manner.

      ii) We demonstrate robustness of these protocols using criteria grounded on neuroanatomy, showing that tractography reconstructions follow topographical principles known from tracers both in WM and GM and for both species. We also show that these protocols capture individual variability as assessed by respecting family structure in data from the HCP twins.

      iii) We use high-resolution dMRI data (HCP and post-mortem macaque) to showcase feasibility of these reconstructions, and we show that reconstructions are also plausible with more conventional data, such as the ones from the UK Biobank.

      iv) We further showcase robustness and the value of cross-species mapping by using these tractography reconstructions to predict known homologous grey matter (GM) regions across the two species, both in cortex and subcortex, on the basis of similarity of grey matter areal connection patterns to the set of proposed white matter bundles.

      Weaknesses

      (2) The demonstration of the new tracts does not include a large number of carefully selected scans and is only compared to the prior methods in XTRACT. The small n and limited statistical comparisons are insufficient to claim that they are better than an alternative. Qualitatively, this method looks sound.

      We appreciate the suggestion for larger sample size, so we performed the same analysis using 50 randomly drawn UK Biobank subjects, instead of ON-Harmony, matching the N=50 randomly drawn HCP subjects (detailed explanation in the comment below, Main text Figure 4A; Supplementary Figures S4). We also generated results using the full set of N=339 HCP unrelated subjects (Supplementary Figure S5 compares 10, 50 and 339 unrelated HCP subjects). We provide further details in the relevant point (3) below. 

      With regards to comparisons to other methods, there are not really many analogous approaches that we can compare against. In our knowledge there are no previous cross-species, standard space tractography protocols for the tracts we considered in this study (including Muratoff, amygdalofugal, different parts of extreme an external capsules, along with their neighbouring tracts). We therefore i) directly compared against independent neuroanatomical knowledge and patterns (Figures 2, 3, 5), ii) confirmed that patterns against data quality and individual variability that the new tracts demonstrate are similar to patterns observed for the more established cortical tracts (Figure 4), iii) indirectly assessed efficacy by performing a demanding task, such as homologue identification on the basis of the tracts we reconstruct (Figures 6, 7). 

      We need to point out that our approach is not “bundle segmentation”, in the sense of “datadriven” approaches that cluster streamlines into bundles following full-brain tractography. The latter is different in spirit and assigns a label to each generated streamline; as full-brain tractography is challenging (Maier-Hein, Nature Comms 2017), we follow instead the approach of imposing anatomical constraints to miIgate for some of these challenges as suggested in (MaierHein, 2017).

      Nevertheless, we used TractSeg (one of the few alternatives that considers corticostriatal bundles) to perform some comparisons. The Author response image below shows average path distributions across 10 HCP subjects for a few bundles that we also reconstruct in our paper (no temporal part of striatal bundle is generated by Tractseg). We can observe that the output for each tract is highly overlapping across subjects, indicating that there is not much individual variability captured. We also see the reduced specificity in the connectivity end-points of the bundles. 

      Author response image 1.

      Comparison between 10-subject average for example subcortical tracts using TractSeg and XTRACT. We chose example bundles shared between our set and TractSeg. Per subject TractSeg produces a binary mask rather than a path distribution per tract. Furthermore, the mask is highly overlapping across subjects. Where direct correspondence was not possible, we found the closest matching tract. Specifically, we used ST_PREF for STBf, and merged ST_PREC with ST_POSTC to match StBm. There was no correspondence for the temporal part of StB.

      We subsequently performed the twinness test using both TractSeg and XTRACT (Author response image 2), as a way to assess whether aspects of individual variability can be captured. Due to heritability of brain organisation features, we anticipate that monozygotic twins have more similar tract reconstructions compared to dizygoIc twins and subsequently non-twin siblings. This pattern is reproduced using our proposed approach, but not using TractSeg that provides a rather flat pattern.  

      Author response image 2.

      Violin plots of the mean pairwise Pearson’s correlations across tracts between 72 monozygotic (MZ) twin pairs, 72 dizygotic (DZ) twin pairs, 72 non-twin sibling pairs, and 72 unrelated subject pairs from the Human Connectome Project, using Tractseg (left) and XTRACT (right). About 12 cortico-subcortical tracts were considered, as closely matched as possible between the two approaches. For Tractseg we considered: 'CA', 'FX', 'ST_FO', 'ST_M1S1' (merged ‘ST_PREC’ and ‘ST_POSTC’ to approximate the sensorimotor part of our striatal bundle), 'ST_OCC', 'ST_PAR', 'ST_PREF',  'ST_PREM', 'T_M1S1' (merged ‘T_PREC’ and ‘T_POSTC’ to approximate the sensorimotor part of our striatal bundle), 'T_PREF', 'T_PREM', 'UF'. For XTRACT we considered: 'ac', 'fx', 'StB<sub>f</sub>', 'StB<sub>m</sub>', 'StB<sub>p</sub>', 'StB<sub>t</sub>, 'EmC<sub>f</sub>', 'EmC<sub>p</sub>', 'EmC<sub>t</sub>', 'MB', 'amf', 'uf'. Showing the mean (μ) and standard deviation (σ) for each group. There were no significant di^erences between groups using TractSeg.

      Taken together, these results indicate as a minimum that the different approaches have potentially different aims. Their different behaviour across the two approaches can be desirable and beneficial for different applications (for instance WM ROI segmentation vs connectivity analysis) but makes it challenging to perform like-to-like comparisons.

      (3) “Subject selection at each stage is unclear in this manuscript. On page 5 the data are described as "Using dMRI data from the macaque (𝑁 = 6) and human brain (𝑁 = 50)". Were the 50 HCP subjects selected to cover a range of noise levels or subject head motion? Figure 4 describes 72 pairs for each of monozygotic, dizygotic, non-twin siblings, and unrelated pairs - are these treated separately? Similarly, NH had 10 subjects, but each was scanned 5 times. How was this represented in the sample construction?”

      We appreciate the suggestions and we agree that some of the choices in terms of group sizes may have been confusing. Short answer is we did not perform any subject selection, subjects were randomly drawn from what we had available. The 72 twin pairs are simply the maximum number of monozygotic twin pairs available in the HCP cohort, so we used 72 pairs in all categories to match this number in these specific tests. The N=6 animals are good quality post-mortem dMRI data that have been acquired in the past and we cannot easily expand. For the rest of the points, we have now made the following changes:

      We have replaced our comparison to the ON-Harmony dataset (10 subjects) with a comparison to 50 unrelated UK Biobank subjects (to match the 50 unrelated HCP subject cohort used throughout). Updated results can be seen in Figure 4A and Supplementary Figure S4. This allows a comparison of tractography reconstruction between high quality and more conventional quality data for the same N.

      We looked at QC metrics to ensure our chosen cohorts were representaIve of the full cohorts we had available. The N=50 unrelated HCP cohort and N=50 unrelated UKBiobank cohorts we used in the study captured well the range of the full 339 unrelated HCP cohort and N=7192 UKBiobank cohort in terms of absolute/relative moion (Author response image 3A and 3B respectively). A similar pattern was observed in terms of SNR and CNR ranges Author response image 4).

      We generated tractography reconstructions for single subjects, corresponding to the 10th percentile (P<sub>10</sub>), median and 90th percentile (P90) of the distributions with respect to similarity to the cohort average maps. These are now shown in Supplementary Figures S6, S7. We also checked the QC metrics for these single subjects and confirmed that average absolute subject moIon was highest for the P<sub>10</sub>, followed by the P<sub>50</sub> and lowest for the P<sub>90</sub> subject, capturing a range of within cohort data quality.

      We generated reconstructions for an even larger HCP cohort (all 339 unrelated HCP subjects) and these look very similar to the N=50 reconstructions (Supplementary Figure S5).

      Author response image 3.

      Subsets chosen from the HCP and UKB reflect similar range of average motion (relative and absolute) to the corresponding full cohorts. (A) Absolute and relative motion comparison between N=50 and N=339 unrelated HCP subjects. (B) Absolute and relative motion comparison between N=50 and N=7192 super-healthy UKB subjects.  

      Author response image 4.

      Average SNR and CNR values show similar range between the N=50 UKB subset and the full UK Biobank cohort of N=7192.

      (4) In the paper, the authors state "the mean agreement between HCP and NH reconstructions was lower for the new tracts, compared to the original protocols (𝑝 < 10^−10). This was due to occasionally reconstructing a sparser path distribution, i.e., slightly higher false negative rate," - how can we know this is a false negative rate without knowing the ground truth?

      We are sorry for the terminology, we have corrected this, as it was confusing. Indeed, we cannot call it false negaIve, what we meant is that reconstructions from lower resolution data for these bundles ended up being in general sparser than the ones from the high-resolution data, potentially missing parts of the tract. We have now revised the text accordingly.

      Reviewer #2 Public Review:

      (5) Summary:

      In this article, Assimopoulos et al. expand the FSL-XTRACT software to include new protocols for identifying cortical-subcortical tracts with diffusion MRI, with a focus on tracts connecting to the amygdala and striatum. They show that the amygdalofugal pathway and divisions of the striatal bundle/external capsule can be successfully reconstructed in both macaques and humans while preserving large-scale topographic features previously defined in tract tracing studies. The authors set out to create an automated subcortical tractography protocol, and they accomplished this for a subset of specific subcortical connections for users of the FSL ecosystem.

      Strengths:

      A main strength of the current study is the translation of established anatomical knowledge to a tractography protocol for delineating cortical-subcortical tracts that are difficult to reconstruct. Diffusion MRI-based tractography is highly prone to false positives; thus, constraining tractography outputs by known anatomical priors is important. Key additional strengths include 1) the creation of a protocol that can be applied to both macaque and human data; 2) demonstration that the protocol can be applied to be high quality data (3 shells, > 250 directions, 1.25 mm isotropic, 55 minutes) and lower quality data (2 shells, 100 directions, 2 mm isotropic, 6.5 minutes); and 3) validation that the anatomy of cortical-subcortical tracts derived from the new method are more similar in monozygotic twins than in siblings and unrelated individuals.

      We thank the Reviewer for the globally posiIve evaluaIon of this work and the perInent comments that have helped us to improve the paper.

      Weaknesses

      (6) Although this work validates the general organizational location and topographic organization of tractography-derived cortical-subcortical tracts against prior tract tracing studies (a clear strength), the validation is purely visual and thus only qualitative. Furthermore, it is difficult to assess how the current XTRACT method may compare to currently available tractography approaches to delineating similar cortical-subcortical connections. Finally, it appears that the cortical-subcortical tractography protocols developed here can only be used via FSL-XTRACT (yet not with other dMRI software), somewhat limiting the overall accessibility of the method.

      We agree that a more quanItative comparison against gold standard tracing data would be ideal. However, there are practical challenges that prohibit such a comparison at this stage: i) Access to data. There are no quantifiable, openly shared, large scale/whole brain tracing data available. The Markov study provided the only openly available weighted connectivity matrices measured by tracers in macaques (Markov, Cereb Cortex 2014), which are only cortico-cortical and do not provide the white matter routes, they only quantify the relative contrast in connection terminals. ii) 2D microscopy vs 3D tractography. The vast majority of tracing data one can find in neuroanatomy labs is on 2D microscopy slices with restricted field of view, which is also the case for the data we had access to for this study. This complicates significantly like-to-like comparisons against 3D whole-brain tractography reconstructions. iii) Quantifiability is even tricky in the case of gold standard axonal tracing, as it depends on nuisance factors, e.g. injection site, injection size, injection uniformity and coverage, which confound the gold-standard measurements, but are not relevant for tractography. For these reasons, a number of high-profile NIH BRAIN CONNECTS Centres (for instance hXps://connects.mgh.harvard.edu/, hXps://mesoscaleconnecIvity.org/) are resourced to address these challenges at scale in the coming years and provide the tools to the community to perform such quantitative comparisons in the future.  

      In terms of comparison with other approaches, we have performed new tests and detail a response to a similar comment (2) from Reviewer 1.

      Finally, our protocols have been FSL-tested, but have nothing that is FSL specific. We cannot speak of performance when used with other tools, but there is nothing that prohibits translation of these standard space protocols to other tools. In fact, the whole idea behind XTRACT was to generate an approach open to external contributions for bundle-specific delineation protocols, both for humans and for non-human species. A number of XTRACT extensions that have been published over the last 5 years for other NHP species (Roumazeilles et al. (2020); Bryant et al. (2020); Wang et al. (2025)) and similar approaches have been used in commercial packages (Boshkovski et al, 2106, ISMRM 2022).

      Recommendations To the Authors:

      (7) Superiority of the FSL-XTRACT approach to delineating cortical-subcortical tracts. The Introduction of the article describes how "Tractography protocols for white matter bundles that reach deeper subcortical regions, for instance the striatum or the amygdala, are more difficult to standardize" due to the size, proximity, complexity, and bottlenecks associated with corticalsubcortical tracts. It would be helpful for the authors to better describe how the analytic approach adopted here overcomes these various challenges. What does the present approach do differently than prior efforts to examine cortical-subcortical connectivity? 

      There have not been many prior efforts to standardise cortico-subcortical connecIvity reconstructions, as we overview in the Introduction. As outlined in (Schilling et al. (2020),  hXps://doi.org/10.1007/s00429-020-02129-z), tractography reconstructions can be highly accurate if we guide them using constraints that dictate where pathways are supposed to go and where they should not go. This is the philosophy behind XTRACT and all the proposed protocols, which provide neuroanatomical constraints across different bundles. At the same time these constraints are relatively coarse so that they are species-generalisable. We have clarified that in Discussion. The approach we took was to first identify anatomical constraints from neuroanatomy literature for each tract of interest independently, derive and test these protocols in the macaque, and then optimise in an iterative fashion until the protocols generalise well to humans and until, when considering groups of bundles, the generated reconstructions can follow topographical principles known from tract tracing literature. This process took years in order to perform these iterations as meticulously as we could. We have modified the first sections in Methods to reflect this better (3rd paragraph of 1st Methods section), as well as modified the third and second to last paragraphs of the Introduction (“We propose an approach that addresses these challenges…”).

      (8) Relatedly, it is difficult to fully evaluate the utility of the current approach to dissecting cortical-subcortical tracts without a qualitative or quantitative comparison to approaches that already exist in the field. Can the authors show that (or clarify how) the FSL-XTRACT approach is similar to - or superior to - currently available methods for defining cortical-striatal and amygdalofugal tracts (e.g., methods they cite in the Introduction)?”

      From the limited similar approaches that exist, we did perform some comparisons against TractSeg, please see Reply to Comment 2 from Reviewer 1. We have also expanded the relevant text in the introduction to clarify the differences:

      “…However, these either uIlise labour-intensive single-subject protocols (22,26), are not designed to be generalisable across species (42, 43), or are based mostly on geometrically-driven parcellaIons that do not necessarily preserve topographical principles of connecIons (40). We propose an approach that addresses these challenges and is automated, standardised, generalisable across two species and includes a larger set of cortico-subcortical bundles than considered before, yielding tractography reconstructions that are driven by neuroanatomical constraints.”

      (9) Future applications of the tractography protocol:

      It would be helpful for the authors to describe the contexts in which the automated tractography approach developed here can (and cannot) be applied in future studies. Are future applications limited to diffusion data that has been processed with FSL's BEDPOSTX and PROBTRACKX? Can FSL-XTRACT take in diffusion data modelled in other software (e.g., with CSD in mrtrix or with GQI in DSI Studio)? Can the seed/stop/target/exclusion ROIs be applied to whole-brain tractography generated in other software? Integration with other software suites would increase the accessibility of the new tract dissection protocols.

      We have added some text in the Discussion to clarify this point. Our protocols have been FSLtested, but have nothing that is FSL specific. We cannot speak of performance of other tools, but there is nothing that prohibits translaIon of these standard space protocols to other tools. As described before, the protocols are recipes with anatomical constraints including regions the corresponding white matter pathways connect to and regions they do not, constructed with cross-species generalisability in mind. In fact a number of other packages (even commercial) have adopted the XTRACT protocols with success in the past, so we do not see anything in principle that prohibits these new protocols to be similarly adopted. 

      We cannot comment on the protocols’ relevance for segmenIng whole-brain tractograms, as these can induce more false posiIves than tractography reconstructions from smaller seed regions and may require stricter exclusions.    

      (10) It was great to see confirmation that the XTRACT approach can be successfully applied in both high-quality diffusion data from the HCP and in the ON-Harmony data. Given the somewhat degraded performance in the lower quality dataset (e.g., Figure 4A), can the authors speak to the minimum data requirements needed to dissect these new cortical-subcortical tracts? Will the approach work on single-shell, low b data? Is there a minimum voxel resolution needed? Which tracts are expected to perform best and worst in lower-quality data?

      Thank you for these comments, even if we have not really tried in lower (spaIal and angular) resolution data, given the proximity of the tracts considered, as well as the small size of some bundles, we would not recommend lower resolution than those of the UK Biobank protocol. In general, we would consider the UK Biobank protocol (2mm, 2 shells) as the minimum and any modern clinical scanner can achieve this in 6-8 minutes. We hence evaluated performance from high quality HCP to lower quality UK Biobank data, covering a considerable range (scan Ime from 55 minutes down to 6 minutes). 

      In terms of which tract reconstructions were more reproducible for UKBiobank data, the tracts with lowest correlations across subjects (Figure 4) were the anterior commissure (AC) and the temporal part of the Extreme Capsule (EmC<sub>t</sub>), while the highest correlations were for the Muratoff Bundle (MB) and the temporal part of the Striatal Bundle (StB<sub>t</sub>). Interestingly, for the HCP data, the temporal part of the Extreme Capsule (EmC<sub>t</sub>) and the Muratoff Bundle were also the tracts with the lowest/highest correlations, respectively. Hence, certain tract reconstructions were consistently more variable than others across subjects, which may hint to also being more challenging to reconstruct. We have now clarified these aspects in the corresponding Results section. 

      (11) Anatomical validation of the new cortical-subcortical tracts

      I really appreciated the use of prior tract tracing findings to anatomically validate the corticalsubcortical tractography outputs for both the cortical-striatal and amygdalofugal tracts. It struck me, however, that the anatomical validation was purely qualitative, focused on the relative positioning or the topographical organization of major connections. The anatomical validation would be strengthened if profiles of connectivity between cortical regions and specific subcortical nuclei or subcortical subdivisions could be quantitatively compared, if at all possible. Can the differential connectivity shown visually for the putamen in Figure 3 be quantified for the tract tracing data and the tractography outputs? Does the amygdalofugal bundle show differential/preferential connectivity across amygdala nuclei in tract tracing data, and is this seen in tractography?

      We appreciate the comment, please see Reply to your comment 6 above. In addiIon to the challenges described there, we do not have access to terminal fields other than in the striatum and these ones are 2D, so we make a qualitaIve comparison of the relevant connecIvity contrasts. We expect that a number of currently ongoing high-profile BRAIN CONNECTS Centres (such as the LINC and the CMC) will be addressing such challenges in the coming years and will provide the tools and data to the community to perform such quanItaIve comparisons at scale.  

      (12) I believe that all visualizations of the macaque and human tractography showed groupaveraged maps. What do these tracts look like at the individual level? Understanding individual-level performance and anatomical variation is important, given the Discussion paragraph on using this method to guide neuromodulation.

      We now demonstrate some representative examples of individual subject reconstructions in Supplementary Figures S6, S7, ranking subjects by the average agreement of individual tract reconstructions to the mean and depicting the 10th percentile, median and 90th percentile of these subjects. We have also shown more results in Author response images 1-2, generated by TractSeg, to indicate how a different bundle segmentation approach would handle individual variability compared to our approach.

      (13) Connectivity-based comparisons across species:

      Figures 5 and 6 of the manuscript show that, as compared to using only cortico-cortical XTRACT tracts, using the full set of XTRACT tracts (with new cortical-subcortical tracts) allows for more specific mapping of homologous subcortical and cortical regions across humans and macaques. Is it possible that this result is driven by the fact that the "connectivity blueprints" for the subcortex did not use an intermediary GM x WM matrix to identify connection patterns, whereas the connectivity blueprints for the cortex did? I was surprised that a whole brain GM x WM connectivity matrix was used in the cortical connectivity mapping procedure, given known problems with false positives etc., when doing whole brain tractography - especially aHer such anatomical detail was considered when deriving the original tracts. Perhaps the intermediary step lowers connectivity specificity and accuracy overall (as per Figure 9), accounting for the poorer performance for cortico-cortical tracts?

      The point is well-taken, however it cannot drive the results in Figures 5 and 6. Before explaining this further, let us clarify the raIonale of using the GMxWM connecIvity matrix, which we have published quite extensively in the past for cortico-cortical connecIons (Mars, eLife 2018 - Warrington, Neuroimage 2020 - Roumazeilles, PLoS Biology 2020 - Warrington, Science Advances 2022 – Bryant, J Neuroscience 2025). 

      Having established the bodies of the tract using the XTRACT protocols, we use this intermediate step of multiplying with a GM x WM connectivity matrix to estimate the grey matter projections of the tracts. The most obvious approach of tracking towards the grey matter (i.e. simply find where tracts intersect GM) has the problem that one moves through bottlenecks in the cortical gyrus and after which fibres fan out. Most tractography algorithms have problems resolving this fanning. However, we take the opposite approach of tracking from the grey matter surface towards the white matter (GMxWM connectivity matrix), thus following the direction in which the fibres are expected to merge, rather than to fan out. We then multiply the GMxWM tractrogram with that of the body of the tract to identify the grey matter endpoints of the tract. This avoids some of the major problems associated with tracking towards the surface. In fact, using this approach improves connectivity specificity towards the cortex, rather than the opposite. We provide some indicative results here for a few tracts:

      Author response image 5.

      Connectivity profiles for example cortico-cortical tracts with and without using the intermediary GMxWM matrix. Tracts considered are the Superior Longitudinal Fasciculus 1 (SLF<sub>1</sub>), Superior Longitudinal Fasciculus 2 (SLF<sub>2</sub>), the Frontal Aslant (FA) and the Inferior Fronto-Occipital Fasciculus (IFO). We see that the surface connectivity patterns without using the GMxWM intermediary matrix are more diffuse (effect of “fanning out” gyral bias), with reduced specificity, compared to whenusing the GMxWM matrix

      Tracking to/from subcortical nuclei does not have the same tractography challenges as tracking towards the cortex and in fact we found that using the intermediary GMxWM matrix is less favourable for subcortex (Figure 9), which is why we opted for not using it. 

      Regardless of how cortical and subcortical connectivity patterns are obtained, the results in Figures 5 and 6 utilise only cortical connectivity patterns. Hence, no matter what tracts are considered (cortico-cortical or cortico-subcortical) to build the connectivity patterns, these results have been obtained by always using the intermediate step of multiplying with the GMxWM connectivity matrix (i.e. it is not the case that cortical features are obtained with the intermediate step and subcortical features without, all of them have the intermediate step applied, as the connectivity patterns comprise of cortical endpoints). Figure 9 is only applicable for subcortical endpoints that play no role in the comparisons shown in Figures 5 and 6. We hope this clarifies this point.

      (14) Methodological clarifications:

      The Methods describe how anatomical masks used in tractography were delineated in standard macaque space and then translated to humans using "correspondingly defined landmarks". Can the authors elaborate as to how this translation from macaques to humans was accomplished?

      For a given tract, our process for building a protocol involved looking into the wider anatomical literature, including the standard white matter atlas of Schmahmann and Pandya (2006) and numerous anatomy papers that are referenced in the protocol description, to determine the expected path the tract was meant to take in white matter and which cortical and subcortical regions are connected. This helped us define constraints and subsequently the corresponding masks. The masks were created through the combination of hand-drawn ROIs and standard space atlases. We firstly started with the macaque where tracer literature is more abundant, but, importantly, our protocol definitions have been designed such that the same protocol can be applied to the human and macaque brain. All choices were made with this aspect in mind, hence corresponding landmarks between the two brains were considered in the mask definition (for instance “the putamen”, “a sub-commissural white matter mask”, the “whole frontal pole” etc, as described in the protocol descriptions).

      The protocols have not been created by a single expert but have been collated from multiple experts (co-authors SA, SW, DF, KB, SH, SS drove this aspect) and the final definitions have been agreed upon by the authors. 

      (15) The article heavily utilizes spatial path distribution maps/normalized path distributions, yet does not describe precisely what these are and how they were generated. Can the authors provide more detail, along with the rationale for using these with Pearson's correlations to compare tracts across subjects (as opposed to, e.g., overlap sensitivity/specificity or the Jaccard coefficient)?

      We have now clarified in text how these plots are generated, particularly when compared using correlation values. We tried Jaccard indices on binarized masks of the tracts and these gave similar trends to the correlations reported in Figure 4 (i.e. higher similarities within that across cohorts). We however feel that correlations are better than Jaccard indices, as the latter assume binary masks, so they focus on spatial overlap ignoring the actual values of the path distributions, we hence kept correlations in the paper.

      Reviewing Editor Comments

      “The reviewers had broadly convergent comments and were enthusiastic about the work. As further detailed by Reviewer 3 (see below), if the authors choose to pursue revisions, there are several elements that have the potential to enhance impact.”

      Thank you, we have replied accordingly and aimed to address most of the comments of the Reviewers.   

      “Comparison to existing methods. How does this approach compare to other approaches cited by the authors?”

      Please see replies to Comment 2 of Reviewer 1 and Comment 7 of Reviewer 2. Briefly, we have now generated new results and clarified aspects in the text. 

      “Minimum data requirements. How broadly can this approach be used across scan variation? How does this impact data from individual participants? Displaying individual participants may help, in addition to group maps.”

      Please see replies to Comment 10 of Reviewer2 on minimum data requirements and individual parIcipants, as well as to Comment 3 of Reviewer 1 on the actual groups considered. Briefly, we have generated new figures and regenerated results using UKBiobank data. 

      Softare. What are the sofware requirements? Is the approach interoperable with other methods?”

      Please see Reply to Comment 9 of Reviewer 2. Our protocols can be used to guide tractography using other types of data as they comprise of guiding ROIs for a given tract. So, although we have not tested them beyond FSL-XTRACT, we believe they can be useful with other tractography packages as well, as there is nothing FSL-specific in these anatomically-informed recipes. 

      “Comparisons with tract tracing. To the degree possible, quantitative comparisons with tract tracing data would bolster confidence in the method.”

      Please see Replies to Comments 6 and 11 of Reviewer 2. Briefly, we appreciate the comment and it is something we would love to do, but there are no data readily available that would allow such quanItaIve comparison in a meaningful way. This is a known challenge in the tractography field, which is why NIH has invested in two 5 year Centres to address it. Our approach will provide a solid starIng point for opImising and comparing further cortico-subcortical tractography reconstructions against microscopy and tracers in the same animal and at scale.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this study, Gu et al. employed novel viral strategies, combined with in vivo two-photon imaging, to map the tone response properties of two groups of cortical neurons in A1. The thalamocortical recipient (TR neurons) and the corticothalamic (CT neurons). They observed a clear tonotopic gradient among TR neurons but not in CT neurons. Moreover, CT neurons exhibited high heterogeneity of their frequency tuning and broader bandwidth, suggesting increased synaptic integration in these neurons. By parsing out different projecting-specific neurons within A1, this study provides insight into how neurons with different connectivity can exhibit different frequency response-related topographic organization.

      Strengths:

      This study reveals the importance of studying neurons with projection specificity rather than layer specificity since neurons within the same layer have very diverse molecular, morphological, physiological, and connectional features. By utilizing a newly developed rabies virus CSN-N2c GCaMP-expressing vector, the authors can label and image specifically the neurons (CT neurons) in A1 that project to the MGB. To compare, they used an anterograde trans-synaptic tracing strategy to label and image neurons in A1 that receive input from MGB (TR neurons).

      Weaknesses:

      Perhaps as cited in the introduction, it is well known that tonotopic gradient is well preserved across all layers within A1, but I feel if the authors want to highlight the specificity of their virus tracing strategy and the populations that they imaged in L2/3 (TR neurons) and L6 (CT neurons), they should perform control groups where they image general excitatory neurons in the two depths and compare to TR and CT neurons, respectively. This will show that it's not their imaging/analysis or behavioral paradigms that are different from other labs. 

      We thank the reviewer for these constructive suggestions. As recommended, we have performed control experiments that imaged the general excitatory neurons in superficial layers (shown below), and the results showed a clear tonotopic gradient, which was consistent with previous findings (Bandyopadhyay et al., 2010; Romero et al., 2020; Rothschild et al., 2010; Tischbirek et al., 2019), thereby validating the reliability of our imaging/analysis approach. The results are presented in a new supplemental figure (Figure 2- figure supplementary 3).

      Related publications:

      (1) Gu M, Li X, Liang S, Zhu J, Sun P, He Y, Yu H, Li R, Zhou Z, Lyu J, Li SC, Budinger E, Zhou Y, Jia H, Zhang J, Chen X. 2023. Rabies virus-based labeling of layer 6 corticothalamic neurons for two-photon imaging in vivo. iScience 26: 106625. DIO: https://doi.org/10.1016/j.isci.2023.106625, PMID: 37250327

      (2) Bandyopadhyay S, Shamma SA, Kanold PO. 2010. Dichotomy of functional organization in the mouse auditory cortex. Nat Neurosci 13: 361-8. DIO: https://doi.org/10.1038/nn.2490, PMID: 20118924

      (3) Romero S, Hight AE, Clayton KK, Resnik J, Williamson RS, Hancock KE, Polley DB. 2020. Cellular and Widefield Imaging of Sound Frequency Organization in Primary and Higher Order Fields of the Mouse Auditory Cortex. Cerebral Cortex 30: 1603-1622. DIO: https://doi.org/10.1093/cercor/bhz190, PMID: 31667491

      (4) Rothschild G, Nelken I, Mizrahi A. 2010. Functional organization and population dynamics in the mouse primary auditory cortex. Nat Neurosci 13: 353-60. DIO: https://doi.org/10.1038/nn.2484, PMID: 20118927

      (5) Tischbirek CH, Noda T, Tohmi M, Birkner A, Nelken I, Konnerth A. 2019. In Vivo Functional Mapping of a Cortical Column at Single-Neuron Resolution. Cell Rep 27: 1319-1326 e5. DIO: https://doi.org/10.1016/j.celrep.2019.04.007, PMID: 31042460

      Figures 1D and G, the y-axis is Distance from pia (%). I'm not exactly sure what this means. How does % translate to real cortical thickness?

      We thank the reviewer for this question. The distance of labeled cells from pia was normalized to the entire distance from pia to L6/WM border for each mouse, according to the previous study (Chang and Kawai, 2018). For all mice tested, the entire distance from pia to L6/WM border was 826.5 ± 23.4 mm (in the range of 752.9 to 886.1).

      Related publications:

      Chang M, Kawai HD. 2018. A characterization of laminar architecture in mouse primary auditory cortex. Brain Structure and Function 223: 4187-4209. DIO: https://doi.org/10.1007/s00429-018-1744-8, PMID: 30187193

      For Figure 2G and H, is each circle a neuron or an animal? Why are they staggered on top of each other on the x-axis? If the x-axis is the distance from caudal to rostral, each neuron should have a different distance? Also, it seems like it's because Figure 2H has more circles, which is why it has more variation, thus not significant (for example, at 600 or 900um, 2G seems to have fewer circles than 2H). 

      We sincerely appreciate the reviewer’s careful attention to the details of our figures. Each circle in the Figure 2G and H represents an individual imaging focal plane from different animals, and the median BF of some focal planes may be similar, leading to partial overlap. In the regions where overlap occurs, the brightness of the circle will be additive.

      Since fewer CT neurons, compared to TR neurons, responded to pure tones within each focal plane, as shown in Figure 2- figure supplementary 2, a larger number of focal planes were imaged to ensure a consistent and robust analysis of the pure tone response characteristics. The higher variance and lack of correlation in CT neurons is a key biological finding, not an artifact of sample size. The data clearly show a wide spread of median BFs at any given location for CT neurons, a feature absent in the TR population.

      Similarly, in Figures 2J and L, why are the circles staggered on the y-axis now? And is each circle now a neuron or a trial? It seems they have many more circles than Figure 2G and 2H. Also, I don't think doing a correlation is the proper stats for this type of plot (this point applies to Figures 3H and 3J).

      We regret any confusion have caused. In fact, Figure 2 illustrates the tonotopic gradient of CT and TR neurons at different scales. Specifically, Figures 2E-H present the imaging from the focal plane perspective (23 focal planes in Figures 2G, 40 focal planes in Figures 2H), whereas Figures 2I-L provide a more detailed view at the single-cell level (481 neurons in Figures 2J, 491 neurons in Figures 2L). So, Figures 2J and L do indeed have more circles than Figures 2G and H. The analysis at these varying scales consistently reveals the presence of a tonotopic gradient in TR neurons, whereas such a gradient is absent in CT neurons.

      We used Pearson correlation as a standard and direct method to quantify the linear relationship between a neuron's anatomical position and its frequency preference, which is widely used in the field to provide a quantitative measure (R-value) and a significance level (p-value) for the strength of a tonotopic gradient. The same statistical logic applies to testing for spatial gradients in local heterogeneity in Figure 3. We are confident that this is an appropriate and informative statistical approach for these data.

      What does the inter-quartile range of BF (IQRBF, in octaves) imply? What's the interpretation of this analysis? I am confused as to why TR neurons show high IQR in HF areas compared to LF areas, which means homogeneity among TR neurons (lines 213 - 216). On the same note, how is this different from the BF variability?  Isn't higher IQR equal to higher variability?

      We thank the reviewer for raising this important point. IQRBF, is a measure of local tuning heterogeneity. It quantifies the diversity of BFs among neighboring neurons. A small IQRBF means neighbors are similarly tuned (an orderly, homogeneous map), while a large IQRBF means neighbors have very different BFs (a disordered, heterogeneous map). (Winkowski and Kanold, 2013; Zeng et al., 2019).

      From the BF position reconstruction of all TR neurons (Figures 2I), most TR neurons respond to high-frequency sounds in the high-frequency (HF) region, but some neurons respond to low frequencies such as 2 kHz, which contributes to high IQR in HF areas. This does not contradict our main conclusion, that the TR neurons is significantly more homogeneous than the CT neurons. BF variability represents the stability of a neuron's BF over time, while IQR represents the variability of BF among different neurons within a certain range. (Chambers et al., 2023).

      Related publications:

      (1) Chambers AR, Aschauer DF, Eppler JB, Kaschube M, Rumpel S. 2023. A stable sensory map emerges from a dynamic equilibrium of neurons with unstable tuning properties. Cerebral Cortex 33: 5597-5612. DIO: https://doi.org/10.1093/cercor/bhac445, PMID: 36418925

      (2) Winkowski DE, Kanold PO. 2013. Laminar transformation of frequency organization in auditory cortex. Journal of Neuroscience 33: 1498-508. DIO: https://doi.org/10.1523/JNEUROSCI.3101-12.2013, PMID: 23345224

      (3) Zeng HH, Huang JF, Chen M, Wen YQ, Shen ZM, Poo MM. 2019. Local homogeneity of tonotopic organization in the primary auditory cortex of marmosets. Proceedings of the National Academy of Sciences of the United States of America 116: 3239-3244. DIO: https://doi.org/10.1073/pnas.1816653116, PMID: 30718428

      Figure 4A-B, there are no clear criteria on how the authors categorize V, I, and O shapes. The descriptions in the Methods (lines 721 - 725) are also very vague.

      We apologize for the initial vagueness and have replaced the descriptions in the Methods section. “V-shaped”: Neurons whose FRAs show decreasing frequency selectivity with increasing intensity. “I-shaped”: Neurons whose FRAs show constant frequency selectivity with increasing intensity. “O-shaped”: Neurons responsive to a small range of intensities and frequencies, with the peak response not occurring at the highest intensity level.

      To provide better visual intuition, we show multiple representative examples of each FRA type for both TR and CT neurons below. We are confident that these provide the necessary clarity and reproducibility for our analysis of receptive field properties.

      Author response image 1.

      Different FRA types within the dataset of TR and CT neurons. Each row shows 6 representative FRAs from a specific type. Types are V-shaped (‘V'), I-shaped (‘I’), and O-shaped (‘O’). The X-axis represents 11 pure tone frequencies, and the Y-axis represents 6 sound intensities.

      Reviewer #2 (Public Review):

      Summary:

      Gu and Liang et. al investigated how auditory information is mapped and transformed as it enters and exits an auditory cortex. They use anterograde transsynaptic tracers to label and perform calcium imaging of thalamorecipient neurons in A1 and retrograde tracers to label and perform calcium imaging of corticothalamic output neurons. They demonstrate a degradation of tonotopic organization from the input to output neurons.

      Strengths:

      The experiments appear well executed, well described, and analyzed.

      Weaknesses:

      (1) Given that the CT and TR neurons were imaged at different depths, the question as to whether or not these differences could otherwise be explained by layer-specific differences is still not 100% resolved. Control measurements would be needed either by recording (1) CT neurons in upper layers, (2) TR in deeper layers, (3) non-CT in deeper layers and/or (4) non-TR in upper layers.

      We appreciate these constructive suggestions. To address this, we performed new experiments and analyses.

      Comparison of TR neurons across superficial layers: we analyzed our existing TR neuron dataset to see if response properties varied by depth within the superficial layers. We found no significant differences in the fraction of tuned neurons, field IQR, or maximum bandwidth (BWmax) between TR neurons in L2/3 and L4. This suggests a degree of functional homogeneity within the thalamorecipient population across these layers. The results are presented in new supplemental figures (Figure 2- figure supplementary 4).

      Necessary control experiments.

      (1) CT neurons in upper layers. CT neurons are thalamic projection neurons that only exist in the deeper cortex, so CT neurons do not exist in upper layers (Antunes and Malmierca, 2021).

      (2) TR neurons in deeper layers. As we mentioned in the manuscript, due to high-titer AAV1-Cre virus labeling controversy (anterograde and retrograde labelling both exist), it is challenging to identify TR neurons in deeper layers.

      (3) non-CT in deeper layers and/or (4) non-TR in upper layers.

      To directly test if projection identity confers distinct functional properties within the same cortical layers, we performed the crucial control of comparing TR neurons to their neighboring non-TR neurons. We injected AAV1-Cre in MGB and a Cre-dependent mCherry into A1 to label TR neurons red. We then co-injected AAV-CaMKII-GCaMP6s to label the general excitatory population green.  In merged images, this allowed us to functionally image and directly compare TR neurons (yellow) and adjacent non-TR neurons (green). We separately recorded the responses of these neurons to pure tones using two-photon imaging. The results show that TR neurons are significantly more likely to be tuned to pure tones than their neighboring non-TR excitatory neurons. This finding provides direct evidence that a neuron's long-range connectivity, and not just its laminar location, is a key determinant of its response properties. The results are presented in new supplemental figures (Figure 2- figure supplementary 5).

      Related publications:

      Antunes FM, Malmierca MS. 2021. Corticothalamic Pathways in Auditory Processing: Recent Advances and Insights From Other Sensory Systems. Front Neural Circuits 15: 721186. DIO: https://doi.org/10.3389/fncir.2021.721186, PMID: 34489648

      (2) What percent of the neurons at the depths are CT neurons? Similar questions for TR neurons?

      We thank the reviewer for the comments. We performed histological analysis on brain slices from our experimental animals to quantify the density of these projection-specific populations. Our analysis reveals that CT neurons constitute approximately 25.47%\22.99%–36.50% of all neurons in Layer 6 of A1. In the superficial layers(L2/3 and L4), TR neurons comprise approximately 10.66%\10.53%–11.37% of the total neuronal population.

      Author response image 2.

      The fraction of CT and TR neurons. (A) Boxplots showing the fraction of CT neurons. N = 11 slices from 4 mice. (B) Boxplots showing the fraction of TR neurons. N = 11 slices from 4 mice.

      (3) V-shaped, I-shaped, or O-shaped is not an intuitively understood nomenclature, consider changing. Further, the x/y axis for Figure 4a is not labeled, so it's not clear what the heat maps are supposed to represent.

      The terms "V-shaped," "I-shaped," and "O-shaped" are an established nomenclature in the auditory neuroscience literature for describing frequency response areas (FRAs), and we use them for consistency with prior work. V-shaped: Neurons whose FRAs show decreasing frequency selectivity with increasing intensity. I-shaped: Neurons whose FRAs show constant frequency selectivity with increasing intensity. O-shaped: Neurons responsive to a small range of intensities and frequencies, with the peak response not occurring at the highest intensity level.

      (Rothschild et al., 2010). We have included a more detailed description in the Methods.

      The X-axis represents 11 pure tone frequencies, and the Y-axis represents 6 sound intensities. So, the heat map represents the FRA of neurons in A1, reflecting the responses for different frequencies and intensities of sound stimuli. In the revised manuscript, we have provided clarifications in the figure legend.

      (4) Many references about projection neurons and cortical circuits are based on studies from visual or somatosensory cortex. Auditory cortex organization is not necessarily the same as other sensory areas. Auditory cortex references should be used specifically, and not sources reporting on S1, and V1.

      We thank the reviewers for their valuable comments. We have made a concerted effort to ensure that claims about cortical circuit organization are supported by findings specifically from the auditory cortex wherever possible, strengthening the focus and specificity of our discussion.

      Reviewer #3 (Public Review):

      Summary:

      The authors performed wide-field and 2-photon imaging in vivo in awake head-fixed mice, to compare receptive fields and tonotopic organization in thalamocortical recipient (TR) neurons vs corticothalamic (CT) neurons of mouse auditory cortex. TR neurons were found in all cortical layers while CT neurons were restricted to layer 6. The TR neurons at nominal depths of 200-400 microns have a remarkable degree of tonotopy (as good if not better than tonotopic maps reported by multiunit recordings). In contrast, CT neurons were very heterogenous in terms of their best frequency (BF), even when focusing on the low vs high-frequency regions of the primary auditory cortex. CT neurons also had wider tuning.

      Strengths:

      This is a thorough examination using modern methods, helping to resolve a question in the field with projection-specific mapping.

      Weaknesses:

      There are some limitations due to the methods, and it's unclear what the importance of these responses are outside of behavioral context or measured at single timepoints given the plasticity, context-dependence, and receptive field 'drift' that can occur in the cortex.

      (1) Probably the biggest conceptual difficulty I have with the paper is comparing these results to past studies mapping auditory cortex topography, mainly due to differences in methods. Conventionally, the tonotopic organization is observed for characteristic frequency maps (not best frequency maps), as tuning precision degrades and the best frequency can shift as sound intensity increases. The authors used six attenuation levels (30-80 dB SPL) and reported that the background noise of the 2-photon scope is <30 dB SPL, which seems very quiet. The authors should at least describe the sound-proofing they used to get the noise level that low, and some sense of noise across the 2-40 kHz frequency range would be nice as a supplementary figure. It also remains unclear just what the 2-photon dF/F response represents in terms of spikes. Classic mapping using single-unit or multi-unit electrodes might be sensitive to single spikes (as might be emitted at characteristic frequency), but this might not be as obvious for Ca2+ imaging. This isn't a concern for the internal comparison here between TR and CT cells as conditions are similar, but is a concern for relating the tonotopy or lack thereof reported here to other studies.

      We sincerely thank the reviewer for the thoughtful evaluation of our manuscript and for your positive assessment of our work.

      (1)  Concern regarding Best Frequency (BF) vs. Characteristic Frequency (CF)

      Our use of BF, defined as the frequency eliciting the highest response averaged across all sound levels, is a standard and practical approach in 2-photon Ca²⁺ imaging studies. (Issa et al., 2014; Rothschild et al., 2010; Schmitt et al., 2023; Tischbirek et al., 2019). This method is well-suited for functionally characterizing large numbers of neurons simultaneously, where determining a precise firing threshold for each individual cell can be challenging.

      (2) Concern regarding background noise of the 2-photon setup

      We have expanded the Methods section ("Auditory stimulation") to include a detailed description of the sound-attenuation strategies used during the experiments. The use of a custom-built, double-walled sound-proof enclosure lined with wedge-shaped acoustic foam was implemented to significantly reduce external noise interference. These strategies ensured that auditory stimuli were delivered under highly controlled, low-noise conditions, thereby enhancing the reliability and accuracy of the neural response measurements obtained throughout the study.

      (3) Concern regarding the relationship between dF/F and spikes

      While Ca²⁺ signals are an indirect and filtered representation of spiking activity, they are a powerful tool for assessing the functional properties of genetically-defined cell populations. As you note, the properties and limitations of Ca²⁺ imaging apply equally to both the TR and CT neuron groups we recorded. Therefore, the profound difference we observed—a clear tonotopic gradient in one population and a lack thereof in the other—is a robust biological finding and not a methodological artifact.

      Related publications:

      (1) Issa JB, Haeffele BD, Agarwal A, Bergles DE, Young ED, Yue DT. 2014. Multiscale optical Ca2+ imaging of tonal organization in mouse auditory cortex. Neuron 83: 944-59. DIO: https://doi.org/10.1016/j.neuron.2014.07.009, PMID: 25088366

      (2) Rothschild G, Nelken I, Mizrahi A. 2010. Functional organization and population dynamics in the mouse primary auditory cortex. Nat Neurosci 13: 353-60. DIO: https://doi.org/10.1038/nn.2484, PMID: 20118927

      (3) Schmitt TTX, Andrea KMA, Wadle SL, Hirtz JJ. 2023. Distinct topographic organization and network activity patterns of corticocollicular neurons within layer 5 auditory cortex. Front Neural Circuits 17: 1210057. DIO: https://doi.org/10.3389/fncir.2023.1210057, PMID: 37521334

      (4) Tischbirek CH, Noda T, Tohmi M, Birkner A, Nelken I, Konnerth A. 2019. In Vivo Functional Mapping of a Cortical Column at Single-Neuron Resolution. Cell Rep 27: 1319-1326 e5. DIO: https://doi.org/10.1016/j.celrep.2019.04.007, PMID: 31042460

      (2) It seems a bit peculiar that while 2721 CT neurons (N=10 mice) were imaged, less than half as many TR cells were imaged (n=1041 cells from N=5 mice). I would have expected there to be many more TR neurons even mouse for mouse (normalizing by number of neurons per mouse), but perhaps the authors were just interested in a comparison data set and not being as thorough or complete with the TR imaging?

      As shown in the Figure 2- figure supplementary 2, a much higher fraction of TR neurons was "tuned" to pure tones (46% of 1041 neurons) compared to CT neurons (only 18% of 2721 neurons). To obtain a statistically robust and comparable number of tuned neurons for our core analysis (481 tuned TR neurons vs. 491 tuned CT neurons), it was necessary to sample a larger total population of CT neurons, which required imaging from more animals.

      (3) The authors' definitions of neuronal response type in the methods need more quantitative detail. The authors state: "Irregular" neurons exhibited spontaneous activity with highly variable responses to sound stimulation. "Tuned" neurons were responsive neurons that demonstrated significant selectivity for certain stimuli. "Silent" neurons were defined as those that remained completely inactive during our recording period (> 30 min). For tuned neurons, the best frequency (BF) was defined as the sound frequency associated with the highest response averaged across all sound levels.". The authors need to define what their thresholds are for 'highly variable', 'significant', and 'completely inactive'. Is best frequency the most significant response, the global max (even if another stimulus evokes a very close amplitude response), etc.

      We appreciate the reviewer's suggestions. We have added more detailed description in the Methods.

      Tuned neurons: A responsive neuron was further classified as "Tuned" if its responses showed significant frequency selectivity. We determined this using a one-way ANOVA on the neuron's response amplitudes across all tested frequencies (at the sound level that elicited the maximal response). If the ANOVA yielded a p-value < 0.05, the neuron was considered "Tuned”. Irregular neurons: Responsive neurons that did not meet the statistical criterion for being "Tuned" (i.e., ANOVA p-value ≥ 0.05) were classified as "Irregular”. This provides a clear, mutually exclusive category for sound-responsive but broadly-tuned or non-selective cells. Silent neurons: Neurons that were not responsive were classified as "Silent". This quantitatively defines them as cells that showed no significant stimulus-evoked activity during the entire recording session. Best frequency (BF): It is the frequency that elicited the maximal mean response, averaged across all sound levels.

      To provide greater clarity, we showed examples in the following figures.

      Author response image 3.

      Reviewer #1 (Recommendations For The Authors):

      (1) A1 and AuC were used exchangeably in the text.

      Thank you for pointing out this issue. Our terminological strategy was to remain faithful to the original terms used in the literature we cite, where "AuC" is often used more broadly. In the revised manuscript, we have performed a careful edit to ensure that we use the specific term "A1" (primary auditory cortex) when describing our own results and recording locations, which were functionally and anatomically confirmed.

      (2) Grammar mistakes throughout.

      We are grateful for the reviewer’s suggested improvement to our wording. The entire manuscript has undergone a thorough professional copyediting process to correct all grammatical errors and improve overall readability.

      (3) The discussion should talk more about how/why L6 CT neurons don't possess the tonotopic organization and what are the implications. Currently, it only says 'indicative of an increase in synaptic integration during cortical processing'...

      Thanks for this suggestion. We have substantially revised and expanded the Discussion section to explore the potential mechanisms and functional implications of the lack of tonotopy in L6 CT neurons.

      Broad pooling of inputs: We propose that the lack of tonotopy is an active computation, not a passive degradation. CT neurons likely pool inputs from a wide range of upstream neurons with diverse frequency preferences. This broad synaptic integration, reflected in their wider tuning bandwidth, would actively erase the fine-grained frequency map in favor of creating a different kind of representation.

      A shift from topography to abstract representation: This transformation away from a classic sensory map may be critical for the function of corticothalamic feedback. Instead of relaying "what" frequency was heard, the descending signal from CT neurons may convey more abstract, higher-order information, such as the behavioral relevance of a sound, predictions about upcoming sounds, or motor-related efference copy signals that are not inherently frequency-specific.’

      Modulatory role of the descending pathway: The descending A1-to-MGB pathway is often considered to be modulatory, shaping thalamic responses rather than driving them directly. A modulatory signal designed to globally adjust thalamic gain or selectivity may not require, and may even be hindered by, a fine-grained topographical organization.

      Reviewer #2 (Recommendations For The Authors):

      (1) Given that the CT and TR neurons were imaged at different depths, the question as to whether or not these differences could otherwise be explained by layer-specific differences is still not 100% resolved. Control measurements would be needed either by recording (1) CT neurons in upper layers (2) TR in deeper layers (3) non-CT in deeper layers and/or (4) non-TR in upper layers.

      We appreciate these constructive suggestions. To address this, we performed new experiments and analyses.

      Comparison of TR neurons across superficial layers: we analyzed our existing TR neuron dataset to see if response properties varied by depth within the superficial layers. We found no significant differences in the fraction of tuned neurons, field IQR, or maximum bandwidth (BWmax) between TR neurons in L2/3 and L4. This suggests a degree of functional homogeneity within the thalamorecipient population across these layers.

      Necessary control experiments.

      (1) CT neurons in upper layers. CT neurons are thalamic projection neurons that only exist in the deeper cortex, so CT neurons do not exist in upper layers (Antunes and Malmierca, 2021).

      (2) TR neurons in deeper layers. As we mentioned in the manuscript, due to high-titer AAV1-Cre virus labeling controversy (anterograde and retrograde labelling both exist), it is challenging to identify TR neurons in deeper layers.

      (3) non-CT in deeper layers and/or (4) non-TR in upper layers.

      To directly test if projection identity confers distinct functional properties within the same cortical layers, we performed the crucial control of comparing TR neurons to their neighboring non-TR neurons. We injected AAV1-Cre in MGB and a Cre-dependent mCherry into A1 to label TR neurons red. We then co-injected AAV-CaMKII-GCaMP6s to label the general excitatory population green.  In merged images, this allowed us to functionally image and directly compare TR neurons (yellow) and adjacent non-TR neurons (green). We separately recorded the responses of these neurons to pure tones using two-photon imaging. The results show that TR neurons are significantly more likely to be tuned to pure tones than their neighboring non-TR excitatory neurons. This finding provides direct evidence that a neuron's long-range connectivity, and not just its laminar location, is a key determinant of its response properties.

      Related publications:

      Antunes FM, Malmierca MS. 2021. Corticothalamic Pathways in Auditory Processing: Recent Advances and Insights From Other Sensory Systems. Front Neural Circuits 15: 721186. DIO: https://doi.org/10.3389/fncir.2021.721186, PMID: 34489648

      (3) V-shaped, I-shaped, or O-shaped is not an intuitively understood nomenclature, consider changing. Further, the x/y axis for Figure 4a is not labeled, so it's not clear what the heat maps are supposed to represent.

      The terms "V-shaped," "I-shaped," and "O-shaped" are an established nomenclature in the auditory neuroscience literature for describing frequency response areas (FRAs), and we use them for consistency with prior work. V-shaped: Neurons whose FRAs show decreasing frequency selectivity with increasing intensity. I-shaped: Neurons whose FRAs show constant frequency selectivity with increasing intensity. O-shaped: Neurons responsive to a small range of intensities and frequencies, with the peak response not occurring at the highest intensity level.

      (Rothschild et al., 2010). We have included a more detailed description in the Methods.

      The X-axis represents 11 pure tone frequencies, and the Y-axis represents 6 sound intensities. So, the heat map represents the FRA of neurons in A1, reflecting the responses for different frequencies and intensities of sound stimuli. In the revised manuscript, we have provided clarifications in the figure legend.

      (4) Many references about projection neurons and cortical circuits are based on studies from visual or somatosensory cortex. Auditory cortex organization is not necessarily the same as other sensory areas. Auditory cortex references should be used specifically, and not sources reporting on S1, V1.

      We thank the reviewers for their valuable comments. We have made a concerted effort to ensure that claims about cortical circuit organization are supported by findings specifically from the auditory cortex wherever possible, strengthening the focus and specificity of our discussion.

      Reviewer #3 (Recommendations For The Authors):

      I suggest showing some more examples of how different neurons and receptive field properties were quantified and statistically analyzed. Especially in Figure 4, but really throughout.

      We thank the reviewer for this valuable suggestion. To provide greater clarity, we have added more examples in the following figure.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary 

      The authors describe a method for gastruloid formation using mouse embryonic stem cells (mESCs) to study YS and AGM-like hematopoietic differentiation. They characterise the gastruloids during nine days of differentiation using a number of techniques including flow cytometry and single-cell RNA sequencing. They compare their findings to a published data set derived from E10-11.5 mouse AGM. At d9, gastruloids were transplanted under the adrenal gland capsule of immunocompromised mice to look for the development of cells capable of engrafting the mouse bone marrow. The authors then applied the gastruloid protocol to study overexpression of Mnx1 which causes infant AML in humans.

      In the introduction, the authors define their interpretation of the different waves of hematopoiesis that occur during development. 'The subsequent wave, known as definitive, produces: first, oligopotent erythro-myeloid progenitors (EMPs) in the YS (E8-E8.5); and later myelo-lymphoid progenitors (MLPs - E9.5-E10), multipotent progenitors (MPPs - E10-E11.5), and hematopoietic stem cells (HSCs - E10.5-E11.5), in the aorta-gonad-mesonephros (AGM) region of the embryo proper.' Herein they designate the yolk sac-derived wave of EMP hematopoiesis as definitive, according to convention, although paradoxically it does not develop from intra-embryonic mesoderm or give rise to HSCs.

      Our definition of primitive and definitive waves is widely used in the field (e.g. PMID: 18204427; PMID: 28299650; PMID: 33681211). Definitive haematopoiesis, encompassing EMP, MLP, MPP and HSC, highlights their origin from haemogenic endothelium, generation of mature cells with adult characteristics from progenitors with multilineage potential and direct and indirect developmental contributions to the intra-embryonic and time-restricted generation of HSCs. 

      General comments 

      The authors make the following claims in the paper: 

      (1) The development of a protocol for hemogenic gastruloids (hGx) that recapitulates YS and AGMlike waves of blood from HE.

      (2) The protocol recapitulates both YS and EMP-MPP embryonic blood development 'with spatial and temporal accuracy'.

      (3) The protocol generates HSC precursors capable of short-term engraftment in an adrenal niche.

      (4) Overexpression of MNX1 in hGx transforms YS EMP to 'recapitulate patient transcriptional signatures'.

      (5) hGx is a model to study normal and leukaemic embryonic hematopoiesis. 

      There are major concerns with the manuscript. The statements and claims made by the authors are not supported by the data presented, data is overinterpreted, and the conclusions cannot be justified. Furthermore, the data is presented in a way that makes it difficult for the reader to follow the narrative, causing confusion. The authors have not discussed how their hGx compares to the previously published mouse embryoid body protocols used to model early development and hematopoiesis. Specific points 

      (1) It is claimed that HGxs capture cellularity and topography of developmental blood formation. The hGx protocol described in the manuscript is a modification of a previously published gastruloid protocol (Rossi et al 2022). The rationale for the protocol modifications is not fully explained or justified. There is a lack of novelty in the presented protocol as the only modifications appear to be the inclusion of Activin A and an extension of the differentiation period from 7 to 9 days of culture. No direct comparison has been made between the two versions of gastruloid differentiation to justify the changes.

      The Reviewer paradoxically claims that the protocol is not novel and that it differs from a previous publication in at least 2 ways – the patterning pulse and the length of the protocol. Of these, the patterning pulse is key. As documented in Fig. 1S1, we cannot obtain Flk1-GFP expression in the absence of Activin A (Fig. 1S1A), and the concentration of Activin A scales activity of the Flk1 locus (Fig. 1S1B). Expression of Flk1 is a fundamental step in haemato-endothelial specification and, accordingly, we do not see CD41 or CD45+ cells in the absence of Activin A. Furthermore, these markers also titrate with the dose of Activin A (in Fig. 1S1B).

      Also, in our hands, there is a clear time-dependent progression of marker expression, with sequential acquisition of CD41 and CD45, with the latter not detectable until 192h (Fig. 1C-D), another key difference relative to the Rossi et al (2022) protocol. We suggest, and present further evidence for in this rebuttal and the revised manuscript, that the 192h-timepoint captures the onset of AGM-like haematopoiesis. We have edited the manuscript to clarify the differences and novelty in our protocol (lines 132-143) and provided a more detailed comparison with the report from Rossi et al. (2022) in the Discussion (lines 574-586).

      The inclusion of Activin A at high concentration at the beginning of differentiation would be expected to pattern endoderm rather than mesoderm. BMP signaling is required to induce Flk1+ mesoderm, even in the presence of Wnt.

      Again, we call the Reviewer’s attention to Fig. 1S1A which clearly shows that Activin A (with no BMP added) is required for induction of Flk1 expression, in the presence of Wnt. Activin A in combination with Wnt, is used in other protocols of haemato-endothelial differentiation from pluripotent cells, with no BMP added in the same step of patterning and differentiation (PMID: 39227582; PMID: 39223325). In the latter protocol, we also call the Reviewer’s attention to the fact that a higher concentration of Activin A precludes the need for BMP4 addition. Finally, one of us has recently reported that Activin A, on its own, will induce Flk1, as well as other anterior mesodermal progenitors (https://www.biorxiv.org/content/10.1101/2025.01.11.632562v1). In addressing the Reviewer’s concerns with the dose of Activin A used, we titrated its concentration against activation of Flk1, confirming optimal Flk1-GFP expression at the 100ng/ml dose used in the manuscript. We have included this data in the manuscript in Figure 1S1B.                         

      FACS analysis of the hGx during differentiation is needed to demonstrate the co-expression of Flk1GFP and lineage markers such as CD34 to indicate patterning of endothelium from Flk1+ mesoderm. The FACS plots in Fig. 1 show C-Kit expression but very little VE-cadherin which suggests that CD34 is not induced. Early endoderm expresses C-Kit, CXCR4, and Epcam, but not CD34 which could account for the lack of vascular structures within the hGx as shown in Fig. 1E.

      We were surprised by the Reviewer’s comment that there are no endothelial structures in our haemogenic gastruloids. The presence of a Flk1-GFP+ network is visible in the GFP images in Fig. 1B, from 144h onwards, and is detailed in the revised Fig. 2A, which shows overlap between Flk1GFP and the endothelial marker CD31. In addition, our single-cell RNA-seq data, included in the manuscript, confirms the presence of endothelial cells with a developing endothelial, including arterial, programme. This is now presented in the revised Fig. 3B-D of the manuscript, which updates a representation in the original manuscript. In contrast with the Reviewer’s claims that no endothelial cells are formed, the data show that Kdr (Flk1)+ cells co-express Cdh5/VE-Cadherin and indeed Cd34, attesting to the presence of an endothelial programme. Arterial markers Efnb2, Flt1, and Dll4 are present. A full-blown programme, which also includes haemogenic markers including Sox17, Esam, Cd44 and Mecom is clear at early (144h) and, particularly at late (192h) timepoints in cells sorted on detection of surface C-Kit (Fig. 3B-E in the manuscript). To address the specific point by the Reviewer, we also document co-expression of Flk1-GFP, CD34 and/or CD31 by flow cytometry (Fig. 2S1A-B in the revised manuscript).

      To summarise new and revised data in the manuscript in relation to this point:

      Immunofluorescence staining showing the Flk1-GFP-defined vascular network in Figure 1E and co-expression of endothelial marker CD31 in Figure 2A. In text: lines 159-163; 178-180.

      Flow cytometry analysis of co-expression of Flk1-GFP with CD31 and CD34 in Figure 2S1AD, including controls. In text: 180-187.

      Real-time quantitative (q)PCR analysis showing time-dependent expression of haematoendothelial and arterial markers in Figure 2F (specifically Dll4 and Mecom). In text: 200-209.

      An improved representation of our scRNA-seq data highlighting key haemato-endothelial markers in Figure 3B-D. In text: 268-304

      (2) The protocol has been incompletely characterised, and the authors have not shown how they can distinguish between either wave of Yolk Sac (YS) hematopoiesis (primitive erythroid/macrophage and erythro-myeloid EMP) or between YS and intraembryonic Aorta-Gonad-Mesonephros (AGM) hematopoiesis. No evidence of germ layer specification has been presented to confirm gastruloid formation, organisation, and functional ability to mimic early development. Furthermore, differentiation of YS primitive and YS EMP stages of development in vitro should result in the efficient generation of CD34+ endothelial and hematopoietic cells. There is no flow cytometry analysis showing the kinetics of CD34 cell generation during differentiation. Benchmarking the hGx against developing mouse YS and embryo data sets would be an important verification. 

      The Reviewer is correct that we have not provided detailed characterisation of the different germ layers, as this was not the focus of the study. In that context, we were surprised by the earlier comment assuming co-expression of C-Kit, Cxcr4 and Epcam, which we did not show, while overlooking the endothelial programme reiterated above, which we have presented. Given our focus on haemato-endothelial specification, we have started the single-cell RNA-seq characterisation of the haemogenic gastruloid at 120h and have not looked specifically at earlier timepoints of embryo patterning. This said, we show the presence of neuroectodermal cells in cluster 9; on the other hand, cluster 7 includes hepatoblast-like cells, denoting endodermal specification (Supplementary File S2). However, in the absence of earlier timepoints and given the bias towards mesodermal specification, we expect that specification of ectodermal and endodermal programmes may be incomplete. 

      In respect of the contention regarding the capture of YS-like and AGM-like haematopoiesis, we had presented evidence in the original version of the manuscript that haemogenic cells generated during gastruloid differentiation, particularly at late 192h and 216h timepoints project onto highly purified CKit+ CD31+ Gfi1-expressing cells from mouse AGM (PMID: 38383534), providing support for at least partial recapitulation of the corresponding developmental stage. These projections are represented in Fig. 4A, right and 4S1C of the revised manuscript. In distinguishing between YS-like and AGM-like haematopoiesis, we call the Reviewer’s attention to the replotting of the single-cell RNA-seq data already in the manuscript, which we provided in response to point 1 (Fig. 3B-D and 3S2B), which highlights an increase in Sox17, but not Sox18, expression in the 192h haemogenic endothelium, which suggests an association with AGM haematopoiesis (PMID: 20228271). A significant association of Cd44 and Procr expression with the same time-point (Fig. 3B-D in the manuscript), further supports an AGM-like endothelial-to-haematopoietic transition at the 192h timepoint. We have re-analysed the scRNA-seq data to better represent the expression of these markers in Fig. 3A-E and S32B. We agree that it remains challenging to identify markers exclusive to AGM haematopoiesis, which is operationally equated with generation of transplantable haematopoietic stem cells. While HSC generation is a key event characteristic of the AGM, not all AGM haematopoiesis corresponds to HSCs, an important point in evaluating the data presented in the manuscript, and one that is acknowledged by us. The main text has been edited to clarify the experiments pertaining to distinguishing AGM and YS haematopoiesis, which are detailed in lines 180-187, 200-221, 268-304, and 315-356.

      Following on the Reviewer’s comments about Cd34, we also inspected co-expression of Cd34 with Cd41 and Cd45, the latter co-expression present in, although not necessarily exclusive to, AGM haematopoiesis. Reassuringly, we observed clear co-expression with both markers (Author response image 1), in addition to a CD41+CD34- population, which likely reflects YS EMP-independent erythropoiesis. Flow cytometry analysis of co-expression of CD31 and CD34 in CD41+ and CD45+ populations at 144h and 216h timepoints has been included in Fig. 2B-D, Fig. 2S1A-D, including controls. In text: 180-187. We have earlier on in the rebuttal highlighted the fact that marker expression is responsive to the levels of Activin A used in the patterning pulse, with the 100ng/ml Activin A used in our protocol superior to 75ng/ml.

      Author response image 1.

      Association of CD34 with CD41 and CD45 expression is Activin A-responsive and supports the presence of definitive haematopoiesis. A. Flow cytometry analysis of CD34 and CD41 expression in 216h-haemogenic gastruloids; two doses of Activin A were used in the patterning pulse with CHI99021 between 48-72h. FMO controls shown. B. Flow cytometry analysis of CD34 and CD45 at 216h in the same experimental conditions.

      Given the centrality of this point in comments by all the Reviewers, we have conducted projections of our single-cell RNA-seq data against two studies which (1) capture arterial and haemogenic specification in the para-splanchnopleura (pSP) and AGM region between E8.0 and E11 (Hou et al, PMID: 32203131), and (2) uniquely capture YS, AGM and FL progenitors and the AGM endothelial-tohaematopoietic transition (EHT) in the same scRNA-seq dataset (Zhu et al, PMID: 32392346). Focusing the analysis on the subsets of haemogenic gastruloid cells sorted as CD41+ (144h) C-Kit+ (144h and 192h) and CD45+ (192h and 216h) (now represented in Fig. 3A, and projected onto the studies in Fig. 4A), we show:

      (1) That a subset of haemato-endothelial cells from haemogenic gastruloids at 144h to 216h project onto intra-embryonic cells spanning E8.25 to E10 (revised Fig. 4A left and 4S1A). This is in agreement with our original interpretation that 216h are no later than the MPP/pre-HSC state of embryonic development, requiring further maturation to generate engrafting progenitors. We have nevertheless removed specific references to pre-HSC, and instead referred to HSPC/progenitors.

      (2) That haemogenic gastruloids contain YS-like (including EMP-like) and AGM-like haematopoietic cells (Fig. 4A centre and 4 S1B). Significantly, some of the cells, particularly CKit-sorted cells with a candidate endothelial and HE-like signature project onto AGM pre-HE and HE, as well as IAHC. Some 144h CD41+ and 192h CD45+ cells also project onto IAHC, suggesting that YS-like and AGM-like programmes arise independently and with partial timedependent organisation in the haemogenic gastruloid model. Later, predominantly 216h cells, have characteristics of MPP/LMPP-like cells from the FL, suggesting a progenitor wave of differentiation.

      Altogether, the data support the notion that haemogenic gastruloids capture YS and AGM haematopoiesis until E10, as suggested by us in the manuscript.This re-analysis of the scRNA-seq data which was indeed prompted by challenging and insightful comments from the Reviewers, has been incorporated in the manuscript as described above and further listed here:

      Re-clustering and highlights of specific markers in our scRNA-seq data in Figure 3A-E. In text: 268-304.

      Projections to mouse embryo datasets in Figure 4A (Figure 4S1A-C; Supplementary File 3). In text: 315-356. 

      Single-cell RNA sequencing was used to compare hGx with mouse AGM. The authors incorrectly conclude that ' ..specification of endothelial and HE cells in hGx follows with time-dependent developmental progression into putative AGM-like HE..' And, '...HE-projected hGx cells.......expressed Gata2 but not Runx1, Myb, or Gfi1b..' Hemogenic endothelium is defined by the expression of Runx1 and Gfli1b is downstream of Runx1.

      As a hierarchy of regulation, Gata2 precedes and drives Runx1 expression at the specification of HE (PMID: 17823307; PMID: 24297996), while Runx1 drives the EHT, upstream of Gfi1b in haematopoietic clusters (PMID: 34517413). Please note that the text segment the Reviewer refers to has been removed from the manuscript, as the analysis is no longer solely focused on projection to Thambyrajah et al (2024) data, and instead gained significantly from the projections on to the Hou et al (2020) and Zhu et al (2020) studies, as detailed above.

      (3) The hGx protocol 'generates hematopoietic SC precursors capable of short-term engraftment' is not supported by the data presented. Short-term engraftment would be confirmed by flow cytometric detection of hematopoietic cells within the recipient bone marrow, spleen, thymus, and peripheral blood that expressed the BFP transgene. This analysis was not provided. PCR detection of transcripts, following an unspecified number of amplification cycles, as shown in Figure 3G (incorrectly referred to as Figure 3F in the legend) is not acceptable evidence for engraftment.

      We provide the full flow cytometry analysis of spleen engraftment in the 5 mice which received implantation of 216h-haemogenic gastruloids in the adrenal gland and were analysed at 4 weeks; an additional (control) animal received adrenal injection of PBS (Fig. 4B-D in the revised manuscript). In this experiment, the bone marrow collection was limiting, and material was prioritised for PCR (Fig. 4C and full gels in 4S2C in the revised manuscript).

      We had previously provided only representative plots of flow cytometry analysis of bone marrow and spleen, which we described as low-level engraftment and were chosen conservatively. The analysis was meant to complement the genomic DNA PCR, where detection was present in only some of the replicates tested per animal. On this note, we confirm that PCR analysis used conventional 40 cycles; the sensitivity had already been shown in the earlier version of the manuscript and is again represented in Fig. 4S2B. We argue that the low level of cytometric and molecular engraftment at 4 weeks, from haemogenic gastruloid-derived progenitors that have not progressed beyond a stage equivalent to E10 (Fig. 4A and Supplementary File 3 in the revised manuscript from scRNAseq projections), and that we have described as requiring additional maturation in vivo, are not surprising. Indeed, as previously shown and now repeated in in Fig. 2B-E (controls in Fig. 2S1E-G) in the revised manuscript, no more than 7 CD45+CD144+ multipotent cells are present per haemogenic gastruloid. We are only able to implant 3 haemogenic gastruloids in the adrenal gland of each transplanted animal. 

      We have rephrased Results and Discussion in lines 359-415 and 588-621, respectively, to rectify the nature of the engraftment, which we now attribute more generically to progenitors, also in light of the developmental time we could capture in the gastruloids prior to implantation.

      Transplanted hGx formed teratoma-like structures, with hematopoietic cells present at the site of transplant only analysed histologically. Indeed, the quality of the images provided does not provide convincing validation that donor-derived hematopoietic cells were present in the grafts.

      As stated in the text, the images mean to illustrate that the haemogenic gastruloids developed in situ. Further analysis motivated by the Reviewers’ comments and indeed a subsequent experiment with analysis of engraftment at a later timepoint of 8 weeks (revised Fig. 4E and 4 S2F-G) did not show a direct correspondence between engraftment and in vivo development or expansion, although this occurs in some cases. To be clearer, the observation of donor-derived blood cells in the implanted haemogenic gastruloids would not correspond to engraftment, as we have amply demonstrated that they have generated blood cells in vitro. There is no evidence that there are remaining pluripotent cells in the haemogenic gastruloid after 9 days of differentiation, and it is therefore not clear that the structures observed are teratomas. We specifically comment on this point in the revised manuscript – lines 601-607.

      There is no justification for the authors' conclusion that '... the data suggest that 216h hGx generate AGM-like pre-HSC capable of at least short-term multilineage engraftment upon maturation...'. Indeed, this statement is in conflict with previous studies demonstrating that pre-HSCs in the dorsal aorta of the mouse embryo are immature and actually incapable of engraftment.

      We have clearly stated that we do not see haematopoietic engraftment through transplantation of dissociated haemogenic gastruloids, which reach the E10 state containing pre-HSC (revised Fig 4A, 4S1A and Supplementary File 3). Instead, we observed rare myelo-erythroid (revised Fig. 4S2F-G) and myelo-lymphoid (revised Fig. 4E) engraftment upon in vivo maturation of haemogenic gastruloids with preserved 3D organisation. These statements are not contradictory. Nevertheless, we have now more cautiously attributed engraftment to the present of progenitors as a generic designation, and not to pre-HSC (lines 412-414 and 588-592 in the revised manuscript).

      The statement '...low-level production of engrafting cells recapitulates their rarity in vivo, in agreement with the embryo-like qualities of the gastruloid system....' is incorrect. Firstly, no evidence has been provided to show the hGx has formed a dorsal aorta facsimile capable of generating cells with engrafting capacity. Secondly, although engrafting cells are rare in the AGM, approximately one per embryo, they are capable of robust and extensive engraftment upon transplantation.

      As indicated above, the statement in lines 412-414 now reads “Engraftment is erythromyeloid at 4 weeks and lympho-myeloid at 8 weeks, reflecting different classes of progenitors, putatively of YS-like and AGM-like affiliation.” To be clear, with our original statement we meant to highlight that the production of definitive AGM-like haematopoietic progenitors (not all of which are engrafting) in haemogenic gastruloids does not correspond to non-physiological single-lineage programming. We did and do not claim that we achieved production of HSC, which would be long-term engrafting.

      (4) Expression MNX1 transcript and protein in hematopoietic cells in MNX1 rearranged acute myeloid leukaemia (AML) is one cause of AML in infants. In the hGX model of this disease, Mnx1 is overexpressed in the mESCs that are used to form gastruloids. Mnx1 overexpression seems to confer an overall growth advantage on the hGx and increase the serial replating capacity of the small number of hematopoietic cells that are generated. The inefficiency with which the hGx model generates hematopoietic cells makes it difficult to model this disease. The poor quality of the cytospin images prevents accurate identification of cells. The statement that the kit-expressing cells represent leukemic blast cells is not sufficiently validated to support this conclusion. What other stem cell genes are expressed? Surface kit expression also marks mast cells, frequently seen in clonogenic assays of blood cells. Flow cytometric and gene expression analyses using known markers would be required.

      The haemogenic gastruloid model generates haematopoietic and haemato-endothelial cells. MNX1 expands C-Kit+ cells at 144h, which we show to have a haemato-endothelial signature (see revised Fig. 3A-E, Supplementary File 2). We have added additional flow cytometry data showing that the replating cells from MNX1 express CD31 (Figure 6S1A-B).

      Serial replating of CFC assays is a conventional in vitro assay of leukaemia transformation. Critically, colony replating is not maintained in EV control cells, attesting to the transformation potential of MNX1. Although we have not fully-traced the cellular hierarchy of MNX1-driven transformation in the haemogenic gastruloid system, the in vitro replating expands a C-Kit+ cell (revised Fig. 6E), which reflects the surface phenotype of the leukaemia, also recapitulated in the mouse model initiated by MNX1-overexpressing FL cells. Importantly, it recapitulates the transcriptional profile of MNX1leukaemia patients (revised Fig. 7C), which is uniquely expressed by MNX1144h and replated colony cells, but not to MNX1 216h gastruloid cells, arguing against a generic signature of MNX1 overexpression (revised Fig. 7B). Importantly, the MNX1-transformation of haemogenic gastruloid cells is superior to the FL leukaemia model at capturing the unique transcriptional features of MNX1-driven leukaemia, distinct from other forms of AML in the same age group (Fig 7 S1D-F). It is possible that this corresponds to a pre-leukaemia event, and we will explore this in future studies, which are beyond the proof-of-principle nature of this paper.

      (5) In human infant MNX1 AML, the mutation is thought to arise at the fetal liver stage of development. There is no evidence that this developmental stage is mimicked in the hGx model.

      We never claim that the haemogenic gastruloid model mimics the foetal liver. We propose that susceptibility to MNX1 is at the HE-to-EMP transition. Moreover, and importantly, contrary to the Reviewer’s statement, there is no evidence in the literature that the mutation arises in the foetal liver stage, just that the mutation arises before birth (PMID: 38806630), which is different. In a mouse model of MNX1 overexpression, the authors achieve leukaemia engraftment upon MNX1 overexpression in foetal liver, but not in bone marrow cells (PMID: 37317878). This is in agreement with a vulnerability of embryonic / foetal, but not adult cells to the MNX1 expression caused by the translocation. However, haematopoietic cells in the foetal liver originate from YS and AGM precursors, so the origin of the MNX1susceptible cells can be in those locations, rather than the foetal liver itself.

      Reviewer #2 (Public review):

      Summary: 

      In this manuscript, the authors develop an exciting new hemogenic gastruloid (hGX) system, which they claim reproduces the sequential generation of various blood cell types. The key advantage of this cellular system would be its potential to more accurately recapitulate the spatiotemporal emergence of hematopoietic progenitors within their physiological niche compared to other available in vitro systems. The authors present a large set of data and also validate their new system in the context of investigating infant leukemia. 

      Strengths: 

      The development of this new in vitro system for generating hematopoietic cells is innovative and addresses a significant drawback of current in vitro models. The authors present a substantial dataset to characterize this system, and they also validate its application in the context of investigating infant leukemia. 

      Weaknesses: 

      The thorough characterization and full demonstration that the cells produced truly represent distinct waves of hematopoietic progenitors are incomplete. The data presented to support the generation of late yolk sac (YS) progenitors, such as lymphoid cells, and aortic-gonad-mesonephros (AGM)-like progenitors, including pre-hematopoietic stem cells (pre-HSCs), by this system are not entirely convincing. Given that this is likely the manuscript's most crucial claim, it warrants further scrutiny and direct experimental validation. Ideally, the identity of these progenitors should be further demonstrated by directly assessing their ability to differentiate into lymphoid cells or fully functional HSCs. Instead, the authors primarily rely on scRNA-seq data and a very limited set of markers (e.g., Ikzf1 and Mllt3) to infer the identity and functionality of these cells. Many of these markers are shared among various types of blood progenitors, and only a well-defined combination of markers could offer some assurance of the lymphoid and pre-HSC nature of these cells, although this would still be limited in the absence of functional assays.

      The identification of a pre-HSC-like CD45⁺CD41⁻/lo C-Kit⁺VE-Cadherin⁺ cell population is presented as evidence supporting the generation of pre-HSCs by this system, but this claim is questionable. This FACS profile may also be present in progenitors generated in the yolk sac such as early erythromyeloid progenitors (EMPs). It is only within the AGM context, and in conjunction with further functional assays demonstrating the ability of these cells to differentiate into HSCs and contribute to long-term repopulation, that this profile could be strongly associated with pre-HSCs. In the absence of such data, the cells exhibiting this profile in the current system cannot be conclusively identified as true pre-HSCs.

      We present 2 additional pieces of evidence to support our claims that we capture YS and AGM stages of haematopoietic development.

      (I) In the new Figures 4A and 4 S1A-C and Supplementary File 3 in the revised manuscript, we project our single-cell RNA-seq data onto (1) developing intra-embryonic pSP and AGM between E8 and E11 (Fig. 4A left, 4S1A) and (2) a single-cell RNA-seq study of HE development which combines haemogenic and haematopoietic cells from the YS, the developing HE and IAHC in the AGM, and FL (Fig. 4A centre, 4S1B). Our data maps E8.25-E10, and captures YS EMP and erythroid and myeloid progenitors, as well as AGM pre-HE, HE and IAHC, with some cells matching HSPC and LMPP, as suggested by the projection onto the Thambyrajah et al data set (already presented in the previous version of the manuscript, and now in Fig. 4A right and 4 S1C). The projection of the scRNA-seq data in presented in lines 314-355 of the revised manuscript. The scRNA-seq data itself was refocused on haemato-endothelial programmes as presented in the revised Fig. 3A-E, described in lines 267-303.

      (II) Given the difficulty in finding markers that specifically associate with AGM haematopoiesis, we inspected the possibility of capturing different regulatory requirements at different stages of gastruloid development mirroring differential effects in the embryo. Polycomb EZH2 is specifically required for EMP differentiation in the YS, but does not affect AGM-derived haematopoiesis; it is also not required for primitive erythroid cells (PMID: 29555646; PMID: 34857757). We treated haemogenic gastruloids from 120h onwards with either DMSO (0.05%) or GSK126 (0.5uM), and inspected the cellularity of gastruloids at 144h, which we equate with YS-EMP, and 216h – putatively AGM haematopoiesis. We show that EZH2 inhibition / GSK126 treatment specifically reduces %CD41+ cells at 144h, but does not reduce %CD41+ or %CD45+ cells at 216h. We have included this experiment in the manuscript in Fig. 2 S2B-C (in text: 209-221).

      These data, together with the scRNA-seq projections described, provide evidence to our claim that 144h haemogenic gastruloids capture YS EMPs, while CD41+ and CD45+ cells isolated at 216h reflect AGM progenitors. We cannot conclude as to the functional nature of the AGM cells from this experiment. The main text has been edited to clarify the experiments pertaining to distinguishing AGM and YS haematopoiesis (lines 180-187; 200-221; 268-304; 315-356).

      The engraftment data presented are also not fully convincing, as the observed repopulation is very limited and evaluated only at 4 weeks post-transplantation. The cells detected after 4 weeks could represent the progeny of EMPs that have been shown to provide transient repopulation rather than true HSCs. 

      In the original version of the manuscript, we stated that there is low level engraftment and did not claim to have generated HSC. Instead, we described cells with short-term engraftment potential. We agree with the Reviewer that the cells we show in the manuscript at 4 weeks could be EMPs (revised Fig. 4B-E and 4 S2D-G). Additionally, we now have 8-week analysis of implant recipients, in which we observed, again low-level, a multi-lineage engraftment of the recipient bone marrow in 1:3 recipients (revised Fig. 4B-E and 4S2F-H). This engraftment is myeloid-lymphoid and therefore likely to have originated in a later progenitor. To be clear, we do not claim that this corresponds to the presence of HSC. It nevertheless supports the maturation of progenitors with engraftment potential. Limiting amounts of material was prioritised for flow cytometry stainings, not allowing PCR analysis. We rephrased Results and Discussion in lines 359-414 and 588-621, respectively, to rectify the nature of the engraftment.      

      Reviewer #3 (Public review):  

      In this study, the authors employ a mouse ES-derived "hemogenic gastruloid" model which they generated and which they claim to be able to deconvolute YS and AGM stages of blood production in vitro. This work could represent a valuable resource for the field. However, in general, I find the conclusions in this manuscript poorly supported by the data presented. Importantly, it isn't clear what exactly are the "YS" and the "AGM"-like stages identified in the culture and where is the data that backs up this claim. In my opinion, the data in this manuscript lack convincing evidence that can enable us to identify what kind of hematopoietic progenitor cells are generated in this system. Therefore, the statement that "our study has positioned the MNX1-OE target cell within the YS-EMP stage (line 540)" is not supported by the evidence presented in this study. Overall, the system seems to be very preliminary and requires further optimization before those claims can be made.

      Specific comments below: 

      (1) The flow cytometric analysis of gastruloids presented in Figure 1 C-D is puzzling. There is a large % of C-Kit+ cells generated, but few VE-Cad+ Kit+ double positive cells. Similarly, there are many CD41+ cells, but very few CD45+ cells, which one would expect to appear toward the end of the differentiation process if blood cells are actually generated. It would be useful to present this analysis as consecutive gating (i.e. evaluating CD41 and CD45 within VE-Cad+ Kit+ cells, especially if the authors think that the presence of VE-Cad+ Kit+ cells is suggestive of EHT). The quantification presented in D is misleading as the scale of each graph is different.

      Fig. 1C-D provide an overview of haemogenic markers during the timecourse of haemogenic gastruloid differentiation, and does indeed show a late up-regulation of CD45, as the Reviewer points out would be expected. The %CD45+ cells is indeed low. However, we should point out that the haemogenic gastruloid protocol, although biased towards mesodermal outputs, does not aim to achieve pure haematopoietic specification, but rather place it in its embryo-like context. We refute that the scale is misleading: it is a necessity to represent the data in a way that is interpretable by the reader: and we made sure from the outset that the gates (in C) are truly representative and annotated, as are the plot axes (in D). Consecutive gating at the 216h-timepoint is shown and quantified in Fig. 2S1D-F, or in the alternative consecutive gating suggested by the Reviewer, in Author response iamge 2 below. At the request of Reviewer 1, we also analysed CD31 and CD34 within CD41 and CD45 populations, again as validation of the emergent haematopoietic character of the cells obtained. This new analysis is shown in revised Fig. 2B, quantified in 2C.

      Author response image 2.

      Flow cytometry analysis of VE-cadherin+ cells in haemogenic gastruloids at 216h of the differentiation protocol, probing co-expression of CD45, CD41 and C-Kit.

      (2) The imaging presented in Figure 1E is very unconvincing. C-Kit and CD45 signals appear as speckles and not as membrane/cell surfaces as they should. This experiment should be repeated and nuclear stain (i.e. DAPI) should be included.

      We included the requested immunofluorescence staining in Figure 1E (216h). We also show the earlier timepoint of 192h here as Author response image 3. In text: lines 158-162.

      Author response image 3.

      Confocal images of haematopoietic production in haemogenic gastruloids. Wholemount, cleared haemogenic gastruloids were stained for CD45 (pseudo-coloured red) and C-Kit antigens (pseudo-coloured yellow) with indirect staining, as described in the manuscript. Flk1-GFP signal is shown in green. Nuclei are contrasted with DAPI. (A) 192h. (B) 216h.

      (3) Overall, I am not convinced that hematopoietic cells are consistently generated in these organoids. The authors should sort hematopoietic cells and perform May-Grunwald Giemsa stainings as they did in Figure 6 to confirm the nature of the blood cells generated.

      It is factual that the data are reproducible and complemented by functional assays shown in revised Fig. 2D-E, which clearly demonstrate haematopoietic output. The single-cell RNA-seq data also show expression of a haematopoietic programme, which we have complemented with biologically independent qRT-PCR analysis of the expression of key endothelial and haematopoietic marker and regulatory genes (revised Fig. 2F; in text: 200-209). As requested, we include Giemsa-Wright’s stained cytospins obtained at 216h to illustrate haematopoietic output. These are shown in revised Fig. 2S2A, in text: lines 194-199. Inevitably, the cytospins will be inconclusive as to the presence of endothelial-tohaematopoietic transition or the generation of haematopoietic stem/progenitor cells, as these cells do not have a distinctive morphology.

      (4) The scRNAseq in Figure 2 is very difficult to interpret. Specific points related to this: - Cluster annotation in Figure 2a is missing and should be included. 

      Why do the heatmaps show the expression of genes within sorted cells? Couldn't the authors show expression within clusters of hematopoietic cells as identified transcriptionally (which ones are they? See previous point)? Gene names are illegible.

      I see no expression of Hlf or Myb in CD45+ cells (Figure 2G). Hlf is not expressed by any of the populations examined (panels E, F, G). This suggests no MPP or pre-HSC are generated in the culture, contrary to what is stated in lines 242-245. (PMID 31076455 and 34589491).Later on, it is again stated that "hGx cells... lacked detection of HSC genes like Hlf, Gfi1, or Hoxa9" (lines 281-283). To me, this is proof of the absence of AGM-like hematopoiesis generated in those gastruloids.

      For a combination of logistic and technical reasons, we performed single-cell RNA-seq using the Smart-Seq2 platform, which is inherently low throughput. We overcame the issue of cell coverage by complementing whole-gastruloid transcriptional profiling at successive time-points with sorting of subpopulations of cells based on individual markers documented in Fig. 1. We clearly stated which platform was used as well as the number and type of cells profiled (Fig. 3S1 and lines 226-241 of the revised manuscript), and our approach is standard. Following suggestions of the Reviewers to further focus our analysis on the haemogenic cellular differentiation within the gastruloids, we revised the presentation of the scRNA-seq data to now provide UMAP projections with representation and quantification of individual genes, including the ones queried by the Reviewer in Fig. 3 and respective supplements. Specifically, re-clustering and highlighting of specific markers are shown in Figure 3A-D and presented in lines 267-303 of the revised manuscript. Complementary independent real-time quantitative (q)PCR analysis showing time-dependent expression of endothelial and haematopoietic markers is now in Figure 2F. In text: 200-208.

      (5) Mapping of scRNA-Seq data onto the dataset by Thambyrajah et al. is not proof of the generation of AGM HE. The dataset they are mapping to only contains AGM cells, therefore cells do not have the option to map onto something that is not AGM. The authors should try mapping to other publicly available datasets also including YS cells.

      We have done this and the data are presented in Figure 4A (Figure 4S1A) and Supplementary File. In text: 314-355. As detailed in response to Reviewer 1, we have conducted projections of our single-cell RNA-seq data against two studies which (1) capture arterial and haemogenic specification in the para-splanchnopleura (pSP) and AGM region between E8.0 and E11 (Hou et al, PMID: 32203131) (revised Fig. 4A and 4 S1A), and (2) uniquely capture YS, AGM and FL progenitors and the AGM endothelial-to-haematopoietic transition (EHT) in the same scRNA-seq dataset (Zhu et al, PMID: 32392346) (revised Fig. 4A and 4 S1B). Specifically in answering the Reviewers’ point, we show that different subsets of haemogenic gastruloid cells sorted on haemogenic surface markers C-Kit, CD41 and CD45 cluster onto pre-HE and HE, intra-aortic clusters and FL progenitor compartments, and to YS EMP and erythroid and myeloid progenitors. This lends support to our claim that the haemogenic gastruloid system specifies both YS-like and AGM-like cells. Please note that we now do point out that some CD41+ cells at 144h project onto IAC, as do cells at the later timepoints, suggesting that AGM-like and YS-EMP-like waves may overlap at the 144h timepoint (lines…). In the future, we will address specific location of these cells, but that corresponds to a largescale spatial transcriptomics analysis requiring extensive optimisation for section capture which is beyond the scope of this manuscript and this revision. 

      (6) Conclusions in Figure 3, named "hGx specify cells with preHSC characteristics" are not supported by the data presented here. Again, I am not convinced that hematopoietic cells can be efficiently generated in this system, and certainly not HSCs or pre-HSCs.

      We have provided evidence in the original manuscript, and now through additional experiments, that there is haematopoietic specification, including of progenitor cells, in the haemogenic gastruloid system. Molecular markers are shown in revised Fig. 2F and Fig. 3 and supplements; CFC assays are shown in revised Fig. 2D-E; cytospins are in revised Fig. 2 S2A; further analysis of 4-week implants and new analysis of 8-week implants (discussed below) are in revised Fig. 4 B-D and Fig. 4 S2 and we discussed the new scRNA-seq projections above. Importantly, we have never claimed, and again do not, that haemogenic gastruloids generate HSC. We accept the Reviewer’s comment that we have not provided sufficient evidence for the specification of pre-HSC-like cells and accordingly now refer more generically and conservatively to progenitors.

      FACS analysis in 3A is again very unconvincing. I do not think the population identified as C-Kit+ CD144+ is real. Also, why not try gating the other way around, as commonly done (e.g. VE-Cad+ Kit+ and then CD41/CD45)?

      Our gating strategy is not unconventional, which was done from a more populated gate onto the less abundant one to ensure that the results are numerically more robust. In the case of haemogenic gastruloids, unlike the AGM preparations the Reviewer may be referring to, CD41 and CD45+ cells are more abundant as there is no circulation of more differentiated haematopoietic cells away from the endothelial structures. This said, we did perform the gating as suggested (Rev Fig. 2), indeed confirming that most VE-cad+ Kit+ cells are CD45+. Interestingly VE-cad+Kit- are predominantly CD41+, reinforcing the haematopoietic nature of these cells.

      The authors must have tried really hard, but the lack of short- or long-engraftment in a number of immunodeficient mouse models (lines 305-313) really suggests that no blood progenitors are generated in their system. I am not familiar with the adrenal gland transplant system, but it seems like a very non-physiological system for trying to assess the maturation of putative pre-HSCs. The data supporting the engraftment of these mice, essentially seen only by PCR and in some cases with a very low threshold for detection, are very weak, and again unconvincing. It is stated that "BFP engraftment of the Spl and BM by flow cytometry was very low level albeit consistently above control (Fig. S4E)" (lines 337-338). I do not think that two dots in a dot plot can be presented as evidence of engraftment.

      We have presented the data with full disclosure and do not deny that the engraftment achieved is low-level and short-term, indicating incomplete maturation of definitive haematopoietic progenitors in the current haemogenic gastruloid system. Indeed, by not wanting to overstate the finding, we were deliberately conservative in our representative flow cytometry plots and focused on the PCR for sensitivity. We now present the full flow cytometry analysis for spleen where we preserved more cells after the genomic DNA extraction (revised Fig. 4C) and call the Reviewer’s attention to the fact that detection of BFP+ cells by PCR and flow cytometry in the recipient animals is consistent between the 2 methods (revised Fig. 4C and D; full gels previously presented now in Fig. 4S2C; sensitivity analysis was also previously available and is now in Fig. 4S2B). In addition, we have now also been able to detect low-level myelo-lymphoid engraftment in the bone marrow and spleen 8 weeks after adrenal implantation, again suggesting the presence of a small number of definitive haematopoietic progenitors that potentially mature from the 3 haemogenic gastruloids implanted (Fig. 4E and 4 S2F-G in the revised manuscript. We rephrased Results and Discussion at lines 359-414 and 589-621, respectively, to rectify the nature of the engraftment which we attribute to progenitors.

      (7) Given the above, I find that the foundations needed for extracting meaningful data from the system when perturbed are very shaky at best. Nevertheless, the authors proceed to overexpress MNX1 by LV transduction, a system previously shown to transform fetal liver cells, mimicking the effect of the t(7;12) AML-associated translocation. Comments on this section:

      The increase in the size of the organoid when MNX1 is expressed is a very unspecific finding and not necessarily an indication of any hematopoietic effect of MNX1 OE.

      We agree with the Reviewer on this point; it is nevertheless a reproducible observation which we thought relevant to describe for completeness and data reproducibility.

      The mild increase of cKit+ cells (Figure 4E) at the 144hr timepoint and the lack of any changes in CD41+ or CD45+ cells suggests that the increase in Kit+ cells % is not due to any hematopoietic effect of MNX1 OE. No hematopoietic GO categories are seen in RNA seq analysis, which supports this interpretation. Could it be that just endothelial cells are being generated?

      The Reviewer is correct that the MNX1-overexpressing cells have a strong endothelial signature, which is present in patients (revised Fig. 5A). We investigated a potential link with C-Kit by staining cells from the replating colonies during the process of in vitro transformation with CD31. We observed that 40-50% of C-Kit+ cells (20-30% total colony cells) co-expressed CD31, at least at early plating. These cells co-exist with haematopoietic cells, namely Ter119+ cells, as expected from the YSlike erythroid and EMP-like affiliation of haematopoietic output from 144h-haemogenic gastruloids. These data are included in Fig. 6S1A-B (in text 506-507) of the revised manuscript.

      (8) There seems to be a relatively convincing increase in replating potential upon MNX1-OE, but this experiment has been poorly characterized. What type of colonies are generated? What exactly is the "proportion of colony forming cells" in Figures 5B-D? The colony increase is accompanied by an increase in Kit+ cells; however, the flow cytometry analysis has not been quantified.

      Given the inability to replate control EV cells, there is not a population to compare with in terms of quantification. The level of C-Kit+ represented in Fig. 6E of the revised manuscript is achieved at plate 2 or 3 (depending on the experiment), both of which are significantly enriched for colony-forming cells relative to control (revised Fig. 6B, D).  

      (9) Do hGx cells engraft upon MNX1-OE? This experiment, which appears not to have been performed, is essential to conclude that leukemic transformation has occurred.

      For the purpose of this study, we are satisfied with confirmation of in vitro transformation potential of MNX1 haemogenic gastruloids, which can be used for screening purposes. Although interesting, in vivo leukaemia engraftment from haemogenic gastruloids is beyond the scope of this study.

      Reviewer #2 (Recommendations for the authors):

      (1) Minor comments

      (a) I find the denomination "hGx" very confusing as it would suggest that these gastruloids are human, whereas, in fact, they are murine.

      We agree with the Reviewer on the confusing nomenclature and have edited the manuscript to call “haemGx” instead.

      (b) I find the presence of mast cells in CFC of MNX1-OE cultures very puzzling as this does not bear any resemblance to human leukemia.

      We detect an enrichment of mast cell transcriptional programmes, as defined by the cell type repositories. While it is not mast cells to represent leukaemic cells in patients, this ontology is likely to reflect the developmental stage and origin of progenitors which are affected by MNX1.

      (2) I have a few suggestions to improve figures and tables clarity, to help readers better follow the data presented.

      (a) To enhance readability, it would be beneficial to highlight the genes mentioned in the text within the scRNA-seq figures. Many figures currently display over 30-40 genes in small font sizes, making it difficult to quickly locate specific genes discussed in the text. Additionally, implementing a colorcoding system to categorize these genes according to their proposed lineages would improve clarity and organization.

      We have now performed major re-organisation and re-analyses of the scRNA-seq data, which we believe has improved the readability and clarity of the corresponding sections of the manuscript.

      (b) The data presented in Supplementary Table 1, along with other supplementary tables, are challenging to interpret due to insufficient annotations. Enhancing these tables with clearer and more detailed annotations would significantly improve clarity and aid readers in understanding the supplementary materials.

      Descriptive text has been added to accompany each Supplementary File to aid in understanding the results reported therein.

      Reviewer #3 (Recommendations for the authors):

      In addition to what was written in the public review, I would suggest the authors simplify and shorten the text. Currently, a lot of unnecessary detail is included which makes the story very hard to follow. Moreover, the authors should modify the figures to make them more comprehensible, especially for RNA-seq data.

      We have significantly re-arranged and shortened parts of the manuscript, particularly by focusing the Discussion. Results presentation has also been improved through additional analysis and graphic representation of the scRNA-seq data, which we believe has improved the readability and clarity.s

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #2 (Public review)

      In this manuscript, Weiguang Kong et al. investigate the role of immunoglobulin M (IgM) in antiviral defense in the teleost largemouth bass (Micropterus salmoides). The study employs an IgM depletion model, viral infection experiments, and complementary in vitro assays to explore the role of IgM in systemic and mucosal immunity. The authors conclude that IgM is crucial for both systemic and mucosal antiviral defense, highlighting its role in viral neutralization through direct interactions with viral particles. The study's findings have theoretical implications for understanding immunoglobulin function across vertebrates and practical relevance for aquaculture immunology.

      Strengths:

      The manuscript applies multiple complementary approaches, including IgM depletion, viral infection models, and histological and gene expression analyses, to address an important immunological question. The study challenges established views that IgT is primarily responsible for mucosal immunity, presenting evidence for a dual role of IgM at both systemic and mucosal levels. If validated, the findings have evolutionary significance, suggesting the conserved role of IgM as an antiviral effector across jawed vertebrates for over 500 million years. The practical implications for vaccine strategies targeting mucosal immunity in fish are noteworthy, addressing a key challenge in aquaculture.

      Weaknesses:

      Several conceptual and technical issues undermine the strength of the evidence:<br /> Monoclonal Antibody (MoAb) Validation: The study relies heavily on a monoclonal antibody to deplete IgM, but its specificity and functionality are not adequately validated. The epitope recognized by the antibody is not identified, and there is no evidence excluding cross-reactivity with other isotypes. Mass spectrometry, immunoprecipitation, or Western blot analysis using tissue lysates with varying immunoglobulin expression levels would strengthen the claim of IgM-specific depletion.<br /> IgM Depletion Kinetics: The rapid depletion of IgM from serum and mucus (within one day) is unexpected and inconsistent with prior literature. Additional evidence, such as Western blot analyses comparing treated and control fish, is necessary to confirm this finding.

      Novelty of Claims: The manuscript claims a novel role for IgM in viral neutralization, despite extensive prior literature demonstrating this role in fish. This overstatement detracts from the contribution of the study and requires a more accurate contextualization of the findings.

      Support for IgM's Crucial Role: The mortality data following IgM depletion do not fully support the claim that IgM is indispensable for antiviral defense. The survival of IgM-depleted fish remains high (75%) compared to non-primed controls (~50%), suggesting that other immune components may compensate for IgM loss

      .<br /> Presentation of IgM Depletion Model: The study describes the IgM depletion model as novel, although similar models have been previously published (e.g., Ding et al., 2023). This should be clarified to avoid overstating its novelty.

      While the manuscript attempts to address an important question in teleost immunology, the current evidence is insufficient to fully support the authors' conclusions. Addressing the validation of the monoclonal antibody, re-evaluating depletion kinetics, and tempering claims of novelty would strengthen the study's impact. The findings, if rigorously validated, have important implications for understanding the evolution of vertebrate immunity and practical applications in fish health management.

      This work is of interest to immunologists, evolutionary biologists, and aquaculture researchers. The methodological framework, once validated, could be valuable for studying immunoglobulin function in other non-model organisms and for developing targeted vaccine strategies. However, the current weaknesses limit its broader applicability and impact.

      We would like to thank Reviewer for the helpful comments. As the reviewer suggested, we verified the specificity of anti-bass IgM MoAb using multiple well-established experimental approaches, including mass spectrometry analysis, western blot, flow cytometry, and in vivo IgM depletion models. Additionally, we included western blot analyses to further confirm the IgM depletion kinetics. Moreover, we carefully revised any overstated claims in the original manuscript and incorporated the valuable suggestions of the reviewer in the Introduction and Discussion sections to enhance the clarity and rigor of our work.

      Reviewer #1 (Recommendations for the authors):

      (1) Experiments and Data Validation:

      Monoclonal Antibody Validation:

      Provide detailed validation of the monoclonal antibody (MoAb) used for IgM depletion.Perform immunoprecipitation followed by mass spectrometry to confirm the specificity of the MoAb and identify any off-target interactions. Conduct Western blot analysis using tissue lysates with varying IgM, IgT, and IgD expression to demonstrate specificity. Include controls, such as a group treated with a control antibody of the same isotype, to confirm the depletion specificity and effects. Present data on the binding site of the MoAb and confirm it targets IgM.

      We thank the reviewer for this constructive comment and have carried out a comprehensive validation of anti-bass IgM monoclonal antibody (MoAb).

      Validation of anti-bass IgM MoAb by Mass Spectrometry

      To validate the specificity of anti-bass IgM MoAb, target proteins were immunoprecipitated from bass serum using IgM MoAb-coupled CNBr-activated Sepharose 4B beads, followed by mass spectrometry analysis to verify exclusive IgM heavy-chain identification (Figure 3–figure supplement 1A). Quantitative mass spectrometry verified the antibody’s specificity, with IgM heavy-chain peptides representing 97.3% of total signal, indicating negligible off-target reactivity. This high target specificity was further supported by the no detectable cross-reactivity to IgT/IgD (Figure 3–figure supplement 1B). Moreover, the 72% sequence coverage (Figure 3–figure supplement 1C) and confirmed LC-MS/MS spectra of IgM peptides (Figure 3–figure supplement 1D) further validated target selectivity.

      Validation of anti-bass IgM MoAb by western blot and flow cytometry

      We compared the anti-bass IgM MoAb with an isotype control (mouse IgG1) under both non-reducing and reducing serum immunoblots. The western blot results showed that the developed MoAb bound specifically to IgM in largemouth bass serum. Owing to the structural diversity of fish IgM isoforms, denatured non-reducing electrophoresis typically yields multiple bands with varying molecular weights (Rombout et al., 1993; Ye et al., 2010). Immunoblot analysis revealed multiple bands with varying molecular weights under non-reducing conditions, with the main band ranging from 700 to 800 kDa and a distinct ~70 kDa band under reducing conditions (Figure 3–figure supplement 2A). Notably, the isotype control showed no detectable bands under both non-reducing and reducing conditions (Figure 3–figure supplement 2A). Additionally, we analyzed tissue lysates from various sources (i.e., Spleen, skin, gill, and gut) and observed consistently recognized bands at identical positions and sizes, whereas the isotype control showed no detectable bands (Figure 3–figure supplement 2B-F).

      Next, we performed flow cytometry analysis to confirm antibody specificity. In largemouth bass head kidney leukocytes, IgM<sup>+</sup> B cells accounted for 28.56% of the population, compared to only 0.41% for the isotype control (Figure 3–figure supplement 2G). Following flow sorting of negative and positive cell populations, we extracted RNA from equal cell numbers. Gene expression analysis revealed high expression of IgM and IgD in the positive population, while IgT and T cell markers were absent (Figure 3–figure supplement 2H and I). These results collectively demonstrate that the monoclonal antibody specifically targets largemouth bass IgM.

      Validation of the depletion specificity and effects using an isotype-matched control antibody

      Largemouth bass (~3 to 5 g) were intraperitoneally injected with 300 µg of mouse anti-bass IgM monoclonal antibody (MoAb, clone 66, IgG1) or an isotype control (mouse IgG1, Abclonal, China). The concentration of IgM in the serum and gut mucus from these MoAb-treated fish was measured by western blot. Our results indicated that anti-bass IgM treatment led to a marked reduction in IgM protein levels in serum (Author response image 1A) and gut mucus (Author response image 1B) from day 1 post-treatment, in contrast to control fish treated with an isotype-matched control antibody.

      Author response image 1.

      Validation of the depletion specificity and effects using an isotype-matched control antibody. (A, B) The depletion effects of IgM from the serum (A) or gut mucus (B) of control or IgM‐depleted fish was detected by western blot. Iso: Isotype group; Dep: IgM‐depleted group.

      We fully agree with the reviewer that epitope characterization would further validate and elucidate the specificity of IgM MoAb. In the present study, we have demonstrated the antibody's IgM-specific binding through multiple classic experimental methods: (1) mass spectrometry analysis, (2) western blot analysis, (3) flow cytometry analysis, and (4) in vivo IgM depletion models. These results collectively support the conclusion that our MoAb specifically targets IgM. We feel that conformational epitope mapping requires structural biology approaches are out of the scope of this work, although future studies should address them in detail.

      Kinetics of IgM Depletion:

      Provide additional evidence for the observed rapid depletion of IgM from serum and mucus within one day, as this is inconsistent with previous findings. Include Western blot results to confirm IgM depletion kinetics.

      Thanks for the reviewer’s suggestion. Previous studies have demonstrated significant differences in the depletion efficiency and persistence of IgM<sup>+</sup> B cells between warm-water and cold-water fish species. In Nile tilapia (Oreochromis niloticus), a warm-water species, administration of 20 µg of anti-IgM antibody resulted in a near-complete depletion of IgM<sup>+</sup> B cells within 9 days (Li et al., 2023). In contrast, rainbow trout (Oncorhynchus mykiss), a cold-water species, required significantly higher doses (200–300 µg) to achieve similar depletion, which persisted in both blood and gut from week 1 up until week 9 post-depletion treatment (Ding et al., 2023). In this study, we investigated largemouth bass (Micropterus salmoides), a warm-water freshwater species. Administration of 300 μg of IgM antibody resulted in rapid IgM+ B cell depletion from serum and mucus within one day, indicating that the rapid depletion kinetics may be attributed to the combined effects of the elevated antibody dose and the species-specific immunological characteristics. Moreover, we provide a western blot analysis of serum and mucus after IgM depletion as shown in Figure 5–figure supplement 1G and H.

      Neutralizing Capacity Assays:

      Discuss the potential role of complement or other serum/mucus factors in the neutralization assays. Consider performing neutralization assays that isolate viruses, antibody, and target cells to assess the specific role of IgM.

      Thanks for the reviewer’s insightful suggestion regarding the potential influence of complement and other serum/mucus factors in our neutralization assays. We sincerely regret that the lack of clarity in our methodological description caused misunderstandings to the reviewer. In fact, prior to performing the virus neutralization assays, serum and mucus samples were heat-inactivated at 56 °C to eliminate potential complement interference. Now, we added the related description of heat-inactivation of serum and mucus samples in the revised manuscript (Lines 727-729). Moreover, our results showed that selective IgM depletion from high LMBV-specific IgM titer mucus and serum samples resulted in significantly increased viral loads and enhanced cytopathic effects (CPE), while no significant difference was observed compared to the control group (shown in Figure 6 of the manuscript).

      To further rule out complement or other factors, we purified IgM from serum and gut mucus of 42DPI-S fish for neutralization assays. Briefly, anti-bass IgM MoAb was coupled to CNBr-activated sepharose 4B beads and used for purification of IgM from both serum and gut mucus of 42DPI-S fish. After that, 100 µL of LMBV (1 × 10<sup>4</sup> TCID<sub>50</sub>) in MEM was incubated with PBS and purified IgM (100 µg/mL) at 28 °C for 1 hour and then the mixtures were applied to infect EPC cells. Medium or bass IgM was added to EPC cells as controls. We added the new text in Materials and methods of the revised manuscript in Lines 735-741. Our result showed that a significant reduction in both LMBV-MCP gene expression and protein levels was observed in EPC cells treated with purified IgM from serum (Figure 6–figure supplement 2A, C, and D) or gut mucus (Figure 6–figure supplement 2B, E, and F). Moreover, significantly lower CPE were observed in the IgM treated group, while no CPE was observed in medium and bass IgM group (Figure 6–figure supplement 2G). Collectively, these findings strongly suggest that the neutralization process is a potential mechanism of IgM, serving as a key molecule in adaptive immunity against viral infection. Here, we have incorporated these new findings in the Results section of the revised manuscript (Lines 382-388).

      IgT Depletion Model:

      To fully establish the role of IgM and IgT in antiviral defense, consider including an experimental group where IgT is depleted.

      Thanks for the reviewer’s suggestion. The role of IgT in mucosal antiviral immunity in teleost fish has been reported in our previous studies (Yu et al, 2022). However, this study primarily investigates the antiviral function of IgM in systemic and mucosal immunity and further analyzes the mechanisms of viral neutralization. In future research, we plan to establish an IgT and IgM double-depletion/knockout model to further elucidate their specific roles in antiviral immune defense.

      (2) Writing and Presentation:

      Introduction:

      Replace the cited review article on IgT absence with original research articles (e.g., Bradshaw et al., 2020; Györkei et al., 2024) to strengthen the context.

      Thank you for your valuable suggestion. We have changed in the revised manuscript (Lines 45-50) as “Notably, while IgT has been identified in the majority of teleost species, genomic analyses reveal its absence in some species, such as medaka (Oryzias latipes), channel catfish (Ictalurus punctatus), Atlantic cod (Gadus morhua), and turquoise killifish (Nothobranchius furzeri) (Bengtén et al., 2002; Bradshaw et al., 2020; Magadán-Mompóet al., 2011; Györkei et al., 2024).”

      Highlight the evolutionary contrast between the presence of the J chain in older cartilaginous fishes and amphibians and its loss in teleosts. Relevant references include Hagiwara et al., 1985, and Hohman et al., 2003.

      Thank you for your valuable suggestion. We have added the relevant description in the revised manuscript (Lines 61-66) “Interestingly, the assembly mechanism of IgM exhibits significant evolutionary variation across vertebrate lineages. In cartilaginous fishes and tetrapods, IgM is secreted as a J chain-linked pentamer, which may enhance multivalent antigen recognition (Hagiwara et al., 1985; Hohman et al., 2003). By contrast, teleosts have undergone J chain gene loss, resulting in the stable of tetrameric IgM formation (Bromage et al., 2004).”

      Acknowledge prior studies demonstrating the viral neutralization role of teleost IgM (e.g., Castro et al., 2021; Chinchilla et al., 2013). Avoid overstating the novelty of findings.

      Thanks for the reviewer’s suggestion. Here, we revised the related description: “More crucially, our study provides further insight into the role of sIgM in viral neutralization and firstly clarified the mechanism through which teleost sIgM blocks viral infection by directly targeting viral particles. From an evolutionary perspective, our findings indicate that sIgM in both primitive and modern vertebrates follows conserved principles in the development of specialized antiviral immunity.” in the revised manuscript (Lines 20-25) and “To the best of our knowledge, our study provides new insights into the role of sIgM in viral neutralization, suggesting a potential function of sIgM in combating viral infections.” in the revised manuscript (Lines 536-538).

      Clarify terms such as "primitive IgM" and avoid misleading evolutionary language (e.g., VLRs are not "candidates"; they mediate adaptive responses).

      Thanks for the reviewer’s suggestion. We changed the description of the primitive IgM in the sentence of the revised manuscript as “From an evolutionary perspective, our findings indicate that sIgM in both primitive and modern vertebrates follows conserved principles in the development of specialized antiviral immunity.” in the revised manuscript (Lines 23-25) and “our findings suggest that sIgM in both primitive and modern vertebrates utilize conserved mechanisms in response to viral infections” in the revised manuscript (Lines 574-575). Moreover, we deleted the description of VLRs for "candidates" and rewrote the relevant sentence in the revised manuscript (Lines 37-39) as “Agnathans, the most ancient vertebrate lineage, do not possess bona fide Ig but have variable lymphocyte receptors (VLRs) capable of mediating adaptive immune responses (Flajnik, 2018).”

      Results and Discussion:

      Address inconsistencies between data and claims, such as the statement that IgM plays a "crucial role" in protection against LMBV, which is not fully supported by mortality data.

      Thank you for your insightful comment. We have carefully reviewed our data and revised the language throughout the manuscript to ensure that our claims are fully consistent with the mortality data. We have changed the description of “IgM plays a crucial role in protection against LMBV” as “plays a role” (Line 119), “sIgM participates in” (Line 127), “contributes to immune protection” (Line 507) to more accurately reflect the mortality data

      Revise the model in Figure 8 to reflect the concerns raised regarding proliferation data, the role of IgM in protective resistance, and the potential contributions of complement in neutralization assays.

      Thank you for your insightful comment. We have added the raised concerns regarding “the viral proliferation data and the role of IgM in protective resistance” in Figure 8 (shown below). Meanwhile, we added relevant descriptions in the figure legends of the revised manuscript (Lines 587-592) as “Upon secondary LMBV infection, plasma cells produce substantial quantities of LMBV-specific IgM. Critically, these virus-specific sIgM from both mucosal and systemic sources has the ability to neutralize the virus by directly binding viral particles and blocking host cell entry, thereby effectively reducing the proliferation of viruses within tissues. Consequently, the IgM-mediated neutralization confers protection against LMBV-induced tissue damage and significantly reduced mortality during secondary infection.”

      However, considering the following two reasons: (1) heat-inactivation of serum and mucus samples at 56°C prior to neutralization assays effectively abolished complement activity, and (2) purified IgM from both serum and gut mucus demonstrated comparable neutralization capacity, confirming IgM-dependent mechanisms independent of complement. Therefore, we did not add the potential function of complement in neutralization to Figure 8.

      Provide a comparative analysis with other vertebrate models to strengthen the evolutionary implications of findings.

      Thank you for your insightful comment. We have added comparative analyses across additional vertebrate models in the discussion of the revised manuscript to enhance the evolutionary perspective of our findings. The details are as follows:

      “Virus-specific IgM production has been well-documented in reptiles, birds, and mammals upon viral infection (Dascalu et al., 2024; Harrington et al., 2021; Hetzel et al., 2021; Neul et al., 2017;). While current evidence confirms the capacity of cartilaginous fish and amphibians to mount specific IgM responses against bacterial pathogens and immune antigens (Dooley and Flajnik, 2005; Ramsey et al., 2010), the potential for viral induction of analogous IgM-mediated immunity in these species remains unresolved.” in the revised manuscript (Lines 498-504) and “Extensive studies in endotherms (birds and mammals) have demonstrated that specific IgM contributes to viral resistance by neutralizing viruses (Baumgarth et al., 2000; Diamond et al., 2013; Ku et al., 2021; Hagan et al., 2016; Singh et al., 2022). In contrast, the neutralizing activity of IgM in amphibians and reptiles remains largely unexplored. Although viral infections have been shown to induce neutralizing antibodies in Chinese soft-shelled turtles (Pelodiscus sinensis) (Nie and Lu, 1999), the specific Ig isotypes mediating this response have yet to be elucidated. In teleost fish, IgM has been shown to possess viral neutralizing activity similar to that observed in endotherms (Castro et al., 2013; Ye et al., 2013). Furthermore, our recent work demonstrated that secretory IgT (sIgT) in rainbow trout (Oncorhynchus mykiss) can neutralize viruses, significantly reducing susceptibility to infection (Yu et al., 2022). However, whether IgM in teleost fish possesses the antiviral neutralizing capacity necessary for fish to resist reinfection remains poorly understood.” in the revised manuscript (Lines 521-534)

      Include a description of the Western blot procedure shown in Figures 7D and 7F in the Methods section.

      Thank you for your suggestion. A detailed protocol for the western blot experiments presented in Figures 7D and 7F has been added to the Methods section (Western Blot Analysis) in the revised manuscript (Lines 684-687). The details are as follows: Gut mucus, serum, and cells samples were analyzed by western blot as described by Yu et al (2022). Briefly, the samples were separated using 4%–15% SDS-PAGE Ready Gel (Thermo Fisher Scientific, USA) and subsequently transferred to Sequi-Blot polyvinylidene fluoride (PVDF) membranes (Bio-Rad, USA). The membranes were blocked using a 8% skim milk for 2 hours and then incubated with monoclonal antibody (MoAb). For IgM concentration detection, the membranes were incubated with mouse anti-bass IgM MoAb (clone 66, IgG1, 1 μg/mL) and then incubation with HRP goat-anti-mouse IgG (Invitrogen, USA) for 1 hour. IgM concentrations were determined by comparing the signal strength values to a standard curve generated with known amounts of purified bass IgM. For neutralizing effect detection, the membranes were incubated with mouse anti-LMBV MCP MoAb (4A91E7, 1 μg/mL) followed by incubation with HRP goat-anti-mouse IgG (Invitrogen, USA) for 1 hour. The β-actin is used as a reference protein to standardize the differences between samples. Immunoblots were scanned using the GE Amersham Imager 600 (GE Healthcare, USA) with ECL solution (EpiZyme, China).

      Ensure all figures are labeled appropriately (e.g., replace "Morality" with "Mortality" in Figure 5A).

      Thanks for bringing this to our attention. We have corrected the label in Figure 5A (shown below) and reviewed all figures to ensure that they are appropriately labeled.

      (3) Minor Corrections:

      Line 117: Correct the typo "across both both."

      Thanks for bringing this to our attention. We have changed “across both both” to “across both” in the revised manuscript (Line 119).

      Line 203: Revise to "IgM plays a role (not crucial role)."

      Thank you for your valuable suggestion. We have modified the description of IgM's role from “crucial” to “plays a role” to better align with our experimental findings in the revised manuscript (Line 202).

      Line 684: Correct the typo "given an intravenous injection with 200 μg."

      Thanks for bringing this to our attention. We have corrected the phrase to “given an intravenous injection with 200 μg” in the revised manuscript (Line 700-701).

      Line 686: Fix the sentence fragment "previously. EdU+ cells."

      Thank you for your careful review. We have revised the sentence fragment for clarity in the revised manuscript (Lines 702-703).

      Abstract and other sections: Adjust language to remove claims of novelty unsupported by data, particularly regarding the role of IgM in viral neutralization.

      Thank you for your constructive feedback. We have thoroughly reviewed and revised the language throughout the abstract and other sections to remove any unsupported claims of novelty, particularly regarding the role of IgM in viral neutralization in the revised manuscript (Lines 20-25).

      (4)Technical Details:

      Verify data availability, including raw data and analysis scripts, in line with eLife's data policies. Include detailed descriptions of all methods, particularly those involving Western blot analysis and antibody validation.

      Thank you for your suggestion. We added the verify data availability, including raw data and analysis scripts as “The raw RNA sequencing data have been deposited in the NCBI Sequence Read Archive under BioProject accession number PRJNA1254665. The mass spectrometny proteomics data have been deposited to the iProX platform with the dataset identifier IPX0011847000.” in the revised manuscript (Lines 808-811).

      (5) Ethical and Policy Adherence:

      Confirm compliance with ethical standards for animal use and antibody development.Ensure proper citation of all referenced works and accurate reporting of prior findings.

      Thank you for your valuable comment. We confirm that our study fully complies with ethical standards for animal use and antibody development. Additionally, we have carefully reviewed the manuscript to ensure that all referenced works are properly cited and that prior findings are accurately reported.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Overall, the conclusions of the paper are mostly supported by the data but may be overstated in some cases, and some details are also missing or not easily recognizable within the figures. The provision of additional information and analyses would be valuable to the reader and may even benefit the authors' interpretation of the data. 

      We thank the reviewer for the thoughtful and constructive feedback. We are pleased that the reviewer found the overall conclusions of our paper to be well supported by the data, and we appreciate the suggestions for improving figure clarity and interpretive accuracy. Below, we address each point with corresponding revisions.

      The conclusion that DREADD expression gradually decreases after 1.5-2 years is only based on a select few of the subjects assessed; in Figure 2, it appears that only 3 hM4Di cases and 2 hM3Dq cases are assessed after the 2-year timepoint. The observed decline appears consistent within the hM4Di cases, but not for the hM3Dq cases (see Figure 2C: the AAV2.1-hSyn-hM3Dq-IRES-AcGFP line is increasing after 2 years.) 

      We agree that our interpretation should be stated more cautiously, given the limited number of cases assessed beyond the two-year timepoint. In the revised manuscript, we have clarified in the Results that the observed decline is based on a subset of animals. We have also included a text stating that while a consistent decline was observed in hM4Di-expressing monkeys, the trajectory for hM3Dq expression was more variable with at least one case showing an increased signal beyond two years.

      Revised Results section:

      Lines 140, “hM4Di expression levels remained stable at peak levels for approximately 1.5 years, followed by a gradual decline observed in one case after 2.5 years, and after approximately 3 years in the other two cases (Figure 2B, a and e/d, respectively). Compared with hM4Di expression, hM3Dq expression exhibited greater post-peak fluctuations. Nevertheless, it remained at ~70% of peak levels after about 1 year. This post-peak fluctuation was not significantly associated with the cumulative number of DREADD agonist injections (repeated-measures two-way ANOVA, main effect of activation times, F<sub>(1,6)</sub> = 5.745, P = 0.054). Beyond 2 years post-injection, expression declined to ~50% in one case, whereas another case showed an apparent increase (Figure 2C, c and m, respectively).”

      Given that individual differences may affect expression levels, it would be helpful to see additional labels on the graphs (or in the legends) indicating which subject and which region are being represented for each line and/or data point in Figure 1C, 2B, 2C, 5A, and 5B. Alternatively, for Figures 5A and B, an accompanying table listing this information would be sufficient. 

      We thank the reviewer for these helpful suggestions. In response, we have revised the relevant figures (Fig. 1C, 2B, 2C, and 5) as noted in the “Recommendations for the authors”, including simplifying visual encodings and improving labeling. We have also updated Table 2 to explicitly indicate the animal ID and brain regions associated with each data point shown in the figures.

      While the authors comment on several factors that may influence peak expression levels, including serotype, promoter, titer, tag, and DREADD type, they do not comment on the volume of injection. The range in volume used per region in this study is between 2 and 54 microliters, with larger volumes typically (but not always) being used for cortical regions like the OFC and dlPFC, and smaller volumes for subcortical regions like the amygdala and putamen. This may weaken the claim that there is no significant relationship between peak expression level and brain region, as volume may be considered a confounding variable. Additionally, because of the possibility that larger volumes of viral vectors may be more likely to induce an immune response, which the authors suggest as a potential influence on transgene expression, not including volume as a factor of interest seems to be an oversight. 

      We thank the reviewer for raising this important issue. We agree that injection volume could act as a confounding variable, particularly since larger volumes were used in only handheld cortical injections. This overlap makes it difficult to disentangle the effect of volume from those of brain region or injection method. Moreover, data points associated with these larger volumes also deviated when volume was included in the model.

      To address this, we performed a separate analysis restricted to injections delivered via microinjector, where a comparable volume range was used across cases. In this subset, we included injection volume as additional factor in the model and found that volume did not significantly impact peak expression levels. Instead, the presence of co-expressed protein tags remained a significant predictor, while viral titer no longer showed a significant effect. These updated results have replaced the originals in the revised Results section and in the new Figure 5. We have also revised the Discussion to reflect these updated findings.

      The authors conclude that vectors encoding co-expressed protein tags (such as HA) led to reduced peak expression levels, relative to vectors with an IRES-GFP sequence or with no such element at all. While interesting, this finding does not necessarily seem relevant for the efficacy of long-term expression and function, given that the authors show in Figures 1 and 2 that peak expression (as indicated by a change in binding potential relative to non-displaced radioligand, or ΔBPND) appears to taper off in all or most of the constructs assessed. The authors should take care to point out that the decline in peak expression should not be confused with the decline in longitudinal expression, as this is not clear in the discussion; i.e. the subheading, "Factors influencing DREADD expression," might be better written as, "Factors influencing peak DREADD expression," and subsequent wording in this section should specify that these particular data concern peak expression only. 

      We appreciate this important clarification. In response, we have revised the title to "Protein tags reduce peak DREADD expression levels" in the Results section and “Factors influencing peak DREADD expression levels” in the Discussion section. Additionally, we specified that our analysis focused on peak ΔBP<sub>ND</sub> values around 60 days post-injection. We have also explicitly distinguished these findings from the later-stage changes in expression seen in the longitudinal PET data in both the Results and Discussion sections.

      Reviewer #1 (Recommendations for the authors):

      (1) Will any of these datasets be made available to other researchers upon request?

      All data used to generate the figures have been made publicly available via our GitHub repository (https://github.com/minamimoto-lab/2024-Nagai-LongitudinalPET.git). This has been stated in the "Data availability" section in the revised manuscript.

      (2) Suggested modifications to figures:

      a) In Figures 2B and C, the inclusion of "serotype" as a separate legend with individual shapes seems superfluous, as the serotype is also listed as part of the colour-coded vector

      We agree that the serotype legend was redundant since this information is already included in the color-coded vector labels. In response, we have removed the serotype shape indicators and now represent the data using only vector-construct-based color coding for clarity in Figure 2B and C.

      b) In Figures 3A and B, it would be nice to see tics (representing agonist administration) for all subjects, not just the two that are exemplified in panels C-D and F-H. Perhaps grey tics for the non-exemplified subjects could be used.

      In response, we have included black and white ticks to indicate all agonist administration across all subjects in Figure 3A and B, with the type of agonist clearly specified. 

      c) In Figure 4C, a Nissl- stained section is said to demonstrate the absence of neuronal loss at the vector injection sites. However, if the neuronal loss is subtle or widespread, this might not be easily visualized by Nissl. I would suggest including an additional image from the same section, in a non-injected cortical area, to show there is no significant difference between the injected and non-injected region.

      To better demonstrate the absence of neuronal loss at the injection site, we have included an image from the contralateral, non-injected region of the same section for comparison (Fig. 4C).

      d) In Figure 5A: is it possible that the hM3Dq construct with a titer of 5×10^13 gc/ml is an outlier, relative to the other hM3Dq constructs used?

      We thank the reviewer for raising this important observation. To evaluate whether the high-titer constructs represented a statistical outlier that might artifactually influence the observed trends, we performed a permutation-based outlier analysis. This assessment identified this point in question, as well as one additional case (titer 4.6 x 10e13 gc/ml, #255, L_Put), as significant outlier relative to the distribution of the dataset.

      Accordingly, we excluded these two data points from the analysis. Importantly, this exclusion did not meaningfully alter the overall trend or the statistical conclusions—specifically, the significant effect of co-expressed protein tags on peak expression levels remain robust. We have updated the Methods section to describe this outlier handling and added a corresponding note in the figure legend.

      Reviewer #2 (Public review): 

      Weaknesses 

      This study is a meta-analysis of several experiments performed in one lab. The good side is that it combined a large amount of data that might not have been published individually; the downside is that all things were not planned and equated, creating a lot of unexplained variances in the data. This was yet judiciously used by the authors, but one might think that planned and organized multicentric experiments would provide more information and help test more parameters, including some related to inter-individual variability, and particular genetic constructs. 

      We thank the reviewer for bringing this important point to our attention. We fully acknowledge that the retrospective nature of our dataset—compiled from multiple studies conducted within a single laboratory—introduces variability related to differences in injection parameters and scanning timelines. While this reflects the practical realities and constraints of long-term NHP research, we agree that more standardized and prospectively designed studies would better control such source of variances. To address this, we have added the following statement to the "Technical consideration" section in Discussion:

      Lines 297, "This study included a retrospective analysis of datasets pooled from multiple studies conducted within a single laboratory, which inherently introduced variability across injection parameters and scan intervals. While such an approach reflects real-world practices in long-term NHP research, future studies, including multicenter efforts using harmonized protocols, will be valuable for systematically assessing inter-individual differences and optimizing key experimental parameters."

      Reviewer #2 (Recommendations for the authors):

      I just have a few minor points that might help improve the paper:

      (1) Figure 1C y-axis label: should add deltaBPnd in parentheses for clarity.

      We have added “ΔBP<sub>ND</sub>” to the y-axis label for clarity.

      The choice of a sigmoid curve is the simplest clear fit, but it doesn't really consider the presence of the peak described in the paper. Would there be a way to fit the dynamic including fitting the peak?

      We agree that using a simple sigmoid curve for modeling expression dynamics is a limitation. In response to this and a similar comment from Reviewer #3, we tested a double logistic function (as suggested) to see if it better represented the rise and decline pattern. However, as described below, the original simple sigmoid curve was a better fit for the data. We have included a discussion regarding this limitation of this analysis. See Reviewer #3 recommendations (2) for details.

      The colour scheme in Figure 1C should be changed to make things clearer, and maybe use another dimension (like dotted lines) to separate hM4Di from hM3Dq.

      We have improved the visual clarity of Figure 1C by modifying the color scheme to represent vector construct and using distinct line types (dashed for hM4Di and solid for hM3Dq data) to separate DREADD type.

      (2) Figure 2

      I don't understand how the referencing to 100 was made: was it by selecting the overall peak value or the peak value observed between 40 and 80 days? If the former then I can't see how some values are higher than the peak. If the second then it means some peak values occurred after 80 days and data are not completely re-aligned.

      We thank the reviewer for the opportunity to clarify this point. The normalization was based on the peak value observed between 40–80 days post-injection, as this window typically captured the peak expression phase in our dataset (see Figure 1). However, in some long-term cases where PET scans were limited during this period—e.g., with one scan performing at day 40—it is possible that the actual peak occurred later. Therefore, instances where ΔBP<sub>ND</sub> values slightly exceeded the reference peak at later time points likely reflect this sampling limitation. We have clarified this methodological detail in the revised Results section to improve transparency.

      The methods section mentions the use of CNO but this is not in the main paper which seems to state that only DCZ was used: the authors should clarify this

      Although DCZ was the primary agonist used, CNO and C21 were also used in a few animals (e.g., monkeys #153, #221, and #207) for behavioral assessments. We have clarified this in the Results section and revised Figure 3 to indicate the specific agonist used for each subject. Additionally, we have updated the Methods section to clearly specify the use and dosage of DCZ, CNO, and C21, to avoid any confusion regarding the experimental design.

      Reviewer #3 (Public review): 

      Minor weaknesses are related to a few instances of suboptimal phrasing, and some room for improvement in time course visualization and quantification. These would be easily addressed in a revision. <br /> These findings will undoubtedly have a very significant impact on the rapidly growing but still highly challenging field of primate chemogenetic manipulations. As such, the work represents an invaluable resource for the community.

      We thank the reviewer for the positive assessment of our manuscript and for the constructive suggestions. We address each comment in the following point-by-point responses and have revised the manuscript accordingly.

      Reviewer #3 (Recommendations for the authors):

      (1) Please clarify the reasoning was, behind restricting the analysis in Figure 1 only to 7 monkeys with subcortical AAV injection?

      We focused the analysis shown in Figure 1 on 7 monkeys with subcortical AAV injections who received comparative injection volumes. These data were primary part of vector test studies, allowing for repeated PET scans within 150 days post-injection. In contrast, monkeys with cortical injections—including larger volumes—were allocated to behavioral studies and therefore were not scanned as frequently during the early phase. We will clarify this rationale in the Results section.

      (2) Figure 1: Not sure if a simple sigmoid is the best model for these, mostly peaking and then descending somewhat, curves. I suggest testing a more complex model, for instance, double logistic function of a type f(t) = a + b/(1+exp(-c*(t-d))) - e/(1+exp(-g*(t-h))), with the first logistic term modeling the rise to peak, and the second term for partial decline and stabilization

      We appreciate the reviewer’s thoughtful suggestion to use a double logistic function to better model both the rising and declining phases of the expression curve. In response to this and similar comments from Reviewer #1, we tested the proposed model and found that, while it could capture the peak and subsequent decline, the resulting fit appeared less biologically plausible (See below). Moreover, model comparison using BIC favored the original simple sigmoid model (BIC = 61.1 vs. 62.9 for the simple and double logistic model, respectively). This information has been included in the revised figure legend for clarity.

      Given these results, we retained the original simple sigmoid function in the revised manuscript, as it provides a sufficient and interpretable approximation of the early expression trajectory—particularly the peak expression-time estimation, which was the main purpose of this analysis. We have updated the Methods section to clarify our modeling and rationale as follows:

      Lines 530, "To model the time course of DREADD expression, we used a single sigmoid function, referencing past in vivo fluorescent measurements (Diester et al., 2011). Curve fitting was performed using least squares minimization. For comparison, a double logistic function was also tested and evaluated using the Bayesian Information Criterion (BIC) to assess model fit."

      We also acknowledge that a more detailed understanding of post-peak expression changes will require additional PET measurements, particularly between 60- and 120-days post-injection, across a larger number of animals. We have included this point in the revised Discussion to highlight the need for future work focused on finer-grained modeling of expression decline:

      Lines 317, “Although we modeled the time course of DREADD expression using a single sigmoid function, PET data from several monkeys showed a modest decline following the peak. While the sigmoid model captured the early-phase dynamics and offered a reliable estimate of peak timing, additional PET scans—particularly between 60- and 120-days post-injection—will be essential to fully characterize the biological basis of the post-peak expression trajectories.”

      Author response image 1.<br />

      (3) Figure 2: It seems that the individual curves are for different monkeys, I counted 7 in B and 8 in C, why "across 11 monkeys"? Were there several monkeys both with hM4Diand hM3Dq? Does not look like that from Table 1. Generally, I would suggest associating specific animals from Tables 1 and 2 to the panels in Figures 1 and 2.

      Some animals received multiple vector types, leading to more curves than individual subjects. We have revised the figure legends and updated Table 2 to explicitly relate each curve with the specific animal and brain region.

      (4) I also propose plotting the average of (interpolated) curves across animals, to convey the main message of the figure more effectively.

      We agree that plotting the mean of the interpolated expression curves would help convey the group trend. We added averaged curves to Figure 2BC.

      (5) Similarly, in line 155 "We assessed data from 17 monkeys to evaluate ... Monkeys expressing hM4Di were assessed through behavioral testing (N = 11) and alterations in neuronal activity using electrophysiology (N = 2)..." - please explain how 17 is derived from 11, 2, 5 and 1. It is possible to glean from Table 1 that it is the calculation is 11 (including 2 with ephys) + 5 + 1 = 17, but it might appear as a mistake if one does not go deep into Table 1.

      We have clarified in both the text and Table 1 that some monkeys (e.g., #201 and #207) underwent both behavioral and electrophysiological assessments, resulting in the overlapping counts. Specifically, the dataset includes 11 monkeys for hM4Di-related behavior testing (two of which underwent electrophysiology testing), 5 monkeys assessed for hM3Dq with FDG-PET, and 1 monkey assessed for hM3Dq with electrophysiology, totaling 19 assessments across 17 monkeys. We have revised the Results section to make this distinction more explicit to avoid confusion, as follows:

      Lines 164, "Monkeys expressing hM4Di (N = 11) were assessed through behavioral testing, two of which also underwent electrophysiological assessment. Monkeys expressing hM3Dq (N = 6) were assessed for changes in glucose metabolism via [<sup>18</sup>F]FDG-PET (N = 5) or alterations in neuronal activity using electrophysiology (N = 1).”

      (6) Line 473: "These stock solutions were then diluted in saline to a final volume of 0.1 ml (2.5% DMSO in saline), achieving a dose of 0.1 ml/kg and 3 mg/kg for DCZ and CNO, respectively." Please clarify: the injection volume was always 0.1 ml? then it is not clear how the dose can be 0.1 ml/kg (for a several kg monkey), and why DCZ and CNO doses are described in ml/kg vs mg/kg?

      We thank the reviewer for pointing out this ambiguity. We apologize for the oversight and also acknowledge that we omitted mention of C21, which was used in a small number of cases. To address this, we have revised the “Administration of DREADD agonist” section of the Methods to clearly describe the preparation, the volume, and dosage for each agonist (DCZ, CNO, and C21) as follows:

      Lines 493, “Deschloroclozapine (DCZ; HY-42110, MedChemExpress) was the primary agonist used. DCZ was first dissolved in dimethyl sulfoxide (DMSO; FUJIFILM Wako Pure Chemical Corp.) and then diluted in saline to a final volume of 1 mL, with the final DMSO concentration adjusted to 2.5% or less. DCZ was administered intramuscularly at a dose of 0.1 mg/kg for hM4Di activation, and at 1–3 µg/kg for hM3Dq activation. For behavioral testing, DCZ was injected approximately 15 min before the start of the experiment unless otherwise noted. Fresh DCZ solutions were prepared daily.

      In a limited number of cases, clozapine-N-oxide (CNO; Toronto Research Chemicals) or Compound 21 (C21; Tocris) was used as an alternative DREADD agonist for some hM4Di experiments. Both compounds were dissolved in DMSO and then diluted in saline to a final volume of 2–3 mL, also maintaining DMSO concentrations below 2.5%. CNO and C21 were administered intravenously at doses of 3 mg/kg and 0.3 mg/kg, respectively.”

      (7) Figure 5A: What do regression lines represent? Do they show a simple linear regression (then please report statistics such as R-squared and p-values), or is it related to the linear model described in Table 3 (but then I am not sure how separate DREADDs can be plotted if they are one of the factors)?

      We thank the reviewer for the insightful question. In the original version of Figure 5A, the regression lines represented simple linear fits used to illustrate the relationship between viral titer and peak expression levels, based on our initial analysis in which titer appeared to have a significant effect without any notable interaction with other factors (such as DREADD type).

      However, after conducting a more detailed analysis that incorporated injection volume as an additional factor and excluded cortical injections and statistical outliers (as suggested by Reviewer #1), viral titer was no longer found to significantly predict peak expression levels. Consequently, we revised the figure to focus on the effect of reporter tag, which remained the most consistent and robust predictor in our model.

      In the updated Figure 5, we have removed the relationship between viral titer and expression level with regression lines.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for the authors):

      Because many conclusions are drawn from overexpression studies and from a single cell line (HEK293), it is unclear how general these effects are. In particular, one of the main claims put forth in this manuscript is that of specificity, namely, that FZD5/8, and none of the other FZDs, are uniquely involved in this internalization and degradation. While there are examples of similar specificities, many of these examples can be attributed to a particular cellular context. Without demonstrating that this FZD5/8 specificity is observed in multiple cell lines and contexts, this point remains unconvincing and questionable. One way to address this point of criticism is to omit the word "specifically" in the title and soften the language concerning this idea throughout the manuscript.

      We appreciate your valuable comments and suggestions. We have removed the word “specifically” from the title and softened the language concerning this idea throughout the manuscript. Moreover, we performed new experiments to show that Wnt3a/5a induces FZD5/8 endocytosis and degradation and that IWP-2 treatment increases the cell surface levels of FZD5/8 in cell lines other than 293A (Figure 1-Figure supplement 1 and Figure 2-Figure supplement 1). These results indicate that Wnt-induced FZD5/8 endocytosis and degradation are not cell specific.

      The starting point for these studies is a survey of all 10 FZDs, V5-tagged and overexpressed in HEK293 cells. Here, the authors observed a decline in cell surface levels of only FZD5 and 8 in response to Wnt3a and Wnt5a. As illustrated in the immunoblot (Fig 1B), several FZDs were poorly expressed, including FZD1, 3, 6 and 9, which calls into question that only FZD5 and 8 were affected. Furthermore, total levels of FZD8 don't diminish appreciably, as claimed by the authors, and only FZD5 shows a subtle decline upon WNT treatment. All of these experiments are performed with overexpressed V5-tagged FZD proteins or with endogenously V5-tagged (KI) proteins, and it is possible that overexpression or tagging lead to potentially artifactual observations. Examining the effects of WNTs on FZD protein localization and levels need to be done with endogenously expressed, non-tagged FZDs. In this context, it is somewhat puzzling that the authors don't show such an experiment using the pan- and FZD5/8-specific antibodies, which they use in multiple experiments throughout the manuscript. With these available tools it should be possible to examine FZD levels at the cell surface in response to Wnt3a and Wnt5a, ideally in multiple cell lines.

      We appreciate your valuable comments and suggestions. Figure 1B shows the results of the follow-up study shown in Figure 1A. As shown in Figure 1A, we used flow cytometry analysis to detect the cell surface levels of stably expressed FZDs and found that Wnt3a/5a specifically reduced the levels of FZD5/8 on the cell surface, suggesting that Wnt3a/5a induces FZD5/8 endocytosis. As shown in Figure 1B and C, we performed immunoblotting to examine whether Wnt3a/5a-induced FZD5/8 internalization resulted in FZD5/8 degradation. Notably, most FZDs exhibit two bands on immunoblots, as also suggested by other published studies, and the upper bands represent the mature form that is fully glycosylated and presented to the cell surface (see also new Figure 2L), whereas the lower bands represent the immature form. Our results clearly indicated that Wnt3a/5a treatment reduced the levels of the mature forms of both FZD5 and FZD8, although the immunoblotting signals of the mature form of FZD8 (upper bands) were relatively weak. The immunoblotting signals of the other FZDs varied, and some of them (including FZD1, -3, -6 and -9) were relatively weak; however, according to the results in Figure 1A, all of the FZDs were expressed and present on the cell surface.

      Commercially available FZD5/8 antibodies, including those used in published studies, cannot detect endogenous FZD5/8 or can only recognize immature FZD5 in our hands, which is why we have to use the CRISPR-CAS9-based KI technique to introduce a V5 tag to FZD5 and FZD7. Notably, in the overexpression experiments, the V5 tag is on the amino terminus, and in the KI experiments, the V5 tag is on the carboxyl terminus of FZDs, which may minimize the potential artificial effects of the V5 tag on the immunoblotting assays.

      The monoclonal antibodies used in this study, such as anti-pan-FZD, anti-FZD5/8, and anti-FZD4 antibodies, are neutralizing antibodies that can compete with Wnt ligands to bind to the FZD CRD. These antibodies have been successfully used to detect the surface levels of FZDs via flow cytometry assays. However, as the binding affinity of the Wnt-FZD CRD is comparable to the binding affinity of the antibody-FZD, we were cautious in using these antibodies to detect the cell surface levels of FZDs when the cells were treated with Wnt3a/5a CM, which contains relatively high concentrations of Wnt3a/5a. As shown in Author response image 1, Wnt3a or Wnt5a treatment dramatically reduced the endogenous cell surface level of FZD5/8, as detected by flow cytometry using the anti-FZD5/8 antibody. However, in another experiment, HEK293A cells were first incubated with cold Wnt3a or Wnt5a CM at 4°C to minimize endocytosis and then analyzed via flow cytometry using the anti-FZD5/8 antibody. The results showed that Wnt3a/5a incubation reduced the floe cytometry signals, suggesting that Wnt3a/5a binding to FZD5/8 might interfere with antibody-FZD5/8 binding, although we cannot exclude the possibility that Wnt3a/5a may induce FZD5/8 endocytosis at 4°C (Author response image 1).

      Author response image 1.

      (A) HEK293A cells were treated with control, Wnt3a or Wnt5a CM for 2 hours at 37°C in a humidified incubator and were analyzed via flow cytometry using the anti-FZD5/8 antibody.

      (B) HEK293A cells were incubated with control, Wnt3a or Wnt5a CM for 1 h at 4°C and analyzed by flow cytometry using the anti-FZD5/8 antibody.

       

      Several experiments rely on gene-edited clonal cell lines, including knockouts of FZD5/8, RNF43/ZNRF3, and DVL. Gene knockouts were confirmed by genomic DNA sequencing and, for DVL and FZD5/8, by loss of protein expression. While these KO lines are powerful tools to study gene function, there is a concern for clonal variability. Each cell line may have acquired additional changes as a result of gene editing. In addition, there may be compensatory changes in gene expression as a consequence of the loss of certain genes. For example, expression of other FZDs may increase in FZD5/8 DKO cells. To address this critique, the authors should show that re-expression of the knocked-out genes rescues the observed effect. This is done in some instances (Fig 5E, G, H) but not in other instances, such as with the DVL TKO (Fig. 3). Since the authors assert that DVL is important for FZD internalization in the absence of WNT, but not for FZD internalization in the presence of WNT, this particular rescue experiment is important. This is a potentially important finding and it should be confirmed by re-expression of DVL in the TKO line. As an alternative, conditional knockdown using Tet-inducible shRNA expression could address concerns for clonal variability.

      We appreciate your valuable comments and suggestions. We re-expressed DVL2 in DVLTKO cells stably expressing V5-linker-FZD5 or V5-linker-FZD7. As shown in Figure 3G-K, re-expression of DVL2 rescued the decreased Wnt-independent endocytosis of FZD5 and FZD7 caused by DVL1/2/3 knockout.

      Given the significant differences in signaling activity by Wnt3a and Wnt5a, it is somewhat surprising that all experiments shown in this manuscript do not identify distinguishing features between Wnt3a and Wnt5a. In addition, it is unclear why the authors switch between Wnt3a and Wnt5a. For example, Figures 1C, 3G-J, 4C-D only use Wnt5a. In contrast, Figures 6E and H use Wnt3a, most likely because b-catenin stabilization is examined, an effect generally not observed with Wnt5a. The choice of which Wnt is examined/used appears to be somewhat arbitrary and the authors never provide any explanations for these choices. In the end, this type of inconsistency becomes puzzling when the authors present, quite convincingly, in Figure 7, that both Wnt3a and 5a promote an interaction between FZD5/8 and RNF43 through proximity biotin labeling.

      Although Wnt3a and Wnt5a are significantly different in triggering intracellular signaling pathways, both bind FZD5/8 and induce FZD5/8 endocytosis and degradation similarly. When FZD5 is stably overexpressed, Wnt5a has slightly stronger effects on inducing FZD5 endocytosis and degradation, possibly because the Wnt5a concentration may be higher than the Wnt3a concentration in our CM, which is why we used Wnt5a CM in some experiments when V5-FZD5 was overexpressed. In the revised manuscript, we used both Wnt3a and Wnt5a CM in the experiments as you suggested, as shown in Figure 1C, 3G-K and Figure 4-Figure supplement 1.

      Minor Points:

      Figure 3G and I: it is curious that individual cells are shown in the "0 h" samples, while the "Con 1 h" and "Wnt5a 1 h" show multiple cells with several making direct contact with each other. This is notable because the V5 staining at sites of cell-cell contact are quite distinct and variable between control and Wnt5a-treated and WT versus DVL TKO cells. Also, sub-cellular localization of FZD5 (V5 tag) puncta is quite distinct between Con and Wnt5a: puncta in Wnt5a-treated cells appear to be more plasma membrane proximal than in Con cells. These points may be easy to address by showing images of cells that are more similar with respect to cell number and density for each condition.

      Thank you for your suggestions. We repeated these experiments and added Wnt3a treatment and adjusted the cell density. Images including an individual cell were selected for presentation.

      Figure 5E: the following statement is confusing/misleading: "Furthermore, reintroducing ZNRF3 or RNF43 into ZRDKO cells efficiently restored the increase in cytosolic β-catenin levels, whereas the expression of RNF130 or RNF150, two structurally similar transmembrane E3 ubiquitin ligases, did not (Fig. 5E)." First, reintroduction of ZNRF3 or RNF43 restores cytosolic b-catenin levels; it does not restore the increase in b-catenin. Second, the claim that RNF130 fails to have this effect is not substantiated since it is barely expressed.

      Thank you for your suggestions and comments. We reorganized the language to make the statement clearer. Notably, the expression level of RNF130 was relatively low compared with that of other E3 ligases, but RNF130 was expressed (Figure 5E darker exposure) and could reduce the cell surface levels of FZDs, as shown in Figure 5G.

      Reviewer #2 (Recommendations for the authors):

      (1) Given their results the authors conclude that upregulation of Frizzled on the plasma membrane is not sufficient to explain the stabilization of beta-catenin seen in the ZNRF3/RNF43 mutant cells. This interpretation is sound, and they suggest in the discussion that ZNRF3/RNF43-mediated ubiquitination could serve as a sorting signal to sort endocytosed FZD to lysosomes for degradation and that absence or inhibition of this process would promote FZD recycling. This should be relatively easy to test using surface biotinylation experiments and would considerably strengthen the manuscript.

      Thank you for your valuable suggestions and comments. We performed cell surface biotinylation experiments in HEK293A FZD5KI cells, as shown in Figure 2L. The results indicated that Wnt3a or Wnt5a treatment induced the degradation of FZD5 on the cell surface, which was antagonized by cotreatment with RSPO1. We did not perform a more detailed endocytosis/recycling biotinylation experiment that requires complex reversible biotinylation and multiple washing steps because HEK293A cells are fragile in culture and not easy to handle. Furthermore, the results shown in Figure 4 indicate that knockout of ZNRF3/RNF43 or RSPO1 significantly blocked the degradation of internalized FZD5 and reduced the colocalization of internalized FZD5 with lysosomal markers, suggesting that Wnt3a/5a induced lysosomal degradation of FZD5 in the presence of ZNRF3/RNF43 and that the internalized FZD5 was most likely recycled back to the cell surface when ZNRF3/RNF43 was knocked out or inhibited by RSPO1.

      (2) The authors show that the FZD5 CRD domain is required for endocytosis since a mutant FZD5 protein in which the CRD is removed does not undergo endocytosis. This is perhaps not surprising since this is the site of Wnt binding, but the authors show that a chimeric FZD5CRD-FZD4 receptor can confer Wnt-dependent endocytosis to an otherwise endocytosis incompetent FZD4 protein. Since the linker region between the CRD and the first TM differs between FZD5 and FZD4, it would be interesting to understand whether the CRD specifically or the overall arrangement (such as the spacing) is the most important determinant.

      Our results in Figure 1D-H clearly show that the CRD of FZD5 specifically is both necessary and sufficient for Wnt3a/5a-induced FZD5 endocytosis, as replacing the CRD alone in FZD5 with the CRD from either FZD4 or FZD7 completely abolished Wnt-induced endocytosis, whereas replacing the CRD alone in FZD4 or FZD7 with the FZD5 CRD alone could confer Wnt-induced endocytosis.

      (3) I find it surprising that only FZD5 and FZD8 appear to undergo endocytosis or be stabilized at the cell surface upon ZNRF3/RNF43 knockout. Is this consistent with previous literature? Is that a cell-specific feature? These findings should be tested in a different cell line, with possibly different relative levels of ZNRF3 and RNF43 expression.

      Thank you for your comments and suggestions. Our finding that ZNRF3/RNF43 specifically regulates FZD5/8 degradation is consistent with recent published studies in which FZD5 is required for the survival of RNF43-mutant PDAC or colorectal cancer cells (Nature Medicine, 2017, PMID: 27869803) and FZD5 is required for the maintenance of intestinal stem cells (Developmental Cell, 2024, PMID: 39579768 and 39579769), and in both cases, FZDs other than FZD5/8 are also expressed but not sufficient to compensate for the function of FZD5. The mechanism by which Wnt3a/5a specifically induces FZD5/8 endocytosis and degradation is currently unknown and needs to be explored in the future. We speculate that Wnt binding to FZD5/8 may recruit another protein on the cell surface to specifically facilitate FZD5/8 endocytosis. On the other hand, we cannot exclude the possibility that Wnts other than Wnt3a/5a may induce the endocytosis and degradation of FZDs other than FZD5/8 since there are 19 Wnts and 10 FZDs in humans. Notably, several previous studies have suggested that ZNRF3/RNF43 may regulate the endocytosis and degradation of all FZDs without selectivity (such as Nature, 2012, PMID: 22575959; Nature, 2012, PMID: 22895187; Mol Cell, 2015, PMID: 25891077). However, their conclusions were drawn mostly on the basis of overexpression studies. According to the results shown in Figure 5E-H, overexpressing a membrane-tethered E3 ligase (such as ZNRF3, RNF43, RNF130, or RNF150) may nonspecifically degrade FZD proteins on the cell surface.

      Furthermore, in the revised manuscript, we showed that Wnt3a/5a induced FZD5/8 endocytosis and degradation in multiple cell lines, including Huh7, U2OS, MCF7, and 769P cells (Figure 1-Figure supplement 1 and Figure 2-Figure supplement 1), suggesting that these phenomena are not specific to 293A cells.

      (4) If FZD7 is not a substrate of ZNRF3/RNF43 and therefore is not ubiquitinated and degraded, how do the authors reconcile that its overexpression does not lead to elevated cytosolic beta-catenin levels in Figure 5B?

      We are currently not sure of the mechanism underlying this result. Considering that most FZDs are expressed in 293A cells, we do not know how much of the mature form of overexpressed FZD7 was presented to the plasma membrane.

      (5) For Figure 5B, it would be interesting if the authors could evaluate whether overexpression of FZD5 in the ZNRF3/RNF43 double knockout lines would synergize and lead to further increase in cytosolic beta-catenin levels. As control if the substrate selectivity is clear FZD7 overexpression in that line should not do anything.

      Thank you for your suggestion. We performed these experiments as suggested, and the results indicated that overexpressing FZD5 further increased cytosolic beta-catenin levels in ZRDKO cells, whereas FZD7 had no effect (Figure 6D).

      (6) In Figure 6G, the authors need to show cytosolic levels of beta-catenin in the absence of Wnt in all cases.

      We did not add Wnt CM in this experiment. RSPO1 activity, which relies on endogenous Wnt, has been well documented in previous studies.

      (7) Since the authors show that DVL is not involved in the Wnt and ZRNF3-dependent endocytosis they should repeat the proximity biotinylation experiment in figure 7 in the DVL triple KO cells. This is an important experiment since previous studies showed that DVL was required for the ZRNF3/RNF43-mediated ubiqtuonation of FZD.

      Thank you for your valuable suggestions. As you suggested, we performed a proximity biotinylation experiment in DVL TKO cells, and the results showed that Wnt3a/5a could still induce the interaction of FZD5 and RNF43 in DVLTKO cells (Figure 7-figure supplement 1), suggesting that the Wnt-induced FZD5‒RNF43 interaction is DVL independent.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, Roy et al. used the previously published deep transfer learning tool, DEGAS, to map disease associations onto single-cell RNA-seq data from bulk expression data. The authors performed independent runs of DEGAS using T2D or obesity status and identified distinct β-cell subpopulations. β-cells with high obese-DEGAS scores contained two subpopulations derived largely from either non-diabetic or T2D donors. Finally, immunostaining using human pancreas sections from healthy and T2D donors validated the heterogeneous expression and depletion of DLK1 in T2D islets.

      Strengths:

      (1) This meta-analysis of previously published scRNA-seq data using a deep transfer learning tool.

      (2) Identification of novel beta cell subclusters.

      (3) Identified a relatively innovative role of DLK1 in T2D disease progression.

      Thank you for your comments on the strengths of our work.

      Weaknesses :

      “There is little overlap of the DE list of bulk RNA-seq analysis in Figure 1D and 1E overlap with the DE list of pseudo-bulk RNA-seq analysis of all cells in Figure S2C. “

      Thank you for pointing this out. To clarify, we did not perform pseudo-bulk analysis on the scRNAseq data. Instead, we used the Seurat FindClusterMarkers function to identify differentially enriched genes between T2D and ND single cells. Indeed, there are many significant genes in new Fig S2D (original S2C). There is some overlap between those data and the DEGS from bulk RNAseq data in Fig 1D, including IAPP, ENTPD3, and FFAR4. However, the limited overlap supports the notion that improved approaches are necessary to identify candidate DEGs from single cell data, as simply performing a comparison of T2D to ND of all β-cells may miss important genes or include many false positives. We have now added clarification to the text to highlight this point.

      The biological meaning of "beta cells had the lowest scores compared to other cell types" is not clear.

      The relatively lower T2D-DEGAS scores for beta cells overall compared to all other cell types (alpha cells, acinar cells, etc) likely reflects the fact that in T2D, beta cell-specific genes can be downregulated. This affects the DEGAS model which is reflected in the scores of all cells in the scRNAseq data. By subsetting the beta cells and replotting them on their own, we can analyze the relative differences in DEGAS scores between different subsets of beta cells. We have now amended the text to clarify, as follows:

      “We next mapped the T2D-association scores onto the single cells (Fig 3A). β-cells had a wide distribution of scores, possibly reflecting β-cell heterogeneity or altered β-cell gene expression after onset of T2D (Fig 3B).”

      The figures and supplemental figures were not cited following the sequence, which makes the manuscript very difficult to read. Some supplemental figures, such as Figures S1C-S1D, S2B-S2E, S3A-S3B, were not cited or mentioned in the text.

      We apologize for this oversight and have now amended the text to call out all figures/panels in order of first introduction.

      In Figure 7, the current resolution is too low to determine the localization of DLK1.

      We have confirmed that in our Adobe Illustrator file, each microscopy panel has a DPI of >600. We have also provided the highest quality TIFF file versions of our figure set. We hope the reviewer will have access to download the high-quality TIFF file for Fig 7 if possible, or the editorial staff can provide it.

      As a result of addressing the critiques, we identified CDKN1C as another promising candidate enriched in the β<sup>T2D-DEGAS</sup> and β<sup>obese-DEGAS</sup> subpopulations of β-cells. We found that CDKN1C is heterogeneously expressed at the protein level in β-cells and that it is increased in T2D in agreement with the DEGAS predictions. We have amended the manuscript to highlight CDKN1C more prominently while still discussing DLK1. DLK1 is very interesting, but exhibits greater donor to donor variability in its alterations in T2D.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Gitanjali Roy et al. applies deep transfer learning (DEGAS) to assign patient-level disease attributes (metadata) to single cells of T2D and non-diabetic patients, including obese patients. This led to the identification of a singular cluster of T2D-associated β-cells; and two subpopulations of obese- β-cells derived from either non-diabetic or T2D donors. The objective was to identify novel and established genes implicated in T2D and obesity. Their final goal is to validate their findings at the protein level using immunohistochemistry of pancreas tissue from non-diabetic and T2D organ donors.

      Strengths:

      This paper is well-written, and the findings are relevant for β-cell heterogeneity in T2D and obesity.

      Thank you for your comments on the positive aspects of our work.

      Weaknesses:

      The validation they provide is not sufficiently strong: no DLK1 immunohistochemistry is shown of obese patient-derived sections.

      We have acquired additional FFPE pancreas samples from the Integrated Islet Distribution Program (IIDP) from lean, overweight, and obese humans with and without T2D. We have now stained for CDKN1C and DLK1 in these samples and have integrated the data into Fig 7 and Fig S5.

      Because the data with CDKN1C was more striking and consistent with the DEGAS predictions, we have chosen to highlight CDKN1C in the main figure and text. The DLK1 data is still quite interesting, although there is substantial variability between T2D donors when it comes to altered staining intensity. DLK1 presents an interesting challenge, given multiple isoforms and cleavage products, and will require further investigation as the focus of a different manuscript.

      Additional presumptive relevant candidates from this transcriptomic analysis should be screened for, at the protein level.

      Thank you for this suggestion. We also identified CDKN1C as promising candidate enriched in the β<sup>T2D-DEGAS</sup> and β<sup>obese-DEGAS</sup> subpopulations of β-cells. We found that CDKN1C is heterogeneously expressed at the protein level in β-cells and that it is increased in T2D in agreement with the DEGAS predictions. We have amended the manuscript to highlight CDKN1C more prominently while still discussing DLK1. DLK1 is very interesting but exhibits greater donor to donor variability in its alterations in T2D.

      Reviewer #1 (Recommendations For The Authors):

      Please explain and provide the detailed information on what percentage of the DE list of bulk RNA-seq analysis in Figures 1D and 1E overlap with the DE list of pseudo-bulk RNA-seq analysis of all cells in Figure S2C.

      Addressed in response to R1 Comment 1.

      Please provide the definition of each cluster of UMAP of the merged human islet scRNA-seq data.

      In figure panels 2A-B,D-G and 3A, the clusters are now labeled according to the marker genes described in Fig 2C.

      The integrative UMAP needs to be included in the main figure.

      We have now moved previous Fig S2A and S2B into the main figures as new Fig 2A-B.

      All figures and supplemental figures need to be cited following sequence.

      Addressed in response to R1 Comment 3.

      In Figure 7, high-resolution images are needed to determine the colocalization of INS and DLK1.

      Addressed in response to R1 Comment 4.

      Reviewer #2 (Recommendations For The Authors):

      Results: 124-128: Fig 1H_The error bars seem high, please include whether the boxplots are SEM or SD. Also, more detail on statistics is missing.

      Thank you for pointing out the need for clarification here. The whiskers on the box and whiskers plots are not error bars. By default, in geom_boxplot() and stat_boxplot(), the whiskers extend to 1.5 times the interquartile range. The box itself represents 50% of the data, the bottom of the box is the first quartile, the middle horizontal line is the median, and the top line of the box is the third quartile. We have now added a clearer description of this to the figure legend and in the methods section.

      The genes shown in Fig 1H were selected because they are found in the T2D Knowledge Portal, illustrating a clear link to T2D. At the T2DKP (https://t2d.hugeamp.org/research.html?pageid=mccarthy_t2d_247), PAX4 and APOE are listed as causal, SLC2A2 has strong evidence, and CYTIP has a linked SNP. This is now discussed in the results section before the Fig 1H callout. These genes are significantly differentially expressed using edgeR in panel 1D with FDR<0.05. The individual data points for each human are shown.

      Figure 6: In general, the representation of the data is quite misleading. It would be nice to have an alternative way of presenting the data, especially when comparing beta-obese differentially expressed genes and pathways and T2D beta obese. Maybe an additional Venn diagram can help. Also, it would be nice to compare data from T2D beta nonobese to ND beta obese, especially given how the story is presented in the paper.

      Thank you for pointing out this clarity issue. We agree that additional alternate ways to present the data would be helpful. When we performed DEGAS using BMI as the disease feature we noted two major and one minor clusters of high-scoring cells in Fig 6A .

      Author response image 1.

      Author response image 2.<br />

      This contrasted with the score map when we ran DEGAS with T2D as the disease feature

      The main difference seems to be the low scoring β<sup>T2D-DEGAS</sup> cluster is different from the low β<sup>obese-DEGAS</sup> cluster.

      Therefore, we could not easily apply thresholding to the β<sup>obese-DEGAS</sup> scores, so instead we subsetted them for comparison. It was also apparent from the metadata that single cells from the left-hand side of the β-cell cluster came from donors that had T2D.

      To clarify these points and address the reviewer’s concerns, we have added a comparison of the DEGs identified for β<sup>T2D-DEGAS</sup> high vs. low and T2D-β<sup>obese-DEGAS</sup> vs ND-β<sup>obese-DEGAS</sup> in Fig S4J, also shown below. DLK1 and CDKNC1C fall within the intersection, in addition to being two of the most enriched candidates in each DEGAS run (Fig 4C and Fig 6D).

      220-222: Figure 7C_ Is one of the nondiabetic beta samples obese? If so, please clearly label it; if not, that info is missing. One would expect that the DLK1 expression in ND obese beta cells resembles the T2D beta cell and not ND non-obese beta cells. That's a big point of this entire work, and experimentally missing. Additional candidate proteins should be checked.

      We have amended the entire Fig 7 to include more data for DLK1 staining as well as adding staining for CDKN1C. We also used CellProfiler to quantify the intensity distribution of DLK1 staining in β-cells and overall found that our initial conclusions were not supported when considering an increased sample size. DLK1 expression is heterogeneous both within and between donors. While we have data from T2D donors that shows DLK1 is lost, other T2D samples indicate that DLK1 is not always lost. At least in the current sample set we have analyzed, we cannot conclude that there is a clear correlation between diabetes or BMI for DLK1. Why DLK1 labels some β-cells and not others and what the role of this subpopulation is an open question.

      Alternatively, we greatly appreciate the reviewer’s suggestion to validate additional candidates, as this led us to CDKN1C. In new Fig 7E-H we now show that CDKN1C is increased in T2D β-cells, in agreement with the DEGAS predictions.

      This work shows that machine learning approaches are powerful for identifying potential candidates, but it also highlights the need for these predictions to be validated at the protein level in human samples.

      Discussion: Based on lack of supporting IHC data, this is an overstatement:

      “DLK1 expression highly overlapped with high scoring βT2D DEGAS cells (Figure 7A) and with T2D βobese-DEGAS cells (Figure 7B). DLK1 immunostaining primarily colocalized with β-cells in non-diabetic human pancreas (Figure 7C). DLK1 showed heterogeneous expression within islets and between islets within the same pancreas section, wherein some islets had DLK1/INS co-staining in most β-cells and other islets had only a few DLK1+ β-cells. In the T2D pancreas, DLK1 staining was much less intense and in fewer β-cells, yet DLK1+/INS+ cells were observed (Figure 7C). This contrasts with the relatively higher DLK1 gene expression seen in the β-cells from the βT2D-DEGAS and T2D-βobese-DEGAS subpopulations (Figure 4D & 6C) as highlighted in Figure 7A,B. which were up- or down-regulated in subpopulations of β-cells identified by DEGAS, and to validate our findings at the protein level using immunohistochemistry of pancreas tissue from non-diabetic and T2D organ donors.”

      This part was at the very end of the last results subsection. This section has been largely rewritten to better describe the new figure and the language has been tempered to not overinterpret the data shown.

      “Our current findings applying DEGAS to islet data have implications for β-cell heterogeneity in T2D and obesity. The abundance of T2D-related factors and functional β-cell genes in our analysis validates applying DEGAS to islet data to identify disease-associated phenotypes and increase confidence in the novel candidate.”

      This part was found at the end of the Background section. We have removed the second sentence to temper the language.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The objective of this study was to infer the population dynamics (rates of differentiation, division, and loss) and lineage relationships of clonally expanding NK cell subsets during an acute immune response. 

      Strengths: 

      A rich dataset and thorough analysis of a particular class of stochastic models. 

      We thank the reviewer for the positive comment.

      Weaknesses: 

      The stochastic models used are quite simple; each population is considered homogeneous with first-order rates of division, death, and differentiation. In Markov process models such as these, there is no dependence of cellular behavior on its history of divisions. In recent years models of clonal expansion and diversification, in the settings of T and B cells, have progressed beyond this picture. So I was a little surprised that there was no mention of the literature exploring the role of replicative history in differentiation (e.g. Bresser Nat Imm 2022), nor of the notion of family 'division destinies' (either in division number or the time spent proliferating, as described by the Cyton and Cyton2 models developed by Hodgkin and collaborators; e.g. Heinzel Nat Imm 2017). The emerging view is that variability in clone (family) size may arise predominantly from the signals delivered at activation, which dictate each precursor's subsequent degree of expansion, rather than from the fluctuations deriving from division and death modeled as Poisson processes. 

      As you pointed out, the Gerlach and Buchholz Science papers showed evidence for highly skewed distributions of family sizes and correlations between family size and phenotypic composition. Is it possible that your observed correlations could arise if the propensity for immature CD27+ cells to differentiate into mature CD27- cells increases with division number? The relative frequency of the two populations would then also be impacted by differences in the division rates of each subset - one would need to explore this. But depending on the dependence of the differentiation rate on division number, there may be parameter regimes (and time points) at which the more differentiated cells can predominate within large clones even if they divide more slowly than their immature precursors. One might not then be able to rule out the two-state model. I would like to see a discussion or rebuttal of these issues. 

      We thank the reviewer for the insightful comment and drawing our attention to the Cyton models. We have discussed the Cyton models in the Introduction (lines 80-95) and the Discussion (lines 538-553) sections of the revised manuscript and carried out simulations for the variant of the Cyton model suggested by the reviewer. The two-state model showed that for certain parameters it can give rise to a negative correlation between the clone size and the percentage of immature (CD27+) NK cells in the absence of any death suggesting the potential importance of division destiny along with stochastic fluctuations in giving rise to the heterogeneity observed in NK cell clone size distributions in the expansion phase. In addition, we also considered a two-state model where the NK cell activation time in individual cells vary following a log-normal distribution; this two-state model also shows the presence of negative correlations between clone sizes and the percentage of immature NK cells within the clones. We have added new results (Figs. S2-3) and discussed the results (lines 223-232) in the Results and the Discussion (lines 538-553) sections. We believe these additional simulations provide new insights into the results we carried out with our two- and three- state models. 

      Reviewer #2 (Public review): 

      Summary: 

      Wethington et al. investigated the mechanistic principles underlying antigen-specific proliferation and memory formation in mouse natural killer (NK) cells following exposure to mouse cytomegalovirus (MCMV), a phenomenon predominantly associated with CD8+ T cells. Using a rigorous stochastic modeling approach, the authors aimed to develop a quantitative model of NK cell clonal dynamics during MCMV infection. 

      Initially, they proposed a two-state linear model to explain the composition of NK cell clones originating from a single immature Ly49+CD27+ NK cell at 8 days post-infection (dpi). Through stochastic simulations and analytical investigations, they demonstrated that a variant of the twostate model incorporating NK cell death could explain the observed negative correlation between NK clone sizes at 8 dpi and the percentage of immature (CD27+) NK cells (Page 8, Figure 1e, Supplementary Text 1). However, this two-state model failed to accurately reproduce the first (mean) and second (variance and covariance) moments of the measured CD27+ and CD27- NK cell populations within clones at 8 dpi (Figure 1g). 

      To address this limitation, the authors increased the model's complexity by introducing an intermediate maturation state, resulting in a three-stage model with the transition scheme: CD27+Ly6C- → CD27-Ly6C- → CD27-Ly6C+. This three-stage model quantitatively fits the first and second moments under two key constraints: (i) immature CD27+ NK cells exhibit faster proliferation than CD27- NK cells, and (ii) there is a negative correlation (upper bound: -0.2) between clone size and the fraction of CD27+ cells. The model predicted a high proliferation rate for the intermediate stage and a high death rate for the mature CD27-Ly6C+ cells. 

      Using NK cell reporter mice data from Adams et al. (2021), which tracked CD27+/- cell population dynamics following tamoxifen treatment, the authors validated the three-stage model. This dataset allowed discrimination between NK cells originating from the bone marrow and those pre-existing in peripheral blood at the onset of infection. To test the prediction that mature CD27- NK cells have a higher death rate, the authors measured Ly49H+ NK cell viability in the mice spleen at different time points post-MCMV infection. Experimental data confirmed that mature (CD27-) NK cells exhibited lower viability compared to immature (CD27+) NK cells during the expansion phase (days 4-8 post-infection). 

      Further mathematical analyses using a variant of the three-stage model supported the hypothesis that the higher death rate of mature CD27- cells contributes to a larger proportion of CD27- cells in the dead cell compartment, as introduced in the new variant model. 

      Altogether, the authors proposed a three-stage quantitative model of antigen-specific expansion and maturation of naïve Ly49H+ NK cells in mice. This model delineates a maturation trajectory: (i) CD27+Ly6C- (immature) → (ii) CD27-Ly6C- (mature I) → (iii) CD27-Ly6C+ (mature II). The findings highlight the highly proliferative nature of the mature I (CD27-Ly6C-) phenotype and the increased cell death rate characteristic of the mature II (CD27-Ly6C+) phenotype. 

      Strengths: 

      By designing models capable of explaining correlations, first and second moments, and employing analytical investigations, stochastic simulations, and model selection, the authors identified the key processes underlying antigen-specific expansion and maturation of NK cells. This model distinguishes the processes of antigen-specific expansion, contraction, and memory formation in NK cells from those observed in CD8+ T cells. Understanding these differences is crucial not only for elucidating the distinct biology of NK cells compared to CD8+ T cells but also for advancing the development of NK cell therapies currently under investigation. 

      We thank the reviewer for the positive comments.

      Weaknesses: 

      The conclusions of this paper are largely supported by the available data. However, a comparative analysis of model predictions with more recent works in the field would be desirable. Moreover, certain aspects of the simulations, parameter inference, and modeling require further clarification and expansion, as outlined below: 

      (1) Initial Conditions and Grassmann Data: The Grassmann data is used solely as a constraint, while the simulated values of CD27+/CD27- cells could have been directly fitted to the Grassmann data, which assumes a 1:1 ratio of CD27+/CD27- at t = 0. This approach would allow for an alternative initial condition rather than starting from a single CD27+ cell, potentially improving model applicability. 

      We fit the moments of the cell populations along with the ratio of resulting cells from an initial condition of 1:1 ratio of CD27+/CD27- cells at t=0 in the model. The initial condition agrees with the experimental data. However, this fit produced parameter values that will lead to greater growth of mature CD27- NK cells compared to that of immature CD27+ NK cells. This could result from the equal weights given to the ratio as well as to the different moments, and a realistic parameter estimate could correspond to an unequal weight between the ratio and the moments. Imposing the constraint Δ<sub>k</sub> >0 in the fitting drives the parameter search in the region, which seems to alleviate this issue that produces estimates of the rates consistent with higher growth of immature NK cells. We included Table S6 and accompanying description to show this, as well as an additional section in the Materials and Methods (lines 669-676). 

      (2) Correlation Coefficients in the Three-State Model: Although the parameter scan of the threestate model (Figure 2) demonstrates the potential for achieving negative correlations between colony size and the fraction of CD27+ cells, the authors did not present the calculated correlation coefficients using the estimated parameter values from fitting the three-state model to the data. Including these simulations would provide additional insight into the parameter space that supports negative correlations and further validate the model.  

      We have included this figure (Figure 2d) in the revised manuscript.

      (3) Viability Dynamics and Adaptive Response: The authors measured the time evolution of CD27+/- dynamics and viability over 30 days post-infection (Figure 4). It would be valuable to test whether the three-state model can reproduce the adaptive response of CD27- cells to MCMV infection, particularly the observed drop in CD27- viability at 5 dpi (prior to the 8 dpi used in the study) and its subsequent rebound at 8 dpi. Reproducing this aspect of the experiment is critical to determine whether the model can simultaneously explain viability dynamics and moment dynamics. Furthermore, this analysis could enable sensitivity analysis of CD27- viability with respect to various model parameters. 

      We have compared the expansion kinetics of the adoptively transferred Ly49H+ NK cells (Figure 2) and endogenous Ly49H+ NK cells, where the endogenous NK cells show slower growth rates than their adoptively transferred counterparts (see lines 422-429). The data shown in Figure 4 refer to the relative percentage of the mature and immature endogenous NK cells, thus cannot be explained by the three-state model calibrated by the expansion of the adoptively transferred NK cells. One of the issues with using the viability data for parameter estimation for endogenous cells is the need to assume a model for dead cell clearance. We assume a model where dead cells are cleared according to a first-order decay reaction and vary the rate of this reaction to show that the qualitative results are in line with our model rates. This model cannot recreate the dip and rebound observed in the data, and instead monotonically and asymptotically approaches a percentage of live cells. We have attached a figure showing this behavior below. Rather, we intend to use this model as qualitative validation that the relative viability of mature NK cells is lower than that of immature NK cells. Models that include time-dependence of clearance of dead cells, or models with a higher-order (i.e. second) reaction for clearance of dead cells in which propensity for clearance is lower at early times and greater at later times may be better suited for this purpose but are beyond the scope of our validation. 

      Author response image 1.

      Reviewer #1 (Recommendations for the authors):  

      I think the manuscript could be improved substantially by exploring alternative models that incorporate replicative history. At the very least it needs a deeper discussion of the literature relating to clonal expansion, putting the existing models in the context of these studies, and arguing convincingly that your conclusions are robust.  

      We have substantially expanded our explorations with alternative models, in particular we considered a variant of the Cyton model suggested by Reviewer#1, a model where NK cells become activated at different times, and a model with asymmetric NK cell division. We have shown the results (Figs. S2-3) in the Supplementary material and discussed the results in the Results and Discussion sections. Please refer to our response #1 to Reviewer #1 for more details. 

      Reviewer #2 (Recommendations for the authors): 

      (1) Possible Typo (Page 12, Line 254): 

      The phrase: "immature NK cells compared to their immature counterparts" appears to contain a typo. Consider rephrasing for clarity. 

      Done. Thanks for finding this. 

      (2) Clarification of Data Source and Computational Procedure: 

      In the statement: "The NK cell clones reported by Flommersfeld et al. contained mixtures of CD27+ and CD27- NK cells. We evaluated the percentage of CD27+ NK cells in each clone and computed the correlation (Csize-CD27+) of the size of the clone with the percentage of CD27+ NK cells in the clones." Please clarify the data source and computational methodology for evaluating the percentage of CD27+ cells within clones. Additionally, consider including the curated data in the supplementary materials. Since the data originates from different immune compartments, explain which compartments were used. If data from all compartments were included, discuss how the calculated correlation changes when stratifying data from different sources (e.g., spleen and lymph nodes).  

      We have clarified the data source (spleen) where appropriate.

      (3) Figure 1b (Correlation Coefficient): 

      While the correlation coefficient with p-value is mentioned, it would be beneficial to also provide the standard deviation of the correlation coefficient and a 95% confidence band for the fitted line. This is particularly relevant as the authors use -0.2 as the upper bound for the correlation coefficient when fitting the three-stage model. 

      We have included the CI and the p-value for the correlation shown in Figure 1b. The figure with the 95% confidence band shown in the figure (appended below) where both axes are in normal scale does not appear visually clear as in Figure 1b where the clone sizes are shown in the logscale. Thus, we did not include the confidence band in Figure 1b but display the CI and p-values on the figure. If the reviewer prefers, we can include the figure with the confidence band in the SI.

      Author response image 2.

      (4) Confidence Intervals in Tables: 

      If confidence intervals in the tables are calculated using bootstrapping, please mention this explicitly in the table headings for clarity. 

      Done.

      (5) Figure 2d-e (Simulation Method): 

      Specify the simulation method used (e.g., stochastic simulation algorithm [SSA], as mentioned in the materials and methods). Panel (e) lacks a caption-please provide one. Additionally, it would be interesting to include the correlation between clone size and the fraction of CD27+ cells in the clones (similar to the experimental data from Flommersfeld et al., 2021). 

      Done.

      (6) Figure 3 (Confidence Band): 

      Include a 95% confidence band for the simulated values to enhance the interpretability of the plots. 

      Done.

      (7) Materials and Methods Section:  Include a mathematical formula defining the metrics described, ensuring clarity and precision. 

      Done. See newly added lines 587-599, as well as existing content in the Supplementary Materials.

      (8) Supplementary Text 1 (Numerical Integration and AICc): 

      The section "Numerical Integration of Master Equation and Calculation of the AICc" is well done. However, given that the master equation involves a system of 106 coupled ODEs, it would be highly appreciated if the authors provided the formulation in matrix representation for better comprehension. 

      We have included a supplementary text (Supplementary Text I) and a schematic figure within the text to provide the details.

      (9) Figure S7b (Three-State Model Validation): 

      Given that the three-state model fits the data, assess whether it can also fit the first and secondmoment data effectively. This validation would strengthen the robustness of the model.

      Although we showed that the best fit of the clonal burst data (moments) vastly overestimates the growth rates of endogenous cells (Figure S9a, previously Figure S7a), we did not fully emphasize the differences in the datasets that make fitting both with the same parameters impossible. We have added additional text in the main text where Figure S9a is located (lines 427-429) to discuss this.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      The study by Klug et al. investigated the pathway specificity of corticostriatal projections, focusing on two cortical regions. Using a G-deleted rabies system in D1-Cre and A2a-Cre mice to retrogradely deliver channelrhodopsin to cortical inputs, the authors found that M1 and MCC inputs to direct and indirect pathway spiny projection neurons (SPNs) are both partially segregated and asymmetrically overlapping. In general, corticostriatal inputs that target indirect pathway SPNs are likely to also target direct pathway SPNs, while inputs targeting direct pathway SPNs are less likely to also target indirect pathway SPNs. Such asymmetric overlap of corticostriatal inputs has important implications for how the cortex itself may determine striatal output. Indeed, the authors provide behavioral evidence that optogenetic activation of M1 or MCC cortical neurons that send axons to either direct or indirect pathway SPNs can have opposite effects on locomotion and different effects on action sequence execution. The conclusions of this study add to our understanding of how cortical activity may influence striatal output and offer important new clues about basal ganglia function. 

      The conceptual conclusions of the manuscript are supported by the data, but the details of the magnitude of afferent overlap and causal role of asymmetric corticostriatal inputs on behavioral outcomes were not yet fully resolved. 

      We appreciate the reviewer’s thoughtful understanding and acknowledgment that the conceptual conclusion of asymmetric projections from the cortex to the striatum is well supported by our data. We also recognize the importance of further elucidating the extent of afferent overlap and the causal contributions of asymmetric corticostriatal inputs to behavioral outcomes. However, we respectfully note that current technical limitations pose significant challenges to addressing these questions with high precision.

      In response to the reviewer’s comments, we have now clarified the sample size, added proper analysis and elaborated on the experimental design to ensure that our conclusions are presented more transparently and are more accessible to the reader.

      After virally labeling either direct pathway (D1) or indirect pathway (D2) SPNs to optogenetically tag pathway-specific cortical inputs, the authors report that a much larger number of "non-starter" D2-SPNs from D2-SPN labeled mice responded to optogenetic stimulation in slices than "non-starter" D1 SPNs from D1-SPN labeled mice did. Without knowing the relative number of D1 or D2 SPN starters used to label cortical inputs, it is difficult to interpret the exact meaning of the lower number of responsive D2-SPNs in D1 labeled mice (where only ~63% of D1-SPNs themselves respond) compared to the relatively higher number of responsive D1-SPNs (and D2-SPNs) in D2 labeled mice. While relative differences in connectivity certainly suggest that some amount of asymmetric overlap of inputs exists, differences in infection efficiency and ensuing differences in detection sensitivity in slice experiments make determining the degree of asymmetry problematic. 

      Thank you for highlighting this point. As it lies at the core of our manuscript, we agree that it is essential to present it clearly and convincingly. As shown by the statistics (Fig. 2B-F), non-starter D1- and D2-SPNs appear to receive fewer projections from D1-projecting cortical neurons (Input D1-record D1, 0.63; Input D1-record D2, 0.40) compared to D2-projecting cortical neurons (Input D2 - record D1, 0.73; Input D2 -record D2, 0.79).

      While it is not technically feasible to quantify the number of infected cells in brain slices following electrophysiological recordings, we addressed this limitation by collecting data from multiple animals and restricting recordings to cells located within the injection sites. In Figure 2D, we used 7 mice in the D1-projecting to D1 EGFP(+) group, 8 mice in the D1-projecting to D2 EGFP(-) group, 10 mice in the D2-projecting to D2 EGFP(+) group, and 8 mice in the D2-projecting to D1 EGFP(-) group. In Figure 2G, the group sizes were as follows: 8 mice in the D1-projecting to D2 EGFP(+) group, 7 mice in the D1-projecting to D1 EGFP(-) group, 8 mice in the D2-projecting to D1 EGFP(+) group, and 10 mice in the D2-projecting to D2 EGFP(-) group. In both panels, connection ratios were compared using Fisher’s exact test. Comparisons were then made across experimental groups. Furthermore, as detailed in our Methods section (page 20, line 399-401), we assessed cortical expression levels prior to performing whole-cell recordings. Taken together, these precautions help ensure that the calculated connection ratios are unlikely to be confounded by differences in infection efficiency.

      It is also unclear if retrograde labeling of D1-SPN- vs D2-SPN- targeting afferents labels the same densities of cortical neurons. This gets to the point of specificity in the behavioral experiments. If the target-based labeling strategies used to introduce channelrhodopsin into specific SPN afferents label significantly different numbers of cortical neurons, might the difference in the relative numbers of optogenetically activated cortical neurons itself lead to behavioral differences? 

      Thank you for bringing this concern to our attention. While optogenetic manipulation has become a widely adopted tool in functional studies of neural circuits, it remains subject to several technical limitations due to the nature of its implementation. Factors such as opsin expression efficiency, optic fiber placement, light intensity, stimulation spread, and other variables can all influence the specificity and extent of neuronal activation or inhibition. As such, rigorous experimental controls are essential when interpreting the outcomes of optogenetic experiments.

      In our study, we verified both the expression of channelrhodopsin in D1- or D2-projecting cortical neurons and the placement of the optic fiber following the completion of behavioral testing. To account for variability, we compared the behavioral effects of optogenetic stimulation within the same animals, stimulated versus non-stimulated conditions, as shown in Figures 3 and 4. Moreover, Figure S3 includes important controls that rule out the possibility that the behavioral effects observed were due to direct activation of D1- or D2-SPNs in striatum or to light alone in the cortex.

      An additional point worth emphasizing is that the behavioral effects observed in the open field and ICSS tests cannot be attributed to differences in the number of neurons activated. Specifically, activation of D1-projecting cortical neurons promoted locomotion in the open field, whereas activation of D2-projecting cortical neurons did not. However, in the ICSS test, activation of both D1- and D2-projecting cortical neurons reinforced lever pressing. Given that only D1-SPN activation, but not D2-SPN activation, supports ICSS behavior, these effects are unlikely to result merely from differences in the number of neurons recruited.

      This rationale underlies our use of multiple behavioral paradigms to examine the functions of D1- and D2-projecting cortical neurons. By assessing behavior across distinct tasks, we aimed to approach the question from multiple angles and reduce the likelihood of spurious or confounding effects influencing our interpretation.

      In general, the manuscript would also benefit from more clarity about the statistical comparisons that were made and sample sizes used to reach their conclusions.

      We thank the reviewer for the valuable suggestion to improve the manuscript. In response, we have made the following changes and provided additional clarification:

      (1) In Figure 2D, we used 7 mice in the D1-projecting to D1 EGFP(+) group, 8 mice in the D1-projecting to D2 EGFP(-) group, 10 mice in the D2-projecting to D2 EGFP(+) group, and 8 mice in the D2-projecting to D1 EGFP(-) group. In Figure 2G, the group sizes were as follows: 8 mice in the D1-projecting to D2 EGFP(+) group, 7 mice in the D1-projecting to D1 EGFP(-) group, 8 mice in the D2-projecting to D1 EGFP(+) group, and 10 mice in the D2-projecting to D2 EGFP(-) group. In both panels, connection ratios were compared using Fisher’s exact test.

      (2) In Figure 3, we reanalyzed the data in panels O, P, R, and S using permutation tests to assess whether each individual group exhibited a significant ICSS learning effect. The figure legend has been revised accordingly as follows:

      (O-P) D1-SPN (red) but not D2-SPN stimulation (black) drives ICSS behavior in both the DMS (O: D1, n = 6, permutation test, slope = 1.5060, P = 0.0378; D2, n = 5, permutation test, slope = -0.2214, P = 0.1021; one-tailed Mann Whitney test, Day 7 D1 vs. D2, P = 0.0130) and the DLS (P: D1, n = 6, permutation test, slope = 28.1429, P = 0.0082; D2, n = 5, permutation test, slope = -0.3429, P = 0.0463; one-tailed Mann Whitney test, Day 7 D1 vs. D2, P = 0.0390). *, P < 0.05. (Q) Timeline of helper virus injections, rabies-ChR2 injections and optogenetic stimulation for ICSS behavior. (R-S) Optogenetic stimulation of the cortical neurons projecting to either D1- or D2-SPNs induces ICSS behavior in both the MCC (R: MCC-D1, n = 5, permutation test, Day1-Day7, slope = 2.5857, P = 0.0034; MCC-D2, n = 5, Day2-Day7, permutation test, slope = 1.4229, P = 0.0344; no significant effect on Day7, MCC-D1 vs. MCC-D2,  two-tailed Mann Whitney test, P = 0.9999) and the M1 (S: M1-D1, n = 5, permutation test, Day1-Day7, slope = 1.8214, P = 0.0259; M1-D2, n = 5, Day1-Day7, permutation test, slope = 1.8214, P = 0.0025; no significant effect on Day7, M1-D1 vs. M1-D2, two-tailed Mann Whitney test, P = 0.3810). n.s., not statistically significant.

      (3) In Figure 4, we have added a comparison against a theoretical percentage change of zero to better evaluate the net effect of each manipulation. The results showed that in Figure 4D, optogenetic stimulation of D1-projecting MCC neurons significantly increased the pressing rate, whereas stimulation of D2-projecting MCC neurons did not (MCC-D1: n = 8, one-sample two-tailed t-test, t = 2.814, P = 0.0131; MCC-D2: n = 7, t = 0.8481, P = 0.4117). In contrast, in Figure 4H, optogenetic stimulation of both D1- and D2-projecting M1 neurons significantly increased the sequence press rate (M1-D1: n = 6, one-sample two-tailed Wilcoxon signed-rank test, P = 0.0046; M1-D2: n = 7, P = 0.0479).

      Reviewer #2 (Public Review):

      Summary: 

      Klug et al. use monosynaptic rabies tracing of inputs to D1- vs D2-SPNs in the striatum to study how separate populations of cortical neurons project to D1- and D2-SPNs. They use rabies to express ChR2, then patch D1-or D2-SPNs to measure synaptic input. They report that cortical neurons labeled as D1-SPN-projecting preferentially project to D1-SPNs over D2-SPNs. In contrast, cortical neurons labeled as D2-SPN-projecting project equally to D1- and D2-SPNs. They go on to conduct pathway-specific behavioral stimulation experiments. They compare direct optogenetic stimulation of D1- or D2-SPNs to stimulation of MCC inputs to DMS and M1 inputs to DLS. In three different behavioral assays (open field, intra-cranial self-stimulation, and a fixed ratio 8 task), they show that stimulating MCC or M1 cortical inputs to D1-SPNs is similar to D1-SPN stimulation, but that stimulating MCC or M1 cortical inputs to D2-SPNs does not recapitulate the effects of D2-SPN stimulation (presumably because both D1- and D2-SPNs are being activated by these cortical inputs). 

      Strengths: 

      Showing these same effects in three distinct behaviors is strong. Overall, the functional verification of the consequences of the anatomy is very nice to see. It is a good choice to patch only from mCherry-negative non-starter cells in the striatum.

      Thank you for your profound understanding and appreciation of our manuscript’s design and the methodologies employed. In the realm of neuroscience, quantifying synaptic connections is a formidable challenge. While the roles of the direct and indirect pathways in motor control have long been explored, the mechanism by which upstream cortical inputs govern these pathways remains shrouded in mystery at the circuitry level.

      In the ‘Go/No-Go’ model, the direct and indirect pathways operate antagonistically; in contrast, the ‘Co-activation’ model suggests that they work cooperatively to orchestrate movement. These distinct theories raise a compelling question: Do these two pathways receive inputs from the same upstream cortical neurons, or are they modulated by distinct subpopulations? Answering this question could provide vital clues as to whether these pathways collaborate or operate independently.

      Previous studies have revealed both differences and similarities in the cortical inputs to direct and indirect pathways at population level. However, our investigation delves deeper to understand how a singular cortical input simultaneously drives these pathways, or might it regulate one pathway through distinct subpopulations? To address this, we employed rabies virus–mediated retrograde tracing from D1- or D2-SPNs and recorded non-starter SPNs to determine if they receive the same inputs as the starter SPNs. This approach allowed us to calculate the connection ratio and estimate the probable connection properties.

      Weaknesses: 

      One limitation is that all inputs to SPNs are expressing ChR2, so they cannot distinguish between different cortical subregions during patching experiments. Their results could arise because the same innervation patterns are repeated in many cortical subregions or because some subregions have preferential D1-SPN input while others do not.

      Thank you for raising this thoughtful concern. It is indeed not feasible to restrict ChR2 expression to a specific cortical region using the first-generation rabies-ChR2 system alone. A more refined approach would involve injecting Cre-dependent TVA and RG into the striatum of D1- or A2A-Cre mice, followed by rabies-Flp infection. Subsequently, a Flp-dependent ChR2 virus could be injected into the MCC or M1 to selectively label D1- or D2-projecting cortical neurons. This strategy would allow for more precise targeting and address many of the current limitations.

      However, a significant challenge lies in the cytotoxicity associated with rabies virus infection. Neuronal health begins to deteriorate substantially around 10 days post-infection, which provides an insufficient window for robust Flp-dependent ChR2 expression. We have tested several new rabies virus variants with extended survival times (Chatterjee et al., 2018; Jin et al., 2024), but unfortunately, they did not perform effectively or suitably in the corticostriatal systems we examined.

      In our experimental design, the aim is to delineate the connectivity probabilities to D1 or D2-SPNs from cortical neurons. Our hypothesis considered includes the possibility that similar innervation patterns could occur across multiple cortical subregions, or that some subregions might show preferential input to D1-SPNs while others do not, or a combination of both scenarios. This leads us to perform a series behavior test that using optogenetic activation of the D1- or D2-projecting cortical populations to see which could be the case.

      In the cortical areas we examined, MCC and M1, during behavioral testing, there is consistency with our electrophysiological results. Specifically, when we stimulated the D1-projecting cortical neurons either in MCC or in M1, mice exhibited facilitated local motion in open field test, which is the same to the activation of D1 SPNs in the striatum along (MCC: Fig 3C & D vs. I; M1: Fig 3F & G vs. L). Conversely, stimulation of D2-projecting MCC or M1 cortical neurons resulted in behavioral effects that appeared to combine characteristics of both D1- and D2-SPNs activation in the striatum (MCC: Fig 3C & D vs. J; M1: Fig 3F & G vs. M). The similar results were observed in the ICSS test. Our interpretation of these results is that the activation of D1-projecting neurons in the cortex induces behavior changes akin to D1 neuron activation, while activation of D2-projecting neurons in the cortex leads to a combined effect of both D1 and D2 neuron activation. This suggests that at least some cortical regions, the ones we tested, follow the hypothesis we proposed.

      There are also some caveats with respect to the efficacy of rabies tracing. Although they only patch non-starter cells in the striatum, only 63% of D1-SPNs receive input from D1-SPN-projecting cortical neurons. It's hard to say whether this is "high" or "low," but one question is how far from the starter cell region they are patching. Without this spatial indication of where the cells that are being patched are relative to the starter population, it is difficult to interpret if the cells being patched are receiving cortical inputs from the same neurons that are projecting to the starter population. Convergence of cortical inputs onto SPNs may vary with distance from the starter cell region quite dramatically, as other mapping studies of corticostriatal inputs have shown specialized local input regions can be defined based on cortical input patterns (Hintiryan et al., Nat Neurosci, 2016, Hunnicutt et al., eLife 2016, Peters et al., Nature, 2021).

      This is a valid concern regarding anatomical studies. Investigating cortico-striatal connectivity at the single-cell level remains technically challenging due to current methodological limitations. At present, we rely on rabies virus-mediated trans-synaptic retrograde tracing to identify D1- or D2-projecting cortical populations. This anatomical approach is coupled with ex vivo slice electrophysiology to assess the functional connectivity between these projection-defined cortical neurons and striatal SPNs. This enables us to quantify connection ratios, for example, the proportion of D1-projecting cortical neurons that functionally synapse onto non-starter D1-SPNs.

      To ensure the robustness of our conclusions, it is essential that both the starter cells and the recorded non-starter SPNs receive comparable topographical input from the cortex and other brain regions. Therefore, we carefully designed our experiments so that all recorded cells were located within the injection site, were mCherry-negative (i.e., non-starter cells), and were surrounded by ChR2-mCherry-positive neurons. This configuration ensured that the distance between recorded and starter cells did not exceed 100 µm, maintaining close anatomical proximity and thereby preserving the likelihood of shared cortical innervation within the examined circuitry.

      These methodological details are also described in the section on ex vivo brain slice electrophysiology, specifically in the Methods section, lines 396–399:

      “D1-SPNs (eGFP-positive in D1-eGFP mice, or eGFP-negative in D2-eGFP mice) or D2-SPNs (eGFP-positive in D2-eGFP mice, or eGFP-negative in D1-eGFP mice) that were ChR2-mCherry-negative, but in the injection site and surrounded by cells expressing ChR2-mCherry were targeted for recording.”

      This experimental strategy was implemented to control for potential spatial biases and to enhance the interpretability of our connectivity measurements.

      A caveat for the optogenetic behavioral experiments is that these optogenetic experiments did not include fluorophore-only controls.

      Thank you for bringing this to our attention. A fluorophore-only control is indeed a valuable negative control, commonly used to rule out effects caused by light exposure independent of optogenetic manipulation. In this study, however, comparisons were made between light-on and light-off conditions within the same animal. This within-subject design, as employed in recent studies (Geddes et al., 2018; Zhu et al., 2025), is considered sufficient to isolate the effects of optogenetic manipulation.

      Furthermore, as shown in Figure S3, we conducted an additional control experiment in which optogenetic stimulation was applied to M1, while ensuring that ChR2 expression was restricted to the striatum via targeted viral infection. This approach serves as a functional equivalent to the control you suggested. Importantly, we observed no effects that could be attributed solely to light exposure, further supporting the conclusion that the observed outcomes in our main experiments are due to targeted optogenetic manipulation, rather than confounding effects of illumination.

      Lastly, by employing an in-animal comparison, measuring changes between stimulated and non-stimulated trials, we account for subject-specific variability and strengthen the interpretability of our findings.

      Another point of confusion is that other studies (Cui et al, J Neurosci, 2021) have reported that stimulation of D1-SPNs in DLS inhibits rather than promotes movement.

      Thank you for bringing the study by Cui and colleagues to our attention. While that study has generated some controversy, other independent investigations have demonstrated that activation of D1-SPNs in DLS facilitates local motion and lever-press behaviors (Dong et al., 2025; Geddes et al., 2018; Kravitz et al., 2010).

      It is still worth to clarify. The differences in behavioral outcomes observed between our study and that of Cui et al. may be attributable to several methodological factors, including differences in both the stereotaxic targeting coordinates and the optical fiber specifications used for stimulation.

      Specifically, in our experiments, the dorsomedial striatum (DMS) was targeted at coordinates AP +0.5 mm, ML ±1.5 mm, DV –2.2 mm, and the DLS at AP +0.5 mm, ML ±2.5 mm, DV –2.2 mm. In contrast, Cui et al. targeted the DMS at AP +0.9 mm, ML ±1.4 mm, DV –3.0 mm and the DLS at AP +0.7 mm, ML ±2.3 mm, DV –3.0 mm. These coordinates correspond to sites that are slightly more rostral and ventral compared to our own. Even subtle differences in anatomical targeting can result in activation of distinct neuronal subpopulations, which may account for the differing behavioral effects observed during optogenetic stimulation.

      In addition, the optical fibers used in the two studies varied considerably. We employed fibers with a 200 µm core diameter and a numerical aperture (NA) of 0.37, whereas Cui et al. used fibers with a 250 µm core diameter and a higher NA of 0.66. The combination of a larger core and higher NA in their setup implies a broader spatial spread and deeper tissue penetration of light, likely resulting in activation of a larger neural volume. This expanded volume of stimulation may have engaged additional neural circuits not recruited in our experiments, further contributing to the divergent behavioral outcomes. Taken together, these differences in targeting and photostimulation parameters are likely key contributors to the distinct effects reported between the two studies.

      Reviewer #3 (Public Review): 

      In the manuscript by Klug and colleagues, the investigators use a rabies virus-based methodology to explore potential differences in connectivity from cortical inputs to the dorsal striatum. They report that the connectivity from cortical inputs onto D1 and D2 MSNs differs in terms of their projections onto the opposing cell type, and use these data to infer that there are differences in cross-talk between cortical cells that project to D1 vs. D2 MSNs. Overall, this manuscript adds to the overall body of work indicating that there are differential functions of different striatal pathways which likely arise at least in part by differences in connectivity that have been difficult to resolve due to difficulty in isolating pathways within striatal connectivity and several interesting and provocative observations were reported. Several different methodologies are used, with partially convergent results, to support their main points.

      However, I have significant technical concerns about the manuscript as presented that make it difficult for me to interpret the results of the experiments. My comments are below.

      Major:

      There is generally a large caveat to the rabies studies performed here, which is that both TVA and the ChR2-expressing rabies virus have the same fluorophore. It is thus essentially impossible to determine how many starter cells there are, what the efficiency of tracing is, and which part of the striatum is being sampled in any given experiment. This is a major caveat given the spatial topography of the cortico-striatal projections. Furthermore, the authors make a point in the introduction about previous studies not having explored absolute numbers of inputs, yet this is not at all controlled in this study. It could be that their rabies virus simply replicates better in D1-MSNs than D2-MSNs. No quantifications are done, and these possibilities do not appear to have been considered. Without a greater standardization of the rabies experiments across conditions, it is difficult to interpret the results.

      We thank the reviewer for raising these questions, which merit further discussion.

      Firstly, the primary aim of our study is to investigate the connectivity of the corticostriatal pathway. Given the current technical limitations, it is not feasible to trace all the striatal SPNs connected to a single cortical neuron. Therefore, we approached this from the opposite direction, starting from D1- or D2-SPNs to retrogradely label upstream cortical neurons, and then identifying their connected SPNs via functional synaptic recordings. To achieve this, we employed the only available transsynaptic retrograde method: rabies virus-mediated tracing. Because we crossed D1- or D2-GFP mice with D1- or A2A-Cre mice to identify SPN subtypes during electrophysiological recordings, the conventional rabies-GFP system could not be used to distinguish starter cells without conflicting with the GFP labeling of SPNs. To overcome this, we tagged ChR2 expression with mCherry. In this setup, we recorded from mCherry-negative D1- or D2-SPNs within the injection site and surrounded by mCherry-positive neurons. This ensures that the recorded neurons are topographically matched to the starter cell population and receive input from the same cortical regions. We acknowledge that TVA-only and ChR2-expressing cells are both mCherry-positive and therefore indistinguishable in our system. As such, mCherry-positive cells likely comprise a mixture of starter cells and TVA-only cells, representing a somewhat broader population than starter cells alone. Nevertheless, by restricting recordings to mCherry-negative SPNs within the injection site, it is ensured that our conclusions about functional connectivity remain valid and aligned with the primary objective of this study.

      Secondly, if rabies virus replication were significantly more efficient in D1-SPNs than in D2-SPNs, this would likely result in a higher observed connection probability in the D1-projecting group. However, we used consistent genetic strategies across all groups: D1-SPNs were defined as GFP-positive in D1-GFP mice and GFP-negative in D2-GFP mice, with D2-SPNs defined analogously. Recordings from both D1- and D2-SPNs were performed using the same methodology and under the same injection conditions within the same animals. This internal control helps mitigate the possibility that differential rabies infection efficiency biased our results.

      With these experimental safeguards in place, we found that 40% of D2-SPNs received input from D1-SPN-projecting cortical neurons, while 73% of D1-SPNs received input from D2-SPN-projecting cortical neurons. Although the ideal scenario would involve an even larger sample size to refine these estimates, the technical demands of post-rabies-infection electrophysiological recordings inherently limit throughput. Nonetheless, our approach represents the most feasible and accurate method currently available, and provides a significant advance in characterizing the functional connectivity within corticostriatal circuits.

      The authors claim using a few current clamp optical stimulation experiments that the cortical cells are healthy, but this result was far from comprehensive. For example, membrane resistance, capacitance, general excitability curves, etc are not reported. In Figure S2, some of the conditions look quite different (e.g., S2B, input D2-record D2, the method used yields quite different results that the authors write off as not different). Furthermore, these experiments do not consider the likely sickness and death that occurs in starter cells, as has been reported elsewhere. The health of cells in the circuit is overall a substantial concern that alone could invalidate a large portion, if not all, of the behavioral results. This is a major confound given those neurons are thought to play critical roles in the behaviors being studied. This is a major reason why first-generation rabies viruses have not been used in combination with behavior, but this significant caveat does not appear to have been considered, and controls e.g., uninfected animals, infected with AAV helpers, etc, were not included.

      We understand and appreciate the reviewer’s concern regarding the potential cytotoxicity of rabies virus infection. Indeed, this is a critical consideration when interpreting functional connectivity data. We have tested several newer rabies virus variants reported to support extended survival times (Chatterjee et al., 2018; Jin et al., 2024), but unfortunately, these variants did not perform reliably in the corticostriatal circuits we examined.

      Given these limitations, we relied on the rabies virus approach originally developed by Osakada et al. (Osakada et al., 2011), which demonstrated that neurons infected with rabies virus expressing ChR2 remain both viable and functional up to at least 10 days post-infection (Fig. 3, cited below). In our own experiments, we further validated the health and viability of cortical neurons, the presynaptic partners of SPNs, particularly around day 7 post-infection.

      To minimize the risk of viral toxicity, we performed ex vivo slice recordings within a conservative time window, between 4 and 8 days after infection, when the health of labeled neurons is well maintained. Moreover, the recorded SPNs were consistently mCherry-negative, indicating they were not directly infected by rabies virus, thus further reducing the likelihood of recording from compromised cells.

      Taken together, these steps help ensure that our synaptic recordings reflect genuine functional connectivity, rather than artifacts of viral toxicity. We hope this clarifies the rationale behind our experimental design.

      For the behavioral tests, including a naïve uninfected group and an AAV helper virus-only group as negative controls could be beneficial to isolate the specific impact of rabies virus infection. However, our primary focus is on the activation of selected presynaptic inputs to D1- or D2-SPNs by optogenetic method. Therefore, comparing stimulated versus non-stimulated trials within the same animal offers more direct and relevant results for our study objectives.

      It is also important to note that the ICSS test is particularly susceptible to the potential cytotoxic effects of rabies virus, as it spans a relatively extended period, from Day 4 to Day 12 post-infection. To mitigate this issue, we focused our analysis on the first 7 days of ICSS testing, thereby keeping the behavioral observations within 10 days post-rabies injection. This approach minimizes potential confounds from rabies-induced neurotoxicity while still capturing the relevant behavioral dynamics. Accordingly, we have revised Figure 3 and updated the statistical analyses to reflect this adjustment.

      The overall purity (e.g., EnvA-pseudotyping efficiency) of the RABV prep is not shown. If there was a virus that was not well EnvA-pseudotyped and thus could directly infect cortical (or other) inputs, it would degrade specificity.

      We agree that anatomical specificity is crucial for accurately labeling inputs to defined SPN populations in our study. The rabies virus strain employed here has been rigorously validated for its specificity in numerous previous studies from our group and others (Aoki et al., 2019; Klug et al., 2018; Osakada et al., 2011; Smith et al., 2016; Wall et al., 2013; Wickersham et al., 2007). For example, in a recent study by Aoki et al. (Aoki et al., 2019), we tested the same rabies virus strain by co-injecting the glycoprotein-deleted rabies virus and the TVA-expressing helper virus, without glycoprotein expressing AAV, into the SNr. As shown in Figure S1 (related to Figure 2), GFP expression was restricted to starter cells within the SNr, with no evidence of transsynaptic labeling in upstream regions such as the striatum, EPN, GPe, or STN (see panels F–H). These findings provide strong evidence that the rabies virus used in our experiments is properly pseudotyped and exhibits high specificity for starter cell labeling without off-target spread.

      We appreciate the reviewer’s emphasis on specificity, and we hope this clarification further supports the reliability of our anatomical tracing approach.

      While most of the study focuses on the cortical inputs, in slice recordings, inputs from the thalamus are not considered, yet likely contribute to the observed results. Related to this, in in vivo optogenetic experiments, technically, if the thalamic or other inputs to the dorsal striatum project to the cortex, their method will not only target cortical neurons but also terminals of other excitatory inputs. If this cannot be ruled it, stating that the authors are able to selectively activate the cortical inputs to one or the other population should be toned down.

      We agree with the reviewer that the thalamus is also a significant source of excitatory input to the striatum. However, current techniques do not allow for precise and exclusive labeling of upstream neurons in a given brain region, such as the cortex or thalamus. This technical limitation indeed makes it difficult to definitively determine whether inputs from these regions follow the same projection rules. Despite this, our findings show that stimulation of defined cortical populations, specifically, D1- or D2-projecting neurons in MCC and M1, elicits behavioral outcomes that closely mirror those observed in our ex vivo slice recordings, providing strong support for the cortical origin of the effects we observed.

      In our in vivo optogenetic experiments, we acknowledge that stimulating a specific cortical region may also activate axonal terminals from rabies-infected cortical or thalamic neurons. While somatic stimulation is generally more effective than terminal stimulation, we recognize the possibility that terminals on non-rabies-traced cortical neurons could be activated through presynaptic connections. To address this, we considered the finding of a previous study (Cruikshank et al., 2010), which demonstrated that while brief optogenetic stimulation (0.05 ms) of thalamo-cortical terminals can elicit few action potentials in postsynaptic cortical neurons, sustained terminal stimulation (500 ms) also results in only transient postsynaptic firing rather than prolonged activation (Fig. 3C, cited below). This suggests that cortical neurons exhibit only short-lived responses to continuous presynaptic stimulation of thalamic origin.

      In comparison, our behavioral paradigms employed prolonged optogenetic stimulation protocols- 20 Hz, 10 ms pulses for 15 s (open-field test), 1 s (ICSS), and 8 s (FR4/8)—which more closely resemble sustained stimulation conditions. Given these parameters, and the robust behavioral responses observed, it means that the effects are primarily mediated by activation of rabies-labeled, ChR2-expressing D1- or D2-projecting cortical neurons rather than indirect activation through thalamic input.

      We appreciate the reviewer’s valuable comment, and we have now incorporated this point into the revised manuscript (page 13, line 265 to 275) to more clearly address the potential contribution of thalamic inputs in our experimental design.

      The statements about specificity of connectivity are not well-founded. It may be that in the specific case where they are assessing outside of the area of injections, their conclusions may hold (e.g., excitatory inputs onto D2s have more inputs onto D1s than vice versa). However, how this relates to the actual site of injection is not clear. At face value, if such a connectivity exists, it would suggest that D1-MSNs receive substantially more overall excitatory inputs than D2s. It is thus possible that this observation would not hold over other spatial intervals. This was not explored and thus the conclusions are over-generalized. e.g., the distance from the area of red cells in the striatum to recordings was not quantified, what constituted a high level of cortical labeling was not quantified, etc. Without more rigorous quantification of what was being done, it is difficult to interpret the results. 

      We sincerely thank the reviewer for the thoughtful comments and critical insights into our interpretation of connectivity data. These concerns are valid and provide an important opportunity to clarify and reinforce our experimental design and conclusions.

      Firstly, as described in our previous response, all patched neurons were carefully selected to be within the injection site and in close proximity to ChR2-mCherry-positive cells. Specifically, the estimated distance from each recorded neuron to the nearest starter cells did not exceed 100 µm. This design choice was made to minimize variability associated with spatial distance or heterogeneity in viral expression, thereby allowing for a more consistent sampling of putatively connected neurons.

      Secondly, quantifying both the number of starter and input neurons would, in principle, provide a more comprehensive picture of connectivity. However, given the technical limitations of the current approach particularly when combining rabies tracing with functional recordings it is not feasible to obtain such precise cell counts. Instead, we focused on connection ratios derived from targeted electrophysiological recordings, which offer a reliable and practical means of estimating connectivity within these defined circuits.

      Thirdly, regarding the potential influence of rabies-labeled neurons beyond the immediate recording site: while we acknowledge that rabies tracing labels a broad set of upstream neurons, our analysis was confined to a well-defined and localized area. The analogy we find helpful here is that of a spotlight - our recordings were restricted to the illuminated region directly under the beam, where the projection pattern is fixed and interpretable, regardless of what lies outside that area. Although we cannot fully account for all possible upstream connections, our methodology was designed to minimize variability and maintain consistency in the region of interest, which we believe supports the robustness of our conclusions in the ex vivo slice recording experiment.

      We hope this additional explanation addresses the reviewer’s concerns and helps clarify the rationale of our experimental strategy.

      The results in figure 3 are not well controlled. The authors show contrasting effects of optogenetic stimulation of D1-MSNs and D2-MSNs in the DMS and DLS, results which are largely consistent with the canon of basal ganglia function. However, when stimulating cortical inputs, stimulating the inputs from D1-MSNs gives the expected results (increased locomotion) while stimulating putative inputs to D2-MSNs had no effect. This is not the same as showing a decrease in locomotion - showing no effect here is not possible to interpret.

      We apologize for any confusion and appreciate the opportunity to clarify this point. Our electrophysiological recordings demonstrated that D1-projecting cortical neurons preferentially innervate D1-SPNs in the striatum, whereas D2-projecting cortical neurons provide input to both D1- and D2-SPNs, without a clear preference. These synaptic connectivity patterns are further supported by our behavioral experiments: optogenetic stimulation of D1-projecting neurons in cortical areas such as MCC and M1 led to behavioral effects consistent with direct D1-SPN activation. In contrast, stimulation of D2-projecting cortical neurons produced behavioral outcomes that appeared to reflect a mixture of both D1- and D2-SPN activation.

      We acknowledge that interpreting negative behavioral findings poses inherent challenges, as it is difficult to distinguish between a true lack of effect and insufficient experimental manipulation. To mitigate this, we ensured that all animals included in the analysis exhibited appropriate viral expression and correctly placed optic fibers in the targeted regions. These controls help to confirm that the observed behavioral effects - or lack thereof - are indeed due to the activation of the intended neuronal populations rather than technical artifacts such as weak expression or fiber misplacement.

      As shown in Author response image 1 below, our verification of virus expression and fiber positioning confirms effective targeting in MCC and M1 of A2A-Cre mice. Therefore, we interpret the negative behavioral outcomes as meaningful consequences of specific neural circuit activation.

      Author response image 1.

      Confocal image from A2A-Cre mouse showing targeted optogenetic stimulation of D2-projecting cortical neurons in MCC or M1. ChR2-mCherry expression highlights D2-projecting neurons, selectively labeled via rabies-mediated tracing. Optic fiber placement is confirmed above the cortical region of interest. Image illustrates robust expression and anatomical specificity necessary for pathway-selective stimulation in behavioral assays.

      In light of their circuit model, the result showing that inputs to D2-MSNs drive ICSS is confusing. How can the authors account for the fact that these cells are not locomotor-activating, stimulation of their putative downstream cells (D2-MSNs) does not drive ICSS, yet the cortical inputs drive ICSS? Is the idea that these inputs somehow also drive D1s? If this is the case, how do D2s get activated, if all of the cortical inputs tested net activate D1s and not D2s? Same with the results in figure 4 - the inputs and putative downstream cells do not have the same effects. Given the potential caveats of differences in viral efficiency, spatial location of injections, and cellular toxicity, I cannot interpret these experiments.

      We apologize for any confusion in our previous explanation. In our behavioral experiments, the primary objective was to determine whether activation of D1- or D2-projecting cortical neurons would produce behavioral outcomes distinct from those observed with pure D1 or D2 activation.

      Our findings show that stimulation of D1-projecting cortical neurons produced behavioral effects closely resembling those of selective D1 activation in both open field and ICSS tests. This is consistent with our slice recording data, which revealed that D1-projecting cortical neurons exhibit a higher connection probability with D1-SPNs than with D2-SPNs.

      In contrast, interpreting the effects of D2-projecting cortical neuron stimulation is inherently more nuanced. In the open field test, activation of these neurons did not significantly modulate local motion. This could reflect a balanced influence of D1 activation, which facilitates movement, and D2 activation, which suppresses it - resulting in a net neutral behavioral outcome. In the ICSS test, the absence of a strong reinforcement effect typically associated with D2 activation, combined with partial reinforcement likely due to concurrent D1 activation, suggests that stimulation of D2-projecting neurons produces a mixed behavioral signal. This outcome supports the interpretation that these neurons synapse onto both D1- and D2-SPNs, leading to a blended behavioral response that differs from selective D1 or D2 activation alone.

      Together, these two behavioral assays offer complementary perspectives, providing a more complete view of how projection-specific cortical inputs influence striatal output and behavior.

      In Figure 4 of the current manuscript (as cited below), we show that optogenetic activation of MCC neurons projecting to D1-SPNs facilitates sequence lever pressing, whereas activation of MCC neurons projecting to D2-SPNs does not induce significant behavioral changes. Conversely, activation of M1 neurons projecting to either D1- or D2-SPNs enhances lever pressing sequences. These observations align with our prior findings (Geddes et al., 2018; Jin et al., 2014), where we demonstrated that in the striatum, D1-SPN activation facilitates ongoing lever pressing, whereas D2-SPN activation is more involved in suppressing ongoing actions and promoting transitions between sub-sequences, shown in Fig. 4 from (Geddes et al., 2018; Jin et al., 2014) and Fig. 5K from (Jin et al., 2014) . Taken together, the facilitation of lever pressing by D1-projecting MCC and M1 neurons is consistent with their preferential connectivity to D1-SPNs and their established behavioral role.

      What is particularly intriguing, though admittedly more complex, is the behavioral divergence observed upon activation of D2-SPN-projecting cortical neurons. Activation of D2-projecting MCC neurons does not alter lever pressing, possibly reflecting a counterbalancing effect from concurrent D1- and D2-SPN activation. In contrast, stimulation of D2-projecting M1 neurons facilitates lever pressing, albeit less robustly than their D1-projecting counterparts. This discrepancy may reflect regional differences in striatal targets, DMS for MCC versus DLS for M1, as also supported by our open field test results. Furthermore, our recent findings (Zhang et al., 2025) show that synaptic strength from Cg to D2-SPNs is stronger than to D1-SPNs, whereas the M1 pathway exhibits the opposite pattern. These data suggest that beyond projection ratios, synaptic strength also shapes cortico-striatal functional output. Thus, stronger D2-SPN synapses in the DMS may offset D1-SPN activation during MCC-D2 stimulation, dampening lever pressing increase. Conversely, weaker D2 synapses in the DLS may permit M1-D2 projections to facilitate behavior more readily.

      In summary, the behavioral outcomes of our optogenetic manipulations support the proposed asymmetric cortico-striatal connectivity model. While the effects of D2-projecting neurons are not uniform, they reflect varying balances of D1 and D2-SPN influence, which further underscores the asymmetrical connections of cortical inputs to the striatum.

      Recommendations For The Authors:

      Reviewer #1 (Recommendations For The Authors): 

      (1) What are the sample sizes for Fig S2? Some trends that are listed as nonsignificant look like they may just be underpowered. Related to this point, S2C indicates that PPR is statistically similar in all conditions. The traces shown in Figure 2 suggest that PPR is quite different in "Input D1"- vs "Input D2" projections. If there is indeed no difference, the exemplar traces should be replaced with more representative ones to avoid confusion. 

      Thank you for your suggestion. The sample size reported in Figure S2 corresponds to the neurons identified as connected in Figure 2. The representative traces shown in Figure 2 were selected based on their close alignment with the amplitude statistics and are intended to reflect typical responses. Given this, it is appropriate to retain the current examples as they accurately illustrate the underlying data.

      (2) Previous studies have described that SPN-SPN collateral inhibition is also asymmetric, with D2->D1 SPN connectivity stronger than the other direction. While cortical inputs to D2-SPNs may also strongly innervate D1-SPNs, it would be helpful to speculate on how collateral inhibition may further shape the biases (or lack thereof) reported here. 

      This would indeed be an interesting topic to explore. SPN-SPN mutual inhibition and/or interneuron inhibition may also play a role in the functional organization and output of the striatum. In the present study, we focused on the primary layer of cortico-striatal connectivity to examine how cortical neurons selectively connect to the striatal direct and indirect pathways, as these pathways have been shown to have distinct yet cooperative functions. To achieve this, we applied a GABAA receptor inhibitor to isolate only excitatory synaptic currents in SPNs, yielding the relevant results.

      To investigate additional circuit organization involving SPN-SPN mutual inhibition, the current available technique would involve single-cell initiated rabies tracing. This approach would help identify the starter SPN and the upstream SPNs that provide input to the starter cell, thereby offering a clearer understanding of the local circuit.

      (3) In Fig 3N-S there are no stats confirming that optogenetic stimulation does indeed increase lever pressing in each group (though it obviously looks like it does). It would be helpful to add statistics for this comparison, in addition to the between-group comparisons that are shown. 

      We thank the reviewer for this thoughtful suggestion. To assess whether optogenetic stimulation increases lever pressing in each group shown in Figures 3O, 3P, 3R, and 3S, we employed a permutation test (10,000 permutations). This non-parametric statistical method does not rely on assumptions about the underlying data distribution and is particularly appropriate for our analysis given the relatively small sample sizes.

      Additionally, in response to Reviewer 3’s concern regarding the potential cytotoxicity of rabies virus affecting behavioral outcomes during in vivo optogenetic stimulation experiments, we focused our analysis on Days 1 through 7 of the ICSS test. This time window remains within 10 days post-rabies infection, a period during which previous studies have reported minimal cytopathic effects (Osakada et al., 2011).

      Accordingly, we have updated Figure 3N-S and revised the associated statistical analyses in the figure legend as follows:

      (O-P) D1-SPN (red) but not D2-SPN stimulation (black) drives ICSS behavior in both the DMS (O: D1, n = 6, permutation test, slope = 1.5060, P = 0.0378; D2, n = 5, permutation test, slope = -0.2214, P = 0.1021; one-tailed Mann Whitney test, Day 7 D1 vs. D2, P = 0.0130) and the DLS (P: D1, n = 6, permutation test, slope = 28.1429, P = 0.0082; D2, n = 5, permutation test, slope = -0.3429, P = 0.0463; one-tailed Mann Whitney test, Day 7 D1 vs. D2, P = 0.0390). *, P < 0.05. (Q) Timeline of helper virus injections, rabies-ChR2 injections and optogenetic stimulation for ICSS behavior. (R-S) Optogenetic stimulation of the cortical neurons projecting to either D1- or D2-SPNs induces ICSS behavior in both the MCC (R: MCC-D1, n = 5, permutation test, Day1-Day7, slope = 2.5857, P = 0.0034; MCC-D2, n = 5, Day2-Day7, permutation test, slope = 1.4229, P = 0.0344; no significant effect on Day7, MCC-D1 vs. MCC-D2,  two-tailed Mann Whitney test, P = 0.9999) and the M1 (S: M1-D1, n = 5, permutation test, Day1-Day7, slope = 1.8214, P = 0.0259; M1-D2, n = 5, Day1-Day7, permutation test, slope = 1.8214, P = 0.0025; no significant effect on Day7, M1-D1 vs. M1-D2, two-tailed Mann Whitney test, P = 0.3810). n.s., not statistically significant.

      We believe this updated analysis and additional context further strengthen the validity of our conclusions regarding the reinforcement effects.

      (4) Line 206: mice were trained for "a few more days" is not a very rigorous description. It would be helpful to state the range of additional days of training. 

      We thank the reviewer for the suggestion. In accordance with the Methods section, we have now specified the number of days, which is 4 days, in the main text (line 207).

      (5) In Fig 4D,H, the statistical comparison is relative modulation (% change) by stimulation of D1- vs D2- projecting inputs. Please show statistics comparing the effect of stimulation on lever presses for each individual condition. For example, is the effect of MCC-D2 stimulation in panel D negative or not significant? 

      Thank you for your suggestion. Below are the statistical results, which we have also incorporated into the figure legend for clarity. To assess the net effects of each manipulation, we compared the observed percentage changes with a theoretical value of zero.

      In Figure 4D, optogenetic stimulation of D1-projecting MCC neurons significantly increased the pressing rate (MCC-D1, n = 8, one-sample two-tailed t-test, t = 2.814, P = 0.0131), whereas stimulation of D2-projecting MCC neurons did not produce a significant effect (MCC-D2, n = 7, one-sample two-tailed t-test, t = 0.8481, P = 0.4117).

      In contrast, Figure 4H shows that optogenetic stimulation of both D1- and D2-projecting M1 neurons significantly increased the sequence press rate (M1-D1, n = 6, one-sample two-tailed Wilcoxon signed-rank test, P = 0.0046; M1-D2, n = 7, one-sample two-tailed Wilcoxon signed-rank test, P = 0.0479).

      These analyses help clarify the distinct behavioral effects of manipulating different corticostriatal projections.

      (6) Are data in Fig 1G-H from a D1- or A2a- cre mouse? 

      The data in Fig 1G-H are from a D1-Cre mouse.

      (7) In Fig S3 it looks like there may actually be an effect of 20Hz simulation of D2-SPNs. Though it probably doesn't affect the interpretation. 

      As indicated by the statistics, there is a slight, but not statistically significant, decrease in local motion when 20 Hz stimulation is delivered to the motor cortex with ChR2 expression in D2-SPNs in the striatum.

      Reviewer #2 (Recommendations For The Authors): 

      The rabies tracing is referred to on several occasions as "new" but the reference papers are from 2011, 2013, and 2018. It is unclear what is new about the system used in the paper and what new feature is relevant to the experiments that were performed. Either clarify or remove "new" terminology. 

      Thank you for bringing this to our attention. We have revised the relevant text accordingly at line 20 in the Abstract, line 31 in the In Brief, line 69 in the Introduction, line 83 in the Results, and line 226 in the Discussion to improve clarity and accuracy.

      In Figure 2 D and G, D1 eGFP (+) and D2 eGFP(-) are plotted separately. These are the same cell type; therefore it may work best to combine that data. This could also be done for 'input to D2- Record D2' in panel D as well as 'input D1-Record D2' and 'input D2-Record D1' in panel G. Combining the information in panel D and G and comparing all 4 conditions to each other would give a better understanding of the comparison of functional connectivity between cortical neurons and D1 and D2 SPNs. 

      We thank the reviewer for the thoughtful suggestion. While presenting single bars for each condition (e.g., ‘input D1 - record D1’) might improve visual simplicity, it would obscure an important aspect of our experimental design. Specifically, we aimed to highlight that the comparisons between D1- and D2-projecting neurons to D1 and D2 SPNs were counterbalanced within the same animals - not just across different groups. By showing both D1-eGFP(+) and D2-eGFP(-), or vice versa, within each group and at similar proportions, we provide a more complete picture of the internal control built into our design. This format helps ensure the audience that our conclusions are not biased by group-level differences, but are supported by within-subject comparisons. Therefore, that the current presentation better could serve to communicate the rigor and balance of our experimental approach.

      The findings in Figure 2 are stated as D1 projecting excitatory inputs have a higher probability of targeting D1 SPNs while D2 projecting excitatory inputs target both D1 SPNs and D2 SPNs. It may be more clear to say that some cortical neurons project specifically to D1 SPNs while other cortical neurons project to both D1 and D2 SPNs equally. A better summary diagram could also help with clarity. 

      Thank you for bringing this up. The data we present reflect the connection probabilities of D1- or D2-projecting cortical neurons to D1 or D2 SPNs. One possible interpretation is like the reviewer said that a subset of cortical neurons preferentially target D1 SPNs, while others exhibit more balanced projections to both D1 and D2 SPNs. However, we cannot rule out alternative explanations - for example, that some D2-projecting neurons preferentially target D2 SPNs, or that the observed differences arise from the overall proportions of D1- and D2-projecting cortical neurons connecting to each striatal subtype.

      There are multiple possible patterns of connectivity that could give rise to the observed differences in connection ratios. Based on our current data, we can confidently conclude the existence of asymmetric cortico-striatal projections to the direct and indirect pathways, but the precise nature of this asymmetry will require further investigation.

      Figure 4 introduces the FR8 task, but there are similar takeaways to the findings from Figure 3. Is there another justification for the FR8 task or interesting way of interpreting that data that could add richness to the manuscript?

      The FR8 task is a self-initiated operant sequence task that relies on motor learning mechanisms, whereas the open field test solely assesses spontaneous locomotion. Furthermore, the sequence task enables us to dissect the functional role of specific neuronal populations in the initiation, maintenance, and termination of sequential movements through closed-loop optogenetic manipulations integrated into the task design. These methodological advantages underscore the rationale for including Figure 4 in the manuscript, as it highlights the unique insights afforded by this experimental paradigm.

      I am somewhat surprised to see that D1-SPN stimulation in DLS gave the results in Figure 3 F and P, as mentioned in the public review. These contrast with some previous results (Cui et al, J Neurosci, 2021). Any explanation? Would be useful to speculate or compare parameters as this could have important implications for DLS function.

      Thank you for raising this point. While Cui’s study has generated some debate, several independent investigations have consistently demonstrated that stimulation of D1-SPNs in the dorsolateral striatum (DLS) facilitates local motion and lever-press behaviors (Dong et al., 2025; Geddes et al., 2018; Kravitz et al., 2010). These findings support the functional role of D1-SPNs in promoting movement and motivated actions.

      The differences in behavioral outcomes observed between our study and that of Cui et al. may stem from several methodological factors, particularly related to anatomical targeting and optical stimulation parameters.

      Specifically, our experiments targeted the DMS at AP +0.5 mm, ML ±1.5 mm, DV –2.2 mm, and the DLS at AP +0.5 mm, ML ±2.5 mm, DV –2.2 mm. In contrast, Cui’s study targeted the DMS at AP +0.9 mm, ML ±1.4 mm, DV –3.0 mm, and the DLS at AP +0.7 mm, ML ±2.3 mm, DV –3.0 mm. These differences indicate that their targeting was slightly more rostral and more ventral than ours, which could have led to stimulation of distinct neuronal populations within the striatum, potentially accounting for variations in behavioral effects observed during optogenetic activation.

      In addition, the optical fibers used in the two studies differed markedly. We employed optical fibers with a 200 µm core diameter and a numerical aperture (NA) of 0.37. Cui’s study used fibers with a larger core diameter (250 µm) and a higher NA (0.66), which would produce a broader spread and deeper penetration of light. This increased photostimulation volume may have recruited a more extensive network of neurons, possibly including off-target circuits, thus influencing the behavioral outcomes in a manner not seen in our more spatially constrained stimulation paradigm.

      Taken together, these methodological differences, both in anatomical targeting and optical stimulation parameters, likely contribute to the discrepancies in behavioral results observed between the two studies. Our findings, consistent with other independent reports, support the role of D1-SPNs in facilitating movement and reinforcement behaviors under more controlled and localized stimulation conditions.

      Reviewer #3 (Recommendations For The Authors): 

      Minor: 

      The authors repeatedly state that they are using a new rabies virus system, but the system has been in widespread use for 16 years, including in the exact circuits the authors are studying, for over a decade. I would not consider this new. 

      Thank you for bringing this to our attention. We have revised the relevant text accordingly at line 20 in the Abstract, line 31 in the In Brief, line 69 in the Introduction, line 83 in the Results, and line 226 in the Discussion to improve clarity and accuracy.

      Figure 2G, how many mice were used for recordings?

      In Fig. 2G, we used 8 mice in the D1-projecting to D2 EGFP(+) group, 7 mice in the D1-projecting to D1 EGFP(-) group, 8 mice in the D2-projecting to D1 EGFP(+) group, and 10 mice in the D2-projecting to D2 EGFP(-) group.

      The amplitude of inputs was not reported in figure 2. This is important, as the strength of the connection matters. This is reported in Figure S2, but how exactly this relates to the presence or absence of connections should be made clearer.

      The amplitude data presented in Figure S2 summarize all recorded currents from confirmed connections, as detailed in the Methods section. A connection is defined by the presence of a detectable and reliable postsynaptic current with an onset latency of less than 10 ms following laser stimulation.

      Reference in the reply-to-review comments:

      Aoki, S., Smith, J.B., Li, H., Yen, X.Y., Igarashi, M., Coulon, P., Wickens, J.R., Ruigrok, T.J.H., and Jin, X. (2019). An open cortico-basal ganglia loop allows limbic control over motor output via the nigrothalamic pathway. Elife 8, e49995.

      Chatterjee, S., Sullivan, H.A., MacLennan, B.J., Xu, R., Hou, Y.Y., Lavin, T.K., Lea, N.E., Michalski, J.E., Babcock, K.R., Dietrich, S., et al. (2018). Nontoxic, double-deletion-mutant rabies viral vectors for retrograde targeting of projection neurons. Nat Neurosci 21, 638-646.

      Cruikshank, S.J., Urabe, H., Nurmikko, A.V., and Connors, B.W. (2010). Pathway-Specific Feedforward Circuits between Thalamus and Neocortex Revealed by Selective Optical Stimulation of Axons. Neuron 65, 230-245.

      Dong, J., Wang, L.P., Sullivan, B.T., Sun, L.X., Smith, V.M.M., Chang, L.S., Ding, J.H., Le, W.D., Gerfen, C.R., and Cai, H.B. (2025). Molecularly distinct striatonigral neuron subtypes differentially regulate locomotion. Nat Commun 16, 2710.

      Geddes, C.E., Li, H., and Jin, X. (2018). Optogenetic Editing Reveals the Hierarchical Organization of Learned Action Sequences. Cell 174, 32-43.

      Jin, L., Sullivan, H.A., Zhu, M., Lavin, T.K., Matsuyama, M., Fu, X., Lea, N.E., Xu, R., Hou, Y.Y., Rutigliani, L., et al. (2024). Long-term labeling and imaging of synaptically connected neuronal networks in vivo using double-deletion-mutant rabies viruses. Nat Neurosci 27, 373-383.

      Jin, X., Tecuapetla, F., and Costa, R.M. (2014). Basal ganglia subcircuits distinctively encode the parsing and concatenation of action sequences. Nat Neurosci 17, 423-430.

      Klug, J.R., Engelhardt, M.D., Cadman, C.N., Li, H., Smith, J.B., Ayala, S., Williams, E.W., Hoffman, H., and Jin, X. (2018). Differential inputs to striatal cholinergic and parvalbumin interneurons imply functional distinctions. Elife 7, e35657.

      Kravitz, A.V., Freeze, B.S., Parker, P.R.L., Kay, K., Thwin, M.T., Deisseroth, K., and Kreitzer, A.C. (2010). Regulation of parkinsonian motor behaviours by optogenetic control of basal ganglia circuitry. Nature 466, 622-626.

      Osakada, F., Mori, T., Cetin, A.H., Marshel, J.H., Virgen, B., and Callaway, E.M. (2011). New Rabies Virus Variants for Monitoring and Manipulating Activity and Gene Expression in Defined Neural Circuits. Neuron 71, 617-631.

      Smith, J.B., Klug, J.R., Ross, D.L., Howard, C.D., Hollon, N.G., Ko, V.I., Hoffman, H., Callaway, E.M., Gerfen, C.R., and Jin, X. (2016). Genetic-Based Dissection Unveils the Inputs and Outputs of Striatal Patch and Matrix Compartments. Neuron 91, 1069-1084.

      Wall, N.R., De La Parra, M., Callaway, E.M., and Kreitzer, A.C. (2013). Differential Innervation of Direct- and Indirect-Pathway Striatal Projection Neurons. Neuron 79, 347-360.

      Wickersham, I.R., Lyon, D.C., Barnard, R.J.O., Mori, T., Finke, S., Conzelmann, K.K., Young, J.A.T., and Callaway, E.M. (2007). Monosynaptic restriction of transsynaptic tracing from single, genetically targeted neurons. Neuron 53, 639-647.

      Zhang, B.B., Geddes, C.E., and Jin, X. (2025) Complementary corticostriatal circuits orchestrate action repetition and switching. Sci Adv, in press.

      Zhu, Z.G., Gong, R., Rodriguez, V., Quach, K.T., Chen, X.Y., and Sternson, S.M. (2025). Hedonic eating is controlled by dopamine neurons that oppose GLP-1R satiety. Science 387, eadt0773.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Response to eLife Assessment:

      We sincerely appreciate your recognition of the novelty and potential significance of our study, and we are grateful for your constructive and valuable comments.

      With regard to your concern that cast immobilization (CI) may itself act as a stressor—potentially influencing skeletal muscle, brown adipose tissue (BAT), and locomotor energy expenditure—we fully recognize this as a highly important issue. In our study, we sought to interpret the findings in light of oxygen consumption and activity data; however, it is inherently difficult to disentangle systemic stress responses and the increased energetic costs associated with CI. We have therefore revised the manuscript to explicitly acknowledge this point as a limitation, and to identify it as a subject for future investigation.

      We also greatly value your suggestion concerning the potential involvement of branched-chain amino acids (BCAAs) derived from adipose tissue in BAT thermogenesis. While our present work primarily focused on muscle-derived amino acids, previous studies have reported that impaired BCAA catabolism in white adipose tissue (WAT) is associated with elevated circulating BCAA levels and metabolic dysfunction [1]. Thus, the possibility that adipose tissue contributes to the BCAA pool used by BAT cannot be disregard. We fully agree that directly addressing this possibility would be highly valuable, and in future work we plan to locally administer isotope-labeled BCAAs into skeletal muscle or adipose tissue and analyze their contribution to circulating BCAA levels and BAT utilization. Although such experiments could not be performed within the timeframe of this resubmission, we have explicitly stated this limitation in the revised manuscript.

      In summary, we have revised the text to acknowledge the limitations highlighted in your comments and to better clarify future research directions. We believe these revisions more accurately position our current study within the broader context. Once again, we are deeply grateful for your recognition of the originality of our work and for your constructive guidance in refining it.

      Response to Reviewers:

      We sincerely appreciate the reviewers’ thoughtful evaluations and constructive comments, and we are grateful for their recognition of the novelty and significance of our study.

      Response to Reviewer 1:

      We thank the reviewer for the detailed and thoughtful comments regarding the potential systemic effects of CI, including stress responses, energy balance, and tissue wasting. These factors are indeed critical when interpreting our findings, and we agree that CI is not merely a passive loss-of-function model but also introduces stress-related influences.

      The principal aim of our study was to investigate the “physiological compensatory mechanisms” that are triggered by loss of muscle function induced by CI. Although CI inevitably elicits systemic metabolic alterations—including stress-related responses—our study is, to our knowledge, the first to demonstrate that a compensatory thermogenic pathway, mediated by the supply of amino acids from skeletal muscle to BAT, is activated under such conditions. We regard this as the central novelty of our work, and it is consistent with the reviewer’s observation that CI results in a “gain of function.”

      Our intention is not to exclude stress as a contributing factor. Rather, we emphasize that under physiological stress conditions requiring BAT thermogenesis—such as reduced energy stores or decreased heat production from skeletal muscle—amino acid supply from muscle to BAT is induced. Importantly, this mechanism is not unique to CI, as we have confirmed similar metabolic crosstalk under acute cold exposure.

      At the same time, we acknowledge that our current data do not allow us to conclude that “stress is not a primary driver” of BAT thermogenesis induced by CI. Chronic stress induced by CI appeared to be limited in our study (Fig. 2_figure supplement 2), but we cannot fully exclude stress-related effects. Accordingly, we now describe the potential triggers of BAT thermogenesis in the manuscript as either decreased body temperature due to muscle functional loss or stress, explicitly noting in the Discussion that stress and reductions in energy reserves may both contribute, as the reviewer suggested. We also modified the original overstatement that “suppression of muscle thermogenesis induces hypothermia,” and now limit the description to the observed phenomenon that “CI-induced restriction of muscle activity leads to reduced cold tolerance,” while recognizing that multiple factors—including stress, substrate availability, and BAT functional capacity—may underlie this effect.

      We further appreciate the reviewer’s comment regarding the energetic burden imposed by CI. The cast weighed less than 2 g (5–10% of body weight), and thus increased locomotor costs cannot be excluded. However, locomotor activity during the dark phase was reduced by approximately 50%, making the net energetic effect difficult to quantify. In the manuscript, we now present oxygen consumption data and restrict our description to “an increase in oxygen consumption per body weight.” Moreover, as food intake remained almost unchanged compared with controls, the animals appear to have compensated for additional energetic demands, supporting the interpretation that the observed effects were not solely attributable to starvation.

      We also find the reviewer’s suggestion—that CI induces BAT overactivation but impairs its functional capacity—extremely important. Indeed, although CI increased thermogenic gene expression in BAT, body temperature maintenance was impaired. We interpret this reduction in thermoregulation as reflecting decreased heat production from skeletal muscle; however, as the reviewer noted, under prolonged CI, depletion of energy stores could further prevent BAT from fully exerting its thermogenic function.

      We have clarified in the revised Discussion that BAT activation under CI is transient, and that long-term outcomes may be influenced by contributions from other thermogenic organs, and that we recognize the impact of energy depletion as an important issue to be addressed in future studies. We also agree that detailed analyses of metabolic changes and BCAA dynamics following prolonged CI will be an important next step.

      Regarding the reviewer’s concern about potential anesthesia effects on acute cold exposure experiments, we confirmed that body temperature had returned to baseline one hour before testing, and that mice displayed spontaneous feeding and grooming behaviors, which suggested adequate recovery. Moreover, the differences observed compared with sham-anesthetized controls support our interpretation that the results reflect CI-specific effects. Nonetheless, we acknowledge this potential confounding factor as an additional limitation.

      Response to Reviewer 2:

      We thank the reviewer for the constructive comments and clear summary of our findings. We fully agree that the impact of immobilization on skeletal muscle and BAT function under cold exposure represents a key future direction. In the present study, we performed acute cold exposure following short-term immobilization and assessed UCP1 expression and metabolic changes in BAT. However, we acknowledge that we did not fully examine coordinated functional adaptations between skeletal muscle and BAT under cold stress. In particular, how skeletal muscle–derived amino acid supply and IL-6–dependent mechanisms operate during cold exposure remains unresolved. We have therefore noted this explicitly as a limitation and highlighted it as a focus for future work. Going forward, we plan to investigate muscle–BAT metabolic crosstalk and IL-6 signaling in detail under cold conditions to clarify whether the observed responses are specific to CI or represent more general physiological adaptations.

      (1) Herman MA, She P, Peroni OD, Lynch CJ, Kahn BB. Adipose tissue branched chain amino acid (BCAA) metabolism modulates circulating BCAA levels. J Biol Chem. 2010;285(15):11348-56. doi:10.1074/jbc.M109.075184.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Heat production mechanisms are flexible, depending on a wide variety of genetic, dietary, and environmental factors. The physiology associated with each mechanism is important to understand since loss of flexibility is associated with metabolic decline and disease. The phenomenon of compensatory heat production has been described in some detail in publications and reviews, notably by modifying BAT-dependent thermogenesis (for example by deleting UCP1 or impairing lipolysis, cited in this paper). These authors chose to eliminate exercise as an alternative means of maintaining body temperature. To do this, they cast either one or both mouse hindlimbs. This paper is set up as an evaluation of a loss of function of muscle on the functionality of BAT.

      Strengths:

      The study is supported by a variety of modern techniques and procedures.

      Weaknesses:

      The authors show that cast immobilization (CI) does not work as a (passive) loss of function, instead, this procedure produces a dramatic gain of function, putting the animal under considerable stress, inducing b-adrenergic effectors, increased oxygen consumption, and IL6 expression in a variety of tissues, together with commensurate cachectic effects on muscle and fat. The BAT is put under considerable stress, super-induced but relatively poor functioning. Thus within hours and days of CI, there is massive muscle loss (leading to high circulating BCAAs), and loss of lipid reserves in adipose and liver. The lipid cycle that maintains BAT thermogenesis is depleted and the mouse is unable to maintain body temperature.

      I cannot agree with these statements in the Discussion:  

      "We have here shown that cast immobilization suppressed skeletal muscle thermogenesis, resulting in failure to maintain core body temperature in a cold environment."

      This result could also be attributed to high stress and decreased calorie reserves. Note also: CI suppresses 50% of locomotor activity, but the actual work done by the mouse carrying bilateral casts is not taken into account.

      We appreciate the reviewer's suggestion. We thank you for raising this issue. As the reviewers suggest, we also consider that cold intolerance resulting from cast immobilization may be attributed to high stress levels, decreased calorie reserves, or reduced systemic locomotor activity. Indeed, reductions in the weight of visceral adipose tissue weight and increases in lipid utilization were observed in the early phase of cast immobilization (Fig.2G and 2F). This suggests that the depletion of calorie reserves induced by stress may affect cold intolerance in cast immobilized mice (Fig.1A-1B). On the other hand, the experiment shown in Fig.1C involved acute cold exposure of mice 2 h after cast immobilization. This result suggests that, even before the depletion of energy stores by immobilization of skeletal muscle, cast immobilization may cause cold intolerance in mice. In addition, as the reviewer suggests, cast immobilization may result in BAT thermogenesis and cachectic effects on muscle and fat. However, circulating corticosterone concentrations and hypothalamic CRH gene expression are not significantly altered after cast immobilization (Figure 2_figure supplement 2D-F). This raises questions about the contribution of stress to the changes in the systemic energy metabolism in this model. As such, we responded to the reviewers’ comments by revising this statement at the beginning of the ‘Discussion’ section and adding a discussion on pages 16, in addition to the existing discussion on pages 17–18.

      Furthermore, to respond as best we could to the reviewer's comments, we performed additional experiments using the restraint stress model (Figure 7). We found that short-term restraint stress may recruit substrate supply from skeletal muscle for BAT thermogenesis via Il6 gene expression. Based on these data, we speculate that the interaction between BAT and skeletal muscle amino acid metabolism may operate under various physiological stress conditions, including infection and exercise, as well as skeletal muscle immobilization, stress, and cold exposure. This interaction may play a significant role in regulating body temperature and energy metabolism. We are currently investigating the effects of sympathetic activation on skeletal muscle amino acid metabolism and systemic thermoregulation via IL-6 secretion from skeletal muscle using a new model. These data will be reported in a subsequent report.

      "Thermoregulatory system in endotherms cannot be explained by thermogenesis based on muscle contraction alone, with nonshivering thermogenesis being required as a component of the ability to tolerate cold temperatures in the long term."

      This statement is correct, and it clearly showcases how difficult it is to interpret results using this CI strategy. The question to the author is- which components of muscle thermogenesis are actually inhibited by CI, and what is the relative heat contribution?

      We appreciate raising this important issue. This study required the measurements of skeletal muscle temperature and electromyography in mice with cast immobilization, but we were unable to perform these measurements. We have therefore described the reviewers suggest on page 18 as limitations of this study.

      In our additional experiments, we found that several genes that are usually activated in skeletal muscle during cold exposure are repressed in mice with cast immobilization (Figure 1_figure supplement 1_G-1K). Skeletal muscle is an important thermogenic organ. Although the role of the sarcolipin gene in non-shivering thermogenesis is well understood, the primary regulator of thermogenesis in metabolism and shivering remains unclear. In Future, we would like to use models in which key signals for energy metabolism are inhibited, such as muscle-specific PGC-1α-deficient mice and muscle-specific AMPK-deficient mice, to clarify important factors in skeletal muscle heat thermogenesis. We expect this approach to enable us to analyze the relative thermal contributions of each component of the heat production process in skeletal muscle, which has proven difficult in immobilized muscle models.

      This conclusion is overinterpreted:

      "In conclusion, we have shown that cast immobilization induced thermogenesis in BAT that was dependent on the utilization of free amino acids derived from skeletal muscle, and that muscle-derived IL-6 stimulated BCAA metabolism in skeletal muscle. Our findings may provide new insights into the significance of skeletal muscle as a large reservoir of amino acids in the regulation of body temperature".

      In terms of the production of the article - the data shown in the heat maps has oddly obscure log10 dimensions. The changes are minimal, approx. 1.5x increase/decrease and therefore significance would be key to reporting these data. Fig.3C heatmap is not suitable. What are the 6 lanes to each condition? Overall, this has little/no information.

      Rather than cherry-picking for a few genes, the results could be made more rigorous using RNA-seq profiling of BAT and muscle tissues.

      We agree that this is an important point. Indeed, our model of skeletal muscle immobilization reveals only modest changes in metabolomics and gene expression analysis. We consider this to be a weakness of the study. However, the interactive thermogenic system that we discovered between skeletal muscle and BAT may also function under other conditions, such as acute stress and cold exposure. We should investigate this further in future models involving such dramatic metabolic changes. In fact, it has been shown that the levels of several metabolites are significantly altered in BAT after acute cold exposure.[1] Therefore, we have corrected the conclusion of this section, as stated on page 18, and added it. We also performed an enrichment analysis on the metabolomics data in BAT following cast immobilization and included the results in Figure 2_figure Supplement 1A. In addition, we excluded the heatmap from Fig. 3C of the pre-revision manuscript, as advised by the reviewer. Although we excluded the results in Figure 3C, we consider Figure 3_figure supplement_1 to be sufficient for the text.  

      In addition, we agree with the reviewer's remarks on our gene expression analysis. In this study, we were unable to examine RNA-seq profiling of BAT and muscle tissue. Therefore, we have described this as a limitation of the study on page 20. However, we are interested in investigating the effect of IL-6 derived from skeletal muscle on RNA-seq profiling of skeletal muscle and BAT. We will conduct future RNA-seq analyses of BAT and skeletal muscle, using models of skeletal muscle immobilization, acute cold exposure and restraint stress.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors identified a previously unrecognized organ interaction where limb immobilization induces thermogenesis in BAT. They showed that limb immobilization by cast fixation enhances the expression of UCP1 as well as amino acid transporters in BAT, and amino acids are supplied from skeletal muscle to BAT during this process, likely contributing to increased thermogenesis in BAT. Furthermore, the experiments with IL-6 knockout mice and IL-6 administration to these mice suggest that this cytokine is likely involved in the supply of amino acids from skeletal muscle to BAT during limb immobilization.

      Strengths:

      The function of BAT plays a crucial role in the regulation of an individual's energy and body weight. Therefore, identifying new interventions that can control BAT function is not only scientifically significant but also holds substantial promise for medical applications. The authors have thoroughly and comprehensively examined the changes in skeletal muscle and BAT under these conditions, convincingly demonstrating the significance of this organ interaction.

      Weaknesses:

      Through considerable effort, the authors have demonstrated that limb-immobilized mice exhibit changes in thermogenesis and energy metabolism dynamics at their steady state. However, The impact of immobilization on the function of skeletal muscle and BAT during cold exposure has not been thoroughly analyzed.

      Reviewer #3 (Public Review):

      Summary:

      In this manuscript, the authors show that impairment of hind limb muscle contraction by cast immobilization suppresses skeletal muscle thermogenesis and activates thermogenesis in brown fat. They also propose that free BCAAs derived from skeletal muscle are used for BAT thermogenesis, and identify IL-6 as a potential regulator.

      Strengths:

      The data support the conclusions for the most part.

      Weaknesses: The data provided in this manuscript are largely descriptive. It is therefore difficult to assess the potential significance of the work. Moreover, many of the described effects are modest in magnitude, questioning the overall functional relevance of this pathway. There are no experiments that directly test whether BCAAs derived from adipose tissue are used for thermogenesis, which would require more robust tracing experiments. In addition, the rigor of the work should be improved. It is also recommended to put the current work in the context of the literature.

      We appreciate the reviewer's valuable feedback. As the reviewer pointed out, many of the effects described in this study are modest in magnitude. This reflects a limitation of our study, which used skeletal muscle immobilization as a model. To clarify the overall functional relevance of this pathway, we therefore plan to use alternative models in which BAT thermogenesis and systemic cachectic effect are more strongly induced. We have added this point to the 'Conclusion' section on page 18.

      In addition, previous findings reported that mitochondrial BCAA catabolism in brown adipocytes promotes systemic BCAA clearance, suggesting that BCAAs may be supplied to BAT from other organs during BAT thermogenesis.[5] However, as the reviewer rightly pointed out, the current study did not directly investigate whether BCAAs derived from adipose tissue contribute to thermogenic processes. In light of this, we have revised the manuscript to include a statement in the limitations section on page 20 that addresses this point. 

      Metabolomic analysis of white adipose tissue (WAT) following skeletal muscle immobilization revealed alterations in amino acid concentrations in WAT in response to cast immobilization (Author response image 1A). Notably, levels of BCAAs in WAT remained largely unchanged at 24 hours after cast immobilization, but increased significantly by day 7 (Author response image 1B). At the 24-hour time point, when BAT thermogenesis is known to be activated, WAT weights was found to be reduced (Fig. 2H). Gene expression analysis of amino acid metabolism-related genes in WAT at this time point revealed a modest upregulation of several genes (Author response image 1C). Furthermore, a slight increase in the uptake of [<sup>3</sup>H] leucine into WAT was observed following immobilization (Fig. 3C). Collectively, these findings suggest that BCAAs within WAT may be primarily metabolized locally rather than being mobilized and supplied to BAT. In addition, given the relatively low levels of BCAAs per tissue mass and the limited capacity for BCAA uptake in WAT compared to other tissues, we consider it unlikely that WAT serves as a major reservoir of BCAAs.

      Author response image 1.

      (A) Amino acids in epididymal white adipose tissue (eWAT) of IL-6 KO (–/–) and WT (+/+) mice without (control) or with bilateral cast immobilization for the indicated times. Results are presented as heat maps of the log10 value of the fold change relative to control WT mice and are means of four mice in each group. (B) BCAA concentrations in eWAT of IL-6 KO and WT mice without (control) or with bilateral cast immobilization for 1 or 7 days. (n = 4 per group) (C) RT and real-time PCR analysis of the expression of SLC1A5, SLC7A1, SLC38A2, SLC43A1, BCAT2 and BCKDHA genes in eWAT of mice without (control) or with bilateral cast immobilization for 24 h. (n = 6 per group). All data other than in (A) are means ± SEM. *p < 0.05, **p < 0.01, ***p < 0.001 as determined by Dunnett's test (B) or by the unpaired t test (C).

      Reviewer #1 (Recommendations for the authors): 

      • Gypsum is an irrelevant label. Label consistently, with a procedure acronym, like CI or Imm.

      We apologize for any confusion that our notation may have caused. We corrected all labels relating to the skeletal muscle immobilization model in mice to 'Imm'.

      There are many grammatical errors and typos. Search for an example on Fudure1. The sense of some sentences is enough to obscure their meaning.

      We appreciate the reviewer's points. We have checked the article for grammatical and typographical errors, correcting them where necessary.

      • Figures 6E and F need to be re-annotated in the legend and on figures.

      Following the peer reviewer's advice, we have re-annotated the Figure legends of this result.

      Reviewer #2 (Recommendations for the authors): 

      (1) It is difficult to understand how the data presented in Supplemental Table 1 were obtained. This appears to be data showing that the skeletal muscle weight of the hind limbs in mice accounts for 40 to 50% of the total skeletal muscle weight. How did the authors calculate the muscle weight? Specifically, how did they measure the weight of muscles that are neither in the hind limbs nor in the forelimbs ("Other")? Was this estimated from whole-body CT or MRI data?

      In the legend, it mentions "the posterior cervical region," but what exactly was measured in the posterior cervical region? The methods for this data should be clearly described.

      We appreciate the reviewers' comments. We apologize for any confusion caused by inadequate explanation of this data. This data was obtained by removing skeletal muscle from the posterior cervical region and measuring the weight of the wet tissue. We have taken care to remove most of the skeletal muscle, but some will remain. However, we do not believe that these errors are significant enough to alter the interpretation of the results. This has now been added to the 'Methods' section on page 21.

      (2) Through considerable effort, the authors have demonstrated that limb-immobilized mice exhibit changes in thermogenesis and energy metabolism dynamics at their steady state. However, it remains unclear why limb-immobilized mice have reduced tolerance to cold exposure. Was there any change in the abundance of energy metabolism-related genes during cold exposure between the immobilized and control mice? For example, if the gene expression of UCP1 and UCP2, which are typically upregulated in brown adipose tissue (BAT) and skeletal muscle during cold exposure, was suppressed in the immobilized mice, it might explain their reduced cold tolerance. Thus, the changes in the response of skeletal muscle and BAT to cold exposure between immobilized and control mice should also be analyzed.

      We thank the reviewer for the constructive comments. We consider the main weakness of this study to be the fact that we were unable to measure the temperature and electromyography (EMG) of the skeletal muscles of the cast-immobilized mice. Following the reviewers' advice, we analyzed the expression levels of several genes related to heat production or energy metabolism (Ucp1, Ucp2, Ucp3, Sln and Ppargc1a) in BAT and skeletal muscle of cast-immobilized mice after acute cold exposure (Figure1_figure supplement 1G-1K). The results showed that the expression of several genes that are usually increased in BAT and skeletal muscle during cold exposure was repressed in cast-immobilized mice. Notably, cast immobilization did not induce the UCP2 and PGC-1α genes at room temperature, and their upregulation during cold exposure was also suppressed in cast-immobilized mice. UCP2 is known to alter its expression in relation to energy metabolism, but it is unclear whether it regulates energy metabolism.[2] Additionally, UCP2 is understood to play a non-role in thermogenesis, and the function of the UCP2 in skeletal muscle remains unclear.[3] On the other hands, PGC-1α is widely recognized as a transcriptional coactivator that regulates various metabolic processes, including thermogenesis.[4] In our study, we found that the amounts of metabolites in the TCA cycle and the expression of the PGC-1α gene were decreased rapidly in immobilized skeletal muscle. This suggests that the metabolic rate is reduced in immobilized skeletal muscle (Figure 1_figure supplement 2A and 2F). In endothermic animals, energy expenditure in skeletal muscle plays a significant role in maintaining body temperature during both activity and rest. Hence, it is assumed that the reduced metabolic rate in skeletal muscle significantly impacts the maintenance of body temperature in cold conditions. Further investigation is required into the function of these genes in skeletal muscle thermogenesis, but we expect that the additional data suggest that the loss of muscle function due to immobilization affects the maintenance of body temperature under cold temperature. These results were discussed further on page 15.

      Reviewer #3 (Recommendations for the authors): 

      There are also more specific concerns related to the data supporting the claims.

      (1) The relevance of increasing thermogenesis in BAT after cast immobilization is unclear, as adult humans have very little BAT. Thermogenesis gene and protein expression should be measured in white adipose tissue.

      We would like to thank the reviewers for highlighting this important issue. We agree with the reviewer's comments. We did not observe significant changes in UCP1 expression in the subcutaneous adipose tissue of the inguinal region following skeletal muscle immobilization. We suspect that this is because skeletal muscle immobilization in mice did not exert a strong enough effect to induce browning of white adipose tissue. The ability of immobilizing skeletal muscle to activate thermogenesis in brown or beige adipocytes in adults remains unclear. We have therefore noted this limitation in our study in line 6.

      Additionally, in this study, we aimed to clarify the role of skeletal muscle as an amino acid reservoir under metabolic stress conditions that increase BAT thermogenesis. To this end, we employed models of skeletal muscle immobilization, acute cold exposure, and restraint stress. We also intend to analyze the metabolic interactions between beige adipose tissue and skeletal muscle in more detail using models that induce browning, such as exercise or cold acclimation.

      (2) In Figures 1E-G, there is no significant difference in UCP1 levels relative to the control, but body temperature is lowered from day 2 to day 7. How do the authors explain this?

      This is an important point. We consider the decrease in body temperature of mice following cast immobilization at room temperature to be the result of a reduction in systemic locomotor activity.

      (3) The small induction of PGC1a seen at 10 hours goes away after day 3. Why is this?

      This is an important point. Our investigation showed that the norepinephrine concentration in BAT and blood of cast-immobilized mice tends to increase, peaking at 24 hours of immobilization (Fig. 1H and Figure 2_figure supplement 2D), and then gradually returns to baseline. We speculate that this transient activation of the sympathetic nervous system may affect the expression of PGC1α in BAT. Additionally, although thermogenesis in BAT temporarily increases after skeletal muscle immobilization, studies from other research groups suggest that long-term skeletal muscle immobilization (two weeks) may increase non-shivering thermogenesis in skeletal muscle via high expression SLN.[6] Therefore, we hypothesize that other thermogenic mechanisms besides BAT might be involved during prolonged cast immobilization. We have added a discussion of these topics on page 16.

      (4) The metabolic cage data are marked in multiple places as significant, but the effect size is extremely small. Please describe how significance was calculated (Figure 5 supplement 1B, E, F).

      This is a valid point. This data was statistically analyzed using daily averages, with the results then being compiled. However, the figure was amended because it was not appropriate to use the original to demonstrate significant differences.

      (5) How does IL-6 increase BCAA levels in muscle?

      This is an important point. We are also investigating this issue with great interest. In future, we will use RNA-seq profiling to investigate the mechanism by which IL-6 regulates amino acid metabolism in skeletal muscle. This point was added as a

      limitation of the study on page 19.

      (6) What is the mechanism behind the elevated il6 levels after cast immobilization?

      We appreciate the reviewer's points. Since IL-6 gene expression in skeletal muscle increases in response to acute cold exposure and acute stress, we hypothesize that IL-6 is regulated by β-adrenergic effectors. In our preliminary experiments, stimulation with norepinephrine or with clenbuterol, a β2-adrenergic receptor agonist, suggests an increase in IL-6 gene expression and the intracellular free BCAA concentration in cultured mouse muscle cells (Author response image 2A-2D). Going forward, our plans include conducting further studies using a mouse model in which the sympathetic nervous system is activated by administering LPS intracerebroventricularly, as well as using muscle-specific β2-adrenergic receptor knockout mice.  

      Reference:

      (1) Okamatsu-Ogura, Y., et al. UCP1-dependent and UCP1-independent metabolic changes induced by acute cold exposure in brown adipose tissue of mice. Metabolism. 2020 113:  154396 doi: 10.1016/j.metabol.2020.154396.

      (2) Patrick Schrauwen and Matthijs Hesselink, UCP2 and UCP3 in muscle controlling body metabolism., J Exp Biol. 2002 Aug;205(Pt 15):2275-85. doi: 10.1242/jeb.205.15.2275.

      (3) C Y Zhang, et al., Uncoupling protein-2 negatively regulates insulin secretion and is a major link between obesity, beta cell dysfunction, and type 2 diabetes., Cell. 2001 Jun 15;105(6):745-55. doi: 10.1016/s0092-8674(01)00378-6.

      (4) Christophe Handschin and Bruce M Spiegelman, Peroxisome proliferator-activated receptor gamma coactivator 1 coactivators, energy homeostasis, and metabolism., Endocr Rev. 2006 Dec;27(7):728-35. doi: 10.1210/er.2006-0037.

      (5) Yoneshiro, et al., BCAA catabolism in brown fat controls energy homeostasis through SLC25A44. Nature. 2019 572(7771): 614-619 doi: 10.1038/s41586-019-1503-x.

      (6) Shigeto Tomiya, et al., Cast immobilization of hindlimb upregulates sarcolipin expression in atrophied skeletal muscles and increases thermogenesis in C57BL/6J mice., Am J Physiol Regul Integr Comp Physiol. 2019 Nov1;317(5):R649-R661.doi:10.1152/ajpregu.00118.2019.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Strengths: 

      Sarpaning et al. provide a thorough characterization of putative Rnt1 cleavage of mRNA in S. cerevisiae. Previous studies have discovered Rnt1 mRNA substrates anecdotally, and this global characterization expands the known collection of putative Rnt1 cleavage sites. The study is comprehensive, with several types of controls to show that Rnt1 is required for several of these cleavages.

      Weaknesses: 

      (1) Formally speaking, the authors do not show a direct role of Rnt1 in mRNA cleavage - no studies were done (e.g., CLIP-seq or similar) to define direct binding sites. Is the mutant Rnt1 expected to trap substrates? Without direct binding studies, the authors rely on genetics and structure predictions for their argument, and it remains possible that a subset of these sites is an indirect consequence of rnt1. This aspect should be addressed in the discussion.

      We have added to this point in the discussion, as requested. We do not, however, agree that CLIP-seq or other methods are needed to address this point, or would even be helpful in the question the reviewer raises. 

      Importantly, we show that recombinant Rnt1 purified from E. coli cleaves the same sites as those mapped in vivo. This does provide direct evidence that Rnt1 directly binds those RNAs. Furthermore, it shows that it can bind these RNAs without the need of other proteins. Our observation that many mRNAs are cleaved at -14 and +16 positions from NGNN stem loops to leave 2-nt 3’ overhangs provides further support that these are the products of an RNase III enzyme, and Rnt1 is the only family member in yeast. Thus, we disagree with the reviewer that our studies do not show direct targeting.

      CLIP-seq experiments would be valuable, but they would address a different point. CLIP-seq measures protein binding to RNA targets, and it is likely that Rnt1 binds some RNAs without cleaving them. In addition, only a transient interaction are needed for cleavage and such transient interactions might not be readily detected by CLIP-seq. Thus, CLIP-seq would reveal the RNAs bound by Rnt1, but would not help identify which ones are cleaved. Catala et al (2004) showed that the catalytically inactive mutant of Rnt1 carries out some functions that are important for the cell cycle. The CLIP-seq studies would be valuable to determine these non-catalytic roles of Rnt1, but we consider those questions beyond the scope of the current study.

      (2) The comprehensive list of putative Rnt1 mRNA cleavage sites is interesting insofar as it expands the repertoire of Rnt1 on mRNAs, but the functional relevance of the majority of these sites remains unknown. Along these lines, the authors should present a more thorough characterization of putative Rnt1 sites recovered from in vitro Rnt1 cleavage.

      We have included new data that confirm that YDR514C cleavage by Rnt1 is relevant to yeast cell physiology. We show that YDR514C overexpression is indeed toxic, as we previously postulated. More importantly, we generated an allele of YDR514C that has synonymous mutations designed to disrupt the stem-loop recognized by Rnt1. We show that at 37 °C, both the wild-type and mutant allele are toxic to rnt1∆ cells, but that in cells that express Rnt1, the wild-type cleavable allele is more toxic than the allele with the mutated stem-loop. This genetic interaction provides strong evidence that cleavage of YDR514C by Rnt1 is relevant to cell physiology. 

      We have also added PARE analysis of poly(A)-enriched and poly(A)-depleted reactions and show that compared to Dcp2, Rnt1 preferentially targets poly(A)+ mRNAs, consistent with it targeting nuclear RNAs. We discuss in more detail that by cleaving nuclear RNA, Rnt1 provides a kinetic proofreading mechanism for mRNA export competence.

      (3) The authors need to corroborate the rRNA 3'-ETS tetraloop mutations with a northern analysis of 3'-ETS processing to confirm an ETS processing defect (which might need to be done in decay mutants to stabilize the liberated ETS fragment). They state that the tetraloop mutation does not yield a growth defect and use this as the basis for concluding that rRNA cleavage is not the major role of Rnt1 in vivo, which is a surprising finding. But it remains possible that tetraloop mutations did not have the expected disruptive effect in vivo; if the ETS is processed normally in the presence of tetraloop mutations, it would undermine this interpretation. This needs to be more carefully examined.

      We have removed the rRNA 3'-ETS tetraloop mutations, because initial northern blot analysis indicated that Rnt1 cleavage is not completely blocked by the mutations we designed. Therefore, the reviewer is correct that tetraloop mutations did not have the expected disruptive effect in vivo. Future investigations will be required to fully understand this. This was a minor point and removing this focuses the paper on its major contributions

      (4) To support the assertion that YDR514C cleavage is required for normal "homeostasis," and more specifically that it is the major contributor to the rnt1∆ growth defect, the authors should express the YDR514C-G220S mutant in the rDNA∆ strains with mutations in the 3'-ETS (assuming they disrupt ETS processing, see above). This simple experiment should provide a relative sense of "importance" for one or the other cleavage being responsible for the rnt1∆ defect. Given the accepted role of Rnt1 cleavage in rRNA processing and a dogmatic view that this is the reason for the rnt1∆ growth defect, such a result would be surprising and elevate the functional relevance and significance of Rnt1 mRNA cleavage.

      We agree that the experiment proposed by the reviewer is very simple, but we are puzzled by the rationale. First, our experiments do not support that there is anything special about the G220S mutation in YDR514C. A complete loss of function (ydr514c∆) also suppresses the growth defect, suggesting that ydr514c-G220S is a simple loss of function allele. We have clarified that the G220S mutation is distant from the stem-loop recognized by Rnt1 and is unlikely to affect cleavage by Rnt1. Instead, Rnt1 cleavage and the G220S mutation are independent alternative ways to reduce Ydr514c function. We have clarified this point in the text. 

      As mentioned in response to point #3, we have included other additional experiments that address the same overall question raised here – the importance of YDR514C mRNA cleavage by Rnt1.    

      (5) Given that some Rnt1 mRNA cleavage is likely nuclear, it is possible that some of these targets are nascent mRNA transcripts, as opposed to mature but unexported mRNA transcripts, as proposed in the manuscript. A role for Rnt1 in co-transcriptional mRNA cleavage would be conceptually similar to Rnt1 cleavage of the rRNA 3'-ETS to enable RNA Pol I "torpedo" termination by Rat1, described by Proudfoot et al (PMID 20972219). To further delineate this point, the authors could e.g., examine the poly-A tails on abundant Rnt1 targets to establish whether they are mature, polyadenylated mRNAs (e.g., northern analysis of oligo-dT purified material). A more direct test would be PARE analysis of oligo-dT enriched or depleted material to determine the poly-A status of the cleavage products. Alternatively, their association with chromatin could be examined. 

      We have added the requested PARE analysis of oligo-dT enriched or depleted material to determine the polyA status of the cleavage products and related discussions. These confirm our proposal that Rnt1 cleaves mature but unexported mRNA transcripts

      We also note that the northern blots shown in figures 2E, 4C, and 5B use oligo dT selected RNA because the signal was undetectable when we used total RNA. This suggests that the cleaved mRNAs are indeed polyadenylated. 

      The term nascent is somewhat ambiguous, but if the reviewer means RNA that is still associated with Pol II and has not yet been cleaved by the cleavage and polyadenylation machinery, we think that is inconsistent with our findings. We have also re-analyzed the NET-seq data from https://pubmed.ncbi.nlm.nih.gov/21248844/ and find no prominent peaks for our Rnt1 sites in Pol II associated RNAs, although for BDF2 NET-seq does suggest that “spliceosome-mediated decay” is co-transcriptional as would be expected. Altogether these data confirm our previous proposal that Rnt1 mainly cleaves mRNAs that have completed polyadenylated but are not yet exported.

      (6) While laboratory strains of budding yeast have a single RNase III ortholog Rnt1, several other budding yeast have a functional RNAi system with Dcr and Ago (PMID 19745116), and laboratory yeast strains are a derived state due to pressure from the killer virus to lose the RNAi system (PMID 21921191). The current study could provide new insight into the relative substrate preferences of Rnt1 and budding yeast Dicer, which could be experimentally confirmed by expressing Dcr in RNT1 and rnt1∆ strains. In lieu of experiments, discussion of the relevance of Rnt1 cleavage compared to yeast RNAi should be included in the discussion before the "human implications" section.

      The reviewer points out that most other eukaryotic species have multiple RNase III family members, which is a general point we discussed and have now expanded on. The reviewer specifically points to papers that study a species that was incorrectly referred to as Saccharomyces castellii in PMID 19745116, but whose current name is Naumovozyma castellii, reflecting that it is not that closely related to S. cerevisiae (diverged about 86 million years ago; for the correct species phylogeny, see http://ygob.ucd.ie/browser/species.html, as both of the published papers the reviewer cites have some errors in the phylogeny). 

      The other species discussed in PMID 19745116 (Vanderwaltozyma polyspora and Candida albicans) are even more distant. There have been several studies on substrate specificity of Dcr1 versus Rnt1 (including PMID 19745116). 

      The reviewer suggests that expressing Dcr1 in S. cerevisiae would be a valuable addition. However, we can’t envision a mechanism by which S. cerevisiae maintained physiologically relevant Dcr1 substrates in the absence of Dcr1. The results from the proposed study would, in our opinion, be limited to identifying RNAs that can be cleaved in this particular artificial system. We think an important implication of our work is that similar studies to ours should be caried out in rnt1∆, dcr1∆, and double mutants in either S. pombe or N. castellii, as well as in drosha knock outs in animals, and we discuss this in more detail in the revised paper. 

      (7) For SNR84 in Figure S3D, it appears that the TSS may be upstream of the annotated gene model. Does RNA-seq coverage (from external datasets) extend upstream to these additional mapped cleavages? The assertion that the mRNA is uncapped is concerning; an alternative explanation is that the nascent mRNA has a cap initially but is subsequently cleaved by Rnt1. This point should be clarified or reworded for accuracy.

      We agree with the reviewer that the most likely explanation is that the primary SNR84 transcript is capped, and 5’ end processed by Rnt1 and Rat1 to make a mature 5’ monophosphorylated SNR84 and have clarified the text accordingly. We suspect our usage of “uncapped” might have been confusing. “uncapped” was not meant to indicate that the primary transcript did not receive a cap, but instead that the mature transcript did not have a cap. We now use “5’ end processed” and “5’ monophosphorylated”. 

      Reviewer #2 (Public review):  

      The yeast double-stranded RNA endonuclease Rnt1, a homolog of bacterial RNase III, mediates the processing of pre-rRNA, pre-snRNA, and pre-snoRNA molecules. Cells lacking Rnt1 exhibit pronounced growth defects, particularly at lower temperatures. In this manuscript, Notice-Sarpaning examines whether these growth defects can be attributed at least in part to a function of Rnt1 in mRNA degradation. To test this, the authors apply parallel analysis of RNA ends (PARE), which they developed in previous work, to identify polyA+ fragments with 5' monophosphates in RNT1 yeast that are absent in rnt1Δ cells. Because such RNAs are substrates for 5' to 3' exonucleolytic decay by Rat1 in the nucleus or Xrn1 in the cytoplasm, these analyses were performed in a rat1-ts xrn1Δ background. The data recapitulate known Rtn1 cleavage sites in rRNA, snRNAs, and snoRNAs, and identify 122 putative novel substrates, approximately half of which are mRNAs. Of these, two-thirds are predicted to contain double-stranded stem loop structures with A/UGNN tetraloops, which serve as a major determinant of Rnt1 substrate recognition. Rtn1 resides in the nucleus, and it likely cleaves mRNAs there, but cleavage products seem to be degraded after export to the cytoplasm, as analysis of published PARE data shows that some of them accumulate in xrn1Δ cells. The authors then leverage the slow growth of rnt1Δ cells for experimental evolution. Sequencing analysis of thirteen faster-growing strains identifies mutations predominantly mapping to genes encoding nuclear exosome co-factors. Some of the strains have mutations in genes encoding a laratdebranching enzyme, a ribosomal protein nuclear import factor, poly(A) polymerase 1, and the RNAbinding protein Puf4. In one of the puf4 mutant strains, a second mutation is also present in YDR514C, which the authors identify as an mRNA substrate cleaved by Rnt1. Deletion of either puf4 or ydr514C marginally improves the growth of rnt1Δ cells, which the authors interpret as evidence that mRNA cleavage by Rnt1 plays a role in maintaining cellular homeostasis by controlling mRNA turnover. 

      While the PARE data and their subsequent in vitro validation convincingly demonstrate Rnt1mediated cleavage of a small subset of yeast mRNAs, the data supporting the biological significance of these cleavage events is substantially less compelling. This makes it difficult to establish whether Rnt1-mediated mRNA cleavage is biologically meaningful or simply "collateral damage" due to a coincidental presence of its target motif in these transcripts.

      We thank the reviewer and have added additional data to support our conclusion that mRNA cleavage, at least for YDR514C, is not simply collateral damage, but a physiologically relevant function of Rnt1. From an evolutionary perspective, cleavage of mRNAs by Rnt1 might have initially been collateral damage, but if there is a way to use this mechanism, evolution is probably going to use it.

      (1) A major argument in support of the claim that "several mRNAs rely heavily on Rnt1 for turnover" comes from comparing number of PARE reads at the transcript start site (as a proxy for fraction of decapped transcripts) and at the Rnt1 cleavage site (as a proxy for fraction of Rnt1-cleaved transcripts). The argument for this is that "the major mRNA degradation pathway is through decapping". However, polyA tail shortening usually precedes decapping, and transcripts with short polyA tails would be strongly underrepresented in PARE sequencing libraries, which were constructed after two rounds of polyA+ RNA selection. This will likely underestimate the fraction of decapped transcripts for each mRNA. There is a wide range of well-established methods that can be used to directly measure differences in the half-life of Rnt1 mRNA targets in RNT1 vs rnt1Δ cells. Because the PARE data rely on the presence of a 5' phosphate to generate sequencing reads, they also cannot be used to estimate what fraction of a given mRNA transcript is actually cleaved by Rnt1. 

      The reviewer is correct that decapping preferentially affects mRNAs with shortened poly(A) tails, that Rnt1 cleavage likely affects mostly newly made mRNAs with long poly(A) tails, and that PARE may underestimate the decay of mRNAs with shortened poly(A) tails. We have reanalyzed our previously published data where we performed PARE on both the poly(A)-enriched fraction and the poly(A)-depleted fraction (that remains after two rounds of oligo dT selection). Rnt1 products are over-represented in the poly(A)-enriched fraction, while decapping products are enriched in the poly(A)-depleted fraction, providing further support to our conclusion that Rnt1 cleaves nuclear RNA. We have re-written key sections of the paper accordingly.

      The reviewer also points out that “There is a wide range of well-established methods that can be used to directly measure differences in the half-life of Rnt1 mRNA targets in RNT1 vs rnt1Δ cells.” However, all of those methods measure mRNA degradation rates from the steady state pool, which is mostly cytoplasmic. We have, in different contexts, used these methods, but as we pointed out they are inappropriate to measure degradation of nuclear RNA. There are some studies that measure nuclear degradation rates, but this requires purifying nuclei. There are two major drawbacks to this. First, it cannot distinguish between degradation in the nucleus and export from the nucleus because both processes cause disappearance from the nucleus. Second, the purification of yeast nuclei requires “spheroplasting” or enzymatically removing the rigid cell wall. This spheroplasting is likely to severely alter the physiological state of the yeast cell. Given these significant drawbacks and the substantial time and money required, we chose not to perform this experiment.  

      (2) Rnt1 is almost exclusively nuclear, and the authors make a compelling case that its concentration in the cytoplasm would likely be too low to result in mRNA cleavage. The model for Rnt1-mediated mRNA turnover would therefore require mRNAs to be cleaved prior to their nuclear export in a manner that would be difficult to control. Alternatively, the Rnt1 targets would need to re-enter prior to cleavage, followed by export of the cleaved fragments for cytoplasmic decay. These processes would need to be able to compete with canonical 5' to 3' and 3' to 5' exonucleolytic decay to influence mRNA fate in a biologically meaningful way.

      We disagree that mRNA export would be difficult to control, as is elegantly demonstrated by the 13 KDa HIV Rev protein. The export of many other RNAs is tightly controlled such that many RNAs are rapidly degraded in the nucleus by, for example, Rat1 and the RNA exosome, while other RNAs are rapidly exported. Indeed, the competition between RNA export and nuclear degradation is generally thought to be an important quality control for a variety of mRNAs and ncRNAs. We do agree with the reviewer that re-import of mRNAs appears unlikely (which is why we do not discuss it), although it occurs efficiently for other Rnt1-cleaved RNAs such as snRNAs. We have clarified the text accordingly, including in the introduction, results, and discussion. 

      (3) The experimental evolution clearly demonstrates that mutations in nuclear exosome factors are the most frequent suppressors of the growth defects caused by Rnt1 loss. This can be rationalized by stabilization of nuclear exosome substrates such as misprocessed snRNAs or snoRNAs, which are the major targets of Rnt1. The rescue mutations in other pathways linked to ribosomal proteins (splicing, ribosomal protein import, ribosomal mRNA binding) support this interpretation. By contrast, the potential suppressor mutation in YDR514C does not occur on its own but only in combination with a puf4 mutation; it is also unclear whether it is located within the Rnt1 cleavage motif or if it impacts Rnt1 cleavage at all. This can easily be tested by engineering the mutation into the endogenous YDR514C locus with CRISPR/Cas9 or expressing wild-type and mutant YDR514C from a plasmid, along with assaying for Rnt1 cleavage by northern blot. Notably, the growth defect complementation of YDR514C deletion in rnt1Δ cells is substantially less pronounced than the growth advantage afforded by nuclear exosome mutations (Figure S9, evolved strains 1 to 5). These data rather argue for a primary role of Rnt1 in promoting cell growth by ensuring efficient ribosome biogenesis through pre-snRNA/pre-snoRNA processing. 

      The reviewer makes several points. 

      First, we have clarified that the ydr514c-G220S mutation is not near the Rnt1 cleavage motif and is unlikely to affect cleavage by Rnt1. This is exactly what would be expected for a mutation that was selected for in an rnt1∆ strain. Although the reviewer appears to expect it, a mutation that affects Rnt1 cleavage could not be selected for in a strain that lacks Rnt1.

      Second, the reviewer points out that the original ydr514c mutations arose in a strain that also had a puf4 deletion. However, we show that ydr514c∆ also suppresses rnt1∆. Furthermore, we have added additional data that overexpressing an uncleavable YDR514C mRNA affects yeast growth at 37 °C more than the wild-type cleavable form further supporting that the cleavage of YDR154C by Rnt1 is physiologically relevant. 

      Reviewer #2 (Recommendations for the authors): 

      (1) The description of the PARE library construction protocol and data analysis workflow is insufficient to ensure their robustness and reproducibility. The library construction protocol should include details of the individual steps, and the data analysis workflow description should include package versions and exact commands used for each analysis step.

      We have clarified that the experiments were performed exactly as previously described and have included very detailed methods. The Galaxy server does not require commands and instead we have indicated the parameters chosen in the various steps. We have also added that the PARE libraries for poly(A)+ and poly(A)- fractions were generated in the lab of Pam Green according to their protocol, which is not exactly the same as ours. Nevertheless, the Rnt1 sites are also evident from those libraries, further demonstrating the robustness of our data. 

      (2) PARE signal is expressed as a ratio of sequencing coverage at a given nucleotide in RNT1 vs rnt1Δ cells. This poses challenges to estimating fold changes: by definition, there should be no coverage at Rnt1 cleavage sites in rnt1Δ cells, as there will not be any 5' monophosphate-containing mRNA fragments to be ligated to the library construction linker. This should be accounted for in the data analysis pipeline - the DESeq2 package, for example, handles this very well (https://support.bioconductor.org/p/64014/).

      The reviewer is correct and we have clarified how we do account for the possibility of having 0 reads by adding an arbitrary 0.01 cpm to all PARE scores for wild type and mutant. In the original manuscript this was not explicitly mentioned and the reader would have to go to our previous paper to learn about this detail. Adding this 0.01 cpm pseudocount avoids dividing by 0 when we calculate a comPARE score. This means we actually underestimate the fold change. As can be seen in the red line in the image below, the y-axis modified log2FC score maxes out along a diagonal line at log2([average RNT1 reads]/0.01) instead of at infinity. That is, at a wild type peak height of 1 cpm, the maximum possible score is log2(1.01/.01), which equals 6.66, and at 10 cpm, the maximum score is ~10, etc.). As can be seen, many of the scores fall along this diagonal, reflecting that indeed, there are 0 reads in the rnt1∆ samples.

      Author response image 1.

      There are multiple ways to deal with this issue, and ours is not uncommon. DESeq2, suggested by the reviewer, uses a different method, which relies on the assumption that the dispersion of read counts for genes of any given expression strength is constant, and then uses that dispersion to “correct” the 0 read counts. While this is a valid way for differential gene expression when comparing similar RNAs, the underlying assumption that the dispersion of expression of all genes is similar for similar expression level is questionable for comparing, for example, mRNAs, snoRNAs, and snRNAs. Thus, we are not convinced that this is a better way to deal with 0 counts. Our analysis accepts that 0 might be the best estimate for the number of counts that are expected from rnt1∆ samples. 

      (3) The analysis in Figure S8 is insufficient to demonstrate that the four mRNAs depicted are significantly more abundant in rnt1Δ vs RNT1 cells - differences in coverage could simply be a result of different sequencing depth. Please use an appropriate method for estimating differential expression from RNA-Seq data (e.g., DESeq2). 

      Unfortunately, the previously published data we included as figure S8 (now figure S9) did not include replicates, and we agree that it does not rigorously show an effect. The reviewer suggests that we analyze the data by DESeq2, which requires replicates, and thus, cannot be done. Instead we have clarified this. If the reviewer is not satisfied with this, we are prepared to delete it.

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review): 

      Overall, the manuscript reveals the role of actin polymerization to drive the fusion of myoblasts during adult muscle regeneration. This pathway regulates fusion in many contexts, but whether it was conserved in adult muscle regeneration remained unknown. Robust genetic tools and histological analyses were used to support the claims convincingly. 

      We very much appreciate the positive comments from this Reviewer.

      There are a few interpretations that could be adjusted. 

      The beginning of the results about macrophages traversing ghost fibers after regeneration was a surprise given the context in the abstract and introduction. These results also lead to new questions about this biology that would need to be answered to substantiate the claims in this section. Also, it is unclear the precise new information learned here because it seems obvious that macrophages would need to extravasate the basement membrane to enter ghost fibers and macrophages are known to have this ability. Moreover, the model in Figure 4D has macrophages and BM but there is not even mention of this in the legend. The authors may wish to consider removing this topic from the manuscript. 

      We appreciate this comment and acknowledge that the precise behavior of macrophages when they infiltrate and/or exit the ghost fibers during muscle regeneration is not the major focus of this study. However, we think that visualizing macrophages squeezing through tiny openings on the basement membrane to infiltrate and/or exit from the ghost fibers is valuable. Thus, we have moved the data from the original main Figure 2 to the new Figure S1. 

      Regarding the model in Figure 4D, we have removed the macrophages because the depicted model represents a stage after the macrophages’ exit from the ghost fiber. 

      Which Pax7CreER line was used? In the methods, the Jax number provided is the Gaka line but in the results, Lepper et al 2009 are cited, which is not the citation for the Gaka line. 

      The Pax7<sup>CreER</sup> line used in this study is the one generated in Lepper et al. 2009. We corrected this information in “Material and Methods” of the revised manuscript. 

      Did the authors assess regeneration in the floxed mice that do not contain Cre as a control? Or is it known these alleles do not perturb the function of the targeted gene? 

      We examined muscle regeneration in the floxed mice without Cre. As shown in Figure 1 below, none of the homozygous ArpC2<sup>fl/fl</sup>, N-WASP<sup>fl/fl</sup>, CYFIP1<sup>fl/fl</sup> or N-WASP<sup>fl/fl</sup>;CYFIP1<sup>fl/fl</sup> alleles affected  muscle regeneration, indicating that these alleles do not perturb the function of the targeted gene.  

      Author response image 1.

      The muscle regeneration was normal in mice with only floxed target gene(s). Cross sections of TA muscles were stained with anti-Dystrophin and DAPI at dpi 14. n = 3 mice of each genotype, and > 80 ghost fibers in each mouse were examined. Mean ± s.d. values are shown in the dot-bar plot, and significance was determined by two-tailed student’s t-test. ns: not significant. Scale bar: 100 μm.

      The authors comment: 'Interestingly, expression of the fusogenic proteins, MymK and MymX, was up-regulated in the TA muscle of these mice (Figure S4F), suggesting that fusogen overexpression is not able to rescue the SCM fusion defect resulted from defective branched actin polymerization.' It is unclear if fusogens are truly overexpressed because the analysis is performed at dpi 4 when the expression of fusogens may be decreased in control mice because they have already fused. Also, only two animals were analyzed and it is unclear if MymX is definitively increased. The authors should consider adjusting the interpretation to SCM fusion defect resulting from defective branched actin polymerization is unlikely to be caused by a lack of fusogen expression. 

      We agree with the Reviewer that fusogen expression may simply persist till later time points in fusion mutants without being up-regulated. We have modified our interpretation according to the Reviewer’s suggestion. 

      Regarding the western blots in the original Figure S4F, we now show one experiment from each genotype, and include the quantification of MymK and MymX protein levels from 3 animals in the revised manuscript (new Figure S5F-S5H). 

      Reviewer #1 (Recommendations for the authors): 

      (1) The ArpC2 cKO data could be presented in a clearer fashion. In the text, ArpC2 is discussed but in the figure, there are many other KOs presented and ArpC2 is the fourth one shown in the figure. The other KOs are discussed later. It may be worthwhile for the authors to rearrange the figures to make it easier for readers. 

      Thank you for this suggestion. We have rearranged the genotypes in the figures accordingly and placed ArpC2 cKO first. 

      The authors comment: 'Since SCM fusion is mostly completed at dpi 4.5 (Figure 1B) (Collins et al. 2024)'. This is not an accurate statement of the cited paper. While myofibers are formed by dpi 4.5 with centralized nuclei, there are additional fusion events through at least 21dpi. The authors should adjust their statement to better reflect the data in Collins et al 2024, which could include mentioning that primary fusions could be completed at dpi 4.5 and this is the process they are studying. 

      We have adjusted our statement accordingly in the revised manuscript.

      The authors comment: 'Consistent with this, the frequency distribution of SCM number per ghost fiber displayed a dramatic shift toward higher numbers in the ArpC2<sup>cKO</sup> mice (Figure S5C). These results indicate that the actin cytoskeleton plays an essential role in SCM fusion as the fusogenic proteins. Should it read 'These results indicate that the actin cytoskeleton plays AS an essential role in SCM fusion as the fusogenic proteins'? 

      Yes, and we adjusted this statement accordingly in the revised manuscript. 

      Minor comments 

      (1) In the results the authors state 'To induce genetic deletion of ArpC2 in satellites....'; 'satellites' is a term not typically used for satellite cells. 

      Thanks for catching this. We changed “satellites” to satellite cells.

      (2) In the next sentence, the satellite should be capitalized. 

      Done.

      (3) The cross-section area should be a 'cross-sectional area'. 

      Changed.

      Reviewer #2 (Public review):

      To fuse, differentiated muscle cells must rearrange their cytoskeleton and assemble actinenriched cytoskeletal structures. These actin foci are proposed to generate mechanical forces necessary to drive close membrane apposition and fusion pore formation. 

      While the study of these actin-rich structures has been conducted mainly in drosophila, the present manuscript presents clear evidence this mechanism is necessary for the fusion of adult muscle stem cells in vivo, in mice. 

      We thank this Reviewer for the positive comment.

      However, the authors need to tone down their interpretation of their findings and remember that genetic proof for cytoskeletal actin remodeling to allow muscle fusion in mice has already been provided by different labs (Vasyutina E, et al. 2009 PMID: 19443691; Gruenbaum-Cohen Y, et al., 2012 PMID: 22736793; Hamoud et al., 2014 PMID: 24567399). In the same line of thought, the authors write they "demonstrated a critical function of branched actin-propelled invasive protrusions in skeletal muscle regeneration". I believe this is not a premiere, since Randrianarison-Huetz V, et al., previously reported the existence of finger-like actin-based protrusions at fusion sites in mice myoblasts (PMID: 2926942) and Eigler T, et al., live-recorded said "fusogenic synapse" in mice myoblasts (PMID: 34932950). Hence, while the data presented here clearly demonstrate that ARP2/3 and SCAR/WAVE complexes are required for differentiating satellite cell fusion into multinucleated myotubes, this is an incremental story, and the authors should put their results in the context of previous literature. 

      In this study, we focused on elucidating the mechanisms of myoblast fusion during skeletal muscle regeneration, which remained largely unknown. Thus, we respectfully disagree with this Reviewer that “this is an incremental story” for the following reasons – 

      First, while we agree with this Reviewer that “genetic proof for cytoskeletal actin remodeling to allow muscle fusion in mice has already been provided by different labs”, most of the previous genetic studies, including ours (Lu et al. 2024), characterizing the roles of actin regulators (Elmo, Dock180, Rac, Cdc42, WASP, WIP, WAVE, Arp2/3) in mouse myoblast fusion were conducted during embryogenesis (Laurin et al. 2008; Vasyutina et al. 2009; Gruenbaum-Cohen et al. 2012; Tran et al. 2022; Lu et al. 2024), instead of during adult muscle regeneration, the latter of which is the focus of this study. 

      Second, prior to this study, several groups tested the roles of SRF, CaMKII theta and gemma, Myo10, and Elmo, which affect actin cytoskeletal dynamics, in muscle regeneration. These studies have shown that knocking out SRF, CaMKII, Myo10, or Elmo caused defects in mouse muscle regeneration, based on measuring the cross-sectional diameters of regenerated myofibers only (Randrianarison-Huetz et al. 2018; Eigler et al. 2021; Hammers et al. 2021; Tran et al. 2022). However, none of these studies visualized myoblast fusion at the cellular and subcellular levels during muscle regeneration in vivo. For this reason, it remained unclear whether the muscle regeneration defects in these mutants were indeed due to defects in myoblast fusion, in particular, defects in the formation of invasive protrusions at the fusogenic synapse. Thus, the previous studies did not demonstrate a direct role for the actin cytoskeleton, as well as the underlying mechanisms, in myoblast fusion during muscle regeneration in vivo.

      Third, regarding actin-propelled invasive protrusions at the fusogenic synapse, our previous study (Lu et al. 2024) revealed these structures by fluorescent live cell imaging and electron microscopy (EM) in cultured muscle cells, as well as EM studies in mouse embryonic limb muscle, firmly establishing a direct role for invasive protrusions in mouse myoblast fusion in cultured muscle cells and during embryonic development. Randrianarison-Huetz et al. (2018) reported the existence of finger-like actin-based protrusions at cell contact sites of cultured mouse myoblasts. It was unclear from their study, however, if these protrusions were at the actual fusion sites and if they were invasive (Randrianarison-Huetz et al. 2018). Eigler et al. (2021) reported protrusions at fusogenic synapse in cultured mouse myoblasts. It was unclear from their study, however, if the protrusions were actin-based and if they were invasive (Eigler et al. 2021). Neither Randrianarison-Huetz et al. (2018) nor Eigler et al. (2021) characterized protrusions in developing mouse embryos or regenerating adult muscle. 

      Taken together, to our knowledge, this is the first study to characterize myoblast fusion at the cellular and subcellular level during mouse muscle regeneration. We demonstrate that branched actin polymerization promotes invasive protrusion formation and myoblast fusion during the regeneration process. We believe that this work has laid the foundation for additional mechanistic studies of myoblast fusion during skeletal muscle regeneration.

      The citations in the original manuscript were primarily focused on previous in vivo studies of Arp2/3 and the actin nucleation-promoting factors (NPFs), N-WASP and WAVE (Richardson et al. 2007; Gruenbaum-Cohen et al. 2012), and of invasive protrusions mediating myoblast fusion in intact animals (Drosophila, zebrafish and mice) (Sens et al. 2010; Luo et al. 2022; Lu et al. 2024). We agree with this reviewer, however, that it would be beneficial to the readers if we provide a more comprehensive summary of previous literature, including studies of both intact animals and cultured cells, as well as studies of additional actin regulators upstream of the NPFs, such as small GTPases and their GEFs. Thus, we have significantly expanded our Introduction to include these studies and cited the corresponding literature in the revised manuscript.

      Reviewer #2 (Recommendations for the authors): 

      (1) I am concerned that the authors did not evaluate the efficiency of the target allele deletion efficiency following Pax7-CreER activation. The majority, if not all, of the published work focusing on this genetic strategy presents the knock-down efficiency using either genotyping PCR, immunolocalization, western-blot; etc... 

      (2) Can the authors provide evidence that the N-WASP, CYFIP1, and ARPC2 proteins are depleted in TAM-treated tissue? Alternatively, can the author perform RT-qPCR on freshly isolated MuSCs to validate the absence of N-WASP, CYFIP1, and ARPC2 mRNA expression?

      Thank you for these comments. We have assessed the target allele deletion efficiency with isolated satellite cells from TAM-injected mice in which Pax7-CreER is activated. Western blot analyses showed that the protein levels of N-WASP, CYFIP1, and ArpC2 significantly decreased in the satellite cells of knockout mice. Please see the new Figure S2.

      Reviewer #3 (Public review): 

      The manuscript by Lu et al. explores the role of the Arp2/3 complex and the actin nucleators NWASP and WAVE in myoblast fusion during muscle regeneration. The results are clear and compelling, effectively supporting the main claims of the study. However, the manuscript could benefit from a more detailed molecular and cellular analysis of the fusion synapse. Additionally, while the description of macrophage extravasation from ghost fibers is intriguing, it seems somewhat disconnected from the primary focus of the work. 

      Despite this, the data are robust, and the major conclusions are well supported. Understanding muscle fusion mechanism is still a widely unexplored topic in the field and the authors make important progress in this domain. 

      We appreciate the positive comments from this Reviewer.

      We agree with this Reviewer and Reviewer #1 that the macrophage study is not the primary focus of the work. However, we think that visualizing macrophages squeezing through tiny openings on the basement membrane to infiltrate and/or exit from the ghost fibers is valuable. Thus, we have moved the data from the original main Figure 2 to the new Figure S1. 

      I have a few suggestions that might strengthen the manuscript as outlined below.  

      (1) Could the authors provide more detail on how they defined cells with "invasive protrusions" in Figure 4C? Membrane blebs are commonly observed in contacting cells, so it would be important to clarify the criteria used for counting this specific event. 

      Thanks for this suggestion. We define invasive protrusions as finger-like protrusions projected by a cell into its fusion partner. Based on our previous studies (Sens et al. 2010; Luo et al. 2022; Lu et al. 2024), these invasive protrusions are narrow (with 100-250 nm diameters) and propelled by mechanically stiff actin bundles. In contrast, membrane blebs are spherical protrusions formed by the detachment of the plasma membrane from the underlying actin cytoskeleton. In general, the blebs are not as mechanically stiff as invasive protrusions and would not be able to project into neighboring cells. Thus, we do not think that the protrusions in Figure 4B are membrane blebs. We clarified the criteria in the text and figure legends of the revised manuscript.

      (2) Along the same line, please clarify what each individual dot represents in Figure 4C. The authors mention quantifying approximately 83 SCMs from 20 fibers. I assume each dot corresponds to data from individual fibers, but if that's the case, does this imply that only around four SCMs were quantified per fiber? A more detailed explanation would be helpful. 

      To quantitatively assess invasive protrusions in Ctrl and mutant mice, we analyzed 20 randomly selected ghost fibers per genotype. Within each ghost fiber, we examined randomly selected SCMs in a single cross section (a total of 83, 147 and 93 SCMs in Ctrl, ArpC2<sup>cKO</sup> and MymX<sup>cKO</sup> mice were examined, respectively). 

      In Figure 4C, each dot was intended to represent the percentage of SCMs with invasive protrusions in a single cross section of a ghost fiber. However, we mistakenly inserted a wrong graph in the original Figure 4C. We sincerely apologize for this error and have replaced it with the correct graph in the new Figure 4C.

      (3) Localizing ArpC2 at the invasive protrusions would be a strong addition to this study. Furthermore, have the authors examined the localization of Myomaker and Myomixer in ArpC2 mutant cells? This could provide insights into potential disruptions in the fusion machinery.

      We have examined the localization of the Arp2/3 complex on the invasive protrusions in cultured SCMs and included the data in Figure 4A of the original manuscript. Specifically, we showed enrichment of mNeongreen-tagged Arp2, a subunit of the Arp2/3 complex, on the invasive protrusions at the fusogenic synapse of cultured SCMs (see the enlarged panels on the right; also see supplemental video 4). The small size of the invasive protrusions on SCMs prevented a detailed analysis of the precise Arp2 localization along the protrusions.  Please see our recently published paper (Lu et al. 2024) for the detailed localization and function of the Arp2/3 complex during invasive protrusion formation in cultured C2C12 cells. 

      We have also attempted to localize the Arp2/3 complex in the regenerating muscle in vivo using an anti-ArpC2 antibody (Millipore, 07-227-I), which was used in many studies to visualize the Arp2/3 complex in cultured cells. Unfortunately, the antibody detected non-specific signals in the regenerating TA muscle of the ArpC2<sup>cKO</sup> animals. Thus, it cannot be used to detect specific ArpC2 signals in muscle tissues. Besides the specificity issue of the antibody, it is technically challenging to visualize invasive protrusions with an F-actin probe at the fusogenic synapses of regenerating muscle by light microscopy, due to the high background of F-actin signaling within the muscle cells. 

      Regarding the fusogens, we show that both are present in the TA muscle of the ArpC2<sup>cKO</sup> animals by western blot (Figure S5F-S5H). Thus, the fusion defect in these animals is not due to the lack of fusogen expression. Since the focus of this study is on the role of the actin cytoskeleton in muscle regeneration, the subcellular localization of the fusogens was not investigated in the current study. 

      (4) As a minor curiosity, can ArpC2 WT and mutant cells fuse with each other?

      Our previous work in Drosophila embryos showed that Arp2/3-mediated branched actin polymerization is required in both the invading and receiving fusion partners (Sens et al. 2010).  To address this question in mouse muscle cells, we co-cultured GFP<sup>+</sup> WT cells with mScarleti<sup>+</sup> WT (or mScarleti<sup>+</sup> ArpC2<sup>cKO</sup> cells) in vitro and assessed their ability to fuse with one another. We found that ArpC2<sup>cKO</sup> cells could barely fuse with WT cells (new Figure 3F and 3G), indicating that the Arp2/3-mediated branched actin polymerization is required in both fusion partners. This result is consistent with our findings in Drosophila embryos. 

      (5) The authors report a strong reduction in CSA at 14 dpi and 28 dpi, attributing this defect primarily to failed myoblast fusion. Although this claim is supported by observations at early time points, I wonder whether the Arp2/3 complex might also play roles in myofibers after fusion. For instance, Arp2/3 could be required for the growth or maintenance of healthy myofibers, which could also contribute to the reduced CSA observed, since regenerated myofibers inherit the ArpC2 knockout from the stem cells. Could the authors address or exclude this possibility? This is rather a broader criticism of how things are being interpreted in general beyond this paper. 

      This is an interesting question. It is possible that Arp2/3 may play a role in the growth or maintenance of healthy myofibers. However, the muscle injury and regeneration process may not be the best system to address this question because of the indispensable early step of myoblast fusion. Ideally, one may want to knockout Arp2/3 in myofibers of young healthy mice and observe fiber growth in the absence of muscle injury and compare that to the wild-type littermates. Since these experiments are out of the scope of this study, we revised our conclusion that the fusion defect in ArpC2<sup>cKO</sup> mice should account, at least in part, for the strong reduction in CSA at 14 dpi and 28 dpi, without excluding additional possibilities such as Arp2/3’s potential role in the growth or maintenance of healthy myofibers.  

      References:

      Eigler T, Zarfati G, Amzallag E, Sinha S, Segev N, Zabary Y, Zaritsky A, Shakked A, Umansky KB, Schejter ED et al. 2021. ERK1/2 inhibition promotes robust myotube growth via CaMKII activation resulting in myoblast-to-myotube fusion. Dev Cell 56: 3349-3363 e3346.

      Gruenbaum-Cohen Y, Harel I, Umansky KB, Tzahor E, Snapper SB, Shilo BZ, Schejter ED. 2012. The actin regulator N-WASp is required for muscle-cell fusion in mice. Proc Natl Acad Sci U S A 109: 11211-11216.

      Hammers DW, Hart CC, Matheny MK, Heimsath EG, Lee YI, Hammer JA, 3rd, Cheney RE, Sweeney HL. 2021. Filopodia powered by class x myosin promote fusion of mammalian myoblasts. Elife 10.

      Laurin M, Fradet N, Blangy A, Hall A, Vuori K, Cote JF. 2008. The atypical Rac activator Dock180 (Dock1) regulates myoblast fusion in vivo. Proc Natl Acad Sci U S A 105: 15446-15451.

      Lu Y, Walji T, Ravaux B, Pandey P, Yang C, Li B, Luvsanjav D, Lam KH, Zhang R, Luo Z et al. 2024. Spatiotemporal coordination of actin regulators generates invasive protrusions in cell-cell fusion. Nat Cell Biol 26: 1860-1877.

      Luo Z, Shi J, Pandey P, Ruan ZR, Sevdali M, Bu Y, Lu Y, Du S, Chen EH. 2022. The cellular architecture and molecular determinants of the zebrafish fusogenic synapse. Dev Cell 57: 1582-1597 e1586.

      Randrianarison-Huetz V, Papaefthymiou A, Herledan G, Noviello C, Faradova U, Collard L, Pincini A, Schol E, Decaux JF, Maire P et al. 2018. Srf controls satellite cell fusion through the maintenance of actin architecture. J Cell Biol 217: 685-700.

      Richardson BE, Beckett K, Nowak SJ, Baylies MK. 2007. SCAR/WAVE and Arp2/3 are crucial for cytoskeletal remodeling at the site of myoblast fusion. Development 134: 4357-4367.

      Sens KL, Zhang S, Jin P, Duan R, Zhang G, Luo F, Parachini L, Chen EH. 2010. An invasive podosome-like structure promotes fusion pore formation during myoblast fusion. J Cell Biol 191: 1013-1027.

      Tran V, Nahle S, Robert A, Desanlis I, Killoran R, Ehresmann S, Thibault MP, Barford D, Ravichandran KS, Sauvageau M et al. 2022. Biasing the conformation of ELMO2 reveals that myoblast fusion can be exploited to improve muscle regeneration. Nat Commun 13: 7077.

      Vasyutina E, Martarelli B, Brakebusch C, Wende H, Birchmeier C. 2009. The small G-proteins Rac1 and Cdc42 are essential for myoblast fusion in the mouse. Proc Natl Acad Sci U S A 106: 8935-8940.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      EnvA-pseudotyped glycoprotein-deleted rabies virus has emerged as an essential tool for tracing monosynaptic inputs to genetically defined neuron populations in the mammalian brain. Recently, in addition to the SAD B19 rabies virus strain first described by Callaway and colleagues in 2007, the CVS N2c rabies virus strain has become popular due to its low toxicity and high trans-synaptic transfer efficiency. However, despite its widespread use in the mammalian brain, particularly in mice, the application of this cell-type-specific monosynaptic rabies tracing system in zebrafish has been limited by low labeling efficiency and high toxicity. In this manuscript, the authors aimed to develop an efficient retrograde monosynaptic rabies-mediated circuit mapping tool for larval zebrafish. Given the translucent nature of larval zebrafish, whole-brain neuronal activities can be monitored, perturbed, and recorded over time. Introducing a robust circuit mapping tool for larval zebrafish would enable researchers to simultaneously investigate the structure and function of neural circuits, which would be of significant interest to the neural circuit research community. Furthermore, the ability to track rabies-labeled cells over time in the transparent brain could enhance our understanding of the trans-synaptic retrograde tracing mechanism of the rabies virus. 

      To establish an efficient rabies virus tracing system in the larval zebrafish brain, the authors conducted meticulous side-by-side experiments to determine the optimal combination of trans-expressed rabies G proteins, TVA receptors, and recombinant rabies virus strains. Consistent with observations in the mouse brain, the CVS N2c strain trans-complemented with N2cG was found to be superior to the SAD B19 combination, offering lower toxicity and higher efficiency in labeling presynaptic neurons. Additionally, the authors tested various temperatures for the larvae post-virus injection and identified 36℃ as the optimal temperature for improved virus labeling. They then validated the system in the cerebellar circuits, noting evolutionary conservation in the cerebellar structure between zebrafish and mammals. The monosynaptic inputs to Purkinje cells from granule cells were neatly confirmed through ablation experiments.

      However, there are a couple of issues that this study should address. Additionally, conducting some extra experiments could provide valuable information to the broader research field utilizing recombinant rabies viruses as retrograde tracers.

      (1) It was observed that many radial glia were labeled, which casts doubt on the specificity of trans-synaptic spread between neurons. The issues of transneuronal labeling of glial cells should be addressed and discussed in more detail. In this manuscript, the authors used a transgenic zebrafish line carrying a neuron-specific Cre-dependent reporter and EnvA-CVS N2c(dG)-Cre virus to avoid the visualization of virally infected glial cells. However, this does not solve the real issue of glial cell labeling and the possibility of a nonsynaptic spread mechanism.

      In agreement with the reviewer’s suggestion, we have incorporated a standalone section in the revised Discussion (page 9) to address the issue of transneuronal glial labeling, including its spatial distribution, temporal dynamics, potential mechanisms, and possible strategies for real resolution.

      Regarding the specificity of trans-synaptic spread between neurons, we have demonstrated that our transsynaptic tracing system reliably and specifically labels input neurons. Structurally, we only observed labeling of inferior olivary cells (IOCs) outside the cerebellum, which are the only known extracerebellar inputs to Purkinje cells (PCs), while all other traced neurons remained confined within the cerebellum throughout the observation period (see Figure 2G–I). Functionally, we verified that the traced neurons formed synaptic connections with the starter PCs (see Figure 2J–M). Together, these findings support the conclusion that our system enables robust and specific retrograde monosynaptic tracing of neurons in larval zebrafish.

      Regarding the transneuronal labeling of radial glia cells, we observed that their distribution closely correlates with the location of neuronal somata and dendrites (see Author response image 2). In zebrafish, radial glial cells are considered functional analogs of astrocytes and are often referred to as radial astroglia. The adjacent labeled astroglia may participate in tripartite synapses with the starter neurons and express viral receptors that enable RV particle entry at postsynaptic sites. This suggests that rabies-based tracing in zebrafish may serve as a valuable tool for identifying synaptically associated and functionally connected glia. Leveraging this approach to investigate glia–neuron interactions represents a promising direction for future research.

      In our system, the glial labeling diminishes at later larval stages, likely due to abortive infection (see Author response image 3 and relevant response). However, the eventual clearance of infection does not preclude the initial infection of glial cells, which may compete with neuronal labeling and reduce overall tracing efficiency. Notably, transneuronal infection of glial cells by RV has also been observed in mammals (Marshel et al., 2010). To minimize such off-target labeling, future work should focus on elucidating the mechanisms underlying glial susceptibility—such as receptor-mediated viral entry— and developing strategies to suppress receptor expression specifically in glia, thereby improving the specificity and efficiency of neuronal circuit tracing.

      In addition, wrong citations in Line 307 were made when referring to previous studies discovering the same issue of RVdG-based transneuronal labeling radial glial cells. "The RVdG-based transneuronal labeling of radial glial cells was commonly observed in larval zebrafish29,30".

      The cited work was conducted using vesicular stomatitis virus (VSV). A more thorough analysis and/or discussion on this topic should be included.

      We thank the reviewer for pointing out the citation inaccuracy. The referenced study employed vesicular stomatitis virus (VSV), which, like RV, is a member of the Rhabdoviridae family. We have revised the text accordingly—from "RVdG-based transneuronal labeling of radial glial cells…" to " Transneuronal labeling of radial glial cells mediated by VSV, a member of the Rhabdoviridae family like RV, has been commonly observed in larval zebrafish" (page 9, line 347).

      Several key questions should be addressed:

      Does the number of labeled glial cells increase over time? 

      Yes, as shown in Figure 2—figure supplement 1C and G, the number of labeled radial glial cells significantly increased from 2 to 6 days post-injection (dpi). This phenomenon has been addressed in the revised Discussion section (page 9, line 357).

      Do they increase at the same rate over time as labeled neurons?

      Although glial cell labeling continued to increase over time, we observed a slowdown in labeling rate between 6 and 10 dpi, as shown in Figure 2—figure supplement 1C and G. Therefore, we divided the timeline into two intervals (2–6 and 6–10 dpi) to compare the rate of increase in labeling between neurons and glia. The rate (R) was defined as the daily change in convergence index. To quantify the difference between neuronal and glial labeling rates, we calculated a labeling rate index: R<sub>g</sub>−R<sub>n</sub>, where R<sub>g</sub> and R<sub>n</sub> denote the rates for glia and neurons, respectively) (Author response image1). Our analysis revealed that, between 2 and 6 dpi, glial cells exhibited a higher labeling rate than neurons. However, this trend reversed between 6 and 10 dpi, with neurons surpassing glial cells in labeling rate. These findings have been included in the revised Discussion section (page 9).

      Author response image 1.

      Labeling rate index of glia and neurons across two time intervals. Data points represent the mean labeling rate index for each tracing strategy within each time interval. *P < 0.05 (nonparametric two-tailed Mann-Whitney test).  

      Are the labeled glial cells only present around the injection site?

      We believe the reviewer is inquiring whether labeled glial cells are spatially restricted to the vicinity of starter neurons. The initial infection is determined by the expression of TVA rather than the injection site. For example, injecting a high volume of virus into the anterior hindbrain resulted in the infection of TVA-expressing cells in distant regions, including the 109 tectum and posterior hindbrain (Author response image 2). 

      Regarding glial labeling, PC starter experiments showed that labeled glial cells (i.e. Bergmann glia) were predominantly localized within the cerebellum, likely due to the confinement of PC dendrites to this region. When using vglut2a to define starter neurons, glial labeling was frequently observed near the soma and dendrites of starter cells (14 out 114 of 17 cases; Author response image 2). These observations suggest that transneuronal labeled glial cells may be synaptically associated with the starter neurons. We have included this point in the revised Discussion section (page 9).

      Author response image 2.

      Location of transneuronal labeled glial cells. (a and b) Confocal images showing the right tectum (a) and posterior hindbrain (b) of different WT larvae expressing EGFP and TVA using UGNT in randomly sparse neurons (vglut2a<sup>+</sup>) and infected with CVSdGtdTomato[EnvA] (magenta) injected into the anterior hindbrain. Dashed yellow circles, starter neurons (EGFP<sup>+</sup>/tdTomato<sup>+</sup>); gray arrows, transneuronally labeled radial glia (tdTomato<sup>+</sup>/EGFP<sup>−</sup>); dashed white lines, tectum or hindbrain boundaries. C, caudal; R, rostral. Scale bars, 20 μm.

      Can the phenomenon of transneuronal labeling of radial glial cells be mitigated if the tracing is done in slightly older larvae?

      Yes, we agree. As elaborated in the following response, we hypothesize that the loss of fluorescence in radial glial cells at later developmental stages is due to abortive infection (see Author response image 3 and associated response). This supports the notion that abortive infection becomes increasingly pronounced as larvae mature, potentially explaining the negligible glial labeling observed in adult zebrafish (Dohaku et al., 2019; Satou et al., 2022). However, as noted in our response to the first comment, the disappearance of fluorescence does not indicate the absence of viral entry. Viral receptors may express on glial cells, allowing initial infection despite a failure in subsequent replication. Consequently, glial infection—though abortive—may still compete with neuronal infection and reduce tracing efficiency.

      What is the survival rate of the infected glial cells over time?

      We observed the disappearance of glial fluorescence after transneuronal labeling, while we did not observe punctate fluorescent debris typically indicative of apoptotic cell death. Therefore, we favor the hypothesis that the loss of glial fluorescence results from abortive infection rather than cell death. Abortive infection refers to a scenario in which viral replication is actively suppressed by host antiviral responses, preventing the production of infectious viral particles. For example, recent studies have shown that lab-attenuated rabies virus (RV) induces the accumulation of aberrant double-stranded DNA in astrocytes, which activates mitochondrial antiviral-signaling protein (MAVS) and subsequent interferon expression (Tian et al., 2018). This antiviral response inhibits RV replication, ultimately resulting in abortive infection. 

      In addition, we quantified the proportion of glial cells labeled at 2 dpi and 4dpi that retained fluorescence over time. By 6 dpi (approximately 11 dpf), glial labeling had largely diminished in both groups (Author response image 3). These results suggest that the decline in glial fluorescence is more closely linked to larval age than to the duration of glial infection, supporting the notion of abortive infection. This also addresses the reviewer’s earlier concern and indicates that glial labeling is mitigated in older larvae.

      Author response image 3.

      Fraction of glial cells with fluorescence retention. (a and b) Proportion of glial cells labeled at 2 dpi (a) and 4 dpi (b) that retained fluorescence over time. Data are from the CVS|N2cG|36°C group. In boxplots: center, median; bounds of box, first and third quartiles; whiskers, minimum and maximum values. n.s., not-significant; *P < 0.05, **P < 0.01 (nonparametric two-tailed Mann-Whitney test).

      If an infected glial cell dies due to infection or gets ablated, does the rabies virus spread from the dead glial cells?

      In our system, glial cells do not express the rabies glycoprotein (G). Therefore, even if glial cells are transneuronally infected, they cannot support viral budding or assembly of infectious particles due to the absence of G (Mebatsion et al., 1996), preventing further viral propagation to neighboring cells.

      If TVA and rabies G are delivered to glial cells, followed by rabies virus injection, will it lead to the infection of other glial cells or neurons?

      We have conducted experiments in which TVA and rabies G were specifically expressed in astroglia using the gfap promoter, followed by RVdG-mCherry[EnvA] injection. This resulted in initial infection of TVA-positive astroglia and occasional subsequent labeling of nearby TVA-negative astroglia (Author response image 4), suggesting astroglia-toastroglia transmission. Notably, no neuronal labeling was observed. This glial-to-glial spread is consistent with previous rabies tracing studies reporting similar phenomena involving the interaction of astrocytes with astrocytes and microglia (Clark et al., 2021). However, the underlying mechanism remains unclear, and we have discussed this in response to the first comment.

      Author response image 4.

      Viral tracing initiated from astroglia. (a) Confocal images of the tectum of a larva expressing EGFP and TVA using UGBT in randomly sparse astroglia (gfap<sup>+</sup>) and infected by SADdG-mCherry[EnvA] (magenta) injected into the anterior hindbrain.  (b) Confocal images of the posterior hindbrain of a larva expressing EGFP and TVA using UGNT in randomly sparse astroglia (gfap<sup>+</sup>) and infected by CVSdG-tdTomato[EnvA] (magenta) injected into the anterior hindbrain. Dashed yellow circles, starter astroglia (EGFP+/mCherry<su>+</sup> or EGFP<sup>+</sup>/tdTomato<sup>+</sup>); gray arrows, transneuronally labeled astroglia (tdTomato<sup>+</sup>/EGFP<sup>−</sup>); dashed white lines, tectum or hindbrain boundaries. C, caudal; R, rostral. Scale bars, 20 μm.<br />

      Answers to any of these questions could greatly benefit the broader research community.

      (2) The optimal virus tracing effect has to be achieved by raising the injected larvae at 36C. Since the routine temperature of zebrafish culture is around 28C, a more thorough characterization of the effect on the health of zebrafish should be conducted.

      Yes, 36°C is required to achieve optimal labeling efficiency. Although this is above the standard zebrafish culture temperature (28°C), previous work (Satou et al., 2022) and our observations indicate that this transient elevation does not adversely affect larval health within the experimental time window. 

      In the previous study, Satou et al. reported no temperature-dependent effects on swimming behavior, social interaction, or odor discrimination in adult fish maintained at 28°C and 36°C. In larvae, both non-injected and virus-injected fish showed a decrease in survival at later time points (7 dpi), with slightly increased mortality observed at elevated temperatures.

      In our study, we raised the same batch of non-virus-injected larvae at 28°C and 36°C, and found no mortality over a 10-day period. For CVS-N2c-injected larvae, electrode insertion caused injury, but survival rates remained around 80% at both temperatures (see Figure 3A). Moreover, we successfully maintained CVS-N2c-injected larvae at 36°C for over a month, indicating that elevated temperature does not adversely affect fish health. Notably, higher temperatures were associated with an accelerated developmental rate. 

      This point was briefly addressed in the previous version and has now been further elaborated in the revised Discussion section (page 8).

      (3) Given the ability of time-lapse imaging of the infected larval zebrafish brain, the system can be taken advantage of to tackle important issues of rabies virus tracing tools.

      a) Toxicity. 

      The toxicity of rabies viruses is an important issue that limits their application and affects the interpretation of traced circuits. For example, if a significant proportion of starter cells die before analysis, the traced presynaptic networks cannot be reliably assigned to a "defined" population of starter cells. In this manuscript, the authors did an excellent job of characterizing the effects of different rabies strains, G proteins derived from various strains, and levels of G protein expression on starter cell survival. However, an additional parameter that should be tested is the dose of rabies virus injection. The current method section states that all rabies virus preparations were diluted to 2x10^8 infection units per ml, and 2-5 nl of virus suspension was injected near the target cells. It would be interesting to know the impact of the dose/volume of virus injection on retrograde tracing efficiency and toxicity. Would higher titers of the virus lead to more efficient labeling but stronger toxicities? What would be the optimal dose/volume to balance efficiency and toxicity? Addressing these questions would provide valuable insights and help optimize the use of rabies viruses for circuit tracing.

      This is an important concern. Viral cytotoxicity is primarily driven by the level of viral transcription and replication, which inhibits host protein synthesis (Komarova et al., 2007). The RVdG-EnvA typically infects cells at a rate of one viral particle per cell (Zhang et al., 2024), suggesting that increasing viral concentration does not proportionally increase percell infection. Accordingly, viral titer and injection volume are unlikely to influence cytotoxicity at the single-cell level. In our experiments, injection volumes up to 20 nl (i.e., 4 to 10 times the standard injection volume) did not affect starter cell survival. However, higher titers or volumes may increase the number of initially infected starter cells, potentially leading to greater overall mortality in larval zebrafish.

      Similarly, given that rabies virus typically infects cells at one particle per cell, increasing viral titer alone is unlikely to enhance tracing efficiency once the virus type is fixed. In contrast, the level of G protein expression significantly influences tracing efficiency (see Figure 2D). However, excessive G protein expression reduces the survival of starter cells (see Figure 3D). Therefore, careful control of G protein levels is essential to balance tracing efficiency and cytotoxicity.

      Notably, regardless of whether infected cells undergo apoptosis or necrosis due to cytotoxicity, the resulting disruption of the plasma membrane severely impairs viral budding. As a result, the formation of intact, G protein-enveloped viral particles is prevented, limiting further infection of neighboring neurons.

      The latest second-generation ΔGL RV vectors (Jin et al., 2024), which lack both the G and L (viral polymerase) genes, have been shown to markedly reduce cytotoxicity. These improved tracing strategies may be explored in future zebrafish studies to further optimize labeling efficiency and cell viability.

      The issue of viral titer and volume has been addressed in the revised Discussion section (page 10).

      b) Primary starters and secondary starters: 

      Given that the trans-expression of TVA and G is widespread, there is the possibility of coexistence of starter cells from the initial infection (primary starters) and starter cells generated by rabies virus spreading from the primary starters to presynaptic neurons expressing G. This means that the labeled input cells could be a mixed population connected with either the primary or secondary starter cells.

      It would be immensely interesting if time-lapse imaging could be utilized to observe the appearance of such primary and secondary starter cells. Assuming there is a time difference between the initial appearance of these two populations, it may be possible to differentiate the input cells wired to these populations based on a similar temporal difference in their initial appearance. This approach could provide valuable insights into the dynamics of rabies virus spread and the connectivity of neural circuits.

      The reviewers suggestion is valuable. Regarding the use of Purkinje cells (PCs) as starter cells, we consider the occurrence of secondary PCs to be extremely rare. Although previous evidence suggests that PCs can form synaptic connections with one another (Chang et al., 2020), our sparse labeling strategy—typically involving fewer than 10 labeled cells— significantly reduces the likelihood of viral transmission between PC starter cells. In addition, if secondary starter PCs were frequently generated, we would expect increased tracing efficiency at 10 dpi compared to 6 dpi. However, our results show no significant difference (see Figure 2—figure supplement 1C and G). 

      Given the restricted expression of TVA and G in PCs, even if a limited number of secondary starters were generated, the labeled inputs would predominantly be granule cells (GCs), thereby preserving the cell-type identity of upstream inputs. While this raises a potential concern regarding an overestimation of the convergence index (CI). Notably, within the GC-PC circuit, individual GCs often project to multiple PCs. Consequently, a GC labeled via a secondary PC may also a bona fide presynaptic partner of the primary starter population. This overlap could mitigate the overestimation of CI. Taken together, we believe that the CI values reported in this study provide a reasonable approximation of monosynaptic connectivity.

      In scenarios where TVA and G are broadly expressed—for example, under the control of vglut2a promoter—secondary starter cells may arise frequently. In such cases, long-term time-lapse imaging in the zebrafish whole brain presents a promising strategy to distinguish primary and secondary starter cells, along with their respective input populations, based on the timing of their appearance. This approach potentially enables multi-step circuit tracing within individual animals. An alternative strategy is to use an EnvA-pseudotyped, G-competent rabies virus, which allows targeted initial infection while supporting multisynaptic propagation. When combined with temporally resolved imaging, this strategy could facilitate direct labeling of higher-order circuits and allow clear differentiation between multi-order inputs and the original starter population over time.

      In conclusion, we find this suggestion compelling and will explore these strategies in future studies to optimize and broaden the application of rabies virus-based circuit tracing.

      Reviewer #2 (Public Review):

      The study by Chen, Deng et al. aims to develop an efficient viral transneuronal tracing method that allows efficient retrograde tracing in the larval zebrafish. The authors utilize pseudotyped-rabies virus that can be targeted to specific cell types using the EnvA-TvA systems. Pseudotyped rabies virus has been used extensively in rodent models and, in recent years, has begun to be developed for use in adult zebrafish. However, compared to rodents, the efficiency of the spread in adult zebrafish is very low (~one upstream neuron labeled per starter cell). Additionally, there is limited evidence of retrograde tracing with pseudotyped rabies in the larval stage, which is the stage when most functional neural imaging studies are done in the field. In this study, the authors systematically optimized several parameters of rabies tracing, including different rabies virus strains, glycoprotein types, temperatures, expression construct designs, and elimination of glial labeling. The optimal configurations developed by the authors are up to 5-10 fold higher than more typically used configurations.

      The results are solid and support the conclusions. However, the methods should be described in more detail to allow other zebrafish researchers to apply this method in their own work.

      Additionally, some findings are presented anecdotally, i.e., without quantification or sufficient detail to allow close examinations. Lastly, there is concern that the reagents created by the authors will not be easily accessible to the zebrafish community.

      (1) The titer used in each experiment was not stated. In the methods section, it is stated that aliquots are stored at 2x10e8. Is it diluted for injection? Are all of the experiments in the manuscripts with the same titer?

      We injected all three viral vectors as undiluted stock aliquots. The titer for SADdGmCherry[EnvA], CVSdG-tdTomato[EnvA], and CVSdG-mCherry-2A-Cre[EnvA]) was 2 × 10<sup>8</sup>, 2 × 10<sup>8</sup>, and 3 × 10<sup>8</sup> infectious units/mL, respectively. This has been clarified in the updated Methods section (page 12).

      (2) The age for injection is quite broad (3-5 dpf in Fig 1 and 4-6 dpf in Fig 2). Given that viral spread efficiency is usually more robust in younger animals, describing the exact injection age for each experiment is critical.

      We appreciate the reviewer’s suggestions. For the initial experiments tracing randomly from neurons in Figure 1, the injection age was primarily 3–4 dpf, with a one-day difference. Due to the slower development of PCs, the injection age for experiments related to Figure 2,3, and 4, is mainly 5 dpf. To clarify the developmental stages at the time of injection for each experiment, we have  newly added tables (see Figure 1,2—table supplement 2) listing the number of fish used at each injection age for all experimental groups shown in Figure 1 and 2.

      (3) More details should be provided for the paired electrical stimulation-calcium imaging study. How many GC cells were tested? How many had corresponding PC cell responses? What is the response latency? For example, images of stimulated and recorded GCs and PCs should be shown.

      Yes, these are important details for the paired electrical stimulation-calcium imaging study. We stimulated 33 GCs from 32 animals and detected calcium responses in putative postsynaptic PCs in 15 cases. Among these, we successfully ablated the single GC in 11 pairs and observed a weakened calcium response in PCs following ablation (see Figure 2M). The response latency was determined as the first calcium imaging frame where ΔF/F exceeded the baseline (pre-stimulus average) by 3 times the standard deviation. Imaging was performed at 5 Hz, and as shown in Figure 2L, the calculated average response latency was 152 ± 35 ms (mean ± SEM), indicating an immediate response with calcium intensity from the first post-stimulus imaging frame consistently exceeding the threshold.

      We have added additional details to the Results (page 5), Discussion (page 9), and Methods (page 15) sections. A representative image showing both the stimulated GC and the recorded PC has been added to Figure 2 in the revised manuscript (see Figure 2K).

      (4) It is unclear how connectivity between specific PC and GC is determined for single neuron connectivity. In other images (Figure 4C), there are usually multiple starter cells and many GCs. It was not shown that the image resolution can establish clear axon dendritic contacts between cell pairs.

      In our experiments, sparse labeling typically results in 1–10 starter cells per fish. Regarding the case shown in Figure 4C (right column), only two PC starters were labeled, which simplifies the assignment of presynaptic inputs to individual PCs. Connectivity is determined based on clear axon-dendritic or axon-cell body apposition between GCs and PCs. We have accordingly added more details to the Methods (page 16) section regarding how we determined connectivity between specific PCs and GCs.

      Reviewer #2 (Recommendations For The Authors):

      To enable broader use of this technique, I would encourage the authors to submit their zebrafish lines, plasmids, and plasmid sequences to public repositories such as ZIRC and  Addgene. Additionally, there is no mention of how viral vectors will be shared.

      We have deposited the related zebrafish lines at CZRC (China Zebrafish Resource Center) and uploaded plasmid maps and sequences to Addgene. The viral vectors are available through BrainCase (Shenzhen, China). We have included the information in the revised manuscript.

      Reviewer #3 (Public Review):

      Summary:

      The authors establish reagents and define experimental parameters useful for defining neurons retrograde to a neuron of interest.

      Strengths:

      A clever approach, careful optimization, novel reagents, and convincing data together lead to convincing conclusions.

      Weaknesses: 

      In the current version of the manuscript, the tracing results could be better centered with  respect to past work, certain methods could be presented more clearly, and other approaches worth considering.

      Appraisal/Discussion:

      Trans-neuronal tracing in the larval zebrafish preparation has lagged behind rodent models,limiting "circuit-cracking" experiments. Previous work has demonstrated that pseudotyped rabies virus-mediated tracing could work, but published data suggested that there was considerable room for optimization. The authors take a major step forward here, identifying a number of key parameters to achieve success and establishing new transgenic reagents that incorporate modern intersectional approaches. As a proof of concept, the manuscript concludes with a rough characterization of inputs to cerebellar Purkinje cells. The work will be of considerable interest to neuroscientists who use the zebrafish model.

      Reviewer #3 (Recommendations For The Authors):

      The main limitations of the work are as follows:

      (1) The optimizations might differ for different neurons. Purkinje cells are noteworthy because they develop considerably during the time window detailed here, almost doubling in number between 7-14dpf. Presumably, connectivity follows. This sort of neurogenesis is much less common elsewhere. It would be useful to show similar results in, say, tectal neurons, which would have spatially-restricted retinal ganglion cells labelled.

      We acknowledge that Purkinje cells (PCs) undergo significant development between 7–14 dpf, which may influence synaptic connectivity and result in differences in tracing efficiency. However, all experimental conditions were standardized across groups, and the selection of starter PCs was unbiased, typically focusing on PCs in the lateral region of the CCe (corpus cerebelli) subregion, ensuring that the relative comparisons remain valid. 

      We agree that testing other neuronal populations would be valuable, as tracing efficiency is influenced by multiple factors, such as the number of endogenous inputs, synaptic maturation, and developmentally regulated synaptic strength. Tectal neurons, which receive spatially restricted retinal ganglion cell inputs, would be a suitable choice for further investigation. However, due to the various tectal cell types and the opacity of the eyeball, such studies present additional technical challenges and are beyond the scope of this paper.

      (2) The virus is delivered by means of microinjection near the cell. This is invasive and challenging for labs that dont routinely perform electrophysiology. It would be useful to know if coarser methods of viral delivery (e.g. intraventricular injection) would be successful. 

      Our protocol does not require the level of precision needed for electrophysiology. The procedure can be performed using a standard high-magnification upright (135× magnification, Nikon SMZ18) or inverted fluorescence microscope (200× magnification, Olympus IX51). The virus suspension was loaded into a glass micropipette with a ~10 µm tip diameter and directly microinjected into the target region using a micromanipulator. The procedure was comparable to embryonic microinjection in terms of precision and operational control. Notably, direct contact with the target cells is not necessary, as the injected virus solution can diffuse and effectively infect nearby cells.  

      We had attempted intraventricular injection as an alternative, but it failed to produce robust labeling, reinforcing the necessity for direct tissue injection. 

      We have now included additional methodological details in the Methods section (page 13). 

      (3) Because of the combination of transgenic lines, plasmid injection, and viral type, it is often confusing to follow exactly what is being done for a particular experiment. It would be useful to specify the transgenic background used for each experiment using standard nomenclature e.g. "Plasmids were injected into Tg(elavl3:GAL4) fish." This is particularly important for the experiments in Figure 4: it isnt clear what the background used for the sparse labels was. 

      Thank the reviewer for bringing this issue to our attention. In order to improve clarity, we have revised the figure legends to explicitly state the transgenic background, injected plasmids, and viral type used in each experiment, particularly for Figure 4. 

      (4) Plasmids should be deposited with Addgene along with maps specifying the particular "codon-optimized Tetoff" per 388. 

      We confirm that all plasmids, including those containing codon-optimized Tetoff constructs, have been uploaded to Addgene along with detailed maps.

      (5) It would be useful to know if there were more apoptotic cells after transfection -- an acridine orange or comparable assay is recommended, rather than loss of fluorescence. 

      We appreciate the reviewer’s suggestion to assess apoptosis using acridine orange staining or comparable assays. We agree that such methods can provide more direct detection of apoptotic events. However, we believe that the difference in cytotoxicity is already evident in our current data: SAD-infected cells exhibit greater loss than CVSinfected cells (see Figure 3D). This is consistent with previous observations in mice, where greater toxicity of SAD compared to CVS was demonstrated using propidium iodide (PI) staining in cultured cells (Reardon et al., 2016).

      (6) Line 219-228 Hibis lab has described the subtypes of granule cells in detail already; the work should discuss the tracings with respect to previous characterizations instead of limiting that work to a citation. 

      Thanks for the reminding of this point. We have expanded the Results section (page 6) to discuss the subtypes of GCs and PCs in relation to previously reported characterizations.

      (7) "Activities" is often used when "activity" is correct. The use of English in the manuscript is, by and large, excellent, but its worth running the text through software like Grammarly to catch the occasional error. 

      We have carefully edited the manuscript using professional language editing tools to correct any grammatical issues.

      (8) The experiments in 2J-2L would be more convincing if they were performed on inferior olive inputs as well -- especially given the small size of the granule cells. 

      We acknowledge the reviewers observation that granule cells (GCs) are relatively small, which may underline the finding that, out of 33 stimulated GCs, only 15 were capable of eliciting calcium responses in putative postsynaptic PCs. However, in all 11 pairs where a single GC was successfully ablated, we observed a weakened calcium response in PCs after the ablation (see Figure 2M), suggesting our tracing approach specifically identifies synaptically coupled neurons. We have clarified this point in the revised manuscript (page 5).

      We agree that verifying the IO inputs to PCs would strengthen the validity of our findings. However, in our experiments, the probability of tracing upstream IO cells was relatively low. This may be due to the developmental immaturity of the synapse and the fact that each PC typically receives input from a single IO cell. Additionally, the deep and distant anatomical location of the IO presents technical challenges for paired electrical stimulationcalcium imaging study. To address these limitations, we are currently exploring the integration of viral tracing and optogenetics to further investigate IO-PC connectivity in future studies.

      (9) It would be useful if the manuscript discussed the efficacy of trans-synaptic labelling. What fraction of granule cell / olivary inputs to a particular Purkinje cell do the authors think their method captures?

      This is an important point for assessing the efficacy of our trans-synaptic labeling. Ideally, electron microscopy (EM) data would provide the most precise evaluation. In the absence of EM data, we estimated the number of GCs, IOs and PCs using light microscopy-based cell counting. 

      At approximately 7 dpf, we manually counted 327 ± 14 PCs and 2318 ± 70 GCs in the Tg(2×en.cpce-E1B:tdTomato-CAAX) and Tg(cbln12:GAL4FF);Tg(5×UAS:EGFP) zebrafish cerebellum, across all subregions (Va, CCe, EG, and LCa). Given the developmental increase in the number of GCs and the fact that some GCs that have exclusively ipsilateral projections, and that a single PC would not receive input from all parallel fibers, we estimate that by 10–14 dpf, a single PC receives approximately 1000– 2000 GC inputs. Under optimal tracing conditions, we observed an average of 20 labeled GC inputs per PC, yielding a capture fraction of ~1–2%. Although this represents only a subset of total inputs, it is consistent with mammalian studies (Wall et al., 2010; Callaway et al., 2015), suggesting inherent limitations of this viral labeling approach.

      For IO inputs, we counted 325 ± 26 inferior olivary neurons in Tg(elavl3:H2B-GCaMP6s) fish. A single PC likely receives input from one IO neuron, though an IO neuron may innervate multiple PCs. Accordingly, the observed capture rate for IO inputs was lower (7 out of 248 starters). 

      Further optimization is required to enhance the tracing efficiency. We have now incorporated a Discussion on this point in the revised manuscript (page 8).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this study, Ana Lapao et al. investigated the roles of Rab27 effector SYTL5 in cellular membrane trafficking pathways. The authors found that SYTL5 localizes to mitochondria in a Rab27A-dependent manner. They demonstrated that SYTL5-Rab27A positive vesicles containing mitochondrial material are formed under hypoxic conditions, thus they speculate that SYTL5 and Rab27A play roles in mitophagy. They also found that both SYTL5 and Rab27A are important for normal mitochondrial respiration. Cells lacking SYTL5 undergo a shift from mitochondrial oxygen consumption to glycolysis which is a common process known as the Warburg effect in cancer cells. Based on the cancer patient database, the author noticed that low SYTL5 expression is related to reduced survival for adrenocortical carcinoma patients, indicating SYTL5 could be a negative regulator of the Warburg effect and potentially tumorigenesis.

      Strengths:

      The authors take advantage of multiple techniques and novel methods to perform the experiments.

      (1) Live-cell imaging revealed that stably inducible expression of SYTL5 co-localized with filamentous structures positive for mitochondria. This result was further confirmed by using correlative light and EM (CLEM) analysis and western blotting from purified mitochondrial fraction.

      (2) In order to investigate whether SYTL5 and Rab27A are required for mitophagy in hypoxic conditions, two established mitophagy reporter U2OS cell lines were used to analyze the autophagic flux.

      Weaknesses:

      This study revealed a potential function of SYTL5 in mitophagy and mitochondrial metabolism. However, the mechanistic evidence that establishes the relationship between SYTL5/Rab27A and mitophagy is insufficient. The involvement of SYTL5 in ACC needs more investigation. Furthermore, images and results supporting the major conclusions need to be improved.

      We thank the reviewer for their constructive comments. We agree that a complete understanding of the mechanism by which SYTL5 and Rab27A are recruited to the mitochondria and subsequently involved in mitophagy requires further investigation. Here, we have shown that SYTL5 recruitment to the mitochondria requires both its lipid-binding C2 domains and the Rab27A-binding SHD domain (Figure 1G-H). This implies a coincidence detection mechanism for mitochondrial localisation of SYTL5.  Additionally, we find that mitochondrial recruitment of SYTL5 is dependent on the GTPase activity and mitochondrial localisation of Rab27A (Figure 2D-E). We also identified proteins linked to the cellular response to oxidative stress, reactive oxygen species metabolic process, regulation of mitochondrion organisation and protein insertion into mitochondrial membrane to be enriched in the SYTL5 interactome (Figure 3A and C).

      However, less details regarding the mitochondrial localisation of Rab27A are understood. To investigate this, we have now performed a mass spectrometry analysis to identify the interactome of Rab27A (see Author response table 1 below,). U2OS cells with stable expression of mScarlet-Rab27A or mScarlet only, were subjected to immunoprecipitation, followed by MS analysis.  Of the 32 significant Rab27A-interacting hits (compared to control), two of the hits are located in the inner mitochondrial membrane (IMM); ATP synthase F(1) complex subunit alpha (P25705), and mitochondrial very long-chain specific acyl-CoA dehydrogenase (VLCAD)(P49748). However, as these IMM proteins are not likely involved in mitochondrial recruitment of Rab27A, observed under basal conditions, we choose not to include these data in the manuscript. 

      It is known that other RAB proteins are recruited to the mitochondria. During parkin-mediated mitophagy, RABGEF1 (a guanine nucleotide exchange factor) is recruited through its ubiquitin-binding domain and directs mitochondrial localisation of RAB5, which subsequently leads to recruitment of RAB7 by the MON1/CCZ1 complex[1]. As already mentioned in the discussion (p. 12), ubiquitination of the Rab27A GTPase activating protein alpha (TBC1D10A) is reduced in the brain of Parkin KO mouse compared to controls[35], suggesting a possible connection of Rab27A with regulatory mechanisms that are linked with mitochondrial damage and dysfunction. While this an interesting avenue to explore, in this paper we will not follow up further on the mechanism of mitochondrial recruitment of Rab27A. 

      Author response table 1.

      Rab27A interactome. Proteins co-immunoprecipitated with mScarlet-Rab27A vs mScarlet expressing control. The data show average of three replicates. 

      To investigate the role of SYTL5 in the context of ACC, we acquired the NCI-H295R cell line isolated from the adrenal gland of an adrenal cancer patient. The cells were cultured as recommended from ATCC using DMEM/F-12 supplemented with NuSerum and ITS +premix. It is important to note that the H295R cells were adapted to grow as an adherent monolayer from the H295 cell line which grows in suspension. However, there can still be many viable H295R cells in the media. 

      We attempted to conduct OCR and ECAR measurements using the Seahorse XF upon knockdown of SYTL5 and/or Rab27A in H295R cells. For these assays, it is essential that the cells be seeded in a monolayer at 70-90% confluency with no cell clusters[4]. Poor adhesion of the cells can cause inaccurate measurements by the analyser. Unfortunately, the results between the five replicates we carried out were highly inconsistent, the same knockdown produced trends in opposite directions in different replicates. This is likely due to problems with seeding the cells. Despite our best efforts to optimise seeding number, and pre-coating the plate with poly-D-lysine[5] we observed poor attachment of cells and inability to form a monolayer. 

      To study the localisation of SYTL5 and Rab27A in an ACC model, we transduced the H295R cells with lentiviral particles to overexpress pLVX-SV40-mScarlet-I-Rab27A and pLVX-CMV-SYTL5-EGFP-3xFLAG. Again, this proved unsuccessful after numerous attempts at optimising transduction. 

      These issues limited our investigation into the role of SYTL5 in ACC to the cortisol assay (Supplementary Figure 6). For this the H295R cells were an appropriate model as they are able to produce an array of adrenal cortex steroids[6] including cortisol[7]. In this assay, measurements are taken from cell culture supernatants, so the confluency of the cells does not prevent consistent results as the cortisol concentration was normalised to total protein per sample. With this assay we were able to rule out a role for SYTL5 and Rab27A in the secretion of cortisol.  

      Another consideration when investigating the involvement of SYTL5 in ACC, is that in general ACC cells should have a low expression of SYTL5 as is seen from the patient expression data (Figure 6B).

      The reviewer also writes “Furthermore, images and results supporting the major conclusions need to be improved.”. We have tried several times, without success, to generate U2OS cells with CRISPR/Cas9-mediated C-terminal tagging of endogenous SYTL5 with mNeonGreen, using an approach that has been successfully implemented in the lab for other genes. This is likely due to a lack of suitable sgRNAs targeting the C-terminal region of SYTL5, which have a low predicted efficiency score and a large number of predicted off-target sites in the human genome including several other gene exons and introns (see Author response image 2). 

      We have also included new data (Supplementary Figure 4B) showing that some of the hypoxia-induced SYTL5-Rab27A-positive vesicles stain positive for the autophagy markers p62 and LC3B when inhibiting lysosomal degradation, further strengthening our data that SYTL5 and Rab27A function as positive regulators of mitophagy.  

      Reviewer #2 (Public review): 

      Summary:

      The authors provide convincing evidence that Rab27 and STYL5 work together to regulate mitochondrial activity and homeostasis.

      Strengths:

      The development of models that allow the function to be dissected, and the rigorous approach and testing of mitochondrial activity.

      Weaknesses:

      There may be unknown redundancies in both pathways in which Rab27 and SYTL5 are working which could confound the interpretation of the results.

      Suggestions for revision:

      Given that Rab27A and SYTL5 are members of protein families it would be important to exclude any possible functional redundancies coming from Rab27B expression or one of the other SYTL family members. For Rab27 this would be straightforward to test in the assays shown in Figure 4 and Supplementary Figure 5. For SYTL5 it might be sufficient to include some discussion about this possibility.

      We thank the reviewer for pointing out the potential redundancy issue for Rab27A and SYTL5. There are multiple studies demonstrating the redundancy between Rab27A and Rab27B. For example, in a study of the disease Griscelli syndrome, caused by Rab27A loss of function, expression of either Rab27A or Rab27B rescues the healthy phenotype indicating redundancy[8]. This redundancy however applies to certain function and cell types. In fact, in a study regarding hair growth, knockdown of Rab27B had the opposite effect to knockdown of Rab27A[9].

      In this paper, we conducted all assays in U2OS cells, in which the expression of Rab27B is very low. Human Protein Atlas reports expression of 0.5nTPM for Rab27B, compared to 18.4nTPM for Rab27A. We also observed this low level of expression of Rab27B compared to Rab27A by qPCR in U2OS cells. Therefore, there would be very little endogenous Rab27B expression in cells depleted of Rab27A (with siRNA or KO). In line with this, Rab27B peptides were not detected in our SYTL5 interactome MS data (Table 1 in paper). Moreover, as Rab27A depletion inhibits mitochondrial recruitment of SYTL5 and mitophagy, it is not likely that Rab27B provides a functional redundancy. It is possible that Rab27B overexpression could rescue mitochondrial localisation of SYTL5 in Rab27A KO cells, but this was not tested as we do not have any evidence for a role of Rab27B in these cells. Taken together, we believe our data imply that Rab27B is very unlikely to provide any functional redundancy to Rab27A in our experiments. 

      For the SYTL family, all five members are Rab27 effectors, binding to Rab27 through their SHD domain. Together with Rab27, all SYTL’s have been implicated in exocytosis in different cell types. For example, SYTL1 in exocytosis of azurophilic granules from neutrophils[10], SYTL2 in secretion of glucagon granules from pancreatic α cells[11], SYTL3 in secretion of lytic granules from cytotoxic T lymphocytes[12], SYTL4 in exocytosis of dense hormone containing granules from endocrine cells[13] and SYTL5 in secretion of the RANKL cytokine from osteoblasts[14]. This indicates a potential for redundancy through their binding to Rab27 and function in vesicle secretion/trafficking. However, one study found that different Rab27 effectors have distinct functions at different stages of exocytosis[15].

      Very little known about redundancy or hierarchy between these proteins. Differences in function may be due to the variation in gene expression profile across tissues for the different SYTL’s (see Author response image 1 below). SYTL5 is enriched in the brain unlike the others, suggesting possible tissue specific functions. There are also differences in the binding affinities and calcium sensitivities of the C2iA and C2B domains between the SYTL proteins[16].

      Author response image 1.

      GTEx Multi Gene Query for SYTL1-5

      All five SYTL’s are expressed in the U2OS cell line with nTPMs according to Human Protein Atlas of SYTL1: 7.5, SYTL2: 13.4, SYTL3:14.2, SYTL4: 8.7, SYTL5: 4.8. In line with this, in the Rab27A interactome, when comparing cells overexpressing mScarlet-Rab27A with control cells, we detected all five SYTL’s as specific Rab27A-interacting proteins (see Author response table 1 above). Whereas, in the SYTL5 interactome we did not detect any other SYTL protein (table 1 in paper), confirming that they do not form a complex with SYTL5. 

      We have included the following text in the discussion (p. 12): “SYTL5 and Rab27A are both members of protein families, suggesting possible functional redundancies from Rab27B or one of the other SYTL isoforms. While Rab27B has a very low expression in U2OS cells, all five SYTL’s are expressed. However, when knocking out or knocking down SYTL5 and Rab27A we observe significant effects that we presume would be negated if their isoforms were providing functional redundancies. Moreover, we did not detect any other SYTL protein or Rab27B in the SYTL5 interactome, confirming that they do not form a complex with SYTL5.”

      Suggestions for Discussion: 

      Both Rab27A and STYL5 localize to other membranes, including the endolysosomal compartments. How do the authors envisage the mechanism or cellular modifications that allow these proteins, either individually or in complex to function also to regulate mitochondrial funcYon? It would be interesYng to have some views.

      We agree that it would be interesting to better understand the mechanism involved in modulation of the localisation and function of SYTL5 and Rab27A at different cellular compartments, including the mitochondria. Here, we have shown that SYTL5 recruitment to the mitochondria involves coincidence detection, as both its lipid-binding C2 domains and the Rab27A-binding SHD domain are required (Figure 1G-H). Both these domains also seem required for localisation of SYTL5 to vesicles, and we can only speculate that binding to different lipids (Figure 1F) may regulate SYTL5 localisation. Additionally, we find that mitochondrial recruitment of SYTL5 is dependent on the GTPase activity and mitochondrial localisation of Rab27A (Figure 2D-E). However, this seems also the case for vesicular recruitment of SYTL5, although a few SYTL5-Rab27A (T23N) positive vesicles were seen (Figure 2E). 

      To characterise the mechanisms involved in mitochondrial localisation of Rab27A, we have performed mass spectrometry analysis to identify the interactome of Rab27A (see Author response table 1 above). U2OS cells with stable expression of mScarlet-Rab27A or mScarlet only were subjected to immunoprecipitation, followed by MS analysis.  Of the 32 significant Rab27A-interacting hits (compared to control), two of the hits localise in the inner mitochondrial membrane (IMM); ATP synthase F(1) complex subunit alpha (P25705), and mitochondrial very long-chain specific acyl-CoA dehydrogenase (VLCAD)(P49748). However, as these IMM proteins are not likely involved in mitochondrial recruitment of Rab27A, observed under basal conditions, we chose not to include these data in the manuscript. 

      It is known that other RAB proteins are recruited to the mitochondria by regulation of their GTPase activity. During parkin-mediated mitophagy, RABGEF1 (a guanine nucleotide exchange factor) is recruited through its ubiquitin-binding domain and directs mitochondrial localisation of RAB5, which subsequently leads to recruitment of RAB7 by the MON1/CCZ1 GEF complex[1]. As already mentioned in the discussion (p.12), ubiquitination of the Rab27A GTPase activating protein alpha (TBC1D10A) is reduced in the brain of Parkin KO mouse compared to controls[35], suggesting a possible connection of Rab27A with regulatory mechanisms that are linked with mitochondrial damage and dysfunction. While this an interesting avenue to explore, it is beyond the scope of this paper. 

      Our data suggest that SYTL5 functions as a negative regulator of the Warburg effect, the switch from OXPHOS to glycolysis. While both SYTL5 and Rab27A seem required for mitophagy of selective mitochondrial components, and their depletion leading to reduced mitochondrial respiration and ATP production, only depletion of SYTL5 caused a switch to glycolysis. The mechanisms involved are unclear, but we found several proteins linked to the cellular response to oxidative stress, reactive oxygen species metabolic process, regulation of mitochondrion organisation and protein insertion into mitochondrial membrane to be enriched in the SYTL5 interactome (Figure 3A and C).

      We have addressed this comment in the discussion on p.12 

      Reviewer #3 (Public review):

      Summary:

      In the manuscript by Lapao et al., the authors uncover a role for the Rab27A effector protein SYTL5 in regulating mitochondrial function and turnover. The authors find that SYTL5 localizes to mitochondria in a Rab27A-dependent way and that loss of SYTL5 (or Rab27A) impairs lysosomal turnover of an inner mitochondrial membrane mitophagy reporter but not a matrix-based one. As the authors see no co-localization of GFP/mScarlet tagged versions of SYTL5 or Rab27A with LC3 or p62, they propose that lysosomal turnover is independent of the conventional autophagy machinery. Finally, the authors go on to show that loss of SYTL5 impacts mitochondrial respiration and ECAR and as such may influence the Warburg effect and tumorigenesis. Of relevance here, the authors go on to show that SYTL5 expression is reduced in adrenocortical carcinomas and this correlates with reduced survival rates.

      Strengths:

      There are clearly interesting and new findings here that will be relevant to those following mitochondrial function, the endocytic pathway, and cancer metabolism.

      Weaknesses:

      The data feel somewhat preliminary in that the conclusions rely on exogenously expressed proteins and reporters, which do not always align.

      As the authors note there are no commercially available antibodies that recognize endogenous SYTL5, hence they have had to stably express GFP-tagged versions. However, it appears that the level of expression dictates co-localization from the examples the authors give (though it is hard to tell as there is a lack of any kind of quantitation for all the fluorescent figures). Therefore, the authors may wish to generate an antibody themselves or tag the endogenous protein using CRISPR.

      We agree that the level of SYTL5 expression is likely to affect its localisation. As suggested by the reviewer, we have tried hard, without success, to generated U2OS cells with CRISPR knock-in of a mNeonGreen tag at the C-terminus of endogenous SYTL5, using an approach that has been successfully implemented in the lab for other genes. This is likely due to a lack of suitable sgRNAs targeting the C-terminal region of SYTL5, which have a low predicted efficiency score and a large number of predicted off-target sites in the human genome including several other gene exons and introns (see Author response image 2). 

      Author response image 2.

      Overview of sgRNAs targeting the C-terminal region of SYTL5 

      Although the SYTL5 expression level might affect its cellular localization, we also found the mitochondrial localisation of SYTL5-EGFP to be strongly increased in cells co-expressing mScarletRab27A, supporting our findings of Rab27A-mediated mitochondrial recruitment of SYTL5. We have also included new data (Supplementary Figure 4B) showing that some of the hypoxia-induced SYTL5Rab27A-positive vesicles stain positive for the autophagy markers p62 and LC3B when inhibiting lysosomal degradation, further strengthening our data that SYTL5 and Rab27A function as positive regulators of mitophagy.  

      In relation to quantitation, the authors found that SYTL5 localizes to multiple compartments or potentially a few compartments that are positive for multiple markers. Some quantitation here would be very useful as it might inform on function. 

      We find that SYTL5-EGFP localizes to mitochondria, lysosomes and the plasma membrane in U2OS cells with stable expression of SYTL5-EGFP and in SYTL5/Rab27A double knock-out cells rescued with SYTL5EGFP and mScralet-Rab27A. We also see colocalization of SYTL5-EGFP with endogenous p62, LC3 and LAMP1 upon induction of mitophagy. However, as these cell lines comprise a heterogenous pool with high variability we do not believe that quantification of the overexpressing cell lines would provide beneficial information in this scenario. As described above, we have tried several times to generate SYTL5 knock-in cells without success.  

      The authors find that upon hypoxia/hypoxia-like conditions that punctate structures of SYTL5 and Rab27A form that are positive for Mitotracker, and that a very specific mitophagy assay based on pSu9-Halo system is impaired by siRNA of SYTL5/Rab27A, but another, distinct mitophagy assay (Matrix EGFP-mCherry) shows no change. I think this work would strongly benefit from some measurements with endogenous mitochondrial proteins, both via immunofluorescence and western blot-based flux assays. 

      In addition to the western blotting for different endogenous ETC proteins showing significantly increased levels of MTCO1 in cells depleted of SYTL5 and/or Rab27A (Figure 5E-F), we have now blotted for the endogenous mitochondrial proteins, COXIV and BNIP3L, in DFP and DMOG conditions upon knockdown of SYTL5 and/or Rab27A (Figure 5G and Supplementary Figure 5A). Although there was a trend towards increased levels, we did not see any significant changes in total COXIV or BNIP3L levels when SYTL5, Rab27A or both are knocked down compared to siControl. Blotting for endogenous mitochondrial proteins is however not the optimum readout for mitophagy. A change in mitochondrial protein level does not necessarily result from mitophagy, as other factors such as mitochondrial biogenesis and changes in translation can also have an effect. Mitophagy is a dynamic process, which is why we utilise assays such as the HaloTag and mCherry-EGFP double tag as these indicate flux in the pathway. Additionally, as mitochondrial proteins have different half-lives, with many long-lived mitochondrial proteins[17], differences in turnover rates of endogenous proteins make the results more difficult to interpret. 

      A really interesting aspect is the apparent independence of this mitophagy pathway on the conventional autophagy machinery. However, this is only based on a lack of co-localization between p62or LC3 with LAMP1 and GFP/mScarlet tagged SYTL5/Rab27A. However, I would not expect them to greatly colocalize in lysosomes as both the p62 and LC3 will become rapidly degraded, while the eGFP and mScarlet tags are relatively resistant to lysosomal hydrolysis. -/+ a lysosome inhibitor might help here and ideally, the functional mitophagy assays should be repeated in autophagy KOs. 

      We thank the reviewer for this suggestion. We have now repeated the colocalisation studies in cells treated with DFP with the addition of bafilomycin A1 (BafA1) to inhibit the lysosomal V-ATPase. Indeed, we find that a few of the SYTL5/Rab27A/MitoTracker positive structures also stain positive for p62 and LC3 (Supplementary Figure 4B). As expected, the occurrence of these structures was rare, as BafA1 was only added for the last 4 hrs of the 24 hr DFP treatment. However, we cannot exclude the possibility that there are two different populations of these vesicles.

      The link to tumorigenesis and cancer survival is very interesYng but it is not clear if this is due to the mitochondrially-related aspects of SYTL5 and Rab27A. For example, increased ECAR is seen in the SYTL5 KO cells but not in the Rab27A KO cells (Fig.5D), implying that mitochondrial localization of SYTL5 is not required for the ECAR effect. More work to strengthen the link between the two sections in the paper would help with future direcYons and impact with respect to future cancer treatment avenues to explore. 

      We agree that the role of SYTL5 in ACC requires future investigation. While we observe reduced OXPHOS levels in both SYTL5 and Rab27A KO cells (Figure 5B), glycolysis was only increased in SYTL5 KO cells (Figure 5D). We believe this indicates that Rab27A is being negatively regulated by SYTL5, as ECAR was unchanged in both the Rab27A KO and Rab27A/SYTL5 dKO cells. This suggests that Rab27A is required for the increase in ECAR when SYTL5 is depleted, therefore SYTL5 negatively regulates Rab27A. The mechanism involved is unclear, but we found several proteins linked to the cellular response to oxidative stress, reactive oxygen species metabolic process, regulation of mitochondrion organisation and protein insertion into mitochondrial membrane to be enriched in the SYTL5 interactome (Figure 3A and C).

      To investigate the link to cancer further, we tested the effect of knockdown of SYTL5 and/or Rab27A on the levels of mitochondrial ROS. ROS levels were measured by flow cytometry using the MitoSOX Red dye, together with the MitoTracker Green dye to normalise ROS levels to the total mitochondria. Cells were treated with the antioxidant N-acetylcysteine (NAC)[18] as a negative control and menadione as a positive control, as menadione induces ROS production via redox cycling[19]. We must consider that there is also a lot of autofluorescence from cells that makes it impossible to get a level of ‘zero ROS’ in this experiment. We did not see a change in ROS with knockdown of SYTL5 and/or Rab27A compared to the NAC treated or siControl samples (see Author response image 3 below). The menadione samples confirm the success of the experiment as ROS accumulated in these cells. Thus, based on this, we do not believe that low SYTL5 expression would affect ROS levels in ACC tumours.

      Author response image 3.

      Mitochondrial ROS production normalised to total mitochondria

      As discussed in our response to Reviewer #1, we tried hard to characterise the role of SYTL5 in the context of ACC using the NCI-H295R cell line isolated from the adrenal gland of an adrenal cancer patient. We attempted to conduct OCR and ECAR measurements using the Seahorse XF upon knockdown of SYTL5 and/or Rab27A in H295R cells without success, due to poor attachment of the cells and inability to form a monolayer. We also transduced the H295R cells with lentiviral particles to overexpress pLVX-SV40-mScarlet-I-Rab27A and pLVX-CMV-SYTL5-EGFP-3xFLAG to study the localisation of SYTL5 and Rab27A in an ACC model. Again, this proved unsuccessful after numerous attempts at optimising the transduction. These issues limited our investigation into the role of SYTL5 in ACC to the cortisol assay (Supplementary Figure 6). For this the H295R cells were an appropriate model as they are able to produce an array of adrenal cortex steroids[6] including cortisol[7] In this assay, measurements are taken from cell culture supernatants, so the confluency of the cells does not prevent consistent results as the cortisol concentration was normalised to total protein per sample. With this assay we were able to rule out a role for SYTL5 and Rab27A in the secretion of cortisol.  

      Another consideration when investigating the involvement of SYTL5 in ACC, is that in general ACC cells should have a low expression of SYTL5 as is seen from the patient expression data (Figure 6B).

      Further studies into the link between SYTL5/Rab27A and cancer are beyond the scope of this paper as we are limited to the tools and expertise available in the lab.

      References

      (1) Yamano, K. et al. Endosomal Rab cycles regulate Parkin-mediated mitophagy. eLife 7 (2018). https://doi.org:10.7554/eLife.31326

      (2) Carré, M. et al. Tubulin is an inherent component of mitochondrial membranes that interacts with the voltage-dependent anion channel. The Journal of biological chemistry 277, 33664-33669 (2002). https://doi.org:10.1074/jbc.M203834200

      (3) Hoogerheide, D. P. et al. Structural features and lipid binding domain of tubulin on biomimetic mitochondrial membranes. Proceedings of the National Academy of Sciences 114, E3622-E3631 (2017). https://doi.org:10.1073/pnas.1619806114

      (4) Plitzko, B. & Loesgen, S. Measurement of Oxygen Consumption Rate (OCR) and Extracellular Acidification Rate (ECAR) in Culture Cells for Assessment of the Energy Metabolism. Bio Protoc 8, e2850 (2018). https://doi.org:10.21769/BioProtoc2850

      (5) Yavin, E. & Yavin, Z. Attachment and culture of dissociated cells from rat embryo cerebral hemispheres on polylysine-coated surface. The Journal of cell biology 62, 540-546 (1974). https://doi.org:10.1083/jcb.62.2.540

      (6) Wang, T. & Rainey, W. E. Human adrenocortical carcinoma cell lines. Mol Cell Endocrinol 351, 5865 (2012). https://doi.org:10.1016/j.mce.2011.08.041

      (7) Rainey, W. E. et al. Regulation of human adrenal carcinoma cell (NCI-H295) production of C19 steroids. J Clin Endocrinol Metab 77, 731-737 (1993). https://doi.org:10.1210/jcem.77.3.8396576

      (8) Barral, D. C. et al. Functional redundancy of Rab27 proteins and the pathogenesis of Griscelli syndrome. J. Clin. Invest. 110, 247-257 (2002). https://doi.org:10.1172/jci15058

      (9) Ku, K. E., Choi, N. & Sung, J. H. Inhibition of Rab27a and Rab27b Has Opposite Effects on the Regulation of Hair Cycle and Hair Growth. Int. J. Mol. Sci. 21 (2020). https://doi.org:10.3390/ijms21165672

      (10) Johnson, J. L., Monfregola, J., Napolitano, G., Kiosses, W. B. & Catz, S. D. Vesicular trafficking through cortical actin during exocytosis is regulated by the Rab27a effector JFC1/Slp1 and the RhoA-GTPase–activating protein Gem-interacting protein. Mol. Biol. Cell 23, 1902-1916 (2012). https://doi.org:10.1091/mbc.e11-12-1001

      (11) Yu, M. et al. Exophilin4/Slp2-a targets glucagon granules to the plasma membrane through unique Ca2+-inhibitory phospholipid-binding activity of the C2A domain. Mol. Biol. Cell 18, 688696 (2007). https://doi.org:10.1091/mbc.e06-10-0914

      (12) Kurowska, M. et al. Terminal transport of lyXc granules to the immune synapse is mediated by the kinesin-1/Slp3/Rab27a complex. Blood 119, 3879-3889 (2012). https://doi.org:10.1182/blood-2011-09-382556

      (13) Zhao, S., Torii, S., Yokota-Hashimoto, H., Takeuchi, T. & Izumi, T. Involvement of Rab27b in the regulated secretion of pituitary hormones. Endocrinology 143, 1817-1824 (2002). https://doi.org:10.1210/endo.143.5.8823

      (14) Kariya, Y. et al. Rab27a and Rab27b are involved in stimulation-dependent RANKL release from secretory lysosomes in osteoblastic cells. J Bone Miner Res 26, 689-703 (2011). https://doi.org:10.1002/jbmr.268

      (15) Zhao, K. et al. Functional hierarchy among different Rab27 effectors involved in secretory granule exocytosis. Elife 12 (2023). https://doi.org:10.7554/eLife.82821

      (16) Izumi, T. Physiological roles of Rab27 effectors in regulated exocytosis. Endocr J 54, 649-657 (2007). https://doi.org:10.1507/endocrj.kr-78

      (17) Bomba-Warczak, E. & Savas, J. N. Long-lived mitochondrial proteins and why they exist. Trends in cell biology 32, 646-654 (2022). https://doi.org:10.1016/j.tcb.2022.02.001

      (18) Curtin, J. F., Donovan, M. & Cotter, T. G. Regulation and measurement of oxidative stress in apoptosis. Journal of Immunological Methods 265, 49-72 (2002). https://doi.org:https://doi.org/10.1016/S0022-1759(02)00070-4

      (19) Criddle, D. N. et al. Menadione-induced Reative Oxygen Species Generation via Redox Cycling Promotes Apoptosis of Murine Pancreatic Acinar Cells. Journal of Biological Chemistry 281, 40485-40492 (2006). https://doi.org:https://doi.org/10.1074/jbc.M607704200

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Turner et al. present an original approach to investigate the role of Type-1 nNOS interneurons in driving neuronal network activity and in controlling vascular network dynamics in awake head-fixed mice. Selective activation or suppression of Type-1 nNOS interneurons has previously been achieved using either chemogenetic, optogenetic, or local pharmacology. Here, the authors took advantage of the fact that Type-1 nNOS interneurons are the only cortical cells that express the tachykinin receptor 1 to ablate them with a local injection of saporin conjugated to substance P (SP-SAP). SP-SAP causes cell death in 90 % of type1 nNOS interneurons without affecting microglia, astrocytes, and neurons. The authors report that the ablation has no major effects on sleep or behavior. Refining the analysis by scoring neural and hemodynamic signals with electrode recordings, calcium signal imaging, and wide-field optical imaging, the authors observe that Type-1 nNOS interneuron ablation does not change the various phases of the sleep/wake cycle. However, it does reduce low-frequency neural activity, irrespective of the classification of arousal state. Analyzing neurovascular coupling using multiple approaches, they report small changes in resting-state neural-hemodynamic correlations across arousal states, primarily mediated by changes in neural activity. Finally, they show that nNOS type 1 interneurons play a role in controlling interhemispheric coherence and vasomotion.

      In conclusion, these results are interesting, use state-of-the-art methods, and are well supported by the data and their analysis. I have only a few comments on the stimulus-evoked haemodynamic responses, and these can be easily addressed.

      We thank the reviewer for their positive comments on our work.

      Reviewer #2 (Public review):

      Summary:

      This important study by Turner et al. examines the functional role of a sparse but unique population of neurons in the cortex that express Nitric oxide synthase (Nos1). To do this, they pharmacologically ablate these neurons in the focal region of whisker-related primary somatosensory (S1) cortex using a saponin-substance P conjugate. Using widefield and 2photon microscopy, as well as field recordings, they examine the impact of this cell-specific lesion on blood flow dynamics and neuronal population activity. Locally within the S1 cortex, they find changes in neural activity paFerns, decreased delta band power, and reduced sensory-evoked changes in blood flow (specifically eliminating the sustained blood flow change amer stimulation). Surprisingly, given the tiny fraction of cortical neurons removed by the lesion, they also find far-reaching effects on neural activity paFerns and blood volume oscillations between the cerebral hemispheres.

      Strengths:

      This was a technically challenging study and the experiments were executed in an expert manner. The manuscript was well wriFen and I appreciated the cartoon summary diagrams included in each figure. The analysis was rigorous and appropriate. Their discovery that Nos1 neurons can have far-reaching effects on blood flow dynamics and neural activity is quite novel and surprising (to me at least) and should seed many follow-up, mechanistic experiments to explain this phenomenon. The conclusions were justified by the convincing data presented.

      Weaknesses:

      I did not find any major flaws in the study. I have noted some potential issues with the authors' characterization of the lesion and its extent. The authors may want to re-analyse some of their data to further strengthen their conclusions. Lastly, some methodological information was missing, which should be addressed.

      We thank the reviewer for their enthusiasm for our work.

      Reviewer #3 (Public review):

      The role of type-I nNOS neurons is not fully understood. The data presented in this paper addresses this gap through optical and electrophysiological recordings in adult mice (awake and asleep).

      This manuscript reports on a study on type-I nNOS neurons in the somatosensory cortex of adult mice, from 3 to 9 months of age. Most data were acquired using a combination of IOS and electrophysiological recordings in awake and asleep mice. Pharmacological ablation of the type-I nNOS populations of cells led to decreased coherence in gamma band coupling between lem and right hemispheres; decreased ultra-low frequency coupling between blood volume in each hemisphere; decreased (superficial) vascular responses to sustained sensory stimulus and abolishment of the post-stimulus CBV undershoot. While the findings shed new light on the role of type-I nNOS neurons, the etiology of the discrepancies between current observations and literature observations is not clear and many potential explanations are put forth in the discussion.

      We thank the reviewer for their comments.

      Reviewer #1 (Recommendations for the authors):  

      (1) Figure 3, Type-1 nNOS interneuron ablation has complex effects on neural and vascular responses to brief (.1s) and prolonged (5s) whisker stimulation. During 0.1 s stimulation, ablation of type 1 nNOS cells does not affect the early HbT response but only reduces the undershoot. What is the pan-neuronal calcium response? Is the peak enhanced, as might be expected from the removal of inhibition? The authors need to show the GCaMP7 trace obtained during this short stimulation.

      Unfortunately, we did not perform brief stimulation experiments in GCaMP-expressing mice. As we did not see a clear difference in the amplitude of the stimulus-evoked response with our initial electrophysiology recordings (Fig. 3a), we suspected that an effect might be visible with longer duration stimuli and thus pivoted to a pulsed stimulation over the course of 5 seconds for the remaining cohorts. It would have been beneficial to interweave short-stimulus trials for a direct comparison between the complimentary experiments, but we did not do this.

      During 5s stimulation, both the early and delayed calcium/vascular responses are reduced. Could the authors elaborate on this? Does this mean that increasing the duration of stimulation triggers one or more additional phenomena that are sensitive to the ablation of type 1 nNOS cells and mask what is triggered by the short stimulation? Are astrocytes involved? How do they interpret the early decrease in neuronal calcium?

      As our findings show that ablation reduces the calcium/vascular response more prominently during prolonged stimulation, we do suspect that this is due to additional NO-dependent mechanisms or downstream responses. NO is modulator of neural activity, generally increasing excitability (Kara and Friedlander 1999, Smith and Otis 2003), so any manipulation that changes NO levels will change (likely decrease) the excitability of the network, potentially resulting in a smaller hemodynamic response to sensory stimulation secondary to this decrease. While short stimuli engage rapid neurovascular coupling mechanisms, longer duration (>1s) stimulation could introduce additional regulatory elements, such as astrocytes, that operate on a slower time scale. On the right, we show a comparison of the control groups ploFed together from Fig. 3a and 3b with vertical bars aligned to the peak. During the 5s stimulation, the time-to-peak is roughly 830 milliseconds later than the 0.1s stimulation, meaning it’s plausible that the signals don’t separate until later. Our interpretation is that the NVC mechanisms responsible for brief stimulus-evoked change are either NO-independent or are compensated for in the SSP-SAP group by other means due to the chronic nature of the ablation. 

      We have added the following text to the Discussion (Line 368): “Loss of type-I nNOS neurons drove minimal changes in the vasodilation elicited by brief stimulation, but led to decreased vascular responses to sustained stimulation, suggesting that the early phase of neurovascular coupling is not mediated by these cells, consistent with the multiple known mechanisms for neurovascular coupling (AFwell et al 2010, Drew 2019, Hosford & Gourine 2019) acting through both neurons and astrocytes with multiple timescales (Le Gac et al 2025, Renden et al 2024, Schulz et al 2012, Tran et al 2018).”

      Author response image 1.

      (2) In Figures 4d and e, it is unclear to me why the authors use brief stimulation to analyze the relationship between HbT and neuronal activity (gamma power) and prolonged stimulation for the relationship between HbT and GCaMP7 signal. Could they compare the curves with both types of stimulation?

      As discussed previously, we did not use the same stimulation parameters across cohorts. The mice with implanted electrodes received only brief stimulation, while those undergoing calcium imaging received longer duration stimulus. 

      Reviewer #2 (Recommendations for the authors):

      (1) Results, how far-reaching is the cell-specific ablation? Would it be possible to estimate the volume of the cortex where Nos1 cells are depleted based on histology? Were there signs of neuronal injury more remotely, for example, beading of dendrites?

      We regularly see 1-2 mm in diameter of cell ablation within the somatosensory cortex of each animal, which is consistent with the spread of small molecules. Ribosome inactivating proteins like SAP are smaller than AAVs (~5 nm compared to ~25 nm in diameter) and thus diffuse slightly further. We observed no obvious indication of neuronal injury more remotely or in other brain regions, but we did not image or characterize dendritic beading, as this would require a sparse labeling of neurons to clearly see dendrites (NeuN only stains the cell body). Our histology shows no change in cell numbers. 

      We have added the following text to the Results (Line 124): “Immunofluorescent labeling in mice injected with Blank-SAP showed labeling of nNOS-positive neurons near the injection site. In contrast, mice injected with SP-SAP showed a clear loss in nNOS-labeling, with a typical spread of 1-2 mm from the injection site, though nNOS-positive neurons both subcortically and in the entirety of the contralateral hemisphere remaining intact.”

      (2) For histological analysis of cell counts amer the lesion, more information is needed. How was the region of interest for counting cells determined (eg. 500um radius from needle/pipeFe tract?) and of what volume was analysed?

      The region of interest for both SSP-SAP and Blank SAP injections was a 1 mm diameter circle centered around the injection site and averaged across sections (typically 3-5 when available). In most animals, the SSP-SAP had a lateral spread greater than 500 microns and encompassed the entire depth of cortex (1-1.5 mm in SI, decreasing in the rostral to caudal direction). The counts within the 1 mm diameter ROI were averaged across sections and then converted into the cells per mm area as presented. Note the consistent decrease in type I nNOS cells seen across mice in Fig 1d, Fig S1b.

      We have added the following text in the Materials & Methods (Line 507): “The region of interest for analysis of cell counts was determined based on the injection site for both SP-SAP and Blank SAP injections, with a 1 mm diameter circle centered around the injection site and averaged across 3-5 sections where available. In most animals, the SP-SAP had a lateral spread greater than 500 microns and encompassed the entire depth of cortex (1-1.5 mm in SI).”

      (3) Based on Supplementary Figure 1, it appears that the Saponin conjugate not only depletes Nos neurons but also may affect vascular (endothelial perhaps) Nos expression. Some quantification of this effect and its extent may be insighIul in terms of ascribing the effects of the lesion directly on neurons vs indirectly and perhaps more far-reaching via vascular/endothelial NOS.

      Thank you for this comment. While this is a possibility, while we have found that the high nNOS expression of type-I nnoos neurons makes NADPH diaphorase a good stain for detecting them, it is less useful for cell types that expres NOS at lower levels.  We have found that the absolute intensity of NADPH diaphorase staining is somewhat variable from section to section. Variability in overall NADPH diaphorase intensity is likely due to several factors, such as duration of staining, thickness of the section, and differences in PFA concentration within the tissue and between animals. As NADPH diaphorase staining is highly sensitive to amount PFA exposure, any small differences in processing could affect the intensity, and slight differences in perfusion quality and processing could account. A second, perhaps larger issue could be due to differences in the number of arteries (which will express NOS at much higher levels than veins, and thus will appear darker) in the section. We did not stain for smooth muscle and so cannot differentiate arteries and veins.  Any difference in vessel intensity could be due to random variations in the numbers of arteries/veins in the section. While we believe that this is a potentially interesting question, our histological experiments were not able to address it.

      (4) The assessment for inflammation took place 1 month amer the lesion, but the imaging presumably occurred ~ 2 weeks amer the lesion. Note that it seemed somewhat ambiguous as to when approximately, the imaging, and electrophysiology experiments took place relative to the induction of the lesion. Presumably, some aspects of inflammation and disruption could have been missed, at the time when experiments were conducted, based on this disparity in assessment. The authors may want to raise this as a possible limitation.

      We apologize for our unclear description of the timeline. We began imaging experiments at least 4 weeks amer ablation, the same time frame as when we performed our histological assays. 

      We have added the following text to the Discussion (Line 379): “With imaging beginning four weeks amer ablation, there could be compensatory rewiring of local and/or network activity following type-I nNOS ablation, where other signaling pathways from the neurons to the vasculature become strengthened to compensate for the loss of vasodilatory signaling from the typeI nNOS neurons.”

      (5) Results Figure 2, please define "P or delta P/P". Also, for Figure 2c-f, what do the black vertical ticks represent?

      ∆P/P is the change in the gamma-band power relative to the resting-state baseline, and black tick marks indicate binarized periods of vibrissae motion (‘whisking’). We have clarified this in Figure caption 2 (Line 174).

      (6) Figure 3b-e, is there not an undershoot (eventually) amer 5s of stimulation that could be assessed? 

      Previous work has shown that there is no undershoot in response to whisker stimulations of a few seconds (Drew, Shih, Kelinfeld, PNAS, 2011).  The undershoot for brief stimuli happens within ~2.5 s of the onset/cessation of the brief stimulation, this is clearly lacking in the response to the 5s stim (Fig 3).  The neurovascular coupling mechanisms recruited during the short stimulation are different than those recruited during the long stimulus, making a comparison of the undershoot between the two stimulation durations problematic. 

      For Figures 3e and 6 how was surface arteriole diameter or vessel tone measured? 2P imaging of fluorescent dextran in plasma? Please add the experimental details of 2P imaging to the methods. Including some 2P images in the figures couldn't hurt to help the reader understand how these data were generated.

      We have added details about our 2-photon imaging (FITC-dextran, full-width at half-maximum calculation for vessel diameter) as well as a trace and vessel image to Figure 2.

      We have added the following text to the Materials & Methods (Line 477): “In two-photon experiments, mice were briefly anesthetized and retro-orbitally injected with 100 µL of 5% (weight/volume) fluorescein isothiocyanate–dextran (FITC) (FD150S, Sigma-Aldrich, St. Louis, MO) dissolved in sterile saline.”

      We have added the following text to the Materials & Methods (Line 532): “A rectangular box was drawn around a straight, evenly-illuminated vessel segment and the pixel intensity was averaged along the long axis to calculate the vessel’s diameter from the full-width at half-maximum (https://github.com/DrewLab/Surface-Vessel-FWHM-Diameter; (Drew, Shih et al. 2011)).”

      (7) Did the authors try stimulating other body parts (eg. limb) to estimate how specific the effects were, regionally? This is more of a curiosity question that the authors could comment on, I am not recommending new experiments.

      We did measure changes in [HbT] in the FL/HL representation of SI during locomotion (Line 205), which is known to increase neural activity in the somatosensory cortex (Huo, Smith and Drew, Journal of Neuroscience, 2014; Zhang et al., Nature Communications 2019). We observed a similar but not statistically significant trend of decreased [HbT] in SP-SAP compared to control. This may have been due to the sphere of influence of the ablation being centered on the vibrissae representation and not having fully encompassed the limb representation. We agree with the referee that it would be interesting to characterize these effects on other sensory regions as well as brain regions associated with tasks such as learning and behavior.

      (8) Regarding vasomotion experiments, are there no other components of this waveform that could be quantified beyond just variance? Amplitude, frequency? Maybe these don't add much but would be nice to see actual traces of the diameter fluctuations. Further, where exactly were widefield-based measures of vasomotion derived from? From some seed pixel or ~1mm ROI in the center of the whisker barrel cortex? Please clarify.

      The reviewer’s point is well taken. We have added power spectra of the resting-state data which provides amplitude and frequency information. The integrated area under the curve of the power spectra is equal to the variance. Widefield-based measures of vasomotion were taken from the 1 mm ROI in the center of the whisker barrel cortex.

      We have added the following text to the Materials & Methods (Line 560): “Variance during the resting-state for both ∆[HbT] and diameter signals (Fig. 7) was taken from resting-state events lasting ≥10 seconds in duration. Average ∆[HbT] from within the 1 mm ROI over the vibrissae representation of SI during each arousal state was taken with respect to awake resting baseline events ≥10 seconds in duration.” 

      (9) On page 13, the title seems like a bit strong. The data show a change in variance but that does not necessarily mean a change in absolute amplitude. Also, I did not see any reports of absolute vessel widths between groups from 2P experiments so any difference in the sampling of larger vs smaller arterioles could have affected the variance (ie. % changes could be much larger in smaller arterioles).

      We have updated the title of Figure 7 to specifically state power (which is equivalent to the variance) rather than amplitude (Line 331). We have also added absolute vessel widths to the Results (Line 340): “There was no difference in resting-state (baseline) diameter between the groups, with Blank-SAP having a diameter of 24.4 ± 7.5 μm and SP-SAP having a diameter of 23.0 ± 9.4 μm (Fest, p ti 0.61). “

      (10) Big picture question. How could a manipulation that affects so few cells in 1 hemisphere (below 0.5% of total neurons in a region comprising 1-2% of the volume of one hemisphere) have such profound effects in both hemispheres? The authors suggest that some may have long-range interhemispheric projections, but that is presumably a fraction of the already small fraction of Nos1 neurons. Perhaps these neurons have specializing projections to subcortical brain nuclei (Nucleus Basilis, Raphe, Locus Coerulus, reticular thalamus, etc) that then project widely to exert this outsized effect? Has there not been a detailed anatomical characterization of their efferent projections to cortical and sub-cortical areas? This point could be raised in the discussion.

      We apologize for the lack of clarity of our work in this point.  We would like to clarify that the only analysis showing a change in the unablated hemisphere being coherence/correlation analysis between the two hemispheres.  Other metrics (LFP power and CBV power spectra) do not change in the hemisphere contralateral to the injections site, as we show in data added in two supplementary figures (Fig. S4 and 7). The coherence/correlation is a measure of the correlated dynamics in the two hemispheres. For this metric to change, there only needs to be a change in the dynamics of one hemisphere relative to another.  If some aspects of the synchronization of neural and vascular dynamics across hemispheres are mediated by concurrent activation of type I nNOS neurons in both hemispheres, ablating them in one hemisphere will decrease synchrony. It is possible that type I nNOS neurons make some subcortical projections that were not reported in previous work (Tomioka 2005, Ruff 2024), but if these exist they are likely to be very small in number as they were not noted.  

      We have added the text in the Results (Line 228): “In contrast to the observed reductions in LFP in the ablated hemisphere, we noted no gross changes in the power spectra of neural LFP in the unablated hemisphere (Fig. S7) or power of the cerebral blood volume fluctuations in either hemisphere (Fig. S4).”

      Line 335): “The variance in ∆[HbT] during rest, a measure of vasomotion amplitude, was significantly reduced following type-I nNOS ablation (Fig. 7a), dropping from 40.9 ± 3.4 μM<sup>2</sup> in the Blank-SAP group (N ti 24, 12M/12F) to 23.3 ± 2.3 μM<sup>2</sup> in the SP-SAP group (N ti 24, 11M/13F) (GLME p ti 6.9×10<sup>-5</sup>) with no significant di[erence in the unablated hemisphere (Fig. S7).”

      Reviewer #3 (Recommendations for the authors):

      (1)  The reporting would be greatly strengthened by following ARRIVE guidelines 2.0: https://arriveguidelines.org/: aFrition rates and source of aFrition, justification for the use of 119 (beyond just consistent with previous studies), etc.

      We performed a power analysis prior to our study aiming to detect a physiologically-relevant effect size of (Cohen’s d) ti 1.3, or 1.3 standard deviations from the mean. Alpha and Power were set to the standard 0.05 and 0.80 respectively, requiring around 8 mice per group (SP-SAP, Blank, and for histology, naïve animals) for multiple independent groups (ephys, GCamp, histology). To potentially account for any aFrition due to failures in Type-I nNOS neuron ablation or other problems (such as electrode failure or window issues) we conservatively targeted a dozen mice for each group. Of mice that were imaged (1P/2P), two SP-SAP mice were removed from the dataset (24 SP-SAP remaining) post-histological analysis due to not showing ablation of nNOS neurons, an aFrition rate of approximately 8%.

      We have added the following text to the Materials & Methods (Line 441): “Sample sizes are consistent with previous studies (Echagarruga et al 2020, Turner et al 2023, Turner et al 2020, Zhang et al 2021) and based on a power analysis requiring 8-10 mice per group (Cohen’s d ti 1.3, α ti 0.05, (1 - β) ti 0.800). Experimenters were not blind to experimental conditions or data analysis except for histological experiments. Two SP-SAP mice were removed from the imaging datasets (24 SP-SAP remaining) due to not showing ablation of nNOS neurons during post-histological analysis, an aFrition rate of approximately 8%.”

      (2) Intro, line 38: Description of the importance of neurovascular coupling needs improvement. Coordinated haemodynamic activity is vital for maintaining neuronal health and the energy levels needed.

      We have added a sentence to the introduction (Line 41): “Neurovascular coupling plays a critical role in supporting neuronal function, as tightly coordinated hemodynamic activity is essential for meeting energy metabolism and maintaining brain health (Iadecola et al 2023, Schaeffer & Iadecola 2021).“

      (3) Given the wide range of mice ages, how was the age accounted for/its effects examined?

      Previous work from our lab has shown that there is no change in hemodynamics responses in awake mice over a wide range of ages (2-18 months), so the age range we used (3 and 9 months of age) should not impact this.  

      We have added the following text in the Results (Line 437): “Previous work from our lab has shown that the vasodilation elicited by whisker stimulation is the same in 2–4-month-old mice as in 18-month-old mice (BenneF, Zhang et al. 2024). As the age range used here is spanned by this time interval, we would not expect any age-related differences.”

      (4) How was the susceptibility of low-frequency neuronal coupling signals to noise managed? How were the low-frequency bands results validated?

      We are not sure what the referee is asking here. Our electrophysiology recordings were made differentially using stereotrodes with tips separated by ~100µm, which provides excellent common-mode rejection to noise and a localized LFP signal. Previous publications from our lab (Winder et al., Nature Neuroscience 2017; Turner et al., eLife2020) and others (Tu, Cramer, Zhang, eLife 2024) have repeatedly show that there is a very weak correlation between the power in the low frequency bands and hemodynamic signals, so our results are consistent with this previous work. 

      (5) It would be helpful to demonstrate the selectivity of cell *death* (as opposed to survival) induced by SP-SAP injections via assessments using markers of cell death.

      We agree that this would be helpful complement to our histological studies that show loss of type-I nNOS neurons, but no loss of other cells and minimal inflammation with SP-saporin injections.  However, we did not perform histology looking at cell death, only at surviving cells, given that we see no obvious inflammation or cells loss, which would be triggered by nonspecific cell death.  Previous work has established that saporin is cytotoxic and specific only to cell that internalize the saporin.   Internalization of saporin causes cell death via apoptosis (Bergamaschi, Perfe et al. 1996), and that the substance P receptor is internalized when the receptor is bound (Mantyh, Allen et al. 1995). Treatment of internalized saporin generates cellular debris that is phagocytosed by microglial, consistent with cell death (Seeger, Hartig et al. 1997). While it is possible that treatment of SP-saporin causes type 1 nNOS neurons to stop expressing nitric oxide synthase (which would make them disappear from our IHC staining), we think that this is unlikely given the literature shows internalized saporin is clearly cytotoxic. 

      We have added the following text to the Results (Line 131): “It is unlikely that the disappearance of type-I nNOS neurons is because they stopped expressing nNOS, as internalized saporin is cytotoxic. Exposure to SP-conjugated saporin causes rapid internalization of the SP receptor-ligand complex (Mantyh, Allen et al. 1995), and internalized saporin causes cell death via apoptosis (Bergamaschi, Perfe et al. 1996). In the brain, the resulting cellular debris from saporin administration is then cleared by microglia phagocytosis (Seeger, Hartig et al. 1997).”

      (6) Was the decrease in inter-hemispheric correlation associated with any changes to the corpus callosum?

      We noted no gross changes to the structure of the corpus callosum in any of our histological reconstructions following SSPSAP administration, however, we did not specifically test for this. Again, as we note in our reply in reviewer 2, the decrease in interhemispheric synchronization does not imply that there are changes in the corpus callosum and could be mediated by the changes in neural activity in the hemisphere in which the Type-I nNOS neurons were ablated.

      (7) How were automated cell counts validated?

      Criteria used for automated cell counts were validated with comparisons of manual counting as described in previous literature. We have added additional text describing the process in the Materials & Methods (Line 510): “For total cell counts, a region of interest (ROI) was delineated, and cells were automatically quantified under matched criteria for size, circularity and intensity. Image threshold was adjusted until absolute value percentages were between 1-10% of the histogram density. The function Analyze Par-cles was then used to estimate the number of particles with a size of 100-99999 pixels^2 and a circularity between 0.3 and 1.0 (Dao, Suresh Nair et al. 2020, Smith, Anderson et al. 2020, Sicher, Starnes et al. 2023). Immunoreactivity was quantified as mean fluorescence intensity of the ROI (Pleil, Rinker et al. 2015).”

      (8) Given the weighting of the vascular IOS readout to the superficial tissue, it is important to qualify the extent of the hemodynamic contrast, ie the limitations of this readout.

      We have added the following text to the Discussion (Line 385): “Intrinsic optical signal readout is primarily weighted toward superficial tissue given the absorption and scaFering characteristics of the wavelengths used. While surface vessels are tightly coupled with neural activity, it is still a maFer of debate whether surface or intracortical vessels are a more reliable indicator of ongoing activity (Goense et al 2012; Huber et al 2015; Poplawsky & Kim 2014).” 

      (9) Partial decreases observed through type-I iNOS neuronal ablation suggest other factors also play a role in regulating neural and vascular dynamics: data presented thus do *not* "indicate disruption of these neurons in diseases ranging from neurodegeneration to sleep disturbances," as currently stated. Please revise.

      We agree with the reviewer. We have changed the abstract sentence to read (Line 30): “This demonstrates that a small population of nNOS-positive neurons are indispensable for regulating both neural and vascular dynamics in the whole brain, raising the possibility that loss of these neurons could contribute to the development of neurodegenerative diseases and sleep disturbances.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors conducted a spatial analysis of dysplastic colon tissue using the Slide-seq method. Their main objective is to build a detailed spatial atlas that identifies distinct cellular programs and microenvironments within dysplastic lesions. Next, they correlated this observation with clinical outcomes in human colorectal cancer.

      Strengths:

      The work is a good example of utilising spatial methods to study different tumour models. The authors identified a unique stem cell program to understand tumours gently and improve patient stratification strategies.

      Weaknesses:

      However, the study's predominantly descriptive nature is a significant limitation. Although the spatial maps and correlations between cell states are interesting observations, the lack of functional validation-primarily through experiments in mouse models-weakens the causal inferences regarding the roles these cellular programs play in tumour progression and therapy resistance.

      We thank the reviewer for this comment. Indeed, functional validation to pin down causal dependencies and a more thorough investigation of tumor progression and therapy resistance both in mouse model as well as human patients and/or patient derived samples would broaden the insights to be gained from this work. Unfortunately, this is beyond the scope of this study.

      The authors also missed an opportunity to link the mutational status of malignant cells with the cellular neighbourhoods.

      The data reported in this study only contains spatial data for one mouse model (AV). As spatial data for the other model (AKPV) is missing, it is not possible to link the mutational type of the model with the cellular neighborhoods. We did investigate whether there is extra somatic mutational heterogeneity in the AV data, both regarding single nucleotide variations (SNVs) and copy number variations (CNVs). But at the time when the mice were sacrificed (after 3 weeks) there was no significant mutational heterogeneity discoverable.

      Overall, the study contributes to profiling the dysplastic colon landscape. The methodologies and data will benefit the research community, but further functional validation is crucial to validate the biological and clinical implications of the described cellular interactions.

      Reviewer #2 (Public review):

      In their study, Avraham-Davidi et al. combined scRNA-seq and spatial mapping studies to profile two preclinical mouse models of colorectal cancer: Apcfl/fl VilincreERT2 (AV) and Apcfl/fl LSL-KrasG12D Trp53fl/fl Rosa26LSL-tdTomato/+ VillinCreERT2 (AKPV). In the first part of the manuscript, the authors describe the analysis of the normal colon and dysplastic lesions induced in these models following tamoxifen injection. They highlight broad variations in immune and stromal cell composition within dysplastic lesions, emphasizing the infiltration of monocytes and granulocytes, the accumulation of IL-17+gdT cells, and the presence of a distinct group of endothelial cells. A major focus of the study is the remodeling of the epithelial compartment, where the most significant changes are observed. Using non-negative matrix factorization, the authors identify molecular programs of epithelial cell functions, emphasizing stemness, Wnt signaling, angiogenesis, and inflammation as major features associated with dysplastic cells. They conclude that findings from scRNA-seq analyses in mouse models are transposable to human CRC. In the second part of the manuscript, the authors aim to provide the spatial context for their scRNA-seq findings using Slide-seq and TACCO. They demonstrate that dysplastic lesions are disorganized and contain tumor-specific regions, which contextualize the spatial proximity between specific cell states and gene programs. Finally, they claim that these spatial organizations are conserved in human tumors and associate region-based gene signatures with patient outcomes in public datasets. Overall, the data were collected and analyzed using solid and validated methodology to offer a useful resource to the community.

      Main comments:

      (1) Clarity

      The manuscript would benefit from a substantial reorganization to improve clarity and accessibility for a broad readership. The text could be shortened and the number of figure panels reduced to emphasize the novel contributions of this work while minimizing extensive discussions on general and expected findings, such as tissue disorganization in dysplastic lesions. Additionally, figure panels are not consistently introduced in the correct order, and some are not discussed at all (e.g., Figure S1D; Figure 3C is introduced before Figure 3A; several panels in Figure 4 are not discussed). The annotation of scRNA-seq cell states is insufficiently explained, with no corresponding information about associated genes provided in the figures or tables. Multiple annotations are used to describe cell groups (e.g., TKN01 = γδ T and CD8 T, TKN05 = γδT_IL17+), but these are not jointly accessible in the figures, making the manuscript challenging to follow. It is also not clear what is the respective value of the two mouse models and time points of tissue collection in the analysis.

      We thank the reviewer for this suggestion. We clarified and simplified the revised manuscript, however we believe that the current discussions are an important part of the manuscript and would be useful to readers. We reordered panels in Figures S1 and 3 to align with their appearance in the manuscript. We kept the order of other panels as it is to keep both context and coherence of those figures intact. We changed the way we reference cell clusters in the manuscript to better align with the naming scheme introduced in Figure 1B. The respective value of the two mouse models as well as the time points of tissue collection are described in lines 108-120 of the manuscript.

      (2) Novelty

      While the study is of interest, it does not present major findings that significantly advance the field or motivate new directions and hypotheses. Many conclusions related to tissue composition and patient outcomes, such as the epithelial programs of Wnt signaling, angiogenesis, and stem cells, are well-established and not particularly novel. Greater exploration of the scRNA-seq data beyond cell type composition could enhance the novelty of the findings. For instance, several tumor microenvironment clusters uniquely detected in dysplastic lesions (e.g., Mono2, Mono3, Gran01, Gran02) are identified, but no further investigation is conducted to understand their biological programs, such as applying nNMF as was done for epithelial cells. Additional efforts to explore precise tissue localization and cellular interactions within tissue niches would provide deeper insights and go beyond the limited analyses currently displayed in the manuscript.

      We thank the reviewer for this comment. Our study aimed to spatially characterize the tumor microenvironment, with scRNA-seq analysis serving to support this spatial characterization.

      Due to technical limitations—such as the number of samples and the limited capture efficiency of Slide-seq—the resolution of immune cell identification in our spatial analysis is constrained. Additionally, while immune and stromal cells formed distinct clusters, epithelial cells exhibited a continuum that was better captured using nNMF.

      Lastly, our manuscript provides a general characterization of monocyte and granulocyte populations in scRNA-seq (line 144) and their spatial microenvironments (line 400). We believe that additional analyses of these populations would be beyond the scope of this study and could place an unnecessary burden on the reader. Instead, we suggest that such analyses be explored in future studies.

      We remark that we analyzed tissue localization for two entirely different spatial transcriptomics assays (Slide-seq and Cartana) at the resolution of cell types and programs, which was feasible within the constraints of the sparsity, gene panel and sample size in the experiments. A future potential path to further increase the resolution of investigation in this dataset is to include other datasets, e.g. by the emerging transformer-based spatial transcriptomics integration methods.

      We also remark that the manuscript already includes an investigation of cellular interactions within tissue niches based on COMMOT (Fig 4k, Fig S8i, Supp Item 4).

      (3) Validation

      Several statements made by the authors are insufficiently supported by the data presented in the manuscript and should be nuanced in the absence of proper validation. For example:

      (a) RNA velocity analyses: The conclusions drawn from these analyses are speculative and need further support.

      We thank the reviewer for this comment. We clarified that our conclusions from the RNA velocity analysis need further support by experimental validation (lines 223-225), which is outside the scope of the current study.

      (b) Annotations of epithelial clusters as dysplastic: These annotations could have been validated through morphological analyses and staining on FFPE slides.

      We thank the reviewer for this comment. While this could have been a possible approach, our study primarily relies on scRNA-seq, which does not preserve tissue morphology, and Slide-seq of fresh tissue, where such an analysis is particularly challenging.

      (c) Conservation of mouse epithelial programs in human tumors: The data in Figure S5B does not convincingly demonstrate the enrichment of stem cell program 16 in human samples. This should be more explicitly stated in the text, given the emphasis placed on this program by the authors.

      We thank the reviewer for pointing this out. We clarified the section about the stem cell program 16 and references to Figures S5A and S5B (lines 269-274): while we do see correlation in the definition of human programs with the mouse stem cell program (Figure S5A), we do not see a correlated expression of the stem cell program across human and mouse (Figure S5B).

      (d) Figure S6E: Cluster Epi06 is significantly overrepresented in spatial data compared to scRNA-seq, yet the authors claim that cell type composition is largely recapitulated without further discussion, which reduces confidence in other conclusions drawn.

      We thank the reviewer for this remark. Indeed, Epi06 was a cluster which drew our attention during early analyses for its mixed expression profiles with contributions of vastly different cell types. We concluded that this is best explained by doublets, but we cannot rule out (partial) non-doublet explanations (e.g. undifferentiated cells). As doublet detection with Scrublet did not flag those cells as doublets, we kept these cells in the workflow, but excluded them from further interpretation. While in the previous version of the manuscript we only shortly hinted to this in figure legend 2A ("Cluster Epi06: doublets (not called by Scrublet)"), we expanded on this in the methods section of the revised manuscript (lines 863-869). Given the doublet interpretation, the observation that this cluster is significantly overrepresented in the annotation of the spatial data is not surprising as this annotation comes from the decomposition of compositional data which contains contributions of multiple cells per Slide-seq bead which are structurally very similar to doublets. While Epi06 appears enriched in S6E when comparing Slide-Seq to scRNA-seq, there are multiple technical  cross platform differences, including different per-gene sensitivities or capture biases for certain cell types (e.g. stromal cells suffering more from dissociation in scRNA compared to Slide-Seq). We believe that comparisons between disease states within a single platform are more biologically meaningful, like the comparison between normal and premalignant tissue, which is presented in Figure S6G. To increase confidence in the analysis and to assess whether intra-platform biological conclusions are affected by the inclusion/exclusion of Epi06, we recreated Figure S6G for a Slide-Seq cell type annotation without Epi06 in the reference (see Author response image 1). Even though Epi06 is missing in that annotation, the strong enrichments are consistently preserved between the two analysis variants, while as expected some less significant enrichments with larger FDR values are not preserved.

      Author response image 1.

      Significance (FDR, color bar, two-sided Welch’s t test on CLR-transformed compositions) of enrichment (red) or depletion (blue) of cell clusters (rows) in normal (N) or AV (AV) tissues based on Slide-seq (“spatial”) data or scRNA-seq ("sc”) including (A) or excluding (B) Epi06 in the reference for annotating the Slide-Seq data (A is identical to Figure S6G in the manuscript).<br />

      Furthermore, stronger validation of key dysplastic regions (regions 6, 8, and 11) in mouse and human tissues using antibody-based imaging with markers identified in the analyses would have considerably strengthened the study. Such validation would better contextualize the distribution, composition, and relative abundance of these regions within human tumors, increasing the significance of the findings and aiding the generation of new pathophysiological hypotheses.

      We agree with the reviewer with their assessment that validation by antibody-based imaging (or other spatial proteomics data) would have been useful follow-up experiments, yet these are beyond the scope of the current study.

      Reviewer #1 (Recommendations for the authors):

      AV and AKPV have different oncogenic mutations, and their impact on spatial neighbourhoods is unclear. Can authors perform an analysis to understand the contribution of oncogenic mutations on the spatial landscape of CRC?

      The data reported in this study only contains spatial data for one mouse model (AV). As spatial data for the other model (AKPV) is missing, it is not possible to comparatively link the mutational type of the model with the spatial landscape.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review)

      (1) The authors postulate a synergistic role for Itgb1 and Itgb3 in the intravasation phenotype, because the single KOs did not replicate the phenotype of the DKO. However, this is not a correct interpretation in the opinion of this reviewer. The roles appear rather to be redundant. Synergistic roles would rather demonstrate a modest effect in the single KO with potentiation in the DKO.

      We agree that the interaction between Itgb1 and Itgb3 appears redundant and we have corrected this point in the revised manuscript (page 10).

      (2) The experiment does not explain how these integrins influence the interaction of the MK with their microenvironment. It is not surprising that attachment will be impacted by the presence or absence of integrins. However, it is unclear how activation of integrins allows the MK to become "architects for their ECM microenvironment" as the authors posit. A transcriptomic analysis of control and DKO MKs may help elucidate these effects.

      We do not yet understand how the activation of α5β1 or αvβ3 integrins affects ECM remodeling by megakaryocytes. Integrins are key regulators of ECM remodeling (see https://doi.org/10.1016/j.ceb.2006.08.009) and can transmit traction forces that induce these changes (see https://doi.org/10.1016/j.bpj.2008.10.009). Our previous study also found reduced RhoA activation in double knockout (DKO) megakaryocytes (MKs) (Guinard et al., 2023, PMID: 37171626), which likely affects ECM organization. These findings are discussed in the Discussion section of the paper (page 14).

      As suggested, conducting a transcriptomic analysis of control and DKO MKs may help to elucidate these effects. However, isolating native rare MKs from DKO mice is technically challenging and requires too many animals. To overcome this issue, we instead isolated mouse platelets and used targeted RT-PCR arrays to profile key ECM remodelling (ECM proteins, proteases…) and adhesion molecules (Zifkos et al., Circ. Res. 2024, PMID, 38563147). Quality controls confirmed that integrin RNA was undetectable in the DKO samples, ruling out contamination. Nevertheless, we found no significant expression differences exceeding the 3-fold change threshold between the control and DKO groups. The high Ct (threshold cycles) values indicate low transcript abundance, which may mask subtle changes (see the scatter plot below). As an example, we present a typical result obtained for the reviewer.

      Author response image 1.

      Relative expression comparison of ECM related-genes between control and DKO integrins in washed platelets. The figure shows a log transformation plot of the relative expression level of each gene between normal (x-axis) and DKO integrins (y-axis). The lines indicate the threefold change threshold for gene expression. These are representative results from two independent experiments.

      (3) Integrin DKO have a 50% reduction in platelets counts as reported previously, however laminin α4 deficiency only leads to 20% reduction in counts. This suggests a more nuanced and subtle role of the ECM in platelet growth. To this end, functional assays of the platelets in the KO and wildtype mice may provide more information.

      The exact contribution of the extracellular matrix (ECM) cage to platelet growth remains incompletely understood. In the Lamα4⁻/⁻ model, a collagen-rich ECM cage persists alongside normal fibronectin deposition. By contrast, the integrin DKO model exhibits a markedly severe phenotype characterized by the loss of both the laminin cage and collagen and the absence of fibrillar fibronectin. Also, the preserved collagen and fibronectin in Lamα4⁻/⁻ mice may permit residual activation of signaling pathways - potentially via integrins or alternative mechanisms- compared to the DKO model. We appreciate the reviewer’s feedback on this adjustment, which has been incorporated into the discussion (page 15).

      As suggested by the reviewer, we performed functional assays that demonstrated normal platelet function in Lamα4⁻/⁻ mice and impaired integrin-mediated aggregation in Itgb1<sup>-/-</sup>/Itgb3<sup>-/-</sup>  mice, as shown by the new data presented in the publication (see pages 7 and 9). Platelet function remained preserved following treatment with MMP inhibitors. This supports the idea that differences in ECM composition can influence the signaling environment and megakaryocyte maturation, but do not fully abrogate platelet function (page 15).

      (4) There is insufficient information in the Methods Section to understand the BM isolation approach. Did the authors flush the bone marrow and then image residual bone, or the extruded bone marrow itself as described in PMID: 29104956?

      Additional methodological information has been provided to clarify that only the extruded bone marrow, and not the bone itself, is isolated (page 17).

      (5) The references in the Methods section were very frustrating. The authors reference Eckly et al 2020 (PMID : 32702204) which provides no more detail but references a previous publication (PMID: 24152908), which also offers no information and references a further paper (PMID: 22008103), which, as far as this reviewer can tell, did not describe the methodology of in situ bone marrow imaging.

      To address this confusion, we have added the reference "In Situ Exploration of the Major Steps of Megakaryopoiesis Using Transmission Electron Microscopy" by C. Scandola et al. (PMID : 34570102) in the « Isolation and preservation of murine bone marrow » section (page 20), which provides a standardized protocol for bone marrow isolation and in situ bone marrow imaging.

      Therefore, this reviewer cannot tell how the preparation was performed and, importantly, how can we be sure that the microarchitecture of the tissue did not get distorted in the process?

      Thank you for pointing this out. While we cannot completely rule out the possibility of distortion, we have clarified the precautions taken to minimize it. We used a double fixation procedure immediately after bone marrow extrusion, followed by embedding it in agarose to preserve its integrity as much as possible. We have elaborated on this point in greater detail in the Methods section of the revised version (page 18).

      Reviewer #2 (Public review):

      (1) ECM cage imaging

      (a) The value or additional information provided by the staining on nano-sections (A) is not clear, especially considering that the thick vibratome sections already display the entirety of the laminin γ1 cage structure effectively. Further clarification on the unique insights gained from each approach would help justify its inclusion.

      Ultrathin cryosectioning enables high-resolution imaging with a threefold increase in Z-resolution, facilitating precise analysis of signal superposition. This approach was particularly valuable for clearly visualizing activated integrin in contact with laminin and collagen IV fibers (see Fig. 3 in revised manuscript, pages 6, 8 and 18). Additionally, 3D reconstructions and z-stack data reveal complex interactions between the basement membrane and the cellular ECM cage that are not evident in 2D projections (see page 6). These complementary methods help elucidate the detailed molecular and three-dimensional organization of the ECM cage surrounding megakaryocytes. These points have been clarified in the method and result sections.

      (b) The sMK shown in Supplementary Figure 1C appears to be linked to two sinusoids, releasing proplatelets to the more distant vessels. Is this observation representative, and if so, can further discussion be provided?

      This observation is not representative; MKs can also be associated with just one sinusoid.

      (c) Freshly isolated BM-derived MKs are reported to maintain their laminin γ1 cage. Are the proportions of MKs with/without cages consistent with those observed in microscopy?   

      After mechanical dissociation and size exclusion, almost half of the MKs successfully retained their cages (53.4% ± 5.6%, based on 329 MKs from three experiments; see page 7 of the manuscript for new data). This highlights the strong physical connection between MK and their cage.

      (2) ECM cage formation

      (a) The statement "the full assembly of the 3D ECM cage required megakaryocyte interaction with the sinusoidal basement membrane" on page 7 is too strong given the data presented at this stage of the study. Supplemental Figure 1C shows that approximately 10% of pMKs form cages without direct vessel contact, indicating that other factors may also play a role in cage formation.

      The reviewer is correct. We have adjust the text to reflect a more cautious interpretation of our results. « Althought we cannot exclude that ECM cage can be form on its own, our data suggests that ECM cage assembly may require interactions between megakaryocytes and the sinusoidal basement membrane » suggests that the assembly of the 3D ECM cage may require interactions between megakaryocytes and the sinusoidal basement membrane » (page 7).

      (b) The data supporting the statement that "pMK represent a small fraction of the total MK population" (cell number or density) could be shown to help contextualize the 10% of them with a cage.

      Following the reviewer's recommendation, a new bar graph has been added to illustrate the 18 ± 1.3 % of MK in the parenchyma relative to the total MK in the bone marrow (page 7 and Suppl. Figure 1H).

      (c) How "the full assembly of the 3D ECM cage" is defined at this stage of the study should be clarified, specifically regarding the ECM components and structural features that characterize its completion.

      We recognize that the term ' full assembly' of the 3D ECM cage can be misleading, as it might suggest different stages of cage formation, such as a completed cage, one in the formation process, or an incomplete cage. Since we have not yet studied this concept, we have eliminate the term "full assembly" from the manuscript to avoid confusion. Instead, we mention the presence of a cage.

      (3) Data on MK Circulation and Cage Integrity: Does the cage require full component integrity to prevent MK release in circulation? Are circulating MKs found in Lama4-/- mice? Is the intravasation affected in these mice? Are the ~50% sinusoid associated MK functional?  

      In lamα4-deficient (Lamα4-/-) mice, which possess an intact collagen IV cage but a structurally compromised laminin cage, electron microscopy and whole-mount imaging revealed an absence of intact megakaryocytes within the sinusoidal lumen. This observation indicates that the structural integrity of all components of the ECM cage is critical for preventing megakaryocyte entry into the circulation. Despite the laminin deficiency, mature Lamα4-/- megakaryocytes exhibited normal ultrastructure and maintained typical intravasation behavior. Furthermore, analysis of bone marrow explants from Lamα4-/- mice demonstrated that megakaryocytes retained their capacity to extend proplatelets. These findings are presented on page 7 and further discussed on page 14.

      (4) Methodology

      (a) Details on fixation time are not provided, which is critical as it can impact antibody binding and staining. Including this information would improve reproducibility and feasibility for other researchers.

      We have included this information in the methods section.

      (b) The description of 'random length measuring' is unclear, and the rationale behind choosing random quantification should be explained. Additionally, in the shown image, it appears that only the branching ends were measured, which makes it difficult to discern the randomness in the measurements.

      The random length measurement method uses random sampling to provide unbiased data on laminin/collagen fibers in a 3D cage. Contrary to what the initial image might have suggested, measurements go beyond just the branching ends ; they include intervals between various branching points throughout the cage. This is now explained page 19.

      To clarify this process, we will outline these steps page 19 as : 1) acquire 3D images, 2) project onto 2D planar sections, 3) select random intersection points for measurement, 4) measure intervals using ImageJ software, and 5) repeat the process for a representative dataset. This will better illustrate the randomness of our measurements.

      (5) Figures

      (a) Overall, the figures and their corresponding legends would benefit from greater clarity if some panels were split, such as separating images from graph quantifications.

      Following the reviewer’s suggestion, we will fully update all the Figures and separate images from graph quantifications.

      Reviewer #3 (Public review):

      (1) The data linking ECM cage formation to MK maturation raises several interesting questions. As the authors mention, MKs have been suggested to mature rapidly at the sinusoids, and both integrin KO and laminin KO MKs appear mislocalized away from the sinusoids. Additionally, average MK distances from the sinusoid may also help separate whether the maturation defects could be in part due to impaired migration towards CXCL12 at the sinusoid. Presumably, MKs could appear mislocalized away from the sinusoid given the data presented suggesting they leaving the BM and entering circulation. Additional data or commentary on intrinsic (ex-vivo) MK maturation phenotypes may help strengthen the author's conclusions and shed light on whether an essential function of the ECM cage is integrin activation at the sinusoid.

      The idea that megakaryocytes move toward CXCL12 is still debated. Some studies suggest mature MKs are mainly sessile (PMID: 28743899), while others propose that CXCL12 may guide MK progenitors rather than mature MKs (PMID: 38987596, this reference has been added). To address the reviewer’s concerns regarding CXCL12-mediated migration, we conducted additional investigations.

      For DKO integrins, Guinard et al. (2023, PMID: 37171626) reported no significant change in the distance between MKs and sinusoids, indicating that integrin deficiency does not impair MK migration toward sinusoidal vessels.

      In our own study involving Lamα4-/- mice, we utilized whole-mount bone marrow preparations, labeling MKs with GPIbβ antibodies and sinusoids with FABP4 antibodies. We observed a 1.6-fold increase in the proximity of MKs to sinusoids in Lamα4-/- mice compared to controls (see figure below). However, the absolute distances measured were less than 3 µm in both groups, much smaller than the average diameter of a mature MK (20 - 25 µm), raising questions about the biological significance of these findings in active MK migration. What happens with MK progenitors - a population not detectable in our experiments using morphological criteria or GPIb staining - remains an open question.

      These results are provided for the reviewer’s information and will be available to eLife readers, along with the authors’ responses, in the revised manuscript.

      Author response image 2.

      (2) The data demonstrating intact MKs in the circulation is intriguing - can the authors comment or provide evidence as to whether MKs are detectable in blood? A quantitative metric may strengthen these observations.

      To investigate this, we conducted flow cytometry experiments and prepared blood smears to determine the presence of intact Itgb1-/-/Itgb3-/- megakaryocytes in the blood. Unfortunately, we could not detect any intact megakaryocytes in the blood samples using FACS (see new Supplementary Figure 4E) nor any on the blood smears (data not shown). However, we observed that large, denuded megakaryocyte nuclei were retained in the downstream pulmonary capillaries of these mice. Intravital imaging of the lung has previously provided direct evidence for the phenomenon of microvascular trapping (Lefrançois et al., 2017; PMID: 28329764), demonstrating that megakaryocytes can be physically entrapped within the pulmonary circulation due to size exclusion while releasing platelets. This has been clarified in the revised paper (Results section, page 10).

      (3) Supplementary Figure 6 - shows no effect on in vitro MK maturation and proplt, or MK area - But Figures 6B/6C demonstrate an increase in total MK number in MMP-inhibitor treated mice compared to control. Some additional clarification in the text may substantiate the author's conclusions as to either the source of the MMPs or the in vitro environment not fully reflecting the complex and dynamic niche of the BM ECM in vivo.

      This is a valid point. We have revised the text to be more cautious and to provide further clarification on these points (page 12).

      (4) Similarly, one function of the ECM discussed relates to MK maturation but in the B1/3 integrin KO mice, the presence of the ECM cage is reduced but there appears to be no significant impact upon maturation (Supplementary Figure 4). By contrast, MMP inhibition in vivo (but not in vitro) reduces MK maturation. These data could be better clarified in the text, or by the addition of experiments addressing whether the composition and quantity of ECM cage components directly inhibit maturation versus whether effects of MMP-inhibitors perhaps lead to over-activation of the integrins (as with the B4galt KO in the discussion) are responsible for the differences in maturation.

      We thank the reviewer for pointing this out.

      In our study of DKO integrin mice with a reduced extracellular matrix (ECM) cage, we observed normal proportions of MK maturation stages. However, these mutant MKs had a disorganized membrane system and smaller cytoplasmic areas compared to wild-type cells, indicating issues in their maturation. This is detailed further in the manuscript (see page 9).

      In the context of MMP inhibition in vivo, which also leads to reduced MK maturation, our immunofluorescence analysis revealed in an increased presence of activated β1 integrin in bone marrow sections (see Supplementary Figure 6E). As suggested by the reviewer, this increase may explain the maturation defect.

      In summary, while it's challenging to definitively determine how ECM cage composition and quantity affect MK maturation in vivo, our results show that changes to the ECM cage - whether through genetic modification (DKO) or MMP inhibition - are consistently linked to defects in MK maturation.

      Reviewer #1 (Recommendations for the authors):

      (1) Movies 1-3 are referenced in the Results section, but this reviewer was not able to find a movie file.

      They have now been added to the downloaded revised manuscript.

      (2) Figure 2D is referenced in the Results Section but this panel is not present in the Figure itself. Instead, this seems to be what is referred to as the right panel of 2C. 

      Thank you. Following the suggestion of reviewer 2, we have now split the panels and separated the images from the graph quantifications. This change has modified all the panel annotations, which we have carefully checked both in the legend and in the manuscript.

      (3) Supplemental Fig 3C has Fibrinogen quantification which seems to belong in Supplemental 3 F instead.  

      Supplementary Figure 3C serves as a control for immunofluorescence, indicating that no fibrinogen-positive granules are detectable in the DKO mice. This supports the conclusion that the αIIbβ3 integrin-mediated fibrinogen internalization pathway is non-functional in this model, affirming the bar graph's placement. We appreciate the reviewer’s insight that similar results may arise from the IEM experiments in Figure 3H, which is valuable for strengthening our findings.

      (4) The x-axis labels in Supplemental 5B are not uniform.  

      This has be done. Thank you.

      Reviewer #2 (Recommendations for the authors):

      (1) Figure 1 Panel C: The sinusoidal basement membrane staining is missing, making it difficult to conclude that the collagen IV organization extends radially from the sinusoidal basement membrane.

      As recommended by the reviewer, we have updated Figure 1C with a new image illustrating the basement membrane (FABP4 staining) and the collagen IV cage. This new image confirms that the cage extends radially from the basement membrane.

      (2) Arrows in 1B: Based on the arrow's localisation, the description of "basement membrane-cage connection" is not evident from the images as it looks like the signal colocalization (right lower panel) occurs below the highlighted areas. Clarification or additional evidence of co-localization is required. 

      The apparent localization of the signal "below" the highlighted areas in the maximal projection image is due to the nature of 2D projections, which compress overlapping signals from multiple depths within the bone marrow into a single plane. This can obscure the spatial relationship between the basement membrane and extracellular matrix (ECM) components. However, when the complete z-stack series is examined, the direct connection between the basement membrane and the ECM cage becomes evident in three dimensions. Therefore, we have now added a comprehensive analysis of the entire z-stack dataset, allowing us to accurately interpret the spatial relationships between the basement membrane and ECM in the native bone marrow microenvironments (movies 1 and 2, and Suppl. Figure 1D-E).

      (3) In Figure 4C, GPIX is used to identify MKs by IVM while GP1bβ is used throughout the rest of the manuscript. It would be helpful for readers who are less familiar with MKs to understand whether GPIX and GP1bβ identify the same population of MKs and the rationale for choosing one marker over the other.  

      GPIX and GPIbβ are components of the GPIb-IX complex, identifying mature megakaryocytes (Lepage et al., 2000, PMID : 11110688). The choice of one over the other in different experiments is primarily based on technical considerations. The intravital experiments have been standardized using an AF488-conjugated anti-GPIX to identify mature megakaryocytes consistently. GPIbβ (GP1bβ) is used in the rest of the manuscript due to its strong and specific bright staining. We have clarified this point in the Result (page 10) and in the Material/methods section (page 17).

      (4) The term "total number of MKs" is used (p8), but the associated data presented in the figure reflect MK density per surface area. Descriptions in the text should align with the data format in the figures.

      This has been corrected in the revised manuscript (page 8). Thank you.

      (5) Supplemental Figure 1(B): Collagen I is written as Collagen III in the legend.

      This has been corrected in the legend of the Figure 1B.

      (6) Figure 2D is described in the text but is missing from the figure.

      This has been corrected.

      (7) Supplemental Figure 3: Plot E overlaps with the images, making it unclear.

      To minimise overlap with the images, we've moved the graph with the bars down. Thank you.

      (8) Supplemental Figure 7: The image quality is too low, and spelling underlining issues are present. A better-quality version with clear labelling is essential.

      We have improved the quality of Figure 7 and fixed the underlining problems.

      (9) The movies were not found in the downloads provided.

      They have now been added to the downloaded revised manuscript.

      (10) Some bar graphs are missing the individual data points.

      All figures have been standardized and now include the individual data points.

      Reviewer #3 (Recommendations for the authors):

      Some minor comments:

      (1) If there is specific importance to some of the analyses of the cage structure, such as fiber length, and pore size, (eg. if they may have biological significance to the MK) it may help readers to give additional context to what differences in the pore size might imply. For example, do pores constrain MKs at sites where actin-driven proplatelet formation could be initiated?

      The effects of extracellular matrix (ECM) features - like fiber length and pore size - on megakaryocyte (MK) biology are not fully understood. Longer ECM fibers may help MKs adhere better and sense their environment. Larger pores could make it easier for MKs to grow, communicate, and extend proplatelets through blood vessel walls. The role of matrix metalloproteinases (MMPs), which degrade the ECM, adds to the complexity, and how this occurs in vivo is not yet well understood.

      As suggested, some of these points have been addressed in the revised manuscript (Discussion, page 16).

      (2) "Although fibronectin and fibrinogen were readily detected around megakaryocytes, a reticular network around megakaryocytes was not observed. Furthermore, no connection was identified between fibronectin and fibrinogen deposition with the sinusoid basement membrane, in contrast to the findings for laminin and collagen IV (Supp. Figures 1E)." - Clarification of how these data are interpreted might be helpful as to what the authors are intending to demonstrate with these data as at least in Figure 1E, fibronectin, and fibrinogen do appear expressed along the MK surface and at the sinusoidal-MK interface.

      While fibronectin and fibrinogen are present around megakaryocytes and at the vessel-cell interface, they do not form a reticular ECM cage. The functional implications of this finding remain unclear. One can imagine that the specific spatial arrangement of various ECM components may lead to different functional roles. Laminin and collagen IV may provide structural support by forming a 3D cage that is essential for the proper positioning and maturation of megakaryocytes. In contrast, fibronectin and fibrinogen may have different functions, potentially related to megakaryocyte expansion in bone marrow fibrosis (Malara et al., 2019, PMID : 30733282) and (Matsuura et al., 2020, PMID : 32294178).  

      This topic has been adressed in the Results page 7 and discussion on page 13.

      (3) Given the effects of dual B1/B3 integrin inhibition on MK intravasation, can the authors comment on the use of integrin RGD-based inhibitors? Are these compounds and drugs likely to interfere with MK retention?

      Our study shows that MK retention depends on the integrity of both components of the cage, collagen IV and laminin (see also point 3 of reviewer 2). Collagen IV contains RGD sequences, making it susceptible to RGD-based inhibition, whereas laminin does not utilize the RGD motif, raising questions about the overall efficacy of these inhibitors.

      In addition, the in vivo efficacy and potential off-target effects of these inhibitors in the complex bone marrow microenvironment remain to be fully elucidated. This intriguing issue warrants further investigation.

      (4) Beyond protein components, other non-protein ECM molecules including glycosaminoglycans (HA, HS) have essential roles in supporting MK function, including maturation (PMIDs: 31436532, 36066492, 27398974) and may merit some brief discussion if the authors feel this is helpful.

      We followed reviewer’s suggestion and mention the contribution of glycoaminoglycans in MK maturation. We also added the three references (page 13). 

      (5) In several locations, the text refers to figure panels that are either not present or not annotated correctly (some examples include Figure 2D, Supplementary Figure 3E vs 3D).

      Following the suggestion of reviewer 2, we have now split the panels and separated the images from the graph quantifications. This change has changed all the panel annotations, which we have carefully checked both in the legend and in the manuscript.

      (6) In some cases, the figure legends seem to incorrectly refer to text, colors, or elements in the panels (e.g. Supplementary Figure 3, fibrinogen is referred to as yellow in the legend but is green in the figure). In Supplemental Figure 1, an image is annotated as pryenocyte in the figure, but splenocyte in the text.

      This has been corrected in the figures and in the revised manuscript. Please also see point (7) below.  Thank you very much.

      (7) Images demonstrating GPIX and GPIBb positive cells in the calvarial and lung microcirculation are convincing, but in Figure C these cells are referred to as MKs, whereas in Figure D they are referred to as pyrenocytes (as well as in the discussion). It is not clear if this is intentional and refers to bare nuclei from erythrocytes or indeed refers to MKs or MK nuclei. Clarification would help guide readers.

      We agree with the reviewer and fully acknowledge the need for clarification. We confirm that these circulating cells are megakaryocytes. To avoid confusion, we have ensure that all references to "pyrenocytes" have been replaced with "megakaryocytes."

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This work starts with the observation that embryo polarization is asynchronous starting at the early 8-cell stage, with early polarizing cells being biased towards producing the trophectoderm (TE) lineage. They further found that reduced CARM1 activity and upregulation of its substrate BAF155 promote early polarization and TE specification, this piece of evidence connects the previous finding that at Carm1 heterogeneity 4-cell stage guide later cell lineages - the higher Carm1-expressing blastomeres are biased towards ICM lineage. Thus, this work provides a link between asymmetries at the 4-cell stage and polarization at the 8-cell stage, providing a cohesive explanation regarding the first lineage allocation in mouse embryos.

      Strengths:

      In addition to what has been put in the summary, the advanced 3D image-based analysis has found that early polarization is associated with a change in cell geometry in blastomeres, regarding the ratio of the long axis to the short axis. This is considered a new observation that has not been identified.

      Weaknesses:

      For the microinjection-based method to overexpression/deletion of proteins, although it has been shown to be effective in the early embryo settings and has been widely used, it may not fully represent the in vivo situation in some cases, compared to other strategies such as the use of knock-in mice. This is a minor weakness; it would be good to include some sentences in the discussion on the potential caveats.

      We thank the reviewer for their insightful summary of our work, and their adjudication on the novelty of our research. We agree with the reviewer that microinjection-based methods, whilst being the standard and widely used in the field, have their weaknesses. In this study, we have primarily used microinjection of previously tested and known constructs which may help mitigate these concerns, and have referenced numerous studies in which these constructs have been used and tested. Nevertheless, the authors are aware of this drawback and have tried to address this previously in other research using novel artificial intelligence techniques (Shen and Lamba et al., 2022 – cited in the manuscript) and this continues to be an active area of investigation for us.

      Reviewer #2 (Public review):

      Summary:

      In this study, Lamba and colleagues suggest a molecular mechanism to explain cell heterogeneity in cell specification during pre-implantation development. They show that embryo polarization is asynchronous. They propose that reduced CARM1 activity and upregulation of its substrate BAF155 promote early polarization and trophectoderm specification.

      Strengths:

      The authors use appropriate and validated methodology to address their scientific questions. They also report excellent live imaging. Most of the data are accompanied by careful quantifications.

      Weaknesses:

      I think this manuscript requires some more quantification, increased number of embryos in their evaluations and clearly stating the number of embryos evaluated per experiments.

      We thank the reviewer for these thoughtful comments on our work, their kind assessment of the strength of our research, and their notes on the weaknesses. We have replied to their points raised below.

      Here are some points:

      (1) It should be clearly stated in all figure legends and in the text how many cells from how many embryos were analyzed.

      We appreciate this comment to provide detailed quantification for every experiment in the paper and stating the numbers of embryos (if a whole embryo level experiment) or blastomeres used for statistical tests and displayed in the graph.

      (2) I think that the number of embryos sometimes are too low. These are mouse embryos easily accessible and the methods used are well established in this lab, so the authors should make an effort to have at least 10/15 embryos per experiment. For example "In agreement with this, hybridization chain reaction (HCR) RNA fluorescence in situ hybridization of early 8-cell stage embryos revealed that the number of CDX2 mRNA puncta was higher in polarized blastomeres with a PARD6-positive apical domain than in unpolarized blastomeres, for 5 out of 6 embryos with EP cells (Figure 3A, B)".. or the data for Figure 4, we know how many cells but now how many embryos.

      We appreciate the reviewer’s comment regarding the number of embryos used in the hybridization chain reaction (HCR) experiment. We agree that increasing the number of embryos could, in principle, further add statistical power. However, both first authors have since left the lab to begin their postdoctoral training or joining a company, and it is not feasible for us to generate additional embryos at this stage.

      Importantly, we believe the number of embryos included in the current manuscript is sufficient to support our conclusions, especially when considered in the context of the broader experimental design, the timing of the study, and our ethical commitment to minimizing animal use.

      Notably, the initial HCR experiment targeting Cdx2 mRNA served as a key indication that prompted further investigation of CDX2 at the protein level. These follow-up experiments were conducted with increased numbers of embryos and/or cells and are presented in Figure 3 and the associated supplementary figures (we now have 124 cells (including 23 EP cells) from 16 embryos), thereby strengthening and confirming the conclusion suggested by the HCR data.

      (3) It would be useful to see in Figure 4 an example of asymmetric cell division as done for symmetric cell division in panel 4B. This could really help the reader to understand how the authors assessed this.

      We used live imaging to track cell division patterns. Cells expressing RFP-tagged polarity proteins were observed during division to identify the resulting daughter cells. Immediately after cytokinesis, we assessed the polarity status of each daughter cell. If both daughter cells were polarized, the division was classified as symmetric; if only one was polarized, it was classified as asymmetric.

      Author response image 1.

      8-cell stage embryos expressing Ezrin-RFP (fire colour) was imaged during 8-16 cell stage division. Top panel arrows indicate a symmetric cell division in which polarity domain became partitioned into both daughter cells; bottom panel indicates asymmetric division in which the polarity domain only get inherited to one cell of the two daughter cells.

      (4) Figure 5C there is a big disproportion of the number of EP and LP identified. Could the authors increase the number of embryos quantified and see if they can increase EP numbers?

      We thank the reviewer for this comment and want to clarify an important detail: EP cells are a phenomenon with average cellular frequency of less than 10% as compared to LP cells (the other 90%). Therefore, when investigating natural embryo development without bias or exclusion, there will likely be an imbalance in the number of EP and LP cells as is the case for Figure 5C. In this case, morphological differences and clear statistical significance were seen between the shape of EP and LP cells within the cells quantified and therefore we decided not to expend further mice for this particular experiment – but we agree with the comment that in most cases additional embryos would help strength our conclusions further.

      (5) Could the authors give more details about how they mount the embryos for live imaging? With agarose or another technique? In which dishes? Overlaid with how much medium and oil? This could help other labs that want to replicate the live imaging in their labs. Also, was it a z-stack analysis? If yes, how many um per stack? Ideally, if they also know the laser power used (at least a range) it would be extremely useful.

      We thank the reviewer for this comment and have provided additional detail here and in the Methods section. For live imaging our embryos, we used glass-bottom 35 mm dishes. We then fixed a small cut square of nylon mesh (5mm to 1cm width and height) onto this plate in the centre using silicon which was used as a grid (diameter of approximately 150 micrometres) for deposition of embryos. After drying of the silicon (overnight) and washing with water, the grid was overlaid with a drop of 100 microlitres of KSOM and then covered with mineral oil until this KSOM drop was submerged. After incubation under conditions for live imaging, single embryos were deposited in each ‘well’ of the grid before being placed in the microscope, which was equilibrated at the correct temperature and CO2.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Reviews):

      Weaknesses: 

      Overall I find the data presented compelling, but I feel that the number of observations is quite low (typically n=3-7 neurons, typically one per animal). While I understand that only a few slices can be obtained for the IPN from each animal, the strength of the novel findings would be more convincing with more frequent observations (larger n, more than one per animal). The findings here suggest that the authors have identified a novel mechanism for the normal function of neurotransmission in the IPN, so it would be expected to be observable in almost any animal. Thus,  it is not clear to me why the authors investigated so few neurons per slice and chose to combine different treatments into one group (e.g. Figure 2f), even if the treatments have the same expected effect.  

      This is a well taken suggestion. However, we must  point out that we do perform statistical analyses on the original datasets and we believe that our conclusions are justified as acknowledged by the Reviewer. As the Reviewer is aware,  the IPN is a small nucleus and with the slicing protocol used, we typically attain 1-2 slices per mouse that are suitable for recordings. Since most of the experiments in the manuscript deals with some form of pharmacological interrogation, we were reticent to use slices that are not naïve and therefore in general did not perform more than 1 cell recording per slice. Having said this, to comply with the Reviewer’s suggestion we have now performed additional experiments to increase the n number for certain experiments. We have amended all figures and legends to incorporate the additional data. We must point out that during the replotting of the data in the summary Figure 8i (previously Figure 7i) we noticed an error with the data representation of the TAC IPL data and have now corrected this oversight  

      Figure 2b,c. 

      500nM DAMGO effect on TAC IPL AMPAR EPSC – n increased from 5 to 9

      Figure 3g. 

      500nM DAMGO effect on CHAT IPR AMPAR EPSC – n increased from 8 to 16 Effect of CTAP on DAMGO on CHAT IPR AMPAR EPSC – n increased from 4 to 7

      Figure 3i. 

      500nm DAMGO or Met-enk effect in “silent” CHAT IPR AMPAR EPSC – n increased    from 7 to 9

      Figure 4e. 

      500nM DAMGO effect on ES coupling – Note: in the original version the n number was 5 and not 7 as written in the figure legend. We have now increased the n from 5 – 9.

      Figure 5e,f. 

      500nM DAMGO effect on TAC IPR AMPAR EPSC – n increased from 5 to 9

      Figure 7f.

      Effect of DHE on EPSC amplitude after application of DNQX/APV/4-AP or DTX-α – n increased from 7-9.

      Figure 7g.

      Emergence of nAChR EPSC after DTX – n increased from 4 to 7

      Figure 7i. 

      Effect of ambenonium on nAChR amplitude and charge – n increased from 4 to 7

      Supplementary Figure 3c and h

      Effect of DAMGO after DNQX – n increased from 4 to 7

      Effect of DNQX after DAMGO mediated potentiation – n increased from 3 to 5.

      Throughout the study (Figs. 3i, 7f and 8h in the revised manuscript)  we do indeed pool datasets that were amassed from different conditions since we were not directly investigating the possibility of any deviation in the extent of response between said treatments. For example, and as pointed out by the Reviewer, in Fig. 2F (now Fig. 3i) the use of DAMGO and met-ENK were merely employed to ascertain whether light-evoked synaptic transmission (ChATCre:ai32 mice) in cells that had no measurable EPSC could be pharmacologically “unsilenced” by mOR activation. Thus, the means by which mOR receptor was activated was not relevant to this specific question. Note: 2 more recordings are now added to this dataset (Fig. 3i) that were taken from ChATChR2/SSTCre:ai9 mice in response to the comment by this Reviewer below (“Are there baseline differences in the electrophysiological or morphological properties of these "silent" neurons compared to the responsive neurons?”).  Similarly, in the revised Fig.7f we pooled data investigating the pharmacological block of the EPSC that emerged following application of either DNQX/APV/4-AP or DNQX/APV/DTX. Low concentrations 4-AP or DTX were interchangeably employed to reveal the DNQX-insensitive EPSC that we go on to show is indeed the nAChR response. Finally, in Fig. 8h, we pooled data demonstrating a  lack of effect of DAMGO in potentiating  both the glutamatergic and cholinergic arms of synaptic transmission in the OPRM1 KO mice. Again, here we were only interested in determining whether removal of mOR expression prevented potentiation of transmission mediated by mHB ChAT neurons irrespective of neurotransmitter modality.  Thus, overall we were careful to only pool data in those instances where it  would not change the interpretation and hence conclusions reached. 

      There are also significant sex differences in nAChR expression in the IPN that might not be functionally apparent using the low n presented here. It would be helpful to know which of the recorded neurons came from each sex, rather than presenting only the pooled data.  

      As the reviewer correctly states there are veins of literature concerning a divergence, based on sex, of not only nicotinic receptor expression but also behaviors associated with nicotine addiction. However, we have reanalyzed our datasets focusing on the extent of the mOR potentiation of glutamatergic and cholinergic transmission mediated by mHB ChAT neurons in IPR  between male and female mice. Please refer to the Author response image 1 below. Although there is a possible trend towards a higher potentiation of nAChR in female mice, this was not found to be of statistical significance (see Author response image 1 below). We therefore chose not to split our data in the manuscript based on gender.

      Author response image 1.

      Comparison of the mOR (500nM DAMGO) mediated potentiation on evoked (a) AMPAR and (b) nAChR  EPSCs in IPR between male and female mice.  

      There are also some particularly novel observations that are presented but not followed up on, and this creates a somewhat disjointed story. For example, in Figure 2, the authors identify neurons in which no response is elicited by light stimulation of ChAT-neurons, but the application of DAMGO (mOR agonist) un-silences these neurons. Are there baseline differences in the electrophysiological or morphological properties of these "silent" neurons compared to the responsive neurons?  

      Unfortunately, we did not routinely measure intrinsic properties of the recorded postsynaptic neurons nor systematically recovered biocytin fills to assess morphology. Therefore, it remains unclear whether the  neurons in which there were none or minimal AMPAR-mediated EPSCs are distinct to the ones displaying measurable responses. The IPR is resident to GABAergic SST neurons that comprise the most numerous neuron type in this IPN subdivision. Although heavily outnumbered by the SST neurons there are additionally VGluT3+ glutamatergic neurons in IPN. The Reviewer is likely referring to a recent study investigating synaptic transmission specifically onto  SST+ and VGluT3+ neurons in IPN demonstrating that mHB cholinergic mediated glutamatergic input is “weaker” onto the glutamatergic neurons. Furthermore, in some instances synaptic transmission onto this latter population can be “unsilenced” by GABAB receptor activation in a similar manner to that seen with mOR activation in this manuscript when IPR neurons are blindly targeted(Stinson & Ninan, 2025).  Using a similar strategy as in this recent study(Stinson & Ninan, 2025), we now include experiments in which the ChATChR2 mouse was crossed with  a SSTCre:Ai14. This allowed for recording of postsynaptic EPSCs in directly identified SST IPR neurons. We demonstrate that DAMGO can indeed increase glutamatergic EPSCs and in 2 of the cells where light activation demonstrated no appreciable AMPAR EPSC upon maximal LED light activation, DAMGO clearly “unsilenced” transmission.  Thus, our additional analyses directly demonstrate that our original observations concerning mOR modulation extend to the mHb cholinergic AMPAR mediated input onto IPR SST neurons. This additional data is in the revised manuscript (Figure 3D-F, I). Future experimentation will be required to determine if the propensity of encountering a  “silent” input that can be converted to robust synaptic transmission by mOR differs between these two cell types. Furthermore, it will be of interest to investigate if any differences exist in the magnitude of the cholinergic input or the mOR mediated potentiation of co-transmission between postsynaptic SST GABA and glutamatergic neuronal subtypes. 

      Reviewer #2 (Public review)

      Weaknesses: 

      The genetic strategy used to target the mHb-IPN pathway (constitutive expression in all ChAT+ and Tac1+ neurons) is not specific to this projection.  

      This is an important point made. We are acutely aware that the source of the synaptic input in IPN mediated by conditional expression of ChR2 employing  using transgenic cre driver lines does not confer specificity to mHB. This is particularly relevant considering one of the novel observations here relates to  a previously unidentified functional input from TAC1 neurons to the IPR. At this juncture we would like to point the Reviewer to the publicly available Connectivity Atlas provided by the Allen Brain Institute (https://connectivity.brain-map.org/). With reference to mHB TAC1 neuronal output, targeted viral injection into the habenula of Tac1Cre mice allows conditional expression of EGFP to SP neurons as evidenced by the predominant expression of reported fluorescence in dorsal mHB (see Author response image 2 a,b below). Tracing the axonal projections to the IPN clearly demonstrates dense fibers in IPL as expected but also arborization in  IPR (Author response image 2 a,c) . This pattern is reminiscent of that seen in the transgenic Tac1Cre:ai9 or ai32 mice used in the current study (Figs. 1c, 2a, 5c). Closer inspection of the fibers in the IPR reveals putative synaptic bouton like structures as we have shown in Fig. 5a,b (Author response image 2 d below).

      Author response image 2.

      Sterotaxic viral injection into mHB pf Tac1Cre mice taken from Allen Brain connectivity atlas (Link to Connectivity Atlas for mHb SP neuronal projection pattern)

      These anatomical data suggest that part of the synaptic input to the IPR originates from mHB TAC1 neurons although we cannot fully discount additional synaptic input from other brain areas that may impinge on the IPR. Indeed, as the Reviewer points out, it is evident that other regions including the nucleus incertus send outputs to the IPN(Bueno et al., 2019; Liang et al., 2024; Lima et al., 2017). However, it is unclear if neuronal inputs from these alternate sources {Liang, 2024 #123;Lima, 2017 #33}{Bueno, 2019 #178} are glutamatergic in nature AND mediated by a TAC1/OPRM1-expressing neuronal population. Nevertheless, we have now modified text in the discussion to highlight the limitations of using a transgenic strategy (pg 12, para 1).

      In addition, a braking mechanism involving Kv1.2 has not been identified.

      It is unclear to what the Reviewer is referring to here. Although most of our experiments pertaining to the brake on cholinergic  transmission by potassium channels use low concentrations of 4-AP (50100M) which have been used to block Shaker Kv1 channels there although at these concentrations there are additional action at other K+-channels such as Kv3, for instance. However, we essentially demonstrate that a selective Kv1.1 and Kv1.2 antagonist dendrotoxin replicates the 4-AP effects. We have now also included RNAseq data demonstrating the relative expression levels of Kv1 channel mRNA in mHb ChAT neurons (KCNA1 through KCNA6; Figure 6b). The complete absence of KCNA1 yet a high expression level of KCNA2 transcripts highly suggests a central role of Kv1.2 in unmasking nAChR mediated synaptic transmission. 

      Reviewer #3 (Public review)

      Weaknesses:  

      The significance of the ratio of AMPA versus nACh EPSCs shown in Figure 6 is unclear since nAChR EPSCs measured in the K+ channel blockers are compared to AMPA EPSCs in control (presumably 4-AP would also increase AMPA EPSCs). 

      We understand the Reviewer’s concern regarding the calculation of nicotinic/AMPA ratios since they are measured under differing conditions i.e. absence and presence of 4-AP, respectively. As the reviewer correctly points point 4-AP likely increases the amplitude of the AMPA receptor mediated EPSC. However, our intention of calculating this ratio was not to ascertain a measure of relative strengths of fast glutamatergic vs cholinergic transmission onto a given postsynaptic IPN neuron per se. Rather, we used the ratio as a means to normalize the size of the nicotinic receptor EPSC to the strength of the light stimulation (using the AMPA EPSC as the normalizing factor) in each individual recording. This permits a more meaningful comparison across cells/slices/mice . We apologize for the confusion and have amended the text in the results section to reflect this (pg 9; para2).

      The mechanistic underpinnings of the most now  results are not pursued. For example, the experiments do not provide new insight into the differential effects of evoked and spontaneous glutamate/Ach release by Gi/o coupled mORs, nor the differential threshold for glutamate versus Ach release. 

      Our major goal of the current manuscript was to provide a much-needed roadmap outlining the effects of opioids in the habenulo-interpeduncular axis. Of course, a full understanding of the mechanisms underlying such complex opioid actions at the molecular level will be of great value. We feel that this is beyond the scope of this already quite result dense manuscript but will be essential if directed manipulation of the circuit is to be leveraged to alter maladaptive behaviors associated with addiction/emotion during adolescence and in adult. 

      The authors note that blocking Kv1 channels typically enhances transmitter release by slowing action potential repolarization. The idea that Kv1 channels serve as a brake for Ach release in this system would be strengthened by showing that these channels are the target of neuromodulators or that they contribute to activity-dependent regulation that allows the brake to be released. 

      The exact mechanistic underpinnings that can potentially titer Kv1.2 availability and hence nAChR transmission would be essential to shed light on potential in vivo conditions under which this arm of neurotransmission can be modulated. However, we feel that detailed mechanistic interrogation constitutes significant work but one that future studies should aim to achieve. Thus, it presently remains unclear under what physiological or pathological scenarios result in attenuation of Kv1.2 to subsequently promote nAChR mediated transmission but as mentioned in the existing discussion future work to decipher such mechanisms would be of great value.

      Reviewer #1 (Recommendations for the authors): 

      Overall I find this to be a very interesting and exciting paper, presenting novel findings that provide clarity for a problem that has persisted in the IPN field: that of the conundrum that light-evoked cholinergic signaling was challenging to observe despite the abundance of nAChRs in the IPN. 

      Major concerns: 

      (1) The n is quite low in most cases, and in many instances, data from one figure are replotted in another figure. Given that the findings presented here are expected in the normal condition, it should not be difficult to increase the n. A more robust number of observations would strengthen the novel findings presented here. 

      Please refer to the response to the public review above.

      (2) In general, I find the organization of the figures somewhat disjointed. Sometimes it feels as if parts of the information presented in the results are split between figures, where it would make more sense to be together in a figure. For example, all the histology for each of the lines is in Figure 1, but only ephys data for one line is included there. It would be more logical to include the histology and ephys data for each line in its own figure. It would also be helpful to show the overlap of mOR expression with Tac1-Cre and ChAT-Cre terminals in the IPN. Likewise, the summarized Tac1Cre:Ai32 IPR data is in Figure 4, but the individual data is in Figure 5. 

      We introduce both ChAT and TAC1 cre lines in Figure 1 as an overview particularly for those readers who are not entirely familiar with the distinct afferent systems operating with the habenulointerpeduncular pathway.  However, in compliance with the Reviewer’s suggestion we have now restructured the Figures. In the revised manuscript, the functional data pertaining to the various transmission modalities mediated by the distinct afferent systems impinging on the subdivision of the IPN tested are now split into their own dedicated figure as follows:

      Figure 2. 

      mOR effect on TAC1neuronal glutamatergic output in IPL.

      Figure 3. 

      mOR effect on CHAT neuronal glutamatergic output in IPR.

      Figure 5. 

      mOR effect on TAC1neuronal glutamatergic output in IPR.

      Figure 8.

      mOR effect on CHAT neuronal cholinergic output in IPC.

      Supp. Fig. 1 mOR effect on CHAT neuronal glutamatergic output in IPC.

      We thank the Reviewer for their suggestions regarding the style of the manuscript. The restructuring has now resulted in a much better flow of the presented data.

      (3) The discussion is largely satisfactory. However, a little more discussion of the integrative function of the IPN is warranted given the opposing effects of MOR activation in the Tac vs ChAT terminals, particularly in the context of both opioids and natural rewards. 

      We thank the reviewer for this comment. However, we feel the discussion is rather lengthy as is and therefore we refrained from including additional text.  

      Minor concerns: 

      (1)  The methods are missing key details. For example, the stock numbers of each of the strains of mice appear to have been left out. This is of particular importance for this paper as there are key differences between the ChAT-Cre lines that are available that would affect observed electrophysiological properties. As the authors indicate, the ChAT-ChR2 mice overexpress VAChT, while the ChAT-IRES-Cre mice do not have this problem. However, as presented it is unclear which mice are being used. 

      We apologize for the omission - the catalog numbers of the mice employed have now been included in the methods section.

      We have now clearly included in each figure panel (single trace examples and pooled data) from which mice the data are taken from – in some instances the pooled data are from the two CHAT mouse strains employed. Despite the tendency of the ChATChR2 mice to demonstrate more pronounced nAChR mediated transmission (Fig. 7h),  we justify pooling the data since we see no statistical significance in the effect of mOR activation on either potentiating AMPA or nAChR EPSCs (Please refer to response to Reviewer 2, Minor Concern point 2)

      (2) Likewise, antibody dilutions used for staining are presented as both dilution and concentration, which is not typical. 

      We thank the reviewer for pointing out this inconsistency. We have amended the text in the methods to include only the working dilution for all antibodies employed in the study.

      (3) There are minor typos throughout the manuscript. 

      All typos have been corrected.

      Reviewer #2 (Recommendations for the authors): 

      The authors provide a thorough investigation into the subregion, and cell-type effect of mu opioid receptor (MOR) signaling on neurotransmission in the medial habenula to interpeduncular nucleus circuit (mHb-IPN). This circuit largely comprises two distinct populations of neurons: mHb substance P (Tac1+) and cholinergic (ChAT+) neurons. Corroborating prior work, the authors report that Tac1+ neurons preferentially innervate the lateral IPN (IPL) and rostral IPN (IPR), while ChAT+ neurons preferentially innervate the central IPN (IPC) and IPR. The densest expression of MOR is observed in the IPL and MOR agonists produce a canonical presynaptic depression of glutamatergic neurotransmission in this region. Interestingly, MOR signaling in the ChAT+ mHb projection to the IPR potentiates light-evoked glutamate and acetylcholine-mediated currents (EPSC), and this effect is mediated by a MOR-induced inhibition of Kv2.1 channels. 

      Major concerns: 

      (1) The method used for expressing channelrhodopsin (ChR2) into cholinergic and neurokinin neurons in the mHb (Ai32 mice crossed with Cre-driver lines) has limitations because all Tac1+/ChAT+ inputs to the IPN express ChR2 in this mouse. Importantly, the IPN receives inputs from multiple brain regions besides the IPN-containing neurons capable of releasing these neurotransmitters (PMID: 39270652). Thus, it would be important to isolate the contributions of the mHb-IPN pathway using virally expressed ChR2 in the mHb of Cre driver mice. 

      Please refer to the response to the public review above. 

      (2) Figure 4: The authors conclude that the sEPSC recorded from IPR originate from Tac1+ mHbIPR projections. However, this cannot be stated conclusively without additional experimentation. For instance, an optogenetic asynchronous release experiment. For these experiments it would also be important to express ChR2 virus in the mHb in Tac1- and ChAT-Cre mice since glutamate originating from other brain regions could contribute to a change in asynchronous EPSCs induced by DAMGO. 

      This is a well taken point. The incongruent effect of DAMGO on evoked CHAT neuronal EPSC amplitude and sEPSC frequency prompted us  to consider the the possibility of differing effect of DAMGO on a  secondary input. We agree that we do not show directly if the sEPSCs originate from a TAC1 neuronal population. Therefore, we have tempered our wording with regards the origin of the sEPSCs and  have also restructured the Figure in question moving the sEPSC data into supplemental data (Supplemental Fig. 2) 

      (3) Figure 5D: lt would be useful to provide a quantitative measure in a few mice of mOR fluorescence across development (e.g. integrated density of fluorescence in IPR). 

      We have now included mOR expression density across development  (Fig. 6). Interestingly, the adult expression levels of mOR in the IPR are essentially reached at a very early developmental age (P10) yet we see stark differences in the role of mOR activation in modulating glutamatergic transmission mediated by mHB cholinergic neurons. Note: since we processed adult tissue (i.e. >p40) for these developmental analyses we utilized these slices to also include an analysis of the relative mOR expression density specifically in adults between the subdivisions of IPN in Fig. 1.

      (4) Figure 6B: It would be useful to quantify the expression of Kcna2 in ChAT and Tac1 neurons (e.g. using FISH). 

      We thank the Reviewer for this suggestion. We have now included mRNA expression levels available from publicly available 10X RNA sequencing dataset provided by the Allen Brain Institute (Figure 7b).  

      (5) It would be informative to examine what the effects of MOR activation are on mHb projections to the (central) . 

      In response to this suggestion, we now have included  additional data in the manuscript in putative IPC cells that clearly demonstrate a similar DAMGO elicited potentiation of AMPAR EPSC to that  seen in IPR. These data are now included in the revised manuscript  (Supplemental Fig. 1; Fig. 8i). 

      (6) What is the proposed link between MOR activation and the inhibition of Kv1.2 (e.g. beta-Arrestin signaling, G beta-gamma interaction with Kv1.2, PKA inhibition?) 

      We apologize for any confusion. We do not directly test whether the potentiation of EPSCs upon mOR activation occurs via inhibition of Kv1.2.Although we have not directly tested this possibility we find it an unlikely underlying cellular mechanism, especially for the potentiation of the cholinergic arm of neurotransmission since in the presence of DNQX/APV, the activation of mOR does not result in any emergence of any nAChR EPSC (see Supplementary Fig. 3a-c)

      Minor concerns: 

      (1) Methods: Jackson lab ID# for used mouse strains is missing. 

      We apologize for this omission and have now included the mouse strain catalog numbers.

      (2) The authors use data from both ChAT-Cre x Ai32 and ChAT-ChR2 mice. It would be helpful to show some comparisons between the lines to justify merging data sets for some of the analyses as there appear to be differences between the lines (e.g. Figure 6G). 

      This is a well taken point. We have now provided a figure for the Reviewer (see below) that illustrates the lack of  significant difference between the mOR mediated potentiation of both mHB CHAT neuronal AMPAR and nAChR transmission between the two mouse lines employed despite a divergence in the extent of glutamatergic vs cholinergic transmission shown in Fig. 7g (previously Figure 6g). We have chosen not to include this data in the revised manuscript.

      Author response image 3.

      Comparison of the mOR (500nM DAMGO) mediated potentiation on evoked AMPAR (a) and nAChR (b)EPSCs in IPR between ChATCre:Ai32  and ChATChR2 mice.

      (3)  Line 154: How was it determined that the EPSC is glutamatergic? 

      We apologize for any confusion. In the revised manuscript we now clearly point to the relevant figures (see Supplementary Figs. 2a and 3) in the Results section (pg. 4, para 2; pg 7, para 1; pg 8, para2) where we determine that both the sEPSCs and ChAT mediated light evoked EPSCs recorded under baseline conditions are totally blocked by DNQX and hence are exclusively AMPAR events 

      (4) It would be helpful to discuss the differences between GABA-B mediated potentiation of mHbIPN signaling and the current data in more detail. 

      We are unclear as to what differences the Reviewer is referring to. At least from the perspective of ChAT neuronal mediated synaptic transmission, other groups (and in the current study; Fig. 7h) have clearly shown that GABA<sub>B</sub> activation markedly potentiates synaptic transmission like mOR activation. Nevertheless, based on our novel findings it would be of interest to determine whether the influence of GABA<sub>B</sub> is inhibitory onto the TAC mediated input in IPR and whether there is a developmental regulation of this effect as we demonstrate upon mOR activation. These additional comparisons between the effect of the two Gi-linked receptors may shed light onto the similarity, or lack thereof, regarding the underlying cellular mechanisms. We now have included a few sentences in the discussion to highlight this (pg 11, para 1).

      Reviewer #3 (Recommendations for the authors): 

      The abstract was confusing at first read due to the complex language, particularly the sentence starting with... Further, specific potassium channels... 

      The authors might want to consider simplifying the description of the experiments and the results to clarify the content of the manuscript for readers who many only read the abstract. 

      We have altered the wording of the abstract and hope it is now more reader friendly.

      The opposite effect of mOR activation on spontaneous EPSCs versus electrical or ChR2-evoked EPSCs is very interesting and raises the issue of which measure is most physiologically relevant. For example, it is unclear whether sEPSCs arise primarily from cholinergic neurons (that are spontaneously active in the slice, Figure 3), and if so, does mOR activation suppress or enhance cholinergic neuron excitability and/or recruitment by ChR2? While a full analysis of this question is beyond the scope of this manuscript, the assumption that glutamate release assayed by electrical/ChR2 evoked transmission is the most physiologically relevant might merit some discussion since sEPSCs presumably also reflect action-potential dependent glutamate release. One wonders whether mORs hyperpolarize cholinergic neurons to reduce spontaneous spiking yet enhance fiber recruitment by ChR2 or an electrical stimulus (i.e. by removing Na channel inactivation). The authors have clearly stated that they do not know where the mORs are located, and that the effects arising from disinhibition are likely complex. But they also might discuss whether glutamate release following synchronous activation of a fiber pathway by ChR2 or electrode is more or less physiologically relevant than glutamate release assayed during spontaneous activity. It seems likely that an equivalent experiment to Figure 3D, E using spontaneous spiking of IPR neurons would show that spiking is reduced by mOR activation. 

      We thank the Reviewer for this comment. As pointed it would be of interest to dissect the “network” effect of mOR activation but as the Reviewer acknowledges this is beyond the scope of the current manuscript. The Reviewer is correct in postulating that mOR activation results in hyperpolarization of mHB ChAT neurons.  A recent study(Singhal et al 2025) demonstrate that a subpopulation of ChAT neurons undergoes a reduction in firing frequency following DAMGO application. This is corroborated by our own observations although we chose not to include this data in our current manuscript (but see below).

      Additionally, the Reviewer questions whether ChR2/electrical stimulation is physiological. This is a well taken point and of course the simultaneous activation of potentially all possible axonal release sites is not the mode under which the circuit operates. Nevertheless, our data clearly demonstrates the ability of mORs to modulate release under these circumstances that must reflect an impact on spontaneous action potential driven evoked release.  Although the suggested experiment  could shed light on the synaptic outcomes of mOR receptor activation on ES coupling of downstream IPN neurons. Interpretation of the outcome would be confounded by the fact that postsynaptic IPN neurons also express mORs . Thus,  we would not be able to isolate the effects of presynaptic changes in modulating ES coupling from any direct postsynaptic effect on the recorded cell when in current clamp. 

      Together these additional sites of action of mOR (i.e. mHB ChAT somatodendritic and postsynaptic IPN neuron) only serve to further highlight the complex nature of the actions of opioids on the habenulo-interpeduncular axis warranting  future work to fully understand the physiological and pathological effects on the habenulo-interpeduncular axis as a whole.

      The idea that Kv2.1 channels serve as a brake raises the question of whether they contribute to activity-dependent action potential broadening to facilitate Ach release during trains of stimuli. 

      This is an interesting suggestion and one that we had considered ourselves. Indeed, as the Reviewer is likely aware and as mentioned in the manuscript, previous studies have shown nAChR signaling can be revealed under conditions of multiple stimulations given at relatively high frequencies.  We therefore attempted to perform high frequency stimulation (20 stimulations at 25Hz and 50Hz) in the presence of ionotropic glutamatergic receptor antagonists DNQX and APV. We have now included this data in the revised manuscript (Supplementary Fig 3b). As shown, this failed to engage nAChR mediated synaptic transmission in our hands. Interestingly there is evidence from reduced expression systems demonstrating that Kv1.2 channels undergo use-dependent potentiation(Baronas et al., 2015) in contrast to that seen with other K+-channels. Whether this is the case for the axonal Kv1.2 channels on mHB axonal terminals in situ is not known but this may explain the inability to reveal nAChR EPSCs upon delivery of such stimulation paradigms.  

      References 

      Baronas, V. A., McGuinness, B. R., Brigidi, G. S., Gomm Kolisko, R. N., Vilin, Y. Y., Kim, R. Y., … Kurata, H. T. (2015). Use-dependent activation of neuronal Kv1.2 channel complexes. J Neurosci, 35(8), 3515-3524. doi:10.1523/JNEUROSCI.4518-13.2015

      Bueno, D., Lima, L. B., Souza, R., Goncalves, L., Leite, F., Souza, S., … Metzger, M. (2019). Connections of the laterodorsal tegmental nucleus with the habenular-interpeduncular-raphe system. J Comp Neurol, 527(18), 3046-3072. doi:10.1002/cne.24729

      Liang, J., Zhou, Y., Feng, Q., Zhou, Y., Jiang, T., Ren, M., … Luo, M. (2024). A brainstem circuit amplifies aversion. Neuron. doi:10.1016/j.neuron.2024.08.010

      Lima, L. B., Bueno, D., Leite, F., Souza, S., Goncalves, L., Furigo, I. C., … Metzger, M. (2017). Afferent and efferent connections of the interpeduncular nucleus with special reference to circuits involving the habenula and raphe nuclei. J Comp Neurol, 525(10), 2411-2442. doi:10.1002/cne.24217

      Singhal, S. M., Szlaga, A., Chen, Y. C., Conrad, W. S., & Hnasko, T. S. (2025). Mu-opioid receptor activation potentiates excitatory transmission at the habenulo-peduncular synapse. Cell Rep, 44(7), 115874. doi:10.1016/j.celrep.2025.115874

      Stinson, H.E., & Ninan, I. (2025). GABA(B) receptor-mediated potentiation of ventral medial habenula glutamatergic transmission in GABAergic and glutamatergic interpeduncular nucleus neurons. bioRxiv doi.10.1101/2025.01.03.631193

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary: 

      Seon and Chung's study investigates the hypothesis that individuals take more risks when observed by others because they perceive others to be riskier than themselves. To test this, the authors designed an innovative experimental paradigm where participants were informed that their decisions would be observed by a "risky" player and a "safe" player. Participants underwent fMRI scanning during the task. 

      Strengths: 

      The research question is sound, and the experimental paradigm is well-suited to address the hypothesis. 

      Weaknesses:

      I have several concerns. Most notably, the manuscript is difficult to read in parts, and I suggest a thorough revision of the writing for clarity, as some sections are nearly incomprehensible. Additionally, key statistical details are missing, and I have reservations about the choice of ROIs.

      We appreciate the reviewer’s interest in and positive assessment of our work, and we thank the reviewer for the constructive feedback. In the current revision, we have revised the manuscript for clarity and added previously omitted statistical details. Furthermore, in the response letter, we have also provided additional explanations to clarify our approach, including the rationale for the choice and use of ROIs.

      Reviewer #2 (Public review): 

      Summary: 

      This study aims to investigate how social observation influences risky decision-making. Using a gambling task, the study explored how participants adjusted their risk-taking behavior when they believed their decisions were being observed by either a risk-averse or risk-seeking partner. The authors hypothesized that individuals would simulate the choices of their observers based on learned preferences and integrate these simulated choices into their own decision-making. In addition to behavioral experiments, the study employed computational modeling to formalize decision processes and fMRI to identify the neural underpinnings of risky decision-making under social observation. 

      Strengths: 

      The study provides a fresh perspective on social influence in decision-making, moving beyond the simple notion that social observation leads to uniformly riskier behavior. Instead, it shows that individuals adjust their choices depending on their beliefs about the observer's risk preferences, offering a more nuanced understanding of how social contexts shape decision-making. The authors provide evidence using comprehensive approaches, including behavioral data based on a well-designed task, computational modeling, and neuroimaging. The three models are well selected to compare at which level (e.g., computing utility, risk preference shift, and choice probability) the social influence alters one's risky decision-making. This approach allows for a more precise understanding of the cognitive processes underlying decision-making under social observation. 

      Weaknesses: 

      While the neuroimaging results are generally consistent with the behavioral and computational findings, the strength of the neural evidence could be improved. The authors' claims about the involvement of the TPJ and mPFC in integrating social information are plausible, but further analysis, such as model comparisons at the neuroimaging level, is needed to decisively rule out alternative interpretations that other computational models suggest. 

      We appreciate the reviewer’s interest in and positive assessment of our work, and we thank the reviewer for the constructive feedback. In the current revision, we have included neural results from additional analyses, which we believe provide stronger support for our proposed computational model.

      Reviewer #3 (Public review): 

      Summary: 

      This is an important paper using a novel paradigm to examine how observation affects the social contagion of risk preferences. There is a lot of interest in the field about the mechanisms of social influence, and adding in the factor of whether observation also influences these contagion effects is intriguing.

      Strengths:

      (1) There is an impressive combination of a multi-stage behavioural task with computational modelling and neuroimaging.

      (2) The analyses are well conducted and the sample size is reasonable. 

      Weaknesses: 

      (1) Anatomically it would be helpful to more explicitly distinguish between dmPFC and vmPFC. Particularly at the end of the introduction when mPFC and vmPFC are distinguished, as the vmPFC is in the mPFC. 

      (2) The authors' definition of ROIs could be elaborated on further. They suggest that peaks are selected from neurosynth for different terms, but were there not multiple peaks identified within a functional or anatomical brain area? This section could be strengthened by confirming with anatomical ROIs where available, such as the atlases here http://www.rbmars.dds.nl/lab/CBPatlases.html and the Harvard-Oxford atlases. 

      (3) How did the authors ensure there were enough trials to generate a reliable BOLD signal? The scanned part of the study seems relatively short. 

      (4) It would be helpful to add whether any brain areas survived whole-brain correction. 

      (5) There is a concern that mediation cannot be used to make causal inferences and much larger samples are needed to support claims of mediation. The authors should change the term mediation in order to not imply causality (they could talk about indirect effects instead) and highlight that the mediation analyses are exploratory as they would not be sufficiently powered (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2843527/). 

      (6) The authors may want to speculate on lifespan differences in this susceptibility to risk preferences given recent evidence that older adults are relatively more susceptible to impulsive social influence (Zhu et al, 2024, comms psychology). 

      We appreciate the reviewer’s interest in and positive assessment of our work, and we thank the reviewer for the constructive feedback. In the response letter below, we address each of the reviewer’s comments, including clarifications regarding the ROIs and the limitations of the current study in interpreting the results.

      Reviewer #1 (Recommendations for the authors):

      (1) The neuroimaging hypotheses seem post hoc to me. First, the term "social inference" is used very loosely. In line 103 the authors mentioned that TPJ has been reported to be involved in inferring other's intentions and learning about others. However, in their task, it is not clear where inference is needed. All participants need to do is recall others' "preferences", rather than inferring a hidden variable or hidden intention. In addition, in some of the studies that the authors have cited (e.g., Park et al. 2021), the hippocampus is the focus of the inference, which gets no mention here.

      How does solving this task require inference (as defined by the authors: inferring others' intentions)? And why do they choose TPJ while inference is not needed in this task?

      We regret any confusion and would like to take this chance to clarify our hypothesis on social inference. As the reviewer pointed out, participants were indeed instructed to predict their choices, through which we expected them to learn the demonstrators’ preferences. Our computational model suggests that during the main phase of the task, i.e., the Observed phase, participants simulated others’ choices based on these previously learned risk preferences of others. The gamble choices they encountered (payoffs and associated probabilities) did not overlap with those in the Learning phase, and therefore, we expected that the cognitive process triggered by the social context involved active simulation—what we describe as making inference about others—rather than simple ‘recall’ of previously learned information. In line with this reasoning, we hypothesized that the TPJ, a brain region previously implicated in simulating others’ actions and intentions, would play a key role during the Observed phase.

      Regarding the role of the hippocampus, the paper we cited by BoKyung Park et al. (2021), titled “The role of right temporoparietal junction in processing social prediction error across relationship contexts”, highlights the involvement of the rTPJ but does not mention the hippocampus. We are aware of the study by Seongmin A. Park et al. (2021), “Inferences on a multidimensional social hierarchy use a grid-like code”, which shows the involvement of the hippocampus and entorhinal cortex in making inferences about multidimensional social hierarchies; we believe the reviewer may have mistakenly assumed that we cited this article. As the study showed, the involvement of the hippocampus—and the use of its grid-like representation of social information—is likely tied to the multidimensional nature of task states. In our study, the hippocampus was not included as an ROI because we had no specific rationale to hypothesize that such grid-like representations would be recruited by our task.

      (2) Social influence can be motivated informationally (to improve accuracy) or normatively (to be aligned with others). To me, it seems that the authors have studied the latter, because, first, there is no objectively correct response in this task and second, because participants changed their risk preference according to the preference of the observing partner. This distinction has not been made throughout the manuscript. This is important because the two process (information and normative) are supported by different neural processes and it is extremely useful to understand neural basis of which process the authors are studying.

      We thank the reviewer for the opportunity to clarify the anticipated role of social influence in our study. As the reviewer pointed out, the gambling task used in our task does not have objectively correct or incorrect answers, and naturally, any social influence present during the task would align with normative social influence. To clarify this point, we have revised the discussion section as follows:

      [Page 9, Line 345]

      Observational learning and mimicry of others’ behavior are patterns commonly found in social animals, including nonhuman primates (Van de Waal et al., 2013). Such behaviors are thought to be driven either by a motivation to acquire additional information (‘informational conformity’) or by a motivation to align with group norm (‘normative conformity’), even when doing so does not necessarily lead to better outcomes (e.g., higher accuracy) (Cialdini & Goldstein, 2004). Given that there are no objectively correct or incorrect answers in the gambling task used in our study, the observed social influence is more consistent with normative conformity. However, we cannot rule out the possibility that individuals developed false beliefs about a particular observing partner—namely, that the partner had greater control over or insight into the gambling task. Future studies are needed to directly investigate whether individuals’ beliefs about others modulate informational social influence—that is, their motivation to use social information to gain additional insight by inferring others’ potential choices.

      (3) From Line 160 onward, the authors report several findings without providing any effect sizes or statistics. Please add effect size and statistics for each finding.

      We thank the reviewer for pointing this out. We have now added the corresponding effect sizes and statistical values for the reported findings, beginning from Line 160 in the revised manuscript.

      (4) Line 270: "In particular, bilateral TPJ, brain regions not implicated in the Solo phase, positively tracked trial-by-trial model-estimated decision probabilities". How can the authors conclude that TPJ is not involved in the solo phase? As far as I understood from the text, TPJ was not included as one of the ROIs for analysis of the Solo phase. If it was included, it should be mentioned in the text and there should be a direct comparison between the effect sizes of the solo and the observer phase. If not, "not implicated in the Solo phase" is not justified and should be removed.

      We apologize for the confusion. As the reviewer correctly pointed out, the TPJ was not included among the ROIs in our analysis of the Solo phase data; therefore, its involvement during the Solo phase was never directly assessed using an ROI-based approach.

      To examine brain responses during the Observed phase, we first assessed whether regions that tracked decision probabilities during the Solo phase—vmPFC, vStr, and dACC—were also engaged in the Observed phase. The involvement of the TPJ during the Observed phase was revealed through a subsequent whole-brain analysis. To clarify this point, we now have revised the corresponding part as follows:

      [Page 8, Line 276]

      In particular, bilateral TPJ positively, brain regions not implicated in the Solo phase, tracked trial-by-trial model-estimated decision probabilities

      à Notably, bilateral TPJ showed significant positive tracking of decision probabilities ~

      (5) I am a bit puzzled about the PPI analysis. Is the main finding increased connectivity within mPFC in the observing condition? PPI is often done between two separate brain regions. I am not sure what it means that connectivity within mPFC increases in one condition compared to another. What was the motivation for this analysis? Can you also please explain what it means?

      As the reviewer noted, psychophysiological interaction (PPI) analyses examine functional connectivity between brain regions as modulated by a psychological factor. To clarify our result, the reported ‘mPFC-mPFC connectivity’ refers to functional connectivity between the mPFC region responsive to the presence of an observing partner and an adjacent, anatomically distinct region within the mPFC. Note that we have revised the manuscript to refer to this region more specifically as the dorsomedial prefrontal cortex (dmPFC). Please see our response to Reviewer 3, Comment 1, for further details.

      During the Observed phase of our task, social information was processed at two distinct time points. First, at the beginning of each decision trial, individuals were cued with the presence (or absence) of an observing partner (‘Partner presentation’). Second, the gamble options, as well as the observing partner’s identity, were revealed (‘Options revealed’). Because participants had previously learned about the observing partner’s risk preferences, we expected them to simulate the choice the partner would likely make. We hypothesized that if individuals indeed simulated the partner’s choice and incorporated this information into their decision-making process, the brain region involved in recognizing the partner’s presence (dmPFC<sub>contrast</sub>) would be functionally connected to the region responsible for integrating social information into the final decision (TPJ). Our results showed that the two regions were functionally connected via an indirect path through an anatomically adjacent cluster within the mPFC (dmPFC<sub>PPI</sub>). Given that the recognition of the partner’s presence and the simulation of their choice occurred at two distinct time points, we interpreted the functional connectivity between the two dmPFC clusters (dmPFC<sub>contrast</sub> and dmPFC<sub>PPI</sub>) as evidence that the dmPFC<sub>PPI</sub>) remained engaged during the decision process to support simulation, rather than being involved solely in the passive recognition of the social context (i.e., observed vs not observed). Note that, consistent with this interpretation, functional connectivity was stronger in individuals who showed greater reliance on social information ('Social reliance' parameter in our model).

      To avoid confusion, we have now labeled the two dmPFC clusters as dmPFC<sub>contrast</sub>—the seed region identified at partner presentation—and dmPFC<sub>PPI</sub>—the target region identified in the PPI analysis.

      [Page 8, Line 284]

      This cue was intended to dissociate neural responses to the social context per se (i.e., the presence of an observing partner), which we hypothesized would initiate social processing, from the neural processes involved in incorporating this information during the subsequent decision-making phase.

      [Page 8, Line 291]

      We tested whether the dmPFC was also involved in incorporating social information during the decision process under social observation, particularly among individuals who relied more heavily on simulating others’ behavior.

      [Page 8, Line 297]

      We confirmed that the functional connectivity between the dmPFC<sub>contrast</sub> which is sensitive to cues regarding the presence of an observing partner, and its adjacent, anatomically distinct region within the dmPFC (‘dmPFC<sub>PPI</sub>’ hereafter; x = 3, y = 50, z = 5, k<sub>E</sub> = .74, cluster-level P<sub>FWE, SVC</sub> = 0.011; Fig. 4a, b, Table S5) was positively associated with individuals’ social reliance.

      (6) In Line 107 the authors say "excitatory stimulation of the TPJ improved social cognition". Improved social cognition is too general and unspecific. Please be more specific.

      We agree that the term ‘social cognition’ was too general and unspecific. In the revised manuscript, we have specified that the improvement was observed in tasks specifically involving the control of self-other representation, as demonstrated by Santiesteban et al. (2012).

      [Page 4, Line 106]

      Corroborating with these neuroimaging data, excitatory stimulation of the TPJ improved social cognition (Santiesteban et al., 2012),~

      à Corroborating these neuroimaging findings, excitatory stimulation of the TPJ improved social cognition involving the control of self-other representation (Santiesteban et al., 2012),~

      Writing:

      We thank the reviewer for their thorough evaluation of our manuscript. We have now made the necessary revisions in accordance with the provided comments.

      (7) Line 75: "one risky options" should be one risky option.

      [Page 3, Line 74]

      between one safe (i.e., guaranteed payoff) and one risky options.

      between a safe option (i.e., guaranteed payoff) and a risky option.

      (8) Line 82: were given with the same set of gamble should be "were given the same set of gamble".

      [Page 3, Line 81]

      In the third phase (‘Observed phase’), individuals were given with the same set of gamble choices they faced in the Solo phase,

      In the third phase (‘Observed phase’), individuals were given the same set of gamble choices they faced in the Solo phase,~

      (9) Line 63: and that the extent of such influence depends on the identity of the observer. It is not clear what the authors mean by the "identity of observer". Does it mean the preference of the observer?

      Van Hoorn et al. (2018) showed that the degree of social influence varies depending on whether individuals are being observed by parents or by peers. While one might attribute this difference to divergent preferences typically held by parents and peers, it is important to note that other factors may also differ between these social groups. To avoid overinterpretation while preserving the original meaning, we have revised the sentence as follows:

      [Page 3, Line 61]

      However, recent studies showed that the unidirectional influence of social others’ presence may be also observed in adults (Otterbring, 2021), and that the extent of such influence depends on the identity of the observer (Van Hoorn et al., 2018).  

      However, recent studies showed that the unidirectional influence of social others’ presence can also be observed in adults (Otterbring, 2021), and that the extent of this influence depends on the observer’s identity—specifically, whether the observer is a parent or a peer (Van Hoorn et al., 2018).

      (10) Line 103: "including inferring others' intention and in learning about others." An "in" is missing right before inferring.

      [Page 4, Line 101]

      The temporoparietal junction (TPJ) is another region known to play an important role in social cognitive functions, including inferring others’ intention and in learning about others (Behrens et al., 2008; Boorman et al., 2013; Charpentier et al., 2020; Park et al., 2021; Samson et al., 2004; Saxe & Kanwisher, 2003; Saxe & Kanwisher, 2013; Van Overwalle, 2009; Young et al., 2010).

      The temporoparietal junction (TPJ) is another region known to play an important role in a range of social cognitive functions, including simulating others’ intention and choices, as well as learning about others (Behrens et al., 2008; Boorman et al., 2013; Charpentier et al., 2020; Park et al., 2021; Samson et al., 2004; Saxe & Kanwisher, 2003; Saxe & Kanwisher, 2013; Van Overwalle, 2009; Young et al., 2010).

      (11) 106: "Corroborating with these neuroimaging data." It should be "corroborating these neuroimaging data".

      [Page 4, Line 106]

      Corroborating with these neuroimaging data, ~

      Corroborating these neuroimaging findings, ~

      (12) Lines 113-115. It is not clear what the authors are trying to say here.

      We have now revised the sentence as follows:

      [Page 4, Line 112]

      We hypothesized that even if others’ choices are not explicitly presented, simple presence of social others may trigger inference about others’ potential choices, and the same set of brain regions will play an important role in value-based decision-making.

      We hypothesized that, even in the absence of explicit information about others’ choices, the mere presence of social others could lead participants to conform to the option they believe others would choose. To do so, participants would need to simulate others’ potential choices, particularly when option values vary across trials. As a result, we propose that the same brain regions involved in simulating others’ decisions would also be engaged during value-based decision-making in the presence of social observers.

      (13) Line 151: This sentence is too long and hard to follow:

      We have now revised the sentence as follows:

      [Page 5, Line 154]

      Furthermore, individuals’ prediction responses on subsequent 10 prediction trials where no feedback was provided (Fig. 2b) as well as self-reports about the perceived riskiness of the partners collected at the end of the Learning phase (Fig. 1d) consistently showed that they were able to distinguish one partner from the other, and correctly estimate the partners’ risk preferences (Predicted risk preference: t(42) = -11.46, P = 1.66e-14; Self-report: t(42) = -35.83, P = 4.10e-33).

      Furthermore, individuals’ prediction responses during the subsequent 10 trials without feedback consistently indicated that they could distinguish between the two partners and accurately estimate each partner’s risk preferences (t(42) = -11.46, P = 1.66e-14; Fig. 2b). Self-reported ratings of the partners’ perceived riskiness, collected after the Learning phase, further supported this finding (t(42) = -35.83, P = 4.10e-33; Fig. 1d).

      (14) Line 178: This sentence is very hard to follow. I am not sure what the authors were trying to say here. Please clarify.

      We have now revised the sentence as follows:

      [Page 5, Line 183]

      Various previous studies examined the impacts of social context on decision-making processes, but the suggested mechanisms by which individuals were affected by the social information depended on how the information was presented.

      à Previous studies have shown that social context can influence decision-making processes. However, the underlying mechanisms proposed have varied depending on how the social information was presented.

      (15) Line 183: "when individuals were given with the chances" should be "when individuals were given the chance".

      [Page 5, Line 187]

      On the contrary, when individuals were given with the chances~

      On the contrary, when individuals were given the chances~

      (16) Line 192: "are sensitive to the identity of the currently observing partner...". Do the authors mean are sensitive to the preferences of the currently observing partner? If so, please clarify, it is hard to follow.

      We have now revised the sentence as follows:

      [Page 5, Line 195]

      We hypothesized that if individuals are sensitive to the identity of the currently observing partner, they would take into account the learned preferences of others in computing their choices rather than simply in guiding the direction how to change their own preferences.

      à We hypothesized that if individuals are sensitive to the learned preferences of the observing partner, they would use this information to simulate the partner’s likely choices, rather than simply aligning their own preferences with those of the partner.

      Reviewer #2 (Recommendations for the authors):

      (1) The current neuroimaging findings appear to support the decision processes of all three models. I recommend that the authors provide more detailed evidence of model comparisons in the neuroimaging analysis. This should go beyond simply comparing the goodness of fit of neural activity.

      We acknowledge that neuroimaging data alone often do not provide conclusive evidence for specific information processing. In our study, we examined brain regions that track decision probabilities and are associated with social cognition, such as simulating others’ choice tendencies. Because these processes are general and not tied to a specific computational model, neural responses supporting the occurrence of such processes cannot be used to rule out alternative decision models. For this reason, our approach prioritized a rigorous behavioral model comparison as a critical first step before probing the neural substrates underlying the proposed mechanism. Our behavioral model comparisons, including both quantitative fit indices and qualitative pattern predictions, indicated that the proposed model best accounted for participants' decision patterns across task conditions.

      More importantly, to further validate the model, we conducted a model recovery analysis (see Fig. S2b in SI), which confirmed that our model can be reliably distinguished from alternative accounts even when behavioral differences are subtle. This result suggests that our model captures unique and meaningful characteristics of the decision process that are not equally well explained by competing models.

      With this behavioral foundation, our neuroimaging analyses were designed not to serve as independent model arbiters, but rather to examine whether brain activity in regions of interest reflected the computations specified by the best-fitting model. We believe this two-step approach—first establishing behavioral validity, then linking model-derived variables to neural data—offers a principled framework for identifying the cognitive and neural mechanisms of decision-making.

      Nevertheless, per the reviewer’s suggestion, we further examined whether there is neural encoding of both the participant’s own utility and the observer’s utility—serving as potential neural evidence to differentiate our model from the two alternative models. Please see below for our response to Reviewer 2’s Comment (2).

      (2) Specifically, if participants are combining their own and simulated choices at the level of choice probability, we would expect to see neural encoding of both their own utility and the observer's utility. These may be observed in different areas of the mPFC, as demonstrated by Nicolle et al. (Neuron, 2012). In that study, decisions simulating others' choices were associated with activity in the dorsal mPFC, while one's own decisions were encoded in the vmPFC. On the contrary, if the brain encodes decision values based on the shifted risk preference, rather than encoding each decision's value in separate brain areas, this would support the alternative model.

      We thank the reviewer for this constructive comment. In our Social reliance model, we assumed that the decision probability based on an individual’s own risk preferences, as well as that based on the observing partner’s risk preferences, both contribute to the individual’s final choice. As the reviewer suggested, neural evidence that differentiates our model from the two alternative models—the Risk preference change model and the Other-conferred utility model—would involve demonstrating neural encoding of both the participant’s own utility and the observer’s utility.

      The utility differences between chosen and unchosen options from the two perspectives—self and observer—were highly correlated, preventing us from including both as regressors in the same design matrix. Instead, we defined ROIs along the ventral-to-dorsal axis of the mPFC, and examined whether each ROI more strongly reflected one’s own utility or that of the observer. Based on the meta-analysis by Clithero and Rangel (2014), we defined the most ventral mPFC ROI (ROI1) as a 10 mm-radius sphere centered at coordinate [x=-3, y=41, z=-7], a region previously associated with subjective value. From this ventral seed, we defined four additional spherical ROIs (10 mm radius each) at 12 mm intervals along the ventral-to-dorsal axis, resulting in five ROIs in total: ROI2 [x=-3, y=41, z=5], ROI3 [x=-3, y=41, z=17], ROI4 [x=-3, y=41, z=29], ROI5 [x=-3, y=41, z=41].

      Consistent with Nicolle et al. (2012), the representation of one’s own utility (labelled as ‘Own subjective value’) and that of the observer (‘Observer’s subjective value’) was organized along the ventral-to-dorsal axis of the mPFC. Specifically, utility signals from the participant’s own perspective (SV<sub>chosen, self</sub> – SV<sub>unchosen, self</sub>) were most prominently represented in the ventral-most ROIs (blue), whereas utility signals from the observer’s perspective (SV<sub>chosen, observer</sub> – SV<sub>unchosen, observer</sub>) were most strongly represented in the dorsal-most ROIs (orange).

      (3) Additionally, the authors may be able to detect neural signals related to conflict when the decisions of the individual and the observer differ, compared to when the decisions are congruent. These neural signatures would only be present if social influences are integrated at the choice level, as suggested by the authors.

      If individuals simulate the choices that others might make, they may compare them with the choices they would have made themselves. To investigate this possibility, we categorized task trials as Conflict or No-conflict trials based on greedy choice predictions derived from a softmax decision rule. Conflict trials were those in which the choice predicted from the participant’s own risk preference differed from that predicted for the observer, whereas No-conflict trials involved the same predicted choice from both perspectives. A contrast between Conflict and No-conflict trials revealed that the dACC and dlPFC—regions previously associated with conflict monitoring and cognitive control (Shenhav et al., 2013)—were sensitive to differences in choice tendencies between the self and observer perspectives.

      Author response image 1.

      dACC and dlPFC are associated with the discrepancy between participants’ own choice tendencies and those of observing partners, as estimated based on prior beliefs about the partners’ risk preferences.

      As the reviewer suggested, these results provide evidence in support of the Social Reliance model, which posits that participants simulate the observer's choice and integrate it with their own.

      (4) Incorporating these additional analyses would provide stronger evidence for distinguishing between the models.

      We again thank the reviewer for these constructive suggestions. Based on the new set of analyses and results, we have made the necessary revisions as noted above. We agree that these revisions provide stronger evidence for distinguishing between the models.

      Reviewer #3 (Recommendations for the authors):

      (1) Anatomically it would be helpful to more explicitly distinguish between dmPFC and vmPFC. Particularly at the end of the introduction when mPFC and vmPFC are distinguished, as the vmPFC is in the mPFC.

      We appreciate the reviewer’s suggestion regarding the anatomical distinction between the dmPFC and vmPFC, particularly in relation to our use of the term “mPFC.” We acknowledge that the dmPFC and vmPFC are subregions of the broader mPFC. In our original manuscript, we referred to one region as mPFC in line with prior studies highlighting its role in social cognition and contextual processing (Behrens et al., 2008; Sul et al., 2015; Wittmann et al., 2016). However, in response to the reviewer’s comment and to more clearly distinguish this region from the ventral portion of the mPFC (i.e., vmPFC), which is canonically associated with subjective valuation, we have now revised the manuscript to refer to this region as the dmPFC. This terminology better reflects its association with social cognition, including model-estimated social reliance and sensitivity to social cues in our study.

      (2) The authors' definition of ROIs could be elaborated on further. They suggest that peaks are selected from neurosynth for different terms, but were there not multiple peaks identified within a functional or anatomical brain area? This section could be strengthened by confirming with anatomical ROIs where available, such as the atlases here http://www.rbmars.dds.nl/lab/CBPatlases.html and the Harvard-Oxford atlases.

      We appreciate the opportunity to clarify how our ROIs were defined. To identify the ROIs, we drew upon both prior literature and results from a term-based meta-analysis using Neurosynth. For each meta-map, we applied an FDR-corrected threshold of p < 0.01 and a cluster extent threshold of k ≥ 100 voxels to identify distinct functional clusters. For each cluster, we constructed a spherical ROI (radius = 10 mm) centered on its center of gravity. Note that for each anatomically distinct brain region, only a single center of gravity was identified and used to define the ROI. The resulting ROIs were subsequently used for small volume correction (SVC) in the second-level fMRI analyses.

      For brain regions associated with decision-making processes, we obtained a meta-analytic activation map associated with the term “decision” from Neurosynth. After applying an FDR-corrected threshold of p < 0.001 and a cluster extent threshold of k ≥ 100 voxels, we identified five distinct clusters: vmPFC [x = -3, y = 38, z = -10]; right vStr [x = 12, y = 11, z = -7]; left vStr [x = -12, y = 8, z = -7]; dACC [x = 3, y = 26, z = 44]; and left Insula [x = -30, y = 23, z = -1]. To identify brain regions involved in decision-making under social observation, we used the Neurosynth meta-map associated with the term “social”, applying the same criteria (FDR p < 0.001, k ≥ 100). This analysis revealed several clusters, including bilateral TPJ: right TPJ [x = 51, y = -52, z = 14]; left TPJ [x = -51, y = -58, z = 17]. To isolate brain regions more specifically associated with social processing rather than valuation, we also constructed a conjunction map using the meta-maps for the terms “social” and “value.” We identified clusters present in the “social” map, but not in the “value” map. This analysis yielded, among others, a cluster in the dmPFC [x = 0, y = 50, z = 14].

      To clarify our ROI analysis methods, we have now revised the manuscript to include more detailed information about the procedures used, as follows:

      [Page 19, Line 746]

      Region-of-interest (ROI) analyses. To define ROIs for the neural analyses conducted in the Observed phase, we used significant clusters identified during the Solo phase. Specifically, regions showing significant activation for Prob(chosen) in the DM0 (thresholded at P < 0.001) were selected as ROIs. Three ROI clusters were defined: the vStr (peak voxel at [x = 3, y = 14, z = -10], k<sub>E</sub> = 9), vmPFC (peak voxel at [x = –3, y = 62, z = –13], k<sub>E</sub> = 99), and dACC (peak voxel at [x = 12, y = 32, z = 29], k<sub>E</sub> = 118). These ROIs were then applied in the Observed phase analyses to test whether similar neural representations are also engaged in social contexts.

      Term-based meta-analytic maps from Neurosynth for small volume correction. To reduce the likelihood of false positives arising from random significant activations and to enhance sensitivity within regions of theoretical interest, small volume correction (SVC) was applied using term-based meta-analytic maps from Neurosynth. This approach allows for hypothesis-driven correction by restricting statistical testing to anatomically and functionally defined ROI. Specifically, three meta-analytic maps were generated using Neurosynth’s term-based analyses (Yarkoni et al., 2011), with a false discovery rate (FDR) corrected P < 0.01 and a cluster size > 100 voxels. For each resulting cluster, we defined a spherical ROI with a 10 mm radius centered on the cluster’s center of gravity. For each anatomically distinct brain region, only a single center of gravity was identified and used to define the corresponding ROI.

      First, to identify regions encoding final decision probabilities during the Solo phase and enhance sensitivity, we used the meta-map associated with the term “decision” to identify neural substrates of value-based decision-making. This yielded three clusters: vmPFC ([x = -3, y = 38, z = -10]), vStr ([x = 12, y = 11, z = -7]), and dACC ([x = 3, y = 26, z = 44]) (Fig. 3a, S7). Second, to examine social processing during the Observed phase, we used the meta-map associated with the term “social” to identify brain regions typically involved in social cognition. This analysis revealed clusters, including the rTPJ ([x = 51, y = -52, z = 14]) and lTPJ ([x = -51, y = -58, z = 17]) (Fig. 3c, S8a). Third, to define an ROI involved in processing social cues independent of valuation, we used a meta-map associated with “social” but excluding “value”, isolating regions specific to non-valuation-related social cognition. This analysis revealed a cluster, including the dmPFC ([x = 0, y = 50, z = 14]) (Fig. 3d, 4a, S8b).

      (3) How did the authors ensure there were enough trials to generate a reliable BOLD signal? The scanned part of the study seems relatively short.

      We appreciate the reviewer’s concern regarding the number of trials and the potential implications for the reliability of the resulting BOLD signals. While we did not conduct formal statistical tests to determine the optimal number of trials, our task design, in general, followed well-established principles in functional neuroimaging. Specifically, we employed a jittered event-related design and used both temporal and dispersion derivatives in the GLM analyses. These strategies are widely recognized for enhancing the efficiency of BOLD signal deconvolution and improving model fit by accounting for inter-subject and inter-regional variability in the hemodynamic response function (HRF). Furthermore, the number of trials per condition in our study was comparable to those reported in previous publications (20-30 trials) that employed similar gambling paradigms to examine individual differences in the neural substrates of value-based decision-making (Chung et al., 2015; Chung et al., 2020).

      (4) It would be helpful to add whether any brain areas survived whole-brain correction.

      No brain regions survived whole-brain correction. Nevertheless, as described in the introduction, we had strong a priori hypotheses. Based on these hypotheses, we defined term-based ROIs using Neurosynth, and conducted small volume correction analyses. Per the reviewer’s suggestion, we have added information indicating that no brain regions survived whole-brain correction, as follows:

      [Page 8, Line 281]

      No additional regions survived whole-brain correction.

      (5) There is a concern that mediation cannot be used to make causal inferences and much larger samples are needed to support claims of mediation. The authors should change the term mediation in order to not imply causality (they could talk about indirect effects instead) and highlight that the mediation analyses are exploratory as they would not be sufficiently powered (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2843527/).

      We acknowledge the reviewer’s concerns regarding the causal interpretation of mediation analysis results. Per this comment, we have revised the manuscript as follows to avoid overinterpreting these results and to refrain from implying any causal inference.

      [Page 9, Line 327]

      Given that our sample size is smaller than the recommended threshold for detecting mediation effects (Fritz & MacKinnon, 2007), this significant indirect effect should be interpreted with caution, particularly with respect to causal inference.

      (6) The authors may want to speculate on lifespan differences in this susceptibility to risk preferences given recent evidence that older adults are relatively more susceptible to impulsive social influence (Zhu et al, 2024, comms psychology).

      We thank the reviewer for the thoughtful suggestion—we believe the referenced work is Zhilin Su et al. (2024). As noted in our manuscript, all participants in the current study were young adults aged between 18 and 29 years. Given this limited age range, our dataset does not provide sufficient variability to directly examine age-related differences across the lifespan. However, we are planning a follow-up study using the same task with older adult participants, which we believe will provide a valuable opportunity to address this important gap in understanding susceptibility to social influence across the lifespan.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for authors):

      (1) Motivation for studying SUL1 in RLS

      Considering that the regulation of cellular metabolism in response to nutrient availability is crucial for cell survival and lifespan, and several organic nutrient transporters have also been implicated in the mediation of aging, we believe that transporters of specific nutrients can transduce the signal downstream to control genes responsible for survival. However, the impact of inorganic nutrient transporters, including phosphate and sulfate, on longevity remains largely unexplored. And another work of our group utilized a LASSO model derived from multi-omics data related to yeast aging, identifying SUL1 as a key candidate for regulating lifespan, which aroused our interest.

      (2) Discrepancy with prior RLS data (PMID: 26456335)​​

      Previous literature (PMID: 26456335) reported a limited number of experimental cells (n=25), which may have contributed to the observed variability in results. To enhance the reliability of our work, we have expanded the number of experimental cells for the sul1Δ strain to 400 (see Figure 1A). In contrast, the lifespan data for other mutant strains have been increased to 200 (see Figure 1B). This confirms the reproducibility of the lifespan extension observed in the sul1Δ strain.

      (3) Mechanistic link between sulfate transport and lifespan​​

      Sulfate absorption assays were performed on the WT, SUL1Δ, SUL2Δ, and SUL1<sup>E427Q</sup> strains (Figure 1C). Compared to the wild type (WT), the SUL1Δ, SUL2Δ, and SUL1<sup>E427Q</sup> strains exhibited delayed sulfate intracellular transportation. However, there was no significant difference in the final concentration of intracellular sulfur ions among all groups. This result reinforces our conclusion that the extended lifespan of SUL1Δ is not associated with sulfate transport.

      (4) Testing the RLS of SUL1ΔMSN4Δ double mutants​​

      The replicative lifespan data for the SUL1ΔMSN4Δ double mutant were further analyzed (shown in the following supplementary figure). It was observed that the extension of the SUL1Δ lifespan was not rescued by the knockout of MSN4, supporting the hypothesis that MSN2 may serve as the downstream transcription factor responsible for the increased lifespan of SUL1Δ.

      Author response image 1.

      Replicative life span of MSN4 deletion mutants in WT and SUL1Δ strains.

      Reviewer #2 (Recommendations for authors):

      (1) Inconsistent WT lifespan in Figure 1B

      All measurements of life expectancy were conducted under controlled conditions (30°C, 2% glucose). The revised Figure 1C illustrates that across three independent experiments (n=200 cells), the average lifespan of wild-type (WT) cells was 29.1 generations, which is comparable to the average lifespan of 25.6 generations reported in Figure 1A after data expansion (n=400 cells). This similarity may be attributed to experimental variability arising from multiple trials; however, it does not compromise the validity of our conclusions.

      (2) Sulfate level measurements​​

      Intracellular sulfate levels were measured by quantitatively assessing the sulfate concentrations in wild-type (WT), SUL1Δ, SUL2Δ, and SUL<sup>E427</sup> cells, as detailed in the methods section (Figure 1C). The results indicated that all mutant strains showed a delayed sulfur uptake process, but there was no significant difference in the final concentration of intracellular sulfur ions in all groups.

      (3) RNA-seq for non-lifespan-extending mutants​​

      RNA-seq data for the SUL2Δ and SULE427 mutants can be found in Supplementary Figure 1. These mutants do not exhibit a significant upregulation of stress-response genes, such as HSP12 and TPS1, which reinforces the specificity of the pathways induced by SUL1Δ.

      (4) Improved Msn2/4 imaging​​

      Figure 3C and supplementary Figure 4A present high-resolution confocal images (using a 63× objective lens) of cell nuclei labeled with MSN2-GFP and DAPI. The GFP intensity within the nucleus was normalized against the DAPI signal to account for differences in nuclear size.​​

      ​​Reviewer #3 (Recommendations for authors):

      (1) Nuclear size normalization​​

      The verification data for MSN2 and MSN4 were re-evaluated through DAPI signal normalization. The revised figures are presented in Figure 3C and Supplementary Figure 4A.

      (2) Strain nomenclature​​

      All strain names (e.g., SUL1Δ) were updated to follow SGD guidelines.

      (3) Grammar and formatting​​

      We have carefully revised the text to improve readability. And the manuscript was proofread by a native English speaker. Citations (e.g., "trehalose (Lillie and Pringle, 1980)") and spacing errors were corrected.

      (4) Microscopy resolution​​

      In the revised figures (Figures 3C, 3E, 4B, 4E, Supplementary Figure 3A, 4A, 4C), all fluorescence images are displayed as separate channels (EGFP, DAPI, BF). The scale and arrows have been added to the figure for clarity.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The authors use electrophysiological and behavioral measurements to examine how animals could reliably determine odor intensity/concentration across repeated experiences. Because stimulus repetition leads to short-term adaptation evidenced by reduced overall firing rates in the antennal lobe and firing rates are otherwise concentration-dependent, there could be an ambiguity in sensory coding between reduced concentration or more recent experience. This would have a negative impact on the animal's ability to generate adaptive behavioral responses that depend on odor intensities. The authors conclude that changes in concentration alter the constituent neurons contributing to the neural population response, whereas adaptation maintains the 'activated ensemble' but with scaled firing rates. This provides a neural coding account of the ability to distinguish odor concentrations even after extended experience. Additional analyses attempt to distinguish hypothesized circuit mechanisms for adaptation but are inconclusive. A larger point that runs through the manuscript is that overall spiking activity has an inconsistent relationship with behavior and that the structure of population activity may be the more appropriate feature to consider.

      To my knowledge, the dissociation of effects of odor concentration and adaptation on olfactory system population codes was not previously demonstrated. This is a significant contribution that improves on any simple model based on overall spiking activity. The primary result is most strikingly supported by visualization of a principal components analysis in Figure 4. However, there are some weaknesses in the data and analyses that limit confidence in the overall conclusions.

      We thank the reviewer for evaluating our work and highlighting its strengths and deficiencies. We have revised the manuscript with expanded behavioral datasets and additional analyses that we believe convincingly support our conclusion. 

      (1) Behavioral work interpreted to demonstrate discrimination of different odor concentrations yields inconsistent results. Only two of the four odorants follow the pattern that is emphasized in the text (Figure 1F). Though it's a priori unlikely that animals are incapable of distinguishing odor concentrations at any stage in adaptation, the evidence presented is not sufficient to reach this conclusion.

      We have expanded our dataset and now show that the behavioral response is significantly different for high and low concentration exposures of the same odorant. This was observed for all four odorants in our study (refer to Revised Fig. 1F).

      (2) While conclusions center on concepts related to the combination of activated neurons or the "active ensemble", this specific level of description is not directly demonstrated in any part of the results. We see individual neural responses and dimensional reduction analyses, but we are unable to assess to what extent the activated ensemble is maintained across experience.

      We have done several additional analyses (see provisional response). Notably, we have corroborated our dimensionality reduction and correlation analysis results with a quantitative classification analysis that convincingly demonstrates that odor identity and intensity of the odorant can be decoded from the ensemble neural activity, and this could be achieved in an adaptation-invariant fashion (refer to Revised Supplementary Fig. 4). 

      (3) There is little information about the variance or statistical strength of results described at the population level. While the PCA presents a compelling picture, the central point that concentration changes and adaptation alter population responses across separable dimensions is not demonstrated quantitatively. The correlation analysis that might partially address this question is presented to be visually interpreted with no additional testing.

      We have included a plot that compares the odor-evoked responses across all neurons (mean ± variance) at both intensity levels for each odorant (Revised Supplementary Fig. 5). This plot clearly shows how the ensemble neural activity profile varies with odor intensity and how these response patterns are robustly maintained across trials. 

      (4) Results are often presented separately for each odor stimulus or for separate datasets including two odor stimuli. An effort should be made to characterize patterns of results across all odor stimuli and their statistical reliability. This concern arises throughout all data presentations.

      We had to incorporate a 15-minute window between presentations of odorants to reset adaptation. Due to this, we were unable to extracellularly record from all four odorants at two intensities from a single experiment (~ 3.5 hours of recording for just 2 odorants at two intensities with one odorant at higher intensity repeated at the end; Fig. 2a). Therefore, we recorded two datasets. Each dataset captured the responses of ~80 PNs to two odorants at two intensities, one odorant at the higher concentration repeated at the end of the experiment to show repeatability of changes due to adaptation. 

      (5) The relevance of the inconclusive analysis of inferred adaptation mechanisms in Figure 2d-f and the single experiment including a complex mixture in Figure 7 to the motivating questions for this study are unclear.

      Figure 2d-f has been revised. While we agree that the adaptation mechanisms are not fully clear, there is a trend that the most active PNs are the neurons that change the most across trials. This change and the response in the first trial are negatively correlated, indicating that vesicle depletion could be an important contributor to the observed results. However, neurons that adapt strongly at higher intensities are not the ones that adapt at lower intensities. This complicates the understanding of how neural responses vary with intensities and the adaptation that happens due to repetition. This has been highlighted in the revised manuscript. 

      Regarding Figure 7, we wanted to examine the odor-specificity of the changes that happen due to repeated encounters of an odorant. Specifically, wondered if the neural response reduction and behavioral enhancements were a global, non-specific state change in the olfactory system brought about by the repetition of any odorant, or are the observed neural and behavioral response changes odor-specific.

      (6) Throughout the description of the results, typical standards for statistical reporting (sample size, error bars, etc.) are not followed. This prevents readers from assessing effect sizes and undermines the ability to assign a confidence to any particular conclusion.

      We have revised the manuscript to fix these issues and included sample size and error bars in our plots.  

      Reviewer #2 (Public Review):

      Summary:

      The authors' main goal was to evaluate how both behavioral responses to odor, and their early sensory representations are modified by repeated exposure to odor, asking whether the process of adaptation is equivalent to reducing the concentration of an odor. They open with behavioral experiments that actually establish that repeated odor presentation increases the likelihood of evoking a behavioral response in their experimental subjects - locusts. They then examine neural activity patterns at the second layer of the olfactory circuit. At the population level, repeated odor exposure reduces total spike counts, but at the level of individual cells there seems to be no consistent guiding principle that describes the adaptation-related changes, and therefore no single mechanism could be identified.

      Both population vector analysis and pattern correlation analysis indicate that odor intensity information is preserved through the adaptation process. They make the closely related point that responses to an odor in the adapted state are distinct from responses to lower concentration of the same odor. These analyses are appropriate, but the point could be strengthened by explicitly using some type of classification analysis to quantify the adaptation effects. e.g. a confusion matrix might show if there is a gradual shift in odor representations, or whether there are trials where representations change abruptly.

      Strengths:

      One strength is that the work has both behavioral read-out of odor perception and electrophysiological characterization of the sensory inputs and how both change over repeated stimulus presentations. It is particularly interesting that behavioral responses increase while neuronal responses generally decrease. Although the behavioral effect could occur fully downstream of the sensory responses the authors measure, at least those sensory responses retain the core features needed to drive behavior despite being highly adapted.

      Weaknesses:

      Ultimately no clear conceptual framework arises to understand how PN responses change during adaptation. Neither the mechanism (vesicle depletion versus changes in lateral inhibition) nor even a qualitative description of those changes. Perhaps this is because much of the analysis is focused on the entire population response, while perhaps different mechanisms operate on different cells making it difficult to understand things at the single PN level.

      From the x-axis scale in Fig 2e,f it appeared to me that they do not observe many strong PN responses to these stimuli, everything being < 10 spikes/sec. So perhaps a clearer effect would be observed if they managed to find the stronger responding PNs than captured in this dataset.

      We thank the reviewer for his/her evaluation of our work. Indeed, our work does not clarify the mechanism that underlies the adaptation over trials, and how this mechanism accounts for adaptation that is observed at two different intensities of the same odorant. However, as we highlight in the revised manuscript, there is some evidence for the vesicle depletion hypothesis. For the plots shown in Fig. 2, the firing rates were calculated after averaging across time bins and trials. Hence, the lower firing rates. The peak firing rates of the most active neurons are ~100 Hz. So, we are certain that we are collecting responses from a representative ensemble of neurons in this circuit.

      Reviewer #3 (Public Review):

      Summary:

      How does the brain distinguish stimulus intensity reduction from response reductions due to adaptation? Ling et al study whether and how the locust olfactory system encodes stimulus intensity and repetition differently. They show that these stimulus manipulations have distinguishable effects on population dynamics.

      Strengths:

      (1) Provides a potential strategy with which the brain can distinguish intensity decrease from adaptation. -- while both conditions reduce overall spike counts, intensity decrease can also changes which neurons are activated and adaptation only changes the response magnitude without changing the active ensemble.

      (2) By interleaving a non-repeated odor, they show that these changes are odor-specific and not a non-specific effect.

      (3) Describes how proboscis orientation response (POR) changes with stimulus repetition., Unlike the spike counts, POR increases in probability with stimulus. The data portray the variability across subjects in a clear way.

      We thank the reviewer for the summary and for highlighting the strengths of our work.

      Weaknesses:

      (1) Behavior

      a. While the "learning curve" of the POR is nicely described, the behavior itself receives very little description. What are the kinematics of the movement, and do these vary with repetition? Is the POR all-or-nothing or does it vary trial to trial?

      The behavioral responses were monitored in unconditioned/untrained locusts. Hence, these are innate responses to the odorants. These innate responses are usually brief and occur after the onset of the stimulus. However, there is variability across locusts and trials (refer Revised Supplementary Fig. 1). When the same odorant is conditioned with food reward, the POR responses become more stereotyped and occur rapidly within a few hundred milliseconds. 

      Author response image 1.

      POR response dynamics in a conditioned locust. The palps were painted in this case (left panel), and the distance between the palps was tracked as a function of time (right panel).

      b. What are the reaction times? This can constrain what time window is relevant in the neural responses. E.g., if the reaction time is 500 ms, then only the first 500 ms of the ensemble response deserves close scrutiny. Later spikes cannot contribute.

      This is an interesting point. We had done this analysis for conditioned POR responses. For innate POR, as we noted earlier, there is variability across locusts. Many responses occur rapidly after odor onset (<1 s), while some responses do occur later during odor presentation and in some cases after odor termination. It is important to note that these dynamical aspects of the POR response, while super interesting, should occur at a much faster time scale compared to the adaptation that we are reporting across trials or repeated encounters of an odorant.

      c. The behavioral methods are lacking some key information. While references are given to previous work, the reader should not be obligated to look at other papers to answer basic questions: how was the response measured? Video tracking? Hand scored?

      We agree and apologize for the oversight. We have revised the methods and added a video to show the POR responses. Videos were hand-scored. 

      d. Can we be sure that this is an odor response? Although airflow out of the olfactometer is ongoing throughout the experiment, opening and closing valves usually creates pressure jumps that are likely to activate mechanosensors in the antennae.

      Interesting. We have added a new Supplementary Fig. 2 that shows that the POR to even presentations of paraffin oil (solvent; control) is negligible.  This should confirm that the POR is a behavioral response to the odorant. 

      Furthermore, all other potential confounds identified by the reviewer are present for every odorant and every concentration presented.  However, the POR varies in an odor-identity and intensity-specific manner. 

      e. What is the baseline rate of PORs in the absence of stimuli?

      Almost zero. 

      f. What can you say about the purpose of the POR? I lack an intuition for why a fly would wiggle the maxillary palps. This is a question that is probably impossible to answer definitively, but even a speculative explanation would help the reader better understand.

      The locusts use these finger-like maxillary palps to grab a grass blade while eating. Hence, we believe that this might be a preparatory response to feeding. We have noted that the PORs are elicited more by food-related odorants. Hence, we think it is a measure of odor appetitiveness. This has been added to the manuscript. 

      (2) Physiology

      a. Does stimulus repetition affect "spontaneous" activity (i.e., firing in the interstimulus interval? To study this question, in Figures 2b and c, it would be valuable to display more of the prestimulus period, and a quantification of the stability or lability of the inter-stimulus activity.

      Done. Yes, the spontaneous activity does appear to change in an odor-specific manner. We have done some detailed analysis of the same in this preprint:

      Ling D, Moss EH, Smith CL, Kroeger R, Reimer J, Raman B, Arenkiel BR. Conserved neural dynamics and computations across species in olfaction. bioRxiv [Preprint]. 2023 Apr 24:2023.04.24.538157. doi: 10.1101/2023.04.24.538157. PMID: 37162844; PMCID: PMC10168254

      b. When does the response change stabilize? While the authors compare repetition 1 to repetition 25, from the rasters it appears that the changes have largely stabilized after the 3rd or 4th repetition. In Figure 5, there is a clear difference between repetition 1-3 or so and the rest. Are successive repetitions more similar than more temporally-separated repetitions (e.g., is rep 13 more similar to 14 than to 17?). I was not able to judge this based on the dendrograms of Figure 5. If the responses do stabilize at it appears, it would be more informative to focus on the dynamics of the first few repetitions.

      The reviewer makes an astute observation. Yes, the changes in firing rates are larger in the first three trials (Fig. 3c). The ensemble activity patterns, though, are relatively stable across all trials as indicated by the PCA plots and classification analysis results.

      Author response image 2.

      Correlation as a function of trial number. All correlations were made with respect to the odor-evoked responses in the last odor trial of hex(H) and bza(H).

      c. How do temporal dynamics change? Locust PNs have richly varied temporal dynamics, but how these may be affected is not clear. The across-population average is poorly suited to capture this feature of the activity. For example, the PNs often have an early transient response, and these appear to be timed differently across the population. These structures will be obscured in a cross population average. Looking at the rasters, it looks like the initial transient changes its timing (e.g., PN40 responses move earlier; PN33 responses move later.). Quantification of latency to first spike after stimulus may make a useful measure of the dynamics.

      As noted earlier, to keep our story simple in this manuscript, we have only focused on the variations across trials (i.e., much slower response dynamics). We did this as we are not recording neural and behavioral responses from the same locust. We plan to do this and directly compare the neural and behavioral dynamics in the same locust.

      d.How legitimate is the link between POR and physiology? While their changes can show a nice correlation, the fact the data were taken from separate animals makes them less compelling than they would be otherwise. How feasible is it to capture POR and physiology in the same prep?

      This would be most helpful, but I suspect may be too technically challenging to be within scope.

      The antennal lobe activity in the input about the volatile chemicals encountered by the locust. The POR is a behavioral output. Hence, we believe that examining the correlation between the olfactory system's input and output is a valid approach. However, we have only compared the mean trends in neural and behavioral datasets, and dynamics on a much slower timescale. We are currently developing the capability to record neural responses in behaving animals. This turned out to be a bit more challenging than we had envisioned. We plan to do fine-grained comparisons of the neural and behavioral dynamics, recommended by this reviewer, in those preparations.

      Further, we will also be able to examine whether the variability in behavioral responses could be predicted from neural activity changes in that prep.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      This manuscript investigated the mechanism underlying boundary formation necessary for proper separation of vestibular sensory end organs. In both chick and mouse embryos, it was shown that a population of cells abutting the sensory (marked by high Sox2 expression) /nonsensory cell populations (marked by Lmx1a expression) undergo apical expansion, elongation, alignment and basal constriction to separate the lateral crista (LC) from the utricle. Using Lmx1a mouse mutant, organ cultures, pharmacological and viral-mediated Rock inhibition, it was demonstrated that the Lmx1a transcription factor and Rock-mediated actomyosin contractility is required for boundary formation and LC-utricle separation.

      Strengths:

      Overall, the morphometric analyses were done rigorously and revealed novel boundary cell behaviors. The requirement of Lmx1a and Rock activity in boundary formation was convincingly demonstrated.

      Weaknesses:

      However, the precise roles of Lmx1a and Rock in regulating cell behaviors during boundary formation were not clearly fleshed out. For example, phenotypic analysis of Lmx1a was rather cursory; it is unclear how Lmx1a, expressed in half of the boundary domain, control boundary cell behaviors and prevent cell mixing between Lmx1a+ and Lmx1a- compartments? Well-established mechanisms and molecules for boundary formation were not investigated (e.g. differential adhesion via cadherins, cell repulsion via ephrin-Eph signaling). Moreover, within the boundary domain, it is unclear whether apical multicellular rosettes and basal constrictions are drivers of boundary formation, as boundary can still form when these cell behaviors were inhibited. Involvement of other cell behaviors, such as radial cell intercalation and oriented cell division, also warrant consideration. With these lingering questions, the mechanistic advance of the present study is somewhat incremental.

      We have acknowledged the lingering questions this referee points out in our Discussion and agree that the roles of differential cell adhesion and cell intercalation would be worth exploring in further studies. Despite these remaining questions, the conceptual advances are significant, since this study provides the first evidence that a tissue boundary forms in between segregating sensory organs in the inner ear (there are only a handful of embryonic tissues in which a tissue boundary has been found in vertebrates) and highlights the evolutionary conservation of this process. This work also provides a strong descriptive basis for any future study investigating the mechanisms of tissue boundary formation in the mouse and chicken embryonic inner ear. 

      Reviewer #2 (Public review):

      Summary:

      Chen et al. describe the mechanisms that separate the common pan-sensory progenitor region into individual sensory patches, which presage the formation of the sensory epithelium in each of the inner ear organs. By focusing on the separation of the anterior and then lateral cristae, they find that long supra-cellular cables form at the interface of the pansensory domain and the forming cristae. They find that at these interfaces, the cells have a larger apical surface area, due to basal constriction, and Sox2 is down-regulated. Through analysis of Lmx1 mutants, the authors suggest that while Lmx1 is necessary for the complete segregation of the sensory organs, it is likely not necessary for the initial boundary formation, and the down-regulation of Sox2.

      Strengths:

      The manuscript adds to our knowledge and provides valuable mechanistic insight into sensory organ segregation. Of particular interest are the cell biological mechanisms: The authors show that contractility directed by ROCK is important for the maintenance of the boundary and segregation of sensory organs.

      Weaknesses:

      The manuscript would benefit from a more in-depth look at contractility - the current images of PMLC are not too convincing. Can the authors look at p or ppMLC expression in an apical view? Are they expressed in the boundary along the actin cables? Does Y-27362 inhibit this expression?

      The authors suggest that one role for ROCK is the basal constriction. I was a little confused about basal constriction. Are these the initial steps in the thinning of the intervening nonsensory regions between the sensory organs? What happens to the basally constricted cells as this process continues?

      In our hands, the PMLC immunostaining gave a punctate staining in epithelial cells and was difficult to image and interpret in whole-mount preparations, which did not allow us to investigate its specific association to the actin-cable-like structures. It is a very valuable suggestion to try alternative methods of fixation to improve the quality of the staining and images in future work. 

      The basal constriction of the cells at the border of the sensory organs was not always clearly visible in freshly-fixed samples, and was absent in the majority of short-term organotypic cultures in control medium, which made it impossible to ascertain the role of ROCK in its formation using pharmacological approaches in vitro (see Figure 7 and corresponding Result section).  On the other hand, the overexpression of a dominant-negative form of ROCK (RCII-GFP) in ovo using RCAS revealed a persistence of basal constriction in transfected cells despite a disorganisation of the boundary domain (Figure 8). We conclude from these experiments that ROCK activity is not necessary for the formation and maintenance of the basal constriction. We also remain uncertain about the exact role of this basal constriction. It could be either a cause or consequence of the expansion of the apical surface of cells in the boundary domain, it could contribute to the limitation of cell intermingling and the formation of the actin-cable-like structure at the interface of Lmx1a-expressing and non-expressing cells, and may indeed prefigure some of the further changes in cell morphology occurring in non-sensory domains separating the sensory organs (cell flattening and constrictions of the epithelial walls in between sensory organs). 

      The steps the authors explore happen after boundaries are established. This correlates with a down-regulation of Sox2, and the formation of a boundary. What is known about the expression of molecules that may underlie the apparent interfacial tension at the boundaries? Is there any evidence for differential adhesion or for Eph-Ephrin signalling? Is there a role for Notch signalling or a role for Jag1 as detailed in the group's 2017 paper?

      Great questions. It is indeed likely that some form of differential cell tension and/or adhesion participates to the formation and maintenance of this boundary, and we have mentioned in the discussion some of the usual suspects (cadherins, eph/ephrin signalling,…) although it is beyond the scope of this paper to determine their roles in this context. 

      As we have discussed in this paper and in our 2017 study (see also Ma and Zhang, Development,  2015 Feb 15;142(4):763-73. doi: 10.1242/dev.113662) we believe that Notch signalling is maintaining prosensory character, and its down-regulation by Lmx1a/b expression is required for the specification of the non-sensory domains in between segregating sensory organs. Although we have not tested this directly in this study, any disruption in Notch signalling would be expected to affect indirectly the formation or maintenance of the boundary domain. 

      A comment on whether cellular intercalation/rearrangements may underlie some of the observed tissue changes.

      We have not addressed this topic directly in the present study but we have included a brief comment on the potential implication of cellular intercalation and rearrangements in the discussion: “It is also possible that the repositioning of cells through medial intercalation could contribute to the straightening of the boundary as well as the widening of the nonsensory territories in between sensory patches.”

      The change in the long axis appears to correlate with the expression of Lmx1a (Fig 5d). The authors could discuss this more. Are these changes associated with altered PCP/Vangl2 expression?

      We are not sure about the first point raised by the referee. We have quantified cell elongation and orientation in Lmx1a-GFP heterozygous and homozygous (null) mice, and our results suggest that the elongation of the cells occurs throughout the boundary domain, and is probably not dependent on Lmx1a expression (boundary cells are in fact more elongated in the Lmx1a mutant).  We have not investigated the expression of components of the planar cell polarity pathway. This is a very interesting suggestion, worth exploring in further studies.

      Reviewer #3 (Public review):

      Summary:

      Lmx1a is an orthologue of apterous in flies, which is important for dorsal-ventral border formation in the wing disc. Previously, this research group has described the importance of the chicken Lmx1b in establishing the boundary between sensory and non-sensory domains in the chicken inner ear. Here, the authors described a series of cellular changes during border formation in the chicken inner ear, including alignment of cells at the apical border and concomitant constriction basally. The authors extended these observations to the mouse inner ear and showed that these morphological changes occurred at the border of Lmx1a positive and negative regions, and these changes failed to develop in Lmx1a mutants. Furthermore, the authors demonstrated that the ROCK-dependent actomyosin contractility is important for this border formation and blocking ROCK function affected epithelial basal constriction and border formation in both in vitro and in vivo systems.

      Strengths:

      The morphological changes described during border formation in the developing inner ear are interesting. Linking these changes to the function of Lmx1a and ROCK dependent actomyosin contractile function are provocative.

      Weaknesses:

      There are several outstanding issues that need to be clarified before one could pin the morphological changes observed being causal to border formation and that Lmx1a and ROCK are involved.

      We have addressed the specific comments and suggestions of the reviewer below. We wish however to point out that we do not think that ROCK activity is required for the formation or maintenance of the basal constriction at the interface of Lmx1a-expressing and nonexpressing cells (see previous answer to referee #2)

      Reviewer #1 (Recommendations for the authors):

      Specific comments:

      (1) Figures 1 and 2, and related text. Based on the whole-mount images shown, the anterior otocyst appeared to be a stratified epithelium with multiple cell layers. If so, it should be clarified whether the x-y view of in the "apical" and "basal" plane are from cells residing in the apical and basal layers, respectively. Moreover, it would be helpful to include a "stage 4", a later stage to show if and when basal constrictions resolve.

      In fact, at these early stages of development, the otic epithelium is “pseudostratified”: it is formed by a single layer of irregularly shaped cells, each extending from the base to the apical aspect of the epithelium, but with their nuclei residing at distinct positions along this basal-apical axis as mitotic cells progress through the cell cycle.  The nuclei divide at the surface of the epithelium, then move back to the most basal planes within daughter cells during interphase. This process, known as interkinetic nuclear migration, has been well described in the embryonic neural tube and occurs throughout the developing otic epithelium (e.g. Orr, Dev Biol. 1975, 47,325-340, Ohta et al., Dev Biol. 2010 Sep 15;347(2):369–381. doi: 10.1016/j.ydbio.2010.09.002; ). Consequently, the nuclei visible in apical or basal planes in x-y views belong to cells extending from the base to the apex of the epithelium, but which are at different stages of the cell cycle. 

      We have not included a late stage of sensory organ segregation in this study (apart from a P0 stage in the mouse inner ear, see Figure 4) since data about later stages of sensory organ morphogenesis are available in other studies, including our Mann et al. eLife 2017 paper describing Lmx1a-GFP expression in the embryonic mouse inner ear.

      (2) Related to above, the observed changes in cell organization raised the possibility that the apical multicellular rosettes and basal constrictions observed in Stage 3 (and 2) could be intermediates of radial cell intercalations, which would lead to expansion of the space between sensory organs and thinning of the boundary domains. To see if it might be happening, it would be helpful to include DAPI staining to show the overall tissue architecture at different stages and use optical reconstruction to assess the thickness of the epithelium in the presumptive boundary domain over time.

      We agree with this referee. Besides cell addition by proliferation and/or changes in cell morphology, radial cell intercalations could indeed contribute to the spatial segregation of inner ear sensory organs (a brief statement on this possibility was added to the Discussion). It is clear from images shown in Figure 4 (and from other studies) that the non-sensory domain separating the cristae from the utricle gets flatter and its cells also enlarge as development proceeds. We do not think that DAPI staining is required to demonstrate this. Perhaps the best way to show that radial cell intercalations occur would be to perform liveimaging of the otic epithelium, but this is technically challenging in the mouse or chicken inner ear. An alternative model system might be the zebrafish inner ear, in which some liveimaging data have shown a progressive down-regulation of Jag1 expression during sensory organ segregation (and a flattening of “boundary domains”), suggesting a conservation of the basic mechanisms at play (Ma and Zhang, Development,  2015 Feb 15;142(4):763-73. doi: 10.1242/dev.113662).

      (3) Similarly, it would be helpful to include the DAPI counterstain in Figures 4, 7, and 8 to show the overall tissue architecture.

      We do not have DAPI staining for these particular images but in most cases, Sox2 immunostaining gives a decent indication of tissue morphology. 

      (4) Figure 2(z) and Figure 4d. The arrows pointing at the basal constrictions are obstructing the view of the basement membrane area, making it difficult to appreciate the morphological changes. They should be moved to the side. Can the authors comment whether they saw evidence for radial intercalations (e.g. thinning of the boundary domain) or partial unzippering of adjoining compartments along the basal constrictions?

      The arrows in Figure 2(z) and Figure 4d have been moved to the side of the panels. 

      See previous comment. Besides the presence of multicellular rosettes, we have not seen direct evidence of radial cell intercalation – this would be best investigated using liveimaging. As development proceeds, the epithelial domain separating adjoining sensory organs becomes wider. The cells that compose it gradually enlarge and flatten, as can be seen for example at P0 in the mouse inner ear (Figure 4g). 

      (5) Figures 3 and 5, and related text. It should be clarified whether the measurements were all taken from the surface cells. For Fig. 3e and 5d, the mean alignment angles of the cell long axis in the boundary regions should be provided in the text.

      The sensory epithelium in the otocyst is pseudostratified, hence, the measurement was taken from the surface of all epithelial cells labelled with F-actin. 

      We have added histograms representing the angular distribution of the cell long axis orientations in the boundary region to Figure 3 and Figure 5 Supplementary 1. We believe that this type of representation is more informative than the numerical value of the mean alignment angles of the cell long axis for defined sub-domains. 

      (6) It would be helpful to also quantify basal constrictions using the cell skeleton analysis. In addition, it would be helpful to show x-y views of cell morphology at the level of basal constrictions in the mouse tissue, similar to the chick otocyst shown in Figure 2.

      The data that we have collected do not allow a precise quantification of basal constrictions with cell skeleton analysis, due to the generally fuzzy nature of F-actin staining in the basal planes of the epithelium. However, we have followed the referee’s advice and analysed Factin staining in x-y views in the Lmx1a-GFP knock-in (heterozygous) mice. We found that the first signs of basal F-actin enrichment and multicellular actin-cable like structures at the interface of Lmx1a-positive and negative cells are visible at E11.5, and F-actin staining in the basal planes increases in intensity and extent at E13.5. (shown in new Figure 4 – Supplementary Figure 1).

      (7) Figure 5 and related text. It would be informative to analyze Lmx1a mutants at early stages (E11-E13) to pinpoint cell behavior defects during boundary formation.

      We chose the E15 stage because it is one at which we can unequivocally recognize and easily image and analyse the boundary domain from a cytoarchitectural point of view. We recognize that it would have been worth including earlier stages in this analysis but have not been able to perform these additional studies due to time constraints and unavailability of biological material. 

      (8) Figure 5-Figure S1, the quantifications suggest that Lmx1a loss had both cellautonomous and non-autonomous effects on boundary cell behaviors. This is an interesting finding, and its implication should be discussed.

      It is well-known that the absence of Lmx1a function induces a very complex (and variable) phenotype in terms of inner ear morphology and patterning defects. It is also clear from this study that the absence of Lmx1 causes non-cell autonomous defects in the boundary domain and we have already mentioned this in the discussion: “Finally, the patterning abnormalities in Lmx1a<sup>GFP/GFP</sup> samples occurred in both GFP-positive and negative territories, which points at some type of interaction between Lmx1a-expressing and nonexpressing cells, and the possibility that the boundary domain is also a signalling centre influencing the differentiation of adjacent territories.”

      (9) Figure 6 and related text. To correlate myosin II activity with boundary cell behaviors, it would be important to immunolocalize pMLC in the boundary domain in whole-mount otocyst preparations from stage 1 to stage 3.

      We tried to perform the suggested immunostaining experiments, but in our hands at least, the antibody used did not produce good quality staining in whole-mount preparations. We have therefore included images of sectioned otic tissue, which show some enrichment in pMLC immunostaining at the interface of segregating organs (Figure 6).

      (10) Figures 7 and 8. A caveat of long-term Rock inhibition is that it can affect cell proliferation and differentiation of both sensory and non-sensory cells, which would cause secondary effects on boundary formation. This caveat was not adequately addressed. For example, does Rock signaling control either the rate or the orientation of cell division to promote boundary formation? Together with the mild effect of acute Rock inhibition, the precise role of Rock signaling in boundary formation remains unclear.

      We absolutely agree that the exact function of ROCK could not be ascertained in the in vitro experiments, for the reasons we have highlighted in the manuscript (no clear effect in short term treatments, great level of tissue disorganisation in long-term treatments). This prompted us to turn to an in ovo approach. The picture remains uncertain in relation to the role of ROCK in regulating cell division/intercalation but we have been at least able to show a requirement for the maintenance of an organized and regular boundary. 

      (11) Figure 8. RCII-GFP likely also have non-autonomous effects on cell apical surface area. In 8d, it would be informative to include cell area quantifications of the GFP control for comparison.

      It is possible that some non-autonomous effects are produced by RCII-GFP expression, but these were not the focus of the present study and are not particularly relevant in the context of large patches of overexpression, as obtained with RCAS vectors. 

      We have added cell surface area quantifications of the control RCAS-GFP construct for comparison (Figure 8e).

      (12) The significance of the presence of cell divisions shown in Figure 9 is unclear. It would be informative to include some additional analysis, such as a) quantify orientation of cell divisions in and around the boundary domain and b) determine whether patterns of cell division in the sensory and nonsensory regions are disrupted in Lmx1a mutants.

      These are indeed fascinating questions, but which would require considerable work to answer and are beyond the scope of this paper. 

      Minor comments:

      (1) Figure 1. It should be clarified whether e', h' and k' are showing cortical F-actin of surface cells. Do the arrowheads in i' and l' correspond to the position of either of the arrowheads in h' and k', respectively?

      The epithelium in the otocyst is pseudostratified. Therefore, images e’, h’, k’ display F-actin labelling on the surface of tissue composed of a single cell layer. We have added arrows to images e”, h”, and k” to indicate the corresponding position of z-projections and included appropriate explanation in the legend of Figure 1: “Black arrows on the side of images e”, h”, and k” indicate the corresponding position of z-projections.”

      (2) Figure 3-Figure S1. Please mark the orientation of the images shown.

      We labelled the sensory organs in the figure to allow for recognizing the orientation. 

      (3) Figure 4. Orthogonal reconstructions should be labeled (z) to be consistent with other figures.

      We have corrected the labelling in the orthogonal reconstruction to (z). 

      (4) Figure 4g. It is not clear what is in the dark area between the two bands of Lmx1a+ cells next to the utricle and the LC. Are those cells Lmx1a negative? It is unclear whether a second boundary domain formed or the original boundary domain split into two between E15 and P0? Showing the E15 control tissue from Figure 5 would be more informative than P0.

      In this particular sample there seems to be a folding of the tissue (visible in z-reconstructions) that could affect the appearance of the projection shown in 4g. We believe the P0 is a valuable addition to the E15 data, showing a slightly later stage in the development of the vestibular organs.

      (5) Figure 5a, e. Magnified regions shown in b and f should be boxed correspondingly.

      This figure has been revised. We realized that the previous low-magnification shown in (e) (now h) was from a different sample than the one shown in the high-magnification view. The new figure now includes the right low-magnification sample (in h) and the regions shown in the high-magnification views have been boxed.

      (6) Figure 8f, h, j. Magnified regions shown in g, i and k should be boxed correspondingly.

      The magnified regions were boxed in Figure 8 f, h, and j. Additionally, black arrows have been placed next to images 8g", 8i", and 8k" to highlight the positions of the z-projections. An appropriate explanation has also been added to the figure legend.

      (9) Figure 8. It would be helpful to show merged images of GFP and F-actin, to better appreciate cell morphology of GFP+ and GFP- cells.

      As requested, we have added images showing overlap of GFP and F-actin channels in Figure 8.

      Reviewer #2 (Recommendations for the authors):

      The PMLC staining could be improved. Two decent antibodies are the p-MLC and pp-MLC antibodies from CST. pp-MLC works very well after TCA fixation as detailed in https://www.researchsquare.com/article/rs-2508957/latest . As phalloidin does not work well after TCA fixation, affadin works very well for segmenting cells.

      If the authors do not wish to repeat the pMLC staining, the details of the antibody used should be mentioned.

      We used mouse IgG1 Phospho-Myosin Light Chain 2 (Ser19) from Cell Signaling Technology (catalogue number #3675) in our immunohistochemistry for PMLC. This is one of the two antibodies recommended by the reviewer #2. Information about this antibody has now been included in material and methods. This antibody has been referenced by many manuscripts, but unfortunately, in our hands at least, it did not perform well in whole-mount preparations.

      A statement on the availability of the data should be included.

      We have included a statement on the data availability: “All data generated or analysed during this study is available upon request.”

      Reviewer #3 (Recommendations for the authors):

      Outstanding issues:

      (1) Morphological description: The apical alignment of epithelial cells at the border is clear but not the upward pull of the basal lamina. Very often, it seems to be the Sox2 staining that shows the upward pull better than the F-actin staining. Perhaps, adding an anti-laminin staining to indicate the basement membrane may help.

      Indeed, the upward pull of the basement membrane is not always very clear. We performed some anti-laminin immunostaining on mouse cryosections and provide below (Figure 1) an example of such experiment. The results appear to confirm an upward displacement of the basement membrane in the region separating the lateral crista from the utricle in the E13 mouse inner ear, but given the preliminary nature of these experiments, we believe that these results do not warrant inclusion in the manuscript. The term “pull” is somehow implying that the epithelial cells are responsible for the upward movement of the basement membrane, but since we do not have direct evidence that this is the case, we have replaced “pull” by “displacement” throughout the text. 

      (2) It is not clear how well the cellular changes are correlated with the timing of border formation as some of the ages shown in the study seem to be well after the sensory patches were separated and the border was established.

      For some experiments (for example E15 in the comparison of mouse Lmx1a-GFP heterozygous and homozygous inner ear tissue; E6 for the RCAS experiments), the early stages of boundary formation are not covered because we decided to focus our analysis on the late consequences of manipulating Lmx1a/ROCK activity in terms of sensory organ segregation. The dataset is more comprehensive for the control developmental series in the chicken and mouse inner ear. 

      (3) The Lmx1a data, as they currently stand could be explained by Lmx1a being required for non-sensory development and not necessarily border formation. Additionally, the relationship between ROCK and Lmx1a was not investigated. Since the investigators have established the molecular mechanisms of Lmx1 function using the chicken system previously, the authors could try to correlate the morphological events described here with the molecular evidence for Lmx1 functioning during border formation in the same chicken system. Right now, only the expression of Sox2 is used to correlate with the cellular events, and not Lmx1, Jag1 or notch.

      These are valid points. Exploring in detail the epistatic relationships between Notch signalling/Lmx1a/ROCK/boundary formation in the chicken model would be indeed very interesting but would require extensive work using both gain and loss-of-function approaches, combined with the analysis of multiple markers (Jag1/Sox2/Lmx1b/PMLC/Factin..). At this point, and in agreement with the referee’s comment, we believe that Lmx1a is above all required for the adoption of the non-sensory fate. The loss of Lmx1a function in the mouse inner ear produce defects in the patterning and cellular features of the boundary domain, but these may be late consequences of the abnormal differentiation of the nonsensory domains that separate sensory organs. Furthermore, ROCK activity does not appear to be required for Sox2 expression (i.e. adoption or maintenance of the sensory fate) since the overexpression of RCII-GFP does not prevent Sox2 expression in the chicken inner ear. This fits with a model in which Notch/Lmx1a regulate cell differentiation whilst ROCK acts independently or downstream of these factors during boundary formation. 

      Specific comments:

      (1) Figure 1. The downregulation of Sox2 is consistent between panels h and k, but not between panels e and h. The orthogonal sections showing basal constriction in h' and k' are not clear.

      The downregulation is noticeable along the lower edge of the crista shown in h; the region selected for the high-magnification view sits at an intermediate level of segregation (and Sox2 downregulation). 

      The basal constriction is not very clear in h, but becomes easier to visualize in k. We have displaced the arrow pointing at the constriction, which hopefully helps. 

      (2) Figure 2. Where was the Z axis taken from? One seems to be able to imagine the basal constriction better in the anti-Sox2 panel than the F-actin panel. A stain outlining the basement membrane better could help.

      Arrows have been added on the side of the horizontal views to mark the location of the zreconstruction. See our previous replies to comments addressing the upward displacement of the basement membrane.

      (3) Figure 4

      I question the ROI being chosen in this figure, which seems to be in the middle of a triad between LC, prosensory/utricle and the AC, rather than between AC and LC. If so, please revise the title of the figure. This could also account for the better evidence of the apical alignment in the upper part of the f panel.

      We have corrected the text. 

      In this figure, the basal constriction is a little clearer in the orthogonal cuts, but it is not clear where these sections were taken from.

      We have added black arrows next to images 4c’, 4f’, and 4i’ to indicate the positions of the zprojections.  

      By E13.5, the LC is a separate entity from the utricle, it makes one wonder how well the basal constriction is correlated with border formation. The apical alignment is also present by P0, which raises the question that the apical alignment and basal restriction may be more correlated with differentiation of non-sensory tissue rather than associated with border formation.

      We agree E13.5 is a relatively late stage, and the basal constriction was not always very pronounced. The new data included in the revised version include images of basal planes of the boundary domain at E11.5, which reveal F-actin enrichment and the formation of an actin-cable-like structure (Figure 4 suppl. Fig1). Furthermore, the chicken dataset shows that the changes in cell size, alignment, and the formation of actin-cable-like structure precede sensory patch segregation and are visible when Sox2 expression starts to be downregulated in prospective non-sensory tissue (Figure 1, Figure 2). Considering the results from both species, we conclude that these localised cellular changes occur relatively early in the sequence of events leading to sensory patch segregation, as opposed to being a late consequence of the differentiation of the non-sensory territories.  

      I don't follow the (x) cuts for panels h and I, as to where they were taken from and why there seems to be an epithelial curvature and what it was supposed to represent.

      We have added black arrows next to the panels 4c’, 4f’, and 4i’ to indicate the positions of the z-projections and modified the legend accordingly. The epithelial curvature is probably due to the folding of the tissue bordering the sensory organs during the manipulation/mounting of the tissue for imaging.

      (4) Figure 5 The control images do not show the apical alignment and the basal constriction well. This could be because of the age of choice, E15, was a little late. Unfortunately, the unclarity of the control results makes it difficult for illustrating the lack of cellular changes in the mutant. The only take-home message that one could extract from this figure is a mild mixing of Sox2 and Lmx1a-Gfp cells in the mutant and not much else. Also, please indicate the level where (x) was taken from.

      Black arrows have been placed next to images 5e and 5l to highlight the positions of the zprojections. The stage E15 chosen for analysis was appropriate to compare the boundary domains once segregation is normally completed. We believe the results show some differences in the cellular features of the boundary domain in the Lmx1a-null mouse, and we have in fact quantified this using Epitool in Figure 5 – Suppl. Fig 1. Cells are more elongated and better aligned in the Lmx1a-null than in the heterozygous samples.  

      (5) Figure 7. I think the cellular disruption caused by the ROCK inhibitor, shown in q', is too severe to be able to pin to a specific effect of ROCK on border formation. In that regard, the ectopic expression of the dominant negative form of ROCK using RCAS approach is better, even though because it is a replication competent form of RCAS, it is still difficult to correlate infected cells to functional disruption.

      We used a replication-competent construct to induce a large patch of infection, increasing our chances of observing a defect in sensory organ segregation and boundary formation. We agree that this approach does not allow us to control the timing of overexpression, but the mosaicism in gene expression, allowing us to compare in the same tissue large regions with/without perturbed ROCK activity, proved more informative than the pharmacological/in vitro experiments.

      (6) Figure 8. Outline the ROI of i in h, and k in j. Outline in k the comparable region in k'. In k", F-actin staining is not uniform. Indicate where (x) was taken from in K.

      The magnified regions were boxed in Figure 8 f, h, and j. Region outlined in figures k’-k” has also been outlined in corresponding region in figure k. Additionally, black arrows have been placed next to images 8g", 8i", and 8k" to highlight the positions of the z-projections. An appropriate explanation has also been added to the figure legend.

      Minor comments:

      (1) P.18, 1st paragraph, extra bracket at the end of the paragraph.

      Bracket removed

      (2) P.22, line 11, in ovo may be better than in vivo in this case.

      We agree, this has been corrected. 

      (3) P.25, be consistent whether it is GFP or EGFP.

      Corrected to GFP.

      (4) P.26, line 5. Typo on "an"

      Corrected to “and”

      Author response image 1.

      Expression of Laminin and Sox2 in the E13 mouse inner ear. a-a’’’) Low magnification view of the utricle, the lateral crista, and the non-sensory (Sox2-negative) domain separating these. Laminin staining is detected at relatively high levels in the basement membrane underneath the sensory patches. At higher magnification (b-b’’’), an upward displacement of the basement membrane (arrow) is visible in the region of reduced Sox2 expression, corresponding to the “boundary domain” (bracket). 

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weaknesses:

      (1). Analysis of transcript expression is limited to the CT-peptide encoding gene, while no gene expression analysis was attempted for the three identified receptors. Differences in the activation of downstream signaling pathways between the three receptors are also questionable due to unclarities in the statistical analysis and variation in the control and experimental data in heterologous assays. Together, this makes it difficult to propose a mechanism underlying differences in the functions of the two CT-like peptides in muscle control and growth regulation.

      We appreciate the reviewer's rigorous critique. The manuscript has been comprehensively revised as follows:

      (1) For the expression analysis of the three identified receptors, the updated results are presented in Figure 5, with the detailed descriptions in Results section 2.4 (line 287-290) and Materials and Methods section 4.5 (line 767).

      (2) For the statistical tests and methodological clarity, statistical tests were indeed performed for all experiments. However, we acknowledge that the original labeling methods required enhanced methodological clarity, and we apologize for any confusion caused. All figures have been revised to improve the visibility of differences, and statistical test information has been added to both the figure legends and the Materials and methods section “4.10 Statistical Analysis” (line 900-910).

      (3) For the variation in the control and experimental data, the minor observed variations in control conditions across experiments primarily arise from two methodological factors: 1) Each experimental set used cells transfected with distinct receptor subtypes (e.g., AjPDFR1 vs. AjPDFR2), inherently introducing baseline variability due to differential receptor expression profiles. 2) Independent cell culture batches were employed for replicate experiments to ensure biological reproducibility.  Importantly, these minor variations ‌did not compromise‌ the statistical significance of downstream signaling differences (p < 0.01 for all comparative analyses). Therefore, differences in the activation of downstream signaling pathways between the three receptors are reliable.

      (2) The authors also suggest a putative orexigenic role for the CT-like peptidergic system in feeding behavior. This effect is not well supported by the experimental data provided, as no detailed analysis of feeding behavior was carried out (only indirect measurements were performed that could be influenced by other peptidergic effects, such as on muscle relaxation) and no statistically significant differences were reported in these assays.

      Thank you for the reviewer’s valuable comments. Our revised manuscript now includes the following multidimensional analyses to strengthen evidence of the orexigenic role of AjCT2: Firstly, in sea cucumbers, the mass of remaining bait is a common indicator of feeding condition. After long-term AjCT2 injection, this value was significantly decreased in comparison with control group during phase V (Figure 8A-figure supplement 1), which indicates that AjCT2 promotes feeding in A. japonicus. Correspondingly, in long-term loss-of-function experiments (newly added in the revised manuscript), the remaining bait in the siAjCTP1/2-1 group was significantly increased in comparison with siNC group form phase II to IV (Figure 10B). The detailed descriptions of these supplementary experiments have been added to‌ Results Section 2.6 (lines 390-396) and Materials and Methods Section 4.9 (line 879-888).

      Secondly, after 24 days of continuous injections of siAjCTP1/2-1, we monitored the feeding behavior of these sea cucumbers over three consecutive days. Each day, we removed residual bait and feces, then repositioned fresh food at the tank center.‌ We calculated the aggregation percentage (AP) of sea cucumbers around the food during the feeding peak (2:00-4:00) each day, which is the most reliable indicator of feeding behavior in this species‌. The results showed that the AP in siAjCTP1/2-1 group was significantly lower than that in control group. Post-dissection observations revealed reduced intestinal food content and significant intestinal degeneration in the siAjCTP1/2-1 group (The figure has been added below). These results indicate that long-term functional loss of AjCT2 reduces food intake and influences the feeding behavior of A. japonicus.

      In response to the comment regarding “No statistically significant differences were reported in these assays”, we have modified the figures to clearly visualize the differences and added statistical test details in both the figure legends and the Materials and methodssection “4.10 Statistical analysis” (lines 900–910).

      Author response image 1.

      The feeding behavior of A. japonicus after long-term loss-of-function of AjCT2. (A) A record of feeding behavior. The red arrow refers to the food and the red box represents the feeding area. The numbers in the figure represent individuals entering into the feeding area. (B) The aggregation percentage (AP) of sea cucumbers around the food during the feeding peak (2:00-4:00) (n=3 days). (C) The degenerated intestine of sea cucumber after 24 days of siAjCTP1/2-1 injection. Data in the graph represent the mean ± standard deviation. *Significant differences between groups (p < 0.05). Control: siNC injection group; CT-SiRNA: siAjCTP1/2 injection group.<br />

      (3) Overall, details regarding statistical analyses are not (clearly) specified in the manuscript, and there are several instances where statements are not supported by literature evidence.

      Thank you for the reviewer’s comments. Again, we sincerely apologize for the confusion caused. To clarify, statistical tests were performed for all experiments. However, the original labeling may have been somewhat messy. We have revised all figures to enhance the visibility of differences and provided detailed statistical test information in both the figure legends and the Materials and Methods section titled “4.10 Statistical Analysis” (lines 900–910). Additionally, we have supplemented the revised manuscript with further literature evidence to support our statements: (1) citation to Furuya et al. (2000), Johnson et al. (2005), Jékely (2013) and Mirabeau et al. (2013) have been added to clarify the foundation studies on DH31 and DH31 receptors in invertebrates (line 73-74); (2) Conzelmann et al. (2013) and Furuya et al. (2000) were cited to validate the present of two different types of CT-related peptides in protostomes: CT-type peptides (with an N-terminal disulphide bridge) and DH31-type peptides (lacking this feature) (line 78-79); (3) Johnson et al. (2005) was referenced to support the dual ligand-receptor interactions of DH31 in Drosophila, specifically its binding to both CG17415 (a CTR/CLR-related protein) and CG13758 (the PDF receptor)  (line 94); (4) Johnson et al. (2005) and Goda et al. (2019) were cited to reinforce the functional significance of dual DH31 receptor pathways in Drosophila, as extensively studied in prior research (line 95-97).

      Reviewer #2 (Public review):

      Weaknesses:

      (1) The authors claim that A. japonicus CTs activate "PDF" receptors and suggest that this cross-talk is evolutionarily ancient since a similar phenomenon also exists in the fly Drosophila melanogaster. These conclusions are not fully supported for several reasons. The authors perform phylogenetic analysis to show that the two "PDF" receptors form an independent clade. This clade is sister to the clade comprising CT receptors. This phylogenetic analysis suffers from several issues. Firstly, the phylogenies lack bootstrap support. Secondly, the resolution of the phylogeny is poor because representative members from diverse phyla have not been included. For instance, insect or other protostomian PDF receptors have not been included so how can the authors distinguish between "PDF" receptors or another group of CT receptors? Thirdly, no in vivo evidence has been presented to support that CT can activate "PDF" receptors in vivo.

      We thank the reviewers for their constructive comments. As suggested, ‌we expanded our taxon sampling to include more representative members across diverse phyla‌ and reanalyzed the phylogenetic relationships (including bootstrap tests) in Figure 1C. The revised analysis revealed two distinct clades‌: one containing CTR/CLR-type receptors and the other PDF-type receptors. Specifically, AjCTR clustered within the CTR/CLR-type receptor group, while AjPDFR1 and AjPDFR2 were placed in the PDF-type receptor clade. The full species names for all taxa were provided in the Supplementary Table 2.

      To provide in vivo evidence supporting CT-mediated activation of "PDF" receptors‌, we conducted the following experiments: Firstly, we confirmed that AjPDFR1 and AjPDFR2 were the functional receptors of AjCT1 and AjCT2 (Figure 2, 3 and 4). Secondly, injection of AjCT2 and siAjCTP1/2-1 in vivo induced corresponding changes in AjPDFR1 and AjPDFR2 expression levels in the intestine (Figure 8C, 9A, 9B and 9C).

      (2) The source of CT which mediates the effects on longitudinal muscles and intestine is unclear. Is it autocrine or paracrine signaling by CT from the same tissue or is it long-range hormonal signaling?

      Thank you for this feedback. We have now analysed CT-type neuropeptide expression in A. japonicus using immunohistochemistry with the antiserum to the A. rubens CT-type peptde ArCT, which has previously been shown to cross-react with CT-type neuropeptides in other echinoderms (Aleotti et al., 2022). We have added related descriptions in the following sections: Results (section 2.4, line 299-336), Discussion (section 3.3, line 545-554) and Materials and methods (section 4.6, line 785-817). Consistent with this previous finding, the ArCT antiserum labelled neuronal cells and fibers in the central and peripheral nervous system and in the digestive system of A. japonicus (Figure 6). The specificity of immunostaining was confirmed by performing pre-absorption tests with the ArCT antigen peptide (Figure 6-figure supplement 1). The detection of immunostaining in the innervation of the intestine is consistent with PCR results and the relaxing effect of AjCT2 on intestine preparations. Interestingly, no immunostaining was observed in longitudinal muscle, which is inconsistent with the detection of AjCT1/2 transcripts in this tissue. This may reflect differences in the sensitivity of the methods employed to detect transcripts (PCR) and mature peptide (immunohistochemistry). The absence of ArCT-like immunoreactivity in the longitudinal muscles suggests that AjCT1 and AjCT2 may exert relaxing effects on this tissue in vivo via hormonal signaling mechanisms. However, because AjCT1/2 expression in the longitudinal muscles may be below the detection threshold of the ArCT antibodies, we can’t rule out the possibility that AjCT1/2 are released within the longitudinal muscles physiologically.   

      (3) Pharmacology experiments showing the effects of CT1 and CT2 on ACh-induced contractions were performed. Sample traces have been provided but no traces with ACh alone have been included. How long do ACh-induced contractions persist? These controls are necessary to differentiate between the eventual decay of ACh effects and relaxation induced by CT1 and CT2. The traces also do not reflect the results portrayed in dose-response curves. For instance, in Figure 6B, maximum relaxation is reported for 10-6M. Yet, the trace hardly shows any difference before and after the addition of 10-6M peptide. The maximum effect in the trace appears to be after the addition of 10-8M peptide.

      Thank you for the reviewer’s comments. ‌As requested, we have included representative traces of ACh-induced contraction of longitudinal muscle and intestinal preparations (Figure 7—figure supplement 1B and 1C). Notably, the positive control (ACh) maintained contraction effects for at least 15 minutes‌, consistent with its known pharmacological properties. Regarding Figure 7B (previous Figure 6B), ‌the trace illustrates the cumulative effects of successive neuropeptide treatments at increasing concentrations‌. A gradual reduction in response amplitude was observed at the highest peptide concentration, ‌likely reflecting receptor desensitization‌, a phenomenon previously reported for neuropeptide Y and oxytocin (Tsurumaki et al., 2003; Arrowsmith and Wray, 2014). These results are now explicitly described in the Results Section 2.5 (lines 340-345 and 348-352) and discussed in Section 3.3 (lines 569-574). In response to the reviewer’s suggestion‌, we further tested the pharmacological effects of AjCT2 at 10⁻⁶ M. ‌As shown in Figure 7—figure supplement 1A, this concentration induced maximal relaxation‌, confirming its dose-dependent efficacy.

      (4) I am unsure how differences in wet mass indicate feeding and growth differences since no justification has been provided. Couldn't wet mass also be influenced by differences in osmotic balance, a key function of calcitonin-like peptides in protostomian invertebrates? The statistical comparisons have not been included in Figure 7B.

      We appreciate the reviewer's insightful comments. We fully concur that wet mass constitutes an inadequate indicator for evaluating feeding and growth variations. Consequently, we reassessed A. japonicus growth parameters using two established metrics: weight gain rate (WGR) and specific growth rate (SGR), to delineate differences between experimental and control groups. Notably, the high-concentration AjCT2 injection group exhibited statistically significant increases in both WGR and SGR relative to controls (Figure 8A). This demonstrates a putative physiological role of AjCT2 signaling in enhancing feeding efficiency and growth performance in A. japonicus. Detailed methodologies are provided in the Materials and methods Section 4.8 (lines 847-851), with corresponding results presented in the Results Section 2.6 (lines 370-375). Besides, Cong et al., (2024) reported holotocin-induced osmoregulatory function in A. japonicus, manifested by significant wet weight elevation and body bloating. However, our AjCT2 intervention showed no such phenotypic alterations, suggesting that AjCT2 likely does not participate in osmotic balance regulation, at least under these experimental conditions. Crucially, the observed WGR and SGR enhancements following AjCT2 administration was not caused by osmoregulatory effects.

      (5) While the authors succeeded in knocking down CT, the physiological effects of reduced CT signaling were not examined.

      Thank you for the reviewer’s comment. We have supplemented the experiments to investigate the physiological effects of long-term reduced CT signaling following the reviewer’s suggestions, including measuring the dry weight of remaining bait and excrement, calculating the weight gain rate and specific growth rate, and testing the expression levels of three growth factors (AjMegf6, AjGDF-8 and AjIgf) to further assess AjCT2’s role in feeding and growth. The results demonstrated that weight gain rate and specific growth rate in the siAjCTP1/2-1 group were significantly decreased (As shown in Figure 10A). Correspondingly, except in phase I, the siAjCTP1/2-1 group exhibited a significant increase in remaining bait and a decrease in excrement during phases II-VI (Figure 10B). Furthermore, the growth inhibitory factor AjGDF-8 was significantly up-regulated and the growth promoting factor AjMegf6 was significantly down-regulated in siAjCTP1/2-1 group (Figure 10C). These findings further support the potential physiological role of AjCT2 signaling in promoting feeding and growth in A. japonicus. The added results are presented in Figure 10, with related descriptions in Section 2.6 (Results, lines 390-396), Section 3.4 (Discussion, line 597-603) and Section 4.9 (Materials and Methods, lines 879-888).

      Reviewer #1 (Recommendations for the authors):

      (1) The abstract states that loss-of-function tests (RNAi knockdown) reveal a potential physiological role for AjCT2 signaling in promoting feeding and growth in A. japonicus. However, RNAi knockdown was only followed by analysis of transcript expression of CT-like receptors and not by the assessment of feeding or growth.

      Thank you for this helpful feedback. In the revised manuscript, we have supplemented the experiments to investigate the physiological effects of long-term reduced CT signaling, as suggested by the reviewer. These include measuring the dry weight of remaining bait and excrement, calculating the weight gain rate and specific growth rate, and testing the expression levels of the three growth factors (AjMegf6, AjGDF-8 and AjIgf) to further assess the function of AjCT2 on feeding and growth in A. japonicus. The results are as follows:

      (1) The weight gain rate and specific growth rate in the siAjCTP1/2-1 group were significantly decreased (As shown in Figure 10A).

      (2) Correspondingly, except for the phase I, the siAjCTP1/2-1 group had significantly increased remaining bait and decreased excrement during phases II-VI (Figure 10B).

      (3) The growth inhibitory factor AjGDF-8 was significantly up-regulated, while the growth promoting factor AjMegf6 was significantly down-regulated in the siAjCTP1/2-1 group (Figure 10C).

      These findings further support the potential physiological role of AjCT2 signaling in promoting feeding and growth in A. japonicus. We have incorporated these results into ‌Figure 10‌ and added related descriptions in the following sections: Results (section 2.6, line 390-396), Discussion (section 3.4, line 597-603) and Materials and methods (section 4.9, line 879-888).

      Regarding the original statement in the abstract “Furthermore, in vivo pharmacological experiments and loss-of-function tests revealed a potential physiological role for AjCT2 signaling in promoting feeding and growth in A. japonicus.” This sentence effectively summarizes our findings. Therefore, we have retained it in the revised manuscript while supplementing the missing experimental details as requested.

      (2) Information on the statistical tests that were performed is lacking for most experiments. It is recommended to include this information in the figure legends, in addition to the methods section. Details on the phylogenetic analysis (parameters and statistics used) and calculation of half maximal effective concentrations (calculation methods and confidence intervals) also need to be included in the manuscript.

      Thank you for this constructive feedback. As the reviewer suggested, statistical test information‌ has been incorporated into both the figure legends and the “4.10 Statistical Analysis” subsection of the Materials and methods (lines 900-910). Specifically:

      (1)Phylogenetic analysis details‌ (parameters and statistical approaches) are now provided in the Materials and methods section 4.2 (line 675-682);

      (2) Bootstrap test results‌ supporting the phylogenetic trees have been added to Figure 1B and 1C‌;

      (3)Half-maximal effective concentration (EC₅₀) calculations‌, including methodologies and confidence intervals, are documented in both the Figure 2B legend and the “4.10 Statistical Analysis” section (lines 900-910)‌‌.

      (3) In some figures (e.g. Figure 5A, 7A), the n number indicated does not match the number of data points shown in the figure panel. It is not clear what n represents here. In Figure 6B, an x-axis label is missing. In some figure legends (e.g. Figure 4 - Figure Supplement 1), the error bars and significance levels are not defined.

      We apologize for this error; we have corrected all quantity errors related to "n" in the manuscript’ figure legends. And also, the x-axis label was added in Figure 7B (previous Figure 6B), error bars and significance levels were defined in all figure legends clearly

      (4) It would be useful to explain what the difference is between the Cre and SRE luciferase assay and why these two assays were used to study receptor-activated signaling cascades. The source of the synthetic peptides is mentioned, but it is recommended to also state the purity of the synthetic peptides.

      Thank you for the valuable comments. As stated in the introduction (line 66-69)- “binding of CT to CTR in the absence of RAMPs can activate signaling via several downstream pathways, including cAMP accumulation, Ca<sup>2+</sup> mobilization, and ERK activation.” Based on this established mechanism, we selected ‌cAMP and Ca²⁺ signaling pathways‌ as biomarkers for studying receptor-activated cascades, with the following experimental rationale: CRE-Luc Reporter System functions as a cAMP response element detector and SRE-Luc Reporter System serves as an intracellular Ca²⁺ level indicator. In CRE-Luc detection, when the receptor is activated by a ligand, it couples with Gαs protein to activate the cAMP/PKA signaling pathway. The accumulation of cAMP can lead to the phosphorylation of PKA, and then enhance the transcription of CRE-containing genes. Therefore, significant increase in CRE-Luc activity directly correlates with cAMP accumulation. Similarly, SRE-Luc activity reflects dynamic changes in intracellular Ca<sup>2+</sup> levels. We have added the explanation of this part in the materials and methods section 4.4 (line 715-721). The purity of the synthetic peptides was >95%, and we have also added this information in section 4.4 (line 715) according to the reviewer’s suggestion.

      (5) In Figure 3B, it is difficult to see receptor internalization in response to the application of synthetic CT-like peptides, and a control condition (without peptide application) is lacking.

      Thank you for the reviewer’s comment. The control condition (without peptide application) was added in Figure 3-figure supplement 1, which shows the localization of pEGFP-N1/receptors in the cell membrane. Upon stimulation with synthetic CT-like peptides (‌Materials and methods section 2.3‌), the receptors exhibit clear internalization into the cytoplasm, as visualized in ‌Figure 3B‌ through comparative analysis.

      (6) Differences in the activation of downstream signaling cascades between the three receptors are questionable because there is substantial variation in the experimental data and control conditions in different experiments (for example, in Figures 3A and 4A). To better represent this variation, it is recommended to plot individual data points onto the bar graphs in all figures and to nuance the interpretation of putative differences in downstream signaling of different receptors. Differences in the physiological roles of CT-like peptides may be explained by various mechanisms, including differences in peptide/receptor expression or in the potency of peptides to activate different receptors in vivo. It would be useful to elaborate on these different explanations in the discussion.

      We appreciate the reviewer's critical assessment. The observed variations in control conditions across experiments (e.g., Figures 3A & 4A) primarily arise from two methodological factors: ① Each experimental set used cells transfected with distinct receptor subtypes (e.g., AjPDFR1 vs. AjPDFR2), inherently introducing baseline variability due to differential receptor expression profiles. ② Independent cell culture batches were employed for replicate experiments to ensure biological reproducibility.  Importantly, these minor variations ‌did not compromise‌ the statistical significance of downstream signaling differences (p < 0.01 for all comparative analyses). And according to the reviewer’s suggestion, we have plotted individual data points onto the bar graphs in all figures.

      And also, according to the reviewer’s suggestion, we have expanded the discussion on receptor-specific signaling cascades in Section 3.4 (lines 589-609). Key findings include: In vivo pharmacological assays demonstrated that ‌only high concentrations of AjCT2 significantly enhanced feeding and growth rates in A. japonicus‌. In contrast, neither a low concentration of AjCT2 nor any concentration of AjCT1 (low or high) induced detectable effects. Furthermore, ‌long-term knockdown of AjCTP1/2 further validated the essential role of AjCT2 in regulating feeding and growth‌ in this species. To elucidate the receptor mediating AjCT2’s feeding- and growth-promoting effects, we selected AjPDFR2 based on its distinct activation profile:‌ AjCT2 selectively activated AjPDFR2, inducing downstream ERK1/2 phosphorylation, whereas AjCT1 exhibited no activity‌ toward this receptor. Given this receptor specificity, we performed AjPDFR2 knockdown experiments, which revealed phenotypic changes ‌consistent with those in AjCTP1/2 knockdown animals‌, including ‌significantly reduced WGR and SGR‌, alongside ‌increased remaining bait accumulation and diminished excrement output‌ compared to control. Collectively, these results support a model wherein AjCT2 promotes feeding and growth in A. japonicus via AjPDFR2-dependent activation of the cAMP/PKA/ERK1/2 and Gαq/Ca²⁺/PKC/ERK1/2 cascades‌. Considering the inherent complexity of neuropeptide signaling systems, which involve multiple GPCR subtypes coupled to diverse signaling cascades, ligands bound to the same receptor may activate distinct G protein subforms within a single cell (Møller et al., 2003; Mendel et al., 2020). Receptor activation modes may be modulated by structural polymorphisms or binding site diversity (Wong et al., 2000; Changeux, 2010), as well as by the differential efficacy of peptides in activating receptors in vivo‌.  

      (7) For the peptide injection experiments, it is recommended to explain the different animal groups in the results section. In addition, injection in the control condition seems to have a small effect on the wet weight. Therefore, it would be useful to compare control-injected and peptide-injected groups after injection.

      Thank you for the reviewer’s comments. We have provided an expanded explanation of the animal group classifications in Section 2.6 (lines 367–375). We fully agree that a comparative analysis between the experimental and control groups post-injection is essential. However, since wet weight measurement is suboptimal for demonstrating feeding and growth variations, we re-evaluated the data using two validated metrics: weight gain rate (WGR) and specific growth rate (SGR) of A. japonicus. The results revealed that the high-concentration AjCT2 injection group exhibited significantly elevated weight gain rate and specific growth rate compared to the control group, suggesting a potential role of AjCT2 signaling in promoting feeding and growth in A. japonicus. These results are presented in Figure 8A, with detailed descriptions in Results Section 2.6 (lines 370–375) and methodology in Materials and Methods Section 4.8 (lines 847-851).

      (8) Regarding the RNAi knockdown experiments, it is not clear from the methods section what the siNC control exactly is, and how the interference rate is calculated.

      Thank you for this comment. The siNC control was siRNA which does not target any genes in A. japonicus, with interference rates quantified through the 2<sup>-ΔΔCT</sup> method to assess siRNA inhibition efficiency.‌ These methodological details have been incorporated into Materials and Methods Section 4.9 (lines 866–867 and 874-876) for enhanced clarity.‌

      Reviewer #2 (Recommendations for the authors):

      (1) Both the phylogenies are missing bootstrap tests. Please include this analysis. The phylogenetic analyses should also include other Family B ligands and receptors from both vertebrates and invertebrates because it is widely assumed that PDF is related to VIP given their shared roles in circadian clock and gut regulation. Therefore, this analysis needs to be more comprehensive than currently presented. Drosophila melanogaster receptors have also been excluded in spite of the Drosophila PDFR exhibiting ligand promiscuity. The legend should also include the full species names of the various taxa (or modify the figure to include full names) instead of referring to another table. The supplementary table was not available to this reviewer.

      Thank you for the reviewer’s constructive comments. According to the reviewer’s suggestion, we have incorporated the VIPRs and Drosophila melanogaster receptors into the comparative analysis and reanalyzed the phylogenies in Figure 1C, and both phylogenies included bootstrap tests (Figure 1B, 1C) in the revised manuscript. The full species names of the various taxa are listed in supplementary tables 1 and 2 in the revised manuscript.

      (2) Expression data indicate that AjCTP1/2 is expressed in both the longitudinal muscles and intestine. What are the cell types that express AjCTP1/2? Given that the authors show an effect of CT1 and CT2 on both of these tissues, it would be important to know whether this is local regulation (paracrine or autocrine) vs long-distance hormonal control by the nervous system. This can be addressed by performing in situ hybridization or immunohistochemistry of CT (using Asterias rubens CT antibody: https://doi.org/10.3389/fnins.2018.00382) on these tissues.

      Thank you for this feedback. We have now analysed CT-type neuropeptide expression in A. japonicus using immunohistochemistry with the antiserum to the A. rubens CT-type peptde ArCT, which has previously been shown to cross-react with CT-type neuropeptides in other echinoderms (Aleotti et al., 2022). We have added related descriptions in the following sections: Results (section 2.4, line 299-336), Discussion (section 3.3, line 545-554) and Materials and methods (section 4.6, line 785-817). ‌Consistent with this previous finding, the ArCT antiserum labelled neuronal cells and fibers in the central and peripheral nervous system and in the digestive system of A. japonicus (Figure 6). The specificity of immunostaining was confirmed by performing pre-absorption tests with the ArCT antigen peptide (Figure 6-figure supplement 1). The detection of immunostaining in the innervation of the intestine is consistent with PCR results and the relaxing effect of AjCT2 on intestine preparations. Interestingly, no immunostaining was observed in longitudinal muscle, which is inconsistent with the detection of AjCT1/2 transcripts in this tissue. This may reflect differences in the sensitivity of the methods employed to detect transcripts (PCR) and mature peptide (immunohistochemistry). The absence of ArCT-like immunoreactivity in the longitudinal muscles suggests that AjCT1 and AjCT2 may exert relaxing effects on this tissue in vivo via hormonal signaling mechanisms. However, because AjCT1/2 expression in the longitudinal muscles may be below the detection threshold of the ArCT antibodies, we can’t rule out the possibility that AjCT1/2 are released within the longitudinal muscles physiologically.       

      (3) While Drosophila DH31 can activate both PDF and DH31 receptors, the EC50 values differ drastically. Importantly, there is an independent gene encoding PDF which is a more sensitive ligand for the PDF receptor. This is in stark contrast to the situation presented here where the authors have yet to identify the PDF gene in their system. Outside Drosophila this cross signaling between the two systems has not been observed in any species. Based on this, I would argue that the ability of CTs to activate PDFR is not an evolutionary ancient property but rather an example of convergent evolution if supported by more evidence.

      We sincerely appreciate the reviewers' insightful comments.‌ We agree that we cannot rule out the possibilty that ability of CT-type peptides to activate PDF-type receptors in Drosophila and A. japonicus has arisen independently. Therefore, we have modified the text in the discussion accordingly so that this alternative explanation for the effects of CT-type peptides on PDF-type receptors is also presented: “Alternatively, the ability of CT-type neuropeptides to act as ligands for PDF-type receptors in D. melanogaster and A. japonicus may have evolved independently. Further studies on a wider variety of both protostome (e.g. molluscs, annelids) and deuterostome taxa (e.g. other echinoderms, hemichordates) are needed to address this issue.”

      (4) AjCT1 and CT2 can activate the two PDF receptors ex vivo. However, their EC50 values are larger and the responses are lower compared to those seen for the CT receptor. Similar cross-talk between closely related peptide families is often observed in ex vivo systems (see: https://doi.org/10.1016/j.bbrc.2010.11.089 , https://doi.org/10.1073/pnas.162276199 , https://doi.org/10.1093/molbev/mst269 and others). However, very few signaling systems exhibit this type of cross-talk in vivo. Without any in vivo evidence, I suspect that the more likely possibility is that the bona fide endogenous ligand for PDF receptors remains to be discovered. The authors could, however, perform peptide and receptor knockdown experiments and show overlap in phenotypes following CT knockdown and PDFR knockdown to support their claim.

      We sincerely appreciate the reviewers' insightful critique. According to the reviewer’s suggestion, we have supplemented CTP and AjPDFR2 knockdown experiments, and measured the dry weight of remaining bait and excrement, as well as calculating the weight gain rate and specific growth rate in response to phenotypic changes. The results showed that weight gain rate and specific growth rate in experimental groups were significantly decreased respectively (As shown in Figure 10A and 11B), Correspondingly, except for the I phase, the siAjCTP1/2-1 group had significantly increased remaining bait and decreased excrement in II-VI phases (Figure 10B), the remaining bait weight was significantly increased in siAjPDFR2-1 group (except during phase I), while the weight of excrement was significantly decreased in phase V and VI (Figure 11C). Therefore, AjCT and AjPDFR2 knockdown experiments showed overlap in phenotypes, providing evidence that AjCT does act as an endogenous ligand for PDFR. These results were added in Figure 10 and Figure 11. The related description was added in the results section 2.6 (line 390-396), section 2.7 (line 427-439) and the materials and methods section 4.9 (line 879-898). We acknowledge, however, that other peptides, in addition AjCT1 and AjCT2, may also act as ligands for AjPDFR1 and AjPDFR2 in vivo and on-going studies in the Chen (OUC) and Elphick (QMUL) labs are attempting to address this issue

      (5) Why are receptor transcripts upregulated following peptide injection? Usually, increased ligand levels/signaling result in a compensatory decrease in receptor levels. These negative feedback loops maintain optimum signaling levels. Since the authors have successfully implemented RNAi for this CT precursor, what are the phenotypes on growth and feeding?

      We thank the reviewers for raising these critical points. Our responses are structured as follows: Firstly, our findings align with established mechanisms of neuropeptide-induced receptor modulation (Please check the reference Tiptanavattana et al. 2022). Secondly, based on the reviewer’s suggestion, we have supplemented the experiments to detect the phenotype variations on growth and feeding based on long-term reduced CT signaling, including measuring the dry weight of remaining bait and excrement, calculating the weight gain rate and specific growth rate, as well as testing the expression levels of the three growth factors (AjMegf6, AjGDF-8 and AjIgf). The results showed that weight gain rate and specific growth rate in siAjCTP1/2-1 group were significantly decreased (As shown in Figure 10A), Correspondingly, except for the I phase, the siAjCTP1/2-1 group had more remaining bait and less excrement in II-VI phases (Figure 10B). Furthermore, the growth inhibitory factor AjGDF-8 was significantly up-regulated and the growth promoting factors AjMegf6 were significantly down-regulated in siAjCTP1/2-1 group (Figure 10C). We have added these results in Figure 10, with detailed description in the results section 2.6 (line 390-396) and in the materials and methods section 4.9 (line 879-888). And after long-term continuous injections of siAjCTP1/2-1, we further recorded the feeding behavior of these sea cucumbers for three consecutive days. The remaining bait and feces were cleaned and the food was re-placed in the middle of the tank each day. We calculated the aggregation percentage (AP) of sea cucumbers around the food during the peak feeding period (2:00-4:00) each day, which is the best indicator for sea cucumber feeding behavior detecting. The results showed that the AP in siAjCTP1/2-1 group was significantly lower than that in control group. After dissection, we also found the intestines of siAjCTP1/2-1 group had less food and significantly degenerated (see author response image 1). All these results supported that long-term functional loss of AjCT2 negatively influence the feeding and growth of A. japonicus.

      Other comments:

      (6) What criteria do the authors use to classify some proteins as "type", some as "like" and others as "related"? In my opinion, DH31 could be referred to as CT-like or CT-type. Please use one term for clarity unless there is a scientific explanation behind this terminology.

      Thank you for the reviewer’s comment. If you look at the paper by Cai et al. (2018) you will see in Figure 14 that CT-type peptides and DH31-type peptides are paralogous, probably due to a gene duplication in the common ancestor of the protostomes. The CT-related peptides in protostomes that have a disulphide bridge we would describe as CT-type because they have conserved a feature that is found in CT-type peptides in deuterostomes. Whereas the DH31 peptides we would describe as CT-like. But there is not a formal rule on this. It is possible the duplication event that gave rise to DH31 and CT-type peptides occurred in the common ancestor of the Bilateria but DH31-type signaling was lost in deuterostomes. On the other hand, if the gene duplication that gave rise to DH31-type peptides and CT-type peptides in protostomes did occur in a common ancestor of the protostomes, then DH31 and CT-type peptides in protostomes could be described as co-orthologs of CT-type peptides in deuterostomes. In this case, both CT peptides and DH31 peptides in protostomes could be described as CT-type. Here is a useful link for explanation of terms: https://omabrowser.org/oma/type/

      (7) Was genomic DNA removal step performed before cDNA synthesis for qRT-PCR?

      Thank you for the reviewer’s comment. The genomic DNA removal step was performed before cDNA synthesis for qRT-PCR and we have added the information in the section 4.5 (line 774-776).

      (8) Line 70: The presence of calcitonin-like peptides (DH31) and DH31 receptors in invertebrates was discovered long before the discoveries by Jekely 2013 and Mirabeau and Joly 2013. Please credit these original studies: https://pubmed.ncbi.nlm.nih.gov/10841553/ and https://pubmed.ncbi.nlm.nih.gov/15781884/.

      Thank you for the reviewer’s comment. We have credited these original studies in the revised manuscript.

      (9) Lines 72-74: Please cite https://pubmed.ncbi.nlm.nih.gov/24359412/.

      Thank you for the reviewer’s comment. We have cited it in the revised manuscript.

      (10) Line 87: Please cite https://pubmed.ncbi.nlm.nih.gov/15781884/.

      Thank you for the reviewer’s comment. We have cited it in the revised manuscript.

      (11) Lines 89-91: The functional significance of DH31 signalling to PDFR in Drosophila is known. See: https://pubmed.ncbi.nlm.nih.gov/15781884/ and https://pubmed.ncbi.nlm.nih.gov/30696873/. There are several studies that have shown the functions of DH31 signalling via DH31R.

      Thank you for the reviewer’s comment. We have corrected it and added all this studies in the revised manuscript.

      (12) Figure 1 Supplement 1: The tertiary models for CT1 and CT2 look completely different. This prediction is not in line with both ligands activating the same receptor.

      Thank you for the reviewer’s comment. We have deleted this supplementary figure.

      (13) Figure 1 Supplement 3 legend: Please add panel labels next to the corresponding receptor.

      Thank you for the reviewer’s comment. We have added panel labels next to the corresponding receptors as you suggested.

      (14) Figure 2: What does CO refer to?

      Thank you for the reviewer’s comment. CO (Control) refers to the stimulation of HEK293T transfected cells with serum-free DMEM, and we have added the detailed information in Figure 2 legend (line 251-252).

      (15) Figure 3: Due to the low magnification of the cells, it is difficult to see the localization of the receptor. It would also be more appropriate to use a membrane marker rather than DAPI which does not label the cytoplasm or membrane where the receptor can be found.

      we appreciate the reviewer's insightful comment regarding the experimental controls.‌ The baseline receptor localization data under non-stimulated conditions are presented in ‌Figure 3—figure supplement 1‌, demonstrating constitutive membrane distribution of pEGFP-N1-tagged receptors. Upon stimulation with synthetic CT-like peptides, qualitative imaging analysis revealed significant ligand-induced receptor internalization into the cytoplasm (Figure 3B).

      (16) Figure 9: Please include PDF precursor and receptor as separate columns. Also, Drosophila CT/DH31 receptors have been characterized.

      Thank you for the reviewer’s comment. We have added PDF precursor, predicted peptides and receptors as separate columns in the revised manuscript Figure 12. And also, we corrected the error summary of Drosophila CT/DH31 receptors according to your suggestions.

      (17) Table 1: It is not very clear why there are multiple columns for ERK1/2 with different outcomes.

      Thank you for the reviewer’s comment. Although the cAMP/PKA or Gαq/Ca<sup>2+</sup>/PKC signaling is activated after ligand binding to receptors, the downstream ERK1/2 cascade is not necessarily activated. Therefore, we counted the activation status of cAMP/PKA and its downstream ERK1/2 cascade, and Gαq/Ca<sup>2+</sup>/PKC and its downstream cascade in Table 1 respectively. We have optimized Table1 to make it clearer in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary: As TDP-43 mislocalization is a hallmark of multiple neurodegenerative diseases, the authors seek to identify pathways that modulate TDP-43 levels. To do this, they use a FACS based genome wide CRISPR KD screen in a Halo tagged TDP-43 KI iPSC line. Their screen identifies a number of genetic modulators of TDP-43 expression including BORC which plays a role in lysosome transport.

      Strengths:

      Genome wide CRISPR based screen identifies a number of modulators of TDP-43 expression to generate hypotheses regarding RNA BP regulation and perhaps insights into disease.

      Weaknesses:

      It is unclear how altering TDP-43 levels may relate to disease where TDP-43 is not altered in expression but mislocalized. This is a solid cell biology study, but the relation to disease is not clear without providing evidence of BORC alterations in disease or manipulation of BORC reversing TDP-43 pathology in disease.

      We thank the reviewer for this comment and have updated the discussion to include more discussion of the role TDP-43 may play in the BORCS8-associated neurodegenerative disorder and how understanding how lysosome localization changing TDP-43 levels may help patients (lines 313-321).

      The mechanisms by which BORC and lysosome transport modulate TDP-43 expression are unclear. Presumably, this may be through altered degradation of TDP protein but this is not addressed.

      We agree with the reviewer that understanding the mechanism by which lysosome transport regulates TDP-43 levels is important and plan to examine this in future studies.

      Previous studies have demonstrated that TDP-43 levels can be modulated by altering lysosomal degradation so the identification of lysosomal pathways is not particularly novel.

      We thank the reviewer for this comment and have updated the text to make this clearer (lines 310-313). What hasn’t been observed previously is a change in lysosome localization affecting TDP-43 levels.

      It is unclear whether this finding is specific to TDP-43 levels or whether lysosome localization may more broadly impact proteostasis in particular of other RNA BPs linked to disease.

      We agree that this is an interesting question and something that should be investigated in future studies.

      Unclear whether BORC depletion alters lysosome function or simply localization.

      We thank the reviewer for this comment. Lysosome function related to protein turnover has not yet been examined in the literature after loss of BORC, but other aspects of lysosome function (including lipid metabolism and autophagic flux) have been shown to be disrupted upon loss of BORC. We have updated the discussion to address this (lines 292-296).

      Reviewer #2 (Public review):

      Summary: The authors employ a novel CRISPRi FACS screen and uncover the lysosomal transport complex BORC as a regulator of TDP-43 protein levels in iNeurons. They also find that BORC subunit knockouts impair lysosomal function, leading to slower protein turnover and implicating lysosomal activity in the regulation of TDP-43 levels. This is highly significant for the field given that a) other proteins could also be regulated in this way, b) understanding mechanisms that influence TDP-43 levels are significant given that its dysregulation is considered a major driver of several neurodegenerative diseases and c) the novelty of the proposed mechanism.

      Strengths:

      The novelty and information provided by the CRISPRi screen. The authors provide evidence indicating that BORC subunit knockouts impair lysosomal function, leading to slower protein turnover and implicating lysosomal activity in the regulation of TDP-43 levels and show a mechanistic link between lysosome mislocalization and TDP-43 dysregulation. The study highlights the importance of localized lysosome activity in axons and suggests that lysosomal dysfunction could drive TDP-43 pathologies associated with neurodegenerative diseases like FTD/ALS. Further, the methods and concepts will have an impact to the larger community as well. The work also sets up for further work to understand the somewhat paradoxical findings that even though the tagged TDP-43 protein is reduced in the screen, it does not alter cryptic exon splicing and there is a longer TDP-43 half-life with BORC KD.

      Weaknesses:

      While the data is very strong, the work requires some additional clarification.

      We thank the reviewer for these comments. Our detailed responses are included below in the “recommendations for authors” section.

      Reviewer #3 (Public review):

      Summary: In this work, Ryan et al. have performed a state-of-the-art full genome CRISP-based screen of iNeurons expressing a tagged version of TDP-43 in order to determine expression modifiers of this protein. Unexpectedly, using this approach the authors have uncovered a previously undescribed role of the BORC complex in affecting the levels of TDP-43 protein, but not mRNA expression. Taken together, these findings represent a very solid piece of work that will certainly be important for the field.

      Strengths:

      BORC is a novel TDP-43 expression modifier that has never been described before and it seemingly acts on regulating protein half life rather than transcriptome level. It has been long known that different labs have reported different half-lives for TDP-43 depending on the experimental system but no work has ever explained these discrepancies. Now, the work of Ryan et al. has for the time identified one of these factors which could account for these differences and play an important role in disease (although this is left to be determined in future studies).

      The genome wide CRISPR screening has demonstrated to yield novel results with high reproducibility and could eventually be used to search for expression modifiers of many other proteins involved in neurodegeneration or other diseases

      Weaknesses:

      The fact that TDP-43 mRNA does not change following BORCS6 KD is based on a single qRT- PCR that does not really cover all possibilities. For example, the mRNA total levels may not change but the polyA sites may have switched from the highly efficient pA1 to the less efficient and nuclear retained pA4. There are therefore a few other experiments that could have been performed to make this conclusion more compelling, maybe also performing RNAscope experiments to make sure that no change occurred in TDP-43 mRNA localisation in cells.

      We thank the reviewer for this comment. To address this point, we performed an analysis of polyA sites on our RNA sequencing data using REPAC and did not find a change in TDP-43 poly adenylation after BORC KD (Figure S6C). Other transcripts do have altered polyA sites, which are summarized in Figure S6C. We also performed HCR FISH for TARDBP mRNA in TDP-43 and BORC KD neurons. While we did not see a difference in RNA localization (see A below, numbers on brackets indicate p-values), we also were not able to detect a significant difference in total TARDBP mRNA levels upon TDP-43 KD (see B below, numbers on brackets indicate p-values), suggesting that some of the signal detected is non-specific to TARDBP. Because of this, we cannot conclusively say that BORC KD does not alter TARDBP mRNA localization using the available tools.

      Author response image 1.

      Even assuming that the mRNA does not change, no explanation for the change in TDP-43 protein half life has been proposed by the authors. This will presumably be addressed in future studies: for example, are mutants that lack different domains of TDP-43 equally affected in their half-lives by BORC KD?. Alternatively, can a mass-spec be attempted to see whether TDP-43 PTMs change following BORCS6 KD?

      We agree with the reviewer that these are important experiments that could be done in the future to further examine the mechanism by which loss of BORC alters TDP-43 half-life. We examined our proteomics data for differential phosphorylation and ubiquitination in NT vs BORC KD (Figure S7G-H). We were unable to detect PTMs on TDP-43, so we cannot say if they contribute to the change in TDP-43 half-life we observed.

      Reviewer #1 (Recommendations for the authors):

      Recommendations are detailed in the public review.

      Reviewer #2 (Recommendations for the authors):

      Ryan et al, employ a CRISPRi FACS screen and uncover the lysosomal transport complex BORC as a regulator of TDP-43 protein levels in iNeurons. The authors provide strong evidence indicating that BORC subunit knockouts impair lysosomal function, leading to slower protein turnover and implicating lysosomal activity in the regulation of TDP-43 levels. The authors then provided additional evidence of TDP-43 perturbations under lysosome-inhibiting drug conditions, underscoring a mechanistic link between lysosome mislocalization and TDP-43 dysregulation. The study highlights the importance of localized lysosome activity in axons and suggests that lysosomal dysfunction could drive TDP-43 pathologies associated with neurodegenerative diseases like FTD/ALS. The work is exciting and could be highly informative for the field.

      Concerns: There are some disconnects between the figures and the main text that can benefit from refining of the figures to align better with the main text. This does not require additional experiments other than perhaps Figure 4B. The impact of the work could be further discussed - it is an interesting disconnect between the fact BORC KD causes decreased IF of the Halo-tagged TDP-43 and lysosomal transport, however this reduction does not impact cryptic exon expression and also increases TDP-43 half life (and of other proteins). It is a very interesting and potentially informative part of the manuscript.

      We thank the reviewer for their detailed reading of our manuscript. We have endeavored to better match the figures and the text and have added more discussion of the impact of the work.

      Minor:

      (1) Suggestion: relating to the statement "Gene editing was efficient, with almost all selected clones correctly edited." - please provide values or %.

      We updated the text to remove the statement about the editing efficiency, instead saying we identified a clone that was correct for both sequence and karyotype (lines 83-85).

      (2) Relating to Figure 1A: Please provide clarification regarding tagging strategy with the halotag - e.g. why in front of exon2.

      We updated the figure legend to reflect that the start codon for TDP-43 is in exon 2, hence why we placed the HaloTag there.

      (3) Relating to Figure S1: A and B seems to have been swapped.

      We thank the reviewer for catching this mistake and have fixed the figure/text.

      (4) Relating to Figure 1B: figure legend does not indicate grayscale coloring of TDP-43 signal.

      We have added text in the figure legend to indicate that the Halo signal is shown in grayscale in the left-handed panels.

      (5) Relating to Figure 1C: can the authors clarify abbreviation for 'NT' in text and legend.

      We thank the reviewer for catching this and have indicated in the text and figure legend that NT refers to the non-targeting sgRNA that was used as a control for comparison to the TDP-43 KD sgRNA.

      (6) Relating to figure 2B and S2A: main text mentioned "Non-targeting Guides" however the figure does not show non-targeting guides to confirm.

      We thank the reviewer for catching this oversight, we updated the figure legends for these figures to indicate that the non-targeting (NT) guides are shown in gray on the rank plot. They cluster towards the middle, more horizontal portion of the graphs, showing that the more vertical sections of the graph are hits.

      (7) Suggestion: To make it easier on the reader, please provide overlap numbers for the following statement ..."In comparing the top GO terms associated with genes that increase or decrease Halo-TDP-43 levels in iNeurons, we found that almost none altered Halo-TDP-43 levels in iPSCs...".

      We thank the reviewer for this comment and have updated the text to indicate that only a single term is shared between the iPSC and iNeuron screens (lines 113-117).

      (8) Relating to the statement "We cloned single sgRNA plasmids for 59 genes that either increased or decreased Halo-TDP-43 in iNeurons but not in iPSCs." Can the authors provide a list of the 59 genes.

      We have included a new column in the supplemental table S1 indicating the result of the Halo microscopy validation to hopefully clarify which genes lead to a validated phenotype and which did not.

      (9) Relating to the statement "To rule out the possibility of neighboring gene or off-target effects of CRISPRi, as has been reported previously15, we examined the impact of BORC knockout (KO) on TDP-43 levels. Using the pLentiCRISPR system, which expresses the sgRNA of interest on the same plasmid as an active Cas916 we found that KO of BORCS7 using two different sgRNAs decreased TDP-43 levels by immunofluorescence (Figure 5C-D)." Please provide clarification as to why BORCS7 was chosen out of all the BORCS? From the data presentation thus far (Figure 4B & 5A), the reader might have anticipated testing BORCS6 for panels 5C-D.

      We thank the reviewer for this comment. We tried a couple of BORCs with the pLentiCRISPR system, but BORCS7 was the only one we were convinced we got functional knockout for based on lysosome localization. We think that either the guides were not ideal for the other BORC components we tried, or we did not get efficient gene editing across the population of cells tested. Because we had previously been working with knock down and CRISPRi guides are not the same as CRISPR knock out guides, we couldn’t use the existing guide sequences we know work well for BORC. Since loss of one BORC gene causes functional loss of the complex and restricts lysosomes to the soma, we did not feel it necessary to assay all 8 genes.

      (10) Relating to the statement "We treated Halo-TDP-43 neurons with various drugs that disrupt distinct processes in the lysosome pathway and asked if Halo-TDP-43 levels changed. Chloroquine (decreases lysosomal acidity), CTSBI (inhibits cathepsin B protease), ammonium chloride (NH4Cl, inhibits lysosome-phagosome fusion), and GPN (ruptures lysosomal membranes) all consistently decreased Halo-TDP-43 levels (Figure 6A-B, S5A-C)" Please provide interpretations for Figures S5A and S5C in text.

      We thank the reviewer for catching this oversight and have updated the text accordingly (lines 183-191).

      (11) Relating to figure 6E: please provide in legend what the different colors used correlate with (i.e. green/brown for BORCS7 KD)?

      We thank the reviewer for pointing this out. These colors were mistakenly left in the figure from a version looking to see if the observed effects were driven by a single replicate rather than a consistent change (each replicate has a slightly different color). As the colors are intermingled and not separated, we concluded the effect was not driven by a single replicate. The colors have been removed from the updated figure for simplicity.

      (12) Relating to the statement "We observed a similar trend for many proteins in the proteome (Figure 8B)" This statement can benefit from stating which trend the authors are referring to, it is currently unclear from the volcano plot shown for Figure 8B.

      We thank the reviewer for catching this and have updated the text accordingly.

      (13) Relating to the statement "For almost every gene, we observed an increase or decrease in Halo-TDP-43 levels without a change in Halo-TDP-43 localization or compartment specific level changes (Figure 4B)." Please provide: (1) the number of genes examined, (2) additional clarification of "localization" and "compartment specific" level changes, (3) some quantification and or additional supporting data of the imaging results. Figures 5A-B presents with the same concern relating to the comment "To determine if results from Halo-TDP-43 expression assays also applied to endogenous, untagged TDP-43 levels, we selected 22 genes that passed Halo validation and performed immunofluorescence microscopy for endogenous (untagged) TDP-43 (Figure 4D-G,5A-B, S4E-F)." please clarify further.

      We thank the reviewer for requesting this clarification. This statement refers to all 59 genes tested by Halo imaging; only one (MFN2) showed any hints of aggregation or changes in localization, every other gene (58) showed what appeared to be global changes in Halo-TDP-43 levels. We were initially intrigued by the MFN2 phenotype; however, we were unable to replicate it on endogenous TDP-43 and thus concluded that this might be an effect specific to the tagged protein. The representative images shown in Figure 4B are representative of the changes we observed across all 59 genes tested (if changes were present). From the 59 genes that we observed a change in Halo-TDP-43 levels by microscopy, we selected a smaller number to move forward to immunofluorescence for TDP-43. We picked a subset of genes from each of the different categories we had identified (mitochondria, m6A, ubiquitination, and some miscellaneous) to validate by immunofluorescence, thinking that genes in the same pathway would act similarly. We have added a column to the supplemental table S1 indicating which genes were tested by immunofluorescence and what the result was. We have also attempted to clarify the results section to make the above clearer.

      (14) Relating to the statement "To determine if results from Halo-TDP-43 expression assays also applied to endogenous, untagged TDP-43 levels, we selected 22 genes that passed Halo validation and performed immunofluorescence microscopy for endogenous (untagged) TDP-43 (Figure 4D-G, 5A-B, S4E-F). Of these, 18 (82%) gene knockdowns showed changes in endogenous TDP-43 levels (Figure 4D-G, S4E-F)." It is difficult to identify the 18 or 22 genes in the figures as described in the main text.

      We added columns to the supplemental table S1 listing the genes and the result in each assay.

      (15) Relating to figures S7A and 8A and the first part of the section "TDP-43, like the proteome, shows longer turnover time in BORC KD neurons" Can the authors provide clarification why the SunTag assay was performed with BORCS6 KD (S7A) but the follow-up experiment (8A) was performed with BORCS7 KD. Does BORCS6 KD show similar results as BORCS7 with the SunTag assay, and does TDP-43 protein abundance with BORCS7 KD show similar results as BORCS6?

      Because loss of any of the 8 BORC genes causes functional loss of BORC and lysosomes to be restricted to the peri-nuclear space, we used BORC KDs interchangeably. Additionally, all BORC KDs had similar effects on Halo-TDP-43 levels.

      Reviewer #3 (Recommendations for the authors):

      Adding more control experiments that TDP-43 mRNA is really not affected following BORC KD

      We performed a FISH experiment to examine TARDBP mRNA localization upon BORC KD but were unable to conclusively say whether BORC KD changes TARDBP mRNA localization (see above). We also analyzed our RNA sequencing experiment for alternative polyadenylation sites upon BORC KD. Results are in Figure S6C.

      Although this could be part of a future study, the authors should try and determine what are the changes to TDP-43 that drive a change in the half-life.

      We agree with the reviewer that these are important experiments and hope to figure this out in the future.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      The manuscript by Sayeed et al. uses a comprehensive series of multi-omics approaches to demonstrate that late-stage human cytomegalovirus (HCMV) infection leads to a marked disruption of TEAD1 activity, a concomitant loss of TEAD1-DNA interactions, and extensive chromatin remodeling. The data are thoroughly presented and provide evidence for the role of TEAD1 in the cellular response to HCMV infection.

      However, a key question remains unresolved: is the observed disruption of TEAD1 activity a direct consequence of HCMV infection, or could it be secondary to the broader innate antiviral response? In this respect, the study would benefit from experiments that assess the effect of TEAD1 overexpression or knockdown/deletion on HCMV replication dynamics. Such functional assays could help delineate whether TEAD1 perturbation directly influences viral replication or is part of a downstream/indirect cellular response, providing deeper mechanistic insights.

      To examine the effect of TEAD1 on HCMV, we performed an experiment in primary human foreskin fibroblasts (HFF) which were stably transduced with constitutive TEAD1. To constitutively express TEAD1, we cloned the open reading frame of TEAD1 into pLenti-puro (Plasmid #39481 from Addgene). We selected for transduced cells using puromycin. For these experiments, we first assessed two multiplicities of infection (MOI): 1 and 10 (Reviewer Response Figure 1). Based on the TEAD1 expression in these cells relative to non-transduced HFF cells, we performed HCMV infection experiments in cells transduced with TEAD1 lentivirus at an MOI of 1.

      For infections, we used a version of HCMV in which the C terminus of the capsi-associated tegument protein pUL32 (pp150) is tagged by enhanced green fluorescent protein (GFP) (PMID: 15708994). This experimental design allowed us to assess the impact of constitutive TEAD1 expression on HCMV infection. GFP and immediate early protein expression levels were measured 48 hours after infection by flow cytometry.

      After infecting parent cells (no constitutive TEAD1) and TEAD1 constitutively expressing cells with a GFP-positive HCMV at MOIs of 0.3 and 1, we identified equivalent GFP expression in the two conditions, indicating equivalent levels of HCMV infection 48 hours after initial infection (Reviewer Response Figure 1A). We also identified equivalent immediate early protein expression at 48 hours after infection, as measured both by percent positivity (Reviewer Response Figure 1B) and mean florescent intensity (Reviewer Response Figure 1C). At 96 hours with an MOI of 3, constitutive expression of TEAD1 led to a slight reduction in the expression of the HCMV proteins pp65 (encoded by UL83) and UL44 at 72 and 96 hours post initial infection (Reviewer Response Figure 1D). These results suggest that TEAD1 expression has minimal effects, if any, on the expression of these two late HCMV proteins in fibroblasts.  Regulation of particular HCMV genes by TEAD1 is likely to be central for HCMV replication and reactivation in other specialized cell types relevant to viral pathogenesis and disease. However, definitive studies are beyond the scope of the current study. 

      Author response image 1.

      Constitutive TEAD1 expression reduces expression of two HCMV late genes at 72 and 96 hours after infection. A-C. Primary human foreskin fibroblasts with and without constitutive TEAD1 expression were infected with pp150-GFP HCMV at a multiplicity of infection (MOI) of 0.3 or 1 and assessed 48 hours post infection. A. HCMV positive cells were quantified by measuring the percent of cells that were GFP positive. B. The percentages of immediate early (IE1/IE2) positive cells were quantified by flow cytometry. C. The mean florescence intensity of immediate early positive cells was quantified by flow cytometry. D. Primary human foreskin fibroblasts with and without constitutive TEAD1 expression were infected with pp150-GFP HCMV at an MOI of 1 and assessed by Western blot at various time point post infection. UL44 and pp65 are expressed late in the cascade of HCMV gene expression. TEAD1 expression levels and uncropped Westerns are provided in Supplemental Figure S8

      Reviewer Response Methods:

      Flow cytometric analysis of viral entry and spread using GFP expression and HCMV immediate early (IE) protein staining

      Parental and TEAD1 transduced human foreskin fibroblasts were seeded into 12-well plates at 1.0 × 10<sup>5</sup> cells per well and either mock infected or infected with pp150-GFP HCMV (PMID: 15708994) at MOIs of 0.3 or 1 on the same day. Cells were trypsinized at appropriate time points and then neutralized with complete medium. Cell suspensions were spun down at 500g for 5 minutes, and the cell pellet was fixed in 70% ethanol for 30 minutes. Following fixation, cells were permeabilized in phosphate-buffered saline (PBS) containing 0.5% bovine serum albumin (BSA) and 0.5% Tween 20 for 10 minutes at 4°C, pelleted, and then stained with IE1/IE2 antibody (mAb810-Alexa Fluor 488) diluted in PBS supplemented with 0.5% BSA for 2 hours. Cells were washed with PBS supplemented with 0.5% BSA–0.5% Tween 20 and then resuspended in PBS. Cells were analyzed using a flow cytometer (BD Biosciences). Infected cells were also trypsinized at appropriate time points, neutralized in the appropriate media, and directly analyzed for GFP positivity on the flow cytometer.

      Western blot analyses of HCMV protein expression in infected cells with and without constitutive TEAD1 expression

      TEAD1 transduced and parental human foreskin fibroblasts were seeded into 6-well cell culture plates at a density of 3.0 × 10<sup>5</sup> cells per well and either mock infected or infected with pp150-GFP HCMV (PMID: 15708994) at an MOI of 1. Whole-cell lysates were collected at various time points post-infection, separated by SDS-PAGE, and transferred to nitrocellulose for Western blot analysis. Western blots were probed with the following primary antibodies: anti-IE1/IE2 (Chemicon), anti-UL44 (kind gift of John Shanley), anti-pp65 (Virusys Corporation), and cellular β-actin antibody (Bethyl Laboratories). Next, each blot was incubated with appropriate horseradish peroxidase-conjugated anti-rabbit or anti-mouse IgG secondary antibodies. Chemiluminescence was detected and quantified using a C-DiGit blot scanner from Li-Cor.

      Reviewer #2 (Public review):

      Summary:

      This work uses genomic and biochemical approaches for HCMV infection in human fibroblasts and retinal epithelial cell lines, followed by comparisons and some validations using strategies such as immunoblots. Based on these analyses, they propose several mechanisms that could contribute to the HCMV-induced diseases, including closing of TEAD1-occupying domains and reduced TEAD1 transcript and protein levels, decreased YAP1 and phospho-YAP1 levels, and exclusion of TEAD1 exon 6.

      Strengths:

      The genomics experiments were done in duplicates and data analyses show good technical reproducibility. Data analyses are performed to show changes at the transcript and chromatin level changes, followed by some Western blot validations.

      Weaknesses:

      This work, at the current stage, is quite correlative since no functional studies are done to show any causal links. For readers who are outside the field, some clarifications of the system and design need to be stated.

      Reviewer #2 (Recommendations for the authors):

      Here are some specific questions:

      (1) Since all current analyses are correlative, it is difficult to know which changes are of biological significance. For example, experiments manipulating TEAD transcription factor or YAP with effects on how cells respond to HCMV infection would significantly strengthen the conclusions, which are largely speculations now.

      Please see response to Reviewer 1, which highlights newly added functional assays that include the constitutive (forced) expression of TEAD1, as suggested.

      (2) How similar are these cell lines (human fibroblasts and retinal epithelial cell lines) resembling the actually infected cells in patients that lead to symptoms?

      In infected cells in patients, HCMV initially infects both fibroblasts and epithelial cells. HCMV penetrates fibroblasts by fusion at the cell surface but is endocytosed into epithelial cells (PMID: 18077432). Thus, most experimental studies of HCMV in vitro use primary human foreskin fibroblasts and a retinal epithelial cell line, as we do in this study.

      Additional information on primary human fibroblasts as a model of HCMV infection in humans

      There is a nice review article that provides the history of the study of the molecular biology of HCMV that describes how Stanley Plotkin from the Wistar Institute first identified human fibroblast HCMV infected cells (PMID: 24639214). The primary fibroblasts of the foreskin of neonates are available commercially (sometimes called HS68) and model neonatal HCMV infection. Neonatal HCMV, or Congenital Cytomegalovirus, is a leading cause of congenital infection and a significant cause of non-genetic hearing loss in the US (https://www.cdc.gov/cytomegalovirus/congenital-infection/index.html). While many infected newborns appear healthy at birth, a substantial percentage experience long-term health problems, including hearing loss, developmental delays, and vision problems (PMID: 39070527). 

      More information on ARPE-3 as a model of HCMV infection in humans

      HCMV retinitis is a leading cause of vision loss and results from HCMV infection of retinal cells. Retinal epithelial cells are the primary target for HCV infection in the eye. The cell line ARPE-19 is derived from a primary human adult retinal pigment epithelium explant and is commonly used to study HCMV and is thought to be physiologically relevant to the human infection (PMID: 8558129 and 28356702). When compared to primary retinal pigment epithelia, ARPE-19 cells develop a similar cellular and molecular phenotype to primary cells from adults and neonates (PMID: 28356702).

      (3) What is the rationale for using 48 hours' infection? Is this the typical timeframe for patients to develop symptoms?

      HCMV genes are expressed in a temporally controlled manner (PMID: 35417700). Early genes (within the first 4 hours) are involved in regulating transcription, while genes within 4-48 hours are involved in DNA replication and further transcriptional regulation. The 48 hour mark corresponds to the onset of significant viral replication and interactions between the virus and the host immune response. After 48 hours, late genes are expressed, which encode structural proteins as well as viral proteins that inhibit host anti-viral responses.  Most studies that focus on the role of HCMV’s early and immediate early genes are performed at 24 or 48 hours. Similarly, most studies that assess the initial innate immune response to HCMV are performed within the initial 48 hours after in vitro infection.

      In most people with healthy immune systems, there are no symptoms (PMID: 34168328). While 60% of people in developed countries and 90% of those in developing countries are serologically positive for past infection, it is challenging to study the kinetics of symptom development due to heterogeneity in the initial virion exposure, the cell types that are initially infected, and immune response. HCMV persists throughout the lifetime of the infected individual by establishing latent infection.

      Also, among all these large-scale global changes, what are primary and what are secondary?

      A kinetic study with many timepoints would be needed to identify the primary and secondary genomic changes associated with HCMV infection. These experiments, while exciting, are beyond the scope of this manuscript.

      (4) Fig.2: In addition to the changes for each cell type, comparison of unchanged, closed and opened with infection regions between the two cell types could be informative for commonalities and differences between cell types.

      This was a good suggestion.  We have added a new Supplemental Figure S2, which compares the differentially accessible regions between the two cell types:

      We have also added the following sentence to the Results section:

      “Comparison of differentially accessible chromatin between ARPE and HFF revealed that the vast majority of the HCMV-induced changes are specific to one of the two cell types (Supplemental Figure S2).”

      (5) "Of the 23,018 loops present in both infected and uninfected cells, only 10 are differential at a 2-fold cutoff and a false discovery rate (FDR) <0.01."

      We thank the reviewer for drawing our attention to the differential chromatin looping analysis.  Your comment prompted us to re-examine the methodologies we employed to identify differential chromatin looping events between uninfected and infected cells.  In the process, we realized that the relatively low resolution of chromatin looping assays such as HiChIP might require additional care in classifying a particular loop as shared or differential when comparing two experimental conditions. We have thus revamped our differential chromatin looping methodologies by adding 5kb “pads” to either end of each chromatin loop “anchor”.

      The corresponding passage now reads:

      “We next used the HiChIP data to identify HCMV-dependent differential chromatin looping events (see Methods). In total, uninfected cells have 143,882 loops. With HCMV infection, 90,198 of these loops are lost, and 44,045 new loops are gained (Supplemental Dataset 3). Because the number of altered loops was large, we repeated loop calling and differential analysis with FDR values less than 0.05, 0.01, and 0.001 (Supplemental Dataset 3). For all three cutoffs, the percentage of loops specific to an infection state were very similar. We also randomly downsampled the number of input pairs used for calling loops to verify that our results were not due to a difference in read depth (Supplemental Dataset 3). For the three smaller subsets of data, the number of loops specific to an infection state only changed slightly. The full quantification of each chromatin looping event and comparisons of events between conditions are provided in Supplemental Dataset 6.”

      Are these cells asynchronous and how to determine whether certain changes are not due to cell cycle stage differences?

      Cells were plated to an identical density of cells per well before either mock or HCMV infection for this study. Based on the differentially expressed genes cell cycle pathways were not amongst the top 50 enriched molecular pathways.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Weakness:

      Although a familiarity preference is not found, it is possible that this is related to the nature of the stimuli and the amount of learning that they offer. While infants here are exposed to the same perceptual stimulus repeatedly, infants can also be familiarised to more complex stimuli or scenarios. Classical statistical learning studies for example expose infants to specific pseudo-words during habituation/familiarisation, and then test their preference for familiar vs novel streams of pseudo-words. The amount of learning progress in these probabilistic learning studies is greater than in perceptual studies, and familiarity preferences may thus be more likely to emerge there. For these reasons, I think it is important to frame this as a model of perceptual habituation. This would also fit well with the neural net that was used, which is processing visual stimuli rather than probabilistic structures. If statements in the discussion are limited to perceptual paradigms, they would make the arguments more compelling. 

      Thank you for your thoughtful feedback. We have now qualified our claims more explicitly throughout the manuscript to clarify the scope of our study. Specifically, we have made the following revisions:

      (1) Title Update: We have modified the title to “A stimulus-computable rational model of visual habituation in infants and adults” to explicitly specify the domain of our model.

      (2) Qualifying Language Throughout Introduction: We have refined our language throughout the introduction to ensure the scope of our claims is clear. Specifically, we have emphasized that our model applies to visual habituation paradigms by incorporating qualifying language where relevant. At the end of Section 1, we have revised the statement to: "Habituation and dishabituation to sequential visual stimuli are well described by a rational analysis of looking time." This clarification makes sure that our model is framed within the context of visual habituation paradigms, particularly those involving structured sequences of stimuli, while acknowledging that habituation extends beyond the specific cases we study.

      (3) New Paragraph on Scope in the Introduction: We have added language in the Introduction acknowledging that while visual habituation is a fundamental mechanism for learning, it is not the only form of habituation. Specifically, we highlight that: “While habituation is a broadly studied phenomenon across cognitive domains—including language acquisition, probabilistic learning, and concept formation—our focus here is on visual habituation, where infants adjust their attention based on repeated exposure to a visual stimulus.”

      (4) New Paragraph on Scope in the General Discussion: We have also revisited this issue in the General Discussion. We added a dedicated paragraph discussing the scope: “This current work focuses on visual habituation, a fundamental but specific form of habituation that applies to sequential visual stimuli. While habituation has been studied across various domains, our model is specifically designed to account for looking time changes in response to repeated visual exposure. This focus aligns with our choice of perceptual representations derived from CNNs, which process visual inputs rather than abstract probabilistic structures. Visual habituation plays a foundational role in infant cognition, as it provides a mechanism for concept learning based on visual experience. However, it does not encompass all forms of habituation, particularly those involving complex rule learning or linguistic structures. Future work should investigate whether models like RANCH can be extended to capture habituation mechanisms in other learning contexts.”

      Reviewer #2 (Public review):

      There are no formal tests of the predictions of RANCH against other leading hypotheses or models of habituation. This makes it difficult to evaluate the degree to which RANCH provides an alternative account that makes distinct predictions from other accounts. I appreciate that because other theoretical descriptions haven't been instantiated in formal models this might be difficult, but some way of formalising them to enable comparison would be useful. 

      We appreciate the reviewer's concern regarding formal comparisons between RANCH and other leading hypotheses of habituation. A key strength of RANCH is that it provides quantitative, stimulus-computable predictions of looking behavior—something that existing theoretical accounts do not offer. Because previous models can not generate predictions about behaviors, we can not directly compare the previous model with RANCH. 

      The one formal model that the reviewer might be referring to is the Goldilocks model, discussed in the introduction and shown in Figure 1. We did in fact spend considerable time in an attempt to implement a version of the Goldilocks model as a stimulus-computable framework for comparison. However, we found that it required too many free parameters, such as the precise shape of the inverted U-shape that the Goldilocks model postulates, making it difficult to generate robust predictions that we would feel confident attributing to this model specifically. This assertion may come as a surprise to a reader who expects that formal models should be able to make predictions across many situations, but prior models 1) cannot be applied to specific stimuli, and 2) do not generate dynamics of looking time within each trial. These are both innovations of our work. Instead, even prior formal proposals derive metrics (e.g., surprisal) that can only be correlated with aggregate looking time. And prior, non-formalized theories, such as the Hunter and Ames model, are simply not explicit enough to implement. 

      To clarify this point, we have now explicitly stated in the Introduction that existing models are not stimulus-computable and do not generate predictions for looking behavior at the level of individual trials: 

      “Crucially, RANCH is the first stimulus-computable model of habituation, allowing us to derive quantitative predictions from raw visual stimuli. Previous theoretical accounts have described broad principles of habituation, but they do not generate testable, trial-by-trial predictions of looking behavior. As a result, direct comparisons between RANCH and these models remain challenging: existing models do not specify how an agent decides when to continue looking or disengage, nor do they provide a mechanistic link between stimulus properties and looking time. By explicitly modeling these decision processes, RANCH moves beyond post-hoc explanations and offers a computational framework that can be empirically validated and generalized to new contexts.” 

      We also highlight that our empirical comparisons in Figure 1 evaluate theoretical predictions based on existing conceptual models using behavioral data, rather than direct model-to-model comparisons: 

      “Addressing these three challenges allowed us to empirically test competing hypotheses about habituation and dishabituation using our experimental data (Figure

      \ref{fig:conceptual}). However, because existing models do not generate quantitative predictions, we could not directly compare RANCH to alternative computational models. Instead, we evaluated whether RANCH accurately captured key behavioral patterns in looking time.”

      The justification for using the RMSEA fitting approach could also be stronger - why is this the best way to compare the predictions of the formal model to the empirical data? Are there others? As always, the main issue with formal models is determining the degree to which they just match surface features of empirical data versus providing mechanistic insights, so some discussion of the level of fit necessary for strong inference would be useful. 

      Thank you for recommending additional clarity on our choice of evaluation metrics. RMSE is a very standard measure (for example, it’s the error metric used in fitting standard linear regression!). On the other hand, it captures absolute rather than relative errors. Correlation-based measures (e.g., r and r<sup>2</sup>-type measures) provide a measure of relative distance between predictive measures. In our manuscript we reported both RMSE and R². In the revised manuscript, we have now:

      (1) Added a paragraph in the main text explaining that RMSE captures the absolute error in the same units as looking time, whereas r² reflects the relative proportion of variance explained by the model: 

      “RANCH predictions qualitatively matched habituation and dishabituation in both infants and adults. To quantitatively evaluate these predictions, we fit a linear model (adjusting model‐generated samples by an intercept and scaling factor) and then assessed two complementary metrics. First, the root mean squared error (RMSE) captures the absolute error in the same units as looking time. Second, the coefficient of determination ($R^2$) measures the relative variation in looking time that is explained by the scaled model predictions. Since each metric relies on different assumptions and highlights distinct aspects of predictive accuracy, they together provide a more robust assessment of model performance. We minimized overfitting by employing cross‐validation—using a split‐half design for infant data and ten‐fold for adult data—to compute both RMSE and $R^2$ on held‐out samples.”

      (2) We updated Table 1 to include both RMSE and R² for each model variant and linking hypothesis. We now reported both RMSE and R² across the two experiments. 

      We hope these revisions address your concerns by offering a more comprehensive and transparent assessment of our model’s predictive accuracy.

      Regarding your final question, the desired level of fit for insight, our view is that – at least in theory development – measures of fit should always be compared between alternatives (rather than striving for some absolute level of prediction). We have attempted to do this by comparing fit within- and across-samples and via various ablation studies. We now make this point explicit in the General Discussion:

      More generally, while there is no single threshold for what constitutes a “good” model fit, the strength of our approach lies in the relative comparisons across model variants, linking hypotheses, and ablation studies. In this way, we treat model fit not as an absolute benchmark, but as an empirical tool to adjudicate among alternative explanations and assess the mechanistic plausibility of the model’s components.

      The difference in model predictions for identity vs number relative to the empirical data seems important but isn't given sufficient weight in terms of evaluating whether the model is or is not providing a good explanation of infant behavior. What would falsification look like in this context? 

      We appreciate the reviewer’s observation regarding the discrepancy between model predictions and the empirical data for identity vs.~number violations. We were also very interested in this particular deviation and we discuss it in detail in the General Discussion, noting that RANCH is currently a purely perceptual model, whereas infants’ behavior on number violations may reflect additional conceptual factors. Moreover, because this analysis reflects an out-of-sample prediction, we emphasize the overall match between RANCH and the data (see our global fit metrics) rather than focusing on a single data point. Infant looking time data also exhibit considerable noise, so we caution against over-interpreting small discrepancies in any one condition. In principle, a more thorough “falsification” would involve systematically testing whether larger deviations persist across multiple studies or stimulus sets, which is beyond the scope of the current work. 

      For the novel image similarity analysis, it is difficult to determine whether any differences are due to differences in the way the CNN encodes images vs in the habituation model itself - there are perhaps too many free parameters to pinpoint the nature of any disparities. Would there be another way to test the model without the CNN introducing additional unknowns? 

      Thank you for raising this concern. In our framework, the CNN and the habituation model operate jointly to generate predictions, so it can be challenging to parse out whether any mismatches arise specifically from one component or the other. However, we are not worried that the specifics of our CNN procedure introduces free parameters because:

      (1) The  CNN introduces no additional free parameters in our analyses, because it is a pre‐trained model not fitted to our data. 

      (2) We tested multiple CNN embeddings and observed similar outcomes, indicating that the details of the CNN are unlikely to be driving performance (Figure 12).

      Moreover, the key contribution of our second study is precisely that the model can generalize to entirely novel stimuli without any parameter adjustments. By combining a stable, off‐the‐shelf CNN with our habituation model, we can make out‐of‐sample predictions—an achievement that, to our knowledge, no previous habituation model has demonstrated.

      Related to that, the model contains lots of parts - the CNN, the EIG approach, and the parameters, all of which may or may not match how the infant's brain operates. EIG is systematically compared to two other algorithms, with KL working similarly - does this then imply we can't tell the difference between an explanation based on those two mechanisms? Are there situations in which they would make distinct predictions where they could be pulled apart? Also in this section, there doesn't appear to be any formal testing of the fits, so it is hard to determine whether this is a meaningful difference. However, other parts of the model don't seem to be systematically varied, so it isn't always clear what the precise question addressed in the manuscript is (e.g. is it about the algorithm controlling learning? or just that this model in general when fitted in a certain way resembles the empirical data?) 

      Thank you for highlighting these points about the model’s components and the comparison of EIG- vs. KL-based mechanisms. Regarding the linking hypotheses (EIG, KL, and surprisal), our primary goal was to assess whether rational exploration via noisy perceptual sampling could account for habituation and dishabituation phenomena in a stimulus-computable fashion. Although RANCH contains multiple elements—including the CNN for perceptual embedding, the learning model, and the action policy (EIG or KL)—we did systematically vary the “linking hypothesis” (i.e., whether sampling is driven by EIG, KL, or surprisal). We found that EIG and KL gave very similar fits, while surprisal systematically underperformed.

      We agree that future experiments could be designed to produce diverging predictions between EIG and KL, but examining these subtle differences is beyond the scope of our current work. Here, we sought to establish that a rational model of habituation, driven by noisy perceptual sampling, can deliver strong quantitative predictions—even for out-of-sample stimuli—rather than to fully disentangle forward- vs. backward-looking information metrics.

      We disagree, however, that we did not evaluate or formally compare other aspects of the model. In Table 1 we report ablation studies of different aspects of the model architecture (e.g., removal of learning and noise components). Further, the RMSE and R² values reported in Table 1 and Section 4.2.3 can be treated as out-of-sample estimates of performance and used for direct comparison (because Table 1 uses cross-validation and Section 4.2.3 reports out of sample predictions). 

      Perhaps the reviewer is interested in statistical hypothesis tests, but we do not believe these are appropriate here. Cross-validation provides a metric of out-of-sample generalization and model selection based on the resulting numerical estimates. Significance testing is not typically recommended, except in a limited subset of cases (see e.g. Vanwinckelen & Blokeel, 2012 and Raschka, 2018).

      Reviewer #1 (Recommendations for the authors):

      "We treat the number of samples for each stimulus as being linearly related to looking time duration." Looking times were not log transformed? 

      Thank you for your question. The assumption of a linear relationship between the model’s predicted number of samples and looking time duration is intended as a measurement transformation, not a strict assumption about the underlying distribution of looking times. This linear mapping is used simply to establish a direct proportionality between model-generated samples and observed looking durations.

      However, in our statistical analyses, we do log-transform the empirical looking times to account for skewness and stabilize variance. This transformation is standard practice when analyzing infant looking time data but is independent of how we map model predictions to observed times. Since there is no a priori reason to assume that the number of model samples must relate to looking time in a strictly log-linear way, we retained a simple linear mapping while still applying a log transformation in our analytic models where appropriate.

      It would be nice to have figures showing the results of the grid search over the parameter values. For example, a heatmap with sigma on x and eta on y, and goodness of fit indicated by colour, would show the quality of the model fit as a function of the parameters' values, but also if the parameters estimates are correlated (they shouldn't be). 

      Thank you for the suggestion. We agree that visualizing the grid search results can provide a clearer picture of how different parameter values affect model fit. In the supplementary materials, we already present analyses where we systematically search over one parameter at a time to find the best-fitting values.

      We also explored alternative visualizations, including heatmaps where sigma and eta are mapped on the x and y axes, with goodness-of-fit indicated by color. However, we found that the goodness of fit was very similar across parameter settings, making the heatmaps difficult to interpret due to minimal variation in color. This lack of variation in fit reflects the observation that our model predictions are robust to changes in parameter settings, which allows us to report strong out of sample predictions in Section 4. Instead, we opted to use histograms to illustrate general trends, which provide a clearer and more interpretable summary of the model fit across different parameter settings. Please see the heatmaps below, if you are interested. 

      Author response image 1.

      Model fit (measured by RMSE) across a grid of prior values for Alpha, Beta, and V shows minimal variation. This indicates that the model’s performance is robust to changes in prior assumptions.

      Regarding section 5.4, paragraph 2: It might be interesting to notice that a potential way to decorrelate these factors is to look at finer timescales (see Poli et al., 2024, Trends in Cognitive Sciences), which the current combination of neural nets and Bayesian inference could potentially be adapted to do. 

      Thank you for this insightful suggestion. We agree that examining finer timescales of looking behavior could provide valuable insights into the dynamics of attention and learning. In response, we have incorporated language in Section 5.4 to highlight this as a potential future direction: 

      Another promising direction is to explore RANCH’s applicability to finer timescales of looking behavior, enabling a more detailed examination of within-trial fluctuations in attention. Recent work suggests that analyzing moment-by-moment dynamics can help disentangle distinct learning mechanisms \autocite{poli2024individual}.Since RANCH models decision-making at the level of individual perceptual samples, it is well-suited to capture these fine-grained attentional shifts.

      Previous work integrating neural networks with Bayesian (like) models could be better acknowledged: Blakeman, S., & Mareschal, D. (2022). Selective particle attention: Rapidly and flexibly selecting features for deep reinforcement learning. Neural Networks, 150, 408-421. 

      Thank you for this feedback. We have now incorporated this citation into our discussion section: 

      RANCH integrates structured perceptual representations with Bayesian inference, allowing for stimulus-computable predictions of looking behavior and interpretable parameters at the same time. This integrated approach has been used to study selective attention \autocite{blakeman2022selective}.

      Unless I missed it, I could not find an OSF repository (although the authors refer to an OSF repository for a previous study that has not been included). In general, sharing the code would greatly help with reproducibility. 

      Thanks for this comment. We apologize that – although all of our code and data were available through github, we did not provide links in the manuscript. We have now added this at the end of the introduction section. 

      Reviewer #2 (Recommendations for the authors):

      Page 7 "infants clearly dishabituated on trials with longer exposures" - what are these stats comparing? Novel presentation to last familiar? 

      Thank you for pointing out this slightly confusing passage. The statistics reported are comparing looking time in looking time between the novel and familiar test trials after longer exposures. We have now added the following language: 

      Infants clearly dishabituated on trials with longer exposures, looking longer at the novel stimulus than the familiar stimulus after long exposure.

      Order effects were covaried in the model - does the RANCH model predict similar order effects to those observed in the empirical data, ie can it model more generic changes in attention as well as the stimulus-specific ones? 

      Thank you for this question. If we understand correctly, you are asking whether RANCH can capture order effects over the course of the experiment, such as general decreases in attention across blocks. Currently, RANCH does not model these block-level effects—it is designed to predict stimulus-driven looking behavior rather than more general attentional changes that occur over time such as fatigue. In our empirical analysis, block number was included as a covariate to account for these effects statistically, but RANCH itself does not have a mechanism to model block-to-block attentional drift independent of stimulus properties. This is an interesting direction for future work, where a model could integrate global attentional dynamics alongside stimulus-specific learning. To address this, we have added a sentence in the General Discussion saying:

      Similarly, RANCH does not capture more global attention dynamics, such as block-to-block attentional drift independent of stimulus properties.

      "We then computed the root mean squared error (RMSE) between the scaled model results and the looking time data." Why is this the most appropriate approach to considering model fit? Would be useful to have a brief explanation. 

      Thank you for pointing this out. We believe that we have now addressed this issue in Response to Comment #2 from Reviewer 1. 

      The title of subsection 3.3 made me think that you would be comparing RANCH to alternate hypotheses or models but this seems to be a comparison of ways of fitting parameters within RANCH - I think worth explaining that. 

      We have now added a sentence in the subsection to make the content of the comparison more explicit: 

      Here we evaluated different ways of specifying RANCH's decision-making mechanism (i.e., different "linking hypotheses" within RANCH).

      3.5 would be useful to have some statistics here - does performance significantly improve? 

      As discussed above, we systematically compared model variants using cross-validated RMSE and R² values, which provide quantitative evidence of improved performance. While these differences are substantial, we do not report statistical hypothesis tests, as significance testing is not typically appropriate for model comparison based on cross-validation (see Vanwinckelen & Blockeel, 2012; Raschka, 2018). Instead, we rely on out-of-sample predictive performance as a principled basis for evaluating model variants.

      It would be very helpful to have a formal comparison of RANCH and other models - this seems to be largely descriptive at the moment (3.6).

      We believe that we have now addressed this issue in our response to the first comment.

      Does individual infant data show any nonlinearities? Sometimes the position of the peak look is very heterogenous and so overall there appears to be no increase but on an individual level there is. 

      Thank you for your question. Given our experimental design, each exposure duration appears in separate blocks rather than in a continuous sequence for each infant. Because of this, the concept of an individual-level nonlinear trajectory over exposure durations does not directly apply. Instead, each infant contributes looking time data to multiple distinct conditions, rather than following a single increasing-exposure sequence. Any observed nonlinear trend across exposure durations would therefore be a group-level effect rather than a within-subject pattern.

      In 4.1, why 8 or 9 exposures rather than a fixed number? 

      We used slightly variable exposure durations to reduce the risk that infants develop fixed expectations about when a novel stimulus will appear. We have now clarified this point in the text.

      Why do results differ for the model vs empirical data for identity? Is this to do with semantic processing in infants that isn't embedded in the model? 

      Thank you for your comment. The discrepancy between the model and empirical data for identity violations is related to the discrepancy we discussed for number violations in the General Discussion. As noted there, RANCH relies on perceptual similarity derived from CNN embeddings, which may not fully capture distinctions that infants make.

      The model suggests the learner’s prior on noise is higher in infants than adults, so produces potentially mechanistic insights. 

      We agree! One of the key strengths of RANCH is its ability to provide mechanistic insights through interpretable parameters. The finding that infants have a higher prior on perceptual noise than adults aligns with previous research suggesting that early visual processing in infants is more variable and less precise.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: 

      LRRK2 protein is familially linked to Parkinson's disease by the presence of several gene variants that all confer a gain-of-function effect on LRRK2 kinase activity. 

      The authors examine the effects of BDNF stimulation in immortalized neuron-like cells, cultured mouse primary neurons, hIPSC-derived neurons, and synaptosome preparations from the brain. They examine an LRRK2 regulatory phosphorylation residue, LRRK2 binding relationships, and measures of synaptic structure and function. 

      Strengths: 

      The study addresses an important research question: how does a PD-linked protein interact with other proteins, and contribute to responses to a well-characterized neuronal signalling pathway involved in the regulation of synaptic function and cell health? 

      They employ a range of good models and techniques to fairly convincingly demonstrate that BDNF stimulation alters LRRK2 phosphorylation and binding to many proteins. Some effects of BDNF stimulation appear impaired in (some of the) LRRK2 knock-out scenarios (but not all). A phosphoproteomic analysis of PD mutant Knock-in mouse brain synaptosomes is included. 

      We thank this Reviewer for pointing out the strengths of our work. 

      Weaknesses: 

      The data sets are disjointed, conclusions are sweeping, and not always in line with what the data is showing. Validation of 'omics' data is very light. Some inconsistencies with the major conclusions are ignored. Several of the assays employed (western blotting especially) are likely underpowered, findings key to their interpretation are addressed in only one or other of the several models employed, and supporting observations are lacking. 

      We appreciate the Reviewer’s overall evaluaVon. In this revised version, we have provided several novel results that strengthen the omics data and the mechanisVc experiments and make the conclusions in line with the data.

      As examples to aid reader interpretation: (a) pS935 LRRK2 seems to go up at 5 minutes but goes down below pre-stimulation levels after (at times when BDNF-induced phosphorylation of other known targets remains very high). This is ignored in favour of discussion/investigation of initial increases, and the fact that BDNF does many things (which might indirectly contribute to initial but unsustained changes to pLRRK2) is not addressed.  

      We thank the Reviewer for raising this important point, which we agree deserves additional investigation. Although phosphorylation does decrease below pre-stimulation levels, a reduction is also observed for ERK/AKT upon sustained exposure to BDNF in our experimental paradigm (figure 1F-G). This phenomenon is well known in response to a number of extracellular stimuli and can be explained by mechanisms related to cellular negative feedback regulation, receptor desensitization (e.g. phosphorylation or internalization), or cellular adaptation. The effect on pSer935, however, is peculiar as phosphorylation goes below the unstimulated level, as pointed by the reviewer. In contrast to ERK and AKT whose phosphorylation is almost absent under unstimulated conditions (Figure 1F-G), the stoichiometry of Ser935 phosphorylation under unstimulated conditions is high. This observation is consistent with MS determination of relative abundance of pSer935 (e.g. in whole brain LRRK2 is nearly 100% phosphorylated at Ser935, see Nirujogi et al., Biochem J 2021).  Thus we hypothesized that the modest increase in phosphorylation driven by BDNF likely reflects a saturation or ceiling effect, indicating that the phosphorylation level is already near its maximum under resting conditions. Prolonged BDNF stimulation would bring phosphorylation down below pre-stimulation levels, through negative feedback mechanisms (e.g. phosphatase activity) explained above. To test this hypothesis, we conducted an experiment in conditions where LRRK2 is pretreated for 90 minutes with MLi-2 inhibitor, to reduce basal phosphorylation of S935. After MLi-2 washout, we stimulated with BDNF at different time points. We used GFP-LRRK2 stable lines for this experiment, since the ceiling effect was particularly evident (Figure S1A) and this model has been used for the interactomic study. As shown below (and incorporated in Fig. S1B in the manuscript), LRRK2 responds robustly to BDNF stimulation both in terms of pSer935 and pRABs. Phosphorylation peaks at 5-15 mins, while it decreases to unstimulated levels at 60 and 180 minutes. Notably, while the peak of pSer935 at 5-15 mins is similar to the untreated condition (supporting that Ser935 is nearly saturated in unstimulated conditions), the phosphorylation of RABs during this time period exceeds unstimulated levels. These findings support the notion that, under basal conditions, RAB phosphorylation is far from saturation. The antibodies used to detect RAB phosphorylation are the following: RAB10 Abcam # ab230261 e RAB8 (pan RABs) Abcam # ab230260.

      Given the robust response of RAB10 phosphorylation upon BDNF stimulation, we further investigated RAB10 phosphorylation during BDNF stimulation in naïve SH-SY5Y cells. We confirmed that the increase in pSer935 is coupled to increase in pT73-RAB10. Also in this case, RAB10 phosphorylation does not go below the unstimulated level, which aligns with the  low pRAB10 stoichiometry in brain (Nirujogi et al., Biochem J 2021). This experiment adds the novel and exciting finding that BDNF stimulation increases LRRK2 kinase activity (RAB phosphorylation) in neuronal cells. 

      Note that new supplemental figure 1 now includes: A) a comparison of LRRK2 pS935 and total protein levels before and after RA differentiation; B) differentiated GFP-LRRK2 SH-SY5Y (unstimulated, BDNF, MLi-2, BDNF+MLi-2); C) the kinetic of BDNF response in differentiated GFP-LRRK2 SH-SY5Y.

      (b) Drebrin coIP itself looks like a very strong result, as does the increase after BDNF, but this was only demonstrated with a GFP over-expression construct despite several mouse and neuron models being employed elsewhere and available for copIP of endogenous LRRK2. Also, the coIP is only demonstrated in one direction. Similarly, the decrease in drebrin levels in mice is not assessed in the other model systems, coIP wasn't done, and mRNA transcripts are not quantified (even though others were). Drebrin phosphorylation state is not examined.  

      We appreciate the Reviewer suggestions and provided additional experimental evidence supporting the functional relevance of LRRK2-drebrin interaction.

      (1) As suggested, we performed qPCR and observed that 1 month-old KO midbrain and cortex express lower levels of Dbn1 as compared to WT brains (Figure 5G). This result is in agreement with the western blot data (Figure 5H). 

      (2)To further validate the physiological relevance of LRRK2-drebrin interaction we performed two experiments:

      i) Western blots looking at pSer935 and pRab8 (pan Rab) in Dbn1 WT and knockout brains. As reported and quantified in Figure 2I, we observed a significant decrease in pSer935 and a trend decrease in pRab8 in Dbn1 KO brains. This finding supports the notion that Drebrin forms a complex with LRRK2 that is important for its activity, e.g. upon BDNF stimulation. 

      ii) Reverse co-immunoprecipitation of YFP-drebrin full-length, N-terminal domain (1-256 aa) and C-terminal domain (256-649 aa) (plasmids kindly received from Professor Phillip R. Gordon-Weeks, Worth et al., J Cell Biol, 2013) with Flag-LRRK2 co-expressed in HEK293T cells. As shown in supplementary Fig. S2C, we confirm that YFP-drebrin binds LRRK2, with the Nterminal region of drebrin appearing to be the major contributor to this interaction. This result is important as the N-terminal region contains the ADF-H (actin-depolymerising factor homology) domain and a coil-coil region known to directly bind actin (Shirao et al., J Neurochem 2017; Koganezawa et al., Mol Cell Neurosci. 2017). Interestingly, both full-length Drebrin and its truncated C-terminal construct cause the same morphological changes in Factin, indicating that Drebrin-induced morphological changes in F-actin are mediated by its N-terminal domains rather than its intrinsically disordered C-terminal region (Shirao et al., J Neurochem, 2017; Koganezawa et al., Mol Cell Neurosci. 2017). Given the role of LRRK2 in actin-cytoskeletal dynamics and its binding with multiple actin-related protein binding (Fig. 2 and Meixner et al., Mol Cell Proteomics. 2011; Parisiadou and Cai, Commun Integr Biol 2010), these results suggest the possibility that LRRK2 controls actin dynamics by competing with drebrin binding to actin and open new avenues for futures studies.

      (3) To address the request for examining drebrin phosphorylation state, we decided to perform another phophoproteomic experiment, leveraging a parallel analysis incorporated in our latest manuscript (Chen et al., Mol Theraphy 2025). In this experiment, we isolated total striatal proteins from WT and G2019S KI mice and enriched the phospho-peptides. Unlike the experiment presented in Fig. 7, phosphopeptides were enriched from total striatal lysates rather than synaptosomal fractions, and phosphorylation levels were normalized to the corresponding total protein abundance. This approach was intended to avoid bias toward synaptic proteins, allowing for the analysis of a broader pool of proteins derived from a heterogeneous ensemble of cell types (neurons, glia, endothelial cells, pericytes etc.). We were pleased to find that this new experiment confirmed drebrin S339 as a differentially phosphorylated site, with a 3.7 fold higher abundance in G2019S Lrrk2 KI mice. The fact that this experiment evidenced an increased phosphorylation stoichiometry in G2019S mice rather than a decreased is likely due to the normalization of each peptide by its corresponding total protein. Gene ontology analysis of differentially phosphorylated proteins using stringent term size (<200 genes) showed post-synaptic spines and presynaptic active zones as enriched categories (Fig. 3F). A SynGO analysis confirms both pre and postsynaptic categories, with high significance for terms related to postsynaptic cytoskeleton (Fig. 3G). As pointed, this is particularly interesting as the starting material was whole striatal tissue – not synaptosomes as previously – indicating that most significant phosphorylation differences occur in synaptic compartments. This once again reinforces our hypothesis that LRRK2 has a prominent role in the synapse. Overall, we confirmed with an independent phosphoproteomic analysis that LRRK2 kinase activity influences the phosphorylation state of proteins related to synaptic function, particularly postsynaptic cytoskeleton. For clarity in data presentation, as mentioned by the Reviewers, we removed Figure 7 and incorporated this new analysis in figure 3, alongside the synaptic cluster analysis. 

      Altogether, three independent OMICs approaches – (i) experimental LRRK2 interactomics in neuronal cells, (ii) a literature-based LRRK2 synaptic/cytoskeletal interactor cluster, and (iii) a phospho-proteomic analysis of striatal proteins from G2019S KI mice (to model LRRK2 hyperactivity) – converge to synaptic actin-cytoskeleton as a key hub of LRRK2 neuronal function.

      (c) The large differences in the CRISPR KO cells in terms of BDNF responses are not seen in the primary neurons of KO mice, suggesting that other differences between the two might be responsible, rather than the lack of LRRK2 protein. 

      Considering that some variability is expected for these type of cultures and across different species, any difference in response magnitude and kinetics could be attributed to the levels of TrKB  and downstream components expressed by the two cell types. 

      We are confident that differentiated SH-SY5Y cells provide a reliable model for our study as we could translate the results obtained in SH-SY5Y cells in other models. However, to rule out the possibility that the more pronounced effect observed in SH-SY5Y KO cells as respect to Lrrk2 KO primary neurons was due to CRISPR off-target effect, we performed an off-target analysis. Specifically, we selected the first 8 putative off targets exhibiting a CDF (Cutting Frequency Determination) off-target-score >0.2. 

      As shown in supplemental file 1, sequence disruption was observed only in the LRRK2 ontarget site in LRRK2 KO SH-SY5Y cells, while the 8 off-target regions remained unchanged across the genotypes and relative to the reference sequence. 

      (d) No validation of hits in the G2019S mutant phosphoproteomics, and no other assays related to the rest of the paper/conclusions. Drebrin phosphorylation is different but unvalidated, or related to previous data sets beyond some discussion. The fact that LRRK2 binding occurs, and increases with BDNF stimulation, should be compared to its phosphorylation status and the effects of the G2019S mutation. 

      As illustrated in the response to point (b), we performed a new phosphoproteomics investigation – with total striatal lysates instead of striatal synaptosomes and normalization phospho-peptides over total proteins – and found that S339 phosphorylation increases when LRRK2 kinase activity increases (G2019S). To address the request of validating drebrin phosphorylation, the main limitation is that there are no available antibodies against Ser339. While we tried phos-Tag gels in striatal lysates, we could not detect any reliable and specific signal with the same drebrin antibody used for western blot (Thermo Fisher Scientific: MA120377) due to technical limitations of the phosTag method. We are confident that phosphorylation at S339 has a physiological relevance, as it was identified 67 times across multiple proteomic discovery studies and they are placed among the most frequently phosphorylated sites in drebrin (https://www.phosphosite.org/proteinAction.action?id=2675&showAllSites=true).

      To infer a possible role of this phosphorylation, we looked at the predicted pathogenicity of using AlphaMissense (Cheng et al., Science 2023). included as supplementary figure (Fig. S3), aminoacid substitutions within this site are predicted not to be pathogenic, also due to the low confidence of the AlphaFold structure. 

      Ser339 in human drebrin is located just before the proline-rich region (PP domain) of the protein. This region is situated between the actin-binding domains and the C-terminal Homerbinding sequences and plays a role in protein-protein interactions and cytoskeletal regulation (Worth et al., J Cell Biol, 2013). Of interest, this region was previously shown to be the interaction site of adafin (ADFN), a protein involved in multiple cytoskeletal-related processes, including synapse formation and function by regulating puncta adherentia junctions, presynaptic differentiation, and cadherin complex assembly, which are essential for hippocampal excitatory synapses, spine formation, and learning and memory processes (Beaudoin, G. M., 3rd et al., J Neurosci, 2013). Of note, adafin is in the list of LRRK2 interacting proteins (https://www.ebi.ac.uk/intact/home), supporting a possible functional relevance of LRRK2-mediated drebrin phosphorylation in adafin-drebrin complex formation. This has been discussed in the discussion section.

      The aim of this MS analysis in G2019S KI mice – now included in figure 3 – was to further validate the crucial role of LRRK2 kinase activity in the context of synaptic regulation, rather than to discover and characterize novel substrates. Consequently, Figure 7 has been eliminated. 

      Reviewer #2 (Public Review):  

      Taken as a whole, the data in the manuscript show that BDNF can regulate PD-associated kinase LRRK2 and that LRRK2 modifies the BDNF response. The chief strength is that the data provide a potential focal point for multiple observations across many labs. Since LRRK2 has emerged as a protein that is likely to be part of the pathology in both sporadic and LRRK2 PD, the findings will be of broad interest. At the same time, the data used to imply a causal throughline from BDNF to LRRK2 to synaptic function and actin cytoskeleton (as in the title) are mostly correlative and the presentation often extends beyond the data. This introduces unnecessary confusion. There are also many methodological details that are lacking or difficult to find. These issues can be addressed. 

      We appreciate the Reviewer’s positive feedback on our study. We also value the suggestion to present the data in a more streamlined and coherent way. In response, we have updated the title to better reflect our overall findings: “LRRK2 Regulates Synaptic Function through Modulation of Actin Cytoskeletal Dynamics.” Additionally, we have included several experiments that we believe enhance and unify the study.

      (1) The writing/interpretation gets ahead of the data in places and this was confusing. For example, the abstract highlights prior work showing that Ser935 LRRK2 phosphorylation changes LRRK2 localization, and Figure 1 shows that BDNF rapidly increases LRRK2 phosphorylation at this site. Subsequent figures highlight effects at synapses or with synaptic proteins. So is the assumption that LRRK2 is recruited to (or away from) synapses in response to BDNF? Figure 2H shows that LRRK2-drebrin interactions are enhanced in response to BDNF in retinoic acid-treated SH-SY5Y cells, but are synapses generated in these preps? How similar are these preps to the mouse and human cortical or mouse striatal neurons discussed in other parts of the paper (would it be anticipated that BDNF act similarly?) and how valid are SHSY5Y cells as a model for identifying synaptic proteins? Is drebrin localization to synapses (or its presence in synaptosomes) modified by BDNF treatment +/- LRRK2? Or do LRRK2 levels in synaptosomes change in response to BDNF? The presentation requires re-writing to stay within the constraints of the data or additional data should be added to more completely back up the logic. 

      We thank the Reviewer for the thorough suggestions and comments. We have extensively revised the text to accurately reflect our findings without overinterpreting. In particular, we agree with the Reviewer that differentiated SH-SY5Y cells are not  identical to primary mouse or human neurons; however both neuronal models respond to BDNF. Supporting our observations, it is known that SH-SY5Y cells respond to BDNF.  In fact, a common protocol for differentiating SH-SY5Y cells involve BDNF in combination with retinoic acid (Martin et al., Front Pharmacol, 2022; Kovalevich et al., Methods in mol bio, 2013). Additionally, it has been reported that SH-SY5Y cells can form functional synapses (Martin et al., Front Pharmacol, 2022). While we are aware that BDNF, drebrin or LRRK2 can also affect non-synaptic pathways, we focused on synapses when moved to mouse models since: (i) MS and phosphoMS identified several cytoskeletal proteins enriched at the synapse, (ii) we and others have previously reported a role for LRRK2 in governing synaptic and cytoskeletal related processes; (iii) the synapse is a critical site that becomes dysfunctional in the early  stages of PD. We have now clarified and adjusted the text as needed. We have also performed additional experiments to address the Reviewer’s concern:

      (1) “Is the assumption that LRRK2 is recruited to (or away from) synapses in response to BDNF”? This is a very important point. There is consensus in the field that detecting endogenous LRRK2 in brain slices or in primary neurons via immunofluorescence is very challenging with the commercially available  antibodies (Fernandez et al., J Parkinsons Dis, 2022). We established a method in our previous studies to detect LRRK2 biochemically in synaptosomes (Cirnaru et al., Front Mol Neurosci, 2014; Belluzzi et al., Mol Neurodegener., 2016). While these data indicate LRRK2 is present in the synaptic compartments, it would be quite challenging to apply this method to the present study. In fact, applying acute BDNF stimulation in vivo and then isolate synaptosomes is a complex experiment beyond the timeframe of the revision due to the need of mouse ethical approvals. However, this is definitely an intriguing angle to explore in the future.

      (2)“Is drebrin localization to synapses (or its presence in synaptosomes) modified by BDNF treatment +/- LRRK2?” To try and address this question, we adapted a previously published assay to measure drebrin exodus from dendritic spines. During calcium entry and LTP, drebrin exits dendritic spines and accumulates in the dendritic shafts and cell body (Koganezawa et al., 2017). This facilitates the reorganization of the actin cytoskeleton (Shirao et al., 2017). Given the known role of drebrin and its interaction with LRRK2, we hypothesized that LRRK2 loss might affect drebrin relocalization during spine maturation.

      To test this, we treated DIV14 primary cortical neurons from Lrrk2 WT and KO mice with BDNF for 5, 15, and 24 hours, then performed confocal imaging of drebrin localization (Author response image 1). Neurons were transfected at DIV4 with GFP (cell filler) and PSD95 (dendritic spines) for visualization, and endogenous drebrin was stained with an anti-drebrin antibody. We then measured drebrin's overlap with PSD95-positive puncta to track its localization at the spine.

      In Lrrk2 WT neurons, drebrin relocalized from spines after BDNF stimulation, peaking at 15 minutes and showing higher co-localization with PSD95 at 24 hours, indicating the spine remodeling occurred. In contrast, Lrrk2 KO neurons showed no drebrin exodus. These findings support the notion that LRRK2's interaction with drebrin is important for spine remodeling via BDNF. However, additional experiments with larger sample sizes are needed, which were not feasible within the revision timeframe (here n=2 experiments with independent neuronal preparations, n=4-7 neurons analyzed per experiment). Thus, we included the relevant figure as Author response image 1 but chose not to add it in the manuscript (figure 3).

      Author response image 1.

      Lrrk2 affects drebrin exodus from dendritic spines. After the exposure to BDNF for different times (5 minutes, 15 minutes and 24 hours), primary neurons from Lrrk2 WT and KO mice have been transfected with GFP and PSD95 and stained for endogenous drebrin at DIV4. The amount of drebrin localizing in dentritic spines outlined by PSD95 has been assessed at DIV14. The graph shows a pronounced decrease in drebrin content in WT neurons during short time treatments and an increase after 24 hours. KO neurons present no evident variations in drebrin localization upon BDNF stimulation. Scale bar: 4 μm.<br />

      (2) The experiments make use of multiple different kinds of preps. This makes it difficult at times to follow and interpret some of the experiments, and it would be of great benefit to more assertively insert "mouse" or "human" and cell type (cortical, glutamatergic, striatal, gabaergic) etc. 

      We thank the Reviewer for pointing this out. We have now more clearly specified the cell type and species identity throughout the text to improve clarity and interpretation.

      (3) Although BDNF induces quantitatively lower levels of ERK or Akt phosphorylation in LRRK2KO preps based on the graphs (Figure 4B, D), the western blot data in Figure 4C make clear that BDNF does not need LRRK2 to mediate either ERK or Akt activation in mouse cortical neurons and in 4A, ERK in SH-SY5Y cells. The presentation of the data in the results (and echoed in the discussion) writes of a "remarkably weaker response". The data in the blots demand more nuance. It seems that LRRK2 may potentiate a response to BDNF that in neurons is independent of LRRK2 kinase activity (as noted). This is more of a point of interpretation, but the words do not match the images.  

      We thank the Reviewer for pointing this out. We have rephrased our data  presentation to better convey  our findings. We were not surprised to find that loss of LRRK2 causes only a reduction of ERK and AKT activation upon BDNF rather than a complete loss. This is because these pathways are complex and redundant and are activated by a number of cellular effectors. The fact that LRRK2 is one among many players whose function can be compensated by other signaling molecules is also supported by the phenotype of Lrrk2 KO mice that is measurable at 1 month but disappears with adulthood (4 and 18 months) (figure 5).

      Moreover, we removed the sentence “Of note, 90 mins of Lrrk2 inhibition (MLi-2) prior to BDNF stimulation did not prevent phosphorylation of Akt and Erk1/2, suggesting that LRRK2 participates in BDNF-induced phosphorylation of Akt and Erk1/2 independently from its kinase activity but dependently from its ability to be phosphorylated at Ser935 (Fig. 4C-D and Fig. 1B-C)” since the MLi-2 treatment prior to BDNF stimulation was not quantified and our new data point to an involvement of LRRK2 kinase activity upon BDNF stimulation.

      (4) Figure 4F/G shows an increase in PSD95 puncta per unit length in response to BDNF in mouse cortical neurons. The data do not show spine induction/dendritic spine density/or spine morphogenesis as suggested in the accompanying text (page 8). Since the neurons are filled/express gfp, spine density could be added or spines having PSD95 puncta. However, the data as reported would be expected to reflect spine and shaft PSDs and could also include some nonsynaptic sites. 

      The Reviewer is right. We have rephrased the text to reflect an increase in postsynaptic density (PSD) sites, which may include both spine and shaft PSDs, as well as potential nonsynaptic sites.

      (5) Experimental details are missing that are needed to fully interpret the data. There are no electron microscopy methods outside of the figure legend. And for this and most other microscopy-based data, there are few to no descriptions of what cells/sites were sampled, how many sites were sampled, and how regions/cells were chosen. For some experiments (like Figure 5D), some detail is provided in the legend (20 segments from each mouse), but it is not clear how many neurons this represents, where in the striatum these neurons reside, etc. For confocal z-stacks, how thick are the optical sections and how thick is the stack? The methods suggest that data were analyzed as collapsed projections, but they cite Imaris, which usually uses volumes, so this is confusing. The guide (sgRNA) sequences that were used should be included. There is no mention of sex as a biological variable. 

      We thank the Reviewer for pointing out this missing information. We have now included:

      (1) EM methods (page 24)

      (2) Methods for ICC and confocal microscopy now incorporates the Z-stack thickness (0.5 μm x 6 = 3 μm) on page 23.

      (3) Methods for Golgi-Cox staining now incorporates the Z-stack thickness and number of neurons and segments per neuron analyzed. 

      (4) The sex of mice is mentioned in the material and methods (page 17): “Approximately equal numbers of males and females were used for every experiment”.

      (6) For Figures 1F, G, and E, how many experimental replicates are represented by blots that are shown? Graphs/statistics could be added to the supplement. For 1C and 1I, the ANOVA p-value should be added in the legend (in addition to the post hoc value provided). 

      The blots relative to figure 1F,G and E are representative of several blots (at least n=5). The same redouts are part of figure 4 where quantifications are provided. We added the ANOVA p-value in the legend for figure 1C, 1I and 1K.

      (7) Why choose 15 minutes of BDNF exposure for the mass spec experiments when the kinetics in Figure 1 show a peak at 5 mins?  

      This is an important point. We repeated the experiment in GFP-LRRK2 SH-SY5Y cells (figure S1C) and included the 15 min time point. In addition to confirming that pSer935 increases similarly at 5 and 15 minutes, we also observed an increase in RAB phosphorylation at these time points. As mentioned in our response to Reviewer’s 1, we pretreated with MLi-2 for 90 minutes in this experiment to reduce the high basal phosphorylation stoichiometry of pSer935. 

      (8) The schematic in Figure 6A suggests that iPSCs were plated, differentiated, and cultured until about day 70 when they were used for recordings. But the methods suggest they were differentiated and then cryopreserved at day 30, and then replated and cultured for 40 more days. Please clarify if day 70 reflects time after re-plating (30+70) or total time in culture (70). If the latter, please add some notes about re-differentiation, etc. 

      We thank the reviewer for providing further clarity on the iPSC methodology. In the submitted manuscript 70DIV represents the total time in vitro and the process involved a cryostorage event at 30DIV, with a thaw of the cells and a further 40 days of maturation before measurement.  We have adjusted the methods in both the text and figure (new schematic) to clarify this.  The cryopreservation step has been used in other iPSC methods to great effect (Drummond et al., Front Cell Dev Biol, 2020). Due to the complexity and length of the iPSC neuronal differentiation process, cryopreservation represents a useful method with which to shorten and enhance the ability to repeat experiments and reduce considerable variation between differentiations. User defined differences in culture conditions for each batch of neurons thawed can usefully be treated as a new and separate N compared to the next batch of neurons.

      (9) When Figures 6B and 6C are compared it appears that mEPSC frequency may increase earlier in the LRRK2KO preps than in the WT preps since the values appear to be similar to WT + BDNF. In this light, BDNF treatment may have reached a ceiling in the LRRK2KO neurons.

      We thank the reviewer for his/her comment and observations about the ceiling effects. It is indeed possible that the loss of LRRK2 and the application of BDNF could cause the same elevation in synaptic neurotransmission. In such a situation, the increased activity as a result of BDNF treatment would be masked by the increased activity  observed as a result of LRRK2 KO. To better visualize the difference between WT and KO cultures and the possible ceiling effect, we merged the data in one single graph.  

      (10) Schematic data in Figures 5A and C and Figures 5B and E are too small to read/see the data. 

      We thank the Reviewer for this suggestion. We have now enlarged figure 5A and moved the graph of figure 5D in supplemental figure S5, since this analysis of spine morphology is secondary to the one shown in figure 5C.

      Reviewer #1 (Recommendations For The Authors): 

      Please forgive any redundancy in the comments, I wanted to provide the authors with as much information as I had to explain my opinion. 

      Primary mouse cortical neurons at div14, 20% transient increase in S935 pLRRK2 5min after BDNF, which then declines by 30 minutes (below pre-stim levels, and maybe LRRK2 protein levels do also). 

      In differentiated SHSY5Y cells there is a large expected increase in pERK and pAKT that is sustained way above pre-stim for 60 minutes. There is a 50% initial increase in pLRRK2 (but the blot is not very clear and no double band in these cells), which then looks like reduced well below pre-stim by 30 & 60 minutes. 

      We thank the Reviewer for bring up this important point. We have extensively addressed this issue in the public review rebuttal. In essence, the phosphorylation of Ser935 is near saturation under unstimulated conditions, as evidenced by its high basal stoichiometry, whereas Rab phosphorylation is far from saturation, showing an increase upon BDNF stimulation before returning to baseline levels. This distinction highlights that while pSer935 exhibits a ceiling effect due to its near-maximal phosphorylation at rest, pRab responds dynamically to BDNF, indicating low basal phosphorylation and a significant capacity for increase. Figure 1 in the rebuttal summarizes the new data collected. 

      GFP-fused overexpressed LRRK2 coIPs with drebrin, and this is double following 15 min BDNF. Strong result.

      We thank the Reviewer.

      BDNF-induced pAKT signaling is greatly impaired, and pERK is somewhat impaired, in CRISPR LKO SHSY5Y cells. In mouse primaries, both AKT and Erk phosph is robustly increased and sustained over 60 minutes in WT and LKO. This might be initially less in LKO for Akt (hard to argue on a WB n of 3 with huge WT variability), regardless they are all roughly the same by 60 minutes and even look higher in LKO at 60. This seems like a big disconnect and suggests the impairment in the SHSy5Y cells might have more to do with the CRISPR process than the LRRK2. Were the cells sequenced for off-target CRISPR-induced modifications?  

      Following the Reviewer suggestion – and as discussed in the public review section - we performed an off-target analysis. Specifically, we selected the first 8 putative off targets exhibiting a CDF (Cutting Frequency Determination) off-target-score >0.2. As shown in supplemental file 1, sequence disruption was observed only in the LRRK2 on-target site in LRRK2 KO SH-SY5Y cells, while the 8 off-target regions remained unchanged across the genotypes and relative to the reference sequence.  

      No difference in the density of large PSD-95 puncta in dendrites of LKO primary relative to WT, and the small (10%) increase seen in WT after BDNF might be absent in LKO (it is not clear to me that this is absent in every culture rep, and the data is not highly convincing). This is also referred to as spinogenesis, which has not been quantified. Why not is confusing as they did use a GFP fill... 

      The Reviewer is right that spinogenesis is not the appropriate term for the process analyzed. We replaced “spinogenesis” with “morphological alternation of dendritic protrusions” or “synapse maturation” which is correlated with the number of PSD95 positive puncta (ElHusseini et al., Science, 2000) . 

      There is a difference in the percentage of dendritic protrusions classified as filopodia to more being classified as thin spines in LKO striatal neurons at 1 month, which is not seen at any other age, The WT filopodia seems to drop and thin spine percent rise to be similar to LKO at 4 months. This is taken as evidence for delayed maturation in LKO, but the data suggest the opposite. These authors previously published decreased spine and increased filopodia density at P15 in LKO. Now they show that filopodia density is decreased and thin spine density increased at one month. How is that shift from increased to decreased filopodia density in LKO (faster than WT from a larger initial point) evidence of impaired maturation? Again this seems accelerated? 

      We agree with the Reviewer that the initial interpretation was indeed confusing. To adhere closely to our data and avoid overinterpretation – as also suggested by Reviewer 2 – we revised  the text and moved figure 5D to supplementary materials. In essence, our data point out to alterations in the structural properties of dendritic protrusions in young KO mice, specifically a reduction in  their size (head width and neck height) and a decrease in postsynaptic density (PSD) length, as observed with TEM. These findings suggest that LRRK2 is involved in morphological processes during spine development. 

      Shank3 and PSD95 mRNA transcript levels were reduced in the LKO midbrain, only shank3 was reduced in the striatum and only PSD was reduced in the cortex. No changes to mRNA of BDNF-related transcripts. None of these mRNA changes protein-validated. Drebrin protein (where is drebrin mRNA?) levels are reduced in LKO at 1&4 but not clearly at 18 months (seems the most robust result but doesn't correlate with other measures, which here is basically a transient increase (1m) in thin striatal spines).  

      As illustrated before, we performed qPCR for Dbn1 and found that its expression is significantly reduced in the cortex and midbrain and non-significantly reduced in the striatum (1 months old mice, a different cohort as those used for the other analysis in figure 5).  

      24h BDNF increases the frequency of mEPSCs on hIPSC-derived cortical-like neurons, but not LKO, which is already high. There are no details of synapse number or anything for these cultures and compares 24h treatment. BDNF increases mEPSC frequency within minutes PMC3397209, and acute application while recording on cells may be much more informative (effects of BDNF directly, and no issues with cell-cell / culture variability). Calling mEPSC "spontaneous electrical activity" is not standard.  

      We thank the reviewer for this point. We provided information about synapse number (Bassoon/Homer colocalization) in supplementary figure S7. The lack of response of LRRK2 KO cultures in terms of mEPSC is likely due to increase release probability as the number of synapses does not change between the two genotypes. 

      The pattern of LRRK2 activation is very disconnected from that of BDNF signalling onto other kinases. Regarding pLRRK2, s935 is a non-autophosph site said to be required for LRRK2 enzymatic activity, that is mostly used in the field as a readout of successful LRRK2 inhibition, with some evidence that this site regulates LRRK2 subcellular localization (which might be more to do with whether or not it is p at 935 and therefor able to act as a kinase). 

      The authors imply BDNF is activating LRRK2, but really should have looked at other sites, such as the autophospho site 1292 and 'known' LRRK2 substrates like T73 pRab10 (or other e.g., pRab12) as evidence of LRRK2 activation. One can easily argue that the initial increase in pLRRK2 at this site is less consequential than the observation that BDNF silences LRRK2 activity based on p935 being sustained to being reduced after 5 minutes, and well below the prestim levels... not that BDNF activates LRRK2. 

      As described above, we have collected new data showing that BDNF stimulation increases LRRK2 kinase activity toward its physiological substrates Rab10 and Rab8 (using a panphospho-Rab antibody) (Figure 1 and Figure S1). Additionally, we have also extensively commented the ceiling effect of pS935.

      BDNF does a LOT. What happens to network activity in the neural cultures with BDNF application? Should go up immediately. Would increasing neural activity (i.e., through depolarization, forskolin, disinhibition, or something else without BDNF) give a similar 20% increase in pS935 LRRK2? Can this be additive, or occluded? This would have major implications for the conclusions that BDNF and pLRRK2 are tightly linked (as the title suggests).  

      These are very valuable observations; however, they fall outside the scope and timeframe of this study. We agree that future research should focus on gaining a deeper mechanistic understanding of how LRRK2 regulates synaptic activity, including vesicle release probability and postsynaptic spine maturation, independently of BDNF.

      Figures 1A & H "Western blot analysis revealed a rapid (5 mins) and transient increase of Ser935 phosphorylation after BDNF treatment (Fig. 1B and 1C). Of interest, BDNF failed to stimulate Ser935 phosphorylation when neurons were pretreated with the LRRK2 inhibitor MLi-2" . The first thing that stands out is that the pLRRK2 in WB is not very clear at all (although we appreciate it is 'a pig' to work with, I'd hope some replicates are clearer); besides that, the 20% increase only at 5min post-BDNF stimulation seems like a much less profound change than the reduction from base at 60 and more at 180 minutes (where total LRRK2 protein is also going down?). That the blot at 60 minutes in H is representative of a 30% reduction seems off... makes me wonder about the background subtraction in quantification (for this there is much less pLRRK2 and more total LRRK2 than at 0 or 5). LRRK2 (especially) and pLRRK2 seem very sketchy in H. Also, total LRRK2 appears to increase in the SHSY5Y cell not the neurons, and this seems even clearer in 2 H. 

      To better visualize the dynamics of pS935 variation relative to time=0, we presented the data as the difference between t=0 and t=x. It clearly shows that pSe935 goes below prestimulation levels, whereas pRab10 does not. The large difference in the initial stoichiometry of these two phosphorylation is extensively discussed above.

      That MLi2 eliminates pLRRK2 (and seems to reduce LRRK2 protein?) isn't surprising, but a 90min pretreatment with MLi-2 should be compared to MLi-2's vehicle alone (MLi-2 is notoriously insoluble and the majority of diluents have bioactive effects like changing activity)... especially if concluding increased pLRRK2 in response to BDNF is a crucial point (when comparing against effects on other protein modifications such as pAKT). This highlights a second point... the changes to pERK and pAKT are huge following BDNF (nothing to massive quantities), whereas pLRRK2 increases are 20-50% at best. This suggests a very modest effect of BDNF on LRRK in neurons, compared to the other kinases. I worry this might be less consequential than claimed. Change in S1 is also unlikely to be significant... 

      These comments have been thoroughly addressed in the previous responses. Regarding fig. S1, we added an additional experiment (Figure S1C) in GFP-LRRK2 cells showing robust activation of LRRK2 (pS935, pRabs) at the timepoint of MS (15 min).

      "As the yields of endogenous LRRK2 purification were insufficient for AP-MS/MS analysis, we generated polyclonal SH-SY5Y cells stably expressing GFP-LRRK2 wild-type or GFP control (Supplementary Fig. 1)" . I am concerned that much is being assumed regarding 'synaptic function' from SHSY5Y cells... also overexpressing GFP-LRRK2 and looking at its binding after BDNF isn't synaptic function.  

      We appreciate the reviewer’s comment. We would like to clarify that the interactors enriched upon BDNF stimulation predominantly fall into semantic categories related to the synapse and actin cytoskeleton. While this does not imply that these interactors are exclusively synaptic, it suggests that this tightly interconnected network likely plays a role in synaptic function. This interpretation is supported by several lines of evidence: (1) previous studies have demonstrated the relevance of this compartment to LRRK2 function; (2) our new phosphoproteomics data from striatal lysate highlight enrichment of synaptic categories; and (3) analysis of the latest GWAS gene list (134 genes) also indicates significant enrichment of synapse-related categories. Taken together, these findings justify further investigation into the role of LRRK2 in synaptic biology, as discussed extensively in the manuscript’s discussion section.

      Figure 2A isn't alluded to in text and supplemental table 1 isn't about LRRK2 binding, but mEPSCs. 

      We have added Figure 2A and added supplementary .xls table 1, which refers to the excel list of genes with modulated interaction upon BDNF (uploaded in the supplemental material).

      We added the extension .xls also for supplementary table 2 and 3. 

      Figure 2A is useless without some hits being named, and the donut plots in B add nothing beyond a statement that "35% of 'genes' (shouldn't this be proteins?) among the total 207 LRRK2 interactors were SynGO annotated" might as well [just] be the sentence in the text. 

      We have now included the names of the most significant hits, including cytoskeletal and translation-related proteins, as well as known LRRK2 interactors. We decided to retain the donut plots, as we believe they simplify data interpretation for the reader, reducing the need to jump back and forth between the figures and the text.

      Validation of drebrin binding in 2H is great... although only one of 8 named hits; could be increased to include some of the others. A concern alludes to my previous point... there is no appreciable LRRK2 in these cells until GFP-LRRK2 is overexpressed; is this addressed in the MS? Conclusions would be much stronger if bidirectional coIP of these binding candidates were shown with endogenous (GFP-ve) LRRK2 (primaries or hIPSCs, brain tissue?) 

      To address the Reviewer’s concerns to the best of our abilities, we have added a blot in Supplemental figure S1A showing how the expression levels of LRRK2 increase after RA differentiation. Moreover, we have included several new data further strengthening the functional link between LRRK2 and drebrin, including qPCR of Dbn1 in one-month old Lrrk2 KO brains, western blots of Lrrk2 and Rab in Dbn1 KO brains, and co-IP with drebrin N- and Cterm domains. 

      Figures 3 A-C are not informative beyond the text and D could be useful if proteins were annotated. 

      To avoid overcrowding, proteins were annotated in A and the same network structure reported for synaptic and actin-related interactors. 

      Figure 4. Is this now endogenous LRRK2 in the SHSY5Y cells? Again not much LRRK2 though, and no pLRRK shown. 

      We confirm that these are naïve SH-SY5Y cells differentiated with RA and LRRK2 is endogenous. We did not assess pS935 in this experiment, as the primary goal was to evaluate pAKT and pERK1/2 levels. To avoid signal saturation, we loaded less total protein (30 µg instead of the 80 µg typically required to detect pS935). pS935 levels were extensively assessed in Figure 1. This experimental detail has now been added in the material and methods section (page 18).

      In C (primary neurons) There is very little increase in pLRRK2 / LRRK2 at 5 mins, and any is much less profound a change than the reduction at 30 & 60 mins. I think this is interesting and may be a more substantial consequence of BDNF treatment than the small early increase. Any 5 min increase is gone by 30 and pLRRK2 is reduced after. This is a disconnect from the timing of all the other pProteins in this assay, yet pLRRK2 is supposed to be regulating the 'synaptic effects'? 

      The first part of the question has already been extensively addressed. Regarding the timing, one possibility is that LRRK2 is activated upstream of AKT and ERK1/2, a hypothesis supported by the reduced activation of AKT and ERK1/2 observed in LRRK2 KO cells, as discussed in the manuscript, and in MLi-2 treated cells (Author response image 2). Concerning the synaptic effects, it is well established that synaptic structural and functional plasticity occurs downstream of receptor activation and kinase signaling cascades. These changes can be mediated by both rapid mechanisms (e.g., mobilization of receptor-containing endosomes via the actin cytoskeleton) and slower processes involving gene transcription of immediate early genes (IEGs). Since structural and functional changes at the synapse generally manifest several hours after stimulation, we typically assessed synaptic activity and structure 24 hours post-stimulation.

      Akt Erk1&2 both go up rapidly after BDNF in WT, although Akt seems to come down with pLRRK2. If they aren't all the same Akt is probably the most different between LKO and WT but I am very concerned about an n=3 for wb, wb is semi-quantitative at best, and many more than three replicates should be assessed, especially if the argument is that the increases are quantitively different between WT v KO (huge variability in WT makes me think if this were done 10x it would all look same). Moreover, this isn't similar to the LKO primaries  "pulled pups" pooled presumably. 

      Despite some variability in the magnitude of the pAKT/pERK response in naïve SH-SY5Y cells, all three independent replicates consistently showed a reduced response in LRRK2 KO cells, yielding a highly significant result in the two-way ANOVA test. In contrast, the difference in response magnitude between WT and LRRK2 KO primary cultures was less pronounced, which justified repeating the experiments with n=9 replicates. We hope the Reviewer acknowledges the inherent variability often observed in western blot experiments, particularly when performed in a fully independent manner (different cultures and stimulations, independent blots).

      To further strengthen the conclusion that this effect is reproducible and dependent on LRRK2 kinase activity upstream of AKT and ERK, we probed the membranes in figure 1H with pAKT/total AKT and pERK/total ERK. All things considered and consistent with our hypothesis, MLi-2 significantly reduced BDNF-mediated AKT and ERK1/2 phosphorylation levels (Author response image 2). 

      Author response image 2.

      Western blot (same experiments as in figure 1) was performed using antibodies against phospho-Thr202/185 ERK1/2, total ERK1/2 and phospho-Ser473 AKT, total AKT protein levels Retinoic acid-differentiated SH-SY5Y cells stimulated with 100 ng/mL BDNF for 0, 5, 30, 60 mins. MLi-2 was used at 500 nM for 90 mins to inhibit LRRK2 kinase activity.

      G lack of KO effect seems to be skewed from one culture in the plot (grey). The scatter makes it hard to read, perhaps display the culture mean +/- BDNF with paired bars. The fact that one replicate may be changing things is suggested by the weirdly significant treatment effect and no genotype effect. Also, these are GFP-filled cells, the dendritic masks should be shown/explained, and I'm very surprised no one counted the number (or type?) of protrusions, especially as the text describes this assay (incorrectly) as spinogenesis... 

      As suggested by the Reviewer we have replotted the results as bar graphs. Regarding the number of protrusions, we initially counted the number of GFP+ puncta in the WT and did not find any difference (Author response image 3). Due to our imaging setup (confocal microscopy rather than super-resolution imaging and Imaris 3D reconstruction), we were unable to perform a fine morphometric analysis. However, this was not entirely unexpected, as BDNF is known to promote both the formation and maturation of dendritic spines. Therefore, we focused on quantifying PSD95+ puncta as a readout of mature postsynaptic compartments. While we acknowledge that we cannot definitively conclude that each PSD95+ punctum is synaptically connected to a presynaptic terminal, the data do indicate an increase in the number of PSD95+ structures following BDNF stimulation.

      Author response image 3.

      GFP+ puncta per unit of neurite length (µm) in DIV14 WT primary neurons untreated or upon 24 hour of BDNF treatment (100 ng/ml). No significant difference were observed (n=3).

      Figure 5. "Dendritic spine maturation is delayed in Lrrk2 knockout mice". The only significant change is at 1 month in KO which shows fewer filopodia and increased thin spines (50% vs wt). At 4 months the % of thin spines is increased to 60% in both... Filopodia also look like 4m in KO at 1m... How is that evidence for delayed maturation? If anything it suggests the KO spines are maturing faster. "the average neck height was 15% shorter and the average head width was 27% smaller, meaning that spines are smaller in Lrrk2 KO brains" - it seems odd to say this before saying that actually there are just MORE thin spines, the number of mature "mushroom' is same throughout, and the different percentage of thin comes from fewer filopodia. This central argument that maturation is delayed is not supported and could be backwards, at least according to this data. Similarly, the average PSD length is likely impacted by a preponderance of thin spines in KO... which if mature were fewer would make sense to say delayed KO maturation, but this isn't the case, it is the fewer filopodia (with no PSD) that change the numbers. See previous comments of the preceding manuscript. 

      We agree that thin spines, while often considered more immature, represent an intermediate stage in spine development. The data showing an increase in thin spines at 1 month in the KO mice, along with fewer filopodia, could suggest a faster stabilization of these spines, which might indeed be indicative of premature maturation rather than delayed maturation. This change in spine morphology may indicate that the dynamics of synaptic plasticity are affected. Regarding the PSD length, as the Reviewer pointed out, the increased presence of thin spines in KO might account for the observed changes in PSD measurements, as thin spines typically have smaller PSDs. This further reinforces the idea that the overall maturation process may be altered in the KO, but not necessarily delayed. 

      We rephrase the interpretation of these data, and moved figure 5D as supplemental figure S4.

      "To establish whether loss of Lrrk2 in young mice causes a reduction in dendritic spines size by influencing BDNF-TrkB expression" - there is no evidence of this.  

      We agree and reorganized the text, removing this sentence.  

      Shank and PSD95 mRNA changes being shown without protein adds very little. Why is drebrin RNA not shown? Also should be several housekeeping RNAs, not one (RPL27)? 

      We measured Dbn1 mRNA, which shows a significant reduction in midbrain and cortex. Moreover we have now normalized the transcript levels against the geometrical means of three housekeeping genes (RPL27, actin, and GAPDH) relative abundance.

      Drebrin levels being lower in KO seems to be the strongest result of the paper so far (shame no pLRRK2 or coIP of drebrin to back up the argument). DrebrinA KO mice have normal spines, what about haploinsufficient drebrin mice (LKO seem to have half derbrin, but only as youngsters?)  

      As extensively explained in the public review, we used Dbn1 KO mouse brains and were able to show reduced Lrrk2 activity.

      Figure 6. hIPSC-derived cortical neurons. The WT 'cortical' neurons have a very low mEPSC frequency at 0.2Hz relative to KO. Is this because they are more or less mature? What is the EPSC frequency of these cells at 30 and 90 days for comparison? Also, it is very very hard to infer anything about mEPSC frequency in the absence of estimates of cell number and more importantly synapse number. Furthermore, where are the details of cell measures such as capacitance, resistance, and quality control e.g., Ra? Table s1 seems redundant here, besides suggesting that the amplitude is higher in KO at base. 

      We agree that the developmental trajectory of iPSC-derived neurons is critical to accurately interpreting synaptic function and plasticity. In response, we have included additional data now presented in the supplementary figure S7 and summarize key findings below:

      At DIV50, both WT and LRRK2 KO neurons exhibit low basal mEPSC activity (~0.5 Hz) and no response to 24 h BDNF stimulation (50 ng/mL).

      At DIV70 WT neurons show very low basal activity (~0.2 Hz), which increases ~7.5-fold upon BDNF treatment (1.5 Hz; p < 0.001), and no change in synapse number. KO neurons display elevated basal activity (~1 Hz) similar to BDNF-treated WT neurons, with no further increase upon BDNF exposure (~1.3 Hz) and no change in synapse number.

      At DIV90, no significant effect of BDNF in both WT and KO, indicating a possible saturation of plastic responses. The lack of BDNF response at DIV90 may be due to endogenous BDNF production or culture-based saturation effects. While these factors warrant further investigation (e.g., ELISA, co-culture systems), they do not confound the key conclusions regarding the role of LRRK2 in synaptic development and plasticity:

      LRRK2 Enables BDNF-Responsive Synaptic Plasticity. In WT neurons, BDNF induces a significant increase in neurotransmitter release (mEPSC frequency) with no reduction in synapse number. This dissociation suggests BDNF promotes presynaptic functional potentiation. KO neurons fail to show changes in either synaptic function or structure in response to BDNF, indicating that LRRK2 is required for activity-dependent remodeling.

      LRRK2 Loss Accelerates Synaptic Maturation. At DIV70, KO neurons already exhibit high spontaneous synaptic activity equivalent to BDNF-stimulated WT neurons. This suggests that LRRK2 may act to suppress premature maturation and temporally gate BDNF responsiveness, aligning with the differences in maturation dynamics observed in KO mice (Figure 5).  

      As suggested by the reviewer we reported the measurement of resistance and capacitance for all DIV (Table 1, supplemental material). A reduction in capacitance was observed in WT neurons at DIV90, which may reflect changes in membrane complexity. However, this did not correlate with differences in synapse number and is unlikely to account for the observed differences in mEPSC frequency. To control for cell number between groups, cell count prior to plating was performed (80k/cm2; see also methods) on the non-dividing cells to keep cell number consistent.

      The presence of BDNF in WT seems to make them look like LKO, in the rest of the paper the suggestion is that the LKO lack a response to BDNF. Here it looks like it could be that BDNF signalling is saturated in LKO, or they are just very different at base and lack a response.

      Knowing which is important to the conclusions, and acute application (recording and BDNF wash-in) would be much more convincing.

      We agree with the Reviewer’s point that saturation of BDNF could influence the interpretation of the data if it were to occur. However, it is important to note that no BDNF exists in the media in base control and KO neuronal culture conditions. This is  different from other culture conditions and allows us to investigate the effects of  BDNF treatment. Thus, the increased mEPSC frequency observed in KO neurons compared to WT neurons is defined only by the deletion of the gene and not by other extrinsic factors which were kept consistent between the groups. The lack of response or change in mEPSC frequency in KO is proposed to be a compensatory mechanism due to the loss of LRRK2. Of Note, LRRK2 as a “synaptic break” has already been described (Beccano-Kelly et al., Hum Mol Gen, 2015). However, a comprehensive analysis of the underlying molecular mechanisms will  require future studies beyond  with the scope of this paper.

      "The LRRK2 kinase substrates Rabs are not present in the list of significant phosphopeptides, likely due to the low stoichiometry and/or abundance" Likely due to the fact mass spec does not get anywhere near everything. 

      We removed this sentence in light of the new phosphoproteomic analysis.

      Figure 7 is pretty stand-alone, and not validated in any way, hard to justify its inclusion?  

      As extensively explained we removed figure 7 and included the new phospho-MS as part of figure. 3

      Writing throughout shows a very selective and shallow use of the literature.  

      We extensively reviewed the citations.

      "while Lrrk1 transcript in this region is relatively stable during development" The authors reference a very old paper that barely shows any LRRK1 mRNA, and no protein. Others have shown that LRRK1 is essentially not present postnatally PMC2233633. This isn't even an argument the authors need to make. 

      We thank the reviewer and included this more appropriate citation. 

      Reviewer #2 (Recommendations For The Authors): 

      Cyfip1 (Fig 3A) is part of the WAVE complex (page 13). 

      We thank the reviewer and specified it.

      The discussion could be more focused. 

      We extensively revised the discussion to keep it more focused.

      Note that we updated the GO ontology analyses to reflect the updated information present in g:Profiler.

      References.

      Nirujogi, R. S., Tonelli, F., Taylor, M., Lis, P., Zimprich, A., Sammler, E., & Alessi, D. R. (2021). Development of a multiplexed targeted mass spectrometry assay for LRRK2phosphorylated Rabs and Ser910/Ser935 biomarker sites. The Biochemical journal, 478(2), 299–326. https://doi.org/10.1042/BCJ20200930

      Worth, D. C., Daly, C. N., Geraldo, S., Oozeer, F., & Gordon-Weeks, P. R. (2013). Drebrin contains a cryptic F-actin-bundling activity regulated by Cdk5 phosphorylation. The Journal of cell biology, 202(5), 793–806. https://doi.org/10.1083/jcb.201303005

      Shirao, T., Hanamura, K., Koganezawa, N., Ishizuka, Y., Yamazaki, H., & Sekino, Y. (2017). The role of drebrin in neurons. Journal of neurochemistry, 141(6), 819–834. https://doi.org/10.1111/jnc.13988

      Koganezawa, N., Hanamura, K., Sekino, Y., & Shirao, T. (2017). The role of drebrin in dendritic spines. Molecular and cellular neurosciences, 84, 85–92. https://doi.org/10.1016/j.mcn.2017.01.004

      Meixner, A., Boldt, K., Van Troys, M., Askenazi, M., Gloeckner, C. J., Bauer, M., Marto, J. A., Ampe, C., Kinkl, N., & Ueffing, M. (2011). A QUICK screen for Lrrk2 interaction partners--leucine-rich repeat kinase 2 is involved in actin cytoskeleton dynamics. Molecular & cellular proteomics: MCP, 10(1), M110.001172. https://doi.org/10.1074/mcp.M110.001172

      Parisiadou, L., & Cai, H. (2010). LRRK2 function on actin and microtubule dynamics in Parkinson disease. Communicative & integrative biology, 3(5), 396–400. https://doi.org/10.4161/cib.3.5.12286

      Chen, C., Masotti, M., Shepard, N., Promes, V., Tombesi, G., Arango, D., Manzoni, C., Greggio, E., Hilfiker, S., Kozorovitskiy, Y., & Parisiadou, L. (2024). LRRK2 mediates haloperidol-induced changes in indirect pathway striatal projection neurons. bioRxiv : the preprint server for biology, 2024.06.06.597594. https://doi.org/10.1101/2024.06.06.597594

      Cheng, J., Novati, G., Pan, J., Bycroft, C., Žemgulytė, A., Applebaum, T., Pritzel, A.,Wong, L. H., Zielinski, M., Sargeant, T., Schneider, R. G., Senior, A. W., Jumper, J., Hassabis, D., Kohli, P., & Avsec, Ž. (2023). Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science (New York, N.Y.), 381(6664), eadg7492. https://doi.org/10.1126/science.adg7492

      Beaudoin, G. M., 3rd, Schofield, C. M., Nuwal, T., Zang, K., Ullian, E. M., Huang, B., & Reichardt, L. F. (2012). Afadin, a Ras/Rap effector that controls cadherin function, promotes spine and excitatory synapse density in the hippocampus. The Journal of neuroscience : the official journal of the Society for Neuroscience, 32(1), 99–110. https://doi.org/10.1523/JNEUROSCI.4565-11.2012

      Fernández, B., Chittoor-Vinod, V. G., Kluss, J. H., Kelly, K., Bryant, N., Nguyen, A. P. T., Bukhari, S. A., Smith, N., Lara Ordóñez, A. J., Fdez, E., Chartier-Harlin, M. C., Montine, T. J., Wilson, M. A., Moore, D. J., West, A. B., Cookson, M. R., Nichols, R. J., & Hilfiker, S. (2022). Evaluation of Current Methods to Detect Cellular Leucine-Rich Repeat Kinase 2 (LRRK2) Kinase Activity. Journal of Parkinson's disease, 12(5), 1423–1447. https://doi.org/10.3233/JPD-213128

      Cirnaru, M. D., Marte, A., Belluzzi, E., Russo, I., Gabrielli, M., Longo, F., Arcuri, L., Murru, L., Bubacco, L., Matteoli, M., Fedele, E., Sala, C., Passafaro, M., Morari, M., Greggio, E., Onofri, F., & Piccoli, G. (2014). LRRK2 kinase activity regulates synaptic vesicle trafficking and neurotransmitter release through modulation of LRRK2 macromolecular complex. Frontiers in molecular neuroscience, 7, 49. https://doi.org/10.3389/fnmol.2014.00049

      Belluzzi, E., Gonnelli, A., Cirnaru, M. D., Marte, A., Plotegher, N., Russo, I., Civiero, L., Cogo, S., Carrion, M. P., Franchin, C., Arrigoni, G., Beltramini, M., Bubacco, L., Onofri, F., Piccoli, G., & Greggio, E. (2016). LRRK2 phosphorylates pre-synaptic Nethylmaleimide sensitive fusion (NSF) protein enhancing its ATPase activity and SNARE complex disassembling rate. Molecular neurodegeneration, 11, 1. https://doi.org/10.1186/s13024-015-0066-z

      Martin, E. R., Gandawijaya, J., & Oguro-Ando, A. (2022). A novel method for generating glutamatergic SH-SY5Y neuron-like cells utilizing B-27 supplement. Frontiers in pharmacology, 13, 943627. https://doi.org/10.3389/fphar.2022.943627

      Kovalevich, J., & Langford, D. (2013). Considerations for the use of SH-SY5Y neuroblastoma cells in neurobiology. Methods in molecular biology (Clifton, N.J.), 1078, 9–21. https://doi.org/10.1007/978-1-62703-640-5_2

      Drummond, N. J., Singh Dolt, K., Canham, M. A., Kilbride, P., Morris, G. J., & Kunath, T. (2020). Cryopreservation of Human Midbrain Dopaminergic Neural Progenitor Cells Poised for Neuronal Differentiation. Frontiers in cell and developmental biology, 8, 578907. https://doi.org/10.3389/fcell.2020.578907

      Tao, X., Finkbeiner, S., Arnold, D. B., Shaywitz, A. J., & Greenberg, M. E. (1998). Ca2+ influx regulates BDNF transcription by a CREB family transcription factor-dependent mechanism. Neuron, 20(4), 709–726. https://doi.org/10.1016/s0896-6273(00)810107

      El-Husseini, A. E., Schnell, E., Chetkovich, D. M., Nicoll, R. A., & Bredt, D. S. (2000). PSD95 involvement in maturation of excitatory synapses. Science (New York, N.Y.), 290(5495), 1364–1368.

      Glebov OO, Cox S, Humphreys L, Burrone J. Neuronal activity controls transsynaptic geometry. Sci Rep. 2016 Mar 8;6:22703. doi: 10.1038/srep22703. Erratum in: Sci Rep. 2016 May 31;6:26422. doi: 10.1038/srep26422. PMID: 26951792; PMCID: PMC4782104.

      Beccano-Kelly DA, Volta M, Munsie LN, Paschall SA, Tatarnikov I, Co K, Chou P, Cao LP, Bergeron S, Mitchell E, Han H, Melrose HL, Tapia L, Raymond LA, Farrer MJ, Milnerwood AJ. LRRK2 overexpression alters glutamatergic presynaptic plasticity, striatal dopamine tone, postsynaptic signal transduction, motor activity and memory. Hum Mol Genet. 2015 Mar 1;24(5):1336-49. doi: 10.1093/hmg/ddu543. Epub 2014 Oct 24. PMID: 25343991.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors use anatomical tracing and slice physiology to investigate the integration of thalamic (ATN) and retrosplenial cortical (RSC) signals in the dorsal presubiculum (PrS). This work will be of interest to the field, as the postsubiculum is thought to be a key region for integrating internal head direction representations with external landmarks. The main result is that ATN and RSC inputs drive the same L3 PrS neurons, which exhibit superlinear summation to near-coincident inputs. Moreover, this activity can induce bursting in L4 PrS neurons, which can pass the signals LMN (perhaps gated by cholinergic input).

      Strengths:

      The slice physiology experiments are carefully done. The analyses are clear and convincing, and the figures and results are well-composed. Overall, these results will be a welcome addition to the field.

      We thank this reviewer for the positive comment on our work.

      Weaknesses:

      The conclusions about the circuit-level function of L3 PrS neurons sometimes outstrip the data, and their model of the integration of these inputs is unclear. I would recommend some revision of the introduction and discussion. I also had some minor comments about the experimental details and analysis.

      Specific major comments:

      (1) I found that the authors' claims sometimes outstrip their data, given that there were no in vivo recordings during behavior. For example, in the abstract, their results indicate "that layer 3 neurons can transmit a visually matched HD signal to medial entorhinal cortex", and in the conclusion they state "[...] cortical RSC projections that carry visual landmark information converge on layer 3 pyramidal cells of the dorsal presubiculum". However, they never measured the nature of the signals coming from ATN and RSC to L3 PrS (or signals sent to downstream regions). Their claim is somewhat reasonable with respect to ATN, where the majority of neurons encode HD, but neurons in RSC encode a vast array of spatial and non-spatial variables other than landmark information (e.g., head direction, egocentric boundaries, allocentric position, spatial context, task history to name a few), so making strong claims about the nature of the incoming signals is unwarranted.

      We agree of course that RSC does not only encode landmark information. We have clarified this point in the introduction (line 69-70) and formulated more carefully in the abstract (removed the word ‘landmark’ in line 17) and in the  introduction (line 82-83). In the discussion we explicitly state that ‘In our slice work we are blind to the exact nature of the signal that is carried by ATN and RSC axons’ (line 522-523).

      (2) Related to the first point, the authors hint at, but never explain, how coincident firing of ATN and RSC inputs would help anchor HD signals to visual landmarks. Although the lesion data (Yoder et al. 2011 and 2015) support their claims, it would be helpful if the proposed circuit mechanism was stated explicitly (a schematic of their model would be helpful in understanding the logic). For example, how do neurons integrate the "right" sets of landmarks and HD signals to ensure stable anchoring? Moreover, it would be helpful to discuss alternative models of HD-to-landmark anchoring, including several studies that have proposed that the integration may (also?) occur in RSC (Page & Jeffrey, 2018; Yan, Burgess, Bicanski, 2021; Sit & Goard, 2023). Currently, much of the Discussion simply summarizes the results of the study, this space could be better used in mapping the findings to the existing literature on the overarching question of how HD signals are anchored to landmarks.

      We agree with the reviewer on the importance of the question, how do neurons integrate the “right” sets of landmarks and HD signals to ensure stable anchoring? Based on our results we provide a schematic to illustrate possible scenarios, and we include it as a supplementary figure (Figure 1, to be included in the ms as Figure 7—figure supplement 2), as well as a new paragraph in the discussion section (line 516-531).  We point out that critical information on the convergence and divergence of functionally defined inputs is still lacking, both for principal cells and interneurons

      Interestingly, recent evidence from functional ultrasound imaging and electrical single cell recording demonstrated that visual objects may refine head direction coding, specifically in the dorsal presubiculum (Siegenthaler et al. bioRxiv 2024.10.21.619417; doi: https://doi.org/10.1101/2024.10.21.619417). The increase in firing rate for HD cells whose preferred firing direction corresponds to a visual landmark could be supported by the supralinear summation of thalamic HD signals and retrosplenial input described in our study. We include this point in the discussion (line 460-462), and hope that our work will spur further investigations.

      Reviewer #2 (Public Review):

      Richevaux et al investigate how anterior thalamic (AD) and retrosplenial (RSC) inputs are integrated by single presubicular (PrS) layer 3 neurons. They show that these two inputs converge onto single PrS layer 3 principal cells. By performing dual-wavelength photostimulation of these two inputs in horizontal slices, the authors show that in most layer 3 cells, these inputs summate supra-linearly. They extend the experiments by focusing on putative layer 4 PrS neurons, and show that they do not receive direct anterior thalamic nor retrosplenial inputs; rather, they are (indirectly) driven to burst firing in response to strong activation of the PrS network.

      This is a valuable study, that investigates an important question - how visual landmark information (possibly mediated by retrosplenial inputs) converges and integrates with HD information (conveyed by the AD nucleus of the thalamus) within PrS circuitry. The data indicate that near-coincident activation of retrosplenial and thalamic inputs leads to non-linear integration in target layer 3 neurons, thereby offering a potential biological basis for landmark + HD binding.

      The main limitations relate to the anatomical annotation of 'putative' PrS L4 neurons, and to the presentation of retrosplenial/thalamic input modularity. Specifically, more evidence should be provided to convincingly demonstrate that the 'putative L4 neurons' of the PrS are not distal subicular neurons (as the authors' anatomy and physiology experiments seem to indicate). The modularity of thalamic and retrosplenial inputs could be better clarified in relation to the known PrS modularity.

      We thank the reviewer for their important feedback. We discuss what defines presubicular layer 4 in horizontal slices, cite relevant literature, and provide new and higher resolution images. See below for detailed responses to the reviewer’s comments, in the section ‘recommendations to authors’.

      Reviewer #3 (Public Review):

      Summary:

      The authors sought to determine, at the level of individual presubiculum pyramidal cells, how allocentric spatial information from the retrosplenial cortex was integrated with egocentric information from the anterior thalamic nuclei. Employing a dual opsin optogenetic approach with patch clamp electrophysiology, Richevaux, and colleagues found that around three-quarters of layer 3 pyramidal cells in the presubiculum receive monosynaptic input from both brain regions. While some interesting questions remain (e.g. the role of inhibitory interneurons in gating the information flow and through different layers of presubiculum, this paper provides valuable insights into the microcircuitry of this brain region and the role that it may play in spatial navigation).

      Strengths:

      One of the main strengths of this manuscript was that the dual opsin approach allowed the direct comparison of different inputs within an individual neuron, helping to control for what might otherwise have been an important source of variation. The experiments were well-executed and the data was rigorously analysed. The conclusions were appropriate to the experimental questions and were well-supported by the results. These data will help to inform in vivo experiments aimed at understanding the contribution of different brain regions in spatial navigation and could be valuable for computational modelling.

      Weaknesses:

      Some attempts were made to gain mechanistic insights into how inhibitory neurotransmission may affect processing in the presubiculum (e.g. Figure 5) but these experiments were a little underpowered and the analysis carried out could have been more comprehensively undertaken, as was done for other experiments in the manuscript.

      We agree that the role of interneurons for landmark anchoring through convergence in Presubiculum requires further investigation. In our latest work on the recruitment of VIP interneurons we begin to address this point in slices (Nassar et al., 2024 Neuroscience. doi: 10.1016/j.neuroscience.2024.09.032.); more work in behaving animals will be needed.

      Reviewer #1 (Recommendations For The Authors):

      Full comments below. Beyond the (mostly minor) issues noted below, this is a very well-written paper and I look forward to seeing it in print.

      Major comments:

      (1) I found that the authors' claims sometimes outstrip their data, given that there were no in vivo recordings during behavior. For example, in the abstract, their results indicate "that layer 3 neurons can transmit a visually matched HD signal to medial entorhinal cortex", and in the conclusion they state "[...] cortical RSC projections that carry visual landmark information converge on layer 3 pyramidal cells of the dorsal presubiculum". However, they never measured the nature of the signals coming from ATN and RSC to L3 PrS (or signals sent to downstream regions). Their claim is somewhat reasonable with respect to ATN, where the majority of neurons encode HD, but neurons in RSC encode a vast array of spatial and non-spatial variables other than landmark information (e.g., head direction, egocentric boundaries, allocentric position, spatial context, task history to name a few), so making strong claims about the nature of the incoming signals is unwarranted.

      Our study was motivated by the seminal work from Yoder et al., 2011 and 2015, indicating that visual landmark information is processed in PoS and from there transmitted to the LMN.  Based on that, and in the interest of readability, we may have used an oversimplified shorthand for the type of signal carried by RSC axons. There are numerous studies indicating a role for RSC in encoding visual landmark information (Auger et al., 2012; Jacob et al., 2017; Lozano et al., 2017; Fischer et al., 2020; Keshavarzi et al., 2022; Sit and Goard, 2023); we agree of course that this is certainly not the only variable that is represented. Therefore we change the text to make this point clear:

      Abstract, line 17: removed the word ‘landmark’

      Introduction, line 69: added “...and supports an array of cognitive functions including memory, spatial and non-spatial context and navigation (Vann et al., 2009; Vedder et al., 2017). ”

      Introduction, line 82: changed “...designed to examine the convergence of visual landmark information, that is possibly integrated in the RSC, and vestibular based thalamic head direction signals”.

      Discussion, line 522-523: added “In our slice work we are blind to the exact nature of the signal that is carried by ATN and RSC axons.”

      (2) Related to the first point, the authors hint at, but never explain, how coincident firing of ATN and RSC inputs would help anchor HD signals to visual landmarks. Although the lesion data (Yoder et al., 2011 and 2015) support their claims, it would be helpful if the proposed circuit mechanism was stated explicitly (a schematic of their model would be helpful in understanding the logic). For example, how do neurons integrate the "right" sets of landmarks and HD signals to ensure stable anchoring? Moreover, it would be helpful to discuss alternative models of HD-to-landmark anchoring, including several studies that have proposed that the integration may (also?) occur in RSC (Page & Jeffrey, 2018; Yan, Burgess, Bicanski, 2021; Sit & Goard, 2023). Currently, much of the Discussion simply summarizes the results of the study, this space could be better used in mapping the findings to the existing literature on the overarching question of how HD signals are anchored to landmarks.

      We suggest a physiological mechanism for inputs to be selectively integrated and amplified, based on temporal coincidence. Of course there are still many unknowns, including the divergence of connections from a single thalamic or retrosplenial input neuron. The anatomical connectivity of inputs will be critical, as well as the subcellular arrangement of synaptic contacts. Neuromodulation and changes in the balance of excitation and inhibition will need to be factored in. While it is premature to provide a comprehensive explanation for landmark anchoring of HD signals in PrS, our results have led us to include a schematic, to illustrate our thinking (Figure 1, see below).

      Do HD tuned inputs from thalamus converge on similarly tuned HD neurons only? Is divergence greater for the retrosplenial inputs? If so, thalamic input might pre-select a range of HD neurons, and converging RSC input might narrow down the precise HD neurons that become active (Figure 1). In the future, the use of activity dependent labeling strategies might help to tie together information on the tuning of pre-synaptic neurons, and their convergence or divergence onto functionally defined postsynaptic target cells. This critical information is still lacking, for principal cells, and also for interneurons. 

      Interneurons may have a key role in HD-to-landmark anchoring. SST interneurons support stability of HD signals (Simonnet et al., 2017) and VIP interneurons flexibly disinhibit the system (Nassar et al., 2024). Could disinhibition be a necessary condition to create a window of opportunity for updating the landmark anchoring of the attractor? Single PV interneurons might receive thalamic and retrosplenial inputs non-specifically. We need to distinguish the conditions for when the excitation-inhibition balance in pyramidal cells may become tipped towards excitation, and the case of coincident, co-tuned thalamic and retrosplenial input may be such a condition. Elucidating the principles of hardwiring of inputs, as for example, selective convergence, will be necessary. Moreover, neuromodulation and oscillations may be critical for temporal coordination and precise temporal matching of HD-to-landmark signals.

      We note that matching directional with visual landmark information based on temporal coincidence as described here does not require synaptic plasticity. Algorithms for dynamic control of cognitive maps without synaptic plasticity have been proposed (Whittington et al., 2025, Neuron): information may be stored in neural attractor activity, and the idea that working memory may rely on recurrent updates of neural activity might generalize to the HD system. We include these considerations in the discussion (line 497-501; 521-531) and hope that our work will spur further experimental investigations and modeling work.

      While the focus of our work has been on PrS, we agree that RSC also treats HD and landmark signals. Possibly the RSC registers a direction to a landmark rather than comparing it with the current HD (Sit & Goard, 2023). We suggest that this integrated information then reaches PrS. In contrast to RSC, PrS is uniquely positioned to update the signal in the LMN (Yoder et al., 2011), cf. discussion (line 516-520).

      Minor comments:

      (1) Fig 1 - Supp 1: It appears there is a lot of input to PrS from higher visual regions, could this be a source of landmark signals?

      Yes, higher visual regions projecting to PrS may also be a source of landmark information, even if the visual signal is not integrated with HD at that stage (Sit & Goard 2023). The anatomical projection from the visual cortex was first described by Vogt & Miller (1983), but not studied on a functional level so far.

      (2) Fig 2F, G: Although the ATN and RSC measurements look quite similar, there are no stats included. The authors should use an explicit hypothesis test.

      We now compare the distributions of amplitudes and of latencies, using the Mann-Whitney U test. No significant difference between the two groups were found. Added in the figure legend: 2F, “Mann-Whitney U test revealed no significant difference (p = 0.95)”. 2G, “Mann-Whitney U test revealed no significant difference (p = 0.13)”.

      (3) Fig 2 - Supp 2A, C: Again, no statistical tests. This is particularly important for panel A, where the authors state that the latencies are similar but the populations appear to be different.

      Inputs from ATN and RSC have a similar ‘jitter’ (latency standard deviation) and ‘tau decay’. We added in the Fig 2 - Supp 2 figure legend: A, “Mann-Whitney U test revealed no significant difference (p = 0.26)”. C, “Mann-Whitney U test revealed no significant difference (p = 0.87)”.

      As a complementary measure for the reviewer, we performed the Kolmogorov-Smirnov test which confirmed that the populations’ distributions for ‘jitter’ were not significantly different, p = 0.1533.

      (4) Fig 4E, F: The statistics reporting is confusing, why are asterisks above the plots and hashmarks to the side?

      Asterisks refer to a comparison between ‘dual’ and ‘sum’ for each of the 5 stimulations in a Sidak multiple comparison test. Hashmarks refer to comparison of the nth stimulation to the 1st one within dual stimulation events (Friedman + Dunn’s multiple comparison test). We mention the two-way ANOVA p-value in the legend (Sum v Dual, for both Amplitude and Surface).

      (5) Fig 5C: I was confused by the 2*RSC manipulation. How do we know if there is amplification unless we know what the 2*RSC stim alone looks like?

      We now label the right panel in Fig 5C as “high light intensity” or “HLI”. Increasing the activation of Chrimson increases the amplitude of the summed EPSP that now exceeds the threshold for amplification of synaptic events. Amplification refers to the shape of the plateau-like prolongation of the peak, most pronounced on the second EPSP, now indicated with an arrow.  We clarify this also in the text (line 309-310).

      (6) Fig 6D (supplement 1): Typo, "though" should be "through"

      Yes, corrected (line 1015).

      (7) Fig 6G (supplement 1): Typo, I believe this refers to the dotted are in panel F, not panel A.

      Yes, corrected (line 1021).

      (8) Fig 7: The effect of muscarine was qualitatively described in the Results, but there is no quantification and it is not shown in the Figure. The results should either be reported properly or removed from the Results.

      We remove the last sentence in the Results.

      (9) Methods: The age and sex of the mice should be reported. Transgenic mouse line should be reported (along with stock number if applicable).

      We used C57BL6 mice with transgenic background (Ai14 mice, Jax n007914  reporter line) or C57BL6 wild type mice. This is now indicated in the Methods (lines 566-567).

      (10) Methods: If the viruses are only referred to with their plasmid number, then the capsid used for the viruses should be specified. For example, I believe the AAV-CAG-tomato virus used the retroAAV capsid, which is important to the experiment.

      Thank you for pointing this out. Indeed the AAV-CAG-tdTom virus used the retroAAV capsid, (line 575).

      (11) Data/code availability: I didn't see any sort of data/code availability statement, will the data and code be made publicly available?

      Data are stored on local servers at the SPPIN, Université Paris Cité, and are made available upon reasonable request. Code for intrinsic properties analysis is available on github (https://github.com/schoki0710/Intrinsic_Properties). This information is now included (line 717-720).

      (12) Very minor (and these might be a matter of opinion), but I believe "records" should be "recordings", and "viral constructions" should be "viral constructs".

      The text had benefited from proofreading by Richard Miles, who always preferred “records” to “recordings” in his writings. We choose to keep the current wording.

      Reviewer #2 (Recommendations For The Authors):

      Below are two major points that require clarification.

      (1) In the last set of experiments presented by the authors (Figs 6 onwards) they focus on 'putative L4' PrS cells. For several lines of evidence (outlined below), I am convinced that these neurons are not presubicular, but belong to the subiculum. I think this is a major point that requires substantial clarification, in order to avoid confusion in the field (see also suggestions on how to address this comment at the end of this section).

      Several lines of evidence support the interpretation that, what the authors call 'L4 PrS neurons', are distal subicular cells:

      (1.1) The anatomical location of the retrogradely-labelled cells (from mammillary bodies injections), as shown in Figs 6B, C, and Fig. 6_1B, very clearly indicates that they belong to the distal subiculum. The subicular-to-PrS boundary is a sharp anatomical boundary that follows exactly the curvature highlighted by the authors' red stainings. The authors could also use specific subicular/PrS markers to visualize this border more clearly - e.g. calbindin, Wfs-1, Zinc (though I believe this is not strictly necessary, since from the pattern of AD fibers, one can already draw very clear conclusions, see point 1.3 below).

      Our criteria to delimit the presubiculum are the following: First and foremost, we rely on the defining presence of antero-dorsal thalamic fibers that target specifically the presubiculum and not the neighbouring subiculum (Simonnet et al., 2017, Nassar et al., 2018, Simonnet and Fricker, 2018; Jiayan Liu et al., 2021). This provides the precise outline of the presubicular superficial layers 1 to 3. It may have been confusing to the reviewer that our slicing angle gives horizontal sections. In fact, horizontal sections are favourable to identify the layer structure of the PrS,  based on DAPI staining and the variations in cell body size. The work by Ishihara and Fukuda (2016) illustrates in their Figure 12 that the presubicular layer 4 lies below the presubicular layer 3, and forms a continuation with the subiculum (Sub1). Their Figure 4 indicates with a dotted line the “generally accepted border between the (distal) subiculum and PreS”, and it runs from the proximal tip of superficial cells of the PrS toward the white matter, among the radial direction of the cortical tissue.  We agree with this definition. Others have sliced coronally (Cembrowski et al., 2018) which renders a different visualization of the border region with the subiculum.

      Second, let me explain the procedure for positioning the patch electrode in electrophysiological experiments on horizontal presubicular slices. Louis Richevaux, the first author, who carried out the layer 4 cell recordings, took great care to stay very close (<50 µm) to the lower limit of the zone where the GFP labeled thalamic axons can be seen. He was extremely meticulous about the visualization under the microscope, using LED illumination, for targeting. The electrophysiological signature of layer 4 neurons with initial bursts (but not repeated bursting, in mice) is another criterion to confirm their identity (Huang et al., 2017). Post-hoc morphological revelation showed their apical dendrites, running toward the pia, sometimes crossing through the layer 3, sometimes going around the proximal tip, avoiding the thalamic axons (Figure 6D). For example the cell in Figure 6, suppl. 1 panel D, has an apical dendrite that runs through layer 3 and layer 1. 

      Third, retrograde labeling following stereotaxic injection into the LMN is another criterion to define PrS layer 4. This approach is helpful for visualization, and is based on the defining axonal projection of layer 4 neurons (Yoder and Taube, 2011; Huang et al., 2017). Due to the technical challenge to stereotaxically inject only into LMN, the resultant labeling may not be limited to PrS layer 4. We cannot entirely exclude some overflow of retrograde tracers (B) or retrograde virus (C) to the neighboring MMN. This would then lead to co-labeling of the subiculum. In the main Figure 6, panels B and C, we agree that for this reason the red labelled cell bodies likely include also subicular neurons, on the proximal side, in addition to L4 presubicular neurons. We now point out this caveat in the main text (line 324-326) and in the methods (line 591-592).

      (1.2) Consistent with their subicular location, neuronal morphologies of the 'putative L4 cells' are selectively constrained within the subicular boundaries, i.e. they do not cross to the neighboring PrS (maybe a minor exception in Figs. 6_1D2,3). By definition, a neuron whose morphology is contained within a structure belongs to that structure.

      From a functional point of view, for the HD system, the most important criterion for defining presubicular layer 4 neurons is their axonal projection to the LMN (Yoder and Taube 2011). From an electrophysiological standpoint, it is the capacity of layer 4 neurons to fire initial bursts (Simonnet et al., 2013; Huang et al., 2017).  Anatomically, we note that the expectation that the apical dendrite should go straight up into layer 3 might not be a defining criterion in this curved and transitional periarchicortex. Presubicular layer 4 apical dendrites may cross through layer 3 and exit to the side, towards the subiculum (This is the red dendritic staining at the proximal end of the subiculum, at the frontier with the subiculum, Figure 6 C).

      (1.3) As acknowledged by the authors in the discussion (line 408): the PrS is classically defined by the innervation domain of AD fibers. As Figure 6B clearly indicates, the retrogradely-labelled cells ('putative L4') are convincingly outside the input domain of the AD; hence, they do not belong to the PrS.

      The reviewer is mistaken here, the deep layers 4 and 5/6 indeed do not lie in the zone innervated by the thalamic fibers (Simonnet et al., 2017; Nassar et al., 2018; Simonnet and Fricker, 2018) but still belong to the presubiculum. The presubicular deep layers are located below the superficial layers, next to, and in continuation of the subiculum. This is in agreement with work by Yoder and Taube 2011; Ishihara and Fukuda 2016; Boccara, … Witter, 2015; Peng et al., 2017 (Fig 2D); Yoshiko Honda et al., (Marmoset, Fig 2A) 2022; Balsamo et al., 2022 (Figure 2B).

      (1.4) Along with the above comment: in my view, the optogenetic stimulation experiments are an additional confirmation that the 'putative L4 cells' are subicular neurons, since they do not receive AD inputs at all (hence, they are outside of the PrS); they are instead only indirectly driven upon strong excitation of the PrS. This indirect activation is likely to occur via PrS-to-Subiculum 'back-projections', the existence of which is documented in the literature and also nicely shown by the authors (see Figure 1_1 and line 109).

      See above. Only superficial layers 1-3 of the presubiculum receive direct AD input.

      (1.5) The electrophysiological properties of the 'putative L4 cells' are consistent with their subicular identity, i.e. they show a sag current and they are intrinsically bursty.

      Presubicular layer 4 cells also show bursting behaviour and a sag current (Simonnet et al., 2013; Huang et al., 2017).

      From the above considerations, and the data provided by the authors, I believe that the most parsimonious explanation is that these retrogradely-labelled neurons (from mammillary body injections), referred to by the authors as 'L4 PrS cells', are indeed pyramidal neurons from the distal subiculum.

      We agree that the retrograde labeling is likely not limited to the presubicular layer 4 cells, and we now indicate this in the text (line 324-326). However, the portion of retrogradely labeled neurons that is directly below the layer 3 should be considered as part of the presubiculum.

      I believe this is a fundamental issue that deserves clarification, in order to avoid confusion/misunderstandings in the field. Given the evidence provided, I believe that it would be inaccurate to call these cells 'L4 PrS neurons'. However, I acknowledge the fact that it might be difficult to convincingly and satisfactorily address this issue within the framework of a revision. For example, it is possible that these 'putative L4 cells' might be retrogradely-labelled from the Medial Mammillary Body (a major subicular target) since it is difficult to selectively restrict the injection to the LMN, unless a suitable driver line is used (if available). The authors should also consider the possibility of removing this subset of data (referring to putative L4), and instead focus on the rest of the story (referring to L3)- which I think by itself, still provides sufficient advance.

      We agree with the reviewer that it is difficult to provide a satisfactory answer. To some extent, the reviewer’s comments target the nomenclature of the subicular region. This transitional region between the hippocampus and the entorhinal cortex has been notoriously ill defined, and the criteria are somewhat arbitrary for determining exactly where to draw the line. Based on the thalamic projection, presubicular layers 1-3 can now be precisely outlined, thanks to the use of viral labeling. But the presubicular layer 4 had been considered to be cell-free in early works, and termed ‘lamina dissecans’ (Boccara 2010), as the limit between the superficial and deep layers. Then it became of great interest to us and to the field, when the PrS layer 4 cells were first identified as LMN projecting neurons (Yoder and Taube 2011). This unique back-projection to the upstream region of the HD system is functionally very important, closing the loop of the Papez circuit (mammillary bodies - thalamus - hippocampal structures).

      We note that the reviewer does not doubt our results, rather questions the naming conventions. We therefore maintain our data. We agree that in the future a genetically defined mouse line would help to better pin down this specific neuronal population.

      We thank the reviewer for sharing their concerns and giving us the opportunity to clarify our experimental approach to target the presubicular layer 4. We hope that these explanations will be helpful to the readers of eLife as well.

      (2) The PrS anatomy could be better clarified, especially in relation to its modular organization (see e.g. Preston-Ferrer et al., 2016; Ray et al., 2017; Balsamo et al., 2022). The authors present horizontal slices, where cortical modularity is difficult to visualize and assess (tangential sections are typically used for this purpose, as in classical work from e.g. barrel cortex). I am not asking the authors to validate their observations in tangential sections, but just to be aware that cortical modules might not be immediately (or clearly) apparent, depending on the section orientation and thickness. The authors state that AD fibers were 'not homogeneously distributed' in L3 (line 135) and refer to 'patches of higher density in deep L3' (line 136). These statements are difficult to support unless more convincing anatomy and  . I see some L3 inhomogeneity in the green channel in Fig. 1G (last two panels) and also in Fig. 1K, but this seems to be rather upper L3. I wonder how consistent the pattern is across different injections and at what dorsoventral levels this L3 modularity is observed (I think sagittal sections might be helpful). If validated, these observations could point to the existence of non-homogeneous AD innervation domains in L3 - hinting at possible heterogeneity among the L3 pyramidal cell targets. Notably, modularity in L2 and L1 is not referred to. The authors state that AD inputs 'avoid L2' (line 131) but this statement is not in line with recent work (cited above) and is also not in line with their anatomy data in Fig. 1G, where modularity is already quite apparent in L2 (i.e. there are territories avoided by the AD fibers in L2) and in L1 (see for example the last image in Fig. 1G). This is the case also for the RSC axons (Fig. 1H) where a patchy pattern is quite clear in L1 (see the last image in panel H). Higher-mag pictures might be helpful here. These qualitative observations imply that AD and RSC axons probably bear a precise structural relationship relative to each other, and relative to the calbindin patch/matrix PrS organization that has been previously described. I am not asking the authors to address these aspects experimentally, since the main focus of their study is on L3, where RSC/AD inputs largely converge. Better anatomy pictures would be helpful, or at least a better integration of the authors' (qualitative) observations within the existing literature. Moreover, the authors' calbindin staining in Fig. 1K is not particularly informative. Subicular, PaS, MEC, and PrS borders should be annotated, and higher-resolution images could be provided. The authors should also check the staining: MEC appears to be blank but is known to strongly express calb1 in L2 (see 'island' by Kitamura et al., Ray et al., Science 2014; Ray et al., frontiers 2017). As additional validation for the staining: I would expect that the empty L2 patches in Figs. 1G (last two panels) would stain positive for Calbindin, as in previous work (Balsamo et al. 2022).

      We now provide a new figure showing the pattern of AD innervation in PrS superficial layers 1 to 3, with different dorso-ventral levels and higher magnification (Figure 2). Because our work was aimed at identifying connectivity between long-range inputs and presubicular neurons, we chose to work with horizontal sections that preserve well the majority of the apical dendrites of presubicular pyramidal neurons. We feel it is enriching for the presubicular literature to show the cytoarchitecture from different angles and to show patchiness in horizontal sections. The non-homogeneous AD innervation domains (‘microdomains’) in L3 were consistently observed across different injections in different animals.

      Author response image 1.

      Thalamic fiber innervation pattern. A, ventral, and B, dorsal horizontal section of the Presubiculum containing ATN axons expressing GFP. Patches of high density of ATN axonal ramifications in L3 are indicated as “ATN microdomains”. Layers 1, 2, 3, 4, 5/6 are indicated.  C, High magnification image (63x optical section)(different animal).<br />

      We also provide a supplementary figure with images of horizontal sections of calbindin staining in PrS, with a larger crop, for the reviewer to check (Figure 3, see below). We thank the reviewer for pointing out recent studies using tangential sections. Our results agree with the previous observation that AD axons are found in calbindin negative territories (cf Fig 1K). Calbindin+ labeling is visible in the PrS layer 2 as well as in some patches in the MEC (Figure 3 panel A). Calbindin staining tends to not overlap with the territories of ATN axonal ramification. We indicate the inhomogeneities of anterior thalamic innervation that form “microdomains” of high density of green labeled fibers, located in layer 1 and layer 3 (Figure 3, Panel A, middle). Panel B shows another view of a more dorsal horizontal section of the PrS, with higher magnification, with a big Calbindin+ patch near the parasubiculum.

      The “ATN+ microdomains” possess a high density of axonal ramifications from ATN, and have been previously documented in the literature. They are consistently present. Our group had shown them in the article by Nassar et al., 2018, at different dorsoventral levels (Fig 1 C (dorsal) and 1D (ventral) PrS). See also Simonnet et al., 2017, Fig 2B, for an illustration of the typical variations in densities of thalamic fibers, and supplementary Figure 1D. Also Jiayan Liu et al., 2021 (Figure 2 and Fig 5) show these characteristic microzones of dense thalamic axonal ramifications, with more or less intense signals across layers 1, 2, and 3.  While it is correct that thalamic axons can be seen to cross layer 2 to ramify in layer 1, we maintain that AD axons typically do not ramify in layer 2. We modify the text to say, “mostly” avoiding L2 (line 130).

      The reviewer is correct in pointing out that the 'patches of higher density in deep L3' are not only in the deep L3, as in the first panel in Fig 1G, but in the more dorsal sections they are also found in the upper L3. We change the text accordingly (line 135-136) and we provide the layer annotation in Figure 1G. We further agree with the reviewer that RSC axons also present a patchy innervation pattern. We add this observation in the text (line 144).

      It is yet unclear whether anatomical microzones of dense ATN axon ramifications in L3 might fulfill the criteria of a functional modularity, as it is the case for the calbindin patch/matrix PrS organization (Balsamo et al., 2022). As the reviewer points out, this will require more information on the precise structural relationship of AD and RSC axons relative to each other, as well as functional studies. Interestingly, we note a degree of variation in the amplitudes of oEPSC from different L3 neurons (Fig. 2F, discussion line 420; 428), which might be a reflection of the local anatomo-functional micro-organization.

      Minor points:

      (1) The pattern or retrograde labelling, or at least the way is referred to in the results (lines 104ff), seems to imply some topography of AD-to-PreS projections. Is it the case? How consistent are these patterns across experiments, and individual injections? Was there variability in injection sites along the dorso-ventral and possibly antero-posterior PrS axes, which could account for a possibly topographical AD-to-PrS input pattern? It would be nice to see a DAPI signal in Fig. 1B since the AD stands out quite clearly in DAPI (Nissl) alone.

      Yes, we find a consistent topography for the AD-to-PrS projection, for similar injection sites in the presubiculum. The coordinates for retrograde labeling were as indicated -4.06 (AP), 2.00 (ML) and -2.15 mm (DV) such that we cannot report on possible variations for different injection sites.

      (2) Fig. 2_2KM: this figure seems to show the only difference the authors found between AD and RS input properties. The authors could consider moving these data into main Fig. 2 (or exchanging them with some of the panels in F-O, which instead show no difference between AD and RSC). Asterisks/stats significance is not visible in M.

      For space reasons we leave the panels of Fig. 2_2KM in the supplementary section. We increased the size of the asterisk in M.

      (3) The data in Fig. 1_1 are quite interesting, since some of the PrS projection targets are 'non-canonical'. Maybe the authors could consider showing some injection sites, and some fluorescence images, in addition to the schematics. Maybe the authors could acknowledge that some of these projection targets are 'putative' unless independently verified by e.g. retrograde labeling. Unspecific white matter labelling and/or spillover is always a potential concern.

      We now include the image of the injection site for data in Fig. 1_1 as a supplementary Fig. 1_2. The Figure 1_1 shows the retrogradely labeled upstream areas of Presubiculum.

      Author response image 2.

      Retrobeads were injected in the right Presubiculum.<br />

      (4) The authors speculate that the near-coincident summation of RS + AD inputs in L3 cells could be a potential mechanism for the binding of visual + HD information in PrS. However, landmarks are learned, and learning typically implies long-term plasticity. As the authors acknowledge in the discussion (lines 493ff) GluR1 is not expressed in PrS cells. What alternative mechanics could the authors envision? How could the landmark-update process occur in PrS, if is not locally stored? RSC could also be involved (Jakob et al) as acknowledged in the introduction - the authors should keep this possibility open also in the discussion.

      A similar point has been raised by Reviewer 1, please check our answer to their point 2. Briefly, our results indicate that HD-to-landmark updating is a multi-step process. RSC may be one of the places where landmarks are learned. The subsequent temporal mapping of HD to landmark signals in PrS might be plasticity-free, as matching directional with visual landmark information based on temporal coincidence does not necessarily require synaptic plasticity.  It seems likely that there is no local storage and no change in synaptic weights in PrS. The landmark-anchored HD signals reach LMN via L4 neurons, sculpting network dynamics across the Papez circuit. One possibility is that the trace of a landmark that matches HD may be stored as patterns of neural activity that could guide navigation (cf. El-Gaby et al., 2024, Nature) Clearly more work is needed to understand how the HD attractor is updated on a mechanistic level. Recent work in prefrontal cortex mentions “activity slots” and delineates algorithms for dynamic control of cognitive maps without synaptic plasticity (Whittington et al., 2025, Neuron): information may be stored in neural attractor activity, and the idea that working memory may rely on recurrent updates of neural activity might generalize to the HD system. We include these considerations in the discussion (line 499-503; 523-533) and also point to alternative models (line 518 -522) including modeling work in the retrosplenial cortex.

      (5) The authors state that (lines 210ff) their cluster analysis 'provided no evidence for subpopulations of layer 3 cells (but see Balsamo et al., 2022)' implying an inconsistency; however, Balsamo et al also showed that the (in vivo) ephys properties of the two HD cell 'types' are virtually identical, which is in line with the 'homogeneity' of L3 ephys properties (in slice) in the authors' data. Regarding the possible heterogeneity of L3 cells: the authors report inhomogeneous AD innervation domains in L3 (see also main comment 2) and differences in input summation (some L3 cells integrate linearly, some supra-linearly; lines 272) which by itself might already imply some heterogeneity. I would therefore suggest rewording the statements to clarify what the lack of heterogeneity refers to.

      We agree. In line 212 we now state “cluster analysis (Figure 2D) provided no evidence for subpopulations of layer 3 cells in terms of intrinsic electrophysiological properties (see also Balsamo et al., 2022).”

      (6) n=6 co-recorded pairs are mentioned at line 348, but n=9 at line 366. Are these numbers referring to the same dataset? Please correct or clarify

      Line 349 refers to a set of 6 co-recorded pairs (n=12 neurons) in double injected mice with Chronos injected in ATN and Chrimson in RSC (cf. Fig. 7E). The 9 pairs mentioned in line 367 refer to another type of experiment where we stimulated layer 3 neurons by depolarizing them to induce action potential firing while recording neighboring layer 4 neurons to assess connectivity. Line 367  now reads: “In n = 9 paired recordings, we did not detect functional synapses between layer 3 and layer 4 neurons.”

      Reviewer #3 (Recommendations For The Authors):

      Questions for the authors/points for addressing:

      I found that the slice electrophysiology experiments were not reported with sufficient detail. For example, in Figure 2, I am assuming that the voltage clamp experiments were carried out using the Cs-based recording solution, while the current clamp experiments were carried out using the K-Gluc intracellular solution. However, this is not explicitly stated and it is possible that all of these experiments were performed using the K-Gluc solution, which would give slightly odd EPSCs due to incomplete space/voltage clamp. Furthermore, the method states that gabazine was used to block GABA(A) receptor-mediated currents, but not when this occurred. Was GABAergic neurotransmission blocked for all measurements of EPSC magnitude/dynamics? If so, why not block GABA(B) receptors? If not blocking GABAergic transmission for measuring EPSCs, why not? This should be stated explicitly either way.

      The addition of drugs or difference of solution is indicated in the figure legend and/or in the figure itself, as well as in the methods. We now state explicitly: “In a subset of experiments, the following drugs were used to modulate the responses to optogenetic stimulations; the presence of these drugs is indicated in the figure and figure legend, whenever applicable.” (line 632). A Cs-based internal solution and gabazine were used in Figure 5, this is now indicated in the Methods section (line 626). All other experiments were performed using K-Gluc as an internal solution and ACSF.

      Methods: The experiments involving animals are incompletely reported. For example, were both sexes used? The methods state "Experiments were performed on wild‐type and transgenic C57Bl6 mice" - what transgenic mice were used and why is this not reported in detail (strain, etc)? I would refer the authors to the ARRIVE guidelines for reporting in vivo experiments in a reproducible manner (https://arriveguidelines.org/).

      We now added this information in the methods section, subsection “Animals” (line 566-567). Animals of both sexes were used. The only transgenic mouse line used was the Ai14 reporter line (no phenotype), depending on the availability in our animal facility.

      For experiments comparing ATN and RSC inputs onto the same neuron (e.g. Figure 2 supplement 2 G - J), are the authors certain that the observed differences (e.g. rise time and paired-pulse facilitation on the ATN input) are due to differences in the synapses and not a result of different responses of the opsins? Refer to https://pubmed.ncbi.nlm.nih.gov/31822522/ from Jess Cardin's lab. This could easily be tested by switching which opsin is injected into which nucleus (a fair amount of extra work) or comparing the Chrimson synaptic responses with those evoked using Chronos on the same projection, as used in Figure 2 (quite easy as authors should already have the data).

      We actually did switch the opsins across the two injection sites. In Figure 2 - supplement 2G-J, the values linked by a dashed line result from recordings in the switched configuration with respect to the original configuration (in full lines, Chronos injected in RSC and Chrimson in ATN). The values from switched configuration followed the trend of the main configuration and were not statistically different (Mann-Whitney U test).

      Statistical reporting: While the number of cells is generally reported for experiments, the number of slices and animals is not. While slice ephys often treat cells as individual biological replicates, this is not entirely appropriate as it could be argued that multiple cells from a single animal are not independent samples (some sort of mixed effects model that accounts for animals as a random effect would be better). For the experiments in the manuscript, I don't think this is necessary, but it would certainly reassure the reader to report how many animals/slices each dataset came from. At a bare minimum, one would want any dataset to be taken from at least 3 animals from 2 different litters, regardless of how many cells are in there.

      Our slice electrophysiology experiments include data from 38 successfully injected animals: 14 animals injected in ATN, 20 animals injected in RSC, and 4 double injected animals. Typically, we recorded 1 to 3 cells per slice. We now include this information in the text or in the figure legends (line 159, 160, 297, 767, 826, 831, 832, 839, 845, 901, 941).

      For the optogenetic experiments looking at the summation of EPSPs (e.g. figure 4), I have two questions: why were EPSPs measured and not EPSCs? The latter would be expected to give a better readout of AMPA receptor-mediated synaptic currents. And secondly, why was 20 Hz stimulation used for these experiments? One might expect theta stimulation to be a more physiologically-relevant frequency of stimulation for comparing ATN and RSC inputs to single neurons, given the relevance with spatial navigation and that the paper's conclusions were based around the head direction system. Similarly, gamma stimulation may also have been informative. Did the authors try different frequencies of stimulation?

      Question 1. The current clamp configuration allows to measure  EPSPamplification/prolongation by NMDA or persistent Na currents (cf.  Fricker and Miles 2000), which might contribute to supralinearity.

      Question 2. In a previous study from our group about the AD to PrS connection (Nassar et al., 2018), no significant difference was observed on the dynamics of EPSCs between stimulations at 10 Hz versus 30 Hz. Therefore we chose 20 Hz. This value is in the range of HD cell firing (Taube 1995, 1998 (peak firing rates, 18 to 24 spikes/sec in RSC; 41 spikes/sec in AD)(mean firing rates might be lower), Blair and Sharp 1995). In hindsight, we agree that it would have been useful to include 8Hz or 40Hz stimulations. 

      The GABA(A) antagonist experiments in Figure 5 are interesting but I have concerns about the statistical power of these experiments - n of 3 is absolutely borderline for being able to draw meaningful conclusions, especially if this small sample of cells came from just 1 or 2 animals. The number of animals used should be stated and/or caution should be applied when considering the potential mechanisms of supralinear summation of EPSPs. It looks like the slight delay in RSC input EPSP relative to ATN that was in earlier figures is not present here - could this be the loss of feedforward inhibition?

      The current clamp experiments in the presence of QX314 and a Cs gluconate based internal solution were preceded by initial experiments using puff applications of glutamate to the recorded neurons (not shown). Results from those experiments had pointed towards a role for TTX resistant sodium currents and for NMDA receptor activation as a factor favoring the amplification and prolongation of glutamate induced events. They inspired the design of the dual wavelength stimulation experiments shown in Figure 5, and oriented our discussion of the results. We agree of course that more work is required to dissect the role of disinhibition for EPSP amplification. This is however beyond the present study.

      Concerning the EPSP onset delays following RSC input stimulation:  In this set of experiments, we compensated for the notoriously longer delay to EPSP onset, following RSC axon stimulation, by shifting the photostimulation (red) of RSC fibers to -2 ms, relative to the onset of photostimulation of ATN fibers (blue). This experimental trick led to an improved  alignment of the onset of the postsynaptic response, as shown in the figure below for the reviewer.

      Author response image 3.

      In these experiments, the onset of RSC photostimulation was shifted forward in time by -2 ms, in an attempt to better align the EPSP onset to the one evoked by ATN stimulation.<br />

      We insert in the results a sentence to indicate that experiments illustrated in Figure 5 were performed in only a small sample of 3 cells that came from 2 mice (line 297), so caution should be applied. In the discussion we  formulate more carefully, “From a small sample of cells it appears that EPSP amplification may be facilitated by a reduction in synaptic inhibition (n = 3; Figure 5)” (line 487).

      Figure 7: I appreciate the difficulties in making dual recordings from older animals, but no conclusion about the RSC input can legitimately be made with n=1.

      Agreed. We want to avoid any overinterpretation, and point out in the results section that the RSC stimulation data is from a single cell pair. The sentence now reads : “... layer 4 neurons occurred after firing in the layer 3 neuron, following ATN afferent stimuli, in 4 out of 5 cell pairs. We also observed this sequence when RSC input was activated, in one tested pair.” line (347-349)

      Minor points:

      Line 104: 'within the two subnuclei that form the anterior thalamus' - the ATN actually has three subdivisions (AD, AV, AM) so this should state 'two of the three nuclei that form the anterior thalamus...'

      Corrected, line 103

      Line 125: should read "figure 1F" and not "figure 2F".

      Corrected, line 124

      Line 277-280: Why were two different posthoc tests used on the same data in Figures 3E & F?

      We used Sidak’s multicomparison test to compare each event Sum vs. Dual (two different configurations at each time point - asterisks) and Friedman’s and Dunn’s to compare the nth EPSP amplitude to the first one for Dual events (same configuration between time points - hashmarks). We give two-way ANOVA results in the legend.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Major concerns:

      (1) Is the direct binding of MCAK to the microtubule cap important for its in vivo function?

      a.The authors claim that their "study provides mechanistic insights into understanding the end-binding mechanism of MCAK". I respectfully disagree. My concern is that the paper offers limited insights into the physiological significance of direct end-binding for MCAK activity, even in vitro. The authors estimate that in the absence of other proteins in vitro, ~95% of MCAK molecules arrive at the tip by direct binding in the presence of ~ physiological ATP concentration (1 mM). In cells, however, the major end-binding pathway may be mediated by EB, with the direct binding pathway contributing little to none. This is a reasonable concern because the apparent dissociation constant measured by the authors shows that MCAK binding to microtubules in the presence of ATP is very weak (69 uM). This concern should be addressed by 1) calculating relative contributions of direct and EB-dependent pathways based on the affinities measured in this and other published papers and estimated intracellular concentrations. Although there are many unknowns about these interactions in cells, a modeling-based analysis may be revealing. 2) the recapitulation of these pathways using purifying proteins in vitro is also feasible. Ideally, some direct evidence should be provided, e.g. based on MCAK function-separating mutants (GDP-Pi tubulin binding vs. catalytic activity at the curled protofilaments) that contribution from the direct binding of MCAK to microtubule cap in EB presence is significant.

      We thank the reviewer for the thoughtful comments.

      (1) We think that the end-binding affinity of MCAK makes a significant contribution for its cellular functions. To elucidate this concept, we now use a simple model shown in Supplementary Appendix-2 (see pages 49-51, lines 1246-1316). In this model, we simplified MCAK and EB1 binding to microtubule ends by considering only these two proteins while neglecting other factors (e.g. XMAP215). Specifically, we considered two scenarios: one in which both proteins freely diffuse in the cytoplasm and another where MCAK is localized to specific cellular structures, such as the centrosome or centromere. Based on the modeling results, we argue that MCAK's functional impact at microtubule ends derives both from its intrinsic end-binding capacity and its ability to strengthen the EB1-mediated end association pathway.

      (2) We agree with the reviewer that MCAK exhibiting a lower end-binding affinity (69 µM) is indeed intriguing, as one might intuitively expect a stronger affinity, e.g. in the nanomolar range. Several factors may contribute to this observation. First, this could be partly due to the in vitro system employed, which may not perfectly replicate in vivo conditions, especially when considering cellular processes quantitatively. Variations in medium composition can significantly influence the binding state. For example, reducing salt concentration leads to a marked increase in MCAK’s binding affinity (Helenius et al., 2006; Maurer et al., 2011; McHugh et al., 2019). Additionally, while numerous binding events with short durations were detected, we excluded transient interactions from our analysis to facilitate quantification. This likely leads to an underestimation of the on-rate and, consequently, the binding affinity. Moreover, to minimize the interference of purification tags (His-tag), we ensured their complete removal during protein sample preparation. Previous studies reported that retaining the His-tag of MAPs affects the binding affinity to microtubules (Maurer et al., 2011; Zhu et al., 2009). Finally, a low affinity is not necessarily unexpected. Considering the microtubule end as a receptor with multiple binding sites for MCAK, the overall binding affinity is in the nanomolar range (260 nM). This does not necessarily contradict MCAK being a microtubule dynamics regulator as only a few MCAK molecules may suffice to induce microtubule catastrophe (as discussed on page 13, lines 408-441).

      (3) Ideally, we would search for mutants that specifically interfere with the binding of GDP-Pi-tubulin or the curled protofilaments. However, the mutant we tested significantly impacts the overall affinity of MCAK to microtubules (both end and lattice), making it challenging to isolate and discuss the function of MCAK with respect to the binding to GDP-Pi-tubulin alone. Additionally, we also think that the GDP-Pi-tubulin in the EB cap and the tubulin in the curved protofilaments may share structural similarities. For instance, the tubulin dimers in both states may be less compact compared to those in the lattice, which could explain why MCAK recognizes both simultaneously (Manka and Moores, 2018). However, this remains a conjecture, as there is currently no direct evidence to support it.

      b. As mentioned in the Discussion, preferential MCAK binding to tubulins near the MT tip may enhance MCAK targeting of terminal tubulins AFTER the MCAK has been "delivered" to the distal cap via the EB-dependent mechanism. This is a different targeting mechanism than the direct MCAK-binding. However, the measured binding affinity between MCAK and GMPCPP tubulins is so weak (69 uM), that this effect is also unlikely to have any impact because the binding events between MCAK and microtubule should be extremely rare. Without hard evidence, the arguments for this enhancement are very speculative.

      Please see our response to the comment No. 1. Additionally, we have revised our discussion to discuss the end-binding affinity of MCAK as well as its physiological relevance (please see page 13, lines 408-441; and see Supplementary Appendix-2 in pages 49-51, lines 1246-1316).

      (2) The authors do not provide sufficient justification and explanation for their investigation of the effects of different nucleotides in MCAK binding affinity. A clear summary of the nucleotide-dependent function of MCAK (introduction with references to prior affinity measurements and corresponding MCAK affinities), the justifications for this investigation, and what has been learned from using different nucleotides (discussion) should be provided. My take on these results is that by far the strongest effect on microtubule wall and tip binding is achieved by adding any adenosine, whereas differences between different nucleotides are relatively minor. Was this expected? What can be learned from the apparent similarity between ATP and AMPPNP effects in some assays (Fig 1E, 4C, etc) but not others (Fig 1D,F, etc)?

      We thank the reviewer for this suggestion. We have revised the manuscript accordingly, and below are the main points of our response

      (1) The experiment investigating the effects of different nucleotides on MCAK binding affinity was inspired by the previous studies demonstrating that kinesin-13 interactions with microtubules are highly dependent on their adenosine-bound states. For example, kinesin-13s tightly bind microtubules and prefer to form protofilament curls or rings with tubulin in the AMPPNP state, whereas kinesin-13s are considered to move along the microtubule lattice via one-dimensional diffusion in the ADP·Pi state (Asenjo et al., 2013; Benoit et al., 2018; Friel and Howard, 2011; Helenius et al., 2006). Based on these observations, we wondered whether MCAK's adenosine-bound states might similarly affect its binding preference for growing microtubule ends. We have made the motivation clear in the revised manuscript (please see page 7, lines 199-209).

      (2) Our main finding regarding the effects of nucleotides is that MCAK shows differential end-binding affinity and preference based on its nucleotide state. First, MCAK shows the greatest preference for growing microtubule ends in the ATP state, supporting the idea that diffusive MCAK (MCAK·ATP) can directly bind to growing microtubule ends. Second, MCAK·ATP also demonstrates a binding preference for GTPγS microtubules and the ends of GMPCPP microtubules. The similar trends in binding preference suggest that the affinity for GDP·Pi-tubulin and GTP-tubulin likely underpins MCAK’s preference for growing microtubule ends. To clarify these points, we have added further discussions in the manuscript (please see page 8, lines 230-233; page9, lines 258-270 and pages 13-14, lines 443-458).

      (3) It is not clear why the authors decided to use these specific mutant MCAK proteins to advance their arguments about the importance of direct tip binding. Both mutants are enzymatically inactive. Both show roughly similar tip interactions, with some (minor) differences. Without a clear understanding of what these mutants represent, the provided interpretations of the corresponding results are not convincing.

      We thank the reviewer for this comment. In the revised manuscript, we no longer draw conclusions about the importance of end-binding based on the mutant data. Instead, we think that the mutant data provide insights into the structural basis of the end-binding preference. Therefore, we have rewritten the results in this section to more accurately reflect these findings (please see page 10, lines 295-327).

      (4) GMPCPP microtubules are used in the current study to represent normal dynamic microtubule ends, based on some published studies. However, there is no consensus in the field regarding the structure of growing vs. GMPCPP-stabilized microtubule ends, which additionally may be sensitive to specific experimental conditions (buffers, temperature, age of microtubules, etc). To strengthen the authors' argument, Taxol-stabilized microtubules should be used as a control to test if the effects are specific. Additionally, the authors should consider the possibility that stronger MCAK binding to the ends of different types of microtubules may reflect MCAK-dependent depolymerization events on a very small scale (several tubulin rows). These nano-scale changes to tubulins and the microtubule end may lead to the accumulation of small tubulin-MCAK aggregates, as is seen with other MAPs and slowly depolymerizing microtubules. These effects for MCAK may also depend on specific nucleotides, further complicating the interpretation. This possibility should be addressed because it provides a different interpretation than presented in the manuscript.

      Regarding the two points raised here, our thoughts are as following

      (1) The end of GMPCPP-stabilized microtubules differs from that of growing microtubules, with the most obvious known difference being the absence of the region enriched in GDP-Pi-tubulin. We consider the end of GMPCPP microtubules as an analogue of the distal tip of growing microtubules, based on two key features: (1) curled protofilaments and (2) GMPCPP-tubulin, a close analogue of GTP-tubulin. Notably, both features are present at the ends of both GMPCPP-stabilized and growing microtubules. Moreover, we agree with the suggestion to use taxol-stabilized microtubules as a control. This would eliminate the second feature (absence of GTP-tubulin), allowing us to isolate the effect of the first feature. Therefore, we conducted this experiment, and our data showed that MCAK exhibits only a mild binding preference for the ends of taxol-stabilized microtubules, which is much less pronounced than for the ends of GMPCPP microtubules. This observation supports the idea that GMPCPP-stabilized ends closely resemble the growing ends of microtubules.

      (2) The reviewer suggested that stronger MCAK binding to the ends of different types of microtubules might reflect MCAK-dependent depolymerization events on a very small scale. This is an insightful possibility, which we had overlooked in the original manuscript. Fortunately, we performed the experiments at the single-molecule concentrations. Upon reviewing the raw data, we found that under ATP conditions, the binding events of MCAK were not cumulative (see Fig. X1 below) and showed no evidence of local accumulation of MCAK-tubulin aggregates.

      Author response image 1.

      The representative kymograph showing GFP-MCAK binding at the ends and lattice of GMPCPP microtubules in the presence of 1 mM ATP (10 nM GFP-MCAK), which corresponded to Fig. 5A. The arrow: the end-binding of MCAK. Vertical bar: 1 s; horizontal bar: 2 mm.

      (5) It would be helpful if the authors provided microtubule polymerization rates and catastrophe frequencies for assays with dynamic microtubules and MCAK in the presence of different nucleotides. The video recordings of microtubules under these conditions are already available to the authors, so it should not be difficult to provide these quantifications. They may reveal that microtubule ends are different (or not) under the examined conditions. It would also help to increase the overall credibility of this study by providing data that are easy to compare between different labs.

      We thank the reviewer for this suggestion. In the revised manuscript, we have provided data on the growth rates, which are similar across the different nucleotide states (Fig. s1). However, due to the short duration of our recordings (usually 5 minutes, but with a high frame rate, 10 fps), we did not observe many catastrophe events, which prevented us from quantifying catastrophe frequency using the current dataset. Since we measured the binding kinetics of MCAK during the growing phase of microtubules, the similar growth rates and microtubule end morphologies suggest that the microtubule ends are comparable across the different conditions.

      Reviewer #1 (Recommendations For The Authors):

      a. Please provide more details about how the microtubule-bound molecules were selected for analysis (include a description of scripts, selection criteria, and filters, if any). Fig 1A arrows do not provide sufficient information.

      We first measured the fluorescence intensity of each binding event. A probability distribution of these intensities was then constructed and fitted with a Gaussian function. A binding event was considered to correspond to a single molecule if its intensity fell within μ±2σ of the distribution. The details of the single-molecule screening process are now provided in the revised manuscript (see page17, lines 574-583).

      b. Evidence that MCAK is dimeric in solution should be provided (gel filtration results, controls for Figs1A - bleaching, or comparison with single GFP fluorophore).

      In the revised manuscript, we provide the gel filtration results of purified MCAK and other proteins used in this study. The elution volume of the peak for GFP-MCAK corresponded to a molecular weight range between 120 kDa (EB1-GFP dimer) and 260 kDa (XMAP215-GFP-his6), suggesting that GFP-MCAK exists as a dimer (~220 kDa) under experimental condition (please see Fig.s1 and page 5, lines 104-105). In addition, we also measured the fluorescence intensity of both MCAK<sup>sN+M</sup> and MCAK. MCAK<sup>sN+M</sup> is a monomeric mutant that contains the neck domain and motor domain (Wang et al., 2012). The average intensity of MCAK<sup>sN+M</sup> is 196 A.U., about 65% of that of MCAK (300 A.U.). These two measurements suggest that the purified MCAK used in this study exists dimers (see Fig. s1).

      c. Evidence that MCAK on microtubules represents single molecules should be provided (distribution of GFP brightness with controls - GFP imaged under identical conditions). Since assay buffers include detergent, which is not desirable, all controls should be done using the same assay conditions. The authors should rule out that their main results are detergent-sensitive.

      (1) Regarding if MCAK on microtubules represent single molecules: please refer to our responses to the two points above.

      (2) To rule out the effect of tween-20 (0.0001%, v/v), we performed additional control experiments. The results showed that it has no significant effect on microtubule-binding affinity of MCAK (see Figure below).

      Author response image 2.

      Tween-20 (0.0001%, v/v) has no significant effect on microtubule-binding affinity of MCAK. (A) The representative projection images of GFP-MCAK (5 nM) binding to taxol-stabled GDP microtubules in the presence of 1 mM AMPPNP with or without tween-20. The upper panel showed the results of the control experiments performed without MCAK. Scale bar: 5 mm. (B) Statistical quantification of the binding intensity of GFP-MCAK binding to GDP microtubules with or without tween-20 (53 microtubules from 3 assays and 70 microtubules from 3 assays, respectively). Data were presented as mean ± SEM. Statistical comparisons were performed using the two-tailed Mann-Whitney U test with Bonferroni correction, n.s., no significance.

      d. How did the authors plot single-molecule intensity distributions? I am confused as to why the intensity distribution for single molecules in Fig 1D and 2A looks so perfectly smooth, non-pixelated, and broader than expected for GFP wavelength. Please provide unprocessed original distributions, pixel size, and more details about how the distributions were processed.

      In the revised manuscript, we provided unprocessed original data in Fig. 1B and Fig. 2A. We thank the reviewer for pointing out this problem.

      e. Many quantifications are based on a limited number of microtubules and the number of molecules is not provided, starting from Fig 1D and down. Please provide detailed statistics and explain what is plotted (mean with SEM?) on each graph.

      We performed a thorough inspection of the manuscript and corrected the identified issues.

      f. Plots with averaged data should be supplemented with error bars and N should be provided in the legend. E.g. Fig 1C - average position of MT and peak positions.

      We agree with the reviewer. In the revised manuscript, we have made the changes accordingly (e.g. Fig. 2C).

      g. Detailed information should be provided about protein constructs used in this work including all tags. The use of truncated proteins or charged/bulky tags can modify protein-microtubule interactions.

      We agree with the reviewer. In the revised manuscript, we provide the information of all constructs (see Fig. s1 and the related descriptions in Methods, pages 15-16, lines 476-534).

      h. Line 515: We estimated that the accuracy of microtubule end tracking was ~6 nm by measuring the standard error of the distribution of the estimated error in the microtubule end position. - evidence should be provided using the conditions of this study, not the reference to the prior work by others.

      i. Line 520: We estimated that the accuracy of the measured position was ~2 nm by measuring the standard error of the fitting peak location". Please provide evidence.

      Point h-i: we now provide detailed descriptions of how to estimate tracking and measurement accuracy and error in our work. Please see pages 18-19, lines 626-645.

      j. Kymographs in Fig 5G are barely visible. Please provide single-channel greyscale images. What are the dim molecules diffusing on this microtubule?

      We have incorporated the changes suggested by the reviewer. We think that some of the dim signals may result from stochastic background noise, while others likely represent transient bindings of MCAK. The exposure time in our experiments was approximately 0.05 seconds; if the binding duration were shorter than this, the signal would be lower (i.e. the “dim” signals). It is important to note that in this study, we selected binding events lasting at least 2 consecutive frames, meaning transient binding events were not included. This point has been clarified in the Methods section (see page17, lines 573-583).

      k. Please provide a methods description for Fig 6. Did the buffer include 1 mM ATP? The presence of ATP would make these conditions more physiological. ATP concentration should be stated clearly in the main text or figure legend.

      The buffer contains ATP. In the revised manuscript, we have provided the methods for the experiments of microtubule dynamics assay, as well as the analysis of microtubule lifetimes and catastrophe frequency (see page 17, lines 561-572 and page 20, lines 685-690).

      l. Line 104: experiment was performed in BRB80 supplemented with 50 mM KCl and 1 mM ATP, providing a nearly physiological ion strength. Please provide a reference or add your calculations in Methods.

      We have provided references on page 5, lines 101-104 of our manuscript.

      m. What was the MCAK concentration in Figure 4? Did the microtubule shorten under any of these conditions?

      In these experiments, we used a very low concentration of MCAK and taxol-stabilized microtubules, so there’s no microtubule shortening observed here. ATP: 10 nM GFP-MCAK; AMPPNP: 1 nM GFP-MCAK; ADP: 10 nM GFP-MCAK; APO state: 0.1 nM GFP-MCAK.

      Other criticism:

      Text improvements are recommended in the Discussion. For example, line 348: Fourth, the loss of the binding preference.. suggests that the binding preference .. is required for the optimal .. preference.

      We thank the reviewer for pointing out this. In the revised manuscript, we conducted a thorough revision and review of the text.

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Chen et al. investigate the localization of microtubule kinesin-13 MCAK to the microtubule ends. MCAK is a prominent microtubule depolymerase whose molecular mechanisms of action have been extensively studied by a number of labs over the last ~twenty years. Here, the authors use single-molecule approaches to investigate the precise localization of MCAK on growing microtubules and conclude that MCAK preferentially binds to a GDP-Pi-tubulin portion of the microtubule end. The conclusions are speculative and not well substantiated by the data, making the impact of the study in its current form rather limited. Specifically, greater effort should be made to define the region of MCAK binding on microtubule ends, as well as its structural characteristics. Given that MCAK has been previously shown to effectively tip-track growing microtubule ends through an established interaction with EB proteins, the physiological relevance of the present study is unclear. Finally, the manuscript does not cite or properly discuss a number of relevant literature references, the results of which should be directly compared and contrasted to those presented here.

      We thank the reviewer for the comments. As these suggestions are more thoroughly expressed in the following comments for authors, we will provide the responses in the corresponding sections, as shown below.

      Reviewer #2 (Recommendations For The Authors):

      Significant concerns:

      (1) Establishing the precise localization of MCAK wrt microtubule end is highly non-trivial. More details should be provided, including substantial supplementary data. In particular, the authors claim ~6 nm accuracy in microtubule end positioning - this should be substantiated by data showing individual overlaid microtubule end intensity profiles as well as fits with standard deviations etc. Furthermore, to conclude that MCAK binds behind XMAP215, the authors should look at the localization of the two proteins simultaneously, on the same microtubule end. Notably, EB binding profiles are well known to exponentially decay along the microtubule lattice - this is not very apparent from the presented data. If MCAK's autonomous binding pattern matches that of EB, we should be seeing an exponentially-decaying localization for MCAK as well? However, averaged MCAK signals seem to only be fitted to Gaussian. Note that the EB binding region (i.e. position and size of the EB comet) can be substantially modulated by increasing the microtubule growth rate - this can be easily accomplished by increasing tubulin concentrations or the addition of XMAP215 (e.g. see Maurer et al. Cur Bio 2014). Thus to establish that MCAK on its own binds the same region as EB, experiments that directly modulate the size and the position of this region should be added.

      (1) We thank the reviewer for this comment. Regarding the accuracy in microtubule end positioning, we now provide more details, and please see pages 18-19, lines 625-645 in the revised manuscript.

      (2) Regarding the relative localization of XMAP215 and MCAK, we performed additional experiments to record their colocalizations simultaneously, on the same microtubule end. Our results showed that MCAK predominantly binds behind XMAP215, with 14.5% appearing within the XMAP215’s binding region. Please see Fig. 2.D-E and lines 184-197 in the revised manuscript.

      (3) Regarding the exponential decay of the EB1 signal along microtubules, we observed that the position probability distribution measured in the present study follows a Gaussian distribution, and the expected exponential decay was not apparent. Since the exponential decay is thought to result from the time delay between tubulin polymerization and GTP hydrolysis, slower polymerization is expected to reduce this latency (Maurer et al., 2014). In our experiments, the growth rate was relatively low (~0.7 mm/min), much slower than the rate observed in cells, where the comet-shaped EB1 signal is most pronounced. The previous study has shown that the exponential decay of EB1 is more pronounced at growth rates exceeding 3 mm/min in vitro (Maurer et al., 2014). Therefore, we think that the relatively slow growth may account for the observed non-exponential decay distribution of the EB1 signals. The same reason may also explain the distribution of MCAK.

      (4) We agree with the reviewer’s suggestion that altering microtubule growth rate is a valid and effective approach to regulate the EB cap length. However, the conclusion that MCAK binds to the EB region is supported by three lines of evidence: (1) the localization of MCAK at the ends of microtubules, (2) new experimental data showing that MCAK binds to the proximal end of the XMAP215 site, and (3) the tendency of MCAK to bind GTPγS microtubules, similar to EB1. Based on these findings, we did not pursue additional experiments to modify the length of the EB cap.

      (2) Even if MCAK indeed binds behind XMAP215, there is no evidence that this region is defined by the GDP-Pi nucleotide state; it could still be curved protofilaments. GTPyS is an analogue of GTP - to what extent GTPyS microtubules exactly mimic the GDP-Pi-tubulin state remains controversial. Furthermore, nucleotide sensing for EB is thought to be achieved through its binding at the interface of four tubulin dimers. However MCAK's binding site is distinct, and it has been shown to recognize intradimer tubulin curvature. Thus it is not clear how MCAK would sense the nucleotide state. On the other hand, there is mounting evidence that the morphology of the growing microtubule end can be highly variable, and that curved protofilaments may be protruding off the growing ends for tens of nanometers or more, previously observed both by EM as well as by fluorescence (e.g. Mcintosh, Moores, Chretien, Odde, Gardner, Akhmanova, Hancock, Zanic labs). Thus, to establish that MCAK indeed localizes along the closed lattice, EM approaches should be used.

      First, we conducted additional experiments that demonstrate MCAK indeed binds behind XMAP215, supporting the conclusion that MCAK interacts with the EB cap (please see Fig. 2 in the revised manuscript). Second, our argument that MCAK preferentially binds to GDP-Pi tubulin is based on two observations: (1) the binding regions of MCAK overlap with those of EB1, and (2) MCAK preferentially binds to GTPγS microtubules, which are considered a close analogue of GDP-Pi tubulin. Third, understanding the structural basis of how MCAK senses the nucleotide state of tubulin is beyond the scope of the present study. However, inspired by the reviewer’s suggestion, we looked into the structure of the MCAK-tubulin complex. The L2 loop of MCAK makes direct contact with the interdimer interface (Trofimova et al., 2018; Wang et al., 2017), which could provide a structural basis for recognizing the changes induced by GTP hydrolysis. While this remains a hypothesis, it is certainly a promising direction for future research. Forth, we agree with the reviewer that an EM approach would be ideal for establishing that MCAK localizes along the closed lattice. However, this is not the focus of the current study. Instead, we argue that MCAK binds to the EB cap, where at least some lateral interactions are likely to have formed.

      (3) The physiological relevance of the study is rather questionable: MCAK has been previously established to be able to both diffuse along the microtubule lattice (e.g. Helenius et al.) as well as hitchhike on EBs (Gouveia et al.). Given the established localization of EBs to growing microtubule ends in cells, and apparently higher affinity of MCAK for EB vs. the microtubule end itself (although direct comparisons with the literature have not been reported here), the relevance of MCAK's autonomous binding to dynamic microtubule ends is dubious.

      We thank the reviewer for raising the importance of physiological relevance. Please refer to our response to the comment No.1 of reviewer 1. Briefly, we think that the end-binding affinity of MCAK makes a significant contribution for its cellular functions. To elucidate this concept, we now use a simple model shown in Supplementary Appendix-2 (see pages 49-51, lines 1246-1316). In this model, we simplified MCAK and EB1 binding to microtubule ends by considering only these two proteins while neglecting other factors (e.g. XMAP215). Specifically, we considered two scenarios: one in which both proteins freely diffuse in the cytoplasm and another where MCAK is localized to specific cellular structures, such as the centrosome or centromere. Based on the modeling results, we argue that MCAK's functional impact at microtubule ends derives both from its intrinsic end-binding capacity and its ability to strengthen the EB1-mediated end association pathway.

      (4) Finally, the study seriously lacks discussion of and comparison with the existing literature on this topic. There are major omissions in citing relevant literature, such as e.g. landmark study by Kinoshita et al. Science 2001. Several findings reported here directly contradict previous findings in the literature. Direct comparison with e.g. Gouveia et al findings, Helenius et al. findings, and others need to be included. For example, Gouveia et al reported that EB is necessary for MCAK plus-end-tracking in vitro (please see Figure 1 of their manuscript). The authors should discuss how they reconcile the differences in their findings when compared to this earlier study.

      We thank the reviewer for this helpful suggestion. In the revised manuscript, we have updated the text description and included comparative discussions with other relevant studies in the Discussion section. Specifically, we added comparisons with the research on XMAP215 in page 14, lines 459-472 (Barr and Gergely, 2008; Kinoshita et al., 2001; Tournebize et al., 2000). Additionally, we have compared our findings with those of Gouveia et al. and Helenius et al. regarding MCAK's preference for binding microtubule ends in page 6, lines 145-157 and page 13, 408-441, respectively (Gouveia et al., 2010; Helenius et al., 2006).

      Additional specific comments:

      Figure 1

      Gouveia et al. (Figure 1) reported that MCAK does not autonomously preferentially localize to growing tips. Specifically, Gouveia et al. found equal association rates of MCAK to both the lattice and the tip in the presence of EB3delT, an EB3 construct that does not directly interact with MCAK. How can these findings be reconciled with the results presented here?

      We are uncertain why there was no observed difference in the on-rates to the lattice and the end in the study by Gouveia et al. Even when considering only the known affinity of MCAK for curved protofilaments at the distal tip of growing microtubules, we would still expect to observe an end-binding preference. After carefully comparing the experimental conditions, we nevertheless identified some differences. First, we used a 160 nm tip size to calculate the on-rate (k<sub>on</sub>), whereas Gouveia et al. used a 450 nm tip. Using a longer tip size would naturally lead to a smaller(k<sub>on</sub>) value. Note that we chose 160 nm for several reasons: (i) a previous cryo-electron tomography study has elucidated that the sheet structures of dynamic microtubule ends have an average length of around 180 nm (Guesdon et al., 2016); (ii) Analysis of fluorescence signals at dynamic microtubule ends has demonstrated that the taper length at the microtubule end is less than 180 nm (Maurer et al., 2014); (iii) in the present study, we estimated that the length of MCAK's end-binding region is approximately 160 nm. Second, in Gouveia et al., single-molecule binding events were recorded in the presence of 75 nM EB3ΔT, which could potentially create a crowded environment at the tip, reducing MCAK binding. Third, as mentioned in our response to Reviewer 1, we took great care to minimize the interference from purification tags (e.g., His-tag) by ensuring their complete removal during protein preparation. Previous studies reported that retaining the His-tag of MAPs led to a significant increase in binding for microtubules (Maurer et al., 2011; Zhu et al., 2009). We believe that some of the factors mentioned above, or their combined effects, may account for the differences in these two observations.

      1C shows the decay of tubulin signal over several hundred nm - should show individual traces? How aligned? Doesn't this long decay suggest protruding protofilaments? (E.g. Odde/Gardner work).

      (1) In the revised manuscript, we now show individual traces (e.g. in Fig. 1B and Fig. 2A). The average trace for tubulin signal with standard deviation was shown in Fig. 2C.

      (2) The microtubule lattice was considered as a Gaussian wall and its end as a half-Gaussian in every frame. Use the peak position of the half-Gaussian of every frame to align and average microtubule end signals, during the dwell time. The average microtubule ends' half-Gaussion peak used as a reference to measure the intensity profile of individual single-molecule binding event in every frame (see page18, lines 607-624).

      (3) We think that the decay of tubulin signal results from the convolution of the tapered end structure and the point spread function. In the revised manuscript, we have updated the Figures to provide unprocessed original data in Fig. 1B and Fig. 2A.

      Please show absolute numbers of measurements in 1C (rather than normalized distribution only).

      In the revised manuscript, we have included the raw data for both tubulin and MCAK signals as part of the methods description. In Fig. 1, using normalized values allows for the simultaneous representation of microtubule and protein signals on a unified graph.

      How do the results in 1D-G compare with the previous literature? Particularly comparison of on-rates between this study and the Gouveia et al? Assuming 1 um = 1625 dimers, it appears that in the presence of EB3, the on-rate of MCAK to the tips reported in Gouveia et al. is an order of magnitude higher than reported here in the absence of EB3 (4.3 x 10E-4 vs. 2 x 10E-5). If so, and given the robust presence of EB proteins at growing microtubule ends in cells, this would invalidate the potential physiological relevance of the current study. Note that the dwell times measured in Gouveia et al. are also longer than those measured here.

      Note that in Gouveia et al, the concentration of mCherry-EB3 was 75 nM, about 187.5 times higher than that of MCAK (0.4 nM). The relative concentrations of these two proteins are not always the case in cells. Regarding the physiological relevance of the end-binding affinity of MCAK itself, please refer to our response to the point No.1 of Reviewer 1.

      Notably, Helenius et al reported a diffusion constant for MCAK of 0.38 um^2/s, which is more than an order of magnitude higher than reported here. The authors should comment on this!

      In the revised manuscript, we have provided an explanation for the difference in diffusion coefficient. Please see page 6, line 142-157. In short, low salt condition facilitates rapid diffusion of MCAK.

      Figure 2:

      This figure is critical and really depends on the analysis of the tubulin signal. Note significant variability in tubulin signal between presented examples in 2A. Also, while 2C looks qualitatively similar, there appears to be significant variability over the several hundred nm from the tip along the lattice. This is the crucial region; statistical significance testing should be presented. More detailed info, including SDs etc. is necessary.

      In the revised manuscript, we have provided raw data in Fig. 1B and Fig. 2A. Additionally, we have provided statistical analysis on the tubulin signals (Fig. 2C) and performed significance test. Please see page 5, lines 111-116 and page 7, lines 179-183 for detailed descriptions.

      Insights into the morphology of microtubule ends based on TIRF imaging have been previously gained in the literature, with reports of extended tip structures/protruding protofilaments (see e.g. Coombes et al. Cur Bio 2013, based on the methods of Demchouk et al. 2011). Such analysis should be performed here as well, if we are to conclude that nucleotide state alone, as opposed to the end morphology, specifies MCAK's tip localization.

      We appreciate the reviewer’s suggestion and agree that it provides a valid optical microscopy-based approach for estimating microtubule end morphology. However, this method did not establish a direct correlation between microtubule end morphology and tubulin nucleotide status. Therefore, we think that refining the measurement of microtubule end morphology will not necessarily provide more information to the understanding of tubulin nucleotide status at MCAK binding sites. Based on the available data in the present study, there are two main pieces of evidence supporting the idea that MCAK can sense tubulin nucleotide status: (1) the binding regions of MCAK and EB overlap significantly, and (2) MCAK shows a clear preference for binding to GTPγS microtubules, similar to EB1 (we provide a new control to support this, Fig. s4). Of course, we do not consider this to be a perfect set of evidence. As the reviewer has pointed out here and in other suggestions, future work should aim to further distinguish the nucleotide status of tubulin in the dynamic versus non-dynamic regions at the ends of microtubules, and to investigate the structural basis by which MCAK recognizes tubulin nucleotide status.

      EB comet profile should be clearly reproduced. MCAK should follow the comet profile.

      Please see our 3<sup>rd</sup> response to the point 1 of this reviewer.

      The conclusion that the MCAK binding region is larger than XMAP215 is not firm, based on the data presented. The authors state that 'the binding region of MCAK was longer than that of XMAP215'. What is the exact width of the region of the XMAP215 localization and how much longer is the MCAK end-binding region? Is this statistically significant?

      We have revised this part in the revised manuscript (page 6, lines 167-172). The position probability distributions of MCAK and XMAP215 were significantly different (K-S test, p< 10<sup>-5</sup>), and the binding region of MCAK (FWHM=185 nm) was significantly longer than that of XMAP215 (FWHM=123 nm).

      MCAK localization with AMPPNP should also be performed here. Even low concentrations of MCAK have been shown to induce microtubule catastrophe/end depolymerization. This will dramatically affect microtubule end morphology, and thus apparent positioning of MCAK at the end.

      In the end positioning experiment, we used a low concentration of MCAK (1 nM). Under this condition, microtubule dynamics remained unchanged, and the morphology of the microtubule ends was comparable across different conditions (with EB1, MCAK or XMAP215). Additionally, in the revised manuscript, we present a new experiment in which we recorded the localization of both MCAK and XMAP215 on the same microtubule. The results support the conclusion regarding their relative localization: most MCAK is found at the proximal end of the XMAP215 binding region, while approximately 15% of MCAK is located within the XMAP215 binding region. Please see Fig. 2D-E and page 7, lines 184-197 for the corresponding descriptions.

      Figure 3:

      For clearer presentation, projections showing two microtubule lattice types on the same image (in e.g. two different colors) should be shown first without MCAK, and then with MCAK.

      We thank the reviewer for this suggestion. We have adjusted the figure accordingly. Please see Fig. 4 in the revised manuscript.

      Please comment on absolute intensity values - scales seem to be incredibly variable.

      The fluorescence value presented here is the result of multiple images being summed. Therefore, the difference in absolute values is influenced not only by the binding affinity of MCAK in different states to microtubules, but also by the number of images used. In this analysis, we are not comparing MCAK in different states, but rather evaluating the binding ability of MCAK in the same state on different types of microtubules.

      Given that the authors conclude that MCAK binding mimics that of EB, EB intensity measurements and ratios on different lattice substrates should be performed as a positive control.

      We performed additional experiments with EB1, in the revised manuscript, we provide the data as a positive control (please see Fig. s4).

      Figure 4:

      MCAK-nucleotide dependence of GMPCPP microtubule-end binding has been previously established (see e.g. Helenius et al, others?) - what is new here? Need to discuss the literature. This would be more appropriate as a supplemental figure?

      In the present study, we reproduced the GMPCPP microtubule-end binding of MCAK in the AMPPNP state, as shown in several previous reports (Desai et al., 1999; Hertzer et al., 2006). Here, we also quantified the end to lattice binding preference, and our results showed that the nucleotide state-dependence shows the same trend as the binding preference of MCAK to the growing microtubule ends. Therefore, we prefer to keep this figure in the main text (Fig. 5).

      Figure 5:

      Please note that both MCAK mutants show an additional two orders of magnitude lower microtubule binding on-rates when compared to wt MCAK. This makes the analysis of preferential binding substrate for these mutants dubious.

      We agreed with this point. We have rewritten this part. Please see page 10, lines 295-327, in the revised manuscript.

      Figure 6:

      Combined effects of XMAP215 and XKCM1 (MCAK) have been previously explored in the landmark study by Kinoshita et al. Science 2001, which should be cited and discussed. Also note that Moriwaki et al. JCB 2016 explored the combined effects of XMA215 and MCAK - which should be discussed here and compared to the current results.

      We agree with the reviewer. We have revised the discussion on this part. Please see page 11, lines 329-342 and page 14, lines 459-472 in the revised manuscript.

      Please report quantification for growth rate and lifetime.

      In the revised manuscript, we provide all these data. Please see pages 11-12, lines 343-374.

      To obtain any new quantitative information on the combined effects of the two proteins, at the very minimum, the authors should perform a titration in protein concentration.

      We agree with the reviewer on this point. In our pilot experiments, we performed titration experiments to determine the appropriate concentrations of MCAK and XMAP215, respectively. We selected 50 nM for XMAP215, as it clearly enhances the growth rate and exhibits a mild promoting effect on catastrophe—two key effects of XMAP215 reported in previous studies (Brouhard et al., 2008; Farmer et al., 2021). Reducing the XMAP215 concentration eliminates the catastrophe-promoting effect, while increasing it would not much enhance the growth rate. For MCAK, we chose 20 nM, as it effectively promotes catastrophe; increasing the concentration beyond this point leads to no microtubule growth, at least in the MCAK-only condition. If there’s no microtubule growth, it would be difficult to quantify the parameters of microtubule dynamics, hindering a clear comparison of the combined versus individual effects. Therefore, we think that the concentrations used in this study are appropriate and representative. In the revised manuscript, we make this point clearer (see pages 11 and lines 329-342).

      Finally, the writing could be improved for overall clarity.

      We thank the reviewer for pointing out this. In the revised manuscript, we conducted a thorough revision and review of the text.

      Reviewer #3 (Public Review):

      The authors revisit an old question of how MCAK goes to microtubule ends, partially answered by many groups over the years. The authors seem to have omitted the literature on MCAK in the past 10-15 years. The novelty is limited due to what has previously been done on the question. Previous work showed MCAK targets to microtubule plus-ends in cells through association with EB proteins and Kif18b (work from Wordeman, Medema, Walczak, Welburn, Akhmanova) but none of their work is cited.

      We thank the reviewer for the suggestion. Some of the referenced work has already been cited in our manuscript, such as studies on the interaction between MCAK and EB1. However, other relevant literature had not been properly cited. In the revised manuscript, we have added further discussion on this topic in the context of existing findings. Please refer to pages 3-4, lines 68-85, and pages 13, lines 425-441.

      It is not obvious in the paper that these in vitro studies only reveal microtubule end targeting, rather than plus end targeting. MCAK diffuses on the lattice to both ends and its conformation and association with the lattice and ends has also been addressed by other groups-not cited here. I want to particularly highlight the work from Friel's lab where they identified a CDK phosphomimetic mutant close to helix4 which reduces the end preference of MCAK. This residue is very close to the one mutated in this study and is highly relevant because it is a site that is phosphorylated in vivo. This study and the mutant produced here suggest a charge-based recognition of the end of microtubules.

      Here the authors analyze this MCAK recognition of the lattice and microtubule ends, with different nucleotide states of MCAK and in the presence of different nucleotide states for the microtubule lattice. The main conclusion is that MCAK affinity for microtubules varies in the presence of different nucleotides (ATP and analogs) which was partially known already. How different nucleotide states of the microtubule lattice influence MCAK binding is novel. This information will be interesting to researchers working on the mechanism of motors and microtubules. However, there are some issues with some experiments. In the paper, the authors say they measure MCAK residency of growing end microtubules, but in the kymographs, the microtubules don't appear dynamic - in addition, in Figure 1A, MCAK is at microtubule ends and does not cause depolymerization. I would have expected to see depolymerization of the microtubule after MCAK targeting. The MCAK mutants are not well characterized. Do they still have ATPase activity? Are they folded? Can the authors also highlight T537 and discuss this?

      Finally, a few experiments are done with MCAK and XMAP215, after the authors say they have demonstrated the binding sites overlap. The data supporting this statement were not obvious and the conclusions that the effect of the two molecules are additive would argue against competing binding sites. Overall, while there are some interesting quantitative measurements of MCAK on microtubules - in particular in relation to the nucleotide state of the microtubule lattice - the insights into end-recognition are modest and do not address or discuss how it might happen in cells. Often the number of events is not recorded. Histograms with large SEM bars are presented, so it is hard to get a good idea of data distribution and robustness. Figures lack annotations. This compromises therefore their quantifications and conclusions. The discussion was hard to follow and needs streamlining, as well as putting their work in the context of what is known from other groups who produced work on this in the past few years.

      We thank the reviewer for the comments. Regarding the physiological relevance of the end-binding of MCAK itself, please refer to our response to the point No.1 of reviewer 1. Moreover, as we feel that other suggestions are more thoroughly expressed in the following comments for authors, we will provide the responses in the corresponding sections, as shown below.

      Reviewer #3 (Recommendations For The Authors):

      Why, on dynamic microtubules, is MCAK at microtubule plus ends and does not cause a catastrophe?

      At this concentration (10 nM MCAK with 16 mM tubulin in Fig. 1; 1 nM MCAK with 12 mM tubulin in Fig. 2), MCAK has little effect on microtubule dynamics in our experiments. Using TIRFM, we were able to observe individual MCAK binding events. Based on these observations, we think that in the current experimental condition, a single binding event of MCAK is insufficient to induce microtubule catastrophe; rather, it likely requires cumulative changes resulting from multiple binding events.

      Do the MCAK mutants still have ATPase activity?

      The ATPase activities of MCAK<sup>K525A</sup> and MCAK<sup>V298S</sup> are both reduced to about 1/3 of the wild-type (Fig. s6).

      The intensities of GFP are not all the same on the microtubule lattice (eg 1A). See blue and white arrowheads. The authors could be looking at multiple molecules of GFP-MCAK instead of single dimers. How do they account for this possibility?

      In the revised manuscript, we provide the gel filtration result of the purified MCAK, and the position of the peak corresponds to ~220 kDa, demonstrating that the purified MCAK in solution is dimeric (please see Fig.s1 and page 5, lines 101-103). We measured the fluorescence intensity of each binding event. A probability distribution of these intensities was then constructed and fitted with a Gaussian function. A binding event was considered to correspond to a single molecule if its intensity fell within μ±2σ of the distribution. The details of the single-molecule screening process are provided in the revised manuscript (see page 17, lines 574-583).

      In addition, we also measured the fluorescence intensity of both MCAK<sup>sN+M</sup> and MCAK. MCAK<sup>sN+M</sup> is a monomeric mutant that contains the neck domain and motor domain (Wang et al., 2012). The average intensity of MCAK<sup>sN+M</sup> is 196 A.U., about 65 % of that of MCAK (300 A.U.), suggesting that MCAK is a dimer (see Fig. s1). Moreover, we think that some of the dim signals may result from stochastic background noise, while others likely represent transient bindings of MCAK. The exposure time in our experiments was approximately 0.05 seconds; if the binding duration were shorter than this, the signal would be lower. It is important to note that in this study, we specifically selected binding events lasting at least 2 consecutive frames, meaning transient binding events were not included. This point has been clarified in the Methods section (see page 17, lines 568-569 and lines 574-583).

      Could the authors provide a kymograph of an MT growing, in the presence of MCAK+AMPPNP? Can MCAK track the cap?

      Under single-molecule conditions, we observed a single MCAK molecule briefly binding to the end of the microtubule. However, we did not record if MCAK at high concentrations could track microtubule ends under AMPPNP conditions.

      In the experiments in Figure 6, the authors should also show the localization of MCAK and XMAP215 at microtubule plus ends in their kymographs to show the two molecules overlap.

      Regarding the relative localization of XMAP215 and MCAK, we conducted additional experiments to record their colocalization simultaneously at the same microtubule end. Our results show that MCAK predominantly binds behind XMAP215, with 14.5% of MCAK binding within the XMAP215 binding region. Please see Fig. 2.D-E and page 7, lines 184-197 in the revised manuscript. However, we argue that the effects of XMAP215 and MCAK are additive, and their binding sites do not necessarily need to overlap for these effects to occur.

      The authors do not report what statistical tests are done in their graphs, and one concern is over error propagation of their data. Instead of bar graphs, showing the data points would be helpful.

      We have now shown all data points in the revised manuscript.

      MCAK+AMPPNP accumulates at microtubule ends. Appropriate quotes from previous work should be provided.

      We have made the revisions accordingly. Please see page 9, lines 273-276.

      Controls are missing. An SEC profile for all purified proteins should be presented. Also, the authors need to explain if they report the dimeric or monomeric concentration of MCAK, XMAP215, etc...

      We have provided the gel filtration result for all purified proteins in the revised manuscript (Fig.s1). Moreover, we now make it clear that the concentrations of MCAK and EB1 are monomeric concentration. Please see the legend for Fig. 1, line 893 in the revised manuscript.

      Figure 1: the microtubules don't look dynamic at all. This is also why the authors can record MCAK at microtubule ends, because their structure is not changing.

      The microtubules are dynamic, but they may appear non-dynamic due to the relatively slow growth rate and the high frame rate at which we are recording. We propose that individual binding events of MCAK induce structural changes at the nanoscopic or molecular scale, which are not detectable using TIRFM.

      I recommend the authors measure the Kon and Koff for single GFP-MCAK mutant molecules and provide the information alongside their normalized and averaged binding intensities of GFP-MCAK in Fig 5. Showing data points instead of bar graphs would be better.

      (1) We measured k<sub>on</sub> and dwell time for mutants at growing microtubule end. However, we did not perform single-molecule tracking for MCAK’s binding on stabilized microtubules. This is mainly because the superimposed signal on the stable microtubule already indicates the changes in the mutant's binding affinity to different microtubule structures, and moreover, the binding of the mutants is highly transient, making accurate single-molecule tracking and calculations difficult.

      (2) In the revised figure, we have included the data points in all plots.

      When discussing how Kinesin-13 interacts with the lattice, the authors should quote the papers that report the organization of full-length Kinesin-13 on tubulin heterodimers: Trofimova et al, 2018; McHugh et al 2019; Benoit et al, 2018. It would reinforce their model and account for the full-length protein, rather than just the motor domain.

      We thank the suggestion for the reviewer. In our manuscript, we have cited papers on full-length Kinesin-13 to discuss the interaction between MCAK and microtubule end-curved structure. Additionally, we have utilized the MCAK-tubulin crystal structure (PDB ID: 5MIO) in Fig. 6, as it depicts a human MCAK, which is consistent with the protein used in our study. This structure illustrates the interaction sites between MCAK and tubulin dimer, guiding our mutation studies on specific residues. Thus, we prefer to use the structure (PDB ID: 5MIO) in Fig.6.

      Figure 5A. What type of model is this? A PDB code is mentioned. Is this from an X-ray structure? If so, mention it.

      We have now included the structural information in the Figure legend (see page 37, lines 1045).

      Figure 5B. It is not possible to distinguish the different microtubule lattices (GTPyS, GDP, and GMPCPP). The experiment needs to be better labelled.

      We thank the reviewer for this comment. We have now rearranged the figure for better clarity (see Fig. 6).

      "Figure 5D: what are the statistical tests? I don't understand " The statistical comparisons were made versus the corresponding value of 848 GFP-MCAK".

      We have made this point clearer in the revised manuscript (see pages 38, line 1078-1080).

      What is the "EB cap"? This needs explaining.

      We provide this explanation for this, please see page 4, lines 87-89 in the revised manuscript.

      Work from Friel and co-workers showed MCAK T537E did not have depolymerizing activity and a reduced affinity for microtubule ends. The work of the authors should be discussed with respect to this previously published work.

      We thank the reviewer for this suggestion. In the revised manuscript, we have added discussions on this (see page 10, lines 303-307).

      The concentration of protein used in the assays is not always described.

      We have checked throughout the manuscript and made revisions accordingly.

      "Having revealed the novel binding sites of MCAK in dynamic microtubule ends " should be on "we wondered how MCAK may work ..with EB1". This is not addressed so should be removed. Instead, they can quote the work from Akhmanova's lab. Realistically this section should be rephrased as there are other plus-end targeting molecules that compete with MCAK, not just XMAP215 and EB1.

      We have rephrased this section as suggested by this reviewer to be more specific. Please see page 11, lines 329-342.

      What is AMPCPP?

      It should be “AMPPNP”

      Typos in Figure 5.

      Corrected

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      We thank the reviewer for his/her very positive comments.

      Reviewer #2 (Public review):

      We thank the reviewer for his/her positive evaluation. We plan to add RNAseq data of yeast wild-type and JDP mutant strains as more direct readout for the role of Apj1 in controlling Hsf1 activity. We agree with the reviewer that our study includes one major finding: the central role of Apj1 in controlling the attenuation phase of the heat shock response. In accordance with the reviewer we consider this finding highly relevant and interesting for a broad readership. We agree that additional studies are now necessary to mechanistically dissect how the diverse JDPs support Hsp70 in controlling Hsf1 activity. We believe that such analysis should be part of an independent study but we will indicate this aspect as part of an outlook in the discussion section of a revised manuscript.

      Reviewer #3 (Public review):

      We thank the reviewer for his/her suggestions. We agree that it is sometimes difficult to distinguish direct effects of JDP mutants on heat shock regulation from indirect ones, which can result from the accumulation of misfolded proteins that titrate Hsp70 capacity. We also agree that an in vitro reconstitution of Hsf1 displacement from DNA by Apj1/Hsp70 will be important, also to dissect Apj1 function mechanistically. We will add this point as outlook to the revised manuscript.

      Reviewer #1 (Recommendations for the authors): 

      (1) Can the authors submit the raw translatome data to a standard repository? Also, the data should be summarized in a supplemental Excel table. 

      We submitted the raw translatome data to the NCBI Gene Expression Omnibus and added the analyzed data sets (shown in Figures 1 and 5) as Supplementary Tables S4/S5 (excel sheets). We additionally included RNAseq analysis of yeast WT and JDP mutants set grown at 25°C, complementing and confirming our former translatome analysis (new Figure 5, Figure Supplement 2). Respective transcriptome raw data were also deposited at the NCBI Gene Expression Omnibus and analyzed data are available as Supplementary Table S7.

      (2) MW indicators need to be added to the Western Blot figures. 

      We added molecular weight markers to the Western Blot figures.

      (3) Can the authors please include the sequences of the primers used in all the RT-qPCR experiments? They mention they are in the supplemental information, but I couldn't locate them. 

      We added the sequences of the RT-qPCR primers as Supplementary Table S4.

      (4) Given the clear mechanism proposed, it would be nice if the authors could provide a nice summary figure. 

      We followed the suggestion of the reviewer and illustrate our main finding as new Figure 7.

      Reviewer #2 (Recommendations for the authors): 

      (1) As mentioned above, a co-IP experiment between Hsf1 and Ssa1/2 in APJ1 and apj1∆ cells, utilizing Hsf1 alleles with and without the two known binding sites, would cement the assignment of Apj1 in the Hsf1 regulatory circuit. 

      We agree with the reviewer that Hsf1-Ssa1/2 pulldown experiments, as done by Pincus and colleagues (1), will further specify the role of Apj1 in targeting Hsp70 to Hsf1 during the attenuation phase of the heat shock response. We have tried extensively such pulldown experiments to document dissociation of Ssa1/2 from Hsf1 upon heat shock in yeast wild-type cells. While we could specifically detect Ssa1/2 upon Hsf-HA1 pulldown, our results after heat shock were highly variable and inconclusive and did not allow us to probe for a role of Apj1 or the two known Ssa1/2 binding sites in the phase-specific targeting. We now discuss the potential roles of the two distinct Ssa1/2 binding sites for phase-specific regulation of Hsf1 activity in the revised manuscript (page 12, lanes 17-21).

      (2) Experiments in Figure 3 nicely localize CHIP reactions with known HSEs. A final confirmatory experiment utilizing a mutated HSE (another classic experiment in the field) would cement this finding and validate the motif and reporter-based analysis. 

      We thank the reviewer for this meaningful suggestions. We have done something like this by using the non-Hsf1 regulated gene BUD3, which lacks HSEs, as reference. We engineered a counterpart, termed “BUD3 HS-UAS”, which bears inserted HSEs, derived from the native UAS of HSP82, within the BUD3 UAS. We show that BUD3<sup>+</sup> lacking HSEs is not occupied by Hsf1 and Apj1 under either non-stress or heat shock conditions while BUD3-HSE is clearly occupied under both, paralleling Hsf1 and Apj1 occupancy of HSP82 (Figure 3E). We have renamed the engineered allele to “BUD3-HSE” to clarify the experimental design and output.

      (3) Page 8 - the ydj1-4xcga allele is introduced without explaining why it's needed, since ydj1∆ cells are viable. The authors should acknowledge the latter fact, then justify why the RQC depletion approach is preferred. Especially since the ydj1∆ mutant appears in Figure 5B. 

      ydj1∆ cells are viable, yet they grow extremely slowly at 25°C and hardly at 30°C,  making them difficult to handle. The RQC-mediated depletion of Ydj1 in ydj1-4xcga cells allows for solid growth at 30°C, facilitating strain handling and analysis of Ydj1 function. Importantly, ydj1-4xcga cells are still temperature-sensitive and exhibit the same deregulation of the heat shock response upon combination with apj1D as observed for ydj1∆ cells. Thus ydj1 knockout and knockdown cells do not differ in the relevant phenotypes reported here and we performed most of the analysis with  ydj1-4xcga cells due to their growth advantage. We added a respective explanation to the text (page 8, lanes 13-14) .

      (4) The authors raise the possibility that Sis1, Apj1, and Ydj1 may all be competing for access to Ssa1/2 at different phases of the HSR, and that access may be dictated by conformational changes in Hsf1. Given that there are at least two known Hsp70 binding sites that have negative regulatory activity in Hsf1, the possibility that domain-specific association governs the different roles should be considered. It is also unclear how the JDPs are associating with Hsf1 differentially if all binding is through Ssa1/2. 

      We thank the reviewer for the comment and will add the possibility of specific roles of the identified Hsp70 binding sites in regulating Hsf1 activity at the different phases of the heat shock response to the discussion section. Binding of Ssa1/2 to substrates (including Hsf1) is dependent on J-domain proteins (JDPs), which differ in substrate specificity. It is tempting to speculate that the distinct JDPs recognize different sites in Hsf1 and are responsible for mediating the specific binding of Ssa1/2 to either N- or C-terminal sites in Hsf1. Thus, the specific binding of a JDP to Hsf1 might dictate the binding to Ssa1/2 to either binding site. We discuss this aspect in the revised manuscript (page 12, lanes 17-21).

      (5) Figure 6 - temperature sensitivity of hsf1 and ydj1 mutants has been linked to defects in the cell wall integrity pathway rather than general proteostasis collapse. This is easily tested via plating on osmotically supportive media (i.e., 1M sorbitol) and should be done throughout Figure 6 to properly interpret the results.

      Our data indicate proteostasis breakdown in ydj1 cells by showing strongly altered localization of Sis1-GFP, pointing to massive protein aggregation (Figure 6 – Figure Supplement  1D).

      We followed the suggestion of the reviewer and performed spot tests in presence of 1 M sorbitol (see figure below). The presence of sorbitol is improving growth of ydj1-4xcga mutant cells at increased temperatures, in agreement with the remark of the reviewer. We, however, do not think that growth rescue by sorbitol is pointing to specific defects of the ydj1 mutant in cell wall integrity. Sorbitol functions as a chemical chaperone and has been shown to have protective effects on cellular proteostasis and to rescue phenotypes of diverse point mutants in yeast cells by facilitating folding of the respective mutant proteins and suppressing their aggregation (2-4). Thus sorbitol can broadly restore proteostasis, which can also explain its effects on growth of ydj1 mutants at increased temperatures. Therefore the readout of the spot test with sorbitol is not unambiguous and we therefore prefer not showing it in the manuscript.

      Author response image 1.

      Serial dilutions of indicated yeast strains were spotted on YPD plates without and with 1 M sorbitol and incubated at indicated temperatures for 2 days.<br />

      Reviewer #3 (Recommendations for the authors): 

      (1) Line 154: Can the authors, by analysis, offer an explanation for why HSR attenuation varies between genes for the sis1-4xcga strain? Is it, for example, a consequence of that a hypomorph and not a knock is used, a mRNA turnover issue, or that Hsf1 has different affinities for the HSEs in the promoters? 

      We used the sis1-4xcga knock-down strain because Sis1 is essential for yeast viability. The point raised by the reviewer is highly valid and we extensively thought about the diverse consequences of Sis1 depletion on levels of e.g. translated BTN2 (minor impact) and HSP104 (strong impact) mRNA. We meanwhile performed transcriptome analysis and confirmed the specific impact of Sis1 depletion on HSP104 mRNA levels, while BTN2 mRNA levels remained much less affected (new Figure 5 - Figure Supplement 2A/B). We compared numbers and spacings of HSEs in the respective target genes but could not identify obvious differences. Hsf1 occupancy within the UAS region of both BTN2 and HSP104 is very comparable at three different time points of a 39°C heat shock: 0, 5 and 120 min, arguing against different Hsf1 affinities to the respective HSEs (5). The molecular basis for the target-specific derepression upon Sis1 depletion thus remains to be explored. We added a respective comment to the revised version of the manuscript (page 12, lanes 3-8) .

      (2) Line 194: The analysis of ChIP-seq is not very elaborated in its presentation. How specific is this interaction? Can it be ruled out by analysis that it is simply the highly expressed genes after the HS that lead to Apj1 appearing there? More generally: Can the data in the main figure be presented to give a more unbiased genome-wide view of the results?

      We overall observed a low number of Apj1 binding events in the UAS of genes. The interaction of Apj1 with HSEs is specific as we do not observe Apj1 binding to the UAS of well-expressed non-heat shock genes. Similarly, Apj1 does not bind to ARS504 (Figure S3 – Figure Supplement 1). We extended the description of our ChIP-seq analysis procedures leading to the identification of HSEs as Apj1 target sites to make it easier to understand the data analysis. We additionally re-analysed the two Apj1 binding peaks that did not reveal an HSE in our original analysis. Using a modified setting we can identify a slightly degenerated HSE in the promoter region of the two genes (TMA10, RIE1) and changed Figure 3C accordingly. Notably, TMA10 is a known target gene of Hsf1. The expanded analysis is further documenting the specificity of the Apj1 binding peaks.

      (3) Line 215. Figure 3. The clear anticorrelation is puzzling. Presumably, Apj1 binds Hsf1 as a substrate, and then a straight correlation is expected: When Hsf1 substrate levels decrease at the promoters, also Apj1 signal is predicted to decrease. What explanations could there be for this? Is it, for example, that Hsf1 is not always available as a substrate on every promoter, or is Apj1 tied up elsewhere in the cell/nucleus early after HS? 

      We propose that Apj1 binds HSE-bound Hsf1 only after clearance of nuclear inclusions, which form upon heat stress. Apj1 thereby couples the restoration of nuclear proteostasis to the attenuation of the heat shock response. This explains the delayed binding of Apj1 to HSEs (via Hsf1), while Hsf1 shows highest binding upon activation of the heat shock response (early timepoints). Notably, the binding efficiency of Hsf1 and Apj1 (% input) largely differ, as we determine strong binding of Hsf1 five min post heat shock (30-40% of input), whereas maximal 3-4% of the input is pulled down with Apj1 (60 min post heat shock) (Figure 3D). Even at this late timepoint 10-20% of the input is pulled down with Hsf1. The diverse kinetics and pulldown efficiencies suggest that Apj1 displaces Hsf1 from HSEs and accordingly Hsf1 stays bound to HSEs in apj1D cells (Figure 4). This activity of Apj1 explains the anti-correlation: increased targeting of Apj1 to HSE-bound Hsf1 will lower the absolute levels of HSE-bound Hsf1. What we observe in the ChIP experiment at the individual timepoints is a snapshot of this reaction. Accordingly, at the last timepoint (120 min after heat shock ) analyzed, we observe low binding of both Hsf1 and Apj1 as the heat shock response has been shut down.

      (4) Line 253: "Sis-depleted".  

      We have corrected the mistake.

      (5) Line 332: Fig. 6C SIS1 OE from pRS315. A YIP would have been better, 20% of the cells will typically not express a protein with a CEN/ARS of the pRS-series so the Sis1 overexpression phenotype may be underestimated and this may impact on the interpretation. 

      We agree with the reviewer that Yeast Integrated Plasmids (YIP) represent the gold standard for complementation assays. We are not aware of a study showing that 20% of cells harboring pRS-plasmids do not express the encoded protein. The results shown in Fig. 8C/D demonstrate that even strong overproduction of Sis1 cannot restore Hsf1 activity control. This interpretation also will not be affected assuming that a certain percentage of these cells do not express Sis1. Nevertheless, we added a comment to the respective section pointing to the possibility that the Sis1 effect might be underestimated due to variations in Sis1 expression (page 11, lanes 15-19).

      (6) Figure 1C. Since n=2, a more transparent way of showing the data is the individual data points. It is used elsewhere in the manuscript, and I recommend it. 

      We agree that showing individual data points can enhance transparency, particularly with small sample sizes. However, the log2 fold change (log2FC) values presented in Figure 1C and other figures derived from ribosome profiling and RNAseq experiments were generated using the DESeq2 package. This DeSeq2 pipeline is widely used in analyzing differential gene expression and known for its statistical robustness. It performs differential expression analysis based on a model that incorporates normalization, dispersion estimation, and shrinkage of fold changes. The pipeline automatically accounts for biological, technical variability, and batch effects, thereby improving the reliability of results. These log2FC values are not directly calculated from log-transformed normalized counts of individual samples but are instead estimated from a fitted model comparing group means. Therefore, the individual values of replicates in DESeq2 log2FC cannot be shown.

      (7) Figure 1D. Please add the number of minutes on the X-axis. Figure legend: "Cycloheximide" is capitalized.  

      We revised the figure and figure legend as recommended.

      (8) Several figure panels: Statistical tests and SD error bars for experiments performed in duplicates simply feel wrong for this reviewer. I do recognize that parts of the community are calculating, in essence, quasi-p-values using parametric methods for experiments with far too low sample numbers, but I recommend not doing so. In my opinion, better to show the two data points and interpret with caution.

      We followed the advice of the reviewer and removed statistical tests for experiments based on duplicates.

      References

      (1) Krakowiak, J., Zheng, X., Patel, N., Feder, Z. A., Anandhakumar, J., Valerius, K. et al. (2018) Hsf1 and Hsp70 constitute a two-component feedback loop that regulates the yeast heat shock response eLife 7,

      (2) Guiberson, N. G. L., Pineda, A., Abramov, D., Kharel, P., Carnazza, K. E., Wragg, R. T. et al. (2018) Mechanism-based rescue of Munc18-1 dysfunction in varied encephalopathies by chemical chaperones Nature communications 9, 3986

      (3) Singh, L. R., Chen, X., Kozich, V., and Kruger, W. D. (2007) Chemical chaperone rescue of mutant human cystathionine beta-synthase Mol Genet Metab 91, 335-342

      (4) Marathe, S., and Bose, T. (2024) Chemical chaperone - sorbitol corrects cohesion and translational defects in the Roberts mutant bioRxiv  10.1101/2024.09.04.6109452024.2009.2004.610945

      (5) Pincus, D., Anandhakumar, J., Thiru, P., Guertin, M. J., Erkine, A. M., and Gross, D. S. (2018) Genetic and epigenetic determinants establish a continuum of Hsf1 occupancy and activity across the yeast genome Mol Biol Cell 29, 3168-3182

    1. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      This manuscript assesses the differences between young and aged chondrocytes. Through transcriptomic analysis and further assessments in chondrocytes, GATA4 was found to be increased in aged chondrocyte donors compared to young donors. Subsequent mechanistic analysis with lentiviral vectors, siRNAs, and a small molecule was used to study the role of GATA4 in young and old chondrocytes. Lastly, an in vivo study was used to assess the effect of GATA4 expression on osteoarthritis progression in a DMM mouse model.

      Strengths:

      This work linked the overexpression of GATA4 to NF-kB signaling pathway activation, alterations to the TGF-b signaling pathway, and found that GATA4 increased the progression of OA compared to the DMM control group. This indicates that GATA4 contributes to the onset and progression of OA in aged individuals.

      The authors thank the reviewer for reviewing our manuscript and providing insightful comments.

      Weaknesses:

      (1) A couple of sentences should be added to the introduction, to emphasize the role GATA4 plays, such as the alterations to the TGF-b signaling pathway and the increased activation of the NF-kB pathway. 

      As suggested, we have expanded on these signaling pathways in the Introduction to highlight the known functions of GATA4. Importantly, there was no previous study reporting the roles of GATA4 in regulating TGF-β pathway.

      “Many growth factors contribute to the chondro-supportive environment in the knee joint. Particularly, transforming growth factor-b (TGF-b) plays a key role in maintaining chondrocytes and replenishing ECM loss. However, during OA, TGF-b can induce catabolic processes in chondrocytes, resulting in matrix stiffening, osteophytes, and chondrocyte hypertrophy.[10-12]” (Lines 80-84)

      “Mechanistically, upregulation of GATA4 was shown to increase nuclear factor-kB (NF-kB) pathway activation.[14,15]  NF-κB is thought to amplify and potentially propagate cellular senescence during the aging process through the senescence-associated secretory phenotype (SASP), which could contribute to a low-grade state of chronic inflammation.[16]” (Lines 99-102)

      “When GATA4 was over expressed, we found that there were alterations to the TGF-b signaling pathway and activation of the NF-kB signaling pathway.” (Lines 106-108)

      (2) Figure 1F, the GATA4 histology image should be bigger.

      We have now increased the size of the image in revised Figure 1F.

      (3) Further discussion should be conducted regarding the reasoning as to why GATA4 increases the phosphorylation of SMAD1/5. 

      Thank you. The underlying mechanism of GATA4 activating SMAD1/5 has not been previously investigated. We have now elaborated on this in the discussion and have added more relevant publications.

      “Our study indicated that there was an observed decrease in chondrogenesis and an increase in hypertrophy-related genes following GATA4 overexpression (Figure 2G).” (Lines 572-574)

      “These previous studies and literature review inspired us to explore the potential association between GATA4 levels and the activation of SMAD1/5.” (Lines 587-588)

      “In this study, it was shown that GATA4 was necessary for bone morphogenic protein-6 (BMP-6) mediated IL-6 induction, in which there are multiple GATA binding domains on the IL-6 promoter. This work further showed that GATA4 interacts with SMAD 2,3 and 4.[55] Studies have suggested that BMP pathways and GATA4 work synergistically to regulate SMAD signaling.56 This information indicates that the involvement of GATA4 in the TGF-b signaling pathway is complex and further studies should be conducted to better assess this relationship.” (Lines 594-599)

      (4) More information should be included to clarify why GATA4 is thought to be linked to DNA damage and the pathway that is associated with that. 

      We have now included further information in the discussion to clarify the association between DNA damage and GATA4 upregulation.

      “The study by Kang et al. demonstrated that the suppression of p62 following DNA damage leads to GATA4 accumulation due to the lack of autophagy.13 DNA damage is known to increase with age.71 Therefore, we believe that DNA damage due to aging is a key driver of the upregulation of GATA4 in old chondrocytes.” (Lines 642-646)

      (5) Please add further information regarding the limitations of the animal study conducted in this work and future plans to assess this. 

      We have included more limitations of the animal study that was conducted in this work and have expanded on the future plans to use inducible GATA4 expression in transgenic mouse lines to study the role of GATA4 overexpression in OA onset and progression.

      “Third, during our in vivo work, the intraarticular injection of GATA4 lentivirus was not chondrocyte-specific. Therefore, the injection also allowed for other cell types to overexpress GATA4. Future work should be conducted using transgenic mouse lines for cartilage-specific inducible overexpression or depletion of Gata4 to further investigate the role of GATA4 in chondrocytes.” (666-670)

      (6) In Figure 5, GATA4 should be changed to Gata4 in the graphed portions for consistency. 

      Thanks. We have made the necessary adjustments throughout the manuscript.

      Reviewer #2 (Public review):

      (1) While it is convincing that GATA4 expression is elevated in elderly individuals, and that it has a detrimental impact on cartilage health, the authors might want to add further discussion on the variability among individual human donors, especially given the finding that the elevation of GATA4 was not observed in chondrocytes from donor O1 (Figure 1G).

      The authors thank the reviewer for reviewing our manuscript and providing insightful comments.

      As suggested, we have included more discussion on the variability among donors.

      “Although we found that GATA4 was generally increased with aging, some young donors also exhibited increased levels of GATA4, which may be associated with increased DNA damage, as discussed above, or other stressors. Therefore, GATA4 should be used together in conjunction with other aging biomarkers, such as the epigenetic clock [72] to precisely define chondrocyte aging. Future work should examine biological versus chronological aging and epigenetic clock-based assessments to explain the variabilities in GATA4 expression among donors.” (Lines 658-663)

      (2) It might also be worth adding additional discussion on the interplay between senescent chondrocytes and the dysfunctional ECM during aging. As noted by the authors, aging is associated with decreased sGAG content and likely degenerative changes in the collagen II network, so the microniche of chondrocytes, and thus cell-matrix crosstalk through the pericellular matrix, is also altered or impaired. 

      Thank you for this comment. We have included more discussion on the interplay of chondrocyte senescence and dysfunctional ECM during aging, with a specific focus on the microniche of chondrocytes.

      “Additionally, a common hallmark of chondrocyte aging is the alternation of ECM, including composition change [2] and stiffening.[57] ECM stiffness can directly affect chondrocyte phenotype and proliferation, and contribute to OA.[58] A recent study by Fu et al. associated matrix stiffening with the promotion of chondrocyte senescence.[59] Furthermore, matrix stiffening has been associated with modulating the TGF-b signaling pathway.[60-62] Future studies should investigate the potential of matrix stiffening and the effect of GATA4 on pericellular matrix proteins such as decorin[63,64], biglycan, collagen VI and XV, as these proteins assist with the regulation of biochemical interactions and assist with the maintenance of the chondrocyte microenvironment.[65] Herein, the TGF-b signaling pathway can further alter the extracellular microenvironment[62], which could promote cellular senescence and subsequently NF-kB pathway activation.” (Lines 600-610)

      (2) If applicable, please also add Y3 and O3 to Figure S1 for visual comparison across individual donors. 

      As suggested, we added Y3 and O3 to the revised Figure S1 for more visual comparisons across individual donors.

      (3) Figure 3C, the molecular weight labels are off. 

      Thanks. We corrected this mistake.

      (4) Line 438 - Please clarify in text that the highest efficiency of siRNA chosen was siRNA2. 

      As suggested, we added the reason for selecting siRNA2.

      “Several GATA4 siRNAs were tested, and the one with the highest efficiency was selected based off RT-qPCR results, which indicated that siRNA2 treatment induced lowest expression of GATA4.  (Supplementary Figure S6).” (Lines 448-450)

      (5) Did the authors test the timeline of sustained knockdown of GATA4 by siRNA?

      We used a 7-day timepoint of chondrogenesis, and RT-qPCR results demonstrated that there was a downregulation of GATA4 expression at this timepoint (Figure 4). In the current in vitro study, we did not examine the efficacy of GATA4 siRNA for longer than 7 days.

      Reviewer #3( Public review):

      (1) It would be useful to explain why GATA4 was chosen over HIF1a, which was the most differentially expressed. 

      The authors thank the reviewer for reviewing our manuscript and providing insightful comments.

      When we first saw the results, we did consider studying the role of HIF1a in aging because it was the most differentially expressed. When we reviewed the relevant literature, we found that HIF1a was commonly upregulated in aged individuals which was thought to be linked to hypoxia and increased oxidated stress (PMID: 12470896, PMID: 12573436). Further investigation found studies that investigated HIF1a in chondrocytes and the use of in vivo work to investigate its role in osteoarthritis (PMID: 32214220). Indicating that HIF1a plays a protective role during OA by suppressing the activation of NF-kB pathway.  Moreover, there is work that has been conducted assessing the stabilization of HIF1a by regulating mitophagy and using HIF1a as a potential therapeutic target for OA (PMID: 32587244). Since there have been many studies investigating the correlation of HIF1a expression and OA, we felt that it would be more innovative to look at other molecules, such as GATA4. Moreoever, as we highlighted in the Introducion and Disucussion, through testing in cell types other than chondrocytes, GATA4 was shown to be associated with DNA damage and senescence, which are both aging hallmarks. Given the fact that roles of GATA4 in chodnrocytes had not been previous studies, we thus chose GATA4 in this study. 

      “Of note, Hypoxia-Inducible Factor 1a (HIF1a) was the most differentially expressed gene predicted to regulate chondrocyte aging. The connection between HIF1a and aging has been previously reported.32 Furthermore, additional studies have investigated HIF1a in association with OA and assessed its use as a therapeutic target.[33,34] Therefore, we decided to focus on GATA4, which was less studied in chondrocytes but highly associated with cellular senescence, an aging hallmark. However, our selection did not dampen the importance of HIF1α and other molecules listed in Figure 1D in chondrocyte aging. They can be further studied in the future using the same strategy employed in the current work.” (Lines 526-533)

      (2) In Figure 5, it would be useful to demonstrate the non-surgical or naive limbs to help contextualize OARSI scores and knee hyperalgesia changes. 

      Thank you for your comment. Based on prior experience, the OARSI score of mice in the sham group had an OARSI score ranging from 0-0.5. In the current study, we focused on the DMM control and DMM Gata4 virus groups so we did not include a sham control group. We recognized this was a limitation of this study.

      “We measured the naive limbs for knee hyperalgesia before DMM surgery, and found the average threshold was 507g. We have highlighted the threshold measurement in the figure legend.507 g was the threshold baseline for non-surgery mice (dashed line).” (Lines 499-500)

      (3) While there appear to be GATA4 small-molecule inhibitors in various stages of development that could be used to assess the effects in age-related OA, those experiments are out of scope for the current study. 

      We agree with this comment that the results are still preliminary, which was the reason that we put it in the supplementary materials. However, we felt like the result is informative, which will support the potential of GATA4 as a therapeutic target and inspire the development of more specific inhibitors. Therefore, if the reviewer agrees, we want to keep the results in the current study.

      In particular, our in vitro study demonstrated the potential of using small-molecule GATA4 to enhance the quality of cartilage created by old chondrocytes. We can validate the findings in vivo, as well as develop other GATA4 inhibitors. (Lines 673-675)

      (4) Is GATA4 upregulated in chondrocytes in publicly available databases? 

      Thank you for this question. We have examined the public databases and have found that there is data showing the trend that GATA4 is upregulated in aged or OA chondrocytes in work conducted by Ungethuem et al (PMID: 20858714). In one study by Ramos et al. (PMID: 25054223), we noticed that GATA4 expression levels were the same in both young and old groups, which may be due to the relatively smaller sample size in the young group compared to old group (4 vs 26).

      Work Conducted by Grogan et al. (Unpublished https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE39795)

      Author response image 1.

      Author response image 2.

      Work conducted by Ramos et al. (PMID: 25054223).<br />

      Author response image 3.

      Work conducted by Ungethuem et al (PMID: 20858714).<br />

      (5) In many cases, the figure captions describe the experiment vs. the outcome. It may be more compelling to state the main finding in the figure title, and you might consider changing it from what is stated at present. For example, Figure 2: instead of the impact of overexpression, you may say GATA4 overexpression impairs cartilage formation (as stated in the results).

      Thanks for the suggestion. We have made the following changes to the figure captions as suggested.

      Figure 1: GATA4 is upregulated in aged chondrocytes (Line 373)

      Figure 2: Overexpressing GATA4 impairs the hyaline cartilage formation capacity of young chondrocytes (Lines 408-409)

      Figure 3: GATA4 overexpression activates SMAD1/5  (Line 436)

      Figure 4: Suppressing GATA4 in old chondrocytes promotes cartilage formation and lowers expression of proinflammatory cytokines (Line 467)

      Figure 5: Gata4 overexpression in the knee joints accelerates OA progression in mice. (Line 593)

      (6) It would be useful to provide a little more information about the human tissue donors, if that is available. 

      We have provided more information about the tissue donors in the revised Supplementary Table S1.

      (7) While aging-like changes were observed in young chondrocytes with GATA4 overexpression, it would be interesting to directly evaluate if there is a change in biological versus chronological age in these tissues. Companies like Zymo can provide this biological v chronological age epigenetic clock-based assessments if that is of interest, to say the young chondrocytes are looking "older". 

      Thank you for this information. We agree that it will be important to assess epigenetic changes in GATA-overexpressing cells. We are contacting the company to learn more about their technology. Meanwhile, we added this to the future work section of the manuscript.

      “Although we found that GATA4 was generally increased with aging, some young donors also exhibited increased levels of GATA4, which may be associated with increased DNA damage, as discussed above, or other stressors. Therefore, GATA4 should be used together in conjunction with other aging biomarkers, such as the epigenetic clock [72] to precisely define chondrocyte aging. Future work should examine biological versus chronological aging and epigenetic clock-based assessments to explain the variabilities in GATA4 expression among donors.”  (Lines 658-663)

      (8) It is not clear the age at which the mice received DMM in the methods, but it is shown in Figure 5. 

      We have added the age at which the mice received the DMM surgery to the methods section.

      “Intraarticular injections were administered to mice between 10-12 weeks of age under general anesthesia to safeguard the well-being of the animals and to minimize procedural discomfort.” (Line 300)

      “One week after viral vector injection, DMM surgery was performed to induce the OA model on mice 11-13 weeks of age.” (Line 312-313)

      (9) It is not clear which factors were assayed using Luminex, and it would be great to add. 

      Thank you for this comment, we have added a comprehensive list of proteins assessed using Luminex into a new supplementary table 6 (S6).

      (10) Also interesting, loss of GATA4 seems to prevent diet-induced obesity in mice and promote insulin sensitivity (potentially via GLP-1 secretion). I wonder if there may be a metabolic axis here too? PMID: 21177287. I may have missed parts of the discussion of the role of GATA4 in metabolism, but it might be an interesting addition to the discussion. 

      In the current study, we have not investigated the role of GATA4 in obesity. As suggested, we have included a discussion of GATA4 in metabolism.

      “Furthermore, GATA4 might be associated with metabolic regulation. A study conducted by Patankar et al. investigated how GATA4 regulates obesity. Specifically, they used intestine-specific Gata4 knockout mice to study diet-induced obesity, showing that the knockout mice were resistant to the high-fat diet, and that glucagon-like peptide-1 (GLP-1) release was increased. These findings indicated a decreased risk for the development for insulin resistance in knockout mice.[44] This work was taken a step further in a subsequent publication, in which the same team investigated the dietary lipid-dependent and independent effects on the development of steatosis and fibrosis in Gata4 knockout mice. The results from this work suggested that the knockdown of Gata4 increases GLP-1 release, in turn suppressing the development of hepatic steatosis and fibrosis, ultimately blocking hepatic de novo lipogenesis.[45] These studies are especially interesting with the rise of GLP-1 based therapy for the treatment of OA.46,47 Thus, the coupling of GATA4-related metabolic dysfunction and OA should be further investigated.” (Lines 542-553)

      (11) Another potential citation: GATA4 regulates angiogenesis and persistence of inflammation in rheumatoid arthritis PMID: 29717129 - around the inflammatory axis potential in OA? since GATA4 was reported in FLS from OA- PMC11183113.

      Thank you. We have included this work/citation in the discussion section.\

      “Further studies have shown that GATA4 regulates angiogenesis and inflammation in fibroblast-like synoviocytes in rheumatoid arthritis, indicating that GATA4 is required for the inflammation induced by IL-1b. This study also demonstrated that GATA4 binds to promoter regions on Vascular Endothelial Growth Factor (VEGF)-A and VEGFC to enhance transcription and regulate angiogenesis.[15]”  (Lines 558-562)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Weaknesses: 

      The main weakness in this paper lies in the authors' reliance on a single model to derive conclusions on the role of local antigen during the acute phase of the response by comparing T cells in model antigen-vaccinia virus (VV-OVA) exposed skin to T cells in contralateral skin exposed to DNFB 5 days after the VV-OVA exposure. In this setting, antigen-independent factors may contribute to the difference in CD8+ T cell number and phenotype at the two sites. For example, it was recently shown that very early memory precursors (formed 2 days after exposure) are more efficient at seeding the epithelial TRM compartment than those recruited to skin at later times (Silva et al, Sci Immunol, 2023). DNFB-treated skin may therefore recruit precursors with reduced TRM potential. In addition, TRM-skewed circulating memory precursors have been identified (Kok et al, JEM, 2020), and perhaps VV-OVA exposed skin more readily recruits this subset compared to DNFB-exposed skin. Therefore, when the DNFB challenge is performed 5 days after vaccinia virus, the DNFB site may already be at a disadvantage in the recruitment of CD8+ T cells that can efficiently form TRM. In addition, CD8+ T cell-extrinsic mechanisms may be at play, such as differences in myeloid cell recruitment and differentiation or local cytokine and chemokine levels in VV-infected and DNFB-treated skin that could account for differences seen in TRM phenotype and function between these two sites. Although the authors do show that providing exogenous peptide antigen at the DNFB-site rescues their phenotype in relation to the VV-OVA site, the potential antigen-independent factors distinguishing these two sites remain unaddressed. In addition, there is a possibility that peptide treatment of DNFB-treated initiates a second phase of priming of new circulatory effectors in the local-draining lymph nodes that are then recruited to form TRM at the DFNB-site, and that the effect does not solely rely on TRM precursors at the DNFB-treated skin site at the time of peptide treatment. 

      Thank you for pointing out these potential caveats to our work.  We have considered the possibility that late application of peptide or cell-extrinsic difference could affect the interpretation of our results.  We would like to highlight that in our prior publication on this topic [1], we found that OT-1 responses in mice infected with VV-OVA and VV-N (irrelevant antigen) yielded the same responses as in our VV-OVA/DNFB models.  In addition, in both our prior publication and our current manuscript, application of peptide to DNFB painted sites results in T<sub>RM</sub> with a similar phenotype to those in the VV-OVA site.  Thus, we are confident that it is the presence of cognate antigen in the skin that drives the augmented T<sub>RM</sub> fitness that we observe.

      Secondly, although the authors conclusively demonstrate that TGFBRIII is induced by TCR signals and required for conferring increased fitness to local-antigen-experienced CD8+ TRM compared to local antigen-inexperienced cells, this is done in only one experiment, albeit repeated 3 times. The data suggest that antigen encounter during TRM formation induces sustained TGFBRIII expression that persists during the antigen-independent memory phase. It remains unclear why only the antigen encounter in skin, but not already in the draining lymph nodes, induces sustained TGFBRIII expression. Further characterizing the dynamics of TGFBRIII expression on CD8+ T cells during priming in draining lymph nodes and over the course of TRM formation and persistence may shed more light on this question. Probing the role of this mechanism at other sites of TRM formation would also further strengthen their conclusions and enhance the significance of this finding. 

      This is an intriguing point.  We do not understand why expression of TGFbR3 in T<sub>RM</sub> required antigen encounter in the skin if T<sub>RM</sub> at all sites clearly have encountered antigen during priming in the LN.  We speculate that durable TGFbR3 expression may require antigen encounter in the context of additional cues present in the periphery or only once cells have committed to the T<sub>RM</sub> lineage.  A more detailed characterization of the dynamics of TGFbR3 expression in multiple tissues would be informative and represents a promising future direction for this project.  We note that to robustly perform these experiments a reporter mouse would likely be a requirement.

      Reviewer #2 (Public review): 

      Weaknesses: 

      Overall, the authors' conclusions are well supported, although there are some instances where additional controls, experiments, or clarifications would add rigor. The conclusions regarding skin-localized TCR signaling leading to increased skin CD8+ TRM proliferation in-situ and increased TGFBR3 expression would be strengthened by assessing skin CD8+ TRM proliferation and TGFBR3 expression in models of high versus low avidity topical OVA-peptide exposure.

      Thank you for these helpful suggestions.  We did not attempt these experiment as we were concerned that given the relatively modest expansion differences observed with the APL that resolving differences in TGFbR3 and BrdU would prove unreliable. However, this is something that we could attempt as we continue working on this project.

      The authors could further increase the novelty of the paper by exploring whether TGFBR3 is regulated at the RNA or protein level. To this end, they could perform analysis of their single-cell RNA sequencing data (Figure 1), comparing Tgfbr3 mRNA in DNFB versus VV-treated skin. 

      As discussed above, a more detailed analysis of TGFbR3 regulation is of great interest.  These experiments would likely require the creation of additional tools (e.g. a reporter mouse) to provide robust data.  However, as suggested, we have re-analyzed our scRNAseq looking for expression of Tgfbr3. Pseudobulk analysis of cells isolated from VV or DNFB sites suggests that Tgfbr3 appears to be elevated in antigen-experienced TRM at steady-state (Author response image 1).

      Author response image 1.

      Pseudobulk analysis by average gene expression of Tgfbr3 in cells isolated from either VV or DNFB treated flanks, divided by the average gene expression of Tgfbr3 in naïve CD8 T cells from the same dataset.

      For clarity, when discussing antigen exposure throughout the paper, it would be helpful for the authors to be more precise that they are referring to the antigen in the skin rather than in the draining lymph node. A more explicit summary of some of the lab's previous work focused on CD8+ TRM and the role of TGFb would also help readers better contextualize this work within the existing literature on which it builds. 

      We appreciate this feedback, and we have clarified this in the text.

      For rigor, it would be helpful where possible to pair flow cytometry quantification with the existing imaging data.

      Thank you for these suggestions.  In terms of quantification of number of T<sub>RM</sub>by flow cytometry, we have previously demonstrated as much as a 36-fold decrease in cell count when compared to numbers directly visualized by immunofluorescence [1].  Thus, for enumeration of T<sub>RM</sub> we rely primarily on direct IF visualization and use flow cytometry primarily for phenotyping.

      Additional controls, namely enumerating TRM in the opposite, untreated flank skin of VV-only-treated mice and the treated flank skin of DNFB-only treated mice, would help contextualize the results seen in dually-treated mice in Figure 2.

      Without a source of inflammation (e.g. VV infection of DNFB) we see very few T<sub>RM</sub>in untreated skin.  A representative image is provided (Author response image 2).  A single DNFB stimulation does not recruit any CD8+ T cells to the skin without a prior sensitization [2].

      Author response image 2.

      Representative images of epidermal whole mounts of VV treated flank skin, and an untreated site from the same mouse isolated on day 50 post infection and stained for CD8a.

      In figure legends, we suggest clearly reporting unpaired T tests comparing relevant metrics within VV or DNFB-treated groups (for example, VV-OVA PBS vs VV-OVA FTY720 in Figure 3F).

      Thank you for this suggestion.  The figure legends have been amended.

      Finally, quantifying right and left skin draining lymph node CD8+ T cell numbers would clarify the skin specificity and cell trafficking dynamics of the authors' model. 

      We quantified the numbers of CD8 T cells in left and right skin draining lymph nodes by flow cytometry in mice at day 50 post VV infection DNFB-pull.  We observe similar numbers of cells at both sites (Author response Image 3).

      Author response Image 3.

      Quantification of total number of CD8+ T cells in left and right inguinal lymph nodes. Each symbol represents paired data from the same individual animal, and this is representative of 3 separate experiments.

      Reviewer #1 (Recommendations for the authors): 

      (1) Figures 1D and S1C demonstrate that 80-90 % of TRM at both VV and DNFB sites express CD103+. In contrast, the sequencing data suggests the TRM at the VV site has much higher Itgae expression. Also, clusters 3 and 4, which express significantly more Itgae than all other clusters, together comprise only ~30% of CD8+ T cells at the VV-infected skin site. How can these discrepancies between transcript and protein expression be explained? 

      Thank you for these excellent comments. T<sub>RM</sub> at both VV and DNFB sites appear to express similarly high levels of CD103 protein in both the OT-I system as we previously published [1] and in a polyclonal system using tetramers.  The lower penetrance of Itgae expression in the scRNAseq data we attribute to a lack of sensitivity which is common with this modality.  However, the relative increased expression of Itgae in clusters 3 and 4 is interesting and may suggest increased Itgae production/stability.  However, in the absence of any effect on protein expression, we chose not to focus on these mRNA differences.

      (2) For the experiments in Figure 3D, in order to exclude a contribution from circulating memory cells, FTY720 should have been administered during the duration of, not prior to, the initiation of the recall response. The effect of FTY720 wears off quickly, so the current experimental setting likely allows for circulating cells to enter the skin. This concern is mitigated by the results of anti-Thy1.1 mAb treatment, but documenting the experiment as in Figure D will likely be confusing to readers. 

      Thank you for this comment.  We relied on the literature indicating that the half-life of FTY720 in blood is longer than 6 days [3-5].  However, on reviewing this again, there are other reports suggesting a lower halflife.  Thank you for pointing out this potential caveat.  As mentioned above, we do not think this affects the interpretation of our data as similar results were obtained with anti-Thy1.1

      (3) Similar to what is described in the weaknesses section, the data on TGFBRIII expression is lacking. When is TGFBRIII induced? In the LN during primary activation and it is then sustained by a secondary antigen exposure at the peripheral target tissue site? Or is it only induced in the peripheral tissue, and there is interesting biology to uncover in regard to how it is induced by the TCR only after secondary exposure, etc.? 

      Thank you for these comments. As discussed above, a more detailed analysis of TGFbR3 regulation is of great interest.  These experiments would likely require the creation of additional tools (e.g. a reporter mouse) to provide robust data and are part of our future directions.

      (4) As described in the weakness section, there could be TCR-independent differences between the VV-OVA and DNFB sites that lead to phenotypic changes in the TRMs that are formed there, both CD8+ T cell-intrinsic (kinetics; with regard to time after initial priming) and extrinsic (microenvironmental differences due to the nature of the challenge, recruited cell types, cytokines, chemokines, etc.). Since the authors report the use of both VV and VV-ova, we recommend an experimental strategy that controls for this by challenging one site with VV and another with VV-OVA concomitantly, followed by repeating the key experiments reported in this manuscript. 

      As discussed above, we have previously published a very similar experiment using VV-OVA and VV-N infection on opposite flanks [1].

      (5) In Figure 6J please indicate means and provide more of the statistics comparing the groups (such as comparing VV-WT vehicle to VV-KO vehicle etc.), and potentially display on a linear scale as with all of the other figures looking at cells/mm2 to help convince the reader of the conclusions and support the secondary findings mentioned in the text such as "Notably, numbers of Tgfbr3ΔCD8 TRM in cohorts treated with vehicle remained at normal levels indicating that loss of TGFβRIII does not affect TRM epidermal residence in the steady state" despite it looking like there is a decrease when looking at the graph. 

      We appreciate the feedback on the readability of this figure, and so have updated figure 6J to be on a linear scale and added additional helpful statistics to the figure legend. The difference between Tgfbr3<sup>WT</sup> and Tgfbr3<sup>∆CD8</sup> at steady state is excellent point, and we agree that there could to be a trend towards reduction in the huNGFR+ T<sub>RM</sub> across both groups, even without CWHM12 administration. However, we did not see statistically significant reductions in steady-state Tgfbr3<sup>∆CD8</sup> T<sub>RM</sub>, but the slight reduction in both VV-OVA and DNFB treated flanks suggests that TGFßRIII may play a role in steady-state maintenance of all T<sub>RM</sub>. Perhaps with more sensitive tools to better visualize TGFßRIII expression, we could identify stepwise upregulation of TGFßRIII depending on TCR signal strength, possibly starting in the lymph node. We have also amended our description of this figure in the text, to allow for the possibility that a low, but under the level of detection amount of TGFßRIII could play a role in steady-state maintenance of both local antigen-experienced and bystander T<sub>RM</sub>.

      Minor points: 

      (1) In describing Figure 4B, the term "doublets" for pairs of connected dividing cells is confusing. 

      Thank you for this comment, the term has been revised to “dividing cells” in the text and figure.

      (2) Figure legend 4F: BrdU is not "expressed" . 

      Very true, it has been changed to “incorporation”.

      (3) Do CreERT2 and/or huNGFR expressed by transferred OT-I cells act as foreign antigens in C57BL/6 mice, potentially causing elimination of circulating memory cells? If that were the case, this would not necessarily confound the read-out of TRM persistence studied here, since skin TRM are likely protected from at least antibody-mediated deletion and their numbers are not maintained by recruitment of circulating cells at stead-state. However, it would be useful to be aware of this potential limitation of this and similar models. 

      Thank you for raising the important technical concern.  In our prior work [1] and this work, we monitor the levels of transferred OT-I cells in the blood over time.  We have not observed rejection of huNGFR+ cells.  We also note that others using the same system have also not observed rejection [6].

      (4) In Figure 6J, means or medians should be indicated 

      This has been updated in Figure 6J.

      (5) Using the term "antigen-experienced" to specifically refer to TRM at the VV site could be confusing, since those at the DNFB site are also Ag-experienced (in the LN draining the VV skin site). 

      We agree that it is a challenging term, as all T<sub>RM</sub> are memory cells. That is why in the text we refer to T<sub>RM</sub> isolated from the VV site as “local antigen experienced T<sub>RM</sub>.”, to try to distinguish them from bystanders that did not experience local antigen.

      (6) The Title essentially restates what was already reported in the authors' prior study. If the data supporting the TGFBRIII-mediated mechanism is studied in more depth, maybe adding this aspect to the title may be useful? 

      Thank you for this suggestion.  I think the current title is probably most suitable for the current manuscript but we are willing to change it should the editors support an alternative title.

      Reviewer #2 (Recommendations for the authors): 

      (1) Definition of bystander CD8+ TRM: The first paragraph of the introduction defines CD8+ TRM. To improve the clarity of this definition, we suggest being explicit that bystander TRM experience cognate antigen in the SDLNs but, in contrast to other TRM, do not experience cognate antigen in the skin. 

      Thank you, we have clarified this is in the text.

      (2) Consider softening the language when comparing the efficiency of CD8+ recruitment of the skin between DNFB and VV-treated flanks. For example, substitute "equal efficiency" with "comparable efficiency" since it is difficult to directly compare the extent of inflammation between viral and hapten-based treatments. 

      We have adjusted this terminology throughout the paper.

      (3) Throughout figure legends, we appreciate the indication of the number of experimental repeats performed. We suggest, either through statistics or supplemental figures, demonstrating the degree of variability between experiments to aid readers in understanding the reproducibility of results. 

      Thank you for this suggestion.  In key figures we show data from individual mice across multiple experiments. Thus, inter-experiment variability is captured in our figures.  

      (4) Figure 1: 

      a) Add control mice treated with either vaccinia virus or DNFB and harvest back skin at day 52 to demonstrate baseline levels of polyclonal and B8R tetramer-positive CD8s in the epidermis. These controls would clarify the background CD8+ expansion that might occur in DNFB-treated mice in the absence of vaccinia virus. 

      This point was addressed above.

      b) Figure 1: It would be helpful to see the %Tet+ population specifically in the CD103+ population, recognizing that the majority of the CD8+ from the skin are CD103+. 

      We did look only at CD103+ CD8 T cells from the skin for our tetramer analysis, so this has been clarified in the figure legend.

      c) Provide a UMAP, very similar to 1H, where CD8+ T cells, vaccinia virus, and DNFB-treated flanks are overlaid.

      Thank you for this suggestion.  A UMAP combining aspects of 1G (cell types from the whole ImmgenT dataset) with 1H (our data) results in a figure that is very difficult to interpret.  Thus, we have separated cell types across the entire ImmgenT data set (e.g. CD8+ T cells) and our data into 2 separate panels.

      d) 1D: left flow plot has numbered axis while the right flow plot does not. 

      Thank you, this has been fixed.

      (5) Figure 2: 

      a) In the figure legend, define what is meant by the grey line present in Figures 2C and 2D. 

      This has been updated in the figure legend.

      b) Edit the Y axis of 2C and 2D to specify the TRM signature score. 

      This has been updated in the figure.

      c) Include panel 1D from 1S into Figure 2 to help clarify for the reader what genes are expressed in the 0 - 5 clusters.

      We appreciate the feedback, but we found the heatmap made the figure look too busy, so we feel comfortable keeping it available within supplemental figure 1.

      d) In body of text explicitly discuss that the TRM module used to calculate a signature score was created using virus infection modules (HSV, LCMV and influenza) and thus some of the transcriptional similarity between the authors vaccinia virus treated CD8+ TRM and the TRM module might be due to viral infection rather than TRM status.

      Thank you for this comment.  We have now emphasized this point in the text.

      (6) Figure 3: 

      a) If there are leftover tissue sections, it would be optimal to show specific staining for CD103. We recognize that this data has been previously published by the lab, but it would be ideal to show it once in this paper. 

      Unfortunately, we do not have leftover tissue sections, so we are unable to measure CD103 by I.F. in these experiments.

      b) If you did collect skin draining lymph nodes in the Thy1.1 depletion model, it would be nice to see flow data showing the depletion effects in the skin draining lymph nodes in addition to the blood. 

      Unfortunately, we did not collect the skin draining lymph nodes, and do not have that data for the relevant experiments.

      c) Figure 3 F & G: Perform a T-test comparing vaccinia virus PBS to FTY720 and isotype to anti-Thy1.1 within the same treatment group. Showing no significance with these two comparisons would strengthen the authors' claims. Statistics can be described in legend. 

      We have included this analysis in the figure legend.

      (7) Figure 4: 

      a) It would be helpful to have the CD69+/CD103+ population in this model discussed/defined more. The CD69 expression seen in 4E is lower than the reviewers would've predicted, and it would be interesting to see CD103 expression as well.

      We have found that generally CD103 is a stronger marker for in the skin by flow, as CD69 staining is somewhat less robust in the colors we have chosen.  By way of example, we present gating we did upstream in that experiment, gated previously on liveCD45+CD3+CD8+ events (Author response image 4).

      Author response image 4.

      Representative flow cytometric plots showing CD69 and CD103 expression in gated live CD45+CD8+CD90.1+ cells isolates from VV-OVA or DNFB treated flanks.

      (8) Figure 5: 

      a) Define APL and its purpose in both the body of text and the figure legend. 

      We have clarified this in the text and the figure legend.

      b) Using in-vivo BrdU, compare proliferation between high avidity N4 and low avidity Y3 OVA-peptide at the primary recall timepoint. 

      We considered this, but due to the lack of sensitivity of the BrdU incorporation and the relatively subtle phenotype of the Y3, we did not think the assay would be sensitive enough to identify differences.

      (9) Figure 6: 

      a) Compare TGFBR3 expression in CD8+ T cells from mice receiving high avidity N4 versus low avidity Y3 OVA-peptide at the primary recall timepoint. 

      This point was discussed above.

      b) Either 1) examine TGFBR3 mRNA expression in VV vs DNFB skin from scRNA-seq dataset or 2) perform a qPCR on epidermal CD8+ T cells from mice receiving high avidity N4 versus low avidity Y3 at the primary recall timepoint. This would help distinguish whether TGFBR3 regulation occurs at the mRNA versus protein level. 

      This point has been discussed above.

      c) Figure 6A: Not required, but it seems like the TGFBR3 gate could be shifted to the right a bit. 

      The gates were set using FMO.

      d) Figure 6C: What comparison is the asterisk indicating significance referring to?

      It is the Dunnett’s test comparing VV-OVA to DNFB and untreated skin, the figure has been amended to clarify this point.

      e) Figure 6: To increase the rigor of the claim that CWHM12 is creating a TGFb limiting condition, the authors could either 1) perform an ELISA or cell-based assay measuring active TGFb, 2) recapitulate results of 6J using monoclonal antibody against avb6 as done in Hirai et al., 2021, Immunity., or 3) examine Tgfbr3 mRNA expression in your single cell RNAseq data, comparing cluster 0 and cluster 3.

      We are pleased to have the opportunity to show Tgfbr3 mRNA, which is above in figure R1.

      (10) Material and methods: 

      Specify how the localization of the back skin used for imaging was made consistent between the right and left flanks. 

      We have updated this methodology in the text.

      Literature Cited

      (1) Hirai, T., et al., Competition for Active TGFβ Cytokine Allows for Selective Retention of Antigen-Specific Tissue- Resident Memory T Cells in the Epidermal Niche. Immunity, 2021. 54(1): p. 84-98.e5.

      (2) Manresa, M.C., Animal Models of Contact Dermatitis: 2,4-Dinitrofluorobenzene-Induced Contact Hypersensitivity, in Animal Models of Allergic Disease: Methods and Protocols, K. Nagamoto-Combs, Editor. 2021, Springer US: New York, NY. p. 87-100.

      (3) Müller, H.C., et al., The Sphingosine-1 Phosphate receptor agonist FTY720 dose dependently affected endothelial integrity in vitro and aggravated ventilator-induced lung injury in mice. Pulmonary Pharmacology & Therapeutics, 2011. 24(4): p. 377-385.

      (4) Nofer, J.-R., et al., FTY720, a Synthetic Sphingosine 1 Phosphate Analogue, Inhibits Development of Atherosclerosis in Low-Density Lipoprotein Receptor–Deficient Mice. Circulation, 2007. 115(4): p. 501-508.

      (5) Brinkmann, V., et al., Fingolimod (FTY720): discovery and development of an oral drug to treat multiple sclerosis. Nat Rev Drug Discov, 2010. 9(11): p. 883-97.

      (6) Andrews, L.P., et al., A Cre-driven allele-conditioning line to interrogate CD4<sup>+</sup> conventional T cells. Immunity, 2021. 54(10): p. 2209-2217.e6.

    1. Author response:

      Reviewer #1 (Public review):

      Summary:

      The behavior of cells expressing constitutively active HRas is examined in mosaic monolayers, both in MCF10a breast epithelial and Beas2b bronchial epithelial cell lines, mimicking the potential initial phase of development of carcinoma. Single HRas-positive cells are excluded from MCF10a but not Beas2b monolayers. Most interestingly, however, when in groups, these cells are not excluded, but rather sharply segregated within a MCF10a monolayer. In contrast, they freely mix with wt Beas2b cells. Biophysical analysis identifies high tension at heterotypic interfaces between HRas and wild-type cells as the likely reason for segregation of MCF10a cells. The hypothesis is supported experimentally, as myosin inhibition abolishes segregation. The probable reason for the lack of segregation in the bronchial epithelium is to be found in the different intrinsic properties of these cells, which form a looser tissue with lower basal actomyosin activity. The behaviour of single cells and groups is recapitulated in a vortex model based on the principle of differential interfacial tension, under the condition of high heterotypic interfacial tension.

      Strengths:

      Despite being long recognized as a crucial event during cancer development, segregation of oncogenic cells has been a largely understudied question. This nice work addresses the mechanics of this phenomenon through a straightforward experimental design, applying the biophysical analytical approaches established in the field of morphogenesis. Comparison between two cell types provides some preliminary clues on the diversity of effects in various cancers.

      Weaknesses:

      Although not calling into question the main message of this study, there are a few issues that one may want to address:

      (1) One may be careful in interpreting the comparison between MCF10a and Beas2b cells as used in this study. The conditions may not necessarily be representative of the actual properties of breast and bronchial epithelia. How much of the epithelial organization is reconstituted under these experimental conditions remains to be established. This is particularly obvious for bronchial cells, which would need quite specific culture conditions to build a proper bronchial layer. In this study, they seemed to be on the verge of a mesenchymal phenotype (large gaps, huge protrusions, cells growing on top of each other, as mentioned in the manuscript).

      We thank the reviewer for this important point. We agree that our experimental conditions do not fully recapitulate the in vivo architecture of either breast or bronchial epithelia. However, here, our intention is to compare two well-established epithelial lines with distinct intrinsic mechanical and organizational properties, rather than to reproduce in-vivo microenvironment. Nevertheless, to address this, we have now strengthened our quantitative analysis of epithelial integrity in Beas2b monolayers, by including ZO-1 immunofluorescence along with E-cadherin immunofluorescence. These measurements confirm that Beas2b monolayers under our culture conditions retain junctional organization, albeit with larger gaps and protrusions compared to MCF10a. We will revise the text to make this distinction explicit.

      As an alternative to Beas2b, comparison of MCF10a with another cell line capable of more robust in vitro epithelial organization, but ideally with different adhesive and/or tensile properties, would be highly interesting, as it may narrow down the parameters involved in segregation of oncogenic cells.

      We agree with the reviewer that the inclusion of an additional epithelial model system with distinct adhesive and organizational properties would provide valuable insights. In line with this suggestion, we are currently repeating the key experiments using Madin-Darby Canine Kidney (MDCK) cells, a well-established model epithelial cell line. We believe this complementary system will allow us to further dissect the behaviour of HRasV12-expressing cells.

      (2) While the seminal description of tissue properties based on interfacial tensions (Brodland 2002) is clearly key to interpreting these data, the actual "Differential Interfacial Tension Hypothesis" poses that segregation results from global differences, i.e., juxtaposition of two tissues displaying different intrinsic tensions. On the contrary, the results of the present work support a different scenario, where what counts is the actual difference in tension ALONG the tissue boundary, in other words, that segregation is driven by high HETEROTYPIC interfacial tension. This is an important distinction that should be clarified.

      We thank the reviewer for this insightful comment. As correctly noted, Brodland’s 2002 work provided a seminal formulation of the Differential Interfacial Tension Hypothesis (DITH), which frames tissue organization in terms of effective interfacial tensions. In its original form, DITH emphasized segregation as a consequence of global differences in the intrinsic (bulk) tensions of juxtaposed tissues.

      While our results specifically show that segregation is determined by local interfacial mechanics between transformed- and host cells, from our experiments with blebbistatin, where we observed lost in segregation upon reducing global contractility, we believe that the differences in local interfacial mechanics also stem from global differences which belong intrinsically to the tissues in discussion here.

      To directly map global interfacial tension, in the revised manuscript, we aim to perform staining with E-cadherin, and actin in the two tissues, and measure cortical actin, stress fibers, and E-cadherin levels at the cell-cell junctions. Once the global tissue mechanics are mapped, we can be more confident about our claim on DITH. Nevertheless, we will also clarify this distinction, more clearly in the text and explicitly state that while DITH provided the foundation for conceptualizing tissue mechanics, our findings on transformed cell- healthy cell interactions specifically demonstrate that segregation is driven by high heterotypic interfacial tension at the tissue boundary.

      (3) Related: The fact that actomyosin accumulates at the heterotypic interface is key here. It would be quite informative to better document the pattern of this accumulation, which is not clear enough from the images of the current manuscript: Are we talking about the actual interface between mutant and wt cells (membrane/cortex of heterotypic contacts)? Or is it more globally overactivated in the whole cell layer along the border? Some better images and some quantification would help.

      We agree that more detailed visualization of actomyosin distribution would strengthen our conclusion. We are currently working on re-imaging the heterotypic interfaces at higher magnification and are quantifying fluorescence intensity of actin and myosin-II along cell–cell boundaries. All of this will be integrated in the next version of the manuscript.

      (4) In the case of Beas2b cells, mutant cells show higher actin than wt cells, while actin is, on the contrary, lower in mutant MCF10a cells (Author response image 2). Has this been taken into account in the model? It may be in line with the idea that HRas may have a different action on the two cell types, a possibility that would certainly be worth considering and discussing.

      Our current vertex model does not explicitly incorporate actin levels; rather, it captures their functional consequences indirectly through effective mechanical parameters such as cortical tension and adhesion strength. Nonetheless, we agree that the opposite trends in actin enrichment between Beas2b and MCF10a HRasV12 mutants raise the important possibility that HRas signaling may act through distinct mechanisms in the two cell types.

      To further investigate this, we are currently culturing MCF10a and Beas2b HRasV12 mutant populations separately (i.e., without wild-type cells) to assess their intrinsic organization and behavior in isolation. These experiments will help us disentangle how HRas activation differentially impacts epithelial architecture in these two cellular contexts, and we will discuss these ongoing efforts in the revised manuscript.

      From the modelling perspective, the model currently does not account for the different actin levels of mutants with respect to wt cells in the two tissues. This can be accounted for by having different  and  for mutants and wt in the two cases in simulation.

      In conclusion, the study conveys an important message, but, as it stands, the strength of evidence is incomplete. It would greatly benefit from a more detailed and complete analysis of the experimental data, a better fit between this analysis and the corresponding vertex model, and a more in-depth discussion of biological and biophysical aspects. These revisions should be rather easily done, and would then make the evidence much more solid.

      Reviewer #2 (Public review):

      Summary:

      The authors investigate the behavior of oncogenic cells in mammary and bronchial epithelia. They observe that individual oncogenic cells are preferentially excluded from the mammary epithelium, but they remain integrated in the bronchial epithelium. They also observe that clusters of oncogenic cells form a compact cluster in the mammary epithelium, but they disperse in the bronchial epithelium. The authors demonstrate experimentally and in the vertex model simulations that the difference in observed behavior is due to the differential tension between the mutant and wild-type cells due to a differential expression of actin and myosin.

      Strengths:

      (1) Very detailed analysis of experiments to systematically characterize and quantify differences between mammary and bronchial epithelia.

      (2) Detailed comparison between the experiments and vertex model simulations to identify the differential cell line tension between the oncogenic and wild-type cells as one of the key parameters that are responsible for the different behavior of oncogenic cells in mammary and bronchial epithelia

      Weaknesses:

      (1) It is unclear what the mechanistic origin of the shape-tension coupling is, which is used in the vertex model, and how important that coupling is for the presented results. The authors claim that the shape-tension coupling is due to the anisotropic distribution of stress fibers when cells are under external stress. It is unclear why the stress fibers should affect an effective line tension on the cell boundaries and why the stress fibers should be sensitive to the magnitude of the internal isotropic cell pressure. In experiments, it makes sense that stress fibers form when cells are stretched. Similar stress fibers form when the cytoskeleton or polymer networks are stretched. It is unclear why the stress fibers should be sensitive to the magnitude of internal isotropic cell pressure. If all the surrounding cells have the same internal pressure, then the cell would not be significantly deformed due to that pressure, and stress fibers would not form. The authors should better justify the use of the shape-tension coupling in the model and also present simulation results without that coupling. I expect that most of the observed behavior is already captured by the differential tension, even if there is no shape-tension coupling. 

      While the segregation behavior can be captured by the differential tension, without the shape-tension coupling, we noticed unjamming and aligned movement of wild type cells at the mutant-cell interface. This was only captured when we incorporated shape tension coupling in the model, suggesting changes in cell shapes due to differential interfacial tension is essential in driving the fate of the mutants.  Below, difference between shape indices of cells at the interface and away from the boundary is plotted versus the interfacial tension in the case of no shape-tension coupling [Author response image 1]. The red dashed line represents the experimental value of the shape index difference. The blue line is the shape index difference between two randomly chosen groups of cells (half of the total number of cells in each group is taken). At zero line-tension, the difference in shape index between interface cells and cells away from the interface is same as that between randomly chosen groups of cells, which is expected since there should be no interface at zero line-tension. The no shape-tension data presented here are averaged over 19 seeds. Although the results without shape-tension coupling reaches experimental values at high enough differential tension [Author response image 2], a closer inspection of the simulation results show that the cells are just squeezed and are aligned perpendicular to the interface, which is contrary to what is seen in experiments.

      Author response image 1.

      Shape indices versus the interfacial line tension<br />

      Calculating the average of the absolute value of the dot product of the nematic director and the interface edge for simulations with and without shape-tension coupling clearly shows that with shape-tension coupling, the cells align and elongate along the interface as is seen in experiment, given by an interface dot product value > 0.5 at high enough line-tension values. Further, shape-tension coupling or biased edge tension has been used before to model for cell elongation during embryo elongation [1] and here we use it as an active line-tension force, which elongates cells along the interface, in addition to the differential tension which is passive. This additional quantification of the alignment and elongation of cells along the interface will be added to the Supplementary Information (SI).

      [1] Dye, N. A., Popović, M., Iyer, K. V., Fuhrmann, J. F., Piscitello-Gómez, R., Eaton, S., & Jülicher, F. (2021). Self-organized patterning of cell morphology via mechanosensitive feedback. Elife, 10, e57964.

      Author response image 2.

      Change in interfacial tension with and without shape tension coupling<br />

      (2) The observed difference of shape indices between the interfacial and bulk cells in simulations in the absence of differential line tension is concerning. This suggests that either there are not enough statistics from the simulations or that something is wrong with the simulations. For all presented simulation results, the authors should repeat multiple simulations and then present both averages and standard deviations. This way, it would be easier to determine whether the observed differences in simulations are statistically significant.

      The reviewer is right in pointing out that statistics for the plots must be shown. The difference in shape indices between the interfacial and bulk cells in simulations has been calculated over 11 different seed values. The observed differences in simulations along with the standard deviations have been plotted below [Author response image 3]. This figure in the paper will be updated to include the standard deviations. The non-zero difference in shape index in the absence of differential line tension for low values of stress threshold is due to the shape-tension coupling acting even at low differential tension. Thus, a non-zero, sufficiently high value of the stress threshold is required in our model with shape-tension coupling, for the model to make sense. This has also been stated in section 4 of the paper. The importance of the stress-tension coupling has been stated in response to the previous point.

      Author response image 3.<br />

      (3) The authors should also analyze the cell line tension data in simulations and make a comparison with experiments.

      We agree with the reviewer that cell line tension data should also be analyzed and compared with experiments. This will be added to the next version of the paper.